## Abstract

A consistent hybrid ensemble filter (CHEF) for using hybrid forecast error covariance matrices that linearly combine aspects of both climatological and flow-dependent matrices within a nonvariational ensemble data assimilation scheme is described. The CHEF accommodates the ensemble data assimilation enhancements of (i) model space ensemble covariance localization for satellite data assimilation and (ii) Hodyss’s method for improving accuracy using ensemble skewness. Like the local ensemble transform Kalman filter (LETKF), the CHEF is computationally scalable because it updates local patches of the atmosphere independently of others. Like the sequential ensemble Kalman filter (EnKF), it serially assimilates batches of observations and uses perturbed observations to create ensembles of analyses. It differs from the deterministic (no perturbed observations) ensemble square root filter (ESRF) and the EnKF in that (i) its analysis correction is unaffected by the order in which observations are assimilated even when localization is required, (ii) it uses accurate high-rank solutions for the posterior error covariance matrix to serially assimilate observations, and (iii) it accommodates high-rank hybrid error covariance models. Experiments were performed to assess the effect on CHEF and ESRF analysis accuracy of these differences. In the case where both the CHEF and the ESRF used tuned localized ensemble covariances for the forecast error covariance model, the CHEF’s advantage over the ESRF increased with observational density. In the case where the CHEF used a hybrid error covariance model but the ESRF did not, the CHEF had a substantial advantage for all observational densities.

## 1. Introduction

Both theory (Bishop and Satterfield 2013) and many independent practical tests (Etherton and Bishop 2004; Wang et al. 2008a,b, 2013; Yaremchuk et al. 2011; Clayton et al. 2013; Kuhl et al. 2013) all suggest that hybrid error covariance models (Hamill and Snyder 2000; Lorenc 2003) are fundamentally more accurate than either just localized ensemble covariances or quasi-static climatological error covariance models. Usually, data assimilation (DA) schemes with hybrid error covariance models use a variational procedure to find the correction used to update the “best guess” of the state of the atmosphere. The reason for this is that variational data assimilation schemes (Derber et al. 1991; Courtier et al. 1994; Rabier et al. 2000; Kadowaki 2005; Xu et al. 2005; Rosmond and Xu. 2006; Rawlins et al. 2007; Gauthier et al. 2007; El Akkraoui et al. 2008) are specifically designed to accommodate the high-rank climatological part of the hybrid error covariance model, whereas ensemble Kalman filters cannot accommodate such high-rank matrices.

Ensemble Kalman filters (EnKFs) (Burgers et al. 1998; Pham et al. 1998; Houtekamer and Mitchell 2001; Bishop et al. 2001; Anderson 2001; Whitaker and Hamill 2002; Hunt et al. 2007) have received a great deal of attention in recent years. However, in atmospheric applications, EnKF data assimilation performance has not yet been good enough to justify a shift from variational DA to a nonvariational EnKF. An ideal analysis ensemble in a nonlinear system would be a random sample from the posterior distribution of truth given by Bayes’s theorem. Such an ensemble would be centered on the posterior mean or minimum error variance state estimate (an objective of EnKF data assimilation schemes) rather than the maximal likelihood state estimate (the typical objective of variational schemes). But in current practice, forecasts from hybrid variational analyses typically have smaller mean-square errors than those from EnKFs. Three possible reasons for the current inferior performance of EnKFs are that (i) current EnKFs do not accommodate hybrid error covariance models, (ii) the nonlinear constraints in mode-finding 4D variational schemes with outer loops may deliver “more realistic” states than those that come from EnKFs, and (iii) as will be outlined in this paper, there is an inconsistency between EnKF solutions and the known minimum error variance estimation equations when ensemble covariance localization is required. In this paper, we introduce a new ensemble filter, the consistent hybrid ensemble filter that is consistent with the minimum error variance estimation equations for localized ensemble covariances and that can accommodate hybrid error covariance models.

In section 2, we review the minimum error variance estimation equations for nonlinear observation operators and forecasts. Section 3 describes the CHEF approach to solving these equations. Section 4 compares the performance of CHEF against other ensemble-based data assimilation strategies. Conclusions follow in section 5.

## 2. The minimum error variance state estimation equations

One aim is to produce a minimum error variance estimate of the variables defining the model state and the list of functions of variables to be observed. In this context, is a discrete representation of smoothed reality and a list of perfect observations is defined by

where is the true filtered state. Precisely what constitutes the “filtered state” depends on both the model resolution and the diffusive effect of the numerical and subgrid parameterization schemes employed.

Any list of real observations will have observation errors given by

We will assume that has a mean of zero and a covariance of . The general nonlinear observation operator *H* can be represented in terms of a nonlinear state extension operation followed by a linear operation . The operator appends to the state vector , the minimal list of nonlinear functions of state elements defined at model grid points

which enables to be written in the form

where is a strictly linear operation. Equations (3) and (4) facilitate the consistent model space ensemble covariance localization discussed in section 3 and the upcoming proof that serial assimilation is equivalent to all-at-once assimilation. The amount by which the number of elements in **x** exceeds that of depends on the spatiotemporal coverage of observation platforms that observe nonlinear functions of the raw model state. While it is straightforward to compute ensemble covariances between nonlinear functions, current static climatological covariance models can only approximate covariances in the errors of forecasts of nonlinear functions of the state using linear approximations based on the best available guess of the state. This is a limitation of current static climatological error covariance models that future research may overcome.

For simplicity, we shall refer to the extended state vector **x** as the state vector and we will also assume that the extended state vector has *n* elements. Assume that the observations have been ordered into observation batch vectors that list observations. Concatenating all the gives the observation vector , which lists all observations. To each observation batch , there corresponds a linear observation operator and observation error covariance matrix such that

We assume that none of the observation errors in the first batch correlate with observation errors from the second batch—and so on.

Given true forecast and observation error covariance matrices and associated with the ensemble mean forecast *n-*vector of and the observations **y**, a linear observation operator such that , where are the true values of the observed variables and the model state variables, respectively, it can be shown (e.g., Daley 1991) that the state obtained by assimilating all of the batches of observations at the same time using

is the minimum error variance state estimate of the system, where is the analysis error covariance matrix given by

Note that provided the true distribution of forecast errors is known, (6) and (7) give the minimum error variance estimate for a linear estimator and its analysis error covariance matrix. This result holds even when the distribution of forecast errors is non-Gaussian and/or the observations are a nonlinear function of the state (Kalnay 2004, 151–155). Because we have included the nonlinear functions of the state required to form the observation operator within the state vector, the covariance matrix includes higher-than-second-order error moments and can accommodate Hodyss and Campbell’s (2013) quadratic innovation and hence (6) can be better than the best strictly linear state estimate.

Also note that (6) and (7) are readily extended into the time dimension by replacing the 3D state vector by a sequence of 3D states. Thus, as with other ensemble Kalman filters, the algorithmic changes required to turn the filter into a 4D smoother are trivial.

A compelling approach to data assimilation is to create a system that can solve (6) for any given estimate of and because then, provided one could figure out how to accurately estimate and , one would be able to obtain the minimum error variance estimate. In this paper, we refer to state estimation techniques that solve (6) and (7) as being consistent because they are consistent with the known minimum error variance state estimate. Note that there is only one minimum error variance state estimate given a single and .

Nonvariational ensemble Kalman filters such as those introduced by Houtekamer and Mitchell (1998), Anderson (2001), and Whitaker and Hamill (2002) utilize a serial or sequential observation assimilation approach in which the observations are assimilated one after the other. In section 3, we describe a serial assimilation approach that yields an analysis identical to that obtained from (6) regardless of whether the forecast error covariance model uses ensemble covariance localization and/or a hybrid and regardless of the order in which observations are assimilated. The analysis mean produced by serial ensemble Kalman filters with localization such as those introduced by Houtekamer and Mitchell (1998), Anderson (2001), and Whitaker and Hamill (2002) depend on the order in which observations are assimilated and, hence, do not accurately solve (6) and are incapable of producing the minimum error variance estimate and hence are inconsistent with (6).

Hunt et al.’s (2007) local form [local ensemble transform Kalman filter (LETKF)] of Bishop et al.’s (2001) ETKF uses observation error variance inflation to achieve localization and hence it too is inconsistent with (6) because it does not use the true observation error covariance matrix.

As far as the authors are aware, there are no nonvariational ensemble filters currently discussed in the literature that provide consistent solutions to (6) or (7) for the hybrid error covariance model. The objective of this paper is to introduce a new nonvariational consistent hybrid ensemble filter (CHEF) that solves (6) precisely and provides an extremely computationally efficient way of generating a *K*-member sample of posterior ensemble perturbations whose covariance converges to an exact solution to (7) as the sample size goes to infinity.

## 3. Obtaining the minimum error variance estimate using the CHEF

The CHEF algorithm is derived from a particular form of the proof that accurate serial-observation assimilation is identical to all-observations-at-once assimilation. Hence, we introduce CHEF with this proof.

It can be shown (e.g., Daley 1991) that if the all-at-once assimilation equations (6) and (7) are satisfied then the equations

are also satisfied (assuming that the inverses of the forecast and observation error covariance matrices exist). Equation (8) implies that if just the first batches of observations were assimilated using (6), then the resulting analysis would satisfy

Left multiplying (10) by then gives

By induction, the and obtained from the iteration

are precisely identical to the analysis and analysis error covariance matrix one would obtain by solving (6) and (7) in an “all-at-once” data assimilation step. It immediately follows that this consistent serial-observation assimilation solution is independent of the order in which observations are assimilated.

CHEF uses perturbed observations to obtain an analysis ensemble. Let each column of the *n* × *K* matrix be an ensemble member. Let the *p* × *K* matrix be a *K*-member ensemble of random observation perturbations created using

where the notation means that is a random normal vector with mean zero and covariance matrix equal to the identity matrix . Note that by definition the expected covariance as is required by the perturbed observations approach to ensemble generation. The *n* × *K* matrix of perturbed observations is then created using

Note that (14) and (15) ensure that the . We will use to denote the *j*th complete perturbed observation vector and to denote the *i*th batch of perturbed observations from .

Replacing the by , respectively, in (6)–(13) trivially shows that the analysis obtained from all-at-once assimilation of using as the first guess in (6) is identical to that which would be obtained from

The iteration (16) is, however, computationally impractical for large models. The CHEF algorithm uses observation volumes like the LETKF and Oke et al.’s (2005) ensemble optimal interpolation (OI) to ameliorate this problem.

To understand how observation volumes are used in the CHEF, consider Fig. 1. An ensemble of forecasts of the temperature variable (shown in red) is to be updated–analyzed using all observations within the local observation volume enclosed by the black circle. The observation (shown in magenta) is outside the observation volume and is not used to update the estimates of because it lies too many error correlation length scales away from to have a significant influence on the state at . However, observations (shown in blue) that lie within the observation volume are close enough to to have a nonnegligible impact on the estimate of the state at . Assume that these observations are related to the true values of the model variables via the equations , where are observation errors. Assume that the observation errors are uncorrelated with each other and that there are observation batches: the first containing and the second containing .

To map (16) to a form appropriate to the observation volume, we define the linear operators

where the row vector selects the variable from the state vector; that is, all of its elements are equal to zero except the element corresponding to . The observation volume form of (16) pertaining to the update of the ensemble estimate of is then given by

As we shall show later, provided the edge of the observation volume is a few times the distance beyond which variables have zero error correlation with each other, the state estimate obtained using the observation volume approach (18) is identical to the global solution from (16) to many significant figures. Note that in (18) there is no restriction to the type of model space forecast error covariance model used in CHEF. Introducing the notation and enables (18) to be further simplified to

These state vectors only have three elements and the error covariance matrices are just 3 × 3 matrices.

Campbell et al. (2010), inter alia, noted that when observations are integrals of the state (like ), ensemble covariance localization based on the distance between observations and other variables is problematic because integral observations cannot be associated with a single point in space–time. A consequence of this is that many ensemble covariance localization schemes do not localize covariances in observation space in a way that is consistent with the localization used between model variables. Campbell et al. (2010) demonstrated that such inconsistent localization gives data assimilation performance that is significantly inferior to that from consistent ensemble covariance localization that first localizes the ensemble covariances on the model grid, where separation distances are clearly defined, and then maps these localized covariances to observation space using the observation operator. Campbell et al. (2010) further note that the benefits of consistent model space ensemble covariance localization are particularly marked for satellite observations—the dominant observing platform over much of the planet.

While theoretically possible, a method for incorporating consistent model space ensemble covariance localization of the type discussed in Campbell et al. (2010) in the serial ensemble data assimilation schemes of Houtekamer and Mitchell (1998), Anderson (2001), and Whitaker and Hamill (2002) has not yet been proposed. We now show how CHEF readily accommodates a hybrid forecast error covariance matrix that incorporates consistent model space localization using the simple case outlined in Fig. 1. In this case, just five model variables are needed to estimate and and . The relevant 5 × 5 hybrid error covariance matrix is

where the columns is the *j*th ensemble perturbation, is the localization matrix possibly defined by the distances between model grid points, is the elementwise matrix product, and estimates the average climatological forecast error covariance. Note that because the distance between model grid points is unambiguously defined, the (extended) model space localization described in (20) avoids the difficulties that are associated with defining distance-dependent localization between model variables and observations whose location is ill defined because the observation is an integral of the state. The scalars α and β are positive definite weights that determine the extent to which ensemble-based or static covariances dominate the hybrid covariance model.

Having defined in the space of relevant model variables, one obtains in observation space using an operator that is just like the operator except it is designed to operate on the short vectors listing the five relevant variables in the observation volume rather than the long vectors listing all model variables. For the simple case depicted in Fig. 1,

The hybrid defined by (21) can then be used to initiate the data assimilation loop defined by (19). Note that while there is an up-front computational cost involved in building the hybrid of a model space localized ensemble covariance matrix and a climatological covariance matrix, once has been formed, the use of model space covariance localization does not impose any additional costs on the serial data assimilation loop given by (19).

Further computational advantages can be obtained over the formulation given by (19) by noting that (i) the correction term to the error covariance matrix of the state estimate has a low rank equal to the number of observations in each batch and (ii) once an observation has been assimilated, subsequent state estimates of the observed variable have no effect on the state estimate of the model variable being updated. In other words, the algorithm (19) wastes computational resources by continually updating estimates of observed variables even after the observation has been assimilated and also because it uses up more memory that it needs to when updating the error covariance matrix of the state estimate. These wasteful aspects of (19) can be removed by using

where the operator removes all the elements of column vectors pertaining to the (*i*+1)th batch of observations that was just assimilated. This feature makes the number of calculations associated with (22) very much smaller than the number of calculations associated with (19) when the number of observations within the observation volume is large. Equation (22) constitutes the CHEF algorithm for updating .

Note that while one computational processing unit (CPU) was using (22) to update the ensemble estimate , a differing CPU could use essentially the same algorithm to simultaneously and independently update another variable—, say. In this way, CHEF is computationally scalable up to the number of model variables. In the case of a very large ensemble, further scalability could be obtained by assigning differing processes to different parts of the loop in (22) over ensemble members.

Also note that the final output of the CHEF algorithm is a *K*-member perturbed observations analysis ensemble that is consistent with the prescribed hybrid analysis error covariance matrix. In a cycling system, this *K*-member analysis ensemble would be propagated by the nonlinear model across the subsequent data assimilation window. This forecast ensemble would then be used to construct the localized ensemble covariance matrix portion of the hybrid error covariance matrix to be used in this subsequent data assimilation window. Note that this localized ensemble covariance matrix is guaranteed to be positive semidefinite provided that the localization matrix is positive semidefinite. This fact should allay concerns about the fact that numerical round-off error can cause the non-square-root analysis error covariance update given by the third line of (22) to produce an analysis error covariance matrix with very small negative eigenvalues. Unlike in the classical Kalman filter, these negative eigenvalues cannot be amplified by time propagation and data assimilation cycling because, in CHEF, covariance time propagation is entirely handled by the *K*-member ensemble forecast—which is guaranteed to produce a positive semidefinite forecast error covariance matrix at the next cycle. The presence of very small negative eigenvalues in the posterior covariance is unlikely to cause any problems in the inverse square root operation on the second line of (22) because of the addition of the observation error covariance matrix. Furthermore, our experiments with CHEF so far indicate that any problems associated with noncycled spurious negative eigenvalues are imperceptible. [See Grewal and Andrews (2008) for a detailed discussion of these issues.]

The general CHEF algorithm for updating any model state is essentially identical to that given by (22). One just needs to define the model variables and observation operators appropriately to the system being updated. Note also that it is trivial to extend (22) to the case where (22) is used to update a patch of model variables. Natural choices for such patches include (i) all the variables at a single grid point, or (ii) all the variable values at a particular grid point at discrete times over a data assimilation window, or (iii) as in (ii) but for a 5 × 5 × 5 cube of grid points.

To better understand the potential computational costs associated with the CHEF, let us suppose that the observation volume was much bigger than that depicted in Fig. 1 and that we used about well-selected observations to update the single variable rather than just 2. In a great number of practical problems, 1000 well-selected observations would tie down the values of model variables at a single grid point, in the sense that the addition of more observations would not appreciably change the estimated value.

With around 1000 observations, the matrix would contain *O*(10^{6}) elements. Since 4 bytes of memory are required to represent a floating point number with seven significant decimal digits, 4 × 10^{6} bytes of memory would be required to hold this matrix in memory. Since a typical current day supercomputer CPU with 16 GB (16 × 10^{9} bytes) of RAM can hold 4000 such matrices at any one time, holding a single such matrix should not present much of an impact on available memory. In appendix A, we estimate that the number of floating point operations is *O*(*p*^{3}) ~ *O*(10^{9}) for a 100-member ensemble. Since each grid point can be updated independently of all other grid points the CHEF filter is embarrassingly scalable up to the number of model grid points and hence is well suited to massively parallel computing architectures.

For this number of observations, the cost of CHEF is considerably larger than the corresponding observation volume cost for a LETKF, which is *O*(*pK*^{2}) ~ 10^{7}. This big difference in cost means that the LETKF could use a bigger ensemble than the CHEF at the same computational cost. However, some experimenters (I. Szunyogh 2013, personal communication) have found little benefit in increasing the LETKF ensemble size beyond 100 members. Let us suppose enough computational resources were available to run the ensemble at a size where there was little or no benefit to LETKF or ensemble square root filter (ESRF) from increasing the ensemble size any further. If at this ensemble size, the CHEF with its hybrid error covariance model could still significantly outperform the LETKF or ESRF, then it would still have something of value to offer those who could afford its cost.

## 4. Tests of CHEF

The specific tests described in this section are based on synthetically generated forecast errors and corresponding forecast error covariance models. No dynamical forecast model is employed. Unlike real data tests or tests that use dynamical models to generate observations and forecasts, in these tests it is possible to know a priori what the optimal analysis is and hence enables diagnostics that are not readily available for tests involving dynamical models.

### a. General idealized framework

To provide a data assimilation testing environment in which the optimal solution is known, and readily calculated, we consider a ring of *n* equally spaced points at which the values of the *n*-vector is defined. To the extent that the innovation can be expressed in terms of the difference between observation and forecast errors , the structure of analysis corrections (6) and the analysis error covariance matrix given by (7) are completely independent of the actual values in . Hence, for simplicity we will assume that so that . We define a known true forecast error covariance matrix in the form

where is an orthonormal matrix ( being the identity matrix) whose columns list the orthonormal eigenvectors of the forecast error covariance matrix and is a positive semidefinite diagonal matrix whose elements list the eigenvalues corresponding to . For simplicity, we let be defined by a discrete Fourier basis of sine and cosine functions. This basis is readily obtained from the inverse Fourier transform of the identity matrix. We then assume that the forecast error and true state are given by the stochastic process

We use (24) to generate forecast errors because it ensures that (i) forecasts are unbiased and (ii) that the covariance of forecast errors is consistent with (23) because

Oceanic and atmospheric observational networks are constantly changing. To ensure that the performance of tested data assimilation schemes is independent of the particular location and density of observations, for some trials, we let the number and location of observations be determined by random uniformly distributed integers so that these quantities differ from one trial to the next. This approach often leads to more than one observation being taken at the same grid point. While the collocation of observations might lead to poor convergence rates in variational methods, the speed of the direct solution methods used here is insensitive to observation collocations. We assume that all observation errors are uncorrelated and that they all have the same error variance *R*. Consequently, for any given trial, the observation error covariance matrix is a diagonal *p* × *p* matrix whose diagonal elements are all equal to *R*.

The actual observation vector **y** to be assimilated is defined using

When *q* independent observations of observation error variance *R* reside at the same grid point, one can show that the error variance of the mean of these *q* observations is equal to *R*/*q*. Hence, by allowing random integers to assign more than one observation at each grid point our approach also partially randomizes the effective error variance of the network at each grid point.

In summary, the above procedure provides observations, observation operators, and forecasts for which all associated covariances are precisely known. This facilitates comparisons of suboptimal with optimal data assimilation schemes.

### b. What is the consequence of only using “local” observations to analyze a grid point?

From section 3, it is clear that the computational cost of CHEF decreases as the number of observations allowed to influence the analysis at each grid point is decreased. In addition, if the forecast error correlation between the observed variable and the analyzed variable is close to zero, it seems plausible that the observation will have very little effect on the analyzed variable.

Let us define the correlation width to be the minimum separation distance required to ensure that the real forecast error correlation between two variables is smaller than some critical value. In the experiments described in this subsection, we will quantify the difference between the optimal analysis obtained by (6) and the CHEF analysis as a function of the number of correlation widths used to define the observation volume.

For the experiments used to quantify this dependency, we use the following parameters. The number of grid points *n* = 128 and there is just a single variable at each grid point. For the trials in this subsection, the number of observations in each trial is determined by a random draw from a uniform distribution of integers between 1 and a prespecified . While we assume that all observations are collocated with grid points, the index of the grid point at which each observation is taken is determined by a random draw from a uniform distribution of integers between 1 and *n*. We also choose so that the expected number of observations is *p*/4. The observation error variance *R* = 1. The *i*th nonzero element of the diagonal matrix defining the forecast error covariance matrix in (23) is given by

where is the wavenumber of the *i*th sinusoidal eigenfunction associated with . Equation (27) ensures that the sum of the eigenvalues and hence that the sum of the diagonal elements (the trace) of is also . Because only depends on wavenumber and is independent of whether it pertains to a sine or cosine eigenfunction, the forecast error variance is the same at each horizontal location. Hence, (27) also ensures that the variance at each grid point is given by . For the experiments discussed in this subsection, we chose and .

The solid line on Fig. 2 displays the correlation of the forecast errors at each of the model grid points with the forecast error at the grid point at the center of the model domain. The peaks of the dashed line display the points at which the magnitude of this correlation function has fallen below 10^{−4} of its peak value. We define the correlation width to be the distance between the peak of the covariance function and this point. In this case, the correlation width is seven grid points.

For our experiments, we let the CHEF create an analysis at each grid point using all observations within the grid point’s observation volume. A gridpoint’s observation volume includes all observations within some specified number of correlation widths of the grid point. The solid line on Fig. 3 shows the mean over 10 independent trials of the base 10 logarithm of the maximum of the absolute values of the difference between each optimal analysis (6) and the CHEF analysis as a function of . (For reference, the mean-square error of the optimal analyses over 10 trials ranged from 0.5 to 0.7 over the 20 sets of 10 trials.) It shows that mean maximum differences between the CHEF analysis and the optimal analysis rapidly decreases from around 10^{−1} when only observations less than half a correlation width (three grid points) from the grid point being updated are used to below 10^{−6} when all observations less than 2.5 correlation widths from the grid point are used. The maximum differences remain at about this (very small) value until at 9.5 correlation widths (66 grid points) all observations are allowed to update each grid point. At this point, the maximum differences drop to around 10^{−15}. Differences at this magnitude can be attributed solely to numerical round-off error associated with the discrete representation of numbers used by the computer program. In other words, as anticipated by the theory, the CHEF delivers an analysis identical to the optimal solution (6) provided all observations are used to update each grid point. As will be discussed later, current EnKFs do not have this property. The dashed line on Fig. 3 shows that, for all values of , the domain averaged mean of differences between the optimal analysis mean (6) and the CHEF analysis mean is about an order of magnitude smaller than the mean maximum difference in all cases.

The slight lack of equivalence between the CHEF and (6) between 2.5 and 9 correlation widths is due to the fact that when one can construct a chain of observations around the domain such that each observation lies within a correlation width of another observation, each observation will have a nonzero contribution to the analysis at every single model grid point.

Since (i) some atmospheric and oceanic observations are only reported to two or three significant figures, and (ii) the observation error variance (including the observation error variance of representation) is barely known to two or even just one significant figure, one could argue that the differences between the CHEF analysis and the optimal analysis (6) are truly insignificant provided those differences are less than 10^{−3}. Figure 3 indicates that this requirement is met by the CHEF provided one allows observations within a single correlation width to influence the analysis at each grid point.

### c. Comparison of the CHEF with other algorithms

By ESRF, we mean the filter described in Whitaker and Hamill (2002). As indicated in Tippett et al. (2003), for serial single observation processing, the ESRF is identical to the ensemble adjustment Kalman filter (EAKF) described in Anderson (2001).

#### 1) Theoretical aspects

In ESRFs, the error covariance of the ensemble mean after the assimilation of *i* observations is given by

where is the raw ensemble covariance matrix and , where are the ensemble perturbations after the assimilation of *i* observations. In (28), is a correlation matrix used to attenuate spurious nonzero sample covariances associated with widely separated grid points. Note that the rank of the localized covariance matrix can be much greater than *K*, the number of ensemble members. As shown in the appendix of Posselt and Bishop (2012), if no ensemble covariance localization was required (all elements of were equal to unity) then after the assimilation of the *i*+1th batch of observations an ensemble of analysis perturbations whose covariance would be precisely consistent with (7) is

where **V** and are the orthonormal eigenvectors and eigenvalues of the observation space covariance matrix . In appendix B, we show that (29) is equivalent to Whitaker and Hamill’s (2002) form of the ESRF [and hence to Anderson’s (2001) EAKF] in the case of single observation processing. Note that is not the square root of the prior *localized* covariance matrix Equation (29) would deliver the true square root of the posterior covariance matrix if were replaced by a left square root but this would greatly increase the cost of the ESRF. Instead, after (29), ESRFs estimate the error covariance matrix of the ensemble mean after the assimilation of *i* + 1 observations using

However, (30) is only consistent with (7) in the special case that all the elements of are equal to unity, thus making the ensemble covariances unlocalized. Otherwise, it is inconsistent with (7) because is not the (high rank) square root of Specifically,

In contrast, the CHEF’s update of the error covariance matrix is always consistent with (7). This is one reason why the CHEF analysis should be more accurate than the ESRF analysis.

A second reason to expect that CHEF should outperform the ESRF is that the CHEF can use hybrid forecast error covariance matrices that blend climatological covariance matrices and ensemble-based error covariance models. A simple form of the hybrid covariance model that is often incorporated into variational data assimilation schemes is just . However, more general hybrid covariance models along the lines of are also possible and they could also easily be incorporated into CHEF.

In the case where a hybrid gave the expected true error covariance given an imperfect ensemble covariance, the CHEF would yield the minimum error variance estimate and a consistent ensemble. Schemes that could not accommodate hybrid covariance models such as the ESRF and EAKF would not be able to provide the minimum error variance estimate. 4DVAR schemes that can accommodate hybrid covariance models but give the posterior mode rather than the posterior mean would also fail to yield the linear minimum error variance state estimate.

The CHEF algorithm has similarities to the ESRF, EnKF, and LETKF. Like ESRFs and EnKFs, the CHEF processes observations serially. Like the LETKF and Oke et al.’s (2005) ensemble optimal interpolation, the update for each model variable can be done independently of the update of other model variables thus enabling a high degree of computational scalability. Such similarities will likely facilitate the incorporation of CHEF into existing LETKF or ESRF data assimilation schemes.

To test the ESRF algorithm based on using (28)–(30), it was coded up in MATLAB and then run for the special case in which there is no ensemble covariance localization. Localization was turned off by setting all of the elements of equal to unity; in other words, one sets , where denotes a row vector whose elements are all equal to unity. In this case, the ESRF should give exactly the same analysis as the CHEF and “all-observations-at-once” data assimilation schemes. The MATLAB version of the algorithms confirmed that this was true.

#### 2) Quantitative comparison of CHEF and ESRF algorithms

It can be shown that the nonhybrid CHEF analysis mean is identical to the ESRF analysis mean in the special cases of either a single observation or observations that are separated by a distance that is large compared with the error correlation length scale. Consequently, we expect the difference between CHEF and ESRF performance to be a function of the expected number *m* of observations that have a significant forecast error covariance with the variable being updated. Currently, operational global model forecasts assimilate millions to tens of millions of observations. Hence, it is possible that many thousands of observations could influence the analysis at many of the grid points in these systems.

To increase the number of observations that influence the analysis at each grid point, we set the parameter *d* used in (27) to *d* = 3. As illustrated in Fig. 4a, this results in a true physical space error correlation length scale that is six times broader than that shown in Fig. 1. As a quantitative measure of the number *m* of locally influential observations, we define *m* to be the number of observations that have a forecast error covariance greater than 0.1 with the variable being updated. With *d* = 3 and a single observation at each grid point, *m* = 41. With two, three, and four independent observations at each grid point, *m* = 82, 123, and 164, respectively. Thus, by changing the number of independent observations at each grid point we can isolate the effect of increasing the number of influential observations while keeping all other aspects of the experiment the same.

The localization matrices were assumed to have the same form as the true forecast error covariance matrix, but with a different *d* value. In each experiment, the localization function was tuned to minimize mean-square error (mse) in the *m* = 41 case.

For both the CHEF and the ESRF in all of the experiments described below, the analysis at each grid point was created using all observations that lay within two correlation widths of the grid point being updated. Consistent with Fig. 3, this was found to give CHEF analyses identical to analyses obtained using all-observations-at-once assimilation using (6) to more than four significant figures.

To motivate specific idealized scenarios, we note that what is considered an “ideal” ensemble depends on its intended use. An ideal for ensemble forecasters is an ensemble of forecasts whose members appear to have been drawn from the same distribution as the truth given antecedent conditions. In contrast, an ideal ensemble for data assimilation groups is an ensemble from which one may derive a high-rank forecast error covariance matrix that is equal to the covariance matrix of forecast errors given antecedent conditions. These differing ideals motivate the following two idealized experimental scenarios:

Scenario A: The ensemble members are drawn from the same distribution as the truth (as desired by ensemble forecasters). In this scenario, neither localized ensemble covariance matrices nor hybrid error covariance models that linearly combine the localized ensemble covariance matrix with a static covariance matrix are equal to the true error covariance matrix. In scenario A, both the truth and the ensemble members are stochastically generated using (23) with

*d*= 3.- Scenario B: The true error covariance matrix is a hybrid error covariance matrix in which a weight of 0.5 is given to the localized ensemble covariance matrix and a weight of 0.5 is also given to the climatological (static) error covariance matrix . In other words, . In this scenario, the ensemble is not drawn from the same distribution as the truth, but when its covariance is localized and used in the hybrid error covariance model it yields the true error covariance model (as desired by the data assimilation group). In scenario B, the ensemble members are obtained using (23) with
*d*= 3 in exactly the same way as in scenario A, but the truth is obtained from the stochastic process , where**c**is a random normal vector with mean**0**and covariance so that as is required by this scenario.

In the non-hybrid case, the differences between CHEF and the ESRF would vanish in the limit of an ensemble so large that localization was not required. Hence, to highlight the differences in the algorithms, we use an ensemble size for these experiments that will clearly benefit from localization: specifically, a small six-member ensemble. Before beginning quantitative tests for scenario A, we first tuned the localization function for the special case that each variable was observed once ( = **)** with observations of unit error variance ( = ). With this setup, the mean-square errors for the CHEF over 7 × 16 independent trials were 0.1078, 0.1072, and 0.1075 for *d* = 2.7, *d* = 2.4, and *d* = 2.1, respectively. Thus, the localization function corresponding to *d* = 2.4 minimized mse for CHEF. This localization function is depicted by the dotted–dashed black line in Fig. 4a. The gray dashed line in Fig. 4a gives the unlocalized ensemble sample covariance function of ensemble perturbations at each model variable with the 64th model variable. The solid gray line gives the corresponding localized ensemble covariance function while the black line gives the corresponding true error covariance function. Note that the “tuned” localization function is broader than the true error covariance function. Also note that the mse results obtained from the tuning exercise indicate a general lack of sensitivity to the localization length scale since the mean-square errors for *d* = 2.7 and *d* = 2.1 differ from that for *d* = 2.4 only in the fourth significant figure.

The corresponding mean-square errors for the ESRF over 7 × 16 independent trials were found to be 0.1127, 0.1127, and 0.1135 for *d* = 3.15, *d* = 3.45, and *d* = 3.75, respectively. We chose the *d* = 3.15 value for the ESRF localization because it was just as good as that for *d* = 3.45. Note that the length scale for the *d* = 3.15 localization function is actually narrower than the true error correlation length scale associated with *d* = 3.

To isolate the performance penalty paid by the ESRF for not accurately solving (6) as it serially assimilates observations, it is of interest to quantify the difference between ESRF and CHEF performance when the CHEF is restricted to using the non-hybrid localized ensemble covariance matrix . To this end, we performed a series of independent experiments for *m* equal 41, 82, 123, and 164 corresponding to each variable being independently observed, one, two, three, and four times. For each value of *m,* we did seven sets of where was chosen to ensure that the average mse over from CHEF was smaller than the corresponding mse from ESRF in each of the seven sets of Insisting that be large enough to make CHEF win all seven independent sets of trials ensures that the null hypothesis that the CHEF and ESRF are of equal quality can be rejected with more than 99% confidence. (This follows because the probability of flipping a coin and obtaining seven heads in a row is 1/128.) For *m* equal 41, 82, 123, and 164 and observation error variance equal to unity, it was found that with values of 16, 16, 8, and 6 were needed to achieve this property. The required value of decreases as *m* increases because the margin of improvement of CHEF over ESRF increases with *m*. To normalize the reduction in mse measure, we use the percentage measure,

where mse(optimal) is the mean-square error from the data assimilation scheme that used the true forecast error covariance matrix and a global solve. On average, mse(optimal) is equal to the mean value of the diagonal elements given by (7).

The dark solid line on Fig. 5a gives the average percentage mse reduction *S* due to using CHEF instead of the ESRF as a function of *m* for the case where both the ESRF and the CHEF begin assimilation with the localized ensemble covariance matrix. It shows how this improvement increases with the number *m* of observations that have a significant influence on the analysis of each grid point. Specifically, as *m* increases from 41 to 164, the average percentage mse reduction *S* due to CHEF increases from 8.2% to 26%. Note that these differences are solely attributable to the fact that the ESRF does not accurately solve (7) as it serially assimilates observations.

The dashed and dotted–dashed lines on Fig. 5a give the minimum and maximum values of the mse for the seven sets of trials. They give a sense of the range of mse values that were obtained in the trials. This range increases with *m* because was selected to decrease with *m*.

The thick gray line on Fig. 5a shows how CHEF performance increases when it is permitted to use a hybrid error covariance model of the form while the ESRF uses an optimally tuned localized ensemble covariance matrix. In this case, as *m* increases from 41 to 164, the average percentage mse reduction *S* due to CHEF increases from 31% to 54%. Thus, in this scenario A case, the reduction in analysis error variance due to the CHEF’s ability to use a hybrid error covariance model is much greater than the reduction in analysis error variance due to its ability to accurately solve (6).

We now consider scenario B. Here, the true forecast error covariance matrix is given by the hybrid error covariance model , where is defined by the correlation matrix corresponding to *d* = 2.25. The six ensemble members whose covariance defines are drawn from a distribution whose covariance is given by so that when averaged over many trials ( is the covariance matrix corresponding to *d* = 3).

As in scenario A, independent tuning experiments were performed to independently optimize CHEF and ESRF localization for scenario B. The localization matrices corresponding to *d* = 3.15 and *d* = 2.25 were found to minimize mse for the ESRF and the non-hybrid CHEF, respectively.

For *m* equal 41, 82, 123, and 164 and observation error variance equal to unity, it was found that with values of 16, 16, 8, and 5 were needed to ensure that CHEF had lower mse than ESRF in all seven independent trials. As before, the required value of decreases as *m* increases because the margin of improvement of CHEF over ESRF increases with *m*.

The dark solid line on Fig. 5b shows that the superiority of CHEF over the ESRF again increases with the number *m* of observations that have a significant influence on the analysis of each grid point. Specifically, as *m* increases from 41 to 164, the average percentage mse reduction *S* due to CHEF over ESRF increases from 8% to 27%. Again, these differences are solely attributable to the fact that the ESRF does not accurately solve (6) as it serially assimilates observations. The thick gray line on Fig. 5b pertains to the case where we let CHEF use the true hybrid error covariance model while the ESRF uses the optimally tuned localized ensemble covariance matrix . In this case, as *m* increases from 41 to 164, the average percentage mse reduction *S* due to CHEF increases from 20% to 29%. Note that in this case, the CHEF analysis is identical to the optimal analysis because the hybrid error covariance model used in CHEF is the true error covariance matrix. It is impossible to get an analysis of the form better than the CHEF analysis in this case.

These scenario B experiments serve to illustrate how when CHEF is used to assimilate observations, more accurate forecast error covariance models translate into improved analyses and that this property is not shared by the ESRF. Furthermore, the CHEF’s ability to incorporate hybrid error covariance models makes it easier to improve forecast error covariance models.

### d. Comparison of CHEF and ESRF performance in a data assimilation cycle

To check that there are no unforeseen problems associated with using CHEF to cycle an ensemble, here we compare the performance of CHEF and the ESRF in a simple cycling data assimilation system. We use model II of Lorenz (2005). The model features 240 grid points with periodic boundary conditions. We set the forcing term *F* = 15 and set the smoothing term (*K* in Lorenz’s notation) to 8. The integration time step was set equal to 0.025. This model features smoothly varying waves that chaotically grow, decay, and propagate. The model was integrated for 30 000 time steps from a random initial condition. This 30 000 time step state was then deemed to be “the true state” and another 2000 time steps were performed. Observations at every grid point were generated every second time step by adding random Gaussian noise to each true grid point value with a mean of zero and a variance *R* = 1.4. This value of *R* was chosen because its square root is ⅕th of the standard deviation of climatological variations. A *K* = 10-member ensemble was created by randomly sampling states from the original 30 000 time step run between the 15 000th and 30 000th time steps of the original run. Using this 10-member ensemble, data assimilation was performed every second time step to assimilate observations using either the CHEF or ESRF method. To make the performance of CHEF and ESRF more comparable, we ran CHEF in non-hybrid mode, only allowing it to use a localized 10-member ensemble covariance for its forecast error covariance model. Both the CHEF and the ESRF were able to quickly reduce the analysis error variance from a value near the climatological error variance to a value much lower than the observation error variance. The localization length scale was independently tuned for CHEF and ESRF to minimize mean-square error variance over the last 500 data assimilation cycles. As in the non-cycling experiments, it was found that the optimal localization function width for CHEF was considerably broader than that for the ESRF.

Many authors including Whitaker and Hamill (2002) have pointed out that the perturbed observations ensemble generation technique can introduce a lot of noise into the ensemble covariance model when the number of ensemble members is significantly less than 100. The ESRF does not use perturbed observations and hence avoids this source of noise. Nevertheless, even with this small ensemble size, Fig. 6 shows that ESRF and CHEF performance are comparable with CHEF having slightly smaller mean-square errors. Both the paired sample *t* test and the two sample *t* test state that the null hypothesis that the CHEF is no better than the ESRF can be rejected with 99% confidence for the last 1000/900/800/700/400/300/200 and 100 data assimilation cycles. Over the last 600 and 500 data assimilation cycles, the paired sample *t* test only allows the null hypothesis to be rejected with 75% and 94% confidence, respectively.

Thus, the superiority of the CHEF over the ESRF was not compromised by cycling—even when the deleterious effect on CHEF of choosing a small perturbed observations ensemble was combined with the deleterious effect of choosing to deny CHEF a hybrid covariance model.

## 5. Conclusions

Data assimilation schemes that use hybrid forecast error covariance models that linearly blend ensemble and quasi-static covariances have been found to outperform those that do not. High-rank hybrid error covariance models are readily accommodated by variational data assimilation schemes. However, variational schemes are typically designed to find posterior modes rather than posterior means when nonlinearity is present. In contrast, EnKFs are specifically designed to find the best linear unbiased estimate of the posterior mean rather than the mode. Hodyss (2012) has shown that a better than linear estimate of the posterior mean can be obtained in the EnKF framework by extending the innovation vector with higher powers of the innovation. Since an ensemble generation scheme that solved Bayes’s theorem in the general non-Gaussian case would center the ensemble on the true posterior mean, we speculate that the EnKF framework has a better chance of imparting this property to ensembles than variational schemes. Unfortunately, existing EnKFs solve (6) inaccurately and have been unable to accommodate high-rank hybrid error covariance models. In contrast, the new nonvariational consistent hybrid ensemble filter (CHEF) introduced here accurately solves (6) and readily accommodates hybrid error covariance models. Despite the word “filter” in its moniker, we have derived the CHEF in a fully four-dimensional “smoother” context. It is likely that the CHEF could be incorporated into existing EnKF, ESRF, and LETKF data assimilation systems fairly easily because of its similarities with these schemes. To summarize,

When given the true forecast and observation error covariance matrix and when all observations are allowed to influence the analysis at each grid point, the CHEF analysis is the linear minimum error variance estimate.

Like Hodyss’s (2012) quad filter, the CHEF can use higher-than-second-order error moments to estimate the posterior distribution; it is not limited to linear estimates of the posterior mean.

Optimal and CHEF analyses were found to agree to the first 3–4 significant digits if all observations within 1.5 correlation widths of each grid point were used.

Even with a nonhybrid forecast error covariance model, the CHEF outperformed an ESRF identical to those described in Anderson (2001) and Whitaker and Hamill (2002). The degree of relative superiority increased as the density of observations was increased. Despite the CHEF’s use of perturbed observations for ensemble generation, this superiority was maintained in cycling experiments with a small 10-member ensemble.

The margin of superiority of the CHEF over the ESRF was even larger and less sensitive to the density of observations when the CHEF was allowed to utilize its capability to accommodate hybrid error covariance models.

In an atmospheric application, the CHEF is likely to be more computationally expensive than both the LETKF and the ESRF but it will be more accurate. To see whether the additional computational expense is worth it, we hope to compare the performance of these data assimilation schemes against operational schemes in future work.

## Acknowledgments

This work was conceived while a guest of Juan Antonio Rengel Ortega and the Instituto Hidrográfico de la Marina (IHM), Armada Española, and Cádiz-España. Thanks to Juan and IHM for their hospitality. Craig Bishop acknowledges financial support from the Department of Navy’s Engineers and Scientists Exchange Program and the Office of Naval Research Grant N0001413WX00008. Bo Huang and Xuguang Wang acknowledge support from NASA NIP Grant NNX10AQ78G and NOAA Grant NA14NWS4680021.

### APPENDIX A

#### CHEF Cost Analysis Using 1000 Observations per Grid Point

Let the number of observations ; we take and . We use these values to give specific estimates of the order of magnitude of the number of floating point operations (FLOPS) required to apply each of the steps in the CHEF algorithm outlined in (22).

##### Cost of computing

This cost is a one-off cost incurred before the algorithm’s loop over batches of observations is begun. The CHEF puts no constraints on the form of . For the purpose of computing the cost of the CHEF filter, let us also assume that the matrix takes the hybrid form

where is a minimal list of the *q* variables in the observation volume that are required to predict the observations in the volume. This list also includes the variable to be updated. Since some observations are integrals of the state, we will assume that this list is four times larger than the number of observations so that and . The matrices and give the localization matrix and climatological forecast error covariance matrix for the *q* variables in . The parameters and are positive definite scalars whose values are obtained through some sort of “tuning” procedure. If *K* is , the number of FLOPS required to compute each element of the part of (A1) associated with the localized ensemble covariance would be . Generally, the cost of computing the static part would be even less than this so the cost overall would still be FLOPS. For very small ensemble sizes, the cost might be dominated by —but we will not consider that case here.

##### Cost of

For the first loop, the cost of obtaining is , while the cost of computing is also . The cost of these two operations decreases each loop because the number of rows and the number of columns decreases by 1 with each iteration. These two operations must be repeated times so an over estimate of the total cost for these operations is . This cost is about the same as the cost as constructing .

The cost of all the other steps in the algorithm (22) is orders of magnitude smaller than the cost of these two operations. Hence, it costs about FLOPS to assimilate 1000 observations with CHEF.

### APPENDIX B

#### Equivalence of ESRF Given by (29) and Whitaker and Hamill

In the case of serially assimilating single observations, Eq. (13) of Whitaker and Hamill (2002) defines a variable α that ensures that , where . In the notation of our paper, the value they give for this variable is . Now, by inspection of (29), would be equivalent with (29) provided that

To see that these are equivalent, note that

Using this in (B1) and noting that with a single observation **V** = = 1 and ,

then proves the equivalence of (29) and the expression given in Whitaker and Hamill (2002).

## REFERENCES

*Atmospheric Data Analysis*. Cambridge University Press, 457 pp.

*Kalman Filtering: Theory and Practice Using MATLAB*. 3rd ed. John Wiley and Sons, 592 pp.

*Atmospheric Modelling, Data Assimilation and Predictability*. Cambridge University Press, 341 pp.