1. Introduction
Interest in data assimilation methods using ensemble integrations of prediction models is growing rapidly in the atmospheric and oceanic communities. This is occurring because ensemble assimilation methods are maturing rapidly and because both prediction centers and research groups are becoming increasingly interested in characterizing more information about the probability distribution of the climate system than can be revealed by a single assimilated state estimate.
Ensemble assimilation methods were originally developed as computationally feasible approximate solutions of the nonlinear filtering problem patterned after the Kalman filter (Kalman and Bucy 1961; Courtier et al. 1993). This led to a sequence of related methods known as ensemble Kalman filters (Evensen 1994), which have been extended to increasingly general assimilation problems (Houtekamer and Mitchell 1998). More recently, other variants, still referred to as ensemble Kalman filters (Bishop et al. 2001; Anderson 2001; Pham 2001), have appeared in the literature demonstrating improved assimilation error characteristics and/or decreased computational cost. Some of these were developed directly from the probabilistic statement of the nonlinear filtering problem, rather than starting from the Kalman filter. Developing filters in this context can lead to a more straightforward understanding of their capabilities for those not intimately related with the intricacies of the Kalman filter.
Here, a framework is developed in which many of the ensemble Kalman filter methodologies documented to date can be described while still supporting a more general class of ensemble filters. The derivation begins with the nonlinear filtering problem and applies a sequence of simplifying assumptions. The introduction of a joint state–observation space (Tarantola 1987) leads to an ability to deal with observations related to the model state variables by nonlinear functions. A least squares assumption (equivalent to assuming a local Gaussian relation among the prior joint state variables) has been made, sometimes indirectly, in many descriptions of ensemble Kalman filters. Here, that assumption is made explicitly and a significant simplification in the description of the algorithms results. Under the assumptions made here, the ensemble filter problem simplifies to an application of a nonlinear filter to a scalar, followed by a sequence of linear regressions. This simplification makes it easier to analyze the relative capabilities of a variety of ensemble filter implementations and can lead to reduced computational cost.
Section 2 derives this context for ensemble filtering and section 3 shows how several previously documented ensemble Kalman filters are related. Section 4 discusses details of methods for doing scalar assimilation problems while section 5 offers conclusions.
2. Ensemble filtering
To simplify notation, this section discusses only what happens at a single time at which observations become available. Discussion of how filter assimilations are advanced in time using ensemble methods and prediction models can be found in Anderson (2001, hereafter A01), Houtekamer and Mitchell (1998), and Jazwinski (1970). Basically, each ensemble member is integrated forward in time independently using a forecast model between times at which observations are available.
a. Joint state–observation space and Bayesian framework


b. An ensemble method for the filtering problem
In ensemble methods for solving (2), information about the prior distribution of the state variables, xp, is available as a sample from N applications of a prediction model. An ensemble sample of the prior observation vector, yp, can be created by applying the forward observation operator,
Some Monte Carlo (ensemble) methods also have a weight, w, associated with each ensemble member. The possibility of weighted ensembles is not discussed in detail here, but the methods in this section are easily generalized to this case. Attempts to apply the most common types of weighting/resampling Monte Carlo algorithms in high-dimensional spaces have faced significant difficulties.
Observational error distributions of climate system observations are generally only poorly known and are often specified as Gaussian with zero mean (known instrument bias is usually corrected by removing the bias from the observation during a preprocessing step). Some observations have values restricted to certain ranges; for instance, precipitation must be positive. Redefining the observation variable as the log of the observation can lead to a Gaussian observational error distribution in this case (Tarantola 1987). Given Gaussian observational error distributions, observations can be decomposed into subsets where observational errors for observations in each subset are correlated but observational errors in different subsets are uncorrelated. In other words, 𝗥, the observational error covariance matrix, is block diagonal with each block being the size of the number of observations in the corresponding observation subset. Error distributions for the different subsets are independent, so the subsets can be assimilated sequentially in (2) in an arbitrary order.
For many commonly assimilated observations, each scalar observation has an error distribution that is independent of all others, allowing each scalar observation to be assimilated sequentially (Houtekamer and Mitchell 2001). If the observational covariance matrix is not strictly diagonal, a singular value decomposition (SVD; equivalent to an eigenvalue decomposition for a symmetric positive-definite matrix like 𝗥) can be performed on 𝗥. The prior joint state ensembles can be projected onto the singular vectors and the assimilation can proceed using this new basis, in which 𝗥′, the observational covariance matrix, is diagonal by definition. Upon completion of the assimilation computation, the updated state vectors can be projected back to the original state space. Given the application of this SVD, a mechanism for sequential assimilation of scalar observations implies no loss of generality for observations with arbitrary Gaussian error distributions. In everything that follows, results are presented only for assimilation of a single scalar observation so that m = 1, with joint state space size k = n + m = n + 1. Allowing arbitrary observational error distributions represented as a sum of Gaussians is a straightforward extension to the methods described below.
c. Two-step data assimilation procedure
Following A01, define the joint state space forward observation operator for a single observation as the order 1 × k linear operator 𝗛 = [0, 0, … , 0, 1]. The expected value of the observation can be calculated by applying 𝗛 to the joint state vector, z, which is equivalent to applying the possibly nonlinear operator
This suggests a partitioning of the assimilation of an observation into two parts. The first determines updated ensemble members for the observation variable y given the observation, yo. To update the ensemble sample of yp, an increment, Δyi, is computed for each ensemble member,
Given increments for the observation variable, the second step computes corresponding increments for each ensemble sample of each state variable, Δxi,j (i indexes the ensemble member and j = 1, … , k indexes which joint state variable throughout this report). This requires assumptions about the prior relationship between the joint state variables. Although reasonable alternatives exist (Tarantola 1987), the assumption used here is that the prior distribution is Gaussian (or a sum of Gaussians that allows generality). This is equivalent to assuming that a least squares fit (local least squares fit) to the prior ensemble members summarizes the relationship between the joint state variables.


The linearization can also be done locally (Fig. 2) by computing local estimates of covariance for each ensemble member. This can be done, for instance, by only using a set of nearest neighbors (in y, in x, or in some combined distance metric) to compute sample covariance. Figure 2 shows an idealized form of nearest neighbor linearization in which only a single closest ensemble member is used to compute the statistical linearization. Related methods for doing local Gaussian kernel approximations of this type can be found in Silverman (1986) and Bengtsson and Nychka (2001). When x is functionally related to y as in Fig. 2, local linearization methods like this can give significantly enhanced performance when
If


When the state variable being updated and the observation variable are not functionally related, the use of local linearizations can be more problematic. Figure 3 shows an example where state variable x1 is being updated by an observation, yo. The expected value of the observation is y =
3. Relation to ensemble Kalman filters










It is easily verified that computing the impact of the observation on each state variable independently in (11), (14), and (15) is equivalent to computing the impact on all state variables at once. This was pointed out, but not rigorously derived, in A01 where assimilations were performed by looking at the impact of an observation on each state variable in turn. More complete discussion of estimation theory as applied in the preceding two subsections can be found in Cohn (1997).
a. Perturbed observation ensemble Kalman filter
In its traditional implementation, the perturbed observation ensemble Kalman filter (Houtekamer and Mitchell 1998) uses a random number generator to sample the observational error distribution (specified as part of the observing system) and adds these samples to the observation, yo, to form an ensemble sample of the observation distribution,
An equivalent two-step procedure for the EnKF begins by computing the update increments for the observation variable, y, a scalar problem independent of the other joint state variables (4). Perturbed observations are generated and (7) is used to compute an updated variance for the observation variable; all matrices here are order 1 × 1. Equation (8) is evaluated N times to compute


b. Ensemble adjustment Kalman filter–square root filter
Bishop et al. (2001), Whitaker and Hamill (2002), and Pham (2001) have all described other ensemble filtering methods similar to the EAKF in A01. Tippett et al. (2003), providing an analysis of the work of these different authors, point out that the methods are roughly equivalent and suggest that the name deterministic square root filter (Andrews 1968) may be more appropriate.




In summary, the EnKF and EAKF assume a Gaussian relation between the variables in the joint state space prior distribution. However, these methods do not use (7) and (8) directly to compute an updated distribution. Both methods can be recast in terms of the two-step assimilation context developed in section 2c. First, update increments are computed for each ensemble sample of the observation variable using scalar versions of the traditional algorithms. Once increments for ensemble samples of the observation variable have been computed, (16) is used to solve for the increments, Δzj,i, for each state variable in turn in terms of Δyi by linear regression. The appendix discusses this method in the case where the prior covariance matrix is degenerate.
There are a number of implications about the computational complexity of ensemble (Kalman) filtering that can be drawn from (16). First, there is no need to compute the prior covariance among the model state variables (only the prior cross covariance of each state variable with the observation variable is needed, along with the variance of the observation variable) or the complete updated covariance, Σu (only the updated variance of the observation variable is needed). Second, once the observational variables are updated, the increments for the state variables depend only on ratios of prior (co)variances. Any multiplication of the prior covariance matrix by some type of covariance inflation factor, as is done in many existing ensemble Kalman filter implementations (Anderson and Anderson 1999; Whitaker and Hamill 2002), does not impact the solution to (16). The impacts of covariance inflation are still felt in the first step in which the increments of the observational variable are computed. Note that many types of covariance inflation directly increase the spread of the joint prior distribution.
There are important caveats to this discussion of enhanced computational efficiency. First, as noted in section 2b, correlations between observational errors require a rotation of the problem to apply the sequential methods discussed here. In the limit where most of the observational errors are correlated, the cost of doing this rotation could end up offsetting the savings from avoiding matrix multiplies. At present, most operational centers do not assume that there are correlated observational errors, but this could become important at a later date. If only certain sets of observations have correlated errors, a block diagonal covariance structure results and rotation can still be done more efficiently than computing the full matrix products. Second, the sequential method outlined here may present challenges for implementation on highly parallel computer architectures; relative performance on such hardware will require further analysis.
4. Additional methods for updating the observational variable ensembles
In this section, some additional methods for updating the observational variable are discussed. Once update increments for the observation variable are computed by one of the following methods, the rest of the joint state variables can be updated by linear regression using (16).
a. A kernel filter


The product of a prior expressed as a sum of Gaussians and a Gaussian observational distribution is equal to the sum of the products of the individual prior Gaussians and the observational Gaussian. The variance of all Gaussians summed in the product is identical in this case and can be computed by a single scalar application of (7). The means will all be different and can be computed by N scalar applications of (12). In the most naïve application of this method, an updated ensemble can be generated from this continuous representation by randomly sampling the sum of Gaussians as in Anderson and Anderson (1999).
This kernel method can be extended in a number of ways by allowing more general kernels. For instance, kernels with different means and different variances can be used following a variety of techniques like the class of nearest-neighbor methods (Silverman 1986; Bengtsson and Nychka 2001). In addition, kernels from the class of “generalized” Gaussians as described in Tarantola (1987) can lead to related kernel algorithms.
b. Quadrature product methods
Update methods that are based directly on “quadrature” solutions to (4) can also be used to find increments for observation variables. One implementation of such a method could begin by computing a continuous approximation to the prior distribution from the ensemble sample; again, kernel methods are an example. Quadrature methods can then be used to divide the real line into a set of intervals over which the product in the numerator of (4) is computed to approximate the updated distribution. An appropriate method can then be used to sample this updated distribution to generate new ensemble methods.
c. General requirements for an observation variable update
Several characteristics may be important for algorithms used to update the observation variables. First, low quality observations should have small impacts on the ensemble. For atmospheric and oceanic models, the prior distributions may be sampling model “attractors” that have a great deal of structure. Allowing low impact observations to change the ensembles has the potential to destroy valuable information. Pure resampling algorithms would be an example of an undesirable method. In this case, the prior ensemble would be converted to a continuous representation that would then be only subtly modified by a low information observation. This updated continuous distribution would then be resampled to generate an ensemble, leading to possibly large increments to ensemble members. The ensemble kernel filter as described above suffers from this deficiency and generally produces assimilations with larger ensemble mean error than do the EnKF and EAKF despite the fact that in many instances it produces more accurate samples of the updated observation variable distribution when the prior is significantly non-Gaussian. Modifications to the kernel filter that limit the impact of low information observations are required to make this method more generally useful.
For related reasons, it is desirable to limit the size of the increments for observation variables as noted in section 2. Since the regression used to update the state variables is a statistical linearization, it is likely to be an increasingly poor approximation as the increments increase. For instance, the updated mean and covariance of the observation variable for the EnKF would be the same if the pairing between the updated,
In some applications, the performance of the EnKF can be dramatically improved by pairing the updated observational variable ensemble members with the prior members so as to reduce the value of the update increments. The most obvious way to do this is to sort the prior and updated observational variable ensemble members and to associate the nth sorted updated ensemble member with the nth sorted prior ensemble member. Doing this reduces much of the difference between the EnKF and EAKF reported in A01. Doing ordered pairing also significantly improves the performance of kernel filter algorithms (section 4a) by reducing the size of the increments.
As an idealized example of this sorting procedure, suppose that for a three-member ensemble, the prior observation variable sample
While many existing ensemble filters can be recast in the framework presented here, there are some that apparently cannot. For instance, the particle filter of Pham (2001) requires information about the joint distribution of observation and state variables when doing resampling. Methods like the singular second-order-exact EnKF of Pham (2001), however, can be expressed in the two-step framework.
5. Conclusions
A local least squares framework for ensemble filtering has been derived leading to a two-step ensemble filtering update procedure when a new observation becomes available. The first step is to compute update increments for each ensemble member of a prior estimate of the value of the observation. This can be done using a variety of algorithms including the perturbed observation ensemble Kalman filter and the ensemble adjustment Kalman filter. Other viable update methods, for instance a kernel filter, extend beyond the Kalman filter context and can be referred to more generally as ensemble filters.
The second step computes increments for each ensemble member of the prior estimate for each state variable by using the prior ensemble sample to do a linear regression of each state variable in turn on the observation variable. The increments for a given state variable are computed by multiplying the corresponding observation variable increment by the prior covariance of the state and observation variable and dividing by the prior variance of the observation variable.
Deriving a class of ensemble filters in this two-step context has a number of advantages. First, it is computationally more efficient than previous descriptions of ensemble Kalman filter algorithms in the literature when observations do not have correlated error distributions. The cost is expected to be dominated by the computation of the prior sample cross covariance of the observation and state variables and the variance of the observation variables. A second advantage is that much more elaborate and expensive ensemble update methods can be used because they need be applied only in a scalar fashion to the observation variables. A final advantage is that it is easier to understand differences between various filtering algorithms. Differences only need to be explored in a scalar context making the relative features of, for instance, the EnKF and EAKF much easier to understand.
By lowering the cost of existing filters and opening up a variety of new filter update algorithms, it is hoped that this local least squares framework can accelerate the development of ensemble filtering algorithms that are best suited for applications such as numerical weather prediction and ocean state estimation.
Acknowledgments
The author is grateful to Chris Snyder, Jim Hansen, Jeff Whitaker, Tom Hamill, Joe Tribbia, and Ron Errico for ongoing discussions of ensemble filtering methods and to the reviewers of this manuscript. Special thanks are due to Stephen Anderson whose insight led the author to the methods developed here.
REFERENCES
Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation. Mon. Wea. Rev., 129 , 2884–2903.
Anderson, J. L., and S. L. Anderson, 1999: A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts. Mon. Wea. Rev., 127 , 2741–2758.
Andrews, A., 1968: A square root formulation of the Kalman covariance equations. AIAA J., 6 , 1165–1168.
Bengtsson, T., and D. Nychka, 2001: Adaptive methods in numerical weather prediction. Proc., First Spanish Workshop on Spatio–Temporal Modeling of Environmental Processes, Benicassim, Castellon, Spain, Universitat Jaume I, 1–15.
Bishop, C. H., B. J. Etherton, and S. J. Majumdar, 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects. Mon. Wea. Rev., 129 , 420–436.
Burgers, G., P. J. van Leeuwen, and G. Evensen, 1998: Analysis scheme in the ensemble Kalman filter. Mon. Wea. Rev., 126 , 1719–1724.
Cohn, S. E., 1997: An introduction to estimation theory. J. Meteor. Soc. Japan, 75 , 257–288.
Courtier, P., J. Derber, R. Errico, J-F. Louis, and T. Vukicevic:, 1993: Important literature on the use of adjoint, variational methods and the Kalman filter in meteorology. Tellus, 45A , 342–357.
Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99 , 10143–10162.
Evensen, G., and P. J. van Leeuwen, 1996: Assimilation of Geosat altimeter data for the Agulhas current using the ensemble Kalman filter with a quasigeostrophic model. Mon. Wea. Rev., 124 , 85–96.
Houtekamer, P. L., and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique. Mon. Wea. Rev., 126 , 796–811.
Houtekamer, P. L., and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 129 , 123–137.
Jazwinski, A. H., 1970: Stochastic Processes and Filtering Theory. Academic Press, 376 pp.
Kalman, R., and R. Bucy, 1961: New results in linear prediction and filtering theory. Trans. AMSE J. Basic Eng., 83D , 95–108.
Keppenne, C. L., 2000: Data assimilation into a primitive-equation model with a parallel ensemble Kalman filter. Mon. Wea. Rev., 128 , 1971–1981.
Lermusiaux, P. F., and A. R. Robinson, 1999: Data assimilation via error subspaces statistical estimation. Part I: Theory and schemes. Mon. Wea. Rev., 127 , 1385–1407.
Miller, R. N., M. Ghil, and F. Gauthiez, 1994: Advanced data assimilation in strongly nonlinear dynamical systems. J. Atmos. Sci., 51 , 1037–1056.
Mitchell, H. L., and P. L. Houtekamer, 2000: An adaptive ensemble Kalman filter. Mon. Wea. Rev., 128 , 416–433.
Pham, D. T., 2001: Stochastic methods for sequential data assimilation in strongly nonlinear systems. Mon. Wea. Rev., 129 , 1194–1207.
Silverman, B. W., 1986: Density Estimation for Statistics and Data Analysis. Chapman and Hall, 175 pp.
Tarantola, A., 1987: Inverse Problem Theory. Elsevier, 613 pp.
Tippett, M. K., J. L. Anderson, C. H. Bishop, T. M. Hamill, and J. S. Whitaker, 2003: in press: Ensemble square-root filters. Mon. Wea. Rev., . in press.
Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations. Mon. Wea. Rev., 130 , 1913–1924.
APPENDIX
Degenerate Prior Covariance
A potential complication occurs if Σp is degenerate so that its inverse is not defined. There are several reasons why Σp might be degenerate. First, it is possible that the joint state vector is of order greater than the size of the space that it spans. The most obvious instance occurs when an observation is a linear function of the prior model state variables (for instance if a state variable is observed directly). Second, details of the sample statistics of the prior ensemble could lead to degeneracy. For instance, if the ensemble is smaller than the size of the joint state space, Σp computed from the sample statistics must be degenerate. Third, details of the prediction model could lead to states that are confined to some submanifold of the model state space.
The two-step procedures for the EnKF and EAKF continue to be equivalent to the traditional implementations even if Σp is degenerate. In this case, (7)–(9) must be modified to replace the inverses with pseudoinverses. Let 𝗨T be an η × k orthogonal matrix whose rows are the set of left singular vectors of Σp corresponding to nonzero singular values (η is the rank of Σp). Applying 𝗨T to a vector in the original space gives the projection of that vector on the range of Σp. The projection of the covariance on this space is 𝗨TΣp𝗨. The probability density of the prior distribution outside of this subspace is zero, so the updated distribution and hence Δz must lie in this subspace.
Next, define an η × η orthogonal matrix 𝗕T that performs a change of basis in the reduced SVD subspace. Let the η × k matrix 𝗖T = 𝗕T𝗨T be defined so that the last row of 𝗖T is the projection of the observation variable, zk = y, on the range of Σp; the last column of 𝗖T is [0, 0, … , 0, ψ]T. As long as the observation vector does not lie in the null space of Σp, 𝗖T exists, but this must be the case for (2) to have a relevant solution. If the observation vector did lie entirely in the null space, then p(yo|zp) would be 0 with probability 1 and the result of (2) would be a delta function indicating a deterministic solution [and probably an improperly defined problem (Tarantola 1987)].


This demonstrates that using the two-step procedure in the original space, even if Σp is degenerate, gives results corresponding to those given by the previously documented versions of the EnKF and EAKF.

An idealized representation showing the relation between update increments for a state variable, x, and an observation variable, y, for a five member ensemble represented by asterisks. The projection of the ensemble on the x and y axes is represented by a plus sign and the observation, yo is represented by ×. In this case, y is functionally related to x by
Citation: Monthly Weather Review 131, 4; 10.1175/1520-0493(2003)131<0634:ALLSFF>2.0.CO;2

An idealized representation showing the relation between update increments for a state variable, x, and an observation variable, y, for a five member ensemble represented by asterisks. The projection of the ensemble on the x and y axes is represented by a plus sign and the observation, yo is represented by ×. In this case, y is functionally related to x by
Citation: Monthly Weather Review 131, 4; 10.1175/1520-0493(2003)131<0634:ALLSFF>2.0.CO;2
An idealized representation showing the relation between update increments for a state variable, x, and an observation variable, y, for a five member ensemble represented by asterisks. The projection of the ensemble on the x and y axes is represented by a plus sign and the observation, yo is represented by ×. In this case, y is functionally related to x by
Citation: Monthly Weather Review 131, 4; 10.1175/1520-0493(2003)131<0634:ALLSFF>2.0.CO;2

As in Fig. 1 but showing the application of local least squares fits, in this case using only the nearest neighbor in y, to compute the updates for x given the updates for y. The local updates for the first and fourth ensemble members are shown by the black vectors
Citation: Monthly Weather Review 131, 4; 10.1175/1520-0493(2003)131<0634:ALLSFF>2.0.CO;2

As in Fig. 1 but showing the application of local least squares fits, in this case using only the nearest neighbor in y, to compute the updates for x given the updates for y. The local updates for the first and fourth ensemble members are shown by the black vectors
Citation: Monthly Weather Review 131, 4; 10.1175/1520-0493(2003)131<0634:ALLSFF>2.0.CO;2
As in Fig. 1 but showing the application of local least squares fits, in this case using only the nearest neighbor in y, to compute the updates for x given the updates for y. The local updates for the first and fourth ensemble members are shown by the black vectors
Citation: Monthly Weather Review 131, 4; 10.1175/1520-0493(2003)131<0634:ALLSFF>2.0.CO;2

As in Fig. 1 but now y =
Citation: Monthly Weather Review 131, 4; 10.1175/1520-0493(2003)131<0634:ALLSFF>2.0.CO;2

As in Fig. 1 but now y =
Citation: Monthly Weather Review 131, 4; 10.1175/1520-0493(2003)131<0634:ALLSFF>2.0.CO;2
As in Fig. 1 but now y =
Citation: Monthly Weather Review 131, 4; 10.1175/1520-0493(2003)131<0634:ALLSFF>2.0.CO;2