## 1. Introduction

Numerical prediction through state estimation in the geophysical sciences strives for high resolution and longer cycling intervals and/or assimilation windows. High resolution, which presently refers to resolving the mesoscale, provides the ability to resolve those details of the circulation that are important to making a useful forecast, such as convection along frontal boundaries and the internal dynamics within a tropical cyclone. The cycling interval refers to the time between state estimates. Longer cycling intervals can provide computational benefits by requiring the reading and writing of the state vector (possibly many state vectors in the ensemble context) as well as the matrix algebra of the data assimilation algorithm to be executed less often. The assimilation window refers to the window in time over which observations are allowed to affect a state estimate. Longer assimilation windows allow for the inclusion of a greater number of observations in each window. It has been suggested that another useful feature of long assimilation windows is that to be consistent with the observed life cycle of baroclinic waves a window of as long as 5 days is necessary (Fisher et al. 2005; Lorenc and Payne 2007).

Designing state estimation systems for use with high resolution and relatively long cycling intervals and/or assimilation windows will require consideration of a nonnormal probability density function (pdf) describing the likely distribution of states. This is because the effects of nonlinearity appear most strongly as the scale of meteorological phenomena decreases and the time scale over which one is interested increases (Zhang et al. 2003; Tan et al. 2004; Reynolds and Rosmond 2003). This implies that quantities such as the third moment of the distribution become important and through the nonlinear model the second moment at forecast time becomes dependent on the moments higher than the second at the analysis time (Miller et al. 1994).

Previous work toward state estimation techniques for nonlinear modeling systems has been discussed by Kushner (1967), Jazwinski (1998, their chapter 9), Anderson and Anderson (1999), and Julier and Uhlmann (1997, 2004). However, in the geophysical sciences work toward explicitly accounting for the nonnormal aspects of the distribution of possible states within an ensemble data assimilation framework has emphasized particle filtering [see van Leeuwen (2009) for a comprehensive review]. In particle filtering one estimates the probability that a particular realization is the true state by comparing against observations. Given these probabilities (sometimes referred to as weights) one can do such things as make a state estimate based on the mean of the pdf or randomly sample from the pdf to generate an ensemble consistent with observational and prior uncertainties. While this method is capable of dealing with any arbitrary nonnormal pdf it does so in a way that does not make transparent the role played by the underlying nonnormal aspects of the pdf. In addition, there appear to be significant obstacles to the application of particle methods to the high-dimensional systems in the geophysical sciences (Snyder et al. 2008). Last, the algorithmic structure is relatively different from the explicit linear algebra involving the second moment required of traditional Kalman (1960) filtering. This makes it difficult to understand from the particle filtering perspective how and why the moments higher than the second affect a state estimate.

Another way to get nonnormal distributions is through nonlinearity in the observation operators or the nonnormality of the observation likelihood. In this case the prior distribution of possible states before observations are considered may be normal; however, the pdf after the observations are considered (posterior) may be nonnormal. Two examples of ensemble filters that are aimed at the situation where the observation operator is nonlinear or the observation likelihood is nonnormal (but the prior distribution is normal) are the work of Zupanski (2005) and Fletcher and Zupanski (2006). Both of these works are directly related to variational methods (e.g., Lorenc 1986; Navon et al. 1992; Courtier et al. 1994) through the statement of a cost function, which defines both the posterior distribution and subsequently that the solution technique is a minimization algorithm. The use of a minimization technique is fundamentally different from that of Kalman filtering as the minimization of a cost function aims at finding the mode of the posterior distribution while the Kalman filter is attempting to find its mean. Consequently, it is important to note that the mean and the mode of the posterior distribution will typically be different because of the generally nonnormal nature of the posterior. In the work to be presented here we will present a theory for both nonnormal prior distributions and observation likelihoods that converges on the mean of the posterior for arbitrary posterior distributions and reduces to the traditional Kalman filter for normal posterior distributions.

The main goal of the work to be presented here is the presentation and derivation of a framework for the basic understanding of the role of the moments higher than the second on ensemble data assimilation. The derivation we present will be such that 1) it is optimal for arbitrarily distributed errors, 2) it is transparent in its use of the moments of the pdf, and 3) it is algorithmically similar to the matrix operations of the traditional Kalman filter. To this end we rederive the state estimation problem starting from Bayes’ rule. We make the standard assumption that the truth is unknowable and rather than attempting to find it we focus our attention on estimating the mean of the conditional pdf given observations and a previous guess. By expanding the expected forecast error in a polynomial series in powers of the innovation (the difference between today’s observation and our guess of that observation) we explicitly show how the moments of the prior affect the computation of the mean of the posterior. We emphasize that we will make no assumptions on the structure of the prior distribution. This formulation of the problem reveals how the properties of this state estimate vary with the innovation and how this is related to the geometry of the mean of the posterior.

In section 2 we derive the theory for state estimation using polynomial expansions in the innovation. In section 3 we show how truncating the expansion derives linear and quadratic estimation equations. In section 4 we apply the theory of sections 2 and 3 to simple scalar systems with normal and nonnormal pdfs and within the Lorenz (1963) attractor. Section 5 closes the manuscript with a summary of the most important results, conclusions, and points out avenues of future research presently being investigated.

## 2. Theory

### a. Posterior distribution

**x**to be an

*N*vector that is drawn from a distribution whose pdf we label

**x**

*. At the present time we have a*

_{f}*p*vector of observations

**y**such that we may use Bayes’ rule to obtain a density that describes the combined knowledge of the likely distribution of states:The density

### b. Posterior mean

In this section we will make no assumptions on the structure of the prior. The derivation below will assume that the observation likelihood is symmetric but not necessarily normal. This assumption that the observation likelihood is symmetric is made only for ease and clarity of presentation; the derivation can easily be completed for arbitrary observation likelihoods. Similarly, we will assume throughout that the observation operator is linear though this assumption is also easily relaxed and does not change the essential results [e.g., see Houtekamer and Mitchell (2001) for a discussion of linear versus nonlinear observation operators in an ensemble framework.].

**f**is assumed smooth and is the object of central interest. One way to understand the structure of

**f**is through an expansion in terms of the observation about the prior estimate of that observation (Jazwinski 1998, 340–346):where the innovation

*N*-dimensional state vector

*p*-dimensional observation space; and the vector

*p*

^{2}such thatwhere the symbol “⊗” refers to the Kronecker product and

*υ*is the

_{i}*i*th element of the innovation vector. A similar representation applies to the

*p*

^{3}vector

### c. Determining the coefficients of the expansion

**is a random variable with zero mean. Therefore, an equation for the “error” in estimating the true state as the posterior mean may be obtained by subtracting**

*ε***x**from both sides of (2.7):Because the expected value of

**and**

*ε**expected*posterior error covariance matrix:where

*expected*posterior error covariance matrix may be calculated aswherewith

**0**

_{1}the

*p*×

*p*zero matrix and

**0**

_{2}the

*p*×

*p*

^{2}zero matrix. The matrices in (2.16a)–(2.16d) are defined asand

*N*

^{2}vector into the

*p*

^{2}predictor space and copious use of the identity,

In (2.17a),

*expected*posterior error covariance matrix (2.15) with respect to the matrix

## 3. Truncation of the polynomial expansion

In practice it is impossible to calculate all of the moments required in the matrix

### a. Linear filter

**f**

_{0}of the expansion and therefore the estimate of the mean of the posterior for vanishingly small innovations is simply that of the mean of the prior. As we will show, this is a good assumption for symmetric prior distributions, but not for strongly skewed prior distributions.

### b. Quadratic filter

**f**

_{0}. Hence, in the limit of vanishingly small innovation the linear filter (3.2a) fails to include the largest term of the complete expansion (2.5), while the quadratic filter makes an

*estimate*of

**f**

_{0}.

*i*th column of

**ε**

*is the*

_{fi}*i*th prior ensemble perturbation). Equation (3.8) is the difference between the perturbation of the squared innovation around its mean and the prediction of that perturbation squared innovation by a linear combination of the squared columns of the square root of

*p*×

*p*identity matrix. By comparing (3.11) to (3.5) one can see that the lowest-order variation in the analysis error covariance because of drawing a particular innovation is due entirely to the third moment of the prior. Equation (3.11) differs from (3.5) even when today’s innovation happens to be equal to zero. This is because the slope of the linear term

*p*×

*p*submatrices

*elementary matrix*defined with elements

*e*with the property that

_{mn}*m*=

*i*and

*n*=

*k*and all other elements are zero. Note that in the first matrix on the right-hand side of (3.14) the second index advances first and all possible elementary matrices are realized. In the second matrix on the right-hand side of (3.14) the

*p*×

*p*identity matrix appears at each location where

*i*=

*k*. Note that

## 4. Illustrative examples

### a. Scalar systems

For concreteness we will focus on two different priors: 1) a normal distribution [i.e., where

These two priors have been carefully chosen to illustrate what we believe are typical characteristics of pdf’s in linear and nonlinear systems. The normally distributed prior has been chosen because it is the canonical model for linear systems. The chi-square prior has been chosen as a model for a nonnormal pdf, such as what would be found in a nonlinear, forced-dissipative system [such as Lorenz (1963); see section 4b]. This is because this pdf has the property that some states are forbidden (zero probability of occurrence) consistent with an underlying attracting manifold and subsequently that the mean of the posterior is “curved” as a function of the innovation. As we shall see this leads to the posterior error variance being a nonlinear function of the innovation. Again, we emphasize that we feel that the curvature in the posterior mean is an important distinguishing characteristic between linear and nonlinear systems.

In Fig. 1 is shown the posterior distribution for the two priors mentioned above as represented through a scatter diagram of random draws from the true distribution as a function of the innovation. One distinguishing characteristic of a prior that is normal is that the mean of its posterior is a linear function of the innovation (Fig. 1a). Another distinguishing characteristic of a prior that is normal is that the posterior error variance is constant (Fig. 1b) and hence equal to its expected posterior error variance

#### 1) Estimating the posterior mean

Application of the quadratic filter in (3.7a) to the normal prior in Figs. 1a,b results in a state estimate that is identical to that of the linear filter in (3.1a). This is because

In contrast, application of the state estimation techniques of section 3 to the chi-square prior in Figs. 1c,d obtains quite different estimates of the mean and error variance of the posterior (Fig. 2) for a linear versus quadratic estimate. Note that both the linear and quadratic estimation techniques provide a reliable (albeit different) estimate of the mean of the posterior for small innovation. The largest errors are obtained for large innovation consistent with what one would expect from the truncation of a polynomial expansion. In other words, the effect of including additional terms in the expansion is to keep the estimate reliable over a larger range of innovations. An obvious way to define the range of innovations for which an estimation technique is reliable is to require its posterior error variance to remain smaller than its prior error variance. In Fig. 2b it can be seen that truncating the polynomial expansion at the linear term is a reliable method over a range of approximately **v** equal to −3 to 5, while truncating the polynomial expansion at the quadratic term is a reliable method over the much larger range of approximately **v** equal to −5 to 9. This behavior that the linear estimator is accurate only over a limited range of innovations is clearly one reason why operational forecasting centers find it useful to discard large innovations. Note, however, that large negative innovations in the system defined by the chi-square prior actually result in the mean of the posterior having smaller error variance than when compared to the error variance for **v** = **0**. This shows that a technique that can effectively make use of relatively large innovations is potentially much more accurate on those days when large innovations are realized.

A linear estimator obtains an estimate of the function **f**_{0} = 0 and therefore its estimate at **v** = 0 is simply that of the prior mean. In contrast, because the quadratic filter is capable of obtaining a nonzero estimate of the function **f**_{0} this allows the quadratic filter to obtain a smaller error variance even when **v** = 0. The posterior error variance when **v** = 0 is 0.45 for the quadratic filter while the linear filter obtains 0.57. Similarly, the linear filter finds an expected posterior error variance of 0.67 and for the quadratic filter the expected posterior error variance is found to be 0.51. This shows that when weighted toward small but likely innovations, the linear estimator reduces the error variance of the prior by approximately 83%, while the addition of the quadratic term reduces the error variance by another 10%. Because the first two terms of the polynomial series account for 93% of the expected posterior error variance it proves reasonable to truncate at the quadratic term.

#### 2) Sampling from the posterior

So far our focus has been upon estimating the posterior mean. An equally important part of employing ensemble methods is the generation of the ensemble such that it correctly samples the posterior distribution. Here we illustrate the application of the framework described in the previous section to distributions whose posterior error variance is a function of the innovation. Our focus will be on the chi-square prior because, as shown in Figs. 1a,b, the error variance of the normal distribution is not a function of the innovation and therefore application of traditional techniques in the linear and quadratic filters precisely recovers the true posterior distribution. It is the nonnormal aspects of the prior that lead to issues with the generation of a random sample consistent with the posterior distribution.

**v**one obtains an ensemble whose error variance is equal to the expected posterior error variance in (3.2b).

*linear*filter algorithms the sign of the innovation within the perturbed observation algorithm can self-consistently be represented either way. However, in the quadratic filtering framework careful consideration of the structure of the innovation must be made to maintain the correct signed relationship between the third moment of the innovation and the third moment used in the quadratic gain term. These considerations show that the self-consistent generation of an ensemble in the quadratic filtering framework can be accomplished aswhere we again draw an observation error from

**v**one obtains an ensemble whose error variance is equal to the expected posterior error variance (3.7b).

Figures 3a,b show the perturbed observations ensemble for the linear and quadratic estimation techniques. Recall that these figures are intended to be representations of the posterior distribution in Fig. 1c. Note that both methods produce far too many ensemble members that sample the region of zero probability of **x** < 0. Note also that because both methods create an ensemble whose error variance is that of the *expected* posterior error variance, and not the posterior error variance as a function of the innovation, the dispersion of the members around their mean appears quite uniform as a function of innovation. Last, the true posterior distribution in Fig. 1c has a strong third moment for small innovation while the ensemble distributions in Fig. 3 both have a very small third moment as can be seen by the symmetry of the distribution of the ensemble members around their mean. This excessive symmetry or smoothing of the shape of the pdf through ensemble generation by perturbed observation algorithms has been discussed previously by Lawson and Hansen (2004). This excessive symmetry of the ensemble’s estimate of the posterior is particularly unsatisfactory when we consider that the biggest error in our ensemble’s estimate of the third moment occurs where the innovation is small and hence where we might expect an expansion in the innovation to perform well. These issues arise not because of the framework of the expansion but because we required the ensemble to satisfy its *expected* error variance (and expected higher moments) and not its error variance as a function of innovation.

This symmetry of the ensemble’s estimate of the posterior distribution is a serious issue for obtaining an optimal filtering system when nonlinearity is such that the posterior mean contains substantial curvature. As shown by Miller et al. (1994), the moments of the posterior distribution higher than the second, after being propagated by the nonlinear model, will affect the second moment of the subsequent prior. This will then lead to an inaccurate prior that will adversely affect the performance of the linear and quadratic filtering equations in the next data assimilation cycle.

Obviously it would be far better for the posterior ensemble distribution to match its true error variance in Fig. 2b and to preserve its higher moments. At present we have not been able to derive a set of equations that produces an ensemble that delivers a match to the true posterior error variance and also its higher moments as a function of innovation. We have however already presented a technique for determining an estimate of the true error variance at observation locations (appendix B). In Fig. 4 we compare the true error variance as a function of the innovation to that estimated by the derivative technique of appendix B. Similarly, prior inflation techniques, such as Anderson (2007), though generally discussed in the context of sampling error, also compensate for the fact that the error variance of the state estimate as a function of the innovation (Fig. 2b) should typically be larger than the expected error variance and does so by using information derived from observations.

### b. Lorenz attractor

In this section we will apply the previously developed theory to the update of the posterior mean and distribution of the ensemble. The main goal here is to understand the behavior when observing only a portion of the state-vector and updating unobserved variables.

We perform an integration starting from a point (initial condition) off the attractor and integrate long enough (ten units of time) that we are assured that we have arrived on the attractor. We take this new state and perturb it 100 000 times with normally distributed random noise with mean 0 and variance 0.01 and subsequently integrate each perturbed state for 1 unit of time. We chose one unit of time because we felt that this was sufficiently long that the ensemble with such small initial variance was now entirely lying on the attractor, but not so long that the distribution of members spread unrealistically far from each other in the context of ensemble data assimilation. Next, we assume that this ensemble is the true prior distribution and randomly observe the variable *z* of members of this prior with observation error variance of 0.1. This procedure determines the posterior distribution as a function of the innovation and is plotted in Fig. 5. In Fig. 5c is shown the posterior distribution for the observed variable *z*. Note that the posterior of the observed variable is the most linear (but not exactly) as a function of the innovation. This property is easily explained by writing the formula for the innovation using the definition of the observation *x* and the “square” of the observed variable [i.e., the third moment in question is

#### 1) Estimating the posterior mean

Here we will illustrate the differences between the linear and quadratic filtering frameworks in the context of the posterior distributions of the Lorenz model. Figure 5 shows the true mean of the posterior as well as the estimates of that mean from the linear and quadratic filtering frameworks. The curvature of the posterior mean is now clearly related to the underlying attracting manifold of the Lorenz model. Because of this curvature the linear filtering framework has difficulty estimating the posterior mean of the unobserved variables for all but the smallest innovations. This issue with the linear filtering framework can be seen in the plots of the error variance about the linear filtering estimate of the mean in Figs. 5d,e. Similar to the chi-square model of the previous section we can see that the effect of including more terms in the expansion is to make the estimate of the mean of the posterior distribution reliable over a wider range of innovations. In addition, the estimation of **f**_{0} by the quadratic filter makes a clear difference in the estimation of the mean of the posterior for the *x* variable in Fig. 5a. This can be seen in the significant difference in the estimation of the mean for small innovation.

#### 2) Sampling from the posterior

We again use the perturbed observations framework in (4.1) and (4.3) to attempt to sample from the posterior distribution. The result of this is plotted in Fig. 6. Because the most noteworthy differences can be illustrated with simply the variable *x* we show only that variable. The estimated posterior distribution for the variable *y* is very similar to that of *x* shown in Fig. 6. In contrast, the observed variable *z* has the best posterior estimate because of its significant linear characteristics (not shown). The dominating characteristic of the unobserved variables is that the structure of the estimated posterior distributions as a function of the innovation is excessively smooth. For both the *x* and *y* variables the inaccurate posterior mean of the linear filter leads to the location where the dispersion of the ensemble members is centered being located at far too large of *x* and *y* values. For the quadratic filter the curvature of the mean of the distribution keeps the ensemble members appropriately centered though similar to the linear filter the dispersion about the mean is still far too uniform. Note also that because we have insisted that the ensemble have uniform variance as a function of innovation, the relationship between the posterior error variance and the posterior third moment in (B.12) is not appropriately related to the first two moments of the distribution.

In Fig. 7 we show the estimate of the posterior error variance from taking the derivative of the linear and quadratic estimation equations with respect to the innovation. Because we have one observation in this experiment the derivative technique of appendix B obtains an estimate of a single column of the posterior error covariance matrix as a function of the innovation. Recall that in the linear estimation technique this column is equal to the expected value of the analysis error covariance for a linear estimator. As can be seen in Fig. 7 the quadratic framework does a fairly good job of estimating the true posterior error covariance over a wide range of innovations.

## 5. Summary and conclusions

A framework has been constructed through which the effect of the moments of the prior on the mean of the posterior can be clearly understood. This framework relies on the fact that the mean of the posterior can be written as a smooth function of the innovation and does not require any assumptions on the structure of the prior or the observation likelihood. This function was then expanded in a polynomial series in the innovation to relate the structure of this function to the moments of the prior. The result of this procedure was the determination of a “linear” update equation for the mean of the posterior using powers of the innovation.

The traditional Kalman filtering equations have been shown to be consistent with a dominant term in the expansion. The expansion shows that the linear filter is an accurate approximation to the posterior mean when the prior distribution is symmetric, but not necessarily normal, and the innovation is not particularly large. In this case, the expansion is relatively accurate after truncation to the term linear in the innovation, which is of course the traditional Kalman update equation for the posterior mean.

Upon including additional terms in the expansion a filter quadratic in the innovation is derived. This filter is quadratic in the sense that it uses square products of the innovation as additional predictors to increase the accuracy of the estimate of the posterior mean. The quadratic filter was shown to be able to produce state estimates that better follow the curvature of nonnormal posterior distributions when compared against traditional linear filtering techniques. In addition, this new filter is also capable of making a correction to the prior mean in situations where the innovation is identically zero by making use of information derived from the third moment of the prior distribution.

This polynomial expansion framework was applied to simple models of nonnormal pdfs. It was shown that the usefulness that is derived by increasing the number of predictors through powers of the innovation is to make the state estimate more reliable over a larger range of innovations. This behavior is most strongly realized in situations where the posterior distribution is highly curved as a function of the innovation. This was shown to be due to the curvature in the posterior distribution being directly related to the moments higher than the second. This behavior suggests that using the additional predictors suggested by the framework of the polynomial expansion should require fewer innovations to be discarded during the assimilation process and hence a better estimate of the posterior mean. On the Lorenz (1963) attractor it was shown that the posterior mean of the unobserved variables had the largest resulting curvature as a function of the innovation, and subsequently the largest improvement from the quadratic innovation technique is found in the unobserved variables. This result that the largest improvement from the quadratic innovation technique is found in the unobserved variables has recently been seen in preliminary experiments in a high-dimensional nonlinear model and will be reported on in a later study.

The generation of the ensemble was also explored under the polynomial expansion framework. It was shown that the ensemble generation technique referred to as “perturbed observations” produces a collection of states that satisfy the expected posterior error variance. This turns out to be a significant issue for posterior distributions that are highly curved as a function of the innovation because these distributions have posterior error variances that can differ substantially from their expected values. It is important to note that this issue with the error variance is not solely a peculiarity of the perturbed observations algorithm, square root filters (e.g., Anderson 2001; Bishop et al. 2001) also strive to require the variance of the ensemble to be that of the expected error variance rather than the error variance as a function of the innovation. This issue with the error variance of the ensemble’s representation of the true posterior distribution has previously been dealt with implicitly using inflation techniques (e.g., Anderson 2007). We have shown here that an alternate and more direct technique can also be used that makes use of the fact that the true posterior error variance is related to the derivative of the state estimate with respect to the innovation.

Similarly, the estimate of the posterior distribution during ensemble generation through perturbed observations was shown to result in an overly symmetric distribution and hence the wrong third moment. This implies the possibility that cycling over short intervals may lead to a significant amount of symmetry being imposed upon the ensembles estimate of the distribution by the ensemble generation algorithm and a subsequent poor estimate of the third moment. While longer cycling intervals may allow for the development of asymmetric features in the prior through nonlinear processes it also increases the error variance of the prior leading to the possibility of filter divergence.

The development of a localization strategy and the determination of the appropriate ensemble size are likely to be the most significant issues with the application of the quadratic filter. Preliminary results with a high-dimensional nonlinear model suggests that an ensemble size of at least 100 is necessary for properly estimating the third and fourth moment matrices. The exploration of approximations for computational expediency and the examination of the effects of sampling error, localization, and inflation in the framework of the polynomial expansion in a high-dimensional system are presently being examined and will be reported on in a later study.

## Acknowledgments

Useful and encouraging conversations on this subject with Craig Bishop, Jim Hansen, Justin Mclay, Jon Moskaitis, and Chris Snyder are greatly appreciated. I gratefully acknowledge support from the Naval Research Laboratory Base Program through Grant BE-4287-0-0-5.

## APPENDIX A

### Normally Distributed Powers of the Error

We show here that the linear gain matrix representation in (2.18) is consistent with the solution of the state estimation problem given normally distributed powers of the error.

*N*and

_{p}*N*are normalization constants. The extended state vector

_{υ}*N*elements of

An example of the “prior” (A.1) is plotted in Fig. A1 for the truncated two-dimensional space *x*_{1} and *x*_{2} for the chi-square prior distribution of section 4a. Note that this “pdf” is not intended to represent the probability of a particular state within the chi-square distribution but rather its moments.

*N*is simply a normalization constant and

_{a}*N*rows of (A.11) are identical to (2.18). Hence, we have now shown that the mean of a nonnormal posterior is related to an underlying normal distribution determined by powers of the forecast error. In addition, this reveals the proper variational solution for the quadratic filter as simply the minimum of (A.6) where we truncate the expansion atThis formulation of the problem was verified to precisely reproduce the quadratic estimate of the mean of the distributions of section 4a. Note that this formulation of the variational solution allows for the solution of the state estimation problem for nonnormal priors with standard minimization techniques.

## APPENDIX B

### Geometry of the Mean

*N*is a normalization factor that is independent of the innovation. Because of the properties of the derivative of an exponential function the derivative of (B.6) takes a particularly simple form:Using (B.7) in (B.3) and (B.5) obtainsGiven the derivative in (B.8) we may now evaluate (B.1) to obtain the sought after expressionwhere

_{o}A similar recursive relationship may be obtained for the moments beyond the third by performing the analogous analysis.

## REFERENCES

Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation.

,*Mon. Wea. Rev.***129**, 2884–2903.Anderson, J. L., 2007: An adaptive covariance inflation error correction algorithm for ensemble filters.

,*Tellus***59A**, 210–224.Anderson, J. L., , and S. L. Anderson, 1999: A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts.

,*Mon. Wea. Rev.***127**, 2741–2758.Bishop, C. H., , B. J. Etherton, , and S. J. Majumdar, 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects.

,*Mon. Wea. Rev.***129**, 420–436.Buehner, M., 2005: Ensemble-derived stationary and flow-dependent background-error covariances: Evaluation in a quasi-operational NWP setting.

,*Quart. J. Roy. Meteor. Soc.***131**, 1013–1043.Burgers, G., , P. J. van Leeuwen, , and G. Evenson, 1998: On the analysis scheme in the ensemble Kalman filter.

,*Mon. Wea. Rev.***126**, 1719–1724.Courtier, P., , J.-N. Thepaut, , and A. Hollingsworth, 1994: A strategy for operational implementation of 4d-var, using an incremental approach.

,*Quart. J. Roy. Meteor. Soc.***120**, 1367–1387.Evenson, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte-Carlo methods to forecast error statistics.

,*J. Geophys. Res.***99**, 143–162.Fisher, M., , M. Leutbecher, , and G. A. Kelly, 2005: On the equivalence between Kalman smoothing and weak-constraint four-dimensional variational data assimilation.

,*Quart. J. Roy. Meteor. Soc.***131**, 3235–3246.Fletcher, S. J., , and M. Zupanski, 2006: A data assimilation method for log-normally distributed observational errors.

,*Quart. J. Roy. Meteor. Soc.***132**, 2505–2519.Golub, G. H., , and G. A. Meurant, 2010:

*Matrices, Moments and Quadrature with Applications*. Princeton University Press, 363 pp.Houtekamer, P. L., , and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique.

,*Mon. Wea. Rev.***126**, 796–811.Houtekamer, P. L., , and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation.

,*Mon. Wea. Rev.***129**, 123–137.Jazwinski, A. H., 1998:

*Stochastic Processes and Filtering Theory*. Academic Press, 376 pp.Julier, S. J., , and J. K. Uhlmann, 1997: New extension of the Kalman filter to nonlinear systems.

*Signal Processing, Sensor Fusion, and Target Recognition VI,*I. Kader, Ed., International Society for Optical Engineering (SPIE Proceedings, Vol. 3068), 182–193.Julier, S. J., , and J. K. Uhlmann, 2004: Unscented filtering and nonlinear estimation.

,*Proc. IEEE***92**, 401–422.Kalman, R. E., 1960: A new approach to linear filtering and prediction problems.

,*J. Basic Eng.***82**, 35–45.Kalman, R. E., , and R. S. Bucy, 1961: New results in linear filtering and prediction theory.

,*Trans. ASME J. Basic Eng.***83D**, 95–107.Kushner, H. J., 1967: Approximations to optimal nonlinear filters.

,*IEEE Trans. Automat. Contrib.***AC-12**, 546–556.Lawson, W. G., , and J. A. Hansen, 2004: Implications of stochastic and deterministic filters as ensemble-based data assimilation methods in varying regimes of error growth.

,*Mon. Wea. Rev.***132**, 1966–1981.Lorenc, A. C., 1986: Analysis methods for numerical weather prediction.

,*Quart. J. Roy. Meteor. Soc.***112**, 1177–1194.Lorenc, A. C., , and T. Payne, 2007: 4D-Var and the butterfly effect: Statistical four-dimensional data assimilation for a wide range of scales.

,*Quart. J. Roy. Meteor. Soc.***133**, 607–614.Lorenz, E. N., 1963: Deterministic non-periodic flow.

,*J. Atmos. Sci.***20**, 130–141.Miller, R. N., , M. Ghil, , and F. Gauthiez, 1994: Advanced data assimilation in strongly nonlinear dynamical systems.

,*J. Atmos. Sci.***51**, 1037–1056.Navon, I. M., , X. Zhou, , J. Derber, , and J. Sela, 1992: Variational data assimilation with an adiabatic version of the NMC spectral model.

,*Mon. Wea. Rev.***120**, 1433–1446.Ott, E., and Coauthors, 2004: A local ensemble Kalman filter for atmospheric data assimilation.

,*Tellus***56A**, 415–428.Reynolds, C. A., , and T. E. Rosmond, 2003: Nonlinear growth of singular-vector-based perturbations.

,*Quart. J. Roy. Meteor. Soc.***129**, 3059–3078.Snyder, C., , T. Bengtsson, , and J. Anderson, 2008: Obstacles to high-dimensional particle filtering.

,*Mon. Wea. Rev.***136**, 4629–4640.Tan, Z., , F. Zhang, , R. Rotunno, , and C. Snyder, 2004: Mesoscale predictability of moist baroclinic waves: Experiments with parameterized convection.

,*J. Atmos. Sci.***61**, 1794–1804.Tippet, M. K., , J. L. Anderson, , C. H. Bishop, , T. M. Hamill, , and J. S. Whitaker, 2003: Ensemble square root filters.

,*Mon. Wea. Rev.***131**, 1485–1490.van Leeuwen, P. J., 2009: Particle filtering in geophysical systems.

,*Mon. Wea. Rev.***137**, 4089–4114.Zhang, F., , C. Snyder, , and R. Rotunno, 2003: Effects of moist convection on mesoscale predictability.

,*J. Atmos. Sci.***60**, 1173–1185.Zupanski, M., 2005: Maximum likelihood ensemble filter: Theoretical aspects.

,*Mon. Wea. Rev.***133**, 1710–1726.