Square Root and Perturbed Observation Ensemble Generation Techniques in Kalman and Quadratic Ensemble Filtering Algorithms

Daniel Hodyss Naval Research Laboratory, Monterey, California

Search for other papers by Daniel Hodyss in
Current site
Google Scholar
PubMed
Close
and
William F. Campbell Naval Research Laboratory, Monterey, California

Search for other papers by William F. Campbell in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

The main goal of this work is to present a new square root ensemble generation technique that is consistent with a recently developed extension of Kalman-based linear regression algorithms such that they may perform nonlinear polynomial regression (i.e., includes a quadratically nonlinear term in the mean update equation) and that is applicable to ensemble data assimilation in the geosciences. Along the way the authors present a unification of the theories of square root and perturbed observation (sometimes referred to as stochastic) ensemble generation in data assimilation algorithms configured to perform both linear (Kalman) regression as well as quadratic nonlinear regression. The performance of linear and nonlinear regression algorithms with both ensemble generation techniques is explored in the three-variable Lorenz model as well as in a nonlinear model configured to simulate shear layer instabilities.

Corresponding author address: Dr. Daniel Hodyss, Naval Research Laboratory, Marine Meteorology Division, 7 Grace Hopper Ave., Stop 2, Monterey, CA 93943. E-mail: daniel.hodyss@nrlmry.navy.mil

Abstract

The main goal of this work is to present a new square root ensemble generation technique that is consistent with a recently developed extension of Kalman-based linear regression algorithms such that they may perform nonlinear polynomial regression (i.e., includes a quadratically nonlinear term in the mean update equation) and that is applicable to ensemble data assimilation in the geosciences. Along the way the authors present a unification of the theories of square root and perturbed observation (sometimes referred to as stochastic) ensemble generation in data assimilation algorithms configured to perform both linear (Kalman) regression as well as quadratic nonlinear regression. The performance of linear and nonlinear regression algorithms with both ensemble generation techniques is explored in the three-variable Lorenz model as well as in a nonlinear model configured to simulate shear layer instabilities.

Corresponding author address: Dr. Daniel Hodyss, Naval Research Laboratory, Marine Meteorology Division, 7 Grace Hopper Ave., Stop 2, Monterey, CA 93943. E-mail: daniel.hodyss@nrlmry.navy.mil

1. Introduction

A key component of ensemble-based data assimilation (DA) is the generation of the ensemble consistent with the distribution of possible true states conditioned on the latest set of observations (commonly referred to as the posterior distribution). The two techniques that attempt to accomplish this task and that are commonly used in the geophysical community are referred to as perturbed observation (or stochastic) ensemble generation (Evensen 1994, 2003; Burgers et al. 1998; Houtekamer and Mitchell 2001, 2005) and square root (or deterministic) forms (Anderson 2001; Bishop et al. 2001). These techniques have been used with great success (e.g., Houtekamer et al. 2005; Houtekamer and Mitchell 2005; Szunyogh et al. 2008; Meng and Zhang 2008; Torn and Hakim 2008; Whitaker et al. 2008; Anderson et al. 2009) in a wide variety of meteorological modeling systems.

There exist three comparisons of these two ensemble generation techniques in the literature that are most relevant to the present work: Lawson and Hansen (2004), Sakov and Oke (2008), and Lei et al. (2010). This body of work clearly describes the fundamental differences between square root and perturbed observation algorithms from the perspective of the ensemble-based Kalman filter (EBKF). The main differences between square root and perturbed observation (stochastic) ensemble generation appears to be that 1) there is less variability in the variance with small ensemble sizes from square root methods and 2) the higher moments appear to be exaggerated with square root methods, especially given their propensity for generating outliers (e.g., Lawson and Hansen 2004; Sakov and Oke 2008; Anderson 2010).

In Hodyss (2011, 2012) a new technique for the estimation of the posterior mean was described that revealed how one could extend the linear regression capability of the EBKF to that of an algorithm that performs nonlinear polynomial regression with a new quadratic nonlinear term (hereafter referred to as quadratic nonlinear regression). This technique has its roots in nonlinear polynomial least squares regression (e.g., Golub and Meurant 2010) with a general introduction being found in Jazwinski (1970, 340–346).

One of the unique features of this technique is its remarkably mathematical similarity to the EBKF, which allows relatively minor changes to an already constructed EBKF algorithm such that it may perform quadratic nonlinear regression for the update of the mean. The update of the mean consistent with quadratic nonlinear regression allows for a significantly more accurate estimate of the posterior mean when the posterior distribution is skewed because Hodyss (2011) showed that a skewed posterior is associated with a curved (nonlinear function of the innovation) posterior mean. As an example, Hodyss and Reinecke (2013) showed that prior/posterior distributions associated with the strong phase uncertainty of tropical cyclones leads to significant skewness and are therefore strongly affected by these issues.

In Hodyss (2011, 2012) ensemble generation was performed with a version of perturbed observations that was consistent with quadratic nonlinear regression. The main motivation for this article is to provide the details of an algorithm that performs square root ensemble generation consistent with data assimilation algorithms based on quadratic nonlinear least squares regression. A secondary motivation of this article is to extend the previous work on the theories of square root and perturbed observation ensemble generation by first showing mathematically how they are related to each other. Second, and equally importantly, we will extend previous work by illustrating that the fundamental assumption of both square root and perturbed observations ensemble generation techniques, irrespective of whether they are configured to perform linear or quadratic nonlinear regression, is that the posterior error variance is independent of innovation. This has important ramifications to the data assimilation as Hodyss (2011) showed that whenever the posterior is skewed the true posterior error variance is in fact a function of the innovation.

In section 2 we review and extend the theory of square root and perturbed observation algorithms by providing mathematical relationships between them. In section 3 we illustrate the fundamental properties of these two ensemble generation schemes in Lorenz (1963) for both the EBKF and quadratic ensemble filter algorithms. Section 4 applies these techniques to nonlinearly evolving shear instabilities. Finally, section 5 closes the manuscript with a recapitulation of the most important results and a discussion of the major conclusions.

2. Square root and perturbed observation ensemble generation

We begin by reviewing the properties of square root and perturbed observation ensemble generation schemes. The presentation that follows only discusses the ensemble generation step of an ensemble-based data assimilation algorithm because given the same prior ensemble the proper application of square root or perturbed observation ensemble generation has no effect on the mean update.

a. Ensemble-based Kalman filter form

Square root ensemble generation within an EBKF framework begins with the assumption that the ensemble we wish to generate should satisfy
e2.1
where is the error covariance matrix derived from the prior distribution, is the observation operator, is the Kalman gain,
e2.2
being the observation error covariance matrix and is the analysis error covariance matrix meant to describe the covariance matrix of the posterior distribution. A systematic discussion of (2.1) may be found in chapter 7 of Jazwinski (1970) as well as in Ghil and Malanotte-Rizzoli (1991, 162–163).

Choosing to approximate random samples from the posterior distribution by insisting that those samples satisfy (2.1) makes at least three important assumptions. The first assumption is related to the fact that the moments higher than the second determined from a specific square root representation (e.g., Anderson 2001; Bishop et al. 2001) are different because (2.1) only aims to constrain the second moments and this likely provides some explanation as to why different square root schemes obtain different performance levels in various applications. This difference in their moments can be understood by noting the fact that both are identically constrained to satisfy (2.1) and, therefore, the only difference between them is a rotation of the resulting ensemble members, which can only result in differences in the higher moments. This aspect of the problem will not be pursued here. The second assumption with generating an ensemble using (2.1) arises from the fact that we implicitly assume that we accurately know the required forecast and observation error covariance matrices (Houtekamer and Mitchell 2005; p. 3273). This assumption is never satisfied in practice as these objects are simply unknown and/or are difficult to estimate with a limited number of ensemble members.

The third assumption made by insisting that random samples satisfy (2.1) was discussed by Hodyss (2011) and refers to the way in which (2.1) was derived. One of the steps in deriving (2.1) is to integrate the posterior error covariance matrix over all possible observations (or equivalently in terms of the innovations):
e2.3a
where v is the innovation vector and a(v) is the posterior error covariance matrix:
e2.3b
Here is the pdf that describes the distribution of innovations, is the pdf that describing the distribution of possible true states given an innovation v (i.e., the posterior distribution), and is the state estimate (commonly referred to as an “analysis”) whose error variance is being measured. The nontraditional overbar in (2.3a) is to make clear that this calculation determines the mean (or expected) posterior error covariance matrix, which is only equal to the true posterior error covariance matrix in (2.3b) when the posterior has no skewness. This result was originally demonstrated by Hodyss (2011) wherein it was shown that the posterior error covariance matrix is a function of the innovation whenever the posterior third moment is nonzero:
e2.3c
where is the posterior third-moment matrix and is a vector in which the columns of the posterior error covariance matrix have been concatenated into a single column vector [see Hodyss (2011) for further details]. Equation (2.3c) is only valid under the assumption that the various moments required by the algorithm were accurately specified (i.e., assumption two above). If they are not accurately specified then behavior different from that described by (2.3c) may be obtained.

Because we wish to generate an ensemble whose error variance is consistent with the expected squared error being made by our data assimilation system given the specific observations we obtained from the present analysis cycle, the ensemble should satisfy the stronger constraint (2.3b) and not (2.3a). It is important to keep in mind that even if the model is perfect and the ensemble has an infinite number of members that (2.1) will still be independent of innovation and therefore incorrect when the posterior distribution has a nonzero third moment. The fact that (2.1) is, in the sense described above, inconsistent with the true posterior error variance (and higher moments) associated with the latest set of observations, will be illustrated below but the new algorithms listed in the next section will not address this issue.

Given these three caveats we proceed with illustrating the square root ensemble generation scheme of Bishop et al. (2001). Within an EBKF framework we always have available the square root of the prior such that, , where the square root of the prior covariance is constructed by employing each ensemble perturbation, , as a column of :
e2.4
with representing the prior mean and K the number of ensemble members. First, we will define a basis of eigenvectors from
e2.5a
where the columns of represent a basis of eigenvectors and Λ is a diagonal matrix of corresponding eigenvalues. Note that the eigenvectors also serve as the right singular vectors of , such that
e2.5b
where are the left singular vectors. Given (2.5a) and (2.5b) we may rewrite (2.1) as
e2.6
where the “transformation matrix” is
e2.7
and that includes the mean-preserving rotation of Wang et al. (2004). Therefore, an ensemble of perturbations consistent with the analysis error covariance matrix in (2.1) may be made by defining those perturbations as
e2.8
where is a square root of the analysis error covariance matrix derived from a square root ensemble generation technique. It is important to remember that this square root is nonunique as any orthogonal matrix may be right-multiplied against (2.8) without invoking a change in (2.6). In fact, by comparing this construction with a perturbed observation ensemble generation algorithm we may answer the following question: what rotation (or transformation) matrix is required to generate a perturbed observation ensemble using a square root form?
To answer this question we first write a perturbed observation ensemble generation technique in matrix form:
e2.9
where and is a p × K matrix whose elements are random draws from a normal distribution with mean 0 and variance 1/K [N(0, 1/K)]. First, we would like to point out that consistent with the discussion in Hodyss (2011) we employ a sign change in our definition of the “perturbed innovation” in (2.9). Please see Hodyss (2011) for further discussion on this issue. Second, it is important to realize that in the limit as K → ∞, the multiplication of (2.9) by its transpose obtains (2.1) and therefore the perturbed observations algorithm also satisfies (2.3a) and not (2.3b). Hence, neither the square root nor perturbed observations algorithm produces a covariance matrix that is consistent with the present set of observations [(2.3b)].
It will prove useful to write with the singular value decomposition of (2.5a) and (2.5b) as with a K × K matrix whose elements are random draws from N(0, 1/K). Using this construction we find the transformation matrix that rotates the prior ensemble perturbations into the perturbed observation ensemble:
e2.10
where
e2.11
Given p and we may now find the rotation matrix that relates them:
e2.12
where is the sought after rotation matrix, which has the following form:
e2.13
Hence, we have now shown how one can generate a perturbed observation ensemble (2.9) using a square root form. Recall that in the limit as K → ∞ both perturbed observations and the square root generation techniques satisfy (2.1) identically and therefore it is the higher moments that are being changed by this rotation. This difference between the higher moments will be shown explicitly in section 3. Equations (2.11) and (2.13) may be useful in the context of testing perturbed observation ensemble generation in an already constructed data assimilation system based on a square root form as the calculation of (2.11) rather than (2.7) is trivial.
We may now also answer the converse question: what nonrandom matrix is required to generate the square root ensemble (2.8) within a perturbed observation framework (2.9)? To answer this question we again set (2.8) and (2.9) equal to each other:
e2.14
but this time we solve for the as yet unknown matrix to obtain the following:
e2.15
Because of the factor in (2.15) we are implicitly assuming that there are no zero eigenvalues in (2.5a). In practice one does sometimes find very small eigenvalues in and in this case one would remove those eigenvalues and corresponding eigenvectors from the calculation of (2.15). In any case, (2.15) allows the generation of a “perturbed observations” ensemble without random noise by using the formula:
e2.16
where the columns of are perturbed innovations of the following form:
e2.17
Equations (2.16) and (2.17) may prove useful for the generation of an ensemble at a small ensemble size where random noise has sometimes been found to be detrimental in a perturbed observation framework. Another benefit of generating a square root ensemble using (2.16) is that one may employ localization in the calculation of the Kalman gain , and hence some accounting for spurious correlations may be made.

b. Quadratic ensemble filter form

Generating an estimate of random draws from the posterior distribution within a quadratic ensemble filter (Hodyss 2012) framework is remarkably similar to that for the EBKF. Square root ensemble generation within a quadratic ensemble filter framework begins with the assumption that the ensemble we wish to generate should satisfy
e2.18
where is an extended error “covariance” matrix derived from the prior distribution and includes information from moments higher than the second. The “hat” notation on the matrix components of (2.18) as well as the matrices defined below is being used to differentiate these matrices from the previous section as the matrices here have been extended using the quadratic state-space approach of Hodyss (2012) such that they now include moments higher than the second. Equation (2.18) was first derived in Hodyss (2011) and is the solution to (2.3a) when [(2.3b)] is calculated using the state estimate from the quadratic ensemble filter.
The square root of the prior is simply created by concatenating the K ensemble perturbations with the squares of the ensemble perturbations [see Hodyss (2012) for further discussion]:
e2.19
where the “” symbol represents the Schur–Hadamard (elementwise) product of two vectors and ; is simply a column vector consisting of the diagonal elements of the prior forecast error covariance matrix. It is important to note that the form of (2.19) is consistent with the data assimilation algorithm described in Hodyss (2012); the form of (2.19) would be different for the algorithm described in Hodyss (2011).
The square root in (2.19) creates an extended covariance matrix that includes information from the third and fourth moments:
e2.20
where (2.20) includes information from the third- and fourth-moment matrices of the prior distribution. The observation operator in the extended state space takes the particularly simple form:
e2.21
while the extended gain matrix takes the following form:
e2.22
and the extended observation error covariance matrix is
e2.23
As in Hodyss (2012) we write the observation operator as if it were linear only for ease of presentation. Including the nonlinear observation operator in this framework is trivial and is described in appendix A of Hodyss (2012).
Given (2.19) through (2.23) we begin as before and define an eigenvector basis as
e2.24
Note that the matrix (2.24) is K × K. Therefore, the cost of the eigenvalue decomposition is the same as that of section 2a. There is, however, an additional cost in the computation of the matrix (2.24) and that cost is a factor of 2 more for (2.24) than when compared to the expense of calculating (2.5a). Because the cost of this step is minor compared with the other costs of the data assimilation algorithm this additional runtime is negligible. While the number of eigenvectors and eigenvalues of (2.24) are the same as those of (2.5a) their structure is different owing to the inclusion of information from moments higher than the second.
Given (2.24) we may rewrite (2.18) as
e2.25
where this new “transformation matrix” is
e2.26
This allows a new estimate of random draws from the posterior distribution whose error variance is consistent with the quadratic ensemble filter and is defined as , where is a square root of the analysis error covariance (second moments only) matrix consistent with the quadratic ensemble filter, which is again nonunique.
The perturbed observation algorithm consistent with the quadratic ensemble filter of Hodyss (2012) may be constructed by first creating K perturbed innovations of the following form:
e2.27
where it is important to note the factors used here to ensure that the perturbations are not scaled by the ensemble size. It is important to unscale the perturbations because they will be squared in the next step and this squaring of the perturbations can be thought of as a way to measure their amplitude. Given (2.27) we may generate an estimate of the square root of the posterior by using the following formula:
e2.28
where the extended innovation vector is calculated from
e2.29
with the expected square innovation matrix being a p × K matrix whose columns are the diagonal of repeated in each column. This term ensures that in the absence of bias in the prior mean (first moment) and/or “bias” in the prior variance (second moment) the extended innovation, like its Kalman counterpart, has zero expectation. In addition, when generalized into the realm of nonlinear regression what is actually being perturbed is the innovation rather than the observation; in other words if one actually did perturb the observation in this generalized nonlinear regression algorithm then a different and incorrect variance would be obtained. Last, consistent with the previous section, the relationships derived between the square root and perturbed observation algorithms can also be derived in this framework as well but will not be repeated here in the interest of space.

3. Illustrative examples with Lorenz (1963)

a. True versus expected posterior variances

The Lorenz (1963) equations are
e3.1a
e3.1b
e3.1c
where the parameters will be . The system (3.1a)(3.1c) will be solved using the explicit Runge–Kutta (4, 5) solver (ode45) found in recent versions of MATLAB. We perform an integration of (3.1a)(3.1c) using the procedure described in Hodyss (2011). We start from an arbitrarily chosen initial point off the attractor and integrate long enough (10 units of time) that we are assured that we have arrived on the attractor. We take this new state and perturb it 106 times with normally distributed random noise with mean zero and variance . We integrate each of these initial conditions for one unit of time. In the context of data assimilation we assume that the ensemble that results from this procedure is the prior distribution and observe the variables y and z of members of this prior with an observation error variance of one on each variable. This procedure determines the posterior distribution as a function of the innovation by creating truth–innovation pairs. These truth–innovation pairs are then plotted in Fig. 1 as a scatter diagram in which each point on the plot represents one truth–innovation pair obtained from this procedure.
Fig. 1.
Fig. 1.

The posterior distribution plotted as a function of each element of the innovation. The black dots represent members of the posterior distribution and the gray dots are the shadow of the posterior. This “shadow” represents the distribution of innovations.

Citation: Monthly Weather Review 141, 7; 10.1175/MWR-D-12-00117.1

In Fig. 1 is shown the distribution of each state variable (x, y, and z) plotted as a scatter diagram as a function of the two elements of the innovation vector. Consistent with the discussion found in Hodyss (2011) the unobserved variable x is the most curved (a characteristic sign of non-Gaussian distributions) as a function of innovation. Nevertheless, the observed variables (y and z) are also curved, although considerably less so than the unobserved variable. Hodyss (2011) showed that this curvature leads to the posterior error variance being a function of the innovation. In Fig. 2 the error variance of the posterior (i.e., the mean square difference between each state estimate and the truth is calculated) is plotted as a function of the innovation [(2.3b)] for three different state estimates: true posterior mean, EBKF mean, and the quadratic ensemble filter mean of Hodyss (2012). The point to be taken from Fig. 2 is that the true posterior error variance is a strong function of the most recent innovation and a proper ensemble generation scheme will produce a different error variance for each different innovation that might be obtained. Note however that (2.1) and (2.18) are not a function of the innovation and therefore cannot vary appropriately with innovation such that an ensemble generation scheme based upon either of them could never produce Fig. 2. What in fact (2.1) and (2.18) are calculating is illustrated next.

Fig. 2.
Fig. 2.

The binned posterior error variance as a function of the two elements of the innovation vector for three state estimates. (from top to bottom) The rows correspond with each state variable: x, y, and z, respectively. (from left to right) The columns correspond with the true posterior mean, EBKF estimate, and quadratic ensemble filter estimate of the posterior mean. Bins without color (white) received less than 100 samples and an error variance was not calculated.

Citation: Monthly Weather Review 141, 7; 10.1175/MWR-D-12-00117.1

As discussed in section 2, (2.1) and (2.18) are attempting to find the expected posterior covariance matrix in (2.3a) and not the posterior error covariance consistent with the latest innovation in (2.3b). Proof of this is provided in Table 1. In Table 1 we perform the integration suggested in (2.3a) by calculating the probability of obtaining an innovation in each bin [] in Fig. 2, multiplying the resulting field by the posterior error variance in Fig. 2, and subsequently summing the result:
e3.2
where is the estimated posterior error variance in the (i, j) cell of Fig. 2 and pij is the probability of drawing an innovation in the (i, j) cell of Fig. 2. We go on to also do this calculation in (3.2) for the third and fourth moments. For comparison purposes we also perform the indicated calculations of sections 2a and 2b to generate both square root and perturbed observation ensembles consistent with (2.1) and (2.18). What we see in Table 1 is that the true posterior error variance when averaged over all innovations matches extremely closely to what is produced by the ensemble generation algorithms in sections 2a and 2b. In fact the posterior error variance from the quadratic ensemble filter of Hodyss (2012) is nearly identical to the minimum possible expected error variance of the true posterior mean. So, while both techniques do indeed provide an accurate estimate of the expected posterior error variance around each of their estimates of the posterior mean, it is the expected higher moments where they begin to differ. As can be seen in Table 1 neither square root technique is as close with the higher moments as they were with the second moments. The fact that neither square root technique provides an especially accurate third and fourth moment is due to the fact that (2.1) and (2.18) only aim to constrain the second moments. In contrast, however, the perturbed observation technique actually does recover the expected higher moments to a high degree of accuracy. This is of course by construction as (2.9) and (2.28) are the very ones we would use to calculate the expected higher moments. In other words, the procedure of resampling the innovation as the sum of an appropriately scaled random number and one prior ensemble perturbation (e.g., ) is equivalent to an integration over innovation as this procedure samples all innovations with the correct frequency.
Table 1.

Expected error statistics from the Lorenz (1963) experiment. Starting at the top of each cell is the stated statistic for the x, y, and then z variables. SQ refers to a square root algorithm and PO refers to perturbed observations.

Table 1.
MomentsPriorPosteriorKalman (true)Kalman SQKalman POQuad (true)Quad SQQuad PO
Second0.03210.02020.02580.02590.02580.02040.02040.02040
2.040.4600.4690.4690.4680.4610.4610.460
1.750.3730.4130.4130.4130.3740.3740.374
Third (×102)−0.550−0.0904−0.311−0.380−0.312−0.0940−0.159−0.0938
−24.4−4.38−4.65−15.6−4.41−4.69−12.0−4.44
−220−5.62−8.39−38.6−8.77−5.35−16.3−5.73
Fourth (×102)0.5190.1430.3260.3500.3270.1480.1600.147
118065.867.867.067.666.060.165.8
137748.262.211262.648.350.048.7

b. Cycling experiments

In this section we will perform cycling experiments with the Lorenz system for a variety of ensemble sizes and cycling intervals. We will perform square root and perturbed observation ensemble generation experiments here within both EBKF and quadratic ensemble filter DA algorithms. In these experiments we observe variables x and z of the Lorenz equations in (3.1a)(3.1c) with observation error variances of 0.1 on both variables. We examine three cycling intervals of τ = 0.05, 0.1, and 0.15 units of time. The shorter cycling interval of τ = 0.05 will be used to understand the performance in a relatively linear (Gaussian) situation, while the longer cycling interval of τ = 0.15 will be used to understand the performance in a relatively nonlinear (non-Gaussian) situation. Four ensemble sizes will be shown: K = 32, 64, 128, and 256. For each ensemble size, cycling interval and ensemble generation method, the prior inflation is tuned on four independent (starting from different points on the attractor as well as a different random seed in the random number generator) assimilations runs of 2050 cycles in which the first 50 are discarded and statics are calculated on the last 2000 cycles. The optimal inflation factor is found by finding the factor that leads to the minimum average RMS analysis error over the four independent assimilation runs. Subsequently, another independent assimilation experiment is then begun using the optimally determined inflation and cycled for 10 050 cycles in which again the first 50 are discarded and statistics are calculated over the last 10 000 cycles.

Figure 3 shows the root-mean-square (RMS) analysis errors for all experiments as a function of the cycling interval τ. The main result to be taken from this figure is that for weak nonlinearity (τ = 0.05) quadratic nonlinear regression is comparable to linear (EBKF) regression, while for strong nonlinearity (τ = 0.15) quadratic nonlinear regression is superior. This superiority becomes greater at the larger ensemble sizes because at these ensemble sizes the third and fourth moments become resolved quite well. Figure 3 reveals the peculiar property that at τ = 0.05 the Kalman-based square root generation technique actually obtains larger RMS analysis errors for the larger ensemble sizes. This pathological property of square root filters has been seen previously in Sakov and Oke (2008, their Fig. 4) and Anderson (2010, his Figs. 2 and 10). Figure 10 in Anderson (2010) is particularly pertinent as it is also in the Lorenz (1963) model but with a different (larger) number of observations and different (larger) observation error variances, which lends credence to the generality of this behavior.

Fig. 3.
Fig. 3.

RMS analysis error in Lorenz-63 for three cycling intervals and ensemble sizes of (a) 32, (b) 64, (c) 128, and (d) 256 members. Solid lines are for the quadratic techniques and dashed lines are for the Kalman-based techniques. Black lines are for square root generation and gray lines are for perturbed observation ensemble generation.

Citation: Monthly Weather Review 141, 7; 10.1175/MWR-D-12-00117.1

Because of the emergence of this well-known problem we caution the reader to not attempt to draw any conclusions about the relative superiority of square root versus perturbed observation generation from Fig. 3. It appears that the well-known “outlier problem” of Lawson and Hansen (2004), Sakov and Oke (2008), and Anderson (2010) has emerged to corrupt the performance of both the square root filters. This outlier problem is the degradation of the ensemble from the emergence of a single member substantially distant from the other members of the ensemble. We provide further evidence that this issue is the very one that is affecting the performance of the square root filter by finding the ensemble member at analysis time that maximizes the following norm for each of the 10 000 cycles in our test period:
e3.3
where is the ensemble perturbation around the mean for the member being tested and for the ith variable of the Lorenz equations in (3.1a)(3.1c) and is the ensemble variance for the ith variable. We then plot in Fig. 4 the distribution of for K = 32 and 256 ensemble members and for τ = 0.05. In Fig. 4 we see that the distribution of outliers is remarkably similar for both perturbed observation ensemble generation techniques and at both ensemble sizes. However, this is not true for the square root ensemble generation techniques. While both square root ensemble generation techniques have quite similar distributions of outliers at K = 32, the distribution of outliers at K = 256 for the Kalman square root technique is much more severely skewed toward extremely large outliers. We also examined plots of plotted as a function of cycle number (not shown) and have determined that these large outliers develop rather slowly as it tended to take more than 1000 cycles for to become greater than 15 in the Kalman-based square root technique. So, while the quadratic-based square root technique was not so severely affected by outliers in the experiments with , further experiments with a τ = 0.2 (not shown) revealed severe outliers in both square root generation techniques and a subsequent divergence of both techniques. In contrast, neither perturbed observation technique diverged at τ = 0.2 (not shown), with both performing satisfactorily, but again the quadratic technique being superior to the Kalman-based version.
Fig. 4.
Fig. 4.

Distribution of outliers in Lorenz (1963) experiments with a cycling interval of τ = 0.05. Solid lines are for the quadratic techniques and dashed lines are for the Kalman-based techniques. Black lines are for square root generation and gray lines are for perturbed observation ensemble generation.

Citation: Monthly Weather Review 141, 7; 10.1175/MWR-D-12-00117.1

We verified the detrimental effects of these outliers on square root ensemble generation performance by invoking a periodic random rotation to the transformation matrix (2.7) through the following transformation:
e3.4
where is a K × K matrix whose elements are random draws from N(0, 1/K). This well-known random rotation is known to produce a smoothing of the ensemble distribution and, hence, is likely to reduce outliers. We therefore applied this random rotation every 500 cycles to a few test cases within our experimental group and found that generally this lead to fewer outliers and subsequently lower RMS analysis error. In some cases, however, this procedure did not lead to lower RMS analysis error and we believe that some tuning of the time interval between applications of the random rotation, as well as a possible retuning of the inflation factor, is required because of the subsequent change in the shape of the distribution. This, however, was not attempted here because it is clearly outside of the scope of this paper as our goal was simply to compare quadratic nonlinear regression against linear (Kalman) regression in a simple setting.

4. Cycling experiments in a 2D shear layer

This section describes experiments with nonlinearly evolving shear wave instabilities and compares with the experiments in Hodyss (2012) that only showed results for experiments with perturbed observations. The difference between Hodyss (2012) and here is that we now include ensemble generation using the square root forms discussed in section 2. We will provide a very brief overview of this experiment, but we refer the reader to Hodyss (2012) for details of the model, observations and observational network, and both Kalman and quadratic data assimilation algorithms for the estimate of the posterior mean.

A shear layer is simulated such that shear instabilities are produced in a model configuration with a state vector of length 8448 in which half the state variables correspond with the vorticity and the other half correspond with the potential temperature. Two separate ensemble DA algorithms are tested: 1) an EBKF using the square root ensemble generation of section 2a and 2) a quadratic ensemble filter using the square root ensemble generation of section 2b. All DA algorithms are coded to assimilate all observations at once (i.e., operate as a “global solve”) in which 640 observations of zonal winds and temperature are assimilated. Four different cycling intervals are examined (100, 200, 300, and 400 time steps of the model). As shown by Hodyss (2012) the longer cycling intervals correspond with larger third moments. In a general sense the cycling intervals of 100 and 200 cycles can be thought of as nearly linear, 300 cycles as moderately nonlinear, and 400 cycles as strongly nonlinear. Both DA algorithms use prior inflation as well as the localization scheme of Bishop et al. (2011). These two techniques are separately tuned for each DA algorithm to deliver the minimum RMS analysis error (with respect to a truth run) over the first third (120 cycles, throwing out the first 20) of the test period. The experiments are carried out for 64-, 128-, and 256-member ensembles and for 320 cycles in which the first 20 are thrown away and statistical validation is carried out over the last 300. The RMS analysis error over the last 300 cycles is shown in Fig. 5 for different cycling intervals and ensemble sizes and includes the perturbed observation results from Hodyss (2012). Figure 5 shows that for all cycling intervals shown the performance of the quadratic ensemble filter with square root generation is superior to that of the EBKF with square root ensemble generation. At a cycling interval of τ = 100 and for 64-member ensembles (not shown) the EBKF with square root ensemble generation was technically better than the quadratic ensemble filter, but as discussed below was not statistically distinguishable from a bootstrap resampling perspective. In comparison with a perturbed observation generation scheme we can see in Fig. 5 that for τ > 200 perturbed observations was superior to square root generation. This result, that longer cycling intervals favor perturbed observations, was also seen in the Lorenz (1963) experiments and similar to those experiments we believe is due to the greater nonlinearity for longer cycling intervals as well as the possibility of the emergence of outliers here as well.

Fig. 5.
Fig. 5.

RMS analysis error over 300 cycles for (a),(c),(e) vorticity and (b),(d),(f) potential temperature for three different cycling intervals and ensemble sizes of 64, 128, and 256 members. Solid lines are for the quadratic techniques and dashed lines are for the Kalman-based techniques. Black lines are for square root generation and gray lines are for perturbed observation ensemble generation. The cycling interval of τ = 400 was sufficiently nonlinear so that 64- and 128-member ensembles lead to filter divergence for the square root techniques. For this case the 256-member square root ensemble is represented in the Kalman technique by a filled circle and the quadratic technique is denoted by the filled square. For the cycling interval of t = 300, 64-member ensembles lead to filter divergence for both square root techniques and the RMS analysis error is therefore not plotted.

Citation: Monthly Weather Review 141, 7; 10.1175/MWR-D-12-00117.1

To assess statistical significance we use bootstrap resampling (Wilks 2006) that we apply to the 300 paired differences between the RMS analysis error of the EBKF and the quadratic ensemble filter. We only apply bootstrap resampling to the square root generation experiments because Hodyss (2012) already discussed the statistical significance of the perturbed observations experiments. We choose a confidence interval defined by the 5% and 95% resampling interval. We assume the difference in the RMS analysis error is significant when this confidence interval does not include zero (because this is a paired difference, a zero value is the dividing line between the two techniques). We find that the quadratic ensemble filter is statistically significantly different from the EBKF for all experiments with 256-member ensembles. For 64- and 128-member ensembles the two DA algorithms are statistically different for cycling intervals greater than 100.

5. Summary and conclusions

A new square root ensemble generation technique is described that is consistent with the expected analysis error variance of the quadratic ensemble filter of Hodyss (2012). Because the expected error variance of the posterior is affected by the third and fourth moments of the prior, this new technique provides some accounting for the effects of skewness. It was shown that this new technique provides a better estimate than a Kalman-based square root technique of the state of a nonlinear system in both the Lorenz (1963) model as well as in an experiment with a nonlinear model simulating shear layer instabilities. The outlier problem for square root ensemble generation of Lawson and Hansen (2004), Sakov and Oke (2008), and Anderson (2010) was found in the experiments with the Lorenz (1963) model and was shown to severely degrade the performance of the filter. The new quadratic square root technique appeared to have less of an issue with outliers, though for large enough cycling interval (a proxy for nonlinearity) it too developed large outliers that severely degraded its performance.

It was also shown that the most important issue with this technique as well as all previous ensemble generation techniques is their reliance on the expected posterior error variance rather than the error variance consistent with the latest innovation vector. It is important to realize that because the posterior error variance created by both square root and perturbed observations algorithms are incorrect when the posterior distribution is skewed, this implies that the state estimate from an ensemble-based Kalman (quadratic ensemble) filter does not actually deliver the minimum error variance estimate of a linear (quadratic) estimator even when the model is perfect and the ensemble is infinite.

Currently, we are examining techniques that attempt to reduce the impact of this issue with the error variance in skewed distributions through the application of appropriately designed posterior inflation algorithms. In addition, we are currently working toward applying quadratic nonlinear regression to a numerical weather prediction setting.

Acknowledgments

We gratefully acknowledge support from the Office of Naval Research PE-0601153N.

REFERENCES

  • Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation. Mon. Wea. Rev., 129, 28842903.

  • Anderson, J. L., 2010: A non-Gaussian ensemble filter update for data assimilation. Mon. Wea. Rev., 138, 41864198.

  • Anderson, J. L., T. Hoar, K. Raeder, H. Liu, N. Collins, R. Torn, and A. Avellano, 2009: The data assimilation research testbed: A community facility. Bull. Amer. Meteor. Soc., 90, 12831296.

    • Search Google Scholar
    • Export Citation
  • Bishop, C. H., B. J. Etherton, and S. J. Majumdar, 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects. Mon. Wea. Rev., 129, 420436.

    • Search Google Scholar
    • Export Citation
  • Bishop, C. H., D. Hodyss, P. Steinle, H. Sims, A. M. Clayton, A. C. Lorenc, D. M. Barker, and M. Buehner, 2011: Efficient ensemble covariance localization in variational data assimilation. Mon. Wea. Rev., 139, 573580.

    • Search Google Scholar
    • Export Citation
  • Burgers, G., P. J. Van Leeuwen, and G. Evensen, 1998: Analysis scheme in the ensemble Kalman filter. Mon. Wea. Rev., 126, 17191724.

  • Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99 (C5), 10 14310 162.

    • Search Google Scholar
    • Export Citation
  • Evensen, G., 2003: The ensemble Kalman filter: Theoretical formulation and practical implementation. Ocean Dyn., 53, 343367.

  • Ghil, M., and P. Malanotte-Rizzoli, 1991: Data assimilation in meteorology and oceanography. Advances in Geophysics, Vol. 33, Academic Press, 141–266.

  • Golub, G. H., and G. A. Meurant, 2010: Matrices, Moments and Quadrature with Applications. Princeton University Press, 363 pp.

  • Hodyss, D., 2011: Ensemble state estimation for nonlinear systems using polynomial expansions in the innovation. Mon. Wea. Rev., 139, 35713588.

    • Search Google Scholar
    • Export Citation
  • Hodyss, D., 2012: Accounting for skewness in ensemble data assimilation. Mon. Wea. Rev., 140, 23462358.

  • Hodyss, D., and P. A. Reinecke, 2013: Skewness of the prior through position errors and its impact on data assimilation. Data Assimilation for Atmospheric, Oceanic, and Hydrologic Applications, S. K. Park and L. Xu, Eds., Vol. II, Springer, 843 pp.

  • Houtekamer, P. L., and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 129, 123137.

    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., and H. L. Mitchell, 2005: Ensemble Kalman filtering. Quart. J. Roy. Meteor. Soc., 131, 32693289.

  • Houtekamer, P. L., H. L. Mitchell, G. Pellerin, M. Buehner, M. Charron, L. Spacek, and B. Hansen, 2005: Atmospheric data assimilation with an ensemble Kalman filter: Results with real observations. Mon. Wea. Rev., 133, 604620.

    • Search Google Scholar
    • Export Citation
  • Jazwinski, A. H., 1970: Stochastic Processes and Filtering Theory. Academic Press, 376 pp.

  • Lawson, W. G., and J. A. Hansen, 2004: Implications of stochastic and deterministic filters as ensemble-based data assimilation methods in varying regimes of error growth. Mon. Wea. Rev., 132, 19661981.

    • Search Google Scholar
    • Export Citation
  • Lei, J., P. Bickel, and C. Snyder, 2010: Comparison of ensemble Kalman filters under non-Gaussianity. Mon. Wea. Rev., 138, 12931306.

  • Lorenz, E. N., 1963: Deterministic non-periodic flow. J. Atmos. Sci., 20, 130141.

  • Meng, Z., and F. Zhang, 2008: Tests of an ensemble Kalman filter for mesoscale and regional-scale data assimilation. Part IV: Comparison with 3DVAR in a month-long experiment. Mon. Wea. Rev., 136, 36713682.

    • Search Google Scholar
    • Export Citation
  • Sakov, P., and P. R. Oke, 2008: Implications of the form of the ensemble transformation in the ensemble square root filter. Mon. Wea. Rev., 136, 10421053.

    • Search Google Scholar
    • Export Citation
  • Szunyogh, I., E. J. Kostelich, G. Gyarmati, E. Kalnay, B. R. Hunt, E. Ott, E. Satterfield, and J. A. Yorke, 2008: A local ensemble transform Kalman filter data assimilation system for the NCEP global model. Tellus, 60, 113130.

    • Search Google Scholar
    • Export Citation
  • Torn, R. D., and G. J. Hakim, 2008: Performance characteristics of a pseudo-operational ensemble Kalman filter. Mon. Wea. Rev., 136, 39473963.

    • Search Google Scholar
    • Export Citation
  • Wang, X., C. H. Bishop, and S. J. Julier, 2004: Which is better, an ensemble of positive-negative pairs or a centered spherical simplex ensemble? Mon. Wea. Rev., 132, 15901605.

    • Search Google Scholar
    • Export Citation
  • Whitaker, J. S., T. M. Hamill, X. Wei, Y. Song, and Z. Toth, 2008: Ensemble data assimilation with the NCEP Global Forecast System. Mon. Wea. Rev., 136, 463482.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2006: Statistical Methods in the Atmospheric Sciences. 2nd ed. Academic Press, 627 pp.

Save
  • Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation. Mon. Wea. Rev., 129, 28842903.

  • Anderson, J. L., 2010: A non-Gaussian ensemble filter update for data assimilation. Mon. Wea. Rev., 138, 41864198.

  • Anderson, J. L., T. Hoar, K. Raeder, H. Liu, N. Collins, R. Torn, and A. Avellano, 2009: The data assimilation research testbed: A community facility. Bull. Amer. Meteor. Soc., 90, 12831296.

    • Search Google Scholar
    • Export Citation
  • Bishop, C. H., B. J. Etherton, and S. J. Majumdar, 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects. Mon. Wea. Rev., 129, 420436.

    • Search Google Scholar
    • Export Citation
  • Bishop, C. H., D. Hodyss, P. Steinle, H. Sims, A. M. Clayton, A. C. Lorenc, D. M. Barker, and M. Buehner, 2011: Efficient ensemble covariance localization in variational data assimilation. Mon. Wea. Rev., 139, 573580.

    • Search Google Scholar
    • Export Citation
  • Burgers, G., P. J. Van Leeuwen, and G. Evensen, 1998: Analysis scheme in the ensemble Kalman filter. Mon. Wea. Rev., 126, 17191724.

  • Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99 (C5), 10 14310 162.

    • Search Google Scholar
    • Export Citation
  • Evensen, G., 2003: The ensemble Kalman filter: Theoretical formulation and practical implementation. Ocean Dyn., 53, 343367.

  • Ghil, M., and P. Malanotte-Rizzoli, 1991: Data assimilation in meteorology and oceanography. Advances in Geophysics, Vol. 33, Academic Press, 141–266.

  • Golub, G. H., and G. A. Meurant, 2010: Matrices, Moments and Quadrature with Applications. Princeton University Press, 363 pp.

  • Hodyss, D., 2011: Ensemble state estimation for nonlinear systems using polynomial expansions in the innovation. Mon. Wea. Rev., 139, 35713588.

    • Search Google Scholar
    • Export Citation
  • Hodyss, D., 2012: Accounting for skewness in ensemble data assimilation. Mon. Wea. Rev., 140, 23462358.

  • Hodyss, D., and P. A. Reinecke, 2013: Skewness of the prior through position errors and its impact on data assimilation. Data Assimilation for Atmospheric, Oceanic, and Hydrologic Applications, S. K. Park and L. Xu, Eds., Vol. II, Springer, 843 pp.

  • Houtekamer, P. L., and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 129, 123137.

    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., and H. L. Mitchell, 2005: Ensemble Kalman filtering. Quart. J. Roy. Meteor. Soc., 131, 32693289.

  • Houtekamer, P. L., H. L. Mitchell, G. Pellerin, M. Buehner, M. Charron, L. Spacek, and B. Hansen, 2005: Atmospheric data assimilation with an ensemble Kalman filter: Results with real observations. Mon. Wea. Rev., 133, 604620.

    • Search Google Scholar
    • Export Citation
  • Jazwinski, A. H., 1970: Stochastic Processes and Filtering Theory. Academic Press, 376 pp.

  • Lawson, W. G., and J. A. Hansen, 2004: Implications of stochastic and deterministic filters as ensemble-based data assimilation methods in varying regimes of error growth. Mon. Wea. Rev., 132, 19661981.

    • Search Google Scholar
    • Export Citation
  • Lei, J., P. Bickel, and C. Snyder, 2010: Comparison of ensemble Kalman filters under non-Gaussianity. Mon. Wea. Rev., 138, 12931306.

  • Lorenz, E. N., 1963: Deterministic non-periodic flow. J. Atmos. Sci., 20, 130141.

  • Meng, Z., and F. Zhang, 2008: Tests of an ensemble Kalman filter for mesoscale and regional-scale data assimilation. Part IV: Comparison with 3DVAR in a month-long experiment. Mon. Wea. Rev., 136, 36713682.

    • Search Google Scholar
    • Export Citation
  • Sakov, P., and P. R. Oke, 2008: Implications of the form of the ensemble transformation in the ensemble square root filter. Mon. Wea. Rev., 136, 10421053.

    • Search Google Scholar
    • Export Citation
  • Szunyogh, I., E. J. Kostelich, G. Gyarmati, E. Kalnay, B. R. Hunt, E. Ott, E. Satterfield, and J. A. Yorke, 2008: A local ensemble transform Kalman filter data assimilation system for the NCEP global model. Tellus, 60, 113130.

    • Search Google Scholar
    • Export Citation
  • Torn, R. D., and G. J. Hakim, 2008: Performance characteristics of a pseudo-operational ensemble Kalman filter. Mon. Wea. Rev., 136, 39473963.

    • Search Google Scholar
    • Export Citation
  • Wang, X., C. H. Bishop, and S. J. Julier, 2004: Which is better, an ensemble of positive-negative pairs or a centered spherical simplex ensemble? Mon. Wea. Rev., 132, 15901605.

    • Search Google Scholar
    • Export Citation
  • Whitaker, J. S., T. M. Hamill, X. Wei, Y. Song, and Z. Toth, 2008: Ensemble data assimilation with the NCEP Global Forecast System. Mon. Wea. Rev., 136, 463482.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2006: Statistical Methods in the Atmospheric Sciences. 2nd ed. Academic Press, 627 pp.

  • Fig. 1.

    The posterior distribution plotted as a function of each element of the innovation. The black dots represent members of the posterior distribution and the gray dots are the shadow of the posterior. This “shadow” represents the distribution of innovations.

  • Fig. 2.

    The binned posterior error variance as a function of the two elements of the innovation vector for three state estimates. (from top to bottom) The rows correspond with each state variable: x, y, and z, respectively. (from left to right) The columns correspond with the true posterior mean, EBKF estimate, and quadratic ensemble filter estimate of the posterior mean. Bins without color (white) received less than 100 samples and an error variance was not calculated.

  • Fig. 3.

    RMS analysis error in Lorenz-63 for three cycling intervals and ensemble sizes of (a) 32, (b) 64, (c) 128, and (d) 256 members. Solid lines are for the quadratic techniques and dashed lines are for the Kalman-based techniques. Black lines are for square root generation and gray lines are for perturbed observation ensemble generation.

  • Fig. 4.

    Distribution of outliers in Lorenz (1963) experiments with a cycling interval of τ = 0.05. Solid lines are for the quadratic techniques and dashed lines are for the Kalman-based techniques. Black lines are for square root generation and gray lines are for perturbed observation ensemble generation.

  • Fig. 5.

    RMS analysis error over 300 cycles for (a),(c),(e) vorticity and (b),(d),(f) potential temperature for three different cycling intervals and ensemble sizes of 64, 128, and 256 members. Solid lines are for the quadratic techniques and dashed lines are for the Kalman-based techniques. Black lines are for square root generation and gray lines are for perturbed observation ensemble generation. The cycling interval of τ = 400 was sufficiently nonlinear so that 64- and 128-member ensembles lead to filter divergence for the square root techniques. For this case the 256-member square root ensemble is represented in the Kalman technique by a filled circle and the quadratic technique is denoted by the filled square. For the cycling interval of t = 300, 64-member ensembles lead to filter divergence for both square root techniques and the RMS analysis error is therefore not plotted.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 1883 1637 53
PDF Downloads 199 58 5