1. Introduction
A key component of ensemble-based data assimilation (DA) is the generation of the ensemble consistent with the distribution of possible true states conditioned on the latest set of observations (commonly referred to as the posterior distribution). The two techniques that attempt to accomplish this task and that are commonly used in the geophysical community are referred to as perturbed observation (or stochastic) ensemble generation (Evensen 1994, 2003; Burgers et al. 1998; Houtekamer and Mitchell 2001, 2005) and square root (or deterministic) forms (Anderson 2001; Bishop et al. 2001). These techniques have been used with great success (e.g., Houtekamer et al. 2005; Houtekamer and Mitchell 2005; Szunyogh et al. 2008; Meng and Zhang 2008; Torn and Hakim 2008; Whitaker et al. 2008; Anderson et al. 2009) in a wide variety of meteorological modeling systems.
There exist three comparisons of these two ensemble generation techniques in the literature that are most relevant to the present work: Lawson and Hansen (2004), Sakov and Oke (2008), and Lei et al. (2010). This body of work clearly describes the fundamental differences between square root and perturbed observation algorithms from the perspective of the ensemble-based Kalman filter (EBKF). The main differences between square root and perturbed observation (stochastic) ensemble generation appears to be that 1) there is less variability in the variance with small ensemble sizes from square root methods and 2) the higher moments appear to be exaggerated with square root methods, especially given their propensity for generating outliers (e.g., Lawson and Hansen 2004; Sakov and Oke 2008; Anderson 2010).
In Hodyss (2011, 2012) a new technique for the estimation of the posterior mean was described that revealed how one could extend the linear regression capability of the EBKF to that of an algorithm that performs nonlinear polynomial regression with a new quadratic nonlinear term (hereafter referred to as quadratic nonlinear regression). This technique has its roots in nonlinear polynomial least squares regression (e.g., Golub and Meurant 2010) with a general introduction being found in Jazwinski (1970, 340–346).
One of the unique features of this technique is its remarkably mathematical similarity to the EBKF, which allows relatively minor changes to an already constructed EBKF algorithm such that it may perform quadratic nonlinear regression for the update of the mean. The update of the mean consistent with quadratic nonlinear regression allows for a significantly more accurate estimate of the posterior mean when the posterior distribution is skewed because Hodyss (2011) showed that a skewed posterior is associated with a curved (nonlinear function of the innovation) posterior mean. As an example, Hodyss and Reinecke (2013) showed that prior/posterior distributions associated with the strong phase uncertainty of tropical cyclones leads to significant skewness and are therefore strongly affected by these issues.
In Hodyss (2011, 2012) ensemble generation was performed with a version of perturbed observations that was consistent with quadratic nonlinear regression. The main motivation for this article is to provide the details of an algorithm that performs square root ensemble generation consistent with data assimilation algorithms based on quadratic nonlinear least squares regression. A secondary motivation of this article is to extend the previous work on the theories of square root and perturbed observation ensemble generation by first showing mathematically how they are related to each other. Second, and equally importantly, we will extend previous work by illustrating that the fundamental assumption of both square root and perturbed observations ensemble generation techniques, irrespective of whether they are configured to perform linear or quadratic nonlinear regression, is that the posterior error variance is independent of innovation. This has important ramifications to the data assimilation as Hodyss (2011) showed that whenever the posterior is skewed the true posterior error variance is in fact a function of the innovation.
In section 2 we review and extend the theory of square root and perturbed observation algorithms by providing mathematical relationships between them. In section 3 we illustrate the fundamental properties of these two ensemble generation schemes in Lorenz (1963) for both the EBKF and quadratic ensemble filter algorithms. Section 4 applies these techniques to nonlinearly evolving shear instabilities. Finally, section 5 closes the manuscript with a recapitulation of the most important results and a discussion of the major conclusions.
2. Square root and perturbed observation ensemble generation
We begin by reviewing the properties of square root and perturbed observation ensemble generation schemes. The presentation that follows only discusses the ensemble generation step of an ensemble-based data assimilation algorithm because given the same prior ensemble the proper application of square root or perturbed observation ensemble generation has no effect on the mean update.
a. Ensemble-based Kalman filter form





Choosing to approximate random samples from the posterior distribution by insisting that those samples satisfy (2.1) makes at least three important assumptions. The first assumption is related to the fact that the moments higher than the second determined from a specific square root representation (e.g., Anderson 2001; Bishop et al. 2001) are different because (2.1) only aims to constrain the second moments and this likely provides some explanation as to why different square root schemes obtain different performance levels in various applications. This difference in their moments can be understood by noting the fact that both are identically constrained to satisfy (2.1) and, therefore, the only difference between them is a rotation of the resulting ensemble members, which can only result in differences in the higher moments. This aspect of the problem will not be pursued here. The second assumption with generating an ensemble using (2.1) arises from the fact that we implicitly assume that we accurately know the required forecast and observation error covariance matrices (Houtekamer and Mitchell 2005; p. 3273). This assumption is never satisfied in practice as these objects are simply unknown and/or are difficult to estimate with a limited number of ensemble members.






Because we wish to generate an ensemble whose error variance is consistent with the expected squared error being made by our data assimilation system given the specific observations we obtained from the present analysis cycle, the ensemble should satisfy the stronger constraint (2.3b) and not (2.3a). It is important to keep in mind that even if the model is perfect and the ensemble has an infinite number of members that (2.1) will still be independent of innovation and therefore incorrect when the posterior distribution has a nonzero third moment. The fact that (2.1) is, in the sense described above, inconsistent with the true posterior error variance (and higher moments) associated with the latest set of observations, will be illustrated below but the new algorithms listed in the next section will not address this issue.
























b. Quadratic ensemble filter form















3. Illustrative examples with Lorenz (1963)
a. True versus expected posterior variances


The posterior distribution plotted as a function of each element of the innovation. The black dots represent members of the posterior distribution and the gray dots are the shadow of the posterior. This “shadow” represents the distribution of innovations.
Citation: Monthly Weather Review 141, 7; 10.1175/MWR-D-12-00117.1
In Fig. 1 is shown the distribution of each state variable (x, y, and z) plotted as a scatter diagram as a function of the two elements of the innovation vector. Consistent with the discussion found in Hodyss (2011) the unobserved variable x is the most curved (a characteristic sign of non-Gaussian distributions) as a function of innovation. Nevertheless, the observed variables (y and z) are also curved, although considerably less so than the unobserved variable. Hodyss (2011) showed that this curvature leads to the posterior error variance being a function of the innovation. In Fig. 2 the error variance of the posterior (i.e., the mean square difference between each state estimate and the truth is calculated) is plotted as a function of the innovation [(2.3b)] for three different state estimates: true posterior mean, EBKF mean, and the quadratic ensemble filter mean of Hodyss (2012). The point to be taken from Fig. 2 is that the true posterior error variance is a strong function of the most recent innovation and a proper ensemble generation scheme will produce a different error variance for each different innovation that might be obtained. Note however that (2.1) and (2.18) are not a function of the innovation and therefore cannot vary appropriately with innovation such that an ensemble generation scheme based upon either of them could never produce Fig. 2. What in fact (2.1) and (2.18) are calculating is illustrated next.
The binned posterior error variance as a function of the two elements of the innovation vector for three state estimates. (from top to bottom) The rows correspond with each state variable: x, y, and z, respectively. (from left to right) The columns correspond with the true posterior mean, EBKF estimate, and quadratic ensemble filter estimate of the posterior mean. Bins without color (white) received less than 100 samples and an error variance was not calculated.
Citation: Monthly Weather Review 141, 7; 10.1175/MWR-D-12-00117.1



Expected error statistics from the Lorenz (1963) experiment. Starting at the top of each cell is the stated statistic for the x, y, and then z variables. SQ refers to a square root algorithm and PO refers to perturbed observations.
Moments | Prior | Posterior | Kalman (true) | Kalman SQ | Kalman PO | Quad (true) | Quad SQ | Quad PO |
Second | 0.0321 | 0.0202 | 0.0258 | 0.0259 | 0.0258 | 0.0204 | 0.0204 | 0.02040 |
2.04 | 0.460 | 0.469 | 0.469 | 0.468 | 0.461 | 0.461 | 0.460 | |
1.75 | 0.373 | 0.413 | 0.413 | 0.413 | 0.374 | 0.374 | 0.374 | |
Third (×102) | −0.550 | −0.0904 | −0.311 | −0.380 | −0.312 | −0.0940 | −0.159 | −0.0938 |
−24.4 | −4.38 | −4.65 | −15.6 | −4.41 | −4.69 | −12.0 | −4.44 | |
−220 | −5.62 | −8.39 | −38.6 | −8.77 | −5.35 | −16.3 | −5.73 | |
Fourth (×102) | 0.519 | 0.143 | 0.326 | 0.350 | 0.327 | 0.148 | 0.160 | 0.147 |
1180 | 65.8 | 67.8 | 67.0 | 67.6 | 66.0 | 60.1 | 65.8 | |
1377 | 48.2 | 62.2 | 112 | 62.6 | 48.3 | 50.0 | 48.7 |
b. Cycling experiments
In this section we will perform cycling experiments with the Lorenz system for a variety of ensemble sizes and cycling intervals. We will perform square root and perturbed observation ensemble generation experiments here within both EBKF and quadratic ensemble filter DA algorithms. In these experiments we observe variables x and z of the Lorenz equations in (3.1a)–(3.1c) with observation error variances of 0.1 on both variables. We examine three cycling intervals of τ = 0.05, 0.1, and 0.15 units of time. The shorter cycling interval of τ = 0.05 will be used to understand the performance in a relatively linear (Gaussian) situation, while the longer cycling interval of τ = 0.15 will be used to understand the performance in a relatively nonlinear (non-Gaussian) situation. Four ensemble sizes will be shown: K = 32, 64, 128, and 256. For each ensemble size, cycling interval and ensemble generation method, the prior inflation is tuned on four independent (starting from different points on the attractor as well as a different random seed in the random number generator) assimilations runs of 2050 cycles in which the first 50 are discarded and statics are calculated on the last 2000 cycles. The optimal inflation factor is found by finding the factor that leads to the minimum average RMS analysis error over the four independent assimilation runs. Subsequently, another independent assimilation experiment is then begun using the optimally determined inflation and cycled for 10 050 cycles in which again the first 50 are discarded and statistics are calculated over the last 10 000 cycles.
Figure 3 shows the root-mean-square (RMS) analysis errors for all experiments as a function of the cycling interval τ. The main result to be taken from this figure is that for weak nonlinearity (τ = 0.05) quadratic nonlinear regression is comparable to linear (EBKF) regression, while for strong nonlinearity (τ = 0.15) quadratic nonlinear regression is superior. This superiority becomes greater at the larger ensemble sizes because at these ensemble sizes the third and fourth moments become resolved quite well. Figure 3 reveals the peculiar property that at τ = 0.05 the Kalman-based square root generation technique actually obtains larger RMS analysis errors for the larger ensemble sizes. This pathological property of square root filters has been seen previously in Sakov and Oke (2008, their Fig. 4) and Anderson (2010, his Figs. 2 and 10). Figure 10 in Anderson (2010) is particularly pertinent as it is also in the Lorenz (1963) model but with a different (larger) number of observations and different (larger) observation error variances, which lends credence to the generality of this behavior.
RMS analysis error in Lorenz-63 for three cycling intervals and ensemble sizes of (a) 32, (b) 64, (c) 128, and (d) 256 members. Solid lines are for the quadratic techniques and dashed lines are for the Kalman-based techniques. Black lines are for square root generation and gray lines are for perturbed observation ensemble generation.
Citation: Monthly Weather Review 141, 7; 10.1175/MWR-D-12-00117.1






Distribution of outliers in Lorenz (1963) experiments with a cycling interval of τ = 0.05. Solid lines are for the quadratic techniques and dashed lines are for the Kalman-based techniques. Black lines are for square root generation and gray lines are for perturbed observation ensemble generation.
Citation: Monthly Weather Review 141, 7; 10.1175/MWR-D-12-00117.1

4. Cycling experiments in a 2D shear layer
This section describes experiments with nonlinearly evolving shear wave instabilities and compares with the experiments in Hodyss (2012) that only showed results for experiments with perturbed observations. The difference between Hodyss (2012) and here is that we now include ensemble generation using the square root forms discussed in section 2. We will provide a very brief overview of this experiment, but we refer the reader to Hodyss (2012) for details of the model, observations and observational network, and both Kalman and quadratic data assimilation algorithms for the estimate of the posterior mean.
A shear layer is simulated such that shear instabilities are produced in a model configuration with a state vector of length 8448 in which half the state variables correspond with the vorticity and the other half correspond with the potential temperature. Two separate ensemble DA algorithms are tested: 1) an EBKF using the square root ensemble generation of section 2a and 2) a quadratic ensemble filter using the square root ensemble generation of section 2b. All DA algorithms are coded to assimilate all observations at once (i.e., operate as a “global solve”) in which 640 observations of zonal winds and temperature are assimilated. Four different cycling intervals are examined (100, 200, 300, and 400 time steps of the model). As shown by Hodyss (2012) the longer cycling intervals correspond with larger third moments. In a general sense the cycling intervals of 100 and 200 cycles can be thought of as nearly linear, 300 cycles as moderately nonlinear, and 400 cycles as strongly nonlinear. Both DA algorithms use prior inflation as well as the localization scheme of Bishop et al. (2011). These two techniques are separately tuned for each DA algorithm to deliver the minimum RMS analysis error (with respect to a truth run) over the first third (120 cycles, throwing out the first 20) of the test period. The experiments are carried out for 64-, 128-, and 256-member ensembles and for 320 cycles in which the first 20 are thrown away and statistical validation is carried out over the last 300. The RMS analysis error over the last 300 cycles is shown in Fig. 5 for different cycling intervals and ensemble sizes and includes the perturbed observation results from Hodyss (2012). Figure 5 shows that for all cycling intervals shown the performance of the quadratic ensemble filter with square root generation is superior to that of the EBKF with square root ensemble generation. At a cycling interval of τ = 100 and for 64-member ensembles (not shown) the EBKF with square root ensemble generation was technically better than the quadratic ensemble filter, but as discussed below was not statistically distinguishable from a bootstrap resampling perspective. In comparison with a perturbed observation generation scheme we can see in Fig. 5 that for τ > 200 perturbed observations was superior to square root generation. This result, that longer cycling intervals favor perturbed observations, was also seen in the Lorenz (1963) experiments and similar to those experiments we believe is due to the greater nonlinearity for longer cycling intervals as well as the possibility of the emergence of outliers here as well.
RMS analysis error over 300 cycles for (a),(c),(e) vorticity and (b),(d),(f) potential temperature for three different cycling intervals and ensemble sizes of 64, 128, and 256 members. Solid lines are for the quadratic techniques and dashed lines are for the Kalman-based techniques. Black lines are for square root generation and gray lines are for perturbed observation ensemble generation. The cycling interval of τ = 400 was sufficiently nonlinear so that 64- and 128-member ensembles lead to filter divergence for the square root techniques. For this case the 256-member square root ensemble is represented in the Kalman technique by a filled circle and the quadratic technique is denoted by the filled square. For the cycling interval of t = 300, 64-member ensembles lead to filter divergence for both square root techniques and the RMS analysis error is therefore not plotted.
Citation: Monthly Weather Review 141, 7; 10.1175/MWR-D-12-00117.1
To assess statistical significance we use bootstrap resampling (Wilks 2006) that we apply to the 300 paired differences between the RMS analysis error of the EBKF and the quadratic ensemble filter. We only apply bootstrap resampling to the square root generation experiments because Hodyss (2012) already discussed the statistical significance of the perturbed observations experiments. We choose a confidence interval defined by the 5% and 95% resampling interval. We assume the difference in the RMS analysis error is significant when this confidence interval does not include zero (because this is a paired difference, a zero value is the dividing line between the two techniques). We find that the quadratic ensemble filter is statistically significantly different from the EBKF for all experiments with 256-member ensembles. For 64- and 128-member ensembles the two DA algorithms are statistically different for cycling intervals greater than 100.
5. Summary and conclusions
A new square root ensemble generation technique is described that is consistent with the expected analysis error variance of the quadratic ensemble filter of Hodyss (2012). Because the expected error variance of the posterior is affected by the third and fourth moments of the prior, this new technique provides some accounting for the effects of skewness. It was shown that this new technique provides a better estimate than a Kalman-based square root technique of the state of a nonlinear system in both the Lorenz (1963) model as well as in an experiment with a nonlinear model simulating shear layer instabilities. The outlier problem for square root ensemble generation of Lawson and Hansen (2004), Sakov and Oke (2008), and Anderson (2010) was found in the experiments with the Lorenz (1963) model and was shown to severely degrade the performance of the filter. The new quadratic square root technique appeared to have less of an issue with outliers, though for large enough cycling interval (a proxy for nonlinearity) it too developed large outliers that severely degraded its performance.
It was also shown that the most important issue with this technique as well as all previous ensemble generation techniques is their reliance on the expected posterior error variance rather than the error variance consistent with the latest innovation vector. It is important to realize that because the posterior error variance created by both square root and perturbed observations algorithms are incorrect when the posterior distribution is skewed, this implies that the state estimate from an ensemble-based Kalman (quadratic ensemble) filter does not actually deliver the minimum error variance estimate of a linear (quadratic) estimator even when the model is perfect and the ensemble is infinite.
Currently, we are examining techniques that attempt to reduce the impact of this issue with the error variance in skewed distributions through the application of appropriately designed posterior inflation algorithms. In addition, we are currently working toward applying quadratic nonlinear regression to a numerical weather prediction setting.
Acknowledgments
We gratefully acknowledge support from the Office of Naval Research PE-0601153N.
REFERENCES
Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation. Mon. Wea. Rev., 129, 2884–2903.
Anderson, J. L., 2010: A non-Gaussian ensemble filter update for data assimilation. Mon. Wea. Rev., 138, 4186–4198.
Anderson, J. L., T. Hoar, K. Raeder, H. Liu, N. Collins, R. Torn, and A. Avellano, 2009: The data assimilation research testbed: A community facility. Bull. Amer. Meteor. Soc., 90, 1283–1296.
Bishop, C. H., B. J. Etherton, and S. J. Majumdar, 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects. Mon. Wea. Rev., 129, 420–436.
Bishop, C. H., D. Hodyss, P. Steinle, H. Sims, A. M. Clayton, A. C. Lorenc, D. M. Barker, and M. Buehner, 2011: Efficient ensemble covariance localization in variational data assimilation. Mon. Wea. Rev., 139, 573–580.
Burgers, G., P. J. Van Leeuwen, and G. Evensen, 1998: Analysis scheme in the ensemble Kalman filter. Mon. Wea. Rev., 126, 1719–1724.
Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99 (C5), 10 143–10 162.
Evensen, G., 2003: The ensemble Kalman filter: Theoretical formulation and practical implementation. Ocean Dyn., 53, 343–367.
Ghil, M., and P. Malanotte-Rizzoli, 1991: Data assimilation in meteorology and oceanography. Advances in Geophysics, Vol. 33, Academic Press, 141–266.
Golub, G. H., and G. A. Meurant, 2010: Matrices, Moments and Quadrature with Applications. Princeton University Press, 363 pp.
Hodyss, D., 2011: Ensemble state estimation for nonlinear systems using polynomial expansions in the innovation. Mon. Wea. Rev., 139, 3571–3588.
Hodyss, D., 2012: Accounting for skewness in ensemble data assimilation. Mon. Wea. Rev., 140, 2346–2358.
Hodyss, D., and P. A. Reinecke, 2013: Skewness of the prior through position errors and its impact on data assimilation. Data Assimilation for Atmospheric, Oceanic, and Hydrologic Applications, S. K. Park and L. Xu, Eds., Vol. II, Springer, 843 pp.
Houtekamer, P. L., and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 129, 123–137.
Houtekamer, P. L., and H. L. Mitchell, 2005: Ensemble Kalman filtering. Quart. J. Roy. Meteor. Soc., 131, 3269–3289.
Houtekamer, P. L., H. L. Mitchell, G. Pellerin, M. Buehner, M. Charron, L. Spacek, and B. Hansen, 2005: Atmospheric data assimilation with an ensemble Kalman filter: Results with real observations. Mon. Wea. Rev., 133, 604–620.
Jazwinski, A. H., 1970: Stochastic Processes and Filtering Theory. Academic Press, 376 pp.
Lawson, W. G., and J. A. Hansen, 2004: Implications of stochastic and deterministic filters as ensemble-based data assimilation methods in varying regimes of error growth. Mon. Wea. Rev., 132, 1966–1981.
Lei, J., P. Bickel, and C. Snyder, 2010: Comparison of ensemble Kalman filters under non-Gaussianity. Mon. Wea. Rev., 138, 1293–1306.
Lorenz, E. N., 1963: Deterministic non-periodic flow. J. Atmos. Sci., 20, 130–141.
Meng, Z., and F. Zhang, 2008: Tests of an ensemble Kalman filter for mesoscale and regional-scale data assimilation. Part IV: Comparison with 3DVAR in a month-long experiment. Mon. Wea. Rev., 136, 3671–3682.
Sakov, P., and P. R. Oke, 2008: Implications of the form of the ensemble transformation in the ensemble square root filter. Mon. Wea. Rev., 136, 1042–1053.
Szunyogh, I., E. J. Kostelich, G. Gyarmati, E. Kalnay, B. R. Hunt, E. Ott, E. Satterfield, and J. A. Yorke, 2008: A local ensemble transform Kalman filter data assimilation system for the NCEP global model. Tellus, 60, 113–130.
Torn, R. D., and G. J. Hakim, 2008: Performance characteristics of a pseudo-operational ensemble Kalman filter. Mon. Wea. Rev., 136, 3947–3963.
Wang, X., C. H. Bishop, and S. J. Julier, 2004: Which is better, an ensemble of positive-negative pairs or a centered spherical simplex ensemble? Mon. Wea. Rev., 132, 1590–1605.
Whitaker, J. S., T. M. Hamill, X. Wei, Y. Song, and Z. Toth, 2008: Ensemble data assimilation with the NCEP Global Forecast System. Mon. Wea. Rev., 136, 463–482.
Wilks, D. S., 2006: Statistical Methods in the Atmospheric Sciences. 2nd ed. Academic Press, 627 pp.