1. Introduction
Many ensemble-based Kalman filtering (EBKF) algorithms process the observations serially (e.g., Houtekamer and Mitchell 2001; Whitaker and Hamill 2002; Anderson 2003; Snyder and Zhang 2003). Serial observation processing views the data assimilation process as an iterative sequence of scalar update equations. What is useful about this data assimilation algorithm is that it has very low memory requirements and does not need complex methods to perform the typical high-dimensional inverse calculations of many other algorithms.
Recently, the push has been toward the prediction, and therefore the assimilation of observations, for regions and phenomena for which high resolution is required and/or highly nonlinear physical processes are operating. For these situations, a basic hypothesis is that the use of the EBKF is suboptimal and performance gains could be achieved by accounting for aspects of the non-Gaussianty. To this end, we develop here a new component of the Data Assimilation Research Testbed (DART; Anderson et al. 2009) to allow for a wide variety of users to test this hypothesis. This new version of DART allows one to run several variants of the EBKF, as well as several variants of the quadratic polynomial filter (Hodyss 2011), while employing the exact same forecast model and observations. Differences between the results of the two systems will then highlight the degree of non-Gaussianity in the system being examined.
This new version of DART will employ the polynomial filtering theory of Hodyss (2011, 2012), which is extended here to serial observation processing data assimilation systems. The key useful feature of polynomial filtering is that its structure is homomorphic to the EBKF. This ensures that changes to already constructed codes are minor and fit within the overall flow. The method essentially takes an augmented state vector approach in which one augments both the states and the observations with new states and synthetic observations corresponding to their squares. As we show below, this process of state augmentation corresponds with quadratic polynomial regression.
Section 2 will review both polynomial regression and serial observation processing, and then will subsequently show how to merge the two methods. Section 3 illustrates the basic ideas in a simple scalar system and shows under what regimes we believe that polynomial filtering can be significantly better than the EBKF. Section 4 shows the performance of the new algorithm within the DART framework with several different modeling tests. A summary of the most important results and our conclusions from them are provided in section 5.
2. Polynomial regression using serial observation processing
In this section we will first review the previous work on extending ensemble methods to perform polynomial regression and then we will show how to connect this previous body of work to serial observation processing. Because our intended application is the DART infrastructure, we will attempt to remain consistent with the symbology of Anderson and Collins (2007). Table 1 provides a synopsis of the symbols that will be used throughout this manuscript.
Definitions of variables.

a. Polynomial regression





Equation (1) reveals three important features of the impact of curvature on state estimation. First, the magnitude of two different third moments (the third moments in observation space and the third moments between the observation space squared and the state space to be updated) needs to be significant for the new terms to contribute. If they vanish, an expansion truncated at these terms reduces to the EBKF. Second, the new information from the third moment does not just provide a new term that is quadratic in the observations but also provides for a correction to the slope of the term linear in the observations. Finally, when the observation and the prior mean happen to be equal, such that the innovation vanishes, a correction is still made to the prior mean when the prior is skewed. This occurs because when the observations happen by chance to be equal to the prior mean when the prior is skewed, then the implied curvature of the posterior is such that its center bends away from the prior mean (e.g., Hodyss 2012).











The reader may readily verify, using the known formula for the inverse of a 2 × 2 matrix, that the first row in (8) is the same as (6) and therefore consistent with (1). The key feature of the gain in (8) is that the calculation of third and fourth moments has been rendered identical to the calculation of a covariance. More specifically, the calculation of the third moment is simply the “covariance” between the jth state variable and the ith “squared” observation. Thinking of the calculation of the moments higher than the second as simply the covariance between various powers of perturbations allows for the convenient calculation of these higher moments using the algorithmic tools of the EBKF.
Further, because the structure of (5) with gain (8) is precisely of the form of the EBKF, we know that we may serially process the observation and the new pseudo-observation. To do this, we associate the new pseudo-observation with a portion of the new pseudoinnovation, which is the second element in (7). In the pseudoinnovation, we associate
The reader may have noticed that our decomposition of the pseudo-observation as
b. Serial observation processing within DART
1) Linear regression
We use the ensemble adjustment Kalman filter (EAKF; Anderson (2001) to illustrate this section. The modifications required to implement a stochastic EBKF (Burgers et al. 1998; Evensen 2003) are illustrated later. The state vector will have








2) Quadratic polynomial regression
We emphasize here that we will make the approximation of Hodyss (2012) and neglect the quadratic cross terms between observations. This approximation reduces the computational cost and memory requirements considerably at the expense of discarding information about higher-order relationships between variables.
To convert the linear regression algorithm of the previous section to quadratic polynomial regression, one needs to perform the two modifications discussed in section 2a: 1) augment the innovation with the new pseudo-observations and 2) augment the state vector with pseudosquared states in order to produce a covariance matrix of the form (9). We discuss the form of these modifications next.















3) Issues with ensemble generation
We will begin this section by discussing issues that we had with developing the quadratic polynomial version of the stochastic EBKF [Burgers et al. (1998); Evensen (2003); sometimes erroneously referred to as “perturbed” observations; see Hodyss (2011), Hodyss and Campbell (2013), and Posselt et al. (2014) for further discussion] and then follow this up with a brief discussion of the differences between stochastic and deterministic ensemble generation methods.
It is important to realize that when the observation likelihood is Gaussian that the likelihood for the pseudo-observations is not. Hence, when generating an ensemble using the framework of the stochastic EBKF, it is technically incorrect to use Gaussian noise for the pseudo-observations. Nevertheless, the present version of DART associates the pseudo-observations with Gaussian noise because of the computational complexity of using the correct noise. The main goal of this section is to illustrate and explain this issue and how we chose to tackle it.
This issue is identical to the corresponding issue in the traditional stochastic EBKF when applied to skewed observation likelihoods [see Hodyss (2011, 2012) for further discussion]. When the observation likelihood is skewed, one must draw the noise for the observations from the correct skewed distribution and must also change the form of the update equation. While the change to the update equation is minor, the change in the way one draws the noise for the quadratic polynomial filter is quite complicated.
To illustrate the issue, we construct a simple scalar non-Gaussian example and compare the correct ensemble generated by a stochastic ensemble generation method and that from the approximation we use in DART. We assume the prior is
As discussed in several places recently (see Hodyss 2011; Hodyss and Campbell 2013; Posselt et al. 2014; Hodyss et al. 2016), the intended result of the stochastic EBKF is not the true posterior moments but in fact their expected values with respect to the marginal distribution of observations. The expected posterior moments of a state estimation method may be generated by calculating the state estimate for each observation of the set of trials, then subtracting this state estimate from the truth, raising this error to the power of the moment to be estimated, and finally averaging these quantities over all trials. These true expected posterior moment values for the quadratic polynomial filter are reported upon in Table 2.
Expected posterior moments of different quadratic polynomial ensemble generation methods.













The true expected posterior PDF from a stochastic algorithm is shown in blue. The resulting posterior PDF from the stochastic DART algorithm is shown in red. The stochastic QF algorithm described in (30) produces a posterior PDF identical to the blue curve.
Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0089.1













3. Understanding when quadratic polynomial filtering is useful
Quadratic polynomial filtering requires the estimation of the third and fourth moments of the prior, which are more difficult at small ensemble sizes to estimate than the second moments of the EBKF. On the other hand, when non-Gaussianity is large, the resulting curvature of the posterior mean will be missed by the EBKF. Hence, we anticipate a competition between the size of the sampling errors from small ensemble sizes and the effects of non-Gaussianity, and its implied curvature, on the relative quality of EBKF and quadratic state estimates.








The results of two sets of experiments are presented in Fig. 2. In the top panel in Fig. 2 we present results for essentially an infinite ensemble size. Because the prior is Gamma we know all of the moments of the prior without sampling error. Therefore, we evaluate

Posterior MSE as a function of skewness. Results for (a) very large ensemble size and (b) for
Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0089.1
It is interesting to see in Fig. 2 that the MSE of the EBKF is independent of skewness for infinite ensemble size. This result runs counter to the more typical philosophy that the EBKF is “worse” in non-Gaussian systems. In fact, the MSE of the EBKF for infinite ensemble size is just as good for Gaussian systems as it is for non-Gaussian systems. This result is actually already known from (16), which clearly shows that the expected squared error,

We also report in Fig. 2 the same experiments but with an ensemble size of
As the skewness increases, the quadratic polynomial filter and the EBKF become comparable around a skewness of about ½ at this particular ensemble size. Beyond a skewness of ½, the non-Gaussianity is so large that it is better to account for it using the quadratic polynomial filter than to ignore it with the EBKF. The standard particle filter is not better than the EBKF until a skewness of about ¾ and is never better than the quadratic polynomial filter over the range of skewness shown here. We believe that this result—that the standard particle filter is more strongly impacted by sampling error than the quadratic polynomial filter—is because the standard particle filter is attempting to implicitly evaluate all of the terms of the polynomial expansion in (1). By contrast, the quadratic polynomial filter has truncated the polynomial to terms that are more accurately evaluated at these small ensemble sizes.






4. Example test problems
The goals of this section are twofold. On the one hand, we would like to describe the differences we see between the EBKF linear regression schemes available in DART and the new quadratic polynomial method. On the other hand, we would like to provide a set of test cases that potential new users of this new DART version can use for testing and validation of their setup. To these ends, we now illustrate a collection of experiments in a simple nonlinear model as well as in a simple general circulation model (GCM) that has been used previously in other studies using DART (e.g., Anderson et al. 2005).
a. Lorenz-63
The Lorenz-63 (Lorenz 1963) system of equations is a standard test case for nonlinear data assimilation and is a standard component of the DART download package. The strong non-Gaussianity in this model arises because its attractor is extremely curved and Hodyss (2011) argued that curvature of the attractor imposes curvature on the posterior mean, which is associated with large posterior third moments, and therefore this system poses a challenging data assimilation problem for the EBKF. These extremely curved posterior distributions in Lorenz-63 allow us to illustrate the full potential of the quadratic polynomial filter.
We will use four different data assimilation methods: 1) the EBKF with ensemble adjustment ensemble generation (EAKF), 2) the EBKF with stochastic ensemble generation (EnKF), 3) the quadratic polynomial filter with ensemble adjustment ensemble generation (EAQF), and 4) the quadratic polynomial filter with stochastic ensemble generation (EnQF). These experiments will use observations of the x and z variables with an observation error variance of 0.1. We cycle for 10 000 cycles with observations 0.12 time units apart (12 time steps of the model). We will perform experiments with ensemble sizes of 5, 10, 20, 50, 100, and 1000 members. For all methods we used the adaptive inflation technique of Anderson (2009). For both versions of the quadratic polynomial filter we tuned the third-moment damping [see (43)] to deliver the minimum posterior RMSE.
The results of these experiments in terms of the posterior RMSE of the z variable are shown in Fig. 3. The RMSEs of the x and y variables follow the same pattern as the z variables and because of this are not shown. There appear to be two regimes in Fig. 3. For ensemble sizes less than about 50 members the method with smallest RMSE is the EAQF. For ensemble sizes greater than or equal to 50 the method with smallest RMSE is the EnQF. This same two-regime pattern of behavior is also apparent between the two EBKF methods. Therefore, for all ensemble sizes, polynomial filtering is superior to the EBKF in this system. We believe that this result is explained by the fact that the average absolute value of the prior skewness found in these experiments ranges from approximately 1 to 5 and therefore the results of this section are consistent with section 3 and Fig. 2, which showed that the benefit from polynomial filtering is large when the skewness is larger than approximately 1.

Posterior RMSEs of the z variable as a function of ensemble size for four data assimilation methods. Experiments that resulted in an RMSE greater than the observational uncertainty (
Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0089.1
The resulting quality of the shape of the posterior ensemble distributions can also be tested. We used binned spread–skill and the continuous ranked probability score (CRPS) to examine the quality of the ensemble distributions. We found that when the adaptive inflation of Anderson (2009) was properly tuned, all four methods had approximately the same binned spread–skill score in terms of the quality of the slope of the resulting plot (not shown). Additionally, the CRPS for all four methods simply ranked the methods identically to their rankings in Fig. 2, which reveals that the quality of the posterior mean is dominating the CRPS value.
b. BGRID GCM
The model used in this section is a coarse-resolution, dry, hydrostatic general circulation model (GCM) without topography (Anderson et al. 2004). The model uses the temperature forcing of Held and Suarez (1994) to maintain a time-evolving, baroclinically unstable Rossby wave field at midlatitudes and Rayleigh damping of winds near the surface to represent friction. The model domain consists of five vertical levels with 30 grid points in latitude and 60 grid points in longitude, which is approximately 6° resolution. This model is available as a standard option in the DART download package.
A variety of experiments were performed to determine the degree of nonlinearity/non-Gaussianity in this model. It was found that at this resolution, and without physical processes related to moisture, the error growth in this model is substantially slower than in typical, full-physics atmospheric models. By trying a variety of cycling intervals and observational densities, we determined that cycling with 10 days between observations was required to overcome this very slow error growth in order to develop the small degree of nonlinear/non-Gaussian behavior that we illustrate below. Additionally, we found that for all ensemble sizes tried here the EAKF and EAQF are substantially superior to the EnKF and EnQF. While we find this result—that deterministic methods are better than the stochastic methods in BGRID, but the reverse holds in Lorenz-63—to be very interesting, we do not pursue this here as we feel a detailed analysis of the differences to be outside the scope of the present work. For this reason, we limit the presentation here to only the EAKF and EAQF.
The observational network consists of 100 randomly distributed surface pressure observations with an observational error variance of 1 hPa2. The simulation is run for 50 yr while assimilating observations from this observational network every 10 days; this yields 1825 analysis times. Similar to the Lorenz-63 section above we employ the adaptive inflation technique of Anderson (2009), but here we also include both vertical and horizontal localization, which is tuned for minimum posterior RMSE. The RMSEs of these analyses are measured in the four prognostic variables of the model (temperature, zonal and meridional winds, and surface pressure) and will be calculated by discarding the first 100 analysis times and calculating the RMSE from the remainder.
We performed data assimilation with ensembles of size 10, 50, and 100 members. For these experiments we calculated the average absolute value of the prior skewness and found that for all variables it was approximately 0.1–0.5, which is an order of magnitude less than what was found in the Lorenz-63 experiments. Checking the impact at this level of skewness in Fig. 2 reveals that there is likely not much to be gained from quadratic polynomial filtering. Hence, we must expect from Fig. 2 that we will not see large improvements from quadratic polynomial filtering in this model setting.
The results in terms of the difference in posterior RMSEs between the EAKF and the EAQF as a function of different ensemble sizes and vertical model levels are shown in Fig. 4 and largely confirm our hypothesis that prior skewness values less than 1 lead to small improvements from quadratic polynomial filtering. The largest impact from the EAQF appears in the surface pressure, which is likely due to the observations being of that variable. The information from surface pressure observations must then be transmitted upward and to the other variables using the covariance structure as well as the higher-order comoments in the EAQF. In temperature and winds the EAQF generally provides a small improvement at all levels except for winds at the top of the model, which is inside a strong gravity-wave-damping region.

Posterior RMSE differences from the BGRID experiments. The difference between the posterior RMSE from the EAKF and the EAQF is shown in all panels. Positive values denote that the EAQF had smaller RMSE. Results for (a) temperature (K); (b) winds (m s−1), where solid is the zonal wind and dashed is the meridional winds; and (c) surface pressure (hPa). In (a) and (b) blue curves correspond to experiments with 10 members, red curves are for 50 members, and green curves are for 100 members.
Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0089.1
While it is tempting to anticipate that there should be greater benefit from larger ensemble sizes in Fig. 4, this line of thinking is confounded by the fact that there are generally larger errors for small ensemble sizes, which is confirmed by the fact that the posterior RMSEs for both the EAKF and the EAQF (not shown) are substantially larger for an ensemble size of 10 as compared to 100. Therefore, the non-Gaussianity is actually larger for small ensemble size, and because the EAQF responds to non-Gaussianity, the EAKF-EAQF difference as a function of ensemble size does not follow a clear pattern. Finally, similar to the results from the Lorenz-63 experiments, binned spread–skill and CRPS were used to examine the quality of the ensemble distributions and no notable differences were found when the adaptive inflation of Anderson (2009) was properly tuned.
5. Summary and conclusions
We have shown how to perform quadratic polynomial regression entirely within the standard framework of an ensemble-based Kalman filter (EBKF) that employs serial observation processing. The key principles are to rewrite multivariate, polynomial regression in a way that is homomorphic to the EBKF and to note that high-order, multivariate moments can be thought of as the covariance between the state and a power of a state variable. Applying these principles allows high-order, multivariate regression to follow the original flow of the EBKF and makes modifying an already constructed EBKF relatively straightforward.
We have shown that the benefit from quadratic polynomial filtering is obtained through a competition between the ensemble size and the level of prior skewness in the physical system being simulated. As a general rule, and with ensemble sizes less than 100 members, we would suggest using quadratic polynomial filtering for systems with an average absolute value of the prior skewness greater than about 1. If the user can accept the computational expense of ensemble sizes greater than 100, then quadratic polynomial filtering could be beneficial for skewness values less than 1. But in this case the benefit will be small as the user must acknowledge from Fig. 2 that even with an infinite ensemble a skewness of approximately 1.5 is required for a 10% improvement. Therefore, we suggest a careful examination of the prior skewness in the system to be studied before application of any nonlinear data assimilation method.
Third-moment damping was found to be an integral component to getting quadratic polynomial filtering to outperform the EBKF at ensemble sizes substantially less than 100. In the experiments in this manuscript the same α value was used for all variables in each experiment. It is likely however that performance gains would be realized by, for example, making the α value a function of field variable, latitude, and altitude in BGRID. Because of the success of third-moment damping, work in this direction is under way.
Quadratic polynomial filtering is now an option within the Data Assimilation Research Testbed (DART) of the Data Assimilation Research Section (DARES) of the National Center for Atmospheric Research. Those interested in making use of this facility should contact the DARES team (http://www.image.ucar.edu/DAReS/DART/) for the latest version of the code.
DH would like to thank Craig Bishop and Chris Snyder for useful conversations during this work. DH gratefully acknowledges support from the Office of Naval Research (Grant 4871-0-6-5).
REFERENCES
Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation. Mon. Wea. Rev., 129, 2894–2903, https://doi.org/10.1175/1520-0493(2001)129<2884:AEAKFF>2.0.CO;2.
Anderson, J. L., 2003: A local least squares framework for ensemble filtering. Mon. Wea. Rev., 131, 634–642, doi:10.1175/1520-0493(2003)131<0634:ALLSFF>2.0.CO;2.
Anderson, J. L., 2009: Spatially and temporally varying adaptive covariance inflation for ensemble filters. Tellus, 61A, doi:10.1111/j.1600-0870.2008.00361.x.
Anderson, J. L., 2010: A non-Gaussian ensemble filter update for data assimilation. Mon. Wea. Rev., 138, 4186–4198, doi:10.1175/2010MWR3253.1.
Anderson, J. L., and N. Collins, 2007: Scalable implementations of ensemble filter algorithms for data assimilation. J. Atmos. Oceanic Technol., 24, 1452–1463, doi:10.1175/JTECH2049.1.
Anderson, J. L., and Coauthors, 2004: The new GFDL global atmosphere and land model AM2–LM2: Evaluation with prescribed SST simulations. J. Climate, 17, 4641–4673, https://doi.org/10.1175/JCLI-3223.1.
Anderson, J. L., B. Wyman, S. Zhang, and T. Hoar, 2005: Assimilation of surface pressure observations using an ensemble filter in an idealized global atmospheric prediction system. Bull. Amer. Meteor. Soc., 62, 2925–2938, https://doi.org/10.1175/JAS3510.1.
Anderson, J. L., T. Hoar, K. Raeder, H. Liu, N. Collins, R. Torn, and A. Avellano, 2009: The Data Assimilation Research Testbed: A community facility. Bull. Amer. Meteor. Soc., 90, 1283–1296, doi:10.1175/2009BAMS2618.1.
Burgers, G., P. J. van Leeuwen, and G. Evensen, 1998: Analysis scheme in the ensemble Kalman filter. Mon. Wea. Rev., 126, 1719–1724, doi:10.1175/1520-0493(1998)126<1719:ASITEK>2.0.CO;2.
Evensen, G., 2003: The ensemble Kalman filter: Theoretical formulation and practical implementation. Ocean Dyn., 53, 343–367, doi:10.1007/s10236-003-0036-9.
Gaspari, G., and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions. Quart. J. Roy. Meteor. Soc., 125, 723–757, doi:10.1002/qj.49712555417.
Held, I. M., and M. Suarez, 1994: A proposal for the intercomparison of the dynamical cores of atmospheric general circulation models. Bull. Amer. Meteor. Soc., 75, 1825–1830, doi:10.1175/1520-0477(1994)075<1825:APFTIO>2.0.CO;2.
Hodyss, D., 2011: Ensemble state estimation for nonlinear systems using polynomial expansions in the innovation. Mon. Wea. Rev., 139, 3571–3588, doi:10.1175/2011MWR3558.1.
Hodyss, D., 2012: Accounting for skewness in ensemble data assimilation. Mon. Wea. Rev., 140, 2346–2358, doi:10.1175/MWR-D-11-00198.1.
Hodyss, D., and W. F. Campbell, 2013: Square root and perturbed observation ensemble generation techniques in Kalman and quadratic ensemble filtering algorithms. Mon. Wea. Rev., 141, 2561–2573, doi:10.1175/MWR-D-12-00117.1.
Hodyss, D., W. F. Campbell, and J. S. Whitaker, 2016: Observation-dependent posterior inflation for the ensemble Kalman filter. Mon. Wea. Rev., 144, 2667–2684, doi:10.1175/MWR-D-15-0329.1.
Houtekamer, P., and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 129, 123–137, doi:10.1175/1520-0493(2001)129<0123:ASEKFF>2.0.CO;2.
Lawson, W., and J. Hansen, 2004: Implications of stochastic and deterministic filters as ensemble-based data assimilation methods in varying regimes of error growth. Mon. Wea. Rev., 132, 1966–1981, doi:10.1175/1520-0493(2004)132<1966:IOSADF>2.0.CO;2.
Lorenz, E. N., 1963: Deterministic nonperiodic flow. J. Atmos. Sci., 20, 130–141, doi:10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2.
Posselt, D. J., D. Hodyss, and C. Bishop, 2014: Errors in ensemble Kalman smoother estimates of cloud microphysical parameters. Mon. Wea. Rev., 142, 1631–1654, doi:10.1175/MWR-D-13-00290.1.
Snyder, C., and F. Zhang, 2003: Assimilation of simulated Doppler radar observations with an ensemble Kalman filter. Mon. Wea. Rev., 131, 1663–1677, doi:10.1175//2555.1.
Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations. Mon. Wea. Rev., 130, 1913–1924, doi:10.1175/1520-0493(2002)130<1913:EDAWPO>2.0.CO;2.