• Alspach, D. L., and H. W. Sorenson, 1972: Nonlinear Bayesian estimation using Gaussian sum approximations. IEEE Trans. Autom. Control, 17, 438448.

    • Search Google Scholar
    • Export Citation
  • Anderson, J. L., and S. L. Anderson, 1999: A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts. Mon. Wea. Rev., 127, 27412758.

    • Search Google Scholar
    • Export Citation
  • Auclair, F., P. Marsaleix, and P. D. Mey, 2003: Space-time structure and dynamics of the forecast error in a coastal circulation model of the Gulf of Lions. Dyn. Atmos. Oceans, 36, 309346.

    • Search Google Scholar
    • Export Citation
  • Bengtsson, T., C. Snyder, and D. Nychka, 2003: Toward a nonlinear ensemble filter for high-dimensional systems. J. Geophys. Res., 108, 8775, doi:10.1029/2002JD002900.

    • Search Google Scholar
    • Export Citation
  • Bennett, A., 1992: Inverse Methods in Physical Oceanography. Cambridge University Press, 346 pp.

  • Bennett, A., 2002: Inverse Modeling of the Ocean and Atmosphere. Cambridge University Press, 234 pp.

  • Bertsekas, D. P., and J. N. Tsitsiklis, 2008: Introduction to Probability. 2nd ed. Athena Scientific, 544 pp.

  • Bishop, C. M., 2006: Pattern Recognition and Machine Learning. Springer, 738 pp.

  • Bocquet, M., C. A. Pires, and L. Wu, 2010: Beyond Gaussian statistical modeling in geophysical data assimilation. Mon. Wea. Rev., 138, 29973023.

    • Search Google Scholar
    • Export Citation
  • Casella, G., and R. L. Berger, 2001: Statistical Inference. 2nd ed. Duxbury, 660 pp.

  • Chen, R., and J. S. Liu, 2000: Mixture Kalman filters. J. Roy. Stat. Soc., 62B, 493508.

  • Cover, T. M., and J. A. Thomas, 2006: Elements of Information Theory. Wiley-Interscience, 748 pp.

  • Commission on Physical Sciences, Mathematics, and Applications, 1993: Statistics and Physical Oceanography. The National Academies Press, 62 pp.

  • Daley, R., 1991: Atmospheric Data Analysis. Cambridge University Press, 457 pp.

  • Dee, D. P., and A. M. D. Silva, 2003: The choice of variable for atmospheric moisture analysis. Mon. Wea. Rev., 131, 155171.

  • Dempster, A. P., N. M. Laird, and D. B. Rubin, 1977: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc., 39B, 138.

    • Search Google Scholar
    • Export Citation
  • Dimet, F. X. L., and O. Talagrand, 1986: Variational algorithms for analysis and assimilation of meteorological observations. Tellus, 38A, 97110.

    • Search Google Scholar
    • Export Citation
  • Doucet, A., N. de Freitas, and N. Gordon, 2001: Sequential Monte-Carlo Methods in Practice. Springer-Verlag, 612 pp.

  • Dovera, L., and E. D. Rossa, 2011: Multimodal ensemble Kalman filtering using Gaussian mixture models. Comput. Geosci., 15,307323.

  • Duda, R. O., P. E. Hart, and D. G. Stork, 2001: Pattern Classification. 2nd ed. Wiley-Interscience, 654 pp.

  • Eisenberger, I., 1964: Genesis of bimodal distributions. Technometrics, 6, 357363.

  • Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte-Carlo methods to forecast error statistics. J. Geophys. Res., 99 (C5), 10 14310 162.

    • Search Google Scholar
    • Export Citation
  • Evensen, G., 2007: Data Assimilation: The Ensemble Kalman Filter. Springer, 279 pp.

  • Eyink, G. L., and S. Kim, 2006: A maximum entropy method for particle filtering. J. Stat. Phys., 123, 10711128.

  • Frei, M., and H. R. Kunsch, 2013: Mixture ensemble Kalman filters. Comput. Stat. Data Anal., 58, 127138.

  • Gelb, A., 1974: Applied Optimal Estimation. MIT Press, 374 pp.

  • Ghanem, R., and P. Spanos, 1991: Stochastic Finite Elements: A Spectral Approach. Springer-Verlag, 214 pp.

  • Ghil, M., and P. Malanotte-Rizzoli, 1991: Data assimilation in meteorology and oceanography. Advances in Geophysics, Vol. 33, Academic Press, 141–266.

  • Holmes, P., J. Lumley, and G. Berkooz, 1996: Turbulence, Coherent Structures, Dynamical Systems, and Symmetry. Cambridge University Press, 420 pp.

  • Hoteit, I., D. T. Pham, G. Triantafyllou, and G. Korres, 2008: A new approximate solution of the optimal nonlinear filter for data assimilation in meteorology and oceanography. Mon. Wea. Rev., 136, 317334.

    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., H. L. Mitchell, and L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique. Mon. Wea. Rev., 126, 796811.

    • Search Google Scholar
    • Export Citation
  • Ide, K., P. Courtier, M. Ghil, and A. Lorenc, 1997: Unified notation for data assimilation: Operational, sequential and variational. J.Meteor. Soc. Japan, 75, 181189.

    • Search Google Scholar
    • Export Citation
  • Jazwinski, A. H., 1970: Stochastic Processes and Filtering Theory. Academic Press, 376 pp.

  • Kalman, R. E., 1960: A new approach to linear filtering and prediction problems. Trans. ASME, 82D, 3545.

  • Kalnay, E., 2003: Atmospheric Modeling, Data Assimilation and Predictability. Cambridge University Press, 341 pp.

  • Kim, S., G. L. Eyink, J. M. Restrepo, F. J. Alexander, and G. Johnson, 2009: Ensemble filtering for nonlinear dynamics. Mon. Wea. Rev., 131, 25862594.

    • Search Google Scholar
    • Export Citation
  • Kotecha, J. H., and P. A. Djuric, 2003: Gaussian particle filtering. IEEE Trans. Signal Process., 51, 25922601.

  • Krause, P., and J. M. Restrepo, 2009: The diffusion kernel filter applied to Lagrangian data assimilation. Mon. Wea. Rev., 137, 43864400.

    • Search Google Scholar
    • Export Citation
  • Lermusiaux, P. F. J., 1997: Data assimilation via error subspace statistical estimation. Ph.D. thesis, Division of Engineering and Applied Sciences, Harvard University, 402 pp.

  • Lermusiaux, P. F. J., 1999a: Data assimilation via error subspace statistical estimation. Part II: Middle Atlantic Bight shelfbreak front simulations. Mon. Wea. Rev., 127, 14081432.

    • Search Google Scholar
    • Export Citation
  • Lermusiaux, P. F. J., 1999b: Estimation and study of mesoscale variability in the Strait of Sicily. Dyn. Atmos. Oceans, 29, 255303.

  • Lermusiaux, P. F. J., 2001: Evolving the subspace of the three-dimensional multiscale ocean variability: Massachusetts Bay. J. Mar. Syst., 29, 385422.

    • Search Google Scholar
    • Export Citation
  • Lermusiaux, P. F. J., 2006: Uncertainty estimation and prediction for interdisciplinary ocean dynamics. J. Comput. Phys., 217, 176199, doi:10.1016/j.jcp.2006.02.010.

    • Search Google Scholar
    • Export Citation
  • Lermusiaux, P. F. J., 2007: Adaptive modeling, adaptive data assimilation and adaptive sampling. Physica D, 230, 172196.

  • Lermusiaux, P. F. J., and A. Robinson, 1999: Data assimilation via error subspace statistical estimation. Part I: Theory and scheme. Mon. Wea. Rev., 127, 13851407.

    • Search Google Scholar
    • Export Citation
  • Lermusiaux, P. F. J., C.-S. Chiu, and A. R. Robinson, 2002a: Modeling uncertainties in the prediction of the acoustic wavefield in a shelfbreak environment. Theoretical and Computational Acoustics, E.-C. Shang, Q. Li, and T. Gao, Eds., World Scientific Publishing Co., 191–200.

  • Lermusiaux, P. F. J., A. R. Robinson, P. J. Haley, and W. G. Leslie, 2002b: Advanced interdisciplinary data assimilation: Filtering and smoothing via error subspace statistical estimation. Proc. OCEANS 2002 MTS/IEEE Conf., Biloxi, MS, IEEE, 795–802.

  • Lermusiaux, P. F. J., and Coauthors, 2006: Quantifying uncertainties in ocean predictions. Oceanography, 19, 92105.

  • Lions, J. L., 1971: Optimal Control of Systems Governed by Partial Differential Equations. Springer-Verlag, 396 pp.

  • Lorenz, E., 1963: Deterministic nonperiodic flow. J. Atmos. Sci., 20, 130141.

  • MacKay, D. J. C., 2003: Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 628 pp.

  • Malanotte-Rizzoli, P., 1996: Modern Approaches to Data Assimilation in Ocean Modeling. Elsevier, 455 pp.

  • McLachlan, G., and D. Peel, 2000: Finite Mixture Models. John Wiley & Sons, Inc., 419 pp.

  • Miller, R. N., M. Ghil, and F. Gauthiez, 1994: Advanced data assimilation in strongly nonlinear dynamical systems. J. Atmos. Sci., 51, 10371056.

    • Search Google Scholar
    • Export Citation
  • Moore, A. M., H. G. Arango, E. D. Lorenzo, B. D. Cornuelle, A. J. Miller, and D. J. Neilson, 2004: A comprehensive ocean prediction and analysis system based on the tangent linear and adjoint of a regional ocean model. Ocean Modell., 7, 227258, doi:10.1016/j.ocemod.2003.11.001.

    • Search Google Scholar
    • Export Citation
  • Papoulis, A., 1965: Probability, Random Variables and Stochastic Processes. McGraw-Hill, 583 pp.

  • Pham, D. T., 2001: Stochastic methods for sequential data assimilation in strongly nonlinear systems. Mon. Wea. Rev., 129, 11941207.

  • Robinson, A. R., P. F. J. Lermusiaux, and N. Q. Sloan, 1998: Data assimilation. The Sea, K. H. Brink and A. R. Robinson, Eds., The Global Coastal Ocean: Processes and Methods, Vol. 10, John Wiley and Sons, 541–594.

  • Sapsis, T., 2010: Dynamically orthogonal field equations. Ph.D. thesis, Department of Mechanical Engineering, Massachusetts Institute of Technology, 188 pp.

  • Sapsis, T., and P. F. J. Lermusiaux, 2009: Dynamically orthogonal field equations for continuous stochastic dynamical systems. Physica D, 238, 23472360, doi:10.1016/j.physd.2009.09.017.

    • Search Google Scholar
    • Export Citation
  • Sapsis, T., and P. F. J. Lermusiaux, 2011: Dynamical criteria for the evolution of the stochastic dimensionality in flows with uncertainty. Physica D, 241, 6076, doi:10.1016/j.physd.2011.10.001.

    • Search Google Scholar
    • Export Citation
  • Schwartz, G. E., 1978: Estimating the dimension of a model. Ann. Stat., 6, 461464.

  • Silverman, B., 1992: Density Estimation for Statistics and Data Analysis. Chapman & Hall, 175 pp.

  • Smith, K. W., 2007: Cluster ensemble Kalman filter. Tellus, 59A, 749757.

  • Sobczyk, K., 2001: Information dynamics: Premises, challenges and results. Mech. Syst. Signal Process., 15, 475498.

  • Sondergaard, T., 2011: Data assimilation with Gaussian mixture models using the dynamically orthogonal field equations. M.S. thesis, Department of Mechanical Engineering, Massachusetts Institute of Technology, 180 pp.

  • Sondergaard, T., and P. F. J. Lermusiaux, 2013: Data assimilation with Gaussian Mixture Models using the Dynamically Orthogonal field equations. Part II: Applications. Mon. Wea. Rev., 141, 17611785.

    • Search Google Scholar
    • Export Citation
  • Sura, P., 2010: On non-Gaussian SST variability in the Gulf Stream and other strong currents. Ocean Dyn., 60, 155170.

  • Tarantola, A., 2005: Inverse Problem Theory and Methods for Model Parameter Estimation. SIAM, 342 pp.

  • Tippett, M. K., J. L. Anderson, C. H. Bishop, T. M. Hamill, and J. S. Whitaker, 2003: Ensemble square root filters. Mon. Wea. Rev., 131, 14851490.

    • Search Google Scholar
    • Export Citation
  • Ueckermann, M. P., P. F. J. Lermusiaux, and T. P. Sapsis, 2013: Numerical schemes for dynamically orthogonal equations of stochastic fluid and ocean flows. J. Comput. Phys., 233, 272294, doi:10.1016/j.jcp.2012.08.041.

    • Search Google Scholar
    • Export Citation
  • van Leeuwen, P. J., 2009: Particle filtering in geophysical systems. Mon. Wea. Rev., 137, 40894114.

  • Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations. Mon. Wea. Rev., 130, 19131924.

  • Wunsch, C., 1996: The Ocean Circulation Inverse Problem. Cambridge University Press, 442 pp.

  • View in gallery

    Gaussian (parametric) distribution, Gaussian Mixture Model, and Gaussian (kernel) density estimator based on 20 samples generated from the mixture of uniform distributions: , where denotes the continuous uniform pdf for random variable X.

  • View in gallery

    GMM-DO filter flowchart.

  • View in gallery

    GMM-DO filter update. In column (i), we plot the set of ensemble realizations within the stochastic subspace, {φ} = {φ1, …, φ100}; in column (ii), we display the vectors and information residing in the state space. (a) The prior state estimate. (b) The fitting of Gaussian Mixture Models of complexity M = 1 (PD) and M = 2 (GMM) are shown, and their marginal distributions are plotted for each of the stochastic coefficients, Φ1 and Φ2. (c) The posterior state estimate is proposed again in the decomposed form that accords with the DO equations.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 337 324 17
PDF Downloads 272 260 18

Data Assimilation with Gaussian Mixture Models Using the Dynamically Orthogonal Field Equations. Part I: Theory and Scheme

View More View Less
  • 1 Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts
© Get Permissions
Full access

Abstract

This work introduces and derives an efficient, data-driven assimilation scheme, focused on a time-dependent stochastic subspace that respects nonlinear dynamics and captures non-Gaussian statistics as it occurs. The motivation is to obtain a filter that is applicable to realistic geophysical applications, but that also rigorously utilizes the governing dynamical equations with information theory and learning theory for efficient Bayesian data assimilation. Building on the foundations of classical filters, the underlying theory and algorithmic implementation of the new filter are developed and derived. The stochastic Dynamically Orthogonal (DO) field equations and their adaptive stochastic subspace are employed to predict prior probabilities for the full dynamical state, effectively approximating the Fokker–Planck equation. At assimilation times, the DO realizations are fit to semiparametric Gaussian Mixture Models (GMMs) using the Expectation-Maximization algorithm and the Bayesian Information Criterion. Bayes’s law is then efficiently carried out analytically within the evolving stochastic subspace. The resulting GMM-DO filter is illustrated in a very simple example. Variations of the GMM-DO filter are also provided along with comparisons with related schemes.

Corresponding author address: Pierre F. J. Lermusiaux, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139. E-mail: pierrel@mit.edu

Abstract

This work introduces and derives an efficient, data-driven assimilation scheme, focused on a time-dependent stochastic subspace that respects nonlinear dynamics and captures non-Gaussian statistics as it occurs. The motivation is to obtain a filter that is applicable to realistic geophysical applications, but that also rigorously utilizes the governing dynamical equations with information theory and learning theory for efficient Bayesian data assimilation. Building on the foundations of classical filters, the underlying theory and algorithmic implementation of the new filter are developed and derived. The stochastic Dynamically Orthogonal (DO) field equations and their adaptive stochastic subspace are employed to predict prior probabilities for the full dynamical state, effectively approximating the Fokker–Planck equation. At assimilation times, the DO realizations are fit to semiparametric Gaussian Mixture Models (GMMs) using the Expectation-Maximization algorithm and the Bayesian Information Criterion. Bayes’s law is then efficiently carried out analytically within the evolving stochastic subspace. The resulting GMM-DO filter is illustrated in a very simple example. Variations of the GMM-DO filter are also provided along with comparisons with related schemes.

Corresponding author address: Pierre F. J. Lermusiaux, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139. E-mail: pierrel@mit.edu

1. Introduction

Data assimilation (DA) is the process of quantitatively estimating dynamically evolving fields by melding information from observations with that predicted by computational models. Data assimilation has a long and interesting history; thorough expositions include Daley (1991), Ghil and Malanotte-Rizzoli (1991), Bennett (1992, 2002), Wunsch (1996), Malanotte-Rizzoli (1996), Robinson et al. (1998), Kalnay (2003), and Evensen (2007). Most schemes are derived from estimation theory (Jazwinski 1970; Gelb 1974), information theory (Sobczyk 2001; Cover and Thomas 2006), control theory (Lions 1971; Dimet and Talagrand 1986), and optimization theory and inverse problem theory (Tarantola 2005). While traditionally grounded in linear theory and the Gaussian approximation (Kalman 1960), recent years have seen the emergence of advanced DA schemes attempting to shed such limitations. One research thrust has been the development of efficient methods that respect nonlinear dynamics and capture non-Gaussian features. Most such methods are either challenging to employ with large realistic systems or still based on some ad hoc approximations. Our motivation here is to allow for realistic geophysical applications while rigorously utilizing the governing dynamical equations with information theory and learning theory for efficient Bayesian inference.

It is well known that geophysical dynamics can be very nonlinear and intermittent. The importance of accounting for nonlinearities in DA has also been known for some time (e.g., Miller et al. 1994). Nonlinearities not only affect prediction, but also the melding of measured and predicted information. As a result, oceanic and atmospheric fields can be characterized by complex, far-from-Gaussian statistics (Commission on Physical Sciences, Mathematics, and Applications 1993; Lermusiaux et al. 2002a; Auclair et al. 2003; Dee and Silva 2003; Lermusiaux et al. 2006; Sura 2010). With the introduction of the ensemble Kalman filter (Evensen 1994; Houtekamer et al. 1998), error subspace schemes (Lermusiaux and Robinson 1999), and square root filters (Whitaker and Hamill 2002; Tippett et al. 2003) came the adoption of Monte Carlo methods (Doucet et al. 2001) within the DA community. In addition to utilizing the inherent nonlinearities of the governing equations, Monte Carlo methods allow exploration and exploitation of probabilistic structures beyond the simple Gaussian melding of information. One type of such methods are particle filters, for example, Pham (2001) and van Leeuwen (2009), which evolve probability density functions (pdfs) using a discrete set of models states or particles and a corresponding mixture of “Dirac functions.” Extensions include diffusion kernel filters (e.g., Krause and Restrepo 2009) and parametric filters (e.g., Kim et al. 2009). A related interest has been the approximation of distributions by Gaussian Mixture Models (GMMs; Bocquet et al. 2010). Examples include Alspach and Sorenson (1972), Anderson and Anderson (1999), Chen and Liu (2000), Bengtsson et al. (2003), Kotecha and Djuric (2003), Eyink and Kim (2006), Smith (2007), Hoteit et al. (2008), and Dovera and Rossa (2011), many of which will be examined later in this work. As will be shown, GMMs provide an attractive method for approximating distributions for the purposes of Bayesian inference. When fit to Monte Carlo data using the Expectation-Maximization algorithm (Dempster et al. 1977) and the Bayesian Information Criterion (Schwartz 1978), an accurate representation of the true pdf results. This is to be developed in this work.

A concern with present nonlinear DA schemes is their difficulty in handling the dimensionality of state vectors commonly encountered in oceanic and atmospheric applications, typically on the order of n ~ 106–1010. A common useful remedy has been the adoption of various localization approximations (Bengtsson et al. 2003) and heuristic arguments (Anderson and Anderson 1999). A number of filters (e.g., Lermusiaux and Robinson 1999) have opted to focus on a time-dependent dominant subspace of the full state space, thereby allocating computational resources solely to the states that matter most. In a similar manner, we employ here the Dynamically Orthogonal (DO) field equations (Sapsis and Lermusiaux 2009; Sapsis 2010). The DO equations originate directly from the governing dynamical equations (i.e., the stochastic partial differential equations describing the evolution of the full geophysical system). By applying an orthogonality condition on the evolution of the stochastic subspace, the governing equations are reduced to evolution equations for (i) the mean field, (ii) the stochastic subspace, and (iii) the probabilistic variability contained within the subspace. These DO equations efficiently represent the true evolving pdf in between assimilation times and effectively approximate the Fokker–Planck equation.

In Part I of this two-part paper, we develop and derive the underlying theory and algorithms of the proposed DA scheme: the GMM-DO filter. In section 2, we introduce and define the filter’s core components. The derivation of the filter with a key proof is completed in section 3. Section 4 provides a simple example illustrating the filter’s update step, while section 5 places the GMM-DO filter in the context of contemporary schemes based on related ideas. Conclusions are in section 6. In appendixes A and B, we present the EM algorithm and outline variations of the filter, respectively. In Part II of this two-part paper (Sondergaard and Lermusiaux 2013), we apply the GMM-DO filter in a dynamical systems setting. Specifically, we evaluate its performance against contemporary filters when applied to 1) the double well diffusion experiment and 2) the sudden expansion fluid flow.

2. GMM-DO filter components

In this section, we introduce the core components that we ultimately combine into the GMM-DO filter, specifically:

  • Gaussian Mixture Models,

  • Expectation-Maximization algorithm,

  • Bayesian Information Criterion, and

  • Dynamically Orthogonal field equations.

In each case, we provide definitions and briefly justify the choices of these components in the context of oceanic and atmospheric DA. As a whole, the DO equations provide prior probabilities for a semiparametric assimilation framework based on Gaussian Mixture Models that are fit with an Expectation-Maximization algorithm and a Bayesian Information Criterion, Bayes’s law is then efficiently employed analytically to combine the predicted and observed information. The objective is to estimate the probabilistic properties of the dynamical state of the system under study, denoted as random state vector X. For ease of notation, expositions in this section are completed in the corresponding dynamical state space. However, in computations, all Bayesian updates occur within the evolving subspace (see section 3). Table 1 summarizes the notation specific to this manuscript.
Table 1.

Notation relevant to the GMM-DO filter. [While we have primarily adopted notation specific to probability theory, information theory, and estimation theory, where possible we also utilize the notation advocated by Ide et al. (1997).]

Table 1.

a. Gaussian Mixture Models

The pdf for a random vector, X ∈ ℝn, distributed according to a multivariate GMM is given by
e1
subject to the constraint that
e2
We refer to M ∈ ℕ as the mixture complexity, πj ∈ [0, 1] as the mixture weights, as the mixture mean vectors, and j ∈ ℝn×n as the mixture covariance matrices. The multivariate Gaussian density function takes the following form:
e3

GMMs provide an attractive semiparametric framework in which to approximate unknown distributions based on a set of ensemble realizations (McLachlan and Peel 2000). They are a flexible compromise between (i) a fully parametric (Gaussian) distribution for which M = 1 and (ii) a (Gaussian) kernel density estimator (Silverman 1992) for which M = N, with N being the number of realizations. A single parametric distribution, while justified based on maximum entropy arguments (Cover and Thomas 2006) often enforces too much structure onto the ensemble set and cannot model highly skewed or multimodal distributions. A kernel density estimator, on the other hand, usually requires one to retain all N realizations for the purposes of inference—a computationally burdensome task. Furthermore, because of the granularity associated with fitting a kernel to every realization, it often necessitates an heuristic choice of the kernel’s shape parameter (see section 5).

Mixture models efficiently summarize the ensemble set by a parameter vector, while retaining the ability to accurately model complex distributions (see Fig. 1). In fact, in the limit of large complexity and small covariance, a GMM converges uniformly to any sufficiently smooth distribution (Alspach and Sorenson 1972). Other mixtures and expansions have been used to approximate arbitrary probability distributions, among them the Gram–Charlier expansion, Edgeworth expansion, and Pearson-type density functions (Alspach and Sorenson 1972). While the former two suffer from being invalid distributions when truncated (viz., that they must integrate to 1 and be positive everywhere), the latter does not lend itself well to Bayesian inference. In contrast, GMMs (1)(3) are clearly valid.

Fig. 1.
Fig. 1.

Gaussian (parametric) distribution, Gaussian Mixture Model, and Gaussian (kernel) density estimator based on 20 samples generated from the mixture of uniform distributions: , where denotes the continuous uniform pdf for random variable X.

Citation: Monthly Weather Review 141, 6; 10.1175/MWR-D-11-00295.1

An important property of GMMs is that they are conjugate priors to the commonly used Gaussian observation models: their Bayesian update then remains a Gaussian mixture (Casella and Berger 2001; Sondergaard 2011). Specifically, for a prior multivariate GMM,
e4
and a multivariate Gaussian observation model,
e5
the Bayesian update remains a multivariate GMM,
e6
with posterior parameters:
e7
where
e8
is the Kalman gain matrix associated with mixture component j.

Consequently, for Gaussian observation models with GMMs as priors, the usually intractable Bayesian update reduces to an update of the elements of the parameter set, , given by (7). Specifically, the individual mixture mean vectors and covariance matrices are found to be updated in accordance with familiar Kalman filter equations, the coupling occurring solely through the mixture weights.

Having introduced GMMs as an attractive method for approximating distributions for the purposes of Bayesian inference, its optimal parameter values,
eq1
need to be estimated based on a set of N ensemble realizations, {x} = {x1, …, xN}. Here, we seek the value for the parameters that maximizes the probability of obtaining the given realizations; the maximum likelihood (ML) estimators. For this we make use of the Expectation-Maximization (EM) algorithm.

b. The Expectation-Maximization algorithm

The EM algorithm is an iterative procedure for estimating the parameters θi of a target distribution that maximize the probability of obtaining a given set of realizations, {x} = {x1, …, xN}. While resulting ML estimators can be justified based on intuition alone, they are also consistent and asymptotically efficient (Bertsekas and Tsitsiklis 2008). For most cases, differentiating the parametric probability distribution, p{X}({x}; θ1, …, θM), with respect to θi, and equating the result to zero for maximization,
e9
results in nonlinear systems for θis that lack closed form solutions. Such is also the case for GMMs. Hence, one resorts to numerical methods for obtaining the ML estimate. While various hill-climbing schemes exist, the EM algorithm takes advantage of properties of probability distributions.

Specifically, the EM algorithm (see section a in appendix A) is an iterative succession of expectation and maximization steps for obtaining the ML estimate. It successively estimates the weights with which a given realization is associated with each of the M mixture components. This is done based on present parameter estimates, followed by optimizing these parameters again using the newly calculated weights. Repeating this, it ultimately arrives at an estimate for the ML parameter vector based on the set of ensemble realizations, {x}. In section b of appendix A, we present the EM algorithm for GMMs. The result is as follows:

Given the set of ensemble realizations, {x} = {x1, …, xN}, and initial parameter estimate,
eq2
repeat until convergence:
  • For all i ∈ {1, …, N} and j ∈ {1, …, M}, use the present parameter estimate, θ(k), to form
    e10
  • For all j ∈ {1, …, M}, update the parameter estimate, θ(k+1), according to
    e11
    e12
    e13
    where
    e14

Inspection of the above satisfies intuition. In the E step of the EM algorithm [(10)], we calculate the probability of mixture component j having generated realization xi based on the present parameter estimates. We do so across all possible pairs of realizations and components. In the M step of the EM algorithm [(11)(13)], the parameter values are updated in accordance with their weighted averages across all realizations [similar in form to (A2)(A4) for the complete dataset]. As proved in section b in appendix A, repeated iterations of the above ensures that a local maximum for the ML parameter estimate is met. We thus arrive at an optimal fit of a GMM of complexity M to the set of N realizations, {x}.

c. The Bayesian Information Criterion

Until now, we have assumed the mixture complexity M to be fixed and known. Such is rarely the case in practice, however. Determining the optimal complexity of a GMM can be a complicated task, particularly given limited a priori knowledge, and is often guided by empirical evidence, namely, the set of ensemble realizations. Such a task is formally referred to as “model selection.” While numerous schemes exist (e.g., Eisenberger 1964; McLachlan and Peel 2000; Duda et al. 2001), here we focus on the Bayesian Information Criterion (BIC).

Introducing a Bayesian framework, the parameter vector θ is assumed random and M is considered constant but unknown. We denote pΘ(θ; M) as the (arbitrary) prior distribution for θ at a given M, and p{X}|Θ({x} | θ; M) as the distribution for the ensemble set conditioned on a θ at a given M. In this work, the latter is a GMM.

The goal is to select the model complexity M that maximizes the likelihood of obtaining {x}. In other words, by the assumed independence of the realizations, we seek M for which
e15
is a maximum. A derivation of this optimum M is given in Sondergaard (2011). In summary, Laplace’s approximation is applied to the left-hand side of Bayes’s law (MacKay 2003),
e16
evaluated at the ML estimate for the parameter vector θ. Ultimately, we obtain
e17
where KM denotes the length of the parameter vector, defines the expected Fisher information (Bishop 2006) in any one realization xi evaluated at the ML estimate for the parameter vector θ, and where we have defined the loglikelihoods:
e18
e19
For large N, however, we keep only the order one terms of (17) to arrive at the BIC:
e20
where N is the number of realizations, M is the mixture complexity, is the log-likelihood of the ensemble set integrated across all possible parameter values, is the log-likelihood of the ensemble set evaluated at the ML estimate for the parameter vector, and KM is the number of parameters. The complexity M needs to be chosen to minimize the BIC.

The BIC is a quantitative equivalent of the “Occam’s Razor” (MacKay 2003; Duda et al. 2001), namely, that one should favor the simplest hypothesis consistent with the ensemble. Here, a balance is struck between underfitting—and thus imposing too much onto the data—and overfitting, for which we limit our predictive capacity beyond the ensemble. This is done by penalizing the fit of the realizations, quantified by twice the log likelihood of the ensemble set evaluated at the ML parameter vector, , with a term proportional to the mixture complexity, KM logN.

At this point, what remains for our DA scheme is an efficient method for evolving the probabilistic description of the state in time. For this, we employ the DO equations.

d. The Dynamically Orthogonal field equations

The DO equations (Sapsis and Lermusiaux 2009; Sapsis 2010), are a closed reduced set of evolution equations for general stochastic continuous fields, X(r, t; ω), described by a stochastic partial differential equation (SPDE):
e21
with initial conditions
e22
and boundary conditions
e23
where r denotes the position in space; t is time; ω is a random event; is a general, potentially nonlinear, differential operator (presently, an ocean or fluid flow model); is a linear differential operator; and ξ is the spatial coordinate denoting the boundary. Two main assumptions are made in the derivation of the DO equations. First, a generalized, time-dependent Karhunen–Loeve decomposition of the fields (Lermusiaux 2006; Sapsis and Lermusiaux 2009) is used,
e24
where are the mean fields with E[·] being the expectation operator over ω; are orthonormal modes spanning the time-dependent stochastic subspace; and Φi(t; ω) are zero-mean, stochastic coefficients. The decomposition (24) defines generalized empirical orthogonal functions. In addition to , the dimension of the subspace s also varies with time, but in what follows, for ease of notation, we omit t next to s. Second, after insertion of (24) into (21), a DO condition is imposed (i.e., the rate of change of the stochastic subspace basis is orthogonal to itself over the physical domain):
e25
With these assumptions, the original SPDE is reduced to DO equations (see definition below):
  1. a PDE (26) for the evolution of the mean field, ;

  2. a family of s PDEs (27) for the evolution of the orthonormal modes describing a basis for the time-dependent dominant stochastic subspace; and,

  3. a system of s stochastic differential equations (28) for the coefficients, Φi(t; ω), which define how the stochasticity evolves within the stochastic subspace.

Mathematically, for the governing dynamics (21), with initial and boundary conditions (22) and (23), the coupled DO evolution equations are (using Einstein notation, Σiaibiaibi)
e26
e27
e28
where
e29
is the projection of F(r) onto the null space of the stochastic subspace; and,
e30
is the correlation between random variables Φi(t; ω) and Φj(t; ω), with the inverse of the correlation matrix being used in (27). The associated boundary conditions take the following form:
e31
e32
and the initial conditions are given by
e33
e34
e35
where i = 1, …, s and are the orthonormal modes for the stochastic subspace at t0.

With the DO equations, both the stochastic subspace and the stochastic coefficients are dynamically evolved in time. They are initialized based on the initial pdf and thereafter evolved in accord with the SPDE governing X(r, t; ω) and its boundary conditions. This evolution is an advantage when compared to the proper orthogonal decomposition (Papoulis 1965; Holmes et al. 1996) and polynomial chaos (Ghanem and Spanos 1991), which both fix in time parts of their truncated expansion, the former the stochastic subspace and the latter the form of the stochastic coefficients. We note that s can also be evolved based on the dynamics and external observations (Sapsis and Lermusiaux 2011), as done in error subspace statistical estimation (ESSE) (Lermusiaux 1999b).

3. The GMM-DO filter

Combining the components described in section 2, and building on the foundations of classical assimilation schemes, we now complete the derivation of the GMM-DO filter: data assimilation with GMMs using the DO equations. The result is an efficient, data-driven scheme that preserves non-Gaussian statistics and respects nonlinear dynamics.

The GMM-DO filter consists of a recursive succession of two distinct steps: a forecast step and an update step. The Bayesian assimilation is the update step. As will be proved, this update is efficiently computed within the evolving subspace and the result is equivalent to the Bayesian update in the dynamical state space. For today’s ocean and atmosphere simulations, the subspace update is computationally feasible. We refer to Table 1 for notation.

a. Initial conditions

We initialize the state vector at discrete time k = 0 in a decomposed form,
e36
that accords with the DO equations. The initial state mean , orthonormal modes , and stochastic coefficients Φi,0(ω), are chosen so as to best represent the initial probabilistic state. Various representations and discretizations for the coefficients, Φi(t; ω), exist (Sapsis and Lermusiaux 2009; Ueckermann et al. 2013), several of which can be employed with our GMM-DO scheme. Here, we adopt a Monte Carlo approach: we draw N realizations of the multivariate random vector, , to obtain the following matrix:
e37
We emphasize that the represent realizations residing in the initial stochastic subspace of dimension s0. With this, we rewrite (36) in its Monte Carlo ensemble form:
e38
where 0 ∈ ℝn×s (Table 1) is the matrix of modes forming an orthonormal basis for the initial subspace. This 0 is evolved in time by dynamics and random forcing in (27).

b. Forecast

Starting from either the initial DO conditions or the posterior state description following the assimilation of data at time k − 1 (i.e., the Bayesian GMM update at k − 1),
e39
we use the stochastic DO equations [(26)(28)] to efficiently evolve the probabilistic description of the state vector in time, arriving at a forecast for observation time k:
e40
This forecast is efficiently computed using the numerical schemes derived by Ueckermann et al. (2013). Specifically, for the mean and modes, we employ a second-order finite-volume spatial discretization and DO-specific projection method, and for the stochastic coefficients, a second- or fourth-order integration scheme in time.

As (39) and (40) indicate, all of the mean, orthonormal modes and coefficients are evolved during the forecast from tk−1 to tk. In particular, the span of the modes differs from that of : the subspace evolves with time in between data assimilation.

c. Observation

Common to oceanic and atmospheric applications, we employ here a linear (or linearized) observation model:
e41
Here Yk ∈ ℝp is the observation random vector at time k, H ∈ ℝp×n is the linear observation model, and ϒk ∈ ℝp the corresponding random noise vector, assumed to be of a Gaussian distribution. We denote the realized observation vector by yk ∈ ℝp and realized noise vector by υk ∈ ℝp. This observation model could be generalized to other forms, which would lead to variations in the following update scheme.

d. Update

The whole update occurs at fixed discrete time instant and, in what follows, we thus omit the subscript time index k. In the update, the subspace is for now assumed unchanged by the observations1: the notation (·)f or (·)a is thus not used on the modes . Of course, observations affect the subspace evolution after each assimilation since the DO equations (26)(28) are coupled. In conclusion, starting from the prior, here the DO forecast,
e42
the goal is to update the mean state and set of realizations, , in accordance with (41) and realized observations y, to obtain the posterior GMM-DO estimate:
e43
To do so, we first optimally fit a GMM (section 2a) to the forecast set of realizations in the stochastic subspace. This prior GMM estimate is then updated within the subspace, in accordance with observations and Bayes’s law, ultimately leading to the posterior GMM-DO estimate (43). In what follows, we derive and describe this GMM-DO update algorithm.

1) GMM representation of prior set of ensemble realizations

At the time of a new set of measurements y, we use the EM algorithm and BIC to determine the GMM that best represents the set of ensemble realizations within the stochastic subspace, . We denote the parameters of the GMM by
eq3
where , , and . We again stress that the GMM efficiently resides in an s-dimensional subspace of the n-dimensional dynamical state space, with sn, thus making the prior estimation procedure computationally feasible.
We determine the optimal mixture complexity by application of the BIC [(20)] successively fitting GMMs of increasing complexity (i.e., M = 1, 2, 3, …) with the EM algorithm, until a minimum of the BIC is met. The final result is a GMM optimally fit to the ensemble of realizations in the stochastic subspace. We write the resulting prior pdf of this GMM as
e44
Because of the affine transformation (42) linking the stochastic subspace with the state space, we may expand the previously determined GMM into the state space according to the following:
e45
e46
This is a key property of our GMM-DO filter. The mixture weights naturally remain unchanged. We note that and now refer to the mean vector and covariance matrix, respectively, for mixture component j in the state space. We thus arrive at the prior distribution for the state vector in state space, taking the form of the following GMM:
e47
We emphasize that because of the affine transformation (42), this distribution would equally have been obtained had we performed the prior fitting of the GMM directly in the state space based on the set of realizations .

2) Bayesian update

Since the uncertainty of the state is restricted to the stochastic subspace, we prove next that the Bayesian update can be performed therein. In doing so, we again make use of the affine transformations (42)(43) linking the stochastic subspace with the state space. We reemphasize that presently, this subspace, described by the matrix , is assumed to remain unaffected by the assimilation. The result of the theorem, of course, provides an efficient implementation of the GMM-DO filter’s update step, with significant computational savings due to the reduced dimensionality, sn. For realistic modeling with large state vectors, only this update is computationally feasible.

(i) Theorem 1

Given the GMM fit (47) to the DO forecast as prior distribution and the realized observation vector y with the observation model (41) of Gaussian distribution, the posterior distribution of the state vector in the state space is obtained by Bayesian update of (44) carried out in the stochastic subspace. The result is equivalent to updating directly. Specifically, the update equations for the mean and parameters , and are as follows:
e48
e49
e50
e51
e52
with the following definitions:
e53
e54
eq23
e55

(ii) Proof

Bayesian update in the state space. Applying the Bayesian update equations (4)(7) of section 2a to the GMM prior (47) and the observation model (41), we first obtain the posterior distribution for the state vector in the state space:
e56
with
e57
e58
e59
where
e60
is the Kalman gain matrix associated with mixture component j.
With this, we can derive the expression for the posterior mean field in the state space:
e61
e62
as well as for other moments in the state space (see the remark hereafter). This completes the Bayesian update in the full state space, with the posterior mean vector and GMM parameters all expressed in terms of the state space quantities and realized observations y.
Bayesian update in the stochastic space. Now, we show that the Bayesian update in the state space defined by (56)(59) and (62) is equivalent to a Bayesian update in the stochastic DO subspace. We first remark that using (53) and (55) is computationally efficient. To derive (55), we use identity (46), orthonormality of the modes, and definition (53):
eq4
Deriving next the update equation (50) for the mixture weights, we start from (57) and use (45) and (46), to obtain the following:
e63
e64
which becomes by simple rearranging of terms:
e65
Then, applying definitions (53) and (54) leads to
e66
With this, we obtain an efficient update equation for the mixture weights using vectors and matrices specific to the subspace, all the while retaining the familiar structure of (57).
In a similar manner, to derive (48), (49), and (51) for the posterior mean and mixture means , we start with (62), use (45), and apply definition (55) to obtain the following:
e67
e68
which becomes, using and applying definitions (53) and (54),
e69
As a result, we obtain the following:
e70
where we have defined “intermediate” mean vectors in the stochastic subspace:,
e71
These intermediate vectors, when adequately combined and weighted, are the contribution of our Bayesian GMM-DO update to the conditional mean state from the forecast mean state . We refer to these M vectors as intermediate means from the fact that our DO framework requires that the parametric distribution describing the stochastic subspace is of mean zero (i.e., ). This condition is obviously not satisfied by . The actual means of the posterior mixture components in the subspace can be obtained by a reset of these intermediate means:
e72
Rather than merely stating this as a matter of fact, however, we now derive this result. Similarly to (45), we first write the following:
e73
By subtraction of and left multiplication by T, we then obtain the following:
e74
e75
where (75) results from the orthonormality of the modes (i.e., T = ). We now have, inserting (58) and (70) in (75),
e76
and then using (45), definition (55), and the orthonormality of the modes,
e77
Hence, we derive (51):
e78
Finally, to derive (52) that expresses the updated mixture covariance matrices, , in terms of DO subspace quantities, we proceed similarly. As in (46), we expend :
e79
and then equate (79) to (59), inserting (46), to obtain the following:
eq5
We then left multiply by T and right multiply by , and use definition (55), to obtain the following:
e80
Here the orthonormality of the modes and the definition in (53) have been used.

With the above theorem, we have derived efficient expressions (48)(52) for the GMM-DO update in the time-dependent stochastic subspace. To conclude, we note the similarity of these GMM-DO filter equations for a Bayesian update with the corresponding ESSE equations for Gaussian update, both of which occur in the stochastic subspace.

(iii) Remark

Although strictly unnecessary for the GMM-DO filter, we can also obtain all updated state space quantities. For example, the full posterior covariance matrix in the state space can be obtained using the law of total variance (Bertsekas and Tsitsiklis 2008):
e81

3) Generation of posterior set of ensemble realizations

We complete the update step, as with ESSE scheme A (Lermusiaux and Robinson 1999), by generating a posterior set of realizations within the stochastic subspace, , according to the posterior multivariate GMM, , with parameters:
eq6
With this, we arrive at the posterior DO representation in Monte Carlo form for the state vector based on a Bayesian assimilation of the observations y at time k:
e82
We note that the size of the prior and posterior ensembles at time k in the stochastic subspace do not need to be the same (e.g., N can be evolved by a convergence criterion for the DO forecast from time k to the next observation time k + 1; Lermusiaux 2007; Ueckermann et al. 2013). This concludes the derivation of the GMM-DO filter. We summarize the algorithm using the flowchart displayed in Fig. 2. We note that extensions of this GMM-DO filter algorithm are provided in appendix B: specifically, an algorithm for limiting the GMM fit to a dominant subspace in the full stochastic DO subspace as well as an algorithm for constraining the means of the GMM.
Fig. 2.
Fig. 2.

GMM-DO filter flowchart.

Citation: Monthly Weather Review 141, 6; 10.1175/MWR-D-11-00295.1

Next, we illustrate the GMM-DO filter procedure by way of a simple toy example. More realistic applications are provided in Part II (Sondergaard and Lermusiaux 2013).

4. Example

Assume we are provided with the following (arbitrarily chosen) forecast for the DO decomposed representation of the state:
eq7
with one hundred subspace realizations, , generated from a Gaussian Mixture Model of complexity 2:
eq8
Let us further assume the following forecast parameters:
eq9
For simplicity, we will take the true field to coincide with one of the realizations:
eq10
We make noisy measurements of the first and third elements of the state vector, that is,
eq11
normally distributed with an error covariance matrix given by
eq12
where σobs = 5. We illustrate all of the above in Fig. 3a. With this, we proceed with the update step, using the GMM-DO flowchart, Fig. 2. We bypass illustrating the application of the BIC and rather present results directly for GMMs of complexity M = 1, 2. The former is a single Gaussian parametric distribution, while the latter would, with high probability, be obtained using the BIC criterion in the present example.
Fig. 3.
Fig. 3.

GMM-DO filter update. In column (i), we plot the set of ensemble realizations within the stochastic subspace, {φ} = {φ1, …, φ100}; in column (ii), we display the vectors and information residing in the state space. (a) The prior state estimate. (b) The fitting of Gaussian Mixture Models of complexity M = 1 (PD) and M = 2 (GMM) are shown, and their marginal distributions are plotted for each of the stochastic coefficients, Φ1 and Φ2. (c) The posterior state estimate is proposed again in the decomposed form that accords with the DO equations.

Citation: Monthly Weather Review 141, 6; 10.1175/MWR-D-11-00295.1

a. Fitting of GMM

  • Use the EM algorithm to obtain the prior mixture parameters
    eq13
    within the stochastic subspace based on the set of ensemble realizations, . The identified mixtures (of complexities one and two), along with their marginal distributions, are displayed in Fig. 3b(i).

b. Update

  1. Calculate parameters:
    eq14
    and determine the mixture Kalman gain matrices:
    eq15
  2. Assimilate the measurements y by calculating the intermediate mixture means in the stochastic subspace,
    eq16
    and further compute the posterior mixture weights:
    eq17
  3. Update the DO mean field [displayed in Fig. 3c(ii)],
    eq18
    as well as the mixture parameters within the stochastic subspace:
    eq19
  4. Generate the posterior set of ensemble realizations within the stochastic subspace, , based on the multivariate GMM with posterior parameters:
    eq20
    We display the posterior set of realizations in Fig. 3c(i).

By way of this simple example, we draw two conclusions on the benefits of the GMM-DO filter. Because of the initial non-Gaussian statistics, the GMM was expectedly found to provide a posterior estimate superior to that of the Gaussian parametric distribution (PD), as evidenced for example by their posterior means, Fig. 3c(ii). In particular, because of the PD’s conservative estimate for the covariance matrix of the true pdf [Fig. 3b(i)], the noisy measurements were inherently favored during the update step, essentially resulting in an “overshoot” of its posterior estimate for the mean. Given the GMM’s accurate representation of the non-Gaussian features, on the other hand, the prior information was properly balanced with that due to the measurements, resulting in a successful Bayesian update. While this was to be expected given the initial bimodal distribution, previous arguments suggest that this holds for arbitrary distributions as long as the fitting of GMMs based on the EM algorithm and BIC provides a good approximation of the true pdf.

The second conclusion refers to the posterior statistics, represented by the subspace realizations, , in Fig. 3c(i). In addition to the GMM’s successful capture of the true solution, the compactness of its posterior set of realizations further emphasized an added belief in this estimate. The accuracy of the posterior representation of the true statistics clearly affects future assimilations (not shown here). We therefore hypothesize that the GMM-DO filter outperforms simpler schemes (e.g., the Gaussian parametric distribution) in this respect. In Part II of this two-part paper, we support this hypothesis by applying the GMM-DO filter in truly dynamical systems.

5. Discussion and comparisons with related schemes

In this section, we review a selection of past pioneering DA schemes that, as the GMM-DO filter, have adopted the use of GMMs for approximating the true pdf.

a. Alspach and Sorenson (1972)

GMMs were, to the best of our knowledge, first addressed in the context of filtering theory by Alspach and Sorenson (1972). Here, the authors were particularly motivated by the inappropriate use of the Gaussian parametric distribution, stating that “the Gaussian (parametric) approximation greatly reduces the amount of information that is contained in the true density, particularly when it is multimodal.” They emphasized the ability of GMMs to approximate arbitrary pdfs, all the while retaining the familiar computational tractability when placed in the context of Bayesian inference.

Based on an approximation of the known, initial (non Gaussian) distribution by a GMM of complexity M, their scheme would essentially run M extended Kalman filters in parallel—one for each mixture component—coupled solely through the mixture weights. Their update would thus take a form structurally similar to that of the GMM-DO filter, set aside the latter’s focus on a stochastic subspace nonlinearly evolving through fully coupled DO equations. While the authors freed themselves of the Gaussian parametric constraint, their scheme remained grounded in linear theory, however, having been inspired by the extended Kalman filter. The authors also made no mention of the appropriate mixture complexity or the manner in which the initial mixture parameters were obtained. Moreover, while they alluded to the need for intermittently restarting the distribution—either due to a poor mismatch of forecast distribution with observations, or to the collapse of weights onto a single mixture component—no appropriate remedies were proposed.

b. Anderson and Anderson (1999)

Anderson and Anderson (1999), in part inspired by the recent advances of ensemble methods within the DA community (e.g., Evensen 1994; Lermusiaux 1997; Houtekamer et al. 1998), extended the work of Alspach and Sorenson by adopting a Monte Carlo approach for evolving the probabilistic description of the state in time. By arguing that one of the fundamental advantages of a Monte Carlo approach [is its] ability to represent non-Gaussian probability distributions,” they chose to approximate the Monte Carlo realizations by use of a kernel density estimator:
e83
with xi representing realizations in state space, is the sample covariance matrix based on the set of ensemble realizations, and α is a heuristically chosen scaling parameter.
Upon assimilating data from a Gaussian observation model, their posterior distribution for the state vector would thus take the familiar form:
e84
with parameters determined in accordance with (57)(59), from which they would draw N new realizations.

The authors justifiably argued for the advantages over filters invoking the Gaussian parametric distribution, giving as example their respective performances when applied to the three-dimensional Lorenz-63 model (Lorenz 1963): while their kernel filter would represent states solely in accordance with model dynamics, simpler filters would potentially assign finite probability to regions of state space never visited.

One drawback of the filter lay in their arguments for choosing the scaling parameter α. Specifically, the authors stated that while a number of methods for computing the constant covariance reduction factor α have been developed,…the value of α is often subsumed into a tuning constant and so does not need to be calculated explicitly.…Tuning a filter for a real system is complicated…[and] must be chosen with care.”

Hoteit et al. (2008) later extended the filter by allowing the realizations to carry uneven weights, drawing on the concepts of particle filters. Specifically, they retained the posterior form of (84) rather than drawing N new realizations following every assimilation step. To avoid the collapse of weights onto only a few realizations, they proposed a number of interesting methods for resampling. While effective, these ideas are not discussed further.

c. Bengtsson et al. (2003)

Bengtsson et al. (2003) expressed a concern over Anderson and Anderson’s use of kernel density methods for approximating distributions, arguing that the use of “scaled versions of the full ensemble covariance around each center in the mixture…cannot adapt as easily to local structure in the forecast distribution.” Instead, they proposed to approximate the set of realizations by a GMM (of complexity less than the number of realizations), estimating the mixture parameters using local knowledge of the ensemble distribution. They stated that such an approach would provide a more accurate approximation to the true pdf.

Their update step essentially proceeded as follows: M ensemble realizations would be arbitrarily chosen to act as means for the proposed Gaussian mixtures, from which Nn nearest neighbors to each of these realizations would be used to approximate their respective mixture covariance matrices. From here, one would proceed with the Bayesian update, conceptually inspired by the ensemble Kalman filter (Evensen 1994).

As with Alspach and Sorensen, the authors left unanswered methods for determining both the mixture complexity M, as well as the appropriate choice of Nn, the number of nearest neighbors. Furthermore, their choice of mixture means, based on the arbitrary sampling of ensemble realizations, would certainly invite for sampling noise.

The authors further expressed difficulties associated with manipulating pdfs in high-dimensional spaces. They thus introduced a hierarchy of adaptations to the aforementioned filter in which they invoked varying degrees of localization approximations, all based on heuristic arguments. As a remedy, however, they concluded that “a more sophisticated filter will likely rely on efficient, sequential identification of low-dimensional subspaces where non-Gaussian densities can be accurately represented and filtered using finite ensemble sizes.”

d. Smith (2007)

Indirectly extending the work by Bengtsson et al., Smith (2007) employed the EM algorithm to uncover the underlying structure represented by the set of ensemble realizations, thus alleviating former heuristic arguments. The author modified the ensemble Kalman filter to allow for a Gaussian mixture representation of the prior distribution, using Akaike’s Information Criterion (AIC) as the method for selecting the appropriate mixture complexity. (As a side note, McLachlan and Peel (2000) found the BIC to outperform the AIC when fitting Gaussian mixtures to data; specifically, the latter would have the tendency to overestimate the mixture complexity.) Similar to the scheme of Bengtsson et al., Smith retained the concept of operating on individual ensemble realizations during the update step, imposing only—but somewhat surprisingly—that the posterior distribution be normally distributed.

For illustration, the author applied his cluster ensemble Kalman filter to a two-dimensional phytoplankton–zooplankton biological model. While successful for such simple models, he emphasized the difficulties of extending his scheme to test cases of larger dimensions, making, however, the useful comment that “the state space could be projected onto a lower dimensional space depicting some relevant phenomenon, and the full covariance matrix in this state space could be used.”

e. Dovera and Rossa (2011)

Dovera and Rossa (2011) would later modify the approach by Smith, attempting to overcome the constraint that the posterior distribution be Gaussian. Their update step seemingly disagreed with the output of the EM algorithm, however—a point of view reflected in the recent work by Frei and Kunsch (2013).

The authors applied their scheme to both the Lorenz-63 model as well as a two-dimensional reservoir model, outperforming the regular ensemble Kalman filter. As with previous schemes, however, they equally noted the problems caused by systems of high dimensionality, again using a number of localization arguments to overcome this burden. With the GMM-DO filter, all of these issues are addressed by (i) adopting the generalized, time-dependent Karhunen–Loeve decomposition of the state dictated by the DO framework; and (ii) deriving the corresponding rigorous GMM-DO updates for fully Bayesian-based data assimilation.

6. Summary and conclusions

A data assimilation framework that rigorously utilizes the governing dynamical equations with information theory and learning theory for efficient Bayesian geophysical data assimilation was presented. The theory and algorithm of the resulting filter, the GMM-DO filter, were developed and derived. The DO equations and their adaptive stochastic subspace are employed to provide prior probabilities, effectively approximating the Fokker–Planck equation. At assimilation times, the DO realizations are fit to semiparametric GMMs using the Expectation-Maximization (EM) algorithm and the Bayesian Information Criterion (BIC). Bayes’s law is then efficiently carried out analytically within the evolving stochastic subspace.

Past literature had identified the advantages of adopting GMMs in a filtering setting, allowing the update step to capture and retain potential non-Gaussian features. In some cases, the EM algorithm and model selection criteria had been used to obtain optimal mixture parameter values, resulting in a more accurate approximation of the true pdf. However, existing schemes often reverted to heuristic approximations or surprising choices. A novelty of the GMM-DO filter lies in its rigorous coupling of GMMs, the EM algorithm, and the BIC with the efficient DO equations. By focusing on the time-dependent dominant stochastic subspace of the state space, we address prior limitations caused by the dimensionality of geophysical applications. Particularly, we render obsolete ad hoc procedures. Contrary to the ensemble Kalman filter, as well as several other methods, we presently refrain from operating directly on individual ensemble realizations during the update step. Rather, under the assumption that the fitted GMM accurately captures the true prior pdf, we analytically carry out Bayes’s law efficiently within the stochastic subspace.

The derived GMM-DO filter respects nonlinear dynamics and captures non-Gaussian statistics as it occurs, obviating the use of empirical arguments. Of course, variations of the present filter exist, two of which are derived in appendix B. Additional areas for further research include the selection of the algorithms for fitting the GMMs to the DO realizations. Schemes based on the EM-BIC approach have the advantage of being generic, but there is a large body of literature on other estimators (McLachlan and Peel 2000), and some schemes could be tailored to specific oceanic or atmospheric applications. Constraints can also be added to this fitting procedure, leading to a supervised learning of the GMM properties. Other mixture models could be used (e.g., including Laplace mixtures for heavier tails, depending of the application and efficiency requirements). One advantage of the GMM is that if the number of Gaussians is one (M = 1), one recovers a classic Kalman update. Since our GMM-DO filter estimates the optimal M, if it is found to be one, a Kalman update in the subspace is used. The GMM-DO filter is thus a straightforward and efficient extension of the Kalman filter for nonlinear and non-Gaussian geophysical systems. The present GMM-DO update could also be augmented with a subspace learning scheme based on the innovation vector and posterior misfit, extending the ESSE learning to GMMs. Another variation of this update is to operate directly on individual realizations; such a variation exists in ESSE. Another research direction is the derivation of GMM-DO smoothers. A possibility is to employ a statistical linearization as in the ESSE smoother (Lermusiaux and Robinson 1999; Lermusiaux et al. 2002b), but other options are possible, including hybrid ones with variational schemes (e.g., Moore et al. 2004). Finally, for the case of white noise stochastic forcing and for small enough stochastic subspace size, the Fokker–Planck equation that evolves the joint pdf for the stochastic coefficients of the DO expansion (Sapsis and Lermusiaux 2011) could be used instead of the stochastic differential equations for DO realizations. This approach would directly provide the prior joint pdf for the Bayesian update, but numerical schemes other than those employed here would then be needed (Ueckermann et al. 2013).

In Part I of this two-part paper, we derived the GMM-DO filter, outlined its algorithmic implementation, and placed it in the context of current literature. In Part II, we evaluate its performance when applied to the following test cases: (i) the double-well diffusion experiment and (ii) the sudden expansion fluid flow.

Acknowledgments

We are very thankful to the MSEAS group members, in particular for helpful discussions with Mr. M.P. Ueckermann, Dr. P. Haley Jr., and Dr. T. Sapsis, as well as with Mr. T. Lolla and P. Lu. We especially thank Mr. M. Ueckermann for his help on the DO numerics. TS also acknowledges Prof. G. Wornell and his MIT course for a clear introduction to a number of the information theory concepts relevant to the present research. We are grateful to the Office of Naval Research for support under Grants N00014-08-1-1097 (ONR6.1), N00014-09-1-0676 (Science of Autonomy—A-MISSION), N00014-11-1-0337 (ATL), and N00014-08-1-0586 (QPE). PFJL is also thankful to Sea Grant at MIT for the “2009 Doherty Professorship in Ocean Utilization” Award.

APPENDIX A

EM Algorithm (with Gaussian Mixture Models)

The EM algorithm is commonly introduced in the context of “incomplete data” (Dempster et al. 1977), for which ML parameter estimation by partial differentiation [(9)] fails to yield a closed form solution. To circumvent this, the main idea is to artificially “complete” the data at hand with additional pseudo data (or knowledge about the data), thereby giving rise to closed form solutions for the ML parameters (McLachlan and Peel 2000). The data with which to complete the existing dataset are chosen by the user and may have little physical relevance; its choice, however, ultimately dictates the efficiency of the algorithm. By conditioning the complete data on the available data, an improved estimate for the ML parameters is iteratively obtained. This procedure lies at the heart of the EM algorithm.

For the case of GMMs, we augment the available dataset, represented by the set of ensemble realizations, {x} = {x1, …, xN}, to form the complete dataset,
ea1
where ci represents an indicator vector of length M such that
eq21
with (ci)j referring to the jth element of vector ci. (Here, these membership indicators have little physical relevance, and exist merely as a conceptual device within the EM framework.) Conditioned on the additional knowledge of the set {c} = {c1, …, cN}, we assume to know the origin of each realization, namely, the mixture component that generated it. This knowledge gives rise to closed form solutions for the ML estimator of the parameter vector, specifically the following:
ea2
ea3
ea4
where
ea5
With the addition of the dataset {c} = {c1, …, cN}, we have thus completed the data vector (i.e., in some sense, we pretend that we know which mixture component generated each realization, so as to get the EM iterations started). In the real EM algorithm, however, a realization is not hard wired to a particular mixture component, as done above. Rather, the algorithm iteratively estimates the weights with which a given realization is associated with each of the M mixture components.
In what follows, to avoid lengthy expressions, we neglect random variable subscripts when describing pdfs with the understanding that their arguments are realizations of this random variable. For instance, for the following pdfs:
ea6
x is the realization of random variable X.

a. Derivation of EM algorithm

We let {x} = {x1, …, xN} denote the set of available data, {z} the complete data vector and θ = {θ1, …, θM} the set of parameters (to be determined) of the chosen distributional form, p({z}; θ). We further assume, as is often the case, that the available data are a unique and deterministic function of the complete data, that is, {x} = g({z}). (For instance, this may simply be a subset of the complete data.) By the total probability theorem (e.g., Bertsekas and Tsitsiklis 2008), we may thus write the following:
ea7
ea8
By taking logarithms,A1 we consequently obtain for any value of {z} that satisfies {x} = g({z}):
ea9
By further taking expectations with respect to the complete data, conditioned on the available data and parameterized by an arbitrary vector (to be optimized),
ea10
the left-hand side of (A9) remains unaffected,
ea11
and we thus obtain
ea12
For the sake of convenience, we define the notation
ea13
ea14
to obtain the simplified expression
ea15
By application of Gibbs’ inequality (MacKay 2003), we see that
ea16
ea17
ea18
Therefore, if we denote as our present estimate for the parameter vector, by choosing such that it further satisfies , we guarantee that
ea19
ea20
ea21
Consequently, upon repeated iterations, our estimate for the parameter vector monotonically increases the (log) likelihood of generating the data at hand, {x} = {x1, …, xN}. Assuming further that the likelihood is bounded from above, we are thus guaranteed to converge to a stationary point and as such obtain an estimate for the ML parameter vector (Casella and Berger 2001). In summary, the EM algorithm proceeds as follows.
EM algorithm

Given the available data, {x} = {x1, …, xN}, initial parameter estimate, θ(0), proposed complete data vector {z} with predetermined, user-specified distribution, p({z}; θ), repeat until convergence:

  • Using the present parameter estimate θ(k), form
    ea22
  • Update the estimate for the parameter vector, θ(k+1), by maximizing U(θ; θ(k)):
    ea23

Next, we apply the EM algorithm to multivariate GMMs. We provide the derivation in a condensed manner; we refer to Sondergaard (2011) for full details.

b. The EM algorithm with Gaussian Mixture Models (GMMs)

We augment the available dataset, {x} = {x1, …, xN}, generated by a GMM of unknown parameters,
ea24
to form the complete dataset,
ea25
as described in (A1).
By the assumed independence of the data, the probability distribution for the complete data takes the following form:
ea26
ea27
Upon taking logarithms we obtain
ea28
By further taking the conditional expectation of (A28) with respect to the available data, arbitrarily parameterized by vector θ(k), we consequently obtain the expression to be maximized under the EM algorithm:
ea29
ea30
For convenience of notation, we define the following:
ea31
ea32
This completes the E step of the EM algorithm (A22).
We proceed with evaluating θ(k+1), the parameter vector θ, which maximizes U(θ; θ(k)). This forms the M step of the EM algorithm (A23). To determine the updated mixture weights , we augment U(θ; θ(k)) using Lagrange multipliers and so introduce the auxiliary function Λ with multiplier λ:
ea33
By equating to zero the gradients of Λ with respect to πp and λ, we obtain after manipulations the final expression:
ea34
where is the sum total of particles associated with a given mixture component p, under the present estimate for the parameter vector θ(k). With this, we proceed to determine the unconstrained parameters, and . To obtain the updated mixture mean vectors , we equate the appropriate partial derivative of Λ with zero:
ea35
to obtain
ea36
Similarly, to obtain the updated mixture covariance matrices , we enforce (with knowledge of )
ea37
to ultimately arrive at
ea38
This completes the condensed derivation of the EM algorithm as applied to GMMs. The algorithm is summarized in the main body of the text in (10)(13). For additional remarks on the EM algorithm and its application to GMMs, including the choice of starting parameters and the issue of convergence, we refer to Sondergaard (2011).

APPENDIX B

Variations of the GMM-DO Filter

a. EM algorithm in q-dominant space of stochastic subspace

Estimating and manipulating nontrivial pdfs in high-dimensional spaces can be a difficult task (Bengtsson et al. 2003). Heuristic arguments suggest that the number of realizations required to accurately represent multivariate pdfs grows exponentially with the dimension of the space (Silverman 1992). This is one of the reasons why we investigate approximations to our main scheme that would allow efficient fitting of GMMs to realizations when the dimension of the stochastic subspace itself is large and may pose a difficulty. Another reason arises from oceanic and atmospheric applications. In such applications, the variance of the ESSE or DO modes is often found to decay rapidly with mode number (e.g., Lermusiaux 1999a,b, 2001, 2007; Sapsis and Lermusiaux 2011). In addition, the accuracy of the low variance modes is not as good as that of the large variance modes: this is mainly because of their much smaller variance and of their proximity to the truncation index and thus unmodeled interactions with the truncated modes. As a result, trying to fit all structures of the marginal probabilities for these low variance modes is likely not needed and can in fact reduce the robustness in the Bayesian inversion. Finally, it reduces the computational cost.

As in the main text, we let the dimension of the stochastic subspace be s (i.e., ∈ ℝn×s). When deemed necessary on the grounds of tractability and mode variance decay, we can limit our estimation of mixtures to the stochastic coefficients associated with the space defined by the q most dominant modes, denoting this q ∈ ℝn×q. We in turn approximate the stochastic coefficients of the remaining sq modes, {Φq+1, …, Φs}, as zero mean Gaussian with (co)variances based on the sample covariance matrix. For our purposes, an obvious and appropriate measure of dominance is the variance of each of the stochastic coefficients.

Next, we define this modified EM algorithm for GMMs in a q-dominant space.

EM algorithm in -dominant space of stochastic subspace
Given the set of realizations, {φ} ∈ ℝs×N, associated with the stochastic subspace, ∈ ℝn×s, we limit our attention to the ensemble set, {φq} ∈ ℝq×N, associated with the q-dominant reduced space, q ∈ ℝn×q, of the stochastic subspace (i.e., qs). We define q such that the following holds:
eb1
where C denotes a user-specified constant chosen such that the majority of the energy in the stochastic subspace is captured. [Note, we assume that the stochastic coefficients, Φi, are ordered by decreasing variance, i.e., var(Φ1) ≥ var(Φ2) ≥ … ≥ var(Φs). Other ratios are also possible (e.g., Lermusiaux 2007; Sapsis and Lermusiaux 2011).]
Based on the reduced ensemble set, , and initial parameter estimate,
eq22
appropriately sized for the reduced EM estimation procedure, we repeat until convergence:
  • For all i ∈ {1, …, N} and j ∈ {1, …, M}, use the present parameter estimate, θq,(k), to form
    eb2
  • For all j ∈ {1, …, M}, update the parameter estimate, θq,(k+1), according to
    eb3
    eb4
    eb5
    where
    eb6
Once converged, we obtain the GMM associated with the stochastic subspace, ∈ ℝn×s, by embedding the above q-dominant vectors and matrices into their adequately sized equivalent:
eb7
and
eb8
where ∈ ℝs×s is the sample covariance matrix,
eb9
and a:b,c:d denotes the submatrix of defined by rows a, b and columns c, d.

In the above, we arrive at (B7) and (B8) by application of the law of iterated expectations and the law of total variance, respectively (e.g., Bertsekas and Tsitsiklis 2008), ensuring that the stochastic coefficients, {Φq+1, …, Φs}, are approximated as zero mean Gaussian distributions with variances based on the sample covariance matrix.

b. EM algorithm with a constrained mean for the Gaussian Mixture Model

In the DO decomposition (24), we impose a zero-mean constraint on the random vector, Φ(ω), represented by the ensemble set, {φ} = {φ1, …, φN}. Since the EM algorithm is an unconstrained optimization procedure in this regard, however, the EM fit of the GMM may not necessarily itself be of zero mean:
eb10
While the test cases presented in Part II of this two-part paper give evidence to suggest that this is little cause for concern (namely that this mean offset is negligible and tends to zero as N increases), we nonetheless propose two possible remedies:
  1. When forming the auxiliary function in (A33), one may add the constraint that the GMM be of zero mean:
    eb11
    thus updating the auxiliary function (in the stochastic subspace) to
    eb12
    While this clearly provides a viable solution, a closer inspection reveals that such a constraint destroys the simplicity of the EM algorithm. Particularly, the closed form equations (11)(13) for the updated mixture parameters then no longer arise. Rather, the GMM parameters to be optimized become intimately coupled.
  2. A complementary approach first estimates the parameter vector by means of our regular EM algorithm for GMMs. This estimate is then in turn fed as a first guess to the coupled set of equations obtained in 1) above, for which an iteration procedure of choice may be utilized. Since based on experience we know that the first guess is good for N large enough, we expect that only a few iterations are needed to converge to an optimal set of parameter values satisfying the additional zero mean constraint.

REFERENCES

  • Alspach, D. L., and H. W. Sorenson, 1972: Nonlinear Bayesian estimation using Gaussian sum approximations. IEEE Trans. Autom. Control, 17, 438448.

    • Search Google Scholar
    • Export Citation
  • Anderson, J. L., and S. L. Anderson, 1999: A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts. Mon. Wea. Rev., 127, 27412758.

    • Search Google Scholar
    • Export Citation
  • Auclair, F., P. Marsaleix, and P. D. Mey, 2003: Space-time structure and dynamics of the forecast error in a coastal circulation model of the Gulf of Lions. Dyn. Atmos. Oceans, 36, 309346.

    • Search Google Scholar
    • Export Citation
  • Bengtsson, T., C. Snyder, and D. Nychka, 2003: Toward a nonlinear ensemble filter for high-dimensional systems. J. Geophys. Res., 108, 8775, doi:10.1029/2002JD002900.

    • Search Google Scholar
    • Export Citation
  • Bennett, A., 1992: Inverse Methods in Physical Oceanography. Cambridge University Press, 346 pp.

  • Bennett, A., 2002: Inverse Modeling of the Ocean and Atmosphere. Cambridge University Press, 234 pp.

  • Bertsekas, D. P., and J. N. Tsitsiklis, 2008: Introduction to Probability. 2nd ed. Athena Scientific, 544 pp.

  • Bishop, C. M., 2006: Pattern Recognition and Machine Learning. Springer, 738 pp.

  • Bocquet, M., C. A. Pires, and L. Wu, 2010: Beyond Gaussian statistical modeling in geophysical data assimilation. Mon. Wea. Rev., 138, 29973023.

    • Search Google Scholar
    • Export Citation
  • Casella, G., and R. L. Berger, 2001: Statistical Inference. 2nd ed. Duxbury, 660 pp.

  • Chen, R., and J. S. Liu, 2000: Mixture Kalman filters. J. Roy. Stat. Soc., 62B, 493508.

  • Cover, T. M., and J. A. Thomas, 2006: Elements of Information Theory. Wiley-Interscience, 748 pp.

  • Commission on Physical Sciences, Mathematics, and Applications, 1993: Statistics and Physical Oceanography. The National Academies Press, 62 pp.

  • Daley, R., 1991: Atmospheric Data Analysis. Cambridge University Press, 457 pp.

  • Dee, D. P., and A. M. D. Silva, 2003: The choice of variable for atmospheric moisture analysis. Mon. Wea. Rev., 131, 155171.

  • Dempster, A. P., N. M. Laird, and D. B. Rubin, 1977: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc., 39B, 138.

    • Search Google Scholar
    • Export Citation
  • Dimet, F. X. L., and O. Talagrand, 1986: Variational algorithms for analysis and assimilation of meteorological observations. Tellus, 38A, 97110.

    • Search Google Scholar
    • Export Citation
  • Doucet, A., N. de Freitas, and N. Gordon, 2001: Sequential Monte-Carlo Methods in Practice. Springer-Verlag, 612 pp.

  • Dovera, L., and E. D. Rossa, 2011: Multimodal ensemble Kalman filtering using Gaussian mixture models. Comput. Geosci., 15,307323.

  • Duda, R. O., P. E. Hart, and D. G. Stork, 2001: Pattern Classification. 2nd ed. Wiley-Interscience, 654 pp.

  • Eisenberger, I., 1964: Genesis of bimodal distributions. Technometrics, 6, 357363.

  • Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte-Carlo methods to forecast error statistics. J. Geophys. Res., 99 (C5), 10 14310 162.

    • Search Google Scholar
    • Export Citation
  • Evensen, G., 2007: Data Assimilation: The Ensemble Kalman Filter. Springer, 279 pp.

  • Eyink, G. L., and S. Kim, 2006: A maximum entropy method for particle filtering. J. Stat. Phys., 123, 10711128.

  • Frei, M., and H. R. Kunsch, 2013: Mixture ensemble Kalman filters. Comput. Stat. Data Anal., 58, 127138.

  • Gelb, A., 1974: Applied Optimal Estimation. MIT Press, 374 pp.

  • Ghanem, R., and P. Spanos, 1991: Stochastic Finite Elements: A Spectral Approach. Springer-Verlag, 214 pp.

  • Ghil, M., and P. Malanotte-Rizzoli, 1991: Data assimilation in meteorology and oceanography. Advances in Geophysics, Vol. 33, Academic Press, 141–266.

  • Holmes, P., J. Lumley, and G. Berkooz, 1996: Turbulence, Coherent Structures, Dynamical Systems, and Symmetry. Cambridge University Press, 420 pp.

  • Hoteit, I., D. T. Pham, G. Triantafyllou, and G. Korres, 2008: A new approximate solution of the optimal nonlinear filter for data assimilation in meteorology and oceanography. Mon. Wea. Rev., 136, 317334.

    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., H. L. Mitchell, and L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique. Mon. Wea. Rev., 126, 796811.

    • Search Google Scholar
    • Export Citation
  • Ide, K., P. Courtier, M. Ghil, and A. Lorenc, 1997: Unified notation for data assimilation: Operational, sequential and variational. J.Meteor. Soc. Japan, 75, 181189.

    • Search Google Scholar
    • Export Citation
  • Jazwinski, A. H., 1970: Stochastic Processes and Filtering Theory. Academic Press, 376 pp.

  • Kalman, R. E., 1960: A new approach to linear filtering and prediction problems. Trans. ASME, 82D, 3545.

  • Kalnay, E., 2003: Atmospheric Modeling, Data Assimilation and Predictability. Cambridge University Press, 341 pp.

  • Kim, S., G. L. Eyink, J. M. Restrepo, F. J. Alexander, and G. Johnson, 2009: Ensemble filtering for nonlinear dynamics. Mon. Wea. Rev., 131, 25862594.

    • Search Google Scholar
    • Export Citation
  • Kotecha, J. H., and P. A. Djuric, 2003: Gaussian particle filtering. IEEE Trans. Signal Process., 51, 25922601.

  • Krause, P., and J. M. Restrepo, 2009: The diffusion kernel filter applied to Lagrangian data assimilation. Mon. Wea. Rev., 137, 43864400.

    • Search Google Scholar
    • Export Citation
  • Lermusiaux, P. F. J., 1997: Data assimilation via error subspace statistical estimation. Ph.D. thesis, Division of Engineering and Applied Sciences, Harvard University, 402 pp.

  • Lermusiaux, P. F. J., 1999a: Data assimilation via error subspace statistical estimation. Part II: Middle Atlantic Bight shelfbreak front simulations. Mon. Wea. Rev., 127, 14081432.

    • Search Google Scholar
    • Export Citation
  • Lermusiaux, P. F. J., 1999b: Estimation and study of mesoscale variability in the Strait of Sicily. Dyn. Atmos. Oceans, 29, 255303.

  • Lermusiaux, P. F. J., 2001: Evolving the subspace of the three-dimensional multiscale ocean variability: Massachusetts Bay. J. Mar. Syst., 29, 385422.

    • Search Google Scholar
    • Export Citation
  • Lermusiaux, P. F. J., 2006: Uncertainty estimation and prediction for interdisciplinary ocean dynamics. J. Comput. Phys., 217, 176199, doi:10.1016/j.jcp.2006.02.010.

    • Search Google Scholar
    • Export Citation
  • Lermusiaux, P. F. J., 2007: Adaptive modeling, adaptive data assimilation and adaptive sampling. Physica D, 230, 172196.

  • Lermusiaux, P. F. J., and A. Robinson, 1999: Data assimilation via error subspace statistical estimation. Part I: Theory and scheme. Mon. Wea. Rev., 127, 13851407.

    • Search Google Scholar
    • Export Citation
  • Lermusiaux, P. F. J., C.-S. Chiu, and A. R. Robinson, 2002a: Modeling uncertainties in the prediction of the acoustic wavefield in a shelfbreak environment. Theoretical and Computational Acoustics, E.-C. Shang, Q. Li, and T. Gao, Eds., World Scientific Publishing Co., 191–200.

  • Lermusiaux, P. F. J., A. R. Robinson, P. J. Haley, and W. G. Leslie, 2002b: Advanced interdisciplinary data assimilation: Filtering and smoothing via error subspace statistical estimation. Proc. OCEANS 2002 MTS/IEEE Conf., Biloxi, MS, IEEE, 795–802.

  • Lermusiaux, P. F. J., and Coauthors, 2006: Quantifying uncertainties in ocean predictions. Oceanography, 19, 92105.

  • Lions, J. L., 1971: Optimal Control of Systems Governed by Partial Differential Equations. Springer-Verlag, 396 pp.

  • Lorenz, E., 1963: Deterministic nonperiodic flow. J. Atmos. Sci., 20, 130141.

  • MacKay, D. J. C., 2003: Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 628 pp.

  • Malanotte-Rizzoli, P., 1996: Modern Approaches to Data Assimilation in Ocean Modeling. Elsevier, 455 pp.

  • McLachlan, G., and D. Peel, 2000: Finite Mixture Models. John Wiley & Sons, Inc., 419 pp.

  • Miller, R. N., M. Ghil, and F. Gauthiez, 1994: Advanced data assimilation in strongly nonlinear dynamical systems. J. Atmos. Sci., 51, 10371056.

    • Search Google Scholar
    • Export Citation
  • Moore, A. M., H. G. Arango, E. D. Lorenzo, B. D. Cornuelle, A. J. Miller, and D. J. Neilson, 2004: A comprehensive ocean prediction and analysis system based on the tangent linear and adjoint of a regional ocean model. Ocean Modell., 7, 227258, doi:10.1016/j.ocemod.2003.11.001.

    • Search Google Scholar
    • Export Citation
  • Papoulis, A., 1965: Probability, Random Variables and Stochastic Processes. McGraw-Hill, 583 pp.

  • Pham, D. T., 2001: Stochastic methods for sequential data assimilation in strongly nonlinear systems. Mon. Wea. Rev., 129, 11941207.

  • Robinson, A. R., P. F. J. Lermusiaux, and N. Q. Sloan, 1998: Data assimilation. The Sea, K. H. Brink and A. R. Robinson, Eds., The Global Coastal Ocean: Processes and Methods, Vol. 10, John Wiley and Sons, 541–594.

  • Sapsis, T., 2010: Dynamically orthogonal field equations. Ph.D. thesis, Department of Mechanical Engineering, Massachusetts Institute of Technology, 188 pp.

  • Sapsis, T., and P. F. J. Lermusiaux, 2009: Dynamically orthogonal field equations for continuous stochastic dynamical systems. Physica D, 238, 23472360, doi:10.1016/j.physd.2009.09.017.

    • Search Google Scholar
    • Export Citation
  • Sapsis, T., and P. F. J. Lermusiaux, 2011: Dynamical criteria for the evolution of the stochastic dimensionality in flows with uncertainty. Physica D, 241, 6076, doi:10.1016/j.physd.2011.10.001.

    • Search Google Scholar
    • Export Citation
  • Schwartz, G. E., 1978: Estimating the dimension of a model. Ann. Stat., 6, 461464.

  • Silverman, B., 1992: Density Estimation for Statistics and Data Analysis. Chapman & Hall, 175 pp.

  • Smith, K. W., 2007: Cluster ensemble Kalman filter. Tellus, 59A, 749757.

  • Sobczyk, K., 2001: Information dynamics: Premises, challenges and results. Mech. Syst. Signal Process., 15, 475498.

  • Sondergaard, T., 2011: Data assimilation with Gaussian mixture models using the dynamically orthogonal field equations. M.S. thesis, Department of Mechanical Engineering, Massachusetts Institute of Technology, 180 pp.

  • Sondergaard, T., and P. F. J. Lermusiaux, 2013: Data assimilation with Gaussian Mixture Models using the Dynamically Orthogonal field equations. Part II: Applications. Mon. Wea. Rev., 141, 17611785.

    • Search Google Scholar
    • Export Citation
  • Sura, P., 2010: On non-Gaussian SST variability in the Gulf Stream and other strong currents. Ocean Dyn., 60, 155170.

  • Tarantola, A., 2005: Inverse Problem Theory and Methods for Model Parameter Estimation. SIAM, 342 pp.

  • Tippett, M. K., J. L. Anderson, C. H. Bishop, T. M. Hamill, and J. S. Whitaker, 2003: Ensemble square root filters. Mon. Wea. Rev., 131, 14851490.

    • Search Google Scholar
    • Export Citation
  • Ueckermann, M. P., P. F. J. Lermusiaux, and T. P. Sapsis, 2013: Numerical schemes for dynamically orthogonal equations of stochastic fluid and ocean flows. J. Comput. Phys., 233, 272294, doi:10.1016/j.jcp.2012.08.041.

    • Search Google Scholar
    • Export Citation
  • van Leeuwen, P. J., 2009: Particle filtering in geophysical systems. Mon. Wea. Rev., 137, 40894114.

  • Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations. Mon. Wea. Rev., 130, 19131924.

  • Wunsch, C., 1996: The Ocean Circulation Inverse Problem. Cambridge University Press, 442 pp.

1

As an aside, in ESSE (Lermusiaux 1999b), the update consists of two parts: data assimilation in a fixed-subspace followed by a correction of the subspace based on the innovation vector and posterior misfit. This results in prior and posterior subspaces that differ. We can generalize this subspace learning scheme to the present Bayesian GMM-DO framework, but this is not done here.

A1

In this appendix, to delimit the argument of logarithms, we utilize braces :. These braces for logarithms should not be confused with the braces used to denote a set of realizations as defined in Table 1; for example, still represents a set of realizations in this appendix.

Save