• Arulampalam, M. S., , S. Maskell, , N. Gordon, , and T. Clapp, 2002: A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Processing, 50 , 174188.

    • Search Google Scholar
    • Export Citation
  • Burgers, G., , P. J. van Leeuwen, , and G. Evensen, 1998: Analysis scheme in the ensemble Kalman filter. Mon. Wea. Rev., 126 , 17191724.

  • Cohn, S. E., , A. da Silva, , J. Guo, , M. Sienkiewicz, , and D. Lamich, 1998: Assessing the effects of data selection with the DAO physical-space statistical analysis system. Mon. Wea. Rev., 126 , 29132926.

    • Search Google Scholar
    • Export Citation
  • Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99 , 1014310162.

    • Search Google Scholar
    • Export Citation
  • Evensen, G., 2003: The ensemble Kalman filter: Theoretical formulation and practical implementation. Ocean Dyn., 53 , 343367.

  • Frakt, A. B., , and A. S. Willsky, 2001: Computationally efficient stochastic realization for internal multiscale autoregressive models. Multidimens. Syst. Signal Processing, 12 , 109142.

    • Search Google Scholar
    • Export Citation
  • Gelb, A., 1974: Applied Optimal Estimation. MIT Press, 374 pp.

  • Gordon, N. J., , D. J. Salmond, , and A. F. M. Smith, 1993: Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proc. F Radar Signal Process., 140 , 107113.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., , J. S. Whitaker, , and C. Synder, 2001: Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter. Mon. Wea. Rev., 129 , 27762790.

    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., , and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 129 , 123137.

    • Search Google Scholar
    • Export Citation
  • Irving, W. W., , P. W. Fieguth, , and A. S. Willsky, 1997: An overlapping tree approach to multiscale stochastic modeling and estimation. IEEE Trans. Image Processing, 6 , 15171529.

    • Search Google Scholar
    • Export Citation
  • Jazwinsky, A. H., 1970: Stochastic Processes and Filtering Theory. Academic Press, 376 pp.

  • Keppenne, C. L., , and M. M. Rienecker, 2002: Initial testing of a massively parallel ensemble Kalman filter with the Poseidon isopycnal ocean general circulation model. Mon. Wea. Rev., 130 , 29512965.

    • Search Google Scholar
    • Export Citation
  • Lorenc, A. C., 2003: The potential of the ensemble Kalman filter for NWP: A comparison with 4D-Var. Quart. J. Roy. Meteor. Soc., 129 , 31833203.

    • Search Google Scholar
    • Export Citation
  • Margulis, S. A., , D. McLaughlin, , D. Entekhabi, , and S. Dunne, 2002: Land data assimilation and estimation of soil moisture using measurements from the Southern Great Plains 1997 Field Experiment. Water Resour. Res., 38 .1299, doi:10.1029/2001WR001114.

    • Search Google Scholar
    • Export Citation
  • McLaughlin, D., 2007: A probabilistic perspective on nonlinear model inversion and data assimilation. Subsurface Hydrology: Data Integration for Properties and Processes, Geophys. Monogr., Vol. 171, Amer. Geophys. Union, 243–253.

    • Search Google Scholar
    • Export Citation
  • Mitchell, H. L., , P. L. Houtekamer, , and G. Pellerin, 2002: Ensemble size, balance, and model-error representation in an ensemble Kalman filter. Mon. Wea. Rev., 130 , 27912808.

    • Search Google Scholar
    • Export Citation
  • Ott, E., and Coauthors, 2004: A local ensemble Kalman filter for atmospheric data assimilation. Tellus, 56A , 415428.

  • Popinet, S., 2003: Gerris: A tree-based adaptive solver for the incompressible Euler equations in complex geometries. J. Comput. Phys., 190 , 572600.

    • Search Google Scholar
    • Export Citation
  • Reichle, R. H., , and R. D. Koster, 2005: Global assimilation of satellite surface soil moisture retrievals into the NASA Catchment land surface model. Geophys. Res. Lett., 32 .L02404, doi:10.1029/2004GL021700.

    • Search Google Scholar
    • Export Citation
  • Reichle, R. H., , D. B. McLaughlin, , and D. Entekhabi, 2002: Hydrologic data assimilation with the ensemble Kalman filter. Mon. Wea. Rev., 130 , 103114.

    • Search Google Scholar
    • Export Citation
  • Tippett, M. K., , J. L. Anderson, , C. H. Bishop, , T. M. Hamill, , and J. S. Whitaker, 2003: Ensemble square root filters. Mon. Wea. Rev., 131 , 14851490.

    • Search Google Scholar
    • Export Citation
  • Willsky, A. S., 2002: Multiresolution Markov models for signal and image processing. IEEE Proc., 90 , 13961458.

  • Zhou, Y., 2006: Multi-sensor large scale land surface data assimilation using ensemble approaches. Ph.D. thesis, Massachusetts Institute of Technology, 234 pp.

  • Zhou, Y., , D. McLaughlin, , D. Entekhabi, , and V. Chatdarong, 2006: Assessing the performance of the ensemble Kalman filter for land surface data assimilation. Mon. Wea. Rev., 134 , 21282142.

    • Search Google Scholar
    • Export Citation
  • View in gallery

    A multiscale tree with scaling factor q.

  • View in gallery

    Spatial domain for the example.

  • View in gallery

    Forecast velocity (u and υ) correlation coefficients between point (8, 8) and all the grid cells over a 64 × 128 domain at t = 0.84. (top) Sample correlation from an ensemble with 52 replicates. (middle) Correlation derived from tree model using the same ensemble as in the first row. (bottom) True correlation from an ensemble with 6240 replicates.

  • View in gallery

    Forecast velocity (u and υ) correlation coefficients between cell (88, 56) and all other cells. Arrangement is the same as in Fig. 3.

  • View in gallery

    Filter replicates, ensemble mean, and true velocity u time series at (left) cell (276, 472) and (right) cell (440, 296).

  • View in gallery

    (top half) Ensemble mean of u before and after update at measurement times, and the corresponding true values. (lower half) Ensemble standard deviation of u before and after update.

  • View in gallery

    (top half) Ensemble mean of υ before and after update at measurement times, and the corresponding true values. (lower half) Ensemble standard deviation of υ before and after update.

  • View in gallery

    RMSE of the ensemble mean of difference between u and υ and the corresponding true velocity (averaged over the entire domain).

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 14 14 4
PDF Downloads 4 4 2

An Ensemble Multiscale Filter for Large Nonlinear Data Assimilation Problems

View More View Less
  • 1 Ralph Parsons Laboratory, Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts
© Get Permissions
Full access

Abstract

Operational data assimilation problems tend to be very large, both in terms of the number of unknowns to be estimated and the number of measurements to be processed. This poses significant computational challenges, especially for ensemble methods, which are critically dependent on the number of replicates used to derive sample covariances and other statistics. Most efforts to deal with the related problems of computational effort and sampling error in ensemble estimation have focused on spatial localization. The ensemble multiscale Kalman filter described here offers an alternative approach that effectively replaces, at each update time, the prior (or background) sample covariance with a multiscale tree. The tree is composed of nodes distributed over a relatively small number of discrete scales. Global correlations between variables at different locations are described in terms of local relationships between nodes at adjacent scales (parents and children). The Kalman updating process can be carried out very efficiently on such a tree, especially if the update calculations exploit the tree’s parallel structure. In fact, the resulting savings in effort far exceeds the additional work required to construct the tree. The tree-identification process offers possibilities for introducing localization in scale, which can be used instead of or in addition to localization in space. The multiscale filter is able to continually adapt to changing problem scales through associated changes in the tree structure. This is illustrated with a large (106) unknown turbulent fluid flow example that generates dynamic features that span a wide range of time and space scales. This filter is able to track changing features over long distances without any spatial localization, using a moderate ensemble size of 54. The computational savings provided by the multiscale approach, combined with opportunities for hybrid localization over both space and scale, offer significant practical benefits for large data assimilation applications.

Corresponding author address: Dennis McLaughlin, Bldg. 48-329, 15 Vassar Street, Cambridge, MA 02139. Email: dennism@mit.edu

Abstract

Operational data assimilation problems tend to be very large, both in terms of the number of unknowns to be estimated and the number of measurements to be processed. This poses significant computational challenges, especially for ensemble methods, which are critically dependent on the number of replicates used to derive sample covariances and other statistics. Most efforts to deal with the related problems of computational effort and sampling error in ensemble estimation have focused on spatial localization. The ensemble multiscale Kalman filter described here offers an alternative approach that effectively replaces, at each update time, the prior (or background) sample covariance with a multiscale tree. The tree is composed of nodes distributed over a relatively small number of discrete scales. Global correlations between variables at different locations are described in terms of local relationships between nodes at adjacent scales (parents and children). The Kalman updating process can be carried out very efficiently on such a tree, especially if the update calculations exploit the tree’s parallel structure. In fact, the resulting savings in effort far exceeds the additional work required to construct the tree. The tree-identification process offers possibilities for introducing localization in scale, which can be used instead of or in addition to localization in space. The multiscale filter is able to continually adapt to changing problem scales through associated changes in the tree structure. This is illustrated with a large (106) unknown turbulent fluid flow example that generates dynamic features that span a wide range of time and space scales. This filter is able to track changing features over long distances without any spatial localization, using a moderate ensemble size of 54. The computational savings provided by the multiscale approach, combined with opportunities for hybrid localization over both space and scale, offer significant practical benefits for large data assimilation applications.

Corresponding author address: Dennis McLaughlin, Bldg. 48-329, 15 Vassar Street, Cambridge, MA 02139. Email: dennism@mit.edu

1. Introduction

Environmental data assimilation problems tend to be very large, both in terms of the number of discrete unknowns to be estimated and the number of observations to be processed. The large number of unknowns results from a desire to capture the wide range of time and space scales encountered in many environmental problems. So, for example, efforts to resolve smaller features in ocean circulation models typically lead to finer computational grids with more cells and more unknowns. Until recently, the number of discrete states associated with spatially distributed environmental models tended to be larger than the number of measurements available for data assimilation. The situation has changed dramatically with the widespread availability of networked observing systems and high-resolution remote sensing observations. Now the number of observations to be processed can be as large as or even larger than the number of discrete states.

The combination of many unknowns and many measurements imposes a large computational burden on the probabilistic estimation methods commonly used to solve environmental data assimilation problems. Such methods typically require repeated solutions of the discretized governing equations (i.e., the forward model) and, in some cases, manipulation of very large matrices. Computational effort is an especially important issue for ensemble Kalman filters, which can perform poorly if the number of random replicates in the ensemble is too small but can quickly become computationally infeasible as the number of replicates is increased.

In the ensemble Kalman filter a dynamic forecasting model is used to propagate random replicates of the state vector between measurement times (Evensen 2003). When measurements become available, all of the replicates are updated to reflect the new information gained. The filter updates are derived from sample covariances computed from the propagated ensemble. If the ensemble is too small, sampling errors can lead to incorrect updates. In fact, when sampling errors are significant measurement updating can actually be counterproductive, since the updated estimation error variance can be greater than the prior variance (Hamill et al. 2001; Lorenc 2003).

The computational and sampling-related difficulties encountered in ensemble filtering have prompted development of a number of methods for simplifying or approximating large problems. Most of these methods divide the original problem into many smaller and more manageable subproblems, using variations on the concept of localization. Spatial localization techniques solve subproblems that focus on nearby states and measurements, relying on the assumption that correlations between variables at different locations should be small beyond a certain characteristic separation distance. Examples of this approach include methods based on covariance filtering with Schur products (Hamill et al. 2001; Houtekamer and Mitchell 2001; Reichle and Koster 2005), methods that perform updates in small blocks of grid cells (Reichle et al. 2002; Margulis et al. 2002; Ott et al. 2004), and hybrid methods that combine covariance filtering with blocking (Keppenne and Rienecker 2002). These localization methods improve computational efficiency while also suppressing the adverse effects of sampling error.

The sample covariances that are explicitly or implicitly modified by localization can be viewed as approximate descriptions of the physical relationships embedded in the forecasting model. If the covariances are more or less arbitrarily changed it is possible that they will no longer properly portray these relationships. A number of researchers have observed and commented on imbalances introduced by localization in meteorological applications (Mitchell et al. 2002; Lorenc 2003). The solution to imbalance problems is to increase the characteristic length scale of the localization procedure. While this improves balance it also reduces the filtering and efficiency gains that make localization attractive.

In effect, the localization scale is a tuning parameter that trades off computational effort, sampling error, and imbalance error. It is likely that the optimum localization scale is application-dependent and, as a result, difficult to determine in advance. This is particularly true for problems where the dominant scales of variability change over time and space, requiring associated changes in the localization scale.

This paper presents a new multiscale approach to ensemble data assimilation that is designed primarily to improve computational efficiency but also provides some new options for introducing localization. Multiscale estimation is based on the concept of replacing the forecast covariance with a multiscale tree that implicitly describes spatial correlations. The primary advantages of the multiscale approach are its computational efficiency and its ability to adaptively localize in both space and scale. As in traditional ensemble Kalman filtering, the multiscale approach provides approximate estimates that may work better in some applications than in others.

In the next section of this paper we provide a brief review of ensemble filtering, primarily to introduce notation. We then introduce relevant multiscale estimation concepts. Next we show how ensemble methods and multiscale estimation can be combined. We illustrate the performance of the resulting ensemble multiscale filter with an example and conclude with a review of the advantages and limitations of the multiscale approach.

2. Ensemble filtering

The uncertainty in both models and measurements is the primary justification for taking a probabilistic approach to environmental estimation problems. Bayesian estimation theory provides a convenient framework for such an approach. Suppose that the system of interest is characterized at time t by the spatially discretized state vector xt. Each element of this vector corresponds to a distributed variable (e.g., pressure, temperature, velocity) evaluated at a particular cell on a fixed computational grid. The Bayesian approach treats xt as a random vector that may be characterized by an unconditional (or prior) joint probability density p(xt).

Now suppose that measurements of the state or other related variables available at time τ are assembled in the vector yτ. Then the conditional density p(xt|y0:T) characterizes everything we know about xt from prior information and from measurements obtained in the interval τ ∈ [0, T] (Jazwinsky 1970). It is generally neither feasible nor desirable to derive this multivariate density for large problems. In practice, we focus particular distributional properties, such as the mean, the mode, and two-point covariances. Nevertheless, it is useful to work with p(xt|y0:T) during problem formulation.

There are a number of ways to estimate properties of p(xt|y0:T), depending on the problem at hand. Here we focus on filtering problems, where estimates are desired at the end of a growing measurement interval (so T is always equal to the current time t). We suppose that the state and measurement vectors are described as follows:
i1520-0493-136-2-678-e1
i1520-0493-136-2-678-e2
where ut is a vector of random model inputs, which are not necessarily white or additive; et is a measurement noise vector, which we assume to be zero mean and white in time; and xt has a random initial condition x0. The function ft() is derived from a spatially discretized time-dependent model of the system dynamics. To simplify the discussion we make the optional but convenient assumption that the measurements are related linearly to the states, through the measurement matrix 𝗛t. The random initial state, input vector, and measurement noise are all characterized by probability densities, which we assume to be given. We also assume that these random vectors are independent of one another. Errors in model structure are assumed to be represented by auxiliary model inputs included in (1).

The Markovian structure of the state equation in (1) enables us to solve the Bayesian estimation problem recursively. In this case, the process of deriving the probability densities of xt at t divides into two steps: a forecast from time t − 1 to time t and an update at time t. At any given time t > 0 the forecast step derives the forecast density p(xt|y0:t−1) from the previously updated density p(xt−1|y0:t−1). The update step at t derives the new updated density p(xt|y0:t) from p(xt|y0:t−1) and the likelihood p(xt|yt).

In practice, the forecast and updated densities are frequently assumed to be Gaussian so they can be completely characterized by their means and covariances (Gelb 1974). This approach, which is taken in the classical Kalman filter, can fail to capture important system features when the state and/or measurement equations are nonlinear, or when ut, et, or x0 are non-Gaussian (McLaughlin 2007). The ensemble Kalman filter deals with nonlinearity during the forecast step by working with ensembles of randomly generated replicates represented by xjt|t−1 (for the forecast ensemble at t) and xjt|t (for the updated ensemble at t; Arulampalam et al. 2002; Gordon et al. 1993). In this case the nonlinear forecast step may be written as
i1520-0493-136-2-678-e3
where xjt−1|t−1 is obtained from the update at time t − 1, and xj0|0 and ujt are synthetically generated replicates drawn from the initial state and input probability densities.
In the ensemble Kalman filter the updated replicates are derived by adjusting the forecast replicates as follows (Burgers et al. 1998; Evensen 1994):
i1520-0493-136-2-678-e4
where ŷjt|t−1 is a measurement prediction replicate defined by
i1520-0493-136-2-678-e5
and 𝗞t is the Kalman gain defined by
i1520-0493-136-2-678-e6
The vector ejt is a synthetically generated zero-mean random measurement perturbation, drawn from the specified probability density of the measurement error et. The matrix 𝗥t is the covariance of et.
In this paper the expression (v, w) indicates a sample estimate of cov(v, w) computed from N replicates of v and w. This sample covariance can be written as a matrix product of the following form:
i1520-0493-136-2-678-e7
where is a matrix with column j the mean-removed replicate j and is a matrix with column j the mean-removed replicate j. The expressions cov(v) and (v) are shorthand for cov(v, v) and (v, v), respectively.

In most ensemble estimation problems N is less than the dimensions of the state and measurement vectors, and the sample covariances used to derive the Kalman gain are rank deficient. However, the matrix inverted in (6) is full rank if 𝗥t is full rank. Also, note that the Kalman gain in the ensemble version of the Kalman filter is a nonlinear function of past measurements since it depends, through the sample covariances, on replicates derived from these measurements.

The linear update of (4) yields an ensemble that converges to the exact p(xt|y0:t) if the state and measurement at t have a Gaussian joint conditional density p(xt, yt|y0:t−1). There are many other versions of the ensemble Kalman filter, including square root forms that do not require the addition of random measurement perturbations in (5). Good reviews of some of the alternatives are provided by Tippett et al. (2003).

For nonlinear problems the ensemble Kalman filter is a compromise that makes no restrictive assumptions during the forecast step but makes implicit Gaussian assumptions during the update step. Despite this compromise, experience shows that the ensemble Kalman filter provides an acceptable approximation in many applications. This issue is discussed in more detail in citations provided in Evensen (2003) as well as in Zhou et al. (2006). There is, however, no guarantee that the assumptions made in the ensemble Kalman filter will always be acceptable, and caution must be used when applying this technique to highly nonlinear problems.

As mentioned in the introduction, the ensemble Kalman filter update has some features that can complicate its application to large environmental problems. First, it requires computation and manipulation of large matrices (covariances, their square roots, or matrices of covariance eigenvectors, depending on the particular computational implementation used). Second, it is limited by sampling error, which can be significant when the ensemble is small. The multiscale estimation approach deals with the computational issue by introducing a fast scale-recursive update. This approach may also be able to reduce the adverse effects of sampling error without introducing spatial localization.

3. Multiscale models

a. Multiscale trees

Multiscale trees provide an efficient and convenient way to represent correlations between the elements of very large state vectors, such as those obtained from spatially discretized stochastic models. In a tree representation global correlations are built up from local correlations between nearby tree nodes. To see how this is done we need to introduce a number of definitions.

A multiscale tree model consists of a set of abstract nodes that may be visualized as shown in Fig. 1. Groups of nodes are organized into scales distinguished as separate rows in the tree diagram. The scale with the most nodes (at the bottom) is called the finest scale, while the scale with only one node (the root node at the top) is the coarsest scale. Each node s on the tree is associated with a relatively small nodal state vector χ(s) of dimension n(s). A detailed discussion of multiscale modeling is provided in Willsky (2002).

In a tree model the nodes at any given scale are related indirectly through their connections with common nodes located higher up the tree. As shown in Fig. 1, each internal tree node is connected to a parent and to several children. The parent–child connections are indicated graphically by lines on the tree diagram. For present purposes we suppose that every node (except a finest-scale node) has q children. We represent the children of s by 1, 2, . . . , q. Also, every node (except the root node) has a single parent . The index m(s) indicates the scale of node s (i.e., the row on the tree diagram containing s). This index increases from 0 at the top of the tree (coarsest scale) to M at the bottom of the tree (finest scale). The subset of finest-scale nodes that descend from node s is indicated by T(s) and the corresponding vector of finest-scale states is χM(s). The set of all nodes at the finest scale is TM and the corresponding vector of all finest-scale states is χM. The subtree rooted at node s (i.e., the set composed of s and all its descendants) is indicated by S(s).

In the applications considered here a multiscale tree is used to approximate conditional correlations between elements of the global state vector xt at a particular update time t. This can be done if the tree structure is appropriately defined. We assume that the tree topology (e.g., the geometric arrangement of nodes on the tree) is given but the definitions of the states at the tree nodes are to be determined. The process of identifying the nodal states above the finest scale is discussed in detail in the next section. For now it is sufficient to consider the definition of the finest-scale state.

Computational efficiency is achieved by dividing the very large global state vectors xt|t−1 and xt|t into many small local vectors, each assigned to a particular finest-scale tree node. A convenient way to do this is to divide the computational grid (usually two- or three-dimensional) of the original system model into relatively small blocks of nearby grid cells. Each block of cells, indicated by the set B(s), is assigned to a particular finest-scale tree node s. The resulting correspondence between global and tree states is given by
i1520-0493-136-2-678-e8
where E() indicates mathematical expectation and 𝗣 is a specified invertible matrix of ones and zeros that maps each element of xt|t−1 associated with B(s) to a corresponding element in χM. When applied to individual forecast replicates the node mapping can be written as
i1520-0493-136-2-678-e9
where x is the sample mean.

b. Multiresolution autoregressive models and internality

In a multiscale estimation framework the states at different finest-scale nodes are related indirectly through their common ancestors (Frakt and Willsky 2001). There are many ways to do this. For estimation applications it is especially convenient to construct the tree so that it satisfies a multiscale extension of the well-known Markov property of time series analysis. As we shall see, this property makes it possible for measurement updates to be computed recursively in scale, with a two-stage process that moves up and then back down the tree.

For now it is sufficient to note that, if the multiscale Markov property holds, the state at a given node s can be related to the state at its parent by the following downward recursion:
i1520-0493-136-2-678-e10
where 𝗔(s) is a downward transition matrix and w(s) is a zero-mean random scale perturbation with covariance 𝗤(s). The root node state χ(0) that initializes the recursion is a zero-mean random variable with covariance cov[χ(0)]. The multiscale Markov property implies that the w(s) values at different nodes are uncorrelated with one another and with χ(s). Note that the scale of the parent node is m() = m(s) − 1.
An equivalent upward recursion can be written as
i1520-0493-136-2-678-e11
where 𝗙(s) is an upward transition matrix and w′(s) is a zero-mean random scale perturbation with covariance 𝗤′(s). Here again, the multiscale Markov property implies that the w′(s) at different nodes are uncorrelated with one another and with χ(s). The upward recursion is initialized with one of the random finest-scale zero-mean states in χM.

The upward and downward recursions given above define a multiscale autoregressive (MAR) model of the tree states χ(s). This MAR model is analogous to the autoregressive models frequently encountered in time series analysis. If (10) is applied repeatedly it can be used to construct the covariances of χM and xt|t−1 from the 𝗔(s) and 𝗤(s) matrices and the root node covariance cov[χ(0)]. So the downward scale recursion, which relies only on local relationships between parents and children, provides an alternative (approximate) description of the global forecast sample covariance. This description is approximate because it is constrained by the tree topology and the state definitions and recursion parameters used in (10). The basic idea of multiscale estimation is to replace the information contained in the global forecast covariance with (10) and (11). There is no need to actually evaluate cov(χM) or cov(xt|t−1).

The MAR model of (10) and (11) enables us to develop very efficient multiscale updating algorithms. To take advantage of this capability we need to know how to design a tree that satisfies the multiscale Markov property. The design process is greatly facilitated if we focus on trees that have certain so-called internal properties. To define these properties suppose that χ(s) is the state vector at node s at scale m(s) < M and χm(s)+1(s) is the vector of all states at the children of s, which are all at scale m(s) + 1. The tree is said to be locally internal if χ(s) is a linear combination of the states at its children, for all nodes on the tree. This requirement can be expressed concisely as follows (Frakt and Willsky 2001):
i1520-0493-136-2-678-e12
where 𝗩(s) is an n(s) × nm(s)+1 dimensional matrix associated with node s and nm(s)+1 is the sum of the dimensions of the state vectors χ(i) for i = 1, . . . , q. The set of 𝗩(s) matrices defines, through (12), all the coarser-scale states on the tree.

The multiscale Markov property has a convenient special form (called the scale-recursive Markov property) that holds when the tree is locally internal. The scale-recursive Markov property relies on the fact that any given node s at scale m(s) partitions the nodes at the next finer-scale m(s) + 1 into q + 1 sets. The first q sets consist of the q children of s. The final q + 1 set consists of the complementary group of all nodes that are at scale m(s) + 1 but are not children of s. The scale-recursive Markov property holds if and only if the vector of all states in any one of these q + 1 sets is conditionally uncorrelated with the vector of all states in each of the remaining q sets, given χ(s) (Frakt and Willsky 2001). When this property is satisfied we can derive the MAR model matrices 𝗔(s), 𝗤(s), 𝗙(s), and 𝗤′(s).

c. Identification of multiscale tree models

The discussion presented above indicates that we can obtain the MAR model needed for an efficient multiscale measurement update if we can select the 𝗩(s) to insure that scale-recursive Markov property is satisfied. In particular, at each internal tree node s, 𝗩(s) should conditionally decorrelate all the states in the q + 1 sets of nodes partitioned by node s. To obtain perfect decorrelation we generally need to use high-dimensional coarser-scale nodal state vectors. This defeats the purpose of the tree, which is to provide a concise and efficient alternative to traditional estimation methods. For practical applications we need to constrain state dimensionality at each coarser-scale node or, equivalently, we need to limit the number of rows in the corresponding 𝗩(s) matrix to be less than or equal to some specified value d(s). Then the identification problem at node s reduces to a search for the 𝗩(s) with n(s) ≤ d(s) that minimizes the conditional covariance among the q + 1 sets partitioned by s.

The constrained tree-identification problem is easier to solve if we simplify the decorrelation process by only requiring that the state zi(s) = χ(i) associated with child i of node s must be conditionally uncorrelated with all other states at scale m(s) + 1. These other states are collected in the complementary vector zic(s) (Frakt and Willsky 2001). This simplification makes it possible to focus on pairwise conditional correlations between zi(s) and the individual elements of zic(s).

An additional simplification is achieved if we limit the set of 𝗩(s) candidates to block diagonal matrices having the following form:
i1520-0493-136-2-678-e13
The submatrix 𝗩i(s) corresponds to child i and has dimension di(s) × n(i), where di(s) is specified. The block diagonal structure of 𝗩(s) implies that each row of χ(s) is a linear combination of states at a particular child of s. The block diagonal restriction enables us to divide the 𝗩(s) identification problem into q smaller problems that each focus on the influence of a particular 𝗩i(s) on the conditional covariance between zi(s) and zic(s) (Frakt and Willsky 2001).
With these simplifications in mind, consider the conditional covariance, given χ(s), between zi(s) and zic(s):
i1520-0493-136-2-678-e14
This covariance needs to be zero, for each i = 1, . . . , q, in order for the scale-recursive Markov property to hold exactly. In practice, we cannot obtain zero correlation if di(s) is constrained. Instead, we seek the 𝗩i(s) that gives the smallest possible cov[zi(s), zic(s)|χ(s)].
Rather than attempt to minimize the conditional covariance matrix directly we use an indirect method called predictive efficiency, which achieves the same objective by working with the following scalar mean-squared error Ji(s):
i1520-0493-136-2-678-e15
where ic(s) is an estimate of zic derived from χ(s). This expression is minimized, for a given χ(s), by the conditional expectation (Jazwinsky 1970):
i1520-0493-136-2-678-e16
If χ(s) is expressed as χ(s) = 𝗩i(s)zi(s), the minimum of Ji(s) depends on 𝗩(s). If 𝗩(s) is an identity with di(s) = n(sαi), the conditional covariance cov[zi(s), zic(s)|χ(s)] is zero [this is easily checked by substituting χ(s) = zi(s) into (14)] and Ji(s) has some value Ji0(s). For any other 𝗩i(s) with di(s) < n(sαi) the Ji(s) will be at least as large as Ji0(s) and the conditional covariance will not be zero. In this case, the departure of the conditional covariance from its desired value of zero can be measured in terms of the difference between Ji(s) and Ji0(s) (Frakt and Willsky 2001):
i1520-0493-136-2-678-e17
The expression for ϵ[zic(s)|χ(s)] given after the second equality applies when the tree states are jointly Gaussian, which is the implicit assumption used in the ensemble Kalman filter update. In the predictive efficiency approach the best choice of 𝗩i(s) is taken to be the one that minimizes ϵ[zic(s)|χ(s)]:
i1520-0493-136-2-678-e18
This choice also minimizes the conditional covariance cov[zi(s), zic(s)|χ(s)], as desired.
Frakt and Willsky (2001) show that the 𝗩i(s) that minimizes ϵ[zic(s)|χ(s)] is given by the first di(s) rows of the following matrix 𝗩′i(s):
i1520-0493-136-2-678-e19
The columns of the matrix 𝗨i(s) are the eigenvectors of the following n(i) dimensional square matrix:
i1520-0493-136-2-678-e20
These eigenvectors are assumed to be arranged according to the magnitudes of the corresponding eigenvalues, from largest to smallest.

The predictive efficiency method outlined above uses (19) and (20) to compute a local internal matrix 𝗩i(s) for each child of s. The q 𝗩i(s) matrices obtained for all the children can be assembled to form 𝗩(s), as specified in (13). The total number of rows in the resulting 𝗩(s) matrix may exceed the total number of rows d(s) originally specified for 𝗩(s) in the nodal state vector size constraint. Following Frakt and Willsky (2001), we deal with this issue by retaining only the d(s) rows in 𝗩(s) that correspond to the d(s) largest predictive efficiency eigenvalues. Note that this will generally result in some of the reduced 𝗩i(s) matrices having more rows than others. That is, some children will contribute more elements to χ(s) than others.

Note that the 𝗩(s) matrices identified in the predictive efficiency procedure depend on low-dimensional prior covariances of the states in the vectors zi(s) and zic(s) at scale m(s) + 1. The scale m(s) + 1 covariances depend, in turn, on the 𝗩(s) matrices and prior covariances at scale m(s) + 2, and so on. This interscale dependence suggests that the 𝗩(s) matrices should be computed recursively, starting with covariances between elements of χM at the finest scale and gradually moving up the tree. Equation (12) is used to derive the required χ(s) covariances at scale m(s) from those at scale m(s) + 1.

In an ensemble implementation the exact finest-scale prior covariances are replaced by sample covariances computed from the finest-scale prior replicates χjM, which are derived from (9). Then 𝗩′i(s) and 𝗩i(s) at scale m(s) = M − 1 are obtained by substituting the sample covariances into (19) and (20). These internal matrices are used to compute the prior replicates at scale m(s) = M − 1, according to the ensemble version of (12):
i1520-0493-136-2-678-e21
This provides the set of replicates needed to compute sample covariances and 𝗩(s) matrices at scale m(s) = M − 2 and so on, up the tree to the root node.
It is possible to derive the MAR scale transition and covariance matrices that appear in (10) and (11) directly from the 𝗩(s) obtained from the predictive efficiency procedure. In an ensemble implementation these tree parameters can be written as functions of sample prior covariances as follows (Frakt and Willsky 2001):
i1520-0493-136-2-678-e22
i1520-0493-136-2-678-e23
i1520-0493-136-2-678-e24
i1520-0493-136-2-678-e25
These sample covariances can all be derived recursively, from the replicates computed in (21).

If the size of the vector zic(s) is large derivation of the sample cross covariance between zi(s) and zic(s) required in (20) can be computationally demanding. The computational effort can be dramatically reduced if the sample cross covariance only considers correlations between zi(s) and the elements of zic(s) nodes in a limited set (or neighborhood) Nh(i) of h nodes at scale m(s) + 1. This set can be selected to focus on nodes that are most likely to be strongly correlated with zi(s). One option is to relate Nh(i) to the spatial support of i. Each node i in the tree has a spatial support defined by the set of grid cells assigned to nodes in T(i), the set of finest-scale descendants of i. Nodes included in Nh(i) also have spatial supports defined by their finest-scale descendants. If we wish to use spatial proximity as a criterion for selecting Nh(i), we can, for example, assign to Nh(i) those nodes at scale m(s) + 1 whose supports are adjacent to the support of i. This is the approach taken in our example.

Frakt and Willsky (2001) show that the computational complexity of the predictive efficiency procedure can be reduced from O[n(x)2] to O[n(x)] if the sample covariances for zi(s) only include elements of zic(s) associated with that lie in Nh(i). The result is a substantial savings in computational effort for large problems.

Note that the use of spatial proximity screening in the predictive efficiency procedure differs from the spatial localization techniques discussed in the introduction. First, predictive efficiency is used only to identify tree parameters, so the screening procedure does not limit the scope of updates. Even with screening, each node is related to all other nodes and to all measurements through common ancestors on the tree. Second, the spatial support associated with a neighborhood with a given size h increases in size at coarser scales, making it easier for important long-distance correlations to have an impact on the overall structure of the tree. Predictive efficiency screening, like spatial localization, tends to filter out spurious fluctuations due to sampling errors. The example discussed later in this paper illustrates this filtering action. Of course, if the screening is too severe important information will be lost and there will be a decline in performance. So some judgment is involved in selecting the appropriate neighborhood size h for a given application.

It is useful to summarize the approximations and simplifications adopted in the tree-identification procedure outlined above:

  • The tree covariances and related tree parameters are sample estimates derived from a finite ensemble of forecast states.
  • There is an upper limit on the dimension d(s) of coarser-scale states.
  • The predictive efficiency method used at node s considers only correlations between zi(s) and zic(s), rather than between all q + 1 sets of scale m(s) + 1 nodes partitioned by node s.
  • The predictive efficiency method assumes that the 𝗩(s) matrix is block diagonal.
  • At any given node i the predictive efficiency method only considers correlations with nodes in the nodal neighborhood Nh(i) at scale m(s) + 1.

These approximations may compromise the tree’s ability to reproduce the true forecast covariance. However, in an ensemble application the last four approximations may also help suppress spurious long-distance correlations due to the first approximation. When this occurs, the approximate tree covariance may actually be more accurate than a traditional sample covariance derived from the same ensemble. This possibility is discussed further in the conclusions section.

4. An ensemble multiscale update procedure

Like the standard ensemble Kalman filter, the ensemble multiscale filter is designed to update the forecast replicates xjt|t−1 with a set of current measurements yt. The result is a new ensemble of updated replicates xjt|t. The multiscale update is carried out on a tree characterized by 𝗩(s) matrices derived from the forecast ensemble, using the predictive efficiency method described in the previous section. In the ensemble version all exact covariances cov() are replaced by sample covariances () computed over the replicates χj(s). If the tree provides a perfect representation of the sample forecast covariance the multiscale update will give the same result as the traditional ensemble Kalman filter. In general, the two updates will differ because the tree only approximates the sample covariance.

To use a multiscale framework we need to assign the global measurements in the vector yt to particular tree nodes, much as the global states at grid cells are assigned to the finest-scale tree nodes. In many geophysical applications measurements are taken over a range of spatial supports, varying from subgrid cell scales to regions as large as the entire computational domain. The most flexible way to accommodate a diverse set of measurements is to relate them all to the states at the finest scale. To make this explicit it is helpful to introduce some new definitions. In particular, suppose that the support of a particular scalar measurement yti in the global measurement vector yt consists of cells assigned to tree nodes in a subset TMi of the set of all finest-scale nodes TM. This measurement can be located at any tree node s that lies above all nodes in TMi, as implied by the following expression:
i1520-0493-136-2-678-e26
Recall that T(s) is the set of all finest-scale nodes descended from s.

In practice, a number of issues need to be considered in locating measurements on the tree, including the size of the nodal measurement vector (which affects the cost of various matrix computations) and the number of descendant nodes without measurements (which affects the cost of the update procedure). In general, the multiscale approach outlined here offers considerable flexibility since all nodal measurements are ultimately related to physically identifiable grid cells, regardless of where they are placed on the tree.

Once a measurement has been assigned to a particular tree node s it may be related to finest-scale states through the following tree-based measurement equation:
i1520-0493-136-2-678-e27
Here 𝗵(s) is matrix that relates the measurement to the finest-scale tree states χM(s) descended from s and e(s) is a zero-mean measurement error vector with specified covariance 𝗿(s). All of the measurements in the vector yt may be assigned to tree nodes in this way. The result is a group of measurement equations having the form of (27) for the set H of all nodes with measurements.

Willsky (2002) and a number of references he cites describe a static two-sweep multiscale estimation algorithm that derives the Gaussian conditional mean and covariance for states and measurements distributed on a multiscale tree. Here we present an adaptation of this algorithm suitable for ensemble applications. In our version the focus is on random replicate updates, rather than moment updates. Appendixes A and B of Zhou (2006) show that the sample moments of the updated replicates converge to the exact conditional moments as the number of replicates increases.

The multiscale updating procedure is a two-sweep recursion that moves up the tree and then back down. In an ensemble implementation the upward sweep generates replicates χj(s|s) at node s that are conditioned on all measurements at or below the scale of s. The downward sweep generates replicates χj(s|S) at node s that are conditioned on all measurements on the tree, above as well as at or below s. Each updated global replicate xjt|t is constructed from the elements of the finest-scale replicate obtained at the end of the downward sweep. The details are described in sections 4a and 4b.

a. Upward sweep

The upward sweep of the ensemble multiscale update algorithm derives at each node s a set of updated replicates χj(s|s) that depend on measurements located at s and its descendants. The algorithm is a recursion that performs an update at each node at a given scale and then proceeds to the next higher scale. The nature of the update at a given node s depends on whether measurements are located at or below the s. The general form is similar to the classical Kalman filter update and can be written in two parts. First is the case in which there is no measurement at or below the specified node s. This condition can be stated as HS(s) = |I2, where S(s) is the set of nodes at or descended from s and |I2 is the null set:
i1520-0493-136-2-678-e28
Here the updated replicate is the same as the prior obtained from the forecast step. Next is the case in which there is at least one measurement at or below s:
i1520-0493-136-2-678-e29
where 𝗞(s) is a multiscale version of the Kalman gain, defined below; Yj is an augmented perturbed measurement vector; and Ŷj is an augmented measurement prediction vector.
At scales m(s) < M above the finest scale the perturbed and predicted measurement vectors associated with a measured node s that lies above other measured nodes are
i1520-0493-136-2-678-e30
and
i1520-0493-136-2-678-e31
Here the node i is a child of s that lies at or above a measured node and 0 ≤ kq. In terms of the notation defined above:
i1520-0493-136-2-678-e32
The zero-mean random measurement perturbation ej(s) has the same covariance 𝗿(s) as e(s) in (27) and is included to insure that the update algorithm yields the correct conditional covariance. If k = 0 or if s is at the finest scale there are no measured nodes below s and the two augmented measurement vectors at s are
i1520-0493-136-2-678-e33
i1520-0493-136-2-678-e34
 At scales m(s) < M above the finest scale the perturbed and predicted measurement vectors associated with an unmeasured node s that lies above measured nodes are
i1520-0493-136-2-678-e35
and
i1520-0493-136-2-678-e36
In this case the final entries appearing in (30) and (31) are omitted. Taken together, the augmented measurement equations form a recursion that propagates the information in measurements placed at tree nodes up the tree so that any node that is measured or has a measured descendant will be updated.
The multiscale Kalman gains used to compute the update and to construct the augmented measurement vectors depend on an augmented measurement error covariance matrix that reflects the definitions of the augmented measurement vectors. At scales above the finest scale the augmented measurement error covariance matrix 𝗥(s) depends on gains and measurement error covariances at the children 1, . . . , k. The expression for 𝗥(s) at a measured node s that lies above other measured nodes is
i1520-0493-136-2-678-e37
Here diag[] represents a square matrix with k + 1 × k + 1 square blocks. The diagonal blocks for i = 1, . . . , k have dimension n(i), and diagonal block k + 1 has dimension n[y(s)], where n[] indicates the dimension of the vector argument. All off-diagonal blocks are zero. If k = 0 or is s is at the finest scale there are no measured nodes below s and the augmented measurement error covariance matrix is
i1520-0493-136-2-678-e38
The covariance expression for an unmeasured node s that lies above measured nodes is
i1520-0493-136-2-678-e39
In all cases the Kalman gain is given by
i1520-0493-136-2-678-e40
This multiscale gain matrix depends only weakly on the replicate values [through its dependence on 𝗥(s) and the sample covariance matrices] and will generally be full rank. In this case the matrix to be inverted in (40) is nonsingular, even for small ensemble sizes.

The gain computation starts at the finest-scale m(s) = M, where the measurement error covariance and Kalman gain are computed at measured nodes. The replicates at all finest-scale nodes are then updated with (28) and (29). After this, the recursion moves up to scale m(s) = M − 1, where the new augmented measurement error covariance is computed from the finest-scale measurement error covariances and Kalman gains at all measured finest-scale nodes. Then the gains at scale m(s) = M − 1 are computed, the replicates are updated, and the recursion proceeds up to the next scale. This continues until upward sweep updates have been performed at all scales. Note that the upward sweep algorithm given here does not make explicit use of the multiscale transition equations of (10) and (11) but it does use 𝗩(s) matrices and (12) to compute the sample covariances in the Kalman gain expression. As we have seen, these 𝗩(s) matrices convey the same information as the scale transition equations.

b. Downward sweep

The downward sweep of the ensemble multiscale update algorithm derives at each node s a set of smoothed replicates χj(s|S) that depend on all measurements on the tree. At the end of the upward sweep, the updated root node replicates χj(s|s) = χj(s|S) at scale m(s) = 0 incorporate all tree measurements and so already constitute a smoothed ensemble. At any node s below the root smoothed replicates χj(s|S) are obtained by adjusting the corresponding updated replicates χj(s|s) from the upward sweep.

The adjustment at scale m(s) is carried out with a linear update similar to the one used for smoothing problems in time series analysis:
i1520-0493-136-2-678-e41
i1520-0493-136-2-678-e42
This update requires computation of a set of projected replicates χj(|s) at for m(s) > 0. The projected replicates characterize the state at the parent of s, given measurements at s and its descendants. They are obtained from the following version of the MAR upward recursion in (11):
i1520-0493-136-2-678-e43
The random perturbation wj(s) is added to insure that the sample statistics are consistent with those of (11). Note that a different set of projected replicates is obtained at from each of the q children of .
The smoothing gain 𝗝(s) in (42) is given by
i1520-0493-136-2-678-e44
The replicate adjustment made in (42) is proportional to the difference between the smoothed replicate χj(|S) and the projected replicate χj(|s) at . This difference reflects the new information obtained from measurements that are not accounted for in χj(|s) [i.e., are not at nodes in S(s) but are accounted for in χj(|S). These nodes are in the complement of S(s)]. Note that χj(|S) is available from the downward sweep computations at scale m(s) − 1, the scale above m(s).
The downward sweep ends at the finest-scale m(s) = M with a set of smoothed replicates χj(s|S). The desired ensemble of updated global replicates is obtained from the inverse of (9):
i1520-0493-136-2-678-e45
where xjMS is a replicate of all smoothed finest-scale states. The smoothed replicates account for all measurements taken at time t. The measurement update is carried out entirely on the tree and does not require computation of global covariances.

c. Algorithm summary

The multiscale ensemble estimation algorithm summarized in sections 3 and 4 consists of several distinct steps:

Forecast

  • The simulation model (state equation) of (3) is used to derive random forecast replicates of the discretized state vector xjt|t−1 at time t from random updated replicates xjt−1|t−1 at time t − 1. The replicates at both times are conditioned on measurements taken through t − 1.

Tree identification and computation of prior tree states

  • Equation (9) is used to derive the finest-scale tree replicates in χjM from the forecast replicates xjt|t−1 for a specified tree topology (e.g., the number of scales M + 1, the number of children per parent q, and the number of states per finest-scale node d).
  • Local sample state covariances are constructed at each node s at the finest-scale m(s) = M. These covariances and (19) and (20) are used to compute the internal weighting matrices 𝗩(s).
  • The prior state replicates at the next scale m(s) = M − 1 are derived from the 𝗩(s) matrices using (21).
  • The upward and downward tree parameters for scale m(s) are computed from local sample cross covariances of the state replicates, as specified in (22) through (25).
  • The identification process continues recursively up the tree until it reaches the root node at m(s) = 0.

Measurement allocation

  • Individual elements yti of the global measurement vector yt are assigned to tree nodes subject to the requirement of (26). The measurements at node s are related to χ(s) by (27).

Update: Upward sweep

  • The augmented perturbed and predicted measurement vectors, the measurement error covariance, and the Kalman gain are constructed at the finest scale from the prior state replicates in χjM and (33), (34), (38), and (40).
  • The updated states χj(s|s) at the finest-scale m(s) = M are derived from (28) and (29).
  • The augmented perturbed and predicted measurement vectors, the measurement error covariance, and the Kalman gain at nodes at the next scale m(s) = M − 1 are derived from the prior state replicates χj(s) and (30), (31), (35), (36), (37), (38), (39), and (40).
  • The updated states χj(s|s) at scale m(s) = M − 1 are derived from (28) and (29).
  • The updating process continues recursively up the tree until it reaches the root node at m(s) = 0, yielding χj(s|s).

Update: Downward sweep

  • The smoothed replicates χj(s|S) at the root node are constructed from (41).
  • Equations (43) and (44) are used to compute the projected replicates χj(0|s) and smoothing gain J(s) for each node s at scale m(s) = 1.
  • Equation (42) is used to derive the smoothed replicates χj(s|S) at each node s at scale m(s) = 1.
  • The smoothing process continues recursively down the tree until it computes the smoothed replicates χj(s|S) at the finest-scale m(s) = M. The global updated replicates xjt|t are obtained from (45).

d. Computational complexity

The complexity of the multiscale update depends on both the tree identification (which must generally be performed at every update time) and the measurement update. To obtain some order of magnitude estimates of computational complexity we suppose that 1) the state dimension at every node is d, 2) every global (finest scale) state is measured, 3) each parent has q children, 4) the number of scales is M + 1, and 5) the predictive efficiency covariance calculation uses a screen consisting of h nodes.

The complexities of the major computational operations for the corresponding multiscale filter are summarized in Table 1. It is apparent that the computational effort required by this filter depends strongly on the size of the nodal state dimension d. That is why it is important to limit the size of this tree parameter small. Also, computational effort may be decreased from the estimates in Table 1 if the measurements are sparse and located high on the tree, so that nodes lower on the tree do not need to be updated during the upward sweep. The memory requirement for the update is as much as 2qM times the storage requirement for one finest-scale node.

We can compare the complexities for the multiscale and traditional Kalman filters if we make the reasonable assumptions that the number of replicates N = d and the neighborhood size h = q2. Then the traditional Kalman filter has complexity O(q3Md3), which is much more expensive than the multiscale update, whose total complexity is O(qMd3).

If spatial localization is used the complexity of the traditional filter can decrease significantly, depending on the approach. For example, when the region of interest is divided into qM completely independent blocks, each containing d states, the traditional filter complexity decreases to O(qMd3). This complexity, which is the same as the multiscale filter, is obtained at the cost of ignoring all correlations between states and between states and measurements that extend beyond the spatial localization block. The complexities of other localization methods are likely to fall between those of the traditional and multiscale filters. The multiscale approach is most computationally advantageous, compared to localized Kalman filters, when the problem at hand cannot be readily divided into subproblems of fixed size. This can occur in situations in which dominant features can vary greatly in scale over both time and space.

5. Example

a. Experimental setup

As suggested above, the concept of multiscale data assimilation is best illustrated with problems that exhibit a range of evolving scales. The two-dimensional incompressible flow example considered here has such variable scales. The flow field is characterized by random vortices that grow and move over time. These vortices are generated by the impact of a jet hitting a barrier placed near one of the domain boundaries, as shown in Fig. 2. The simulation model used for the forecasting step of the data assimilation is perfect except for uncertainty in the position of the jet along the domain boundary. The uncertain boundary condition leads to uncertain velocities with correlation scales that continually change as the vortices pass by.

The objective in our example is to characterize the complete velocity field with conditional statistics derived from synthetically generated measurements of the longitudinal (u) velocity component. The measurements are generated from a “true” velocity field which is obtained by running the forecasting model with a single time-dependent random realization for the jet position. The problem is sufficiently large (over 106 states) to be computationally challenging.

The governing equations are the two nondimensional momentum equations and the continuity equation:
i1520-0493-136-2-678-e46
i1520-0493-136-2-678-e47
and
i1520-0493-136-2-678-e48
where u and υ are the longitudinal and transverse velocity components, p is pressure, and η is the eddy viscosity. The eddy viscosity is set equal to zero in (46) and (47), but numerical viscosity is generated at the corners of the barrier when the velocity equations are discretized.

We solve (46)(48) with Gerris (Popinet 2003), which is an open source free software library for fluid flow applications. Gerris uses an adaptive mesh refinement method that adds detail in regions with small-scale flow features. The solution algorithm combines finite volume spatial discretization with a quad-tree nested refinement scheme and a multilevel Poisson solver for pressure computations. Advection terms are discretized using a robust second-order upwind scheme.

The boundary conditions for the example are listed in Table 2. The random jet center position c(t) is described by the following temporally discretized autoregressive equation:
i1520-0493-136-2-678-e49
where w(t) is zero-mean white normally distributed noise with standard deviation 0.04. This expression causes the jet entrance to move up and down along the left boundary in a random fashion, staying mostly near the center of this boundary. The initial conditions are zero velocity everywhere.

For the example considered here it is convenient to specify a regular grid for input and output purposes. Gerris refines this grid internally, without intervention from the user. The computational effort of the refined simulation is generally proportional to the size of the user-specified grid. We consider two resolutions for the regular input–output grid: a coarse 128 × 64 grid that is used to assess the tree’s ability to represent the forecast covariance and a fine 1024 × 512 grid that is used to demonstrate the estimation capabilities of the ensemble multiscale filter. Both grids cover the simulation domain of Fig. 2.

The states to be estimated in this problem are the discretized longitudinal and transverse velocities defined on the flow model’s regular computational grid (coarse or fine). The uncertain jet position is not estimated but the state updates attempt to correct for jet position errors (since the effects of position uncertainty are reflected in the random state replicates). The discretized pressures are diagnostic variables that are derived from the velocities at the previous time step. The tree properties used for the coarse and fine multiscale models are summarized in Table 3.

b. Assessment of the tree approximation

The performance of the ensemble multiscale filter depends on the tree’s ability to represent the true forecast covariance matrix. To obtain a quantitative assessment of this ability it is helpful to distinguish 1) (xt|t−1), the forecast covariance obtained from the downward recursion (10), using tree parameters derived from the forecast replicates, the predictive efficiency procedure, and a given tree topology; 2) (xt|t−1), the sample forecast covariance derived directly from the ensemble xjt|t−1; and 3) the true forecast covariance cov(xt|t−1). For practical purposes, we suppose that the true forecast covariance is equal to the sample covariance obtained from a very large ensemble.

To assess the tree approximation we use an ensemble of 6240 replicates to estimate cov(xt|t−1) over the coarse model grid described in Table 3. Tests with smaller ensembles indicate that the covariance computations nearly converge to asymptotic values for N = 6240. We compare the large-sample “true” forecast covariance with the small-sample forecast covariance (xt|t−1), and small-sample tree-based covariance (xt|t−1), each derived from 52 replicates. The results of this three-way covariance comparison are shown in Figs. 3 and 4. Figure 3 plots contours of the correlations between velocities at cell (8, 8) (in the upper-left corner of the domain) and all other cells at time t = 0.84. Correlations for the u component are shown in the left column while correlations for the υ component are shown in the right column. The three rows of each figure show the small-sample forecast, small-sample tree, and large-sample forecast covariances, respectively. Figure 4 shows similar plots for velocity correlations between cell (88, 56) (in the lower right corner of the domain) and all other cells.

These figures indicate that the small-sample tree covariances resemble the small-sample forecast covariances but are somewhat smoother and less affected by sampling anomalies, reflecting the benefits of the tree model’s localization properties. Both of the small-sample estimates differ significantly from the large-sample “truth” for the (8, 8) case, which is more affected by local conditions near the jet inlet and barrier. The small-sample approximations are better for the (88, 56) case.

The correlation plots clearly show the superposition of small and large features in this problem. Velocity fluctuations at (8, 8) are generally small-scale and have relatively little correlation with velocities in the rest of the domain. On the other hand, velocity fluctuations at (88, 56) are correlated over longer distances. Note that the u correlations at this point extend primarily in the longitudinal direction, while the υ correlations are primarily transverse. The structure of these correlations changes over time, with the regions of longer correlation generally moving from left to right. Evidence of the blocking pattern used to define the tree’s finest-scale states is apparent in the u correlation plot for (88, 56) where the contour lines change abruptly. Such artifacts are relatively easy to avoid if the grid blocks are allowed to overlap (Irving et al. 1997).

c. Data assimilation

To test the data assimilation capabilities of the multiscale ensemble filter we derive updated velocity replicates at all the nodes on the fine 1024 × 512 Gerris input–output grid. The ensemble size is 52, which is typical for operational problems of this size. The replicates are updated with synthetic longitudinal velocity measurements taken at 2048 equally spaced fine grid locations. The synthetic measurements are generated from the following measurement equation:
i1520-0493-136-2-678-e50
where x(t) is a single random replicate of the 1 048 576 dimensional state vector (defined as the “true” state for evaluations of estimation error), et is a 2048 dimensional vector of independent zero mean normally distributed random measurement errors with standard deviation 1.5, and 𝗛 is a 2048 × 1 048 576 dimensional measurement matrix that selects longitudinal velocities at the measurement locations.

The temporal behavior of the ensemble multiscale filter is illustrated in Fig. 5, which shows longitudinal velocity results at locations (276, 472) and (440, 296), respectively. The true state (red), representative individual replicates (light blue), and the ensemble mean (dark blue) are plotted over the simulation period t ∈ [0, 0.84]. Since (276, 472) is at a measurement location the ensemble mean provides a reasonably good estimate of the true state. The ensemble mean estimate at (440, 296), which is not at a measurement location, is nearly as good.

Figures 6 and 7 show longitudinal and transverse velocity spatial distributions. The top halves of these figures compare the velocity ensemble means before and after the four update times (first and second columns) to the corresponding true values (third column). The complexity and multiscale character of the true velocity are apparent. Note that the forecast means tend to be more symmetric, especially at early times, reflecting the fact that the forecast replicates are generated by randomly located jets distributed symmetrically around the center of the left boundary. The true state is asymmetric because it is generated by a single random jet that happens to be located more often below the center of this boundary. The observations used to derive the updated velocity replicates are able to capture this asymmetry and the ensemble mean gives a reasonably good portrayal of the true velocity field, especially just after the update.

The lower halves of Figs. 6 and 7 show the ensemble standard deviations before (first column) and after (second column) the update times. Note the evidence of the measurement grid and the substantially reduced levels of uncertainty in the updated standard deviations. The updated longitudinal velocity standard deviations generally improve more than the transverse velocity standard deviations. This is reasonable, considering that the transverse velocity estimates are inferred indirectly from noisy measurements of the longitudinal velocity.

Uncertain flow features generated at the jet are able to propagate throughout much of the domain between measurements. Consequently, the benefits of each update are mostly lost by the time of the next update. This suggests that it would be helpful to take measurements more often. Some tree artifacts are visible in the form of sharp vertical or horizontal contours at grid block locations.

Figure 8 shows the time history of the root-mean-squared error between the ensemble mean and the known true velocities, taken over all grid cells. The effects of the four measurement updates are apparent not only in the longitudinal velocity plot but also in the transverse plot, although only longitudinal velocities are measured. The longitudinal velocity error nearly rises to the previous high before falling at each new update time.

The abrupt state updates observed in Kalman filtering generally have the effect of changing the values of conserved quantities such as mass, momentum, and energy. These changes are to be expected if input information is uncertain and forecasts are incompatible with observations. In our application the only input uncertainty is the position of the inlet jet. As a result, the spatially integrated mass over the domain is conserved (within a few percent), even through the updates. The spatially integrated momentum and kinetic energy values (not shown here) generally increase after updates. This appears to reflect the impacts of 1) inlet position uncertainty, which influences the location and movement of simulated vortices; and 2) dissipation due to the lack of subgrid resolution in the forward simulation. New measurements at the update time generally reveal more velocity variability than was forecast, resulting in an increase in spatially integrated momentum and energy.

Although abrupt filter updates may correct for model errors, as they have in our example, they can have adverse impacts on forecasts, especially in meteorological applications (Cohn et al. 1998; Mitchell et al. 2002; Lorenc 2003). For this reason, considerable effort has been devoted to the problem of preserving dynamic balance during filter updates. The multiscale approach provides an opportunity to reassess the balance problem from the perspective of scale (rather than spatial) localization. It may be possible to adjust the tree structure to preserve various balance measures, much as model error perturbations and covariances have been adjusted for this purpose in other applications (Mitchell et al. 2002).

Overall, the ensemble multiscale filter gives results similar to a comparable ensemble Kalman filter, but with less computational effort and without the need for spatial localization. More extensive tests are needed to obtain a definitive assessment of the multiscale filter’s capabilities. It is worth noting that there are very few ensemble methods that can be run in a reasonable amount of time for problems of this size without spatial localization. In fact, we were unable to get a traditional ensemble filter to run in a reasonable time on our multiprocessor cluster for the 1 048 576 state example described here. This makes it difficult to compare multiscale and traditional filter performance for the very large problems for which the multiscale approach is most attractive. Fruitful comparisons could be made with spatially localized ensemble filters that are computationally competitive for such problems.

6. Discussion and conclusions

The ensemble multiscale Kalman filter relies on the same basic structure as the classical ensemble Kalman filter. It uses a nonlinear model to propagate replicates of the system states between measurement times and uses implicit Gaussian assumptions to update these replicates when measurements become available. The difference in the two approaches is in the way the measurement update is implemented. The classical update relies on Kalman gains obtained from large low-rank global sample covariance matrices. These matrices are estimated from the forecast ensemble propagated from the previous update time. The multiscale approach effectively replaces the sample covariances with a tree that is also estimated from the forecast ensemble. However, the multiscale tree relates system variables at different locations through a set of local parent–child relationships rather than spatial correlations. This local tree-based description of spatial structure makes it possible to carry out the Kalman update in much less time than is required by the traditional covariance-based approach.

The tree-based measurement update requires the forecast replicates to be assigned to particular finest-scale tree nodes and the measurements to be assigned to tree nodes at various scales, depending on the measurement support. Once these assignments are carried out the tree is identified directly from the finest-scale ensemble, using a predictive efficiency approach. The resulting tree generally only approximates the sample forecast covariances used in the classical ensemble filter. This is because 1) the states of the coarser tree nodes are truncated, 2) pairwise correlations are used to simplify the predictive efficiency computations, 3) the 𝗩(s) matrices used in the predictive efficiency procedure are assumed to be block diagonal, and 4) predictive efficiency correlations are derived only over a scale-dependent neighborhood around each node. All of these simplifications are introduced to improve computational efficiency.

The truncation and neighborhood screening approximations provide many different options for filtering out sampling error. Since these approximations affect relationships between nodes at different scales they provide a type of localization in scale (rather than in space). Scale-based localization affects the tree’s approximation of spatial correlations indirectly (rather than directly) through its influence on the common ancestors of the finest-scale nodes.

The example presented in this paper illustrates some of the capabilities of the multiscale approach. In particular, it shows that it is possible to obtain reasonable state estimates for a large nonlinear data assimilation problem with changing time and space scales, without any spatial localization. In our example scale localization provides some modest filtering of high-frequency sampling errors but it does not have much effect on larger-scale errors. Overall, the spatial covariance reconstructed from the tree (for diagnostic purposes) looks much like the sample forecast covariance estimated from the ensemble. Both exhibit the same large-scale deviations from the true forecast covariance.

Since the study described here focused primarily on implementation of the ensemble multiscale Kalman filter for a large problem, no effort was made to optimize the scale-localization procedure. Preliminary experiments suggest that it may be difficult to obtain significant suppression of large-scale sampling errors within the confines of the particular tree-identification framework presented here. However, it may be possible to do better if the coarser-scale node dimension d and neighborhood h are allowed to vary over time, rather than required to stay fixed at specified values. Also, it would be relatively easy to combine spatial localization with scale localization to deal with sampling errors over a wider range of conditions than either approach can handle individually.

The propagation step of the ensemble Kalman filter is well known to be amenable to parallelization over the ensemble. However, the sample covariance estimation portion of the traditional ensemble Kalman update requires merging of information from all replicates and is not inherently parallel. By contrast, the multiscale tree-identification procedure provides a number of options for parallel computing. For example, the predictive efficiency calculations can be carried out in parallel across all nodes on a given scale. Measurement updates in different tree branches can also be parallelized. These efficiencies are not included in our computational complexity discussion but could be quite beneficial for large problems.

Overall, the ensemble multiscale Kalman filter appears to offer considerable advantages for large nonlinear data assimilation applications. Here the filter’s performance is demonstrated for one particular problem. Other problems may respond somewhat differently to the approximations introduced in the tree-identification procedure. However, our computational complexity analysis can be expected to apply in general. This analysis clearly shows the superiority of the multiscale approach over the classical nonlocalized ensemble Kalman filter. It remains to be seen how the multiscale approach compares to spatial localization procedures that also provide computational benefits over the classical approach. A complete assessment will require an examination of both accuracy and computational effort for a range of problems.

REFERENCES

  • Arulampalam, M. S., , S. Maskell, , N. Gordon, , and T. Clapp, 2002: A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Processing, 50 , 174188.

    • Search Google Scholar
    • Export Citation
  • Burgers, G., , P. J. van Leeuwen, , and G. Evensen, 1998: Analysis scheme in the ensemble Kalman filter. Mon. Wea. Rev., 126 , 17191724.

  • Cohn, S. E., , A. da Silva, , J. Guo, , M. Sienkiewicz, , and D. Lamich, 1998: Assessing the effects of data selection with the DAO physical-space statistical analysis system. Mon. Wea. Rev., 126 , 29132926.

    • Search Google Scholar
    • Export Citation
  • Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99 , 1014310162.

    • Search Google Scholar
    • Export Citation
  • Evensen, G., 2003: The ensemble Kalman filter: Theoretical formulation and practical implementation. Ocean Dyn., 53 , 343367.

  • Frakt, A. B., , and A. S. Willsky, 2001: Computationally efficient stochastic realization for internal multiscale autoregressive models. Multidimens. Syst. Signal Processing, 12 , 109142.

    • Search Google Scholar
    • Export Citation
  • Gelb, A., 1974: Applied Optimal Estimation. MIT Press, 374 pp.

  • Gordon, N. J., , D. J. Salmond, , and A. F. M. Smith, 1993: Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proc. F Radar Signal Process., 140 , 107113.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., , J. S. Whitaker, , and C. Synder, 2001: Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter. Mon. Wea. Rev., 129 , 27762790.

    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., , and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 129 , 123137.

    • Search Google Scholar
    • Export Citation
  • Irving, W. W., , P. W. Fieguth, , and A. S. Willsky, 1997: An overlapping tree approach to multiscale stochastic modeling and estimation. IEEE Trans. Image Processing, 6 , 15171529.

    • Search Google Scholar
    • Export Citation
  • Jazwinsky, A. H., 1970: Stochastic Processes and Filtering Theory. Academic Press, 376 pp.

  • Keppenne, C. L., , and M. M. Rienecker, 2002: Initial testing of a massively parallel ensemble Kalman filter with the Poseidon isopycnal ocean general circulation model. Mon. Wea. Rev., 130 , 29512965.

    • Search Google Scholar
    • Export Citation
  • Lorenc, A. C., 2003: The potential of the ensemble Kalman filter for NWP: A comparison with 4D-Var. Quart. J. Roy. Meteor. Soc., 129 , 31833203.

    • Search Google Scholar
    • Export Citation
  • Margulis, S. A., , D. McLaughlin, , D. Entekhabi, , and S. Dunne, 2002: Land data assimilation and estimation of soil moisture using measurements from the Southern Great Plains 1997 Field Experiment. Water Resour. Res., 38 .1299, doi:10.1029/2001WR001114.

    • Search Google Scholar
    • Export Citation
  • McLaughlin, D., 2007: A probabilistic perspective on nonlinear model inversion and data assimilation. Subsurface Hydrology: Data Integration for Properties and Processes, Geophys. Monogr., Vol. 171, Amer. Geophys. Union, 243–253.

    • Search Google Scholar
    • Export Citation
  • Mitchell, H. L., , P. L. Houtekamer, , and G. Pellerin, 2002: Ensemble size, balance, and model-error representation in an ensemble Kalman filter. Mon. Wea. Rev., 130 , 27912808.

    • Search Google Scholar
    • Export Citation
  • Ott, E., and Coauthors, 2004: A local ensemble Kalman filter for atmospheric data assimilation. Tellus, 56A , 415428.

  • Popinet, S., 2003: Gerris: A tree-based adaptive solver for the incompressible Euler equations in complex geometries. J. Comput. Phys., 190 , 572600.

    • Search Google Scholar
    • Export Citation
  • Reichle, R. H., , and R. D. Koster, 2005: Global assimilation of satellite surface soil moisture retrievals into the NASA Catchment land surface model. Geophys. Res. Lett., 32 .L02404, doi:10.1029/2004GL021700.

    • Search Google Scholar
    • Export Citation
  • Reichle, R. H., , D. B. McLaughlin, , and D. Entekhabi, 2002: Hydrologic data assimilation with the ensemble Kalman filter. Mon. Wea. Rev., 130 , 103114.

    • Search Google Scholar
    • Export Citation
  • Tippett, M. K., , J. L. Anderson, , C. H. Bishop, , T. M. Hamill, , and J. S. Whitaker, 2003: Ensemble square root filters. Mon. Wea. Rev., 131 , 14851490.

    • Search Google Scholar
    • Export Citation
  • Willsky, A. S., 2002: Multiresolution Markov models for signal and image processing. IEEE Proc., 90 , 13961458.

  • Zhou, Y., 2006: Multi-sensor large scale land surface data assimilation using ensemble approaches. Ph.D. thesis, Massachusetts Institute of Technology, 234 pp.

  • Zhou, Y., , D. McLaughlin, , D. Entekhabi, , and V. Chatdarong, 2006: Assessing the performance of the ensemble Kalman filter for land surface data assimilation. Mon. Wea. Rev., 134 , 21282142.

    • Search Google Scholar
    • Export Citation
Fig. 1.
Fig. 1.

A multiscale tree with scaling factor q.

Citation: Monthly Weather Review 136, 2; 10.1175/2007MWR2064.1

Fig. 2.
Fig. 2.

Spatial domain for the example.

Citation: Monthly Weather Review 136, 2; 10.1175/2007MWR2064.1

Fig. 3.
Fig. 3.

Forecast velocity (u and υ) correlation coefficients between point (8, 8) and all the grid cells over a 64 × 128 domain at t = 0.84. (top) Sample correlation from an ensemble with 52 replicates. (middle) Correlation derived from tree model using the same ensemble as in the first row. (bottom) True correlation from an ensemble with 6240 replicates.

Citation: Monthly Weather Review 136, 2; 10.1175/2007MWR2064.1

Fig. 4.
Fig. 4.

Forecast velocity (u and υ) correlation coefficients between cell (88, 56) and all other cells. Arrangement is the same as in Fig. 3.

Citation: Monthly Weather Review 136, 2; 10.1175/2007MWR2064.1

Fig. 5.
Fig. 5.

Filter replicates, ensemble mean, and true velocity u time series at (left) cell (276, 472) and (right) cell (440, 296).

Citation: Monthly Weather Review 136, 2; 10.1175/2007MWR2064.1

Fig. 6.
Fig. 6.

(top half) Ensemble mean of u before and after update at measurement times, and the corresponding true values. (lower half) Ensemble standard deviation of u before and after update.

Citation: Monthly Weather Review 136, 2; 10.1175/2007MWR2064.1

Fig. 7.
Fig. 7.

(top half) Ensemble mean of υ before and after update at measurement times, and the corresponding true values. (lower half) Ensemble standard deviation of υ before and after update.

Citation: Monthly Weather Review 136, 2; 10.1175/2007MWR2064.1

Fig. 8.
Fig. 8.

RMSE of the ensemble mean of difference between u and υ and the corresponding true velocity (averaged over the entire domain).

Citation: Monthly Weather Review 136, 2; 10.1175/2007MWR2064.1

Table 1.

Computational complexity for the ensemble multiscale filter.

Table 1.
Table 2.

Boundary conditions for the example.

Table 2.
Table 3.

Tree inputs for the example.

Table 3.
Save