• Bélanger, P. R., 1974: Estimation of noise covariance matrices for a linear time-varying stochastic process. Automatica, 10, 267275, https://doi.org/10.1016/0005-1098(74)90037-5.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Berry, T., and T. Sauer, 2013: Adaptive ensemble Kalman filtering of non-linear systems. Tellus, 65A, 20331, https://doi.org/10.3402/tellusa.v65i0.20331.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dee, D. P., 1995: On-line estimation of error covariance parameters for atmospheric data assimilation. Mon. Wea. Rev., 123, 11281145, https://doi.org/10.1175/1520-0493(1995)123<1128:OLEOEC>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Guttman, L., 1946: Enlargement methods for computing the inverse matrix. Ann. Math. Stat., 17, 336343, https://doi.org/10.1214/aoms/1177730946.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and J. S. Whitaker, 2005: Accounting for the error due to unresolved scales in ensemble data assimilation: A comparison of different approaches. Mon. Wea. Rev., 133, 31323147, https://doi.org/10.1175/MWR3020.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hodyss, D., and N. Nichols, 2015: The error of representation: Basic understanding. Tellus, 67A, 24822, https://doi.org/10.3402/tellusa.v67.24822.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Janjić, T., and S. E. Cohn, 2006: Treatment of observation error due to unresolved scales in atmospheric data assimilation. Mon. Wea. Rev., 134, 29002915, https://doi.org/10.1175/MWR3229.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Janjić, T., and et al. , 2018: On the representation error in data assimilation. Quart. J. Roy. Meteor. Soc., https://doi.org/10.1002/qj.3130, in press.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Julier, S. J., and J. K. Uhlmann, 2004: Unscented filtering and nonlinear estimation. Proc. IEEE, 92, 401422, https://doi.org/10.1109/JPROC.2003.823141.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kuramoto, Y., and T. Tsuzuki, 1976: Persistent propagation of concentration waves in dissipative media far from thermal equilibrium. Prog. Theor. Phys., 55, 356369, https://doi.org/10.1143/PTP.55.356.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Liu, Z.-Q., and F. Rabier, 2002: The interaction between model resolution, observation resolution and observation density in data assimilation: A one-dimensional study. Quart. J. Roy. Meteor. Soc., 128, 13671386, https://doi.org/10.1256/003590002320373337.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lorenz, E. N., 1963: Deterministic nonperiodic flow. J. Atmos. Sci., 20, 130141, https://doi.org/10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lorenz, E. N., 1996: Predictability—A problem partly solved. Proc. Seminar on Predictability, Vol. 1, Reading, United Kingdom, ECMWF, 18 pp., https://www.ecmwf.int/sites/default/files/elibrary/1995/10829-predictability-problem-partly-solved.pdf.

  • Mehra, R., 1970: On the identification of variances and adaptive Kalman filtering. IEEE Trans. Autom. Control, 15, 175184, https://doi.org/10.1109/TAC.1970.1099422.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mehra, R., 1972: Approaches to adaptive filtering. IEEE Trans. Autom. Control, 17, 693698, https://doi.org/10.1109/TAC.1972.1100100.

  • Mitchell, H. L., and R. Daley, 1997: Discretization error and signal/error correlation in atmospheric data assimilation. Tellus, 49A, 3253, https://doi.org/10.3402/tellusa.v49i1.12210.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Oke, P. R., and P. Sakov, 2008: Representation error of oceanic observations for data assimilation. J. Atmos. Oceanic Technol., 25, 10041017, https://doi.org/10.1175/2007JTECHO558.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ran, A., and R. Vreugdenhil, 1988: Existence and comparison theorems for algebraic Riccati equations for continuous- and discrete-time systems. Linear Algebra Appl., 99, 6383, https://doi.org/10.1016/0024-3795(88)90125-5.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Satterfield, E., D. Hodyss, D. D. Kuhl, and C. H. Bishop, 2017: Investigating the use of ensemble variance to predict observation error of representation. Mon. Wea. Rev., 145, 653667, https://doi.org/10.1175/MWR-D-16-0299.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Simon, D., 2006: Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches. Wiley-Interscience, 552 pp.

    • Crossref
    • Export Citation
  • Sivashinsky, G., 1977: Nonlinear analysis of hydrodynamic instability in laminar flames I. Derivation of basic equations. Acta Astronaut., 4, 11771206, https://doi.org/10.1016/0094-5765(77)90096-0.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Van Leeuwen, P. J., 2015: Representation errors and retrievals in linear and nonlinear data assimilation. Quart. J. Roy. Meteor. Soc., 141, 16121623, https://doi.org/10.1002/qj.2464.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations. Mon. Wea. Rev., 130, 19131924, https://doi.org/10.1175/1520-0493(2002)130<1913:EDAWPO>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • View in gallery

    Demonstrating correlated noise in the truncated L63 system. (a) Comparing the true x coordinate of L63 (gray) to a one-step forecast using the forward Euler method (blue, solid curve) with and the integrated observation (red, dashed curve). (b) Comparing the system error (defined as LTE) to the observation error (only representation error in this example), note the correlation. (c) Empirical covariance matrices, with red lines dividing the , and blocks. (d)–(f) As in (a)–(c), but using the RK4 integrator with the same coarse time step . Color ranges in (c) and (f) are selected to emphasize the matrix and may saturate for and . Notice positive correlations in (b),(c) and negative correlations in (e),(f).

  • View in gallery

    (top) (left) Ground truth 512 gridpoint solution (middle left) the same solution decimated to 64 grid points (middle right) the observation, which integrates the leftmost solution over 9 grid points before truncating, and (right) the observation error, which is the difference between the middle two solutions. (bottom) (left),(middle left) As in (top). (middle right) The 1-step integrator output from the truncated model, using 64 grid points and . (right) System error, difference between the middle two solutions.

  • View in gallery

    For the Kuramoto–Sivashinsky model truncated onto 64 grid points with we show (left) the spatially averaged matrix, note that the cross covariance between dynamical truncation errors and observation representation errors has a larger magnitude than the variance of the observation errors and (right) the eigenvalues of (black, solid curve) decay quickly. The presence of eigenvalues that are very close to zero indicates that the matrix is close to maximally correlated as we will show in section 5. We also show the eigenvalues for a correlation matrix computed in the presence of both observational model error and representation error (red, dashed curve). Finally, we show the eigenvalues after the diagonal of the matrix is increased by 50% (blue, dotted curve).

  • View in gallery

    (a) Comparison of the true solution, (gray) and its discrete time samples (black circles) and integrated observations (green circles) with the UKF estimates (blue, solid) and CUKF estimates (red, dotted) over the same time interval shown in Fig. 1. (b) Errors computed by subtracting the true discretized signal from the observation (green circles), the UKF estimates (blue, solid), and the CUKF estimate (red, dotted). (c),(d) As in (a),(b), but using the RK4 integrator with the same .

  • View in gallery

    (a),(b) Comparison of filter results using the UKF without correlations to filtering with correlations (CUKF) on the Kuramoto–Sivashinsky model truncated in space to and 256 grid points for observations integrated over (a) and (b) . (c) For 64 grid points and , we show the robustness of the results after adding various levels of Gaussian instrument noise with variance to the observations. (d) For the case we test the UKF with inflation by adding the identity matrix times a constant to (blue, solid) or (red, dashed). We also show the effect of inflating the filter background covariance (black, dotted) where the x axis indicates inflation percentage. In each case, inflation degraded the filter performance.

  • View in gallery

    Mean-squared error of filter estimates for linear models with (a) positive correlations and (b) negative correlations. Black curve is based on the filter using , all other curves use the true . Notice that we obtain perfect recovery to the limit of numerical precision when the stability criterion is satisfied: (a) for positive correlation and (b) for negative correlation.

  • View in gallery

    Mean-squared error of filter estimates with positively correlated noise for (a) L63 in periodic and chaotic parameter regimes and (b) L96 dynamical systems for various values of the forcing parameter. Black curve is based on the filter using (UKF), and all other curves use the true (CUKF).

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 224 224 28
PDF Downloads 218 218 27

Correlation between System and Observation Errors in Data Assimilation

View More View Less
  • 1 George Mason University, Fairfax, Virginia
© Get Permissions
Full access

Abstract

Accurate knowledge of two types of noise, system and observational, is an important aspect of Bayesian filtering methodology. Traditionally, this knowledge is reflected in individual covariance matrices for the two noise contributions, while correlations between the system and observational noises are ignored. We contend that in practical problems, it is unlikely that system and observational errors are uncorrelated, in particular for geophysically motivated examples where errors are dominated by model and observation truncations. Moreover, it is shown that accounting for the cross correlations in the filtering algorithm, for example in a correlated ensemble Kalman filter, can result in significant improvements in filter accuracy for data from typical dynamical systems. In particular, we discuss the extreme case where the two types of errors are maximally correlated relative to the individual covariances.

© 2018 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Timothy Sauer, tsauer@gmu.edu

Abstract

Accurate knowledge of two types of noise, system and observational, is an important aspect of Bayesian filtering methodology. Traditionally, this knowledge is reflected in individual covariance matrices for the two noise contributions, while correlations between the system and observational noises are ignored. We contend that in practical problems, it is unlikely that system and observational errors are uncorrelated, in particular for geophysically motivated examples where errors are dominated by model and observation truncations. Moreover, it is shown that accounting for the cross correlations in the filtering algorithm, for example in a correlated ensemble Kalman filter, can result in significant improvements in filter accuracy for data from typical dynamical systems. In particular, we discuss the extreme case where the two types of errors are maximally correlated relative to the individual covariances.

© 2018 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Timothy Sauer, tsauer@gmu.edu

1. Introduction

Consider a discrete time nonlinear dynamical system with state variable and observations given by
e1
e2
where is called the system or dynamical noise (or the stochastic forcing) and is called the observation noise. In practice these noise terms are needed to account for model mismatch, truncation errors caused by differing resolutions, and stochastic terms such as instrument errors. There has been considerable recent interest in the implications of these different sources of error (e.g., Satterfield et al. 2017; Hodyss and Nichols 2015; Van Leeuwen 2015; Janjić et al. 2018).
On the other hand, most filtering algorithms are designed based on a separation of dynamical and observation noise. The nonlinear filtering literature typically considers noise and , allowing correlation within type, but tends to dismiss correlation between ω and ν. In this article, we argue the importance of modeling correlations between system and observation noise. Specifically, we will consider Kalman filters and ensemble Kalman filters (EnKF) that are derived based on the assumption that
e3
and is assumed to be symmetric and positive semidefinite. The matrix contains the cross covariances between the variables and .

In order for to be positive semidefinite, both and must be positive semidefinite. The classical viewpoint corresponds to simply extracting and and ignoring . However, it is reasonable to expect that in many physical systems, the truncations causing the noise in the state of the system would also affect the sensor or observation system. The goal of this paper is to establish that 1) correlations between system and observation errors are likely to be common in applied data assimilation problems caused by truncation of infinite-dimensional solutions and 2) incorporating these correlations can dramatically improve filter results.

If we consider the true state to be a function of space and time evolving in an infinite-dimensional function space, then the truncated true state and the observation are essentially two different finite-dimensional projections of this infinite-dimensional space (Dee 1995; Janjić and Cohn 2006; Oke and Sakov 2008). In section 2 we will show that the errors between the exact projections and the finite-dimensional approximations are correlated for generic observations. Error correlations arising from model truncation have been previously observed (Mitchell and Daley 1997; Hamill and Whitaker 2005; Liu and Rabier 2002). In particular, models are often composed of discrete dynamics occurring at points of a two- or three-dimensional grid. Remote observations by satellite or radiosonde can be viewed as integrations over a region including several grid points. Here, we provide a general framework to explain correlations between these quantities. We also consider other sources of error such as model mismatch and instrument error, and we show that significant correlations persist except in the case where instrument error dominates (since this error will be modeled as white noise that is uncorrelated with the state).

In section 3, a correlated version of the EnKF is developed that takes the correlations in (3) into account, and recovers the Kalman equations for linear systems. In section 4, the correlated unscented Kalman filter (CUKF; an unscented version of the EnKF) is applied to examples from section 2. Using an appropriate substantially improves output accuracy, while a filter that ignores the cross-correlation matrix can lead to dramatically suboptimal results.

In section 5 we investigate the effect of correlations in greater detail. First, we show that in the linear case when , for any and there exists a “maximal” for which perfect recovery of the state variables is possible. Second, we demonstrate examples of perfect recovery in nonlinear systems with such maximal . In these examples, if one ignores a maximal (setting in the filter while the true is maximal), variables that would have been perfectly recovered by using the true will instead be estimated with variance on the order of the entries in (e.g., see Fig. 6).

2. Correlation between system and observation errors

To understand how correlations arise in applied data assimilation, we must first leave behind the idealized scenario described in (1) and (2). Following Dee (1995) and Janjić and Cohn (2006), we describe the true evolution and observation processes by replacing the discrete solution with an infinite-dimensional solution , which has some regularity (continuity or differentiability) in the spatial variable z and temporal variable t. We should note that the following analysis is very similar to Satterfield et al. (2017), except that we consider an infinite-dimensional solution instead of a high-resolution solution [which is denoted in Satterfield et al. (2017)].

The discrete time solution and the observations can be viewed as projections of this continuous solution. The desired finite-dimensional discrete time solution
eq1
is a projection of the continuous solution. Meanwhile, the finite-dimensional discrete time observations
eq2
are given by another projection of the solution , plus an instrument noise term . Define the system error as the local truncation error (LTE) of the discrete solver f, namely,
eq3
and define the observation error by
eq4
where is the true observation projection and is the approximate discrete observation function. Letting h be a consistent discretization of the true observation projection [as in (5)], we further decompose the observation error in terms of the representation error
eq5
and the observation model error
eq6
so that the total observation error becomes
eq7
The above definitions are very similar to those found in Eqs. (1)–(7) in Satterfield et al. (2017) except that we have replaced their high-resolution observation function and the truncation smoother with finite-dimensional projections of infinite-dimensional spaces, namely, and , respectively. Projections from infinite-dimensional spaces were also considered by Janjić et al. (2018) who also considered additional terms in the decomposition of the observation error. Since our main focus is the correlations between system and observation error we will restrict our attention to the three sources of observation error listed above. We should note that while the observational errors can be formally decomposed as above, the individual components may be correlated (especially the model error and representation error terms) so in general we do not expect a corresponding decomposition of the observation error covariance matrix.
For simplicity we assume that are both mean zero and define the error variances to be
eq8
We briefly note that, while it may be reasonable to assume that the representation and observation model errors are uncorrelated from the instrument error , their cross covariance is
eq9
which seems unlikely to be zero. However, our main concern here is the cross covariance between system and observation error, which is defined as
e4
(assuming uncorrelated instrument errors). In this general setting it is already puzzling why one would assume that . By averaging over the full time series, we are defining global covariance matrices that are fixed in time. More generally, one could also consider time-varying covariance matrices that are either localized in time or in state space. However, this would only change the indices of the averages above and in most situations one should expect nonzero matrices. In the next section we will show explicit examples of substantial correlation between system and observation errors in many practical situations. By imposing additional assumptions on the observation model (viz., that it is linear and local in both space and time) we will be able to show that the correlations are close to maximal. When these assumptions on the observation model are satisfied, we expect to be very important to the filtering problem except when the observation errors are dominated by instrument error.

a. Evaluation and averaging projections

We will assume that the finite-dimensional dynamics f and observation function h are consistent, meaning that the errors and go to zero in the limit of small discretization parameters in space and in time. As an example, consider the case when the evolution of the full solution is governed by a PDE
eq10
and consider the projection of the state onto a grid at time , namely,
eq11
If we assume that the solution has continuous derivatives in space and in time, we can use a solver that is order n in space and order m in time to obtain the system error
eq12
where for simplicity we assume a uniform spatial grid in each dimension. The coefficients and depend on the derivatives of within and of .
Now consider the associated observation operator . Rather than sampling at an instantaneous time, most observation modes have an associated time constant, and an average value of an interval with some weight function is returned. Similarly, the observation may involve multiple spatial grid points, as in the case of satellite observations involving radiative transfer that explicitly integrate over the entire vertical grid. Even for very local observations, the true observing system may be located between grid points, thereby involving interpolation between grid points. Thus, we assume the true observation has the following form:
eq13
meaning that a consistent observation function h should be a quadrature rule for approximating this integral. Assuming that the discrete observation function has order convergence in space and order in time, the representation error is
e5
where the coefficients depend on the derivatives of within and of .
The situation described above is common in applications, namely, where the discrete solution is given by (or equivalent to) evaluation on a grid and the true observation operator is a local weighted average of the full solution. In this case, we find the cross covariance of the system and observation errors to be (excluding higher-order terms)
eq14
and in the limit of small and , the derivatives in will all be evaluated at points less that apart. Therefore, up to higher-order terms, can be rewritten in terms of derivatives evaluated at the same point . Notice in particular that when , the coefficients a and c are the same up to a scalar, and similarly when the coefficients b and d are linear combinations of the same order derivatives. While it is possible for these terms to combine so as to exactly cancel when averaged over time, the correlation will often be nonzero.

As a special case, consider the situation when both the system and observation errors are dominated by the same single variable (either time or one of the spatial variables). In this case, the leading-order terms would differ only by a constant, so that up to higher-order terms the system error and observation error would be multiples of one another. This not only implies that but also, as we will show in section 5, that the system and observation errors are maximally correlated up to higher-order corrections, so that S is as large as possible relative to the individual variances. For example, in the case of satellite observations of radiative transfer, the true observation integrates over the entire vertical component of the atmosphere, whereas the integral may be very localized in time and the horizontal variables. This would suggest a relatively large error in terms of vertical (depending on the number of vertical grid points and the order of the quadrature rule used to estimate the radiative transfer) even if the model was perfectly specified. If the vertical model error also dominated the model error then we would expect a high correlation of the model and observation errors.

Similar to the above analysis, we can also consider the observation model error to be a difference of quadrature rules given by the observation function h and an approximate function . With these assumptions we find
eq15
where α and are the quadrature weights for h and , respectively. Since both observation functions are assumed to be local, we again obtain an error in terms of and according to the order of agreement between h and , which will lead to maximal correlations with the truncation errors. In practice, either the model error or the representation error could be the dominant term, but in either case we find nontrivial correlation with the system errors. In what follows, we focus on examples where representation error dominates, since this term will be the easiest to estimate in practice. The importance of treating correlated errors will hold in either case. We now turn to some concrete examples.

Even for general nonlinear observations functions h and approximate observation functions , we expect the correlation given in (4) to be nonzero, since setting (4) equal to zero imposes a nontrivial constraint on the system. The analysis in this section shows that for linear observations that have a local structure in space and time, truncation error can be expected to be nearly maximally correlated with both observation model error and representation error. When the observations are nonlinear or not local in space and time we still expect error correlations to be present, although if not close to maximal, they may not be crucial to the filtering problem.

b. Time-averaged observations of an ODE

First consider the case when the true solution is discrete in space, meaning that is a vector evolving continuously in time. Assume that the true evolution of is governed by an ODE
eq16
and the projection is evaluation at a discrete time grid . If the discrete evolution operator f is Euler’s method, the system error is
e6
where we have used
We assume the true observation is an unweighted average over a short interval , namely,
eq17
Consider the discrete observation function h to be a consistent quadrature rule using the grid points falling within the interval . In particular when we find and the representation error is
eq18
Expanding in a Taylor series centered at and cancelling odd terms we find
e7
Comparing (6) and (7) shows that up to leading order, and are directly proportional, meaning that if the leading-order terms are nonzero, they are correlated.
If a backward Euler method were used instead of a forward Euler method, the leading term of the system error would be , which is negatively correlated with . Moreover, if the observations arose from an asymmetric average (e.g., over ) then the observation error would be in terms of the first derivative of instead of the second derivative; however, these different derivatives are still likely to be correlated since
eq19
If higher-order methods were used, then the errors would involve higher-order derivatives, which are still highly likely to be correlated.

c. Estimation of the full covariance matrix

The correlations described above will be illustrated in two simple examples. To show the effects most clearly, we assume a perfect model. We begin with a simple ODE solver.

Consider using forward Euler on the Lorenz-63 system (Lorenz 1963)
e8
where , , and . To demonstrate the correlation derived above, we first used a higher-order integrator [Runge–Kutta fourth order (RK4) with a 0.005 time step] to produce a finely sampled ground truth signal. To define the truncated model, we used a forward Euler method with . We used a 21-point composite trapezoid rule to approximate the integrated observation with . In Figs. 1a and 1b we show the correlation between the system error, in this case the local truncation error (LTE) of the Euler solver, and the observation error, in this case only the representation error, as a function of time.
Fig. 1.
Fig. 1.

Demonstrating correlated noise in the truncated L63 system. (a) Comparing the true x coordinate of L63 (gray) to a one-step forecast using the forward Euler method (blue, solid curve) with and the integrated observation (red, dashed curve). (b) Comparing the system error (defined as LTE) to the observation error (only representation error in this example), note the correlation. (c) Empirical covariance matrices, with red lines dividing the , and blocks. (d)–(f) As in (a)–(c), but using the RK4 integrator with the same coarse time step . Color ranges in (c) and (f) are selected to emphasize the matrix and may saturate for and . Notice positive correlations in (b),(c) and negative correlations in (e),(f).

Citation: Monthly Weather Review 146, 9; 10.1175/MWR-D-17-0331.1

In Fig. 1c we show the estimated covariance matrix , which reveals the strong correlations () between the system and observation errors. The covariance matrix is estimated by concatenating the system and observation errors at each step into a six-dimensional vector, and then computing the empirical covariance matrix of these vectors (averaged over T = 12 000 discrete time steps). The estimated matrix will be used in section 4 in a nonlinear filter, and this method can be used to estimate the matrix for general problems as long as one can afford a long offline run using a very fine discretization. Figures 1d–f show the same phenomenon for a more accurate solver, RK4 with a time step of . Although the system errors are much smaller, the correlation with observation errors are still evident.

The difference between the positive correlations in Figs. 1b and 1c and negative correlations in Figs. 1e and 1f will have a noticeable effect in filter accuracy, as shown in section 4a. In section 5, we will establish a theory explaining this disparity in the linear case.

As a second example, consider spatiotemporal dynamics given by the Kuramoto–Sivashinsky PDE (Kuramoto and Tsuzuki 1976; Sivashinsky 1977)
e9
defined on a periodic domain with length . For simplicity, we will use an explicit method applying RK4 in time and second-order finite-difference formulas for the spatial derivatives. While an implicit method would be stable for much larger values of , we will later see that the filter will be able to stably recover the signal from noisy observations even for large (the filter uses the observations to stabilize what would otherwise be an unstable numerical scheme). To obtain a high-resolution “ground truth” signal we use a grid with 512 equally spaced spatial grid steps and a time step of .
To simulate a PDE integrator in practice, we truncate the model, applying the same RK4 solver with a reduced number of grid steps and a larger . Let be the number of spatial grid steps on and let be the spatial step size, so that . We define an observation function that integrates in space as
e10
where δ defines the spatial region over which the observations are averaged. So for example, when , we will first estimate the true observation using a composite trapezoid rule with grid points from the full 512 gridpoint solution. If we consider a truncated model with grid points, then the observation function will have to be estimated using only grid points. If we consider a truncated model with grid points, we will estimate the observation functions using only a single grid point, in other words our coarse model for the observation function will be direct observation at each grid point. This is because the integration range δ has become smaller than our truncated . In each case, the estimated observation function h is consistent with the true observation , so we are not considering observation model error yet but only representation error.

In Fig. 2 we compare the full resolution and truncated solutions for 64 grid points and . The observation representation errors (top right) are tightly correlated to the system errors, consisting of truncation errors from the one-step integrator (bottom right). Both errors are also correlated with the underlying truth solution.

Fig. 2.
Fig. 2.

(top) (left) Ground truth 512 gridpoint solution (middle left) the same solution decimated to 64 grid points (middle right) the observation, which integrates the leftmost solution over 9 grid points before truncating, and (right) the observation error, which is the difference between the middle two solutions. (bottom) (left),(middle left) As in (top). (middle right) The 1-step integrator output from the truncated model, using 64 grid points and . (right) System error, difference between the middle two solutions.

Citation: Monthly Weather Review 146, 9; 10.1175/MWR-D-17-0331.1

It is helpful to view the empirical full covariance matrix of the system plus observation errors that can be estimated from the data in Fig. 2. In Fig. 3 (left) we show the matrix where the submatrices and have been spatially averaged [using the symmetry of (9) on the periodic domain], which reveals the strong correlation between the system and observation errors. In Fig. 3 (right), we plot the sorted eigenvalues of the empirically estimated matrix (black, solid curve), which result purely from the correlation of the truncation error in the model and the representation error in the observation.

Fig. 3.
Fig. 3.

For the Kuramoto–Sivashinsky model truncated onto 64 grid points with we show (left) the spatially averaged matrix, note that the cross covariance between dynamical truncation errors and observation representation errors has a larger magnitude than the variance of the observation errors and (right) the eigenvalues of (black, solid curve) decay quickly. The presence of eigenvalues that are very close to zero indicates that the matrix is close to maximally correlated as we will show in section 5. We also show the eigenvalues for a correlation matrix computed in the presence of both observational model error and representation error (red, dashed curve). Finally, we show the eigenvalues after the diagonal of the matrix is increased by 50% (blue, dotted curve).

Citation: Monthly Weather Review 146, 9; 10.1175/MWR-D-17-0331.1

Next, we consider the case of a large observation model error and show that the observed correlations are still present in this case. While leaving the truncated observation function unchanged, we changed the true observation function to compute a weighted spatial average of the four nearest grid points to each observed point. Explicitly, prior to applying the composite trapezoid rule, we first average each location with its four nearest neighbors with weights:
eq20
which maintains the local structure but also implies that the quadrature rule used for the truncated state is no longer consistent with this new observation. In Fig. 3 (right), we plot the eigenvalues of the empirically estimated matrix (red, dashed curve) for this new observation, which contains both observation model error and representation error. While the correlation is slightly further than maximal, the same strong decay of eigenvalues and almost singular behavior is present as in the case of representation error alone.

In Fig. 3 we should emphasize the presence of many small eigenvalues, indicating that the matrix is close to singular. To show that these small eigenvalues come from the special structure of the correlated errors, we artificially increased the diagonal of the submatrices by 50%, and the resulting eigenvalues are shown as the blue dashed curve in Fig. 3. In other words, by increasing the correlation beyond the true correlations we destroy the special “almost rank deficient” nature of this type of correlated error. In section 5, we focus on this phenomenon, which indicates that the system and observation errors are very close to being maximally correlated. We will show that the strong correlation between the system and observation errors has significant consequences for the ability to estimate the true state from the observations.

3. Filtering in the presence of correlations

In this section we review versions of the Kalman filter for linear and nonlinear dynamics, which include full correlations of system and observation errors. We begin with the linear formulas, and then discuss the unscented version of the ensemble Kalman filter for nonlinear models.

a. The Kalman filter for correlated system and observation noise

We begin by reviewing the Kalman update equations for a linear system with correlated noise (e.g., see Simon 2006). Assume the following model and observation equations:
e11
e12
where and represent the systems dynamics and linear observable, respectively; and and are fixed matrices. Assume (3) represents the noise covariances.
Given the posterior estimate of the state and covariance at step , the Kalman update for a linear system (Simon 2006; Bélanger 1974) is
e13
e14
e15
where represents the forecast of the state given only the observations up to time and represents the covariance of the forecast. Similarly, represents the forecast of the ith observation given only the observations up to time , and the difference between the observed variables and the forecast mapped into observation space,
eq21
is called the innovation. These innovations are often used to estimate the system and observation error as in Bélanger (1974), Mehra (1970, 1972), and Berry and Sauer (2013). The Kalman gain matrix optimally combines the forecast with the innovation to form the posterior estimate , which is the maximal likelihood and minimum variance estimator of the true state . The filter also produces the covariance matrix of the estimator for use in the next filter step.

b. The correlated unscented Kalman filter (CUKF)

We now generalize the correlated system and observation noise filtering approach to nonlinear systems and we will show that for linear systems we recover exactly the equations above.

To apply the unscented Kalman filter we need to generate an unscented ensemble with the correct correlations. Since the noise realization is independent of the current state, we consider the concatenated state and noise vector as follows:
eq22
eq23
Notice that the concatenated state is dimensional, and the joint covariance matrix is . We then form the unscented ensemble, which is represented in a matrix
eq24
where contains the first N rows of the ensemble, contains the next N rows, and contains the final M rows. We also define the associated ensemble weights as
eq25
for . The scalar α defines the scaling of the ensemble, which is often chosen to be or although Julier and Uhlmann (2004) suggests [in the limit as the unscented Kalman filter (UKF) approaches the extended Kalman filter (EKF)]. Notice that if the matrix is constant, the square root of can be computed offline and then can be formed at each step as the block diagonal matrix with blocks and .
Now that we have generated an unscented ensemble with the correct correlations we can pass this ensemble through the nonlinear transformations defining
eq26
eq27
where the nonlinear functions f and h are applied to each column of the ensemble matrices to form the forecast ensemble matrices and that are and , respectively. Now we can compute the following forecast statistics:
eq28
where is the jth column of (the jth ensemble member).
We can now define the unscented version of the Kalman update for correlated noise as follows:
e16
Finally, as an implementation detail, the last equation should be computed as
eq29
in order to maintain numerical symmetry.

In appendix A we show the equivalence of the CUKF and Kalman filter (KF) for linear problems with correlated noise, which shows that the CUKF is a natural generalization to nonlinear problems with correlated errors. We note that the generalization of the CUKF approach to an ensemble square root Kalman filter (EnSQKF) is a straightforward extension of the same Kalman update formulas. Integrating correlated noise into other Kalman filters such as the EnKF and ensemble transform Kalman filter (ETKF) can also be achieved using the Kalman update for correlated noise. For large problems the covariance matrix would need to be localized in order to be a practical method; for example, the localized ensemble transform Kalman filter (LETKF) can be adapted to use the unscented ensembles used here Berry and Sauer (2013). A significant remaining task is generalizing the ensemble adjustment Kalman filter EAKF for additive system noise and correlated system and observation noise. Serial filters such as the EAKF cannot currently be applied even for additive system noise, which is not correlated to the observation noise, and instead these filters typically use inflation to try to account for system error. Generalizing the serial filtering approach to allow these more general types of inflation is a significant and important task and is beyond the scope of this article.

4. Filtering systems with truncation errors

In this section we apply the CUKF to truncated observations of the Lorenz-63 and Kuramoto–Sivashinsky systems as described in section 2. The dynamics and observations considered in this section have no added noise, so that the system errors arise only from truncation of the numerical solvers, and the observation errors arises only from local integration (representation error only). The CUKF will use the empirically estimated matrices described in section 2, and we compare these to the filter results with the covariance matrix modified by setting the block equal to the 0 matrix, which we denote as UKF.

a. Example: Lorenz equations

First we consider the Lorenz-63 system in (8) with the observation described in section 2b. Using the same data generated in that example, we applied the CUKF and UKF. The estimates produced by these filters are shown in Fig. 4a (for the same time interval shown in Fig. 1). In Fig. 4b we show the errors between each filter’s estimates and the truth, compared to the observation representation errors over the same time interval. The CUKF, which uses the full matrix, obtains significantly superior estimates of the true state. Averaged over 6000 filter steps (after removing the initial filter transient) the root-mean-squared error (RMSE) of standard UKF estimates is 0.29 whereas that of the CUKF estimates is 0.16. Compared to the RMSE of the raw observations, which is 0.35, the UKF reduced the error by 17%, while the CUKF reduced the error by 54%.

Fig. 4.
Fig. 4.

(a) Comparison of the true solution, (gray) and its discrete time samples (black circles) and integrated observations (green circles) with the UKF estimates (blue, solid) and CUKF estimates (red, dotted) over the same time interval shown in Fig. 1. (b) Errors computed by subtracting the true discretized signal from the observation (green circles), the UKF estimates (blue, solid), and the CUKF estimate (red, dotted). (c),(d) As in (a),(b), but using the RK4 integrator with the same .

Citation: Monthly Weather Review 146, 9; 10.1175/MWR-D-17-0331.1

We then repeated this experiment using the RK4 integrator instead of forward Euler with the same truncated time step of and the results are shown in Figs. 4c and 4d. The RMSE of the UKF estimates with this integrator is 0.18 and the RMSE of the CUKF estimates is 0.16. Recall that the local truncation errors of RK4 were negatively correlated with the observation representation errors, resulting in relatively small differences between the UKF and CUKF for this example. The difference between positively and negatively correlated errors will be studied below in section 5. Notice that the CUKF with forward Euler obtains better estimates than the UKF using the far superior RK4 integrator. This shows that by using the correlations we can obtain better results with a much faster integrator. It also emphasizes the importance of the sign of the correlations, so that if possible one should select an integrator that yields errors that are positively correlated with observation representation errors (in the global average), possibly even if this requires using a lower-order method.

b. Example: Kuramoto–Sivashinsky

Next we consider filtering the observations of the Kuramoto–Sivashinsky model in (9) introduced in section 2c. Using a ground truth integrated with 512 spatial grid points and we consider truncated models with and 256 grid points and (results were similar for and ). In Figs. 5a and 5b we compare the RMSE of the UKF estimates, CUKF estimates, and the observations for two different spatial integration widths and . When the integral in (10) defining the true observation is estimated using the composite trapezoid rule on 3 grid points of the full 512 grid point solution and the RMSE of the observation representation errors is 0.01, which is of the signal variance. In Fig. 5a we see that for 64 grid points the UKF does not reduce the error much relative to the observation representation error, whereas the CUKF obtains a much better estimate. In fact, the CUKF error with 64 grid points is comparable to the UKF error with 256 grid points. When the integral in (10) is estimated using the composite trapezoid rule on 9 grid points of the full 512 grid point solution and the RMSE of the observation representation errors is 0.11, which is of the signal variance. As shown in Fig. 5b, the CUKF still outperforms the UKF; however, the difference at 64 grid points is less significant since meaning that each integral is completely contained between grid points.

Fig. 5.
Fig. 5.

(a),(b) Comparison of filter results using the UKF without correlations to filtering with correlations (CUKF) on the Kuramoto–Sivashinsky model truncated in space to and 256 grid points for observations integrated over (a) and (b) . (c) For 64 grid points and , we show the robustness of the results after adding various levels of Gaussian instrument noise with variance to the observations. (d) For the case we test the UKF with inflation by adding the identity matrix times a constant to (blue, solid) or (red, dashed). We also show the effect of inflating the filter background covariance (black, dotted) where the x axis indicates inflation percentage. In each case, inflation degraded the filter performance.

Citation: Monthly Weather Review 146, 9; 10.1175/MWR-D-17-0331.1

The results of both UKF are robust for large until the numerical solver becomes extremely unstable, which occurred for with 256 grid points, since more grid points generally require smaller to stabilize the solver. However, we should note that the numerical solver is unstable even for 64 grid points with and the filter is stabilizing the solver using the observations. We also examined the robustness of these results in the presence of additive Gaussian observation noise with covariance in Fig. 5c. Notice that for small the random noise is small and the errors from truncation dominate, meaning the correlations in are significant. As the uncorrelated noise is increased it eventually dominates the correlated part of the observation errors, so that UKF and CUKF have similar performance.

Finally, in Fig. 5d we show the effect of inflation in the UKF by adding a constant multiple of the identity to either or , and in each case the best performance is found when using no inflation. We also tried inflating the filter background covariance matrix by multiplying by a constant greater than one, and this also had very little effect as shown in Fig. 5d. These results indicate that inflation cannot account for the correlated error. Since the and used in the UKF were determined empirically to be optimal in this example, the only way to improve the performance is to account for correlations using the CUKF.

5. Maximally correlated random variables and perfect recoverability

In the previous section, the importance of using the full correlation matrix was demonstrated, for system and observation errors that arise naturally from truncation and averaging that is common in geophysical modeling and filtering. In this section, we investigate the effects of cross correlation in a more systematic way. In particular, we identify the extreme case of maximally correlated random variables.

a. Maximum correlation

We begin by defining maximally correlated random variables.

Definition 5.1 (Maximally correlated random variables)

Let and be random variables with covariances and , respectively, and let be the cross covariance. We say that are maximally correlated if the Schur complement of in , namely , has minimal trace among all matrices . In other words .

While it is not immediately obvious from the definition, Lemma B.1 in appendix B shows that the roles of X and Y are symmetric, so that also minimizes . The idea of maximally correlated random variables is that by choosing an appropriate the matrix becomes rank deficient with rank N. Notice that we can make a linear change of the X variables,
eq30
so that is unchanged but the covariance matrix of is
e17
and the new state variables have covariance matrix with minimal trace. In other words, the variables have minimal variance among all possible choices of . According to the rank additivity formula of Guttman (1946), the rank of is equal to the sum of the rank of and the rank of its Schur complement , meaning that . Thus, by reducing the rank of the Schur complement we are actually choosing , which minimizes the rank of . Intuitively speaking, this choice of minimizes the dimensionality of the joint noise process.

A simple example of maximal correlation is to consider the case where , , and are scalars. By setting , we find the Schur complement to be . Moreover, with this choice of s the eigenvalues of are , so that is minimal over all possible choices of s. In general, when we can set , where and are matrix square roots (recall that matrix square roots are unique up to a choice of orthogonal matrix) and we find the Schur complement to be . When the formula for is similar and is given in Lemma B.1, which is stated and proved in appendix B.

It follows from Lemma B.1 that given random variables and with covariance matrices , respectively, there always exists an matrix such that the total covariance matrix in (3) makes X and Y maximally correlated, and rank. This finding is striking, in the sense that if X and Y represent the system and observation errors of a dynamical system, respectively, and if they are maximally correlated, then the underlying noise/error process is actually only N dimensional, despite appearing dimensional. Since the observation errors are linear combinations of the system errors, up to an orthogonal transformation we can think of the process as effectively having no observation noise. We will make this rigorous below by showing that in the case of maximal correlations, the observation errors can be completely eliminated by filtering and the true state can be perfectly recovered.

Now consider the case when and are the system and observation covariances, respectively. From the previous lemma we can see that the easiest way to obtain maximally correlated processes is when the observation errors are linear combinations of the system errors (since this implies that has rank N). So, returning to the discussion in section 2a, we can now see that when the leading-order terms in the system and observation errors only differ by a constant multiple they will be maximally correlated up to higher-order terms. More generally, whenever the matrix has rank N, the system and observation errors are maximally correlated. In particular, the small eigenvalues in Fig. 3 indicate that the system and observation errors are close to maximally correlated.

b. Perfect recoverability in maximally correlated linear systems

Consider the linear system of the form (1) and (2), where
eq31
eq32
and assume the noise is generated by
eq33
as in (3). In this section we will show that when the covariance matrices and are maximally correlated, meaning that is chosen as in Lemma B.1, the state variables become perfectly recoverable, meaning that the limiting variance of the Kalman filter estimates of those variables is zero. Of course, in real applications we do not get to choose . Our purpose here is to demonstrate the maximal effect that can have on the ability to estimate random variables. As a consequence, if the true were maximal and one instead used a suboptimal filter with , the relative loss of accuracy would be “infinite” (since perfect reconstruction was possible with the true ). Although the results in this section only apply to linear filtering problems, in section 5c we will show similar empirical results for nonlinear filtering problems.
We show the effect that the maximal correlation has on the stationary posterior covariance of a Kalman filter. Without loss of generality, in this section we will assume and since we may replace by where is block diagonal with blocks and . Substituting (13) and (14) into (15) and setting , we find the discrete time algebraic Riccati equation (DARE):
e18

If (18) has a solution that is stabilizing, meaning that all the eigenvalues of are inside the unit circle [where is defined by (14) using the solution ], then this solution is unique and is the limiting covariance matrix of the Kalman filter as shown in Ran and Vreugdenhil (1988). We can now state the following result, the proof can be found in appendix C.

Theorem 5.2

Assume that all the eigenvalues of the matrix
eq34
lie inside the unit circle. Then the limiting covariance matrix of a Kalman filtering problem with maximally correlated noise processes is zero when . In other words, all state variables are perfectly recoverable. When , if the general stability condition on is met, the limiting covariance matrix is zero when projected onto the top M eigenvectors of .

Notice that the Kalman filter has an asymmetry between and , which is not present in the definition of maximal correlation, because of their differing roles in the dynamics. The consequence of this asymmetry is seen in the stability condition. For simplicity, consider the case , when is positive the stability condition is met for all , and even for some since . Conversely, when is negative, we find and stability of is no longer sufficient.

To demonstrate this result, we applied the numerical DARE solver implemented by MATLAB to a linear system with , where λ will be varied to demonstrate the effect of stability and , , and . To show how the filter estimates improve as approaches the maximal choice, we let for and . In this case, the correlation in is positive and we find that the stabilization criterion is
eq35
if and only if In Fig. 6a we plot the mean of the diagonal elements of the numerical solution to (18) against the trace of the Schur complement for all values of j. The different curves correspond to different values of λ chosen near 1.5. Notice that for , as the noise approached maximally correlated, the filter estimates approach the true state up to the limits of numerical precision. When , is still a solution of the DARE but is no longer stabilizing and so the filter converges to a covariance matrix that has variances greater than zero.
Fig. 6.
Fig. 6.

Mean-squared error of filter estimates for linear models with (a) positive correlations and (b) negative correlations. Black curve is based on the filter using , all other curves use the true . Notice that we obtain perfect recovery to the limit of numerical precision when the stability criterion is satisfied: (a) for positive correlation and (b) for negative correlation.

Citation: Monthly Weather Review 146, 9; 10.1175/MWR-D-17-0331.1

To show the effect of negative correlation, we next consider the case for and . In this case, the correlation in is negative and we find that the stabilization criterion is
eq36
if and only if Notice that in this case the dynamics are required to be stable () in order to stabilize the solution, whereas with positive correlations the dynamics could be unstable (). In Fig. 6b we plot the mean of the diagonal elements of the numerical solution to (18) against the trace of the Schur complement for all values of j. The different curves correspond to different values of λ chosen near 0.5. Notice that for , as the noise approached maximally correlated, the filter estimates approach the true state up to the limits of numerical precision. When , is still a solution of the DARE but is no longer stabilizing and so the filter converges to a covariance matrix with variances greater than zero.
Finally, we note that a standard form for the DARE used in numerical solvers, such as MATLAB, is
eq37
and (18) can be put in this form by setting , , , , , and .

c. Examples of UKF and perfect recovery in nonlinear systems

In this section we will apply the UKF to synthetic datasets generated with nonlinear dynamics where the system and observation errors are Gaussian distributed pseudorandom numbers. A surprising result is that despite the nonlinearity, we still obtain perfect recovery up to numerical precision for maximally correlated errors. Moreover, in analogy to the linear case, perfect recovery is not possible when the instabilities in the nonlinear dynamics become sufficiently strong.

We first consider the Lorenz-63 system introduced above (8). We consider the discrete time dynamics to be given by applying the RK4 solver with to the chaotic vector field in (8) and the direct observation function . We artificially add substantial system and observation noise of covariance and , respectively. According to the remarks preceding Lemma B.1, the system and observation noise are maximally correlated when , which implies the Schur complement of and is the zero matrix. To test the recovery of the deterministic variables , we set for and , implying that is the trace of the Schur complement. A time series of Lorenz-63 was produced with noise specified from , and and the UKF algorithm of section 3a was applied. Results for RMSE of the recovered variables are shown in Fig. 7a. In the limit as and approaches maximal correlation, we find perfect recovery of the true state using the CUKF algorithm with the true covariance matrix , as foreshadowed by the linear case. We repeated this experiment for the alternative parameter value in (8), which yields a globally attracting periodic orbit, and obtained very similar results, also shown in Fig. 7a.

Fig. 7.
Fig. 7.

Mean-squared error of filter estimates with positively correlated noise for (a) L63 in periodic and chaotic parameter regimes and (b) L96 dynamical systems for various values of the forcing parameter. Black curve is based on the filter using (UKF), and all other curves use the true (CUKF).

Citation: Monthly Weather Review 146, 9; 10.1175/MWR-D-17-0331.1

Since perfect recovery in the linear case depended on the degree of stability of the dynamics, we next investigate the effect of the Lyapunov exponents of a chaotic dynamical system on the ability to obtain perfect recovery. We consider the chaotic Lorenz-96 system, a 40-dimension ODE given by
e19
where determines the size and number of the positive Lyapunov exponents of the chaotic dynamics Lorenz (1996). Using , , the maximal correlation occurs when . We set for and and generated time series with noise from , and as above. The CUKF algorithm was applied to recover the 40-dimensional state. In Fig. 7b we show that for we obtain perfect recovery in the case of maximal correlation between system and observation noise. However, as increases, the system becomes more strongly chaotic and the perfect recovery breaks down. Notice that as in the linear case, the failure of perfect recovery occurs very sharply between and . This suggests that in analogy to the linear result, some form of stability condition is likely necessary for perfect recovery.

6. Discussion

Approximating a dynamical system on a grid is pervasive in geophysical data assimilation applications. For dynamical processes, time is usually handled in discrete fashion. We have shown that correlation between system and observation errors should be expected when the system errors derive from local truncation errors from differential equation solvers, both in discrete time and on a spatial grid, and when observational error is dominated by either observation model error or representation error.

In section 3, we introduced an approach to the ensemble Kalman filter that accounts for the correlations between system and observation errors. In particular, we showed that for spatiotemporal problems, extending the covariance matrix to allow cross correlations can reduce filtering error as much as a significant increase in grid resolution. Of course, obtaining more precise estimates of the truth with much coarser discretization allows faster runtimes and/or larger ensembles to be used.

Correlations are most significant when other independent sources of observation and system error are small compared to the truncation error. Of course, other sources of error, such as model error, may influence both the state and the observations leading to further significant correlations, but for simplicity we focus on correlations arising in the perfect model scenario. It is reasonable to expect that in many physical systems, the noise affecting the state of the system would also affect the sensor or observation system.

The generalization of the CUKF to an ensemble square root Kalman filter (EnSQKF) is a straightforward extension. However, it remains to extend the ensemble adjustment Kalman filter EAKF for additive system noise to correlations between system and observation noise. An EAKF formulation is critical for situations when the ensemble size is necessarily much smaller than either the state or observation dimensions (N and M, respectively). This situation is common when the covariance matrices, which are explicitly used in the UKF approach above do not fit in memory. A significant challenge in this formulation is that we cannot appropriately inflate the ensemble since we assume the full correlation matrix is of maximum rank, and any inflation of the small ensemble would only match the inflation in the subspace spanned by the ensemble. A promising alternative is to follow the approach of Whitaker and Hamill (2002) and design an alternative gain matrix such that the analysis ensemble has the same covariance as applying the Kalman gain to an appropriately inflated ensemble.

In this article, we have not dealt with the question of real-time estimation of the full covariance matrix . The importance of correctly specifying the and matrices was first demonstrated for the Kalman filter in Mehra (1970, 1972) and for nonlinear filters in Berry and Sauer (2013). We consider sequential methods for estimation of the full covariance matrix in parallel with filtering to be a fruitful area of future research.

Acknowledgments

We thank three reviewers whose helpful suggestions led to a much improved paper. This research was partially supported by National Science Foundation Grant DMS1723175.

APPENDIX A

Equivalence of CUKF and KF for Linear Problems with Correlated Errors

To justify our definition of the CUKF in section 3b, we will show that for linear systems the update in (16) is equivalent to the Kalman filter equations given in section 3a. We can define the covariance of the forecast by expanding the innovation as
ea1
and writing we find
eq38
where we recall that and . Notice that
eq39
so that
eq40
which implies that we can rewrite the Kalman gain equation as
eq41
Similarly, we can define the cross correlation between the state and observation as
eq42
and finally we can write the Kalman gain as
eq43
which agrees with the definition used in (16) for our version of the unscented Kalman filter.

APPENDIX B

Maximal Correlation when

In section 5 we showed how to define the maximal correlation matrix when . In the following lemma we derive the formula when .

Lemma B.1

Let be symmetric positive-definite matrices with eigendecompositions and . Denote diagonal entries by and .
  1. If , let be the first M columns of and be the first block of and let and .
  2. If , let and and let be any N columns of and the corresponding block of .
Then is minimized over all matrices by
eq44
for any orthogonal matrix . Moreover, for any maximal we have and .

Proof

Notice that
ea2
so that , where replaces the first M diagonal entries of with zeros if and if .

APPENDIX C

Proof of Theorem 5.2

Proof

It suffices to show that is a solution to the DARE. Let be the unique symmetric square root of and let be its inverse. Setting , notice that
eq45
and multiplying on the left by and on the right by we have
eq46
Finally, since is maximal with we have by Lemma B.1, which implies that
eq47
Since the previous equation is exactly like our (18) with , this shows that is a solution to the DARE. Moreover, since implies that the limiting Kalman gain is
eq48
where is invertible since it is the square root of an invertible matrix. Thus, the stabilizing condition is that the matrix
eq49
has eigenvalues inside the unit circle. Since solves the DARE and is stabilizing, it is the limiting covariance matrix of the Kalman filtering problem.

In the case when , recall that is the eigendecomposition from Lemma B.1 and contains the first M columns of so that . Since is maximal we have that and multiplying both sides by on the left and on the right we find , where . Similarly, setting , , and we can rewrite the DARE (again multiplying both sides by on the left and on the right). The result is precisely the DARE from (18) with replaced by , respectively. Since , we have reduced to the case above, so is a solution of this DARE. In other words satisfying is a solution of the DARE, and projecting the DARE onto the eigenvectors of orthogonal to would yield another DARE, which would need to be satisfied with a nonzero solution. Moreover, the resulting Kalman gain and stability condition become nontrivial in this case, but if the stability condition for the DARE is met, then we find a limiting covariance matrix, which is zero when projected onto the top M eigenvectors of .

REFERENCES

  • Bélanger, P. R., 1974: Estimation of noise covariance matrices for a linear time-varying stochastic process. Automatica, 10, 267275, https://doi.org/10.1016/0005-1098(74)90037-5.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Berry, T., and T. Sauer, 2013: Adaptive ensemble Kalman filtering of non-linear systems. Tellus, 65A, 20331, https://doi.org/10.3402/tellusa.v65i0.20331.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dee, D. P., 1995: On-line estimation of error covariance parameters for atmospheric data assimilation. Mon. Wea. Rev., 123, 11281145, https://doi.org/10.1175/1520-0493(1995)123<1128:OLEOEC>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Guttman, L., 1946: Enlargement methods for computing the inverse matrix. Ann. Math. Stat., 17, 336343, https://doi.org/10.1214/aoms/1177730946.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and J. S. Whitaker, 2005: Accounting for the error due to unresolved scales in ensemble data assimilation: A comparison of different approaches. Mon. Wea. Rev., 133, 31323147, https://doi.org/10.1175/MWR3020.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hodyss, D., and N. Nichols, 2015: The error of representation: Basic understanding. Tellus, 67A, 24822, https://doi.org/10.3402/tellusa.v67.24822.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Janjić, T., and S. E. Cohn, 2006: Treatment of observation error due to unresolved scales in atmospheric data assimilation. Mon. Wea. Rev., 134, 29002915, https://doi.org/10.1175/MWR3229.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Janjić, T., and et al. , 2018: On the representation error in data assimilation. Quart. J. Roy. Meteor. Soc., https://doi.org/10.1002/qj.3130, in press.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Julier, S. J., and J. K. Uhlmann, 2004: Unscented filtering and nonlinear estimation. Proc. IEEE, 92, 401422, https://doi.org/10.1109/JPROC.2003.823141.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kuramoto, Y., and T. Tsuzuki, 1976: Persistent propagation of concentration waves in dissipative media far from thermal equilibrium. Prog. Theor. Phys., 55, 356369, https://doi.org/10.1143/PTP.55.356.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Liu, Z.-Q., and F. Rabier, 2002: The interaction between model resolution, observation resolution and observation density in data assimilation: A one-dimensional study. Quart. J. Roy. Meteor. Soc., 128, 13671386, https://doi.org/10.1256/003590002320373337.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lorenz, E. N., 1963: Deterministic nonperiodic flow. J. Atmos. Sci., 20, 130141, https://doi.org/10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lorenz, E. N., 1996: Predictability—A problem partly solved. Proc. Seminar on Predictability, Vol. 1, Reading, United Kingdom, ECMWF, 18 pp., https://www.ecmwf.int/sites/default/files/elibrary/1995/10829-predictability-problem-partly-solved.pdf.

  • Mehra, R., 1970: On the identification of variances and adaptive Kalman filtering. IEEE Trans. Autom. Control, 15, 175184, https://doi.org/10.1109/TAC.1970.1099422.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mehra, R., 1972: Approaches to adaptive filtering. IEEE Trans. Autom. Control, 17, 693698, https://doi.org/10.1109/TAC.1972.1100100.

  • Mitchell, H. L., and R. Daley, 1997: Discretization error and signal/error correlation in atmospheric data assimilation. Tellus, 49A, 3253, https://doi.org/10.3402/tellusa.v49i1.12210.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Oke, P. R., and P. Sakov, 2008: Representation error of oceanic observations for data assimilation. J. Atmos. Oceanic Technol., 25, 10041017, https://doi.org/10.1175/2007JTECHO558.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ran, A., and R. Vreugdenhil, 1988: Existence and comparison theorems for algebraic Riccati equations for continuous- and discrete-time systems. Linear Algebra Appl., 99, 6383, https://doi.org/10.1016/0024-3795(88)90125-5.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Satterfield, E., D. Hodyss, D. D. Kuhl, and C. H. Bishop, 2017: Investigating the use of ensemble variance to predict observation error of representation. Mon. Wea. Rev., 145, 653667, https://doi.org/10.1175/MWR-D-16-0299.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Simon, D., 2006: Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches. Wiley-Interscience, 552 pp.

    • Crossref
    • Export Citation
  • Sivashinsky, G., 1977: Nonlinear analysis of hydrodynamic instability in laminar flames I. Derivation of basic equations. Acta Astronaut., 4, 11771206, https://doi.org/10.1016/0094-5765(77)90096-0.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Van Leeuwen, P. J., 2015: Representation errors and retrievals in linear and nonlinear data assimilation. Quart. J. Roy. Meteor. Soc., 141, 16121623, https://doi.org/10.1002/qj.2464.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations. Mon. Wea. Rev., 130, 19131924, https://doi.org/10.1175/1520-0493(2002)130<1913:EDAWPO>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
Save