## 1. Introduction

**and**

*ω***. In this article, we argue the importance of modeling correlations between system and observation noise. Specifically, we will consider Kalman filters and ensemble Kalman filters (EnKF) that are derived based on the assumption that**

*ν*In order for

If we consider the true state to be a function of space and time evolving in an infinite-dimensional function space, then the truncated true state and the observation are essentially two different finite-dimensional projections of this infinite-dimensional space (Dee 1995; Janjić and Cohn 2006; Oke and Sakov 2008). In section 2 we will show that the errors between the exact projections and the finite-dimensional approximations are correlated for generic observations. Error correlations arising from model truncation have been previously observed (Mitchell and Daley 1997; Hamill and Whitaker 2005; Liu and Rabier 2002). In particular, models are often composed of discrete dynamics occurring at points of a two- or three-dimensional grid. Remote observations by satellite or radiosonde can be viewed as integrations over a region including several grid points. Here, we provide a general framework to explain correlations between these quantities. We also consider other sources of error such as model mismatch and instrument error, and we show that significant correlations persist except in the case where instrument error dominates (since this error will be modeled as white noise that is uncorrelated with the state).

In section 3, a correlated version of the EnKF is developed that takes the correlations in (3) into account, and recovers the Kalman equations for linear systems. In section 4, the correlated unscented Kalman filter (CUKF; an unscented version of the EnKF) is applied to examples from section 2. Using an appropriate

In section 5 we investigate the effect of correlations in greater detail. First, we show that in the linear case when

## 2. Correlation between system and observation errors

To understand how correlations arise in applied data assimilation, we must first leave behind the idealized scenario described in (1) and (2). Following Dee (1995) and Janjić and Cohn (2006), we describe the true evolution and observation processes by replacing the discrete solution *z* and temporal variable *t*. We should note that the following analysis is very similar to Satterfield et al. (2017), except that we consider an infinite-dimensional solution

*f*, namely,

*h*be a consistent discretization of the true observation projection [as in (5)], we further decompose the observation error in terms of the representation error

### a. Evaluation and averaging projections

*f*and observation function

*h*are

*consistent*, meaning that the errors

*n*in space and order

*m*in time to obtain the system error

*h*should be a quadrature rule for approximating this integral. Assuming that the discrete observation function has order

*a*and

*c*are the same up to a scalar, and similarly when

*b*and

*d*are linear combinations of the same order derivatives. While it is possible for these terms to combine so as to exactly cancel when averaged over time, the correlation will often be nonzero.

As a special case, consider the situation when both the system and observation errors are dominated by the same single variable (either time or one of the spatial variables). In this case, the leading-order terms would differ only by a constant, so that up to higher-order terms the system error and observation error would be multiples of one another. This not only implies that *maximally correlated* up to higher-order corrections, so that *S* is as large as possible relative to the individual variances. For example, in the case of satellite observations of radiative transfer, the true observation integrates over the entire vertical component of the atmosphere, whereas the integral may be very localized in time and the horizontal variables. This would suggest a relatively large error in terms of vertical

*h*and an approximate function

*α*and

*h*and

*h*and

Even for general nonlinear observations functions *h* and approximate observation functions

### b. Time-averaged observations of an ODE

*f*is Euler’s method, the system error is

*h*to be a consistent quadrature rule using the grid points falling within the interval

### c. Estimation of the full covariance matrix

The correlations described above will be illustrated in two simple examples. To show the effects most clearly, we assume a perfect model. We begin with a simple ODE solver.

In Fig. 1c we show the estimated covariance matrix *T* = 12 000 discrete time steps). The estimated

The difference between the positive correlations in Figs. 1b and 1c and negative correlations in Figs. 1e and 1f will have a noticeable effect in filter accuracy, as shown in section 4a. In section 5, we will establish a theory explaining this disparity in the linear case.

*δ*defines the spatial region over which the observations are averaged. So for example, when

*δ*has become smaller than our truncated

*h*is consistent with the true observation

In Fig. 2 we compare the full resolution and truncated solutions for 64 grid points and

(top) (left) Ground truth 512 gridpoint solution (middle left) the same solution decimated to 64 grid points (middle right) the observation, which integrates the leftmost solution over 9 grid points before truncating, and (right) the observation error, which is the difference between the middle two solutions. (bottom) (left),(middle left) As in (top). (middle right) The 1-step integrator output from the truncated model, using 64 grid points and

Citation: Monthly Weather Review 146, 9; 10.1175/MWR-D-17-0331.1

(top) (left) Ground truth 512 gridpoint solution (middle left) the same solution decimated to 64 grid points (middle right) the observation, which integrates the leftmost solution over 9 grid points before truncating, and (right) the observation error, which is the difference between the middle two solutions. (bottom) (left),(middle left) As in (top). (middle right) The 1-step integrator output from the truncated model, using 64 grid points and

Citation: Monthly Weather Review 146, 9; 10.1175/MWR-D-17-0331.1

(top) (left) Ground truth 512 gridpoint solution (middle left) the same solution decimated to 64 grid points (middle right) the observation, which integrates the leftmost solution over 9 grid points before truncating, and (right) the observation error, which is the difference between the middle two solutions. (bottom) (left),(middle left) As in (top). (middle right) The 1-step integrator output from the truncated model, using 64 grid points and

Citation: Monthly Weather Review 146, 9; 10.1175/MWR-D-17-0331.1

It is helpful to view the empirical full covariance matrix

For the Kuramoto–Sivashinsky model truncated onto 64 grid points with

Citation: Monthly Weather Review 146, 9; 10.1175/MWR-D-17-0331.1

For the Kuramoto–Sivashinsky model truncated onto 64 grid points with

Citation: Monthly Weather Review 146, 9; 10.1175/MWR-D-17-0331.1

For the Kuramoto–Sivashinsky model truncated onto 64 grid points with

Citation: Monthly Weather Review 146, 9; 10.1175/MWR-D-17-0331.1

In Fig. 3 we should emphasize the presence of many small eigenvalues, indicating that the *maximally correlated*. We will show that the strong correlation between the system and observation errors has significant consequences for the ability to estimate the true state from the observations.

## 3. Filtering in the presence of correlations

In this section we review versions of the Kalman filter for linear and nonlinear dynamics, which include full correlations of system and observation errors. We begin with the linear formulas, and then discuss the unscented version of the ensemble Kalman filter for nonlinear models.

### a. The Kalman filter for correlated system and observation noise

*i*th observation given only the observations up to time

*innovation*. These innovations are often used to estimate the system and observation error as in Bélanger (1974), Mehra (1970, 1972), and Berry and Sauer (2013). The Kalman gain matrix

### b. The correlated unscented Kalman filter (CUKF)

We now generalize the correlated system and observation noise filtering approach to nonlinear systems and we will show that for linear systems we recover exactly the equations above.

*N*rows of the ensemble,

*N*rows, and

*M*rows. We also define the associated ensemble weights as

*α*defines the scaling of the ensemble, which is often chosen to be

*f*and

*h*are applied to each column of the ensemble matrices to form the forecast ensemble matrices

*j*th column of

*j*th ensemble member).

In appendix A we show the equivalence of the CUKF and Kalman filter (KF) for linear problems with correlated noise, which shows that the CUKF is a natural generalization to nonlinear problems with correlated errors. We note that the generalization of the CUKF approach to an ensemble square root Kalman filter (EnSQKF) is a straightforward extension of the same Kalman update formulas. Integrating correlated noise into other Kalman filters such as the EnKF and ensemble transform Kalman filter (ETKF) can also be achieved using the Kalman update for correlated noise. For large problems the covariance matrix would need to be localized in order to be a practical method; for example, the localized ensemble transform Kalman filter (LETKF) can be adapted to use the unscented ensembles used here Berry and Sauer (2013). A significant remaining task is generalizing the ensemble adjustment Kalman filter EAKF for additive system noise and correlated system and observation noise. Serial filters such as the EAKF cannot currently be applied even for additive system noise, which is not correlated to the observation noise, and instead these filters typically use inflation to try to account for system error. Generalizing the serial filtering approach to allow these more general types of inflation is a significant and important task and is beyond the scope of this article.

## 4. Filtering systems with truncation errors

In this section we apply the CUKF to truncated observations of the Lorenz-63 and Kuramoto–Sivashinsky systems as described in section 2. The dynamics and observations considered in this section have no added noise, so that the system errors arise only from truncation of the numerical solvers, and the observation errors arises only from local integration (representation error only). The CUKF will use the empirically estimated **0** matrix, which we denote as UKF.

### a. Example: Lorenz equations

First we consider the Lorenz-63 system in (8) with the observation described in section 2b. Using the same data generated in that example, we applied the CUKF and UKF. The estimates produced by these filters are shown in Fig. 4a (for the same time interval shown in Fig. 1). In Fig. 4b we show the errors between each filter’s estimates and the truth, compared to the observation representation errors over the same time interval. The CUKF, which uses the full

(a) Comparison of the true solution,

Citation: Monthly Weather Review 146, 9; 10.1175/MWR-D-17-0331.1

(a) Comparison of the true solution,

Citation: Monthly Weather Review 146, 9; 10.1175/MWR-D-17-0331.1

(a) Comparison of the true solution,

Citation: Monthly Weather Review 146, 9; 10.1175/MWR-D-17-0331.1

We then repeated this experiment using the RK4 integrator instead of forward Euler with the same truncated time step of

### b. Example: Kuramoto–Sivashinsky

Next we consider filtering the observations of the Kuramoto–Sivashinsky model in (9) introduced in section 2c. Using a ground truth integrated with 512 spatial grid points and

(a),(b) Comparison of filter results using the UKF without correlations to filtering with correlations (CUKF) on the Kuramoto–Sivashinsky model truncated in space to *x* axis indicates inflation percentage. In each case, inflation degraded the filter performance.

Citation: Monthly Weather Review 146, 9; 10.1175/MWR-D-17-0331.1

(a),(b) Comparison of filter results using the UKF without correlations to filtering with correlations (CUKF) on the Kuramoto–Sivashinsky model truncated in space to *x* axis indicates inflation percentage. In each case, inflation degraded the filter performance.

Citation: Monthly Weather Review 146, 9; 10.1175/MWR-D-17-0331.1

(a),(b) Comparison of filter results using the UKF without correlations to filtering with correlations (CUKF) on the Kuramoto–Sivashinsky model truncated in space to *x* axis indicates inflation percentage. In each case, inflation degraded the filter performance.

Citation: Monthly Weather Review 146, 9; 10.1175/MWR-D-17-0331.1

The results of both UKF are robust for large

Finally, in Fig. 5d we show the effect of inflation in the UKF by adding a constant multiple of the identity to either

## 5. Maximally correlated random variables and perfect recoverability

In the previous section, the importance of using the full correlation matrix *maximally correlated* random variables.

### a. Maximum correlation

We begin by defining maximally correlated random variables.

#### Definition 5.1 (Maximally correlated random variables)

Let *.* We say that *maximally correlated* if the Schur complement of

**X**and

**Y**are symmetric, so that

*N*. Notice that we can make a linear change of the

**X**variables,

A simple example of maximal correlation is to consider the *s* the eigenvalues of *s*. In general, when

It follows from Lemma B.1 that given random variables **X** and **Y** maximally correlated, and rank**X** and **Y** represent the system and observation errors of a dynamical system, respectively, and if they are maximally correlated, then the underlying noise/error process is actually only *N* dimensional, despite appearing *effectively having no observation noise.* We will make this rigorous below by showing that in the case of maximal correlations, the observation errors can be completely eliminated by filtering and the true state can be perfectly recovered.

Now consider the case when *N*). So, returning to the discussion in section 2a, we can now see that when the leading-order terms in the system and observation errors only differ by a constant multiple they will be maximally correlated up to higher-order terms. More generally, whenever the *N*, the system and observation errors are maximally correlated. In particular, the small eigenvalues in Fig. 3 indicate that the system and observation errors are close to maximally correlated.

### b. Perfect recoverability in maximally correlated linear systems

If (18) has a solution that is stabilizing, meaning that all the eigenvalues of

#### Theorem 5.2

*M*eigenvectors of

*.*

Notice that the Kalman filter has an asymmetry between

*λ*will be varied to demonstrate the effect of stability and

*j*. The different curves correspond to different values of

*λ*chosen near 1.5. Notice that for

Mean-squared error of filter estimates for linear models

Citation: Monthly Weather Review 146, 9; 10.1175/MWR-D-17-0331.1

Mean-squared error of filter estimates for linear models

Citation: Monthly Weather Review 146, 9; 10.1175/MWR-D-17-0331.1

Mean-squared error of filter estimates for linear models

Citation: Monthly Weather Review 146, 9; 10.1175/MWR-D-17-0331.1

*j*. The different curves correspond to different values of

*λ*chosen near 0.5. Notice that for

### c. Examples of UKF and perfect recovery in nonlinear systems

In this section we will apply the UKF to synthetic datasets generated with nonlinear dynamics where the system and observation errors are Gaussian distributed pseudorandom numbers. A surprising result is that despite the nonlinearity, we still obtain perfect recovery up to numerical precision for maximally correlated errors. Moreover, in analogy to the linear case, perfect recovery is not possible when the instabilities in the nonlinear dynamics become sufficiently strong.

We first consider the Lorenz-63 system introduced above (8). We consider the discrete time dynamics

Mean-squared error of filter estimates with positively correlated noise for (a) L63 in periodic and chaotic parameter regimes and (b) L96 dynamical systems for various values of the forcing parameter. Black curve is based on the filter using

Citation: Monthly Weather Review 146, 9; 10.1175/MWR-D-17-0331.1

Mean-squared error of filter estimates with positively correlated noise for (a) L63 in periodic and chaotic parameter regimes and (b) L96 dynamical systems for various values of the forcing parameter. Black curve is based on the filter using

Citation: Monthly Weather Review 146, 9; 10.1175/MWR-D-17-0331.1

Mean-squared error of filter estimates with positively correlated noise for (a) L63 in periodic and chaotic parameter regimes and (b) L96 dynamical systems for various values of the forcing parameter. Black curve is based on the filter using

Citation: Monthly Weather Review 146, 9; 10.1175/MWR-D-17-0331.1

## 6. Discussion

Approximating a dynamical system on a grid is pervasive in geophysical data assimilation applications. For dynamical processes, time is usually handled in discrete fashion. We have shown that correlation between system and observation errors should be expected when the system errors derive from local truncation errors from differential equation solvers, both in discrete time and on a spatial grid, and when observational error is dominated by either observation model error or representation error.

In section 3, we introduced an approach to the ensemble Kalman filter that accounts for the correlations between system and observation errors. In particular, we showed that for spatiotemporal problems, extending the covariance matrix to allow cross correlations can reduce filtering error as much as a significant increase in grid resolution. Of course, obtaining more precise estimates of the truth with much coarser discretization allows faster runtimes and/or larger ensembles to be used.

Correlations are most significant when other independent sources of observation and system error are small compared to the truncation error. Of course, other sources of error, such as model error, may influence both the state and the observations leading to further significant correlations, but for simplicity we focus on correlations arising in the perfect model scenario. It is reasonable to expect that in many physical systems, the noise affecting the state of the system would also affect the sensor or observation system.

The generalization of the CUKF to an ensemble square root Kalman filter (EnSQKF) is a straightforward extension. However, it remains to extend the ensemble adjustment Kalman filter EAKF for additive system noise to correlations between system and observation noise. An EAKF formulation is critical for situations when the ensemble size is necessarily much smaller than either the state or observation dimensions (*N* and *M*, respectively). This situation is common when the covariance matrices, which are explicitly used in the UKF approach above do not fit in memory. A significant challenge in this formulation is that we cannot appropriately inflate the ensemble since we assume the full correlation matrix

In this article, we have not dealt with the question of real-time estimation of the full covariance matrix

## Acknowledgments

We thank three reviewers whose helpful suggestions led to a much improved paper. This research was partially supported by National Science Foundation Grant DMS1723175.

## APPENDIX A

### Equivalence of CUKF and KF for Linear Problems with Correlated Errors

## APPENDIX B

### Maximal Correlation when

In section 5 we showed how to define the maximal correlation matrix

#### Lemma B.1

If

, let be the first *M*columns ofand be the first block of and let and . If

, let and and let be any *N*columns ofand the corresponding block of .

*.*Moreover, for any maximal

#### Proof

*M*diagonal entries of

## APPENDIX C

### Proof of Theorem 5.2

#### Proof

In the case when *M* columns of *M* eigenvectors of

## REFERENCES

Bélanger, P. R., 1974: Estimation of noise covariance matrices for a linear time-varying stochastic process.

,*Automatica***10**, 267–275, https://doi.org/10.1016/0005-1098(74)90037-5.Berry, T., and T. Sauer, 2013: Adaptive ensemble Kalman filtering of non-linear systems.

,*Tellus***65A**, 20331, https://doi.org/10.3402/tellusa.v65i0.20331.Dee, D. P., 1995: On-line estimation of error covariance parameters for atmospheric data assimilation.

,*Mon. Wea. Rev.***123**, 1128–1145, https://doi.org/10.1175/1520-0493(1995)123<1128:OLEOEC>2.0.CO;2.Guttman, L., 1946: Enlargement methods for computing the inverse matrix.

,*Ann. Math. Stat.***17**, 336–343, https://doi.org/10.1214/aoms/1177730946.Hamill, T. M., and J. S. Whitaker, 2005: Accounting for the error due to unresolved scales in ensemble data assimilation: A comparison of different approaches.

,*Mon. Wea. Rev.***133**, 3132–3147, https://doi.org/10.1175/MWR3020.1.Hodyss, D., and N. Nichols, 2015: The error of representation: Basic understanding.

,*Tellus***67A**, 24822, https://doi.org/10.3402/tellusa.v67.24822.Janjić, T., and S. E. Cohn, 2006: Treatment of observation error due to unresolved scales in atmospheric data assimilation.

,*Mon. Wea. Rev.***134**, 2900–2915, https://doi.org/10.1175/MWR3229.1.Janjić, T., and Coauthors, 2018: On the representation error in data assimilation.

, https://doi.org/10.1002/qj.3130, in press.*Quart. J. Roy. Meteor. Soc.*Julier, S. J., and J. K. Uhlmann, 2004: Unscented filtering and nonlinear estimation.

,*Proc. IEEE***92**, 401–422, https://doi.org/10.1109/JPROC.2003.823141.Kuramoto, Y., and T. Tsuzuki, 1976: Persistent propagation of concentration waves in dissipative media far from thermal equilibrium.

,*Prog. Theor. Phys.***55**, 356–369, https://doi.org/10.1143/PTP.55.356.Liu, Z.-Q., and F. Rabier, 2002: The interaction between model resolution, observation resolution and observation density in data assimilation: A one-dimensional study.

,*Quart. J. Roy. Meteor. Soc.***128**, 1367–1386, https://doi.org/10.1256/003590002320373337.Lorenz, E. N., 1963: Deterministic nonperiodic flow.

,*J. Atmos. Sci.***20**, 130–141, https://doi.org/10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2.Lorenz, E. N., 1996: Predictability—A problem partly solved.

*Proc. Seminar on Predictability*, Vol. 1, Reading, United Kingdom, ECMWF, 18 pp., https://www.ecmwf.int/sites/default/files/elibrary/1995/10829-predictability-problem-partly-solved.pdf.Mehra, R., 1970: On the identification of variances and adaptive Kalman filtering.

,*IEEE Trans. Autom. Control***15**, 175–184, https://doi.org/10.1109/TAC.1970.1099422.Mehra, R., 1972: Approaches to adaptive filtering.

,*IEEE Trans. Autom. Control***17**, 693–698, https://doi.org/10.1109/TAC.1972.1100100.Mitchell, H. L., and R. Daley, 1997: Discretization error and signal/error correlation in atmospheric data assimilation.

,*Tellus***49A**, 32–53, https://doi.org/10.3402/tellusa.v49i1.12210.Oke, P. R., and P. Sakov, 2008: Representation error of oceanic observations for data assimilation.

,*J. Atmos. Oceanic Technol.***25**, 1004–1017, https://doi.org/10.1175/2007JTECHO558.1.Ran, A., and R. Vreugdenhil, 1988: Existence and comparison theorems for algebraic Riccati equations for continuous- and discrete-time systems.

,*Linear Algebra Appl.***99**, 63–83, https://doi.org/10.1016/0024-3795(88)90125-5.Satterfield, E., D. Hodyss, D. D. Kuhl, and C. H. Bishop, 2017: Investigating the use of ensemble variance to predict observation error of representation.

,*Mon. Wea. Rev.***145**, 653–667, https://doi.org/10.1175/MWR-D-16-0299.1.Simon, D., 2006:

*Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches.*Wiley-Interscience, 552 pp.Sivashinsky, G., 1977: Nonlinear analysis of hydrodynamic instability in laminar flames I. Derivation of basic equations.

,*Acta Astronaut.***4**, 1177–1206, https://doi.org/10.1016/0094-5765(77)90096-0.Van Leeuwen, P. J., 2015: Representation errors and retrievals in linear and nonlinear data assimilation.

,*Quart. J. Roy. Meteor. Soc.***141**, 1612–1623, https://doi.org/10.1002/qj.2464.Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations.

,*Mon. Wea. Rev.***130**, 1913–1924, https://doi.org/10.1175/1520-0493(2002)130<1913:EDAWPO>2.0.CO;2.