Including Cross Correlation between Forecast and Observation Errors in an Ensemble Kalman Filter

Shun Ohishi RIKEN Center for Computational Science, Kobe, Japan
RIKEN Cluster for Pioneering Research, Kobe, Japan
RIKEN Interdisciplinary Theoretical and Mathematical Sciences Program (iTHEMS), Kobe, Japan
Institute for Space-Earth Environmental Research, Nagoya University, Nagoya, Japan

Search for other papers by Shun Ohishi in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0003-4043-8886
,
Yuki Kobayashi RIKEN Center for Computational Science, Kobe, Japan
Graduate School of Engineering, Kyoto University, Kyoto, Japan

Search for other papers by Yuki Kobayashi in
Current site
Google Scholar
PubMed
Close
, and
Takemasa Miyoshi RIKEN Center for Computational Science, Kobe, Japan
RIKEN Cluster for Pioneering Research, Kobe, Japan
RIKEN Interdisciplinary Theoretical and Mathematical Sciences Program (iTHEMS), Kobe, Japan

Search for other papers by Takemasa Miyoshi in
Current site
Google Scholar
PubMed
Close
Open access

Abstract

The Kalman filter is an unbiased minimum variance estimator under the assumption of no cross correlation between the forecast and observation errors. Some data assimilation (DA) systems assimilate analyzed data as if they were independent observations, but they may contain errors correlated with the forecast errors. Examples include satellite retrievals used in atmospheric DA systems and optimally interpolated sea surface temperature (OISST) analysis data used in ocean DA systems. Even if the forecasts are not directly used in satellite retrievals or OISST, the model formulations and analysis methods are generally similar and could introduce correlated errors. This study brings to light the potential impacts of including the cross correlation in an ensemble Kalman filter using perfect-model twin experiments with the Lorenz-96 model. The observation errors are generated by mixing the forecast errors in the observation space with independent random noise. We formulate the ensemble transform Kalman filter (ETKF) with the cross correlation (ETKFCC) with only minor modifications to the ETKF and implement it with the Lorenz-96 model. We find nonnegligible impacts from the cross correlation. The results show that the ETKFCC is significantly more accurate than the standard ETKF at negligible additional computational cost.

© 2025 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Shun Ohishi, shun.ohishi@riken.jp

Abstract

The Kalman filter is an unbiased minimum variance estimator under the assumption of no cross correlation between the forecast and observation errors. Some data assimilation (DA) systems assimilate analyzed data as if they were independent observations, but they may contain errors correlated with the forecast errors. Examples include satellite retrievals used in atmospheric DA systems and optimally interpolated sea surface temperature (OISST) analysis data used in ocean DA systems. Even if the forecasts are not directly used in satellite retrievals or OISST, the model formulations and analysis methods are generally similar and could introduce correlated errors. This study brings to light the potential impacts of including the cross correlation in an ensemble Kalman filter using perfect-model twin experiments with the Lorenz-96 model. The observation errors are generated by mixing the forecast errors in the observation space with independent random noise. We formulate the ensemble transform Kalman filter (ETKF) with the cross correlation (ETKFCC) with only minor modifications to the ETKF and implement it with the Lorenz-96 model. We find nonnegligible impacts from the cross correlation. The results show that the ETKFCC is significantly more accurate than the standard ETKF at negligible additional computational cost.

© 2025 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Shun Ohishi, shun.ohishi@riken.jp

1. Introduction

The Kalman filter (KF) and the ensemble Kalman filter (EnKF) are widely used data assimilation methods that optimally combine forecasts and observations to estimate state variables and model parameters and to evaluate observing systems. Since the EnKF is easy to implement with various models and is efficient for parallel computations, it has been widely used for the analyses of the atmospheric and oceanic states and for the operational weather forecasts (Houtekamer and Zhang 2016; Balmaseda et al. 2015).

Previous studies have investigated the impacts of cross correlation between system noise q (a.k.a. model errors) and observation errors ϵo [i.e., 〈q(ϵo)T〉] and proposed new KF and EnKF formulations that include this cross correlation (Petovello et al. 2009; Chang 2014; Berry and Sauer 2018; Raboudi et al. 2021). Hereafter, the notations follow Tables A1A6 in appendix A. In a discrete-time nonlinear dynamical system, represented as xt+1t=M(xtt)+qt, the system noise comes from various sources of model imperfections such as parameterization and boundary conditions.

The KF and EnKF are formulated under the assumption of no cross correlation between the forecast and observation errors [i.e., 〈ϵf(ϵo)T〉 = 0]. To the best of the authors’ knowledge, little attention has been paid to the effects of this cross correlation, since it is generally assumed that observations are made independently of the model predictions. However, some data assimilation systems assimilate reanalysis data in the atmosphere (Hoover and Langland 2017) and optimal interpolation analysis data such as Merged Satellite and In Situ Data Global Daily Sea Surface Temperature (MGDSST; Kurihara et al. 2006) in the ocean (Hirose et al. 2019; Kido et al. 2022) as well as observations like satellite retrievals in the atmosphere (Miyoshi and Kunii 2012). If these data assimilate the same observations used in the systems, we expect 〈ϵf(ϵo)T〉 ≠ 0. Even if the forecasts are not directly used for satellite retrievals, the forecast errors of two independent systems might be correlated because model formulations are generally similar. Therefore, these data could possibly contain errors correlated with the forecast errors.

For a discrete nonlinear forecast dynamical system xt+1f=M(xtf), the forecast error evolution is given by ϵt+1fMϵtf+qt, and therefore, the forecast errors consist of the errors originating from the initial condition and system noise. To focus on the issue of 〈ϵf(ϵo)T〉 ≠ 0 without mixing the impacts from 〈q(ϵo)T〉, this study performs perfect-model twin experiments using the Lorenz-96 model (Lorenz 1996; Lorenz and Emanuel 1998) without system noise (i.e., q = 0) and investigates the pure impact of including the cross correlation between the forecast and observation errors by extending the EnKF.

In this paper, section 2 describes a method for generating the observation errors correlated with the forecast errors and presents the formulations of extended KF with cross correlation (KFCC) and extended ensemble transform Kalman filter (Bishop et al. 2001) with cross correlation (ETKFCC), followed by the experimental settings in section 3. Section 4 presents the results, and a summary is given in section 5.

2. Method

a. Correlated observation errors

In this study, we perform perfect-model twin experiments and can obtain the exact forecast errors online by ϵf = xfxt. In this case, the observation errors correlated with ϵf can be generated by
ϵo=AH(ϵf)+η.
Here, A = diag(a), where a = (a1, a2, …, ap) is a vector consisting of scalar parameters ai (i = 1, 2, …, p), and η is the independent Gaussian random noise with mean 0 and the error covariance matrix Ruc [i.e., ηN(0, Ruc)], where the superscript uc represents “uncorrelated” with the forecast errors [i.e., 〈ϵf(η)T〉 = 0]. In a scalar form, the cross-correlation coefficient between the observation and forecast errors is calculated by
r=aσf(aσf)2+(σuc)2,
where (σuc)2 are the variance of η. According to Eq. (2), the correlation coefficient varies at each time step since σf varies.

We do not prescribe a fixed cross-correlation coefficient between the forecast and observation errors due to the following reasons. As shown in Eq. (B5) in appendix B, exact σf is required to generate the correlated observation errors with a fixed cross correlation, but it is not trivial to obtain it at each analysis time step because the KF and EnKF cannot estimate it perfectly. Therefore, we adopt Eq. (1) as a feasible first step in this study.

b. KFCC

The analysis errors are represented as ϵa´(In×nK´H)ϵf+K´ϵo by substituting forecasts xf = xt + ϵf, observations y = H(xt) + ϵo, and analyses x´a=xt+ϵa´ into x´a=xf+K´[yH(xf)]. Here, the acute indicates the inclusion of the cross correlation between forecast and observation errors, and K´ is the gain matrix to be derived. If we include the cross correlation, we can derive the analysis error covariance matrix Pa´ as follows:
Pa´=ϵa´(ϵa´)T=Pf+CK´TPfHTK´T+K´CT+K´RK´TK´CTHTK´TK´HPfK´HCK´T+K´HPfHTK´T,
where C ≡ 〈ϵf(ϵo)T〉 is the covariance matrix between the forecast and observation errors. By minimizing the total analysis error variance [i.e., calculating tr(Pa´)/K´=0], K´ is derived as
K´=(PfHTC)(HPfHT+RHCCTHT)1=PfHT(Ip×pA)T[(Ip×pA)HPfHT(Ip×pA)T+Ruc]1.
To derive Eq. (4), we used R = AHPfHTAT + Ruc and C = PfHTAT, which are obtained from Eq. (1) and 〈ϵf(η)T〉 = 0. From Eqs. (3) and (4) and C = PfHTAT, the analysis error covariance matrix is given by
Pa´=(In×nK´H)Pf+K´CT=[In×nK´(Ip×pA)H]Pf.

We refer to the method described here as the extended Kalman filter with cross correlation (KFCC). In the case of no cross correlation (i.e., A = C = 0), the KFCC is reduced to the KF with the Kalman gain matrix K = PfHT(HPfHT + R)−1 and the analysis error covariance matrix Pa = (In×nKH)Pf. By replacing Hϵf in K and Pa with (Ip×pA)Hϵf, we can also confirm the consistency between the KF and KFCC. When A = Ip×p, Eq. (4) indicates K´=0, and therefore, xa´=xf and Pa´=Pf. Namely, observation has no impact on the analysis, and we will not consider this situation in this paper.

The K´ and Pa´ have similar forms to the KF except for multiplication by a diagonal matrix Ip×pA, and therefore, the KFCC can be implemented with only minor modifications to the KF. The additional computational cost for this multiplication would be negligible.

c. ETKFCC

Here we derive the extended ensemble transform Kalman filter (ETKF) with cross correlation (ETKFCC). The forecast and analysis error covariance matrices are written by
P^f1m1δXf(δXf)T
P´^a1m1δX´a(δX´a)T,
respectively, where the hat and ≐ indicate the ensemble-based sample estimates and approximation, respectively. Substituting Eq. (6) into Eq. (4), we get
K´̂=δXf(δYf)T(Ip×pA)T[(Ip×pA)δYf(δYf)T(Ip×pA)T+(m1)Ruc]1.
Using a transform matrix W´, the analysis ensemble perturbation matrix is written as
δX´a=δXfW´.
Substituting Eq. (8) into Eq. (5) and Eq. (9) into Eq. (7), we get
P´^a=1m1δXf{Im×m(δYf)T(Ip×pA)T×[(Ip×pA)δYf(δYf)T(Ip×pA)T+(m1)Ruc]1×(Ip×pA)δYf}(δXf)T=1m1δXfW´W´TδXf,
and hence,
W´W´T=Im×m(δYf)T(Ip×pA)T[(Ip×pA)δYf(δYf)T×(Ip×pA)T+(m1)Ruc]1(Ip×pA)δYf=[Im×m+(m1)1(δYf)T(Ip×pA)T(Ruc)1×(Ip×pA)δYf]1.
To derive Eq. (11), we applied the Sherman–Morrison–Woodbury formula: Im×mδYT(δYδYT + R)−1δY = (Im×m +δYTR−1δY)−1. Following Hunt et al. (2007), we define
P´˜a(m1)1W´W´T=[(m1)Im×m+(δYf)T(Ip×pA)T(Ruc)1(Ip×pA)δYf]1,
and consequently, the transform matrix is given by
W´=[(m1)P´˜a]1/2.
Applying δYT(δYδYT + R)−1 = (Im×m + δYTR−1δY)−1δYTR−1 to Eq. (8), we get
K´^=δXfP´˜a(δYf)T(Ip×pA)T(Ruc)1.
Using Eqs. (9) and (14), the jth ensemble analysis state vector is given by
xa(j)=xf¯+K´^[yH(xf¯)]+δX´a(j)=xf¯+δXf{P´˜a(δYf)T(Ip×pA)T(Ruc)1[yH(xf¯)]+W´(j)}.
To apply multiplicative covariance inflation, δXf is replaced with ρδXf throughout the formulation, where ρ ≥ 1 is a multiplicative inflation factor. As a result, only Eq. (12) is modified to
P´˜a=[m1ρIm×m+(δYf)T(Ip×pA)T(Ruc)1(Ip×pA)δYf]1.
In summary, we get the formulation of ETKFCC as follows:
xa(j)=xf¯+δXf{P´˜a(δYf)T(Ip×pA)T(Ruc)1[yH(xf¯)]+W´(j)},
P´˜a=[(m1)Im×m+(δYf)T(Ip×pA)T(Ruc)1(Ip×pA)δYf]1,
W´=[(m1)P´˜a]1/2.
The standard ETKF without the cross correlation is given by
xa(j)=xf¯+δXf{P˜a(δYf)TR1[yH(xf¯)]+W(j)},
P˜a=[m1ρIm×m+(δYf)TR1δYf]1,
W=[(m1)P˜a]1/2.

Similarly to the case of the KFCC, in the case of no cross correlation (i.e., A = 0), the ETKFCC is reduced to the ETKF. If δYf in the ETKF is replaced with (Ip×pA)δYf, the ETKF and ETKFCC are consistent. The ETKFCC has only additional multiplications of a diagonal matrix Ip×pA and can be implemented with very minor modifications to the ETKF. The additional computational cost would be negligible.

3. Experimental setting

This study uses the Lorenz-96 model (Lorenz 1996; Lorenz and Emanuel 1998) with 40 variables (i.e., n = 40) with cyclic boundary:
dxkdt=(xk+1xk2)xk1xk+F(k=1,2,,n),
where F = 8.0 is the forcing parameter. Model time step Δt = 0.05 corresponds to 6 h when F = 8.0 if we consider a typical error-doubling time for the synoptic weather (Lorenz 1996; Lorenz and Emanuel 1998). The fourth-order Runge–Kutta scheme with Δt = 0.01 is used. These experimental settings follow previous studies (Terasaki and Miyoshi 2014; Ying and Zhang 2015).

In this study, we perform perfect-model twin experiments. An 11-yr nature run is conducted after a 1-yr spinup, initialized by a sinusoidal wave with wavenumber and amplitude being 4 and 1, respectively. To compare the impacts between the ETKF and ETKFCC on accuracy, we perform ETKF and ETKFCC experiments with an ensemble size of m = 40. The initial ensemble states are generated by 1-yr ensemble free runs with the initial conditions from standard Gaussian random numbers N(0, 1). Observations at every model grid point (i.e., H = In×n) are generated every 6 h by adding the correlated observation errors [Eq. (1)] to true values from the nature run, and A = aIp×p is assumed for Eqs. (1) and (20)(22). As discussed in section 1, some systems assimilate optimally interpolated analysis data. Since the background and observation error covariance matrices in the optimal interpolation are usually fixed in space and time, it would be reasonable to assume spatially uniform cross correlation between the forecast and observation errors over the global domain, i.e., A = aIp×p, when we assimilate the optimally interpolated analysis datasets. If the multiple datasets with different values of the parameter a are assimilated, we need to prescribe each diagonal element of A or to assimilate each dataset serially. To generate the correlated observation errors, the forecast errors are calculated online by ϵf=xf¯xt, and σuc is set to 1 (i.e., Ruc = Ip×p). We investigate the sensitivity to the parameter a by setting a = −1, −0.9, …, 0.9 so that the observation errors do not contain unreasonably large forecast error components. Although a < 0 would be unrealistic in practice, we investigate those cases in this theoretical study.

For the ETKF, R as well as ρ should be tuned because R depends on Pf as is clear from R = AHPfHTAT + Ruc. To investigate the optimal settings for each parameter a, we conduct experiments using a variety of tuning parameters: ρ = 1.00, 1.01, …, 1.1, 1.2, …, 2.0, 3.0, …, 5.0 and R = 1.0, 1.1, …, 2.0, 3.0, …, 10.0 × Ip×p for the ETKF experiments. Although R = AHPfHTAT + Ruc suggests that nonzero off-diagonal elements exist in R, we assumed diagonal R in this study for simplicity. Since the ETKFCC does not use R, only ρ is tuned by setting ρ = 1.00, 1.01, …, 1.10, where large values beyond 1.1 are not necessary since the ETKFCC includes correlated observation errors explicitly. Covariance localization in observation space can be applied by inflating R and Ruc in ETKF and ETKFCC, respectively. However, to avoid an additional tuning parameter, the covariance localization is excluded by having a sufficiently large ensemble size of 40 for the 40-dimensional system.

We validate the results for 10 years excluding the first year in the 11-yr ETKF and ETKFCC experiments. Using ϵf=xf¯xt, ϵa=xa¯xt, and ϵo at the analysis time, we calculate spatiotemporally averaged forecast, analysis, and observational root-mean-square errors (RMSEs) over the whole domain and analysis period. We calculate an improvement ratio (IR) defined as
IR(%)=100×RMSEETKFRMSEETKFCCRMSEETKF,
where RMSEETKF and RMSEETKFCC indicate the forecast RMSEs in the ETKF and ETKFCC experiments, respectively. We also apply both the statistical t-test and the bootstrap method to the forecast RMSE differences between the ETKF and ETKFCC experiments and obtain the same results for the significance at a 99% confidence level. For the bootstrap method, the RMSE differences are resampled for 10 000 cycles. Using the forecast ensemble spread at the analysis time, we calculate the spatiotemporally averaged forecast ensemble spread over the whole domain and analysis period. Using ϵf and ϵo at the analysis time over the whole analysis period, we calculate the cross-correlation coefficient between the forecast and observation errors at each grid and then spatially average the cross-correlation coefficients over the whole domain, although the cross-correlation coefficient might vary at each time as discussed in section 2a. We define filter divergence as the spatiotemporally averaged analysis RMSEs being larger than the corresponding observational RMSEs.

4. Results

Figure 1 shows the spatiotemporally averaged forecast RMSEs and ensemble spread and the spatially averaged cross-correlation coefficients, for the optimal choice of R in the ETKF experiment (Fig. 2a). The more positive and negative parameter a, respectively, the larger and smaller the forecast RMSEs and ensemble spread are (Figs. 1a,b), and the more positive and negative the cross-correlation coefficients are (Fig. 1c). Figure 2 shows that optimal observation error standard deviations (i.e., the square root of diagonal elements in optimal R; Fig. 2a) correspond well to the spatiotemporally averaged observational RMSEs (Fig. 2b), although they are overestimated near the points close to the filter divergence and when ρ > 1.6–1.8 (Fig. 2c, red color). For the negative a, the minimum forecast RMSEs are achieved when no or minimal inflation is applied (white stars in Fig. 1). In contrast, more inflation is required for the larger positive a, and especially when a ≥ 0.7, the optimal inflation parameter exceeds 1.50. However, at the same time, the optimal observation error standard deviations are substantially larger than the observational RMSEs (Fig. 2) and balance with the large inflation.

Fig. 1.
Fig. 1.

Spatiotemporally averaged forecast (a) RMSEs and (b) ensemble spread, and (c) spatially averaged cross-correlation coefficients between the forecast and observation errors in the ETKF experiment for multiplicative inflation parameter ρ = 2, 3, …, 5. (d)–(i) As in (a)–(c), but for ρ = 1.1, 1.2, …, 2.0 and ρ = 1.00, 1.01, …, 1.10, respectively. The observation error variances are manually tuned (cf. Fig. 2a). White stars denote where the forecast RMSE is minimum for each a. White areas indicate filter divergence and a = 1 (not performed).

Citation: Monthly Weather Review 153, 6; 10.1175/MWR-D-25-0016.1

Fig. 2.
Fig. 2.

As in Fig. 1, but for (a) optimal observation error standard deviations, (b) spatiotemporally averaged observational RMSEs, and (c) the ratios of (a) to (b). In (c), the ratios of 95%–105% are also shaded in white.

Citation: Monthly Weather Review 153, 6; 10.1175/MWR-D-25-0016.1

In the ETKFCC experiment, the larger positive and negative a, respectively, the larger and smaller forecast RMSEs and ensemble spread (Figs. 3a,b), and the more positive and negative cross-correlation coefficients (Fig. 3c). These results are qualitatively the same as those in the ETKF experiment. Since diagonal K´ elements should be between 0 and 1, Eq. (5) in a scalar form implies that the positive and negative covariances between the forecast and observation errors increase and reduce the analysis error variances, respectively. Therefore, the larger and smaller forecast RMSEs and ensemble spread for the positive and negative a are theoretically consistent with Eq. (5), respectively. In contrast to the ETKF experiment, for the positive a, the optimal multiplicative inflation parameter in the ETKFCC experiment is slightly increased but almost the same as that with no cross correlation (i.e., a = 0) (Figs. 1 and 3).

Fig. 3.
Fig. 3.

As in Figs. 1g–i, but for the ETKFCC experiment.

Citation: Monthly Weather Review 153, 6; 10.1175/MWR-D-25-0016.1

The minimum forecast RMSEs for each a are compared between the ETKF and ETKFCC experiments (Fig. 4a). Open orange circles indicate cases where IR > 5% and the forecast RMSEs are significantly better in the ETKFCC experiment than in the ETKF experiment at a 99% confidence level. For most of a, the forecast RMSEs are smaller in the ETKFCC experiment than in the ETKF experiment. Especially for a ≤ −0.3 and 0.5 ≤ a ≤ 0.8, IR is more than 5% and statistically significant. However, the observational RMSEs are not consistent between the ETKF and ETKFCC experiments (Fig. 4b) because R = AHPfHTAT + Ruc, and therefore, the forecast errors contribute to the observation errors. Consequently, when comparing the accuracy between the ETKF and ETKFCC experiments, it is more appropriate to use RMSE ratios, which are defined as the ratios of the spatiotemporally averaged forecast RMSEs to the corresponding observational RMSEs, rather than the forecast RMSEs (Fig. 4c). Although the RMSE ratios are also used for calculating IR and detecting the statistical signals, the results are qualitatively the same as when using the forecast RMSEs (Figs. 4a,c). Therefore, the ETKFCC outperforms the ETKF for most of a, especially for a ≤ −0.3 and 0.5 ≤ a ≤ 0.8.

Fig. 4.
Fig. 4.

Spatiotemporally averaged (a) forecast and (b) observational RMSEs and (c) the RMSE ratios of the spatiotemporally averaged forecast RMSEs to the corresponding observational RMSEs when the forecast RMSE is minimum for each a. The black and orange lines indicate the ETKF and ETKFCC experiments, respectively. In (a), a logarithmic scale is used for the vertical axis to clarify the differences between the ETKF and ETKFCC experiments. In (a), open circles are illustrated if IR > 5% and the forecast RMSEs and RMSE ratios in the ETKFCC experiment are significantly better than the ETKF experiment at a 99% confidence level. In (c), open circles are the same as in (a), but the RMSE ratios are used.

Citation: Monthly Weather Review 153, 6; 10.1175/MWR-D-25-0016.1

Figure 5 shows the spatially averaged cross-correlation coefficients between the forecast and observation errors when R and ρ are optimal for each a. The result is consistent with Eq. (2), showing that the signs of the cross-correlation coefficients are determined by those of a. For the positive cross correlation, the forecasts and observations tend to be on the same side relative to the true values, and the analyses are likely to be located far away from the true values. In contrast, for the negative cross correlation, the forecasts and observations tend to be on the opposite side, and the analyses are likely to be close to the true values. As a result, the positive and negative cross correlation results in the degradation and improvement of the accuracy, respectively (Fig. 4a). Equation (2) indicates that the asymmetry of the forecast RMSEs between the positive and negative a causes that of the cross correlation and that the differences in the forecast RMSEs between the ETKF and ETKFCC experiments result in those in the cross-correlation coefficient (Figs. 4a and 5).

Fig. 5.
Fig. 5.

Spatially averaged cross-correlation coefficients between the forecast and observation errors in the ETKF (black) and ETKFCC (orange) experiments when the forecast RMSE is minimum for each a.

Citation: Monthly Weather Review 153, 6; 10.1175/MWR-D-25-0016.1

As discussed in section 2a and appendix B, it is difficult to conduct numerically stable experiments using observation errors generated by fixing the cross-correlation coefficients. Therefore, using a line graph, we compare the RMSE ratios between the ETKF and ETKFCC experiments relative to the estimated cross-correlation coefficients (Fig. 6). The results show that the ETKFCC outperforms the ETKF for the more positive and negative cross correlation. However, when the cross-correlation coefficient is nearly zero, the RMSE ratios in the ETKFCC experiment are almost the same as those in the ETKF experiment.

Fig. 6.
Fig. 6.

The RMSE ratios relative to the spatially averaged correlation coefficients in the ETKF (black) and ETKFCC (orange) experiments when the forecast RMSE is minimum for each a.

Citation: Monthly Weather Review 153, 6; 10.1175/MWR-D-25-0016.1

5. Summary

We derived the KFCC and ETKFCC to include the cross correlation between the forecast and observation errors and compared the ETKF and the new ETKFCC by perfect-model twin experiments with the Lorenz-96 model. We generated correlated observation errors by mixing the forecast errors in the observation space with random noise [Eq. (1)]. The results showed that the positive cross correlation significantly degraded the accuracy in both ETKF and ETKFCC (Figs. 1, 3, 4, and 6) because the forecasts and observations tend to be located on the same side relative to the true values, and vice versa for the negative correlation case. The optimal inflation parameters in the ETKFCC are close to the experiments without the cross correlation for all a, whereas those in the ETKF are exceedingly large for the positive a (Figs. 1 and 3). For most of the cross-correlation coefficients, the ETKFCC outperformed the ETKF with negligible additional computational cost. Therefore, it would be recommended to implement the ETKFCC if we assimilate observations with the cross correlation.

As described in section 3, this study assumed that the assimilated data have a uniform parameter a (i.e., A = aIp×p). In practice, however, the parameter a would vary for different types of observation, and it is necessary to prescribe the parameter a (i.e., the diagonal elements of A) or to serially assimilate each observation type with a uniform scalar parameter a. In practical data assimilation systems, it is challenging to estimate the cross-correlation coefficients accurately. Therefore, our future study will explore online and offline approaches to estimate the parameter a using innovation statistics (Desroziers et al. 2005) and estimate the parameter a of real-world data.

Acknowledgments.

We thank the editor and three anonymous reviewers for giving constructive comments. This work was supported by JST AIP (Grant JPMJCR19U2, Japan); MEXT (Grant JPMXP1020200305) as “Program for Promoting Researches on the Supercomputer Fugaku” (Large Ensemble Atmospheric and Environmental Prediction for Disaster Prevention and Mitigation); the COE research grant in computational science from Hyogo Prefecture and Kobe City through the Foundation for Computational Science; JST, SICORP (Grant JPMJSC1804, Japan); JSPS KAKENHI (Grant JP19H05605); JSPS KAKENHI Grant-in-Aid for Early-Career Scientists (Grant JP23K13174); JSPS Grant-in-Aid for Transformative Research Areas (Grant JP24H02227); the Japan Aerospace Exploration Agency (JX-PSPC-452680, JX-PSPC-500973, JX-PSPC-509736, JX-PSPC-513414, JX-PSPC-519799, and JX-PSPC-527843); JST, CREST (Grant JPMJCR20F2, Japan); Cabinet Office, Government of Japan, Moonshot R & D Program for Agriculture, Forestry and Fisheries (funding agency: Bio-oriented Technology Research Advancement Institution) JPJ009237; RIKEN Pioneering Project “Prediction for Science”; JST, CREST (Grant: JPMJSA2109); and JST, K program (Grant JPMJKP23D1).

Data availability statement.

The source codes and datasets generated during and/or analyzed during the current study are available from https://zenodo.org/record/7777540.

APPENDIX A

Notations

We followed the notations presented in Tables A1A6 in this study.

Table A1.

Notations for scalars.

Table A1.
Table A2.

Notations for superscripts.

Table A2.
Table A3.

Notations for subscripts.

Table A3.
Table A4.

Notations for vectors.

Table A4.
Table A5.

Notations for matrices.

Table A5.
Table A6.

Notations for operators and normal distributions.

Table A6.

APPENDIX B

Correlated Observation Error with Fixed Observation Error Variance and Cross-Correlation Coefficient in Scalar Form

In this appendix B, we derive correlated observation errors assuming a fixed observation error variance (σo)2 and cross-correlation coefficient r between the forecast and observation errors. First, we represent the correlated observation errors as
ϵo=aϵf+bη,
where a and b are the scalar parameters to be determined and ηN[0,(σuc)2] is an independent Gaussian random noise and independent of the forecast error (i.e., 〈ϵfη〉 = 0). From Eq. (B1), the observation error variance is calculated as
(σo)2=(aσf)2+(bσuc)2.
The cross correlation between forecast and observation errors is given by r = f/σo, and therefore,
a=rσoσf.
Substituting Eq. (B3) into Eq. (B2), we can solve for b:
b=σoσuc1r2.
Finally, substituting Eqs. (B3) and (B4) into Eq. (B1), we get
ϵo=σo(rσfϵf+1r2σucη).

Here, variables and parameters, except σf, are either prescribed or can be estimated exactly. However, the KF and EnKF estimate σf imperfectly, and therefore, it is not trivial to obtain exact ϵo at each time step.

REFERENCES

  • Balmaseda, M. A., and Coauthors, 2015: The Ocean Reanalyses Intercomparison Project (ORA-IP). J. Oper. Oceanogr., 8, s80s97, https://doi.org/10.1080/1755876X.2015.1022329.

    • Search Google Scholar
    • Export Citation
  • Berry, T., and T. Sauer, 2018: Correlation between system and observation errors in data assimilation. Mon. Wea. Rev., 146, 29132931, https://doi.org/10.1175/MWR-D-17-0331.1.

    • Search Google Scholar
    • Export Citation
  • Bishop, C. H., B. J. Etherton, and S. J. Majumdar, 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects. Mon. Wea. Rev., 129, 420436, https://doi.org/10.1175/1520-0493(2001)129<0420:ASWTET>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Chang, G., 2014: On Kalman filter for linear system with colored measurement noise. J. Geod., 88, 11631170, https://doi.org/10.1007/s00190-014-0751-7.

    • Search Google Scholar
    • Export Citation
  • Desroziers, G., L. Berre, B. Chapnik, and P. Poli, 2005: Diagnosis of observation, background and analysis-error statistics in observation space. Quart. J. Roy. Meteor. Soc., 131, 33853396, https://doi.org/10.1256/qj.05.108.

    • Search Google Scholar
    • Export Citation
  • Hirose, N., and Coauthors, 2019: Development of a new operational system for monitoring and forecasting coastal and open-ocean states around Japan. Ocean Dyn., 69, 13331357, https://doi.org/10.1007/s10236-019-01306-x.

    • Search Google Scholar
    • Export Citation
  • Hoover, B. T., and R. H. Langland, 2017: Forecast and observation-impact experiments in the Navy Global Environmental Model with assimilation of ECWMF analysis data in the global domain. J. Meteor. Soc. Japan, 95, 369389, https://doi.org/10.2151/jmsj.2017-023.

    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., and F. Zhang, 2016: Review of the ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 144, 44894532, https://doi.org/10.1175/MWR-D-15-0440.1.

    • Search Google Scholar
    • Export Citation
  • Hunt, B. R., E. J. Kostelich, and I. Szunyogh, 2007: Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Phys. D, 230, 112126, https://doi.org/10.1016/j.physd.2006.11.008.

    • Search Google Scholar
    • Export Citation
  • Kido, S., M. Nonaka, and Y. Miyazawa, 2022: JCOPE-FGO: An eddy-resolving quasi-global ocean reanalysis product. Ocean Dyn., 72, 599619, https://doi.org/10.1007/s10236-022-01521-z.

    • Search Google Scholar
    • Export Citation
  • Kurihara, Y., T. Sakurai, and T. Kuragano, 2006: Global daily sea surface temperature analysis using data from satellite microwave radiometer, satellite infrared radiometer and in-situ observations (in Japanese). Wea. Serv. Bull., 73, s1s18.

    • Search Google Scholar
    • Export Citation
  • Lorenz, E. N., 1996: Predictablilty: A problem partly solved. Proc. Seminar on Predictability, Reading, United Kingdom, ECMWF, 118, https://www.ecmwf.int/en/elibrary/75462-predictability-problem-partly-solved.

    • Search Google Scholar
    • Export Citation
  • Lorenz, E. N., and K. A. Emanuel, 1998: Optimal sites for supplementary weather observations: Simulation with a small model. J. Atmos. Sci., 55, 399414, https://doi.org/10.1175/1520-0469(1998)055<0399:OSFSWO>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Miyoshi, T., and M. Kunii, 2012: Using AIRS retrievals in the WRF-LETKF system to improve regional numerical weather prediction. Tellus, 64A, 18408, https://doi.org/10.3402/tellusa.v64i0.18408.

    • Search Google Scholar
    • Export Citation
  • Petovello, M. G., K. O’Keefe, G. Lachapelle, and M. E. Cannon, 2009: Consideration of time-correlated errors in a Kalman filter applicable to GNSS. J. Geod., 83, 5156, https://doi.org/10.1007/s00190-008-0231-z.

    • Search Google Scholar
    • Export Citation
  • Raboudi, N. F., B. Ait-El-Fquih, H. Ombao, and I. Hoteit, 2021: Ensemble Kalman filtering with coloured observation noise. Quart. J. Roy. Meteor. Soc., 147, 44084424, https://doi.org/10.1002/qj.4186.

    • Search Google Scholar
    • Export Citation
  • Terasaki, K., and T. Miyoshi, 2014: Data assimilation with error-correlated and non-orthogonal observations: Experiments with the Lorenz-96 model. SOLA, 10, 210213, https://doi.org/10.2151/sola.2014-044.

    • Search Google Scholar
    • Export Citation
  • Ying, Y., and F. Zhang, 2015: An adaptive covariance relaxation method for ensemble data assimilation. Quart. J. Roy. Meteor. Soc., 141, 28982906, https://doi.org/10.1002/qj.2576.

    • Search Google Scholar
    • Export Citation
Save
  • Balmaseda, M. A., and Coauthors, 2015: The Ocean Reanalyses Intercomparison Project (ORA-IP). J. Oper. Oceanogr., 8, s80s97, https://doi.org/10.1080/1755876X.2015.1022329.

    • Search Google Scholar
    • Export Citation
  • Berry, T., and T. Sauer, 2018: Correlation between system and observation errors in data assimilation. Mon. Wea. Rev., 146, 29132931, https://doi.org/10.1175/MWR-D-17-0331.1.

    • Search Google Scholar
    • Export Citation
  • Bishop, C. H., B. J. Etherton, and S. J. Majumdar, 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects. Mon. Wea. Rev., 129, 420436, https://doi.org/10.1175/1520-0493(2001)129<0420:ASWTET>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Chang, G., 2014: On Kalman filter for linear system with colored measurement noise. J. Geod., 88, 11631170, https://doi.org/10.1007/s00190-014-0751-7.

    • Search Google Scholar
    • Export Citation
  • Desroziers, G., L. Berre, B. Chapnik, and P. Poli, 2005: Diagnosis of observation, background and analysis-error statistics in observation space. Quart. J. Roy. Meteor. Soc., 131, 33853396, https://doi.org/10.1256/qj.05.108.

    • Search Google Scholar
    • Export Citation
  • Hirose, N., and Coauthors, 2019: Development of a new operational system for monitoring and forecasting coastal and open-ocean states around Japan. Ocean Dyn., 69, 13331357, https://doi.org/10.1007/s10236-019-01306-x.

    • Search Google Scholar
    • Export Citation
  • Hoover, B. T., and R. H. Langland, 2017: Forecast and observation-impact experiments in the Navy Global Environmental Model with assimilation of ECWMF analysis data in the global domain. J. Meteor. Soc. Japan, 95, 369389, https://doi.org/10.2151/jmsj.2017-023.

    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., and F. Zhang, 2016: Review of the ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 144, 44894532, https://doi.org/10.1175/MWR-D-15-0440.1.

    • Search Google Scholar
    • Export Citation
  • Hunt, B. R., E. J. Kostelich, and I. Szunyogh, 2007: Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Phys. D, 230, 112126, https://doi.org/10.1016/j.physd.2006.11.008.

    • Search Google Scholar
    • Export Citation
  • Kido, S., M. Nonaka, and Y. Miyazawa, 2022: JCOPE-FGO: An eddy-resolving quasi-global ocean reanalysis product. Ocean Dyn., 72, 599619, https://doi.org/10.1007/s10236-022-01521-z.

    • Search Google Scholar
    • Export Citation
  • Kurihara, Y., T. Sakurai, and T. Kuragano, 2006: Global daily sea surface temperature analysis using data from satellite microwave radiometer, satellite infrared radiometer and in-situ observations (in Japanese). Wea. Serv. Bull., 73, s1s18.

    • Search Google Scholar
    • Export Citation
  • Lorenz, E. N., 1996: Predictablilty: A problem partly solved. Proc. Seminar on Predictability, Reading, United Kingdom, ECMWF, 118, https://www.ecmwf.int/en/elibrary/75462-predictability-problem-partly-solved.

    • Search Google Scholar
    • Export Citation
  • Lorenz, E. N., and K. A. Emanuel, 1998: Optimal sites for supplementary weather observations: Simulation with a small model. J. Atmos. Sci., 55, 399414, https://doi.org/10.1175/1520-0469(1998)055<0399:OSFSWO>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Miyoshi, T., and M. Kunii, 2012: Using AIRS retrievals in the WRF-LETKF system to improve regional numerical weather prediction. Tellus, 64A, 18408, https://doi.org/10.3402/tellusa.v64i0.18408.

    • Search Google Scholar
    • Export Citation
  • Petovello, M. G., K. O’Keefe, G. Lachapelle, and M. E. Cannon, 2009: Consideration of time-correlated errors in a Kalman filter applicable to GNSS. J. Geod., 83, 5156, https://doi.org/10.1007/s00190-008-0231-z.

    • Search Google Scholar
    • Export Citation
  • Raboudi, N. F., B. Ait-El-Fquih, H. Ombao, and I. Hoteit, 2021: Ensemble Kalman filtering with coloured observation noise. Quart. J. Roy. Meteor. Soc., 147, 44084424, https://doi.org/10.1002/qj.4186.

    • Search Google Scholar
    • Export Citation
  • Terasaki, K., and T. Miyoshi, 2014: Data assimilation with error-correlated and non-orthogonal observations: Experiments with the Lorenz-96 model. SOLA, 10, 210213, https://doi.org/10.2151/sola.2014-044.

    • Search Google Scholar
    • Export Citation
  • Ying, Y., and F. Zhang, 2015: An adaptive covariance relaxation method for ensemble data assimilation. Quart. J. Roy. Meteor. Soc., 141, 28982906, https://doi.org/10.1002/qj.2576.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    Spatiotemporally averaged forecast (a) RMSEs and (b) ensemble spread, and (c) spatially averaged cross-correlation coefficients between the forecast and observation errors in the ETKF experiment for multiplicative inflation parameter ρ = 2, 3, …, 5. (d)–(i) As in (a)–(c), but for ρ = 1.1, 1.2, …, 2.0 and ρ = 1.00, 1.01, …, 1.10, respectively. The observation error variances are manually tuned (cf. Fig. 2a). White stars denote where the forecast RMSE is minimum for each a. White areas indicate filter divergence and a = 1 (not performed).

  • Fig. 2.

    As in Fig. 1, but for (a) optimal observation error standard deviations, (b) spatiotemporally averaged observational RMSEs, and (c) the ratios of (a) to (b). In (c), the ratios of 95%–105% are also shaded in white.

  • Fig. 3.

    As in Figs. 1g–i, but for the ETKFCC experiment.

  • Fig. 4.

    Spatiotemporally averaged (a) forecast and (b) observational RMSEs and (c) the RMSE ratios of the spatiotemporally averaged forecast RMSEs to the corresponding observational RMSEs when the forecast RMSE is minimum for each a. The black and orange lines indicate the ETKF and ETKFCC experiments, respectively. In (a), a logarithmic scale is used for the vertical axis to clarify the differences between the ETKF and ETKFCC experiments. In (a), open circles are illustrated if IR > 5% and the forecast RMSEs and RMSE ratios in the ETKFCC experiment are significantly better than the ETKF experiment at a 99% confidence level. In (c), open circles are the same as in (a), but the RMSE ratios are used.

  • Fig. 5.

    Spatially averaged cross-correlation coefficients between the forecast and observation errors in the ETKF (black) and ETKFCC (orange) experiments when the forecast RMSE is minimum for each a.

  • Fig. 6.

    The RMSE ratios relative to the spatially averaged correlation coefficients in the ETKF (black) and ETKFCC (orange) experiments when the forecast RMSE is minimum for each a.

All Time Past Year Past 30 Days
Abstract Views 168 168 168
Full Text Views 279 279 113
PDF Downloads 175 175 59