1. Introduction
The use of coupled models in Earth sciences is becoming increasingly common. In addition to atmosphere–ocean coupled models used for predicting El Niño–Southern Oscillation (ENSO), Earth system models coupled with many components such as ice sheets, the biosphere, aerosols, and chemical species have improved our understanding of the Earth system and enabled projecting future climate with less artificial approximations.
The purpose of data assimilation (DA) is to prepare initial conditions for these models by estimating the current (real-time analysis) or past (reanalysis) state of the system combining the models and observations. Traditionally, the state of each component is estimated by using a background state estimate from an uncoupled model and observations of the same component. This approach is known as uncoupled DA. For example, the oceanic state is often analyzed using an ocean general circulation model (GCM) and observations of only the oceanic state, even if the analysis is then used for a prediction by a coupled model. Coupled data assimilation (CDA) substitutes the uncoupled models with a coupled model to generate the background and is expected to provide a more self-consistent and accurate analysis compared with uncoupled DA (e.g., Zhang et al. 2007; Penny and Hamill 2017).
One type of CDA known as weakly coupled DA (WCDA) uses a coupled model to generate the background but then updates the analysis separately for each component. Observations of one component are directly assimilated only into the state variables of the same component. The analysis increments of one component can, therefore, only indirectly impact the state of the other components through the forward integration of the coupled model. Weakly coupled ocean–atmosphere DA has been successfully implemented operationally and shown to be an improvement over uncoupled DA (e.g., Saha et al. 2010; Laloyaux et al. 2016).
Another type of CDA is called strongly coupled DA (SCDA; Penny and Hamill 2017). SCDA also uses a coupled model in the forecast step of DA as well but then uses the cross covariances of the background errors to assimilate observations of one component into the state estimate of the other components directly. Therefore, a model variable can be affected by the assimilation of all the possibly relevant observations of all the components, which should minimize the analysis uncertainty of the variable. In addition, SCDA is expected to achieve more self-consistent analysis by alleviating the initialization shocks present in WCDA, which are caused by each component assimilating a disjoint set of observations (Mulholland et al. 2015).
Some recent experiments with SCDA show encouraging results. Sluka et al. (2016) showed that the direct assimilation of atmospheric observations into the ocean analysis reduced the error in the ocean analysis by ~46% compared to WCDA using an intermediate complexity atmosphere–ocean coupled model under a perfect model scenario. However, SCDA does not always produce improved analyses. Han et al. (2013) showed that the use of cross covariance between components is only beneficial for the slower component (i.e., the ocean in an atmosphere–ocean model) and only if using a very large ensemble (
The main difficulty of CDA is due to the presence of multiple time scales in a coupled system dynamics. A coupled atmosphere–ocean system, for example, has high-frequency modes such as convection and baroclinic instability (minutes to days) and low-frequency modes such as ENSO (seasonal to interannual). However, most of the observable quantities are (possibly nonlinear) combinations of those modes. Therefore, when observations are assimilated to the physical state variables, irrelevant modes projected onto the observation can introduce noise into the mode of interest (e.g., Tardif et al. 2014). Lu et al. (2015a,b) explicitly tackled the multi-time-scale characteristic of coupled systems. They first showed that the use of coupled covariance (i.e., SCDA relative to WCDA) is only beneficial in the deep tropics, where the ocean drives the atmosphere through anomalous sea surface temperature (Ruiz-Barradas et al. 2017). They attributed the negative impact of SCDA outside the tropics to high-frequency weather noise and proposed the leading average coupled covariance (LACC) method, in which the innovations from the atmospheric observations were averaged over several days to enhance the cross correlation between the atmosphere and the ocean.
The aforementioned, apparently contradicting results raise an important question: Under what conditions does SCDA work better than WCDA? In our study, we address this question and propose an offline method to determine which observations should be assimilated into which variables during the analysis update.
2. Theoretical analysis
In this section, we derive an expression that estimates the analysis uncertainty reduction by the assimilation of each observation.
Here, we assume that only a single observation is assimilated at a time. This is not a strong assumption because, in both Gaussian and Bayesian frameworks, theoretical analysis shows that the observations can be assimilated sequentially without changing the resulting analysis if they have mutually independent error distributions (Houtekamer and Mitchell 2001; Anderson 2003). Furthermore, Anderson (2003) pointed out that the observations with correlated errors can be transformed into ones with uncorrelated errors by performing a singular value decomposition on the observation error covariance matrix
Equation (6) indicates that the relative improvement of the estimate of the state of each model variable by the assimilation of a single observation is the product of two quantities: (i) the ratio of the background error variance at the observation location and the sum of the background and observation error variance at the observation location (which is close to one when the observation is relative to the background) and (ii) the square of the background error correlation between the analyzed and observed variables. This equation also provides a quantitative estimate of the analysis error reduction by SCDA using estimates of the cross covariances between background errors in the different components.
We hypothesize that in an ensemble Kalman filter (EnKF; Evensen 1994), the assimilation of “irrelevant” observations in SCDA degrades the analysis if the detrimental effect of spurious correlations from the limited ensemble size exceeds the expected error reduction from the Kalman filter. This hypothesis implies that for a strongly coupled EnKF, we should use a correlation-cutoff method to localize the analysis, in which we only consider cross covariances between variables that have strong background error correlation.
3. Methods
Local EnKFs such as the local ensemble transform Kalman filter (LETKF; Hunt et al. 2007) allows us to assimilate different subsets of observations for each model variable. Therefore, we can define a “localization pattern”, in which we select observations to be assimilated into each model variable depending on which component the observation and the model variable are located in (see details in section 3e and Fig. 1). In this section, the optimal localization pattern for a simple coupled model is sought by estimating the strength of cross correlation using an offline analysis cycle of the LETKF. Then, the localization pattern is tested in independent LETKF cycles with various ensemble sizes, and the accuracy of the resulting analysis is compared to that obtained with other localization patterns.
a. Model
This coupled model consists of three components: a fast “extratropical atmosphere” (xe, ye, ze), a fast “tropical atmosphere” (
Despite its extreme simplicity and computational efficiency, the multi-time-scale coupled model shares several important characteristics with the real coupled Earth system and is an excellent testbed for testing ideas for CDA problems. The model shows a chaotic behavior with two distinct regimes: the coupled tropical atmosphere and ocean cycle into a random number of “normal years” (between 2 and 7), interrupted by an “El Niño year” with large negative anomaly in X, before returning to normal years [see Fig. 2 of Peña and Kalnay (2004)]. Since this asymmetric oscillation neither occurs in the uncoupled tropical atmosphere nor ocean, it is regarded as an intrinsically coupled instability. Therefore, the model developers called the coupled tropical atmosphere and ocean as an ENSO-like coupled system. The extratropical atmosphere, on the other hand, behaves almost like an individual chaotic system because of the weak coupling with the other components. Norwood et al. (2013) examined the properties of the coupled model and showed it to have two positive, five negative, and two near-zero Lyapunov exponents.
b. Data assimilation method
We use the LETKF (Hunt et al. 2007), one of the deterministic implementations of the ensemble Kalman filters classified as ensemble square root filters (EnSRFs; Tippett et al. 2003). The LETKF allows us to assimilate only a subset of the observations into the analysis of each variable.
According to Ng et al. (2011) and Trevisan and Palatella (2011), the dimension of the subspace spanned by perturbations is at most
Although we also conducted some experiments with 100 members, the resulting temporal mean analysis RMS error was not qualitatively different from that of the 10-member experiments, and only the results from smaller ensemble experiments are shown below. Note that the LETKF is designed to provide the same analysis mean and analysis error covariance matrix as those of the Kalman filter for linear forward operators and with sufficient ensemble members to factorize the background error covariance matrix into ensemble perturbation matrices (Hunt et al. 2007). However, the premises may be violated if the model is biased or stochastic, or if the nonlinearity is significant (i.e., the errors are too large to neglect the second- and higher-order terms in the Taylor expansions of the nonlinear forward operators). In these difficult situations, larger ensemble size will be beneficial because the statistical sampling of the stochastic or nonlinear error growth becomes more accurate. The insensitivity of EnSRF’s averaged analysis error to excessive ensemble size is thoroughly discussed in Sakov and Oke (2008).
For covariance inflation, we use multiplicative adaptive inflation of Wang and Bishop (2003). The diagnosed inflation factor
c. Experimental settings
We test our method by performing identical twin experiments. The model [Eq. (8)] is started from random initial conditions and spun up by integrating for 25 000 time steps before saving the subsequent 75 000 time steps as the truth. Observations are produced by adding Gaussian noise to the truth with a mean of zero and standard deviation of
The ensemble members are initialized with random numbers (with different random seeds from the one used for the truth) and spun up for 25 000 time steps before starting the analysis cycle so that the background ensemble members for the first analysis are random samples on the model’s attractor. Analysis experiments are conducted for the subsequent 75 000 time steps, the same period as the one for which we have saved the truth and the observations. The analysis is updated every 8 time steps, and therefore, the observations are only available at the end of each window. Within the 75 000 time steps (9375 analysis windows), only the last 50 000 time steps (6250 analysis windows) are used for calculating the background error correlation (Fig. 2) and the analysis error (Fig. 3) in the following subsections because we are only interested in the filter performance after its initial transient.
d. Offline experiment and error statistics
We first conduct an offline experiment to obtain the error statistics of the model. For this, we use the same analysis system as discussed in section 3c but with the truth, observations, and initial ensemble members independent from the main experiments. We use the fully coupled LETKF (Full pattern in section 3e) with
Figure 2 shows the mean of squared error correlation for each pair of variables. In this model there are only weak error correlations (
e. Covariance localization
We test the five covariance localization patterns shown in Fig. 1.
Full is the standard SCDA in which every observation is assimilated into the analysis of every state variable.
Adjacent uses the cross covariance only between directly interacting components. The cross covariances between the extratropical atmosphere and ocean are therefore ignored.
ENSO-coupling is the pattern suggested by our theoretical analysis and the offline experiment. The observations of the ENSO-like coupled system (i.e., the tropical atmosphere and the ocean) are mutually assimilated, but the extratropical atmosphere is analyzed individually.
Atmos-coupling analyzes the extratropical and tropical atmosphere together but the ocean separately. This pattern separately analyzes the fast and slow components.
Individual analyzes each component individually. The background is updated by the coupled model, but the analysis step is individually implemented for each component, which is equivalent to WCDA for this three-component model.
4. Results
The resulting analysis errors are plotted in Fig. 3.
Full (standard SCDA) performs worse than Individual (WCDA) when the ensemble size is small (
As Eq. (6) indicates, the assimilation of any type of observations with the Kalman filter will, on average, not increase the analysis uncertainty if the background and observation error covariance matrices are accurately specified. When the ensemble size is sufficient, the background error is sufficiently small to ignore nonlinearity, and no localization is applied; thus, the assimilation of an observation uncorrelated with a state variable will be neither beneficial nor harmful since the LETKF converges to the Kalman filter. The number of ensemble members needed for successful implementation of SCDA will be highly model dependent and may be affected by other factors like the use of covariance inflation.
The ENSO-coupling pattern suggested by the correlation-cutoff method did perform best in essentially all experiments, as we expected. In comparison to Individual (WCDA) and Atmos-coupling, ENSO-coupling was superior regardless of the ensemble size. The inferior performance of Individual and Atmos-coupling is noticeable in the tropical atmosphere and the ocean, between which these inferior patterns ignore strong background error cross covariances. This comparison shows the importance of including the background error covariances between the tropical atmosphere and the ocean in this model. In contrast to Full (standard SCDA) and Adjacent, ENSO-coupling performed well with the smaller ensembles (
These comparisons support the ENSO-coupling pattern, or the decision of ignoring the weak cross covariance between the “extratropical atmosphere” and the other components while considering the strong covariance between the “tropical atmosphere” and the “ocean,” as suggested by Fig. 2.
5. Summary
We first derived a simplified equation for the expected analysis error reduction when assimilating an observation into the analysis of each model variable. The results from the tests with five different covariance localization patterns support the intuitive idea that SCDA benefits from using the cross covariance between variables of different components only when they have strong background error correlations.
We then experimentally showed that the use of cross covariance in the LETKF could be detrimental when the ensemble size is too small. This supports the claim of Han et al. (2013) that a large ensemble is needed to improve the analysis using the full cross covariance. With a limited number of ensemble members, localizing the background error covariance is essential to obtain a better analysis. We proposed the correlation-cutoff method: first, estimate the mean of the squared background error correlation with an offline run, then, uncouple the data assimilation if the background error correlation between the analyzed and the observed variables is weak. In our experiments with the nine-variable coupled model of Peña and Kalnay (2004), the correlation-cutoff method, intermediate to the standard SCDA and WCDA approaches, results in the best analysis and is the most robust to the choice of ensemble size.
Covariance localization guided by the correlation-cutoff method is a general idea to increase the signal-to-noise ratio of data assimilation. This method, however, is particularly important for the SCDA, where the correlation strength between different model components cannot be summarized by a simple function of distance, as represented by the carbon data assimilation of Kang et al. (2011). Although the distance-dependent localization (Hamill et al. 2001) showed great success in atmospheric and oceanic DA, it cannot deal with characteristics of the dynamics that are distance independent. On the other hand, the squared ensemble correlation is a norm-independent quantity between 0 and 1, which can be measured between any pair of observation and model variables. Furthermore, the method is also applicable before the implementation of SCDA; if a weakly coupled EnKF system has been already implemented, by measuring the squared ensemble correlations, one can assess the variance reduction that could be achieved by implementing the SCDA in advance. With these two characteristics, the correlation-cutoff method can be particularly useful for coupled EnKF applications.
In the toy model we used there was a clear distinction between strongly and weakly correlated pairs of variables (Fig. 2), and therefore, it was clear where to stop the coupled data assimilation. In more realistic configurations, the cutoff could be more complex. For example, Smith et al. (2017) estimated the coupled error correlation with a single column model and showed that strong cross correlations are mostly limited to the atmospheric boundary layer and the oceanic mixed layer. Such limited distribution of cross correlation was also observed with an operational coupled GCM (T. Sluka 2017, personal communication). We expect that in such realistic models, as in the simple coupled atmosphere–ocean model we tested, the squared background error correlation will provide guidance on where to stop the coupled assimilation.
Acknowledgments
We gratefully acknowledge that T. Yoshida is supported by the Japanese Government Long-term Overseas Fellowship Program. We thank Dr. Travis Sluka and three anonymous reviewers for making several insightful suggestions that significantly improved the clarity of the paper.
REFERENCES
Anderson, J. L., 2003: A local least squares framework for ensemble filtering. Mon. Wea. Rev., 131, 634–642, https://doi.org/10.1175/1520-0493(2003)131<0634:ALLSFF>2.0.CO;2.
Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99, 10 143–10 162, https://doi.org/10.1029/94JC00572.
Gelb, A., J. F. Kasper Jr., A. N. Raymond Jr., C. F. Price, and A. A. Sutherland Jr., 1974: Applied Optimal Estimation. M.I.T. Press, 374 pp.
Hamill, T. M., J. S. Whitaker, and C. Snyder, 2001: Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter. Mon. Wea. Rev., 129, 2776–2790, https://doi.org/10.1175/1520-0493(2001)129<2776:DDFOBE>2.0.CO;2.
Han, G., X. Wu, S. Zhang, Z. Liu, and W. Li, 2013: Error covariance estimation for coupled data assimilation using a Lorenz atmosphere and a simple pycnocline ocean model. J. Climate, 26, 10 218–10 231, https://doi.org/10.1175/JCLI-D-13-00236.1.
Houtekamer, P. L., and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 129, 123–137, https://doi.org/10.1175/1520-0493(2001)129<0123:ASEKFF>2.0.CO;2.
Hunt, B. R., E. J. Kostelich, and I. Szunyogh, 2007: Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Physica D, 230, 112–126, https://doi.org/10.1016/j.physd.2006.11.008.
Kalman, R. E., 1960: A new approach to linear filtering and prediction problems. J. Basic Eng., 82, 35–45, https://doi.org/10.1115/1.3662552.
Kang, J.-S., E. Kalnay, J. Liu, I. Fung, T. Miyoshi, and K. Ide, 2011: “Variable localization” in an ensemble Kalman filter: Application to the carbon cycle data assimilation. J. Geophys. Res., 116, D09110, https://doi.org/10.1029/2010JD014673.
Laloyaux, P., M. Balmaseda, D. Dee, K. Mogensen, and P. Janssen, 2016: A coupled data assimilation system for climate reanalysis. Quart. J. Roy. Meteor. Soc., 142, 65–78, https://doi.org/10.1002/qj.2629.
Li, H., E. Kalnay, and T. Miyoshi, 2009: Simultaneous estimation of covariance inflation and observation errors within an ensemble Kalman filter. Quart. J. Roy. Meteor. Soc., 135, 523–533, https://doi.org/10.1002/qj.371.
Lorenz, E. N., 1963: Deterministic nonperiodic flow. J. Atmos. Sci., 20, 130–141, https://doi.org/10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2.
Lu, F., Z. Liu, S. Zhang, and Y. Liu, 2015a: Strongly coupled data assimilation using leading averaged coupled covariance (LACC). Part I: Simple model study. Mon. Wea. Rev., 143, 3823–3837, https://doi.org/10.1175/MWR-D-14-00322.1.
Lu, F., Z. Liu, S. Zhang, Y. Liu, and R. Jacob, 2015b: Strongly coupled data assimilation using leading averaged coupled covariance (LACC). Part II: CGCM experiments. Mon. Wea. Rev., 143, 4645–4659, https://doi.org/10.1175/MWR-D-15-0088.1.
Mulholland, D. P., P. Laloyaux, K. Haines, and M. A. Balmaseda, 2015: Origin and impact of initialization shocks in coupled atmosphere-ocean forecasts. Mon. Wea. Rev., 143, 4631–4644, https://doi.org/10.1175/MWR-D-15-0076.1.
Ng, G.-H. C., D. Mclaughlin, D. Entekhabi, and A. Ahanin, 2011: The role of model dynamics in ensemble Kalman filter performance for chaotic systems. Tellus, 63A, 958–977, https://doi.org/10.1111/j.1600-0870.2011.00539.x.
Norwood, A., E. Kalnay, K. Ide, S.-C. Yang, and C. Wolfe, 2013: Lyapunov, singular and bred vectors in a multi-scale system: An empirical exploration of vectors related to instabilities. J. Phys. A, 46, 254021, https://doi.org/10.1088/1751-8113/46/25/254021.
Peña, M., and E. Kalnay, 2004: Separating fast and slow modes in coupled chaotic systems. Nonlinear Processes Geophys., 11, 319–327, https://doi.org/10.5194/npg-11-319-2004.
Penny, S. G., and T. M. Hamill, 2017: Coupled data assimilation for integrated earth system analysis and prediction. Bull. Amer. Meteor. Soc., 98, ES169–ES172, https://doi.org/10.1175/BAMS-D-17-0036.1.
Ruiz-Barradas, A., E. Kalnay, M. Peña, A. E. BozorgMagham, and S. Motesharrei, 2017: Finding the driver of local ocean–atmosphere coupling in reanalyses and CMIP5 climate models. Climate Dyn., 48, 2153–2172, https://doi.org/10.1007/s00382-016-3197-1.
Saha, S., and Coauthors, 2010: The NCEP Climate Forecast System Reanalysis. Bull. Amer. Meteor. Soc., 91, 1015–1057, https://doi.org/10.1175/2010BAMS3001.1.
Sakov, P., and P. R. Oke, 2008: Implications of the form of the ensemble transformation in the ensemble square root filters. Mon. Wea. Rev., 136, 1042–1053, https://doi.org/10.1175/2007MWR2021.1.
Sluka, T. C., S. G. Penny, E. Kalnay, and T. Miyoshi, 2016: Assimilating atmospheric observations into the ocean using strongly coupled ensemble data assimilation. Geophys. Res. Lett., 43, 752–759, https://doi.org/10.1002/2015GL067238.
Smith, P. J., A. S. Lawless, and N. K. Nichols, 2017: Estimating forecast error covariances for strongly coupled atmosphere–ocean 4D-Var data assimilation. Mon. Wea. Rev., 145, 4011–4035, https://doi.org/10.1175/MWR-D-16-0284.1.
Tardif, R., G. J. Hakim, and C. Snyder, 2014: Coupled atmosphere-ocean data assimilation experiments with a low-order climate model. Climate Dyn., 43, 1631–1643, https://doi.org/10.1007/s00382-013-1989-0.
Tippett, M. K., J. L. Anderson, C. H. Bishop, T. M. Hamill, and J. S. Whitaker, 2003: Ensemble square root filters. Mon. Wea. Rev., 131, 1485–1490, https://doi.org/10.1175/1520-0493(2003)131<1485:ESRF>2.0.CO;2.
Trevisan, A., and L. Palatella, 2011: On the Kalman Filter error covariance collapse into the unstable subspace. Nonlinear Processes Geophys., 18, 243–250, https://doi.org/10.5194/npg-18-243-2011.
Wang, X., and C. H. Bishop, 2003: A comparison of breeding and ensemble transform Kalman filter ensemble forecast schemes. J. Atmos. Sci., 60, 1140–1158, https://doi.org/10.1175/1520-0469(2003)060<1140:ACOBAE>2.0.CO;2.
Yang, S.-C., and Coauthors, 2006: Data assimilation as synchronization of truth and model: Experiments with the three-variable Lorenz system. J. Atmos. Sci., 63, 2340–2354, https://doi.org/10.1175/JAS3739.1.
Zhang, S., M. J. Harrison, A. Rosati, and A. Wittenberg, 2007: System design and evaluation of coupled ensemble data assimilation for global oceanic climate studies. Mon. Wea. Rev., 135, 3541–3564, https://doi.org/10.1175/MWR3466.1.