## 1. Introduction

This work considers the data assimilation problem in coupled systems that consist of two subsystems. Examples in this aspect include, for instance, coupled ocean–atmosphere models (e.g., Russell et al. 1995), marine ecosystem models coupling physics and biology (e.g., Petihakis et al. 2009), coupled flow, and (contaminant) transport models (e.g., Dawson et al. 2004), to name a few.

In principle, data assimilation in coupled systems can be tackled by concatenating the states of the subsystems into one augmented state and treating the whole coupled system as a single dynamical system. After augmentation, a conventional data assimilation method, such as the ensemble Kalman filter (EnKF), can be directly applied. In this work, we present a divided state-space estimation strategy in the context of ensemble Kalman filtering. Instead of directly applying the update formulae in the conventional EnKF, we consider the possibility to express the update formulae in terms of some quantities with respect to the subsystems themselves. In doing so, the updated formulae in the divided estimation framework introduce some extra “cross terms” to account for the effect of coupling between the subsystems.

The divided estimation framework is derived based on the joint estimation one; hence, in principle these two approaches are mathematically equivalent. The main purpose of this work is to investigate the possibility of using the divided estimation strategy as an alternative to its joint counterpart. Whenever convenient, we would advocate the use of the joint estimation strategy, since it is conceptually more straightforward. However, there might still be some aspects in which the divided estimation strategy may appear more attractive, for example, in terms of the flexibility of implementation in large-scale applications, to be further discussed later.

This work is organized as follows: Section 2 outlines the filtering step of the EnKF in the joint and divided estimation frameworks. Section 3 conducts numerical experiments with a multiscale Lorenz 96 model and verifies that the joint and divided estimation frameworks have close performance under the same conditions. Section 4 investigates two extensions of the divided estimation framework that aim to achieve a certain trade-off between computational efficiency and accuracy. Finally, section 5 concludes the work and discusses some potential future developments.

## 2. Joint and divided estimation strategies with the EnKF

In the literature there are many variants of the EnKF (see, e.g., Anderson 2001; Bishop et al. 2001; Evensen 1994; Burgers et al. 1998; Hoteit et al. 2002; Luo and Moroz 2009; Tippett et al. 2003; Whitaker and Hamill 2002). In this work, we use the ensemble transform Kalman filter (ETKF; Bishop et al. 2001) for illustration. The extension to other filters can be done in a similar way. The joint and divided estimation strategies mainly differ at the filtering step, which is thus our focus hereafter. For ease of notation, we drop the time indices of all involved quantities.

Suppose that the state vectors in the coupled subsystems are *η* and *ξ*, respectively, and the corresponding observation subsystems are given by **y**_{η} = _{η}** η** +

**u**

_{η}and

**y**

_{ξ}=

_{ξ}

**+**

*ξ***u**

_{ξ}, where

_{η}and

_{ξ}are some linear observation operators,

^{1}and

**u**

_{η}and

**u**

_{ξ}are the corresponding observation noise with zero means and covariances

_{η}and

_{ξ}, respectively. In practice, it is possible that one of the subsystems (e.g.,

*ξ*) may not be observed. In this case, to overcome the technical problem in describing the unknown observation operator (e.g.,

_{ξ}), one can set the associated covariance matrix

_{ξ}of

**y**

_{ξ}to +∞ so that

**y**

_{ξ}does not affect the update (Jazwinski 1970, p. 219). For convenience of discussion, we denote the dimensions of the vectors

**,**

*η***,**

*ξ***x**,

**y**

_{η},

**y**

_{ξ}, and

**y**by

*m*

_{η},

*m*

_{ξ},

*m*

_{x},

*m*

_{y}, respectively, such that

*m*

_{η}+

*m*

_{ξ}=

*m*

_{x}and

In the above setting we have assumed that the observation operators _{η} and _{ξ} for different subsystems are “separable,” in the sense that the observation (say **y**_{η}) of each subsystem only depends on the corresponding subsystem state (say ** η**). In some situations, however, the observation with respect to one subsystem may depend on the state variables of both subsystems. In such cases, one may introduce a certain transform to the observation system augmented by the observations with respect to the subsystems [see Eq. (1)], so that the resulting augmented observation system (after the transform) has a diagonal or block diagonal observation operator and thus becomes separable.

**x**≡ [(

_{η}

**)**

*η*^{T}, (

_{η}

**)**

*ξ*^{T}]

^{T}, and

_{ηξ}being the cross covariance between

**u**

_{η}and

**u**

_{ξ}. Throughout this work, we assume the observation noise

**u**

_{η}and

**u**

_{ξ}are uncorrelated, such that

_{ηξ}=

**0**. If, in addition, both

_{η}and

_{ξ}are diagonal, then the observation

**y**can be assimilated serially through some scalar update formulae (Anderson 2003). For our deduction, though, we only need to assume that

_{ηξ}≠

**0**, one can still obtain results similar to those presented below (though in somewhat more complicated forms), following a procedure similar to the derivation in the appendix.

*n*-member background ensemble consisting of the subsystem components

^{b}are the sample mean and a square root matrix of the sample covariance of

^{b}, respectively. Here,

**Φ**

^{b}and

**Ξ**

^{b}, respectively. On the other hand, definethen

^{b}onto the observation space (projection ensemble for short), then one can construct an

*m*

_{y}×

*n*matrix (projection matrix for short):whereSimilarly, one can also decompose the projection ensemble

*i*= 1, …,

*n*. Let the sample means of

**Φ**

^{obv}and

**Ξ**

^{obv}be

### a. Implementation of the ETKF in the joint estimation framework

_{n−1}is the

*n*× (

*n*− 1) transform matrix. Roughly speaking,

_{n−1}is an approximate square root of the matrix

**Λ**= [

_{n}+ (

^{h})

^{T}

^{−1}

^{h}]

^{−1}(with

_{n}being the

*n*-dimensional identity matrix) and is constructed based on the (

*n*− 1) leading eigenvalues of

**Λ**and the associated eigenvectors (see Wang et al. 2004); the (

*n*− 1) ×

*n*matrix

^{T}=

_{n−1}and

**1**

_{n}= 0 (Livings et al. 2008; Wang et al. 2004), where

**1**

_{n}is an

*n*-dimensional vector with all its elements being 1. Readers are referred to Hoteit et al. (2002) and Wang et al. (2004) for the construction of such a centering matrix. Also note that it can be more convenient to use the square root update formula

^{b}, when the ensemble size is larger than the dimension of the observation space (Posselt and Bishop 2012).

^{a}, the analysis ensemble

^{a})

_{i}denotes the

*i*th column of

^{a}. Propagating

^{a}forward, one obtains a background ensemble at the next time step and a new assimilation cycle can begin.

### b. Implementation of the ETKF in the divided estimation framework

_{η}and

_{ξ}are some square root matrices of

_{η},

_{ξ}), one has

_{n−1}is now constructed based on the (

*n*− 1) leading eigenvalues and the corresponding eigenvectors of

*n*− 1) ×

*n*centering matrix as previously discussed.

The mean update formulae Eqs. (10a) and (10b) in the divided estimation framework are similar to that in Eq. (8a). However, they also exhibit clear differences. For instance, the correction terms in the divided estimation framework, say _{11} and _{12}, that bear different forms from the Kalman gain _{11} reduces to the Kalman gain with respect to the subsystem *η*. In this sense, the presence of the term _{ξ} in _{11} reflects the coupling between the subsystems *η* and *ξ*. Similar results can also be found for the other gain matrices _{12}, _{21}, and _{22}. The square root update formula, say Eq. (12a) for the subsystem *η*, has its transform matrix _{n−1} as an approximate square root matrix of

## 3. Numerical experiments

### a. Experiment settings

*i*= 1, …,

*m*and

*j*= 1, …,

*K*, and

*F*,

*c*,

*b*, and

*h*are constant parameters. The state variables

*x*

_{i}and

*z*

_{j,i}are cyclic as in the Lorenz 96 model (Lorenz and Emanuel 1998). For instance, one has

*x*

_{m+1}=

*x*

_{1};

*x*

_{0}=

*x*

_{m};

*z*

_{K+1,i}=

*z*

_{1,i};

*z*

_{0,i}=

*z*

_{K,I}, and so on. In the experiments, we let

*m*= 40,

*K*= 1,

*F*= 8,

*c*=

*b*= 10, and

*h*= 0.8. This results in an 80-dimensional dynamical system with 40

*x*

_{i}variables and 40

*z*

_{1,i}variables. In the divided estimation framework, the two subsystems consist of the ordinary differential equations (ODEs) starting with

*dx*

_{i}/

*dt*and

*dz*

_{1,i}/

*dt*, respectively, that is,

*x*

_{i}and

*z*

_{1,i}play the roles of

*η*and

*ξ*in section 2. For convenience, we call the component

*x*

_{i}fast mode (in terms of the rate of state change) and

*z*

_{1,i}slow mode. Figure 1 plots the time series of some state variables in the ms-L96 model.

The dynamical system Eq. (14) is numerically integrated using the fourth-order Runge–Kutta–Fehlberg (RKF) method (Fehlberg 1970), and the system states are collected every 0.05 time unit (for brevity we call it an integration step). In the experiments, we run the system forward in time for 1500 integration steps and discard the first 500 steps to avoid a spinup period. In both the joint and divided estimation frameworks, data assimilation starts from step 501 until step 1500. The trajectory during this period is considered as the truth. Synthetic observations are generated by adding Gaussian white noise (with zero mean and unit variance) to the fast mode state variables *x*_{1}, *x*_{5}, *x*_{9}, …, *x*_{37} and to the slow mode ones *z*_{1,1}, *z*_{1,5}, *z*_{1,9}, …, *z*_{1,37} (i.e., every four state variables and) every four integration steps. Therefore, observations are available at 250 out of 1000 integration steps, from 20 out of 80 state variables. For convenience, we relabel the integration step 501 as the first assimilation step. An initial background ensemble with 20 ensemble members is generated by drawing samples from the 80-dimensional multivariate normal distribution *N*(**0**, _{80}) and then adding these samples to the true state at the first assimilation cycle.

In the experiments below, we consider an extra possibility in which the integration of the subsystems *x*_{i} and *z*_{1,i} in Eq. (14) is also carried out in a “divided” way. This is achieved by temporally treating variables (say *z*_{1,i}) as constant parameters in the subsystem (say *x*_{i}) during the integration and vice versa. Such a parameterization may incur extra numerical errors during the integration steps. Our main motivation to consider this option is, however, for its potential usefulness in data assimilation practices. For instance, it could be a fast—although crude, and likely not the best possible—way to combine Earth’s subsystem (ocean, atmosphere, etc.) models independently developed by different research groups and hence increase the reusability of existing resources. However, it is worthwhile to stress that running the subsystems separately is not mandatory for the implementation of the divided estimation framework.

Therefore, in each experiment below we consider four possible scenarios, which differentiate from each other depending on whether they divide the dynamical system and/or the assimilation scheme. For convenience, we denote these scenarios by (DS-joint, DA-joint), (DS-joint, DA-divided), (DS-divided, DA-joint), and (DS-divided, DA-divided), respectively, where the abbreviations DS and DA stand for “dynamical system” and “data assimilation.” Here, for instance, “DS-joint” means that the dynamical system is integrated as a whole, and “DA-divided” means that the divided estimation framework is adopted for data assimilation. Other terminologies are interpreted in a similar way.

For illustration, Fig. 2 outlines the main procedures in the scenario (DS-divided, DA-divided). Starting with an initial ensemble of the coupled system, we split the initial ensemble into two subensembles according to fast and slow modes and mark them by letters F and S, respectively. The subensemble F (S) acts as the input state vectors of the fast (slow) mode (denoted by solid arrow lines) and as the input “parameters” of the slow (fast) mode (denoted by dotted arrow lines). With incoming observations, the background ensembles of the fast and slow modes are updated to their analysis counterparts as described in section 2b. Propagating the analysis ensembles forward, one starts a new assimilation cycle, and so on.

*m*

_{x}-dimensional system, the RMSE

*e*

_{k}of an analysis

*k*is defined as

### b. Experiment results

#### 1) Results with the plain setting

First, we investigate whether the joint and divided estimation frameworks yield the same results. To this end, we compare the analyses obtained in both methods by conducting a single update step using identical background ensemble and observations. The experiment is repeated 100 times, each time the background ensemble and observations are drawn at random so that in general they will change over different repetitions. Figure 3 shows that the mean and standard deviation (STD) of the differences (in absolute values) between the state variables of the analyses of both estimation frameworks are in the order of 10^{−16}. Our computations are carried out with MATLAB (version R2012a), in which the numerical precision of floating point numbers is 2.2204 × 10^{−16}. This indicates that the tiny differences reported in Fig. 3 mainly stem from the numerical precision in computations.

Figure 4 depicts the time series of the RMSEs of the estimates obtained in the four different scenarios, (DS-joint, DA-joint), (DS-joint, DA-divided), (DS-divided, DA-joint), and (DS-divided, DA-divided), with a longer time horizon. These four assimilation scenarios have identical initial background ensembles and observations. However, the background ensembles in these four scenarios may (gradually) deviate from each other at subsequent time instants because of the chaotic nature of the ms-L96 model and the extra parameterization errors in the DS-divided scenarios. Therefore, in Fig. 4 one can see that, in the DS-joint scenarios, the differences between the estimates from the joint (Fig. 4a) and divided (Fig. 4b) estimation frameworks are nearly zero during the early assimilation period, but become more substantial over time. Meanwhile, in the DS-divided scenarios the estimates from either the joint (Fig. 4c) or the divided (Fig. 4d) estimation framework deviate from those in the (DS-joint, DA-joint) scenario (Fig. 4a) more quickly with the extra parameterization errors.

In terms of estimation accuracy, the time-mean RMSE in Fig. 4a is 2.7866. In contrast, the time-mean RMSEs in Figs. 4b–d are −0.1203 (lower), −0.1649 (lower), and +0.3808 (higher), respectively, relative to that in Fig. 4a. This seems to suggest that the extra numerical errors due to parameterization are not always harmful. For instance, the time-mean RMSE in Fig. 4c appears to be the lowest in these four tested scenarios. A possible explanation of this result is discussed later from the point of view of covariance inflation.

Because of the interactions of the forecast and update steps in assimilating the ms-L96 model, it is challenging to obtain an analytic description of the dynamics of the differences between the reference trajectory of the (DS-joint, DA-joint) scenario and those of the (DS-joint, DA-divided), (DS-divided, DA-joint), and (DS-divided, DA-divided) scenarios. For this reason, in what follows we adopt two statistical measures, namely, the boxplot (see the left column of Fig. 5) and the histogram (see the right column of Fig. 5), to characterize these differences.

A boxplot depicts a group of data through their quartiles. In this work, the boxplot is adopted to plot the differences at certain time instants. The differences are 80-dimensional vectors, obtained by subtracting the trajectory of the reference scenario (DS-joint, DA-joint) from those of the scenarios (DS-joint, DA-divided), (DS-divided, DA-joint), and (DS-divided, DA-divided) at some particular time instants. A boxplot is used here to indicate the spatial distribution of the 80 elements in a difference vector at a particular time instant. For ease of visualization, we only plot the boxes at time steps {1:10:91} and {100:100:1000}, where *υ*_{ini}:Δ*υ*:*υ*_{final} stands for an array of scalars that grow from the initial value *υ*_{ini} to the final one *υ*_{final}, with an even increment Δ*υ* each time. Our boxplot setting follows the custom in MATLAB (version R2012a): on each box, the band inside the box denotes the median, the bottom and top of the box represent the 25th and 75th percentiles, the ends of the whiskers indicate the extension of the data that are considered nonoutliers, while outliers are marked individually as asterisks in Fig. 5. Note that, in the (DS-joint, DA-divided) scenario, because the differences from the reference trajectory are very tiny at the early assimilation stage, the boxes appear to collapse during this period (e.g., from time steps 1 to 91), which is consistent with the results in Fig. 4b. As time moves forward, the trajectory of the (DS-joint, DA-divided) scenario gradually deviates from the reference. Therefore, as indicated in Fig. 5a, the spreads of the differences become larger from time step 200 on, compared to those at earlier time steps. In addition, more outliers (asterisks) are seen after time step 200, while the medians of the differences appear to remain close to zero at all time steps. Similar phenomena can also be observed in the (DS-divided, DA-joint) and (DS-divided, DA-divided) scenarios, except that the periods in which the boxplots collapse are much shorter compared to that in the (DS-joint, DA-divided) scenario, which is also consistent with the results in Fig. 4.

The histogram is also used here to depict the distribution of an element in a difference vector during the whole assimilation time window. In the right column of Fig. 5 we show the twentieth and sixtieth elements, which correspond to the trajectory differences in the state variables *x*_{20} and *z*_{1,20}, respectively, in the scenarios (DS-joint, DA-divided), (DS-divided, DA-joint), and (DS-divided, DA-divided). In the (DS-joint, DA-divided) scenario, the histogram of the differences in state variable *x*_{20} appears to have a single peak at zero, while its support is inside the interval [−15, 15]. The histogram of the differences in state variable *z*_{1,20} also has a single peak at zero, but its support is narrower, being inside the interval [−1.5, 1.5] instead. Similar phenomena are also observed in the (DS-divided, DA-joint) and (DS-divided, DA-divided) scenarios, although the heights of the peaks tend to be lower, and the corresponding supports tend to be wider.

Overall, the results in Figs. 4 and 5 seem to suggest that the trajectories of the (DS-joint, DA-divided), (DS-divided, DA-joint), and (DS-divided, DA-divided) scenarios tend to oscillate around the reference trajectory of the (DS-joint, DA-joint) scenario, although they may also substantially deviate from the reference one at many time instants.

#### 2) Results with both covariance inflation and localization

Covariance inflation (Anderson and Anderson 1999) and localization (Hamill et al. 2001) are two important auxiliary techniques that can be used to improve the performance of an EnKF. Since the EnKF is a Monte Carlo implementation of the Kalman filter, when the ensemble size is relatively small, certain issues may arise, including, for instance, systematic underestimation of the variances of state variables, overestimation of the correlations of different state variables, and rank deficiency in the sample error covariance matrix.

Covariance inflation (Anderson and Anderson 1999) is introduced to tackle the variance underestimation problem by artificially increasing the sample error covariance to some extent. In relation to the results in the previous experiment, one possible explanation of the result there is that the extra numerical errors due to parameterization may have acted as some additive noise in the dynamical model, which is not always bad for a filter’s performance. Indeed, as has been reported in some earlier works (e.g., Gordon et al. 1993; Hamill and Whitaker 2011), introducing some artificial noise to the dynamical model may improve filter performance. In the context of EnKF, this may be considered an alternative form of covariance inflation (Hamill and Whitaker 2011), which may enhance the robustness of the filter from the point of view of *H*_{∞} filtering theory (Luo and Hoteit 2011; Altaf et al. 2013; Triantafyllou et al. 2013). One may also introduce artificial noise in a more sophisticated way, for example, through a certain nonlinear regression model, such that the statistical effect of the regression model mimics that of the dynamical model (Harlim et al. 2014).

How to optimally conduct covariance inflation is an ongoing research topic in the data assimilation community. Some recent developments include, for example, adaptive covariance inflation techniques (see, e.g., Anderson 2007, 2009) and covariance inflation from the point of view of residual nudging (Luo and Hoteit 2014b, 2013, 2012), among many others. For our purpose here, it appears sufficient to conduct covariance inflation by simply multiplying the analysis sample error covariance by a factor *δ*^{2} (*δ* ≥ 1), as originally proposed in Anderson and Anderson (1999). The values of *δ* in the experiment are {1:0.05:1.3}.

Covariance localization (Hamill et al. 2001) is adopted to deal with the overestimation of the correlations and rank deficiency. In practice, different methods are proposed to conduct localization (see, e.g., Anderson 2007, 2009; Clayton et al. 2013; Kuhl et al. 2013; Wang et al. 2007). In our experiments, localization is directly applied to the gain matrices. We assume that *z*_{1,i} and *x*_{i} are located at the same grid point *i*. Covariance localization thus follows the settings in Anderson (2007), in which a parameter *l*_{c}, called the half-width (or length scale of localization), controls the degree of correlation tapering. We use the same half-width for the fast and slow components of the ms-L96 model, with *l*_{c} being chosen from the set {0.1:0.2:0.9}. In general, for both the joint and divided estimation frameworks, one may use different half-widths for different components (e.g., ocean and atmosphere) of a coupled system. In such circumstances, it could be more efficient to use an adaptive localization approach (see, e.g., Bishop and Hodyss 2007, 2009a,b, 2011).

We investigate the filter performance in the aforementioned four scenarios by combining different values of the inflation factor *δ* and the half-width *l*_{c}. The corresponding results, in terms of time-mean RMSEs (the averages of the RMSEs over the assimilation time window) are reported in Fig. 6. In the experiments, the filters’ performance is improved in most of the cases, in comparison with the results in Fig. 4. In Fig. 6, the best filter performance is obtained with *l*_{c} ≈ 0.7, while with localization, covariance inflation does not seem to help improve the estimation accuracy,^{2} similar to the findings of Penny (2014). The above results, however, may strongly depend on the experimental settings. For instance, in the context of the hybrid local ETKF, Penny (2014) found that the best filter performance is achieved at relatively small *l*_{c} values (e.g., ≈0.2).

Figure 6 also indicates that, for a given model integration scenario (either DS-joint or DS-divided), the joint and divided estimation frameworks yield very close results. On the other hand, for a given estimation framework (either DA-joint or DA-divided), integrating the subsystems separately tends to deteriorate filter performance. In general, the performance deterioration is not severe, less than 10% in all cases with the same values of *δ* and *l*_{c}.

## 4. Two extensions from the practical point of view

In this section, we present two extensions of the aforementioned frameworks. These are largely motivated by the current status and challenges of conducting data assimilation in coupled ocean–atmosphere models (Bishop et al. 2012). These two extensions are illustrated within the (DS-divided, DA-divided) scenario. The extensions to the other scenarios can be implemented in a similar way.

### a. Different ensemble sizes in the subsystems

Here, we consider the possibility of running the filter with different ensemble sizes in the fast and slow modes. This may be considered as an example in which one wants to gain certain computational efficiency by running fewer ensemble members in one of the subsystems, but possibly at the cost of certain loss of accuracy. To this end, let the ensemble sizes of the fast and slow modes be *n*_{f} and *n*_{s}, respectively. In the experiments, we consider four different cases, with (*n*_{f} = 20, *n*_{s} = 20), (*n*_{f} = 20, *n*_{s} = 15), (*n*_{f} = 15, *n*_{s} = 20), and (*n*_{f} = 15, *n*_{s} = 15) at the prediction step, and the targeted ensemble size is 20 for both modes at the filtering step. To apply the filter update formulae, the ensemble sizes of both modes should be equal. Therefore, dimension mismatch will arise when *n*_{f} ≠ *n*_{s}. This issue is addressed through a conditional sampling scheme discussed in the online supplemental material.

In each of the above cases, we investigate the filter’s performance when (i) neither covariance inflation nor covariance localization is applied (the plain setting) and (ii) both covariance inflation and covariance localization are adopted. In setting ii, the covariance inflation factor is 1.15 for both the fast and slow modes, and the half-width for covariance localization is 0.75.

Figure 7 plots the time series of the RMSEs for the above four different cases. In each case, when the filter is equipped with both covariance inflation and localization, its time-mean RMSE tends to be lower than that of the plain setting (with neither inflation nor localization). On the other hand, if one takes the case (*n*_{f} = 20, *n*_{s} = 20) with both covariance inflation and localization as the reference, then it is clear that reducing the ensemble size of either the fast or slow mode degrades the filter performance in terms of RMSE. Also, comparing Figs. 7b and 7c, one can see that reducing the ensemble size of the fast mode appears to have a larger (negative) impact than reducing the ensemble size of the slow one, which may be because the fast mode appears to dominate the dynamics of the ms-L96 model [see Fig. 8 later as well as the similar results in Hoteit and Pham (2004)]. On the other hand, comparing Figs. 7c and 7d, it seems better to simply reduce the ensemble sizes of both the fast and slow modes, in contrast to the case that reduces the ensemble size of the fast mode only. This may also be because the fast mode is the dominant part to the dynamics of the ms-L96 model; therefore, the extra errors due to the sampling scheme may be significant to the filter performance. However, a comparison between Figs. 7b and 7d suggests that if one only reduces the ensemble size of the slow mode, then the filter performance can be better than that resulting from reducing the ensemble sizes of both modes. Similar results are also observed with the plain setting, except that with the plain setting, the case (*n*_{f} = 20, *n*_{s} = 15) seems to perform slightly better than the one with (*n*_{f} = 20, *n*_{s} = 20).

### b. Incorporating the ensemble optimal interpolation into the divided estimation framework

If one subsystem of the coupled model (e.g., the ocean in the coupled ocean–atmosphere model) exhibits relatively slow changes, then it may be reasonable to assume that this subsystem has a (almost) constant background covariance over a short assimilation time window (Hoteit et al. 2002).^{3} As a result, optimal interpolation (OI; see, e.g., Cooper and Haines 1996) could be a reasonable assimilation scheme for such a slow-varying subsystem model because of its simplicity in implementation and significant savings in computational cost. The ensemble optimal interpolation (EnOI; see, e.g., Counillon and Bertino 2009) is an ensemble implementation of the OI scheme. It has an update step similar to that of the EnKF, but computes the associated background covariance (or square root matrix) based on a “historical” ensemble (Counillon and Bertino 2009). At the prediction step, the EnOI only propagates the analysis mean forward to obtain a background mean at the next assimilation cycle. This is computationally much cheaper than propagating the whole analysis ensemble forward as in the EnKF and hence appears attractive for certain applications (e.g., oceanography; see Hoteit et al. 2002; Bishop et al. 2012).

Here, we consider the possibility to tailor the divided estimation framework so as to incorporate the EnOI into one of the subsystems. Such a modification is largely motivated by the current status and challenges of operational data assimilation in coupled ocean–atmosphere models, in which, because of the limitations in computational resource, one may use OI or three-dimensional variational data assimilation (3D-Var; or their ensemble implementations) for the ocean model and a more sophisticated scheme such as four-dimensional variational data assimilation (4D-Var) or EnKF for the atmosphere model. Therefore, combining these different assimilation systems becomes a challenge in practice (Bishop et al. 2012).

In our investigation below, to incorporate the EnOI into the divided estimation framework, some modifications are introduced. (i) At the prediction step, the slow mode only propagates forward the analysis mean of the corresponding subensemble and uses the analysis mean with respect to the fast mode as the “parameters” in the numerical integrations of the slow mode. On the other hand, the fast mode propagates forward the corresponding analysis subensemble [updated through Eqs. (12) and (13)] and uses the update of the historical ensemble [also through Eqs. (12) and (13)] of the slow mode as the parameters in the numerical integrations of the fast mode. (ii) At the filtering step, the background subensemble of the fast mode is the propagation of the analysis subensemble from the previous assimilation cycle, while the background subensemble of the slow mode is the historical ensemble generated by drawing a specified number of samples from a Gaussian distribution whose mean and covariance are equal to the “climatological” mean and covariance of the slow mode, respectively. This historical ensemble is produced once for all and does not change over the assimilation window. However, at each assimilation cycle, when a new observation is available, the historical ensemble is updated according to Eqs. (12) and (13) and is used as the parameters of the fast mode. In doing so, the cross covariance between the historical ensemble of the slow mode and the flow-dependent subensemble of the fast mode may not accurately represent the true correlations between both modes.

To generate the historical ensemble of the slow mode, we run the ms-L96 model forward in time for 100 000 integration steps, where the step size is 0.05. The climatological statistics are then taken as the temporal mean and covariance of the generated trajectory. Figure 8 shows the values of the climatological means and the eigenvalues of the climatological covariances of the fast and slow modes. These results suggest that the fast mode dominates the slow one in magnitudes, consistent with the results in Fig. 1.

In the experiments below, the ensemble sizes of the fast and slow modes are both 20. For distinction, hereafter we refer to the extended assimilation scheme with the EnOI as “DA-divided-exEnOI” and that without the EnOI as DA-divided. We also consider two settings. In the plain setting neither covariance inflation nor localization is conducted, while in the other setting both auxiliary techniques are applied, with an inflation factor of 1.15 for the fast and slow modes and a half-width of 0.7.

Figure 9 plots the time series of the RMSEs for the DA-divided and DA-divided-exEnOI. When neither covariance inflation nor localization is adopted, the magnitudes of the trajectories of DA-divided and DA-divided-exEnOI are comparable at many time instants, although substantial differences are also spotted in some cases (e.g., the interval between time steps 100 and 200). On the other hand, when covariance inflation and localization are applied, both DA-divided and DA-divided-exEnOI schemes tend to yield lower time-mean RMSEs. In addition, with covariance inflation and localization, the difference (in time-mean RMSE) between DA-divided and DA-divided-exEnOI is narrowed from around 0.06 to around 0.01. Although the relative performance of the DA-divided and DA-divided-exEnOI schemes may in general change from case to case, the above experiment suggests—at least for the ms-L96 model—the potential of incorporating the EnOI into the divided estimation framework to reduce the computational cost.

## 5. Discussion and conclusions

We consider the data assimilation problem in coupled systems composed of two subsystems. A straightforward method to tackle this problem is to augment the state vectors of the subsystems. In contrast, the divided estimation framework reexpresses the update formulae in the joint estimation framework in terms of some quantities with respect to the subsystems themselves. We also consider the option of running the subsystems separately, which may bring flexibility and efficiency to data assimilation practices in certain situations, but possibly at the cost of larger discretization errors during model integrations.

We use a multiscale Lorenz 96 model to evaluate the performance of four different data assimilation scenarios, combining different options of joint/divided subsystems and joint/divided estimation frameworks. In addition, we also consider two possible extensions that may be relevant for certain coupled data assimilation problems. The experiment results suggest that (i) with identical background ensemble and observation, the joint and divided estimation frameworks yield the same estimate within the machine’s numerical precision; (ii) running the subsystems separately may bring in extra flexibility in practice, but at the cost of reduced estimation accuracy in certain circumstances; and (iii) for the approximations used in the extension schemes of section 4, provided that the assimilation schemes are properly configured, one might still obtain reasonable estimates, especially when both covariance inflation and localization are applied.

The current work mainly serves as a proof-of-concept study. In real applications, for instance, data assimilation in coupled ocean–atmosphere general circulation models (OAGCM), model balance and the generation of initial background ensemble are among the issues that require special attention (Saha et al. 2014; Zhang et al. 2007). Additional challenges (e.g., different time scales between ocean and atmosphere components) may also arise when coupled data assimilation is extended to longer time scales (e.g., in the context of climate studies). In this case, certain configurations in the current work may need to be modified, including (but not limited to), for instance, the way to generate the initial background ensemble and to conduct the conditional sampling (see the online supplemental material). This study may be considered a complement to some existing works in the literature (e.g., Zhang et al. 2007) in terms of the data assimilation schemes in use. In light of the mathematical equivalence between the joint and divided estimation frameworks, we envision that existing techniques (see, e.g., Saha et al. 2014; Zhang et al. 2007) and their future developments used to tackle the aforementioned challenges can also be applied in a similar way within the divided estimation framework.

One may also extend the present work to the situations where the coupled system consists of more than two components. This extension may be of interest in certain situations, for instance, when the interactions of land, ocean, and atmosphere are in consideration or when the domain of a global model is divided into a number of subdomains such that data assimilation is conducted in a set of regional models, similar to the scenario considered in the local ensemble Kalman filter (Ott et al. 2004). In such cases, the corresponding update formulae may become more complicated when adopting the divided estimation framework. This topic will be investigated in the future.

We thank three reviewers for their constructive comments and suggestions that significantly improved the presentation and quality of the work. This study was funded by King Abdullah University of Science and Technology (KAUST). The first author would also like to thank the IRIS/CIPR cooperative research project “Integrated Workflow and Realistic Geology” that is funded by industry partners ConocoPhillips, Eni, Petrobras, Statoil, and Total, as well as the Research Council of Norway (PETROMAKS), for partial financial support.

# APPENDIX

## Gain Matrices in the Divided Estimation Framework

^{b}(

^{h})

^{T}, which readsNext, we consider the component [

^{h}(

^{h})

^{T}+

^{−1}, which can be expanded toApplying the matrix inversion lemma (Simon 2006, p. 11) on the right-hand side of Eq. (A2), we havewhereandThe equality between the second and third lines of Eq. (A5) is derived based on the Sherman–Morrison–Woodbury identity (Sherman and Morrison 1950) such thatIn the last line of Eq. (A5),

_{ξ}is a square root of

*ξ*(Bishop et al. 2001). Similarly, we havewith

_{η}being a square root of

## REFERENCES

Altaf, U. M., , T. Butler, , X. Luo, , C. Dawson, , T. Mayo, , and H. Hoteit, 2013: Improving short-range ensemble Kalman storm surge forecasting using robust adaptive inflation.

,*Mon. Wea. Rev.***141**, 2705–2720, doi:10.1175/MWR-D-12-00310.1.Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation.

,*Mon. Wea. Rev.***129**, 2884–2903, doi:10.1175/1520-0493(2001)129<2884:AEAKFF>2.0.CO;2.Anderson, J. L., 2003: A local least squares framework for ensemble filtering.

,*Mon. Wea. Rev.***131**, 634–642, doi:10.1175/1520-0493(2003)131<0634:ALLSFF>2.0.CO;2.Anderson, J. L., 2007: An adaptive covariance inflation error correction algorithm for ensemble filters.

,*Tellus***59A**, 210–224, doi:10.1111/j.1600-0870.2006.00216.x.Anderson, J. L., 2009: Spatially and temporally varying adaptive covariance inflation for ensemble filters.

,*Tellus***61A**, 72–83, doi:10.1111/j.1600-0870.2008.00361.x.Anderson, J. L., , and S. L. Anderson, 1999: A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts.

,*Mon. Wea. Rev.***127**, 2741–2758, doi:10.1175/1520-0493(1999)127<2741:AMCIOT>2.0.CO;2.Bishop, C. H., , and D. Hodyss, 2007: Flow-adaptive moderation of spurious ensemble correlations and its use in ensemble-based data assimilation.

,*Quart. J. Roy. Meteor. Soc.***133**, 2029–2044, doi:10.1002/qj.169.Bishop, C. H., , and D. Hodyss, 2009a: Ensemble covariances adaptively localized with ECO-RAP. Part 1: Tests on simple error models.

,*Tellus***61A**, 84–96, doi:10.1111/j.1600-0870.2008.00371.x.Bishop, C. H., , and D. Hodyss, 2009b: Ensemble covariances adaptively localized with ECO-RAP. Part 2: A strategy for the atmosphere.

,*Tellus***61A**, 97–111, doi:10.1111/j.1600-0870.2008.00372.x.Bishop, C. H., , and D. Hodyss, 2011: Adaptive ensemble covariance localization in ensemble 4D-VAR state estimation.

,*Mon. Wea. Rev.***139**, 1241–1255, doi:10.1175/2010MWR3403.1.Bishop, C. H., , B. J. Etherton, , and S. J. Majumdar, 2001: Adaptive sampling with ensemble transform Kalman filter. Part I: Theoretical aspects.

,*Mon. Wea. Rev.***129**, 420–436, doi:10.1175/1520-0493(2001)129<0420:ASWTET>2.0.CO;2.Bishop, C. H., and Coauthors, 2012: Joint GODAE OceanView—WGNE workshop on short- to medium-range coupled prediction for the atmosphere-wave-sea-ice-ocean: Status, needs and challenges: Data assimilation—Whitepaper. GODAE OceanView Rep., 14 pp. [Available online at https://www.godae-oceanview.org/outreach/meetings-workshops/task-team-meetings/coupled-prediction-workshop-gov-wgne-2013/white-papers/.]

Burgers, G., , P. J. van Leeuwen, , and G. Evensen, 1998: On the analysis scheme in the ensemble Kalman filter.

,*Mon. Wea. Rev.***126**, 1719–1724, doi:10.1175/1520-0493(1998)126<1719:ASITEK>2.0.CO;2.Clayton, A., , A. Lorenc, , and D. Barker, 2013: Operational implementation of a hybrid ensemble/4D-Var global data assimilation system at the Met Office.

,*Quart. J. Roy. Meteor. Soc.***139**, 1445–1461, doi:10.1002/qj.2054.Cooper, M., , and K. Haines, 1996: Altimetric assimilation with water property conservation.

,*J. Geophys. Res.***101**, 1059–1077, doi:10.1029/95JC02902.Counillon, F., , and L. Bertino, 2009: Ensemble optimal interpolation: Multivariate properties in the Gulf of Mexico.

,*Tellus***61A**, 296–308, doi:10.1111/j.1600-0870.2008.00383.x.Dawson, C., , S. Sun, , and M. F. Wheeler, 2004: Compatible algorithms for coupled flow and transport.

,*Comput. Methods Appl. Mech. Eng.***193**, 2565–2580, doi:10.1016/j.cma.2003.12.059.Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics.

,*J. Geophys. Res.***99**, 10 143–10 162, doi:10.1029/94JC00572.Fehlberg, E., 1970: Classical fourth and lower order Runge-Kutta formulas with stepsize control and their application to heat transfer problems (in German).

,*Computing***6**, 61–71, doi:10.1007/BF02241732.Gordon, N. J., , D. J. Salmond, , and A. F. M. Smith, 1993: Novel approach to nonlinear and non-Gaussian Bayesian state estimation.

,*IEE Proc. F Radar Signal Process.***140**, 107–113, doi:10.1049/ip-f-2.1993.0015.Hamill, T. M., , and J. S. Whitaker, 2011: What constrains spread growth in forecasts initialized from ensemble Kalman filters?

,*Mon. Wea. Rev.***139**, 117–131, doi:10.1175/2010MWR3246.1.Hamill, T. M., , J. S. Whitaker, , and C. Snyder, 2001: Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter.

,*Mon. Wea. Rev.***129**, 2776–2790, doi:10.1175/1520-0493(2001)129<2776:DDFOBE>2.0.CO;2.Harlim, J., , A. Mahdi, , and A. J. Majda, 2014: An ensemble Kalman filter for statistical estimation of physics constrained nonlinear regression models.

,*J. Comput. Phys.***257**, 782–812, doi:10.1016/j.jcp.2013.10.025.Hoteit, I., , and D. T. Pham, 2004: An adaptively reduced-order extended Kalman filter for data assimilation in the tropical Pacific.

,*J. Mar. Syst.***45**, 173–188, doi:10.1016/j.jmarsys.2003.11.004.Hoteit, I., , D. T. Pham, , and J. Blum, 2002: A simplified reduced order Kalman filtering and application to altimetric data assimilation in tropical Pacific.

,*J. Mar. Syst.***36**, 101–127, doi:10.1016/S0924-7963(02)00129-X.Hoteit, I., , X. Luo, , and D. T. Pham, 2012: Particle Kalman filtering: An optimal nonlinear framework for ensemble Kalman filters.

,*Mon. Wea. Rev.***140**, 528–542, doi:10.1175/2011MWR3640.1.Jazwinski, A. H., 1970:

*Stochastic Processes and Filtering Theory.*Academic Press, 400 pp.Kuhl, D. D., , T. E. Rosmond, , C. H. Bishop, , J. McLay, , and N. L. Baker, 2013: Comparison of hybrid ensemble/4DVar and 4DVar within the NAVDAS-AR data assimilation framework.

,*Mon. Wea. Rev.***141**, 2740–2758, doi:10.1175/MWR-D-12-00182.1.Livings, D. M., , S. L. Dance, , and N. K. Nichols, 2008: Unbiased ensemble square root filters.

,*Physica D***237**, 1021–1028, doi:10.1016/j.physd.2008.01.005.Lorenz, E. N., 2006: Predictability—A problem partly solved.

*Predictability of Weather and Climate,*T. Palmer and R. Hagedorn, Eds., Cambridge University Press, 40–58.Lorenz, E. N., , and K. A. Emanuel, 1998: Optimal sites for supplementary weather observations: Simulation with a small model.

,*J. Atmos. Sci.***55**, 399–414, doi:10.1175/1520-0469(1998)055<0399:OSFSWO>2.0.CO;2.Luo, X., , and I. M. Moroz, 2009: Ensemble Kalman filter with the unscented transform.

,*Physica D***238**, 549–562, doi:10.1016/j.physd.2008.12.003.Luo, X., , and I. Hoteit, 2011: Robust ensemble filtering and its relation to covariance inflation in the ensemble Kalman filter.

,*Mon. Wea. Rev.***139**, 3938–3953, doi:10.1175/MWR-D-10-05068.1.Luo, X., , and I. Hoteit, 2012: Ensemble Kalman filtering with residual nudging.

*Tellus,***64A,**17 130, doi:10.3402/tellusa.v64i0.17130.Luo, X., , and I. Hoteit, 2013: Covariance inflation in the ensemble Kalman filter: A residual nudging perspective and some implications.

,*Mon. Wea. Rev.***141**, 3360–3368, doi:10.1175/MWR-D-13-00067.1.Luo, X., , and H. Hoteit, 2014a: Ensemble Kalman filtering with residual nudging: An extension to state estimation problems with nonlinear observation operators.

,*Mon. Wea. Rev.***142,**3696–3712, doi:10.1175/MWR-D-13-00328.1.Luo, X., , and I. Hoteit, 2014b: Efficient particle filtering through residual nudging.

,*Quart. J. Roy. Meteor. Soc.***140**, 557–572, doi:10.1002/qj.2152.Luo, X., , I. M. Moroz, , and I. Hoteit, 2010: Scaled unscented transform Gaussian sum filter: Theory and application.

,*Physica D***239**, 684–701, doi:10.1016/j.physd.2010.01.022.Ott, E., and Coauthors, 2004: A local ensemble Kalman filter for atmospheric data assimilation.

,*Tellus***56A**, 415–428, doi:10.1111/j.1600-0870.2004.00076.x.Penny, S. G., 2014: The hybrid local ensemble transform Kalman filter.

,*Mon. Wea. Rev.***142,**2139–2149, doi:10.1175/MWR-D-13-00131.1.Petihakis, G., , G. Triantafyllou, , K. Tsiaras, , G. Korres, , A. Pollani, , and I. Hoteit, 2009: Eastern Mediterranean biogeochemical flux model—Simulations of the pelagic ecosystem.

,*Ocean Sci.***5**, 29–46, doi:10.5194/os-5-29-2009.Posselt, D. J., , and C. H. Bishop, 2012: Nonlinear parameter estimation: Comparison of an ensemble Kalman smoother with a Markov chain Monte Carlo algorithm.

,*Mon. Wea. Rev.***140**, 1957–1974, doi:10.1175/MWR-D-11-00242.1.Russell, G., , J. Miller, , and D. Rind, 1995: A coupled atmosphere-ocean model for transient climate change studies.

,*Atmos.–Ocean***33**, 683–730, doi:10.1080/07055900.1995.9649550.Saha, S., and Coauthors, 2014: The NCEP Climate Forecast System version 2.

,*J. Climate***27,**2185–2208, doi:10.1175/JCLI-D-12-00823.1.Sherman, J., , and W. J. Morrison, 1950: Adjustment of an inverse matrix corresponding to a change in one element of a given matrix.

,*Ann. Math. Stat.***21**, 124–127, doi:10.1214/aoms/1177729893.Simon, D., 2006:

*Optimal State Estimation: Kalman, H-Infinity, and Nonlinear Approaches.*Wiley-Interscience, 552 pp.Tippett, M. K., , J. L. Anderson, , C. H. Bishop, , T. M. Hamill, , and J. S. Whitaker, 2003: Ensemble square root filters.

,*Mon. Wea. Rev.***131**, 1485–1490, doi:10.1175/1520-0493(2003)131<1485:ESRF>2.0.CO;2.Triantafyllou, G., , I. Hoteit, , X. Luo, , K. Tsiaras, , and G. Petihakis, 2013: Assessing a robust ensemble-based Kalman filter for efficient ecosystem data assimilation of the Cretan Sea.

,*J. Mar. Syst.***125**, 90–100, doi:10.1016/j.jmarsys.2012.12.006.Van Leeuwen, P. J., 2009: Particle filtering in geophysical systems.

,*Mon. Wea. Rev.***137**, 4089–4114, doi:10.1175/2009MWR2835.1.Wang, X., , C. H. Bishop, , and S. J. Julier, 2004: Which is better, an ensemble of positive–negative pairs or a centered simplex ensemble.

,*Mon. Wea. Rev.***132**, 1590–1605, doi:10.1175/1520-0493(2004)132<1590:WIBAEO>2.0.CO;2.Wang, X., , T. M. Hamill, , J. S. Whitaker, , and C. H. Bishop, 2007: A comparison of hybrid ensemble transform Kalman filter–optimum interpolation and ensemble square root filter analysis schemes.

,*Mon. Wea. Rev.***135**, 1055–1076, doi:10.1175/MWR3307.1.Whitaker, J. S., , and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations.

,*Mon. Wea. Rev.***130**, 1913–1924, doi:10.1175/1520-0493(2002)130<1913:EDAWPO>2.0.CO;2.Zhang, S., , M. Harrison, , A. Rosati, , and A. Wittenberg, 2007: System design and evaluation of coupled ensemble data assimilation for global oceanic climate studies.

,*Mon. Wea. Rev.***135**, 3541–3564, doi:10.1175/MWR3466.1.Zupanski, M., 2005: Maximum likelihood ensemble filter: Theoretical aspects.

,*Mon. Wea. Rev.***133**, 1710–1726, doi:10.1175/MWR2946.1.

^{1}

In cases of nonlinear observation operators, one may either approximate them by some linear ones or adopt more sophisticated assimilation schemes (see, e.g., Hoteit et al. 2012; Luo et al. 2010; Luo and Hoteit 2014a; Van Leeuwen 2009; Zupanski 2005).

^{2}

When covariance localization is excluded, inflation may improve the filters’ performance (results not shown).

^{3}

In the context of meteorological applications, the extension described here mainly targets short-term (e.g., subseasonal) time scales, while for seasonal or longer time scale applications (e.g., climate studies), the small-variation assumption (e.g., in the ocean component) may not be valid.