## 1. Introduction

To simulate real weather events with an acceptable degree of accuracy, numerical weather prediction models rely on carefully specified initial conditions that consider all past and present information leading up to the initialization time. Data assimilation provides a means of initializing models with the most probable state, given prior statistics about the model and imperfect observations. For large dimensional systems, like atmospheric models, these methods typically assume Gaussian errors for the observations and prior probability density, and solve a linear system of equations for the posterior mean and covariance. Data assimilation methods can take the form of filters, which consider information up to the time of the analysis, or smoothers, which consider both past and future information relative to the analysis time. The most commonly used filters in atmospheric science are the ensemble Kalman filter (EnKF; Evensen 1994; Houtekamer and Mitchell 1998) and 3D variational data assimilation methods (3DVar; Le Dimet and Talagrand 1986), and the most commonly used smoother is the 4D variational method (Thepáut and Courtier 1991). Ensemble-based approaches, like the EnKF, provide a sample representation of the posterior error covariance after each data assimilation cycle, which can be used for generating probabilistic forecasts and prior error statistics for future data assimilation cycles. On the other hand, variational methods provide an efficient means of assimilating large numbers of observations; the solution, however, is purely deterministic in that posterior covariances are not estimated. The framework for variational systems already exists for most models used operationally and for research; examples include thoroughly tested data processing and quality control procedures, efficient preconditioning methods, and physical balance constraints for the solution.

Another benefit of variational data assimilation is the straightforward implementation of a hybrid prior error covariance. Hybrid data assimilation methods combine flow-dependent, ensemble covariance with the stationary, climate-based covariances that are typically used in 3DVar and 4DVar. Hamill and Snyder (2000) found that mixing ensemble and static covariance produces better results than using the two components separately during data assimilation: the ensemble provides nonstationary error statistics, while the static covariance removes some of the filter sensitivity to ensemble size. Ensemble filters that use hybrid covariance are also less sensitive to model error, which introduces additional bias in the ensemble estimate (Etherton and Bishop 2004). Using a low-dimensional model, Bishop and Satterfield (2013) provide a mathematical justification for hybrid data assimilation by showing that a linear combination of ensemble and static covariance provides the optimal prior error covariance, given a distribution of ensemble variances and the likelihood of variances given the true conditional error variance.

Among the various types of hybrid data assimilation methods, recent studies find the best performance, in terms of analysis and forecast error reduction, from systems that incorporate ensemble information in variational smoothers (Tian et al. 2008, 2009; Zhang et al. 2009; Buehner et al. 2010b; Bishop and Hodyss 2011; Zhang and Zhang 2012; Zhang et al. 2013; Clayton et al. 2013; Kuhl et al. 2013; Poterjoy and Zhang 2014; Fairbairn et al. 2014; Wang and Lei 2014). One strategy is to introduce ensemble perturbations into pre-existing 4DVar systems, which will be denoted ensemble-4DVar (E4DVar) in this manuscript. A second strategy is to use a 4D ensemble to replace the tangent linear and adjoint model operators in 4DVar (Liu et al. 2008), which will be denoted 4D-ensemble-Var (4DEnVar). 4DEnVar requires the additional cost of running an ensemble forecast through the time window from which observations are collected, but avoids the coding and maintenance of a tangent linear model. More importantly, 4DEnVar does not require running the tangent linear model and its adjoint during each iteration of the cost function minimization, thus increasing the parallel capability of this algorithm over E4DVar. In this study, both E4DVar and 4DEnVar are implemented using the approach introduced in Zhang et al. (2009): from an ensemble forecast, the analysis state is found by minimizing the hybrid variational cost function, while an EnKF updates the ensemble perturbations around the hybrid solution.

Buehner et al. (2010a,b) provided the first comparison of E4DVar and 4DEnVar with other commonly used data assimilation methods for operational global models. Using a low-dimensional model, Fairbairn et al. (2014) performed a more systematic comparison of these methods, and investigated their sensitivity to observation density, ensemble size, model errors, and other system parameters. Both Buehner et al. and Fairbairn et al. applied a one-way coupling strategy in which ensemble perturbations are introduced into the variational system from a cycling EnKF, but the resulting variational solution is not fed back into the EnKF component. This approach facilitates a comparison of the two data assimilation methods using identical ensemble covariance, but does not allow the variational solution to affect the ensemble forecasts between cycles. In both of these studies, the authors applied either an ensemble-estimated covariance or a static covariance (no hybrid covariance) during data assimilation, and found E4DVar to provide more accurate results than 4DEnVar. Similar to Fairbairn et al. the current study applies a low-dimensional model to examine several aspects of E4DVar and 4DEnVar data assimilation systems. This study differs from previous studies that directly compare E4DVar and 4DEnVar in that it focuses primarily on hybrid implementations of the two systems, and uses a two-way coupling between EnKF and variational components, thus allowing E4DVar and 4DEnVar to evolve different ensemble forecast error covariance over time.

Hybrid data assimilation systems have known benefits for atmospheric models, but contain a number of user-specified parameters that must be tuned for optimal performance. These parameters include ensemble-specific values, such as ensemble size, covariance localization radius of influence, and covariance inflation coefficients, which are required for standard implementations of ensemble filters and smoothers. They also require the specification of a weighting coefficient for determining the amount of ensemble and stationary covariance used for the prior error statistics. To investigate the sensitivity of E4DVar and 4DEnVar to these tuning parameters, the two methods are compared over a range of configurations using the dynamical system introduced in Lorenz (1996, hereafter L96). The L96 model mimics the behavior of an atmospheric quantity over a latitudinal circle, but has a state dimension several orders of magnitude smaller than a typical atmospheric model, which allows for large numbers of assimilation experiments to be performed. Three main goals of these experiments are as follows: 1) investigate the role of the stationary covariance in hybrid data assimilation methods in the presence of preexisting ensemble covariance tuning parameters, 2) compare E4DVar with 4DEnVar systematically over a variety of data assimilation scenarios, and 3) examine the challenges that need to be overcome before replacing linearized models in 4DVar with a 4D ensemble.

The organization of this manuscript is as follows. We introduce the two data assimilation methods and forecast model in section 2. In section 3 we describe the implicit role of 4D covariance in variational smoothers, and show that differences in E4DVar and 4DEnVar estimations of time covariances determine their relative performance during data assimilation. Section 4 describes results from cycling data assimilation experiments, and section 5 provides a summary and conclusions.

## 2. Hybrid four-dimensional data assimilation

In this section, we present the mathematical framework for hybrid E4DVar and 4DEnVar, along with the dynamical system adopted for applying the two methods. We use lowercase boldface roman font to indicate vectors, uppercase boldface arial font to indicate matrices, and italic font to indicate scalars and nonlinear operators.

### a. Hybrid E4DVar

*β*can range from 0 to 1 and controls the impact of the two increments on the final analysis. This form of hybrid data assimilation is equivalent to replacing the background error covariance in (1) with

^{1}(Wang et al. 2007).

*n*th ensemble perturbation vector in the diagonal elements, and 0 in the nondiagonal elements. Equation (5) can then be written as

### b. Hybrid 4DEnVar

*t*:

^{2}Here,

*n*th ensemble perturbation from time 0 to

*t*using the full nonlinear model. 4DEnVar requires a localization of the ensemble covariance at each observation time in the window, as opposed to E4DVar, which propagates a localized covariance forward implicitly using the tangent linear and adjoint models. The localization of time-dependent covariance introduces additional complexity in the 4DEnVar algorithm. To perform the localization, we adopt the strategy used in most previous studies, which is to use the same correlation matrix at each time (Liu et al. 2009; Buehner et al. 2010b; Liu and Xiao 2013; Fairbairn et al. 2014).

### c. Coupled EnKF–variational data assimilation

The hybrid data assimilation methods discussed above use ensemble forecasts to provide flow-dependent statistics for the background state. In this study, we estimate the posterior mean using the E4DVar or 4DEnVar analysis, and adjust the ensemble perturbations at the middle of the assimilation window using an EnKF (Zhang et al. 2009). In doing so, the variational component assimilates observations at their correct times from

### d. Treatment of sampling errors

*α*in (12) is called the “relaxation coefficient” and ranges from 0 to 1, where

### e. Forecast model

^{3}This dynamical system contains a quadratic advection term, a linear dissipation term, and a constant damping term. It exhibits chaotic behavior for

*F*to vary when investigating the sensitivity of E4DVar and 4DEnVar to model error and error growth rate. To distinguish between the

*F*used by the true dynamical system, and the

*F*used by the model in imperfect model experiments, we use

*F*, respectively.

## 3. Four-dimensional covariance

In addition to not being able to evolve time covariance from

Mean squared error of covariance estimated from the E4DVar method (black) and 4DEnVar method (red) without localization (dashed lines) and with localization (solid lines).

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

Mean squared error of covariance estimated from the E4DVar method (black) and 4DEnVar method (red) without localization (dashed lines) and with localization (solid lines).

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

Mean squared error of covariance estimated from the E4DVar method (black) and 4DEnVar method (red) without localization (dashed lines) and with localization (solid lines).

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

The effects of the different localization strategies are illustrated in Fig. 3 by applying both methods to estimate covariances between all state variables in

Covariance between all 40 state variables at

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

Covariance between all 40 state variables at

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

Covariance between all 40 state variables at

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

For

Though not shown, the localization strategy used by E4DVar to estimate single columns of

The discussion in this section focuses mostly on the approximation of

## 4. Cycling data assimilation experiments

In this section, we present results from experiments that examine the sensitivity of E4DVar and 4DEnVar to a variety of modeling and observing system configurations, and tuning parameters that control the background error statistics. Each experiment uses observations generated from a “truth” simulation with added noise drawn from *κ* variables (grid points) and *κ* = 2 and

We perform separate sets of data assimilation experiments to examine the sensitivity of E4DVar and 4DEnVar to ensemble size, window length, observation network, model forcing, and model error; the system parameters for each of these experiments are listed in Table 1. Results from each 2500-day assimilation period are summarized by averaging the analysis RMSEs over all state variables and data assimilation cycles. Given the number of samples used to approximate these values, the largest margin of error for the RMSE calculations is found to be less than 0.01 with 95% confidence. For simplicity, RMSEs in each experiment are rounded to the nearest 0.001 units when the margin of error is smaller than 0.001, and 0.01 units when the margin of error is greater than 0.001.

Configuration of cycling data assimilation experiments.

The hybrid forms of E4DVar and 4DEnVar require two prior covariance matrices: *T*, and observation network, and use differences between the forecast and true system state to estimate covariances for the model state variables. These covariances are then averaged to form a covariance matrix that contains the same diagonal and off-diagonal elements for each variable; an example

### a. Ensemble size

For large modeling applications, computing resources often limit the number of ensemble members used to estimate *α* to tune the imperfect error statistics from the ensembles.

Figure 4 shows mean RMSEs calculated for each ensemble size and configuration of *α* near the optimal values, which are indicated by the black box in each subplot. As expected, both methods become more stable and require less localization (larger *α*) as *α* for a given

Mean analysis RMSEs estimated for a range of *α* and ROI, and *α* and

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

Mean analysis RMSEs estimated for a range of *α* and ROI, and *α* and

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

Mean analysis RMSEs estimated for a range of *α* and ROI, and *α* and

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

A hybrid covariance is used to compensate for sampling errors in cases that use *α* and *β* values (Fig. 5). For *α* between Figs. 4a–d and 5a–d. The positive impact of using a hybrid covariance is less significant for *β* drops below 0.05 for the pair of data assimilation systems. The optimal *α* between Figs. 4e,f and 5e,f, remains fixed at 0.3, suggesting that covariance relaxation is more effective than the hybrid covariance at treating sampling errors when

Mean analysis RMSEs estimated for a range of *α* and *β*, with *β* changes between the second and third rows. Red shading indicates higher RMSEs, NA indicates that filter divergence occurs during the experiment, and the smallest errors are indicated by the black box.

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

Mean analysis RMSEs estimated for a range of *α* and *β*, with *β* changes between the second and third rows. Red shading indicates higher RMSEs, NA indicates that filter divergence occurs during the experiment, and the smallest errors are indicated by the black box.

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

Mean analysis RMSEs estimated for a range of *α* and *β*, with *β* changes between the second and third rows. Red shading indicates higher RMSEs, NA indicates that filter divergence occurs during the experiment, and the smallest errors are indicated by the black box.

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

The relationship between *β* and *β* = 0 in Fig. 4. The performance of both methods, however, depends less on the prescribed

Mean analysis RMSEs estimated for a range of *α* and *β* and

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

Mean analysis RMSEs estimated for a range of *α* and *β* and

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

Mean analysis RMSEs estimated for a range of *α* and *β* and

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

### b. Assimilation window length

This section examines the sensitivity of E4DVar and 4DEnVar to assimilation window length. The same observations are assimilated in all experiments, except *T* values of 0, 6, and 24 h are used. Smaller values of *T* require more frequent data assimilation cycles, with *T* = 0 h resulting in an ensemble filter step each time observations are available (*T* used in the experiments.

Figure 7 shows mean RMSEs from experiments that use *β* = 0, *α* values near the optimal pair of ensemble covariance parameters. The methods are equivalent when run in filtering mode (*T* = 0), and they both exhibit a decrease in RMSEs as *T* is increased to 24 h. Smoothers use increasingly more data to approximate the analysis state as window length is extended, which leads to a smaller sensitivity to random errors in the ensemble and observation network, and an overall reduction in analysis RMSEs. For the same reasoning, the longer window length also decreases the amount of relaxation needed to prevent filter divergence (Fig. 7). Though not shown, the performance of both methods continues to increase with *T* until reaching a point where nonlinearity in the model dynamics introduces errors in the solution. Figure 7 also verifies that differences in covariance localization cause E4DVar to outperform 4DEnVar as window length increases; the inability of 4DEnVar (with localization) to produce analyses as accurate as E4DVar for large *T* is consistent with the error estimation of

Using an ensemble size of 10, mean analysis RMSEs are estimated for a range of *α* and *T* = 0, 6, and 24 h; the range of *α* values changes between each row. Red shading indicates higher RMSEs, NA indicates that filter divergence occurs during the experiment, and the smallest errors are indicated by the black box.

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

Using an ensemble size of 10, mean analysis RMSEs are estimated for a range of *α* and *T* = 0, 6, and 24 h; the range of *α* values changes between each row. Red shading indicates higher RMSEs, NA indicates that filter divergence occurs during the experiment, and the smallest errors are indicated by the black box.

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

Using an ensemble size of 10, mean analysis RMSEs are estimated for a range of *α* and *T* = 0, 6, and 24 h; the range of *α* values changes between each row. Red shading indicates higher RMSEs, NA indicates that filter divergence occurs during the experiment, and the smallest errors are indicated by the black box.

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

We introduce hybrid covariance in the cost function to examine how *T* affects the optimal weight of static covariance for *β* decreases with *T* in a manner similar to *α* (Fig. 7). This result provides additional evidence that both 4D data assimilation methods are less sensitive to sampling noise as assimilation window length increases. Furthermore, the inability of 4DEnVar to propagate

Mean analysis RMSEs estimated for a range of *α* and *β*, and *T* = 0, 6, and 24 h; the range of *β* values changes between each row. Ensemble size and

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

Mean analysis RMSEs estimated for a range of *α* and *β*, and *T* = 0, 6, and 24 h; the range of *β* values changes between each row. Ensemble size and

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

Mean analysis RMSEs estimated for a range of *α* and *β*, and *T* = 0, 6, and 24 h; the range of *β* values changes between each row. Ensemble size and

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

### c. Observation network

We examine the sensitivity of the data assimilation methods to spatial and temporal observation density by changing the observation parameters *κ* and *κ* from 2 to 1 (Figs. 9a,b), which doubles the spatial density of the observations. The additional observations reduce the differences in analysis accuracy between E4DVar and 4DEnVar, and allow both methods to become stable over a larger number of ensemble parameters (cf. Figs. 4c,d). Because the system has full observability in this experiment, the ensemble members remain close to the true solution between data assimilation cycles, which reduces the sensitivity of analysis RMSEs to covariance localization and inflation.

Using an ensemble size of 20 and *α* and *α* and

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

Using an ensemble size of 20 and *α* and *α* and

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

Using an ensemble size of 20 and *α* and *α* and

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

We then make the observation network sparse by increasing *κ* to 3 (Figs. 9c,d). With *κ* fixed at 3, we also decrease the temporal frequency of observations by increasing *α* for estimating background error covariance given limited information.

We perform a second set of experiments to examine the impact of using a hybrid covariance to improve the background error estimation for sparse observation networks. Figure 10 shows the sensitivity of E4DVar and 4DEnVar to parameters *α* and *β* when data are assimilated from the three different observation networks using a fixed *β* is found near 0.02, which amounts to only a small decrease in E4DVar and 4DEnVar analysis RMSEs compared to the case in which no static covariance is used. The optimal weight of the static covariance, however, increases when the observation network is made more sparse (Figs. 10c–f), which is indicative of less accurate ensemble covariance when fewer observations are available to constrain the solution. While E4DVar continues to produce smaller analysis RMSEs than 4DEnVar, we find that the two methods benefit equally from the use of a hybrid covariance in these experiments.

Using an ensemble size of 20 and *α* and *β* for experiments assimilating observations (a),(b) of every variable (*β* and *α* values change between the first and second rows. Red shading indicates higher RMSEs, NA indicates that filter divergence occurs during the experiment, and the smallest errors are indicated by the black box.

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

Using an ensemble size of 20 and *α* and *β* for experiments assimilating observations (a),(b) of every variable (*β* and *α* values change between the first and second rows. Red shading indicates higher RMSEs, NA indicates that filter divergence occurs during the experiment, and the smallest errors are indicated by the black box.

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

Using an ensemble size of 20 and *α* and *β* for experiments assimilating observations (a),(b) of every variable (*β* and *α* values change between the first and second rows. Red shading indicates higher RMSEs, NA indicates that filter divergence occurs during the experiment, and the smallest errors are indicated by the black box.

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

### d. Model forcing

In section 3, 4D covariances modeled using ensemble and tangent-linear approaches show that the accuracy of *F* can control the rate at which small initial-condition errors grow in time between *F* is increased from 8 to 10 decreases from 2.1 to 1.5 days (Lorenz and Emanuel 1998). Likewise, the model forcing term is expected to influence the period over which time covariance can be estimated effectively by the two data assimilation methods.

Using a perfect model, we perform data assimilation experiments with *F* set to 9, 10, and 11 to test E4DVar and 4DEnvar on dynamical systems that are intrinsically less predictable. These experiments are similar to those presented in Figs. 7e and 7f in that we use *F* leads to larger analysis RMSEs in both data assimilation systems. The faster error growth also causes both methods to become increasingly less stable for large *α*; significant posterior variance inflation is no longer needed to increase the prior variance for future data assimilation cycles. As hypothesized, the 4DEnVar analyses suffer a larger drop in accuracy, owing to faster error growth in the assimilation window.

Using an ensemble size of 10 and *α* and

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

Using an ensemble size of 10 and *α* and

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

Using an ensemble size of 10 and *α* and

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

### e. Model error

In this section, we test E4DVar and 4DEnVar with an imperfect forecast model by keeping the true forcing term fixed at *α* for tuning the ensemble covariance without hybrid covariance. Both methods produce the lowest analysis RMSEs when a relatively small *α* of 0.6. Model error reduces the accuracy of 4D covariances approximated in the assimilation window, and covariance localization acts as the only mechanism for improving its estimation. The issues discussed in section 3 regarding time localization in 4DEnVar become more severe, owing to the use of a shorter

(a),(b) Mean analysis RMSEs are estimated for a range of *α* and *T* = 24 h, and an incorrect model forcing of *α* and *β*. Hybrid covariance is used in the standard cost function form in (c) and (d) and with ensemble perturbations in (e) and (f). Red shading indicates higher RMSEs, NA indicates that filter divergence occurs during the experiment, and the smallest errors are indicated by the black box.

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

(a),(b) Mean analysis RMSEs are estimated for a range of *α* and *T* = 24 h, and an incorrect model forcing of *α* and *β*. Hybrid covariance is used in the standard cost function form in (c) and (d) and with ensemble perturbations in (e) and (f). Red shading indicates higher RMSEs, NA indicates that filter divergence occurs during the experiment, and the smallest errors are indicated by the black box.

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

(a),(b) Mean analysis RMSEs are estimated for a range of *α* and *T* = 24 h, and an incorrect model forcing of *α* and *β*. Hybrid covariance is used in the standard cost function form in (c) and (d) and with ensemble perturbations in (e) and (f). Red shading indicates higher RMSEs, NA indicates that filter divergence occurs during the experiment, and the smallest errors are indicated by the black box.

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

When a hybrid covariance is introduced to treat model error during data assimilation, E4DVar and 4DEnVar respond very differently than in experiments where sampling errors related to ensemble size is most dominant (see section 4a). Increasing *β* from 0.0 to 0.1 in the E4DVar experiment leads to a 30% reduction in analysis RMSEs (Fig. 12c), and removes most of the sensitivity to *α*. The hybrid covariance in E4DVar serves a similar purpose as additive inflation for EnKFs; *α* for optimal performance, which suggests that the data assimilation benefits less from

*α*. Nevertheless, E4DVar performs worse with the hybrid perturbations, owing to the fact that

### f. Choice of analysis

*α*and

*β*values and a 24-h assimilation window. We apply E4DVar and 4DEnVar using the analyses defined by (20) and (21), respectively, and compare the results with experiments using (19). Mean analysis errors are plotted in Fig. 13 using “

Mean analysis RMSEs estimated for a range of *α* and *β*, and *T* = 24 h using (a),(b) analysis increments propagated forward with the nonlinear model from

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

Mean analysis RMSEs estimated for a range of *α* and *β*, and *T* = 24 h using (a),(b) analysis increments propagated forward with the nonlinear model from

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

Mean analysis RMSEs estimated for a range of *α* and *β*, and *T* = 24 h using (a),(b) analysis increments propagated forward with the nonlinear model from

Citation: Monthly Weather Review 143, 5; 10.1175/MWR-D-14-00224.1

## 5. Conclusions

Using a low-dimensional dynamical system, this study investigates the behavior of two hybrid 4D data assimilation systems currently being tested for operational use and for research applications in atmospheric science. One method, called E4DVar, is an extension of the traditional 4DVar method, and uses tangent linear and adjoint models to implicitly evolve a hybrid prior error covariance in the assimilation window. A second method, called 4DEnVar, uses a four-dimensional ensemble in place of the linearized models in 4DVar. Because of the absence of tangent linear and adjoint models in 4DEnVar, the climatological component of the background error covariance matrix is fixed in time; therefore, the 4DEnVar method in this study operates as a hybrid between 4DEnVar and 3DVar-FGAT.

A fundamental difference between E4DVar and 4DEnVar is how temporal covariances are estimated in the assimilation window. 4DEnVar uses an ensemble forecast from the full nonlinear model to estimate the time evolution of prior error covariance. This approach is beneficial for situations where the tangent linear approximation is invalid, as is the case for physical parameterization schemes used frequently in atmospheric models. It also provides a means of performing 4D data assimilation in a manner that is highly parallel, unlike E4DVar, which requires an integration of the tangent linear and adjoint models during each iteration of the cost function minimization. Nevertheless, the ensemble representation of time covariances in 4DEnVar has disadvantages when a hybrid covariance is applied, or when localization is needed to treat sampling errors.

The L96 model is used to examine the performance of E4DVar and 4DEnVar under a number of model, data assimilation, and observing system configurations. Experiments performed using ensemble covariance during data assimilation yield results that agree with the findings of Buehner et al. (2010b) and Fairbairn et al. (2014) in that E4DVar consistently produces the lowest analysis errors. The two methods provide comparable analysis accuracy when the forecast model is perfect, and when dense observation networks and large ensembles are available; however, E4DVar performs significantly better for sparse observations, and in cases where small ensemble sizes and model error decrease the optimal localization length scale. E4DVar also produces significantly lower analysis errors than 4DEnVar when either the assimilation window length or model forcing term are increased; in both cases, the pure ensemble representation of covariances in the assimilation window is found to be less accurate than applying linearized models to evolve a localized covariance matrix in time.

The E4DVar method has additional advantages when a hybrid prior covariance is used in the cost function for imperfect model experiments. The ability of E4DVar to evolve time-dependent covariance from the climatological component in the assimilation window has the same effect as applying additive inflation in EnKFs, except without the extra sampling errors. We find that 4DEnVar benefits more from using hybrid perturbations—comprising samples from the ensemble forecast and climatological error distribution—rather than using the full climatological covariance matrix in the cost function. Hybrid perturbations allow the ensemble to evolve errors from the climatological covariance in time, but introduce additional sampling noise. The inability of 4DEnVar to evolve the full static covariance forward in time, however, only appears to be an issue when the forecast model is imperfect. In perfect model experiments, E4DVar and 4DEnVar benefit equally from using static covariance in the cost function to improve the prior error covariance estimated from small ensembles, or in cases where observations are sparse.

One drawback of E4DVar not addressed in this study is the necessary use of a simplified forecast model when coding the tangent linear and adjoint operators for weather applications. Data assimilation experiments performed at national forecast centers, such as Environment Canada (Buehner et al. 2010b) and the Met Office (Lorenc et al. 2014), suggest that the positive qualities of E4DVar outweigh the negative effects of simplified linear operators in global models. For high-resolution limited-area models, Gustafsson and Bojarova (2014) provide evidence that suggests the opposite to be true. The relative performance of E4DVar and 4DEnVar for convective-scale applications, where errors in physical parameterization schemes are nontrivial, remains an open question. Despite the potential errors from simplified linear models in E4DVar, this method has advantages over 4DEnVar when short localization length scales are required, when forecast errors grow quickly in time, and when a hybrid covariance is necessary to compensate for model errors. Furthermore, alternative approaches to the hybrid methods tested in this study can be considered more effective for weather applications (e.g., replacing 3DVar-FGAT in 4DEnVar with 4DVar or E4DVar). These strategies would also combine the most computationally expensive components of each system, and require the coding of tangent linear and adjoint models.

## Acknowledgments

This work was supported by the Office of Naval Research Grant N000140910526, the National Science Foundation Grant AGS-1305798, and NASA Grant NNX12AJ79G. The authors thank Steven Greybush from The Pennsylvania State University for many helpful discussions during the course of this work, and three anonymous reviewers for their insightful comments.

## REFERENCES

Bishop, C. H., and D. Hodyss, 2011: Adaptive ensemble covariance localization in ensemble 4D-VAR state estimation.

,*Mon. Wea. Rev.***139**, 1241–1255, doi:10.1175/2010MWR3403.1.Bishop, C. H., and E. A. Satterfield, 2013: Hidden error variance theory. Part I: Exposition and analytic model.

,*Mon. Wea. Rev.***141**, 1454–1468, doi:10.1175/MWR-D-12-00118.1.Buehner, M., 2005: Ensemble-derived stationary and flow-dependent background-error covariances: Evaluation in a quasi-operational NWP setting.

,*Quart. J. Roy. Meteor. Soc.***131**, 1013–1043, doi:10.1256/qj.04.15.Buehner, M., P. L. Houtekamer, C. Charette, H. Mitchell, and B. He, 2010a: Intercomparison of variational data assimilation and the ensemble Kalman filter for global deterministic NWP. Part I: Description and single-observation experiments.

,*Mon. Wea. Rev.***138**, 1550–1566, doi:10.1175/2009MWR3157.1.Buehner, M., P. L. Houtekamer, C. Charette, H. Mitchell, and B. He, 2010b: Intercomparison of variational data assimilation and the ensemble Kalman filter for global deterministic NWP. Part II: One-month experiments with real observations.

,*Mon. Wea. Rev.***138**, 1567–1586, doi:10.1175/2009MWR3158.1.Clayton, A. M., A. C. Lorenc, and D. M. Barker, 2013: Operational implementation of a hybrid ensemble/4D-Var global data assimilation system at the Met Office.

,*Quart. J. Roy. Meteor. Soc.***139**, 1445–1461, doi:10.1002/qj.2054.Courtier, P., J.-N. Thepáut, and A. Hollingsworth, 1994: A strategy for operational implementation of 4D-Var, using an incremental approach.

,*Quart. J. Roy. Meteor. Soc.***120**, 1367–1387, doi:10.1002/qj.49712051912.Etherton, B. J., and C. H. Bishop, 2004: Resilience of hybrid ensemble/3DVAR analysis schemes to model error and ensemble covariance error.

,*Mon. Wea. Rev.***132**, 1065–1080, doi:10.1175/1520-0493(2004)132<1065:ROHDAS>2.0.CO;2.Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics.

,*J. Geophys. Res.***99**, 10 143–10 162, doi:10.1029/94JC00572.Fairbairn, D., S. R. Pring, A. C. Lorenc, and I. Roulstone, 2014: A comparison of 4DVar with ensemble data assimilation methods.

,*Quart. J. Roy. Meteor. Soc.***140**, 281–294, doi:10.1002/qj.2135.Fisher, M., and E. Andersson, 2001: Developments in 4D-Var and Kalman filtering. ECMWF Tech. Memo. 347, ECMWF, 36 pp.

Gustafsson, N., and J. Bojarova, 2014: Four-dimensional ensemble variational (4D-En-Var) data assimilation for the HIgh Resolution Limited Area Model (HIRLAM).

,*Nonlinear Processes Geophys.***21**, 745–762, doi:10.5194/npg-21-745-2014.Hamill, T. M., and C. Snyder, 2000: A hybrid ensemble Kalman filter-3D variational analysis scheme.

,*Mon. Wea. Rev.***128**, 2905–2919, doi:10.1175/1520-0493(2000)128<2905:AHEKFV>2.0.CO;2.Honda, Y., M. Nishijima, K. Koizumi, Y. Ohta, K. Tamiya, T. Kawabata, and T. Tsuyuki, 2005: A pre-operational variational data assimilation system for a non-hydrostatic model at the Japan Meteorological Agency: Formulation and preliminary results.

,*Quart. J. Roy. Meteor. Soc.***131**, 3465–3475, doi:10.1256/qj.05.132.Houtekamer, P. L., and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique.

,*Mon. Wea. Rev.***126**, 796–811, doi:10.1175/1520-0493(1998)126<0796:DAUAEK>2.0.CO;2.Huang, X., and Coauthors, 2009: Four-dimensional variational data assimilation for WRF: Formulation and preliminary results.

,*Mon. Wea. Rev.***137**, 299–314, doi:10.1175/2008MWR2577.1.Kuhl, D. D., T. E. Rosmond, C. H. Bishop, J. McLay, and N. L. Baker, 2013: Comparison of hybrid ensemble/4DVar and 4DVar within the NAVDAS-AR data assimilation framework.

,*Mon. Wea. Rev.***141**, 2740–2758, doi:10.1175/MWR-D-12-00182.1.Le Dimet, F.-X., and O. Talagrand, 1986: Variational algorithms for analysis and assimilation of meteorological observations: Theoretical aspects.

,*Tellus***38A**, 97–110, doi:10.1111/j.1600-0870.1986.tb00459.x.Liu, C., and Q. Xiao, 2013: An ensemble-based four-dimensional variational data assimilation scheme. Part III: Antarctic applications with Advanced Research WRF using real data.

,*Mon. Wea. Rev.***141**, 2721–2739, doi:10.1175/MWR-D-12-00130.1.Liu, C., Q. Xiao, and B. Wang, 2008: An ensemble-based four-dimensional variational data assimilation scheme. Part I: Technical formulation and preliminary test.

,*Mon. Wea. Rev.***136**, 3363–3373, doi:10.1175/2008MWR2312.1.Liu, C., Q. Xiao, and B. Wang, 2009: An ensemble-based four-dimensional variational data assimilation scheme. Part II: Observing System Simulation Experiments with Advanced Research WRF (ARW).

,*Mon. Wea. Rev.***137**, 1687–1704, doi:10.1175/2008MWR2699.1.Lorenc, A. C., 1997: Development of an operational variational assimilation scheme.

,*J. Meteor. Soc. Japan***75**, 339–346.Lorenc, A. C., 2003: The potential of the ensemble Kalman filter for NWP: A comparison with 4D-Var.

,*Quart. J. Roy. Meteor. Soc.***129**, 3183–3203, doi:10.1256/qj.02.132.Lorenc, A. C., and Coauthors, 2000: The Met Office global three-dimensional variational data assimilation scheme.

,*Quart. J. Roy. Meteor. Soc.***126**, 2991–3012, doi:10.1002/qj.49712657002.Lorenc, A. C., N. Bowler, A. Clayton, and S. Pring, 2014: Development of the Met Office’s 4DEnVar system.

*Proc. Sixth EnKF Workshop,*Buffalo, NY, Met Office. [Available online at hfip.psu.edu/fuz4/EnKF2014/EnKF-Day1/Lorenc_4DEnVar.pptx.]Lorenz, E. N., 1996: Predictability: A problem partly solved.

*Proc. Seminar on Predictability,*Vol. 1, Reading, United Kingdom, ECMWF, 1–18.Lorenz, E. N., and K. A. Emanuel, 1998: Optimal sites for supplementary weather observations: Simulation with a small model.

,*J. Atmos. Sci.***55**, 399–414, doi:10.1175/1520-0469(1998)055<0399:OSFSWO>2.0.CO;2.Poterjoy, J., and F. Zhang, 2014: Intercomparison and coupling of ensemble and four-dimensional variational data assimilation methods for the analysis and forecasting of Hurricane Karl (2010).

,*Mon. Wea. Rev.***142,**3347–3364, doi:10.1175/MWR-D-13-00394.1.Sun, J., and N. A. Crook, 1997: Dynamical and microphysical retrieval from Doppler radar observations using a cloud model and its adjoint. Part I: Model development and simulated data experiments.

,*J. Atmos. Sci.***54**, 1642–1661, doi:10.1175/1520-0469(1997)054<1642:DAMRFD>2.0.CO;2.Thepáut, J.-N., and P. Courtier, 1991: Four-dimensional variational data assimilation using the adjoint of a multilevel primitive-equation model.

,*Quart. J. Roy. Meteor. Soc.***117**, 1225–1254, doi:10.1002/qj.49711750206.Thepáut, J.-N., P. Courtier, G. Belaud, and G. Lamaître, 1996: Dynamical structure functions in a four-dimensional variational assimilation: A case study.

,*Quart. J. Roy. Meteor. Soc.***122**, 535–561, doi:10.1002/qj.49712253012.Tian, X., Z. Xie, and A. Dai, 2008: An ensemble-based explicit four-dimensional variational assimilation method.

*J. Geophys. Res.,***113,**D21124, doi:10.1029/2008JD010358.Tian, X., Z. Xie, A. Dai, C. Shi, B. Jia, F. Chen, and K. Yang, 2009: A dual-pass variational data assimilation framework for estimating soil moisture profiles from AMSR-E microwave brightness temperature.

*J. Geophys. Res.,***114,**D16102, doi:10.1029/2008JD011600.Wang, X., and T. Lei, 2014: GSI-based four-dimensional ensemble-variational (4DEnsVar) data assimilation: Formulation and single-resolution experiments with real data for NCEP Global Forecast System.

,*Mon. Wea. Rev.***142**, 3303–3325, doi:10.1175/MWR-D-13-00303.1.Wang, X., C. Snyder, and T. M. Hamill, 2007: On the theoretical equivalence of differently proposed ensemble-3DVar hybrid analysis schemes.

,*Mon. Wea. Rev.***135**, 222–227, doi:10.1175/MWR3282.1.Zhang, F., C. Snyder, and J. Sun, 2004: Impacts of initial estimate and observation availability on convective-scale data assimilation with an ensemble Kalman filter.

,*Mon. Wea. Rev.***132**, 1238–1253, doi:10.1175/1520-0493(2004)132<1238:IOIEAO>2.0.CO;2.Zhang, F., M. Zhang, and J. A. Hansen, 2009: Coupling ensemble Kalman filter with four-dimensional variational data assimilation.

,*Adv. Atmos. Sci.***26**, 1–8, doi:10.1007/s00376-009-0001-8.Zhang, F., M. Zhang, and J. Poterjoy, 2013: E3DVar: Coupling an ensemble Kalman filter with three-dimensional variational data assimilation in a limited-area weather prediction model and comparison to E4DVar.

,*Mon. Wea. Rev.***141**, 900–917, doi:10.1175/MWR-D-12-00075.1.Zhang, M., and F. Zhang, 2012: E4DVar: Coupling an ensemble Kalman filter with four-dimensional data assimilation in a limited-area weather prediction model.

,*Mon. Wea. Rev.***140**, 587–600, doi:10.1175/MWR-D-11-00023.1.

^{1}

The standard 4DVar uses

^{2}

Another option is to apply the full nonlinear observation operator on the forecast members and mean before calculating