## 1. Introduction

For more than a decade four-dimensional variational data assimilation (4DVar) has been used by most of the main global numerical weather prediction (NWP) centers (Rabier 2005); the Met Office implemented 4DVar in 2004 (Rawlins et al. 2007). A weakness of the basic 4DVar method is the use of a fixed “climatological” model of the error covariance in the background forecast, which does not describe the flow-dependent errors of the day as well as ensemble Kalman filter methods (Lorenc 2003b). To address this Clayton et al. (2013) implemented a hybrid-4DVar method, where the term “hybrid” refers to a combination of a climatological covariance model with covariances calculated from an ensemble of forecasts, designed to sample the current uncertainty.

Another weakness of 4DVar is its cost, both in the time needed to run sequential iterations of linear and adjoint models for a high-resolution NWP system on a massively parallel computer, and in the effort needed to maintain and develop these models as the NWP system evolves because of advances in science and computing power (Kalnay et al. 2007). These costs can be avoided by using the ensemble trajectories (i.e., the time evolution of each member) to directly estimate a four-dimensional background error covariance. Pioneering attempts such as Tian et al. (2008) did not include localization so were not directly applicable to large NWP systems and relatively small ensembles. With localization the approach has come to be called four-dimensional (4D) ensemble-variational data assimilation (4DEnVar; Lorenc 2013); it was introduced by Liu et al. (2008, 2009) and tested in an NWP model by Buehner et al. (2010a,b). We have coded 4DEnVar within the Met Office global data assimilation software system. Because the operational system is hybrid-4DVar, already using a localized ensemble covariance to represent errors of the day, our tests were able to eliminate this effect and do a clean comparison of the hybrid-4DEnVar and hybrid-4DVar methods. This distinguishes the current paper from most others on hybrid data assimilation, which focused on the benefit of adding hybrid covariances to three-dimensional variational data assimilation (3DVar; Hamill and Snyder 2000; Wang et al. 2008a,b, 2013; Kleist and Ide 2015a), or to 4DVar (Clayton et al. 2013; Zhang and Zhang 2012), or on comparing 4DVar with hybrid-4DEnVar (Buehner et al. 2013; Gustafsson et al. 2014). The studies most related to ours are Wang and Lei (2014) and Kleist and Ide (2015b), who compared 4DEnVar with 3DEnVar.

Our initial plan was for trials of the hybrid-4DEnVar system with as many settings as possible based on those of the operational hybrid-4DVar (Clayton et al. 2013); these showed that although hybrid-4DEnVar performed quite well, improving on the hybrid-3DEnVar method used for example by Hamill and Snyder (2000) and Wang et al. (2013), it did not make as big an improvement as that shown by hybrid-4DVar over hybrid-3DVar. To better understand this difference in the methods we performed some additional diagnostic experiments: of the time evolution of error covariances implicit in the various methods and of the different methods used to control high-frequency oscillations. They indicated that, unless there is strong space localization, the 4DEnVar method can model the time evolution of ensemble covariances as well as 4DVar—in our experiments it is the 3D treatment of the climatological covariance which is the main contribution to the relatively poor performance of hybrid-4DEnVar. The new 4D incremental analysis update (4DIAU) time-filtering scheme used with hybrid-4DEnVar performed well enough compared to the well-established *J*_{c} method of 4DVar systems (Gauthier and Thépaut 2001) that this difference makes only a relatively minor contribution.

The structure of this paper is as follows: in section 2 we describe the data assimilation methods; in section 3 we describe realistic trials to evaluate hybrid-4DEnVar (the new method) compared to hybrid-4DVar (the operational method) and their 3D equivalents; in section 4 we describe the additional diagnostic studies of the time evolution of error covariances and in section 5 we present diagnostics of the 4DIAU and *J*_{c} schemes used in the main trials, comparing them with the incremental analysis update (IAU) method used for 3D analysis methods (Bloom et al. 1996). Finally, in section 6, we discuss the conclusions and suggest means of improving the performance of 4DEnVar.

## 2. Data assimilation methods used

### a. Four-dimensional variational methods and notation

**d**=

**y**

^{o}−

*H*(

**x**

^{b}) as a preliminary step, then increments it using an observation operator linearized about

**x**

^{b}. Using Eq. (3) enables all the methods in this paper to handle nonlinear

For large NWP systems, the background term

### b. 4DVar—Using a linear forecast model and its adjoint

While both methods studied in this paper are four-dimensional and variational, the term 4DVar has come to be associated with methods that achieve their four-dimensional constraint by running a forecast model and its adjoint at each iteration. In line with the recommendations of Lorenc (2013), in this paper we reserve 4DVar, even with a prefix, to such methods.

**x**and a 4D output

**x**by

**x**

^{b}+

*δ*

**x**and make the assumption that

**v**

^{c}(Lorenc et al. 2000; Ingleby 2001) such that

^{T}. This gives

**v**which in this case is just a copy

**v**=

**v**

^{c}:

Each iteration in the variational minimization of Eq. (8) requires an integration of the linear model

Control of high-frequency noise in the 4DVar experiments was achieved by adding a digital-filter *J*_{c} term to Eq. (8), similar to that of Gauthier and Thépaut (2001). This term penalizes high-frequency oscillations in the PF model trajectory. We discuss its properties in section 5.

*α*_{k}for each ensemble member.

*α*_{k}defines the local weight given to each perturbation, so that

*α*_{k}is smooth using techniques copied from the climatological covariance transform

**v**so that the background penalty term is transformed into a simple inner product:

**v**

^{c}and all the

**v**. The same 4DVar penalty function in Eq. (8) can be used, this time calculating

^{1}

*J*

_{c}term to control imbalance. For future flexibility we wish to investigate methods with no internal model steps, ruling out internal constraints such as the tangent-linear normal mode constraint (TLNMC; Kleist et al. 2009) or the external initialization scheme we used when 3DVar was operational—an incremental version of digital filtering initialization (DFI; Lynch and Huang 1992). The IAU method (Bloom et al. 1996) was used in the experiments in section 3. Its time-filtering properties are discussed in section 5.

### c. 4DEnVar—Using a precalculated ensemble of trajectories

*α*_{k}are the same as in En-4DVar: we ensure that it is smooth using a transform, setting

**v**. A very similar penalty function to Eq. (8) can be used:

**v**

^{c}and all the

**v**. The same penalty function in Eq. (24) can be used, this time calculating

Notice that, apart from the *J*_{c} term, Eq. (24) is identical to the hybrid-4DVar penalty in Eq. (8), and both use the same method as in Eq. (3) for predicting the observed values. It is the different calculations of the 4D increments in Eqs. (25) and (17), which distinguish the methods. 4DEnVar does not require

Time filtering using a 4DIAU is used in our 4DEnVar experiments, since the digital-filter *J*_{c} term is not expected to be effective without a model. The four-dimensional IAU is like that of Bloom et al. (1996), but with an important difference: while the IAU adds the same 3D analysis increment gradually over a period during a model forecast, the 4DIAU adds a different 3D field valid at each time step of the forecast,^{2} taken from the four-dimensional

The differences between the *J*_{c} initialization used in 4DVar, this 4DIAU method, and the IAU used in the 3D analysis experiments are discussed further in section 5.

3DEnVar versions of the above can be obtained by making the 4D trajectories such as

Apart from the different methods for controlling high-frequency noise, the only difference between the methods is in their handling of the time evolution of increments in each window. Remove this by taking the three-dimensional versions and the methods are the same: En-3DVar is equivalent to 3DEnVar and hybrid-3DVar is equivalent to hybrid-3DEnVar. This helps in understanding the results in section 3. [For technical reasons in our implementation, there are other slight differences: **x**^{b} in En-4DVar and about the ensemble mean

## 3. NWP trials

The object of this first set of experiments was to compare the hybrid-4DEnVar method with the hybrid-4DVar system documented by Clayton et al. (2013) and used for operational global NWP at the Met Office. We, therefore, based the settings for hybrid-4DEnVar (e.g., for localization and hybridization of the ensemble covariances) on that paper, with no attempt to tune them for the new method. We also ran hybrid-3DEnVar and hybrid-3DVar experiments, with the expectation that they would be very similar. By using these as controls, we are able to measure how much adding the time dimension improves each method.

The other important difference is the *J*_{c} term used in hybrid-4DVar. This penalizes high-frequency oscillations revealed by applying a digital time filter to *J*_{c} term turned off, using the 4DIAU initialization used for the hybrid-4DEnVar experiment to control high-frequency noise. The experiments are summarized in Table 1.

The trials were run at slightly lower horizontal resolution than the Met Office operational global NWP system, which for the experimental period in 2013 ran its deterministic forecast with a grid spacing of about 26 km: the grids we used were 640 × 481 (about 42 km) for the deterministic forecast model and 432 × 325 (about 62 km) for the ensemble forecasts and the perturbation forecast model in 4DVar. All used the same 70 levels, with the top level at 80 km. The ensemble was precalculated by running a version of the Met Office Global and Regional Ensemble Prediction System (MOGREPS; Bowler et al. 2008; Flowerdew and Bowler 2011). Note that MOGREPS includes no external initialization or time filtering of the initial ensemble states or perturbations. Experiments showed that such measures were unnecessary, and indeed the ensemble perturbation trajectories are found to be relatively well-balanced after *T* + 3 h—the time they are first required for use in 4DVar or 4DEnVar. All the trials used the same 44-member ensemble trajectories to generate ensemble covariances using Eqs. (10) or (21) as appropriate. They assimilated the operational observations in a 6-h cycle for the month of July 2013. Each experiment ran forecasts to 5 days every 12 h. These were verified against observations for a range of fields and time ranges as indicated on the two axes on each panel in Fig. 1. The first three rows in each panel are measured north of 20°N, the next two rows are between 20°N and 20°S and the last three rows are south of 20°S. Length of forecast increases along each row.

The experiments were then compared in pairs by calculating the relative difference between their RMS errors for each verified field. These are shown graphically for each pair of experiments by the shaded squares in Fig. 1, whose areas are proportional to the percentage change in RMS error. The caption for each panel also shows the average of the values plotted.

The comparison of hybrid-4DVar with hybrid-4DEnVar, which is the objective of this paper, is shown in Fig. 1a. It shows that hybrid-4DEnVar performs on average over 3% worse, with the biggest degradations occurring in 500-hPa height forecasts in the Southern Hemisphere. The rest of this paper focusses on understanding the causes of this result.

We start by considering Fig. 1b, which confirms that, as expected, the hybrid-3DVar and hybrid-3DEnVar systems perform equally well. These experiments are used as reference in Figs. 1c,d. Figure 1c shows that adding the time dimension in hybrid-4DVar gives a big benefit, especially in the Southern Hemisphere. This is usual for 4DVar (e.g., it was seen in Rawlins et al. 2007). In comparison, Fig. 1d shows a much smaller improvement due to the time dimension in hybrid-4DEnVar.

We can eliminate the *J*_{c} term as the major cause of the improvement seen in the hybrid-4DVar experiment by considering the last two panels. Figure 1e shows that, using the same 4DIAU method to control noise, hybrid-4DVar still beats hybrid-4DEnVar, by nearly as much as in Fig. 1a. Figure 1f shows that there is some advantage from the *J*_{c} term over 4DIAU in 4DVar—we study this more in section 5.

So we can conclude that the time dimension is less well handled in hybrid-4DEnVar than in hybrid-4DVar. Experiments to show why are described in the next section.

## 4. Diagnosing the 4D covariances

### a. Experimental setup

We diagnose the behavior of the 4D covariances using single-observation experiments. We chose two different cases:

a strong midlatitude jet stream—a good test of strong advection, and

a hurricane—a good test of complex nonlinear physics, including moist processes.

For the first case we chose a region in the polar jet stream, located in the North Atlantic Ocean about 1500 km southeast of Newfoundland, Canada. The observation is located at level 29, which corresponds to a height of 5796 m and a pressure of about 450 hPa. The wind speed for this example is about 60 m s^{−1}. Figure 2 shows the background

For the second case we chose Hurricane Sandy as it was tracking northward through the Caribbean. Figure 3 shows the background

The pseudo-observations are specified as an increment to the background. The two observations are described in Table 2.

The single-observation experiments are designed to explore the 4D representation of the climatological and ensemble background-error covariances for hybrid-4DVar and hybrid-4DEnVar. They are also designed to explore the effect of horizontal localization on the ensemble covariance. Bearing this in mind, four covariance models with differing *β* and Gaussian localization scales^{3} *L* are investigated, given in Table 3.

Covariance (Cov) models used for the single-observation experiments.

The single observations are located at the beginning of the window, which, since we turned off the 4DVar *J*_{c} term, implies that the hybrid-4DVar and hybrid-4DEnVar analysis increments should be identical at *t*_{0}.^{4} Note that our climatological covariance is built using transforms designed to give appropriately balanced increments (Lorenc et al. 2000) and our ensemble localization in Eqs. (15) and (16) uses the same transforms, so we did not expect these single-observation experiments to exhibit substantial imbalance. This was confirmed by repeating the 4DVar experiments with the *J*_{c} term switched on (not shown); results did not change significantly.

*E*

_{SC}). They are calculated by

*M*

_{0→6}is the model propagation from the beginning to the end of the assimilation window.

*M*

_{0→6}and the perturbation forecast

_{0→6}used to compute

*δ*

**x**

^{a}(

*t*

_{6}). For hybrid-4DEnVar, Eq. (27) measures errors from four sources:

the resolution difference between the deterministic and ensemble forecasts;

the time invariance of the analysis increment generated by the climatological part of the background error covariance (as in 3DVar FGAT);

the linearization errors introduced by the alpha-control variable—a linear combination of trajectories need not itself be a trajectory; and

the errors introduced by the space localization (of the ensemble covariance between the initial and final times) not moving with the flow.

### b. Jet stream results

#### Analysis increments

The hybrid-4DVar and hybrid-4DEnVar analysis increments are compared for the four Cov models. Since the hybrid-4DVar and hybrid-4DEnVar increments are theoretically identical at the time of the observation (the start of the window), only the hybrid-4DEnVar increments are shown at this time. The increments at the end of the window are shown for both methods.

Firstly, hybrid-4DEnVar and hybrid-4DVar are compared for Cov model i, where the pure climatological covariance is used (

Next Cov model ii is examined, where hybrid-4DEnVar and hybrid-4DVar use a pure ensemble covariance (*L* = 500 km). Cov model ii is a comparison between 4DEnVar and En-4DVar and is shown in Figs. 4d–f. It is evident that the analysis increment at the beginning of the window (Fig. 4d) exhibits a heterogenous structure, since the ensemble covariance captures the “errors of the day.” Note that such severe localization is not used in operational practice, but it clearly demonstrates an important effect. Both increments have moved downstream by the end of the assimilation window, but the En-4DVar increment (Fig. 4f) has moved farther downstream than the 4DEnVar increment (Fig. 4e). The localization between different times in 4DEnVar makes no allowance for the flow, which explains why the increment is smaller in magnitude and has not moved as far downstream as the En-4DVar increment. Fairbairn et al. (2014) used a simple model to demonstrate that this effect degrades the 4DEnVar analysis.

Cov model iii is shown in Figs. 4g–i. This is the same as Cov model ii but uses the longer 1200-km localization length scale used operationally at the Met Office. Thus, the analysis increments are spread over a larger area. The analysis increments at the end of the window for En-4DVar and 4DEnVar are similar, since the localization length scale is large enough to not significantly degrade the time correlations of the 4DEnVar ensemble covariance.

Cov model iv is shown in Figs. 4j–l, where the hybrid background-error covariance matrix is used. The analysis increments for Cov model iv are effectively a linear combination of the increments for Cov model i and Cov model iii. It is clear that the main difference between the two methods is the way they use the climatological covariance, rather than the ensemble covariance. The hybrid-4DEnVar method uses a time-invariant climatological covariance, so part of the analysis increment is permanently centered around the observation. The 4D ensemble covariance then propagates part of the analysis increment downstream. On the other hand, the hybrid-4DVar climatological and ensemble covariances are both propagated by the PF and adjoint models. Hence all of the hybrid-4DVar analysis increment is propagated downstream.

### c. Strong-constraint errors

The strong-constraint errors for Cov model iv are shown in Fig. 5. The location of the largest errors for hybrid-4DEnVar agrees with Fig. 4k, where part of the analysis increment was not propagated downstream from the observation. These large errors are not present in hybrid-4DVar. Instead, hybrid-4DVar has some errors downstream, which are probably caused mainly by the simplified physics in the PF model, though as for hybrid-4DEnVar there will also be errors due to the higher resolution used for the deterministic forecast. The strong-constraint errors for hybrid-4DEnVar appear to be larger overall than the strong-constraint errors for hybrid-4DVar, which is clarified below.

### d. Relative errors

The relative strong-constraint errors for Cov model iii and Cov model iv are shown in Table 4. For Cov model iii, 4DEnVar performs slightly better than En-4DVar. This suggests that the linear assumption made by using the 4DVar PF and adjoint models is less accurate than the linear assumption made by using the 4DEnVar alpha-control variable. However, for Cov model iv, hybrid-4DVar performs significantly better than hybrid-4DEnVar. The results for Cov model iv agree with the trials of section 3, where hybrid-4DVar performed better than hybrid-4DEnVar. The results suggest that the inferior performance of hybrid-4DEnVar in the trials is related to the inferior 4D representation of the climatological background-error covariance.

The relative strong-constraint errors (RE) for Cov models iii and iv with the jet stream case and the Hurricane Sandy case.

### e. Hurricane Sandy results

#### 1) Analysis increments

The experiments for the Hurricane Sandy case are performed in exactly the same way as the experiments for the jet stream case. The analysis increments for Cov model iii and Cov model iv are shown in Fig. 6. It is evident that the ensemble covariance spreads the increment around the hurricane. Because of the high winds and sharp gradients near a hurricane, any variations in position cause an unusually large ensemble covariance; in comparison the climatological covariance is almost negligible. This explains why the increments for Cov model iv (Figs. 6d–f) are approximately half the size of the increments for Cov model iii (Figs. 6a–c).

#### 2) Strong-constraint errors

Figure 7 shows the strong-constraint errors for Cov model iv. The strong-constraint errors for both hybrid-4DEnVar and hybrid-4DVar are largest near the hurricane eye. The strong-constraint errors for hybrid-4DVar are significantly larger than the strong-constraint errors for hybrid-4DEnVar. This may be related to the simplified physics of the PF model. In particular, its moist processes such as precipitation (Stiller and Ballard 2009; Stiller 2009), which are important in hurricane development, use very simple parameterizations with global coefficients almost certainly inappropriate for hurricanes.

#### 3) Relative errors

Table 4 shows the relative strong-constraint errors for Cov models iii and iv. In Cov model iii, 4DEnVar performs significantly better than En-4DVar. As discussed above, this is probably due to the simplified PF model physics. In Cov model iv, hybrid-4DEnVar also performs better than hybrid-4DVar, since the ensemble covariances dominate the hybrid near hurricanes.

## 5. *J*_{c}, IAU, and 4DIAU

*J*

_{c}

The Met Office 4DVar scheme includes a penalty *J*_{c} on high-frequency oscillations, as revealed by applying a digital time filter to the increment trajectory ^{5} This constraint is effective enough that no further measures are needed to deal with imbalances—we just add the analysis increment valid at the start of the window directly to the background **x**^{b}. Since there is no model within 4DEnVar, we cannot expect the 4DVar *J*_{c} term to be effective, so it was omitted and there is currently no dynamical penalty on imbalance within 4DEnVar. Thus, the increments produced by hybrid-4DEnVar are poorly balanced, making it necessary to deal with the high-frequency gravity–inertia waves that are generated when the increments are added to the model.^{6}

A simple method of filtering high-frequency oscillations is the IAU scheme (Bloom et al. 1996). Here a fraction of the same 3D analysis increment is added each model time step over a time window. This weighted displacement has a time-filtering effect on the increments, similar to that of the time filter in the *J*_{c} term (Polavarapu et al. 2004). The IAU method was used for our 3DVar and 3DEnVar experiments, inserting the analysis increment over a 6-h span with uniform weights.

Since 4DEnVar directly produces a 4D analysis increment, it is possible to modify the IAU scheme to add increments valid at the correct time to each model time step; we call this 4DIAU. Although superficially similar, 4DIAU has different time-filtering properties to IAU because there is no displacement in time.^{7} In theory, waves that evolve the same way in **x**^{b}) are not time filtered as they would be in the *J*_{c} term or the IAU. However, as long as the ensemble trajectories themselves contain relatively little noise, most noise will come from the manipulations of each 3D state in

To investigate the degree of high-frequency gravity wave activity in the various systems, we obtained near-surface pressure *p*_{1} increment time series produced by adding hybrid-4DVar and hybrid-4DEnVar analysis increments to a test forecast, using a variety of insertion strategies:

instantaneous at

*T*− 3—the start of the analysis window;instantaneous at

*T*+ 0;instantaneous at

*T*+ 3;uniform IAU from

*T*− 3 to*T*+ 3, using the increment valid at*T*+ 0; anduniform 4DIAU from

*T*− 3 to*T*+ 3, using the nearest hourly increment to each time.

*p*

_{1}evolution due to the analysis increment was obtained by subtracting the time series from the background forecast trajectory, and two balance diagnostics were calculated. The mean absolute tendency plots on the left of Fig. 8 tend to emphasize very high-frequency oscillations, which are dissipated rapidly by the filtering built into the forecast model’s dynamics. The mean absolute high-pass

*p*

_{1}plots on the right measure the average amplitude of the high-frequency oscillations revealed by applying the same time filter as used in the 4DVar

*J*

_{c}term—a more relevant measure for data assimilation. For the high-frequency oscillations captured by the filter, this diagnostic gives more weight to relatively slow oscillations than the tendency diagnostic. Since these slower oscillations tend to persist longer in the model, the value of this diagnostic reduces more slowly than that of the tendency diagnostic. The frequency response of the

*J*

_{c}filter, and that of the standard 6-h uniform-weights IAU scheme used with 3DVar and in these balance experiments is shown in Fig. 9.

The basic findings are as follows:^{8}

The degree of imbalance produced by a hybrid-4DEnVar increment is similar to that produced by the

*T*− 3 increment from hybrid-4DVar without a*J*_{c}term (cf. the black, red, and pink lines in Figs. 8e,f with the black lines in Figs. 8c,d).The hybrid-4DEnVar with 4DIAU appears to be well balanced—better than hybrid-4DVar with

*J*_{c}(cf. the blue lines in Figs. 8e,f with the black lines in Figs. 8a,b).For hybrid-4DEnVar, 4DIAU is almost as effective as the 6-h IAU scheme (cf. the blue and green lines in Figs. 8e,f.).

For hybrid-4DVar without

*J*_{c}, 4DIAU (blue lines in Figs. 8c,d) is less effective than both IAU (green lines in the same figures) and*J*_{c}(black lines in Figs. 8a,b). In fact, 4DIAU is not much better than just applying the analysis increment at*T*− 3 and allowing the filtering built into the forecast model to deal with the noise (cf. the black and blue lines in Figs. 8c,d).

The time-filtering properties of IAU and 4DIAU are a result of destructive interference between increments added at different times, so the ineffectiveness of 4DIAU when used with 4DVar is due to the dynamical consistency of the increments. The essence of this effect is illustrated in Fig. 10, which shows (first seven rows) high-frequency components of the *p*_{1} field at *T* + 6 produced by adding analysis increments (climatological component alone, corresponding to **v**) at *T* − 3, *T* − 2, …, *T* + 3. This insertion process is similar to what is done in IAU or 4DIAU, but here we add full analysis increments, and add the increments every hour rather than every time step. Figures 10a,c are for 4DIAU-like insertions, where the increment times match the insertion times. Figures 10b,d are for IAU-like insertions where the same *T* + 0 increment is added at all times. For hybrid-4DEnVar (Figs. 10a,b), the climatological components of the analysis increments are the same at all times, so 4DIAU is equivalent to IAU. The (same) increment added at different times produces much different high-frequency structures at *T* + 6, which tend to cancel each other when averaged together (bottom panels). For hybrid-4DVar without a *J*_{c} term, the effect of the IAU-like insertions (Fig. 10d) is similar to what we see with hybrid-4DEnVar. However, because of the dynamical consistency between the hourly analysis increments, the 4DIAU-like insertions (Fig. 10c) produce very similar high-frequency patterns at *T* + 6: what differences we do see are the result of mismatches between the evolution of analysis increments in the 4DVar PF model and in the full model, but these are relatively small. Thus, the average increment shown on the bottom row of Fig. 10c is similar to the individual increments above, with very little cancellation evident.

Corresponding results for the ensemble components of the analysis increments (corresponding to the

In light of these results, we see that the 4DVar4DIAU trial described in section 3 included much poorer noise control than the other trials, which suggests that its half-percent deficit against 4DVar (Fig. 1f) is likely an overestimate of the effect of not having a *J*_{c} term in 4DEnVar.

## 6. Discussion and conclusions

We have set out a consistent documentation of the use of climatological, ensemble, and hybrid covariances in 4DVar and 4DEnVar.

We have described an appealingly simple 4DIAU method, for adding the 4D increment from 4DEnVar to the background in a way that reduces imbalance in the subsequent forecast, while not damping real moving features seen in the ensemble.

We have *not* in this paper studied the benefit from hybrid covariances—using an ensemble to add “errors of the day” information to the climatological covariance traditionally used in 3DVar and 4DVar. Instead we used the same 3D hybrid covariances and studied the effect of differences in the way they were extended to four-dimensions in hybrid-4DEnVar compared to hybrid-4DVar. The main conclusion was that, with the same settings (tuned for hybrid-4DVar), the new hybrid-4DEnVar system performed worse than the existing hybrid-4DVar. In detail:

The main flaw in our hybrid-4DEnVar, compared to hybrid-4DVar, is that the climatological component of the covariance is not evolved in time: it is used as in 3DVar. [In retrospect it is not surprising that this effect dominates, since we followed Clayton et al. (2013) and gave the climatological covariances a weight of

in the hybrid.] Time-covariance errors resulting from the localization and model not commuting (Fairbairn et al. 2014), and the 3D localization not following the flow, are not important for our 1200-km localization scale and 6-h window, but they do become important for a 500-km scale.

The 4DEnVar method was able to use the ensemble component of the covariances often more accurately than in En-4DVar. In the jet stream case evolution errors induced by localization in 4DEnVar were smaller than 4DVar errors, except when the localization was severe. In the hurricane case approximations in the perturbation forecast model induced errors in 4DVar evolutions.

### Improving hybrid-4DEnVar

The maintenance and running costs of hybrid-4DVar are larger, so there is an incentive to improve hybrid-4DEnVar. Our results show that to do this we need to reduce the weight on climatological

The Met Office has increased its operational ensemble size from 23 to 44; further increases are expected as computer power allows. This is guided by Houtekamer et al. (2014), who indicated possible benefits for the Canadian system in increasing the size beyond 192.

The MOGREPS ensemble system (Bowler et al. 2008; Flowerdew and Bowler 2011) was designed and tested to quantify uncertainty in short-period forecasts. Attention in the past focused on having the correct ensemble spread, with less attention on the implied covariances used in data assimilation. Experiments are under way to see whether alternative ensemble generation methods, such as an ensemble of data assimilations (EDA; Bonavita et al. 2012) of 4DEnVar assimilations, give better results.

Localization need not be simply a function of distance: see the discussion around Eq. (16). For instance Buehner and Charron (2007) showed that spectral localization is a spatial smoother. The wave band implementation of Buehner (2012) allows us to use different spatial localization scales for each wave band. Multivariate localization (i.e., making the transformed balanced and unbalanced variables independent) would allow us to impose the balance assumed by the climatological covariance model on the ensemble covariances, at the risk of removing real flow-dependent correlations like those seen by Montmerle and Berre (2010). We have implemented these methods and plan to test their benefit.

Wang et al. (2013) and Wang and Lei (2014) found that when the ensemble resolution matched the assimilation model, the weight *β*_{c} and *β*_{e}—something easy to achieve in conjunction with the wave band filter.

Flow-following localization might be done using the ensemble itself (Bishop and Hodyss 2007, 2009a, b), or it might be done by replacing

## Acknowledgments

David Fairbairn would like to thank EPSRC for partly funding his engineering doctorate.

## REFERENCES

Bishop, C. H., and D. Hodyss, 2007: Flow-adaptive moderation of spurious ensemble correlations and its use in ensemble-based data assimilation.

,*Quart. J. Roy. Meteor. Soc.***133**, 2029–2044, doi:10.1002/qj.169.Bishop, C. H., and D. Hodyss, 2009a: Ensemble covariances adaptively localized with ECO-RAP. Part 1: Tests on simple error models.

,*Tellus***61A**, 84–96, doi:10.1111/j.1600-0870.2008.00371.x.Bishop, C. H., and D. Hodyss, 2009b: Ensemble covariances adaptively localized with ECO-RAP. Part 2: A strategy for the atmosphere.

,*Tellus***61A**, 97–111, doi:10.1111/j.1600-0870.2008.00372.x.Bloom, S. C., L. L. Takacs, A. M. D. Silva, and D. Ledvina, 1996: Data assimilation using incremental analysis updates.

,*Mon. Wea. Rev.***124**, 1256–1271, doi:10.1175/1520-0493(1996)124<1256:DAUIAU>2.0.CO;2.Bonavita, M., L. Isaksen, and E. Holm, 2012: On the use of EDA background error variances in the ECMWF 4D-Var.

,*Quart. J. Roy. Meteor. Soc.***138**, 1540–1559, doi:10.1002/qj.1899.Bowler, N. E., A. Arribas, K. R. Mylne, K. B. Robertson, and S. E. Beare, 2008: The MOGREPS short-range ensemble prediction system.

,*Quart. J. Roy. Meteor. Soc.***134**, 703–722, doi:10.1002/qj.234.Buehner, M., 2012: Evaluation of a spatial/spectral covariance localization approach for atmospheric data assimilation.

,*Mon. Wea. Rev.***140**, 617–636, doi:10.1175/MWR-D-10-05052.1.Buehner, M., and M. Charron, 2007: Spectral and spatial localization of background-error correlations for data assimilation.

,*Quart. J. Roy. Meteor. Soc.***133**, 615–630, doi:10.1002/qj.50.Buehner, M., P. L. Houtekamer, C. Charette, H. L. Mitchell, and B. He, 2010a: Intercomparison of variational data assimilation and the ensemble Kalman filter for global deterministic NWP. Part I: Description and single-observation experiments.

,*Mon. Wea. Rev.***138**, 1550–1566, doi:10.1175/2009MWR3157.1.Buehner, M., P. L. Houtekamer, C. Charette, H. L. Mitchell, and B. He, 2010b: Intercomparison of variational data assimilation and the ensemble Kalman filter for global deterministic NWP. Part II: One-month experiments with real observations.

,*Mon. Wea. Rev.***138**, 1567–1586, doi:10.1175/2009MWR3158.1.Buehner, M., J. Morneau, and C. Charette, 2013: Four-dimensional ensemble-variational data assimilation for global deterministic weather prediction.

,*Nonlinear Processes Geophys.***20**, 669–682, doi:10.5194/npg-20-669-2013.Clayton, A. M., A. C. Lorenc, and D. M. Barker, 2013: Operational implementation of a hybrid ensemble/4D-Var global data assimilation system at the Met Office.

,*Quart. J. Roy. Meteor. Soc.***139**, 1445–1461, doi:10.1002/qj.2054.Courtier, P., J.-N. Thépaut, and A. Hollingsworth, 1994: A strategy for operational implementation of 4D-Var, using an incremental approach.

,*Quart. J. Roy. Meteor. Soc.***120**, 1367–1387, doi:10.1002/qj.49712051912.Desroziers, G., J.-T. Camino, and L. Berre, 2014: 4DEnVar: Link with 4D state formulation of variational assimilation and different possible implementations.

*Quart. J. Roy. Meteor. Soc.,***140,**2097–2110, doi:10.1002/qj.2325.Etherton, B. J., and C. H. Bishop, 2004: Resilience of hybrid ensemble/3DVAR analysis schemes to model error and ensemble covariance error.

,*Mon. Wea. Rev.***132**, 1065–1080, doi:10.1175/1520-0493(2004)132<1065:ROHDAS>2.0.CO;2.Fairbairn, D., S. R. Pring, A. C. Lorenc, and I. Roulstone, 2014: A comparison of 4D-Var with ensemble data assimilation methods.

,*Quart. J. Roy. Meteor. Soc.***140**, 281–294, doi:10.1002/qj.2135.Fisher, M., and H. Auvinen, 2012: Long window weak-constraint 4D-Var.

*ECMWF Seminar on Data Assimilation for Atmosphere and Ocean,*Shinfield Park, Reading, ECMWF, 189–202.Flowerdew, J., and N. E. Bowler, 2011: Improving the use of observations to calibrate ensemble spread.

,*Quart. J. Roy. Meteor. Soc.***137**, 467–482, doi:10.1002/qj.744.Gauthier, P., and J.-N. Thépaut, 2001: Impact of the digital filter as a weak constraint in the preoperational 4DVAR assimilation system of Météo-France.

,*Mon. Wea. Rev.***129**, 2089–2102, doi:10.1175/1520-0493(2001)129<2089:IOTDFA>2.0.CO;2.Gustafsson, N., J. Bojarova, and O. Vignes, 2014: A hybrid variational ensemble data assimilation for the high resolution limited area model (HIRLAM).

,*Nonlinear Processes Geophys.***21**, 303–323, doi:10.5194/npg-21-303-2014.Hamill, T. M., and C. Snyder, 2000: A hybrid ensemble Kalman filter-3D variational analysis scheme.

,*Mon. Wea. Rev.***128**, 2905–2919, doi:10.1175/1520-0493(2000)128<2905:AHEKFV>2.0.CO;2.Hamill, T. M., and J. S. Whitaker, 2005: Accounting for the error due to unresolved scales in ensemble data assimilation: A comparison of different approaches.

,*Mon. Wea. Rev.***133**, 3132–3147, doi:10.1175/MWR3020.1.Hamill, T. M., J. S. Whitaker, and C. Snyder, 2001: Distance dependent filtering of background error covariance estimates in an ensemble Kalman filter.

,*Mon. Wea. Rev.***129**, 2776–2790, doi:10.1175/1520-0493(2001)129<2776:DDFOBE>2.0.CO;2.Houtekamer, P., X. Deng, H. L. Mitchell, S.-J. Baek, and N. Gagnon, 2014: Higher resolution in an operational ensemble Kalman filter.

,*Mon. Wea. Rev.***142**, 1143–1162, doi:10.1175/MWR-D-13-00138.1.Ide, K., P. Courtier, M. Ghil, and A. Lorenc, 1997: Unified notation for data assimilation: Operational, sequential and variational.

,*J. Meteor. Soc. Japan***75**, 181–189.Ingleby, N. B., 2001: The statistical structure of forecast errors and its representation in the Met Office global 3D variational data assimilation scheme.

,*Quart. J. Roy. Meteor. Soc.***127**, 209–231, doi:10.1002/qj.49712757112.Ingleby, N. B., A. C. Lorenc, K. Ngan, F. R. Rawlins, and D. R. Jackson, 2013: Improved variational analyses using a nonlinear humidity control variable.

,*Quart. J. Roy. Meteor. Soc.***139**, 1875–1887, doi:10.1002/qj.2073.Kalnay, E., H. Li, T. Miyoshi, S.-C. Yang, and J. Ballabrera-Poy, 2007: 4-D-Var or ensemble Kalman filter?

,*Tellus***59A**, 758–773, doi:10.1111/j.1600-0870.2007.00261.x.Kepert, J., 2009: Covariance localisation and balance in an ensemble Kalman filter.

,*Quart. J. Roy. Meteor. Soc.***135**, 1157–1176, doi:10.1002/qj.443.Kleist, D. T., and K. Ide, 2015a: An OSSE-based evaluation of hybrid variational-ensemble data assimilation for the NCEP GFS. Part I: System description and 3D-Hybrid results.

, in press.*Mon. Wea. Rev.*Kleist, D. T., and K. Ide, 2015b: An OSSE-based evaluation of hybrid variational-ensemble data assimilation for the NCEP GFS. Part II: 4DEnVar and hybrid variants.

, in press.*Mon. Wea. Rev.*Kleist, D. T., D. F. Parrish, J. C. Derber, R. Treadon, R. M. Errico, and R. Yang, 2009: Improving incremental balance in the GSI 3DVAR analysis system.

,*Mon. Wea. Rev.***137**, 1046–1060, doi:10.1175/2008MWR2623.1.Le Dimet, F.-X., and O. Talagrand, 1986: Variational algorithms for analysis and assimilation of meteorological observations: Theoretical aspects.

,*Tellus***38A**, 97–110, doi:10.1111/j.1600-0870.1986.tb00459.x.Liu, C., Q. Xiao, and B. Wang, 2008: An ensemble-based four-dimensional variational data assimilation scheme. Part I: Technical formulation and preliminary test.

,*Mon. Wea. Rev.***136**, 3363–3373, doi:10.1175/2008MWR2312.1.Liu, C., Q. Xiao, and B. Wang, 2009: An ensemble-based four-dimensional variational data assimilation scheme. Part II: Observing System Simulation Experiments with advanced research WRF (ARW).

,*Mon. Wea. Rev.***137**, 1687–1704, doi:10.1175/2008MWR2699.1.Lorenc, A. C., 2003a: Modelling of error covariances by four-dimensional variational data assimilation.

,*Quart. J. Roy. Meteor. Soc.***129**, 3167–3182, doi:10.1256/qj.02.131.Lorenc, A. C., 2003b: The potential of the ensemble Kalman filter for NWP—A comparison with 4D-Var.

,*Quart. J. Roy. Meteor. Soc.***129**, 3183–3203, doi:10.1256/qj.02.132.Lorenc, A. C., 2013: Recommended nomenclature for EnVar data assimilation methods.

*Research Activities in Atmospheric and Oceanic Modelling,*WGNE, 2 pp. [Available online at http://www.wcrp-climate.org/WGNE/BlueBook/2013/individual-articles/01_Lorenc_Andrew_EnVar_nomenclature.pdf.]Lorenc, A. C., and F. Rawlins, 2005: Why does 4D-Var beat 3D-Var?

,*Quart. J. Roy. Meteor. Soc.***131**, 3247–3257, doi:10.1256/qj.05.85.Lorenc, A. C., and Coauthors, 2000: The Met. Office global three-dimensional variational data assimilation scheme.

,*Quart. J. Roy. Meteor. Soc.***126**, 2991–3012, doi:10.1002/qj.49712657002.Lynch, P., and X.-Y. Huang, 1992: Initialization of the HIRLAM model using a digital filter.

,*Mon. Wea. Rev.***120**, 1019–1034, doi:10.1175/1520-0493(1992)120<1019:IOTHMU>2.0.CO;2.Mitchell, H. L., P. L. Houtekamer, and G. Pellerin, 2002: Ensemble size, balance, and model-error representation in an ensemble Kalman filter.

,*Mon. Wea. Rev.***130**, 2791–2808, doi:10.1175/1520-0493(2002)130<2791:ESBAME>2.0.CO;2.Montmerle, T., and L. Berre, 2010: Diagnosis and formulation of heterogeneous background-error covariances at the mesoscale.

,*Quart. J. Roy. Meteor. Soc.***136**, 1408–1420, doi:10.1002/qj.655.Ota, Y., J. C. Derber, E. Kalnay, and T. Miyoshi, 2013: Ensemble-based observation impact estimates using the NCEP GFS.

,*Tellus***65A**, 20038, doi:10.3402/tellusa.v65i0.20038.Polavarapu, S., S. Ren, A. Clayton, D. Sankey, and Y. Rochon, 2004: On the relationship between incremental analysis updating and incremental digital filtering.

,*Mon. Wea. Rev.***132**, 2495–2502, doi:10.1175/1520-0493(2004)132<2495:OTRBIA>2.0.CO;2.Rabier, F., 2005: Overview of global data assimilation developments in numerical weather-prediction centres.

,*Quart. J. Roy. Meteor. Soc.***131**, 3215–3233, doi:10.1256/qj.05.129.Rawlins, F., S. P. Ballard, K. J. Bovis, A. M. Clayton, D. Li, G. W. Inverarity, A. C. Lorenc, and T. J. Payne, 2007: The Met Office global 4-dimensional variational data assimilation system.

,*Quart. J. Roy. Meteor. Soc.***133**, 347–362, doi:10.1002/qj.32.Stiller, O., 2009: Efficient moist physics schemes for data assimilation. II: Deep convection.

,*Quart. J. Roy. Meteor. Soc.***135**, 721–738, doi:10.1002/qj.362.Stiller, O., and S. P. Ballard, 2009: Efficient moist physics schemes for data assimilation. I: Large-scale clouds and condensation.

,*Quart. J. Roy. Meteor. Soc.***135**, 707–720, doi:10.1002/qj.400.Tian, X., Z. Xie, and A. Dai, 2008: An ensemble-based explicit four-dimensional variational assimilation method

*. J. Geophys. Res.,***113,**D21124, doi:10.1029/2008JD010358.Wang, X., and T. Lei, 2014: GSI-based four dimensional ensemble-variational (4DEnsVar) data assimilation: Formulation and single-resolution experiments with real data for NCEP global forecast system.

,*Mon. Wea. Rev.***142,**3303–3325, doi:10.1175/MWR-D-13-00303.1.Wang, X., C. Snyder, and T. M. Hamill, 2007: On the theoretical equivalence of differently proposed ensemble–3DVAR hybrid analysis schemes.

*Mon. Wea. Rev.,***135,**222–227, doi:10.1175/MWR3282.1.Wang, X., D. M. Barker, C. Snyder, and T. M. Hamill, 2008a: A hybrid ETKF-3DVAR data assimilation scheme for the WRF model. Part I: Observing system simulation experiment.

,*Mon. Wea. Rev.***136**, 5116–5131, doi:10.1175/2008MWR2444.1.Wang, X., D. M. Barker, C. Snyder, and T. M. Hamill, 2008b: A hybrid ETKF-3DVAR data assimilation scheme for the WRF model. Part II: Real observation experiments.

,*Mon. Wea. Rev.***136**, 5132–5147, doi:10.1175/2008MWR2445.1.Wang, X., D. Parrish, D. Kleist, and J. S. Whitaker, 2013: GSI 3DVar-based ensemble–variational hybrid data assimilation for NCEP global forecast system: Single-resolution experiments.

,*Mon. Wea. Rev.***141,**4098–4117, doi:10.1175/MWR-D-12-00141.1.Zhang, M., and F. Zhang, 2012: E4DVar: Coupling an ensemble Kalman filter with four-dimensional variational data assimilation in a limited-area weather prediction model.

,*Mon. Wea. Rev.***140**, 587–600, doi:10.1175/MWR-D-11-00023.1.

^{1}

Lorenc (2003b) incorrectly suggests that *β*_{c} + *β*_{e} = 1.

^{2}

In our experiments,

^{3}

For a discussion of the relationship between *L* and other localization functions see Clayton et al. (2013, their section 2.5).

^{4}

Very small differences exist in practice due to the differences in linearization noted at the end of section 2.

^{5}

While Gauthier and Thépaut (2001) use the full energy norm, the Met Office scheme uses only the elastic term that depends on the pressure increment, and applies this to both pressure and pressure tendency.

^{6}

We ran an experiment without such measures: the scores (not shown) were much worse than for any experiment in section 3.

^{7}

Because our

^{8}

The apparent increases in imbalance after *T* + 12 are due to diurnal variations in grid-scale noise produced by the model’s convection scheme, and are unrelated to imbalances produced by the analysis increments.