## 1. Introduction

Nonvariational ensemble Kalman filters (Houtekamer and Mitchell 1998; Bishop et al. 2001; Anderson 2001; Whitaker and Hamill 2002) are now used across a wide range of fields. Variations on these techniques that involve expanding the ensemble size beyond the *K* ensemble members propagated by the nonlinear ensemble forecast have been proposed for differing reasons.

Bishop and Hodyss (2009, 2011) introduced ensemble expansion techniques in order to allow flow-dependent time-evolving ensemble covariance localization. These papers used the fact that an ensemble of size *K* that is expanded to a size of *M* = *LK* by taking the element-wise product of each raw-member with each of the *L* columns of the square root of a localization matrix results in an *M*-member ensemble whose covariance is inherently localized. We shall hereafter refer to this type of expansion as a modulation product ensemble expansion. Leng et al. (2013) used an equivalent procedure to obtain nonadaptive inherent localization. Whitaker (2016) used a modulation product ensemble expansion to inherently localize ensemble covariances in the vertical and thus avoid the pitfalls outlined in Campbell et al. (2010) of attempting to localize satellite radiance ensemble covariances in radiance space rather than model space.

Kretschmer et al.’s (2015) climatologically augmented local ensemble transform Kalman filter (ETKF) expands the ensemble size by introducing *M*–*K* climatological forecast error proxies to the raw *K* forecast error proxies produced by the nonlinear model to create an *M*-member ensemble. As shown by Bishop and Satterfield (2013), the mean of the distribution of true error variances given an imperfect ensemble variance is a weighted ensemble variance plus a weighted climatological variance. Kretschmer et al.’s innovation allows such hybrid error covariance models to be incorporated directly into the ETKF framework. Sommer and Janjic (2017) tested ensemble expansions similar to Kretschmer et al.’s (2015) but in their case they used them to account for model error.

Regardless of the motivation for using an ensemble of size *M* to update the ensemble mean while only propagating an ensemble of size *K*, one is faced with the question of how to create the *K* analysis perturbations that will be used to initialize the next *K*-member ensemble forecast. In considering how to do this, one must also account for the fact that when some type of ensemble expansion has been employed, some ensemble perturbations may be considered to be more representative of the true forecast error distribution. For well-tuned ensemble forecasting systems, the most representative ensemble perturbations will be the *K* ensemble perturbations produced by the nonlinear forecast model. The gain ETKF (GETKF) introduced in this paper, provides a way of producing *K* analysis ensemble members from an *M*-member prior ensemble that can account for the fact that the *K* raw ensemble perturbations are likely better error proxies than arbitrarily selected members of the *M*-member ensemble. To illustrate the technique, we will focus on the case where the ensemble expansion is used to enable vertical ensemble covariance localization for the assimilation of satellite-like observations that are vertical integrals of the state. Section 2 uses a simple satellite-relevant data assimilation problem to illustrate how (i) the modulation ensemble expansion technique would improve the ability of the ETKF to extract information from satellites but (ii) does *not* provide an obvious solution for the problem of how to create a *K*-member analysis ensemble from the *M*-member analysis ensemble produced by the ETKF. Section 3 introduces the GETKF as a solution to this problem. Sections 4 and 5 compare the accuracy of the GETKF method for obtaining *K* analysis members with various ad hoc methods for obtaining *K* analysis members from the ETKF’s *M* analysis members. Section 4 makes the comparison using statistical models and theoretically derived *true* analysis error covariance matrices while section 5 makes the comparison within the context of a newly developed *storm-track* version of the Lorenz-96 model and a cycling data assimilation scheme. Concluding remarks follow in section 6.

## 2. ETKF satellite data assimilation and modulated ensembles

### a. Modulated ensembles

Consider the problem of estimating a *n* = 100 gridpoint vertical profile of temperature from *p* = 100 satellite radiances whose vertical weighting functions are depicted in Fig. 1. Methods of localizing ensemble covariances in the vertical based on the distances between observations and model variables (Hamill et al. 2001) are inappropriate for such observations because the variable that is observed does not exist at a single height: each observation is an integral of variables at many different heights. Campbell et al. (2010) compared the performance of EnKFs that used model space vertical covariance localization, in which the localization is prescribed purely in terms of the distance between model variables, and EnKFs that used observation space localization, in which the localization is based on “estimated” distances between satellite observations and other variables. They found that the model space localization was superior to the observation space localization. In particular, the observation space localization approach was unable to recover the true state in the special case where there are as many perfect satellite observations as there are model variables. In contrast, with model space localization, EnKFs were readily able to recover the true model state in this case.

*j*th raw ensemble perturbation corresponding to the

*j*th ensemble forecast member

*K*ensemble members. In the case examined by Whitaker (2016), the modulation product pertained strictly to a vertical localization covariance matrix and

*L ~ O*(10) and hence the modulation product ensemble was an order of magnitude larger than the raw ensemble. The proof that the expanded ensemble given by (2) satisfies (1) is given in Eqs. (1) and (2) of Bishop and Hodyss (2009).

*i*th row and

*j*th column of the true

*n*×

*n*forecast error covariance matrix

*i*and

*j*) than the upper atmosphere. With (3) as the true forecast error covariance matrix, we can obtain a reasonable counterpart for the

*K*raw forecast ensemble perturbations usingwhere

*K =*50 member ensemble was generated and then its (unlocalized) covariance matrix

- Broader length scales defined by
and were used in (3) to create a correlation matrix that was similar to but with broader length scales. - The eigenvectors and eigenvalue pairs of
were computed and ordered from largest eigenvalue to smallest eigenvalue. Having determined that only 10 eigenvalues were sufficient to account for 85% of the sum of all the eigenvalues, the 10 leading (eigenvector, eigenvalue) pairs were used to create a low-rank approximation to . Note that each column of is an eigenvector of multiplied by the square root of its corresponding eigenvalue. - The final low-rank localization matrix
was created by removing the deviation of the diagonal of from unity using , where .

Figure 3 allows us to visualize the relationship between the raw ensemble members and the modulated ensemble members. Referring to (2), the first *K* members of the modulated ensemble are *K* members in the modulated ensemble are given by

*M*modulated ensemble perturbations to create an

*M*-member ensemble

### b. The modulated ETKF (METKF) and the analysis ensemble reduction problem

*p*observations with an

*M*member ensemble is to compute the matrixwhere

*H*is the (nonlinear) observation operator,

*p*×

*p*observation error covariance matrix, and

*Mp*operations) of computing the normalized observation space ensemble perturbations

*p =*100 true observations using

The blue line in Fig. 4a depicts an example of a true model state generated using (13). In Fig. 4b, the blue line depicts the corresponding true state in observation space while the red line depicts the corresponding error-prone observations and the cyan line depicts the mean of the forecast ensemble in observation space. The difference between the observations (Fig. 4b, red line) and the prior mean in observation space (Fig. 4b, cyan line) is then used in (12) to correct the model space prior ensemble mean (Fig. 4a, cyan line). The resulting analyses obtained with and without a modulated ensemble are depicted in Fig. 4a by the mauve and black lines, respectively. Inspection of Fig. 4a shows that the *M* = *KL* = 500 member modulated ensemble allows the analysis (mauve line) to track the true state (blue line) more closely than the analysis (black line) from the unmodulated *K* = 50 member ensemble. Direct computation shows that, in this case, the mean square errors (MSEs) of the analyses with and without the modulated ensembles are 0.26 and 0.6, respectively. To check whether this difference was statistically significant, the aforementioned data assimilation experiment was repeated eight times using entirely independent random numbers to create the truth, the observations, and the ensemble. The dashed and solid lines give the MSEs for the unmodulated and modulated ensemble cases, respectively, in each of these eight experiments. In all eight cases, the MSE obtained using the modulated ensemble was lower than that obtained using the unmodulated ensemble. If there were no statistical difference in ETKF performance with and without modulated ensembles, then the probability of finding superior performance in eight out of eight cases would be

*n*model variables associated with the

*j*th vertical column of the model,

*j*th observation volume, and

*n*×

*M*matrix

*j*th grid column. Note that (11) and (17) utilize the symmetric form

Updating entire vertical columns of variables saves as many calls to the ETKF solver as there are vertical model levels [*O*(100)]. However, it also increases the number of observations processed by the ETKF in each call. The net effect of this change to the computational cost will depend on the specific details of the LETKF implementation and the observational network. The number of floating point operations required for the modulated ensemble form of the LETKF depends on the number of observations *p* within the cylindrical observation volume used to update a vertical column, the number *n* of model variables in the vertical column being updated, the number of unmodulated ensemble members *K*, the number of eigenvectors retained when approximating the square root of the vertical localization matrix *L*, and the number of modulated ensemble members

The operation count scaling for the ensemble update given by (17) is dominated by the cost of computing *L* of eigenvectors retained from the localization matrix. If only two eigenvectors were retained rather than 10, then the dominant *L =* 10 takes 100 times longer than updating a single vertical grid point with no ensemble modulation. However, if there were 100 vertical grid points, the cost of updating the entire vertical column with or without modulation would be about the same.

Within the context of a cycling ensemble data assimilation scheme operating on a computer with sufficient resources to run an ensemble with *K* members, (17) presents a problem: it gives *but there are only enough computational resources to propagate K members*. How might the *K* members

*perturbed observations*approach (Burgers et al. 1998) would be to setwhere

*i*th vector of

*K*random normally distributed observation perturbations. Each sample of

*K*perturbations has the mean removed so that

*K*analysis states obtained from (18) are stored in the matrix

*stochastic subsampling*approach would be to setwhere

*M*×

*K*matrix whose elements are independent random draws from a normal distribution. The sample covariance of the

*K*member ensemble

*K*-member ensemble is identical to

*deterministic subsampling*approach would be to setwhere

*K*columns of

*i*and the index step

*K*initial conditions from an

*M*-member ensemble. In their case, the first

*K*members of the

*M*-member ensemble corresponded to raw forecast perturbations. These were considered better estimates of the true forecast error covariance matrix than the climatological perturbations. To obtain analysis perturbations closely associated with these members, they simply selected the first

*K*members of the posterior ensemble. This is the case of deterministic subselection obtained when one sets

Both of these methods are easy to implement and add little to the cost of the method. In the next section we introduce the GETKF and in the section after that, we present the results of tests that show that the GETKF gives a *K*-member analysis ensemble covariance matrix

## 3. The gain form of the ETKF (GETKF)

*K*unmodulated perturbations while respecting key aspects of the localized ensemble covariances used by the data assimilation scheme.

*concise*SVD, we mean the SVD that removes all of the left and right singular vectors corresponding to zero singular values. Equation (22) implies thatUsing (23) in (21) givesLike (11), (24) yields

^{1}appropriate for ensemble perturbations discussed in Whitaker and Hamill (2002) and

*K*ensemble perturbations. Doing so yields an equation for the

*i*th posterior ensemble member of the formwhere the posterior analysis mean

*p*= 1), it can be shown that (26) would give identical results to Whitaker and Hamill’s (2002) and Anderson’s (2001) filters.

*n*×

*K*matrix

*M*-member modulated ensemble

*K*forecast ensemble perturbations

*inherent GETKF inflation*. This factor ensures that the average analysis error variance produced by the GETKF is identical to that produced by the METKF. We shall denote the covariance matrix of the analysis ensemble obtained using (30) in (29) by

With (29), the number of operations required to update an individual ensemble member is roughly the same as updating the ensemble mean [see (12)]. This means that the dominant computational cost is proportional to *K ~* 100, this gives a total cost proportional to

## 4. Comparison of the accuracy of with , and in a simple model

*K-*member analysis ensemble covariance matrices

*ij*)th element of the localization matrix

*b*is some arbitrary scalar. Thus, (34) may be viewed as a measure of the similarity of the shapes of the two covariance matrices as it is independent of the amplitude of the field while (33) measures some combination of the similarity of shape and amplitude. In considering (34), note that since

The MSE and correlation measures of accuracy were computed over eight entirely independent trials. Figure 5 plots the MSE and correlation measures over each of these trials for

We have not displayed the results for the deterministic subsampling technique in Fig. 5 because it was found that *K* = 50 and *L* = 10 in our simple model experiment, this approach selects six members (1, 10, 19, 28, 37, and 46) that are raw members modulated by column 1 of the truncated renormalized localization matrix and six more members (55, 64, 73, 82, 91, and 100) that are raw members modulated by column 2, and so on until 50 members were obtained. There was not a great deal of sensitivity to changing which modulated ensemble members were selected. The deterministic selection procedure associated with the above equation was found to be better than or statistically indistinguishable from other selection procedures. In this case, this selection procedure was significantly superior to that used by Kretschmer et al. (2015) of just choosing the first 50 members of the expanded ensemble, which correspond to the 50 raw members modulated by column 1 of the localization matrix. However, since the ensemble expansion technique employed by Kretschmer et al. (augmentation with climatological forecast error proxies) is very different from the ensemble modulation expansion technique used here, our result does not imply that Kretchmer et al.’s deterministic subselection approach was suboptimal for their application.

Most promising for the GETKF is the fact that its analysis ensemble covariance matrix estimate was *always* closer to the true forecast error covariance matrix than any of the other techniques of obtaining *K* analysis perturbations from the *M* METKF analysis perturbations.

## 5. Cycling experiments with a simple dynamical model

To further examine GETKF performance, here we test it in a data assimilation cycling mode using a newly created “storm track” version of the Lorenz-96 model (Lorenz and Emanuel 1998) and observations that are an integral of the state.

*F*(which is set to a constant value of 8 in the original model) is treated as an independent random variable at every grid point. Specifically, if the forcing at the previous time step was

*G*is a random number drawn from a gamma distribution. To interpret the scalar

*r*in this equation, note that right multiplying (35) by

*F*over one time step. We set

*e*-folding time scale of three model time steps.

*F*values produced by (35) must equal the mean of

*G*. We set the mean of the gamma distribution from which the values of

*G*to be equal to 8, thus ensuring that

*F*about its mean value of 8 was equal to ⅛.

The spatially varying linear damping term

Data assimilation experiments are performed with eight ensemble members, using the serial algorithm of Whitaker and Hamill (2002) (incorporating both observation space localization and model space localization via modulated ensembles), the METKF and the GETKF (both of which employ model space localization using modulated ensembles). All experiments use the observation-dependent posterior inflation algorithm of Hodyss et al. (2016).^{2} The tunable parameters for the inflation scheme [*a* and *b* in Eq. (4.4) of Hodyss et al. (2016)] are both fixed to 1.0 for all of the experiments. Note that for the GETKF, the Hodyss et al. posterior inflation is applied to the perturbations obtained using (29), where (29) includes the multiplicative factor *a* that through (30) ensures that the trace of the GETKF posterior covariance matrix is identical to that of the corresponding METKF posterior covariance.

Each observed value is equal to the average of seven spatially contiguous grid points. Each grid point has a unique average associated with it. Anderson and Lei (2013) found that such *integral* observations are particularly challenging for observation space localization in the standard 40-variable Lorenz-96 model. These observations are analogous to satellite radiance observations, where the forward radiative transfer operator involves a vertical integral of the state.^{3}

Since we chose to assimilate all 80 unique seven-point running averages of the system each data assimilation cycle, the observation error covariance matrix

When using idealized models for data assimilation experiments, it is of interest to note the ratio of the error-doubling time to the data assimilation time interval. While we have not performed a detailed analysis of the error-doubling time in this model, we do know how our modifications to the original Lorenz and Emanuel (1998) model alter the growth of ensemble spread. Specifically, allowing the diffusion to vary zonally had little overall impact on the growth of the ensemble spread but changing the forcing *F* from a constant to a randomly varying *F* increased the growth of the spread over a single time step from 1.15 to 1.66. This suggests that the error-doubling time for our modified version of this model is even shorter than that of the original model. Lorenz and Emanuel (1998) state that the error-doubling time of their original model was 2 days—8 times larger than the data assimilation time interval used in our experiments so it would be less than 8 times larger in our experiments.

*i*th and

*j*th grid points and a length scale

*l*. The structure of the climatological covariance matrix (Fig. 7) suggests that a spatially varying covariance localization length scale, tighter in the center of the domain and broader at each end, should perform better than a constant localization scale. The variation of the covariance function length scale appears to mirror the spatial structure of the linear damping terms; furthermore, dynamical reasoning suggests that error correlation length scales would also be partially controlled by the linear damping term. For those two reasons, and to reduce the size of the parameter space of varying localization length scales to be explored, we chose to let the GC localization length scale have the same spatial structure as the linear damping term so thatwhere

*m*is the index locating the grid point of interest and

*ij*)th element of this localization matrix

*i*th and

*j*th grid points in (39) ensures that the localization matrix is symmetric. We have confirmed that the use of the spatially varying length scale given by (38) results in analysis errors that are about 10% smaller than any constant localization length scale. A plot of the localization matrix arising from (39) is given in Fig. 8 for the case where the reference localization length scale

In our implementation of the observation space localization form of the serial EnSRF, each observation is used to update the mean and *K* raw perturbations of the state variables using ensemble covariances localized with the function given by (39) in which the distance *k*th boxcar averaging kernel. In other words, the “location” assigned to each observation is the fourth grid point of the seven consecutive grid points involved in the running average that defines the observation.

For model space localization, a synthetic modulation product ensemble is created by modulating the eight-member ensemble with the eigenvectors of the localization matrix implied by the spatially varying GC localization function using the procedure described in section 2. In all experiments, we ensure that the number *L* of leading eigenvectors retained in our approximation of (39)’s localization matrix is sufficient to explain 99% of the trace of **.** The scaled eigenvectors are then used to perform the modulation, leading to a synthetic ensemble of size 8*L*, where *L* is a function of the reference localization length scale *L* increases. When the modulated ensemble is used in the serial EnSRF, each observation and the modulated ensemble covariances are used to update the mean, the *M* modulated perturbations, and the *K* raw perturbations of the state variables. The modulated ensemble covariances are not localized because the localization is already “baked into” the ensemble via the modulation product. After all observations have been assimilated by the serial EnSRF, the fully updated *K* raw perturbations are added back to the posterior mean to create the final posterior ensemble and this *K* member ensemble is propagated forward by the nonlinear model.

The GETKF simultaneously assimilates all of the observations in the local observation volume, using (12) to update the ensemble mean and (29) to update the *K* raw ensemble perturbations. In this toy model example, the local observation volume is global in that it contains *all* of the observations.

The METKF, on the other hand, involves the computation of a set of weights that are used to transform the entire set of 8*L*-member background ensemble perturbations into an 8*L*-member set of analysis ensemble perturbations. The first eight members of the *prior* 8*L*-member modulated ensemble are in fact the original eight members propagated by the forecast model multiplied by the first eigenvector of the localization matrix. This knowledge suggests an ad hoc approach in which one would try and “undo” the modulation in the posterior ensemble by elementwise dividing each of the first eight members of the posterior ensemble by the first eigenvector of the localization matrix. Since the first eigenvector typically has a relatively simple structure without zero values, demodulation of the first eight posterior members by the first eigenvector seems like the best option. We call this approach the “demodulated” METKF.

Figure 9 shows the structure of the first eigenvector of the localization matrix with *d*_{0} = 10 (15) instead of 20 results in a loss of 9 (5) digits of precision. We note in the case of the operational NCEP data assimilation system that the first eigenvector of the vertical localization matrix contains values very close to zero, and these near-zero values cause an extreme loss of precision in the updated ensemble, so that the demodulated METKF approach is infeasible. For the same reason, the demodulated METKF approach was also unworkable for the set of problems considered in section 4. As previously noted, another alternative is to use the perturbed observations approach [see (18)].

Figure 10 shows the root-mean-square error (RMSE) of the ensemble mean analysis (relative to the nature run used to generate the observations) for the serial filter using observation space localization, the METKF (both the “perturbed-obs” and demodulated variants) and the GETKF using model space localization, as a function of the GC localization length scale *d*_{0}. For reference, the horizontal black dashed line shows the near-optimal analysis error obtained by running a 256-member ETKF with no localization. For all values of *d*_{0}, model space localization outperforms observation space localization. The GETKF outperforms the demodulated METKF for the *d*_{0} values for which the demodulated METKF is stable. The demodulated METKF fails for

The modulated-ensemble serial filter with model space localization performs identically to the GETKF when the adjustment factor *a* in (29) is set equal to 1. However, when (30) is used to define *a*, the GETKF outperforms the modulated-ensemble serial filter for

Figure 11 shows that when (30) is used to define *a*, its average value increases with increasing localization length scale *a* is approximately equal to 1 and has a neutral or small effect on ensemble perturbation size. If it were generally true that the localization that makes the inherent GETKF inflation neutral also minimizes RMSE, one could adaptively decide the vertical localization for each vertical column by varying the localization until the “*a*” obtained from (30) became equal to 1. We leave it to future research to assess the merit of the associated hypothesis that *the localization that neutralizes inherent GETKF inflation also minimizes RMSE*.

## 6. Conclusions

The GETKF has been introduced and described. It is a variation on the ETKF that provides a solution to the problem of how to rapidly obtain just *K* posterior ensemble members from an ETKF-type method when the size of the forecast ensemble has been synthetically increased from *K* members to *KL* members. To better assess the potential value of the GETKF, alternative methods for creating just *K* analysis members from *KL* members were also examined. These alternative methods included the well-established perturbed observation method, a stochastic subsampling of the analysis distribution implied by the *KL* member posterior ensemble, a deterministic subsampling approach, and a demodulation approach.

In tests with a statistical model that used 50 raw ensemble members to assimilate a vertical profile of observations, each of which was an integral of the state and in which the true suboptimal analysis error covariance matrix was perfectly known, it was found that the GETKF produced significantly more accurate analysis error covariance matrices than any of the aforementioned alternatives.

In cycling data assimilation tests with a newly developed 80-variable storm-track version of the Lorenz-96 model and observations that were integrals of the state, the following results were obtained:

- Model space localization outperformed observation space localization.
- The GETKF method for obtaining a
*K*-member posterior ensemble from a*KL*-member prior ensemble resulted in lower mean square analysis errors than either the demodulation or perturbed observation methods. - If the GETKF’s posterior adjustment factor was set equal to unity rather than the value given by (30), GETKF’s performance was identical to that obtained when modulated ensembles were used in the serial EnSRF and the serial EnSRF’s modified gain was used to obtain
*K*posterior perturbations. - The GETKF gave superior or equivalent performance to the EnSRF when (30) was used to set the GETKF’s posterior adjustment factor
*a.*The superior performance was confined to localization length scales larger than the optimal localization length scale. Intriguingly, at the optimal localization length scale, the average value of*a*was approximately equal to 1.

In dynamical systems that have a richer range of scales than the simple storm-track model considered here, it can be impractical to optimally tune the localization length scale for all the phenomena likely to occur. In such situations, the lack of sensitivity of GETKF performance to localization length scale could lend it advantages over the EnSRF.

Within the context of LETKFs, our simple model results suggest that the GETKF ensemble update algorithm should replace the ETKF ensemble update when modulation product ensembles have been used for vertical model space localization. Penny et al. (2015) have demonstrated how the removal of vertical localization allows the LETKF to update entire vertical columns of state variables simultaneously. The GETKF with ensembles modulated in the vertical also simultaneously updates entire vertical columns of state variables, but in contrast to Penny et al. (2015), it incorporates vertical *model space* localization. For observations of variables that are vertical integrals of the state such as satellite-based radiance observations, the vertical “location” of the observation is ill-defined. This makes observation space localization particularly problematic. Our experiments together with those of Campbell et al. (2010) and Whitaker (2016) have found that model space localization was more effective than observation space localization when assimilating observations of variables that are vertical integrals of the state.

Within the context of deterministic EnKFs that assimilate observations serially and employ model space vertical localization through a modulation product ensemble expansion, our results suggest that performance might be improved by replacing the one-at-a-time serial assimilation of a vertical column of observations by an “all at once” assimilation of the entire vertical column of observations using the GETKF. In such systems, the LETKF could achieve ensemble covariance localization in the horizontal using horizontal-distance-dependent observation error variance inflation while a serial deterministic EnKF could achieve it using localization functions that were solely a function of horizontal distance.

In future work, we plan to apply a local volume version of the GETKF, using modulation product ensembles to assimilate satellite radiances with vertical localization in model space. Preliminary experiments using a serial assimilation approach (Whitaker 2016) have shown this approach to significantly enhance the ability of ensemble methods to extract information from satellite radiances. Apart from the potential skill gains mentioned in result iv above, it is possible that the local GETKF version of the algorithm will be found to be computationally more efficient than the serial filter in atmospheric applications because 1) the number of observations typically far exceeds the number of modulated ensemble members and 2) the matrix

CHB gratefully acknowledges funding support from the Chief of Naval Research through the NRL Base Program (PE 0601153N). JSW and LL acknowledge the support of the Disaster Relief Appropriations Act of 2013 (P.L. 113-2) that funded NOAA Research Grant NA14OAR4830123.

# APPENDIX

## Analysis Error Covariance Matrix in the Presence of a Suboptimal Gain

## REFERENCES

Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation.

,*Mon. Wea. Rev.***129**, 2884–2903, doi:10.1175/1520-0493(2001)129<2884:AEAKFF>2.0.CO;2.Anderson, J. L., and L. Lei, 2013: Empirical localization of observation impact in ensemble Kalman filters.

,*Mon. Wea. Rev.***141**, 4140–4153, doi:10.1175/MWR-D-12-00330.1.Bishop, C. H., and D. Hodyss, 2009: Ensemble covariances adaptively localized with ECO-RAP. Part 2: A strategy for the atmosphere.

,*Tellus***61A**, 97–111, doi:10.1111/j.1600-0870.2008.00372.x.Bishop, C. H., and D. Hodyss, 2011: Adaptive ensemble covariance localization in ensemble 4D-VAR state estimation.

,*Mon. Wea. Rev.***139**, 1241–1255, doi:10.1175/2010MWR3403.1.Bishop, C. H., and E. A. Satterfield, 2013: Hidden error variance theory. Part I: Exposition and analytic model.

,*Mon. Wea. Rev.***141**, 1454–1468, doi:10.1175/MWR-D-12-00118.1.Bishop, C. H., B. J. Etherton, and S. J. Majumdar, 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects.

,*Mon. Wea. Rev.***129**, 420–436, doi:10.1175/1520-0493(2001)129<0420:ASWTET>2.0.CO;2.Bonavita, M., M. Hamrud, and L. Isaksen, 2015: EnKF and hybrid gain ensemble data assimilation. Part II: EnKF and hybrid gain results.

,*Mon. Wea. Rev.***143**, 4865–4882, doi:10.1175/MWR-D-15-0071.1.Burgers, G., P. J. van Leeuwen, and G. Evensen, 1998: Analysis scheme in the ensemble Kalman filter.

,*Mon. Wea. Rev.***126**, 1719–1724, doi:10.1175/1520-0493(1998)126<1719:ASITEK>2.0.CO;2.Campbell, W. F., C. H. Bishop, and D. Hodyss, 2010: Covariance localization for satellite radiances in ensemble Kalman filters.

,*Mon. Wea. Rev.***138**, 282–290, doi:10.1175/2009MWR3017.1.Gaspari, G., and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions.

,*Quart. J. Roy. Meteor. Soc.***125**, 723–757, doi:10.1002/qj.49712555417.Hamill, T. M., J. S. Whitaker, and S. Snyder, 2001: Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter.

,*Mon. Wea. Rev.***129**, 2776–2790, doi:10.1175/1520-0493(2001)129<2776:DDFOBE>2.0.CO;2.Hamrud, M., M. Bonavita, and L. Isaksen, 2015: EnKF and hybrid gain ensemble data assimilation. Part I: EnKF implementation.

,*Mon. Wea. Rev.***143**, 4847–4864, doi:10.1175/MWR-D-14-00333.1.Hodyss, D., W. Campbell, and J. Whitaker, 2016: Observation-dependent posterior inflation for the ensemble Kalman filter.

,*Mon. Wea. Rev.***144**, 2667–2684, doi:10.1175/MWR-D-15-0329.1.Houtekamer, P. L., and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique.

,*Mon. Wea. Rev.***126**, 796–811, doi:10.1175/1520-0493(1998)126<0796:DAUAEK>2.0.CO;2.Hunt, B. R., E. J. Kostelich, and S. Szunyogh, 2007: Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter.

,*Physica D***230**, 112–126, doi:10.1016/j.physd.2006.11.008.Kretschmer, M., B. R. Hunt, and E. Ott, 2015: Data assimilation using a climatologically augmented local ensemble transform Kalman filter.

,*Tellus***67A**, 26617, https://dx.doi.org/10.3402/tellusa.v67.26617.Leng, H., J. Song, F. Lu, and X. Cao, 2013: A new data assimilation scheme: The space-expanded ensemble localization Kalman filter.

,*Adv. Meteor.***2013**, 410812, doi:10.1155/2013/410812.Lorenz, E. N., and K. A. Emanuel, 1998: Optimal sites for supplementary weather observations: Simulations with a small model.

,*J. Atmos. Sci.***55**, 399–414, doi:10.1175/1520-0469(1998)055<0399:OSFSWO>2.0.CO;2.Penny, S. G., 2014: The hybrid local ensemble transform Kalman filter.

,*Mon. Wea. Rev.***142**, 2139–2149, doi:10.1175/MWR-D-13-00131.1.Penny, S. G., D. W. Behringer, J. A. Carton, and E. Kalnay, 2015: A hybrid Global Ocean Data Assimilation System at NCEP.

,*Mon. Wea. Rev.***143**, 4660–4677, doi:10.1175/MWR-D-14-00376.1.Sommer, M., and T. Janjic, 2017: A flexible additive inflation scheme for treating model error in ensemble Kalman filters.

*Proc. 19th European Geophysical Union General Assembly*, Vienna, Austria, EGU2017-7393, http://meetingorganizer.copernicus.org/EGU2017/EGU2017-7393.pdf.Wang, X., C. H. Bishop, and S. J. Julier, 2004: Which is better, an ensemble of positive–negative pairs or a centered spherical simplex ensemble?

,*Mon. Wea. Rev.***132**, 1590–1605, doi:10.1175/1520-0493(2004)132<1590:WIBAEO>2.0.CO;2.Wang, X., T. M. Hamill, J. S. Whitaker, and C. H. Bishop, 2007: A comparison of hybrid ensemble transform Kalman filter–Optimum interpolation and ensemble square root filter analysis schemes.

,*Mon. Wea. Rev.***135**, 1055–1076, doi:10.1175/MWR3307.1.Whitaker, J. S., 2016: Performing model space localization for satellite radiances in an ensemble Kalman filter.

*20th Conf. on Integrated Observing and Assimilation Systems for the Atmosphere, Oceans, and Land Surface (IOAS-AOLS)*, New Orleans, LA, Amer. Meteor. Soc., P253, https://ams.confex.com/ams/96Annual/webprogram/Paper281727.html.Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations.

,*Mon. Wea. Rev.***130**, 1913–1924, doi:10.1175/1520-0493(2002)130<1913:EDAWPO>2.0.CO;2.

^{1}

The modified gain discussed here is to map prior ensemble perturbations to posterior perturbations. In contrast, the modified gains of Penny (2014), Hamrud et al. (2015), and Bonavita et al. (2015) map prior means to posterior means.

^{2}

Code in the Python programming language to reproduce all of the experiments shown here is available online (https://github.com/jswhit/L96).

^{3}

Many radiance observations are vertical integrals of nonlinear functions of the state but, for simplicity, we ignore such complexities in this paper.