1. Introduction
The Kalman filter was first proposed by Kalman (1960) under the assumption of linear dynamics, knowledge of certain covariance information, and Gaussian random variables. Evensen (1994) first introduced the ensemble Kalman filter (EnKF) into this context. EnKF takes advantage of a short-term ensemble to estimate flow-dependent background error covariance, the computational cost of which is much more affordable than the original Kalman filter for large dimensional computation. As variations of EnKF, several ensemble square root filters (EnSRF) have been developed [for a summary see Tippett et al. (2003)] in order to maintain consistency between the ensemble and the covariance (see Houtekamer and Mitchell 2001; Anderson 2001; Bishop et al. 2001; Whitaker and Hamill 2002; Snyder and Zhang 2003). Ott et al. (2004) designed the local ensemble transform Kalman filter so the state variables can be updated completely in parallel. Meanwhile people are devoting efforts to combining EnKF with three- and four-dimensional variational data assimilation (3D/4D-Var; e.g., Hamill and Snyder 2000; Lorenc 2003; Wang et al. 2008a,b; Zhang et al. 2009, 2013). In the ensemble Kalman filters, sampling error is one of the common problems.
Since the ensemble size is usually far less than the state dimension in geoscience applications, it is impossible for the sample covariance calculated by the ensemble members to capture all the main properties of the true covariance. Some of the sampling errors can be eliminated when the physical correlation radius is small compared to the spatial dimension of the state, which is usually the case in numerical weather prediction. Houtekamer and Mitchell (2001) and Hamill et al. (2001) used localization to remove spurious correlations between distant variables. The localization method can be used in all ensemble-based filters and it has been shown to be a powerful tool for limiting sampling error due to a small ensemble size. The Gaspari–Cohn (GC) localization function (Gaspari and Cohn 1999) is symmetric positive definite and widely used for a wide range of applications. Alternative methods include Jun et al. (2011) who considered estimating the background covariance by smoothing the sample covariance using a kernel function. The GC function is effective for spatial localization, but the localization radius in most applications until recently is chosen empirically. Anderson (2012) sheds some new light on this problem by making use of a probabilistic approach. Anderson and Lei (2013) further extended the probabilistic approach to determining the localization radius based on observation impact. Lei and Anderson (2014) chose the localization radius to be the one that minimizes the difference between sample covariance and localized sample covariance computed from different samples. Meanwhile Bishop and Hodyss (2007) and Bishop et al. (2011) proposed the use of entry by entry powers of sample correlation together with a nonadaptive empirical localization function. The method proposed in the current study is based on a sequential ensemble square root filter. For covariance inflation we utilize the relaxation method from Zhang et al. (2004). To clarify, we present the algorithm in detail in Table 1. Moreover we would like to mention that the method under discussion in this manuscript only applies to the case when observations are uncorrelated. Further generalizations of this approach to correlated observation errors will be the subject of future research.
Algorithm of sequential localized ensemble square root Kalman filter: Algorithm AK.

The structure of this paper is as follows. In section 2 we propose a variational approach through minimizing a cost function to find the optimum localization radius in the case that the true covariance is known. In section 3 we generalize the method to the case where the true covariance is unknown but can be represented probabilistically from the sample covariance. Numerical results about the performance of this method are presented through sections 2, 3, and 4. In particular, section 4 contains the numerical results based on the Lorenz-96 system. Section 5 provides a summary and discussion on the proposed localization method.
2. Optimal localization with true covariance: A cost function approach
As mentioned earlier, when the physical correlation radius is small the sampling noise at far distances can be largely reduced by localization. Nevertheless, the benefits of this process can occur at the expense of removing small but nonnegligible true covariance beyond the cutoff distance and lowering the true covariance within the localization radius. Figure 1 is a visual illustration of the effect of covariance localization. In this experiment, we use the GC correlation function as the true covariance by setting the true radius of influence (ROI) equal to 5. We draw 61 sample vectors from the Gaussian distribution

(top) Gaspari–Cohn covariance (blue line), which serves as the true covariance with radius of influence equal to 5; ensemble covariance (red line) with ensemble size N = 61; and localized sample covariance (black line) with localization radius 24. (bottom) The difference between the true covariance and the sample/localized sample covariance. This panel shows that localization can significantly reduce sampling error when the true correlation radius is far smaller than the dimension of the domain.
Citation: Monthly Weather Review 142, 12; 10.1175/MWR-D-13-00390.1
Cost function F0










Based on the above analysis, we propose the algorithm A0 (Table 2) (while its results are shown in the left part of Table 3) for evaluating the optimal ROI. In algorithm A0,
Algorithm A0.

Comparison of F0 and F for given true covariance. We do 1000 tests for each experiment and estimate the probability density of ROI(1) for each experiment. We plot the (rescaled) probability density function of ROI(1) in the right panels of Fig. 2 and list the maximum likelihood estimates of ROI(1) in the right half of Table 5. ROImax is the maximal ROI allowed by algorithm A0 and algorithm A1, respectively. At the first glance we see that for some set of parameters the difference between ROI(0) and ROI(1) is large. But we cannot expect that algorithm A0 and algorithm A1 give close values in all situations because in algorithm A1 the information of true covariance is completely unknown anyway. On the other hand, we think the value of ROI(1) in these cases are still acceptable and sometimes it is close to ROI(0). Moreover we see that when ensemble size is small ROI(1) is also small though the true covariance has larger ROI. This is intuitively right since when ensemble size is smaller we put fewer trust on the ensemble covariance. We also see that for larger ensemble size, both algorithms results in larger ROI.


The (rescaled) probability density estimate of ROI(0) and ROI(1) for experiments using known true covariance with ROI = 2, 5, 10, or 20. Here we normalize the curves so the maximum of each curve is always equal to 1. (left) ROI(0) and (right) ROI(1). Blue curves are for ROI = 2. Red curves are for ROI = 5. Magenta curves are for ROI = 10. Black curves are for ROI = 20. The maximum likelihood estimate of each curve is listed in Table 3. The MLE of the curves in (left) is listed in the left half of Table 3. Note that there is no black and magenta curve in the (top right) panel; please refer to the text for a discussion of this.
Citation: Monthly Weather Review 142, 12; 10.1175/MWR-D-13-00390.1
For all choices of parameters in this experiment, the resulting maximum likelihood estimate of ROI(0) is always larger than the true ROI. This is desirable since our goal is to find a localization that can reduce the spurious correlation at a far distance while also approximating the true correlation at a near distance. Second, we can see that for a larger ensemble size (or ROI) the resulting ROI(0) also gets larger. We also find that as the true ROI increases for a fixed ensemble size, the increment of ROI(0) is almost linearly proportional to the increment of ROI. That is likely due to the Gaussian assumption. More discussions on Table 3 and Fig. 2 will be presented in section 3.
3. A probabilistic method of localization





In the above formula p(



- the true covariance may not lie in this subspace and the subspace is not flat;
- mathematically it is harder to simplify the cost function to a computable form.



a. How to choose the cost function

The advantages of the function F11 are the following:
- The truncated F11 and the full F11 have the same value:
- In the case the true covariance is known, F0 and F11 only differ by a constant. More precisely speaking,where this constant C has no impact on where the cost function’s minimum is taken in this case. Therefore, the constant C has no impact on determining the localization radius.


For the exact mathematical expression/definition of
Theorem 3.1
Suppose there is a scalar observation available at this time and the observational variance is









Based on the above theorem we propose algorithm A1 (Table 4) to adaptively determine the localization radius. The computational cost of this algorithm is O(N2m).
Algorithm A1.

b. Comparison of ROI(0) and ROI(1) for a given true covariance
We present the experimental results in Table 3 and Fig. 2 for a comparison of ROI using the algorithm A0 presented in section 2 [ROI(0)] versus the new probabilistic algorithm A1 [ROI(1)]. The experiment for ROI(0) is still the same as that presented in section 2, where the true covariance is known. In addition, we add columns 7–11, which are values of ROI(1) computed by algorithm A1 for the same true covariance and sample covariance that are used in columns 2–6, though the information of true covariance is completely unknown in algorithm A1. Again we do 1000 tests for each experiment and estimate the probability density of ROI(1) for each experiment. We plot the (rescaled) probability density function of ROI(1) on the right panels of Fig. 2 and list the maximum likelihood estimates (MLE) of ROI(1) in the right panel of Table 3. Note that the last row of Table 3 is the maximum value of ROI(0) and ROI(1) allowed by algorithm A0 and A1 respectively. The upper bound for ROI(0) can be modified arbitrarily with the larger upper bound translating into a larger range of possible solutions. ROImax = 130 is found to be a large enough searching range for our application. The upper bound for ROI(1) is not as flexible as in the ROI(0) case. In this experiment and the experiments in the next section, the maximum of ROI(1) is chosen to be
When the true ROI = 2, ROI(0) varies from 2.5 to 5. For the algorithm that uses no knowledge of the true covariance, ROI(1) changes from 2.2 to 17, which has a much larger range than ROI(0). For example, when N = 11, the MLE of ROI(0) is about 1.3 times the true ROI. When N = 121, the MLE of ROI(0) is about 2.6 times the true ROI. Note that now there is no similar linear relationship between the MLE and ROI for ROI(1) [which is found for ROI(0)]. When N = 11, ROI = 10 and 20, the probability density of ROI(1) all concentrate at a single point, which causes the graph of the (rescaled) probability density function to not be seen in Fig. 2. Similarly for N = 21 and 31, the probability density of ROI(1) in Fig. 2 is very concentrated near the maximal possible ROI (i.e., ROImax). This is because algorithm A1 does not allow ROI(1) to be larger than
At first glance, the differences between ROI(0) and ROI(1) are quite large over the set of parameters that we examine. When N = 121, the MLE of ROI(1) is larger than that of ROI(0) when the true ROI= 2, 5, and 10 but smaller than that of ROI(0) when the true ROI = 20. This result is not surprising because in algorithm A1 the true covariance is completely unknown as in real-data applications. On the other hand, the values of ROI(1) are close to ROI(0) when N = 61. Moreover we see that when the ensemble size is small ROI(1) is also small though the true covariance has larger ROI. We also see that for a larger ensemble size, both algorithms result in larger ROIs. Therefore, the algorithm adjusts the ROI according to the expected sampling error for a given data assimilation cycle.
Although F11 would give the same value if we slightly enlarge the region allowed by the Wishart distribution, the integral would be different due to the use of the noninformative Jeffreys prior. As a consequence, if we only use F11 as our cost function the resulting ROI(1) would just be ROImax in most situations. This is why the second term in F10 is required for making the solution nontrivial. On the other hand, the choice of prior does not necessarily have to be the Jeffreys prior. The penalty term can have other choices as well. In this sense, the method is still quite empirical when the true covariance is unknown. Future studies will explore the use of more informative prior probability distribution functions for optimizing the ensemble localization distance.
4. Application to the ensemble covariance generated by the Lorenz-96 model
a. Experiments with the Lorenz-96 system and approximated true covariance
Now we design experiments to compare algorithms A0 and A1 in a Lorenz-96 system (Lorenz 2006). Here the system is configured to have 120 variables and 30 uniformly distributed observations, which lie on the model grid points. The external forcing F is set to 8, the time step dt is set to 0.05, and observations appear every two time steps. We use an error variance of 0.04 for all observations in these experiments. Observational operator H is simply the restriction operator as required by the current version of the algorithm. To get an approximated true covariance in the Lorenz-96 system, we first run 6000 ensemble members for T1 time steps with the EnKF data assimilation. We then run the ensemble members for T2 time steps without data assimilation. Then we use the whole 6000 ensemble members to get an approximation of the true covariance

The (rescaled) probability density estimate of ROI(0) and ROI(1) for experiments using approximated true covariance in a Lorenz-96 system with T1 = 50, T2 = 1, 10, and 100. Here we normalize the curves so the maximum of each curve is always equal to 1. (left) ROI(0) and (right) ROI(1). Blue curves are for T2 = 1. Red curves are for T2 = 10. Black curves are for T2 = 100. The MLE of each curve is listed in Table 5.
Citation: Monthly Weather Review 142, 12; 10.1175/MWR-D-13-00390.1

The (rescaled) probability density estimate of ROI(0) and ROI(1) for experiments using approximated true covariance in a Lorenz-96 system with T1 = 500, T2 = 1, 10, and 100. Here we normalize the curves so the maximum of each curve is always equal to 1. (left) ROI(0) and (right) ROI(1). Blue curves are for T2 = 1. Red curves are for T2 = 10. Black curves are for T2 = 100. The MLE of each curve is listed in Table 5.
Citation: Monthly Weather Review 142, 12; 10.1175/MWR-D-13-00390.1
Comparison of F0 and F for approximated, true covariance in Lorenz-96 system. These are maximum likelihood estimates of ROI(0) and ROI(1). The probability estimates are based on 1000 independent tests. Our true covariance

We do not expect that algorithms A1 and A0 give similar values in all cases, because in algorithm A1 we do not use the true covariance while in algorithm A0 we have the complete information of true covariance.
When T1 = 50 (T1 is the EnKF analysis duration), N = 61, the value of ROI(0) = 17.7, 18.6, and 10.6 for different T2 (T2 is the length of free ensemble forecast without assimilation). We see ROI(1) = 19.4, 19.5, and 11 for different T2, respectively, which are very close to the values of ROI(0). But for T1 = 500 (Fig. 4), we see the probability density function of ROI(0) is quite peculiar for T2 = 1 and 10 (blue and red curves in the left panels) when the ensemble size is small. The maximum likelihood of ROI(0) are around 7.7, 2.5, and 8 for different T2 no matter if N = 11, 21, or 31. Although it is not clear why T2 = 10 results in ROI(0) ≈ 2.5, which might be due to the system dynamics, this shows that ROI(0) only changes a little when ensemble size changes among N = 11, 21, and 31. The estimated probability densities of ROI(0) for these parameters have similar shape as can be seen in first three panels on the left of Fig. 4. Now, if we compare the graphs on all of the left panels of Figs. 2, 3, and 4, we see that although the (approximate) true covariance is known in all of these cases, there is a difference in the shapes of the probability distributions for ROI(0). It can be clearly seen that in the experiments for the left panels of Fig. 2 the probability density function curves are approximately symmetric about their axis of symmetry, in which case the sample is exactly drawn from a Gaussian distribution. As a comparison, we see that for tests using small ensemble sizes, which is for the left panels of Figs. 3 and 4, the curves do not have symmetry at all. This is probably because the distributions of state variables in the Lorenz-96 system are not close to Gaussian. For N = 21, T1 = 50, T2 = 10 (red curve in Fig. 3, N = 21), the curve even has two peaks. In this case it would be hard for us to determine a good localization radius even if we know the true covariance. For T1 = 500, T2 = 100 (the black curves in Fig. 4), the behavior of ROI(0) becomes close to that when T1 = 50, T2 = 100 for all ensemble sizes. This might be because after running the model for 100 time steps without data assimilation, the distribution of state variables become close to the “climatological distribution.” But as we compare the results in this case with the numerical results in section 4b, we see that ROI around 11 does not give optimal RMSE for the long time real implementation. However, the curves on the right panels of Figs. 3 and 4 are more symmetric. This might because our probabilistic method still assumes Gaussian distribution and we do see in the cases T1 = 50 or 500, T2 = 1 or 10, ROI(1) would result in smaller RMSE according to numerical results in section 4b, hence, it is a better value for real-data implementation.
b. Experiments that use algorithm A1 to do adaptive localization in Lorenz-96 system
Now we implement algorithm A1 in the Lorenz-96 system. As mentioned before, the Lorenz-96 system configured here has n = 120 state variables and is integrated using the Runge–Kutta fourth-order scheme. The observations are uniformly distributed and lie on some of the grid points. The number of observations m and ensemble size N are different for each case. To handle filter divergence, we use a relaxation coefficient α = 0.5 (Zhang et al. 2004). The inflation method is necessary here to prevent the model from blowing up. The choice of α = 0.5 is not tuned. We also did the same experiment for α = 0.25 and the results are similar. The choice of α can slightly influence the resulting ROI(1), but the change is ignorable at least for this experiment, so we only show the results for α = 0.5. At every time step when observations are available, we first compute ROI(1) using algorithm A1, then we use ROI(1) to be the localization radius before assimilating the observations.
1) Sensitivity to ensemble size
In Fig. 5 we plot the curves of ROI(1) as a function of time for different ensemble sizes. In general we see that the resulting localization radius increases with the ensemble size, which is consistent with the results in section 2 where the true covariance is known.

Adaptive ROI from algorithm A1 as a function of time for different ensemble sizes. We can see that for larger ensemble size the resulting ROI(1) also gets larger. This experiment is done in Lorenz-96 system.
Citation: Monthly Weather Review 142, 12; 10.1175/MWR-D-13-00390.1
2) Comparison of RMSE for different observational density
In this subsection, we fix the ensemble size at N = 61 and plot the spatial RMSE of the ensemble mean as a function of time for times steps between 1 and 50, 1000 and 1200, and 1200 and 1400. These values are plotted for m = 30, 60, and 120, in Figs. 6, 7, and 8, respectively, along with ROI(1) as a function of time. While the larger ROI tends to result in smaller RMSEs, the solution can be unstable when the observations are sparse as indicated by the missing red curve in Fig. 6. The optimal choice of ROI must lower the RMSE, while maintaining a stable solution. In Table 6 we show the temporal mean of RMSE from time step 1000 to time step 5000 for different number of observations and fixed/adaptive ROI. When m = 30, the temporal mean of RMSE for the green, blue, cyan, and black curve is 0.213, 0.2274, 0.4295, and 0.2296, respectively. We find that the ROI(1) value becomes stable after a few steps of data assimilation and lies around 20.2, and the RMSEs of the black, blue, and green curves are indistinguishable. In this case the adaptive method almost gives the best ROI. On the other hand, we found that the resulting ROI is smaller when the observation density increases. This is likely because assimilating denser observations may cause the ensemble correlation to be more narrowly supported, which then leads to a smaller localization radius by this algorithm. Our speculation is based on Zhang et al. (2006, 729–730) and Daley and Menard (1993, see their Fig. 2), which is often called the “whitening” of the analysis error spectrum in that larger-scale error will be more effectively reduced first with more observations assimilated. In the meantime it is known that with denser observations larger localization radius can be applied to improve the mean update while also maintaining the stability of the solution. When m = 60, the temporal mean of the red, green, blue, cyan, and black curves is 0.1038, 0.1061, 0.1138, 0.1381, and 0.1112. The temporal mean of ROI(1) in this case is 18.2209. When m = 120, the temporal mean of the red, green, blue, cyan, and black curves is 0.0652, 0.0689, 0.0718, 0.0854, and 0.0713 while the temporal mean of ROI(1) is 16.5928. In Fig. 8, we see that the blue and black curves are nearly identical, because the temporal mean ROI(1) is approximately 16. And from the graph we can easily see that the red curve is consistently better than the black curve implying that there is still room to reduce the RMSE by enlarging the ROI. This suggests that more work needs to be done to take observation density into account so the localization radius determined by this algorithm can be more efficient. On the other hand, comparing the graphs in the top panels we see that with more observations the RMSE reduces more quickly and it is hard to distinguish the curves for the first 50 time steps in Figs. 7 and 8.

Comparison of spatial RMSE of ensemble mean using adaptive ROI with that using fixed ROI = 8, 16, and 24 in the Lorenz-96 system. We can see that the ROI(1) converges to around 20 after a few time steps. The RMSE for adaptive ROI(1) is comparable with that for optimal ROI = 24. The bottom plot shows how ROI(1) evolves with time.
Citation: Monthly Weather Review 142, 12; 10.1175/MWR-D-13-00390.1

Comparison of spatial RMSE of ensemble mean using adaptive ROI with that using fixed ROI = 8, 16, …, 30 in the Lorenz-96 system. We can see that the ROI(1) converges to around 18 after a few time steps. The RMSE for adaptive ROI(1) is slightly larger than that for optimal ROI = 30. The bottom plot shows how ROI(1) evolves with time.
Citation: Monthly Weather Review 142, 12; 10.1175/MWR-D-13-00390.1

Comparison of spatial RMSE of ensemble mean using adaptive ROI with that using fixed ROI = 8, 16, …, 30 in the Lorenz-96 system. We can see that the ROI(1) converges to around 16 after a few time steps. The RMSE for adaptive ROI(1) is slightly larger than that for optimal ROI = 30. The bottom plot shows how ROI(1) evolves with time.
Citation: Monthly Weather Review 142, 12; 10.1175/MWR-D-13-00390.1
The temporal RMSE of the ensemble mean from time steps 1000 to 5000 in the Lorenz-96 system for different observation densities and fixed/adaptive ROI. NaN refers to missing data caused by frequent model breakdown.

3) Curve of F value
The value of the cost function F for each observation in the m = 30 cases is plotted in the top of Fig. 9 as a function of ROI for time steps 2, 120, and 1020. The ensemble size N = 61. The red dots indicate where F is a minimum for each observation. In the bottom panels, the curves are the (rescaled) estimated probability density of all possible ROIs. The blue dot shows where the maximum likelihood is taken, hence it is also the final ROI(1) we use for localization at each time step. The curves in the top panels are almost convex, making finding the minimum value (red dots) of F for each observation not a trivial problem. This is also the consequence of adding the penalty term to the cost function. The bottom panels show that the optimal ROI(1) for each observation is distributed in a way that the probability density function of ROI(1) only has one peak (i.e., the maximum likelihood estimate is clearly well defined). If the density function has two peaks, then it is not easy to determine a unique localization radius. In that case, one can try dividing observations into groups so for each group of observations the probability density function of ROI(1) is better behaved. This aspect of the algorithm needs further exploration in the future.

(top) F-value curves [defined in Eq. (11)] for different observations and different time steps, and (bottom) the estimated (rescaled) probability density of ROI. This experiment is done in the Lorenz-96 system. Since we have m = 30 observations, we have 30 curves in each panel at the top. The red dots are where the minimum of F value are taken for each observation. The blue dots are the maximum likelihood estimate of ROI(1) at each time step. Hence, the blue dots are the output ROI(1) of algorithm A1 at each time step.
Citation: Monthly Weather Review 142, 12; 10.1175/MWR-D-13-00390.1
5. Discussion and summary
In this article we first presented a cost function approach to analyze the sampling error issue in the case where the true covariance is known. Then we generalized this method to the case where the true covariance is not known. We presented a probabilistic approach to determine the localization radius adaptively when serial ensemble square root filters are used to assimilate uncorrelated observations. The advantage of this method is that it uses the information from the ensemble members and observations only within a single assimilation window. Further the computational cost of this algorithm is small and its performance in the Lorenz-96 system is promising. We compared the results from this probabilistic method with that from the deterministic method in the case that the true covariance is known. The results show that the probabilistic method gives more useful radius of influence for long time implementation than the deterministic method. As a side result, we find that in order to determine a good localization radius, one needs to consider more than simply sampling error. In particular, the dynamical property of the model needs to be taken into account.
There are severe issues worth further discussion and investigation. First of all, the algorithm under discussion does not utilize any information regarding the observation density. As a consequence, the ROI(1) is not the optimal radius of influence when observations are dense. Second the algorithm in this article is based on serial ensemble square root Kalman filters and has the requirement that the observational operator H must be the restriction operator and that observations must be uncorrelated, which must be eliminated in the future work. Third the output of ROI(0) in the experiments in Lorenz-96 system, where the true covariance is approximated by 6000 ensemble members, is clearly not optimal according to the curves in Fig. 8, suggesting again that in order to get a good localization radius other system dynamics besides sampling error need to be considered. On the other hand the choice of a noninformative prior and the use of an empirical penalty term are not necessarily to be optimal as presented in this article. Deriving a useful mathematical formula for other choices of prior is a challenging problem.
There are several ways to make modifications in the algorithm in order to make use of more statistical tools. For example, we can generalize the concept of maximum likelihood as the “peaks of the density function.” More precisely speaking, if the probability density function of ROI(1) has more than one peak, it may be more wise to use the different peak values as the ROI for different observations rather than using only the peak value at the maximum likelihood estimate. One can also try incorporating this scheme with other methods. For example, one can consider taking observational impact into account to get a weighted maximum likelihood estimate. Future research is also warranted to extend the current one-dimensional study to 2D or 3D as well as for covariance across different physical variable (e.g., between temperature and winds).
We thank Zhibiao Zhao for his valuable suggestion on the choice of Jeffreys prior; Ningtao Wang and Cheng You for their valuable suggestion about Wishart distribution; Jeffrey Anderson for his encouraging us to publish this result; John Harlim, Jonathan Poterjoy, and the anonymous reviewers for their patient and careful reading and generous suggestion on how to make the math more clear. This work was supported in part by Office of Naval Research Grant N000140910526 and the National Science Foundation Grant ATM-0840651.
APPENDIX
Mathematical Derivation
a. Notation
For convenience the notation in this appendix is a little different from those in the main context. For example,
b. Review of setup
We now consider how to determine the localization radius when assimilating a single observation. Since we only assimilate one observation each time, for the convenience of notation, we always assume the observation is on the first grid point. The observation operator H is a row vector in this case and more specifically H = (1, 0, …, …, 0).
Let








Let n = nLoc be the number of state variables within distance ROI to the observation and N be the ensemble size. We require N > nLoc.






c. Mathematical derivation
2) Compute 








lemma 2:
Because of space limitations we merely remark that this can be proven by careful but straightforward computation, leaving to the interested reader the task of filling in the details.


It is known that for n × n SPD matrix
Therefore



















Lemma 4: For 2l > n + 2,
Lemma 5: Let




















Hence (combining all the above identities), we have

3) Compute 
:










Combining all the results above, we have the following theorem:



And by doing the same substitution as in the computation of a3 in the proof of lemma 7, it is not hard to see the equivalence between Eqs. (A5), (A6) and Eqs. (4), (5).
REFERENCES
Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation. Mon. Wea. Rev., 129, 2884–2903, doi:10.1175/1520-0493(2001)129<2884:AEAKFF>2.0.CO;2.
Anderson, J. L., 2012: Localization and sampling error correction in ensemble Kalman filter data assimilation. Mon. Wea. Rev., 140, 2359–2371, doi:10.1175/MWR-D-11-00013.1.
Anderson, J. L., , and L. Lei, 2013: Empirical localization of obervation impact in ensemble Kalman filters. Mon. Wea. Rev., 141, 4140–4153, doi:10.1175/MWR-D-12-00330.1.
Bernardo, J. M., , and A. F. Smith, 1994: Bayesian Theory.John Wiley and Sons, 608 pp.
Bishop, C. H., , and D. Hodyss, 2007: Flow adaptive moderation of spurious ensemble correlations and its use in ensemble-based data assimilation. Quart. J. Roy. Meteor. Soc., 133, 2029–2044, doi:10.1002/qj.169.
Bishop, C. H., , B. J. Etherton, , and S. J. Majumdar, 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects. Mon. Wea. Rev., 129, 420–436, doi:10.1175/1520-0493(2001)129<0420:ASWTET>2.0.CO;2.
Bishop, C. H., , D. Hodyss, , P. Steinle, , H. Sims, , A. M. Clayton, , A. C. Lorenc, , D. M. Barker, , and M. Buehner, 2011: Efficient ensemble covariance localization in variational data assimilation. Mon. Wea. Rev., 139, 573–580, doi:10.1175/2010MWR3405.1.
Daley, R., , and R. Menard, 1993: Spectral characteristics of Kalman filter systems for atmospheric data assimilation. Mon. Wea. Rev., 121, 1554–1565, doi:10.1175/1520-0493(1993)121<1554:SCOKFS>2.0.CO;2.
Evensen, G., 1994: Sequential data assimilation with nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99, 10 143–10 162, doi:10.1029/94JC00572.
Gaspari, G., , and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions. Quart. J. Roy. Meteor. Soc., 125, 723–757, doi:10.1002/qj.49712555417.
Hamill, T. M., , and C. Snyder, 2000: A hybrid ensemble Kalman filter–3D variational analysis scheme. Mon. Wea. Rev., 128, 2905–2919, doi:10.1175/1520-0493(2000)128<2905:AHEKFV>2.0.CO;2.
Hamill, T. M., , J. S. Whitaker, , and C. Snyder, 2001: Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter. Mon. Wea. Rev., 129, 2776–2790, doi:10.1175/1520-0493(2001)129<2776:DDFOBE>2.0.CO;2.
Houtekamer, P. L., , and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 129, 123–137, doi:10.1175/1520-0493(2001)129<0123:ASEKFF>2.0.CO;2.
Jun, M., , I. Szunyogh, , M. G. Genton, , F. Zhang, , and C. H. Bishop, 2011: A statistical investigation of sensitivity of ensemble-based Kalman filters to covariance filtering. Mon. Wea. Rev., 139, 3036–3051, doi:10.1175/2011MWR3577.1.
Kalman, R. E., 1960: A new approach to linear filtering and prediction problems. Trans. ASME J. Fluids Eng.,82D,35–45, doi:10.1115/1.3662552.
Lei, L., , and J. Anderson, 2014: Comparisons of empirical localization techniques for serial ensemble Kalman filters in a simple atmospheric general circulation model. Mon. Wea. Rev., 142, 739–754, doi:10.1175/MWR-D-13-00152.1.
Lorenc, A., 2003: The potential of the ensemble Kalman filter for NWP—A comparison with 4D-Var. Quart. J. Roy. Meteor. Soc., 129, 3183–3203, doi:10.1256/qj.02.132.
Lorenz, E. N., 2006: Predictability—A problem partly solved. Predictability of Weather and Climate, T. N. Palmer and R. Hagedorn, Eds., Cambridge University Press, 40–58.
Muirhead, R. J., 1982: Aspects of Multivariate Statistical Theory.John Wiley and Sons, 673 pp.
Ott, E., and Coauthors, 2004: A local ensemble Kalman filter for atmospheric data assimilation. Tellus,56A, 415–428, doi:10.1111/j.1600-0870.2004.00076.x.
Snyder, C., , and F. Zhang, 2003: Assimilation of simulated Doppler radar observations with an ensemble Kalman filter. Mon. Wea. Rev., 131, 1663–1677, doi:10.1175//2555.1.
Tippett, M. K., , J. L. Anderson, , C. H. Bishop, , T. M. Hamill, , and J. S. Whitaker, 2003: Ensemble square root filters. Mon. Wea. Rev., 131, 1485–1490, doi:10.1175/1520-0493(2003)131<1485:ESRF>2.0.CO;2.
Wang, X., , D. M. Barker, , C. Snyder, , and T. M. Hamill, 2008a: A hybrid ETKF–3DVar data assimilation scheme for the WRF model. Part I: Observing system simulation experiment. Mon. Wea. Rev., 136, 5116–5131, doi:10.1175/2008MWR2444.1.
Wang, X., , D. M. Barker, , C. Snyder, , and T. M. Hamill, 2008b: A hybrid ETKF–3DVar data assimilation scheme for the WRF model. Part II: Real observation experiments. Mon. Wea. Rev., 136, 5132–5147, doi:10.1175/2008MWR2445.1.
Whitaker, J. S., , and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations. Mon. Wea. Rev., 130, 1913–1924, doi:10.1175/1520-0493(2002)130<1913:EDAWPO>2.0.CO;2.
Zhang, F., , C. Snyder, , and J. Sun, 2004: Impacts of initial estimate and observation availability on convective-scale data assimilation with an ensemble Kalman filter. Mon. Wea. Rev., 132, 1238–1253, doi:10.1175/1520-0493(2004)132<1238:IOIEAO>2.0.CO;2.
Zhang, F., , Z. Meng, , and A. Aksoy, 2006: Tests of an ensemble Kalman filter for mesoscale and regional-scale data assimilation. Part I: Perfect model experiments. Mon. Wea. Rev., 134, 722–736, doi:10.1175/MWR3101.1.
Zhang, F., , M. Zhang, , and J. A. Hansen, 2009: Coupling ensemble Kalman filter with four-dimensional variational data assimilation. Adv. Atmos. Sci., 26, 1–8, doi:10.1007/s00376-009-0001-8.
Zhang, F., , M. Zhang, , and J. Poterjoy, 2013: E3DVar: Coupling an ensemble Kalman filter with three-dimensional variational data assimilation in a limited-area weather prediction model and comparison to E4DVar. Mon. Wea. Rev., 141, 900–917, doi:10.1175/MWR-D-12-00075.1.