Abstract

This study proposes a variational approach to adaptively determine the optimum radius of influence for ensemble covariance localization when uncorrelated observations are assimilated sequentially. The covariance localization is commonly used by various ensemble Kalman filters to limit the impact of covariance sampling errors when the ensemble size is small relative to the dimension of the state. The probabilistic approach is based on the premise of finding an optimum localization radius that minimizes the distance between the Kalman update using the localized sampling covariance versus using the true covariance, when the sequential ensemble Kalman square root filter method is used. The authors first examine the effectiveness of the proposed method for the cases when the true covariance is known or can be approximated by a sufficiently large ensemble size. Not surprisingly, it is found that the smaller the true covariance distance or the smaller the ensemble, the smaller the localization radius that is needed. The authors further generalize the method to the more usual scenario that the true covariance is unknown but can be represented or estimated probabilistically based on the ensemble sampling covariance. The mathematical formula for this probabilistic and adaptive approach with the use of the Jeffreys prior is derived. Promising results and limitations of this new method are discussed through experiments using the Lorenz-96 system.

1. Introduction

The Kalman filter was first proposed by Kalman (1960) under the assumption of linear dynamics, knowledge of certain covariance information, and Gaussian random variables. Evensen (1994) first introduced the ensemble Kalman filter (EnKF) into this context. EnKF takes advantage of a short-term ensemble to estimate flow-dependent background error covariance, the computational cost of which is much more affordable than the original Kalman filter for large dimensional computation. As variations of EnKF, several ensemble square root filters (EnSRF) have been developed [for a summary see Tippett et al. (2003)] in order to maintain consistency between the ensemble and the covariance (see Houtekamer and Mitchell 2001; Anderson 2001; Bishop et al. 2001; Whitaker and Hamill 2002; Snyder and Zhang 2003). Ott et al. (2004) designed the local ensemble transform Kalman filter so the state variables can be updated completely in parallel. Meanwhile people are devoting efforts to combining EnKF with three- and four-dimensional variational data assimilation (3D/4D-Var; e.g., Hamill and Snyder 2000; Lorenc 2003; Wang et al. 2008a,b; Zhang et al. 2009, 2013). In the ensemble Kalman filters, sampling error is one of the common problems.

Since the ensemble size is usually far less than the state dimension in geoscience applications, it is impossible for the sample covariance calculated by the ensemble members to capture all the main properties of the true covariance. Some of the sampling errors can be eliminated when the physical correlation radius is small compared to the spatial dimension of the state, which is usually the case in numerical weather prediction. Houtekamer and Mitchell (2001) and Hamill et al. (2001) used localization to remove spurious correlations between distant variables. The localization method can be used in all ensemble-based filters and it has been shown to be a powerful tool for limiting sampling error due to a small ensemble size. The Gaspari–Cohn (GC) localization function (Gaspari and Cohn 1999) is symmetric positive definite and widely used for a wide range of applications. Alternative methods include Jun et al. (2011) who considered estimating the background covariance by smoothing the sample covariance using a kernel function. The GC function is effective for spatial localization, but the localization radius in most applications until recently is chosen empirically. Anderson (2012) sheds some new light on this problem by making use of a probabilistic approach. Anderson and Lei (2013) further extended the probabilistic approach to determining the localization radius based on observation impact. Lei and Anderson (2014) chose the localization radius to be the one that minimizes the difference between sample covariance and localized sample covariance computed from different samples. Meanwhile Bishop and Hodyss (2007) and Bishop et al. (2011) proposed the use of entry by entry powers of sample correlation together with a nonadaptive empirical localization function. The method proposed in the current study is based on a sequential ensemble square root filter. For covariance inflation we utilize the relaxation method from Zhang et al. (2004). To clarify, we present the algorithm in detail in Table 1. Moreover we would like to mention that the method under discussion in this manuscript only applies to the case when observations are uncorrelated. Further generalizations of this approach to correlated observation errors will be the subject of future research.

Table 1.

Algorithm of sequential localized ensemble square root Kalman filter: Algorithm AK.

Algorithm of sequential localized ensemble square root Kalman filter: Algorithm AK.
Algorithm of sequential localized ensemble square root Kalman filter: Algorithm AK.

The structure of this paper is as follows. In section 2 we propose a variational approach through minimizing a cost function to find the optimum localization radius in the case that the true covariance is known. In section 3 we generalize the method to the case where the true covariance is unknown but can be represented probabilistically from the sample covariance. Numerical results about the performance of this method are presented through sections 2, 3, and 4. In particular, section 4 contains the numerical results based on the Lorenz-96 system. Section 5 provides a summary and discussion on the proposed localization method.

2. Optimal localization with true covariance: A cost function approach

As mentioned earlier, when the physical correlation radius is small the sampling noise at far distances can be largely reduced by localization. Nevertheless, the benefits of this process can occur at the expense of removing small but nonnegligible true covariance beyond the cutoff distance and lowering the true covariance within the localization radius. Figure 1 is a visual illustration of the effect of covariance localization. In this experiment, we use the GC correlation function as the true covariance by setting the true radius of influence (ROI) equal to 5. We draw 61 sample vectors from the Gaussian distribution (0, true) and compute the sample covariance. Using a localization radius of 24, the localized sampling covariance is then compared with the raw sampling covariance and the true covariance. We can clearly see that the localization method effectively reduces the spurious correlation at far distances.

Fig. 1.

(top) Gaspari–Cohn covariance (blue line), which serves as the true covariance with radius of influence equal to 5; ensemble covariance (red line) with ensemble size N = 61; and localized sample covariance (black line) with localization radius 24. (bottom) The difference between the true covariance and the sample/localized sample covariance. This panel shows that localization can significantly reduce sampling error when the true correlation radius is far smaller than the dimension of the domain.

Fig. 1.

(top) Gaspari–Cohn covariance (blue line), which serves as the true covariance with radius of influence equal to 5; ensemble covariance (red line) with ensemble size N = 61; and localized sample covariance (black line) with localization radius 24. (bottom) The difference between the true covariance and the sample/localized sample covariance. This panel shows that localization can significantly reduce sampling error when the true correlation radius is far smaller than the dimension of the domain.

Cost function F0

To gain further understanding of this problem we first examine the solution in the case where the true covariance is known, but we acknowledge that this is impossible in practice. In later sections of this paper we show how to remove this restriction and apply this technique to real problems where the true covariance or Kalman gain is not known. For this scenario we construct a way of describing how “good” the localization radius is for a given ensemble by associating it with the value of a cost function F0. Recall, when one observation y is available, the updating formula of state variable xi using the algorithm in section 1 is as follows:

 
formula

where is the mean update of the ith state variable, Δy is the innovation, is the regression coefficient calculated from the sample, and ρi is the value of localization function for the ith state variable and observation y.

If we know the true covariance true we can simply substitute true for s without doing localization. Hence, the exact Kalman updating formula becomes

 
formula

where

 
formula

is the true regression coefficient (perfect Kalman gain). Here h is the observational operator and is the observational covariance.

A natural way of comparing the difference between using the true covariance and the localized sample covariance is represented by the following cost function:

 
formula

The optimal localization radius should be the ROI that minimizes the cost function F0. Now let us consider the case when we have more than one observation available at the same time. We can find the optimal ROI for each observation and then find the maximum likelihood estimate of the ROI and choose it to be our overall optimal ROI.

Based on the above analysis, we propose the algorithm A0 (Table 2) (while its results are shown in the left part of Table 3) for evaluating the optimal ROI. In algorithm A0, denotes the sample regression coefficient for the jth observation and ith grid point. Similarly is the regression coefficient computed by using the true covariance for the jth observation and ith grid point; dji is the distance (in number of grid points) between the jth observation and ith grid point. In the left half of Table 3 and the left panel of Fig. 2, we show the value of ROI(0) in terms of grid points for different true covariances and ensemble sizes. Given true covariance of GC form with correlation radius ROI = 2, 5, 10, or 20, we draw samples of size N = 11, 21, …, 121 from the distribution (0, true) and compute the sample covariance s. We assume observations exist at every grid point and have an error variance of 0.04. Then we compute the ROI(0) using algorithm A0 for each pair of ROI and N values. We do this test 1000 times for each pair of parameters and estimate the probability density of ROI(0) and plot the curves of (rescaled) probability density in Fig. 2. In Table 3 (left half) we show the maximum likelihood estimate of ROI(0) for each choice of ensemble size and radius of influence of the true covariance.

Table 2.

Algorithm A0.

Algorithm A0.
Algorithm A0.
Table 3.

Comparison of F0 and F for given true covariance. We do 1000 tests for each experiment and estimate the probability density of ROI(1) for each experiment. We plot the (rescaled) probability density function of ROI(1) in the right panels of Fig. 2 and list the maximum likelihood estimates of ROI(1) in the right half of Table 5. ROImax is the maximal ROI allowed by algorithm A0 and algorithm A1, respectively. At the first glance we see that for some set of parameters the difference between ROI(0) and ROI(1) is large. But we cannot expect that algorithm A0 and algorithm A1 give close values in all situations because in algorithm A1 the information of true covariance is completely unknown anyway. On the other hand, we think the value of ROI(1) in these cases are still acceptable and sometimes it is close to ROI(0). Moreover we see that when ensemble size is small ROI(1) is also small though the true covariance has larger ROI. This is intuitively right since when ensemble size is smaller we put fewer trust on the ensemble covariance. We also see that for larger ensemble size, both algorithms results in larger ROI.

Comparison of F0 and F for given true covariance. We do 1000 tests for each experiment and estimate the probability density of ROI(1) for each experiment. We plot the (rescaled) probability density function of ROI(1) in the right panels of Fig. 2 and list the maximum likelihood estimates of ROI(1) in the right half of Table 5. ROImax is the maximal ROI allowed by algorithm A0 and algorithm A1, respectively. At the first glance we see that for some set of parameters the difference between ROI(0) and ROI(1) is large. But we cannot expect that algorithm A0 and algorithm A1 give close values in all situations because in algorithm A1 the information of true covariance is completely unknown anyway. On the other hand, we think the value of ROI(1) in these cases are still acceptable and sometimes it is close to ROI(0). Moreover we see that when ensemble size is small ROI(1) is also small though the true covariance has larger ROI. This is intuitively right since when ensemble size is smaller we put fewer trust on the ensemble covariance. We also see that for larger ensemble size, both algorithms results in larger ROI.
Comparison of F0 and F for given true covariance. We do 1000 tests for each experiment and estimate the probability density of ROI(1) for each experiment. We plot the (rescaled) probability density function of ROI(1) in the right panels of Fig. 2 and list the maximum likelihood estimates of ROI(1) in the right half of Table 5. ROImax is the maximal ROI allowed by algorithm A0 and algorithm A1, respectively. At the first glance we see that for some set of parameters the difference between ROI(0) and ROI(1) is large. But we cannot expect that algorithm A0 and algorithm A1 give close values in all situations because in algorithm A1 the information of true covariance is completely unknown anyway. On the other hand, we think the value of ROI(1) in these cases are still acceptable and sometimes it is close to ROI(0). Moreover we see that when ensemble size is small ROI(1) is also small though the true covariance has larger ROI. This is intuitively right since when ensemble size is smaller we put fewer trust on the ensemble covariance. We also see that for larger ensemble size, both algorithms results in larger ROI.
Fig. 2.

The (rescaled) probability density estimate of ROI(0) and ROI(1) for experiments using known true covariance with ROI = 2, 5, 10, or 20. Here we normalize the curves so the maximum of each curve is always equal to 1. (left) ROI(0) and (right) ROI(1). Blue curves are for ROI = 2. Red curves are for ROI = 5. Magenta curves are for ROI = 10. Black curves are for ROI = 20. The maximum likelihood estimate of each curve is listed in Table 3. The MLE of the curves in (left) is listed in the left half of Table 3. Note that there is no black and magenta curve in the (top right) panel; please refer to the text for a discussion of this.

Fig. 2.

The (rescaled) probability density estimate of ROI(0) and ROI(1) for experiments using known true covariance with ROI = 2, 5, 10, or 20. Here we normalize the curves so the maximum of each curve is always equal to 1. (left) ROI(0) and (right) ROI(1). Blue curves are for ROI = 2. Red curves are for ROI = 5. Magenta curves are for ROI = 10. Black curves are for ROI = 20. The maximum likelihood estimate of each curve is listed in Table 3. The MLE of the curves in (left) is listed in the left half of Table 3. Note that there is no black and magenta curve in the (top right) panel; please refer to the text for a discussion of this.

For all choices of parameters in this experiment, the resulting maximum likelihood estimate of ROI(0) is always larger than the true ROI. This is desirable since our goal is to find a localization that can reduce the spurious correlation at a far distance while also approximating the true correlation at a near distance. Second, we can see that for a larger ensemble size (or ROI) the resulting ROI(0) also gets larger. We also find that as the true ROI increases for a fixed ensemble size, the increment of ROI(0) is almost linearly proportional to the increment of ROI. That is likely due to the Gaussian assumption. More discussions on Table 3 and Fig. 2 will be presented in section 3.

3. A probabilistic method of localization

In more common and realistic cases the true covariance is almost always unknown so the algorithm A0 is quite limited for real-data implementations. Anderson (2012) introduced a probabilistic method by assuming that the true correlation between a scalar variable and a scalar observation is a random variable based on the ensemble. And in Lei and Anderson (2014) they choose the localization value that minimizes the difference between the sample regression coefficient and the localized sample regression coefficient from another pair of the same type of variables from another ensemble group of the hierarchical ensemble group filter. We follow a similar approach to Anderson (2012) in this study. More precisely speaking, we compute the optimal localization radius whenever any observation is available for data assimilation. We assume that the true covariance between a scalar observation and all nearby state variables follows a specified error distribution so we can compute the average value of some cost function over that distribution. Part of the cost function we will present has some resemblance to Eq. (1) in Lei and Anderson (2014). However, the main technique we employ in this manuscript is considerably different from that in Lei and Anderson (2014), which does not assume any prior distribution of the true covariance but instead compares the difference between the sample regression coefficient with the localized sample regression coefficient of the same type, under their definition, of pairs of observation variables and state variables from a different ensemble group. We use the Jeffreys prior and the Wishart distribution to calculate the conditional distribution of the true covariance matrix rather than using the sample covariance computed from another ensemble group member as in Lei and Anderson (2014). It may also be worth noting that the method in Lei and Anderson (2014) finds the spatially homogeneous localization while the scheme we propose in this manuscript can be applied for any location that is spatially varying. In our proposed method of this section is still the true regression coefficient, but it is now a function of the true covariance represented by a random matrix. Therefore, the regression coefficient is a random number. The distribution of the true covariance for a given sample comes directly from Bayes’s theorem:

 
formula

In the above formula p(s) is a constant for a given sample and p(s | ) is the well-known Wishart distribution. Since we assume the prior distribution is Gaussian, it has been mathematically proved that the distribution of sample covariance is exactly the Wishart distribution (see Muirhead 1982); p() is called the prior distribution, the choice of which is not unique. Here we use the Jeffreys prior, which is proportional to the square root of the determinant of the Fisher information. We do not claim that Jeffreys prior is the most appropriate distribution, but this is commonly used in the statistics community (see Bernardo and Smith 1994).

The average of the cost function F over all true (random) covariance is

 
formula

One limitation is that the Wishart distribution is valid only when the sample size is larger than the dimension of the state variable, which cannot be satisfied in most of the applications. If the sample covariance is not of full rank, then the space of s is not an open subset of the space of all symmetric positive definite (SPD) matrices. In this case, the space of singular semipositive definite matrix is a surface of the full space of SPD matrix. And the probability distribution lies on this surface. But the true covariance is usually of full rank, hence, it does not lie in this subspace. Though people have studied the analog of the Wishart distribution in the case that s is of deficient rank, we do not use it for the following two reasons:

  1. the true covariance may not lie in this subspace and the subspace is not flat;

  2. mathematically it is harder to simplify the cost function to a computable form.

Therefore, we consider computing the truncated average of the cost function instead of the overall average. Specifically speaking, we consider

 
formula

where (or Loc) is the sample (or true) covariance of the state variables that are within distance ROI from the observation. ROI can only be chosen among those that the amount of state variables within the distance ROI from an observation is strictly less than the ensemble size N. Intuitively this truncated average only takes into account the influence of the observation on the state variables that lie in the “radius of influence .” This restriction also causes the resulting ROI to be extremely small if we directly apply this method to 3D GCMs. This aspect will be studied further in the future.

a. How to choose the cost function

If we use exactly the same cost function F0 for the truncated average as in section 2, where the true covariance is known, we find that the variational method often leads to a trivial solution (ROI = 0, not shown). We believe part of the reason is that the truncation ignores all information outside of the radius of influence. In other words, the localization radius resulted by using the function F0 in Eqs. (5) and (6) might be much different if we instead use a truncated version of F0. To limit the loss of information, we consider a modified cost function:

 
formula

The advantages of the function F11 are the following:

  1. The truncated F11 and the full F11 have the same value: 
    formula
  2. In the case the true covariance is known, F0 and F11 only differ by a constant. More precisely speaking, 
    formula
    where this constant C has no impact on where the cost function’s minimum is taken in this case. Therefore, the constant C has no impact on determining the localization radius.

Combining the above properties we can see that for a given true covariance, substituting F11 for F0 in algorithm A0 is mathematically equivalent. More importantly, in the case that the true covariance is unknown, the truncated average when use Jeffreys prior can be simplified explicitly so the integration of thousands of variables can be avoided. However, because of the use of the uninformative Jeffreys prior as the true covariance probability distribution function, we find that an additional penalty is necessary to be included in the cost function to avoid a trivial solution where the resulting ROI is simply the largest possible radius of influence allowed by the algorithm. Therefore, we define

 
formula

where the second term is a penalty term that can be interpreted as the sampling noise or stability term that accounts for the difference between the true covariance and sampling covariance scaled by the localization coefficient.

Now we present the complete form of the cost function that we designed for the case when true covariance is unknown:

 
formula

where ρi = GC(ROI, di).

For the exact mathematical expression/definition of and dLoc, please refer to the  appendix. Note that this integral is taken over thousands of variables so it cannot be directly computed numerically. Mathematical simplifications to Eq. (11) can be made under certain assumptions. A computable form of this expression is derived in the  appendix.

Theorem 3.1

Suppose there is a scalar observation available at this time and the observational variance is . The observational operator h = (0, 0, …, 0, 1, 0, …, 0). Let k be the state variable that the observation is taken for. Let nLoc be the number of state variables that are within distance ROI from the observation. Let ϵi ∈ ℝN be the forecast perturbation of the ith component of the state variable.

Let

 
formula

Let

 
formula
 
formula
 
formula
 
formula

Then

 
formula
 
formula

where denotes the expectation of f(x) when x follows the distribution .

Based on the above theorem we propose algorithm A1 (Table 4) to adaptively determine the localization radius. The computational cost of this algorithm is O(N2m).

Table 4.

Algorithm A1.

Algorithm A1.
Algorithm A1.

b. Comparison of ROI(0) and ROI(1) for a given true covariance

We present the experimental results in Table 3 and Fig. 2 for a comparison of ROI using the algorithm A0 presented in section 2 [ROI(0)] versus the new probabilistic algorithm A1 [ROI(1)]. The experiment for ROI(0) is still the same as that presented in section 2, where the true covariance is known. In addition, we add columns 7–11, which are values of ROI(1) computed by algorithm A1 for the same true covariance and sample covariance that are used in columns 2–6, though the information of true covariance is completely unknown in algorithm A1. Again we do 1000 tests for each experiment and estimate the probability density of ROI(1) for each experiment. We plot the (rescaled) probability density function of ROI(1) on the right panels of Fig. 2 and list the maximum likelihood estimates (MLE) of ROI(1) in the right panel of Table 3. Note that the last row of Table 3 is the maximum value of ROI(0) and ROI(1) allowed by algorithm A0 and A1 respectively. The upper bound for ROI(0) can be modified arbitrarily with the larger upper bound translating into a larger range of possible solutions. ROImax = 130 is found to be a large enough searching range for our application. The upper bound for ROI(1) is not as flexible as in the ROI(0) case. In this experiment and the experiments in the next section, the maximum of ROI(1) is chosen to be . Therefore, when the ensemble size is small and if the ROI(1) tends to be large, the ROI(1) of the 1000 independent tests would simply concentrate at the maximum value allowed by the algorithm. This is why we see some peculiar shape of the black curves in the right panels of Fig. 2 when the ensemble size is small.

When the true ROI = 2, ROI(0) varies from 2.5 to 5. For the algorithm that uses no knowledge of the true covariance, ROI(1) changes from 2.2 to 17, which has a much larger range than ROI(0). For example, when N = 11, the MLE of ROI(0) is about 1.3 times the true ROI. When N = 121, the MLE of ROI(0) is about 2.6 times the true ROI. Note that now there is no similar linear relationship between the MLE and ROI for ROI(1) [which is found for ROI(0)]. When N = 11, ROI = 10 and 20, the probability density of ROI(1) all concentrate at a single point, which causes the graph of the (rescaled) probability density function to not be seen in Fig. 2. Similarly for N = 21 and 31, the probability density of ROI(1) in Fig. 2 is very concentrated near the maximal possible ROI (i.e., ROImax). This is because algorithm A1 does not allow ROI(1) to be larger than and the best radius of influence in the search range is very likely to be the upper bound ROImax. When N = 121, the distribution of ROI(1) is far away from the upper bound ROImax but the MLE of ROI(1) does not depend linearly on the true ROI neither.

At first glance, the differences between ROI(0) and ROI(1) are quite large over the set of parameters that we examine. When N = 121, the MLE of ROI(1) is larger than that of ROI(0) when the true ROI= 2, 5, and 10 but smaller than that of ROI(0) when the true ROI = 20. This result is not surprising because in algorithm A1 the true covariance is completely unknown as in real-data applications. On the other hand, the values of ROI(1) are close to ROI(0) when N = 61. Moreover we see that when the ensemble size is small ROI(1) is also small though the true covariance has larger ROI. We also see that for a larger ensemble size, both algorithms result in larger ROIs. Therefore, the algorithm adjusts the ROI according to the expected sampling error for a given data assimilation cycle.

Although F11 would give the same value if we slightly enlarge the region allowed by the Wishart distribution, the integral would be different due to the use of the noninformative Jeffreys prior. As a consequence, if we only use F11 as our cost function the resulting ROI(1) would just be ROImax in most situations. This is why the second term in F10 is required for making the solution nontrivial. On the other hand, the choice of prior does not necessarily have to be the Jeffreys prior. The penalty term can have other choices as well. In this sense, the method is still quite empirical when the true covariance is unknown. Future studies will explore the use of more informative prior probability distribution functions for optimizing the ensemble localization distance.

4. Application to the ensemble covariance generated by the Lorenz-96 model

a. Experiments with the Lorenz-96 system and approximated true covariance

Now we design experiments to compare algorithms A0 and A1 in a Lorenz-96 system (Lorenz 2006). Here the system is configured to have 120 variables and 30 uniformly distributed observations, which lie on the model grid points. The external forcing F is set to 8, the time step dt is set to 0.05, and observations appear every two time steps. We use an error variance of 0.04 for all observations in these experiments. Observational operator H is simply the restriction operator as required by the current version of the algorithm. To get an approximated true covariance in the Lorenz-96 system, we first run 6000 ensemble members for T1 time steps with the EnKF data assimilation. We then run the ensemble members for T2 time steps without data assimilation. Then we use the whole 6000 ensemble members to get an approximation of the true covariance true. In the first T1 time steps, we use covariance relaxation with α = 0.5 (Zhang et al. 2004). Then, we draw N ensemble members from the 6000 members and compute the sample covariance s. Finally we compute the optimal ROI(0) using algorithm A0 with the known (approximated) true covariance, and compare it with the optimal ROI(1) that was computed by algorithm A1 with no knowledge of the true covariance. We choose α = 0.5 here because the model still has a chance to blow up when no localization is used, even if when the ensemble size is 6000. We did 1000 tests for each set of the parameters and we estimated the probability density of the optimum ROI with respect to both algorithm A0 and A1. The graphs of probability density function are shown in Figs. 3 and 4 and the maximum likelihood ROI(0) and ROI(1) are listed in Table 5.

Fig. 3.

The (rescaled) probability density estimate of ROI(0) and ROI(1) for experiments using approximated true covariance in a Lorenz-96 system with T1 = 50, T2 = 1, 10, and 100. Here we normalize the curves so the maximum of each curve is always equal to 1. (left) ROI(0) and (right) ROI(1). Blue curves are for T2 = 1. Red curves are for T2 = 10. Black curves are for T2 = 100. The MLE of each curve is listed in Table 5.

Fig. 3.

The (rescaled) probability density estimate of ROI(0) and ROI(1) for experiments using approximated true covariance in a Lorenz-96 system with T1 = 50, T2 = 1, 10, and 100. Here we normalize the curves so the maximum of each curve is always equal to 1. (left) ROI(0) and (right) ROI(1). Blue curves are for T2 = 1. Red curves are for T2 = 10. Black curves are for T2 = 100. The MLE of each curve is listed in Table 5.

Fig. 4.

The (rescaled) probability density estimate of ROI(0) and ROI(1) for experiments using approximated true covariance in a Lorenz-96 system with T1 = 500, T2 = 1, 10, and 100. Here we normalize the curves so the maximum of each curve is always equal to 1. (left) ROI(0) and (right) ROI(1). Blue curves are for T2 = 1. Red curves are for T2 = 10. Black curves are for T2 = 100. The MLE of each curve is listed in Table 5.

Fig. 4.

The (rescaled) probability density estimate of ROI(0) and ROI(1) for experiments using approximated true covariance in a Lorenz-96 system with T1 = 500, T2 = 1, 10, and 100. Here we normalize the curves so the maximum of each curve is always equal to 1. (left) ROI(0) and (right) ROI(1). Blue curves are for T2 = 1. Red curves are for T2 = 10. Black curves are for T2 = 100. The MLE of each curve is listed in Table 5.

Table 5.

Comparison of F0 and F for approximated, true covariance in Lorenz-96 system. These are maximum likelihood estimates of ROI(0) and ROI(1). The probability estimates are based on 1000 independent tests. Our true covariance true is approximated by the sample covariance of 6000 ensemble members. First we run these 6000 members for T1 time steps with data assimilation. Then we run these members for T2 time steps without data assimilation. Then we compute the sample covariance and set it to be the true covariance true. For each test our sample covariance s is computed from N randomly drawn ensemble members. Then we compute ROI(0) and ROI(1) based on that specific sample covariance and true covariance. The left half are maximum likelihood estimates of ROI(0). The right half are maximum likelihood estimates of ROI(1). ROImax is the maximal ROI allowed by algorithm A0 and algorithm A1.

Comparison of F0 and F for approximated, true covariance in Lorenz-96 system. These are maximum likelihood estimates of ROI(0) and ROI(1). The probability estimates are based on 1000 independent tests. Our true covariance true is approximated by the sample covariance of 6000 ensemble members. First we run these 6000 members for T1 time steps with data assimilation. Then we run these members for T2 time steps without data assimilation. Then we compute the sample covariance and set it to be the true covariance true. For each test our sample covariance s is computed from N randomly drawn ensemble members. Then we compute ROI(0) and ROI(1) based on that specific sample covariance and true covariance. The left half are maximum likelihood estimates of ROI(0). The right half are maximum likelihood estimates of ROI(1). ROImax is the maximal ROI allowed by algorithm A0 and algorithm A1.
Comparison of F0 and F for approximated, true covariance in Lorenz-96 system. These are maximum likelihood estimates of ROI(0) and ROI(1). The probability estimates are based on 1000 independent tests. Our true covariance true is approximated by the sample covariance of 6000 ensemble members. First we run these 6000 members for T1 time steps with data assimilation. Then we run these members for T2 time steps without data assimilation. Then we compute the sample covariance and set it to be the true covariance true. For each test our sample covariance s is computed from N randomly drawn ensemble members. Then we compute ROI(0) and ROI(1) based on that specific sample covariance and true covariance. The left half are maximum likelihood estimates of ROI(0). The right half are maximum likelihood estimates of ROI(1). ROImax is the maximal ROI allowed by algorithm A0 and algorithm A1.

We do not expect that algorithms A1 and A0 give similar values in all cases, because in algorithm A1 we do not use the true covariance while in algorithm A0 we have the complete information of true covariance.

When T1 = 50 (T1 is the EnKF analysis duration), N = 61, the value of ROI(0) = 17.7, 18.6, and 10.6 for different T2 (T2 is the length of free ensemble forecast without assimilation). We see ROI(1) = 19.4, 19.5, and 11 for different T2, respectively, which are very close to the values of ROI(0). But for T1 = 500 (Fig. 4), we see the probability density function of ROI(0) is quite peculiar for T2 = 1 and 10 (blue and red curves in the left panels) when the ensemble size is small. The maximum likelihood of ROI(0) are around 7.7, 2.5, and 8 for different T2 no matter if N = 11, 21, or 31. Although it is not clear why T2 = 10 results in ROI(0) ≈ 2.5, which might be due to the system dynamics, this shows that ROI(0) only changes a little when ensemble size changes among N = 11, 21, and 31. The estimated probability densities of ROI(0) for these parameters have similar shape as can be seen in first three panels on the left of Fig. 4. Now, if we compare the graphs on all of the left panels of Figs. 2, 3, and 4, we see that although the (approximate) true covariance is known in all of these cases, there is a difference in the shapes of the probability distributions for ROI(0). It can be clearly seen that in the experiments for the left panels of Fig. 2 the probability density function curves are approximately symmetric about their axis of symmetry, in which case the sample is exactly drawn from a Gaussian distribution. As a comparison, we see that for tests using small ensemble sizes, which is for the left panels of Figs. 3 and 4, the curves do not have symmetry at all. This is probably because the distributions of state variables in the Lorenz-96 system are not close to Gaussian. For N = 21, T1 = 50, T2 = 10 (red curve in Fig. 3, N = 21), the curve even has two peaks. In this case it would be hard for us to determine a good localization radius even if we know the true covariance. For T1 = 500, T2 = 100 (the black curves in Fig. 4), the behavior of ROI(0) becomes close to that when T1 = 50, T2 = 100 for all ensemble sizes. This might be because after running the model for 100 time steps without data assimilation, the distribution of state variables become close to the “climatological distribution.” But as we compare the results in this case with the numerical results in section 4b, we see that ROI around 11 does not give optimal RMSE for the long time real implementation. However, the curves on the right panels of Figs. 3 and 4 are more symmetric. This might because our probabilistic method still assumes Gaussian distribution and we do see in the cases T1 = 50 or 500, T2 = 1 or 10, ROI(1) would result in smaller RMSE according to numerical results in section 4b, hence, it is a better value for real-data implementation.

b. Experiments that use algorithm A1 to do adaptive localization in Lorenz-96 system

Now we implement algorithm A1 in the Lorenz-96 system. As mentioned before, the Lorenz-96 system configured here has n = 120 state variables and is integrated using the Runge–Kutta fourth-order scheme. The observations are uniformly distributed and lie on some of the grid points. The number of observations m and ensemble size N are different for each case. To handle filter divergence, we use a relaxation coefficient α = 0.5 (Zhang et al. 2004). The inflation method is necessary here to prevent the model from blowing up. The choice of α = 0.5 is not tuned. We also did the same experiment for α = 0.25 and the results are similar. The choice of α can slightly influence the resulting ROI(1), but the change is ignorable at least for this experiment, so we only show the results for α = 0.5. At every time step when observations are available, we first compute ROI(1) using algorithm A1, then we use ROI(1) to be the localization radius before assimilating the observations.

1) Sensitivity to ensemble size

In Fig. 5 we plot the curves of ROI(1) as a function of time for different ensemble sizes. In general we see that the resulting localization radius increases with the ensemble size, which is consistent with the results in section 2 where the true covariance is known.

Fig. 5.

Adaptive ROI from algorithm A1 as a function of time for different ensemble sizes. We can see that for larger ensemble size the resulting ROI(1) also gets larger. This experiment is done in Lorenz-96 system.

Fig. 5.

Adaptive ROI from algorithm A1 as a function of time for different ensemble sizes. We can see that for larger ensemble size the resulting ROI(1) also gets larger. This experiment is done in Lorenz-96 system.

2) Comparison of RMSE for different observational density

In this subsection, we fix the ensemble size at N = 61 and plot the spatial RMSE of the ensemble mean as a function of time for times steps between 1 and 50, 1000 and 1200, and 1200 and 1400. These values are plotted for m = 30, 60, and 120, in Figs. 6, 7, and 8, respectively, along with ROI(1) as a function of time. While the larger ROI tends to result in smaller RMSEs, the solution can be unstable when the observations are sparse as indicated by the missing red curve in Fig. 6. The optimal choice of ROI must lower the RMSE, while maintaining a stable solution. In Table 6 we show the temporal mean of RMSE from time step 1000 to time step 5000 for different number of observations and fixed/adaptive ROI. When m = 30, the temporal mean of RMSE for the green, blue, cyan, and black curve is 0.213, 0.2274, 0.4295, and 0.2296, respectively. We find that the ROI(1) value becomes stable after a few steps of data assimilation and lies around 20.2, and the RMSEs of the black, blue, and green curves are indistinguishable. In this case the adaptive method almost gives the best ROI. On the other hand, we found that the resulting ROI is smaller when the observation density increases. This is likely because assimilating denser observations may cause the ensemble correlation to be more narrowly supported, which then leads to a smaller localization radius by this algorithm. Our speculation is based on Zhang et al. (2006, 729–730) and Daley and Menard (1993, see their Fig. 2), which is often called the “whitening” of the analysis error spectrum in that larger-scale error will be more effectively reduced first with more observations assimilated. In the meantime it is known that with denser observations larger localization radius can be applied to improve the mean update while also maintaining the stability of the solution. When m = 60, the temporal mean of the red, green, blue, cyan, and black curves is 0.1038, 0.1061, 0.1138, 0.1381, and 0.1112. The temporal mean of ROI(1) in this case is 18.2209. When m = 120, the temporal mean of the red, green, blue, cyan, and black curves is 0.0652, 0.0689, 0.0718, 0.0854, and 0.0713 while the temporal mean of ROI(1) is 16.5928. In Fig. 8, we see that the blue and black curves are nearly identical, because the temporal mean ROI(1) is approximately 16. And from the graph we can easily see that the red curve is consistently better than the black curve implying that there is still room to reduce the RMSE by enlarging the ROI. This suggests that more work needs to be done to take observation density into account so the localization radius determined by this algorithm can be more efficient. On the other hand, comparing the graphs in the top panels we see that with more observations the RMSE reduces more quickly and it is hard to distinguish the curves for the first 50 time steps in Figs. 7 and 8.

Fig. 6.

Comparison of spatial RMSE of ensemble mean using adaptive ROI with that using fixed ROI = 8, 16, and 24 in the Lorenz-96 system. We can see that the ROI(1) converges to around 20 after a few time steps. The RMSE for adaptive ROI(1) is comparable with that for optimal ROI = 24. The bottom plot shows how ROI(1) evolves with time.

Fig. 6.

Comparison of spatial RMSE of ensemble mean using adaptive ROI with that using fixed ROI = 8, 16, and 24 in the Lorenz-96 system. We can see that the ROI(1) converges to around 20 after a few time steps. The RMSE for adaptive ROI(1) is comparable with that for optimal ROI = 24. The bottom plot shows how ROI(1) evolves with time.

Fig. 7.

Comparison of spatial RMSE of ensemble mean using adaptive ROI with that using fixed ROI = 8, 16, …, 30 in the Lorenz-96 system. We can see that the ROI(1) converges to around 18 after a few time steps. The RMSE for adaptive ROI(1) is slightly larger than that for optimal ROI = 30. The bottom plot shows how ROI(1) evolves with time.

Fig. 7.

Comparison of spatial RMSE of ensemble mean using adaptive ROI with that using fixed ROI = 8, 16, …, 30 in the Lorenz-96 system. We can see that the ROI(1) converges to around 18 after a few time steps. The RMSE for adaptive ROI(1) is slightly larger than that for optimal ROI = 30. The bottom plot shows how ROI(1) evolves with time.

Fig. 8.

Comparison of spatial RMSE of ensemble mean using adaptive ROI with that using fixed ROI = 8, 16, …, 30 in the Lorenz-96 system. We can see that the ROI(1) converges to around 16 after a few time steps. The RMSE for adaptive ROI(1) is slightly larger than that for optimal ROI = 30. The bottom plot shows how ROI(1) evolves with time.

Fig. 8.

Comparison of spatial RMSE of ensemble mean using adaptive ROI with that using fixed ROI = 8, 16, …, 30 in the Lorenz-96 system. We can see that the ROI(1) converges to around 16 after a few time steps. The RMSE for adaptive ROI(1) is slightly larger than that for optimal ROI = 30. The bottom plot shows how ROI(1) evolves with time.

Table 6.

The temporal RMSE of the ensemble mean from time steps 1000 to 5000 in the Lorenz-96 system for different observation densities and fixed/adaptive ROI. NaN refers to missing data caused by frequent model breakdown.

The temporal RMSE of the ensemble mean from time steps 1000 to 5000 in the Lorenz-96 system for different observation densities and fixed/adaptive ROI. NaN refers to missing data caused by frequent model breakdown.
The temporal RMSE of the ensemble mean from time steps 1000 to 5000 in the Lorenz-96 system for different observation densities and fixed/adaptive ROI. NaN refers to missing data caused by frequent model breakdown.

3) Curve of F value

The value of the cost function F for each observation in the m = 30 cases is plotted in the top of Fig. 9 as a function of ROI for time steps 2, 120, and 1020. The ensemble size N = 61. The red dots indicate where F is a minimum for each observation. In the bottom panels, the curves are the (rescaled) estimated probability density of all possible ROIs. The blue dot shows where the maximum likelihood is taken, hence it is also the final ROI(1) we use for localization at each time step. The curves in the top panels are almost convex, making finding the minimum value (red dots) of F for each observation not a trivial problem. This is also the consequence of adding the penalty term to the cost function. The bottom panels show that the optimal ROI(1) for each observation is distributed in a way that the probability density function of ROI(1) only has one peak (i.e., the maximum likelihood estimate is clearly well defined). If the density function has two peaks, then it is not easy to determine a unique localization radius. In that case, one can try dividing observations into groups so for each group of observations the probability density function of ROI(1) is better behaved. This aspect of the algorithm needs further exploration in the future.

Fig. 9.

(top) F-value curves [defined in Eq. (11)] for different observations and different time steps, and (bottom) the estimated (rescaled) probability density of ROI. This experiment is done in the Lorenz-96 system. Since we have m = 30 observations, we have 30 curves in each panel at the top. The red dots are where the minimum of F value are taken for each observation. The blue dots are the maximum likelihood estimate of ROI(1) at each time step. Hence, the blue dots are the output ROI(1) of algorithm A1 at each time step.

Fig. 9.

(top) F-value curves [defined in Eq. (11)] for different observations and different time steps, and (bottom) the estimated (rescaled) probability density of ROI. This experiment is done in the Lorenz-96 system. Since we have m = 30 observations, we have 30 curves in each panel at the top. The red dots are where the minimum of F value are taken for each observation. The blue dots are the maximum likelihood estimate of ROI(1) at each time step. Hence, the blue dots are the output ROI(1) of algorithm A1 at each time step.

5. Discussion and summary

In this article we first presented a cost function approach to analyze the sampling error issue in the case where the true covariance is known. Then we generalized this method to the case where the true covariance is not known. We presented a probabilistic approach to determine the localization radius adaptively when serial ensemble square root filters are used to assimilate uncorrelated observations. The advantage of this method is that it uses the information from the ensemble members and observations only within a single assimilation window. Further the computational cost of this algorithm is small and its performance in the Lorenz-96 system is promising. We compared the results from this probabilistic method with that from the deterministic method in the case that the true covariance is known. The results show that the probabilistic method gives more useful radius of influence for long time implementation than the deterministic method. As a side result, we find that in order to determine a good localization radius, one needs to consider more than simply sampling error. In particular, the dynamical property of the model needs to be taken into account.

There are severe issues worth further discussion and investigation. First of all, the algorithm under discussion does not utilize any information regarding the observation density. As a consequence, the ROI(1) is not the optimal radius of influence when observations are dense. Second the algorithm in this article is based on serial ensemble square root Kalman filters and has the requirement that the observational operator H must be the restriction operator and that observations must be uncorrelated, which must be eliminated in the future work. Third the output of ROI(0) in the experiments in Lorenz-96 system, where the true covariance is approximated by 6000 ensemble members, is clearly not optimal according to the curves in Fig. 8, suggesting again that in order to get a good localization radius other system dynamics besides sampling error need to be considered. On the other hand the choice of a noninformative prior and the use of an empirical penalty term are not necessarily to be optimal as presented in this article. Deriving a useful mathematical formula for other choices of prior is a challenging problem.

There are several ways to make modifications in the algorithm in order to make use of more statistical tools. For example, we can generalize the concept of maximum likelihood as the “peaks of the density function.” More precisely speaking, if the probability density function of ROI(1) has more than one peak, it may be more wise to use the different peak values as the ROI for different observations rather than using only the peak value at the maximum likelihood estimate. One can also try incorporating this scheme with other methods. For example, one can consider taking observational impact into account to get a weighted maximum likelihood estimate. Future research is also warranted to extend the current one-dimensional study to 2D or 3D as well as for covariance across different physical variable (e.g., between temperature and winds).

Acknowledgments

We thank Zhibiao Zhao for his valuable suggestion on the choice of Jeffreys prior; Ningtao Wang and Cheng You for their valuable suggestion about Wishart distribution; Jeffrey Anderson for his encouraging us to publish this result; John Harlim, Jonathan Poterjoy, and the anonymous reviewers for their patient and careful reading and generous suggestion on how to make the math more clear. This work was supported in part by Office of Naval Research Grant N000140910526 and the National Science Foundation Grant ATM-0840651.

APPENDIX

Mathematical Derivation

a. Notation

For convenience the notation in this appendix is a little different from those in the main context. For example, s in the appendix is not sample covariance, but ; is actually Loc; the true regression coefficient is not denoted by , but by ri; and n is actually short for nLoc.

b. Review of setup

We now consider how to determine the localization radius when assimilating a single observation. Since we only assimilate one observation each time, for the convenience of notation, we always assume the observation is on the first grid point. The observation operator H is a row vector in this case and more specifically H = (1, 0, …, …, 0).

Let be the regression coefficient for the ith variable when using the pure sample covariance matrix. Let ri be the regression coefficient for the ith variable when using the true covariance matrix. Then the regression coefficient for the ith variable when using localized sample covariance matrix is where ρi = GC(ROI, di).

We do not know what the true covariance is hence we think of the true covariance as a random matrix. Let s be N − 1 times the sample covariance matrix. Let p( | s) be the probability density of the true covariance matrix given s. Then

 
formula

Let be the observational covariance, H = (1, 0, …, 0).

Then

 
formula
 
formula

Let n = nLoc be the number of state variables within distance ROI to the observation and N be the ensemble size. We require N > nLoc.

We define functions F1 and F2:

 
formula

and

 
formula

where

 
formula

is a constant depending only on the sample and n such that

 
formula

We want to minimize the following:

 
formula

In this draft we use Jeffreys prior: .

The notations are

 
formula
 
formula

(Muirhead 1982).

c. Mathematical derivation
1) First we compute :
 
formula

Let s = (s)−1 and = −1

 
formula

[By Muirhead (1982) theorem 2.1.8, d−1 = det()n−1d.]

 
formula

[By Muirhead (1982) theorem 2.1.11.]

 
formula

Hence,

lemma 1: .

2) Compute
 
formula

Let

 
formula

where n−1 is the (n − 1) × (n − 1) identity matrix, 0 is the zero row vector, and t = (t2, …, tn)T is a column vector whose entries are defined by ti =−(b1i/bii) for 2 ≤ in. We define d = dt1, …, dtn and dt = dt2, …, dtn. Let be a (n − 1) × (n − 1) matrix whose entries are defined by = bi+1,j+1b1,i+1b1,j+1/b11. Then it can be straightforwardly verified that

 
formula

and we have

lemma 2: .

Because of space limitations we merely remark that this can be proven by careful but straightforward computation, leaving to the interested reader the task of filling in the details.

It is easy to see that . So

 
formula

It is known that for n × n SPD matrix , d−1 = det()−(n+1)d [see Muirhead (1982), theorem 2.1.8].

Therefore and

 
formula

Now define . So

 
formula

Let the ensemble perturbation be and define δi = ϵi + tiϵ1. Then by direct computation one can get

 
formula

Then

 
formula

Let . Then

 
formula

It follows from Muirhead (1982) theorem 2.1.11 that

 
formula

Hence,

 
formula

Let

 
formula

and , , , for 2 ≤ i and jn.

Then

 
formula
 
formula

and d = dt1dt.

Hence, it suffices to compute the following:

 
formula
 
formula
 
formula

Lemma 3: For 2l > n + 2 and even integer n,

 
formula

Lemma 4: For 2l > n + 2, .

Lemma 5: Let ∈ ℝn×n denote the identity matrix and s ∈ ℝn be a column vector. Then det( + ssT) = 1 + |s|2.

Lemma 6:

 
formula
 
formula

where is the singular value decomposition, and (T)i is the ith row of the matrix T.

Proof: Recall that and :

 
formula

Let as before, then and :

 
formula

where a3 is essentially a Gamma function. One can easily get the desired expression of a3 by substitution.

Let

 
formula

then

 
formula

Recall the following (well known) identities about multivariate gamma functions (assuming are both odd numbers):

 
formula
 
formula
 
formula
 
formula

It is not hard to check that

 
formula

Hence (combining all the above identities), we have

lemma 7:

 
formula
3) Compute :

By similar derivation we can get

 
formula

Hence, it suffices to compute

 
formula

Lemma 8:

 
formula

It is easy to see that by symmetry when ij

 
formula
 
formula

On the other hand,

 
formula

Hence, one can derive

 
formula

Lemma 9: For odd numbers n, 2k:

 
formula

And similarly, we can derive

Lemma 10:

 
formula

Combining all the results above, we have the following theorem:

Theorem 5.1:

 
formula
 
formula

where Δii is the ith diagonal element of Δ.

It is not hard to see that

 
formula

And by doing the same substitution as in the computation of a3 in the proof of lemma 7, it is not hard to see the equivalence between Eqs. (A5), (A6) and Eqs. (4), (5).

REFERENCES

REFERENCES
Anderson
,
J. L.
,
2001
:
An ensemble adjustment Kalman filter for data assimilation
.
Mon. Wea. Rev.
,
129
,
2884
2903
, doi:.
Anderson
,
J. L.
,
2012
:
Localization and sampling error correction in ensemble Kalman filter data assimilation
.
Mon. Wea. Rev.
,
140
,
2359
2371
, doi:.
Anderson
,
J. L.
, and
L.
Lei
,
2013
:
Empirical localization of obervation impact in ensemble Kalman filters
.
Mon. Wea. Rev.
,
141
,
4140
4153
, doi:.
Bernardo
,
J. M.
, and
A. F.
Smith
,
1994
: Bayesian Theory. John Wiley and Sons, 608 pp.
Bishop
,
C. H.
, and
D.
Hodyss
,
2007
:
Flow adaptive moderation of spurious ensemble correlations and its use in ensemble-based data assimilation
.
Quart. J. Roy. Meteor. Soc.
,
133
,
2029
2044
, doi:.
Bishop
,
C. H.
,
B. J.
Etherton
, and
S. J.
Majumdar
,
2001
:
Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects
.
Mon. Wea. Rev.
,
129
,
420
436
, doi:.
Bishop
,
C. H.
,
D.
Hodyss
,
P.
Steinle
,
H.
Sims
,
A. M.
Clayton
,
A. C.
Lorenc
,
D. M.
Barker
, and
M.
Buehner
,
2011
:
Efficient ensemble covariance localization in variational data assimilation
.
Mon. Wea. Rev.
,
139
,
573
580
, doi:.
Daley
,
R.
, and
R.
Menard
,
1993
:
Spectral characteristics of Kalman filter systems for atmospheric data assimilation
.
Mon. Wea. Rev.
,
121
,
1554
1565
, doi:.
Evensen
,
G.
,
1994
:
Sequential data assimilation with nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics
.
J. Geophys. Res.
,
99
,
10 143
10 162
, doi:.
Gaspari
,
G.
, and
S. E.
Cohn
,
1999
:
Construction of correlation functions in two and three dimensions
.
Quart. J. Roy. Meteor. Soc.
,
125
,
723
757
, doi:.
Hamill
,
T. M.
, and
C.
Snyder
,
2000
:
A hybrid ensemble Kalman filter–3D variational analysis scheme
.
Mon. Wea. Rev.
,
128
,
2905
2919
, doi:.
Hamill
,
T. M.
,
J. S.
Whitaker
, and
C.
Snyder
,
2001
:
Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter
.
Mon. Wea. Rev.
,
129
,
2776
2790
, doi:.
Houtekamer
,
P. L.
, and
H. L.
Mitchell
,
2001
:
A sequential ensemble Kalman filter for atmospheric data assimilation
.
Mon. Wea. Rev.
,
129
,
123
137
, doi:.
Jun
,
M.
,
I.
Szunyogh
,
M. G.
Genton
,
F.
Zhang
, and
C. H.
Bishop
,
2011
:
A statistical investigation of sensitivity of ensemble-based Kalman filters to covariance filtering
.
Mon. Wea. Rev.
,
139
,
3036
3051
, doi:.
Kalman
,
R. E.
,
1960
: A new approach to linear filtering and prediction problems. Trans. ASME J. Fluids Eng.,82D,
35
45
, doi:.
Lei
,
L.
, and
J.
Anderson
,
2014
:
Comparisons of empirical localization techniques for serial ensemble Kalman filters in a simple atmospheric general circulation model
.
Mon. Wea. Rev.
,
142
,
739
754
, doi:.
Lorenc
,
A.
,
2003
:
The potential of the ensemble Kalman filter for NWP—A comparison with 4D-Var
.
Quart. J. Roy. Meteor. Soc.
,
129
,
3183
3203
, doi:.
Lorenz
,
E. N.
,
2006
: Predictability—A problem partly solved. Predictability of Weather and Climate, T. N. Palmer and R. Hagedorn, Eds., Cambridge University Press, 40–58.
Muirhead
,
R. J.
,
1982
: Aspects of Multivariate Statistical Theory. John Wiley and Sons, 673 pp.
Ott
,
E.
, and Coauthors
,
2004
:
A local ensemble Kalman filter for atmospheric data assimilation
. Tellus,
56A
,
415
428
, doi:.
Snyder
,
C.
, and
F.
Zhang
,
2003
:
Assimilation of simulated Doppler radar observations with an ensemble Kalman filter
.
Mon. Wea. Rev.
,
131
,
1663
1677
, doi:.
Tippett
,
M. K.
,
J. L.
Anderson
,
C. H.
Bishop
,
T. M.
Hamill
, and
J. S.
Whitaker
,
2003
:
Ensemble square root filters
.
Mon. Wea. Rev.
,
131
,
1485
1490
, doi:.
Wang
,
X.
,
D. M.
Barker
,
C.
Snyder
, and
T. M.
Hamill
,
2008a
:
A hybrid ETKF–3DVar data assimilation scheme for the WRF model. Part I: Observing system simulation experiment
.
Mon. Wea. Rev.
,
136
,
5116
5131
, doi:.
Wang
,
X.
,
D. M.
Barker
,
C.
Snyder
, and
T. M.
Hamill
,
2008b
:
A hybrid ETKF–3DVar data assimilation scheme for the WRF model. Part II: Real observation experiments
.
Mon. Wea. Rev.
,
136
,
5132
5147
, doi:.
Whitaker
,
J. S.
, and
T. M.
Hamill
,
2002
:
Ensemble data assimilation without perturbed observations
.
Mon. Wea. Rev.
,
130
,
1913
1924
, doi:.
Zhang
,
F.
,
C.
Snyder
, and
J.
Sun
,
2004
:
Impacts of initial estimate and observation availability on convective-scale data assimilation with an ensemble Kalman filter
.
Mon. Wea. Rev.
,
132
,
1238
1253
, doi:.
Zhang
,
F.
,
Z.
Meng
, and
A.
Aksoy
,
2006
:
Tests of an ensemble Kalman filter for mesoscale and regional-scale data assimilation. Part I: Perfect model experiments
.
Mon. Wea. Rev.
,
134
,
722
736
, doi:.
Zhang
,
F.
,
M.
Zhang
, and
J. A.
Hansen
,
2009
:
Coupling ensemble Kalman filter with four-dimensional variational data assimilation
.
Adv. Atmos. Sci.
,
26
,
1
8
, doi:.
Zhang
,
F.
,
M.
Zhang
, and
J.
Poterjoy
,
2013
:
E3DVar: Coupling an ensemble Kalman filter with three-dimensional variational data assimilation in a limited-area weather prediction model and comparison to E4DVar
.
Mon. Wea. Rev.
,
141
, 900–917, doi:.

Footnotes

This article is included in the Sixth WMO Data Assimilation Symposium Special Collection.