Local Ensemble Transform Kalman Filter with Cross Validation

Mark Buehner Data Assimilation and Satellite Meteorology Research Section, Environment and Climate Change Canada, Dorval, Quebec, Canada

Search for other papers by Mark Buehner in
Current site
Google Scholar
PubMed
Close
Open access

Abstract

Many ensemble data assimilation (DA) approaches suffer from the so-called inbreeding problem. As a consequence, there is an excessive reduction in ensemble spread by the DA procedure, causing the analysis ensemble spread to systematically underestimate the uncertainty of the ensemble mean analysis. The stochastic EnKF used for operational NWP in Canada largely avoids this problem by applying cross validation, that is, using an independent subset of ensemble members for updating each member. The goal of the present study is to evaluate two new variations of the local ensemble transform Kalman filter (LETKF) that also incorporate cross validation. In idealized numerical experiments with Gaussian-distributed background ensembles, the two new LETKF approaches are shown to produce reliable analysis ensembles such that the ensemble spread closely matches the uncertainty of the ensemble mean, without any ensemble inflation. In ensemble DA experiments with highly nonlinear idealized forecast models, the deterministic version of the LETKF with cross validation quickly diverges, but the stochastic version produces better results, nearly identical to the stochastic EnKF with cross validation. In the context of a regional NWP system, ensemble DA experiments are performed with the two new LETKF-based approaches with cross validation, the standard LETKF, and the stochastic EnKF. All approaches with cross validation produce similar ensemble spread at the first analysis time, though the amplitude of the changes to the individual members is larger with the stochastic approaches. Over the 10-day period of the experiments, the fit of the ensemble mean background state to radiosonde observations is statistically indistinguishable for all approaches evaluated.

Denotes content that is immediately available upon publication as open access.

For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Mark Buehner, mark.buehner@canada.ca

Abstract

Many ensemble data assimilation (DA) approaches suffer from the so-called inbreeding problem. As a consequence, there is an excessive reduction in ensemble spread by the DA procedure, causing the analysis ensemble spread to systematically underestimate the uncertainty of the ensemble mean analysis. The stochastic EnKF used for operational NWP in Canada largely avoids this problem by applying cross validation, that is, using an independent subset of ensemble members for updating each member. The goal of the present study is to evaluate two new variations of the local ensemble transform Kalman filter (LETKF) that also incorporate cross validation. In idealized numerical experiments with Gaussian-distributed background ensembles, the two new LETKF approaches are shown to produce reliable analysis ensembles such that the ensemble spread closely matches the uncertainty of the ensemble mean, without any ensemble inflation. In ensemble DA experiments with highly nonlinear idealized forecast models, the deterministic version of the LETKF with cross validation quickly diverges, but the stochastic version produces better results, nearly identical to the stochastic EnKF with cross validation. In the context of a regional NWP system, ensemble DA experiments are performed with the two new LETKF-based approaches with cross validation, the standard LETKF, and the stochastic EnKF. All approaches with cross validation produce similar ensemble spread at the first analysis time, though the amplitude of the changes to the individual members is larger with the stochastic approaches. Over the 10-day period of the experiments, the fit of the ensemble mean background state to radiosonde observations is statistically indistinguishable for all approaches evaluated.

Denotes content that is immediately available upon publication as open access.

For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Mark Buehner, mark.buehner@canada.ca

1. Introduction

Several ensemble data assimilation (DA) algorithms are currently used for applications to numerical weather prediction (NWP). These generally fall into three categories: the stochastic ensemble Kalman filter (EnKF) that assimilates perturbed observations (e.g., Houtekamer and Mitchell 1998), the class of approaches collectively referred to as deterministic ensemble square root filters (EnSRF; Tippett et al. 2003) and the local version of the ensemble transform Kalman filter (LETKF; Hunt et al. 2007). Both the EnSRF and LETKF can be considered deterministic ensemble filters as they do not require perturbing the observations. However, they differ in how observations are assimilated in that the EnSRF assimilates small batches of observations serially to update model variables at all affected grid points and the LETKF simultaneously assimilates all observations affecting a single grid point. At Environment and Climate Change Canada (ECCC), the stochastic EnKF has been used operationally for ensemble DA since 2005. Since then, some studies suggest that the assimilation of satellite radiance observations in ensemble DA systems can be improved by modifying the method for spatial localization (Campbell et al. 2010; Lei et al. 2018). This localization method can be efficiently implemented with the LETKF approach using an expanded ensemble to implicitly account for model space covariance localization (Bishop et al. 2017). For this and other reasons related to algorithmic differences between the stochastic EnKF and the LETKF, it was decided to perform a comparison between the stochastic EnKF and LETKF approaches at ECCC using both idealized numerical experiments and initial tests with an experimental regional NWP system for ensemble DA.

In the typical case of using a small ensemble size relative to the number of observations and model variables, many ensemble DA algorithms suffer from the so-called inbreeding problem (section 3 of Houtekamer and Zhang 2016). This occurs when the analysis update of an ensemble member is computed using error statistics estimated from an ensemble that includes the member being updated. An important consequence of inbreeding is the excessive reduction in ensemble spread by the DA procedure such that the analysis ensemble spread systematically underestimates the uncertainty of the ensemble mean analysis (even when the background ensemble spread is consistent with the uncertainty of the ensemble mean background state). Most of the current implementations of the LETKF and EnSRF are among the algorithms that suffer from this problem. Numerous ensemble inflation methods can be used to compensate for this loss of reliability of the ensemble, including additive inflation, multiplicative inflation and relaxation methods (e.g., Houtekamer et al. 2009; Whitaker and Hamill 2012). Even without the problem of inbreeding, some type of inflation is still typically necessary to compensate for various sources of error in both the model and the data assimilation procedure itself. Nevertheless, it is preferable that the ensemble DA algorithm not suffer from inbreeding and therefore modifies the ensemble distribution to accurately reflect the reduction in uncertainty in the ensemble mean from assimilating observations.

The effect of inbreeding can be substantially reduced or even eliminated by applying the concept of cross validation. This was introduced by Houtekamer and Mitchell (1998) in the context of a stochastic EnKF and also discussed more generally by Houtekamer and Zhang (2016). With this approach, the ensemble is split into equal sized subensembles. Then, for each subensemble, the analysis update is computed using a Kalman gain matrix estimated using the other, independent, members. Houtekamer et al. (2009) show how, in the context of idealized DA experiments with a perfect NWP model, the ensemble spread closely matches the uncertainty in the ensemble mean when using a stochastic EnKF with cross validation. Using a simple scalar model with chaotic behavior, Mitchell and Houtekamer (2009, hereafter MH2009) further examined the effectiveness of cross validation in a stochastic EnKF for different total ensemble sizes and numbers of subensembles. To date, cross validation has only been extensively applied to the stochastic EnKF algorithm developed and used for operational NWP at ECCC. Whitaker and Hamill (2002) also describe a version of their deterministic EnSRF with cross validation, but this was only tested with an idealized low-dimensional model. They note that the use of cross validation in the EnSRF requires significantly more computations while producing qualitatively similar results than the original EnSRF when tested with an idealized model. When tested with a highly nonlinear model, Bowler et al. (2013) showed that the EnSRF with cross validation could, under certain conditions, lead to highly non-Gaussian ensembles and divergence of the ensemble mean from observations. Cross validation has not yet been applied to the LETKF algorithm for ensemble data assimilation.

The goal of the present study is to present and evaluate two new variations of the LETKF that use cross validation to reduce the effect of inbreeding. The first variation is a “stochastic” approach that, like the current ECCC implementation of the EnKF, uses random perturbations to sample the uncertainty due to observation error so that the full Kalman gain matrix (normally used in the LETKF only for updating the ensemble mean) can be used for updating each ensemble member independently. The second variation is a “deterministic” approach, like the original LETKF, that relies on the gain form of the LETKF as proposed by Bishop et al. (2017). As with the stochastic EnKF with cross validation, both LETKF approaches with cross validation use a subset of ensemble members independent from the member perturbation being updated. The current study also provides one of the first comparisons between stochastic and deterministic ensemble filters in a fully realistic NWP context.

In the following section, the ensemble DA approaches considered in this study are described: standard stochastic EnKF, stochastic EnKF with cross validation (hereafter EnKF-CV), standard LETKF, stochastic LETKF with cross validation (LETKF-SCV), and deterministic LETKF with cross validation (LETKF-DCV). Results are presented in section 3 from applying these approaches to the simple problem of updating a Gaussian-distributed background ensemble that is repeatedly randomly generated from a known true error covariance matrix. In sections 4 and 5, the ensemble DA approaches are applied to idealized DA experiments using the logistic map and the 40-variable model of Lorenz and Emanuel (1998), respectively, as the forecast model. Preliminary results are presented in section 6 from applying these approaches to an experimental ensemble DA system for NWP based on a limited-area atmospheric model over Canada. Finally, some conclusions are given in section 7.

2. Ensemble DA approaches

a. Standard LETKF

The local version of the ensemble transform Kalman filter (Bishop et al. 2001; Hunt et al. 2007) is a commonly used ensemble DA approach for NWP (e.g., Miyoshi et al. 2010; Schraff et al. 2016). To allow efficient implementation on a parallel computer, the analysis procedure is performed independently at each grid point by assimilating all of the Ny observations within a specified radius of influence of the grid point (though the algorithm is usually made more efficient by following the approach described by Yang et al. 2009). Following this local approach, the ensemble members are updated through a two-step procedure (for more details see Hunt et al. 2007). In the first step, a set of weights is computed that, when multiplied by the local background ensemble perturbations, generates the analysis increment for the ensemble mean at the local grid point being considered. These weights are computed with a Kalman analysis equation (based on the Kalman gain matrix K = PaHTR−1) that is modified to operate within the subspace spanned by the local ensemble member perturbations:
w¯a=P˜a(Yb)TR1(yoy¯b),
where R is the observation error covariance matrix, yo is the vector of observations, y¯b is the ensemble mean of the background ensemble in observation space, given by
y¯b=1Nem=1NeH(xmb),
the matrix P˜a is the Ne × Ne analysis error covariance matrix in the subspace spanned by the Ne background ensemble perturbations, given by
P˜a=[(Ne1)I+(Yb)TR1Yb]1,
and the nth column of the Ny × Ne matrix Yb is the ensemble perturbation of the nth member projected into observation space,
[Yb]n=H(xnb)y¯b,
where H(x) is the observation operator that projects the model state x into observation space. In the second step, an Ne × Ne weight matrix for the ensemble perturbations is computed such that it equals the symmetric square root of the analysis error covariance matrix P˜a in the subspace spanned by the background ensemble perturbations, given by
Wa=[(Ne1)P˜a]1/2.
Multiplication of this weight matrix by the background ensemble perturbations results in an ensemble of perturbations with covariance equal to the estimated analysis error covariance. The final analysis for the nth ensemble member is then obtained by summing the weights for the ensemble mean increment and the nth column of the weight matrix for the ensemble perturbations and multiplying by the local background ensemble perturbations:
xna=x¯b+Xb(w¯a+[Wa]n),
where x¯b is the background ensemble mean: x¯b=(1/Ne)n=1Nexnb, and the matrix Xb consists of the background ensemble perturbations (i.e., [Xb]n=xnbx¯b).

Spatial covariance localization is an important aspect of any ensemble-based assimilation approach to reduce the effect of sampling error when using relatively small ensembles (Hamill and Whitaker 2001). With the LETKF approach, it is not possible to directly apply localization to the background error covariances due to the analysis procedure described above being performed within the subspace spanned by the ensemble member perturbations [though, as mentioned in the introduction, it is possible to use an expanded ensemble to implicitly account for model space localization (Bishop et al. 2017)]. Therefore, instead of reducing the background error covariance for increasingly separated pairs of locations, the observation error variance is increased with increasing distance between the observation and the local grid point where the analysis is being computed (Hunt et al. 2007). As a consequence, the influence of observations is smoothly reduced as the distance increases between the observation and the local analysis state grid point. If the observation error correlation matrix is assumed to be diagonal, then spatial localization is applied to the calculation of the weight fields by multiplying the diagonal elements of the matrix R−1 by a monotonically decreasing localization function.

b. Stochastic EnKF with and without cross validation

The stochastic EnKF (Burgers et al. 1998) estimates the full Kalman gain matrix [i.e., K = PHT(HPHT + R)−1] by computing the error covariances from the complete ensemble of background states. To correctly account for observation uncertainty, the assimilated observations are perturbed such that the perturbations have zero ensemble mean and are randomly drawn from a Gaussian distribution with covariance equal to the observation error covariance matrix R. The same Kalman gain matrix is used to compute the full analysis state for each ensemble member. Using a similar notation as for the LETKF, the analysis for the nth member is therefore given by
xna=xnb+Xb(Yb)T[Yb(Yb)T+(Ne1)R]1×[yo+εnH(xnb)],
where εn is the observation random perturbation for the nth ensemble member.

Spatial covariance localization is applied in this EnKF approach by applying a Schur product (i.e., element-wise multiplication) to both the Xb(Yb)T and Yb(Yb)T covariance matrices. These matrices are multiplied by localization matrices such that the resulting covariances are progressively reduced for increasing distance either between observation and grid point (for the first matrix) or between two observations (for the second matrix).

The stochastic EnKF that is currently an important component of the suite of ECCC operational NWP systems uses cross validation (Houtekamer and Mitchell 1998; MH2009). In the absence of any model error, this approach produces analysis ensembles with spread that closely matches the error in the ensemble mean (as demonstrated in Fig. 3 of Houtekamer et al. 2009). With k-fold cross validation (see section 3c of Houtekamer and Zhang 2016), the original background ensemble is divided into k subensembles of equal size. Then the analysis update for all members within a given subensemble is obtained using background error covariances estimated using all of the other, independent, subensembles. For the EnKF-CV algorithm, the analysis update is computed for each member using an estimate of the full Kalman gain matrix. The nth member of the analysis ensemble is obtained as
xna=xnb+X^b(Y^b)T[Y^b(Y^b)T+(N^e1)R]1×[yo+εnH(xnb)],
where the “hat” on X^b and Y^b indicates that they only consist of the N^e ensemble member perturbations from all subensembles that do not include the nth member.

In the context of a realistic NWP application, the observations cannot all be assimilated simultaneously with the stochastic EnKF, since a matrix with dimension equal to the number of observations must be inverted. Therefore, the complete set of observations is divided into batches of more manageable size. All ensemble members are then updated after each batch is assimilated sequentially, one after the other [see Houtekamer and Mitchell (2005) for more details on the efficient implementation of this EnKF algorithm].

c. Stochastic LETKF with cross validation

Because the analysis update for the LETKF is performed within the subspace spanned by the background ensemble perturbations, it is not immediately apparent how to apply cross validation within the LETKF algorithm. However, one possible approach can be formulated by noting the similar role of the analysis equations of the EnKF-CV algorithm [Eq. (8)] and of the LETKF equation for computing the ensemble mean analysis increment [Eq. (1)]. In both cases, the full Kalman gain matrix is computed using ensemble estimates of the background error covariances, though the equations appear to be very different (mostly because they are based on alternative, but mathematically equivalent, forms of the Kalman gain matrix). Consequently, by following the same approach as the EnKF-CV, we can obtain a stochastic LETKF with cross validation (LETKF-SCV). It should be emphasized that this approach can be considered as simply an alternative approach for implementing the stochastic EnKF with cross validation, though the results can significantly differ when assimilating a large number of observations and applying spatial localization. However, due to the numerous similarities with the original LETKF algorithm with respect to the mathematical formulation and technical implementation (i.e., only minor modifications are required to an existing LETKF system to obtain the LETKF-SCV algorithm) it was decided, somewhat arbitrarily, to include “LETKF” in the name of the approach instead of simply referring to it as a variant of the stochastic EnKF.

In the LETKF-SCV approach, the full Kalman gain matrix is applied to each ensemble member and the observations are perturbed to obtain the following weight vector for the nth ensemble member:
wna=P^a(Y^b)TR1[yo+εnH(xnb)],
where the P^a and Y^b are computed using only the members from all subensembles that do not contain the nth member. This weight vector represents the analysis increment for the nth ensemble member in the subspace spanned by the same subset of background ensemble perturbations used to compute the Kalman gain matrix in (9). Then, the nth analysis ensemble member is computed as
xna=xnb+X^bwna,
where X^b contains the background ensemble perturbations from all subensembles that do not contain the nth member. This approach is similar to one proposed by Bishop et al. (2017) that also uses perturbed observations in the context of an ensemble transform Kalman filter, but for facilitating the use of an expanded ensemble to account for model space covariance localization.

Alternatively, the analysis ensemble obtained with (10) could be recentered on an analysis mean computed with the same equation as with the standard LETKF that uses the entire ensemble and unperturbed observations [Eq. (1)]. This is the approach that will be used in tests with a realistic NWP system so as to reduce the differences between the different LETKF approaches being tested.

d. Deterministic LETKF with cross validation

To avoid the need to randomly perturb the observations, as in the standard LETKF, a deterministic LETKF with cross validation (LETKF-DCV) can be formulated using the so-called gain form of the LETKF. This approach was originally proposed by Bishop et al. (2017) to facilitate the use of an expanded ensemble that implicitly accounts for model space covariance localization (Lei et al. 2018). They demonstrate how the expanded ensemble is used to compute a modified Kalman gain matrix for updating only the ensemble perturbations of the much smaller original background ensemble. It is straightforward to apply the same procedure to cross validation such that the modified gain matrix is computed using a subset of the ensemble members that is independent from the member being updated.

For the LETKF-DCV approach, the ensemble mean analysis increment is first computed using the same approach as the standard LETKF. Then, the analysis ensemble perturbation for the nth member is obtained by first computing the N^e×N^e matrix:
A^=(Y^b)TR1Y^b,
and its eigendecomposition:
A^=E^Λ^E^T,
where again, the “hat” indicates quantities that only consist of the N^e ensemble members from all subensembles that do not include the nth member. Then, [similar to Eq. (10) of Lei et al. (2018)] the analysis ensemble perturbation for the nth member is given by
[Xa]n=[Xb]nX^bE^{I(N^e1)1/2[Λ^+(N^e1)I]1/2}×Λ^1E^T(Y^b)TR1[Yb]n.
As pointed out by Bishop et al. (2017), the second term on the RHS plays the role of the modified Kalman gain in the EnSRF of Whitaker and Hamill (2002) that, when applied to the background ensemble perturbations, provides the appropriate analysis ensemble perturbation. Cross validation is indeed used in (13), since the nth background ensemble perturbation is updated using quantities computed with only ensemble members from all subensembles that do not include the nth member.

Note that, unlike the standard LETKF approach, the analysis ensemble perturbations obtained with (13) are not guaranteed to have a zero ensemble mean. Therefore, the mean is simply removed before addition to the mean analysis.

3. Experiments with idealized 1D ensembles

a. Experimental setup

The ensemble DA approaches are first compared by applying them to an idealized situation of assimilating observations that lie along a single nonperiodic spatial dimension with evenly spaced grid points. Gaussian-distributed background ensembles with 10 members are randomly generated from a known true state (equal to zero) and error covariance matrix such that the background ensemble spread is statistically consistent with the error in the background ensemble mean. To achieve this, the ensemble mean is first computed by adding to the true state a random draw from the true error distribution. The ensemble perturbations added to this mean are computed as random draws from the same distribution, but with the sample ensemble mean removed. Note that there is no cycling of the ensemble through time in these experiments. The Gaussian function is used to define the true background error correlations with a length scale1 of two grid units and both the background and observation error variance equal one. In this idealized situation, the error distributions are all Gaussian with known covariances and therefore all assumptions involved in the formulation of the EnKF and LETKF are respected.

Several configurations of the idealized experiments are conducted. In the first, only a single observation is assimilated to correct the background ensemble. The entire spatial domain for this experiment consists of 10 grid points evenly spaced along a single spatial dimension. In two additional configurations, 20 randomly distributed observations are assimilated within a spatial domain of 20 grid points. Since the observations are located at randomly chosen grid points, some grid points may have no observations, whereas others can have numerous independent observations. The ensemble DA approaches are applied to this configuration both with and without spatial covariance localization applied. The spatial localization function is defined as the Gaussian function with a length scale of three grid units. To obtain robust statistical results, the numerical experiments are conducted repeatedly over 10 000 random realizations of the background and observation error and the ensemble perturbations. The approaches with cross validation are implemented such that each subensemble contains only a single ensemble member. Consequently, when performing the analysis update for any given member, the covariances are estimated using all of the other members (i.e., N^e=Ne1).

b. Results

The square root of the mean squared errors (RMSE) of the background ensemble mean and analysis ensemble mean are computed over the 10 000 random realizations. For a reliable ensemble, this RMSE should be consistent with the ensemble spread as measured with the standard deviation (stddev) of the ensemble perturbations with respect to the ensemble mean. Figure 1 shows the RMSE of the ensemble mean (blue curves) and the ensemble spread (red curves) for the background ensemble (pale curves) and the analysis ensembles produced by applying each of the ensemble DA approaches with no spatial localization used. Without cross validation, applying either the EnKF or LETKF approach produces analysis ensembles with a spread that systematically underestimates the error in the ensemble mean throughout the spatial domain (Fig. 1a). When applied to such a simple problem with no covariance localization, the EnKF and LETKF produce identical results in terms of both the RMSE of the ensemble mean and average ensemble spread. The approaches with cross validation produce analysis ensembles with a spread that is more consistent with the error in the ensemble mean (Fig. 1b). This consistency is highest for the EnKF-CV and LETKF-SCV approaches, which produce identical results. With the LETKF-DCV approach, the ensemble spread matches very closely the error in the ensemble mean at the observation location (eighth grid point), but still underestimates the error at locations distant from the observation, though to a much lesser extent than the approaches without cross validation.

Fig. 1.
Fig. 1.

The RMSE of the analysis ensemble mean (blue) and average analysis ensemble spread (red) computed over 10 000 realizations for (a) EnKF (○), LETKF (+) and (b) EnKF-CV (○), LETKF-SCV (+), LETKF-DCV (×). For this configuration, there are 10 ensemble members, 1 observation (with location denoted by the gray diamond near the bottom of the panel) and 10 grid points. The true homogeneous background error correlation function has a length scale of two grid points. The RMSE of the background ensemble mean (pale blue) and background ensemble spread (pale red) are also shown.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0402.1

Similarly, when assimilating 20 observations, the approaches without cross validation produce analysis ensembles with spread that systematically underestimates the error in the ensemble mean (Fig. 2a). The extent of this underestimation is much greater than when only a single observation was assimilated. The ensemble spread is again much closer to the error in the ensemble mean when using the approaches with cross validation (Fig. 2b), especially at the well observed locations with multiple observations. At locations that are not directly observed, the spread differs from the error in the ensemble mean, with the EnKF-CV and LETKF-SCV producing ensembles with spread larger than the ensemble mean error and the LETKF-DCV producing ensembles with spread smaller than the ensemble mean error. However, this lack of consistency is again much less pronounced than with the approaches without cross validation. By applying spatial localization to the same configuration, the consistency between the ensemble mean error and the ensemble spread are more consistent, even for the approaches without cross validation (Fig. 3a), due to both a reduction in the ensemble mean error and an increase in the ensemble spread. The consistency is nearly perfect throughout the spatial domain when applying spatial location with the three DA approaches with cross validation (Fig. 3b).

Fig. 2.
Fig. 2.

As in Fig. 1, but for a configuration with 20 grid points and 20 randomly located observations (the number of observations at each grid is indicated by the size gray diamonds).

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0402.1

Fig. 3.
Fig. 3.

As in Fig. 2, but for with spatial localization applied to the covariance matrix estimated from the ensemble. The localization function has a length scale of 3 grid points.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0402.1

Even though the analysis ensemble mean error and spread are both nearly identical when comparing the stochastic and deterministic approaches (i.e., EnKF versus LETKF and EnKF-CV/LETKF-SCV versus LETKF-DCV), this is obtained by very different changes made to the individual ensemble perturbations during the analysis procedure. The assimilation of perturbed observations to account for observation uncertainty results in larger changes to the ensemble perturbations for the stochastic approaches as compared with the use of a symmetric square root of the analysis error covariances in the deterministic approaches. Figure 4 shows the stddev of the analysis updates to the ensemble perturbations (i.e., the differences between the analysis and background ensemble perturbations) for EnKF and LETKF (Fig. 4a) and also for EnKF-CV, LETKF-SCV, and LETKF-DCV (Fig. 4b). The stochastic approaches both with and without cross validation have a higher stddev than the deterministic approaches. In fact, the use of cross validation has a much smaller effect on the amplitude of these increments than the effect of using deterministic versus stochastic ensemble DA approach for this particular experimental configuration (from comparing Figs. 4a and 4b).

Fig. 4.
Fig. 4.

The stddev of the ensemble perturbation increments for (a) EnKF (○), LETKF (+) and (b) EnKF-CV (○), LETKF-SCV (+), LETKF-DCV (×). for the case with 20 observations and spatial localization applied. The gray line without symbols denotes the background error stddev.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0402.1

4. Experiments with the logistic map

a. Experimental setup

As opposed to the perfectly Gaussian distributions used in the experiments just presented, many realistic ensemble DA applications, including for NWP, can involve non-Gaussian distributions. A simple scalar model, the logistic map (Strogatz 2015), was used by MH2009 to examine different approaches for avoiding inbreeding in the stochastic EnKF. The model is nonlinear, exhibits chaotic behavior, and therefore can produce ensembles with non-Gaussian distributions. Following the procedure used by MH2009, a large number of ensemble DA experiments are performed (10 000) using each of the approaches described in section 2. The initial conditions of the true state are uniformly sampled within the interval (0.25, 0.75) and the following form of the logistic map is integrated for 12 time units:
x(t)=4x(t1)[1x(t1)],
where x(t) is the state variable at time index t (for integer values of t > 0). For the data assimilation experiments, both the initial ensemble mean and the observations were generated by perturbing the true state with a Gaussian random number with stddev equal to 0.01. As in the previous experiments, cross validation was implemented such that each subensemble contains only a single ensemble member.

b. Results

The resulting RMSE of the ensemble mean and ensemble spread at both initial and final times from using both 6 (see Table 1) and 12 (see Table 2) ensemble members are presented. Very similar results were obtained using the EnKF and EnKF-CV approaches as those obtained by MH2009 (their Tables 1 and 4, respectively). The LETKF also produced very similar results as those obtained by MH2009 (their Table 7) when using the EnSRF of Whitaker and Hamill (2002). As expected, the LETKF-SCV approach produced identical results as with EnKF-CV. The LETKF-DCV approach, on the other hand, produced unstable realizations such that the analyzed value of the state variable quickly diverged from its true value. MH2009 also showed that, for a sufficiently small ensemble size, the stochastic EnKF with 2 subensembles could lead to an unstable realization. They explain that this instability occurs when the two subensembles become sufficiently dissimilar from each other. When this happens the subensemble with smaller spread is updated using a larger gain than the subensemble with larger spread, thus amplifying the difference between the spreads of the two subensembles. The same behavior is seen in numerical experiments with the LETKF-DCV approach (an example is shown in Table 3). In fact, it appears to occur more frequently with the LETKF-DCV such that it becomes unstable even with larger ensembles (up to 24-member ensembles were tested) that enable the stochastic EnKF with cross validation to provide stable results. This is likely due to the added disadvantage of the LETKF-DCV approach that, during the analysis update, tends to maintain and even amplify any non-Gaussianity in the background ensemble. Lawson and Hansen (2004) also observed this feature for a deterministic ensemble DA approach, whereas a stochastic EnKF tended to produce an analysis ensemble that was more Gaussian than the background ensemble.

Table 1.

Resulting analysis ensemble mean RMSE and ensemble spread at the initial and final times from ensemble DA experiments with the logistic map and using 6 members for EnKF, EnKF-CV, LETKF, LETKF-SCV, and LETKF-DCV obtained from 10 000 realizations. In this and following tables, ∞ indicates when the model produces unrealistically large values after the ensemble has diverged from the true state.

Table 1.
Table 2.

As in Table 1, but with 12 ensemble members.

Table 2.
Table 3.

Detailed results of a single unstable LETKF-DCV realization with 6 ensemble members. The outlier member is indicated by the column with the numbers in bold.

Table 3.

5. Experiments with the Lorenz model

a. Experimental setup

The ensemble DA approaches are also tested with the Lorenz and Emanuel (1998) model that has a single periodic spatial dimension and therefore benefits from the use of spatial localization. This model is chaotic and has been used in numerous past studies for the initial testing of ensemble DA algorithms (e.g., Whitaker and Hamill 2002; Bowler et al. 2013). A configuration with 40 model grid points, a time step equal to 0.05 time units and 20 ensemble members was used. The observations are located at every second grid point, have an error stddev equal to one and are assimilated at every time step. This is similar to one of the configurations used by Bowler et al. (2013) in their tests of various ensemble DA approaches. The initial true state was obtained by first running the model for 100 time steps from a state with an infinitesimal perturbation added at the eighth grid point to an otherwise spatially constant state. The initial ensemble mean for the DA experiments was obtained by randomly perturbing (with stddev of 0.4) the true state. The initial ensemble perturbations were also randomly generated (with stddev of 0.4). For both the truth and DA runs, the model was then integrated for 1100 time steps and the first 100 time steps are not used in the comparison. As for the idealized experiments described in the previous two sections, cross validation was implemented such that each subensemble contains only a single ensemble member. Spatial localization was implemented by using the fifth-order polynomial with compact support proposed by Gaspari and Cohn (1999). The best results were obtained with a localization function that becomes zero at a distance of 10 grid points.

b. Results

Table 4 provides the RMSE of the analysis ensemble mean and the mean analysis ensemble spread averaged both temporally and spatially. Qualitatively similar results are obtained as with the logistic map. In particular, the LETKF-DCV approach produces highly non-Gaussian ensembles with a single outlier member that quickly diverges from all the other members. This is also consistent with the results of Bowler et al. (2013) that found the EnSRF could perform very poorly when applying a cross validation approach with subensembles. The two stochastic approaches with cross validation (i.e., EnKF-CV and LETKF-SCV) produce slightly different results due to the difference in how spatial localization is applied in the two approaches. Both approaches produce ensembles with spread that overestimates the error in the ensemble mean to a small degree.

Table 4.

Resulting time averaged analysis ensemble mean RMSE and analysis ensemble spread from data assimilation experiments with the Lorenz and Emanuel (1998) model. Note that the ensemble DA approaches without cross validation employed random additive inflation (EnKF: σ = 0.085, LETKF: σ = 0.02).

Table 4.

Uncorrelated random additive inflation applied to the analysis ensembles is used for both the EnKF and LETKF approaches so that the ensembles are approximately consistent with the error of the ensemble mean. For this, the EnKF approach requires significantly more inflation (σ = 0.085) as compared with LETKF (σ = 0.02). Without random inflation (not shown), the EnKF approach resulted in a very large error in the ensemble mean, whereas the LETKF approach produces ensembles that are only slightly less accurate than with additive inflation. The standard LETKF approach with random additive inflation results in the lowest ensemble mean error, whereas the EnKF with random additive inflation results in the highest ensemble mean error.

6. Experiments with a regional NWP model

a. Experimental setup

An experimental regional NWP system is used for performing an initial comparison between the three LETKF-based approaches (LETKF, LETKF-SCV, and LETKF-DCV) and the EnKF-CV approach currently used operationally at ECCC for global ensemble DA. A regional configuration of the Global Environmental Multiscale (GEM) model (Girard et al. 2014) is employed with a spatial domain covering most of Canada and northern United States, 10 km grid spacing and 80 vertically staggered levels between the surface and 0.1 hPa. All ensemble DA approaches with cross validation use 8 subensembles. The same lateral boundary conditions (provided by a global EnKF) are used for all experiments. In addition, the same additive inflation (applied to the analysis ensembles) is used for all experiments by adding random realizations generated with spatial and multivariate covariances from a low-resolution and scaled version of the static background-error covariance matrix (with stddev multiplied by 0.25) from the global deterministic NWP system that is based on 4D-EnVar (Buehner et al. 2015). This additive inflation approach follows that used in the operational global EnKF at ECCC (Houtekamer et al. 2019).

The specific configuration of the EnKF-CV is similar to the system described by Bédard et al. (2018), except for a reduced ensemble size of 128 members and more severe spatial localization. The localization function is the same function from Gaspari and Cohn (1999) already mentioned, such that the horizontal covariances are forced to zero at a horizontal distance of 1400 km for vertical levels below 400 hPa, which increases to a distance of 2000 km for levels above 14 hPa. Vertically, the covariances are forced to zero at a “distance” of 2.0 units of the natural logarithm of pressure. For the three LETKF experiments, a shorter localization length scale is used, such that the horizontal covariances are forced to zero at a distance of 1200 km for all levels and in the vertical at 1.5 units of the natural logarithm of pressure, which was found to be necessary to obtain a similar fit of the mean of the first analysis ensemble to radiosonde observations. This is consistent with the results of Greybush et al. (2011) that suggest the optimal localization length scale is shorter when using so-called R localization in the LETKF than when using B localization in the EnKF.

The 10-day experiments are performed with the first analysis at 1800 UTC 30 June 2016 and subsequent analyses performed every 6 h. The same set of observations that passed the quality control procedures of the regional deterministic prediction system are assimilated for all ensemble DA approaches, including those from radiosondes, aircraft, land stations, ships, buoys, scatterometers, atmospheric motion vectors, satellite-based radio occultation, and brightness temperature from microwave and infrared satellite sounders/imagers. While the EnKF-CV approach assimilates the observations sequentially in batches, the LETKF simultaneously assimilates all observations within the distance from a given grid point such that the localization function is nonzero. This algorithmic difference and the difference in the approach used for localization may cause different results from the EnKF-CV and LETKF-SCV approaches. The standard LETKF and LETKF-DCV have the additional difference with respect to the EnKF-CV approach due to the assimilation of observations without perturbations added. Unlike the other approaches, the standard LETKF does not use cross validation and therefore requires an additional procedure to maintain sufficient ensemble spread. For the experiment presented here, the relaxation to prior spread (RTPS) approach, introduced by Whitaker and Hamill (2012), is used with a coefficient of 0.8 (where this value was chosen empirically based on preliminary testing with various values).

As explained in section 2, with cross validation the gain matrix is computed using ensemble covariances obtained from only a subset of the complete ensemble. As part of this calculation, the ensemble mean of the subset should be subtracted from the members. However, when applied to the LETKF, this would significantly increase the computational cost when the number of observations is large, since it necessitates the recalculation of the matrix (Yb)TR−1Yb for each subensemble. To avoid these additional computations for the NWP experiments, an alternative approach is taken. Since this matrix is in the space of the ensemble members, each matrix element corresponds to a pair of members. Therefore, to obtain the required matrix for updating a single subensemble, a subset of the matrix elements is simply extracted from the matrix computed with the full ensemble. For the relatively large ensemble size used for the NWP experiments, the effect of this simplification is likely to be small. For the idealized experiments in the previous sections, which used much smaller ensemble sizes, the ensemble mean was subtracted for the subset of members used to compute covariances for the analysis update of each subensemble.

The LETKF algorithms were implemented within the same modular FORTRAN code used for the operational deterministic 4D-EnVar, which is nearly independent from the EnKF software. For the experiments performed in this study, the overall computational cost is lower for the LETKF approaches, though both systems could likely be further optimized.

b. Results from first analysis time

All ensemble DA experiments begin by performing 6-h ensemble forecasts initialized from the same analysis ensemble obtained from a global stochastic EnKF experiment. Then, the resulting ensemble after the first analysis is compared when using the different approaches. Because the background ensemble is the same for each experiment, these analysis ensembles differ only due to the single application of each ensemble DA procedure.

Figures 57 show the background ensemble spread and the analysis ensemble spread from applying each assimilation approach. As an additional comparison, the standard LETKF without RTPS is also included in this comparison, though this approach was not cycled over the 10-day period. For surface pressure (Fig. 5) the ensemble spread is significantly reduced by all of the approaches with respect to the background spread (Fig. 5a), except when using the standard LETKF with RTPS (Fig. 5d). As expected, the ensemble spread is reduced the most with the standard LETKF without RTPS (Fig. 5c) due to inbreeding. The application of RTPS is intended to compensate for inbreeding by inflating the analysis ensemble spread, which results in only a small reduction in spread relative to the background ensemble. All approaches with cross validation produce analysis ensembles with similar spread (Figs. 5b,e,f). Qualitatively similar results are also seen for temperature (Fig. 6) and zonal wind (Fig. 7) at the model level near 300 hPa. Based on these comparisons (and similar comparisons for other variables and vertical levels, which are not shown), the two LETKF approaches with cross validation produce very similar analysis ensemble spread as the EnKF-CV approach currently used at ECCC.

Fig. 5.
Fig. 5.

Ensemble spread stddev from (a) the first background ensemble and also from the first analysis ensemble in the 10-day regional NWP ensemble DA experiments produced by (b) EnKF-CV, (c) LETKF, (d) LETKF with RTPS, (e) LETKF-SCV, and (f) LETKF-DCV for surface pressure (in units of hPa).

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0402.1

Fig. 6.
Fig. 6.

As in Fig. 5, but for temperature (in units of K) at 300 hPa.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0402.1

Fig. 7.
Fig. 7.

As in Fig. 6, but for the zonal wind component (in units of m s−1) at 300 hPa.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0402.1

The ensemble mean analysis increment is shown in Fig. 8 for the same three variables and vertical levels as shown in the previous figures. Results for only the EnKF-CV (Figs. 8a,c,e) and standard LETKF (Figs. 8b,d,f) are shown, since all variations of the LETKF approach produce the same ensemble mean increment for the first analysis time (since, as mentioned in section 2c, they all used the same equation as the standard LETKF to compute the ensemble mean increment). While the large-scale pattern of the increments is similar between the approaches, many local details differ. In general, the EnKF-CV approach tends to produce increments with larger local values, especially apparent for temperature and zonal wind at 300 hPa in several areas (e.g., over British Columbia and near Lake Superior). These differences are likely due to the two major differences in the implementation of the approaches: the EnKF-CV assimilates observations in batches and applies covariance localization to the background-error covariances, while the LETKF assimilates all observations simultaneously at each analysis grid point and applies covariance localization by modifying the observation-error covariance matrix.

Fig. 8.
Fig. 8.

Analysis increment for the ensemble mean at the first analysis time in the 10-day regional NWP ensemble DA experiments for (a),(c),(e) EnKF-CV and (b),(d),(f) LETKF shown for (a),(b) surface pressure (in units of hPa), (c),(d) temperature (in units of K) at 300 hPa, and (e),(f) the zonal wind component (in units of m s−1) at 300 hPa.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0402.1

It was previously demonstrated that the three approaches with cross validation produce very similar analysis ensemble spread (Figs. 57). However, as seen from the idealized experiments in section 3, the stochastic approaches (EnKF-CV and LETKF-SCV) can produce much larger changes to the ensemble perturbation of each member than the deterministic approach with cross validation (LETKF-DCV). Figures 911 show the stddev of the increment to the ensemble perturbations for surface pressure (Fig. 9) and also for temperature (Fig. 10) and zonal wind (Fig. 11), both at 300 hPa. Consistent with the results from the idealized experiments, these figures show that the stochastic approaches (EnKF-CV and LETKF-SCV) make much larger changes to the ensemble perturbations than the deterministic approaches (LETKF with RTPS and LETKF-DCV). The fact that different ensemble DA approaches can produce similar analysis ensemble spread, but different analysis ensemble perturbations is discussed by Tippett et al. (2003) in the context of comparing different types of ensemble square root filters. It may be expected that approaches that make smaller changes to the ensemble perturbations during the analysis procedure may be more effective at maintaining the dynamical balances present in the individual background ensemble members. It is also possible, however, that deterministic approaches, especially when combined with cross validation, may result in ensembles that diverge more from a Gaussian distribution, as seen in the idealized experiments.

Fig. 9.
Fig. 9.

Stddev of increments to the ensemble perturbations for surface pressure (in units of hPa) for the first analysis time in the 10-day regional NWP ensemble DA experiments for (a) EnKF-CV, (b) LETKF with RTPS, (c) LETKF-SCV, and (d) LETKF-DCV.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0402.1

Fig. 10.
Fig. 10.

As in Fig. 9, but for temperature (in units of K) at 300 hPa.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0402.1

Fig. 11.
Fig. 11.

As in Fig. 10, but for the zonal wind component (in units of m s−1) at 300 hPa.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0402.1

c. Results from 10-day experiments

Each ensemble DA approach was cycled for a period of 10 days and the resulting ensemble mean analysis and background states compared with radiosonde observations. The resulting bias and stddev of the radiosonde observations minus the corresponding ensemble mean values are shown in Fig. 12 for both meridional wind (Figs. 12a,c) and temperature (Figs. 12b,d). Overall, the results from all LETKF-based approaches (standard LETKF with RTPS, LETKF-SCV, and LETKF-DCV) are very similar to those from using the EnKF-CV approach. Statistically significant differences with respect to EnKF-CV are indicated in the figure with filled circles. Only the observation-minus-analysis temperature bias shows many vertical levels with statistically significant differences for the two deterministic approaches (standard LETKF with RTPS and LETKF-DCV), though these differences represent both decreases and increases in bias, depending on the vertical level. For the ensemble mean background state, no differences with EnKF-CV are statistically significant with the curves of all four experiments being nearly indistinguishable.

Fig. 12.
Fig. 12.

The bias and stddev of the (a),(b) analysis ensemble mean and (b),(d) background ensemble mean with respect to radiosonde observations of (a),(c) meridional wind and (b),(d) temperature for the 10-day regional NWP ensemble DA experiments: stochastic EnKF with cross validation (black), LETKF with RTPS (red), LETKF-SCV (cyan), and LETKF-DCV (green). Any statistically significant differences between the stochastic EnKF and other experiments are indicated with filled circles.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0402.1

A series of 3-day ensemble forecasts were produced from each of the ensemble data assimilation experiments. In terms of the RMSE of the ensemble mean, ensemble spread and the continuous ranked probability score, few of the differences between the approaches are statistically significant.

7. Conclusions

In this study, two new ensemble DA approaches are presented that allow the problem of inbreeding to be largely eliminated by applying the concept of cross validation to the LETKF approach. With cross validation, the analysis update for a given ensemble member perturbation is performed using covariances estimated from an independent subset of ensemble members. In the LETKF-SCV approach, perturbed observations are assimilated and the full gain matrix is applied to each ensemble member. In the LETKF-DCV approach, the gain form of the LETKF (Bishop et al. 2017) is used to perform the analysis update for each ensemble member perturbation. A comparison between these and the EnKF-CV approach was carried out using both idealized numerical experiments and realistic regional NWP experiments. A summary of the results follows:

  • in idealized tests with a Gaussian distributed background ensemble, all methods with cross validation (EnKF-CV, LETKF-SCV, and LETKF-DCV) produce analysis ensembles that are much more consistent with the error in the ensemble mean than the standard approaches without cross validation (EnKF and LETKF);

  • in idealized ensemble DA experiments with highly nonlinear forecast models, the LETKF-DCV approach produces non-Gaussian ensembles that can eventually result in total failure of the approach, whereas the stochastic approaches with cross validation (EnKF-CV and LETKF-SCV) are more robust and produce ensembles that have similar or lower ensemble mean error than the EnKF without cross validation; and

  • in a realistic regional NWP application over a period of 10 days, the first analysis ensembles have very similar ensemble spread when using any of the approaches with cross validation (EnKF-CV, LETKF-SCV, and LETKF-DCV), whereas use of the standard LETKF approach with multiplicative inflation (RTPS) produces a much larger analysis ensemble spread at the first analysis time; all approaches (both with and without cross validation) result in statistically indistinguishable observation-minus-ensemble mean background statistics over the 10-day experiments.

The promising results of this study have motivated plans for future work to perform a more comprehensive comparison of the current EnKF-CV approach with LETKF approaches both with and without cross validation in the context of the ECCC operational global ensemble DA system. If one of the LETKF approaches provides ensemble forecast results of comparable quality without increased computational cost, then the LETKF could be considered as a possible replacement of the EnKF-CV approach in the operational system. This could be justified because an ensemble DA approach based on the LETKF would allow the implementation of improved vertical localization to benefit the assimilation of satellite radiance observations (Lei et al. 2018). Moreover, the LETKF approaches are implemented in the same modular FORTRAN software used for the operational regional and global deterministic DA systems based on 4D-EnVar (Buehner et al. 2015; Caron et al. 2015). As a consequence, a switch from EnKF-CV to an LETKF-based approach would also improve the efficiency of many types of modifications to the data assimilation systems (e.g., addition of new types of assimilated observations) that currently need to be implemented separately in both the deterministic and ensemble DA software libraries. Comparisons will also need to be performed of the LETKF approaches with the EnKF-CV approach in the context of a hybrid DA system in which a 4D-EnVar data assimilation procedure is used to partially recenter the members, as is currently done in the operational system (Houtekamer et al. 2019).

Acknowledgments

The author thanks Seung Jong Baek for help with the regional EnKF experiments and Craig Bishop for initial positive feedback on the basic idea of applying the gain-form LETKF for cross validation. Jean-François Caron, Pieter Houtekamer, and three anonymous reviewers are also thanked for comments that helped to improve an earlier version of the paper.

REFERENCES

  • Bédard, J., M. Buehner, J.-F. Caron, S. J. Baek, and L. Fillion, 2018: Practical ensemble-based approaches to estimate atmospheric background error covariances for limited-area deterministic data assimilation. Mon. Wea. Rev., 146, 37173733, https://doi.org/10.1175/MWR-D-18-0145.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bishop, C. H., B. J. Etherton, and S. J. Majumdar, 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects. Mon. Wea. Rev., 129, 420436, https://doi.org/10.1175/1520-0493(2001)129<0420:ASWTET>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bishop, C. H., J. S. Whitaker, and L. Lei, 2017: Gain form of the ensemble transform Kalman filter and its relevance to satellite data assimilation with model space ensemble covariance localization. Mon. Wea. Rev., 145, 45754592, https://doi.org/10.1175/MWR-D-17-0102.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bowler, N. E., J. Flowerdew, and S. R. Pring, 2013: Tests of different flavours of EnKF on a simple model. Quart. J. Roy. Meteor. Soc., 139, 15051519, https://doi.org/10.1002/qj.2055.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Buehner, M., and Coauthors, 2015: Implementation of deterministic weather forecasting systems based on ensemble–variational data assimilation at Environment Canada. Part I: The global system. Mon. Wea. Rev., 143, 25322559, https://doi.org/10.1175/MWR-D-14-00354.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Burgers, G., P. Jan van Leeuwen, and G. Evensen, 1998: Analysis scheme in the ensemble Kalman filter. Mon. Wea. Rev., 126, 17191724, https://doi.org/10.1175/1520-0493(1998)126<1719:ASITEK>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Campbell, W. F., C. H. Bishop, and D. Hodyss, 2010: Vertical covariance localization for satellite radiances in ensemble Kalman filters. Mon. Wea. Rev., 138, 282290, https://doi.org/10.1175/2009MWR3017.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Caron, J., T. Milewski, M. Buehner, L. Fillion, M. Reszka, S. Macpherson, and J. St-James, 2015: Implementation of deterministic weather forecasting systems based on ensemble–variational data assimilation at Environment Canada. Part II: The regional system. Mon. Wea. Rev., 143, 25602580, https://doi.org/10.1175/MWR-D-14-00353.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gaspari, G., and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions. Quart. J. Roy. Meteor. Soc., 125, 723757, https://doi.org/10.1002/qj.49712555417.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Girard, C., and Coauthors, 2014: Staggered vertical discretization of the Canadian Environmental Multiscale (GEM) model using a coordinate of the log-hydrostatic-pressure type. Mon. Wea. Rev., 142, 11831196, https://doi.org/10.1175/MWR-D-13-00255.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Greybush, S. J., E. Kalnay, T. Miyoshi, K. Ide, and B. R. Hunt, 2011: Balance and ensemble Kalman filter localization techniques. Mon. Wea. Rev., 139, 511522, https://doi.org/10.1175/2010MWR3328.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamill, T. H., and J. S. Whitaker, 2001: Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter. Mon. Wea. Rev., 129, 27762790, https://doi.org/10.1175/1520-0493(2001)129<2776:DDFOBE>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique. Mon. Wea. Rev., 126, 796811, https://doi.org/10.1175/1520-0493(1998)126<0796:DAUAEK>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., and H. L. Mitchell, 2005: Ensemble Kalman filtering. Quart. J. Roy. Meteor. Soc., 131, 32693289, https://doi.org/10.1256/qj.05.135.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., and F. Zhang, 2016: Review of the ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 144, 44894532, https://doi.org/10.1175/MWR-D-15-0440.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., H. L. Mitchell, and X. Deng, 2009: Model error representation in an operational ensemble Kalman filter. Mon. Wea. Rev., 137, 21262143, https://doi.org/10.1175/2008MWR2737.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., M. Buehner, and M. De La Chevrotière, 2019: Using the hybrid gain algorithm to sample data assimilation uncertainty. Quart. J. Roy. Meteor. Soc., 145, 3556, https://doi.org/10.1002/qj.3426.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hunt, B. R., E. Kostelich, and I. Szunyogh, 2007: Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Physica D, 230, 112126, https://doi.org/10.1016/j.physd.2006.11.008.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lawson, W. G., and J. A. Hansen, 2004: Implications of stochastic and deterministic filters as ensemble-based data assimilation methods in varying regimes of error growth. Mon. Wea. Rev., 132, 19661981, https://doi.org/10.1175/1520-0493(2004)132<1966:IOSADF>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lei, L., J. S. Whitaker, and C. Bishop, 2018: Improving assimilation of radiance observations by implementing model space localization in an ensemble Kalman filter. J. Adv. Model. Earth Syst., 10, 32213232, https://doi.org/10.1029/2018MS001468.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lorenz, E. N., and K. A. Emanuel, 1998: Optimal sites for supplementary weather observations: Simulation with a small model. J. Atmos. Sci., 55, 399414, https://doi.org/10.1175/1520-0469(1998)055<0399:OSFSWO>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mitchell, H. L., and P. L. Houtekamer, 2009: Ensemble Kalman filter configurations and their performance with the logistic map. Mon. Wea. Rev., 137, 43254343, https://doi.org/10.1175/2009MWR2823.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Miyoshi, T., Y. Sato, and T. Kadowaki, 2010: Ensemble Kalman filter and 4D-Var intercomparison with the Japanese operational global analysis and prediction system. Mon. Wea. Rev., 138, 28462866, https://doi.org/10.1175/2010MWR3209.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Schraff, C., H. Reich, A. Rhodin, A. Schomburg, K. Stephan, A. Periáñez, and R. Potthast, 2016: Kilometre-scale ensemble data assimilation for the COSMO model (KENDA). Quart. J. Roy. Meteor. Soc., 142, 14531472, https://doi.org/10.1002/qj.2748.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Strogatz, S., 2015: Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering. 2nd ed. CRC Press, 532 pp., https://doi.org/10.1201/9780429492563.

    • Crossref
    • Export Citation
  • Tippett, M. K., J. L. Anderson, C. H. Bishop, T. M. Hamill, and J. S. Whitaker, 2003: Ensemble square root filters. Mon. Wea. Rev., 131, 14851490, https://doi.org/10.1175/1520-0493(2003)131<1485:ESRF>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations. Mon. Wea. Rev., 130, 19131924, https://doi.org/10.1175/1520-0493(2002)130<1913:EDAWPO>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Whitaker, J. S., and T. M. Hamill, 2012: Evaluating methods to account for system errors in ensemble data assimilation. Mon. Wea. Rev., 140, 30783089, https://doi.org/10.1175/MWR-D-11-00276.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Yang, S., E. Kalnay, B. Hunt, and N. E. Bowler, 2009: Weight interpolation for efficient data assimilation with the local ensemble transform Kalman filter. Quart. J. Roy. Meteor. Soc., 135, 251262, https://doi.org/10.1002/qj.353.

    • Crossref
    • Search Google Scholar
    • Export Citation
1

The length scale is defined in terms of the e-folding distance (i.e., the distance where the correlation equals 1/e).

Save
  • Bédard, J., M. Buehner, J.-F. Caron, S. J. Baek, and L. Fillion, 2018: Practical ensemble-based approaches to estimate atmospheric background error covariances for limited-area deterministic data assimilation. Mon. Wea. Rev., 146, 37173733, https://doi.org/10.1175/MWR-D-18-0145.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bishop, C. H., B. J. Etherton, and S. J. Majumdar, 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects. Mon. Wea. Rev., 129, 420436, https://doi.org/10.1175/1520-0493(2001)129<0420:ASWTET>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bishop, C. H., J. S. Whitaker, and L. Lei, 2017: Gain form of the ensemble transform Kalman filter and its relevance to satellite data assimilation with model space ensemble covariance localization. Mon. Wea. Rev., 145, 45754592, https://doi.org/10.1175/MWR-D-17-0102.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bowler, N. E., J. Flowerdew, and S. R. Pring, 2013: Tests of different flavours of EnKF on a simple model. Quart. J. Roy. Meteor. Soc., 139, 15051519, https://doi.org/10.1002/qj.2055.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Buehner, M., and Coauthors, 2015: Implementation of deterministic weather forecasting systems based on ensemble–variational data assimilation at Environment Canada. Part I: The global system. Mon. Wea. Rev., 143, 25322559, https://doi.org/10.1175/MWR-D-14-00354.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Burgers, G., P. Jan van Leeuwen, and G. Evensen, 1998: Analysis scheme in the ensemble Kalman filter. Mon. Wea. Rev., 126, 17191724, https://doi.org/10.1175/1520-0493(1998)126<1719:ASITEK>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Campbell, W. F., C. H. Bishop, and D. Hodyss, 2010: Vertical covariance localization for satellite radiances in ensemble Kalman filters. Mon. Wea. Rev., 138, 282290, https://doi.org/10.1175/2009MWR3017.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Caron, J., T. Milewski, M. Buehner, L. Fillion, M. Reszka, S. Macpherson, and J. St-James, 2015: Implementation of deterministic weather forecasting systems based on ensemble–variational data assimilation at Environment Canada. Part II: The regional system. Mon. Wea. Rev., 143, 25602580, https://doi.org/10.1175/MWR-D-14-00353.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gaspari, G., and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions. Quart. J. Roy. Meteor. Soc., 125, 723757, https://doi.org/10.1002/qj.49712555417.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Girard, C., and Coauthors, 2014: Staggered vertical discretization of the Canadian Environmental Multiscale (GEM) model using a coordinate of the log-hydrostatic-pressure type. Mon. Wea. Rev., 142, 11831196, https://doi.org/10.1175/MWR-D-13-00255.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Greybush, S. J., E. Kalnay, T. Miyoshi, K. Ide, and B. R. Hunt, 2011: Balance and ensemble Kalman filter localization techniques. Mon. Wea. Rev., 139, 511522, https://doi.org/10.1175/2010MWR3328.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamill, T. H., and J. S. Whitaker, 2001: Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter. Mon. Wea. Rev., 129, 27762790, https://doi.org/10.1175/1520-0493(2001)129<2776:DDFOBE>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique. Mon. Wea. Rev., 126, 796811, https://doi.org/10.1175/1520-0493(1998)126<0796:DAUAEK>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., and H. L. Mitchell, 2005: Ensemble Kalman filtering. Quart. J. Roy. Meteor. Soc., 131, 32693289, https://doi.org/10.1256/qj.05.135.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., and F. Zhang, 2016: Review of the ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 144, 44894532, https://doi.org/10.1175/MWR-D-15-0440.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., H. L. Mitchell, and X. Deng, 2009: Model error representation in an operational ensemble Kalman filter. Mon. Wea. Rev., 137, 21262143, https://doi.org/10.1175/2008MWR2737.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., M. Buehner, and M. De La Chevrotière, 2019: Using the hybrid gain algorithm to sample data assimilation uncertainty. Quart. J. Roy. Meteor. Soc., 145, 3556, https://doi.org/10.1002/qj.3426.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hunt, B. R., E. Kostelich, and I. Szunyogh, 2007: Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Physica D, 230, 112126, https://doi.org/10.1016/j.physd.2006.11.008.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lawson, W. G., and J. A. Hansen, 2004: Implications of stochastic and deterministic filters as ensemble-based data assimilation methods in varying regimes of error growth. Mon. Wea. Rev., 132, 19661981, https://doi.org/10.1175/1520-0493(2004)132<1966:IOSADF>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lei, L., J. S. Whitaker, and C. Bishop, 2018: Improving assimilation of radiance observations by implementing model space localization in an ensemble Kalman filter. J. Adv. Model. Earth Syst., 10, 32213232, https://doi.org/10.1029/2018MS001468.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lorenz, E. N., and K. A. Emanuel, 1998: Optimal sites for supplementary weather observations: Simulation with a small model. J. Atmos. Sci., 55, 399414, https://doi.org/10.1175/1520-0469(1998)055<0399:OSFSWO>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mitchell, H. L., and P. L. Houtekamer, 2009: Ensemble Kalman filter configurations and their performance with the logistic map. Mon. Wea. Rev., 137, 43254343, https://doi.org/10.1175/2009MWR2823.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Miyoshi, T., Y. Sato, and T. Kadowaki, 2010: Ensemble Kalman filter and 4D-Var intercomparison with the Japanese operational global analysis and prediction system. Mon. Wea. Rev., 138, 28462866, https://doi.org/10.1175/2010MWR3209.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Schraff, C., H. Reich, A. Rhodin, A. Schomburg, K. Stephan, A. Periáñez, and R. Potthast, 2016: Kilometre-scale ensemble data assimilation for the COSMO model (KENDA). Quart. J. Roy. Meteor. Soc., 142, 14531472, https://doi.org/10.1002/qj.2748.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Strogatz, S., 2015: Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering. 2nd ed. CRC Press, 532 pp., https://doi.org/10.1201/9780429492563.

    • Crossref
    • Export Citation
  • Tippett, M. K., J. L. Anderson, C. H. Bishop, T. M. Hamill, and J. S. Whitaker, 2003: Ensemble square root filters. Mon. Wea. Rev., 131, 14851490, https://doi.org/10.1175/1520-0493(2003)131<1485:ESRF>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations. Mon. Wea. Rev., 130, 19131924, https://doi.org/10.1175/1520-0493(2002)130<1913:EDAWPO>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Whitaker, J. S., and T. M. Hamill, 2012: Evaluating methods to account for system errors in ensemble data assimilation. Mon. Wea. Rev., 140, 30783089, https://doi.org/10.1175/MWR-D-11-00276.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Yang, S., E. Kalnay, B. Hunt, and N. E. Bowler, 2009: Weight interpolation for efficient data assimilation with the local ensemble transform Kalman filter. Quart. J. Roy. Meteor. Soc., 135, 251262, https://doi.org/10.1002/qj.353.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    The RMSE of the analysis ensemble mean (blue) and average analysis ensemble spread (red) computed over 10 000 realizations for (a) EnKF (○), LETKF (+) and (b) EnKF-CV (○), LETKF-SCV (+), LETKF-DCV (×). For this configuration, there are 10 ensemble members, 1 observation (with location denoted by the gray diamond near the bottom of the panel) and 10 grid points. The true homogeneous background error correlation function has a length scale of two grid points. The RMSE of the background ensemble mean (pale blue) and background ensemble spread (pale red) are also shown.

  • Fig. 2.

    As in Fig. 1, but for a configuration with 20 grid points and 20 randomly located observations (the number of observations at each grid is indicated by the size gray diamonds).

  • Fig. 3.

    As in Fig. 2, but for with spatial localization applied to the covariance matrix estimated from the ensemble. The localization function has a length scale of 3 grid points.

  • Fig. 4.

    The stddev of the ensemble perturbation increments for (a) EnKF (○), LETKF (+) and (b) EnKF-CV (○), LETKF-SCV (+), LETKF-DCV (×). for the case with 20 observations and spatial localization applied. The gray line without symbols denotes the background error stddev.

  • Fig. 5.

    Ensemble spread stddev from (a) the first background ensemble and also from the first analysis ensemble in the 10-day regional NWP ensemble DA experiments produced by (b) EnKF-CV, (c) LETKF, (d) LETKF with RTPS, (e) LETKF-SCV, and (f) LETKF-DCV for surface pressure (in units of hPa).

  • Fig. 6.

    As in Fig. 5, but for temperature (in units of K) at 300 hPa.

  • Fig. 7.

    As in Fig. 6, but for the zonal wind component (in units of m s−1) at 300 hPa.

  • Fig. 8.

    Analysis increment for the ensemble mean at the first analysis time in the 10-day regional NWP ensemble DA experiments for (a),(c),(e) EnKF-CV and (b),(d),(f) LETKF shown for (a),(b) surface pressure (in units of hPa), (c),(d) temperature (in units of K) at 300 hPa, and (e),(f) the zonal wind component (in units of m s−1) at 300 hPa.

  • Fig. 9.

    Stddev of increments to the ensemble perturbations for surface pressure (in units of hPa) for the first analysis time in the 10-day regional NWP ensemble DA experiments for (a) EnKF-CV, (b) LETKF with RTPS, (c) LETKF-SCV, and (d) LETKF-DCV.

  • Fig. 10.

    As in Fig. 9, but for temperature (in units of K) at 300 hPa.

  • Fig. 11.

    As in Fig. 10, but for the zonal wind component (in units of m s−1) at 300 hPa.

  • Fig. 12.

    The bias and stddev of the (a),(b) analysis ensemble mean and (b),(d) background ensemble mean with respect to radiosonde observations of (a),(c) meridional wind and (b),(d) temperature for the 10-day regional NWP ensemble DA experiments: stochastic EnKF with cross validation (black), LETKF with RTPS (red), LETKF-SCV (cyan), and LETKF-DCV (green). Any statistically significant differences between the stochastic EnKF and other experiments are indicated with filled circles.

All Time Past Year Past 30 Days
Abstract Views 51 0 0
Full Text Views 985 230 32
PDF Downloads 800 238 32