## Abstract

The ensemble Kalman filter (EnKF) and its deterministic variants, mostly square root filters such as the ensemble transform Kalman filter (ETKF), represent a popular alternative to variational data assimilation schemes and are applied in a wide range of operational and research activities. Their forecast step employs an ensemble integration that fully respects the nonlinear nature of the analyzed system. In the analysis step, they implicitly assume the prior state and observation errors to be Gaussian. Consequently, in nonlinear systems, the analysis mean and covariance are biased, and these filters remain suboptimal. In contrast, the fully nonlinear, non-Gaussian particle filter (PF) only relies on Bayes’s theorem, which guarantees an exact asymptotic behavior, but because of the so-called curse of dimensionality it is exposed to weight collapse. Here, it is shown how to obtain a new analysis ensemble whose mean and covariance exactly match the Bayesian estimates. This is achieved by a deterministic matrix square root transformation of the forecast ensemble, and subsequently a suitable random rotation that significantly contributes to filter stability while preserving the required second-order statistics. The properties and performance of the proposed algorithm are further investigated via a set of experiments. They indicate that such a filter formulation can increase the analysis quality, even for relatively small ensemble sizes, compared to other ensemble filters in nonlinear, non-Gaussian scenarios. Localization enhances the potential applicability of this PF-inspired scheme in larger-dimensional systems. The proposed algorithm, which is fairly easy to implement and computationally efficient, is referred to as the nonlinear ensemble transform filter (NETF).

## 1. Introduction

Data assimilation combines model forecasts with real-world observations to achieve an optimal estimation of the state of a dynamical system. Considering the strong sensitivity of highly nonlinear and chaotic systems such as the earth’s atmosphere with respect to initial conditions, an advanced data assimilation system is a key criterion to generate good forecasts. Furthermore, data assimilation techniques are successfully applied in a reanalysis context to reconstruct the atmospheric state over extended periods, which is relevant for climate studies.

Data assimilation systems have significantly advanced in the past decades. Many developments concern the advancement of variational methods such as 4DVAR (Talagrand and Courtier 1987), which require the minimization of a high-dimensional cost function, and hence, a reliable tangent linear and adjoint model of the forward model have to be derived and maintained. 4DVAR can be classified as a smoother, as it aims at producing an analysis trajectory for a complete time window such that it represents an optimal fit to both the background state and all the observations. In contrast, filters only consider past observations and are usually applied in a sequential way by iterating a forecast step, which executes the forward model until observation time, and an analysis step, which turns the prior forecast state into a posterior estimate that contains observational information. The output of the analysis step is then reused as the initial condition for the next forecast step. Initialized by the introduction of the ensemble Kalman filter (EnKF; Evensen 1994), sequential, ensemble-based techniques now constitute a popular alternative to variational methods. Their particular advantage, at least for most state estimation applications, is the independence of the actual analysis algorithm of the specific forward model. Furthermore, the combination of both approaches into so-called hybrid systems is an active field of research particularly for operational applications (e.g., Buehner et al. 2013; Lorenc et al. 2015).

The EnKF is based on the Kalman filter (KF; Kalman 1960), originally derived as an optimal solution (in a least squares sense) for linear models and linear observation operators. As this leads to Gaussian densities, the KF analysis step relies on the prior mean and covariance to compute the posteriori mean and covariance. The KF analysis can be interpreted as a linear regression of the model state on the observation vector (Anderson 2010). To apply it to nonlinear systems, several approximations have been suggested. They all attempt to obtain realistic estimates of the prior mean and covariance within the forecast step. The extended KF employs a linearized propagation of the error covariance matrix, which can lead to unrealistic results in nonlinear problems (Miller et al. 1994). The EnKF proposed a new perspective by estimating the prior moments from a preceding ensemble forecast. This can be interpreted as a Monte Carlo approximation for solving the underlying Fokker–Planck equation (Evensen 2009). However, the update step remains Gaussian, since it utilizes the usual KF equations to determine the mean and covariance of the analysis ensemble.

The classical EnKF requires a perturbation of the observations in the analysis step in order to maintain the correct covariance (Burgers et al. 1998), which results in suboptimal behavior because of sampling errors, particularly for small ensemble sizes. As alternatives, the so-called deterministic EnKFs were suggested, which generate the analysis ensemble with the desired properties without perturbing the observations (Bishop et al. 2001; Anderson 2001; Whitaker and Hamill 2002; Evensen 2004; Ott et al. 2004). Most of them belong to the class of square root filters (Tippett et al. 2003; Nerger et al. 2012) as they require the computation of a matrix square root in order to determine an appropriate transformation matrix to be applied to the prior ensemble.

A distinct approach is represented by the particle filter (PF), first published by Gordon et al. (1993). Here, the properties of the analysis, mostly moments, are estimated directly via Bayes’s theorem. The PF only requires some mild assumptions (Doucet et al. 2001, chapter 2) about the form of the prior probability density function (pdf) and does not restrict it to a particular type. Therefore, it can be regarded as a fully nonlinear, non-Gaussian filter. Even though this seems appealing, its application to high-dimensional systems is not straightforward due to the so-called curse of dimensionality (Snyder et al. 2008). In a Bayesian update in a high-dimensional space with many independent observations, many particles are likely to fall outside the region of significant probability mass and receive negligible weights, and therefore, their information is effectively lost, which results in a filter collapse. The review by van Leeuwen (2009) summarized different PF variants and approximations, whereas a more recent development, which is the equivalent weights PF (van Leeuwen 2010), seems to be a promising approach. It relies on a stochastic nature of the forward model in order to pull the particles toward the observations, and to ensure almost-equal weights by employing suited proposal densities. Other recent attempts to improve the PF also apply a form of residual nudging (Luo and Hoteit 2014) or variational methods (implicit PF, e.g., Atkins et al. 2013).

During the past years, there has been research activity concerning the development of algorithms that attempt to address the drawbacks of both the EnKF and the PF. A popular approach, initiated by Anderson and Anderson (1999) and Pham (2001), is the representation of the forecast ensemble as a Gaussian mixture, allowing a better consideration of nonlinear features contained in the ensemble (e.g., Hoteit et al. 2008; Stordal et al. 2011; Hoteit et al. 2012; Frei and Künsch 2013a,b). Other authors use the EnKF as a proposal for a PF (Papadakis et al. 2010) or apply linear programming methods (Reich 2013).

As Lei and Bickel (2011) pointed out, in a nonlinear, non-Gaussian scenario the EnKFs will necessarily produce an analysis where the mean and covariance are biased, resulting from the assumption of a Gaussian prior pdf, which, in general, is not satisfied. They suggested the nonlinear ensemble adjustment filter (NLEAF), which uses a stochastic update mechanism designed such that the analysis mean and covariance are unbiased. This introduces sampling error as an additional source of uncertainty, and the algorithm may exhibit high computational costs even in moderate dimensions.

Our work continues with their paradigm by deriving and exploring a novel deterministic formulation of the analysis step that aims at generating an analysis ensemble such that its first two moments exactly match the Bayesian estimates. In contrast to the PF, the resulting algorithm does not attempt to estimate the full analysis pdf but concentrates on its most important properties. It can be classified as a square root filter as it applies a properly derived transform matrix similar to the ensemble transform Kalman filter (ETKF). We also motivate and show that an additional moment-preserving random rotation contributes to filter stability by preventing the built-up of ensemble outliers, as similarly found for the ETKF (Lawson and Hansen 2004; Sakov et al. 2012). Together with a localization approach as an effective way to overcome the curse of dimensionality (van Leeuwen 2009), the algorithm may also offer a potential applicability for systems of larger dimensionality. Besides a theoretical exploration, the main focus of this work is the investigation of its properties in systems of different sizes and interpretation of its performance. This particularly concerns a comparison with EnKF variants to assess the advantages of the nonlinear approach, and to the NLEAF in order to evaluate the effect of a deterministic update.

The remainder of this paper is organized as follows. Section 2 reviews the use of deterministic square root filters within the EnKF framework, with the focus on the ETKF. In section 3, we summarize the PF and reformulate its mean and covariance. This allows a straightforward derivation of the novel analysis algorithm in section 4, where we also discuss its principal characteristics. Section 5 contains an application of the new filter in various experiments in order to demonstrate and compare its performance properties. Finally, section 6 draws the main conclusions.

## 2. Ensemble Kalman filtering

This section reviews the EnKF framework. The technique of a deterministic ensemble transformation will be used for the derivation of the new filter in section 4.

### a. Classical EnKF

In the following discussion, we exclusively concentrate on the analysis step, and therefore neglect any time indices. The state of the dynamical system is described by a state vector **x** of size *d*. From the previous forecast step, an ensemble of *m* independent prior states representing the best estimate of the prior pdf (Evensen 1994) is given. Collecting all ensemble vectors as columns of a *d* × *m* matrix and the ensemble perturbations as columns of , that is, its *i*th column is , the estimates of the prior mean and covariance can be conveniently written as

where .

The KF assumes a multivariate Gaussian prior density, entirely described by the two moments, . Furthermore, suppose the observation is given by the vector **y** of size *k*, and an observation operator is available that translates a model state **x** into observation space. Its specific form depends on the explicit application, for example, it can represent a simple interpolation of model states, or a complex radiative transfer model to derive satellite radiances from model variables. In the KF framework, the observation is supposed to have Gaussian error statistics with covariance , therefore its likelihood density is

Under these assumptions, the analysis pdf is also Gaussian, with the modified moments

Here, is the *d* × *d* identity matrix and is the tangent linear observation operator. The Kalman gain is computed as

For an EnKF, it is usually required that the analysis mean and covariance satisfy Eqs. (4) and (5). The correct mean can be easily guaranteed by recentering the ensemble. To retain the desired covariance matrix, the prior ensemble perturbations have to be updated correctly. For the original EnKF (Evensen 1994; Burgers et al. 1998), this request can be achieved by updating each ensemble member with the KF equation by assigning each member an artificially perturbed observation:

where the *k* × *m* matrix contains *m* perturbed observation vectors drawn from .

Note that the tangent linear observation operator, which appears in some equations, is not required for the EnKFs, as all terms involving can be evaluated by applying the nonlinear observation operator to each ensemble state (Houtekamer and Mitchell 1998).

### b. Deterministic EnKFs

The use of perturbed observations only guarantees the KF Eqs. (4) and (5) to be satisfied for . For finite ensemble sizes, particularly, for small ones that are typical in large-scale applications, the EnKF behaves suboptimally, owing to the additionally introduced sampling errors associated with the perturbed observations. To overcome this issue, several authors (see, e.g., the summary by Tippett et al. 2003), suggested alternative, deterministic algorithms to generate the analysis ensemble with the desired moments. Their common basis is that they formally require the analysis ensemble covariance to match the KF result given in Eq. (5):

With the exception of the “deterministic EnKF” (Sakov and Oke 2008a), this is achieved by using a matrix square root technique. Here, we refer to the ETKF, originally described by Bishop et al. (2001) and revised by Hunt et al. (2007). Their formulation can be obtained by taking the ensemble representation of , as given in Eq. (2), and applying it in Eqs. (5) and (6), which results in

where

Hence, matrix can be interpreted as a transform matrix applied to prior ensemble perturbations to obtain analysis perturbations, . Since has to be computed as the matrix square root of , the filter is classified as square root filter. As pointed out by several authors (Wang et al. 2004; Sakov and Oke 2008b), the unique symmetric square root of the positive definite matrix should be used to preserve the ensemble mean. However, the transform matrix itself is not unique, since using instead of with an arbitrary orthogonal matrix of size *m* × *m*, such that , generates a different ensemble with the same requested covariance matrix (Sakov and Oke 2008b). As shown by Livings et al. (2008), this rotation matrix should additionally satisfy [i.e., having the vector as an eigenvector] in order to retain the ensemble mean.

The analysis ensemble mean given in Eq. (4) can be reformulated (Hunt et al. 2007) as a weighted linear combination of the prior ensemble perturbations, emphasizing that the ETKF works in the subspace spanned by the ensemble:

In the next section, it will be shown that the mean calculated within a PF can be written in a similar form, based on a distinct weight vector.

## 3. Particle filter and its moments: A reformulation

As noted in the previous section, all EnKF variants rely on the KF analysis equations to determine the analysis mean and covariance; thus, they implicitly assume a Gaussian prior pdf. For nonlinear models, the prior ensemble will in general not form a Gaussian distribution, which can only be partly considered by the EnKF analysis (Evensen 2009, p. 43). Because of the linear update formulation, the analysis will exhibit a bias caused by this simplification, as discussed by Lei and Bickel (2011) in more detail. An alternative analysis algorithm is given by the PF.

### a. Basic particle filter

The PF (Gordon et al. 1993), also known as sequential Monte Carlo method in the statistical literature (Doucet et al. 2001), constitutes a fully nonlinear and non-Gaussian approach to the filtering problem, since no strong assumptions or restrictions on the prior or the observational pdf are posed. The basis for the PF is Bayes’s theorem, which quantifies the update of a prior pdf with new pieces of information. The latter is represented by the observational pdf, also called the likelihood density. The result is the posterior pdf, that is, the density of the state conditioned on the observation:

where . Hence, the posteriori expectation of any function *f* of **x**, *f*(**x**), can be written as

If the prior ensemble represents an i.i.d., *m*-sized sample of the prior pdf, , we can calculate a Monte Carlo approximation of the expected value as

This equation can be regarded as a weighted mean, where each prior ensemble state is assigned a specific weight that incorporates the observational information. In the more general case, the prior ensemble is not necessarily equally weighted (e.g., if no resampling has been performed in the previous analysis step) and the previous weights have to be considered as well, yielding the new weights:

The Bayesian weight of each ensemble member is directly determined by its likelihood. In practice, the denominator is computed by the constraint that all weights have to sum up to 1, since the analysis pdf has to be normalized. For example, for Gaussian observation errors, the weights are easily obtained by evaluating the multivariate normal density introduced in Eq. (3):

The PF is appealing because it performs a Bayesian analysis without any restrictive assumptions; thus, it entirely considers the nonlinear nature of the system and the resulting non-Gaussian densities. However, the ensemble states are not updated, only their relative weights are modified. The curse of dimensionality states that in high-dimensional state spaces with a large number of independent observations, many particles do not lie in the region of significant probability mass with respect to the likelihood pdf. Consequently, the weights will exhibit a high variance, with many particles having negligible relative weights. These particles are effectively lost for the next forecast step, and therefore, after a few iterations, the filter will collapse into effectively one particle (Bickel et al. 2008). This issue is often counteracted by resampling the particles according to their weights (e.g., Gordon et al. 1993; Kitagawa 1996). However, Snyder et al. (2008) showed that this does not avoid the problem mentioned above unless the number of particles strongly exceeds a critical number, which grows exponentially with the square of an effective dimension of the system. Additionally, direct resampling does not help in deterministic systems without stochastic terms (e.g., model error) where duplicated particles will remain identical throughout the complete next forecast phase. Various approaches have been summarized by van Leeuwen (2009), and a considerably promising method appears to be found with the equivalent weight PF (van Leeuwen 2010; Ades and van Leeuwen 2013), which has recently been shown to be able to perform stable assimilation in large-scale circulation models (van Leeuwen and Ades 2013; Ades and van Leeuwen 2015). It utilizes cleverly chosen proposal densities to relax the ensemble members close to the observation during the forecast step, which implies that considering a model error is essential. It does not work with deterministic models, and therefore, the implementation of the forecast step has to be adapted and tuned to the specific assimilation problem.

### b. Particle filter moments

We now compute the Bayesian expectations of mean and covariance. For this purpose, we define the weight vector **w** of length *m*, where each element corresponds to the particle’s weight as defined in Eq. (14).

To obtain the analysis mean, Eq. (13) is evaluated with , leading to

Using the weight vector **w**, we split the analysis mean into the prior mean and perturbations:

Here, represents a *d* × *m* matrix with all columns equal to the prior ensemble mean. Together with the constraint , this leads to

Hence, the PF analysis has the same form as the ETKF analysis in Eq. (11): a linear combination of the ensemble perturbations, determined by a weight vector, is added to the prior mean. The only difference is that the weight vector now contains the “true” weights, as derived from the fully nonlinear, non-Gaussian Bayes’s theorem.

The covariance predicted by the PF is computed using Eq. (13) with , resulting in

The aim is to write the analysis covariance matrix in the same form as in Eq. (9) in order to identify an appropriate transform matrix for the new perturbations. In the first step, we substitute the analysis mean with the help of Eq. (17) to introduce the matrix of prior perturbations, and then, we expand the product within the sum:

As the first term contains the columns of , it can be written as matrix product where the matrix is defined as a *m* × *m* diagonal matrix created from the weights. The last term is owing to the sum of weights being 1, and the second and third term can also be written as the same matrix–vector product (but with a negative sign). Therefore, the final result is

We introduced the factor in order to restore the usual unbiased covariance estimator in case of equal weights (L. Nerger, AWI Bremerhaven, 2013, personal communication). In combination, Eqs. (17) and (19) allow an efficient evaluation of the first two moments of a PF analysis. In the following, we employ this result to generate an analysis ensemble.

## 4. Nonlinear ensemble transform filter

Based on the previous results, this section derives the new filter and discusses its general properties and implications.

### a. Ensemble transformation

As discussed in section 2, the ETKF and its variants aim at constructing an analysis ensemble where mean and covariance match the KF equations. The general idea now is to obtain an analysis ensemble where the mean and covariance exactly match the Monte Carlo estimates based on Bayes’s theorem, as stated by Eq. (13). This results in a nonparametric analysis formulation, in the sense that, as in the PF, no specific distribution for the prior pdf is assumed. Hence, we do not introduce biases resulting from the linear update due to the Gaussian assumption as for the EnKFs. At this stage, we assume the Bayesian estimates to be reliable for further processing, which is not valid if a weight collapse has happened and most members already have negligible weight. In sections 4b and 4d we return to this issue. In mathematical terms, the requirement to match the empirical analysis covariance with the Bayesian estimate in Eq. (19) is formulated as

Comparing Eq. (20) with Eq. (9), we note that the analysis covariance has the same structure as in the ETKF, with replaced by . This insight suggests that an analysis ensemble can be constructed by utilizing the square root of as transform matrix:

As is a real, symmetric, positive semidefinite *m* × *m* matrix (see appendix A), it has a unique symmetric square root that can be computed through a singular value decomposition, , so that . The use of a symmetric square root is motivated by experiences from ETKF variants (Lawson and Hansen 2004; Sun et al. 2009) and also guarantees that the ensemble perturbations remain centered at zero (Sakov and Oke 2008b).

Having obtained the analysis ensemble perturbations from the deterministic transformation, the final ensemble is easily obtained by recentering the perturbations around the Bayesian mean given by Eq. (17), so that . As mentioned by Nerger et al. (2012) in the context of the ETKF, the update of the mean and ensemble perturbations can be combined efficiently to one single step:

where is a matrix containing **w** in each column.

In summary, the analysis algorithm just presented generates an ensemble with exactly the same mean and covariance as derived directly from the fully nonlinear, non-Gaussian Bayes’s theorem, and therefore, it is referred to as the nonlinear ensemble transform filter (NETF). It applies the same paradigm as followed by the NLEAF (Lei and Bickel 2011) in a deterministic way.

### b. Random rotations

As already mentioned in section 2 for the ETKF, the transform matrix of the NETF is not unique since an additional rotation of the ensemble in the space spanned by the ensemble members,

does not change the mean and covariance of the ensemble as long as is orthogonal and satisfies (Livings et al. 2008). Hence, one can choose as a random matrix with these properties. To generate such a random matrix, the “second-order exact sampling” method of Pham (2001) is applicable. It was developed to obtain a random *m* × (*m* − 1) rotation matrix, whereas we need a random *m* × *m* matrix, requiring a slight modification of the algorithm, which is described in appendix B.

Such an additional rotation has already been investigated for ETKFs where it has been shown to potentially improve filter stability. The reason is that a purely deterministic update may enhance a situation where the ensemble is nearly collapsed, in the sense that most of the ensemble members are considerably close and the correct moments are enforced by a few strong outliers, as found by Lawson and Hansen (2004) and emphasized by Leeuwenburgh et al. (2005). The NETF derived here is particularly sensitive to this issue as its transform matrix is formed exclusively by the Bayesian weights, which are known to possibly have a large variance, especially in high-dimensional spaces. Such an ensemble is more likely to suffer from weight collapse in the successive analysis step. The random rotation suggested here counteracts this problem by generating a new ensemble that conserves the exact second-order statistics. This procedure comes at the price that higher-order moments are discarded and the analysis ensemble statistics become closer to Gaussian distributions (Nerger et al. 2012). However, at the current stage it is unclear how the deterministic transformation already affects higher-order moments which are not considered in our scheme (see also appendix C). As we will illustrate in section 5, the experimental experiences confirm that augmenting the transform matrix with a random rotation strongly improves filter stability.

### c. Alternative ensemble transformation

The structure of the update mechanism derived in section 4a is very similar to the ETKF as it utilizes a right-sided transformation of the prior ensemble perturbation matrix to obtain new perturbations with the desired covariance. The same request may also be met by a left-sided transformation:

which resembles the transformation appearing in the ensemble adjustment Kalman filter (Anderson 2001). It can easily be verified that the following choice of the left-sided transform matrix,

also produces analysis ensemble perturbations that have zero mean and an empirical covariance of . Again, the update can be augmented by a random rotation in ensemble space as discussed in section 4b:

In this form, the filter variant is referred to as the NETF in state space (NETFiSS). When used with random rotations, the filter properties are expected to be identical to the NETF since a new ensemble with Gaussian characteristics is created (see section 5c). However, the type of the ensemble transformation affects the computational resources needed, see section 4e.

### d. Localization and inflation

The direct application of ensemble-based filters in large-scale and nonlinear systems results in suboptimal behavior due to the low-rank approximation of the covariance matrix and sampling errors, leading to spurious correlations and an underestimation of the ensemble spread. Both issues are caused by the inevitable necessity to use a rather small ensemble owing to computational restrictions. While the tendency to underestimate the variance is usually suppressed by inflation methods, spurious correlations are overcome through localization (e.g., Ott et al. 2004; Janjić et al. 2011). It has been shown that in combination with such rather ad hoc techniques, the filter performance in high-dimensional systems can be enhanced significantly and made comparative to established variational schemes (Kalnay et al. 2007; Buehner et al. 2010; Yang et al. 2012). Furthermore, van Leeuwen (2009) suggested that localization might be a helpful tool to fight the curse of dimensionality inherent to PFs, as it strongly reduces the effective state dimensionality. However, it is argued that its application is a challenge for PFs since the weights become spatially dependent (van Leeuwen 2009; Lei and Bickel 2011). The NETF overcomes this problem because at each analysis step a new, equally weighted ensemble is generated and local weights only exist intermediately.

Local analyses with the NETF can be performed similar to the principles summarized by Hunt et al. (2007), computing a separate analysis for each model grid point where only observations in a defined radius are considered. Additional to such a domain localization (Houtekamer and Mitchell 1998), the observational influence can be reduced with its distance by multiplying the corresponding entries in , which in our filter appears in in the case of Gaussian observations, with an appropriate weight function (observation localization, e.g., Miyoshi and Yamane 2007; Kirchgessner et al. 2014). If localization is used, the random rotation has to be applied to the global ensemble in order to avoid inconsistent local random transformations (i.e., after the fully deterministic local updates the global analysis is replaced by ). Equivalently, one may also use the same rotation matrix for all local updates.

The NETF relies on the Bayesian weights whose variance increases as the filter approaches weight degeneracy. Inflation can be used to counteract this issue by increasing the chances of more particles being in the region of significant probability mass with respect to the likelihood pdf (Lei and Bickel 2011; Frei and Künsch 2013a). It can be applied in the usual way before or after the analysis step by replacing the corresponding ensemble according to , with an inflation factor *γ* slightly larger than 1 (Anderson and Anderson 1999).

### e. Computational complexity

As in the ETKF, the analysis is performed in the *m*-dimensional subspace spanned by the ensemble members, and hence, the computational expense is similar for a given ensemble size. In contrast to all EnKFs, our filter does not involve the computation of an inverse matrix. This prevents computational instabilities caused by considerably small singular values, which are sometimes neglected in ETKF implementations for that reason (Sakov et al. 2012). If a localized version is applied, the local analyses are independent and can be computed in parallel as for the ETKF. The generation of random rotation matrices consumes additional resources; however, it is possible to resort to a collection of precalculated random matrices since they only depend on ensemble size *m*.

The left-sided variant NETFiSS performs the update in state space as its transform matrix is of dimension *d* × *d*. It requires the evaluation of the product of two matrix square roots of this size (one being inverse). For comparison, in each NLEAF analysis step *m* + 1 matrix square roots of state covariance matrices have to be computed, together with *m* matrix multiplications in state space (Lei and Bickel 2011, section 3c). These update algorithms are feasible for small-sized problems but become very costly for systems of intermediate and high dimensionality. We conclude that, concerning computational complexity, the NETF is the more efficient choice in larger-dimensional cases.

### f. Summary of the final algorithm

The NETF is only concerned with the analysis step, converting any prior ensemble into an analysis ensemble by assimilating the observation. Below, we summarize the algorithm to generate the analysis including localization as mentioned in section 4d, allowing a simple implementation into an existing sequential ensemble data assimilation system. We assume that the global state vector of dimension *d* is partitioned into a number of *local domains*, where each one may for instance contain all model variables at one single grid point. For simplicity of the description, following Hunt et al. (2007), we mark all global quantities with an additional subscript [*g*], that is, we begin with the prior ensemble of size *m*, stored in the *d* × *m* ensemble matrix , and the *k*-dimensional observation with error covariance . All quantities referring to the current local domain have no additional subscript. If localization is not required, one simply has to skip step 3 and ignore all distinctions between local and global quantities.

Compute the predicted observations by applying the observation operator to the prior ensemble, , and put the resulting vectors into a

*k*×*m*ensemble matrix .Prepare an appropriate orthogonal, mean-preserving random rotation matrix (see appendix B for an algorithm).

This step selects the data for the local analysis: extract all rows of that belong to the current local domain to obtain the local ensemble states, . Select the entries of that are to be considered for the local analysis (e.g., all observations within a specified localization radius) and the corresponding rows and columns of , which forms the local observation vector

**y**and covariance matrix . Accordingly, choose the same rows from , forming , which contains the ensemble’s counterparts of the localized observation**y**.If desired, multiply the entries of with an appropriate weight function to reduce the influence of more distant observations.

Having performed steps 3 to 9 for all local domains, form the global analysis ensemble: Reverse the first procedure of step 3 by inserting the local analysis ensemble matrices (the outputs of step 9) into the corresponding rows of the

*d*×*m*matrix .

## 5. Exemplary applications

In this section, we apply the NETF to a variety of data assimilation test beds in order to explore its principal behavior and compare its performance to other ensemble filters. We deal with three distinct model environments in order to highlight the impact of nonlinear dynamics and observations as well as the method’s potential applicability in larger-dimensional settings. First, we introduce the models and describe the experimental setups. Then, the actual results are shown and discussed.

### a. Models, setups, and evaluation measures

In all cases, we perform twin experiments, meaning that a reference trajectory, referred to as truth , is generated, from which synthetic observations are drawn by means of perturbation. The observation errors are restricted to be noncorrelated with constant variance, that is, , where *k* is the number of observed state components.

We compare the NETF with both the ETKF variants (with and without random rotations) and with the first- and second-order versions of the NLEAF that rely on perturbed observations. In this context, we also include the classical EnKF, as defined in Eq. (7), as the stochastic counterpart of the ETKF. Table 1 summarizes basic properties for all these filters. For each experimental run, all filters under investigation receive exactly the same observations, initial ensemble and, if needed, rotation matrices.

An ensemble filter’s performance in nonlinear scenarios is quite sensitive to the chosen inflation, which is the most important tuning parameter (Sakov and Oke 2008b). To fairly compare the filter results, it is desirable to remove the dependency on inflation. We tuned the prior inflation factor *γ* for optimal performance (in terms of RMSE, see below) by varying for each experimental case and filter. While this simple but expensive procedure is sufficient for toy model tests, for real applications, more sophisticated methods such as adaptive inflation (e.g., Anderson 2007; Miyoshi 2011; Luo and Hoteit 2013) may be more suitable.

The following setup issues are common to all Lorenz experiments: the truth is generated through the integration of an initial state for 100 nondimensional time units of the corresponding model, and the data assimilation experiments are performed within this time range. The true initial state itself is the outcome of a spinup run over 30 time units. The initial ensembles are sampled around (as, e.g., in Yang et al. 2012), in order to minimize the effects of filter spinup. They are generated by second-order exact sampling with a covariance of one-tenth of the climatological one (extracted from the truth).

#### 1) Lorenz 1963 (L63)

The well-known L63 model (Lorenz 1963) is a three-dimensional system, , described by the ordinary differential equations:

Because of its simple formulation but challenging dynamics, this model has often been applied as benchmark for data assimilation experiments (e.g., Miller et al. 1994; Ahrens 1999; Yang et al. 2012; Sakov et al. 2012). The system is solved with a fourth-order Runge–Kutta discretization scheme with time step (therefore, 100 time units are equal to 10 000 time steps), using the standard parameters , , and . With this choice, the system exhibits strong nonlinearity that depends on the current flow. The spinup run is initialized with . We use a partial observation system for *x* and *y* while the *z* component remains hidden. In addition to the variation of the ensemble size, the experiments differ in the time interval between two successive Gaussian observations, , and their error variance, .

#### 2) Lorenz models of intermediate dimensionality

The L96 system (Lorenz 1996) roughly emulates a meteorological variable on a latitude:

It is widely applied as another typical nonlinear data assimilation test bed (e.g., see Lei and Bickel 2011; Nerger et al. 2012; Frei and Künsch 2013a; Kirchgessner et al. 2014). The system is applied on a periodic domain with grid points (twice the usual dimensionality) so that , and a forcing of . It is solved with a fourth-order Runge–Kutta scheme with the typical time step (therefore, 100 time units are equal to 2000 time steps). The spinup run is initialized with and for .

Because of the intermediate dimensionality of the system, we apply the filters in a localized form using local analyses together with observation localization, as outlined in section 4f. Hence, for each grid point, only observations within the localization radius are considered, and we apply a fifth-order polynomial correlation function [Gaspari and Cohn 1999, their Eq. (4.10)] to such that the observational weights smoothly decrease with distance. The observation error variance is set to and the system is observed at all grid points with an odd index every other time step. This sparse-observation situation is a difficult setup for a data assimilation system (Lei and Bickel 2011); particularly if one considers that the time between two observations corresponds to about 12 h in the real atmosphere (Lorenz 1996). Furthermore, we create a slightly non-Gaussian observation scenario whose treatment is an advantage of the nonlinear filters. To that purpose, we sample the independent observation errors from a univariate Laplace (double exponential) density with zero mean and a scale parameter of that guarantees that the density’s variance is . The KF-based filters implicitly assume the likelihood pdf to be a Gaussian density with the same variance, which in principle looks similar, but has slightly less weight near zero and in the tails.

In the L96 system, two neighboring grid points do not exhibit a significant correlation (Lorenz 2005). Here, localization is mainly applied as a dimension reduction technique, as discussed by Lei and Bickel (2011), since a filter solely based on the Bayesian weights cannot work directly in high-dimensional spaces due to the curse of dimensionality. However, it may be more desirable to apply localization based on actual spatial dependence. For that purpose, we also investigate the filter’s behavior in model II of Lorenz (2005), which extends the L96 model by a spatial smoothing parameter *K*:

where [⋅] denotes an average of the nearby grid points. Rainwater and Hunt (2013) applied this L2005 system to data assimilation experiments, and we adopt their model design with {then } and , resulting in a detectable correlation for up to five neighboring grid points. However, again we use a state dimensionality of . Except for an increased localization radius of , the setup remains the same as that described for the L96 model.

#### 3) Linear advection (LA)

In a final experiment, we test the filter in a simple model that is not characterized by its challenging dynamics but instead by a state dimensionality one order of magnitude larger than the just-presented Lorenz systems. This demand is met by the linear advection (LA) model on a 1D grid with , first used by Evensen (2004), that propagates the previous field to the right at each time step without changing its shape:

The experimental setup slightly differs from the Lorenz experiments: following Sakov and Oke (2008b) and Evensen (2009), a climatological field is obtained by sampling a random Gaussian field whose correlation function is Gaussian with a spatial decorrelation length of 20 grid points and a variance of 1. The true field, as well as the initial ensemble fields, are generated by adding additional random fields with this characteristics to the climatology. The integration time is , and a set of Gaussian observations is created with every five time steps at each tenth grid point. The localization radius equals twice the spatial length scale, . No inflation is used within this linear scenario, where the analysis field should converge to the truth.

### b. Evaluation

The main evaluation criterion is the root-mean-square error (RMSE). The difference between the analysis mean and truth is quantified for all times *t* using and then the average over the entire period of interest is computed.

In some cases, we also assess the quality of the ensembles by computing the empirical ensemble spread via as a measure of its uncertainty, which should on average be of similar magnitude as the RMSE (e.g., Palmer et al. 2005; Hopson 2014). The statistical reliability of the ensemble can be quantified by finding the 0.025 and 0.975 quantiles of the ensemble. For a well-calibrated ensemble, the truth should be inside this interval in 95% of all cases. Finally, the continuous ranked probability score (CRPS; Gneiting et al. 2007) and ignorance score (CRIGN; Tödter and Ahrens 2012) are useful probabilistic measures that consider the entire ensemble distribution with respect to the truth. They are computed component-wise, and their purpose is to summarize the distance between the ensemble pdf and pdf of the verifying truth in probability space. While the CRPS applies a quadratic norm, the CRIGN quantifies the difference in information content and is more sensitive to cases where the truth lies close to or outside of the ensemble range boundaries.

The innovations, , allow the extraction of further diagnostics independent of the truth. Specifically, the expected innovation variance (Desroziers et al. 2005) can be expressed as the sum of the observation error variance and average prior ensemble variance in observation space, . Following Anderson (2007), it should correspond to the actual innovation variance, , in a well-calibrated data assimilation system.

### c. Results and discussions

#### 1) Lorenz 63 (L63)

The first experiment concentrates on some basic properties of the NETF, using ensemble members in a setup with and . To demonstrate the importance of the random rotation, we ran the filter 1) without augmented rotations, 2) with a deterministic, mean-preserving rotation matrix [Nerger et al. 2012, their Eq. (24)], and 3) with random rotation matrices as introduced in section 4b. Figure 1a shows the temporal evolution of the average RMSE for the first 2500 time steps. Initially, the basic PF (with universal resampling) shows the best results, but the strong increase in RMSE at indicates filter collapse, which is expected in a fully deterministic system, as motivated in section 3. The NETF without rotations is not much better than the PF, as it diverges only a few assimilation cycles later. With a deterministic rotation, stability can be maintained for a longer period, but eventually the filter collapses as well. In contrast, the NETF with random rotations remains stable until the end of the assimilation window. We generated very similar results independently on the setup, and did not succeed in obtaining stable runs without rotations or with deterministic ones. Next, in Fig. 1b we compare the standard NETF with the variant NETFiSS that employs a left-sided transformation, see section 4c. Again, we notice that without random rotations, the filter quickly diverges, and in similar experiments, we did not obtain stable results over the long term, even with inflation tuning. However, if the NETFiSS uses the same random rotation matrices as the NETF, the performances are almost identical, as we expected from the theoretical considerations. As a consequence, in all further experiments we only refer to the standard NETF with random rotations, as summarized in section 4f.

For practical reasons, it is important to investigate the filter performance with respect to the ensemble size, so all experiments were run with . Figure 2 focusses on the required ensemble sizes to obtain reasonable RMSE results in the scenario with and . While the ETKF with random rotations is only slightly better than the EnKF for , the ETKF without rotations degrades with ensemble size, probably because the probability of generating outliers increases (Lawson and Hansen 2004). The NETF’s performance is in between the NLEAF1 and NLEAF2 for larger ensemble sizes. However, the NETF diverges only for and delivers superior results even with just 20 members, whereas the NLEAF2 requires more than 40 ensemble members to be able to outperform it, presumably because of the additional sampling noise.

For this scenario, Table 2 presents the additional evaluation measures introduced in section 5b. Because of covariance inflation, the ensembles are not underdispersive as their spread typically exceeds the RMSE slightly. This does not hold for the NLEAF1/2 at and , respectively, as they require more members to work properly, in accordance with Fig. 2. Apart from such exceptional cases, the innovation variance is usually of similar magnitude as the prior ensemble variance plus the observation error variance, indicating that inflation tuning leads to a reliable assimilation system. This is also confirmed by the fact that the truth is in between the 95% confidence interval of the ensemble close to 95% of all times, except for small ensemble sizes, and for the ETKF without rotations in general. The probabilistic scores CRPS and CRIGN emphasize that the NETF not only produces a better analysis mean than the KF-based filters at each time step, but also an ensemble distribution that accumulates more probability mass near the truth.

Finally, Fig. 3 shows the optimal average RMSE for different temporal observation densities when the observation error variance is . Here, for each filter the smallest RMSE over all ensemble sizes has been chosen, based on the insight that large ensemble sizes can be detrimental, particularly to the ETKF. In general, the analysis quality decreases almost linearly with . The NETF clearly outperforms the KF-based filters that here all give quite similar results, with the ETKFs being slightly better than their stochastic counterpart. In the most nonlinear scenario the ETKF without rotations shows the better performance due to its optimal behavior at very small ensemble sizes. The NETF performs quite similarly to the second-order version NLEAF2 but is not able to improve upon it. Their relative improvement over the ETKF increases with larger observation distance, a sign of a better treatment of the rising nonlinear developments.

From the low-dimensional L63 experiments, we conclude that the NETF outperforms the KF-based algorithms because of its nonlinear ensemble update, as well as the NLEAF1 because it aims at second-order exactness. Nevertheless, it cannot completely achieve the quality of the NLEAF2 in the L63 environment. This can probably be explained by the fact that the random rotations tend to suppress higher-order moments that may implicitly be (at least partly) preserved by the stochastic NLEAF2 updates. A similar behavior has been pointed out in a comparison of the ETKF and EnKF by Lawson and Hansen (2004) who found the stochastic update to be more capable of dealing with strongly nonlinear situations. However, the NETF is still more efficient because the NLEAF2 requires larger ensemble sizes to achieve superior RMSE values.

#### 2) Lorenz models of intermediate dimensionality

The L63 experiments confirmed that the NETF is able to create a stable and good analysis in the presence of strong nonlinearity. Regarding a potential application to real-world problems, its behavior in systems of larger dimensionality is an important issue.

The first experiment concerns the 80-dimensional L96 system with the setup described in section 5a. Figure 4 summarizes the filter performances in terms of RMSE for various ensemble sizes. The KF-based filters achieve similar results, and it is noticeable that the ETKF without rotations performs best for and then degrades again. With random rotations, the ETKF’s performance is almost independent of ensemble size in this scenario, as also reported by Lei and Bickel (2011) for their experiments. The stochastic EnKF is only worse for small ensemble sizes. For , the NLEAFs outperform the KF-based methods, while the NETF consistently exhibits the best performance already with an ensemble size of 20 or more. This confirms that in such a nonlinear, non-Gaussian scenario of intermediate dimensionality, the suggested method performs well with localization, and the deterministic update mechanism seems to be more beneficial than its stochastic counterpart.

The next experiment concerns the L2005 model, which exhibits a distinct spatial structure compared to the L96 model. Figure 5 shows the analysis error with respect to ensemble size. Concerning the relative performances of the KF-based filters, the general structure is very similar to the L96 experiment. The most remarkable, seemingly counterintuitive difference is that here the NLEAF1 performs better than the NLEAF2, except when . Lei and Bickel (2011) do not show the NLEAF2 for their larger-dimensional experiments, hence, we cannot directly compare these findings to their results. A possible explanation could be revealed by the update formalism of the NLEAF2 [Lei and Bickel 2011, their Eq. (3)], which requires the estimation of an individual analysis covariance matrix for each ensemble member, based on each of the *m* perturbed observations. These low-rank approximations of the *d* × *d* covariance matrix with *m* members are subject to sampling error, and it seems that the stochastic errors of the perturbed observations amplify the errors in the estimation of the covariances, which may lead to these unexpected results in certain larger-dimensional cases. This hypothesis is supported by the fact that in the low-dimensional L63 experiments the NLEAF2 consistently outperformed the NLEAF1. Additionally, in the L2005 system, spatial correlations are more important than in the L96 system, hence a reliable estimation of the covariances is of greater relevance here. The NETF, which also focusses on the second-order statistics, does not suffer from this issue. Again, it exhibits the smallest analysis error for . We conclude that, particularly in larger-dimensional scenarios, the benefits of the deterministic update mechanism of the NETF become more apparent.

To strengthen these findings, Table 3 gives an overview of more diagnostic measures for . As in the L63 case, the comparison of RMSE and spread as well as of innovation variance and expected innovation variance indicate that, thanks to the inflation procedure, the filters behave reasonably well in both state and observation space. Except for , all scores reveal that the NETF performs best. The CRIGN shows that in information-theoretical terms, the NETF ensembles also exhibit the closest match with the truth. The CRPS confirms this insight, but its differences are smaller. The reason for this is that the CRIGN is more sensitive to outliers, which is why it is also the best score for detecting the degradation of the ETKF without rotations for larger *m*, in contrast to the RMSE that only increases slightly. Furthermore, all scores are consistent with the finding that the NLEAF2 can only achieve the performance of the NLEAF1 for . Figure 6 shows the relation between ensemble spread and the corresponding skill in terms of RMSE for . In this case, no inflation was required for the NETF, which leads to optimal behavior for all bins, while the ETKF and the NLEAF2 on average exhibit slightly too much spread. Nevertheless, there is an almost perfect correlation between spread and skill due to the inflation tuning, except for the NLEAF2, which has too little spread in the most difficult situations indicated by a high RMSE.

In the L96–L2005 experiments presented so far, the NETF and NLEAFs were able to compute their analysis based on the actual, non-Gaussian likelihood pdf while the ETKFs and EnKF could only use the specified covariance in a Gaussian manner. A concluding L2005 experiment examines the impact of such a misspecification of the likelihood on the NETF. The setup remains identical as before, except that now *all* filters wrongly assume the likelihood to be Gaussian. Figure 7 shows the resulting RMSEs. While the performances of the ETKFs and EnKF are unchanged, the nonlinear filters now show less dominant behavior. It is not surprising that the approximation of non-Gaussian error distributions as Gaussian can lead to significant analysis errors (Fowler and van Leeuwen 2013). In particular, the NLEAFs are not able to outperform the EnKF and the ETKF anymore. Note that here the NLEAF1 always outperforms the NLEAF2. However, the NETF still exhibits a better analysis at least for , even though the improvement is not as pronounced as before. These results indicate that even if the NETF does not use the true likelihood pdf to find the analysis weights, it still has the potential to achieve superior results.

In summary, the experiments with intermediate Lorenz systems form a consistent picture. The NETF performs well already for an intermediate ensemble size and is able to outperform not only the ETKF and EnKF, but also its stochastic counterpart, the NLEAF. We note that without random rotations the NETF diverged, as in the L63 cases. Based on the experiments performed in this work, we conclude that the additional random rotation is required to maintain filter stability by generating a new ensemble with Gaussian properties. However, it is possible that other experimental settings or implementations might lead to different experiences.

#### 3) Linear advection

In this final experiment, the dimensions of ensemble , observations , and state differ by about one order of magnitude, representing a difficult setup for filters that rely on the Bayesian weights. However, it approaches real-world situations that are usually characterized by .

Figure 8 shows the RMSE of each filter resulting from simulations with various ensemble sizes. By definition, the chosen scenario represents a linear, Gaussian assimilation problem. Recalling the Gaussian assumption built into all KFs, it is not surprising that the ETKF, which utilizes this information in a very efficient manner, performs best. Thus, its RMSE can be seen as the lower limit for a given ensemble size. Nevertheless, the NETF, which cannot benefit from the Gaussianity of the situation as much as the ETKF because it can only rely on the Bayesian weights, is stable and approaches the performance of the ETKF for an ensemble size of . In contrast, the stochastic filters (NLEAFs and EnKF) perform similarly in that they exhibit a considerably higher RMSE and do not converge to the smallest possible value. In this scenario, the perturbed observations introduce additional random noise and render these filters considerably suboptimal. While in the L63 experiments, they helped to improve the analysis, successive experiments have indicated that, in situations where the main challenge is given by an increased dimensionality, the second-order exact filters (NETF and ETKF) show a more favorable behavior. Furthermore, it could be confirmed that, by the means of localization, the NETF offers a potential applicability in systems where the ensemble size is rather small compared to the dimensions of observation and state.

### d. Summary

We summarize the main findings from the series of experiments that applied the NETF to different models and setups:

Despite being solely based on the Bayesian weights, the algorithm is capable of remaining stable even in difficult situations and higher-dimensional spaces, indicating the importance of the random rotations after the ensemble transformation.

In nonlinear and non-Gaussian scenarios, the NETF is able to outperform the KF-based algorithms (ETKF and EnKF), as it does not rely on Gaussian assumptions. In all experiments, the ensemble sizes required to achieve superior results was fairly small, indicating that the update mechanism acts quite efficiently. However, it will not work properly with

*very*small ensemble sizes such as 5 or 10, for which the ETKF already may achieve its best performance.Domain localization, as suggested by Hunt et al. (2007) for the ETKF, works successfully also for this nonlinear filter formulation. It reduces the effective state dimensionality, which in turn helps avoid weight collapse in larger-dimensional systems. This allows the use of the NETF with rather small ensemble sizes also in these scenarios.

In some cases, as in low-dimensional but strongly nonlinear scenarios, the perturbed observations used in the NLEAF can be beneficial. However, particularly in larger-dimensional problems, the NETF is able to outperform its stochastic counterpart because of its deterministic update mechanism.

## 6. Conclusions

This paper first reformulates the mean and covariance of the fully nonlinear, non-Gaussian Bayes’s theorem, and then uses them to derive a new type of filter algorithm, the NETF, which exhibits second-order statistics derived from the Bayesian estimates. This is achieved by a deterministic matrix square root transformation of the prior ensemble together with a suitable random rotation. Compared to the PF, at each analysis step these transformations generate a new, equally weighted ensemble, which contributes to filter stability. Compared to EnKFs, it does not suffer from the bias caused by the linear ensemble update.

The paradigm of using the Bayesian expectations to create an analysis ensemble with these characteristics is closely related to the approach employed by the NLEAF (Lei and Bickel 2011). The key difference lies in the method of constructing the analysis ensemble. The NLEAF uses a stochastic update of the ensemble members where the realized mean and covariance converge to the desired results. In this work, we applied a deterministic matrix square root transformation to generate an analysis ensemble with exact second-order statistics without sampling noise. Furthermore, the proposed method is also computationally more efficient, as it only requires the computation of one single matrix square root in the ensemble subspace, and does not involve a matrix inversion. Like the NLEAF, it exclusively relies on the Bayesian weights in order to derive the targeted second-order statistics; hence, it may also be exposed to weight collapse. The additional random rotation of the ensemble aims at increasing the filter stability while preserving the first two moments of the analysis ensemble.

As common to all algorithms that directly employ the Monte Carlo approximation to Bayes’s theorem, the NETF needs to evaluate the likelihood density to obtain the Bayesian weights. Hence, its functional form has to be specified, but in contrast to the EnKFs, this form is not restricted to be Gaussian. This additional freedom can be important for particular applications that deal with inherently non-Gaussian observation error statistics.

The proposed filter, which is fairly easy to implement, was tested in a set of experiments, mostly with nonlinear, non-Gaussian scenarios. The results show that it outperformed the ETKF and EnKF in these scenarios in terms of analysis error and ensemble quality even for relatively small ensemble sizes. While in the low-dimensional L63 system the NETF’s performance was between the NLEAF1 and NLEAF2, in all experiments with a larger dimensionality, its analysis error was also noticeably smaller compared to the NLEAFs, emphasizing the advantages of the deterministic update formulation. In these cases, it was confirmed that, with localization, such a scheme only based on the Bayesian weights is capable of producing stable and superior results, despite the curse of dimensionality. We conclude from these initial experiments that the filter, despite its rather simple design, appears to be a promising contribution to the field of nonlinear data assimilation and may be potentially suitable in large-scale systems. This should be studied in the future.

In the experiments conducted here, the random rotation was required to maintain filter stability. This is probably reasoned in the effect of the deterministic transform with on higher-order moments, which is an open issue at the current stage (see also appendix C). For the ETKF, the influence of random rotations on the analysis quality depends on the experimental setup, such as ensemble size and model properties. Thus, concerning the random rotation step in the NETF, further research about its necessity and impact in other situations is desirable.

The NETF concentrates, like the other ensemble filters tested in these experiments, on the first two moments of the analysis pdf. In principle, third- or higher-order moments could also be derived using Eq. (13). This raises the question whether ensembles of higher-order exactness may be generated in a similar way, which could be a direction for further research. In the work of Miller et al. (1994), the consideration of third and fourth moments with an extended KF resulted in major improvements in L63 experiments. Hence, a first step could be the development of a higher-order EnKF or ETKF before using the Bayesian estimates. However, such a filter may be limited by the fact that the Monte Carlo estimates of higher-order moments require a rather large ensemble size to be reliable, and by the computational expense of higher-order moments in larger-dimensional spaces.

Another aspect could concern the reutilization of the final analysis ensemble for ensemble predictions. An ETKF-based initialization scheme was shown to create better ensemble forecasts than classical alternatives such as singular vector or breeding methods (Wang and Bishop 2003), particularly if enhanced by rotations (Wang et al. 2004). In this context, it would be interesting to investigate the impact of a nonlinear analysis scheme on successive ensemble forecasts [see also Ades (2013), chapter 7.3], potentially enhanced by random rotations as discussed in this work.

## Acknowledgments

The authors thank Dr. L. Nerger and P. Kirchgessner (AWI, Bremerhaven) for valuable discussions on the subject, and the three anonymous reviewers for their detailed comments, which greatly helped in improving the manuscript. We acknowledge the support from the German Federal Ministry of Education and Research (BMBF) under Grant MiKliP: DEPARTURE/01LP1118B.

### APPENDIX A

#### Properties of the Transform Matrix

Apart from a scaling factor *m*, the square of the transform matrix defined in Eq. (20) is . For an investigation of its basic properties, we assume all weights to be nonzero, which always holds if *p*(**y** | **x**) is Gaussian. Currently, it is unclear how a likelihood pdf with finite support that may assign zero weights will affect the NETF.

We first show that is positive semidefinite. For this purpose, we make use of the fact that a symmetric, real and (weakly) diagonally dominant matrix with nonnegative diagonal entries is positive (semi) definite [Brouwer and Haemers (2012), p. 30, Lemma 2.10.1]. By definition, is real and symmetric. The diagonal elements of are , which are always nonnegative. With for , the sum of the absolute values of all off-diagonal elements in any matrix row *i* is

Hence, in any row the sum of the absolute values of the off-diagonal elements equals the corresponding diagonal element. This proves that also exhibits weak diagonal dominance, which finally allows us to establish its positive semidefiniteness. As a result, it possesses a unique symmetric square root such that . Note that if one neglects an ensemble member to compute in an dimensional subspace (e.g., the one with the smallest weight), it will get strictly diagonal dominant and therefore positive definite.

As is a full-rank matrix and has rank 1, the rank of is . The null space of can be found by solving for **z**. Evaluating both sides of this equation, we find that the constraint on any component of **z** is . Hence, the null spaces of and the transform matrix consist only of the direction given by the vector **1**. This property ensures that analysis perturbations remain centered around zero.

### APPENDIX B

#### Construction of the Random Rotation Matrix

The generation of a random rotation matrix appears in the singular evolutive interpolated KF (SEIK; Pham 2001; Hoteit et al. 2002) and has been reviewed by Nerger et al. (2012). The method relies on generating a sequence of Householder matrices, and only requires the ability to draw random numbers on a unit sphere in any dimension *D*. This can easily be performed by drawing *D* independent samples from the standard normal density, collecting the results into a *D*-sized vector **a** and normalizing it so that ||**a**||_{2} = 1.

To preserve mean and covariance of the analysis ensemble, we need an orthogonal random matrix that has as eigenvector with eigenvalue 1 [Livings et al. (2008), Theorem 3]. It was suggested (P. Kirchgessner, AWI Bremerhaven, 2013, personal communication) to combine the method of Nerger et al. (2012) with the algorithm of Sakov and Oke (2008b), who show how to obtain a mean-preserving, orthogonal matrix from any random matrix. This results in the following algorithm:

Use the algorithm presented in the appendix of Nerger et al. (2012), but omit the last step, resulting in an orthogonal matrix

**Ω**. Note that a further application of this algorithm would indeed generate an orthogonal random matrix, but it would not have the mean-preserving property required for our application.Create an arbitrary orthogonal matrix having as first basis vector. For instance, one may use the deterministic matrix mentioned in Nerger et al. (2012) by calculating the full Householder matrix associated with this vector, , which has as its last column.

- Perform the inverse basis transformation to obtain the final random rotation matrix: An alternative for step 1 is to use the orthogonal matrix from the singular value decomposition of a random matrix
**Ω**[i.e., ; Evensen 2004].

Additionally, we prove that this algorithm generates a random matrix with the desired properties.

Orthogonality: Eq. (B2) directly yields . Since is orthogonal, follows. Equation (B1) reveals that the orthogonality of

**Ω**implies that is orthogonal, and consequently, .Mean preservation: can be shown by successive evaluation of the matrix–vector products. First, because the first column vector of is , and due to its orthogonality, all other column vectors are orthogonal to

**1**. Second, the matrix leaves unchanged due to its special form in Eq. (B1). Finally, because all entries in the first column of are equal to .

### APPENDIX C

#### Scalar Gaussian–Gaussian example

We demonstrate the single update of a scalar state variable *x* with Gaussian prior and likelihood densities, and , respectively. The analytical solution, as given by the KF, is the Gaussian analysis pdf . An ensemble of forecast states is generated and updated by the ETKF and the NETF with and without random rotation. Figure C1 visualizes the resulting densities in form of histograms. As expected, the ETKF transform matrix in Eq. (10), which is derived from the theoretical KF analysis covariance in Eq. (5), produces a normally distributed ensemble. In contrast, the ensemble generated by the NETF without rotation clearly does not follow the desired analysis pdf, even though it exhibits the correct second-order statistics by construction. After the random rotation, the ensemble is Gaussian again. At the current stage, the impact of the purely-deterministic transform with Eq. (21) on higher-order moments is an open issue. However, the empirical insights gained from the experiments conducted within this work point out the importance of the random rotation for the NETF.

## REFERENCES

*Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh,*B. Clarke and S. Ghosal, Eds., Vol. 3, Institute of Mathematical Statistics, 318–329.

*Spectra of Graphs.*Springer, 263 pp.

*Sequential Monte-Carlo Methods in Practice.*Springer-Verlag, 581 pp.

*Data Assimilation: The Ensemble Kalman Filter.*Springer, 279 pp.

**142,**2165–2175, doi:.

**143,**212–229, doi:.

*Proc. Seminar on Predictability,*Reading, United Kingdom, ECMWF, 1–18.

*ECMWF Newsletter,*No. 106, ECMWF, Reading, United Kingdom,