## Abstract

Operational forecasting with simulation models involves the melding of observations and model dynamics to determine a set of initial conditions for each forecast. The Kalman filter (KF) provides the optimal closed-form solution to a general linear stochastic (perfect model) case, while the target of the problem has not even been defined in the case of imperfect models. Data assimilation in a nonlinear, perfect-model scenario is considered. It is shown that a new fully nonlinear approach based upon the indistinguishable states (IS) systematically outperforms the ensemble Kalman filter (EnKF). The IS provides an ensemble of initial conditions, consistent with (i) the model dynamics, (ii) the observational noise model, and (iii) the particular observations over a window. It is argued that this is the relevant limit to consider in data assimilation, when the desire is to place high probability density in the vicinity of the target state. The advantages of the IS approach come in part from its ability to provide attractor-balanced ensembles near any attracting manifold the system may evolve on. The use of an EnKF, provides a computationally cheaper alternative that place points in the general vicinity of the target. A low (i.e., 2) dimensional example is used to provide easily visualized evidence for these claims, which are then tested in a higher (i.e., 12) dimensional system. Inasmuch as the IS approach is shown to outperform the EnKF systematically in these perfect-model experiments, it provides an interesting alternative approach when informative ensembles are desired.

## 1. Introduction

Data assimilation is the procedure of combining simulation models with observations. Given noisy observations, the aim of data assimilation is, arguably, a sample of (weighted) model states, consistent both with the observations and the given noise model. A key goal of data assimilation is the generation of a reliable sample of system states for real-time applications including nowcasting, numerical weather prediction, and seasonal prediction. Simulation models of interest are high-dimensional, nonlinear, chaotic dynamical systems, which are imperfect. Observations are typically incomplete, available at discrete times, and contaminated with observational noise with poorly known characteristics. The main aim of this paper is to demonstrate that ensembles formed using a new nonlinear method systematically outperform those based upon a state-of-the-art linear method in a variety of perfect-model contexts.

In this paper, we consider data assimilation in the perfect-model scenario (PMS). Within PMS, the system state and model state estimates evolve according to the same dynamics, and the statistical characteristics of the observational noise are known exactly. Probability theory is the guiding framework for data assimilation in this setting (Jazwinski 1970). We advocate the following definition of data assimilation within PMS: given a prediction model and observations up to and including the present time *t* = 0, the goal of data assimilation is to sample (ideally to compute) the conditional probability distribution of the system state. Given this definition, we view any method that seeks to generate a set of states for *t* = 0 representative of this conditional probability distribution as a method for ensemble data assimilation. This includes a whole variety of methodologies and frameworks, including those explored in this article. Given the complexity and size of simulation models of interest, solving the problem analytically is intractable and Monte Carlo (ensemble) methods (Leith 1974) are used instead. The use of nonlinear models suggests the use of ensemble data assimilation methods, which do not rely, implicitly or explicitly, on assumptions of dynamical linearity. Approaches that are optimal in linear systems (like the Kalman filter; Kalman 1960), need not perform well in nonlinear systems. The primary goal of this paper is to contrast a standard linear method with a nonlinear method, illustrating their relative ability in the limit of large computational resources.

We describe and compare a novel methodology for nonlinear ensemble data assimilation based upon indistinguishable states (IS). The IS approach has a number of appealing properties: it does not require explicit assumptions regarding the form of prior distributions, nor does it require complicated resampling techniques, nor prior specification of the properties of model error. By construction, the IS approach forms ensemble members from model trajectories, thus forming ensembles that reflect the (nonlinear) dynamics of the model. In PMS, the IS approach has theoretically satisfying limiting properties that do not hinge on dynamical linearity or distributional assumptions. The IS approach can be applied to nonlinear observation operators and non-Gaussian likelihood functions without approximation. An IS approach will also function in the special case of vanishingly low observational noise and linearized dynamics, but can never be expected to outperform optimal linear methods in truly linear problems.

In section 2, the problem of probabilistic state estimation in the context of ensemble data assimilation is described, along with a simple method for comparing two ensemble data assimilation methods probabilistically. In section 3, we describe the theory underlying the IS approach and its practical implementation. To illustrate aspects of this IS method, which differ from others, results from a series of Observing System Simulation Experiments (OSSEs; Daley 1991; Arnold and Dey 1986) are shown. Here the IS method is compared to an ensemble Kalman filter data assimilation technique that has been successfully demonstrated in high-dimensional atmospheric prediction problems (Anderson 2001, 2003; Anderson et al. 2005). In comparing two relatively complicated ensemble data assimilation schemes (IS and ensemble Kalman filter), we find it insightful to also include relatively simple ensemble data assimilation schemes in order to confirm that complex methods are worthwhile, and the degree to which they are useful (McSharry et al. 2003). We compare to two additional schemes, one based only on the observational noise, which provides a “zero skill” baseline assuming no knowledge of the system’s dynamics; the other method selects initial conditions in a manner consistent with the long-term dynamics (i.e., “on the attractor”) but only uses observational information at the analysis time (i.e., no prehistory). Details of these schemes and numerical experiments are provided in section 4. In section 5, results are shown for a nonlinear chaotic two-dimensional Ikeda system, while in section 6, results are shown for a 12-variable Lorenz (1996) system. In both sections 5 and 6, for a nonlinear regime of the model/data assimilation system, we demonstrate how this IS method significantly outperforms the ensemble Kalman filter (and the other schemes). Section 7 provides a discussion and the conclusions.

## 2. Evaluating probabilistic state estimation using ensembles

### a. Definitions and notation

We work within the strong perfect-model scenario, where the system and model states evolve according to the same dynamics: the model is structurally correct and there is no parametric uncertainty. In this case a “true” system state exists. The true system state is denoted by , where *t _{i}* is time indexed by

*i*and is an -dimensional vector. We make a number of assumptions in setting up the problem, with little loss of generality. Assume that observations of the true state are available periodically with a sampling time of Δ

*t*

_{assim}=

*t*

_{i}_{+1}−

*t*. At time

_{i}*t*, let the

_{i}*m*-dimensional vector of observations be . Here

*H*is generally a nonlinear operator and is an

*m*-dimensional observational error. Assume that

*H*is the same at each observing time. For simplicity we assume observation of the complete state vector (i.e., ) and take

*H*to be the identity operator. The posterior ensemble at time

*t*is a set of

_{i}*m*-dimensional state estimates (

*k*= 1, … ,

*K*) with weights , normalized so that , where

*α*denotes a particular ensemble formation scheme.

### b. Probabilistic evaluation

We focus on the problem of probabilistic state estimation of the system state given observations up to and including the present time, which is sometimes referred to as probabilistic nowcasting (see, e.g., Hansen and Smith 2001). To compare the various ensemble formation schemes defined below, we deem a method which systematically assigns more probability mass in a neighborhood about the “target” as the better of two methods. We denote an *m*-dimensional “target state” at time *t _{i}* by , noting that in the PMS experiments below, the target state is the true state (i.e., ). Specifically, in comparing two ensemble data assimilation methods

*M*and

_{β}*M*at time

_{α}*t*, one can draw a hypersphere of radius

_{i}*ε*(hereafter

*ε*ball) around the target state. For each method, one can sum the weights of the ensemble members that lie within the

*ε*ball. We denote these quantities by . If ,

*M*wins. If ,

_{α}*M*wins. If , the methods tie. If one method systematically assigns more probability mass to the target over a series of assimilation times for a range of

_{β}*ε*-ball sizes then it is to be preferred. We stress that this approach does not correspond to a proper skill score (Brocker and Smith 2007; Du 2009) and should not be used recklessly; nevertheless it addresses the question of interest and is of value when robust results over a range of length scales are obtained (as happens below). The

*ε*-ball approach also provides a direct examination of the ensemble, avoiding the need to translate ensembles into continuous probability distributions in order to apply the more common proper scores.

## 3. Indistinguishable states of nonlinear ensemble data assimilation: Theory and practical considerations

### a. Forming an ensemble using the indistinguishable states of a reference trajectory

Our goal is to generate an ensemble of state estimates conditioned on observations at *t _{i}*, and

*N*observation times in the past. We start with a sequence of state estimates from

*t*

_{i−N}to

*t*denoted by , which is roughly consistent with observational evidence from

_{i}*t*

_{i−N}to

*t*(superscript

_{i}*e*denotes state estimate). This sequence of state estimates can be computed using a four-dimensional variational data assimilation method (4DVAR; e.g., Lorenc 1986) or other nonlinear noise reduction techniques such as gradient descent (GD; Judd and Smith 2001, 2004; Judd 2003; Judd et al. 2004; Ridout and Judd 2002). The GD methodology is used in this paper, and the reader is referred to Judd et al. (2004) for a detailed description of the algorithm, omitted for brevity. [See Judd et al. (2008) for an application of related algorithms in operational models.] The relevant details of our particular implementation of GD will be given as necessary throughout this paper.

After the GD procedure, a “reference” trajectory is obtained from the final GD estimate by mapping the first state estimate forward in time to *t _{i}* under the model dynamics. The result is a reference trajectory denoted by (superscript

*r*denotes reference). We now focus on how an informative ensemble of trajectories can be formed

*given*the reference trajectory. We emphasize that in practice, the actual methodology (GD, 4DVAR, etc.) used to determine the reference trajectory is likely to depend on practical considerations.

The reference trajectory will not be the true trajectory (even in PMS), the coherent aim of data assimilation in this case is to form an ensemble. We proceed by finding the indistinguishable states (IS) of the reference trajectory, noting that a more general approach would also consider an ensemble of reference trajectories. As the concept of indistinguishable states is central to this paper, we briefly review the detailed introduction to this concept in Judd and Smith (2001).

Suppose we are given a “candidate” trajectory (superscript *c* denotes candidate) that we may wish to add to our ensemble. In the limit *N* → ∞ the candidate trajectory is said to be indistinguishable from the reference trajectory if and only if there is a nontrivial probability (given the noise model) that the observations of the reference trajectory would be statistically mistaken as observations of the candidate. For finite *N* one is forced to set a threshold value.

Suppose that at *t*_{i−N} we have a vector of observations , and assume that the observational errors *ν*_{i−N} are independent and identically distributed, and drawn from a density *ρ* that is not singular. For *t*_{i−N}, the joint density that the reference and the candidate are indistinguishable is given by

where can be normalized by dividing by implying that the reference trajectory is indistinguishable from itself with a probability of 1. Normalizing defines .

When *ρ* is Gaussian:

where is the observational error covariance matrix. The product of *q* values for *t*_{i−N} to *t _{i}* gives the indistinguishability criterion:

for the two trajectories and . If , *c* and *r* are indistinguishable.

The reference trajectory and its indistinguishable states define a set of trajectories, which can be used to form an ensemble. Of course, whenever *ρ* is unbounded then while may be vanishingly small for almost all candidate trajectories, it is *always* greater than zero for any finite *N*. We restrict attention to the Gaussian case for the remainder of this paper (other densities are considered in Judd and Smith 2001). As discussed below, we have formed sets of indistinguishable states in our experiments by only accepting states for which the value of *Q* is greater than some specified threshold.

Weights are assigned to the chosen ensemble members using the observations. For example, suppose we have a set of *k* = 1, … , *K* ensemble members obtained by application of this IS method. The trajectories can be weighted by

The weights can be normalized by a constant factor *θ* defined such that . The weight assigned to ensemble member *k* is then *θw _{k}*.

### b. Computing the indistinguishable states of a reference trajectory: Practical considerations

As the observational window length *N* → ∞, the IS of a reference trajectory form a subset of the reference trajectory’s unstable manifold (Judd and Smith 2001). One strategy for computing IS of the reference trajectory is to generate many candidate trajectories by taking small perturbations about the initial point of the reference trajectory and integrating forward, while setting some minimum *Q* [Eq. (3)] is required for a candidate trajectory to be included in the ensemble. The precise scheme used to form informative ensembles will be problem and system dependent, our choices are discussed along with the results in section 5 and 6. Recall that our aim here is not to optimize the implementation of IS in these cases, but merely to demonstrate an implementation of IS that convincingly outperforms the other alternatives considered.

## 4. Details of the experimental design and the alternative approaches

### a. Design of the OSSEs

An initial state was obtained by propagating a point (believed to be close to the system’s attractor) 10 000 additional Δ*t*_{assim} under the system’s dynamics to place the initial target state even closer to the system (and in PMS also the model) attractor. Next, a target trajectory of 11 000 steps (each separated by Δ*t*_{assim}) was obtained. In each case the observing network was set to identity (i.e., ). Simulated observations were obtained by adding a random error sampled from a Gaussian distribution with a standard deviation of *σ*_{obs} to each component of the target. Observations of individual state variables are uncorrelated. For all 11 000 steps, posterior ensembles of size *K* = 1000, along with their weights were computed for a variety of ensemble data assimilation schemes. Distributions with *K* = 1000 appear robust, allowing us to compare the various methods for an ensemble size where the impacts of sampling error are minimal. The first 1000 assimilation steps were discarded to minimize any transient effects.

### b. Alternative approaches (EN, PB, and OB)

The IS method will be compared with three methods, each with a different underlying motivation. The first scheme EN denotes an ensemble adjustment Kalman filter (EAKF) described in Anderson (2001) and Anderson (2003). The aim of applying the IS method to a hierarchy of complex assimilation problems, motivated comparison of IS ensembles with an ensemble Kalman filter scheme that has been demonstrated to give accurate results in high-dimensional, complex prediction problems (Anderson et al. 2005). This is the rationale for our comparison to the EN scheme. The EN method is a particular version of a deterministic ensemble square root Kalman filter (Tippett et al. 2003). Given the large ensemble size, the EN was implemented without any heuristic adjustments such as covariance localization (e.g., Hamill et al. 2001) or covariance inflation (e.g., Anderson and Anderson 1999). Note that the EN assigns equal weight to all the ensemble members (*w _{k}* = 1/

*K*). At the first of the 11 000 consecutive assimilation times, the prior ensemble was formed from a 1000 member sample of the system’s climatology. The EN was used to update this ensemble for the 11 000 consecutive steps. Discarding the first 1000 steps allowed the EN to equilibrate to the observing system before any comparisons are made.

When comparing two relatively complex ensemble data assimilation schemes (IS and EN), including simple approaches allows us bench marks for the comparison of skill and justification of any complex method in the first place. They are included here as a necessary test toward demonstrating that the more complicated methods are warranted. Both simple schemes only know about the observations at the analysis time. The first simple scheme called observations distribution sampling (OB) has no knowledge of the system dynamics, while the other scheme, one-step perfect Bayes (PB), exploits knowledge of the system dynamics as well. The OB method obtains 1000 equally likely ensemble members by adding 1000 random perturbations to the observations at a particular time. The random perturbations are obtained from random samples of the inverse observational noise (in this case a Gaussian observational error distribution with *σ*_{obs}). In the PB scheme, a hypersphere (a circle in the two-dimensional Ikeda system) of radius 3*σ*_{obs} was drawn around the observation. Starting from a random point believed to be close to the attractor, the model was integrated until 1000 states within the 3*σ*_{obs} ball were obtained. The ensemble members were then weighted by their relative likelihood given the observations at the time in consideration. The PB method is a one-step method as it is only conditioned on observations at one particular time. While our choice of 3*σ*_{obs} is arbitrary, this scheme is an attempt at approximating a scheme which assumes the prior probability distribution function (PDF) is the climatology.

While ad hoc choices were made in implementing the OB and PB schemes, the comparisons to the IS and EN methods demonstrate the significant advantage of employing the more complicated EN and IS schemes. Furthermore, we can intuit how the OB and PB will perform based on how they are implemented. The OB method is the simplest of all schemes considered, providing a zero-skill base line as the scheme has no knowledge of system dynamics. In contrast, the PB scheme uses the system dynamics to restrict attention only to points consistent with the long term dynamics (i.e., on the attractor; Hansen and Smith 2001; Smith 1996); the consistency of PB with the dynamics and the observation suggests it will outperform OB, but with a significant increase in computational cost.

The EN approach can be interpreted as a linear generalization of OB using a window of observations and the system dynamics, while IS attempts a practical approximation of PB over that window. As noted by Lorenz (1963) waiting for an analog in a perfect model is valuable, when it is an option; arguably PB yields a perfect ensemble (see Smith et al. 1999) when evaluated over a long window. As noted above, in this paper we restrict PB to a window consisting of only a single observation. Inasmuch as the computational cost of PB increases rapidly with the length of the window, it would be interesting to explore where, for any finite computational resource, PB would outperform IS. This question is only of interest within PMS and falls beyond the scope of this paper.

## 5. Numerical results in the Ikeda system

### a. Model equations

The Ikeda system (Ikeda 1979) is a two-dimensional nonlinear chaotic map. It is a common test bed in assimilation studies (Hansen and Smith 2001; Judd and Smith 2004; Ridout and Judd 2002). The equations are as follows:

where [*x _{i}*

_{+1},

*y*

_{i}_{+1}]

^{T}is the state vector at “time” indexed by

*i*;

*μ*,

*a*, and

*b*are real valued parameters. Following previous studies we take

*μ*= 0.83,

*a*= 0.4, and

*b*= 6.0 (Hansen and Smith 2001; Lawson and Hansen 2004; Judd 2003; Judd and Smith 2001) where the system exhibits sensitive dependence on initial conditions. The Ikeda system has a complicated fractal attractor, which does not fill up the entire state space; it is depicted in Fig. 1. Our main motivation for using Ikeda is that the entire two-dimensional state space can be displayed simultaneously, allowing visual comparison of the various strategies, and indicating how the IS method achieves its advantage over other data assimilation strategies.

Results were obtained for observations every iteration of the system so that Δ*t*_{assim} = 1. Numerical experiments for this parameter regime of Ikeda reveal that one iteration of the system is roughly ⅘ of the error doubling time (Lawson and Hansen 2004). The observational error standard deviation was set to *σ*_{obs} = 0.05, relatively large given the size of the attractor (Fig. 1), to evaluate the methods in a nonlinear regime.

### b. Results

#### 1) Ensemble formation using the IS method

We begin by focusing on the first assimilation time for which comparisons are made (the 1000th observation time or *t*_{1000}). For *t*_{1000}, the target is at the intersection of the straight lines in Fig. 2. The observation is located at roughly [0.46, −0.71], precisely in the middle of the circle marked by the inner blue dot. The small dots in the background are samples from the Ikeda system’s attractor.

The first step in the IS method is to find a reference trajectory. A sequence of state estimates was obtained using the GD method as described in Judd and Smith (2001; see also Judd et al. 2004), using an initial state equal to the observations and following Ridout and Judd (2002) in determining the length of the window used to find an accurate reference. (The appendix provides further details regarding the numerical integration required.) Figures 3a,b depict solutions obtained by GD. In Figs. 3a,b, the target values for *x* and *y* are depicted by the black dots from *t*_{991} to *t*_{1000}, the error bars indicating the observational standard deviation (0.05). The observation values are depicted by the black circles. The solid blue line joins the state estimates obtained by solving Eq. (6) in Judd et al. (2004) numerically, until the indeterminism [defined by Eq. (1) in Judd et al. (2004)] *L* ≈ 10^{−9}. The indeterminism *L* quantifies how far a solution obtained from gradient descent is to a true system trajectory. In both Figs. 3a,b, it appears that the sequence of state estimates obtained via GD has achieved noise reduction. A reference trajectory, computed by mapping forward the state estimate at *t*_{991}, is located at the centers of the blue circles in Figs. 3a,b. For all subsequent assimilation times, a similar procedure was followed. Details concerning the computational cost of this procedure can be found in the appendix.

Returning to Fig. 2, the observation indicated by the blue dot within the circle at [0.46, −0.71] depicts the starting point for the GD calculations. The final point in the solution is the blue dot at the center of the circle at roughly [0.375, −0.75] after the GD algorithm has been applied. We find that successive intermediate points move toward the target. Notice how the final GD state is not only closer to the target than the observation, but it also falls close to a nearby branch of the attractor. Averaged over the full set of experiments, the mean distance of the observations from the truth was found to be 0.0626, whereas the GD state achieved a mean distance from the truth of 0.0266, indicating that our choice of *N* = 9 has achieved noise reduction. Once again, Ridout and Judd (2002) also uses *N* = 9 in comparisons to an extended Kalman filter. We have verified empirically that further noise reduction can be achieved for larger windows (*N* > 9), but *N* = 9 proved sufficient for our purpose of contrasting the IS and EN.

Figure 4 depicts a reference trajectory at *t*_{1000} and 999 of its IS, colored by their relative likelihood given the observations from *t*_{991} to *t*_{1000}. Notice how all the ISs line up close to a relevant branch of the Ikeda attractor, as previously observed in Judd and Smith (2001).

We now describe how the ISs were computed, and discuss various parameter choices which define our particular implementation. Any trajectory segment (any 10 consecutive iterations of the system equations) will fall some finite distance from the reference trajectory thus have *Q* > 0, even if vanishingly small, so an ensemble formed by taking any trajectories for which *Q* > 0 would include (unlikely) trajectories that are far from observations. For any finite window, we thus only accept trajectories for which *Q* is greater than some threshold, taken in this paper to be 0.1, and discard less likely trajectories for which *Q* is small.

Given the reference trajectory from *t*_{991} to *t*_{1000}, we generate a 1000-member ensemble by computing 999 IS candidates each with *Q* > 0.1. These 999 IS candidates and the reference trajectory form the 1000 member ensemble. One expects, of course, an infinite number of trajectories with *Q* > 0.1. Ideally, we would like to randomly sample from a subset of trajectories on the system’s attractor. In practice, this would require knowing the (singular) density of attractor states in a neighborhood about the initial point of the reference trajectory at *t*_{991}, introducing the high computational costs associated with PB. Instead, we adopt a simple scheme for picking the IS, which, while ad hoc, is sufficient to outperform the EN. Specifically, we generate a candidate trajectory by adding a perturbation to the reference trajectory at *t*_{991}, integrate it forward, and accept this candidate trajectory if and only if *Q* > 0.1; some trajectories are rejected. We have picked a perturbation scheme such that the histogram of *Q* values greater than 0.1 appears *roughly* uniform. Our rationale for picking a scheme that generates a roughly uniform distribution in *Q* (beyond 0.1) is that we have no a priori knowledge of the *Q* density. While the histogram in the top panel of Fig. 5 is not statistically uniform, the procedure generates a range of *Q* values greater than 0.1.

The perturbations that generated the first 199 IS are sampled from a uniform box, centered about the reference trajectory at *t*_{991}, the next 200 IS using a smaller box size, the next 200 using an even smaller box size (see the appendix for details.). The rationale behind our choice of a “box” scheme is its simplicity, both conceptually and computationally; no doubt more efficient schemes can be defined when needed; that remains an interesting area of future research. Simple trial and error was used to tune box sizes appropriately. We note that on average, 44 000 integrations of the Ikeda system were required to generate the 1000 member ensemble using our particular implementation of the box scheme.

The histogram of corresponding relative likelihoods is shown in the bottom panel of Fig. 5. Notice that, after normalizing, no one member dominates the ensemble. It was noted that for all 10 000 assimilation times, the values of *Q* and the relative likelihood were strongly correlated.

In summary, we have made five key simplifications in our implementation of the IS method: (i) we have obtained a sequence of state estimates using GD; (ii) we have obtained a reference trajectory by mapping the state estimate at the start of the observation window, obtained from GD, forward to the most current observing time; (iii) we have used a window length of *N* = 9; (iv) we have constructed an ensemble of trajectories for which *Q* > 0.1; and (v) we have chosen a perturbation scheme that generates a roughly uniform distribution of *Q* values at each particular assimilation time. While our implementation could be improved by increasing *N*, might be improved by using another scheme to generate a reference trajectory, and could be made more efficient by employing a more clever perturbation scheme, the empirical results are more than sufficient to justify our choices [(i), (ii), (iii), (iv), and (v)]. A summary of the numerical details of this implementation is provided in the appendix.

#### 2) Comparisons of the IS method to EN, OB, and PB

Figure 6 illustrates the main results of this paper and suggests why the IS method outperforms the alternatives methods. It depicts 1000 member ensembles generated by the IS method and the EN for four consecutive assimilation times. Each panel is constructed so that the target is at the center of the panel at the intersection of the two lines, the observation is shown as a black circle. The small black background dots depict points on the attractor. The magenta crosses depict the 1000-member equally weighted ensemble members obtained using the EN. The IS ensemble is depicted by the colored dots, where the color indicates the relative likelihood of each ensemble member. Note that in each case the IS ensemble extends near the target.

Note that the IS ensemble reflects the structure of the system’s attractor and tends to include points in small neighborhoods about the target. “Misleading” observations as in the top right panel of Fig. 6 tend to shift the probability distribution on the IS ensemble away from the target, but the ensemble in total retains members near the target, and recovers quickly as shown in the subsequent (bottom left) panel of Fig. 6.

The EN ensemble is often far from the system’s attractor and does not “cover” the target nearly as well as the IS method. The entire EN ensemble is drawn away from the target in the top-right panel, and as it is sequential, the EN cannot recover in the manner that the IS ensemble does. We also note that the IS ensemble is on occasion distinctly non-Gaussian, especially in the top two panels of Fig. 6. The behaviors illustrated in Fig. 6 strongly suggest why the IS method outperforms the EN systematically.

In the bottom-right panel of Fig. 7 is a quantitative comparison of the IS versus EN method. For a particular assimilation time, a series of *ε*-balls of radius [1.0, 0.5, 0.1, 0.05, 0.025, 0.01, 0.001] have been drawn around the target. For a given method, the weights of the ensemble members falling within the *ε* ball are summed. If the IS method assigns more probability it is assigned one point and vice versa. If the methods tie, each is assigned a point. Over the 10 000 assimilation times, the percentage of wins and ties versus the log of the *ε*-ball size is plotted in the bottom-right panel of Fig. 7. Except for the largest *ε*-ball size (where they tie), the IS ensemble outperforms the EN. The log of the observational error standard deviation is indicated by the diamond on the horizontal axis. For small *ε*-ball size, the IS method nearly always wins. Given the large number of trials, these and all subsequent results are statistically significant (one can formulate significance tests based on binomial distributions to show this, but for brevity we do not provide results from such testing given the large number of trials).

As revealed in Fig. 7, the IS method clearly outperforms the PB and OB methods. The EN method either outperforms OB or nearly ties it. However, the EN method is found to be inferior to the PB method for small *ε*-ball sizes. In view of Fig. 6, this result is not too surprising, as the PB ensemble is by construction, very close to if not on the attractor where as the EN ensemble often places the majority of states far from the attractor. Since the target is on the attractor, this was enough to find the PB ensemble superior to the EN ensemble for small *ε*-ball sizes using this measure. For larger *ε*-ball sizes, the EN clearly outperforms PB.

A simultaneous comparison of all the methods is provided in Fig. 8. For each assimilation time, if a particular method assigns more probability in a neighborhood of size *ε* about the target than any other method, it wins and is assigned one point. If there is no clear winner, all the methods are assigned one point (this intentionally gives inferior methods an advantage). The IS method is repeatedly found to be superior to all the other methods, in spite of this bias in favor of the inferior methods. Figure 8 depicts the percentage of wins and ties for each method as a function of the log of *ε*-ball size. The IS method is the clear winner, EN ranks second, PB is third, and the OB scheme is last.

The IS method significantly outperforms the EN in terms of placing more probability mass near the target over a range of definitions for “near.” The score used is not proper, but is appropriate for our interests and avoids the complications of transforming the ensemble into a continuous probability distribution for the application of more common proper scores. One can imagine a situation in which this score would be misleading, if, for example the IS method often places only a slightly greater probability mass near the target on most occasions, while being wildly off target now and then. If this were the case, the claim that the IS has performed better than the EN in our experiments might be considered misleading. Scatterplots (not shown) of the ensemble mean distance to the “truth” for the IS versus the EN (over a series of verification times) show that there are no instances in which the IS is very far from the truth, whereas the EN is, on occasion. The *ε*-ball score is reflecting the properties we wish to examine in this case.

The linearity assumptions inherent to the EN method are not met in these experiments, and so it is not surprising that the EN is not the preferred method. It is possible, of course, that the performance of the EN was due to too small an ensemble size. To verify whether or not the poor performance of the EN was not due to ensemble size, we have run an additional set of results for EN using 10 000 members, and found that the performance of the EN with 10 000 members relative to the 10 times smaller 1000 member IS method did not change significantly (the wins/ties results analogous to the bottom-right panel of Fig. 7 looks nearly identical, not shown here for brevity). This suggests that the EN results have converged and the EN cannot be improved significantly by increasing the number of ensemble members. *An identical analysis* working closer to the linear regime (decreasing the standard deviation of the observational noise to 0.0001) reveals the IS method still systematically outperforms the EN, although less dramatically so. While we have intentionally focused on the nonlinear regime in this paper, it may be of interest in follow on work to understand what degree of nonlinearity is required before one starts to see benefits in using the IS method, as opposed to the EN method that is optimal in linear systems. At the request of a reviewer, we note that, averaged over all the experiments, the mean ensemble mean distance to the truth for the IS method was found to be 0.0264 and for the EN it was 0.0463. We also note that RMS statistics in visibly nonlinear distributions can be very misleading (Smith 1996; McSharry and Smith 1999).

Contrasting the results of IS and EN with the simple systems shows that both IS and EN add value above pure sampling of the inverse observational noise (OB). We note that, for small *ε* balls PB outperforms the EN method, suggesting the importance of sampling on the attractor (see Smith 1996). We note that, were it computationally tractable to extend PB to longer windows, it would be expected to outperform IS as well, as accountable ensembles would result (see Smith 1996 for a discussion on accountable ensembles).

## 6. Numerical results in the Lorenz 96 12-variable system

### a. Model equations

While difficulties identified in low-dimensional systems rarely vanish in high-dimensional systems, success often fails to generalize. In this section, results for a 12-variable version of the Lorenz 1996 (L96) system (Lorenz 1996) are presented. L96 is often used in assimilation studies (e.g., Hansen and Smith 2001; Bengtsson et al. 2003). The equation is

where *j* = 1, … , 12 and the forcing parameter *F* = 8. The system is cyclic with *x _{j}*

_{+12}=

*x*. The locations of the state variables can be conceptualized as equidistant locations on a latitude circle. This version of L96 has (i) “energy” conservation, (ii) nonlinear advection and linear dissipation, (iii) sensitivity to initial conditions, and (iv) external forcing (Lorenz 2005). The energy is conserved through a compensation between the constant forcing term and linear dissipation term. For

_{j}*F*= 8, disturbances propagate from the low to high index

*j*(“west” to “east”; Lorenz 2005).

Following Lorenz (2005), a fourth-order Runge–Kutta scheme with time step Δ*t* = 0.025 is used. Numerical experiments yield an error doubling time of roughly 16 Δ*t*. Assuming that the doubling time in the atmosphere is roughly 2 days, our value of Δ*t* can be thought of as equivalent to 3 h. The climatology is with *σ*_{climate,j} = 3.6 (same for all *j*).

We take Δ*t*_{assim} = 2Δ*t* or 6 h, similar to current-day operational systems (Bishop et al. 2003). We have set *σ*_{obs} = 1.0. The results below confirm that we are evaluating the various methods (IS, EN, OB, and PB) in a nonlinear regime.

### b. Results

#### 1) Ensemble formation using the IS method

Using a similar procedure to that used with the Ikeda system, reference trajectories were obtained from state estimates obtained from GD for *N* = 19. This means that the separation in time between the initial point and final point in the reference trajectory is 19Δ*t*_{assim} (nearly 5 “days”). Averaged over all the assimilation times, the mean observation distance to the truth was found to be 3.374, whereas the mean distance of the GD solution was found to be 0.798, indicating that the GD method with *N* = 19 has achieved significant noise reduction. Further numerical details regarding our computations with GD are found in the appendix.

Again employing a similar scheme to that used with Ikeda system, we generated a set of 999 IS each with *Q* > 0.1 and a *Q* distribution, which is roughly uniform (again, this is typical for all the assimilation times). Details of the numerical perturbation scheme are summarized in the appendix; on average 14 000 integrations were required to obtain the 1000 member ensemble. At each observation time, a 1000 member ensemble was formed using the 999 IS and the reference trajectory. In the top-left panel of Fig. 9 is a plot of state variables *x*_{1} and *x*_{2} at *t*_{1001}. The target is depicted by the intersection of the two lines at the center of each panel, the observation by the black circle. The IS ensemble members weighted by their relative likelihood are depicted by the colored dots, where as the magenta crosses indicate the EN ensemble members, which are each equally likely. In the top panel of Fig. 10 is a histogram of *Q* values corresponding to *t*_{1001}, the bottom panel consists of the histogram of relative likelihood values. As in the Ikeda system, the majority of the relative likelihood is not assigned to only one member, and again the *Q* values and relative likelihood values are strongly correlated. We note that we are sampling from a likelihood function, which is 240 dimensional, where dimension is defined as the number of observing times (i.e., 20) multiplied by the number of observations at each time (i.e., 12).

#### 2) Comparisons of the IS method to other schemes

Figure 9 depicts two-dimensional projections of the posterior ensembles at *t*_{1001}, *t*_{1011}, *t*_{1021}, and *t*_{1031}. Unlike Fig. 6 (analogous plot for Ikeda), background dots corresponding to the attractor are not shown. We note that the two-dimensional projection of the L96 attractor appears to be a circular ball (with standard deviation 3.6 and mean 2.3 in each direction). The results in Fig. 9 suggest that the two-dimensional projections of the IS and EN ensembles are close to (if not on) the two-dimensional projection of the system’s attractor. It also appears that the IS ensemble assigns more probability near the target than the EN ensemble.

The win/tie comparisons for the L96 results are depicted in Figs. 11 and 12. For the L96 results, the *ε*-ball sizes were set to [4.0, 2.0, 1.0, 0.5, 0.25, 0.1, and 0.05]. The bottom-right panel of Fig. 11 plots the percentage of wins and ties for the IS versus EN. The log of the observational standard deviation is indicated by the diamond at zero on the horizontal axis. The IS method again systematically assigns more probability in a small neighborhood about target than EN, except for small *ε*-balls, *ε* < 0.1, within which neither method tends to assign nonzero probability (the volume of a 12-dimension sphere decreases quickly with *ε*). The IS method is shown to outperform PB and OB in Fig. 11.

To test whether or not the deficiency of the EN method is due to using too small an ensemble, we have run the EN for *K* = 10 000. The same win/tie comparisons were made with the 10 times smaller *K* = 1000 IS ensemble. As with the Ikeda system, it seems that the EN method has converged in the sense that the relative comparison of the IS versus EN (bottom-right panel of Fig. 11), which is essentially unchanged. An additional set of results was obtained closer to the linear regime where the observational standard deviation was set to 0.001. As for Ikeda, the results suggest that as the noise level is tending to zero, linearized dynamics provides an excellent description near the target, and the EN and IS results become comparable.

The all-methods-at-once comparison in Fig. 12 again shows that the IS systematically assigns more probability to the target for a range of *ε* balls than all the other methods. At the request of a reviewer, we note that the mean ensemble mean distance to the truth of the IS method was found to be 0.796, whereas it was 0.900 for the EN. We again note the RMS distance between the ensemble mean and truth in nonlinear systems is at best blunt measure of ensemble quality and can easily mislead (see McSharry and Smith 1999). Finally we note, as for the Ikeda system, that the degenerate case where the IS method is winning by a small margin most of the time but loses drastically on occasions does not occur, as there are no instances in which the ensemble mean distance to the truth is significantly larger for the IS when compared to the EN method. The IS ensembles robustly outperform the other methods considered.

## 7. Discussion and conclusions

Arguably, the aim of data assimilation in the nonlinear context is a set of states at a given time representative of the system’s conditional probability distribution given observations up to and including that time. In this paper, a novel approach to nonlinear ensemble data assimilation based upon indistinguishable states (IS) has been shown to be superior to an alternative approach based upon a state-of-the-art ensemble Kalman filter; this result appears to be robust in the limit of large computational resources. The IS method makes use of a nonlinear noise reduction technique to obtain a reference trajectory, and then forms an ensemble by weighting the reference trajectory and its indistinguishable states by their relative likelihood. The fact that ad hoc elements remain in the IS approach as presented, which no doubt could be improved upon, should not distract us from the fact that the IS approach as presented outperforms the EN systematically.

The IS methodology has a number of desirable properties. The IS methodology does not require explicit assumptions or parameterizations of prior distributions of states, nor does it require implementation of complicated resampling techniques unlike other nonlinear ensemble formation techniques described in the state estimation literature (Anderson and Anderson 1999; Pham 2001; Kim et al. 2003; Bengtsson et al. 2003; van Leeuwen 2001; Doucet et al. 2001; Evensen 1994). The IS method constructs ensembles from a set of trajectories, consistent with the model’s nonlinear dynamics over the observation window. The IS method can be applied to nonlinear forward observation operators and non-Gaussian distributions.

In this paper, we have used the gradient descent method (e.g., Judd and Smith 2001) to obtain reference trajectories. Alternative methods are clearly available, ultimately the choice is a pragmatic one. We stress that the IS method provides a methodology for ensemble formation irrespective of the scheme for finding reference trajectories. The exploration of alternative methods for obtaining reference trajectories, and indeed sampling over reference trajectories, remains an interesting area for further research.

Success in any number of specific systems does not, of course, establish that the IS method provides superior state estimation in high-dimensional dynamical systems in general, or outside PMS (see however Judd et al. 2008). The results presented above illustrate the performance of the IS method compared to a state-of-the-art ensemble Kalman filter scheme (Anderson 2001, 2003), which has been applied successfully to high-dimensional atmospheric prediction problems (Anderson et al. 2005). The manner in which the IS approach outperforms the EN will generalize inasmuch as it respects the nonlinear dynamics of the system. Perfect-model comparisons, in a two-dimensional Ikeda system and a 12-variable L96 system have been made in a nonlinear regime of the model/data assimilation system. We find that for both systems, the nonlinear IS method systematically assigns more probability mass near the target (which is the “truth” in PMS) for a range of small neighborhoods than does the EN. The IS method also drastically outperforms a scheme that samples the inverse noise model about the value of an observation (OB) and a method that forms an ensemble by weighting (using relative likelihood) a random sample of the points near the model manifold within a neighborhood of the observation value (PB). The IS, EN, PB, and OB methods have been evaluated for an ensemble size *K* = 1000. This demonstrates the clear superiority of the IS approach, inasmuch as employing values of *K* an order of magnitude larger for the EN does not appear to improve performance of the EN. Specifically, the EN using a *K* = 10 000 member ensemble was outperformed by the 10 times smaller *K* = 1000 IS ensemble, outperformed in much the same manner as the comparison of equal sized *K* = 1000 ensembles for both IS and EN. In this sense it appears the EN has converged, and its performance will not be improved by massive increases in computational resources. On the other hand, we expect performance of the IS method to improve with longer observational windows and larger ensemble sizes. Within PMS the IS method can exploit further increases in computational resources, which the EN will not.

Ensemble members generated by the IS method have been weighted by their relative likelihood given observations. By considering trajectories with nontrivial *Q* density, the IS approach avoids the problem of nearly all the (relative) likelihood being assigned to one ensemble member, a phenomena that has been observed and studied analytically in the context of random sampling (Bengtsson et al. 2008). The set of examples introduced here can be used in future studies of the relative likelihood sampling issue.

This paper lays a foundation for understanding how the IS method performs in more complicated assimilation problems. Applications of the IS approach in higher-dimensional applications, in studies with explicit restrictions on computational resources and ensemble sizes among the competing methods, with incomplete observations or nonlinear observation operators remain to be explored. The IS framework can be extended in a systematic fashion to the imperfect-model scenario as shown in Judd and Smith (2004). Given the results above and in Judd and Smith (2004), a similar study to this one in the imperfect-model scenario is a natural extension of this work (see Du 2009). The IS methodology provides an exciting new framework for tackling some of the most interesting prediction problems in forecasting and data assimilation.

## Acknowledgments

The authors would like to acknowledge the support of the Statistical and Applied Mathematical Sciences Institute located in Research Triangle Park, North Carolina. Shree Khare would also like to acknowledge the support of the Institute for Mathematics Applied to Geosciences division at the National Center for Atmospheric Research in Boulder, Colorado. Leonard Smith acknowledges the support of the EU ENSEMBLES program. Both authors are happy to acknowledge comments by H. Du, K. Judd, and by the anonymous reviewers whose critiques improved the quality of the manuscript.