## Abstract

Particle filtering methods for data assimilation may suffer from the “curse of dimensionality,” where the required ensemble size grows rapidly as the dimension increases. It would, therefore, be useful to know a priori whether a particle filter is feasible to implement in a given system. Previous work provides an asymptotic relation between the necessary ensemble size and an exponential function of , a statistic that depends on observation-space quantities and that is related to the system dimension when the number of observations is large; for linear, Gaussian systems, the statistic can be computed from eigenvalues of an appropriately normalized covariance matrix. Tests with a low-dimensional system show that these asymptotic results remain useful when the system is nonlinear, with either the standard or optimal proposal implementation of the particle filter. This study explores approximations to the covariance matrices that facilitate computation in high-dimensional systems, as well as different methods to estimate the accumulated system noise covariance for the optimal proposal. Since may be approximated using an ensemble from a simpler data assimilation scheme, such as the ensemble Kalman filter, the asymptotic relations thus allow an estimate of the ensemble size required for a particle filter before its implementation. Finally, the improved performance of particle filters with the optimal proposal, relative to those using the standard proposal, in the same low-dimensional system is demonstrated.

## 1. Introduction

Ensemble methods have been used in a variety of geophysical estimation problems, including atmospheric applications, oceanography, and land surface systems. Recently there has been growing interest in particle filtering methods in particular, as these methods are better able to capture the nonlinearity inherent in many geophysical systems [e.g., the merging particle filter of Nakano et al. (2007), the equivalent-weights filter of Ades and van Leeuwen (2013), and the implicit particle filter (Morzfeld et al. 2012)]. At the same time, particle filters also tend to suffer from the “curse of dimensionality” where the required ensemble size grows very rapidly as the dimension increases. Thus, it would be useful to know a priori whether a particle filter is feasible to implement in a given system.

The curse of dimensionality is a well-known problem in density estimation, as Monte Carlo estimation of high-dimensional probability densities demands notoriously large sample sizes (Silverman 1986). In a series of related papers, Bengtsson et al. (2008), Bickel et al. (2008), and Snyder et al. (2008) show that the curse of dimensionality is also manifest in the simplest particle filter. They demonstrate that the required ensemble size scales exponentially with a statistic related, in part, to the system dimension and that may be considered as an effective dimension. More general particle filters employ sequential importance sampling and allow a choice of proposal distribution from which particles are drawn. Snyder et al. (2015) [see also Snyder (2012)] showed that the exponential increase of the ensemble size with effective dimension also holds for particle filters using the optimal proposal distribution (Doucet et al. 2001), which we will introduce in more detail in section 5.

We will consider particle filters based on both of the proposals above. In the case examined by Bengtsson et al. (2008), Bickel et al. (2008), and Snyder et al. (2008), the proposal is the transition distribution for the system dynamics, where new particles are generated by evolving particles from the previous time under the system dynamics. It yields the bootstrap filter of Gordon et al. (1993) and was termed the “standard” proposal by Snyder (2012) and Snyder et al. (2015). The optimal proposal is of interest both because of its relation to the implicit and equivalent-weights particle filters and because it minimizes the degeneracy of weights, as shown in Snyder et al. (2015), and thereby provides a bound on the performance of other particle filters that use sequential importance sampling.

Our ultimate goal is to be able to determine whether a particle filter would be feasible to implement, given that we have statistics from, say, a working ensemble Kalman filter (EnKF). For the standard proposal, this is straightforward: the forecast step of ensemble forecasts provides a draw from the proposal and we simply need to compute weights based on the observation likelihood for each member. However, it is harder to use an existing ensemble to assess the feasibility of the particle filter based on the optimal proposal, since it is nontrivial to develop an algorithm to sample from this proposal (cf. Morzfeld et al. 2012). An alternative is to utilize the results of Bengtsson et al. (2008), Bickel et al. (2008), and Snyder et al. (2008, 2015), which relate the behavior of the weights in the linear, Gaussian case to eigenvalues of certain covariance matrices that may be estimated from an existing ensemble. We aim to evaluate the use of these results in a more general nonlinear, non-Gaussian setting, by using ensembles from a working sequential EnKF to calculate the relevant statistics (without implementing the particle filter directly.) As a specific test case, we employ the Lorenz-96 system and demonstrate the nonlinearity present in this example. Note that sampling error also presents an issue in applying these results; we investigate these effects and possible methods for overcoming them in this paper as well.

In addition, it is nontrivial to implement the truly optimal particle filter in nonlinear settings when the observations are not available every model time step. We investigate several approximations to the implementation of the “optimal” particle filter and utilize these approximations in sections 5 and 6. However, we emphasize that these approximations are no longer guaranteed to be optimal.

We note here that Chorin and Morzfeld (2013) have investigated a different, but related, effective dimension of a Gaussian data assimilation problem. In particular, they define a “feasibility criterion” to be the Frobenius norm of the steady-state posterior covariance matrix (which can be exactly calculated in the linear, Gaussian regime.) While both studies explore potential limitations of particle filtering in high-dimensional systems, their criterion is based on bounding the total posterior error variance as a function of an effective dimension, whereas the studies of Snyder et al. (2008) and Snyder et al. (2015) quantify the relation between degeneracy of the particle-filter weights and an effective dimension.

The remainder of this paper is organized as follows. In section 2, we review the ensemble Kalman filter and the particle filter and their respective implementations. Section 3 reviews the previous results of Snyder et al. (2008) regarding the limits of particle filters in high-dimensional linear systems. Section 4 verifies the applicability of the results for linear, Gaussian systems and the standard proposal to nonlinear systems by testing the results on the Lorenz-96 system; this is specifically useful for understanding the similar extension needed for the optimal proposal. In section 5, we consider approximations of the optimal proposal in a nonlinear system and discuss some of the difficulties that arise, in particular regarding additive model noise in nonlinear systems. Section 6 includes comparisons of performance of the standard proposal and approximation to the optimal proposal in a nonlinear system. Finally, section 7 summarizes the results and draws conclusions.

## 2. Review of ensemble methods and previous results

Ensemble data assimilation methods approximate probability distributions using an ensemble of states, either weighted or unweighted. Two common ensemble methods are the EnKF and the particle filter. Generally, the traditional EnKF algorithm uses unweighted ensemble members, which are themselves updated when an observation becomes available. On the other hand, the particle filter uses a weighted ensemble. When an observation is available, the particles are drawn from a proposal distribution and then reweighted according to the observation.

In this section, we will first describe the setup and some notation, and then briefly review the standard and optimal proposal implementations of the particle filter as well as the ensemble Kalman filter.

### a. Setup and notation

Assume that our state of interest is given by , where *k* indexes time and is the dimension of the state. We will additionally assume that model noise is added at integration time steps (of size ) indexed by *l* below, while observations are available every integration steps, indexed by *k*. Let the time between observations be denoted by . The model evolution can be described as

where and observations are available for , . The , whose dependence on *k* is suppressed for notational convenience, are i.i.d. random variables that represent the system noise and have a distribution to be specified later. Further define to be the operator that takes to . That is, includes the nonlinear convolution of the noise between observation times:

In particular, the effect of the model noise over the time window between observations is not simply additive when *m* is nonlinear, since the noise is convolved into the state at the intermediate integration times *l*. Next assume that we have linear, noisy observations of the state given by

where is the observation of dimension and is the observation error. For the following methods, we denote unweighted ensembles (where curly brackets are used to indicate ensembles) of size as and weighted ensembles as . Finally, will denote the concatenation of the observations from time to time .

### b. Particle filter

The particle filter estimates the true Bayesian probability distribution using a weighted ensemble of states. When an observation is available at time , we are interested in . We assume that we are unable to sample from the distribution of interest directly; instead, we will sample from a “proposal” distribution, which we can choose, and then assign appropriate weights to each member of the sample. We will briefly review the derivation of the weight update based on sequential importance sampling (SIS; Doucet et al. 2000; Snyder 2012; Snyder et al. 2015). To this end, suppose we wish to sample from and we are given a sample from a proposal distribution with weights . Assume that the proposal distribution factors as

If a sample is then drawn from , the appropriate importance weights are

where

and the are normalized to sum to 1. A draw from the filtering distribution is then obtained by retaining and ignoring .

The simplest choice for a proposal density is the standard proposal, in which is chosen to be . Then the weight update is given by

Doucet et al. (2000) discuss the so-called optimal proposal, which includes information about the previous state as well as the current observation: . In this case, the weights are updated according to

Sampling from this proposal is discussed in more detail in the appendix, but note that drawing from the optimal proposal and updating the weights are both more complicated in the case of the optimal proposal than the standard proposal. Despite this added computational effort, there are cases in which the optimal proposal has significant performance gain over the standard proposal, with the same number of particles (see section 6 below.) Thus, the optimal proposal may be more computationally tractable than the standard proposal in terms of the number of particles needed for an acceptable error level.

### c. Ensemble Kalman filter

Evensen (1994) introduced the ensemble Kalman filter as an approximation of the Kalman filter that, like the particle filter, uses an ensemble of realizations of the system state to represent probability distributions. Unlike the particle filter, the ensemble Kalman filter uses an unweighted (or equally weighted) ensemble of states. Suppose the ensemble at time is given by , where *f* stands for “forecast,” and *a* will represent “analysis.” If an observation is also available at time , each ensemble member is updated according to

where and is the ensemble covariance of the forecast:

This is the so-called perturbed-observation formulation of the EnKF (Evensen 2003), in which each observation is viewed as a random variable. In the update step, we replace with , where has the same statistics as the observation error noise. This formulation is shown to give the correct posterior covariance; otherwise, the covariance is overly tightened (see Burgers et al. 1998; Houtekamer and Mitchell 1998).

The EnKF is a linear method, and thus will be suboptimal for problems that are significantly non-Gaussian even if is large. However, for distributions that are close to Gaussian, the EnKF works well with relatively few ensemble members, though often it requires localization and inflation (see Houtekamer and Mitchell 1998, 2001; Anderson and Anderson 1999; Hamill et al. 2001). In the experiments in this paper, we use the perturbed-observation formulation of the EnKF with covariance localization using the compactly supported correlation function of Gaspari and Cohn (1999) and a small but fixed inflation.

### d. Review of previous asymptotic results

Snyder et al. (2008) prove, in certain regimes, an exponential relationship between the variance of the observation log-likelihood and the inverse of the maximum weight. In the linear Gaussian case, this variance can be calculated as a sum of eigenvalues of an explicit function of covariances. First, we give some definitions.

Define the weight update factor as in (6). Next define to be the variance of the log of these factors conditioned on the observations:

Let denote the maximum weight over the ensemble. Snyder et al. (2008) show that

under the following assumptions: first, that the observation errors are spatially and temporally independent; second, that and are large; third, that and are large, in the sense that as ; and finally, that the distribution of over draws of from the proposal is sufficiently close to Gaussian. The first three assumptions are easily verified and generally hold in the systems of interest to this work. The final assumption is less obvious, and below we investigate situations in which this assumption may not hold true.

Snyder et al. (2008) apply these asymptotic results to the particle filter with the standard proposal, where . Snyder (2012) and Snyder et al. (2015) note that similar arguments apply to the optimal proposal, where . Snyder et al. (2015) also show that the optimal proposal minimizes the degeneracy of the over draws of both and , and thus provide a bound on the performance of other particle filters that use sequential importance sampling, including the implicit particle filter (Morzfeld et al. 2012) and the equivalent-weights particle filter (Ades and van Leeuwen 2013). Although we will consider nonlinear, non-Gaussian systems, the asymptotics developed in the linear, Gaussian case are of interest here because they provide an explicit expression for . Additionally, in the linear, Gaussian case, the conditions for to be Gaussian (and thus for validity of the asymptotic theory) are straightforward.

We take the linear, Gaussian system to be

where and the observations are as defined in (3). Note that, for the linear Gaussian case, the dynamics are only written for observation times , in contrast to (1).

With the standard proposal, we first need to calculate the eigenvalues of the matrix:

Here we have omitted the notation for conditioning on ; thus, cov. Snyder et al. (2008) and references therein derive the following relation:

where the expectation is taken over Moreover, Bickel et al. (2008) show that is asymptotically Gaussian (over draws from the proposal), and the relation in (14) is valid, as long as no eigenvalue(s) dominate the sum of squares above. In the case of the optimal proposal, the same expression in (17) and the same conditions for validity hold, but using the eigenvalues of

In a system where each degree of freedom is independent and independently observed, these expressions simplify and show that will be proportional to the number of observations. A similar but more informal derivation of this result also appears in Ades and van Leeuwen (2013).

## 3. Model and experimental setup

In all experiments in this paper, we will restrict our attention to the nonlinear dynamical system of Lorenz (1996). The deterministic form of these equations is given by

for and here. The subscripts indicate the spatial location in a one-dimensional, periodic domain and should be understood .

We solve a discrete-time, stochastic version of this equation, cast in the form (1). Fixing an integration time step and an observation time step , we compute by integrating (19) over a single time step using a fourth-order Runge–Kutta scheme and draw from . Except where noted below, all results employ . Alternatively, we could have started from a continuous time stochastic differential equation by including noise directly in (19); this distinction is not crucial to any of the results we present. The observing network in the experiments will consist of full observations, so that , and the observation error covariance is .

We will need an example ensemble data assimilation (DA) scheme to calculate the statistics necessary to test the asymptotic theory. Since our goal is to demonstrate that these statistics may be used in practice to determine the applicability of the particle filter, we will use the EnKF, a common method for high-dimensional problems, to calculate the statistics. While the EnKF is suboptimal in the nonlinear case, we wish to show that the method is reasonably effective across a wide range of parameters for this system. The spread–skill relation (Table 1) indicates that this is true. For this experiment, we fix , , let the system noise be fixed with , and vary the observation error noise . For each value of observation error, we run the EnKF for 200 sequential observations. Table 1 shows the forecast mean squared error and forecast variance over the last 190 observations, for each value of observation error. For these results, . As the results show, the EnKF is working well with the chosen values of inflation and localization (1.05 and 5, respectively), since the forecast mean squared error (MSE) and variance are comparable and neither blows up.

In section 6, we will run a sequential particle filter for many observations to compare the overall performance of different proposal algorithms. In this case, we will need to resample in order to prevent weight collapse: here, we test two different resampling thresholds. The first is the resampling threshold defined in Kong et al. (1994), in which the filter is set to resample when the effective sample size falls below a fixed ensemble size. This is the threshold suggested by Arulampalam et al. (2002). The second threshold is based on the maximum weight, in which the filter resamples when exceeds a certain value (here we use 0.5.) We then use a Monte Carlo Metropolis–Hastings resampling technique [see Hastings (1970); Robert and Casella (2004) for an introduction and van Leeuwen (2009) for a description applied to particle filters], followed by resetting the weights to be equal.

## 4. Extension to nonlinear case: Standard proposal

Our goal is to show how to use an existing DA ensemble to determine whether it would be feasible to use a particle filter for a given nonlinear system, and if so, how many particles would be necessary to avoid filter collapse. In the case of the standard proposal, it is straightforward to directly calculate the weights without implementing the particle filter and quantify the statistics of the maximum weight directly. Alternatively, if we know and , we could use (17) to estimate and then predict from (14). This alternative approach to predict the behavior of is especially useful in the case of the optimal proposal, where computing the weights directly requires sampling from the optimal proposal, which can be difficult. Thus, we first numerically demonstrate the theory in the simpler case with the standard proposal, but with a nonlinear model, before moving on to considering the optimal proposal. Note that the asymptotics have been verified numerically for linear, Gaussian systems in Snyder et al. (2008).

We consider the Lorenz (1996) equations with , fix the system noise as , and vary the observation error variance . The existing DA scheme we use is the EnKF as described in section 3.

First, to demonstrate the degree of nonlinearity in this system of equations, we study the difference in perturbations after evolving two initial points forward under the fully nonlinear Lorenz equations as well as a linearized system. Specifically, we choose a random observation time in the EnKF experiment, choose two random ensemble members as our initial perturbation, and linearize the system about one of them. We evolve each ensemble member under the linearized dynamics to get and under the full dynamics to get ; we then measure the linearity of the system with

where . This will be close to 0 if the full system is close to linear. Additionally, note that this traditional version of the Lorenz (1996) equations with has a doubling time of 2.1 days, where one model time unit corresponds to 5 days (Lorenz 1996); thus, the doubling time is 0.42 model time units. (While system dimension and system noise can each have an effect on doubling time, we found that the doubling time of the version implemented here does not differ significantly from the original 40-variable deterministic system.) Table 2 shows results with a fixed , and variable integration time , averaged over 100 randomly chosen observation times. Note that in the experiments in this paper, we vary the time between observations as (standard proposal experiment) or (optimal proposal experiment, as in the following section.)

The results in Table 2 show that the measure of nonlinearity is very close to 0 for a single integration step, but quickly increases for longer time windows. This implies that the system is well approximated by a linear model after just one integration step, but the nonlinearity increases as the length of integration increases. We have, therefore, chosen observation frequencies for the following experiments that guarantee nonlinear behavior of the model between observations in order to test the theory. Additionally, note that although we are operating in a regime where the system is fully observed, based on theory, we expect the same results to hold in the more realistic situation of inhomogeneous spatial observation coverage. In particular, fewer observations will lead to more strongly non-Gaussian probability distributions; however, we are testing the effects of non-Gaussian distributions by ensuring the time between observations is long enough to display nonlinear behavior.

Next, to test the asymptotic theory on the calculation of and its relationship to , we run the EnKF with a localization radius of 5 and a covariance inflation of 1.05 on the Lorenz equations with 100 variables for 3000 sequential observations; at each observation time, before the EnKF analysis, we calculate what the true maximum weight would be if we were implementing the particle filter. We also calculate using the approximation defined in the linear case. We emphasize that we are not running a sequential particle filter here, merely using the ensemble from a sequential EnKF to calculate the relevant statistics. To have an accurate estimation of the covariance matrices, we run the EnKF with a large number of ensemble members () to perform this estimation, then draw a smaller ensemble () to calculate the weights directly. The ensemble size is then used in the term in the numerics. In this experiment, we fix , , and system noise , and vary the observation error from 5 × 10^{−5} to 0.05. Note that varying the observation error leads to different values of *τ*, and thus different data points, since the estimate of involves the eigenvalues of a matrix proportional to . Thus, small values of lead to larger values of *τ* and will result in ensembles that are close to collapse. Intuitively, this can be understood by thinking about a one-dimensional case: if the variance of the observation likelihood is very small, then the support of the probability distribution is very narrow, and all particles except the one closest to the observation will have very small weight.

Figure 1 shows the results of the asymptotic theory of collapse, where each data point is averaged over the last 2990 steps of the filter and the error bars represent 95% confidence intervals. Note that the observation error is increasing as we move in the positive *x*-axis direction. Results using the full covariance to calculate the eigenvalues are given in blue. They agree well with the theory in the regime near the origin, where the theory is formally valid, but deviations from the theory increase as increases.

There are several additional reasons for deviation from the theory within the asymptotic regime as well. In particular, the theory relies on the assumption that is an approximate sample from a Gaussian distribution. This assumption is satisfied provided is a sum of a large number of sufficiently independent random variables. However, evolving the ensemble under the state dynamics generally concentrates the state variance into a few growing structures, which increases spatial correlations and makes observation-space quantities more dependent. This leads to effectively being a sum over fewer independent random variables, which (all else being equal) in turn leads to being less Gaussian.

To test whether the non-Gaussian nature of may be a factor in the deviation of the numerics from the theory, we also investigate the degree to which may be skewed in this system. In particular, we look at the skewness of the ensembles of for the standard proposal experiment in this section; results are given in Table 3. If the skewness is far from 0, then the sample distribution is far from Gaussian. The skewnesses are averaged over the last 2990 observations. As the results in Table 3 show, larger observation error generally leads to higher skewness values; since large observation error corresponds to larger , this may explain why the data points do not follow the theory as well further from the asymptotic regime.

To additionally give a visual approximation of the distribution of , we run the same experiment with larger ensembles (*N*_{e} = 10 000) and plot the histograms of for several levels of observation error at one fixed time step. These plots are given in Fig. 2, with the skewness at that time step included in each plot. Although the increased ensemble size can have an effect on the nature of this distribution, the observed skewness values are relatively close to those for the above experiment with . That is, the differences in skewness do not have a visible effect on these histograms. In fact, these histograms all look quite close to Gaussian, which is supported by the fairly weak skewness values. Thus, increasing skewness may not be the only cause of the deviation between numerics and theory. However, the frequent changes of variables needed to derive prevent a more detailed analysis of this deviation.

In practice, there are difficulties using (17) to estimate . First, computing a covariance matrix from a small sample typically yields an eigenvalue spectrum that is artificially steep, with too much variance in leading directions. The corresponding calculation of will then be too large, since it is a sum of higher powers of the eigenvalues. We have, therefore, chosen a large ensemble () in this experiment in order to estimate the covariances accurately and avoid this effect. Second, for large numbers of observations and large ensembles, calculating eigenvalues of these matrices may be computationally prohibitive. Specifically, in order to avoid severely overestimating the leading eigenvalues, the sample covariances are generally localized before the eigenvalues are computed. This computation is nontrivial.

Thus, we also tested this theory using a computationally feasible approximation for the eigenvalues of : we assume and are diagonal, so that the eigenvalues are simply the product of the corresponding diagonal elements of and . This diagonal approximation is, in some sense, the most extreme localization, and in this limit the eigenvalues of the matrices are easy to obtain. Results with approximated in this way are also shown in Fig. 1. The approximation systematically underestimates ; thus, data points with the approximation always lie to the right of those using eigenvalues of the full matrix in (16).^{1} Nevertheless, using the approximation of in the asymptotic relation gives reasonable predictions of , often better than with the unapproximated , because the underestimation by the diagonal approximation compensates for the overestimation of that is, empirically, a property of the asymptotic relation when is not small. It is not clear whether this compensation will be equally effective in other problems.

## 5. Optimal proposal

Next, we follow the approach of the previous section, but apply the asymptotic theory to our approximation of the optimal proposal. Specifically, we wish to use an existing ensemble to evaluate the feasibility of a particle filter using the optimal proposal. As in the case of the standard proposal above, the evaluation will be limited by the fact that it applies results from linear, Gaussian systems in a nonlinear, non-Gaussian setting, and by sampling errors in estimating the necessary covariance matrices from a finite ensemble. We will check these limitations with numerical simulations using the Lorenz (1996) system.

There are two additional issues that must be addressed when evaluating the feasibility of the optimal proposal. First, the assumption that , which defines the weight update, is a normal distribution with mean and covariance (see the appendix), is not true in the case we consider (a nonlinear model in which observations are not taken at every model step.) We will consider weight updates of this form, but we emphasize again that this is an approximation to the correct weight update for the optimal proposal and is necessary if we are to avoid the difficulties involved in implementing the optimal proposal. The second issue, which we turn to below, is that some of the covariance matrices involved in the definition in (18) of do not appear explicitly in the nonlinear problem.

### a. Model noise in nonlinear systems

The matrix , whose eigenvalues determine for the optimal proposal via (17) in the linear, Gaussian case, involves the covariance matrices and . Since (1)–(3) for the nonlinear system do not specify these quantities, we take the approach of defining them through more general expressions that reduce to the correct result for the linear, Gaussian case.

To compute the covariance involving the linear dynamics , we first define [recalling that in (2) is a function of the state as well as the realizations of the noise at each integration step ]. In the linear case, and a more general definition for the desired covariance is

We estimate the right-hand side for the nonlinear system by evolving an ensemble of initial conditions from using , applying , and computing the sample covariance.

For the covariance , there are at least two possible definitions that generalize to the nonlinear system. The first uses

which is an identity in the linear case and gives a quantity that, in the nonlinear case, will depend on . We can estimate the covariance on the right-hand side by starting from a given and computing an ensemble , over realizations of the system noise. Let be the state-space covariance estimated in this way. (Recall from section 3 that in our experiments.) A further step would be to compute by averaging the over an ensemble of .

The second possible definition relies on

This expression is again an identity in the linear case— can be written as the sum over contributions from the noise in (1) at each of the model time steps between and . Beginning from an ensemble of realizations of , we estimate the covariance on the right-hand side above by evolving each member from to with both and , with independent realizations of the system noise in , and then taking the sample covariance of the differences in . We denote this estimate .

It is not immediately obvious whether one of these definitions is to be preferred. They will agree in the limit of linear dynamics and may differ as nonlinearity increases. We have, therefore, explored the behavior of both approaches in the case with , , and with varying model noise and initial ensemble spread . The test consists of evolving the particles forward from time 0 to time and estimating in the three ways described above. First, we calculate for each particle; second, we take the average of these values; finally, we estimate as above. We found that the variations of about were negligible relative to the magnitude of elements of . Similarly we found good agreement between and in these cases. Thus, the effects of nonlinearity in estimating the effective model noise covariance are small in these experiments; in particular, they are much smaller than sampling error in estimates of with ensembles of size 1000, which we use in the following experiments.

The two definitions do, however, differ substantially in their computational demands, as the computation of the and requires an ensemble of integrations for each , while a single integration for each suffices for . In all following experiments, therefore, we use to estimate the model noise covariance, as it is the most computationally efficient.

### b. Numerical results

Snyder et al. (2015) have rigorously shown that the asymptotics developed in Bengtsson et al. (2008) and Snyder et al. (2008) also hold for the optimal proposal. Here, we numerically show how these results extend to the nonlinear system of Lorenz (1996), with our approximation of the optimal proposal. As in the experiment with the standard proposal, we run the EnKF with a localization radius of 5 and a covariance inflation of 1.05 on the Lorenz equations with 100 variables for 3000 sequential observations. We fix and the system noise , and vary the observation error from 5 × 10^{−3} to 1. In this experiment, we use the approximation described above when calculating both *τ* and the exact weights. The size of the ensemble used to calculate is , but we take a subsample of size when calculating the weights themselves {and, as above, use in the theoretical value }. We approximate sampling from the optimal proposal by sampling from the distribution derived for the linear, Gaussian case given in (A3), replacing all s with .

The results are given in Fig. 3. Clearly, the data points do not agree with the theory as well as in the experiment with the standard proposal. This is likely due to the parameter choices in this experiment. When using this approximation of the optimal proposal, we empirically found that we needed to increase the time between observations in order to satisfy the assumption that the filter is close to collapse [i.e., that is close to 0]. However, as mentioned previously, this also leads to a steep spectrum of the covariance matrices, which in turn leads to violation of the assumption that is approximately Gaussian. We again investigate the values of skewness for this experiment; these results are given in Table 4. Note that the longer observation time window in this experiment leads to higher values of skewness than for the shorter time window standard proposal experiment in the previous section.

As in the previous section, we also perform this experiment with the larger ensemble size of *N*_{e} = 10 000 and plot the histograms of for a fixed time step and increasing observation error in Fig. 4. The values of skewness for that particular step are also included in the figure. As before, although the ensemble size differs from the above experiment, the instantaneous values of skewness for *N*_{e} = 10 000 are comparable to the time averages for . In particular, note that the increasing observation error generally leads to more skewed, non-Gaussian distributions of .

Thus, we would expect worse agreement with the asymptotics in these experiments, because is less Gaussian than in the standard proposal experiments. The data points for which the full covariances were used (in blue) fall almost entirely above the theoretical line in solid black. On the other hand, since approximating the eigenvalues by the diagonal elements leads to underestimating *τ*, the data points for which this approximation was used (in red) are much closer to the theoretical line. That is, the underestimation of *τ* by the diagonal approximation compensates for the overestimation of *τ* by the theory due to the steep spectrum. But, as in the case of the standard proposal, these approximations are more accurate in the asymptotic regime (close to the origin) while the data deviate from the theory away from this regime.

## 6. Performance of standard and optimal proposals

Recently, there has been a focus in the particle filtering community on the optimal proposal as an improvement over the standard proposal (Doucet et al. 2000; Arulampalam et al. 2002; Bocquet et al. 2010; Snyder 2012; Snyder et al. 2015). Intuitively, sampling from a distribution conditioned on the new observations should perform better than a distribution conditioned on the previous observations. The form of the weight update should also provide intuition behind the performance gain: the standard proposal weight update involves the distribution of the observations conditioned on the state at the current time , whereas the optimal proposal weight update is conditioned on the state at the previous time: . Since uncertainty generally increases with a longer prediction window, the likelihood will tend to be broader than , and thus there will be less variance across the weights for the optimal proposal update.

In a review of non-Gaussian data assimilation methods, Bocquet et al. (2010) performed a simple comparison between the standard and optimal proposal implementation of the particle filter and found that the optimal proposal results in lower mean squared errors for smaller ensemble sizes, and has comparable performance to the standard proposal for large ensemble sizes. Here, we perform experiments that not only compare the mean squared errors of these methods, but we also consider the frequency at which resampling occurs as well as the maximum weight of each method after a single step.

To test the usefulness of the optimal proposal, experiments were run with the Lorenz (1996) system with 5, 10, and 20 variables, with full observations once per time step for 300 time steps, using both the standard proposal and our approximate implementation of the optimal proposal as described in section 5. The observation error variance, system noise variance, and initial ensemble variance are fixed at respectively. We test two resampling thresholds: first, when the effective sample size falls below ; and second, when the maximum weight exceeds 0.5. After resampling, the weights are reset to and a small amount of jitter (with a variance of 0.01) is added to each particle. This extra noise is added to avoid degeneracy, and is also known as regularization (see e.g., van Leeuwen 2009; Doucet and Johansen 2011; Chopin 2004). The errors are averaged over the last 200 time steps, but the resample counts are over the entire 300-step window.

Figure 5 shows the root mean squared error of the posterior mean as a function of ensemble size. For the system with the smallest state dimension () and increasing ensemble size, results for the standard proposal converge quickly toward those from our approximation to the optimal proposal. When is larger, however, the results for the standard proposal do not appear to converge over the range of ensemble sizes considered, and the root mean squared errors remain much larger than from our approximation of the optimal proposal even at the largest ensemble size. Note also that the standard proposal improves slightly over the approximation to the optimal proposal for and large ensembles. We believe this reflects the approximations in our implementation of the optimal proposal. Similar errors result from both resampling thresholds, with the exception of small ensemble sizes for small state dimension (), in which case the threshold determined by effective sample size results in smaller errors.

A further difference is that the filter using the standard proposal resamples much more often than that of our approximation to the optimal proposal, with both resampling thresholds (see Table 5). This may help explain why our approximation of the optimal proposal has better error values: since resampling introduces additional sampling noise into the algorithm, resampling less frequently should be preferable to resampling often.

Note that under the effective sampling size threshold, the number of times the filter resampled increases with ensemble size for a fixed state dimension. This may be due to the dependence of the threshold on the ensemble size, leading to increased resampling frequency with . Alternatively, our approximation of the optimal proposal density may be too narrow in relation to the posterior density, resulting in particles in the tail of the proposal with high posterior probability, and thus a low effective sample size. To test this, we also tried inflating the proposal variance as in Del Moral and Murray (2015). For these results, the resampling frequency still increased with increasing , though not as drastically. Additionally, the errors were not affected by inflation, and so we have not included the results here. On the other hand, the threshold determined by the maximum weight results in decreasing resampling frequency as grows for a fixed , without inflating the proposal variance. This would suggest that even if the effective sample size is small, the weights are well distributed across these particles. Then, even though the effective sample size may be increasing at a slower rate than , resulting in higher resampling frequency with larger , the weights are still distributed across more particles. While resampling methods compose a rich area of research, they are not the focus of this work, and will not be investigated further here.

In addition to having smaller errors over time, the optimal proposal is less likely to experience collapse than the standard proposal. A hint to this behavior is given by the lower number of necessary resampling steps for our approximation to the optimal proposal than the standard proposal; however, this effect can be studied directly by comparing the maximum weight after one step for each proposal. Results are shown in Fig. 6. All parameters are fixed at the same value for each proposal, except the state dimension, which varies as shown. The ensemble size is fixed at , the data are averaged over 100 trials, and the error bars show 95% confidence intervals based on this sample. These results demonstrate that, for fixed ensemble size and state dimension, our approximation to the optimal proposal consistently provides a lower maximum weight, and thus less variance across the weights. In this experiment, the improvement is especially clear in the regime where the state dimension is between 10 and 50.

When is small compared to the prior covariance and the observation error covariance, the optimal and standard proposals are nearly identical. Without system noise, becomes a deterministic function of and both and are delta functions at . This can also be seen in the system in (A1)–(A2) discussed in the appendix: when is very small, so is in (A6) and the mean and covariance of approach and , respectively, which are the same as the standard proposal mean and covariance. Thus, the gain that the optimal proposal affords over the standard proposal will be dependent on the size of the system noise. Table 6 includes results for the Lorenz system with 5 variables and 500 particles, with varying system noise. The resampling threshold is determined by the effective sample size (specifically, when falls below 0.1.) The particle filter with each proposal distribution was run over 300 time steps, with observations at each time step; the table includes the ratio of the means of the errors over the last 200 time steps as well as the number of times the resampling threshold was reached out of the 300 time steps.

As Table 6 shows, the difference in errors between the standard and approximation of the optimal proposal increases as the system noise increases. For the smallest system noise, the ratio of the error from the approximation to the optimal proposal to the standard error is very close to 1, but for larger noise, our approximation of the optimal proposal yields a significant decrease in error over the standard proposal. Since the optimal proposal (or an approximation thereof) requires more computational effort than the standard proposal, if the problem of interest has very small system noise, then the standard proposal should be used. Lin et al. (2013) present the optimal proposal particle filter as one method in a class of “look ahead” algorithms, and investigate other such algorithms in the context of computational expense for various types of problems.

## 7. Discussion and conclusions

In this work, we attempted to answer the question of whether one could predict the collapse of the optimal particle filter without building a scheme to sample from the optimal proposal. We have shown that this is possible in the Lorenz (1996) system, using results from Snyder et al. (2008) and their extension to the optimal proposal in Snyder et al. (2015). The results of the former demonstrate how to use eigenvalues of matrices from a linear, Gaussian system to calculate the effective dimension , which can then be used to assess the feasibility of the particle filter in that system. One key issue is determining the extent to which these results are valid in more general settings (e.g., nonlinear systems.) To this end, we have numerically shown that the asymptotic approximations and results found in Snyder et al. (2008, 2015) are also useful in the nonlinear regime of the Lorenz (1996) system with both the standard proposal and the above approximate implementation of the optimal proposal.

In extending the asymptotic results to nonlinear systems and the optimal proposal, another important issue is to estimate an “effective” system-noise covariance corresponding to the additive Gaussian noise at observation times assumed in Snyder et al. (2008, 2015). We have discussed several different approximations of this covariance, and shown that the asymptotic results are also valid with our implementation of the optimal proposal when these approximate system-noise covariances are used. We emphasize that this implementation is an approximation to the truly “optimal” proposal, and thus the theory guaranteeing optimality of this proposal algorithm no longer holds in this setting.

Additionally, the eigenvalue decompositions necessary to estimate the effective dimension will be costly for large systems (and large ensembles.) Thus, in practice, we will need to find computationally feasible approximations. In this work, we have chosen to approximate the matrices as diagonal to simplify these eigenvalue calculations. This approximation appears to be effective in the idealized system considered here, though it also tends to overestimate the degree of collapse. The margin of this overestimation decreases as the system gets closer to collapse.

Finally, motivated by the results of Snyder (2012), which demonstrate the benefits of the optimal proposal implementation over the standard proposal in a simple example, we investigated the performance gain of an approximate implementation of the optimal proposal over the standard proposal in a nonlinear system. We have shown that the approximation of the optimal proposal implementation not only collapses less frequently than the standard proposal in the same regimes, but also results in quicker error convergence as a function of increasing particles. Thus, for systems in which the particle filter may work, utilizing the optimal proposal can provide increased performance with fewer particles than the standard proposal. The optimal proposal, however, is not trivial to implement and its benefits disappear in the limit of small system noise.

There are several remaining challenges regarding particle filters in nonlinear systems. Experiments still need to be done to determine how the filters behave when applied sequentially; the experiments in sections 4 and 5 of this paper study the degree of collapse after one assimilation step. However, this does not preclude the possibility of the particle filter collapsing after two or more steps. Second, while we have presented a general methodology, we have only tested it on one system; further testing in a wider variety of systems would be of interest. In addition, it could be useful to know whether the numerical results in this paper have an analytical analog, as in the linear Gaussian case. Finally, further work should be done to investigate the optimal proposal, particularly in regards to approximations of the model noise covariance.

## Acknowledgments

Slivinski was supported by the NSF through Grants DMS-0907904 and DMS-1148284, by ONR through DOD (MURI) Grant N000141110087, and by NCAR’s Advanced Study Program during a collaborative visit to NCAR.

### APPENDIX

#### Sampling from Optimal Proposal

Recall that the optimal proposal requires conditioning on the current observation: (Doucet et al. 2000; Snyder 2012). Consider the case of additive Gaussian noise and a linear observation operator, where the system is given by

with and . Then

In this case, the weights have an analytic update expression, since

Thus, the particles at time are first sampled from (A3), and then their weights are updated according to

## REFERENCES

*Probability and Statistics: Essays in Honor of David A. Freedman*, D. Nolan and T. Speed, Eds., Institute of Mathematical Statistics, 316–334.

*Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh*, B. Clarke and S. Ghosal, Eds., Institute of Mathematical Statistics, 318–329.

*Sequential Monte Carlo Methods in Practice*, A. Doucet, N. de Freitas, and N. Gordon, Eds., Springer-Verlag, 2–14.

*Proc. Seminar on Predictability*, Vol. 1, Reading, Berkshire, United Kingdom, ECMWF, 1–18.

*ECMWF Seminar on Data Assimilation for Atmosphere and Ocean*, Shinfield, Reading, United Kingdom, ECMWF, 161–170.

## Footnotes

Current affiliation: Cooperative Institute for Research in Environmental Sciences, Boulder, Colorado.

^{+}

The National Center for Atmospheric Research is sponsored by the National Science Foundation.

^{1}

Since is a sum of squares of the eigenvalues of (16) [see (17)] and because the sum of the eigenvalues equals the sum of the diagonal elements, will be underestimated by the diagonal approximation whenever the eigenvalue spectrum is steeper than the sorted list of diagonal elements. We expect this to be true in many problems involving spatial correlations, with spatially correlated but nearly spatially homogeneous processes being a prime example.