## 1. Introduction

Particle filters are ensemble-based algorithms for data assimilation that, unlike many schemes, make no assumptions about the probability distributions for the prior or the observation errors. One difficulty is that particle filters may suffer from degeneracy, in which the weight assigned to one ensemble member (or particle) converges to one while those assigned to all other members approach zero. Bengtsson et al. (2008), Bickel et al. (2008), and Snyder et al. (2008) (hereafter BBS08) showed that avoiding degeneracy in the most elementary particle filter [essentially, the bootstrap filter of Gordon et al. (1993)] requires an ensemble size that increases exponentially with the variance of the log-likelihood of the observations given each member, which in simple examples is proportional to the system dimension.

More general particle filters employ sequential importance sampling, in which the ensemble members at each step are drawn from a proposal distribution. The choice of proposal distribution strongly influences the performance of these filters (e.g., Liu and Chen 1998; Doucet et al. 2000; Arulampalam et al. 2002). Here, we consider the proposal given by the distribution of the present state given the state at the previous step and the most recent observations, which is known as the “optimal” proposal (Doucet et al. 2000). We extend the asymptotic results of BBS08 to the optimal proposal for the case of linear, Gaussian systems and thus demonstrate that, even with that proposal, the required ensemble size will still grow exponentially with an appropriate measure of the problem size.

Several particle filters that use sequential importance sampling have recently been developed for geophysical applications. The implicit particle filter (Chorin and Tu 2009; Morzfeld et al. 2012; Chorin et al. 2013) generates the *i*th new particle by solving an algebraic equation related to the likelihood of the most recent observations given the *i*th particle at the previous time. It is equivalent to the optimal proposal for linear, Gaussian systems. The equivalent-weights particle filter (van Leeuwen 2010; Ades and Van Leeuwen 2015) also uses the most recent observations in generating the *i*th new particle, by nudging the trajectory beginning from the *i*th particle at the previous time toward the new observations. It includes a further step that depends on the new observations, in which most particles are adjusted toward locations with nearly equal importance ratios. Papadakis et al. (2010) also present a particle filter in which the proposal uses the new observations.

For a given system and a given ensemble size, these more sophisticated particle filters often perform much better than the bootstrap filter, but their behavior as the system size increases has not yet been established. In principle, the analysis of BBS08 could be extended to each new filter. We avoid this nontrivial task by demonstrating that, out of the class of particle filters that generate new particles based only on the new observations and the particles generated at previous step, the optimal proposal minimizes the variance of the (unnormalized) weights over draws of both the previous and new particles. This result extends the usual optimality statement for the optimal proposal, namely, that it minimizes the variance of weights over draws of the new particles. The extended optimality means the particle filter employing the optimal proposal provides a lower bound for the ensemble size necessary to avoid degeneracy of the weights, a bound which applies to all single-step particle filters that use sequential importance sampling, including the implicit particle filter and the equivalent-weights particle filter.

The outline of the paper is as follows. The next section provides further background on sequential importance sampling and the optimal proposal distribution. In section 3, we show that the asymptotic results of BBS08 also hold for the optimal proposal in the case of linear, Gaussian systems and we examine the behavior of the optimal proposal in a simple test problem with independent and identically distributed (i.i.d.) degrees of freedom. Some of the basic results given here can also be found in Snyder (2012). Section 4 demonstrates that the optimal proposal is optimal in the extended sense described above, while section 5 outlines how system dimension, the system dynamics, and details of the observing network influence the required ensemble size, leading to a back-of-the-envelope assessment of particle filtering for global numerical weather prediction. We conclude with a summary and discussion of our results, including a suggestion that effective particle filters for high-dimensional systems will need to include some form of spatial localization, such as is employed in ensemble Kalman filters.

## 2. Background

This section briefly reviews sequential importance sampling and the optimal proposal distribution, together with the degeneracy of the particle weights and the asymptotic results of BBS08 related to degeneracy. Readers familiar with these topics can proceed to section 3.

Consider a discrete-time system with state **x** of dimension **y** of dimension *N*_{y} that are related to the state. The system is determined by the transition density *k* indicates evaluation at the *k*th time, *t*_{k}.

Our goal is to estimate the filtering density *t*_{k}. Since all pdfs in what follows will be conditioned on it,

Particle filters are sequential Monte Carlo techniques that represent *N*_{e} is the ensemble size and the weights must sum to 1 over the ensemble. The ensemble members *particles*.

*proposal*. The particle

It is also possible to develop particle filters that do not employ sequential importance sampling (Klaas et al. 2005; Nakano 2014). While we expect that high-dimensional problems will also be difficult for that class of particles filters, the results we present in sections 3 and 4 do not carry over directly.

An important subtlety in sequential importance sampling is that, for each *i*,

*standard*proposal. Another possible choice, which is known as the optimal proposal in the particle-filtering literature (e.g., Doucet et al. (2000)), is

*degeneracy*in the particle-filtering literature and

*collapse*in Snyder et al. (2008),

*i*, all other particles have weights close to zero and the conditional distribution will be poorly approximated. The asymptotic results of BBS08 concern degeneracy. They define

^{1}

Since (6) does not depend on

## 3. The optimal proposal in the context of linear, Gaussian systems

This section demonstrates that the asymptotic arguments of Bengtsson et al. (2008) and Snyder et al. (2008) are applicable to the optimal proposal when the system is linear and Gaussian. It also presents a linear, Gaussian example that illustrates both the asymptotic results and the potential benefits provided by the optimal proposal relative to the standard proposal. Although what follows is restricted to linear, Gaussian systems, the numerical simulations of Slivinski and Snyder (2015, manuscript submitted to *Mon. Wea. Rev.*) demonstrate that the asymptotic results are also informative in simple nonlinear systems.

### a. Asymptotic relations following Bengtsson et al.

To extend the asymptotic arguments of Bengtsson et al. (2008) and Snyder et al. (2008) to the optimal proposal, we must show that

*j*th components of

Extending these results for the optimal proposal [especially (8)] to non-Gaussian, nonlinear systems hinges on showing that *Mon. Wea. Rev.*) show that (8) is valid for a specific, significantly non-Gaussian system. Nevertheless, the linear, Gaussian case considered here is sufficient to establish that the optimal proposal does not avoid degeneracy.

The linear Gaussian case also provides insight into the potential advantages of the optimal proposal. Comparing (16) and (12) shows that *i*th eigenvalue of (16) is always bounded below by the *i*th eigenvalue of (12), so that

### b. A simple system with i.i.d. degrees of freedom

Consider the linear, Gaussian system (9) with *q*, the standard deviation of the system noise; and *a*, the standard deviation of the prior for *q* becomes small, which measures the degree to which the deterministic system dynamics affect the forecast uncertainty.

We first check the validity of the asymptotic relation (8) for the optimal proposal. Figure 1 shows

Returning to (19), it is immediately clear that

Even at moderate values of the system noise variance, the decrease of

The predictions of the asymptotic theory are confirmed in Fig. 3, which shows the minimum

Although more minor, the optimal proposal has the additional advantage that it yields better estimates for a given value of

## 4. Performance bounds from the optimal proposal

We next demonstrate that particle filters using the optimal proposal have minimal degeneracy, first explaining on an informal, intuitive level why this is so and then presenting a rigorous proof.

Our arguments apply to “single step” algorithms for particle filters, that is, algorithms in which the sample at

### a. An intuitive view

In a single-step particle filter, however, the best that we can hope for is that *before*

Comparison of (23) and (21) yields similar intuition. Choosing the optimal proposal makes the second density on the rhs agree, but the proposal lacks the conditioning on

These heuristic arguments indicate that the fundamental limitation of a single-step particle filter is not how cleverly the proposal is chosen but rather that the algorithm does not correct particles at earlier times to reflect new observations.

### b. Another look at optimality of the optimal proposal

The foregoing, intuitive argument suggests that the optimal proposal really is the best possible proposal for a single-step particle filter, since it is exactly the second distribution on the rhs of (23). We next give a rigorous result, showing that the optimal proposal minimizes the variance of

A general, inductive proof covering times from

This means that other single-step particle filters will always exhibit degeneracy that is more pronounced than that for the optimal proposal at a given

In contrast to our arguments, Bocquet et al. (2010) compare particle filters using the standard and optimal proposal in an idealized system and find advantages for the standard proposal in certain parameter regimes. The system they consider, however, is deterministic, a setting in which the standard and optimal proposals are identical. Moreover, the filters they implement include a fictitious system noise when drawing from the respective proposals. We conclude that any advantages of the standard proposal in their experiments arise from specific details of those implementations.

### c. Block-resampling algorithms

Sections 4a and 4b explain how the performance of single-step particle filters is limited by the need to correct particles at *L* will also be resampled using a proposal distribution that depends on

Block resampling using (27) is one potential way to reduce *L* of the resampling window increases as long as the system dynamics are not deterministic [i.e., as long as *L*.

Block resampling is not without drawbacks. As was the case for the optimal proposal, block resampling using (27) depends crucially on the specification of the system noise. More important, sampling from the proposal distribution is no longer easy or inexpensive. Indeed, its difficulty approaches that of sampling directly from the posterior distribution as *L* increases. Although the techniques of Morzfeld et al. (2012) offer promise for the future, implementation of block resampling remains prohibitive at present for many geophysical applications such as numerical weather prediction.

## 5. , number of observations, and system dimension

The interest of our results lies in their implications for particle filters in high-dimensional systems. Equation (8) relates the maximum weight to

In general, however,

This leaves the question of how large *Mon. Wea. Rev.*). Chorin and Morzfeld (2013) also explore the feasibility of high-dimensional assimilation, but from a different perspective.

For the standard proposal,

Extending this estimate to the optimal proposal requires assumptions about the magnitude of the system noise appropriate for global NWP. Little is known about the system noise and, indeed, many operational assimilation systems ignore system noise. It, therefore, seems reasonable to assume that the deficiencies of the forecast model are not the dominant contributions to short-range forecast errors. In the context of the simple system, the correct parameter regime is

## 6. Summary and discussion

This paper has shown that particle filters using the optimal proposal are subject to the asymptotic results of BBS08 that relate the expectation of one over the maximum weight to

Various properties of the optimal proposal can be illustrated in the simple system given in section 3b. The simple system consists of

A second important result (section 4) is that the optimal proposal is optimal in the sense that it minimizes a specific measure of weight degeneracy. More precisely, among all “single step” proposal distributions for

We next consider the relation of our results to the study of Ades and Van Leeuwen (2015). They apply the equivalent-weights particle filter in experiments with simulated two-dimensional turbulence and

How can their results be reconciled with ours that show that the optimal proposal bounds the performance of the equivalent-weights filter? First, the equivalent-weights proposal as implemented in Ades and Van Leeuwen (2015) becomes sharper as *ϵ*, which they set to

Second, as discussed in section 5, the degeneracy of the weights does not depend directly on

Overall, the optimal proposal offers substantial improvements over the standard proposal when the system noise is not too small, requiring orders of magnitude fewer ensemble members in many moderately large problems. At the same time,

It is important to emphasize that our results hold only for particle filters using sequential importance sampling. Filters that seek to apply importance sampling directly to the marginal, conditional distribution at

In our view, further progress in particle filtering for large, spatially distributed problems, such as global NWP, will rest on the incorporation of some form of spatial localization into the algorithm. Localization (Houtekamer and Mitchell 1998, 2001; Hamill et al. 2001) capitalizes on the common property in geophysical systems that state variables separated by a sufficient distance are nearly independent, and is the key idea that allows the ensemble Kalman filter to perform well with

## APPENDIX A

### Demonstration that

Clearly

## APPENDIX B

### General Proof of Optimality

We show by induction that for proposals of the form (B1), choosing

The arguments of section 4b hold with

## REFERENCES

Ades, M., and P. J. Van Leeuwen, 2015: The equivalent-weights particle filter in a high-dimensional system.

,*Quart. J. Roy. Meteor. Soc.***141**, 484–503, doi:10.1002/qj.2370.Arulampalam, M., S. Maskell, N. Gordon, and T. Clapp, 2002: A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking.

,*IEEE Trans. Signal Process.***50**, 174–188, doi:10.1109/78.978374.Bengtsson, T., C. Snyder, and D. Nychka, 2003: Toward a nonlinear ensemble filter for high-dimensional systems.

,*J. Geophys. Res.***108**, 8775, doi:10.1029/2002JD002900.Bengtsson, T., P. Bickel, and B. Li, 2008: Curse-of-dimensionality revisited: Collapse of the particle filter in very large scale systems.

*Probability and Statistics: Essays in Honor of David A. Freedman*, D. Nolan and T. Speeds, Eds., Vol. 2, Institute of Mathematical Statistics, 316–334, doi:10.1214/193940307000000518.Bickel, P., B. Li, and T. Bengtsson, 2008: Sharp failure rates for the bootstrap particle filter in high dimensions.

*Pushing the Limits of Contemporarly Statistics: Contributions in Honor of Jayanta K. Ghosh*, B. Clarke and S. Ghosal, Eds., Vol. 3, Institute of Mathematical Statistics, 318–329, doi:10.1214/074921708000000228.Bocquet, M., C. A. Pires, and L. Wu, 2010: Beyond Gaussian statistical modeling in geophysical data assimilation.

,*Mon. Wea. Rev.***138**, 2997–3023, doi:10.1175/2010MWR3164.1.Boffetta, G., A. Celani, A. Crisanti, and A. Vulpiani, 1997: Predictability in two-dimensional decaying turbulence.

,*Phys. Fluids***9**, 724–734, doi:10.1063/1.869227.Chorin, A. J., and X. Tu, 2009: Implicit sampling for particle filters.

,*Proc. Natl. Acad. Sci. USA***106**, 17 249–17 254, doi:10.1073/pnas.0909196106.Chorin, A. J., and M. Morzfeld, 2013: Conditions for successful data assimilation.

,*J. Geophys. Res. Atmos.***118**, 11 522–11 533, doi:10.1002/2013JD019838.Chorin, A. J., M. Morzfeld, and X. Tu, 2013: A survey of implicit particle filters for data assimilation.

*State Space Models: Applications in Economics and Finance*, Y. Zeng and S. Wu, Eds., Springer, 63–88.Doucet, A., S. Godsill, and C. Andrieu, 2000: Sequential Monte-Carlo methods for Bayesian filtering.

,*Stat. Comput.***10**, 197–208, doi:10.1023/A:1008935410038.Doucet, A., M. Briers, and S. Sénécal, 2006: Efficient block sampling strategies for sequential Monte Carlo methods.

,*J. Comput. Graph. Stat.***15**, 693–711, doi:10.1198/106186006X142744.Gordon, N. J., D. J. Salmond, and A. F. M. Smith, 1993: Novel approach to nonlinear/non-Gaussian Bayesian state estimation.

,*IEE Proc.-F Radar Signal Process.***140**, 107–113.Hamill, T. M., J. S. Whitaker, and C. Snyder, 2001: Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter.

,*Mon. Wea. Rev.***129**, 2776–2790, doi:10.1175/1520-0493(2001)129<2776:DDFOBE>2.0.CO;2.Houtekamer, P. L., and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique.

,*Mon. Wea. Rev.***126**, 796–811, doi:10.1175/1520-0493(1998)126<0796:DAUAEK>2.0.CO;2.Houtekamer, P. L., and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation.

,*Mon. Wea. Rev.***129**, 123–137, doi:10.1175/1520-0493(2001)129<0123:ASEKFF>2.0.CO;2.Hunt, B. R., E. J. Kostelich, and I. Szunyogh, 2007: Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter.

,*Physica D***230**, 112–126, doi:10.1016/j.physd.2006.11.008.Klaas, M., A. Doucet, and N. D. Frietas, 2005: Towards practical Monte Carlo: The marginal particle filter.

*Proc. 21st Annual Conf. on Uncertainty in Artificial Intelligence*, Arlington, VA, Association for Uncertainty in Artificial Intelligence, 308–315.Kong, A., J. Liu, and W. Wong, 1994: Sequential imputations and Bayesian missing data problems.

,*J. Amer. Stat. Assoc.***89**, 278–288, doi:10.1080/01621459.1994.10476469.Lin, M., R. Chen, and J. S. Liu, 2013: Lookahead strategies for sequential Monte Carlo.

,*Stat. Sci.***28**, 69–94, doi:10.1214/12-STS401.Liu, J. S., and R. Chen, 1998: Sequential Monte Carlo methods for dynamic systems.

,*J. Amer. Stat. Assoc.***93**, 1032–1044, doi:10.1080/01621459.1998.10473765.Metref, S., E. Cosme, C. Snyder, and P. Brasseur, 2014: A non-Gaussian analysis scheme using rank histograms for ensemble data assimilation.

,*Nonlinear Processes Geophys.***21**, 869–885, doi:10.5194/npg-21-869-2014.Morzfeld, M., X. Tu, E. Atkins, and A. J. Chorin, 2012: A random map implementation of implicit filters.

,*J. Comput. Phys.***231**, 2049–2066, doi:10.1016/j.jcp.2011.11.022.Nakano, S., 2014: Hybrid algorithm of ensemble transform and importance sampling for assimilation of non-Gaussian observations.

,*Tellus***66A**, 21429, doi:10.3402/tellusa.v66.21429.Papadakis, N., E. Mémin, A. Cuzol, and N. Gengembre, 2010: Data assimilation with the weighted ensemble Kalman filter.

,*Tellus***62A**, 673–697, doi: 10.1111/j.1600-0870.2010.00461.x.Parlett, B. N., 1998:

*The Symmetric Eigenvalue Problem.*SIAM, xxiv + 391 pp.Rabier, F., A. McNally, E. Andersson, P. Courtier, P. Undén, J. Eyre, A. Hollingsworth, and F. Bouttier, 1998: The ECMWF implementation of three-dimensional variational assimilation (3D-Var). Part II: Structure functions.

,*Quart. J. Roy. Meteor. Soc.***124**, 1809–1829, doi:10.1002/qj.49712455003.Reich, S., 2013: A nonparametric ensemble transform method for Bayesian inference.

,*SIAM J. Sci. Comput.***35**, A2013–A2024, doi:10.1137/130907367.Rotunno, R., and C. Snyder, 2008: A generalization of Lorenz’s model for the predictability of flows with many scales of motion.

,*J. Atmos. Sci.***65**, 1063–1076, doi:10.1175/2007JAS2449.1.Snyder, C., 2012: Particle filters, the “optimal” proposal and high-dimensional systems.

*ECMWF Seminar on Data Assimilation for Atmosphere and Ocean*, Shinfield, United Kingdom, ECMWF, 161–170.Snyder, C., T. Bengtsson, P. Bickel, and J. Anderson, 2008: Obstacles to high-dimensional particle filtering.

,*Mon. Wea. Rev.***136**, 4629–4640, doi:10.1175/2008MWR2529.1.van Leeuwen, P. J., 2009: Particle filtering in geophysical systems.

,*Mon. Wea. Rev.***137**, 4089–4114, doi:10.1175/2009MWR2835.1.van Leeuwen, P. J., 2010: Nonlinear data assimilation in geosciences: An extremely efficient particle filter.

,*Quart. J. Roy. Meteor. Soc.***136**, 1991–1999, doi:10.1002/qj.699.Weir, B., R. N. Miller, and Y. H. Spitz, 2013: A potential implicit particle method for high-dimensional systems.

,*Nonlinear Processes Geophys.***20**, 1047–1060, doi:10.5194/npg-20-1047-2013.

^{1}

In those papers,