1. Introduction
Particle filters are ensemble-based algorithms for data assimilation that, unlike many schemes, make no assumptions about the probability distributions for the prior or the observation errors. One difficulty is that particle filters may suffer from degeneracy, in which the weight assigned to one ensemble member (or particle) converges to one while those assigned to all other members approach zero. Bengtsson et al. (2008), Bickel et al. (2008), and Snyder et al. (2008) (hereafter BBS08) showed that avoiding degeneracy in the most elementary particle filter [essentially, the bootstrap filter of Gordon et al. (1993)] requires an ensemble size that increases exponentially with the variance of the log-likelihood of the observations given each member, which in simple examples is proportional to the system dimension.
More general particle filters employ sequential importance sampling, in which the ensemble members at each step are drawn from a proposal distribution. The choice of proposal distribution strongly influences the performance of these filters (e.g., Liu and Chen 1998; Doucet et al. 2000; Arulampalam et al. 2002). Here, we consider the proposal given by the distribution of the present state given the state at the previous step and the most recent observations, which is known as the “optimal” proposal (Doucet et al. 2000). We extend the asymptotic results of BBS08 to the optimal proposal for the case of linear, Gaussian systems and thus demonstrate that, even with that proposal, the required ensemble size will still grow exponentially with an appropriate measure of the problem size.
Several particle filters that use sequential importance sampling have recently been developed for geophysical applications. The implicit particle filter (Chorin and Tu 2009; Morzfeld et al. 2012; Chorin et al. 2013) generates the ith new particle by solving an algebraic equation related to the likelihood of the most recent observations given the ith particle at the previous time. It is equivalent to the optimal proposal for linear, Gaussian systems. The equivalent-weights particle filter (van Leeuwen 2010; Ades and Van Leeuwen 2015) also uses the most recent observations in generating the ith new particle, by nudging the trajectory beginning from the ith particle at the previous time toward the new observations. It includes a further step that depends on the new observations, in which most particles are adjusted toward locations with nearly equal importance ratios. Papadakis et al. (2010) also present a particle filter in which the proposal uses the new observations.
For a given system and a given ensemble size, these more sophisticated particle filters often perform much better than the bootstrap filter, but their behavior as the system size increases has not yet been established. In principle, the analysis of BBS08 could be extended to each new filter. We avoid this nontrivial task by demonstrating that, out of the class of particle filters that generate new particles based only on the new observations and the particles generated at previous step, the optimal proposal minimizes the variance of the (unnormalized) weights over draws of both the previous and new particles. This result extends the usual optimality statement for the optimal proposal, namely, that it minimizes the variance of weights over draws of the new particles. The extended optimality means the particle filter employing the optimal proposal provides a lower bound for the ensemble size necessary to avoid degeneracy of the weights, a bound which applies to all single-step particle filters that use sequential importance sampling, including the implicit particle filter and the equivalent-weights particle filter.
The outline of the paper is as follows. The next section provides further background on sequential importance sampling and the optimal proposal distribution. In section 3, we show that the asymptotic results of BBS08 also hold for the optimal proposal in the case of linear, Gaussian systems and we examine the behavior of the optimal proposal in a simple test problem with independent and identically distributed (i.i.d.) degrees of freedom. Some of the basic results given here can also be found in Snyder (2012). Section 4 demonstrates that the optimal proposal is optimal in the extended sense described above, while section 5 outlines how system dimension, the system dynamics, and details of the observing network influence the required ensemble size, leading to a back-of-the-envelope assessment of particle filtering for global numerical weather prediction. We conclude with a summary and discussion of our results, including a suggestion that effective particle filters for high-dimensional systems will need to include some form of spatial localization, such as is employed in ensemble Kalman filters.
2. Background
This section briefly reviews sequential importance sampling and the optimal proposal distribution, together with the degeneracy of the particle weights and the asymptotic results of BBS08 related to degeneracy. Readers familiar with these topics can proceed to section 3.
Consider a discrete-time system with state x of dimension
Our goal is to estimate the filtering density
Particle filters are sequential Monte Carlo techniques that represent











It is also possible to develop particle filters that do not employ sequential importance sampling (Klaas et al. 2005; Nakano 2014). While we expect that high-dimensional problems will also be difficult for that class of particles filters, the results we present in sections 3 and 4 do not carry over directly.
An important subtlety in sequential importance sampling is that, for each i,











Since (6) does not depend on
3. The optimal proposal in the context of linear, Gaussian systems
This section demonstrates that the asymptotic arguments of Bengtsson et al. (2008) and Snyder et al. (2008) are applicable to the optimal proposal when the system is linear and Gaussian. It also presents a linear, Gaussian example that illustrates both the asymptotic results and the potential benefits provided by the optimal proposal relative to the standard proposal. Although what follows is restricted to linear, Gaussian systems, the numerical simulations of Slivinski and Snyder (2015, manuscript submitted to Mon. Wea. Rev.) demonstrate that the asymptotic results are also informative in simple nonlinear systems.
a. Asymptotic relations following Bengtsson et al.
To extend the asymptotic arguments of Bengtsson et al. (2008) and Snyder et al. (2008) to the optimal proposal, we must show that





































Extending these results for the optimal proposal [especially (8)] to non-Gaussian, nonlinear systems hinges on showing that
The linear Gaussian case also provides insight into the potential advantages of the optimal proposal. Comparing (16) and (12) shows that
b. A simple system with i.i.d. degrees of freedom
Consider the linear, Gaussian system (9) with

















We first check the validity of the asymptotic relation (8) for the optimal proposal. Figure 1 shows
Accuracy of the asymptotic relation (8) in numerical simulations using the optimal proposal. To obtain a range of values for
Citation: Monthly Weather Review 143, 11; 10.1175/MWR-D-15-0144.1
Returning to (19), it is immediately clear that
Contours for the ratio of
Citation: Monthly Weather Review 143, 11; 10.1175/MWR-D-15-0144.1
Even at moderate values of the system noise variance, the decrease of
The predictions of the asymptotic theory are confirmed in Fig. 3, which shows the minimum
The minimum
Citation: Monthly Weather Review 143, 11; 10.1175/MWR-D-15-0144.1
Although more minor, the optimal proposal has the additional advantage that it yields better estimates for a given value of
The ratio of mean squared error (MSE), averaged over 100 realizations, to (top) the MSE of the optimal, conditional-mean estimate and (bottom) the ratio of the estimate posterior variance to the MSE as a function of the expected maximum weight. Results from numerical simulations for the standard proposal (circles) and optimal proposal (dots) are shown, with different points corresponding to different values of
Citation: Monthly Weather Review 143, 11; 10.1175/MWR-D-15-0144.1
4. Performance bounds from the optimal proposal
We next demonstrate that particle filters using the optimal proposal have minimal degeneracy, first explaining on an informal, intuitive level why this is so and then presenting a rigorous proof.
Our arguments apply to “single step” algorithms for particle filters, that is, algorithms in which the sample at
a. An intuitive view



















In a single-step particle filter, however, the best that we can hope for is that
Comparison of (23) and (21) yields similar intuition. Choosing the optimal proposal makes the second density on the rhs agree, but the proposal lacks the conditioning on
These heuristic arguments indicate that the fundamental limitation of a single-step particle filter is not how cleverly the proposal is chosen but rather that the algorithm does not correct particles at earlier times to reflect new observations.
b. Another look at optimality of the optimal proposal
The foregoing, intuitive argument suggests that the optimal proposal really is the best possible proposal for a single-step particle filter, since it is exactly the second distribution on the rhs of (23). We next give a rigorous result, showing that the optimal proposal minimizes the variance of
A general, inductive proof covering times from














This means that other single-step particle filters will always exhibit degeneracy that is more pronounced than that for the optimal proposal at a given
In contrast to our arguments, Bocquet et al. (2010) compare particle filters using the standard and optimal proposal in an idealized system and find advantages for the standard proposal in certain parameter regimes. The system they consider, however, is deterministic, a setting in which the standard and optimal proposals are identical. Moreover, the filters they implement include a fictitious system noise when drawing from the respective proposals. We conclude that any advantages of the standard proposal in their experiments arise from specific details of those implementations.
c. Block-resampling algorithms
Sections 4a and 4b explain how the performance of single-step particle filters is limited by the need to correct particles at


Block resampling using (27) is one potential way to reduce
Block resampling is not without drawbacks. As was the case for the optimal proposal, block resampling using (27) depends crucially on the specification of the system noise. More important, sampling from the proposal distribution is no longer easy or inexpensive. Indeed, its difficulty approaches that of sampling directly from the posterior distribution as L increases. Although the techniques of Morzfeld et al. (2012) offer promise for the future, implementation of block resampling remains prohibitive at present for many geophysical applications such as numerical weather prediction.
5. 
, number of observations, and system dimension

The interest of our results lies in their implications for particle filters in high-dimensional systems. Equation (8) relates the maximum weight to
In general, however,
This leaves the question of how large
For the standard proposal,
Extending this estimate to the optimal proposal requires assumptions about the magnitude of the system noise appropriate for global NWP. Little is known about the system noise and, indeed, many operational assimilation systems ignore system noise. It, therefore, seems reasonable to assume that the deficiencies of the forecast model are not the dominant contributions to short-range forecast errors. In the context of the simple system, the correct parameter regime is
6. Summary and discussion
This paper has shown that particle filters using the optimal proposal are subject to the asymptotic results of BBS08 that relate the expectation of one over the maximum weight to
Various properties of the optimal proposal can be illustrated in the simple system given in section 3b. The simple system consists of
A second important result (section 4) is that the optimal proposal is optimal in the sense that it minimizes a specific measure of weight degeneracy. More precisely, among all “single step” proposal distributions for
We next consider the relation of our results to the study of Ades and Van Leeuwen (2015). They apply the equivalent-weights particle filter in experiments with simulated two-dimensional turbulence and
How can their results be reconciled with ours that show that the optimal proposal bounds the performance of the equivalent-weights filter? First, the equivalent-weights proposal as implemented in Ades and Van Leeuwen (2015) becomes sharper as
Second, as discussed in section 5, the degeneracy of the weights does not depend directly on
Overall, the optimal proposal offers substantial improvements over the standard proposal when the system noise is not too small, requiring orders of magnitude fewer ensemble members in many moderately large problems. At the same time,
It is important to emphasize that our results hold only for particle filters using sequential importance sampling. Filters that seek to apply importance sampling directly to the marginal, conditional distribution at
In our view, further progress in particle filtering for large, spatially distributed problems, such as global NWP, will rest on the incorporation of some form of spatial localization into the algorithm. Localization (Houtekamer and Mitchell 1998, 2001; Hamill et al. 2001) capitalizes on the common property in geophysical systems that state variables separated by a sufficient distance are nearly independent, and is the key idea that allows the ensemble Kalman filter to perform well with
APPENDIX A
Demonstration that 

Clearly





APPENDIX B
General Proof of Optimality





We show by induction that for proposals of the form (B1), choosing
The arguments of section 4b hold with




















REFERENCES
Ades, M., and P. J. Van Leeuwen, 2015: The equivalent-weights particle filter in a high-dimensional system. Quart. J. Roy. Meteor. Soc., 141, 484–503, doi:10.1002/qj.2370.
Arulampalam, M., S. Maskell, N. Gordon, and T. Clapp, 2002: A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Process., 50, 174–188, doi:10.1109/78.978374.
Bengtsson, T., C. Snyder, and D. Nychka, 2003: Toward a nonlinear ensemble filter for high-dimensional systems. J. Geophys. Res., 108, 8775, doi:10.1029/2002JD002900.
Bengtsson, T., P. Bickel, and B. Li, 2008: Curse-of-dimensionality revisited: Collapse of the particle filter in very large scale systems. Probability and Statistics: Essays in Honor of David A. Freedman, D. Nolan and T. Speeds, Eds., Vol. 2, Institute of Mathematical Statistics, 316–334, doi:10.1214/193940307000000518.
Bickel, P., B. Li, and T. Bengtsson, 2008: Sharp failure rates for the bootstrap particle filter in high dimensions. Pushing the Limits of Contemporarly Statistics: Contributions in Honor of Jayanta K. Ghosh, B. Clarke and S. Ghosal, Eds., Vol. 3, Institute of Mathematical Statistics, 318–329, doi:10.1214/074921708000000228.
Bocquet, M., C. A. Pires, and L. Wu, 2010: Beyond Gaussian statistical modeling in geophysical data assimilation. Mon. Wea. Rev., 138, 2997–3023, doi:10.1175/2010MWR3164.1.
Boffetta, G., A. Celani, A. Crisanti, and A. Vulpiani, 1997: Predictability in two-dimensional decaying turbulence. Phys. Fluids, 9, 724–734, doi:10.1063/1.869227.
Chorin, A. J., and X. Tu, 2009: Implicit sampling for particle filters. Proc. Natl. Acad. Sci. USA, 106, 17 249–17 254, doi:10.1073/pnas.0909196106.
Chorin, A. J., and M. Morzfeld, 2013: Conditions for successful data assimilation. J. Geophys. Res. Atmos., 118, 11 522–11 533, doi:10.1002/2013JD019838.
Chorin, A. J., M. Morzfeld, and X. Tu, 2013: A survey of implicit particle filters for data assimilation. State Space Models: Applications in Economics and Finance, Y. Zeng and S. Wu, Eds., Springer, 63–88.
Doucet, A., S. Godsill, and C. Andrieu, 2000: Sequential Monte-Carlo methods for Bayesian filtering. Stat. Comput., 10, 197–208, doi:10.1023/A:1008935410038.
Doucet, A., M. Briers, and S. Sénécal, 2006: Efficient block sampling strategies for sequential Monte Carlo methods. J. Comput. Graph. Stat., 15, 693–711, doi:10.1198/106186006X142744.
Gordon, N. J., D. J. Salmond, and A. F. M. Smith, 1993: Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proc.-F Radar Signal Process., 140, 107–113.
Hamill, T. M., J. S. Whitaker, and C. Snyder, 2001: Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter. Mon. Wea. Rev., 129, 2776–2790, doi:10.1175/1520-0493(2001)129<2776:DDFOBE>2.0.CO;2.
Houtekamer, P. L., and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique. Mon. Wea. Rev., 126, 796–811, doi:10.1175/1520-0493(1998)126<0796:DAUAEK>2.0.CO;2.
Houtekamer, P. L., and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 129, 123–137, doi:10.1175/1520-0493(2001)129<0123:ASEKFF>2.0.CO;2.
Hunt, B. R., E. J. Kostelich, and I. Szunyogh, 2007: Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Physica D, 230, 112–126, doi:10.1016/j.physd.2006.11.008.
Klaas, M., A. Doucet, and N. D. Frietas, 2005: Towards practical Monte Carlo: The marginal particle filter. Proc. 21st Annual Conf. on Uncertainty in Artificial Intelligence, Arlington, VA, Association for Uncertainty in Artificial Intelligence, 308–315.
Kong, A., J. Liu, and W. Wong, 1994: Sequential imputations and Bayesian missing data problems. J. Amer. Stat. Assoc., 89, 278–288, doi:10.1080/01621459.1994.10476469.
Lin, M., R. Chen, and J. S. Liu, 2013: Lookahead strategies for sequential Monte Carlo. Stat. Sci., 28, 69–94, doi:10.1214/12-STS401.
Liu, J. S., and R. Chen, 1998: Sequential Monte Carlo methods for dynamic systems. J. Amer. Stat. Assoc., 93, 1032–1044, doi:10.1080/01621459.1998.10473765.
Metref, S., E. Cosme, C. Snyder, and P. Brasseur, 2014: A non-Gaussian analysis scheme using rank histograms for ensemble data assimilation. Nonlinear Processes Geophys., 21, 869–885, doi:10.5194/npg-21-869-2014.
Morzfeld, M., X. Tu, E. Atkins, and A. J. Chorin, 2012: A random map implementation of implicit filters. J. Comput. Phys., 231, 2049–2066, doi:10.1016/j.jcp.2011.11.022.
Nakano, S., 2014: Hybrid algorithm of ensemble transform and importance sampling for assimilation of non-Gaussian observations. Tellus, 66A, 21429, doi:10.3402/tellusa.v66.21429.
Papadakis, N., E. Mémin, A. Cuzol, and N. Gengembre, 2010: Data assimilation with the weighted ensemble Kalman filter. Tellus, 62A, 673–697, doi: 10.1111/j.1600-0870.2010.00461.x.
Parlett, B. N., 1998: The Symmetric Eigenvalue Problem. SIAM, xxiv + 391 pp.
Rabier, F., A. McNally, E. Andersson, P. Courtier, P. Undén, J. Eyre, A. Hollingsworth, and F. Bouttier, 1998: The ECMWF implementation of three-dimensional variational assimilation (3D-Var). Part II: Structure functions. Quart. J. Roy. Meteor. Soc., 124, 1809–1829, doi:10.1002/qj.49712455003.
Reich, S., 2013: A nonparametric ensemble transform method for Bayesian inference. SIAM J. Sci. Comput., 35, A2013–A2024, doi:10.1137/130907367.
Rotunno, R., and C. Snyder, 2008: A generalization of Lorenz’s model for the predictability of flows with many scales of motion. J. Atmos. Sci., 65, 1063–1076, doi:10.1175/2007JAS2449.1.
Snyder, C., 2012: Particle filters, the “optimal” proposal and high-dimensional systems. ECMWF Seminar on Data Assimilation for Atmosphere and Ocean, Shinfield, United Kingdom, ECMWF, 161–170.
Snyder, C., T. Bengtsson, P. Bickel, and J. Anderson, 2008: Obstacles to high-dimensional particle filtering. Mon. Wea. Rev., 136, 4629–4640, doi:10.1175/2008MWR2529.1.
van Leeuwen, P. J., 2009: Particle filtering in geophysical systems. Mon. Wea. Rev., 137, 4089–4114, doi:10.1175/2009MWR2835.1.
van Leeuwen, P. J., 2010: Nonlinear data assimilation in geosciences: An extremely efficient particle filter. Quart. J. Roy. Meteor. Soc., 136, 1991–1999, doi:10.1002/qj.699.
Weir, B., R. N. Miller, and Y. H. Spitz, 2013: A potential implicit particle method for high-dimensional systems. Nonlinear Processes Geophys., 20, 1047–1060, doi:10.5194/npg-20-1047-2013.
In those papers,