## 1. Introduction

Ensemble methods for data assimilation are presently undergoing rapid development. The ensemble Kalman filter (EnKF), in various forms, has been successfully applied to a wide range of geophysical systems including atmospheric flows from global to convective scales (Whitaker et al. 2004; Snyder and Zhang 2003), oceanography from global to basin scales (Keppenne et al. 2005), and the land surface (Reichle et al. 2002). Particle filters are another class of ensemble-based assimilation methods of interest in geophysical applications. [See Gordon et al. (1993) or Doucet et al. (2001) for an introduction.]

In their simplest form, particle filters calculate posterior weights for each ensemble member based on the likelihood of the observations given that member. Like the EnKF, particle filters are simple to implement and largely independent of the forecast model, but they have the added attraction that they are, in principle, fully general implementations of Bayes’s rule and applicable to highly non-Gaussian probability distributions. Unlike the EnKF, however, particle filters have so far mostly been applied to low-dimensional systems. This paper examines obstacles to applying particle filters in high-dimensional systems.

Both particle filters and the EnKF are Monte Carlo techniques—they work with samples (i.e., ensembles) rather than directly with the underlying probability density function (pdf). Naively, one would expect such techniques to require ensemble sizes that are large relative to the dimension of the state vector. Experience has shown, however, that this requirement does not hold for the EnKF if localization of the sample covariance matrix is employed (Houtekamer and Mitchell 1998, 2001; Hamill et al. 2001). The feasibility of the EnKF with ensemble sizes much smaller than the state dimension also has theoretical justification. Furrer and Bengtsson (2007) and Bickel and Levina (2008) examine the sample covariance structure for reasonably natural classes of covariance matrices and demonstrate the effectiveness of localizing the sample covariance matrix.

There is much less experience with particle filters in high dimensions. Several studies have presented results from particle filters and smoothers for very low dimensional systems, including that of Lorenz (1963) and the double-well potential (Pham 2001; Kim et al. 2003; Moradkhani et al. 2005; Xiong et al. 2006; Chin et al. 2007). Both van Leeuwen (2003) and Zhou et al. (2006), however, apply the particle filter to higher-dimensional systems. Van Leeuwen (2003) considers a model for the Agulhas Current with dimension of roughly 2 × 10^{5}, and Zhou et al. (2006) use a land surface model of dimension 684. We will return to the relation of our results to their studies in the concluding section.

One might expect that particle filters, which in essence attempt to approximate the full pdf of the state, will be substantially more difficult to apply in high dimensions than the EnKF, which only involves approximation of the mean and covariance. The estimation of continuous pdfs is known to suffer from the “curse of dimensionality,” requiring computations that increase exponentially with dimension (Silverman 1986).

We argue here that high-dimensional particle filters face fundamental difficulties. Specifically, we explore the result from Bengtsson et al. (2008) and Bickel et al. (2008) that, unless the ensemble size is exponentially large in a quantity *τ* ^{2}, the particle-filter update suffers from a “collapse” in which with high probability a single member is assigned a posterior weight close to one while all other members have vanishingly small weights. The quantity *τ* ^{2} is the variance of the observation log likelihood, which depends not only on the state dimension but also on the prior distribution and the number and character of observations. As will be discussed later, *τ* ^{2} may be considered an effective dimension as it is proportional to the dimension of the state vector in some simple examples.

The tendency for collapse of weights has been remarked on previously in the geophyscial literature (Anderson and Anderson 1999; Bengtsson et al. 2003; van Leeuwen 2003) and is also well known in the particle-filtering literature, where it is often referred to as “degeneracy,” “impoverishment,” or “sample attrition.” Unlike previous studies, however, we emphasize the collapse of weights as a fundamental obstacle to particle filtering in high-dimensional systems, in that very large ensembles are required to avoid collapse even for system dimensions of a few tens or hundreds.^{1}

Because of the tendency for collapse, particle filters invariably employ some form of resampling or selection step after the updated weights are calculated (e.g., Liu 2001), in order to remove members with very small weights and replenish the ensemble. We do not analyze resampling algorithms in this paper but rather contend that, whatever their efficacy for systems of small dimension and reasonably large ensemble sizes, they are unlikely to overcome the need for exponentially large ensembles as *τ* ^{2} grows. Resampling proceeds from the approximate posterior distribution computed by the particle filter; it does not improve the quality of that approximate posterior.

The particle filter can also be cast in the framework of importance sampling (see Doucet et al. 2001 for an introduction), which allows one to choose the proposal distribution from which the particles are drawn. All the analysis in this paper assumes that the proposal is the prior distribution, a simple and widely used approach. Although the possibility is yet to be demonstrated, clever choices of the proposal distribution may be able to overcome the need for exponentially large ensemble sizes in high-dimensional systems.

The outline of the paper is as follows. In section 2, we review the basics of particle filters. Section 3 illustrates the difficulty of particle filtering when *τ* ^{2} is not small through simulations for the simplest possible example: a Gaussian prior and observations of each component of the state with Gaussian errors, both of which have identity covariance. In section 4, we derive (following Bengtsson et al. 2008) an asymptotic condition on the ensemble sizes that yield collapse when both the prior and observation errors are independent and identically distributed in each component of the state vector. Section 5 extends those results to the more general case of Gaussian priors and Gaussian observation errors. Section 6 briefly discusses the effect of a specific heavy-tailed distribution for the observation error.

## 2. Background on particle filters

Our notation will generally follow that of Ide et al. (1997) except for the dimensions of the state and observation vectors and our use of subscripts to indicate ensemble members.

Let **x** of dimension *N _{x}* be the state of the system represented in some discrete basis, such as the values of all prognostic variables on a regular grid. Since it cannot be determined exactly given imperfect observations, we consider

**x**to be a random variable with pdf

*p*(

**x**).

The subsequent discussion will focus on the update of *p*(**x**) given new observations at some time *t* = *t*_{0}. That is, suppose that we have both a prediction *p*[**x**(*t*_{0})] and a vector of observations **y** that depends on **x**(*t*_{0}) and has dimension *N _{y}.* {To be more precise,

*p*[

**x**(

*t*

_{0})] is conditioned on all observations prior to

*t*=

*t*

_{0}. Since all pdfs here pertain to

*t*=

*t*

_{0}and will be conditioned on all previous observations, in what follows we suppress explicit reference to

*t*

_{0}and the previous observations.} We wish to estimate

*p*(

**x**|

**y**), the pdf of

**x**given the observations

**y**, which we will term the posterior pdf.

**:More general observation models are of course possible but (1) suffices for all the points we wish to make in this paper.**

*ϵ***x**

^{f}

_{i},

*i*= 1, . . . ,

*N*

_{e}} that is assumed to be drawn from

*p*(

**x**), where the superscript

*f*(for “forecast”) indicates a prior quantity. The ensemble members are also known as particles. The update step makes the approximation of replacing the prior density

*p*(

**x**) by a sum of delta functions,

*N*

^{−1}

_{e}Σ

^{Ne}

_{i=1}

*δ*(

**x**−

**x**

^{f}

_{i}). Applying Bayes’s rule yieldswhere the posterior weights are given byIn the posterior, each member

**x**

^{f}

_{i}is weighted according to how likely the observations would be if

**x**

^{f}

_{i}were the true state.

*p*(

**y**|

**x**

^{f}

_{i}) is much larger than the rest, max

*w*will be close to 1 and the particle filter approximates the posterior pdf as a single point mass. The particle-filter estimates of posterior expectations, such as the posterior meanmay then be poor approximations. We will loosely term this situation, in which a single member is given almost all the posterior weight, as a collapse of the particle filter. The goal of our study is to describe the situations in which collapse occurs, both through the rigorous asymptotic results of Bengtsson et al. (2008) for large

_{i}*N*and

_{y}*N*and through simulations informed by the asymptotics.

_{e}## 3. Failure of the particle filter in a simple example

We next consider a simple example, in which the prior distribution *p*(**x**) is Gaussian with each component of **x** independent and of unit variance and the observations **y** are of each component of **x** individually with independent Gaussian errors of unit variance. More concisely, consider *N*_{y} = *N*_{x}, 𝗛 = 𝗜, **x** ∼ *N*(0, 𝗜), and ** ϵ** ∼

*N*(0, 𝗜), where the symbol ∼ means “is distributed as” and

*N*(

**, 𝗣) is the Gaussian distribution with mean**

*μ***and covariance matrix 𝗣.**

*μ*Figure 1 shows histograms for max*w _{i}* from simulations of the particle-filter update using

*N*= 10, 30, and 100, and

_{x}*N*= 10

_{e}^{3}. In the simulations,

**x**,

**, and an ensemble {**

*ϵ***x**

^{f}

_{i},

*i*= 1, . . . ,

*N*

_{e}} are drawn from

*N*(0, 𝗜). Weights

*w*are then computed from (3). The histograms are based on 10

_{i}^{3}realizations for each value of

*N*.

_{x}The maximum *w _{i}* is increasingly likely to be close to 1 as

*N*and

_{x}*N*increase. Large weights appear occasionally in the case

_{y}*N*= 10, for which max

_{x}*w*> 0.5 in just over 6% of the simulations. Once

_{i}*N*= 100, the average value of max

_{x}*w*over the 10

_{i}^{3}simulations is greater than 0.8 and max

*w*> 0.5 with probability 0.9. Collapse of the weights occurs frequently for

_{i}*N*= 100 despite the ensemble size

_{x}*N*= 10

_{e}^{3}.

**x**

^{a}= (

**x**

^{f}+

**y**)/2, where the superscript

*a*(for “analysis”) indicates a posterior quantity and the prior mean

**x**

*= 0 in this example. The expected squared error of*

^{f}**x**

*is*

^{a}*E*(|

**x**

^{a}−

**x**|

^{2}) = [

*E*(|

**x**

^{f}−

**x**|

^{2}+

*E*(|

**y**−

**x**|

^{2})]/4 =

*N*

_{x}/2, while that of the observations [

*E*(|

**y**−

**x**|

^{2})] is equal to

*N*. The posterior mean estimated by the particle filter,has squared error of 5.5, 25, and 127 for

_{x}*N*= 10, 30, and 100, respectively, when averaged over the simulations. Thus,

_{x}**x̂**

^{a}has error close to that of

**x**

*only for*

^{a}*N*= 10. For

_{x}*N*= 100, collapse of the weights is pronounced and

_{x}**x̂**

^{a}is a very poor estimator of the posterior mean—it has

*larger*errors than either the prior or the observations.

As might be expected, the effects of collapse are also apparent in the particle-filter estimate of posterior variance, which is given by Σ*w*_{i}|**x**^{f}_{i} − **x̂**^{a}|^{2}. The correct posterior variance is given by *E*(|**x** − **x**^{a}|^{2}) = *N*_{x}/2, yet the particle-filter estimates (again averaged over 10^{3} simulations) are 4.7, 10.5, and 19.5 for *N _{x}* = 10, 30, and 100, respectively. Except for

*N*= 10, the particle-filter update significantly underestimates the posterior variance, especially when compared with the squared error of

_{x}**x̂**

^{a}.

The natural question is how large the ensemble must be in order to avoid the complete failure of the update. This example is tractable enough that the answer may be found by direct simulation: for various *N _{x}*, we simulate with

*N*

_{e}= 10 × 2

^{k}and increase

*k*until the average squared error of

**x̂**

^{a}is less than that of the prior or the observations. We emphasize that this merely requires that the particle-filter estimate of the state is no worse than simply relying on the observations or the prior alone (i.e., that the particle filter “does no harm”). The

*N*required to reach this minimal threshold is shown as a function of

_{e}*N*(or

_{x}*N*) in Fig. 2.

_{y}The required *N _{e}* appears to increase exponentially in

*N*The limitations this increase places on implementations of the particle filter are profound. For

_{x}.*N*=

_{x}*N*= 90, somewhat more than 3 × 10

_{y}^{5}ensemble members are needed. Ensemble sizes for larger systems can be estimated from the best-fit line shown in Fig. 2. Increasing

*N*and

_{x}*N*to 100 increases the necessary ensemble size to just under 10

_{y}^{6}, while

*N*=

_{x}*N*= 200 would require 10

_{y}^{11}members.

The exponential dependence on *N _{e}* is also apparent in other aspects of the problem. Figure 3 shows the minimum

*N*such that maximum

_{e}*w*(averaged over 400 realizations) is less than a specified value. For each of the values 0.6, 0.7, and 0.8, the required

_{i}*N*increases approximately exponentially with

_{e}*N*.

_{x}## 4. Behavior of weights for large *N*_{y}

_{y}

The previous example highlights potential difficulties with the particle-filter update but does not permit more general conclusions. Results of Bengtsson et al. (2008), outlined in this section and the next, provide further guidance on the behavior of the particle-filter weights. Our discussion will be largely heuristic; we refer the reader to Bengtsson et al. for more rigorous and detailed proofs.

### a. Approximation of the observation likelihood

*ϵ*of

_{j}**is independent and identically distributed (i.i.d.) with density**

*ϵ**f*(). Then for each member

**x**

^{f}

_{i}, the observation likelihood can be written aswhere

*y*and (𝗛

_{j}**x**

^{f}

_{i})

_{j}are the

*j*th components of

**y**and 𝗛

**x**

^{f}

_{i}, respectively. An elementary consequence of (5) is that, given

**y**, the likelihood depends only on

*N*,

_{y}*f*() and the prior as reflected in the observed variables 𝗛

**x**. There is no direct dependence on the state dimension

*N*.

_{x}*ψ*() = log

*f*(),where

*V*

_{ij}= −

*ψ*[

*y*

_{j}− (𝗛

**x**

^{f}

_{i})

_{j}], the negative log likelihood of the

*j*th component of the observation vector given the

*i*th ensemble member. It is convenient to center and scale the argument of the exponent in (6) by definingwhereThen (6) becomeswhere

*S*has zero mean and unit variance. The simplest situation (as in the example of section 3) is when the random variables

_{i}*V*

_{ij},

*j*= 1, . . . ,

*N*

_{y}, are independent given

**y**, so thatBecause

*S*is a sum of

_{i}*N*random variables, its distribution will often be close to Gaussian if

_{y}*N*is large. When

_{y}*V*

_{ij},

*j*= 1, . . . ,

*N*

_{y}, are independent given

**y**, the distribution of

*S*on any fixed, finite interval approaches the standard Gaussian distribution for large

_{i}*N*if the Lindeberg condition holds with probability tending to 1 (see Durret 2005, section 2.4a). More generally, the approximate normality of

_{y}*S*holds for any observation error density

_{i}*f*() such that ∫

*f*

^{1−ϵ}(

*t*)

*dt*is finite for some

*ϵ*> 0 and when the

*V*are not i.i.d. but have sufficiently similar distributions and are not too dependent (see Bengtsson et al. 2008). We note in passing that the requirement that the

_{ij}*V*be not too dependent as

_{ij}*N*increases means that

_{y}*N*must become large as well and also that the components of the state vector are not strongly dependent. The pdf of the

_{x}**x**

^{f}

_{i}must also have a moment-generating function, but is otherwise unconstrained. We will return to the role of

*N*in the collapse later.

_{x}Equation (8) together with the approximation *S _{i}* ∼

*N*(0, 1) is the basis for the asymptotic conditions for collapse derived in section 4b. They allow statements about the asymptotic behavior of likelihood, and thus of the

*w*, for large sample sizes

_{i}*N*and large numbers of observations

_{e}*N*, using asymptotic results for large samples from the standard normal distribution.

_{y}Showing that the approximation *S _{i}* ∼

*N*(0, 1) is adequate for our purposes is nontrivial, since the behavior in the tails of the distribution is crucial to the derivations but convergence to a Gaussian is also weakest there. The interested reader will find details and proofs in Bengtsson et al. (2008). In fact, the approximation is adequate when the

*S*s are distributed as noncentral

_{i}*χ*

^{2}variables with

*N*degrees of freedom, which is exactly the case when the observations themselves are Gaussian. As the study of Bengtsson et al. (2008) shows, the adequacy of the Gaussian approximation for the

_{y}*S*holds if

_{i}*ψ*= log

*f*has a moment generating function, for instance if

*f*is Cauchy. In what follows, however, we will assume that

*S*∼

_{i}*N*(0, 1) holds in a fashion which makes succeeding manipulations valid.

### b. Heuristic derivation of conditions for collapse

*w*

_{(Ne)}can be expressed aswhere

*S*

_{(}

_{i}_{)}is the

*i*th-order statistic of the sample {

*S*

_{i},

*i*= 1, . . . ,

*N*

_{e}}.

^{2}Definingwe then haveCollapse of the particle-filter weights occurs when

*T*approaches zero.

*E*(

*T*) for large

*N*and

_{e}*N*by approximating

_{y}*E*[

*T*|

*S*

_{(1)}] and then taking an expectation over the distribution of

*S*

_{(1)}. For an expectation conditioned on

*S*

_{(1)}, the sum in (10) may be replaced by a sum over an unordered ensemble with the condition

*S*>

_{i}*S*

_{(1)}. In that case the expectation of each term in the sum will be identical andwhere

*S̃*is drawn from the same distribution as the

*S*but with values restricted to be greater than

_{i}*S*

_{(1)}.

*S*∼

_{i}*N*(0, 1). Then

*S̃*has the densitywhere

*φ*() is the density for the standard normal distribution and

*x*) = ∫

^{∞}

_{x}

*φ*(

*z*)

*dz*. Writing the expectation explicitly with the density of

*S̃*yieldsNext, we replace

*φ*(

*z*) by (2

*π*)

^{−1/2}exp(−

*z*

^{2}/2) in the integrand in (13), complete the square in the exponent, and use the definition of

*x*) to obtain the following:

^{3}as

*N*→ ∞,Thus, since

_{e}*S*

_{(1)}is becoming large and negative,

*S*

_{(1)}] approaches 1 and may be ignored in (14) when calculating the asymptotic behavior of

*E*(

*T*|

*S*

_{(1)}).

*τ*/

*N*

_{e}

*N*

_{e}→ ∞. In this limit,

*τ*+

*S*

_{(1)}≈

*τ*(1 −

*N*

_{e}

*τ*) → ∞ and so, by the standard approximation to the behavior of

^{4}thatBut, reversing the reasoning that led to (16) gives

*φ*[

*S*

_{(1)}]/|

*S*

_{(1)}| ≈ Φ[

*S*

_{(1)}], where Φ(

*x*) = 1 −

*x*) is the cumulative distribution function (cdf) for the standard Gaussian. Thus,as

*N*

_{e}→ ∞.

*S*

_{(1)}then givesTo see this, recall that evaluating the cdf of a random variable at the value of the random variable, as in Φ(

*S*

_{i}), yields a random variable with a uniform distribution on [0, 1]. This property underlies the use of rank histograms as diagnostics of ensemble forecasts (Hamill 2001 and references therein) and is known in statistics as the “probability integral transform.” Thus, Φ[

*S*

_{(1)}] is distributed as the minimum of a sample of size

*N*from a uniform distribution and

_{e}*E*{Φ[

*S*

_{(1)}]} ≈ 1/

*N*

_{e}. In the next section, we will confirm (19) with direct simulations.

Equation (19) implies that the particle filter will suffer collapse asymptotically if *N*_{e} ≪ exp(*τ* ^{2}/2). More generally, *N _{e}* must increase exponentially with

*τ*

^{2}in order to keep

*E*[1/

*w*

_{(Ne)}] fixed as

*τ*increases. This exponential dependence of

*N*on

_{e}*τ*

^{2}is consistent with the simulation results of section 3, where

*τ*

^{2}∝

*N*

_{y}.

In contrast to the most obvious intuition, the asymptotic behavior of *w*_{(Ne)} given in (19) does not depend directly on the state dimension *N _{x}.* Instead, the situation is more subtle:

*τ*

^{2}, a measure of the variability of the observation priors, controls the maximum weight. The dimensionality of the state enters only implicitly, via the approximation that

*S*is asymptotically Gaussian, which requires that

_{i}*N*be asymptotically large. One can then think of

_{x}*τ*

^{2}as an equivalent state dimension, in the sense that

*τ*

^{2}is the dimension of the identity-prior, identity-observation example (in section 3) that would have the same collapse properties.

## 5. The Gaussian–Gaussian case

The analysis in the previous section focused on situations in which the log likelihoods for the observations (considered as random functions of the prior) were mutually independent and identically distributed. In general, however, the observation likelihoods need not be i.i.d., since the state variables are correlated in the prior distribution and observations may depend on multiple state variables. In this section, we consider the case of a Gaussian prior, Gaussian observation errors, and linear 𝗛, where analytic progress is possible even for general prior covariances and general 𝗛.

Let the prior **x** ∼ *N*(0, 𝗣) and the observation error ** ϵ** ∼

*N*(0, 𝗥). We may assume that both

**x**and

**have mean zero since, if the observations depend linearly on the state,**

*ϵ**E*(

**y**) = 𝗛

*E*(

**x**) and

*p*(

**y**|

**x**) is unchanged if

**y**is replaced by

**y**−

*E*(

**y**) and

**x**by

**x**−

*E*(

**x**).

**, the transformation**

*ϵ***y**′ = 𝗥

^{−1/2}

**y**also leaves

*p*(

**y**|

**x**) unchanged but results in cov(

**′) = cov(𝗥**

*ϵ*^{−1/2}

**) = 𝗜. Further simplification comes from diagonalizing cov(𝗥**

*ϵ*^{−1/2}𝗛

**x**) via an additional orthogonal transformation in the observation space. Let

**y**″ = 𝗤

^{T}

**y**′, where 𝗤 is the matrix of eigenvectors of cov(𝗥

^{−1/2}𝗛

**x**) with corresponding eigenvalues

*λ*

^{2}

_{j},

*j*= 1, . . . ,

*N*

_{y}; then cov(𝗤

^{T}𝗥

^{−1/2}𝗛

**x**) = diag (

*λ*

^{2}

_{1}, . . . ,

*λ*

^{2}

_{Ny}), while

**″ = 𝗤**

*ϵ*^{T}

**′ still has identity covariance and**

*ϵ**p*(

**y**|

**x**) is again unchanged because 𝗤 is orthogonal. [Anderson (2001) presents a similar transformation that diagonalizes the problem in terms of the state variables, rather than the observation variables.] We therefore assume, without loss of generality, thatand drop primes in the sequel.

### a. Analysis of the observation likelihood

*p*(

**y**|

**x**

^{f}

_{i}) can be written in terms of a sum over the log likelihoods

*V*as in (6). In addition, the pdf for each component of the observations is Gaussian with unit variance and, given

_{ij}**x**

^{f}

_{i}, mean

**Hx**

^{f}

_{i}. Thus,The additive constant

*c*results from the normalization of the Gaussian density and may be omitted without loss of generality, since it cancels in the calculation of the weights

*w*.

_{i}^{Ny}

_{j=1}

*V*

_{ij}to be approximately Gaussian with mean

*μ*and variance

*τ*

^{2}. Leaving aside for the moment the conditions under which the sum is approximately Gaussian, the mean and variance given

**y**of Σ

^{Ny}

_{j=1}

*V*

_{ij}can be calculated directly using (20) together with the properties of the standard normal distribution and the fact that the

*V*are independent as

_{ij}*j*varies [as in (7b)]. This yieldsand

**y**of the observations. Proceeding rigorously would require taking the expectation of (19) over

**y**. Here, we simply assume that expectation may be approximated by replacing

*τ*in (19) by its expectation over

**y**. Using the fact that

*E*(

*y*

^{2}

_{j}) =

*λ*

^{2}

_{j}+ 1, we haveand

*λ*

_{1}≥

*λ*

_{2}≥ . . . , the distribution of

*S*

_{i}= (Σ

*V*

_{ij}−

*μ*)/

*τ*converges to a standard Gaussian as

*N*→ ∞ if and only ifThat is,

_{y}*S*converges to a Gaussian when no single eigenvalue or set of eigenvalues dominate the sum of squares: (23) implies that max

_{i}_{j}(

*λ*

^{2}

_{j})/Σ

*λ*

^{2}

_{j}→ 0 as

*N*

_{y}→ ∞. The condition (23) also means that

*τ*

^{2}→ ∞, which in turn leads to collapse if log

*N*/

_{e}*τ*→ 0.

^{2}On the other hand, in the case that (23) is not satisfied, the unscaled log likelihood converges to a quantity that does *not* have a Gaussian distribution. Collapse does not occur since the updated ensemble empirical distribution converges to the true posterior as *N*_{e} → ∞, whatever *N _{y}* may be.

### b. Simulations

*E*[1/

*w*

_{(Ne)}] − 1 as a function of

*N*and

_{e}*N*, given in (19), for the Gaussian–Gaussian case. For simplicity, let

_{y}*λ*

_{j}= 1,

*j*= 1, . . . ,

*N*

_{y}(as in the example of section 2). Then (22b) implies that

*E*(

*τ*

^{2}) = 5

*N*/2 and (19) becomes

_{y}This approximation is valid when *N _{e}* is large enough that the sample minimum follows (15) and

*N*is large enough that log(

_{y}*N*)/

_{e}*N*is small. To capture the appropriate asymptotic regime, we have performed simulations with

_{y}*N*

_{e}=

*N*

^{α}

_{y},

*α*= 0.75, 0.875, 1.0, 1.25,

*N*varying over a dozen values between 600 and 3000, and

_{y}*E*(1/

*w*

_{(Ne)}) approximated by averaging over 1000 realizations of the experiment. As can be seen from Fig. 4, 1 −

*E*[1/

*w*

_{(Ne)}] has an approximately linear relation to,

*N*

_{e})/

*N*

_{y}

Equation (19) also implies that asymptotic collapse of the particle filter depends only on *τ* rather than the specific sequence {*λ*_{j}, *j* = 1, . . . , *N*_{y}}. To illustrate that *τ* does control collapse, we consider various *λ* sequences by setting *λ*^{2}_{j} = *cj*^{−θ}. In this case, the simulations fix *N _{y}* = 4 × 10

^{3}and

*N*= 10

_{e}^{5}while

*θ*takes the values 0.3, 0.5, and 0.7 and c is varied such that substituting (22b) in (19) gives 0.01 <

*E*[1/

*w*

_{(Ne)}] − 1 < 0.075. These values are again chosen to capture the appropriate asymptotic regime where the normalized log likelihood

*S*is approximately Gaussian. The expectation

_{i}*E*[1/

*w*

_{(Ne)}] is approximated by averaging over 400 realizations of the experiment.

Figure 5 shows results as a function of *N*_{e}*τ*. As predicted by (19), *E*[1/*w*_{(Ne)}] depends mainly on *τ* rather than on the specific *λ* sequence. The simulations thus confirm the validity of (19) and, in particular, the control of the maximum weight by *τ*. Nevertheless, some limited scatter around the theoretical prediction remains, which arises from weak dependence of *E*[1/*w*_{(Ne)}] on the *λ* sequence for finite *τ*. We defer to a subsequent study a more detailed examination of the behavior of the maximum weight for finite *τ* and *N _{e}* and the limits of validity of (19).

## 6. Multivariate Cauchy observation-error distribution

Van Leeuwen (2003) proposes the use of a multivariate Cauchy distribution for the observation error to avoid collapse and gives some numerical results supporting his claim. In Bengtsson et al. (2008), analytical arguments as well as simulations indicate that collapse still occurs but more slowly with such an observation-error distribution. Specifically, they show that, in the limit *N*_{e})/*N*_{y}*E*(*T*) approaches zero at a rate given by *N*_{e})/*N*_{y}*N*_{e})/*N*_{y}*E*(*T*) → 0 and collapse is then identical to that in the Gaussian–Gaussian case, namely, log(*N*_{e})/*N*_{y} → 0, but the rate is distinctly slower than those implied by (19) or (24).

**has a multivariate Cauchy distribution, then**

*ϵ***can be written aswhere**

*ϵ**z*

_{1}, . . . ,

*z*

_{Ny}are i.i.d.

*N*(0, 1). For given

*z*

_{Ny}

_{+1}close to 0, the errors have very long Gaussian tails. This makes collapse harder because the true posterior resembles the prior, implying that the observations have relatively little information.

## 7. Conclusions

Particle filters have a well-known tendency for the particle weights to collapse, with one member receiving a posterior weight close to unity. We have illustrated this tendency through simulations of the particle-filter update for the simplest example, in which the priors for each of *N _{x}* state variables are i.i.d. and Gaussian, and the observations are of each state variable with independent, Gaussian errors. In this case, avoiding collapse and its detrimental effects can require very large ensemble sizes even for moderate

*N*. The simulations indicate that the ensemble size

_{x}*N*must increase exponentially with

_{e}*N*in order for the posterior mean from the particle filter to have an expected error smaller than either the prior or the observations. For

_{x}*N*= 100, the posterior mean will typically be worse than either the prior or the observations unless

_{x}*N*> 10

_{e}^{6}.

Asymptotic analysis, following Bengtsson et al. (2008) and Bickel et al. (2008), provides precise conditions for collapse either in the case of i.i.d. observation likelihoods or when both the prior and the observation errors are Gaussian (but with general covariances) and the observation operator is linear. The asymptotic result holds when *N _{e}* is large and

*τ*

^{2}, the variance of the observation log likelihood defined in (7b), becomes large and has an approximately Gaussian distribution. Then, in the limit that

*τ*

^{−1}

*N*

_{e}

*w*

_{(Ne)}satisfies

*E*[1/

*w*

_{(Ne)}] ≈ 1 +

*τ*

^{−1}

*N*

_{e}

*τ*increases unless the ensemble size

*N*grows exponentially with

_{e}*τ*.

In the case that both the prior and observation errors are Gaussian, *τ* ^{2} can be written as a sum over the eigenvalues of the observation-space-prior covariance matrix. The theory then predicts that collapse does not depend on the eigenstructure of the prior covariances, except as that influences *τ*. Simulations in section 5 confirm this result.

It is thus not the state dimension per se that matters for collapse, but rather *τ*, which depends on both the variability of the prior and the characteristics of the observations. Still, one may think of *τ* ^{2} as an effective dimension, as it gives the dimension of the identity-prior, identity-observation Gaussian system (as in section 3) that would have the same collapse properties. This analogy is only useful, however, when the normalized observation log likelihood *S _{i}* defined in (7a) has an approximately Gaussian distribution, which requires that

*N*not be too small.

_{x}Our results point to a fundamental obstacle to the application of particle filters in high-dimensional systems. The standard particle filter, which uses the prior as a proposal distribution together with some form of resampling, will clearly require exponentially increasing ensemble sizes as the state dimension increases and thus will be impractical for many geophysical applications. Nevertheless, some limitations of this study will need to be addressed before the potential of particle filtering in high dimensions is completely clear.

First, the simulations and asymptotic theory presented here have not dealt with the most general situation, namely, when the prior and observations are non-Gaussian and have nontrivial dependencies among their components. There is no obvious reason to expect that the general case should have less stringent requirements on *N _{e}* and we speculate that the Gaussian–Gaussian results of section 5 will still be informative even for non-Gaussian systems. Some support for this claim comes from the results of Nakano et al. (2007), who apply the particle filter with a variety of ensemble sizes to the fully nonlinear, 40-variable model of Lorenz (1996). Consistent with Fig. 2, they find that an ensemble size between 500 and 1000 is necessary for the posterior from the particle filter to have smaller rms errors than the observations themselves.

Second, the asymptotic theory pertains to the behavior of the maximum weight, but says nothing about how the tendency for collapse might degrade the quality of the particle-filter update. Indeed, the update may be poor long before the maximum weight approaches unity, as illustrated by Figs. 2 and 3. What is needed is practical guidance on ensemble size for a given problem with finite *N _{x}*,

*N*, and

_{y}*τ*. Though rigorous asymptotic analysis will be difficult, we anticipate that simulations may provide useful empirical rules to guide the choice of ensemble size.

Third, we have not addressed the possible effects of sequentially cycling the particle filter given observations at multiple instants in time. Overall, cycling must increase the tendency for collapse of the particle filter. The quantitative effect, however, will depend on the resampling strategy, which again makes analytic progress unlikely.

Fourth, we have not considered proposal distributions other than the prior nor have we considered resampling algorithms, which are frequently employed to counteract the particle filter’s tendency for collapse of the ensemble. We emphasize that resampling strategies that do not alter the update step are unlikely to overcome the need for very large *N _{e}*, since they do not improve the estimate of the posterior distribution, but merely avoid carrying members with very small weights further in the algorithm. It is conceivable that the required

*N*might be reduced by splitting a large set of observations valid at a single time into several batches, and then assimilating the batches serially with resampling after each update step. Alternatively, one might identify states in the past that will evolve under the system dynamics to become consistent with present observations, thereby reducing the need for large ensembles of present states when updating given present observations. Gordon et al. (1993) term this process “editing,” and a similar idea is employed by Pitt and Shephard (1999). Such a scheme, however, would likely demand very large ensembles of past states.

_{e}As noted in the introduction, both van Leeuwen (2003) and Zhou et al. (2006) have applied particle filters to systems of dimension significantly larger than 100. In Zhou et al., however, each update is based on only a single observation (and only 28 observations total are assimilated); assuming that the prior uncertainty is comparable to the observation variance, *τ* ^{2} < 28 in their case and their ensemble sizes of *O*(1000) would be adequate based on Fig. 3. Based on the characteristics of the sea surface height observations assimilated by van Leeuwen, we estimate that the particle-filter update uses *O*(100) observations at each (daily) analysis. Allowing for the possibility that nearby observations are significantly correlated owing to the relatively large scales emphasized by sea surface height, then van Leeuwen’s use of 500–1000 ensemble members would seem to be at the edge of where our results would indicate collapse to occur. Consistent with this, van Leeuwen notes a strong tendency for collapse.

Fundamentally, the particle filter suffers collapse in high-dimensional problems because the prior and posterior distributions are nearly mutually singular, so that any sample from the prior distribution has exceptionally small probability under the posterior distribution. For example, in the Gaussian i.i.d. case, the prior and posterior distributions have almost all their mass confined to the neighborhood of hyperspheres with different radii and different centers. The mutual singularity of different pdfs becomes generic in high dimensions and is one manifestation of the curse of dimensionality.

Another way of looking at the cause of collapse is that the weights of different members for any chosen state variable are influenced by *all* observations, even if those observations are nearly independent of the particular state variable. The particle filter thus inherently overestimates the information available in the observations and underestimates the uncertainty of the posterior distribution. Similar problems occur for the EnKF and, for spatially distributed systems with finite correlation lengths (e.g., most geophysical systems), can be reduced by explicitly restricting any observation’s influence to some spatially local neighborhood. This motivates the development of nonlinear, non-Gaussian ensemble assimilation schemes that perform spatially local updates, as in Bengtsson et al. (2003) or Harlim and Hunt (2007).

It was T. Hamill who first introduced the lead author to the potential problems with the particle-filter update in high dimensions. This work was supported in part by NSF Grant 0205655.

## REFERENCES

Anderson, J. L., 2001: An ensemble adjustment filter for data assimilation.

,*Mon. Wea. Rev.***129****,**2884–2903.Anderson, J. L., , and S. L. Anderson, 1999: A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts.

,*Mon. Wea. Rev.***127****,**2741–2758.Bender, M., , and S. Orszag, 1978:

*Advanced Mathematical Methods for Scientists and Engineers*. McGraw-Hill, 593 pp.Bengtsson, T., , C. Snyder, , and D. Nychka, 2003: Toward a nonlinear ensemble filter for high-dimensional systems.

,*J. Geophys. Res.***108****,**D24. 8775–8785.Bengtsson, T., , P. Bickel, , and B. Li, 2008: Curse of dimensionality revisited: Collapse of the particle filter in very large scale systems.

*Probability and Statistics: Essays in Honor of David A. Freedman,*D. Nolan and T. Speed, Eds., Vol. 2, Institute of Mathematical Statistics, 316–334, doi:10.1214/193940307000000518. [Available online at http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body&id=pdf_1&handle=euclid.imsc/1207580091.].Bickel, P., , and E. Levina, 2008: Regularized estimation of large covariance matrices.

,*Ann. Stat.***36****,**199–227.Bickel, P., , B. Li, , and T. Bengtsson, 2008: Sharp failure rates for the bootstrap particle filter in high dimensions.

*Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh,*Vol. 3, B. Clarke and S. Ghosal Eds., Institute of Mathematical Statistics, 318–329, doi:10.1214/074921708000000228.Chin, T. M., , M. J. Turmon, , J. B. Jewell, , and M. Ghil, 2007: An ensemble-based smoother with retrospectively updated weights for highly nonlinear systems.

,*Mon. Wea. Rev.***135****,**186–202.David, H. A., , and H. N. Nagaraja, 2003:

*Order Statistics*. 3rd ed. John Wiley and Sons, 458 pp.Doucet, A., , N. de Freitas, , and N. Gordon, 2001: An introduction to sequential Monte Carlo methods.

*Sequential Monte Carlo Methods in Practice,*A. Doucet, N. de Freitas, and N. Gordon, Eds., Springer-Verlag, 2–14.Durret, R., 2005:

*Probability: Theory and Examples*. 3rd ed. Duxbury Press, 512 pp.Furrer, R., , and T. Bengtsson, 2007: Estimation of high-dimensional prior and posteriori covariance matrices in Kalman filter variants.

,*J. Multivar. Anal.***98****,**2. 227–255.Gordon, N. J., , D. J. Salmond, , and A. F. M. Smith, 1993: Novel approach to nonlinear/non-Gaussian Bayesian state estimation.

,*IEEE Proc.***140****,**107–113.Hamill, T. M., 2001: Interpretation of rank histograms for verifying ensemble forecasts.

,*Mon. Wea. Rev.***129****,**550–560.Hamill, T. M., , J. S. Whitaker, , and C. Snyder, 2001: Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter.

,*Mon. Wea. Rev.***129****,**2776–2790.Harlim, J., , and B. R. Hunt, 2007: A non-Gaussian ensemble filter for assimilating infrequent noisy observations.

,*Tellus***59A****,**225–237.Houtekamer, P. L., , and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique.

,*Mon. Wea. Rev.***126****,**796–811.Houtekamer, P. L., , and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation.

,*Mon. Wea. Rev.***129****,**123–137.Ide, K., , P. Courtier, , M. Ghil, , and A. C. Lorenc, 1997: Unified notation for data assimilation: operational, sequential, and variational.

,*J. Meteor. Soc. Japan***75****,**(Special Issue). 181–189.Keppenne, C. L., , M. M. Rienecker, , N. P. Kurkowski, , and D. A. Adamec, 2005: Ensemble Kalman filter assimilation of temperature and altimeter data with bias correction and application to seasonal prediction.

,*Nonlinear Processes Geophys.***12****,**491–503.Kim, S., , G. L. Eyink, , J. M. Restrepo, , F. J. Alexander, , and G. Johnson, 2003: Ensemble filtering for nonlinear dynamics.

,*Mon. Wea. Rev.***131****,**2586–2594.Liu, J. S., 2001:

*Monte Carlo Strategies in Scientific Computing*. Springer-Verlag, 364 pp.Lorenz, E. N., 1963: Deterministic nonperiodic flow.

,*J. Atmos. Sci.***20****,**130–148.Lorenz, E. N., 1996: Predictability: A problem partly solved.

*Proc. Seminar on Predictability,*Vol. 1, Reading, Berkshire, United Kingdom, ECMWF, 1–18.Moradkhani, H., , K-L. Hsu, , H. Gupta, , and S. Sorooshian, 2005: Uncertainty assessment of hydrologic model states and parameters: Sequential data assimilation using the particle filter.

,*Water Resour. Res.***41****.**W05012, doi:10.1029/2004WR003604.Nakano, S., , G. Ueno, , and T. Higuchi, 2007: Merging particle filter for sequential data assimilation.

,*Nonlinear Processes Geophys.***14****,**395–408.Pham, D. T., 2001: Stochastic methods for sequential data assimilation in strongly nonlinear systems.

,*Mon. Wea. Rev.***129****,**1194–1207.Pitt, M. K., , and N. Shephard, 1999: Filtering via simulation: Auxilliary particle filters.

,*J. Amer. Stat. Assoc.***94****,**590–599.Reichle, R. H., , D. B. McLaughlin, , and D. Entekhabi, 2002: Hydrologic data assimilation with the ensemble Kalman filter.

,*Mon. Wea. Rev.***130****,**103–114.Silverman, B. W., 1986:

*Density Estimation for Statistics and Data Analysis*. Chapman and Hall, 175 pp.Smith, K. W., 2007: Cluster ensemble Kalman filter.

,*Tellus***59A****,**749–757.Snyder, C., , and F. Zhang, 2003: Assimilation of simulated Doppler radar observations with an ensemble Kalman filter.

,*Mon. Wea. Rev.***131****,**1663–1677.van Leeuwen, P. J., 2003: A variance-minimizing filter for large-scale applications.

,*Mon. Wea. Rev.***131****,**2071–2084.Whitaker, J. S., , G. P. Compo, , X. Wei, , and T. M. Hamill, 2004: Reanalysis without radiosondes using ensemble data assimilation.

,*Mon. Wea. Rev.***132****,**1190–1200.Xiong, X., , I. M. Navon, , and B. Uzunoglu, 2006: A note on the particle filter with posterior Gaussian resampling.

,*Tellus***58A****,**456–460.Zhou, Y., , D. McLaughlin, , and D. Entekhabi, 2006: Assessing the performance of the ensemble Kalman filter for land surface data assimilation.

,*Mon. Wea. Rev.***134****,**2128–2142.

^{1}

This obstacle is equally relevant to a related class of “mixture” filters in which the prior ensemble serves as the centers for a kernel density estimate of the prior (Anderson and Anderson 1999; Bengtsson et al. 2003; Smith 2007). These filters also involve the calculation of the weight of each center given observations, and thus are subject to similar difficulties.

^{2}

In other words, *S*_{(1)} is the minimum of the sample, *S*_{(2)} is the next smallest element, and so on until the maximum, *S*_{(Ne)}.

^{3}

If a random variable *X* depends on a parameter *a*, we write *X*(*a*) = *o _{p}*(l) as (say)

*a*→ ∞ if Pr(|

*X*| ≥

*δ*) → 0 for all

*δ*≥ 0.