## Abstract

This study presents the first application of a localized particle filter (PF) for data assimilation in a high-dimensional geophysical model. Particle filters form Monte Carlo approximations of model probability densities conditioned on observations, while making no assumptions about the underlying error distribution. Unlike standard PFs, the local PF uses a localization function to reduce the influence of distant observations on state variables, which significantly decreases the number of particles required to maintain the filter’s stability. Because the local PF operates effectively using small numbers of particles, it provides a possible alternative to Gaussian filters, such as ensemble Kalman filters, for large geophysical models. In the current study, the local PF is compared with stochastic and deterministic ensemble Kalman filters using a simplified atmospheric general circulation model. The local PF is found to provide stable filtering results over yearlong data assimilation experiments using only 25 particles. The local PF also outperforms the Gaussian filters when observation networks include measurements that have non-Gaussian errors or relate nonlinearly to the model state, like remotely sensed data used frequently in atmospheric analyses. Results from this study encourage further testing of the local PF on more complex geophysical systems, such as weather prediction models.

## 1. Introduction

Ensemble data assimilation strategies are now used frequently for state and parameter estimation in geoscience. Examples include the perturbed observation ensemble Kalman filter (Evensen 1994; Houtekamer and Mitchell 1998), deterministic ensemble square root filters (e.g., Anderson 2001; Bishop and Hodyss 2011; Whitaker and Hamill 2002), and ensemble/variational hybrids (e.g., Hamill and Snyder 2000; Lorenc 2003; Buehner 2005). These methods attempt to provide Monte Carlo estimates of a system’s probability density conditioned on observations, and are formulated by assuming errors for the model state and observations are independent and Gaussian. A major benefit of Gaussian approaches is that they can be made to work effectively using affordable ensemble sizes by accounting for sampling errors in ensemble-estimated covariances. Nevertheless, these methods may not be the best option as computational resources allow for larger ensembles that better resolve non-Gaussian errors.

A recent paper by Poterjoy (2016, hereafter P16) introduces a new localized particle filter (PF) for ensemble data assimilation and forecasting. The filter operates in a manner similar to traditional PFs in that it estimates posterior quantities by reweighting particles (ensemble members) based on the likelihood of observations. Particle filters do not assume a parametric probability density function (PDF) for errors in the model state estimate, which allows them to perform well for most applications when provided with a sufficiently large number of particles; see Doucet et al. (2001) for a review. Unlike conventional PF methods, the local PF reduces the influence of distant observations on the weight calculations via a localization function, similar to covariance localization in ensemble Kalman filters (Houtekamer and Mitchell 1998; Hamill and Whitaker 2001). The resulting filter does not require the number of particles to increase exponentially with the degrees of freedom in the system, as is the case for standard PFs (Bickel et al. 2008; Bengtsson et al. 2008; Snyder et al. 2008).

Localization has already been used in previously proposed PF variants, including the local-local ensemble filter (LLEF) of Bengtsson et al. (2003) and the moment matching filter of Lei and Bickel (2011). These studies present novel approaches of generating localized updates to particles that differ from the local PF described in P16. Nevertheless, their potential for data assimilation in large geophysical models has not yet been demonstrated. Another strategy for maintaining a filter’s stability when the state dimension is large is to sample particles from a proposal distribution, typically conditioned on the most recent observations (Doucet et al. 2001). Methods of this type include the equivalent-weights PF (van Leeuwen 2010) and the implicit PF (Chorin et al. 2010). More recently, Ades and van Leeuwen (2015) applied the equivalent-weights PF to a barotropic vorticity model with 65 500 state variables, demonstrating its ability to prevent filter divergence using only 32 particles for a large model. This method relies on modifying the transition density between observation times to draw particles closer to observations, then sampling from a proposal density that ensures particles have nearly equal weights.

The current study differs from previous work in that it uses a high-dimensional geophysical model to examine potential benefits of a PF-based method over two commonly used Gaussian filters: the perturbed observation ensemble Kalman filter (EnKF; Evensen 1994; Houtekamer and Mitchell 1998), and the ensemble adjustment Kalman filter (EAKF; Anderson 2001). P16 already performed extensive tests of the local PF using the 40-variable model of Lorenz (1996) and found the filter to be a robust method for various observation densities, observation types, and model system dynamics. P16 also show that the local PF is substantially better than Gaussian methods for assimilating dense observation networks that relate nonlinearly to the model state, like satellite and radar measurements collected routinely by environmental observing platforms. Most importantly, the local PF produces stable results using as few as 5 particles for the 40-variable Lorenz system. These results provide an incentive to explore the local PF’s potential for data assimilation applications that would require very large numbers of particles for standard PF methods.

Two aspects of the local PF that could not be investigated using the Lorenz system in P16 are explored here: the filter’s ability to produce physically consistent posterior particles and the scalability of the new method for high-dimensional systems. In geophysical models, state variables are often highly correlated due to physical balances created by the model dynamics. These correlations can exhibit complex structures that change rapidly with time (e.g., Poterjoy and Zhang 2011), which motivates the use of ensembles to approximate model forecast errors. Localization can modify or destroy these correlations (Mitchell et al. 2002; Greybush et al. 2011), so physical balances must be considered when applying localized ensemble filters. Despite this limitation, localized ensemble Kalman filters perform reasonably well for most geophysical applications. The local PF must be shown to be as good or better than Kalman filters at preserving balance to be considered an effective data assimilation method. The filter also needs to be practical for large models, where the number of particles is usually severely limited. For typical modeling applications in atmospheric and oceanic science, computational resources usually allow for particle numbers on the order of 100–1000, which is far from what is needed for standard PF methods. To test whether the local PF is a viable method for high-dimensional geophysical systems, the current study uses yearlong cycling data assimilation experiments with an idealized atmospheric general circulation model (GCM). Similar to P16, the local PF is compared with the EAKF and EnKF for cases in which the linear/Gaussian assumptions of these filters are appropriate and for cases in which they are inappropriate for the assimilation problem at hand.

The manuscript is organized in the following manner. Section 2 introduces the most basic form of particle filtering and presents the local PF algorithm. Section 3 discusses deterministic and probabilistic forecast results from yearlong cycling data assimilation experiments using the EAKF, EnKF, and local PF. The last section summarizes the study and provides a discussion of possible applications for the local PF.

## 2. A particle filter for high-dimensional systems

### a. Sequential importance resampling and its limitations

Particle filters are sequential methods that estimate probability densities of the model state at discrete times conditioned on observations of the true system state (Doucet et al. 2001). In this section, we discuss a PF called the bootstrap filter (Gordon et al. 1993), which generates an equally weighted set of particles from the posterior distribution each time observations are available. To introduce this filter, let denote a random vector of length representing the state variables of the system we wish to estimate at a given time. Also, let be a vector of length containing the available observations at the same time as . These observations are related to the true state through

where *H* is an operator that maps the model state into observation space and *ϵ* is the error in the measurements. Starting from a sample of particles to approximate a prior PDF for (denoted , for ), the filter estimates the posterior probability of as a sum of weighted delta functions: . If the prior particles are equally likely at each filtering time, as will be assumed throughout this study, the posterior weights are given by the likelihood of observations assuming each particle is the true state:

The weights in (2), normalized by (3), are called “importance weights” in the PF literature. In the bootstrap filter, they are used to generate equally likely posterior samples by removing prior particles with low weights and duplicating particles with high weights. The resulting particles then initialize an ensemble forecast for the succeeding cycle.

The bootstrap filter has a number of desirable properties; most notably, it is simple to implement and converges to the true Bayesian solution as approaches . Nevertheless, the number of particles required to prevent the collapse of importance weights onto a single particle increases exponentially with the dimension of the system (Bickel et al. 2008; Bengtsson et al. 2008; Snyder et al. 2008). The filter’s stability for large applications can be improved by sampling from a proposal distribution conditioned on the previous model state and most recent observations. Recent studies by Slivinski and Snyder (2016) and Snyder et al. (2015), however, suggest the weight collapse for high-dimensional systems may still be inevitable, even when the proposal distribution is nearly optimal. Despite this work, methods such as the equivalent-weights PF (van Leeuwen 2010; Ades and van Leeuwen 2015) have shown some skill in high-dimensional systems with relatively small numbers of particles. Snyder et al. (2015) argue that the equivalent-weights PF operates outside the “optimal proposal” context as described in their asymptotic analysis, which is one possible explanation for its stability in large models. The equivalent-weights PF also nudges particles toward observations when transitioning the ensemble to the next observation time; this nudging is performed locally in state space, thus providing some form of localized update to particles.

### b. The local particle filter

Ensemble Kalman filters rely on covariance localization and variance inflation methods to reduce the effects of sampling errors and prevent the ensemble variance from collapsing for small ensembles (Houtekamer and Mitchell 2001; Hamill and Whitaker 2001). Adopting similar strategies, P16 introduced the local PF, which limits the update of particles by the bootstrap filter to regions in the vicinity of observations. The local PF exploits the fact that the signal-to-noise ratio of ensemble correlations between variables in generally decreases as their separation increases. In this section we briefly describe the main parts of the local PF algorithm, and refer readers to P16 for a detailed explanation and derivation of the presented equations.

The local PF assimilates observations serially to generate posterior particles that reflect the local impact of observations. This process requires updating particles after each observation is processed, while simultaneously updating a matrix of weights (denoted ) that reflect the marginal probabilities of each state variable, given all observations up to the current one. The term contains columns, which are populated by vectors of normalized weighting coefficients of length . The calculation of depends only on observations up to the current one in the assimilation sequence, and the prior particles before any observations have been assimilated; these particles are denoted , for . If the observation errors are uncorrelated, the *n*th set of column elements can be estimated sequentially for each observation using the likelihood of each observations given . The weight calculated for the *j*th state variable of the *n*th particle given the *i*th observation, , is given by

Here, the superscript refers to quantities that reflect all observations up to , and the subscripts *n* and *j* on *x* and *ω* indicate the particle index and state variable, respectively. In (4), is a localization function that controls the impact of each observation on the weights; it has a maximum value of 1 when the distance between and is 0, and decays to 0 when and are far apart. We use the fifth-order correlation function given by Eq. (4.10) of Gaspari and Cohn (1999) for . This function resembles a Gaussian, but tapers to zero at a rate controlled by a half-width parameter *r*. Each column of the weight matrix contains the normalized weights , where indicates element-wise division of vectors. Coefficients in reflect the standard PF importance weights at the location of observations, but approach as goes to zero. Section 2c of P16 discusses the motivation for choosing a weight equation of the form given in (4).

The weighting matrix can be used to approximate properties of the posterior distribution:

where indicates element-wise multiplication and is a function for the mean, variance, etc., which operates on column vectors containing the particles. The term provides information regarding the posterior probability distribution for each state variable, but not the joint probability of neighboring state variables. As a result, (6) cannot estimate multivariate properties, such as covariances. This limitation is not problematic for the update steps described below, which use bootstrap resampling steps to provide multivariate corrections.

Provided with (4)–(6), the local PF assimilates observations serially to generate posterior particles. In our description of this process, we will use to denote particles updated by observations up to . The first step of the local PF is to sample particles with replacement based on scalar weights proportional to and merge the sampled particles with prior particles within the localization region.^{1} The sampled particles will be denoted for , where indicates the integer location of each sampled particle in the prior ensemble. One means of forming the posterior particles is to use an update equation of the following form:

where (6) provides the posterior mean . The vectors and control the contribution of the sampled and prior particles within the update region. A solution that satisfies the posterior mean and variance () provided from (6) is given by the equations for the *j*th elements of and :

where ; see P16 for a derivation and further interpretation of the update equations. The update step relies on (4)–(6) to estimate the posterior mean and variance in (8); this part of the algorithm helps maintain consistency between the updated particles and the weights in . Like the bootstrap filter, relationships between posterior state variables are introduced during the sampling step of the algorithm, so (7)–(10) avoid explicit calculations of multivariate quantities, such as covariances. The resulting update contains the same posterior solution as the bootstrap filter near observations, since the right-hand side of (7) converges to as approaches unity. The particles updated via (7) are then assigned equal weights before continuing to the next observation.

After processing all observations, higher-order corrections are then made to the particles using a nonparametric kernel density mapping approach (McGinnis et al. 2015). This step updates the particles so that the marginal probability densities of each state variable match the marginal densities given by . Details of this mapping step are provided in section 3b of P16.

Following the above update steps, the resulting filter provides only an approximation to standard PF methods, which converge to the Bayesian solution as approaches . For this reason, a rigorous mathematical justification for the local PF may not be feasible. When processing each observation, the local PF follows Bayes’s rule for updating variables that have a small spatial displacement from the observation. But the impact of the observation on the remaining state variable updates is tapered within a localization region to avoid the collapse of the posterior variance for variables assumed to be independent of the observation. The shape and extent of this tapering depends on an empirical localization function, so the posterior update within this region no longer follows Bayesian filtering theory. We, however, do not view this limitation as a disadvantage to the EAKF and EnKF methods used in this study, which use localization functions to make similar approximations.

### c. Posterior inflation and observation quality control

For geophysical filtering applications, the number of particles used to approximate the state-space PDF is typically much smaller than the degrees of freedom in the model. Particles are also integrated forward in time using imperfect forecast models with unknown errors. Performing many successive filtering steps without considering these systematic errors can lead to the collapse of ensemble variance over time. Ensemble Kalman filters generally rely on prior or posterior variance inflation methods to compensate for these errors; for example, the EAKF and EnKF in this study use adaptive prior inflation (Anderson 2007). In this study, we stabilize the local PF by maintaining relatively uniform importance weights. A simple, yet effective means of preventing the weights from converging to a single particle is to multiply in (4) and (10) by a scalar *α*, where . We also calculate weights for the sampling component of the update steps using , instead of , so the sampled particles are consistent with near observations. As before, the revised weights are normalized so their sum is equal to unity. These steps enforce a minimum value of for the posterior weights before normalization, which prevents the collapse of the PF when all particles are far from the observation. In addition, the *α* parameter reduces the correction made to particles during the update step in a manner similar to posterior inflation schemes used in some implementations of ensemble Kalman filters (e.g., Zhang et al. 2004; Whitaker and Hamill 2012). This modification affects the sampling of particles and the calculation of and used to merge particles in (7). The resulting change acts to relax the posterior distribution back to the prior when filter degeneracy may occur.

The local PF algorithm applies a sampling step for each observation, thus making the treatment of sampling errors using *α* an important component of the filter. To demonstrate the impact of *α* on the calculation of posterior weights, we present three univariate examples using a simple Gaussian application. For the first case, we generate a sample of 20 particles from and calculate weights based on a single observation, with . We maintain the same samples for the second two cases, but use and for so that fewer particles return high likelihoods. Figures 1a–c show the posterior weights before and after normalization with (i.e., no inflation) for the three cases, illustrating how a greater difference between the observation and particles affects the distribution of weights. In the first example (Fig. 1a), the particles contain relatively uniform weights, with higher values near the mean of . The number of particles with large weights becomes progressively smaller as the mean of is shifted to larger values (Figs. 1b,c), until a single particle has nearly all the weight after normalization. In these experiments, the normalized weights tend to collapse onto a single particle when the average mean squared difference between particles and the observation is larger than the sum of prior and observation error variance. This collapse can occur in real filtering applications if an observation is an outlier or if the forecast model contains a large bias. To prevent the weight collapse in this demonstration, we set , which places a minimum value of 0.01 on the prenormalized weights (see dashed line in Figs. 1d–f). Reducing *α* from 1 to 0.99 has no major effect on the and cases, but forces equal weights between particles when the probability of each particle is small. In practice, the *α* coefficient has the same effect as choosing a with heavy tails—a method that has been used as early as Meinhold and Singpurwalla (1989) for making filters more robust to observation outliers. Therefore, *α* also acts as a quality control mechanism by reducing the effect of observations with low likelihoods.

## 3. Cycling data assimilation experiments

### a. Forecast model: An idealized GCM

To examine the potential of the local PF as a data assimilation method for geophysical models, we apply the filter for a coarse-resolution atmospheric GCM. The model has the same hydrostatic dynamical core as the Geophysical Fluid Dynamics Laboratory AM2 climate model (Anderson et al. 2004), but with no moisture and topography, and no parameterizations for radiation, turbulence, and convection. Instead, the model uses Held and Suarez (1994) forcing, which consists of Newtonian relaxation of the temperature field to a prescribed radiative equilibrium solution and Rayleigh damping of winds near the surface to represent friction. The prognostic variables are zonal and meridional winds (*u* and *υ*), temperature *T*, and surface pressure . The domain consists of 60 longitude × 30 latitude grid points on an Arakawa B grid (Arakawa and Lamb 1977) with 5 vertical sigma levels. The domain resolution is fine enough to support a constant train of baroclinic Rossby waves along mid- and upper latitudes of the model domain. The analysis and prediction of these waves provides a suitable test problem for PF methods, which have known limitations for systems with many degrees of freedom. The same model, as configured here, is available in the DART data assimilation package as the “bgrid-solo” option.

### b. Experiment configuration

This section presents results from three sets of cycling data assimilation experiments comparing the performance of the local PF and two ensemble Kalman filters for the idealized GCM application. In the first of these experiments, the observations contain Gaussian errors and *H* is a linear operator. Prior errors also remain close to Gaussian for this model, providing a scenario in which the assumptions made by Kalman filtering methods are appropriate. To make the filtering problem more challenging, we use non-Gaussian measurement errors in the second experiment and a nonlinear operator for *H* in the third experiment. For each of these cases, we generate 300 synthetic sounding observations every 6 h from a yearlong “truth” simulation using the same resolution and configuration as the forecast model; that is, the model used to propagate particles forward in time between cycles is perfect. The soundings contain measurements of as well as *u*, *υ*, and *T* at each model level, which are created from the truth simulation using *H* with random errors added [see (1)]. We choose the locations of these sounding randomly from a uniform distribution over the domain and maintain the same spatial network of observations for all times.

To initialize our experiments we add small, uncorrelated perturbations to a previously spunup model state, then integrate each perturbed state forward 30 days. The ensemble perturbations grow into large amplitude correlated errors over this period, providing a suitable approximation of a climatological prior probability density for the first data assimilation cycle. All three experiments use a relatively small number of particles (), thus allowing sampling noise to factor into the performance of the filters. Results discussed in this manuscript come from configurations of the EAKF, EnKF, and local PF that are tuned to produce the lowest prior root-mean-square errors (RMSEs) for each experiment. We tuned the filters by performing several sensitivity experiments with various settings for *r* (and *α* for the PF). Though not shown, simulations were performed using *r* values of 0.07, 0.09, 0.12, 0.15, 0.18, 0.21, 0.24, and 0.27 rad and *α* values of 0.8, 0.85, 0.9, and 0.99. The EAKF and EnKF do not rely on a relaxation parameter to maintain the filter’s stability, but instead use adaptive prior covariance inflation (Anderson 2007) with prior standard deviations for the inflation coefficient (*γ*) fixed at 0.6.

### c. Linear H, Gaussian likelihood experiment

The first test of the local PF is for a case in which *H* is an operator that performs a linear horizontal interpolation from model space to observation space. The simulated observations contain uncorrelated Gaussian errors, with standard deviations of 1 hPa, 1 K, and 1 m s^{−1} for , *T*, and wind variables, respectively. The purpose of this experiment is to test whether the local PF can provide adequate filtering results for a realistic geophysical system using a small number of particles. We use the three filters to generate posterior particles and 6-h forecasts, which approximate the prior errors for each succeeding assimilation cycle. We then summarize prior error statistics from the EAKF, EnKF, and local PF cases by calculating domain averages of RMSE and ensemble spread for all cycles and variable types (Fig. 2). All three filters require about 30 days to produce time series of global mean errors that are near stationary, so the first 45 days are omitted from the verification. Table 1 also summarizes the results from this experiment (and the nonlinear/non-Gaussian experiments) using RMSEs and spread averaged over the period plotted in Fig. 2. For the linear/Gaussian experiment, all three filters produce the lowest prior RMSEs when the localization radius is set to , and with *α* set to 0.9 for the PF. The resulting prior statistics show that the local PF provides stable results over the entire yearlong cycling period and produces forecast errors that are well below the observation errors. The EAKF and EnKF priors have lower RMSEs than the local PF because parametric methods are less sensitive to sampling errors. This experiment, however, demonstrates that the local PF can provide stable results for large applications using a small number of particles.

Next, we explore how well the PF preserves balance when updating particles each assimilation cycle. Arriving at the same optimal *r* values for the three filters allows for a convenient comparison of the methods when considering factors such as initial condition imbalance, where the amount of localization can have a large influence (Anderson 2012). To examine potential imbalances between data assimilation cycles we adopt a commonly used metric, RMS surface pressure tendency (), which quantifies the pressure adjustment immediately after model initialization. The metric is estimated using tendencies from the first model time step ( s) for each particle. The amount of imbalance after initialization is influenced significantly by the magnitude of analysis increments, which can be estimated roughly by the prior RMSEs. Though not shown, the dependence of RMS on prior RMSEs is seen during the spinup stage of data assimilation. The dependence also becomes evident in experiments performed in the following sections, where the local PF produces prior RMSEs that are comparable or smaller than the ensemble Kalman filter RMSEs. To remove this dependence, we performed an additional experiment in which 6-h ensemble forecasts from the cycling EAKF experiment are used as prior particles for the local PF. We then calculate pressure tendency from forecasts initialized with these posterior local PF states.

Figure 3 shows global mean RMS , averaged over the particles for each cycle of the EAKF and local PF cases, and for the posterior PF particles generated using the EAKF priors; results from the EnKF cycles are omitted because they resemble the EAKF case. We also provide average RMS from the truth simulation (black line) for reference. Values much larger than the true pressure tendency are assumed to come from spurious adjustments made after initialization and are indicative of improperly balanced posterior particles. Particles initialized from the local PF undergo a larger adjustment than EAKF particles because the PF produces a prior forecast that is not as close to the truth as the EAKF in the linear/Gaussian experiment (Fig. 2). The differences in RMS decrease considerably when the PF uses prior particles from the EAKF cycles, which follows from the smaller corrections needed to update the particles (cf. blue and red lines in Fig. 3). Using the same prior particles, imbalances in posterior PF and EAKF updates lead to temporal mean values of pressure tendency that are about 15% higher than global mean RMS calculated from the truth. RMS values from the PF forecasts are within 10% of the EAKF case over the duration of the experiment, suggesting particles initialized by the two methods undergo a comparable amount of adjustment after initialization. The metric shows that the local PF and EAKF perform equally well at preserving balance when updating particles, thus validating the localization and update strategies adopted for the local PF.

### d. Linear H, non-Gaussian likelihood experiment

One advantage of the PF over ensemble Kalman filters is flexibility in choosing the likelihood function for calculating weights. A number of observation types assimilated by atmospheric models contain non-Gaussian errors that are often skewed or bounded; precipitation data present one example. While methods exist for transforming observed data into Gaussian distributions (e.g., Lien et al. 2013), or reformulating the filter update equations to handle certain non-Gaussian PDFs (e.g., Fletcher and Zupanski 2006), the PF simply uses a likelihood function that best fits the measurement errors.^{2} The only requirement is that errors come from a known parametric distribution. Developing a filter that can handle measurement errors in a general manner may allow for a more efficient treatment of observations with known non-Gaussian errors.

For this experiment we apply the same linear *H* as the previous experiment, but use a skew normal distribution to generate observation errors. The skew normal PDF is given by

where *ψ*, *s*, and *κ* are location, scale, and shape parameters, respectively; and is the error function. This distribution has a mean *μ* and variance given by

To create observation errors, we set , , and in (12) and (13), solve for *ψ* and *s*, then sample from (11) using these parameters. Figure 4 compares the skew normal PDF with a Gaussian PDF for the parameters specified here. The resulting samples are added to the simulated truth to form observations. In this experiment, the EAKF and EnKF correctly assume the observation errors have a mean and variance of 0 and 1, respectively, but have no knowledge of the skewness associated with these errors. On the other hand, the PF uses (11) directly in the weight equation, with location, scale, and shape parameters modified to fit the observation likelihoods.

Figure 5 shows prior RMSEs and spread for the experiment using skew normal observation errors. Here, the best configurations of the EAKF, EnKF, and local PF contain the same localization radius as the previous experiment (*r* = 0.24), but with *α* reduced from 0.9 to 0.85 for the PF. Prior errors from the EAKF and EnKF simulations resemble results from the linear/Gaussian case discussed in section 3c. Because the observation errors are unbiased and have the correct error variance, the skewed observation errors lead to similar posterior mean and error variance estimates as the Gaussian likelihood experiment (see the appendix for more details). The relative PF performance, however, is improved over the linear/Gaussian case. This method has the advantage of using the correct PDF for the observation likelihoods, which has a higher peak probability density than a Gaussian PDF with the same variance. The larger maximum in peak probability density allows the local PF to move more particles toward the truth than in the previous Gaussian case, thus leading to lower RMSEs. The performance gains of the local PF are large enough to reduce most of the differences in RMSEs between the three filtering methods (Fig. 5).

Because the observation errors follow a non-Gaussian distribution, forecast errors between cycles exhibit more non-Gaussian behavior than the linear/Gaussian experiment, rendering RMSE and ensemble spread insufficient for quantifying the probabilistic skill of the filters. Therefore, we apply rank histograms to evaluate 6-h ensemble forecasts generated during experiments where the linear/Gaussian assumptions made by the EAKF and EnKF are violated. Rank histograms plot how often the true model state lands in discrete bins formed by ranking particles in ascending order (Anderson 1996; Hamill and Colucci 1996; Talagrand et al. 1997). A flat histogram is a necessary, but insufficient condition for particles to come from the same probability distribution as the true model state, and will be used here to determine potential shortcomings in the filters. The histograms are calculated independently for all model grid points from prior particles generated every 8 cycles (48 h) of the experiments. Using data every 48 h reduces the effects of temporal correlations when estimating whether each histogram exhibits a significant deviation from a uniform probability distribution.

After calculating histograms, we apply a chi-squared test to determine whether deviations from a uniform distribution can be detected at a 99% significance level. Our motivation for using hypothesis testing is to identify regions where nonuniform histograms may exist, before performing a visual inspection of histograms in suspect regions. Figures 6a and 6b show global maps indicating [with crisscrosses (×s)] model grid points where forecasts for midlevel *T* (vertical model level 3) likely contain nonuniform histograms.^{3} Histograms detected to be nonuniform often reveal large forecast biases relative to RMSEs; therefore, we plot time-averaged in addition to results from the hypothesis testing (shaded contours in Fig. 6). Despite providing mean RMSEs that are comparable to the PF, the EAKF and EnKF yield inferior probabilistic forecasts (cf. Figs. 6a,c,e), demonstrating the advantage of using the local PF for this application. Like the ensemble Kalman filters, the PF also contains regions where the chi-squared test indicates nonuniform histograms, due mostly to forecast biases in upper latitudes. One reason for this result is that we configured the filters to provide the lowest ensemble mean RMSEs over the cycling period, with no consideration of probabilistic skill. A part of this process involves finding the localization length scale that produces the best results when used for all 300 sounding observations, despite the fact that the optimal length scale is a function of many factors, including latitude (Anderson 2012). The filters can be tuned to provide better ensemble forecasts than what is shown here, but at the expense of higher RMSEs.

### e. Nonlinear H, Gaussian likelihood experiment

As shown in P16, the benefit of PFs over ensemble Kalman filters is apparent only when ensemble statistics are sufficient for detecting non-Gaussian PDFs. Therefore, distributions that are close to Gaussian require a larger to detect deviations from Gaussianity than distributions that are far from Gaussian. One way of designing an experiment where a small number of particles can detect non-Gaussian distributions is to generate observations that relate nonlinearly to the model state, so that the projection of prior particles into observation space results in samples from a non-Gaussian PDF (see Fig. 5 of P16). For the last test of the local PF, we use an *H* comprising absolute value and natural log functions: . Here, is a time- and spatially averaged base state calculated from the truth simulation. The purpose of this constant is to prevent the observation-space prior variance from collapsing to values close to zero when applying the log operator to large numbers, such as state variables. This step allows us to use the same observation error variance used in the first experiment (cf. section 3c).

Figure 7 shows time series plots of domain-averaged prior RMSEs and spread for the filter experiments using the nonlinear *H*. All three filters produce higher errors than the previous experiment because the new observations provide only indirect measurements of the system state at each filter time. The new *H* causes the EAKF and EnKF updates to be suboptimal, leading to prior RMSEs that are almost twice the local PF RMSEs over the cycling period (Table 1). EAKF forecasts tend to exhibit the largest peaks in mean RMSE between cycles, and in some cases, yield ensemble spread that is much smaller than the true prior errors. One factor leading to elevated errors for the ensemble Kalman filters is the smaller localization half-width needed to maintain stable results; in this case, *r* must be reduced to 0.07 for both the EAKF and EnKF. Though not shown, using larger *r* leads to filter divergence partway through the experiment for both cases. At the same time, the optimal *r* and *α* parameters for the local PF remain unchanged from the previous experiment. P16 also found the local PF to produce reasonable results using the same *r* for linear and nonlinear *H*, because the filter does not need to compensate for systematic errors introduced when Gaussian assumptions are invalid.

We again use rank histograms to examine the probabilistic forecast performance of the three methods. As was the case for the non-Gaussian likelihood experiment, the EAKF and EnKF forecasts have a larger number of grid points with nonuniform histograms than the PF (Figs. 6b,d). For the nonlinear *H* experiment, the result is mostly due to a domainwide cold bias caused by assuming a Gaussian distribution for the observation-space prior PDF. Though not shown, ensemble forecasts (for all three filters) also tend to be underdispersed in the lower latitudes, which is a major factor contributing to the nonuniform histograms near the equator. This result is made worse in the EAKF and EnKF cases, which use a relatively short localization radius to maintain the filters’ stability. The small *r* required for the ensemble Kalman filters is far from optimal in low latitudes, where the Rossby radius of deformation is quite large. The local PF, however, benefits from maintaining a large localization radius for this observation network, which allows for significant improvements over these filters.

## 4. Conclusions

This study describes the first application of a localized PF for data assimilation in a high-dimensional geophysical model. As a benchmark, the local PF is compared with two commonly used deterministic and stochastic ensemble Kalman filters in the DART software package. Particle filters provide benefits over Gaussian filters when linear/Gaussian assumptions limit the effectiveness of Kalman filtering theory. Nevertheless, the number of particles required for standard PFs is known to scale exponentially with the system dimension. The local PF avoids this issue by limiting the influence of observations on the calculation of posterior quantities using a strategy similar to covariance localization in ensemble Kalman filters. To form posterior samples, the filter processes observations serially, updating particles via sampling and merging steps to approximate the posterior mean and covariance resulting from the local influence of observations. The algorithm then uses a probability mapping step to make higher-order corrections not considered during the previous steps. The resulting set of particles reflects samples from a posterior PDF that approximates the Bayesian solution near observations and the prior away from observations. The first step in the local PF algorithm has a computational cost that scales with the number of observations assimilated; likewise, the probability mapping step scales with the number of particles. In typical data assimilation applications, where the number of observations is much greater than the number of particles, the time required to run the filter is dominated by the first part of the algorithm.

Using a relatively small number of particles (), the local PF provides consistently accurate filtering results in yearlong cycling data assimilation experiments performed with a simplified atmospheric GCM. For applications where linear/Gaussian approximations are appropriate, the EAKF and EnKF provide lower RMSEs than the local PF. This result occurs because ensemble Kalman filters correctly assume a Gaussian distribution for prior errors, and use particles to estimate only the mean and covariance of this distribution. The local PF is nonparametric, which makes it more sensitive to sampling errors for small numbers of particles; it also contains a resampling step for each observation, which increases this sensitivity. The number of particles required to produce comparable results with ensemble Kalman filters and the local PF depends mostly on the degrees of freedom depicted by observations within localization regions and the validity of linear/Gaussian approximations. The local PF demonstrates benefits over the EAKF and EnKF when observation errors are selected from a non-Gaussian distribution, and when observations relate nonlinearly to the model state. For the case of non-Gaussian observation errors, this benefit comes in the form of improved probabilistic forecasts over the ensemble Kalman filters, rather than smaller RMSEs. Additional particles are required in order for the PF to achieve lower RMSEs than Gaussian filters for this experiment. Furthermore, posterior particles generated by the local PF are found to produce a comparable level of initial condition imbalance as EAKF and EnKF particles, which justifies the use of localized updates in the algorithm. These results suggest that a nonparametric data assimilation system based on the local PF introduced in P16 may be practical for real geophysical applications.

This study and P16 focus mostly on the potential of PFs for assimilating observations that have nonlinear operators or non-Gaussian measurement errors. Advantages demonstrated by the local PF thus far provide enough incentive to explore its potential in real geophysical modeling applications. Nevertheless, the theoretical benefits of the local PF for highly nonlinear dynamical models have not yet been tested. Future work will focus on applying the local PF in high-dimensional systems that do not require nonlinear measurement operators to yield non-Gaussian priors. One possible application is for mesoscale convective systems in weather models, which contain large gradients in wind, temperature, and moisture that can lead to non-Gaussian errors. Other conceivable uses include land surface models, ocean models, and coupled system models, all of which are governed by nonlinear dynamical systems that are not well constrained by observations.

Moreover, the serial processing of observations by the local PF and its inclusion in the DART software package means that it can be used together with ensemble Kalman filters to obtain a superior data assimilation system. We hypothesize that the best filtering option when small to moderate numbers of particles are available may be to use the PF for observation networks that are problematic for Gaussian filters, while applying a Kalman filter update when linear/Gaussian assumptions are appropriate. Nevertheless, the practical benefits of the local PF need to be explored for more realistic geophysical modeling applications.

## Acknowledgments

This research is sponsored by the NCAR Advanced Study Program. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the National Science Foundation. We would like to acknowledge high-performance computing support from Yellowstone (ark:/85065/d7wd3xhc) provided by NCAR’s Computational and Information Systems Laboratory, sponsored by the National Science Foundation.

### APPENDIX

#### Univariate Skew Normal Experiments

Numerical simulations demonstrate the impact of using the PF and the EAKF to assimilate observations with skew normal errors for a univariate application. Here, we perform single observation experiments using the PF and EAKF to generate posterior particles. In each experiment, we sample 500 particles and a truth state from , then generate an observation by adding a random error from a skew normal distribution to the truth. As in section 3d, the observation error comes from a distribution that has a mean, variance, and skewness of 0, 1, and 10, respectively. Figures A1a,b show rank histograms produced from the PF and EAKF experiments, with average RMSE and spread indicated in the upper right of each plot. For reference, we also plot results using the EAKF for a case in which observation errors are Gaussian, but contain the same variance as the skew normal experiment (Fig. A1c). On average, the EAKF assimilation of observations with skew normal errors leads to posterior mean RMSEs that are comparable to the (nonskewed) Gaussian error case. The nonuniform rank histogram produced from the EAKF, however, indicates deficiencies in probabilistic analyses produced by this method for the skew normal observation errors. The PF produces a flat rank histogram and the smallest average RMSEs and spread because it uses the correct likelihood function when calculating particle weights.

## REFERENCES

*Methods in Computational Physics: Advances in Research and Applications*, J. Chang, Ed., Vol. 17, Elsevier, 173–265.

*Probability and Statistics: Essays in Honor of David A. Freedman*, D. Nolan and T. Speed, Eds., Vol. 2, Institute of Mathematical Statistics, 316–334.

*Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh*, B. Clarke and S. Ghosal, Eds., Vol. 3, Institute of Mathematical Statistics, 318–329.

*Sequential Monte Carlo Methods in Practice*, Springer-Verlag, 2–14.

*13th Conf. on Probability and Statistics in the Atmospheric Sciences*, San Francisco, CA, Amer. Meteor. Soc., 51–56.

*Proc. Seminar on Predictability*, Vol. 1, Reading, United Kingdom, ECMWF, 1–18.

*Machine Learning and Data Mining Approaches to Climate Science*, V. Lakshmanan et al., Eds., Springer International Publishing, doi:

*Proc. ECMWF Workshop on Predictability*, Reading, United Kingdom, ECMWF,

## Footnotes

The National Center for Atmospheric Research is sponsored by the National Science Foundation.

^{1}

We use systematic resampling (Kitagawa 1996) for this step, but any type of resampling method can be used.

^{2}

Like the PF, the rank histogram filter (Anderson 2010) also benefits from flexibility in choosing a likelihood function, except variables in state space are updated via a linear regression from the observation-space correction.

^{3}

We performed the same analysis for several other model levels and variables and find midlevel *T* to be representative of how forecasts are verified for most variables.