## Abstract

Particle filters are well known in statistics. They have a long tradition in the framework of ensemble data assimilation (EDA) as well as Markov chain Monte Carlo (MCMC) methods. A key challenge today is to employ such methods in a high-dimensional environment, since the naïve application of the classical particle filter usually leads to filter divergence or filter collapse when applied within the very high dimension of many practical assimilation problems (known as the curse of dimensionality). The goal of this work is to develop a localized adaptive particle filter (LAPF), which follows closely the idea of the classical MCMC or bootstrap-type particle filter, but overcomes the problems of collapse and divergence based on localization in the spirit of the local ensemble transform Kalman filter (LETKF) and adaptivity with an adaptive Gaussian resampling or rejuvenation scheme in ensemble space. The particle filter has been implemented in the data assimilation system for the global forecast model ICON at Deutscher Wetterdienst (DWD). We carry out simulations over a period of 1 month with a global horizontal resolution of 52 km and 90 layers. With four variables analyzed per grid point, this leads to 6.6 × 10^{6} degrees of freedom. The LAPF can be run stably and shows a reasonable performance. We compare its scores to the operational setup of the ICON LETKF.

## 1. Introduction

Data assimilation is concerned with the use of observations in combination with a numerical model to determine the state of a dynamical system. In the framework of weather forecasting, data assimilation has a long history, ranging from the early works of Bjerknes (1904) and Richardson (1922) to modern ensemble data assimilation systems (cf. Bauer et al. 2015). Data assimilation links the model world with reality, usually by using a wide range of observational data to correct the development of the dynamical system step by step through an assimilation cycle and, thus, providing initial conditions for forecasting.

For an introduction into data assimilation methods, we refer to Kalnay (2003), Evensen (2009), Anderson and Moore (2012), van Leeuwen et al. (2015), Reich and Cotter (2015), and Nakamura and Potthast (2015). The history of data assimilation methods used within an operational framework started with optimal interpolation from the 1960s to the 1990s. Variational methods such as three-dimensional variational assimilation (3D-VAR) have been employed operationally since about 1990, with four-dimensional variational assimilation (4D-VAR) since approximately 2000. Variational methods have the strength to calculate a best estimate for either one time slice (3D-VAR) or over some temporal window (4D-VAR). Ensemble data assimilation development started in the mid-1990s, with operational use since about 2010.

The ensemble Kalman filter (EnKF) was developed by Evensen (1994; see also Evensen and van Leeuwen 2000; Evensen 2009). The idea was applied to global numerical weather prediction by Houtekamer and Mitchell (1998, 2001, 2005) and Houtekamer et al. (2005). In the area of geophysical data assimilation, Burgers et al. (1998) developed the theoretical basis of the EnKF methods based on perturbations of the observations. Whitaker and Hamill (2002) proposed the alternative approach, called the ensemble square root filter (EnSRF). It does not use randomly perturbed observations, but formulates a deterministic calculation of the posterior ensemble.

Further variants of ensemble filters include, for example, the singular evolutive extended Kalman filter (SEEKF; Pham et al. 1998), the ensemble adjustment Kalman filter (EAKF; Anderson 2001), and the ensemble transform Kalman filter (ETKF; Bishop et al. 2001). Localization is a key ingredient of the ensemble Kalman filter (Houtekamer and Mitchell 1998; denoted as data selection with a cutoff radius), the work by Brusdal et al. (2003), the local ensemble Kalman filter (LEKF; Ott et al. 2004), and the local ensemble transform Kalman filter (LETKF; Hunt et al. 2007), where all locally available observations are assimilated in one step. Various other forms of explicit filters are developed (e.g., the GIGG filter; Bishop 2016). For an overview of ensemble-based data assimilation methods, we refer to Vetra-Carvalho et al. (2018).

Important current research topics on ensemble Kalman filters are covariance localization and inflation (see van Leeuwen 2003b; Miyoshi et al. 2007; Miyoshi and Sato 2007; Campbell et al. 2010; Greybush et al. 2011; Janjić et al. 2011; Periáñez et al. 2014). Localization for particle filters has been described by, for example, Bengtsson et al. (2003) and van Leeuwen (2003a). Flow-adaptive localization has been described by Bishop and Hodyss (2007, 2009a,b) and Anderson (2007a); multiscale localization by Miyoshi and Kondo (2013); and flow-adaptive inflation by Anderson (2007b, 2009), Li et al. (2009), and Miyoshi (2011). The investigation of large ensembles has been carried out by, for example, Miyoshi et al. (2014).

Leaving the Gaussian regime, for which ensemble Kalman filters are best, particle filters take into account the full nonlinearity of both the model dynamics and observation operators in applications, leading to strongly non-Gaussian distributions on all temporal and spatial scales. Particle filters have a long history in stochastic modeling, where they have been used since the 1960s under the name of iterative Markov chain Monte Carlo (MCMC) methods (Bain and Crisan 2009; Crisan and Rozovskii 2011). The idea is to sample some probability distribution, where the number of samples reflects the local strength of the probability density, their weights are adapted using observations, and then resampling is carried out. Several particle filter methods have been formulated and tested for small-dimensional problems, ranging from early work by Gordon et al. (1993) to the review of van Leeuwen (2009). More recently, Ades and van Leeuwen (2013, 2015) have adapted their equivalent-weights particle filter (EWPF) to high-dimensional systems using a simple relaxation technique and a proposal to ensure equal weights for the particles. Zhu et al. (2016) have further improved the EWPF to their implicit equal weights particle filter (IEWPF). Further, Gaussian mixture models have been developed, based on the estimation of the model error probability distribution by a set of Gaussians (see Hoteit et al. 2008; Stordal et al. 2011 for a hybrid method of a Gaussian mixture and the particle filter). Frei and Künsch (2013) develop a hybrid method for an EnKF and a particle filter. For a recent review, we refer to van Leeuwen (2009) and Reich and Cotter (2015; cf. Nakamura and Potthast 2015).

It is well known that in a high-dimensional framework, particle filters suffer from so-called filter collapse or filter divergence under the curse of dimensionality (cf. van Leeuwen 2010; Snyder et al. 2008, 2015; Bickel et al. 2008). This means that usually, only very few or just one of the ensemble members carry all the weight in the assimilation step. This immediately destroys the diversity of the ensemble and leads to useless behavior when applied in an iterative way. Different ideas have been developed to overcome filter collapse: for example, guiding the particles to the right places in the high-dimensional space (van Leeuwen 2010). Also, using localization for particle filters has become popular (see, e.g., Reich and Cotter 2015; Poterjoy and Anderson 2016). Only recently, Robert et al. (2018) and Poterjoy (2016) started to apply particle filters in an operational environment for a large-scale operational global weather model. Robert et al. (2018) employ the data assimilation coding environment of Deutscher Wetterdienst (DWD) working with the convection-permitting COSMO Model. To avoid filter divergence, they combine the particle filter step with an ensemble Kalman filter step. Poterjoy (2016) works in a convective-scale framework in comparison to the global setup; he also employs localization as a core ingredient of his localized particle filter (see also Poterjoy et al. 2017).

Here, our goal is to develop a localized adaptive particle filter (LAPF) for global atmospheric data assimilation to fit into the framework of a global operational weather prediction model. We want to show that with appropriate adaptation, the classical particle filter can be a stable and useful approach in a large-scale operational framework.

Our reference is the implementation of the LETKF (Hunt et al. 2007) for the Icosahedral Nonhydrostatic (ICON) model. This has been operational at DWD since 20 January 2015. The ICON-LETKF provides initial conditions for the global ensemble prediction system (ICON-EPS) and feeds the dynamic covariance matrix of DWD’s ensemble–variational data assimilation system (EnVAR). The basic ingredients of the LAPF will be described in detail in sections 3b–f. Section 3b describes the classical particle filter; section 3c describes the projection onto ensemble space to calculate the particle weights; section 3d describes a classical resampling step; section 3e describes spread control by optimal spread estimation; and section 3f describes a Gaussian rejuvenation or resampling step, carried out by modulated global draws around each particle after the first classical resampling step.

Further, in section 3a, we describe 1) the set of observation operators that are included in our assimilation tests, 2) the LETKF reference implementation, 3) the localization in observation space, 4) multiplicative inflation and relaxation to prior perturbation (RTPP), 5) the assimilation grid and interpolation, 6) additive covariance inflation, and 7) the incremental analysis update; all these components are important parts of the system.

Adaptive Gaussian rejuvenation in combination with optimal spread estimation acts as spread control for the ensemble, such that filter collapse or filter divergence can be avoided. It ensures the stability of the particle filter, in the sense that the ensemble spread stays in a feasible range, the filter does not crash, and its core task to synchronize the system to real observations works. The ensemble space projection helps to keep the ensemble weights in a reasonable range. The careful treatment of the rejuvenation by modulated global draws in ensemble space around each ensemble member after resampling keeps atmospheric fields intact for further integration.

By running the implementation for a test period of 1 month with a global horizontal resolution of 52 km, 90 vertical layers, and four variables analyzed per grid point (i.e., with state space dimension of approximately *n* = 6.6 × 10^{6}), we demonstrate the feasibility of the method. A comparison of scores with the operational setup shows that the localized adaptive particle filter is able to provide a reasonable atmospheric analysis in a large-scale environment.

In section 2, we give an introduction into the ICON model setup and the ensemble prediction system ICON-EPS. Section 3 describes the localized adaptive particle filter in ensemble space with Gaussian resampling and spread control. The ensemble data assimilation (EDA) suite of DWD includes different tools, in particular multiplicative and additive covariance inflation, and various stochastic schemes for generating spread in surface fields. We survey our system and experimental setup in section 4 and evaluate the assimilation cycle and forecasts in detail in sections 4a and 4b, studying the development of spread, bias, scores, and the stability of the system. Finally, conclusions and further development steps are discussed in section 5.

## 2. The ICON model and ICON-EPS

In this part, we briefly introduce the ICON model in section 2a, the ICON-EPS in section 2b, and the preoperational experimental setup in section 2c.

### a. The ICON model and deterministic forecast system

ICON is the operational global numerical weather prediction (NWP) model of DWD. It is a joint project of DWD and the Max Planck Institute for Meteorology (MPI-M; Zängl et al. 2015). ICON is based on the prognostic variables suggested by Gassmann and Herzog [2008; for further information, see also Reinert et al. (2018)], but instead of using the three-dimensional Lamb transformation, it uses the two-dimensional version to convert the nonlinear momentum advection into a vector-invariant form (Zängl et al. 2015).

The model grid is based on an unstructured triangular grid that is generated by successive refinement of a spherical icosahedron, which consists of 20 equilateral triangles with an edge length of about 7054 km. These triangles are subdivided (e.g., by bisection, trisection) into smaller triangles, leading to a model grid with the desired spatial resolution. The use of this icosahedral grid provides a nearly homogeneous coverage of the globe. After dividing the triangles, the operational grid of the ICON model consists of 2 949 120 triangles on each horizontal level. Each triangle has an average area of 173 km^{2}. This corresponds to the global horizontal resolution of 13 km for the deterministic run. For the operational setup, a two-way nested area over Europe with 6.5-km horizontal resolution is included. The ensembles run with an operational horizontal resolution of 40 km, with a 20-km-resolution nest over Europe.

The scalar prognostic variables (e.g., temperature, humidity) are located in the center of the triangles, whereas the wind components are located at the edge midpoints of the triangles. The most important prognostic variables (e.g., wind, humidity, cloud water, cloud ice, temperature, snow, precipitation) are calculated for all grid cells on 90 terrain-following vertical model levels, which range from the surface up to a height of 75 km, leading to 265 million grid points in the operational setup. Additional prognostic equations are solved over land on seven soil levels for soil temperature and soil water content. If snow is present, several snow variables are also determined. Once per day (at 0000 UTC), the sea surface temperature (just over ice-free ocean) is analyzed from observations and is kept constant during the forecasts. For the sea ice fraction of the ice-covered oceans, we proceed in the same way. However, ice thickness and ice surface temperature are determined by a simple sea ice model.

Beyond the adiabatic processes in the atmosphere (horizontal and vertical transport processes), diabatic processes (e.g., radiation, turbulence) play a major role in NWP. Describing these small-scale processes is part of the physics parameterization of ICON.

Within the operational workflow, we distinguish the data assimilation cycle and the forecast mode. During the data assimilation cycle, a 3-h forecast starting from the previous analyses (the first guess) is blended with all observations valid for a 3-h time window centered at the analysis date. It is important to note that traditionally, the atmospheric analysis calculates increments to four core variables (temperature, humidity, and two wind components) at each grid point in its three-dimensional grid, with pressure adapted according to the hydrostatic equation. The model adapts further prognostic variables itself. Additional two-dimensional fields that are also adapted have been neglected in our variable counts. To obtain an optimal initial state for the subsequent forecasts, we calculate a variational analysis with a dynamic covariance matrix, known as ensemble variational data assimilation (EnVAR). In total, 70% of the covariance matrix is calculated from the ensemble runs based on a LETKF. Further, 30% of the covariance matrix is given by its climatological part based on the National Meteorological Center (NMC) method (Parrish and Derber 1992). The EnVAR and the LETKF were made operational on 20 January 2016.

Based on the analyses for 0000 and 1200 UTC, ICON provides a 180-h forecast in just 1-h wall-clock time. Forecasts over 120 h are based on the analyses of the 0600 and 1800 UTC run, and the 30-h forecasts are based on the 0300, 0900, 1500, and 2100 UTC analyses.

### b. The ICON-EPS

ICON-EPS has run preoperationally since January 2016, providing background error correlations for the operational global EnVAR system of DWD, with operational forecasts including all ensemble products since 17 January 2018. It also has provided boundary conditions for the operational local-area kilometer-scale ensemble data assimilation (KENDA; Schraff et al. 2016) and ensemble prediction system COSMO-DE-EPS (Bouallègue et al. 2013) since March 2017.

The ICON-EPS initial conditions are provided by the LETKF, which is part of the hybrid data assimilation suite LETKF+EnVAR for the ICON model. The ICON-EPS is run up to 180 h at 0000 and 1200 UTC, up to 120 h at 0600 and 1800 UTC, and up to 30 h at the 3-hourly intervals in between, with the main purpose of generating forecasts and ensemble boundary conditions in the short range of up to 30-h lead time every 3 h.

### c. Development environment, experimental setup, and period

The main goal of our work is to investigate the feasibility and performance of a stable particle filter for global numerical weather prediction with the ICON model in the above operational setup. For our experimental test, we chose the period of 1–31 May 2016.

Since quality control is carried out based on the deterministic run, it is always part of experiments. Here, for the development of the LAPF, we choose the established 52-km experimental resolution for the ensemble and 26 km for the deterministic run. In standard DWD experiments, EDA tests are usually run with the operational ensemble size members, which is also our choice for the LAPF development. In this setup, we study the performance and stability of the particle filter in direct comparison to the LETKF-based operational setup.

The development of the LAPF takes place in the data assimilation coding environment of DWD. The suite includes modules for snow analysis (SNOW) every 3 h, sea surface temperature (SST) analysis, and soil moisture analysis (SMA) once per day. The surface analysis consists of separate modules in which, among others, random perturbations are added to the ensemble members. For our experiments, this part has been kept identical to the operational setup (cf. Reinert et al. 2018).

## 3. An LAPF with Gaussian resampling

In this section, we introduce the localized adaptive particle filter with adaptive Gaussian resampling and spread control. First, section 3a describes the operational LETKF implementation, which serves as a reference for comparison and whose core algorithm is replaced by the LAPF algorithm. Section 3b presents the classical particle filter basis of the method, as an important first step, we describe the ensemble transform version of this particle filter in section 3c. Then, we go into details of classical resampling in section 3d and describe the indicator for spread control in section 3e and the adaptive Gaussian resampling or rejuvenation in section 3f. We note that the adaptive Gaussian rejuvenation is carried out on top of classical resampling (i.e., this is an additional tool for spread control added to the classical particle filter).

### a. The operational LETKF implementation

In this section, we first briefly describe the operational ensemble data assimilation method, its components, and setup. For the LAPF implementation, the core LETKF algorithm II is replaced by a particle filter step, keeping the observation handling, quality control, and part of the inflation and localization facilities unchanged.

#### 1) Observation operators and analysis frequency

Observation operators are evaluated at 3-hourly intervals in the cycled data assimilation code. Observation types currently include TEMP, PILOT,^{1} SHIP, SYNOP,^{2} BUOY, wind profiler, aircraft, atmospheric motion vector (AMV),^{3} radio occultations, scatterometer, and satellite radiances. For further information about the observation types, we refer to ECMWF (2014, p. 32). Observational quality control (observation minus first-guess check) and bias correction (for radiances and individual aircrafts) are performed in the deterministic data assimilation system. Bias-corrected observations and quality control flags are then further passed to the ensemble data assimilation system.

#### 2) LETKF

The formulation of the LETKF at DWD is based on the proposal of Hunt et al. (2007). The implementation is shared with the regional KENDA system (Schraff et al. 2016) for the COSMO EPS. The ensemble Kalman filter equations are solved in ensemble space (39 dimensions in case of 40 members). In principle, the Kalman gain matrix uses the background error covariance matrix in order to determine the analysis increment of the ensemble mean and the symmetric square root of the analysis ensemble covariance matrix . Practically, a weight matrix is derived, which is used for the construction of the analysis ensemble as a linear combination of the forecast ensemble members. Since the LAPF implementation directly imitates the LETKF transform, we need to look into more detail here. The operational LETKF system implements Eqs. (20) and (21) of Hunt et al. (2007), that is,

for calculating the mean of the analysis ensemble and given by

where we use the letter *L* for the number of ensemble members and the notation for the linear coefficients of the analysis mean, denotes the analysis covariance in the space of ensemble coefficients, is the observation error covariance matrix, are the observations, is the mean of the model equivalents of the observations, *H* is the observation operator, and is the matrix of ensembles minus mean in observation space. Equation (2) in model space leads to Eqs. (22) and (23) of Hunt et al. (2007):

where is the analysis mean, and is the matrix of ensemble minus its mean. The analysis ensemble is calculated as in (24) of Hunt et al. (2007). We obtain

using

with the symmetric square root denoted by the 1/2 power of the symmetric matrix . We note that a derivation of this algorithm with its links to classical inverse problems theory can also be found in Nakamura and Potthast (2015, chapter 5).

#### 3) Localization on

Localization is performed by calculating independent analyses (weight matrices ) at each analysis grid point using only the observations in the vicinity of that location. The observations are weighted smoothly in dependence on their distance to that point, according to a localization function chosen as the fifth-order polynomial described by Gaspari and Cohn (1999), which is similar to a Gaussian but has compact support. We use a horizontal localization length scale of 300 km and a vertical length scale varying from 0.3 [given in ln(*p*)] at the surface to 0.8 at the model top (75 km). The length scales are defined following Daley (1993), using the second derivative of the localization function *c* at its origin: . For a Gaussian, this coincides with the standard deviation of the distribution. Formally, the inverse of the observation error covariance matrix [in (2), respectively (6)] is weighted by a pointwise multiplication with the function defined by Gaspari and Cohn, such that observations that are located at a larger distance from the current analysis grid point receive less weight when calculating the analysis. Therefore, this procedure is often denoted as localization on .

#### 4) Multiplicative inflation and RTPP

The analysis ensemble spread is adjusted by multiplicative inflation with a factor ranging from 0.9 to 1.5, based on an online estimate of spread and ensemble-mean root-mean-square error (RMSE) in observation space, following Houtekamer et al. (2005). The inflation factor is estimated locally based on the statistics on observation minus first-guess differences, as described in section 3e, and the matrices are adjusted respectively. In addition, an RTPP is applied following Whitaker and Hamill (2012), with a rate of 0.75. The latter preserves a reasonable situation-dependent spread–skill relationship in the analysis cycle.

#### 5) Assimilation grid and interpolation

Tapering the observations with a smooth function and taking the symmetric square root in the LETKF algorithm ensures that the weight matrices only change on scales on the order of (or larger than) the localization length scale. For this reason, it is sufficient to derive the weight matrices on a coarser analysis grid (e.g., Yang et al. 2009), with spacing on the order of this prescribed length scale. Afterward, the are interpolated to the model grid, and the final analyses are derived by taking linear combinations of the forecast ensemble members according to the interpolated weight matrices.

#### 6) Additive covariance inflation

To account for model errors, additive random perturbations consistent with 25% of the amplitude of the climatological matrix used in the deterministic EnVAR assimilation system are added to the analysis ensemble members. In addition, the SST is perturbed by random perturbations of 1 K, which are a linear combination of perturbations with spatial correlation length scales of 100 and 1000 km and a time scale of 1 day.

#### 7) Incremental analysis update

The analysis increments applied by the cycled data assimilation system, as well as the stochastic perturbations, introduce imbalances and spinup effects, which are diminished by using an incremental analysis update (IAU) scheme (see Bloom et al. 1996). The combined analysis increments from the LETKF, inflation schemes, and additive perturbations are added to the model trajectory, dispensing them over a time interval of 3 h, symmetrically adjusted around the analysis date.

### b. The classical particle filter

The concept of the LAPF is chosen to be as parallel as possible to the ensemble transform Kalman filter described in sections 3a(1)–(7) above. Here, we basically replace the analysis step 2 by the steps described in sections 3b–f.

Let us recall the stochastic notation and background of the particle filter (following Nakamura and Potthast 2015), where denotes model states of dimension *n*, and is the vector of *m* observations at time . Bayesian data assimilation starts with some prior distribution. The analysis step employs observations to derive an analysis distribution. This analysis distribution is propagated to the next time step, where it acts as a prior distribution for the subsequent assimilation step. For our implementation, we consider the analysis distribution for time as initial state. Then, the analysis distribution is propagated in time by a short-range ensemble forecast, from to , to get the first-guess distribution . Afterward, the Bayes formula

is employed to calculate the new analysis distribution

where *c* is a normalization constant so that

The classical particle filter uses an ensemble of states, which represents the prior probability distribution at time in the form of *δ* functions. Alternatively, particles are considered as draws from this prior distribution, and in the case of *L* particles, they each carry a weight of 1/*L*. To carry out the analysis step at time , weights are calculated by

for the particles corresponding to (7), where *c* is a normalization constant. We note that sometimes we use the normalization to *L* for easier discussion (i.e., ). Then, for the prior each particle carries the weight 1.

### c. Projection onto ensemble space

In the following, we drop the explicit declaration of the time index *k* and write for the observations at time .

As seen from (1), the LETKF is based on the projection of the observation onto ensemble space. For brevity, we use for . We note that the orthogonal projection of the observation difference onto the ensemble space with respect to the scalar product weighted by in is given by

Compare, for example, lemma 3.2.3 of Nakamura and Potthast (2015) using the adjoint based on the weighted scalar product in and the Euclidean scalar product in the space of ensemble coefficients.^{4} The corresponding particle filter weights (9) at time based on this projection are given by its ensemble transform projection:

with normalization constant *c*. Abbreviating and , we first note

and

where is the standard unit vector, which is 1 in its th component and zero otherwise. Now, the exponent of (11) is transformed into

leading to

The classical weight (9) is known to lead to filter divergence in high-dimensional spaces. Here, the ensemble transform and the projection *P* onto ensemble space lead to a significant reduction of the dimensionality. The observation vector is mapped onto the vector in the space with ensemble size *L*. The weights (11) now penalize the distance of the ensemble members , in to the projection *C* of the observations onto ensemble space. The histograms in Fig. 1 show the result of this projection step, which, in combination with adaptive Gaussian resampling and localization, leads to a feasible behavior of the particle filter weights.

To evaluate the relationship between the classical particle filter weights and the ensemble space particle filter weights, we note

where we use the orthogonality of the projection *P* with respect to the scalar product with weight such that the mixed terms of *P* with vanish. The second exponential factor in the last line of (16) is equal to a constant for all since we have

If the ensemble space spans only a small part of the full state space, the constant can be very small, so the ensemble transformation effectively removes a very small but uniform factor from the ensemble weights.

After the determination of the weights [(15)], the classical resampling (section 3d) is carried out. Then, the spread control (section 3e) will be prepared. Subsequently, the adaptive Gaussian resampling step (section 3f) will be executed.

### d. Classical resampling

The LAPF carries out a classical resampling step based on (15), suggested already by Gordon et al. (1993). For resampling, accumulated weights , , are defined by

where now we employ normalization to the total weight of *L*. Then, similar to Doucet et al. (2001; see also Elsheikh et al. 2014; Crisan and Rozovskii 2011), we draw , , set , and define the transform matrix for the particles by

with , where denotes the interval of values . This is carried out for each analysis grid point ; for brevity, however, we use instead of .

### e. Spread control

In ensemble data assimilation systems, the spread of the ensemble evolves as a result of model dynamics, model errors (represented by additive perturbations or multiplicative inflation), and active observations^{5} and thus relies on a correct specification of model and observational errors. As it is very difficult to properly estimate and model these errors, the spread of the ensemble is adjusted. In the operational LETKF, an adaptive inflation factor *ρ* is estimated, based on statistics of observations minus first guess (cf. Desroziers et al. 2005; Li et al. 2009). For this purpose, we use adaptive Gaussian resampling with parameters based on the estimate of *ρ* in the LAPF. It is derived from the observation minus background () statistics, the current ensemble spread, and the assumed observation error. Its determination is based on

with the true background state , the background state , the linearization of *H*, the vector of observation errors , and the vector of background errors . Then, if the observation errors and background errors are uncorrelated, we obtain

(see Desroziers et al. 2005). To estimate the inflation factor, we substitute the expectation values of the background and observation errors with the actual ensemble covariance matrix multiplied by the inflation factor *ρ* and the nominal covariance of observation error , respectively: and , resulting in

Now, by taking the trace of the matrices on both sides, using , , and , the inflation factor *ρ* is estimated by

following Desroziers et al. (2005) and Li et al. (2009).

Equation (23) computes a scalar inflation factor *ρ* based on a set of observations and the corresponding ensemble spread in observation space. It is carried out locally as a localized ensemble data assimilation method is employed; that is, we calculate

at each point with the local innovation vector , the observation error , and the local estimate of the background error covariance in observation space. The factor is the estimate for the local variance inflation at the analysis point *p* in the LETKF.

Because of the localization procedure, the number of observations used in this method may be small, and the estimated value of *ρ* may be based on limited statistics. To make the estimate more robust, we first limit *ρ* by lower and upper bounds of 0.9 and 1.5 and afterward perform a temporal smoothing. A weighting factor is chosen to combine the estimated by (24) in the current cycle *k* and the used in the previous analysis cycle (3 h in the past) to get the to be applied in the current cycle *k*:

In the LETKF, *ρ* is used at each analysis grid point to calculate the filter transformation matrix by

where is the transform matrix defined in (6). For the localized adaptive particle filter, pure multiplicative inflation is not appropriate, since it would just inflate the distribution of the remaining duplicate ensemble members. Instead, we apply the Gaussian resampling based on (23) and (24), as described below.

### f. Gaussian resampling or rejuvenation

The LAPF first calculates the ensemble weights according to the ensemble space projection (15) and the resampling (19) at each of the analysis points . Usually, this resampling leads to part of the total number of particles only getting the majority of the weights. Often, a rejuvenation step [see Doucet et al. 2001; van Leeuwen et al. 2015, Eq. (2.39)] is carried out around each of the remaining particles (i.e., new particles are generated based on a pseudorandom draw in ensemble space). We note that this rejuvenation can be considered as classical resampling from some posterior represented by a superposition of Gaussian functions in the spirit of classical MCMC methods (Nakamura and Potthast 2015, chapters 4 and 5).

In our implementation of the LAPF, we draw from a Gaussian distribution around each remaining particle in ensemble space, used with appropriate multiplicity as constructed by in (19). Using a Gaussian with mean given by the column vector of and with a covariance matrix , this leads to a draw from a distribution in physical space with mean given by the ensemble and the rescaled ensemble covariance matrix .

#### 1) Global rejuvenation

A pseudorandom matrix with each element draw from a Gaussian distribution with mean zero and covariance 1 is chosen globally to ensure the best possible continuity of the meteorological variables in physical space. Since resampling is global, without modulating the random matrix , the resulting perturbations would be scaled superpositions of the original ensemble members, keeping linear balances completely and nonlinear balances to some degree. However, adaptivity of the pseudorandom draws turns out to be crucial to avoid filter divergence and filter collapse. Only after adapting the spread of the rejuvenation or resampling carefully as follows did the filter start to be stable.

#### 2) Adaptivity

For the adaptive resampling step, the size of the draw (given by ) is modulated by applying a scalar perturbation factor *σ* for each analysis grid point. Scaling of the draw around each member at time is carried out by

The specification of the factor *σ* is based on the inflation parameter estimated in (25):

where elementary tuning tests lead to the values , , , and (which might not be an optimal choice). The function continuously depends on *ρ* with if and if .

We summarize that we always perturb each of the remaining members of the filter after the classical resampling step by a Gaussian with standard deviation of at least and at most . The scaling is a continuous function of the input parameters of our estimator of *ρ* (i.e., it depends continuously on space, on the observations, and on the ensemble members). The only discontinuities can occur when the matrix in dependence of the spatial point *p* itself is discontinuous according to a change in the weights resulting from the classical resampling procedure.

#### 3) Number of surviving particles

We remark that the adaptive Gaussian rejuvenation ensures that with probability one, the analysis ensemble consists of *L* distinct particles. However, in the first classical resampling step, the number of distinct particles can be significantly smaller with the limiting case where only one particle gains all the weight, such that after resampling and before rejuvenation, we have *L* identical copies of this particle in a given localization point. It is, of course, highly interesting to study the statistics of how many particles remain at each analysis grid point and in what way the ensemble projection (11) helps the filter to stay away from collapse.

In Fig. 1, for three dates close to the end of the experimental period, we show some histograms of the number of particles at each analysis grid point surviving the first resampling step (i.e., those particles with weights before rejuvenation when normalizing the total weight to *L*). The results show that at 100 hPa (high level with few data; left column), mainly five to 20 particles obtain most of the weight. There are considerably fewer cases where 20 to 40 particles survive the classical resampling step.

In the midtroposphere (500 hPa; middle column) it can clearly be seen that the first date (0300 UTC 20 May 2016) and the third date (2100 UTC 31 May 2016) show a quite similar distribution. The number of cases with one up to 30 particles with weight larger than one is very similar. For the second date (second row; 1200 UTC 25 May 2016), there are only a few cases with more than 20 surviving particles. This is probably due to the larger amount of synoptic data at 1200 UTC, compared to 0300 and 2100 UTC.

The result for the bottom level (1000 hPa; right column) differs from the other two levels. In all these subfigures, there exists one peak at one to five particles. For the first and the third rows, there is also a peak at 30 particles. The first peak might be due to model biases in the boundary layer in combination with a high number of observations.

A high number of observations leads to a small number of particles surviving the classical resampling, which is the well-known filter divergence phenomenon. We remark, however, that due to the ensemble transform and projection step of section 3b, this divergence only occurs in a part of the localization boxes. Further, adaptive Gaussian rejuvenation in ensemble space guarantees the calculation of *L* distinct analysis ensemble members with a controlled spread and distribution.

#### 4) Examples

Examples for the matrix from (27) are displayed in Fig. 2. The matrices show how the analysis ensemble is constructed from the first-guess ensemble. Entries close to one indicate that an analysis ensemble member is sampled from the respective first-guess member. In general, some first-guess members lead to multiple analysis members, whereas others are dismissed totally (only entries close to zero in that row). The deviations from zero or one are consequences of the Gaussian resampling step.

In the first case (top left; in the lower stratosphere with few observations), the analysis members are sampled from a large number of first-guess particles. Going farther to the ground, fewer first-guess particles are selected, but are replicated multiple times due to their large weight. This is caused by the larger number of observations farther down in the atmosphere putting more constraints on the prior distribution.

It may also be noted that in Fig. 2a, the strength of the Gaussian resampling is weak so that entries remain close to zero or one. In the subsequent figures, the estimated inflation factor is larger so that final weights almost fill the range in between −0.5 and 1.

## 4. Numerical tests in the operational framework and results

Here, we present tests of the localized adaptive particle filter for the global ICON model for a typical experimental setup and for a time period of 1 month. In particular, we investigate the assimilation cycle and forecast scores in some detail, diagnosing the development of spread, bias, scores, and the stability of the system.

We start with evaluating the assimilation cycle of the LAPF experiment in section 3a. As reference, we use the LETKF implementation that is operational at DWD. With the help of spread control, the LAPF provides reasonable results and is stable over the full experimental period. In section 3b, we investigate the quality of forecast runs and compare it to the performance of the system in the operational setup.

### a. Assimilation cycle

For the experimental diagnostics, a spinup period of 1 week is excluded from the observation minus first-guess statistics, as bias-correction algorithms and ensemble spread have to adapt to the new method.

We first investigate upper-air observation minus first-guess (obs − fg) statistics based on radiosonde observations (TEMP). Some selection of results is displayed in Fig. 3, in particular bias (ME), RMSE, and standard deviation (SD) for 3-h forecasts (first guesses), generated by the reference (LETKF) and the particle filter (LAPF). These statistics are based on observations that passed the quality control (i.e., that were actually used in the assimilation). Figure 3 visualizes statistics for the global domain and for the time period 8–31 May 2016.

The results show that the root-mean-square error and standard deviation of the current LAPF are about 10%–15% worse than those of the LETKF. They also show that the system is functioning and shows comparable features to the LETKF-based statistics. The LAPF shows better results for upper-air temperature than for relative humidity. For the bias, the results for relative humidity determined by the LAPF are slightly better than the results of the LETKF (in the sense that they are closer to zero). The values of RMSE and SD in the vertical column show similar shapes, but with higher values for LAPF. Overall, this first implementation of the LAPF scheme, where we were not yet able to carry out the time-consuming full tuning, which is usually done for an operational system, shows a very reasonable behavior in comparison to the LETKF.

Since filter collapse and filter divergence are of high interest, we next investigate the behavior of corresponding diagnostics, investigating spread behavior and the number of surviving particles in each resampling step. Figure 4 shows the spread averaged over all ensemble members for 0000 UTC 31 May 2016 (i.e., at the end of the first month of cycling). Displayed is the spread at ICON level 64 (approximately 500 hPa) for upper-air temperature and specific humidity. In this context, spread is the pointwise variance of the ICON-EPS.

The left panels in the second and third rows show the fields for the LAPF; the right panels show those for the LETKF. In the first row, the differences of the spread between LETKF and LAPF are displayed for upper-air temperature in Fig. 4a and specific humidity spread in Fig. 4b. It can clearly be seen that the structures are similar, but the spread for both variables is higher for the LETKF ( and ) than for the LAPF ( and ).

The main structures of the spread of LAPF and LETKF show similar physical features. The differences between the two filters are more random and linked to the stochastic parts of the methods.

The regions with the maximum spread are also the places where the LAPF and the LETKF differ most. One example of this is the temperature over Madagascar. Here, the spread of the LETKF is much larger than that of the LAPF. Also, the vortex over western Siberia shows a big difference between the two filters.

For the specific humidity, the biggest differences are situated in the tropics, where humidity values are large. The spread difference plot clearly resembles the patterns of the spread in specific humidity itself. It indicates that the differences between LAPF and LETKF are often situated at the borders of the big vortices. The largest differences are situated, for example, in the region around the west coast of Mexico, where the LAPF has a maximum but the LETKF does not.

For studying the development of the spread and the stability of the filter, time series from 1 to 31 May 2016 for both filters are plotted in Fig. 5. The spread starts from zero and needs a couple of days to settle, because we started all members from the same state duplicated times. The first row shows the mean of the spread for upper-air temperature and specific humidity, calculated at each point in time and for one horizontal level of the atmospheric grid. For both quantities, the LAPF shows lower values for mean spread than the LETKF. The same holds for the minimum value of the spread. However, the maximum values of the spread of the LETKF and the LAPF show quite similar values.

### b. Forecast verification

Forecasts were run twice a day at 0000 and 1200 UTC. In Figs. 6 and 7, a verification of temperature, relative humidity, and wind components compared to radiosonde observations is shown: continuous ranked probability score (CRPS), SD, RMSE, and ME for lead times of 24, 48, 72, 96, 120, and 144 h for the forecasts based on the LETKF and the LAPF analyses. The determination of SD, RMSE, and ME is based on the ensemble mean. At this stage, the results show that in the current development stage, the LAPF does not outperform the LETKF. However, the shapes of CRPS, SD, and RMSE are comparable, indicating that the LAPF is stable and has a reasonable behavior. The humidity bias of the LAPF is reduced in comparison to the LETKF at all heights above 850 hPa. It is consistent with theory and implementation that the particle filter does not draw the model fields to the observations as strongly as the LETKF, a distance that is maintained throughout all forecast lead times and explains the behavior of the scores.

## 5. Conclusions

Standard algorithms for data assimilation used for large-scale atmospheric analysis in operational centers include the ensemble Kalman filter and 4D-VAR. These data assimilation methods are either inherently or practically based on the assumption that the underlying distribution is Gaussian. If the ensemble distribution is not Gaussian, these methods are not optimal. In the case of non-Gaussianity, more general Bayesian methods such as the particle filter have been proposed. The core idea of the particle filter is to realize the Bayesian approach giving a weight to each particle, depending on its distance to the observations. The adaptation of particles is carried out in different ways: by resampling, nudging particles toward some proposal distribution, or optimal transport processes.

Classical particle filters in high-dimensional dynamical systems suffer from filter divergence or filter collapse due to the curse of dimensionality. In this work, we have developed and implemented a localized adaptive particle filter (LAPF) in ensemble space with spread control and Gaussian resampling or rejuvenation. With the help of modulated rejuvenation, we prevent the filter divergence as well as filter collapse. It has been implemented for global atmospheric data assimilation to fit into the framework of the global operational weather prediction model ICON of DWD.

The LAPF was tested over a period of 1 month with 40 ensemble members, a global horizontal resolution of 52 km, and 90 vertical layers in an operational setup with slightly reduced resolution. A comparison of the scores with those of the operational system of DWD (with some modest reduction of resolution) shown in section 4 demonstrates that the localized adaptive particle filter is able to provide reasonable atmospheric analysis in a large-scale environment. We have shown that for this first attempt, the RMSE quality of forecasts based on the LAPF is 10%–15% behind the forecast quality of the LETKF for forecasts up to several days (cf. Figs. 6 and 7), and BIAS is partly improved, in particular for humidity. Altogether, for the assimilation cycle and forecasts, the LAPF shows promising results (Fig. 3). Furthermore, we are able to demonstrate the stability of the LAPF over a period of 1 month (cf. Figs. 4 and 5) and show that atmospheric data assimilation within an operational modeling environment is possible based on a localized adaptive particle filter approach.

## REFERENCES

*Optimal Filtering*. Courier Corporation, 368 pp.

*Fundamentals of Stochastic Filtering*. Stochastic Modelling and Applied Probability Series, Vol. 60, Springer, 390 pp., https://doi.org/10.1007/978-0-387-76896-0.

*Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh*, Vol. 3, Institute of Mathematical Statistics, 318–329, https://doi.org/10.1214/074921708000000228.

*The Oxford Handbook of Nonlinear Filtering*. Oxford University Press, 1080 pp.

*Atmospheric Data Analysis.*Cambridge University Press, 472 pp.

*Sequential Monte Carlo Methods in Practice*. Springer, 582 pp., https://doi.org/10.1007/978-1-4757-3437-9.

*Data Assimilation: The Ensemble Kalman Filter*. Springer, 280 pp.

*Atmospheric Modeling, Data Assimilation and Predictability.*Cambridge University Press, 341 pp.

*Inverse Modeling*. IOP Publishing, 509 pp., https://doi.org/10.1088/978-0-7503-1218-9.

*Probabilistic Forecasting and Bayesian Data Assimilation.*Cambridge University Press, 297 pp., https://doi.org/10.1017/CBO9781107706804.

*Weather Prediction by Numerical Process*. Cambridge University Press, 258 pp.

*Seminar on Recent Developments in Data Assimilation for Atmosphere and Ocean*, Shinfield Park, Reading, United Kingdom, ECMWF, 265–286, https://www.ecmwf.int/en/elibrary/13249-nonlinear-ensemble-data-assimilation-ocean.

*Nonlinear Data Assimilation*. Frontiers in Applied Dynamical Systems: Reviews and Tutorials Series, Vol. 2, Springer, 118 pp., https://doi.org/10.1007/978-3-319-18347-3.

## Footnotes

© 2019 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

^{1}

TEMPs and PILOTs are particular weather balloons measuring profiles of, for example, temperature and humidity.

^{2}

SYNOPs are the classical land-based weather stations measuring, for example, temperature, humidity, and pressure.

^{3}

Calculated from subsequent satellite images.

^{4}

This is readily obtained from , where denotes an inner product and where and . See Nakamura and Potthast (2015) for a detailed introduction.

^{5}

Active observations are those that passed the quality control.