## Abstract

This paper proposes an efficient data assimilation approach based on the sigma-point Kalman filter (SPKF). With a potential for nonlinear filtering applications, the proposed approach, designated as the local unscented transform Kalman filter (LUTKF), is similar to the SPKF in that the mean and covariance of the nonlinear system are estimated by propagating a set of sigma points—also referred to as ensemble members—generated using the scaled unscented transformation (SUT), while making no assumptions with regard to nonlinear models. However, unlike the SPKF, the LUTKF can reduce the influence of observations on distant state variables by employing a localization scheme to suppress spurious correlations between distant locations in the error covariance matrix. Moreover, while the SPKF uses the augmented state vector constructed by concatenating the model states, model noise, and measurement noise, the system state for the LUTKF is not augmented with the random noise variables, thereby providing an accurate state estimate with relatively few sigma points. In sensitivity experiments executed with a 40-variable Lorenz system, the LUTKF required only three sigma points to prevent filter divergence for linear/nonlinear measurement models. Comparisons of the LUTKF and the local ensemble transform Kalman filters (LETKFs) reveal the advantages of the proposed filter in situations that share common features with geophysical data assimilation applications. In particular, the LUTKF shows considerable benefits over LETKFs when assimilating densely spaced observations that are related nonlinearly to the model state and that have high noise levels—such as the assimilation of remotely sensed data from satellites and radars.

## 1. Introduction

One of the most widely used Bayesian filters for assimilating atmospheric and oceanic data is the ensemble Kalman filer (EnKF; Evensen 1992, 1994; Houtekamer and Mitchell 1998; Evensen and van Leeuwen 2000). Several variations of the EnKF have been introduced in the field of Earth science, such as the local ensemble Kalman filter (LEKF; Ott et al. 2004), the ensemble transform Kalman filter (ETKF; Bishop et al. 2001), and the local ensemble transform Kalman filter (LETKF; Hunt et al. 2007).

For a nonlinear model, the EnKF methods can propagate the state estimate and its covariance in time through fully nonlinear model equations without the linearization (e.g., calculation of the Jacobian or Hessian) of the nonlinear model as in the EKF (extended Kalman filter). However, the EnKF approaches assume a linear observation operator to obtain the standard Kalman gain and error covariance; to be more precise, they take into consideration only up to the first-order Taylor approximation of the nonlinear observation operator, instead of linearizing the nonlinear operator on the entire model space as in Eqs. (6.11), (6.14), and (6.15) of Hamill (2006). For this reason, it cannot provide an accurate estimate of up to the second-order moment in relation to the true states. Thus, if the EnKF algorithms are executed with highly nonlinear observation functions, they can produce large errors when estimating the states of the nonlinear observation [e.g., satellite altimetry data assimilation, where observations have a strong nonlinear relationship with the model state variables (Hamon et al. 2019)].

The choice of the ensemble size has a significant impact on the analysis performance of the EnKF algorithm. To determine which linear combination of ensemble members approaches the true state of the system through the analysis, the ensemble size of the EnKF should be large enough to approximately span, as fully as possible, the state space, although it can be reasonably small compared to Monte Carlo approaches. However, the use of excessive ensemble members may not be suitable for high-dimensional models, such as atmospheric and oceanic models, because it requires substantial computational resources. While the hybridization of background error covariance can be used for reducing the sampling noise with the aid of the static background error covariance, this study focuses on the identification of the worth of the new ensemble DA scheme. The background modeling method is set to be pure ensemble-oriented for the discussion to be simplified.

Recently, the sigma-point Kalman filter (SPKF), a new ensemble-based data assimilation method using the scaled unscented transformation (SUT) in nonlinear systems, has been proposed to overcome the aforementioned drawbacks associated with the EnKF approaches in the field of Earth science (Ambadan and Tang 2009; Luo and Moroz 2009; Tang et al. 2014). The most important difference between the SPKF and EnKF methods described above is in the generation of ensemble members and in the determination of the ensemble size. Unlike the EnKF methods, the sigma points in the SPKF are deterministically selected using the SUT during every data assimilation cycle. For an *L*-dimensional system, a set of 2*L* + 1 sigma points in the SPKF is used to propagate the state estimate and its error covariance (uncertainty) through nonlinear model equations. Unlike the EnKF algorithms, the SPKF makes no assumptions regarding the linearized observation operator; therefore, it can estimate the error statistics of the model state through a fully nonlinear measurement operator by reinterpreting the calculation for the standard Kalman gain and analysis covariance (Ambadan and Tang 2009; Tang et al. 2014).

Despite this advantage offered by the SPKF over the EnKF methods, the SPKF approach is associated with some practical problems. The SPKF algorithm that uses 2*L* + 1 sigma points for an *L*-dimensional system becomes prohibitively computationally expensive when dealing with a high-dimensional system due to the computational costs associated with more dimensions. Moreover, the SPKF approach uses the augmented state vector that is constructed by concatenating the model states, model noise, and measurement noise. The augmented state dimension is the sum of the model state, model error, and measurement error dimensions. Hence, in the SPKF approach, it can be extremely difficult to assimilate a large number of observations for high-dimensional systems.

To solve this problem, some studies on SPKF have been made. One approach is to reduce sigma points so that only the most important sigma points are used. For example, the simplex SPKF (Julier and Uhlmann 2002) uses *L* + 1 sigma points that affect the evolution of error covariance in a nonlinear model with a large dimensionality. Another approach is the principal component analysis (PCA), which presents the principal component space with a fewer number of sigma-points while still having the main features of the full sigma-point space (Ambadan and Tang 2009). Although this scheme reduces the number of sigma points in full sigma-point space, it can approximate the error covariance with the same estimation accuracy as the original SPKF using sigma points in principal component space. An alternative solution to the same problem is to use the truncated singular value decomposition (TSVD) for the SPKF (Luo and Moroz 2009; Tang et al. 2014). Because the number of the sigma points decreases by truncating the analysis error covariance matrix, this approach can reduce the computational cost of the SPKF efficiently. However, for higher-dimensional systems (such as global atmospheric and oceanic models), the reduced SPKF schemes mentioned above are still computationally intractable.

The other potential solution to the problem with the SPKF is the localization, which is important for applications to high-dimensional systems in the EnKF algorithms. The localization scheme allows the analysis to be performed more efficiently for each model grid point with a moderate ensemble size as in low-dimensional models. However, since the SPKF uses the augmented state vector constructed by concatenating model state variables, model noise, and measurement noise for all model grid points, the model state estimate of the SPKF is executed for the global domain. Therefore, the localization approach in the SPKF may not be used to efficiently execute the data assimilation.

In this paper, we propose an efficient scheme for implementing the SPKF algorithm, called a local unscented transform Kalman filter (LUTKF). Unlike the SPKF, the LUTKF uses the localization scheme, which is the same as that used in the LETKF; that is, it executes the local data assimilation at each model grid point. Furthermore, the LUTKF does not use the augmented state vector; that is, it uses the state vector that consists of only the state variables. This makes it easier to implement the local analysis in the LUTKF and enables the LUTKF to be more computationally efficient for nonlinear systems. In this study, the results from cycling data assimilation experiments executed through the 40-variable Lorenz (1996) model show that the LUTKF has great potential in performing efficient data assimilation with a small and reasonable ensemble size for nonlinear models. The performance benefits of the LUTKF were evaluated using the existing ensemble Kalman filters as benchmarks, including the SPKF (known as a global filter) and the LETKF (known as a local filter). As future work, we will explore the potential for high-dimensional models, such as atmospheric and oceanic models.

The rest of this paper is organized as follows: section 2 presents the SPKF in its most basic form; section 3 introduces the LUTKF algorithm, which incorporates the nonaugmented SPKF algorithm and the localization approach; section 4 presents experiments to verify the validity of the LUTKF algorithm as an assimilation method; and finally, section 5 summarizes results and discusses the potential of the LUTKF approach for high-dimensional problems.

## 2. Sigma-point Kalman filtering

The basic framework for the SPKF algorithm involves estimating the state of a nonlinear system by the dynamical model and the observation model with nonadditive noise as follows:

where **x**_{k} denotes the system state vector at time *k*, *f*(⋅) is a nonlinear function of the state, **z**_{k} is the measurement vector, and *h*(⋅) is the nonlinear function that describes the relationship between the measurement and state. The process noise **w**_{k} and measurement noise **v**_{k} are assumed to be white noise and uncorrelated with the zero-mean and covariance matrices $Q$_{k} and $R$_{k}, respectively.^{1}

In the SPKF, the process noise and measurement noise are treated as specific states to be estimated; however, they are not actually estimated using the filter. Hence, the state vector for the SPKF is redefined as the concatenation of the original state, process noise, and measurement noise (Ambadan and Tang 2009; Tang et al. 2014). The augmented state vector $xka$ and corresponding covariance $Pka$ are given by

where $P$_{k} is the estimation error covariance of the original model state **x**_{k}. The dimension of the augmented state vector is as follows:

where *L*_{x} is the original state dimension, *L*_{w} is the process noise dimension, and *L*_{υ} is the measurement noise dimension (i.e., $xka\u2208\mathbb{R}Lx+Lw+L\upsilon $).

The SPKF algorithm can estimate the statistical moments of the nonlinear model using the SUT (Julier and Uhlmann 2004). In the SPKF, the mean and covariance of the model state are calculated by propagating the deterministically chosen sigma points, also referred to as ensemble members, through the nonlinear model. The data assimilation procedure of the SPKF can be divided into two phases: sigma point selection and state estimation.

### a. Sigma point selection

During this phase, 2*L*^{a} + 1 sigma points are deterministically chosen using the SUT to enable them to capture the statistical properties of the nonlinear model (Julier et al. 2000; Ito and Xiong 2000; Wan and van der Merwe 2000). Thus,

where the subscript *k* − 1|*k* − 1 means the posterior or analysis state at time *k* − 1, $x^k\u22121|k\u22121a$ is the mean of the augmented state, *λ* is a scaling parameter, and $[\u2061(La+\lambda )Pk\u22121|k\u22121a]i$ is the *i*th column (or row) vector of the matrix square root of the weighted covariance matrix $\u2061(La+\lambda )Pk\u22121|k\u22121a$. For numerical efficiency, the Cholesky factorization is generally used to calculate the matrix square root.

The weight associated with the *i*th sigma-point is defined as

where $\omega \u2061(m)i$ and $\omega \u2061(c)i$ represent the weighting terms used to calculate the mean and covariance, respectively. The scaling parameter *λ* is determined by *λ* = *α*^{2}(*L*^{a} + *κ*) − *L*^{a}. The value of *α* is a parameter that controls the size (spread) of the sigma-point distribution around the mean $x^k\u22121|k\u22121a$ and should be set to a small positive value (0 ≤ *α* ≤ 1). The scaling parameter *κ* is a nonnegative value (*κ* ≥ 0) that guarantees the positive semidefiniteness of the covariance matrix, but it is not critical; therefore, the choice of *κ* = 0 is good as a default value. Another parameter *β* can be used to integrate the knowledge of the higher-order moments of the state distribution and is usually set to a nonnegative value (*β* ≥ 0).^{2}

### b. Mean and covariance estimation

The sigma points are propagated through the fully nonlinear system [Eqs. (1) and (2)], while making no assumptions regarding the linearity of prediction and observation models:

where $\chi \u2061(x),k|k\u22121i$ is the prior or forecast sigma point state vector at time *k*, $\chi \u2061(w),k\u22121|k\u22121i$ and $\chi \u2061(\upsilon ),k\u22121|k\u22121i$ are sigma point vectors that correspond to the model error and observation error, respectively, and $zki$ represents the transformed sigma point in observation space.

Using the transformed sigma points obtained from Eqs. (12) and (13), the predicted state estimate (background mean) $x^k|k\u22121$, corresponding covariance $P$_{k|k−1}, predicted measurement estimate $z^k$, and its error covariance $S$_{k} at time *k* can then be calculated by

The SPKF expressions for calculating the optimal Kalman gain $K$_{k}, the updated state estimate (analysis mean) $x^k|k$, and its covariance $P$_{k|k} are as follows (more detailed derivations for $K$_{k} and $P$_{k|k} can be found in the appendix):

where $P$_{xz} denotes the cross covariance between $x^k|k\u22121$ and $z^k$ and can be computed by

The estimates of the mean and covariance using SUT are accurate to the second-order (third order for true Gaussian priors) of the Taylor series expansion for a nonlinear function.^{3} However, the SPKF approach is computationally unfeasible in high-dimensional models, such as atmospheric or ocean models, due to the large dimensionality (*L*^{a}) of the augmented states. Another major drawback of the SPKF approach is that the sigma points of the SPKF are determined based on the global domain, making it impossible to use the localization scheme for generating the sigma points, which is a good strategy for overcoming issues with computational feasibility.

## 3. The local unscented transform Kalman filter

The LUTKF locally executes the data assimilation based on the nonaugmented SPKF algorithm (section 3a) where the spatial localization method (section 3b) is applied at each grid point.

### a. Nonaugmented SPKF filtering

Suppose we have an *L*_{x}-dimensional discrete-time nonlinear system described by the dynamical model and the observation model with additive noise. Accordingly, Eqs. (1) and (2) can be rewritten, respectively, as

The process noise and measurement noise in the fully nonlinear system [Eqs. (1) and (2)] are injected in a nonlinear fashion. Hence, the state vector for the SPKF is redefined as the concatenation of the original state, process noise, and measurement noise so that the SPKF algorithm takes nonlinearities in the multiplicative noise into account. However, in the case of additive process and measurement noise, the system state vector does not need to be augmented with the random noise variables, since the SPKF does not need to consider nonlinearities in the noise model as multiplicative noise in Eqs. (1) and (2) (Julier and Uhlmann 1996; van der Merwe 2004). The use of the nonaugmented state vector can reduce the dimension of the sigma points as well as the total number of sigma points used. While the dimension of the augmented state vector in the SPKF is *L*^{a}, the dimension of the state vector in the nonaugmented SPKF is *L*_{x}. Hence, while the number of the sigma points required to estimate the true mean and covariance in the SPKF is 2*L*^{a} + 1, the number of the sigma points used in the nonaugmented SPKF is 2*L*_{x} + 1.

The sigma point selection and state estimation of the nonaugmented SPKF are similar to those of the SPKF, except for the use of the nonaugmented state vector; that is, the augmented state estimate $x^a$, covariance $P$^{a}, and state dimension *L*^{a} in both phases of the SPKF [Eqs. (6)–(21)] are replaced with nonaugmented forms $x^$, $P$, and *L*_{x} for the nonaugmented SPKF, respectively. Furthermore, the sigma points are transformed through the dynamical model (22) and measurement model (23) as follows:

Since the process noise **w**_{k} and measurement noise **v**_{k} are assumed to be white noise with the zero-mean, there are not additive noise terms in Eqs. (14) and (16) for the nonaugmented SPKF. Therefore, the sigma points for noise vectors are not necessarily required in Eqs. (24) and (25) as in van der Merwe (2004).

Note that unlike the SPKF, the covariance matrices $Q$_{k−1} and $R$_{k} are added at the end of the prior error covariance $P$_{k|k−1} and measurement error covariance $S$_{k}, respectively, in order to take account of the model error **w**_{k−1} and observation noise **v**_{k} as follows:

The nonaugmented SPKF algorithm has been successfully implemented and employed to analyze practical systems (Farina et al. 2002; Crassidis and Markley 2003; Bshara et al. 2010; Cui et al. 2017). A detailed description and deviation of this SPKF algorithm can be found in van der Merwe (2004).

### b. Spatial localization

The LUTKF algorithm is analogous to carrying out the SPKF locally, with the nonaugmented form for each region of the model. The spatial localization has a great impact on the performance of the EnKFs for high-dimensional systems, such as atmospheric and oceanic models (Oke et al. 2005, 2008; Deng et al. 2010).

The localization requires enough ensemble members to represent the uncertainty in only the local space; therefore, the local analysis for LUTKF can efficiently perform the data assimilation with a smaller ensemble size for each model grid point, just as it is applied to the low-dimensional system. Localizing the analysis can suppress spurious correlations between distant positions in the background covariance $P$_{k|k−1}, thereby providing better analysis results (Houtekamer and Mitchell 2001; Whitaker and Hamill 2002). Moreover, the local analysis can be computed independently in parallel. However, the localization can increase the imbalance of analyses (Greybush et al. 2011).

The localization approaches used in the EnKF algorithms can be classified into two general categories: B localization that addresses the background covariance $P$_{k|k−1}; and R localization that deals with the observation error covariance $R$_{k} (Greybush et al. 2011; Houtekamer and Zhang 2016).

The B localization multiplies the elements in the background covariance $P$_{k|k−1} by a distance-dependent function that decays to zero beyond a cutoff distance, and thus the observations do not influence the model state beyond the cutoff distance (Houtekamer and Mitchell 2001; Whitaker and Hamill 2002). The B localization can be applied to the EnKF algorithm by multiplying the elements of the background covariance matrix $P$_{k|k−1} by another matrix ** ρ** whose elements denote the distance-dependent localization function using a Schur product; therefore, the ensemble Kalman gain in the EnKF is given by

where $H$ is the linearized operator (partial derivative) of the nonlinear measurement function *h*, and $\u2218$ represents an element-wise product.

However, applying the B localization to the LUTKF in order to remove spurious correlations in the background covariance $P$_{k|k−1} may be computationally intractable, since the background covariance matrix $P$_{k|k−1} is not represented explicitly in the expressions for calculating the background observation covariance $S$_{k} and Kalman gain $K$_{k} [Eqs. (17) and (18)]. Furthermore, since the B localization in model space yields full-rank background covariance matrix $P$_{k|k−1} (Manoj et al. 2014), the LUTKF may suffer from decomposing the background covariance matrix. For this reason, the B localization may not be suitable for the LUTKF.

The simplest method for implementing the R localization takes into account the observations only within a cutoff radius surrounding the grid point, and it carries out the analysis separately for each spatial grid point of the model (Houtekamer and Mitchell 1998; Keppenne 2000; Ott et al. 2004; Hunt et al. 2007). However, the abrupt cutoff distance in the R localization can lead to a noisy estimate of the model state. To solve this problem, the smoothed R localization was proposed by Hunt et al. (2007). The localization can achieve an effect similar to the B localization by multiplying the elements of the observation error covariance $R$_{k} by a function that increases the uncertainty assigned to the observations gradually, as the distances of the observations from the analysis grid point increases (Houtekamer and Zhang 2016); therefore, the observations do not have an impact on the model state at analysis grid point beyond the cutoff distance *c*.

The localization can be applied to the LUTKF by dividing the diagonal elements of $R$_{k} in Eq. (27) by the Gaussian localization function *G*(*d*, *L*) = exp(−*d*^{2}/2*L*^{2}), where *d* is redefined as the distance between observation and analysis grid point, and *L* is reinterpreted as the localization distance for the observations.^{4} In practice, this study uses the following piecewise polynomial approximation of the Gaussian localization function with compact support, which is given by Eq. (4.10) of Gaspari and Cohn (1999) as follows:

This equation facilitates addressing the localization values in terms of the cutoff distance *c* = 2 × (0.3)^{−0.5}*L* rather than the localization distance *L*. Considering only the diagonal elements of $R$_{k} for the localization can offer better computational efficiency compared to the localization technique that requires another matrix whose elements represent the localization function for all the elements, as in the B localization.

The LUTKF offers the following advantages over the SPKF: (i) the use of the nonaugmented state makes it easier to implement the localization scheme at each model grid point by considering only the original model state, not including the process and measurement noise in the state vector, and it allows the LUTKF to be more computationally efficient by reducing the number of sigma points; and (ii) unlike the SPKF, the LUTKF can employ the localization method used to enhance the computational efficiency and to suppress spurious correlations between distant locations in the error covariance matrix.

### c. Efficient algorithm implementation

In this subsection, we provide a step-by-step description for the LUTKF algorithm in aspect of ease of implementation and computational efficiency; therefore, we account for the simplest method for implementing the LUTKF algorithm in each step and demonstrate alternative methods for minimizing the amount of the computation in some steps.

By exploiting the localization method introduced in section 3b, the LUTKF can split the global analysis domain into a number of local analysis points (i.e., model grid points where the analyses are executed), and its data assimilation can be conducted independently for each local analysis point, as in the LETKF (Hunt et al. 2007; Greybush et al. 2011). Thus, if the state dimension of the dynamical model can be expressed as *L*_{x} at each model grid point, then the LUTKF based on the nonaugmented state vector starts with an initial ensemble of 2*L*_{x} + 1 *L*_{x}-dimensional model state vectors ${\chi \u2061(x),0|0i}i=02Lx$ generated locally using the deterministic sampling based on the SUT at each grid point, just as it is applied to the *L*_{x}-dimensional system. For the LUTKF, the mean and covariance of the model state are initialized before any observations are available as follows:

where *E*[⋅] denotes the expected value or the mathematical expectation, and **x**_{0} is the initial model state and can be chosen to properly approximate the error statistics of the model, as in Evensen (2003). By substituting *L*^{a}, $x^k\u22121|k\u22121a$, and $Pk\u22121|k\u22121a$ with *L*_{x}, $x^0|0$, and $P$_{0|0} in Eqs. (6)–(11), respectively, the initial ensemble members ${\chi \u2061(x),0|0i}i=02Lx$ and their associated weights ($\omega \u2061(m)i$ for the ensemble mean and $\omega \u2061(c)i$ for the ensemble covariance) are locally determined at each grid point.

For example, for the Lorenz (1996) model (hereafter L96) that mimics the time evolution of an unspecified scalar meteorological quantity *x* at *N*_{x} equidistant grid points along a latitude circle, the LUTKF starts with three (2*L*_{x} + 1, where *L*_{x} = 1) ensemble members chosen locally in a deterministic manner for each model grid point of the *N*_{x}-dimensional L96 model, just as it is applied to the one-dimensional model (i.e., *L*_{x} = 1) by the spatial localization at that model grid point, as shown in Fig. 1. That is, because the L96 model has one model variable *x* for one model grid point, three ensemble members are locally used for the LUTKF to solve the analysis at each model grid point. On the other hand, the LETKF starts with randomly generated ensemble members at each model grid point (see Fig. 1).

After generating the initial ensemble, the LUTKF recursively estimates the model state using the following steps:

#### 1) Step 1

In this step, before the model state for each model grid point at time *k* is predicted, a global analysis ensemble of 2*L*_{x} + 1 *L*_{x,[g]}-dimensional model state vectors ${\chi \u2061(x),k\u22121|k\u22121,[g]i}i=02Lx$ at time *k* − 1 is first obtained by the concatenation of the local analysis ensembles ${\chi \u2061(x),k\u22121|k\u22121i}i=02Lx$ for each grid point of the model, where [*g*] indicates that the ensemble member reflects the global model state. The global model state dimension *L*_{x,[g]} is computed by multiplying local state dimension *L*_{x} by the number of the model grid points. Then, the global background ensemble ${\chi \u2061(x),k|k\u22121,[g]i}i=02Lx$ at time *k* is obtained by applying the nonlinear function *f* to each global analysis ensemble member as follows:

The background ensemble mean $x^k|k\u22121$ at each model grid point is locally calculated using the local background ensemble ${\chi \u2061(x),k|k\u22121i}i=02Lx$ for each grid point as follows:

Also, the background error covariance $P$_{k|k−1} for each grid point is locally obtained by computing the weighted sum of the outer product for the local background ensemble perturbations and then adding the *L*_{x} × *L*_{x} local process noise covariance matrix $Q$_{k−1} at the end of the weighted sum, as in Eq. (26). The weighted sum of the outer product for vectors in Eq. (26) can be replaced with a matrix multiplication as follows:

where $Xk|k\u22121\omega $ and $X$_{k|k−1} are *L*_{x} × (2*L*_{x} + 1) matrices whose *i*th column are $\omega \u2061(c)i[\chi \u2061(x),k|k\u22121i\u2212x^k|k\u22121]$ and $\chi \u2061(x),k|k\u22121i\u2212x^k|k\u22121$, respectively.

#### 2) Step 2

To estimate the model state in observation space, this step forms the global background observation ensemble ${zk,[g]i}i=02Lx$ by mapping ${\chi \u2061(x),k|k\u22121,[g]i}i=02Lx$ into observation space through the observation operator *h*, as follows:

where [*g*] indicates that the ensemble member reflects all available observations.

#### 3) Step 3

To perform the local analysis for each grid point in the LUTKF that requires R localization rather than B localization as mentioned in section 3b, local observations within a certain cutoff distance centered on each model grid point need to be selected from global observations, as shown in Fig. 2; that is, because observations beyond the cutoff distance at each grid point do not have an impact on the local analysis, they are not included for the local analysis, as in Houtekamer and Mitchell (1998). An optimal value of the cutoff distance for achieving the best data assimilation results can be determined empirically by trial and error.

Therefore, this step starts with the choice of the local observations required to perform the local analysis at each model grid.^{5} The *L*_{z}-dimensional local observation vector **z**_{k} and its error covariance matrix $R$_{k} at each grid point are determined by selecting elements used for the local analysis within a certain cutoff distance (local region) surrounding a given model grid point in the *L*_{z,[g]}-dimensional global observation vector **z**_{k,[g]} and its error covariance matrix $R$_{k,[g]}. For each grid point, a local ensemble of 2*L*_{x} + 1 *L*_{z}-dimensional background observation vectors ${zki}i=02Lx$ can be obtained by selecting elements corresponding to the local observation vector **z**_{k} in ${zk,[g]i}i=02Lx$.

Then, the mean of the local background observation ensemble $z^k$ is locally calculated at each grid point as follows:

The error covariance $S$_{k} for $z^k$ is locally determined by computing the weighted sum of the outer product for local background observation perturbations and then adding $R$_{k} at the end of the weighted sum, as in Eq. (27). The weighted sum of the outer product for vectors in Eq. (27) can be substituted with a matrix multiplication:

where $Zk\omega $ and $Z$_{k} are *L*_{z} × (2*L*_{x} + 1) matrices whose *i*th column are $\omega \u2061(c)i\u2061(zki\u2212z^k)$ and $zki\u2212z^k$, respectively. For the smoothed R localization, the diagonal elements of $R$_{k} in Eqs. (27) and (37) can be divided by the Gaussian localization function *C*(*d*, *c*), as explained in section 3b. Similarly, the cross covariance $P$_{xz} between $x^k|k\u22121$ and $z^k$ is locally calculated by

#### 4) Step 4

In this step, the Kalman gain $K$_{k} is locally computed by multiplying $P$_{xz} by the inverse of $S$_{k} at each model grid point, as in Eq. (18). If the number of observations (*L*_{z}) used for the local analysis of each model grid point is large compared to the ensemble size (2*L*_{x} + 1) and the number of model variables (*L*_{x}), this step may result in the highest computational cost among steps due to the inverse of *L*_{z} × *L*_{z} matrix $S$_{k}.

Since the covariance matrix $S$_{k} is symmetric, then Eq. (18) can be expressed as a linear system

To avoid the matrix inversion in Eq. (18), determining $KkT$ by solving the linear system given by Eq. (39) can reduce the numerical cost related to the calculation of $K$_{k}, as in Krishnamoorthy and Menon (2013). Our simulations (not reported here), which used the 40-variable L96 model in which the average number of observations used for the local analysis is 50, showed that the computational time for this step from Eq. (18) is approximately 3.89 times as much as from Eq. (39). Therefore, it may be more efficient to compute $K$_{k} by solving the system of linear equations rather than inverting $S$_{k} in this step.

#### 5) Step 5

This step locally computes the local analysis ensemble mean $x^k|k$ and covariance $P$_{k|k} using Eqs. (19) and (20) at each model grid point, respectively. By substituting $KkSkKkT$ with $PxzKkT$ using Eq. (18), Eq. (20) can be written as

This is helpful for the efficient computation of $P$_{k|k} by reducing the number of times matrices are multiplied, compared to Eq. (20); to be more precise, the number of floating-point operations required by the calculation of $PxzKkT$ is $Lx2\u2061(2Lz\u22121)$, while computing $KkSkKkT$ requires $\u2061(Lx2+LxLz)\u2061(2Lz\u22121)$ operations.

For each grid point, the local analysis ensemble members ${\chi \u2061(x),k|ki}i=02Lx$ and their associated weights at time *k* are locally obtained by replacing *L*^{a}, $x^k\u22121|k\u22121a$, and $Pk\u22121|k\u22121a$ with *L*_{x}, $x^k|k$, and $P$_{k|k} in Eqs. (6)–(11), respectively. The ensemble members will be employed to compute the local background mean $x^k+1|k$ and covariance $P$_{k+1|k} in Step 1 of the next time *k* + 1.

### d. Scalable algorithm for ensemble size

The LUTKF can estimate the state of the nonlinear system with a smaller number of deterministically chosen sigma points using both the nonaugmented form and localization. The localization in the LUTKF is used to remove the spurious long-distance correlations due to sampling error from finite ensemble size by suppressing the influence of distant observation. Nonetheless, the use of the limited ensemble size in the LUTKF may still lead to the correlations that are deemed to be spurious, especially for higher-dimensional systems. Thus, the ensemble size needs to be large enough to approximately span the space of possible system states, because more ensemble members generally allow more observations to be used gainfully.

A possible solution to this problem is to increase the number of sigma points so that correlations between grid points that are far apart are suppressed, thus allowing the LUTKF to more accurately approximate the error correlation statistics of the model. We can devise several approaches to modify the generation of sigma points to make the LUTKF applicable for this problem. Here, as an example, we propose a variant of the LUTKF algorithm that controls the number of sigma points to be produced with the modifications, called a local scalable unscented transform Kalman filter (LSUTKF).

Similar to the LUTKF, the LSUTKF estimates the nonaugmented state of the nonlinear system [Eqs. (22) and (23)], in which the process noise and observation noise are purely additive. However, unlike the LUTKF, the LSUTKF employs a truncated normal distribution in selecting the sigma points to form the sigma point space that approximates the mean and error covariance of the model state more accurately, assuming that the system state vector **x**_{k} follows a Gaussian distribution. The truncated normal distribution is the probability distribution that represents the random variable *x* to be distributed normally with mean *μ* and variance *σ*^{2} and to be truncated to the range (*a*, *b*) (i.e., *a* < *x* < *b*), where the values of *a* and *b* are defined over the domain of the standard normal distribution (Robert 1995; Chu et al. 2010). This distribution can be referred to as $Nt\u2061(\mu ,\sigma 2,a,b)$. To adjust the number of the sigma points according to the system, the LSUTKF generates the sigma points using both the deterministic sampling approach used in the LUTKF and the stochastic (random) sampling from the truncated normal distribution.

For an *L*_{x}-dimensional system, while the LUTKF uses 2*L*_{x} + 1 sigma points, the number of sigma points required to approximate the error statistics of the model in the LSUTKF is 2*qL*_{x} + 1, where *q* is a scaling factor that adjusts the number of sigma points. The initialization, forecast, and analysis steps of the LSUTKF are executed in the same way as those of the LUTKF except for the sigma point selection and the substitution of the value of *L*_{x} in the equations of the LUTKF with *δ* = *qL*_{x}.

Given the mean $x^k\u22121|k\u22121$ and its covariance $P$_{k−1|k−1} in the sigma point selection phase of the LSUTKF, 2*δ* + 1 sigma points ${\chi \u2061(x),k\u22121|k\u22121j}i=02\delta $ are locally generated at each model grid point as follows:

and for *i* = 1, … *L*_{x}, the following procedures are executed as

where $D$ is the *L*_{x} × (*q* − 1) matrix that consists of random vectors generated from the truncated normal distribution $Nt{0,\u2061(Pk\u22121|k\u22121)i,0,[\u2061(\delta +\lambda )Pk\u22121|k\u22121]i}$, and $D$_{m} indicates the *m*th column vector of the matrix $D$.

Similar to the LUTKF, the sigma points in the LSUTKF are deterministically chosen through the vector $[\u2061(\delta +\lambda )Pk\u22121|k\u22121]i$ centered on the mean $x^k\u22121|k\u22121$ so that they can represent the mean and covariance of the posterior distribution. Moreover, to produce more sigma points that follow the Gaussian distribution, they are determined from the sigma point distribution $D$ obtained from the truncated distribution around the mean $x^k\u22121|k\u22121$.

For example, when applied to the L96 model mentioned in section 3c, sigma points generated by the LSUTKF with mean $x^k\u22121|k\u22121$ and covariance $P$_{k−1|k−1} are indicated by circle symbols at the bottom of Fig. 3. The red circle represents the sigma point for the mean $x^k\u22121|k\u22121$, and the circle symbols with green and blue solid lines from both ends around this point denote sigma points obtained by the deterministic sampling of the LUTKF. The circle symbols that are located symmetrically in the light gray shaded region around the red circle symbol indicate sigma points obtained by the random sampling from the truncated normal distribution.

Based on this approach using the truncated normal distribution, we performed data assimilation tests with an enlarged ensemble size and identified its ability to eliminate the spurious long-distance correlations in the frame of the LUTKF. However, a more detailed description of the LSUTKF is beyond the scope of this study that aims to propose a localized form of the SPKF algorithm. Therefore, in addition to data assimilation results, more details on the methodology and implementation for the LSUTKF will be described in future work.

## 4. Cycling data assimilation experiments

### a. Lorenz (1996) model

In this section, the LUTKF algorithm is applied to the L96 model. The L96 model includes *N*_{x} equally spaced variables ${xi}i=1Nx$, which can be thought of as atmospheric variables at *N*_{x} equidistant grid points along a latitude circle, governed by the following differential equations:

with cyclic boundary conditions (i.e., both $xi+Nx$ and $xi\u2212Nx$ are equal to *x*_{i}).

For experiments executed in this study, the L96 model setup is similar to that of Lorenz and Emanuel (1998), where the number of model variables *N*_{x} is set to 40 and the magnitude of the forcing *F* is fixed at 8, resulting in chaotic behavior in the system dynamics. The L96 model assumes that a unit time Δ*t* = 1 is equivalent to 5 days. Forward integration of Eq. (43) is carried out numerically through the fourth-order Runge–Kutta method with a time step of Δ*t* = 0.05 (i.e., 6 h).

### b. Design of the experiment

To investigate the data assimilation performance of the LUTKF and LETKF, several numerical experiments were performed using model and observation system configurations that mimic the features of realistic weather systems. Both the LUTKF and LETKF employ the correlation function with compact support [Eq. (29)] described in section 3b for the R localization technique, the Gaussian-type structure of which is modified by setting a localization length scale *c* to 1.1 and 3.7 for achieving the best data assimilation results for the LUTKF and LETKF, respectively. Note that the value of *c* in this experiment indicates the fraction of the distance between two adjacent grid points in the L96 model.

For the LETKF, the underestimation of forecast error covariance (i.e., uncertainty in its prior state estimate) can be caused by limited ensemble size *N*_{e} and systematic errors (Whitaker and Hamill 2002; Whitaker et al. 2008; Zhang et al. 2004). To avoid filter divergence due to this problem, in this experiment, the LETKF uses the adaptive covariance inflation scheme of Whitaker and Hamill (2012) that is referred to as relaxation to prior spread (RTPS) in addition to the localization method. The tunable parameter for the RTPS method [the *α* in Eq. (3) of Whitaker and Hamill (2012)] was set to 0.4 for all the experiments. We found that this choice of *α* ensures that the LETKF provides a low value of the domain-averaged root-mean-square error (RMSE). As *α* is increased above the optimal value, better analysis results could not be obtained, because the background error covariance is increasingly overestimated. When *α* is decreased to 0, since the background error covariance can be systematically underestimated, the performance of the LETKF was degraded due to underweighted observations.

State estimation results of the LUTKF depend on the choices for the control parameters *α*, *β*, and *κ*, which were introduced in section 2. Trial simulations (not shown here) showed that values of *α* = 1, *β* = 2, and *κ* = 0 lead to a high correlation between the true state and its estimated value by the LUTKF in the experiments executed in this study.

Considering that the model errors are white and uncorrelated, the model error covariance matrix $Q$ is assumed to be diagonal. For our experiments, the diagonal elements of covariance matrix $Q$ are set to 0.01. The true states are generated by integrating the system over 6000 cycles using the fourth-order Runge–Kutta method with the specified initial condition **x**_{0}, which is generated by adding the noise $w~N\u2061(0,Q)$ to the magnitude of the forcing *F*. Furthermore, the covariance $Q$ is incorporated into the assimilation by Eq. (26).

The experiments were carried out through three different types of the observation operators: the first experiment used the linear operator *h*(**x**) = **x**, where the model state is interpolated from the model space to observation space; and the second and third experiments exploited the nonlinear measurement operators, where the interpolated states are applied for |**x**| and ln(|**x**|), respectively, introducing an additional nonlinear source that can limit the effectiveness of the LETKF. The features of each observation operator are described in Fig. 5 in Poterjoy (2016).

Observations *N*_{z} are created by adding the noise $v~N\u2061(0,R)$ to the true data using Eq. (23), where the diagonal elements of covariance matrix $R$ are equal to 0.01.^{6} Unless stated otherwise, it is assumed that the default configuration is composed of a linear interpolation for *h* and no model error [e.g., **w**_{k−1} in Eqs. (1) and (22)] for the LETKF as in Hunt et al. (2007). Also, the value of *N*_{z} is set to 100 to prevent filter divergence during the cycling experiments performed by the assimilation methods according to three different observation operators and various ensemble sizes ranging from 2 to 1000. Trial simulations (not reported here) showed that this choice of *N*_{z} leads to a high degree of correlation between the true model state and its estimated value using the filters in the set experimental configurations. The time interval of observations was set to 6 h; thus, the observed states were assimilated to the nonlinear model every 6 h.

The spatial positions of the observations were fixed over time and determined randomly from a Gaussian distribution centered on the variable **x**_{20} with a standard deviation of one-third of the model domain length. As a result, most observations are located at the regions of the model domain surrounding the variable **x**_{20}, and thus relatively unobserved domains exist, which is similar to environmental observation platforms. After a spinup period of 1000 cycles, the average values of prior domain-averaged RMSEs and the ensemble spread over 5000 cycles were calculated to verify the performance of the data assimilation algorithms.

The assimilation experiments were performed on a Cray XC40 system (available online from Cray, Inc., at https://www.cray.com/products/computing/xc-series) comprising 2904 dual-processor nodes, each with 128 GB of random access memory. Each processor in the node is a 2.60 GHz Intel Xeon CPU E5–2690 v3 with hyper-threading disabled, running on Linux. Twenty computational nodes on the supercomputer system were employed for verification of the assimilation performance and computational efficiency of the filters. The L96 model and data assimilation methods were implemented with Python 2.7.3 software (available online from Python Software Foundation at https://www.python.org/downloads/release/python-273) and the Message Passing Interface (MPI) for the Python package (available online at https://mpi4py.readthedocs.io/en/stable) called mpi4py, used for parallel computing in Python. For the LUTKF, the data assimilation for each model grid point by using the localization scheme can be executed independently in parallel. Therefore, the Cray system was used to execute independently the local analysis for each grid point of the L96 model with *N*_{x} = 40 in parallel, using two processor compute cores per computational node; that is, data assimilation calculations of the LUTKF were parallelized by performing the local analysis using a compute core for each grid point.

### c. Experiment with ensemble size and measurement operator

Figure 4 shows the average values of the prior ensemble mean RMSE and spread of the SPKF, LUTKF, and LETKF according to three different observation operators and ensemble size *N*_{e}. The SPKF required 361 [2*L*^{a} + 1, where *L*^{a} given by Eq. (5) is determined by *L*_{x} = 40, *L*_{w} = 40, and *L*_{υ} = 100] ensemble members by the deterministic sampling and global analysis to provide satisfactory filtering solution for L96 model grid points. As described in section 3, the LUTKF algorithm based on the nonaugmented SPKF filtering and the localization approach can carry out the data assimilation with a small ensemble size, just as it is applied to the low-dimensional system. Since the L96 model has one model variable *x* for one model grid point, the LUTKF locally required only three (2*L*_{x} + 1, where *L*_{x} = 1) ensemble members by the deterministic sampling to provide a stable filtering solution at each grid point of the *N*_{x}-dimensional L96 model, just as it is applied to the one-dimensional model (i.e., *L*_{x} = 1) by the spatial localization. On the other hand, the LETKF provided satisfactory filtering results when *N*_{e} ≥ 10 locally at each L96 model grid point; that is, its prior RMSE values shown in Fig. 4 decreased abruptly until the number of ensemble members reached 10 (i.e., the optimal ensemble size) and then converged to the values of about 0.114, 0.115, and 0.182 for the measurement operators *h*(**x**) = **x**, *h*(**x**) = |**x**|, and *h*(**x**) = ln(|**x**|), respectively. On average, its prior spread values matched the true forecast errors.

When *h*(**x**) = **x** and *h*(**x**) = |**x**|, the results show that the LUTKF outperforms the prior estimation accuracy of the LETKF for small ensemble sizes ranging from 2 to 7 (blue shading in Figs. 4a,b), because the LETKF produces a noisy estimate for the forecast error distribution of the true state if a small value of *N*_{e} is used due to the random sampling.

When *h* is strongly nonlinear [i.e., *h*(**x**) = ln(|**x**|)], the LUTKF can achieve about 91% higher accuracy than the LETKF for *N*_{e} = 3 (see blue shading in Fig. 4c), while it can reach about 46.21% and 48.74% higher accuracy for the observation types *h*(**x**) = **x** and *h*(**x**) = |**x**|, respectively. Because the LETKF makes an incorrect linearization assumption that takes into account only the first-order term of the Taylor expansion even for the highly nonlinear function *h*, it is suboptimal for this observation operator. On the contrary, the LUTKF makes no assumption regarding *h* due to SUT, thereby providing up to at least a second-order accuracy.

Figure 4 also shows that the SPKF based on the global analysis and the augmented state vector can offer slightly more accurate estimation results with a larger ensemble size for the measurement operators *h*(**x**) = **x**, *h*(**x**) = |**x**|, and *h*(**x**) = ln(|**x**|), compared to the LUTKF based on the local analysis and nonaugmented state vector. However, the SPKF can be computationally unfeasible by requiring excessive ensemble members owing to the large dimensionality (*L*^{a}) of the augmented state vector. Meanwhile, the LUTKF is more computationally efficient for nonlinear systems by reducing the number of sigma points due to the use of the nonaugmented state vector and localization approach. The computational performance of the SPKF and LUTKF is discussed further in section 4e.

Figures 5–8 represent rank histograms obtained from prior ensembles of the SPKF, LUTKF, and LETKF during the experiments using the three observation types. The rank histograms were obtained by calculating the frequency with which the true model state lands in discrete bins formed by ranking prior ensemble members in ascending order. This histogram verification was performed for state variables 2, 10, 18, 26, and 34 and for all variables ranging from 1 to 40 in the model domain using the SPKF with the size *N*_{e} = 361 (Fig. 5), LUTKF with the size *N*_{e} = 3 (Fig. 6), and LETKF with the size *N*_{e} = 3 (Fig. 7) and *N*_{e} = 10 (Fig. 8) during cycles of data assimilation. Nonuniform histograms for the frequencies across the bins can indicate that the probabilistic representation of variables by the prior ensemble deviates from the true prior distribution of the system state.

Figures 5 and 6 show that the SPKF and LUTKF can achieve flat histograms for all the state variables and the three observation types. While Fig. 5 indicates that the SPKF based on the global domain can approximate the true prior distribution of the model state with a large ensemble size by using the augmented state vector, Fig. 6 demonstrates the advantage of the LUTKF in yielding reliable probabilistic representations close to the truth even with a small number of ensemble members. As expected, the histograms in Fig. 7 reveal noticeable deficiencies in the probabilistic forecasts by the LETKF using a small ensemble size *N*_{e} = 3. Especially, the observation type *h*(**x**) = ln(|**x**|) contains substantial deviations, which come from the suboptimal analysis that has only the first-order accuracy even for the strongly nonlinear function as well as the use of a small number of ensemble members (see Figs. 7m–r). On the other hand, Fig. 8 shows that if the ensemble size is large enough to represent the true prior distribution of the model state, LETKF forecasts can provide relatively uniform histograms.

### d. Experiment on correlation between state variables

In this subsection, several data assimilation experiments were performed to illustrate how the localizing approach has an impact on the prior correlation between neighboring state variables. The LUTKF and LETKF were run as local filters using the localization while the SPKF and ETKF were run as corresponding global filters not using the localization; afterward, the localization performance was compared.

Figure 9 shows prior correlations between state variable 20 (i.e., **x**_{20}) and all other variables in the background covariance, which were determined by the local filters and global filters; the assumed true correlations obtained by the particle filter (PF; Doucet et al. 2001; van Leeuwen 2009) using 10 000 ensemble members are illustrated by black lines for reference.^{7} For simplicity, the Gaspari and Cohn (1999) correlation function [Eq. (29)] used in the local filters for the localization is marked as “GC” in this figure.

For *N*_{e} = 100, the ETKF offers relatively accurate correlations near **x**_{20} (red lines in Figs. 9a–c). However, the use of *N*_{e} that is insufficient to describe nonzero uncertainty in the global space can lead to a noisy estimate of the correlation between state variables far from **x**_{20} in the background covariance (green lines in Figs. 9a–c). The resultant inaccurate correlations can overestimate the impact of the observations from one location on the analysis in distant locations.

When *N*_{e} = 10, the LETKF provides comparable results as those produced by the ETKF using *N*_{e} = 100 (red lines in Figs. 9d–f). This is because the localization requires an ensemble whose size represents the nonzero uncertainty in only the local region and removes spurious correlations between distant positions in the background covariance by reducing the influence of observations on distant state variables.

Nonetheless, if *N*_{e} is too small for the local analysis, then the correlation results of the LETKF are relatively noisy in regions far from **x**_{20} (green lines in Figs. 9d–f). Furthermore, the correlation in the LETKF experiments using *N*_{e} = 3 with the highly nonlinear *h*(**x**) = ln(|**x**|) was more inaccurate compared to experiments with the observation types *h*(**x**) = **x** and *h*(**x**) = |**x**| (green line in Fig. 9f). This result is caused by the incorrect assumption of linearity in the LETKF for the nonlinear function *h*, in addition to sampling errors due to a small *N*_{e}. For the LETKF, a larger value of *N*_{e} results in an increase in the localization radius *c* to obtain more observations that is effective in conducting the state estimate, as shown in Figs. 9d–f.

On the other hand, the LUTKF with the smaller ensemble size than the LETKF using *N*_{e} = 10 can estimate relatively accurate prior correlations [even for *h*(**x**) = ln(|**x**|)], since it makes no assumption with regard to *h*, owing to the SUT (red lines in Figs. 9j–l).

Also, the LUTKF provides results comparable to the SPKF with 361 [2*L*^{a} + 1, where *L*^{a} given by Eq. (5) is determined by *L*_{x} = 40, *L*_{w} = 40, and *L*_{υ} = 100] ensemble members (red lines in Figs. 9g–i). These results demonstrate that the LUTKF can capture major components of the state estimate using a much smaller *N*_{e} than would be required otherwise, thereby providing a good approximation when many ensemble members are not available.

### e. Potential advantages of the LUTKF

For data assimilation problems in realistic settings, such as atmospheric and oceanic systems, remotely sensed data from radars (Johnson et al. 2015) and satellites (Hocking et al. 2011) need to be assimilated to perform weather prediction and analysis. In this case, the filters should process densely spaced observations that are related nonlinearly to the model state, which poses a challenge for the LETKF, given the incorrect linearization assumption of *h*. For a situation analogous to this problem, the first set of experiments (hereafter Case A) was performed for the L96 system by assimilating a network of 200 randomly located observations every 6 h using *h*(**x**) = ln(|**x**|).

Figures 10a–d represent the prior state estimates obtained from the LETKF with 3 and 10 members, LUTKF for *N*_{e} = 3, and SPKF using *N*_{e} = 561 [2*L*^{a} +1, where *L*^{a} given by Eq. (5) is determined by *L*_{x} = 40, *L*_{w} = 40, and *L*_{υ} = 200] over time steps ranging from 3000 to 4000 during the cycling experiments for a much clearer comparison of the performance of the methods. The results show that all three filters, except LETKF with *N*_{e} = 3, can provide the state estimates close to true state, indicating that they perform well in estimating the model state provided that the magnitude of the observation noise is appropriate. However, the LETKF needs to assume the linearity of observation models and the SPKF requires a large ensemble size of 561. On the contrary, the LUTKF uses only three ensemble members, indicating its benefit over the LETKF and SPKF. The variation in the prior ensemble mean RMSEs over the 40 variables with time steps for the three assimilation schemes is shown in Fig. 10e.

Table 1 compares the overall assimilation performance of each filter through the averaged prior RMSE over all the variables and time steps during the cycling experiments. As seen in Fig. 10e and Table 1, the LUTKF can accomplish forecast results that are as accurate as both the LETKF using *N*_{e} = 10 and SPKF.

For the sake of completeness, Fig. 11 represents the prior state estimates and their domain-averaged RMSEs for Case A experiments using *N*_{z} = 100 as performed in section 4c. The figure shows that the model states are fairly well estimated by the LUTKF, providing prior RMSE values that are comparable to those of the LETKF using *N*_{e} = 10 and SPKF using *N*_{e} = 361 [2*L*^{a} + 1, where *L*^{a} given by Eq. (5) is calculated by *L*_{x} = 40, *L*_{w} = 40, and *L*_{υ} = 100] at most time steps, although the opposite situations occur at some assimilation time steps. Although not shown, the four assimilation methods in this case yield higher prior mean RMSEs than assimilation results presented in Table 1, due to a smaller number of observations; the prior mean RMSE values of the LETKF for *N*_{e} = 3, LETKF for *N*_{e} = 10, LUTKF, and SPKF assimilation results are 2.685, 0.182, 0.213, and 0.171, respectively.

The frequent peak values of prior mean RMSEs in Figs. 10e and 11e results from either overestimation or underestimation of model states, as shown in Figs. 10a and 11a, respectively. These estimation errors are most probably related to random measurement noise and to the chaotic characteristics of the L96 system for transition states. Thus, the assimilation methods can lead to poor results in estimating the model state when using observations with a large amount of noise during the transition phase period. More details on the performance of the filters when noisier observations are assimilated were examined in detail through the second set of experiments.

Table 1 also shows the computational time required for each assimilation approach to estimate the model state. To compare the computational efficiency, the same programming framework described in section 4b is used to implement all the assimilation schemes. In addition, the local analysis for each model grid point of the LUTKF and LETKF can be computed independently in parallel by using a parallel programming library such as the mpi4py package. As a result, even though the LUTKF yields no computational benefits over the LETKF when *N*_{e} = 3, the LUTKF consumes less computational cost compared to the LETKF with *N*_{e} = 10, as shown in Table 1. The SPKF that requires 561 members for a good state estimate is the most computationally expensive, costing 12.31 times as much as LUTKF.

To mimic a more realistic situation, the second set of experiments (hereafter Case B) was executed by increasing the noise level (error standard deviation) of the observations tenfold; that is, the diagonal entries of the observation error covariance $R$ were set to 1. The diagonal entries of the model error covariance $Q$ were also fixed to 1. Figure 12 presents the assimilation results for the state estimation methods and corresponding RMSEs over assimilation time steps ranging from 3000 to 4000. Comparing Figs. 12a–c reveals that the LUTKF estimates the state better than the LETKF with *N*_{e} = 3 or *N*_{e} = 10. Figure 12e shows some divergent tracks over time steps of cycling assimilation among the four estimation approaches. For example, the RMSE for the SPKF changes almost steadily, while the LUTKF has a relatively significant change in RMSE with time steps. In contrast to the LUTKF, the change in RMSE is more abrupt and frequent in the two LETKF methods. From Fig. 12e, we can also see that the LUTKF can provide more reliable and accurate estimation results than the two LETKF approaches, although the system is subject to higher noise.

The prior mean RMSE of assimilation results for each estimation scheme in Case B is shown in Table 2. The results show that the estimation accuracy of the LUTKF is as good as the SPKF, and the LUTKF provides better results than the two LETKF methods. Table 2 also compares the correlation between the true and estimated trajectories from the assimilation methods. The LUTKF using *N*_{e} = 3 can provide correlation results that are as high as the SPKF assimilation. This means that the LUTKF has a superior ability to capture the observation information effectively when estimating the model state using noisy observations. For Case B experiments with *N*_{z} = 100 (not reported here), the LETKF with *N*_{e} = 3 and *N*_{e} = 10, LUTKF, and SPKF yield prior mean RMSE values of 4.243, 1.518, 1.636, and 1.146, respectively, and their correlations are 0.313, 0.911, 0.902, and 0.949, respectively.

Although not shown, the data assimilation methods in Case B are performed at a similar computational cost as Case A and result in accurate estimates of the error statistics of the model. To provide a more complete assessment of filter performance when *h* is nonlinear, Fig. 13 shows assimilation results and corresponding RMSEs for the additional experiment using *h*(**x**) = |**x**| in Case B. From Fig. 13, we can see that the LUTKF can provide state estimation results that are as accurate as the LETKF with *N*_{e} = 10 and SPKF. Although not shown, the prior mean RMSE values of the LETKF for *N*_{e} = 3, LETKF for *N*_{e} = 10, LUTKF, and SPKF assimilation results are 0.826, 0.331, 0.397, and 0.329, respectively. These results suggest that LUTKF continues to provide reliable performance, even if observations relate nonlinearly to the model state, provided that sufficient observations are available.

For Case A and Case B experiments, densely spaced observations that are close to variable **x**_{20} were assimilated. The LUTKF with *N*_{e} = 3 can be very advantageous in these experiments. On one hand, the filter degeneration due to a small ensemble size can be offset by assimilating numerous observations on the same location. On the other hand, *N*_{z} = 100 and *N*_{z} = 200 suggest that a substantial number of observations can be assimilated into nearly the same location.

To examine the impact of less dense observations on the assimilation performance of the filters, the third set of experiments (hereafter Case C) was performed as in Case B, but densely spaced observations (*N*_{z} = 200) are replaced with 40 observations (*N*_{z} = 40) that are equally spaced on the grid points; that is, all but the observation network among experimental configurations set in Case B were applied to Case C. For the best assimilation results in Case C, the localization length scale *c* of the Gaussian localization function given by Eq. (29) is set to 1.1, 1.4, and 3.7 for the LUTKF using *N*_{e} = 3, LETKF with *N*_{e} = 3, and LETKF with *N*_{e} = 10, respectively.

Figure 14 shows the prior state estimates of the LETKF with *N*_{e} = 3 and *N*_{e} = 10, LUTKF using *N*_{e} = 3, and SPKF using *N*_{e} = 241 [2*L*^{a} + 1, where *L*^{a} given by Eq. (5) is determined by *L*_{x} = 40, *L*_{w} = 40, and *L*_{υ} = 40] and their corresponding domain-averaged RMSEs in Case C. Figure 15 represents the prior state estimation errors from the true state and the posterior state estimates minus prior state estimates (analysis increments) as a function of time (data assimilation cycle) and space (L96 model state variable) for the LETKF with *N*_{e} = 10, LUTKF using *N*_{e} = 3, and SPKF using *N*_{e} = 241 in Case C. As seen in Figs. 14, 15a, 15c, and 15e, the LUTKF can achieve estimation results that are better than the LETKF with *N*_{e} = 3 and comparable to the LETKF with *N*_{e} = 10, although it cannot provide estimation results that are as accurate as the SPKF, since assimilated observations are insufficient. Figures 15b, 15d, and 15f show that the continuity of the analysis increment of the LUTKF is comparable to that of the LETKF. This occurs because the LUTKF that performs the spatial localization as in the LETKF can result in the overlap of assimilated observations within the local regions between neighboring model grid points, which helps to ensure the spatial continuity of the analysis at adjacent grid points (Hunt et al. 2007).

Table 3 shows the prior mean RMSE, correlation between the true and estimated states, and computational time for the assimilation schemes in Case C. From Fig. 14e and Table 3, we can see that the estimation accuracy of the LUTKF is similar to that of the LETKF with *N*_{e} = 10, providing RMSE values and correlation results comparable to the LETKF for *N*_{e} = 10, even though the LUTKF provides no estimation accuracy improvements over the SPKF with *N*_{e} = 241. This suggests that the LUTKF can capture the observation information as effectively as the LETKF for *N*_{e} = 10, when estimating the model state using a smaller number of observations.

Furthermore, because fewer observations than Case A and Case B are assimilated in Case C, the LETKF with *N*_{e} = 3, LETKF with *N*_{e} = 10, LUTKF, and SPKF in Case C yield higher prior mean RMSEs and correlations when compared to those in Table 1 and Table 2, and have about 60.29%, 105.16%, 74.93%, and 76.82% lower computational times than those in Table 1, respectively, as shown in Table 3.

To examine the assimilation performance of the filters when applied to the L96 system with more chaotic behavior, Case C experiments using the L96 system with *F* = 8.1 were performed. In general, the behavior of the L96 system is affected by the magnitude of *F* (Lorenz 1996). For very small values of *F*, all solutions of Eq. (43) will converge to a steady state, while, for somewhat larger values of *F*, most solutions are periodic. However, when the value of *F* is large enough, the solutions are chaotic.

The maximal Lyapunov exponent (MLE) that can provide a reasonable indication of the magnitude of the chaos of the system was calculated; a negative MLE indicates that the system is exponentially stable, while a positive MLE means that the system is chaotic and unpredictable. The MLE values of L96 models for *F* = 8.0 and *F* = 8.1 were 0.212 and 0.218, respectively. This suggests that the L96 model with *F* = 8.1 displays somewhat more chaotic behavior compared to the model using *F* = 8.0.

Figure 16 shows the prior state estimates and corresponding RMSEs of the filters for Case C experiments using the L96 system with *F* = 8.1. Similar to assimilation results of the L96 model using *F* = 8.0 in Fig. 14, the LUTKF using as few as three ensemble members can achieve the estimation performance that is as accurate as the LETKF with *N*_{e} = 10 and can outperform the LETKF using *N*_{e} = 3, although it can yield no significant benefits over the SPKF using *N*_{e} = 241.

From Table 3, we can see that LETKF with *N*_{e} = 3, LETKF with *N*_{e} = 10, LUTKF, and SPKF using the L96 model with *F* = 8.1 provide about 1.55%, 4.81%, 4.59%, and 5.52% higher RMSE values, respectively, and 1.74%, 1.79%, 1.34%, and 1.54% lower correlations, respectively, compared to experiments using the L96 model with *F* = 8.0. This result occurs because the filters can result in poor estimation accuracy in estimating the system state that is more chaotic and unstable by using a larger value of *F*. Table 3 also denotes that the filters regardless of the value of *F* have a similar computational cost, since the magnitude of *F* does not have a great impact on the computational time of the filters.

## 5. Conclusions

This paper presents an efficient data assimilation method called localized unscented transform Kalman filter (LUTKF) for nonlinear filtering applications. Unlike the EnKF that uses a random sampling strategy and makes assumptions regarding the linearity of measurement operators, the LUTKF estimates the error statistics of the model state (mean and covariance) through deterministically chosen samples (called sigma points) using the scaled unscented transformation (SUT) without making any assumptions for nonlinear observation operators.

In a broad sense, the LUTKF can be thought of as a variant of the localized EnKF (such as the LETKF) with the specific sample selection approach (i.e., SUT). Similar to localized EnKFs, the LUTKF can reduce the impact of observations on distant state variables by using the localization approach to suppress spurious correlations between distant positions in the background covariance; meanwhile, the model state estimate for the SPKF is executed for the global domain. The spatial local analysis within neighboring regions of each model state variable makes it possible to use far fewer measurements than the SPKF, and it enables the LUTKF to perform data assimilation with a smaller ensemble size than the SPKF.

In addition to avoiding the global analysis of the SPKF, the LUTKF is designed to carry out local analysis for the nonlinear dynamical model and observation model with additive noise. Therefore, while the system state for the SPKF is based on an augmented form in which the model state vector is redefined as the concatenation of the original model state, model error, and observation error, the LUTKF does not need to augment the model state with the random noise variables. The use of the nonaugmented state vector can reduce the number of dimensions of the sigma points as well as the total number of sigma points used.

Because the LUTKF uses both the nonaugmented form and the localization scheme, another advantage of the LUTKF is that it is computationally affordable for nonlinear filtering applications, and its data assimilation calculations can be easily parallelized. In terms of computing time, the LUTKF can provide substantial benefits over the SPKF that executes a global analysis.

The LUTKF approach introduced in this manuscript may be problematic when applied to geophysical data assimilation applications. Similar to localized EnKFs, the local analysis does not ensure that physical balances are preserved in the posterior model state. This issue can be worsened as the localization length scale decreases, as demonstrated by Mitchell et al. (2002). Moreover, the LUTKF assumes that observations are uncorrelated in order to localize a posterior solution in a manner similar to covariance localization in localized EnKFs, such as the LETKF approach used here. This assumption may limit the effectiveness of the filters in assimilating remotely sensed data, such as satellite retrievals (Stewart et al. 2008). In the case of infrared satellite radiance data, off-diagonal components of observation error covariance can be significant (Bormann et al. 2016); in this situation, further consideration is required regarding the feasibility of the LUTKF approach.

Results from the cycling data assimilation experiments with the 40-variable Lorenz model show that the LUTKF requires only three ensemble members (i.e., sigma points) to accurately estimate the error statistics of the model. With the linear observation operators, the LUTKF yields lower prior mean RMSEs than the LETKF (with localization and adaptive inflation) for ensemble sizes smaller than eight. The LUTKF is most beneficial in applications in which dense observations related nonlinearly to the model state are assimilated. In this case, the LUTKF offers considerable benefits over the LETKF and results that are comparable to the SPKF. These results suggest that the LUTKF has the potential to become an effective method to assimilate observations into realistic geophysical systems that require strongly nonlinear observation operators; for example, the assimilation of remotely sensed data, such as satellite radiances or radar reflectivity.

We have already begun examining the feasibility of the LUTKF in larger models. Although not discussed in this study, the LUTKF requires a much smaller ensemble size than the LETKF and SPKF to estimate the error statistics of the model in cycling data assimilation experiments carried out with a realistic atmospheric model that includes 70 762 328 variables. Furthermore, we have already studied the LUTKF algorithm that can scale the ensemble size to more accurately approximate the error statistics of the realistic model, called a local scalable unscented transform Kalman filter (LSUTKF). Data assimilation experiments using this model will be the topic of a future study that will explore the estimation accuracy, numerical stability, and computational difficulty for high-dimensional problems.

## Acknowledgments

We thank Dr. Jeffrey Anderson for helpful and insightful comments on this work. This work was part of the R&D project on the development of global numerical weather prediction systems of the Korea Institute of Atmospheric Prediction Systems (KIAPS) funded by the Korea Meteorological Administration (KMA) and supported by 2019 Research Fund of Myongji University.

### APPENDIX

#### Kalman Gain and Analysis Covariance for the SPKF

The error in the posterior state estimation is defined as

Because the Kalman filter is a minimum mean square error (MMSE) estimator, we want to minimize the mathematical expectation of the square of the magnitude of the estimation error, which is given by $E[\Vert ek|k\Vert 22]$.^{8} This is the same as minimizing the trace of the posterior estimation error covariance (analysis covariance) $P$_{k|k} calculated by

Assuming that the estimation error **e**_{k|k} is unbiased (i.e., *E*[**e**_{k|k}] = 0), the analysis covariance $P$_{k|k} is rewritten as

where $ek|k\u22121=xk\u2212x^k|k\u22121$ and $z\u02dck=zk\u2212z^k$.

Similar to the posterior estimation error **e**_{k|k}, assuming that both the prior estimation error **e**_{k|k−1} and measurement residual $z\u02dck$ are unbiased, the cross covariance matrices $P$_{xz} (between $x^k|k\u22121$ and $z^k$) and $P$_{xz} (between $z^k$ and $x^k|k\u22121$) can be expressed as

Next we find the Kalman gain matrix $K$_{k} that minimizes the trace of the covariance $P$_{k|k}. The trace of the analysis covariance Tr($P$_{k|k}) is minimized when its matrix derivative with respect to the gain matrix is zero. Using the gradient matrix rules, this can be represented by

From Eq. (A7), the optimal Kalman gain that yields MMSE estimates is calculated by

By replacing the Kalman gain in Eq. (A6) with the optimal value derived above, the optimal expression for analysis covariance is given by

## REFERENCES

*J. Atmos. Sci.*

*Mon. Wea. Rev.*

*Quart. J. Roy. Meteor. Soc.*

*IEEE Trans. Veh. Technol.*

*J. Climate*

*J. Guid. Control Dyn.*

*IEEE Trans. Signal Process.*

*Ocean Modell.*

*Sequential Monte Carlo Methods in Practice*

*J. Geophys. Res.*

*J. Geophys. Res.*

*Ocean Dyn.*

*Mon. Wea. Rev.*

*IEEE Trans. Aerosp. Electron. Syst.*

*Quart. J. Roy. Meteor. Soc.*

*Mon. Wea. Rev.*

*Predictability of Weather and Climate*

*J. Atmos. Oceanic Technol.*

*Mon. Wea. Rev.*

*Mon. Wea. Rev.*

*Mon. Wea. Rev.*

*Physica D*

*IEEE Trans. Autom. Control*

*Mon. Wea. Rev.*

*Proc. 2002 American Control Conf.*

*Proc. IEEE*

*IEEE Trans. Autom. Control*

*Mon. Wea. Rev.*

*Proc. IEEE Signal Processing Algorithms Architectures, Arrangements, and Applications*

*Proc. Seminar on Predictability*

*J. Atmos. Sci.*

*Physica D*

*J. Atmos. Oceanic Technol.*

*J. Atmos. Sci.*

*Mon. Wea. Rev.*

*Quart. J. Roy. Meteor. Soc.*

*Ocean Modell.*

*Tellus*

*Mon. Wea. Rev.*

*Stat. Comput.*

*Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches*. Wiley-Interscience, 526 pp

*Int. J. Numer. Methods Fluids*

*J. Adv. Model. Earth Syst.*

*Mon. Wea. Rev.*

*Proc. 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (ASSPCC)*

*Mon. Wea. Rev.*

*Mon. Wea. Rev.*

*Mon. Wea. Rev.*

*Mon. Wea. Rev.*

## Footnotes

^{1}

Since there is no general approach for determining the noise covariances $Q$_{k} and $R$_{k} of the dynamical and observation models, these covariances, known as parameters of the Kalman filter (Simon 2006), are often obtained by sensitivity experiments or Monte Carlo methods (Miller et al. 1994) and by the estimation methods suggested by Evensen (2003).

^{2}

The parameter *β* can be employed to control the error in the kurtosis that has an impact on the heaviness of the tails of the posterior distribution (Wan and van der Merwe 2000; van der Merwe 2004). If the state distribution of the dynamic model is Gaussian, then the optimal choice of *β* is 2.

^{3}

A more detailed analysis for the estimation accuracy of the SUT can be found in van der Merwe (2004) and Simon (2006).

^{4}

The observation covariance matrix $R$ is assumed to be diagonal. This is a reasonable assumption given that observation errors are white and uncorrelated.

^{5}

Since the local analysis needs much less data compared to the global analysis and can be separately executed for each model grid point in parallel, the localization enhances the computational efficiency of the data assimilation.

^{6}

The diagonal elements of $R$ are much smaller than examined in most other studies in the literature. Additional experiments using diagonal elements that are set to 1 are addressed in section 4e.

^{7}

The PF makes no assumptions regarding the linearity of prediction and observation models and about the error distributions needed to estimate and sample from the posterior density.

^{8}

Means the two-norm of a vector, also known as the Euclidean norm, and is defined as the length or magnitude of a vector as $\Vert x\Vert =xTx=x12+\cdots +xn2$.