## 1. Introduction

Among sequential data assimilation methods, Kalman filters have been widely used in meteorology and oceanography. The standard Kalman Filter (KF) is a simplification of Bayesian estimation that provides sequential, unbiased, minimum error variance estimates based on a linear combination of all past measurements and dynamics (Welch and Bishop 1995). Since the introduction of the extended Kalman filter (EKF), the nonlinear extension to the standard KF, there have been many attempts to use the EKF in weather or climate prediction models. It has been shown that the EKF can be used in sequential data assimilation in strongly nonlinear systems (Miller et al. 1994). Unfortunately, the requirement of the Jacobian or tangent linear model (TLM) for the linearization of nonlinear functions limits the use of EKF for many real world problems. Another major drawback of EKF is that it only uses the first-order terms of the Taylor expansion of the nonlinear function. It is evident that this approximation often introduces large errors in the estimation of covariance matrices in highly nonlinear models (Miller et al. 1994). In other words, the inaccuracy of propagated means and covariances resulting from the linearization of the nonlinear model is one of the major drawbacks of the EKF data assimilation algorithm.

Another alternative to the standard KF is the ensemble Kalman filter (EnKF), introduced by Evensen (Evensen 1992; Houtekamer and Mitchell 1998), in which the error covariances are estimated approximately using an ensemble of model forecasts. The main concept behind the formulation of the EnKF is that if the dynamical model is expressed as a stochastic differential equation, the prediction error statistics, which are described by the Fokker–Plank equation, can be estimated using ensemble integrations (Evensen 1994, 1997); thus, the error covariance matrices can be calculated by integrating the ensemble of model states. The EnKF can overcome the EKF drawback that neglects the contributions from higher-order statistical moments in calculating the error covariance. The major strengths of the EnKF include the following: (i) there is no need to calculate the tangent linear model or Jacobian of nonlinear models, which is extremely difficult for ocean (or atmosphere) general circulation models (GCMs); (ii) the covariance matrix is propagated in time via fully nonlinear model equations (no linear approximation as in the EKF); and (iii) it is well suited to modern parallel computers (cluster computing) (Keppenne 2000).

The finite ensemble size has major effects on the performance of the EnKF. A small ensemble size increases the residual errors and gives inaccurate statistical moments, and a large ensemble size is not computationally feasible in the case of atmospheric or ocean GCMs. Another disadvantage of the EnKF is that it assumes a linear measurement operator: if the measurement function is nonlinear, it has to be linearized in the EnKF. The nonlinear measurement functions appear in many situations: for example, the parameter estimation of nonlinear dynamical models, in which the measurement relationships between observations and parameters are nonlinear. Another example is satellite altimetry data assimilation, in which the observation (sea level height) is often nonlinearly related to the variable required for assimilation (e.g., temperature). Thus, the condition of linear measurement limits the use of the EnKF in some real world problems.

The sigma-point Kalman filters (SPKFs; van der Merwe et al. 2004) have recently been proposed in an attempt to address these drawbacks of EKF and EnKF. The SPKF is a *derivativeless* sequential optimal estimation method, using a novel deterministic sampling approach that eliminates the need for the calculation of TLM or the Jacobian of the model equations as needed by the standard KF (Julier et al. 1995; Nørgaard et al. 2000b; Ito and Xiong 2000; Lefebvre et al. 2002; Wan and van der Merwe 2000; Haykin 2001; van der Merwe 2004). It has been found that the expected error due to linearization is smaller than that of a truncated Taylor series linearization (Schei 1997; Lefebvre et al. 2002; van der Merwe and Wan 2001a). The SPKF algorithm has been successfully implemented in many areas such as robotics, artificial intelligence, natural language processing, and global positioning systems navigation (van der Merwe and Wan 2001a,b; Haykin 2001; van der Merwe 2004; van der Merwe et al. 2004; Wan and van der Merwe 2000). In this paper, we will show that SPKF, as an ensemble Kalman filter with a specific ensemble, has a great potential in the assimilation of nonlinear systems. This paper is meant as a major effort in exploring the possibility of applying SPKF in atmospheric and oceanic data assimilation.

This paper is structured as follows: section 2 introduces the sigma-point methodology and section 3 describes SPKF implementation in the highly nonlinear Lorenz model. Section 4 describes the experimental details and gives a detailed comparison of SPKF with EKF and EnKF. Section 5 describes the SPKF implementation in higher-dimensional systems. Section 6 summarizes the conclusions.

## 2. Methodology: Sigma-point Kalman filters

In this section we will interpret the sigma-point concept and SPKF algorithms in detail. The so-called sigma-point approach is based on deterministic sampling of state distribution to calculate the approximate covariance matrices for the standard Kalman filter equations. The family of SPKF algorithms includes the unscented Kalman filter (UKF; Julier et al. 1995; Wan and van der Merwe 2000), the central difference Kalman filter (CDKF; Nørgaard et al. 2000b; Ito and Xiong 2000), and their square root versions (Haykin 2001; van der Merwe and Wan 2001a,b). Another interpretation of the sigma-point approach is that it implicitly performs a statistical linearization (Gelb 1974; Lefebvre et al. 2002) of the nonlinear model through a weighted statistical linear regression (WSLR) to calculate the covariance matrices (van der Merwe and Wan 2001a,b; van der Merwe et al. 2004). In SPKF, the model linearization is done through a linear regression between *n* number of points (called sigma points) drawn from a prior distribution of a random variable rather than through a truncated Taylor series expansion at a single point (van der Merwe et al. 2004). It has been found that this linearization is much more accurate than a truncated Taylor series linearization (Schei 1997; Lefebvre et al. 2002; van der Merwe and Wan 2001a).

*L*-dimensional dynamical system represented by a set of discretized state space equations,where

*θ**represents the system state vector at time*

_{k}*k*,

*f*(·) is the nonlinear function of the state,

*q*is the random (white) model errors,

_{k}

*ψ**is the measured state,*

_{k}*h*(·) is the measurement function, and

*r*is the zero-mean random measurement noise.

_{k}*is the Kalman gain, which is optimally chosen such that it minimizes the weighted scalar sum of the diagonal elements of the error covariance matrix 𝗣*

_{k}_{θk}

^{−}(Gelb 1974). The standard expression for the Kalman gain and the error covariance matrix is given bywhere 𝗛 is the linearized measurement operator, 𝗥 is the observation error covariance matrix, and

*E*[·] represents the mathematical expectation or the expected value. The error covariance update or the analysis covariance matrix, which represents the change in forecast error covariance when a measurement is employed, is given bywhere

**I**is the identity matrix. For EKF, the formulation of the forecast error covariance is given bywhere 𝗔

_{k}_{−1}is the TLM of the nonlinear model (1) and (2) and 𝗤

_{k}_{−1}is the model error covariance matrix. The TLM often introduces errors in highly nonlinear models and is extremely difficult to obtain for GCMs. Another major drawback of EKF is that it uses the linearized measurement operator 𝗛 to calculate the Kalman gain and update error covariance. The linearization of nonlinear measurement is computationally difficult and may result in large estimation errors.

The SPKF family addresses the above issues of EKF and EnKF. It uses a different approach in calculating the Kalman gain and the error covariance matrices. The technique employed in SPKF is to reinterpret the standard Kalman gain and covariance update equation in such a way that it does not need the TLM and the linearized measurement operator. This interpretation is explained below.

_{θ−k}H

^{T}in Kalman gain Eq. (4) can be interpreted as the cross-covariance 𝗣

_{θk}

*ψ̃*between the state and observation errors, and the remaining expression can be interpreted as the error covariance 𝗣

_{k}*of the difference between model and observation (Gelb 1974). Proof of this interpretation can be found in appendix A.*

_{ψ̃}_{k}^{1}Therefore, the optimal gain or Kalman gain 𝗞

*can be rewritten asHere,*

_{k}

*ψ̃*_{k}is defined as the error between the noisy observation

*ψ**and its prediction*

_{k}

*ψ̂*_{k}

^{−}given by

*ψ̃*_{k}=

**ψ**_{k}−

*ψ̂*_{k}

^{−}. By using relation (9), the covariance update Eq. (6) can be rewritten as (see appendix B for details)Unlike the standard KF, the SPKF algorithm makes use of this new interpretation [Eqs. (9) and (10)], which avoids the use of the Jacobian while retaining consistency and accuracy.

In the standard KF the state error covariance is calculated during the time update process and is updated during the measurement update process. Updating the error covariance matrix is important because it represents the change in forecast error covariance when a measurement is performed. The EnKF implementation does not require the covariance update equation because it can directly calculate the updated error covariance matrix from a set of ensembles. Evensen (2003) has derived the analysis covariance equation, which is consistent with the standard KF error covariance update Eq. (6). But the true representation of the updated error covariance requires a large ensemble size, which is often computationally infeasible. The SPKF makes use of the reformulated error covariance to update Eq. (10) and chooses the ensembles deterministically in such a way that they can capture the statistical moments of the nonlinear model accurately; in other words, the forecast error covariance Eq. (5) is computed using deterministically chosen samples, called sigma points. In a broad sense, the SPKF algorithm implicitly uses the prior covariance update equation (or the analysis error covariance matrix) to calculate the forecast error covariance. Thus, SPKF is fully consistent with the time update and measurement update formulation of the Kalman filter algorithm. In the next subsection we will discuss each SPKF algorithm in detail.

### a. Sigma-point unscented Kalman filter (SP-UKF)

*k*, is 2

*L*+ 1: thus, the

*sigma-point state vector*is given by (Julier et al. 1995; Julier 2002; Wan and van der Merwe 2000)where

*χ*_{k,0},

*χ*_{k,i}

^{+}, and

*χ*_{k,i}

^{−}are the sigma-point vectors. The selection scheme for choosing the sigma points is based on the scaled unscented transformation that transforms the model state vector according to the following equations:where [

*L*+

*λ*) 𝗣

_{θk}

_{i}is the

*i*th row (or column) of the weighted matrix square root of the covariance matrix, 𝗣

_{θk}.

*w*

_{i}

^{(m)}is the weighting term corresponding to the mean,

*w*

_{i}

^{(c)}corresponds to the covariance, and

*λ*=

*α*

^{2}(

*L*+

*κ*) −

*L*is a scaling parameter. The parameter

*α*is set to a small positive value (0 ≤

*α*≤ 1) and determines the spread of the sigma points around the mean state

*θ**. Another control parameter is*

_{k}*κ*, which guarantees the positive semidefiniteness of the covariance matrix and is usually set to a positive value (

*κ*≥ 0);

*β*is a nonnegative weighting term that can be used to incorporate any prior knowledge of the nature of the state distribution.

^{2}

*χ*_{k}

^{θ}is the forecast sigma-point state vector,

*χ*_{k−1}

^{q}is the sigma-point vector corresponding to the model error, and

*χ*_{k}

^{r}corresponds to the observation error. The approximated mean, covariance, and cross-covariance for the calculation of Kalman gain are computed as follows (Julier et al. 1995; Julier 2002; Wan and van der Merwe 2000, 2001):

The Kalman gain 𝗞 can be calculated using Eq. (9) and the state covariance is updated using Eq. (10). A detailed description and derivation of the UKF algorithm and sigma-point formulation can be found in the above referenced literature.

### b. Sigma-point central difference Kalman filter (SP-CDKF)

^{3}Using Sterling’s polynomial interpolation, the nonlinear model given by Eq. (1) can be approximated aswhere

*f̃*(

_{k}) is the linearized model, and

*D̃*

_{k}and

*D̃*

_{k}

^{2}are the central divided difference operators, which we will explain in the latter part in this section. Here the linearization of the nonlinear model is achieved by using a linear transformation that statistically decouples

^{4}the state vector

*θ**(Schei 1997). It has been shown that this approximation is always better than using the Jacobian matrix (Schei 1997). The linear transformation is based on the square root factorization of the model covariance matrix and is given byHere*

_{k}

*θ*_{k}is the mean state and 𝗦

_{θk}is the Cholesky factor of the updated error covariance matrix (10) that satisfies the following condition:The terms

*D̃*

_{k}and

*D̃*

_{k}

^{2}are the first- and second-order central divided difference operators and can be written as (Ito and Xiong 2000; Nørgaard et al. 2000a; van der Merwe and Wan 2001a; Wan and van der Merwe 2001)where

*m**,*

_{i}

*d**, and*

_{i}

*d*_{i}

^{2}are the mean, partial first-order, and partial second-order central divided difference operators respectively, defined aswhere

*δ*is the central difference step size and

*s*_{θi}is the

*i*th column of the Cholesky factor of the covariance updated error covariance matrix (10)

One main advantage of SP-CDKF over SP-UKF is that it uses only one “control parameter” (*δ*) compared to three (*λ*, *α*, and *κ*) in UKF. For exact derivation and algorithmic details see Ito and Xiong (2000), Nørgaard et al. (2000a), van der Merwe and Wan (2001a), and Wan and van der Merwe (2001).

## 3. SPKF applied to the Lorenz model

In the field of data assimilation, the celebrated Lorenz (1963) model has served as a test bed for examining the properties of various data assimilation methods (Gauthier 1992; Miller et al. 1994; Evensen 1997) because the Lorenz model shares many common features with the atmospheric circulation and climate system in terms of variability and predictability (Palmer 1993). By adjusting the model parameters that control the nonlinearity of the system, the model can be used to simulate nearly regular oscillations or highly nonlinear fluctuations.

### a. Lorenz 1963 model

*x*,

*y*, and

*z*are related to the intensity of convective motion and to the temperature gradients in the horizontal and vertical directions, respectively, and the parameters

*σ*,

*ρ*, and

*β*will be referred to as dynamical parameters;

*q*,

^{x}*q*, and

^{y}*q*represent the unknown model errors, assumed to be uncorrelated in time (white noise). Also, we assume that all the measurements or observations are linear functions of the nonlinear model states. The true data are created by integrating the model over 4000 time steps using the fourth-order Runge–Kutta scheme (Press et al. 1992), with parameters

^{z}*σ*,

*ρ*, and

*β*set to 10.0, 28.0, and 8/3, respectively, and initial conditions set to 1.508870, −1.531271, and 25.46091 (Miller et al. 1994; Evensen 1997). The integration step is set to 0.01. The observation datasets are simulated by adding normal distributed noise to the true data. The assimilation process is completely subject to the model Eqs. (41)–(43) after the initial guesses are given; at each step of the integration, the initial conditions are the estimated model state from the previous step.

### b. State estimation

To apply KF, we discretize the nonlinear Lorenz model (41)–(43) using the fourth-order Runge–Kutta method and write it in the form of state space equations given by (1) and (2), where *θ** _{k}* represents the system state vector (a column vector composed of

*x*,

*y*, and

*z*),

*f*(·) is the nonlinear function of the state, and

*q**is the random (white) process noise vector (column vector composed of*

_{k}*q*,

^{x}*q*, and

^{y}*q*). The measured model state

^{z}

*ψ**required for the application of the KF is a function of the states according to Eq. (2), where*

_{k}*h*(·) is the measurement function and

*r**is the random measurement noise vector.*

_{k}**Θ**

*and the corresponding covariance matrix are given by the following equations:Therefore, the augmented state dimension is the sum of the original state dimension, model error dimension, and measurement error dimension given bywhere*

_{k}*L*is the dimension of the state,

_{θ}*L*is the dimension of the model error vector, and

_{q}*L*is that of measurement errors. The augmented sigma points are found using the transformation Eqs. (12)–(14). The dimension of the augmented sigma-point vector is 2

_{r}*L*

_{Θ}+ 1. For the Lorenz model discussed here, the augmented sigma-point vector dimension is 19. In other words the number of sigma points required to approximate the error statistics accurately is 19. The augmented sigma-point vector is then propagated through (15) and (16) and the optimal terms for the calculation of Kalman gain are computed according to Eqs. (17)–(21).

### c. Parameter estimation from noisy measurements

*f*(·) or an empirical model parameterized by the vector

**Λ**. The state space representation of the parameter estimation problem for the Lorenz model can be written aswhere

*f*(·) is the nonlinear measurement model given by the Lorenz Eqs. (1)–(3);

**Λ**is the parameter vector that constitutes the dynamical parameters

*σ*,

*ρ*, and

*β*; and

*q*_{k}

^{Λ}and

*r*_{k}

^{Λ}represent the model and measurement error vector respectively. The SPKF (SP-UKF and SP-CDKF) equations for the parameter estimation problem are similar to those of the state estimation formulation except that the state (here states are parameters) time evolution is linear [Eq. (54)] and the measurement function is nonlinear [Eq. (55)].

### d. Joint estimation of parameters and states

*dual estimation*and

*joint estimation*approaches (Haykin 2001; Nelson 2000; van der Merwe 2004). In the dual estimation approach, two Kalman filters are running simultaneously for state and parameter estimation. On the other hand, in the joint estimation approach, the system state and parameters are concatenated into a single higher-dimensional joint state vector and only one Kalman filter is used to estimate the joint vector. For example, the joint state vector

*for the SPKF data assimilation can be written asIn the joint estimation process, the SPKF schemes estimate the states using parameters that are estimated at every time step using the prior states. In this study, we will only present the joint estimation of parameters and states because it incorporates complete model states and parameters during assimilation cycles.*

_{k}## 4. Experiments and results

In this section we demonstrate the feasibility of the SPKF algorithms as an effective data assimilation method for highly nonlinear models. The SPKF algorithms discussed in the previous sections will now be examined and compared with standard EKF and EnKF methods. To compare the SPKF algorithms with standard EKF and EnKF, all experiments were designed almost identically to those of Miller et al. (1994) and Evensen (1997).

### a. State estimation

The first set of experiments were carried out with initial conditions, parameters, and observation noise levels identical to those in Miller et al. (1994) and Evensen (1997): the observations and initial conditions are simulated by adding normally distributed noise *N*(0,

For all the cases to be discussed, we assume that the model and observation errors are uncorrelated in both space and time. Because there is no general way to set the model error, usually the amount of model error to use in the KF is often determined experimentally by trial or by statistical methods such as Monte Carlo, which is computationally expensive (Miller et al. 1994). In our experiments, the model errors were intentionally designed in such a way that the model would not drift from the true state too much.^{5} In detail, we set the model errors by calculating the expected errors in the state scaled by a decreasing exponential factor that is a function of the assimilation time; initially, the model covariance matrix is set to an arbitrary diagonal value and then anneals toward zero exponentially as the assimilation proceeds. For simulating model errors in the ensemble Kalman filter, we follow the method suggested by Evensen (2003). An ensemble of 1000 members was used in the EnKF as in Evensen’s experiment (Evensen 1997).

Figures 1a–d show the state estimate using the EKF, EnKF, SP-UKF, and SP-CDKF, respectively. As can be seen, all four methods can generate the model states similar to true values, indicating good capability of these methods in estimating model states if the size of initial perturbation and observed noise are appropriate as given here. Figures 1a and 1b were also obtained by Miller et al. (1994) and Evensen (1997). It should be noted that the EKF and the EnKF can have good state estimates, but the former needs to construct TLM and the latter requires a large ensemble size of 1000. In contrast, the SP-UKF and the SP-CDKF only use 19 “particular” ensemble members (sigma points) here, showing their advantages over EKF and EnKF.

A comparison among the four methods is shown in Fig. 2: the variation of the error square (ES) with time step. The ES is defined here as the square of the difference between estimated state and true model state scaled by *N*, where *N* is a scalar quantity:^{6}

From Fig. 2, we can see that the SP-UKF and SP-CDKF assimilations have a smaller ES than EnKF at most times, although some assimilation steps have an opposite situation. These “peak” values of ES correspond to either overestimation or underestimation of model states, which are most probably related to random noise in the “observations” and to the chaotic nature of the Lorenz system. The state estimate is probably poor when a large noise is assimilated and when the state is a transition from one chaotic regime to the other (also see the discussions below).

The overall performance of each assimilation is measured by the root-mean-square error (RMSE) over all time steps, as shown in Table 1. As shown, SP-UKF and SP-CDKF have slightly smaller RMSE than others. The most impressive point in the table is that SPKF methods use only 19 sigma points (or in general 19 conditional ensembles) to estimate the statistical moments of the nonlinear model accurately. This turns out to be an advantage for the data assimilation problems in low-dimensional systems, but in the case of atmospheric or ocean GCMs the 2*L* + 1 integration is not computationally feasible. More details on SPKF implementation, its limitations, and methods to overcome the limitations are described in detail in section 5.

For the sake of completeness, we performed an EnKF assimilation experiment with 19 ensembles compared to 1000 ensembles. The result of this experiment is shown in Fig. 3a and the corresponding ES and RMSE are shown in Fig. 3b and Table 1. These results show that errors of state estimate from the EnKF with 19 ensemble members are around 5–10 times as much as from SPKF. Thus, EnKF with only 19 members could not capture the mean and covariance of a highly nonlinear Lorenz model appropriately. On the other hand, with just 19 conditional ensembles (or sigma points), SPKF is able to capture the statistical moments of the highly nonlinear Lorenz model.

The assimilation experiments took place on a symmetric multiprocessor (SMP) machine with two AMD Optron 248 CPUs (Advanced Micro Devices 2007) with a clock speed of 2.2 GHz, running on Linux. MATLAB 7.3.0.298 (R2006) software (available online from Mathworks, Inc., at http://www.mathworks.com/products/matlab/index.html) was used to implement the model and data assimilation algorithm. Table 1 also compares the computation time required by each assimilation algorithm discussed above. To compare the computational efficiency, we use the same programming framework for implementing all the data assimilation methods discussed above. The computational cost is the least for EKF, followed by two SPKF methods. The EnKF that requires 1000 members for a good estimate (see Fig. 1b) is the most expensive, around 50–80 times as much as SPKF.

The second set of experiments was carried out with a more realistic situation by increasing the observations’ noise level tenfold: the observations and initial conditions are generated by adding normally distributed noise *N*(0,

In the third set of experiments, we increased the observation noise level as well as the interval between consecutive observations; the interval between observations was increased from 25 to 40 and the observations and initial conditions were generated by adding normally distributed noise *N*(0,

Figure 7 shows some divergence in some time steps of the assimilation track among the four methods. For example, the errors (ES) vary almost steadily in SP-UKF whereas SP-CDKF has a relatively significant variation of ES with time steps. Compared with SPKF, the variation of ES is more striking in EKF and EnKF. The significant variation of ES might be related to the chaotic nature of the Lorenz system and the capability of individual algorithm in capturing the observation information. The chaotic Lorenz attractor is known to have a butterfly shape with two wings. For a good estimate of the transition state from one wing to the other, the assimilation should be able to characterize the information of both wings of the Lorenz attractor. Obviously this depends on two issues: the observation itself and the assimilation algorithm. If the observation is more frequently assimilated (i.e., the interval between observations is small), sufficient data allow the coverage of more information of both chaotic regimes in assimilation. This is the reason why there are many more abnormal” values of ES in Fig. 7 than in Fig. 2, in which the observations are more frequent. On the other hand, if one assimilation algorithm has a better capacity to mix observation and model information to characterize transitions, it would have better estimates for transition states. In many cases, it highly depends on the model and observation error covariances. When the observation error covariance is usually predescribed, the model error covariance is updated at each assimilation step in the family of Kalman filters, depending on the algorithm. Thus, Fig. 7 suggests that SPKF is probably better than EKF and EnKF in the assimilation of some transition states using noisy observations.

Again, we repeated the EnKF assimilation (for case 3) with 1000 ensembles and the result is shown in Fig. 8. The result is not as good as SPKF assimilation and seems noisier. This is probably because the observation assimilated is noisier and less frequent; thus, an ensemble size of 1000 is probably not enough to capture the statistical moments accurately.

### b. Parameter estimation

Estimating uncertain dynamical model parameters is one of the important tasks in data assimilation, where the measurement function is usually nonlinear. The requirement of the tangent linear measurement operator 𝗛 in the optimal gain term given by Eq. (10) makes the EKF and EnKF assimilation schemes inaccurate and inappropriate for the parameter estimation in nonlinear dynamical systems. It has been shown that the EnKF data assimilation gave poor results in estimating the dynamical parameter of the Lorenz model (Kivman 2003). The SPKF methods should be better alternatives for parameter estimation because they do not need to linearize the nonlinear measurement function.

The experimental setup is identical to that of the first case of the state estimation problem discussed in the above subsection. To simulate a more realistic situation, the initial guesses of the parameters are generated by adding normal distributed noise of covariance 100 to the true parameters. In the first case, we assume that only one parameter (say *β*) is uncertain. Thus, our task is to estimate the correct value of *β* from infrequent observations contaminated by noise. Figure 9 shows the SPKF parameter estimation results. Figures 9a and 9b show the parameter estimation using SP-UKF and SP-CDKF, respectively.

From these figures it is clear that SPKF assimilation methods can retrieve dynamical parameters well from noisy observations. In the above experiment, even though the initial parameter was far from the true value (the standard deviation is 10), the SPKF method is still able to estimate the parameter accurately. In general, our experiments suggest a faster convergence for SP-CDKF algorithm. This might be due to the algorithm tuning problem, because SP-CDKF uses only one control parameter (*δ*) compared to three (*λ*, *α*, and *κ*, ) in SP-UKF.

In the second case we assume that two dynamical parameters (say *ρ* and *β*) are uncertain. This situation is more difficult than the first case because inaccuracy in the estimation of one parameter can result in inaccurate estimation of the other. Initial parameters were generated using the same method as in the previous case: adding normal distributed noise of covariance 100 to the true parameters. Figures 10a and 10b show the results of the simultaneous estimation of *ρ* and *β* using SP-UKF and SP-CDKF, respectively. In the case of single parameter estimation, SPKF assimilation is able to approximate the true parameter much faster compared to the two-parameter case. This suggests that more frequent observations might be needed to accurately estimate both parameters.

### c. Joint estimation of model states and parameters

Data assimilation problems involving inaccurate model states and parameters arise in many situations in meteorology and physical oceanography. In this situation our task is to estimate the model states and parameters simultaneously from a set of noisy observations. In this experiment we used SPKF data assimilation schemes for the joint estimation of parameters and states simultaneously. The experimental setup is identical to that of the state estimation discussed in the above section where the interval between noisy observations is set to 25 and the noise covariance is 2. In the joint estimation approach the model states and parameters evolve in time simultaneously; model states are estimated at each assimilation step using the estimated parameters, which are estimated from the prior states. In this simulation we estimated the model state *x* and dynamical parameter *σ* simultaneously. Figures 11 and 12 show the joint estimation results and the corresponding error square for SP-UKF assimilation, respectively; Figs. 13 and 14 do the same for SP-CDKF data assimilation.

Simultaneously estimating both state and parameter values increases the nonlinearity of the assimilation problem, thereby increasing the assimilation time needed to retrieve them. From Figs. 12 and 14, we can see that the ES of the parameter estimate decreases with time. One interesting feature in Figs. 11 –14 is that when the estimated parameters are far away the true values, the model states still have good estimation. This is because the initial model errors for the states are much higher than those for the parameter; thus, the analysis weights the observations much more than the model simulation associated with inaccurate parameters. When the estimated parameter gradually approaches the true value, the ES of model state estimate seems not to decrease significantly. This is because as the model error decreases with assimilation time (i.e., as the model becomes more and more accurate), the model state becomes more sensitive to slight changes in the estimated parameter.

In summary, all the above experiments—including state, parameter, and joint estimation with different observation frequencies and noise levels—show that sigma-point Kalman filters are efficient and good assimilation algorithms for a highly nonlinear Lorenz system. If the observation density is high and noise level is small, all the data assimilation methods discussed above would estimate the model state accurately, but at the cost of additional computational expense and the requirement of TLM for EnKF and EKF. But even if the noise level is high and the observations are less frequent, SPKF can estimate the model states and parameters with good accuracy, without the requirements of TLM and costly computation.

## 5. SPKF data assimilation in higher-dimensional systems

In the preceding sections, we have demonstrated the power and merits of SPKF, as well as its advantages over EKF and EnKF, by the low-dimensional Lorenz model. One of the crucial issues in evaluating a data assimilation algorithm is its computational expense when applied to realistic models that have a large dimensionality. In this section we will further explore the SPKF using higher-dimensional Lorenz models.

For an *L*-dimensional system, the number of sigma points required to estimate the true mean and covariance is 2*L* + 1. As described in the previous sections, this procedure works well for low-dimensional models like the Lorenz 1963 model, but 2*L* + 1 sigma-point integration is computationally unfeasible if the dimension system is of the order of tens of millions, as in global GCMs. Julier (Julier and Uhlmann 2002; Julier 2003; Julier and Uhlmann 2004) has shown that by using *simplex unscented transformation* the minimum number of sigma points that gives same estimation accuracy as SP-UKF can be reduced to *L* + 1. These sigma points are called simplex sigma points, but for higher-dimensional systems this *L* + 1 simplex sigma-point integration is still computationally intractable. A possible solution to this problem is to reduce the number of sigma points by selecting a particular subset of sigma points from the original sigma-point space, which can approximate the error statistics of the model. In the following subsections we will examine this possibility.

### a. A subspace approach with sigma points: Design and implementation

*P*

_{k}

^{p}to the error covariance

*P*should be defined by minimizing the norm of the difference between

_{k}*P*and

_{k}*P*

_{k}

^{p}; i.e.,According to the minimum criterion (58), the error subspace is characterized by the singular vectors and values of

*P*.

_{k}We follow an idea similar to ESSE to form a sigma-point subspace that approximates the mean and error covariance of system. In our approach, it is assumed that when the estimate of a system’s full errors requires all sigma points, its dominant errors can be estimated using the most important sigma points. Theoretically these most important sigma points should be chosen based on (58). However, this will introduce huge complexity and be difficult to implement. For simplicity, as a good start toward a complete solution to the problem, we have used principal component analysis (PCA) to identify the most important sigma points that influence the evolution of error covariance. The main idea behind using PCA is to represent the multidimensional sigma-point space by a fewer number of sigma points while still retaining the main features of the original sigma-point space; that is, sigma points in the principal component space are used to calculate the error propagation. The selection of sigma points is based on the proportion of variances. Specifically, instead of using the full sigma-point space, we use some leading principal components, thereby reducing the number of sigma points required to approximate forecast error covariance. In the following subsections, we will see the potential of this approach in the assimilation of higher-dimensional systems.

### b. Experiments with the Lorenz 1995 model

*K*variables

*X*

_{1}, …,

*X*, which may be thought of as atmospheric variables in

_{k}*K*sectors of a latitude circle, governed bywhere the constant

*F*, called the forcing term, is independent of

*k*. By using the cyclic boundary conditions, the definition of

*X*is extended to all values of

_{k}*k*; that is,

*X*

_{k}_{−K}and

*X*

_{k}_{+K}equal

*X*. It is assumed that a unit time Δ

_{k}*t*= 1 is associated with 5 days.

The experimental setup is similar to that of Lorenz (Lorenz 2006), where *K* = 36 and the magnitude of the forcing is set to 8, for which the system is chaotic. The system is integrated using the fourth-order Runge–Kutta method, with a time step of Δ*t* = 0.05 (i.e., 6 h). The experiments were carried out with random initial conditions, and the observations were generated by adding normally distributed noise *N* (0,

### c. Performance and evaluation

For all the cases to be discussed, we assume that the model and observation errors are uncorrelated in both space and time. In the first case we use “full” sigma-point space for the calculation of error covariance. Thus we have a total of 217 sigma points, hence 217 ensemble members. Figure 15a shows the state estimate using the SP-UKF. As can be seen, SP-UKF can estimate the model states similar to true values, indicating good capability of the original SPKF methods in estimating model states.

In the second case we use the *reduced sigma-point subspace* to calculate the error covariance. In this case we select 40 sigma points, which account for more than 90% of the total variance. The result of this experiment is shown in Fig. 15b. As can be seen, the model states can be fairly well estimated by the reduced SPKF, although its estimate accuracy is not as good as the original SPKF. This suggests a possible solution to applying SPKF for high-dimensional systems. For the sake of completeness, we performed an EnKF assimilation experiment with 40 ensembles. The ensemble is generated using the same approach as the previous experiment with the Lorenz 1963 model, in which we used 19 ensembles (Fig. 3a). The result of this experiment is shown in Fig. 15c. Comparing Figs. 15b and 15c reveals that the reduced SPKF is better than the EnKF for the state estimate, especially for the magnitude estimate. It is apparent that the EnKF underestimates the magnitude of model states during the transition phase period.

We also performed the SPKF assimilation experiment for the 960-variable Lorenz ’95 model. The experimental setup is identical to that in the previous cases except that *K* = 960. Two cases are studied with the model. In the first case we use all sigma points (a total of 5761 sigma points), and in the second case we use 200 important sigma points for the calculation of error covariances. The results of these experiments are shown in Fig. 16. For comparison, we also performed an EnKF assimilation experiment with 200 ensembles, and the result of this experiment is shown in Fig. 16c. Apparently the reduced SPKF leads to a better estimate than the EnKF in both phase and magnitude simulation. As can be seen in the EnKF, the estimated state is often out of the phase of “true” trajectory, which is absent in the reduced SPKF. The correlation between the estimated trajectory and true trajectory is 0.59 for the reduced SPKF and 0.10 for the EnKF.

A great deal of additional research is needed for better design and implementation of these techniques applied to atmosphere or ocean GCMs for state, parameter, and joint estimation problems. However, the above experimental results are promising, and a variety of possible extensions to these techniques could be developed to deal with more complicated situations.

## 6. Discussion and conclusions

The EKF and EnKF, two important Kalman-type filters, have been widely applied for atmospheric and oceanic data assimilation because of their efficient and simple algorithms. The major weaknesses of the EKF and EnKF are that the former needs to calculate the tangent linear model or Jacobian for linearization of nonlinear forecast models and that the EnKF performance is greatly dependent on ensemble size, which is often an intractable burden for computation. The EKF and EnKF cannot deal with the systems directly if observed data are a nonlinear transformation of states.

In this study we introduced and presented two recently proposed derivativeless sigma-point Kalman filters. The SPKF is a technique for implementing a derivativeless optimal estimation using a novel deterministic sampling approach that ensures a small size of samples to accurately estimate forecast error statistics. It is unlike EnKF, in which a random sampling strategy is used. The technique employed in SPKF is that it reinterprets the standard Kalman gain and covariance update equation in such a way that it does not need linearization of the nonlinear prediction model and nonlinear measurement operator, and it can capture the statistical moments of the nonlinear model accurately using deterministic sampling technique. Thus, in SPKF the forecast error covariance equation is computed using deterministically chosen samples, called sigma points. In a broad sense, the SPKF algorithm can be considered as a particular case of the ensemble Kalman filter with a specific sample selection scheme. In other words, the forecast sigma points in SPKF algorithms are actually specific ensembles conditioned on the specific selection schemes, which can represent the error statistics accurately. Also, the ensemble forecast step in SPKF can be parallelized by running each ensemble member on a separate processor of a parallel computer (or cluster), resulting in huge computational savings.

Using the highly nonlinear low-dimensional Lorenz 1963 model and a higher-dimensional Lorenz 1995 model, we investigated the capability and performance of SPKF over standard KF-based data assimilation methods for three different classes of problems, namely state estimation, parameter estimation, and joint estimation. The results demonstrated that the SPKF has better estimate accuracy than EKF and EnKF for all experiments. SPKF experiments with a higher-dimensional model suggest that it is possible to reduce the number of sigma points, thereby reducing the computation time, by using a reduced sigma-point space approach. The results in this study are encouraging and suggest that the SPKF could become an effective method to assimilate observations into realistic models such as atmospheric or oceanic GCMs. The SPKF also has the advantage that it does not need tangent linear or Jacobian operators of the original models.

The SP-UKF and SP-CDKF data assimilation involves the calculation of the matrix square root of the state covariance matrix, which is a computationally intensive process. It has been shown that square root formulation of SP-UKF and SP-CDKF is numerically efficient and stable and has equal estimation accuracy when compared to original SP-UKF and SP-CDKF (van der Merwe and Wan 2001a,b). Because the state space dimension of the model that we used in this study is relatively small, it is practically irrelevant to compare the numerical stability of the square root formulation with original SP-UKF and SP-CDKF implementation. Therefore, this issue is left for future study in GCMs.

In this study, we explored the SPKF using highly simplified nonlinear models. One might be concerned by the performance and efficiency of SPKF when a realistic GCM is used. Additional research is needed for better implementation of these techniques applied to data assimilation problems in atmospheric or ocean GCMs. Nonetheless, the present study represents a step in pursuing advanced data assimilation algorithms by using a simple nonlinear model, which shares some common features with complicated atmospheric and oceanic models. Future work will also study the parallelization of SPKF data assimilation in GCMs similar to EnKF parallelization because the propagation of each sigma point through the nonlinear model is independent. We are currently working on an implementation of the SPKF for a realistic ocean GCM that will investigate the estimation accuracy, numerical stability, and consistency, as well as the computational difficulties. These studies will be described in future work.

## Acknowledgments

The authors acknowledge and thank the Oregon Graduate Institute and Dr. Eric A. Wan and Dr. Rudolph van der Merwe for providing the ReBEL (van der Merwe and Wan 2003) tool kit, part of which has been used in this research work. This work was supported by a Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant and Canadian Foundation for Climate and Atmospheric Sciences (CFCAS) network project of “Global Ocean–Atmosphere Prediction and Predictability.”

## REFERENCES

Advanced Micro Devices, cited. 2007: AMD Opteron processor technical documents. Advanced Micro Devices, Inc. [Available online at http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118,00.html.].

Baheti, R. S., , O. Halloran, , and H. R. Itzkowitz, 1990: Mapping extended Kalman filters onto linear arrays.

,*IEEE Trans. Autom. Control***35****,**1310–1319.Daum, F. E., , and J. Fitzgerald, 1983: Decoupled Kalman filters for phased array radar tracking.

,*IEEE Trans. Autom. Control***28****,**269–283.Davis, R. E., 1977: Techniques for statistical analysis and prediction of geophysical fluid systems.

,*Geophys. Astrophys. Fluid Dyn.***8****,**245–277.Evensen, G., 1992: Using the extended Kalman filter with a multilayer quasi-geostrophic ocean model.

,*J. Geophys. Res.***97****,**(C11). 17905–17924.Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics.

,*J. Geophys. Res.***99****,**(C5). 10143–10162.Evensen, G., 1997: Advanced data assimilation for strongly nonlinear dynamics.

,*Mon. Wea. Rev.***125****,**1342–1354.Evensen, G., 2003: The ensemble Kalman filter: Theoretical formulation and practical implementation.

,*Ocean Dyn.***53****,**343–367.Gamage, N., , and W. Blumen, 1993: Comparative analysis of low-level cold fronts: Wavelet, Fourier, and empirical orthogonal function decompositions.

,*Mon. Wea. Rev.***121****,**2867–2878.Gauthier, P., 1992: Chaos and quadric-dimensional data assimilation: A study based on the Lorenz model.

,*Tellus***44A****,**2–17.Gelb, A., 1974:

*Applied Optimal Estimation*. MIT Press, 374 pp.Hasselmann, K., 1988: PIPs and POPs: A general formalism for the reduction of dynamical systems in terms of principal interaction patterns and principal oscillation patterns.

,*J. Geophys. Res.***93****,**11 015–11 021.Haykin, S., Ed. 2001:

*Kalman Filtering and Neural Networks*. Wiley, 284 pp.Houtekamer, P., , and H. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique.

,*Mon. Wea. Rev.***126****,**796–811.Ito, K., , and K. Xiong, 2000: Gaussian filters for nonlinear filtering problems.

,*IEEE Trans. Autom. Control***45****,**910–927.Julier, S., 1998: A skewed approach to filtering.

*Proc. SPIE Conf. on Signal and Data Processing of Small Targets*, Orlando, FL, International Society for Optical Engineering, 271–282.Julier, S., 2002: The scaled unscented transformation.

*Proc. 2002 American Control Conf*., Vol. 6, Anchorage, AK, IEEE, 4555–4559.Julier, S., 2003: The spherical simplex unscented transformation.

*Proc. 2003 American Control Conf*., Vol. 3, Denver, CO, IEEE, 2430–2434.Julier, S., , and J. Uhlmann, 2002: Reduced sigma-point filters for the propagation of mean and covariances through nonlinear transformations.

*Proc. 2002 American Control Conf*., Vol. 2, Anchorage, AK, IEEE, 887–892.Julier, S., , and J. Uhlmann, 2004: Unscented filtering and nonlinear estimation.

*Proc. IEEE*, 92, 401–422.Julier, S., , J. Uhlmann, , and H. Durrant-Whyte, 1995: A new approach for filtering nonlinear systems.

*Proc. 1995 American Control Conf*., Seattle, WA, IEEE, 1628–1632.Keppenne, C. L., 2000: Data assimilation into a primitive-equation model with a parallel ensemble Kalman filter.

,*Mon. Wea. Rev.***128****,**1971–1981.Kivman, G. A., 2003: Sequential parameter estimation for stochastic systems.

,*Nonlinear Processes Geophys.***10****,**253–259.Lefebvre, T., , H. Bruyninckx, , and J. De Schutter, 2002: Comment on “A new method for the nonlinear transformation of means and covariances in filters and estimators.”.

,*IEEE Trans. Autom. Control***47****,**1406–1409.Lermusiaux, P. F. J., 1997: Error subspace data assimilation methods for ocean field estimation: Theory, validation, and applications. Ph.D. thesis, Harvard University, 402 pp.

Lermusiaux, P. F. J., , and A. R. Robinson, 1999: Data assimilation via error subspace statistical estimation. Part I: Theory and schemes.

,*Mon. Wea. Rev.***127****,**1385–1407.Lorenz, E., 1963: Deterministic nonperiodic flow.

,*J. Atmos. Sci.***20****,**130–141.Lorenz, E., 1965: A study of the predictability of a 28-variable atmospheric model.

,*Tellus***17****,**321–333.Lorenz, E., 2005: Designing chaotic models.

,*J. Atmos. Sci.***62****,**1574–1587.Lorenz, E., 2006: Predictability—A problem partly solved.

*Predictability of Weather and Climate,*T. Palmer and R. Hagedorn, Eds., Cambridge University Press, 40–58.Lorenz, E., , and K. A. Emmanuel, 1998: Optimal sites for supplementary weather observations: Simulation with a small model.

,*J. Atmos. Sci.***55****,**399–414.Meyers, S. D., , B. G. Kelly, , and J. J. O’Brien, 1993: An introduction to wavelet analysis in oceanography and meteorology: With application to the dispersion of Yanai waves.

,*Mon. Wea. Rev.***121****,**2858–2866.Miller, R., , M. Ghil, , and F. Gauthiez, 1994: Advanced data assimilation in strongly nonlinear dynamical systems.

,*J. Atmos. Sci.***51****,**1037–1056.Nelson, A. T., 2000: Nonlinear estimation and modeling of noisy time-series by dual Kalman filtering methods. Ph.D. thesis, Oregon Graduate Institute of Science and Technology, 298 pp.

Nørgaard, M., , N. K. Poulsen, , and O. Ravn, 2000a: Advances in derivative-free state estimation for nonlinear systems. Tech. Rep. IMM-REP-1998-15, Dept. of Mathematical Modeling, Technical University of Denmark, 33 pp.

Nørgaard, M., , N. K. Poulsen, , and O. Ravn, 2000b: New developments in state estimation of nonlinear systems.

,*Automatica***36****,**1627–1638.Ohmuro, T., 1984: A decoupled Kalman tracker using LOS coordinates.

*Proc. Int. Symp. Noise and Clutter Rejection in Radars and Imaging Sensors*, Tokyo, Japan, IEEE, 451–455.Palmer, T., 1993: Extended-range atmospheric prediction and the Lorenz model.

,*Bull. Amer. Meteor. Soc.***74****,**49–65.Penland, C., 1989: Random forcing and forecasting using principal oscillation pattern analysis.

,*Mon. Wea. Rev.***117****,**2165–2185.Press, W. H., , S. A. Teukolsky, , W. T. Vetterling, , and B. P. Flannery, 1992:

*Numerical Recipes in C: The Art of Scientific Computing*. 2nd ed. Cambridge University Press, 994 pp.Schei, T. S., 1997: A finite-difference method for linearization in nonlinear estimation algorithms.

,*Automatica***33****,**2053–2058.Schnur, R., , G. Schmitz, , N. Grieger, , and H. von Storch, 1993: Normal modes of the atmosphere as estimated by principal oscillation patterns and derived from quasi-geostrophic theory.

,*J. Atmos. Sci.***50****,**2386–2400.Simon, D., 2006:

*Optimal State Estimation, Kalman, H*. 1st ed. Wiley-Interscience, 526 pp._{∞}, and Nonlinear ApproachesTeman, R., 1991: Approximation of attractors, large eddy simulations and multiscale methods.

,*Proc. Roy. Soc. London***434A****,**23–29.Van der Merwe, R., 2004: Sigma-point Kalman filters for probabilistic inference in dynamic state-space models. Ph.D. thesis, Oregon Health and Science University.

Van der Merwe, R., , and E. A. Wan, 2001a: Efficient derivative-free Kalman filters for online learning.

*Proc. 2001 European Symp. on Artificial Neural Networks (ESANN)*, Bruges, Belgium, 6 pp. [Available online at http://www.dice.ucl.ac.be/Proceedings/esann/esannpdf/es2001-21.pdf.].Van der Merwe, R., , and E. A. Wan, 2001b: The square-root unscented Kalman filter for state and parameter estimation.

*Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)*, Vol. 6, Salt Lake City, UT, IEEE, 3461–3464.Van der Merwe, R., , and E. A. Wan, cited. 2003: ReBEL: Recursive Bayesian estimation library. [Available online at http://choosh.csee.ogi.edu/rebel/index.html.].

Van der Merwe, R., , A. Doucet, , N. de Freitas, , and E. Wan, 2000: The unscented particle filter. Tech. Rep. CUED/F-INFENG/TR 380, Cambridge University Engineering Department, 46 pp. [Available online at http://citeseer.ist.psu.edu/article/vandermerwe00unscented.html].

Van der Merwe, R., , E. A. Wan, , and S. I. Julier, 2004: Sigma-point Kalman filters for nonlinear estimation and sensor fusion: Applications to integrated navigation.

*AIAA Guidance, Navigation and Control Conf.,*Providence, RI, American Institute of Aeronautics and Astronautics, 5120–5122.Von Storch, H., , and C. Frankignoul, 1997: Empirical modal decomposition in coastal oceanography.

*The Global Coastal Ocean,*K. Brink and A. R. Robinson, Eds.,*The Sea,*Vol. 10, Wiley, 419–455.Von Storch, H., , G. Burger, , R. Schnur, , and J-S. von Storch, 1995: Principal oscillation patterns: A review.

,*J. Climate***8****,**377–400.Wallace, J. M., , C. Smith, , and C. S. Bretherton, 1992: Singular value decomposition of wintertime sea surface temperature and 500-mb height anomalies.

,*J. Climate***5****,**561–576.Wan, E. A., , and R. Van der Merwe, 2000: The unscented Kalman filter for nonlinear estimation.

*Proc. 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (ASSPCC),*Lake Louise, AB, Canada, IEEE, 153–158.Wan, E. A., , and R. Van der Merwe, 2001: Kalman filtering and neural networks.

*The Unscented Kalman Filter,*S. Haykin, Ed., Wiley, 221–277.Weare, B. C., , and J. Nasstrom, 1982: Examples of empirical orthogonal function analyses.

,*Mon. Wea. Rev.***110****,**481–485.Welch, G., , and G. Bishop, 1995: An introduction to the Kalman filter. Tech. Rep. TR95-041, University of North Carolina, Chapel Hill, NC, 16 pp.

## APPENDIX A

### Reinterpretation of the Standard Kalman Gain

*E*[·] represents the mathematical expectation or the expected value. Thus, the state and covariance update equations can be rewritten asNow the “standard” Kalman gain equation is given bywhere 𝗣

_{θk}is the forecast error covariance matrix. The first underbracketed expression in the Kalman gain term can be interpreted as the cross-covariance between the state and observation errors (Gelb 1974; Simon 2006):Similarly, the second underbracketed expression in Eq. (A7) can be interpreted as the error covariance of the difference between model and observation (Gelb 1974):Therefore the Kalman gain can be rewritten asThe main advantage of using this form of Kalman gain is that we can avoid the use of a measurement operator, especially when the measurement operator is a nonlinear function of the state. A complete statistical derivation of the above formulation can be found in Simon (2006).

## APPENDIX B

### An Alternate Formula for Updating the State Error Covariance Matrix

*ψ*_{k}and its prediction

_{θk}and the cross covariance 𝗣

_{θk}

*ψ̃*between the state and observation error given by Eqs. (A8) and (A12) can be rewritten in terms of Eqs. (B1) and (B2) and are given byTaking the outer products and expectation of (B3) producesUsing Eqs. (B5) and (B6), Eq. (B7) can be rewritten asSubstituting the expression for Kalman gain, given by Eq. (A17), back into the above expression, the covariance update equation is given byA more detailed interpretation and derivation of the above expression can be found in Simon (2006).

_{k}RMSE and computation time for case 1.

^{1}

A more detailed statistical derivation and interpretation of these formulations can be found in Simon (2006).

^{2}

The weighting term corresponding to the zeroth sigma point directly affects the magnitude of errors in higher-order moments for symmetric distributions (Julier 2002; van der Merwe et al. 2000). The parameter *β* is thus introduced to minimize the higher-order errors.

^{3}

However, our numerical experiments show that the SP-CDKF does not always outperform SP-UKF. See the following discussions.

^{4}

The linear transformation from the stochastic vector _{k} to *θ*_{k} : *θ*_{k} = *S*_{θk} *ϕ*_{k}, decouples the fully coupled state vector *θ** _{k}* where the covariance of

_{k}is equal to the identity matrix. For computational reasons the square root matrix

*S*_{θk}often remains triangular (Cholesky decomposition). More details on decoupling and its advantages in Kalman filters can be found in Ohmuro (1984), Baheti et al. (1990), and Daum and Fitzgerald (1983)

^{5}

The model is considered to have a relatively large error at the initial time so the assimilation weighs more observation information. As such, the model prediction would not drift from the true value too much.

^{6}

We choose *N* to be 4000, which is the total time step.