## 1. Introduction

Since early 1960s, data assimilation in atmospheric and oceanographic applications is based on the Kalman filtering theory (Kalman and Bucy 1961; Jazwinski 1970). Beginning with optimal interpolation (Gandin 1963), and continuing with three-dimensional (Parrish and Derber 1992; Rabier et al. 1998; Cohn et al. 1998; Daley and Barker 2001) and four-dimensional variational data assimilation (Navon et al. 1992; Zupanski 1993; Zou et al. 1995, 2001; Courtier et al. 1994; Rabier et al. 2000; Zupanski et al. 2002), data assimilation methodologies operationally used in atmospheric and oceanic applications can be viewed as an effort to approximate the Kalman filter/smoother theoretical framework (Cohn 1997). The approximations are necessary because of the lack of knowledge of statistical properties of models and observations, as well as because of a tremendous computational burden associated with high dimensionality of realistic atmospheric and oceanic data assimilation problems. So far, common approaches to realistic data assimilation were to approximate (e.g., model) error covariances, as well as to avoid the calculation of posterior (e.g., analysis) error covariance. These approximations have a common problem of not being able to use fully cycled error covariance information, as the theory suggests. The consequence is not only that the produced analysis is of reduced quality, but also that no reliable estimates of the uncertainties of the produced analysis are available.

For the first time, a novel approach to data assimilation in oceanography and meteorology pursued in recent years (Evensen 1994; Houtekamer and Mitchell 1998; Pham et al. 1998; Lermusiaux and Robinson 1999; Brasseur et al. 1999; Hamill and Snyder 2000; Evensen and van Leeuwen 2000; Keppenne 2000; Bishop et al. 2001; Anderson 2001; van Leeuwen 2001; Haugen and Evensen 2002; Reichle et al. 2002b; Whitaker and Hamill 2002; Anderson 2003; Ott et al. 2004), based on the use of ensemble forecasting in nonlinear Kalman filtering, offers the means to consistently estimate the analysis uncertainties. The price to pay is the reduced dimension of the analysis subspace (defined by ensemble forecasts); thus there is a concern of not being sufficient to adequately represent all important dynamical features and instabilities. Preliminary results show, however, that this may not always be a problem (e.g., Houtekamer and Mitchell 2001; Keppenne and Rienecker 2002). On the other hand, it is anticipated that the ensemble size will need to be increased as more realistic and higher-resolution models and observations are used. This, however, may be feasible even on currently available computers. With the advancement in computer technology, and multiple processing in particular, which is ideally suited for ensemble framework, the future looks promising for continuing development and realistic applications of ensemble data assimilation methodology.

In achieving that goal, however, there are still few unresolved methodological and practical issues that will be pursued in this paper. Current ensemble data assimilation methodologies are broadly grouped in stochastic and deterministic approaches (Tippett et al. 2003). A common starting point to these algorithms is the use of the *solution form* of the extended Kalman filter (e.g., Evensen 2003), obtained assuming linearized dynamics and observation operators, with Gaussian assumption regarding the measurements and control variables (e.g., initial conditions). We refer to this as a *linearized* solution form. Since realistic observation operators are generally nonlinear, a common approach to nonlinearity in ensemble data assimilation is to use a first-order Taylor series assumption, that is, to use a difference between two nonlinear operators in the place of a linearized observation operator. The use of linearized solution form with nonlinear observation operators, however, creates a mathematical inconsistency in treatment of nonlinear observation operators. An alternate way to deal with the nonlinearity of observation operators is to first pose a fully nonlinear problem, and then find the solution in the ensemble-spanned subspace. This is the approach adopted in this paper.

The proposed ensemble data assimilation method is based on a combination of the maximum likelihood and ensemble data assimilation, named the maximum likelihood ensemble filter (MLEF). The analysis solution is obtained as a model state that maximizes the posterior conditional probability distribution. In practice, the calculation of the maximum likelihood state estimate is performed using an iterative minimization algorithm, thus making the MLEF approach closely related to the iterated Kalman filter (Jazwinski 1970; Cohn 1997). Since the cost function used to define the analysis problem is arbitrarily nonlinear, the treatment of nonlinear observation operators is considered an advantage of the MLEF algorithm. The use of optimization in MLEF forms a bond between ensemble data assimilation and control theory. Like other ensemble data assimilation algorithms, MLEF produces an estimate of the analysis uncertainty (e.g., analysis error covariance). The idea behind this development is to produce a method capable of optimally exploiting the experience gathered in operational data assimilation and the advancements in ensemble data assimilation, eventually producing a qualitatively new system. The practical goal is to develop a single data assimilation system easily applicable to the simplest, as well as to the most complex nonlinear models and observation operators.

While the maximum likelihood estimate has a unique solution for unimodal probability density functions (PDFs), there is a possibility for a nonunique solution in the case of multimodal PDFs. This issue will be given more attention in the future.

The method is explained in section 2, algorithmic details are given in section 3, experimental design is presented in section 4, results in section 5, and conclusions are drawn in section 6.

## 2. MLEF methodology

From variational methods it is known that a maximum likelihood estimate, adopted in MLEF, is a suitable approach in applications to realistic data assimilation in meteorology and oceanography. From operational applications of data assimilation methods, it is also known that a Gaussian PDF assumption, used in derivation of the cost function (e.g., Lorenc 1986), is generally accepted and widely used. Although the model and observation operators are generally nonlinear, and observation and forecast errors are not necessarily Gaussian, the Gaussian PDF framework is still a state-of-the-art approach in meteorological and oceanographic data assimilation (e.g., Cohn 1997). This is the main reason why a Gaussian PDF framework is used in this paper.

The mathematical framework of the MLEF algorithm is presented in two parts, the forecast and the analysis steps, followed by a brief comparison with related data assimilation methodologies.

### a. Forecast step

*(*

_{f}*k*) is the forecast error covariance at time

*t*, 𝗠

_{k}

_{k}_{−1}

*is the linearized forecast model (e.g., Jacobian) from time*

_{,k}*t*

_{k}_{−1}to time

*t*, 𝗣

_{k}*(*

_{a}*k*− 1) is the analysis error covariance at time

*t*

_{k}_{−1}, and 𝗤(

*k*− 1) is the model error covariance at time

*t*

_{k}_{−1}. The model error is neglected in the remainder of this paper. With this assumption, and after dropping the time indexing, the forecast error covariance is

*N*defines the dimension of the model state (e.g., initial conditions), and the index

*S*refers to the number of ensembles. In practical ensemble applications,

*S*is much smaller than

*N*. Using (3) in definition (2), the square root forecast error covariance is

**x**

_{k}_{−1}is the analysis from the previous analysis cycle, at time

*t*

_{k}_{−1}. Note that each of the columns {

**b**

_{i}:*I*= 1, . . . ,

*S*} has

*N*elements. The ensemble square root forecast error covariance 𝗣

^{1/2}

_{f}can be obtained from

*S*nonlinear ensemble forecasts,

*M*(

**x**

_{k}_{−1}+

**p**

*), plus one control forecast,*

_{i}*M*(

**x**

_{k}_{−1}) [e.g., (4)]. The forecast error covariance definition (4) implies the use of a control (deterministic) forecast instead of an ensemble mean, commonly used in other ensemble data assimilation methods. Ideally, the control forecast represents the most likely dynamical state; thus it is intrinsically related to the use of the maximum likelihood approach. In principle, however, the use of an ensemble mean instead of the most likely deterministic forecast is also possible.

Important to note is that the availability of an ensemble square root analysis error covariance 𝗣^{1/2}_{a}, provided by the data assimilation algorithm, is critical for proper coupling between analysis and forecast. In addition to data assimilation cycles, the 𝗣^{1/2}_{a} columns could be used as initial perturbations for ensemble forecasting, in agreement with (4).

### b. Analysis step

**x**is the model state vector,

**x**

*denotes the prior (background) state, and*

_{b}**y**is the measurement vector. The background state

**x**

*is an estimate of the most likely dynamical state; thus it is a deterministic forecast from the previous assimilation cycle. The nonlinear observation operator*

_{b}*H*represents a mapping from model space to observation space, and 𝗥 is the observation error covariance matrix.

Note that the error covariance matrix 𝗣* _{f}* is defined in the ensemble subspace [e.g., (4)], and thus it has a much smaller rank than the true forecast error covariance. Therefore, the cost function definition (5) only has a similar form as the three-dimensional variational cost function (e.g., Parrish and Derber 1992); however, it is defined in the ensemble subspace only. In strict terms, the invertibility of 𝗣

*in (5) is preserved only in the range of 𝗣*

_{f}*, implying that the cost function (5) is effectively defined in the range of 𝗣*

_{f}*as well. The same reasoning and definitions are implicit in other ensemble data assimilation methods, with the exemption of hybrid methods (e.g., Hamill and Snyder 2000).*

_{f}**is the control variable defined in ensemble subspace, and**

*ζ*^{T/2}

_{f}= (𝗣

^{1/2}

_{f})

^{T}is used in the above formula. A closer inspection reveals that the change of variable (6) is a perfect preconditioner in quadratic minimization problems (Axelsson 1984), that is, assuming linear observation operators. This means that, with the change of variable (6) and linear observation operators, the solution is obtained in a single step of minimization iteration. The matrix defined in (6) is the square root of an inverse Hessian of (5). The matrix 𝗖 is commonly neglected in Hessian preconditioning in variational problems, because of high dimensionality and associated computational burden.

^{−T/2}, however, requires some attention. Since the columns of the square root forecast error covariance are known, the

*i*th column of the matrix appearing in (7) is

**z**

*has the dimension of observation space. The matrix 𝗖 can be then defined as*

_{i}*S*×

*S*symmetric matrix; thus it has small dimensions defined by the number of ensembles. To calculate efficiently the inversion and the square root involved in (𝗜 + 𝗖)

^{−T/2}, an eigenvalue decomposition (EVD) of the matrix 𝗖 may be used. One obtains 𝗖 = 𝗩

**Λ**𝗩

^{T}, where 𝗩 denotes the eigenvector matrix, and

**Λ**is the eigenvalue matrix. Then

^{−1/2}𝗛𝗣

^{1/2}

_{f}.

As shown in appendix A, within a linear operator framework, the first minimization iteration calculated using the preconditioned steepest descent is equivalent to the ensemble-based reduced-rank Kalman filter (Verlaan and Heemink 2001; Heemink et al. 2001), or to the Monte Carlo–based ensemble Kalman filter (Evensen 1994). Although different in detail, the computational effort involved in calculation with ensemble-based Kalman filters is comparable to the calculation of ensemble-based Hessian preconditioning and the gradient present in the MLEF algorithm. In both the ensemble Kalman filters and the MLEF, the computational cost of the analysis step is dominated by a matrix inversion computation [e.g., (A5)–(A7), appendix A].

**x**

_{opt}), and then calculate 𝗣

^{1/2}

_{a}according to

^{1/2}

_{a}are then used as initial perturbations for the next assimilation cycle, according to (3) and (4), and cycling of analysis and forecast continues.

### c. The MLEF and related data assimilation methodologies

The MLEF method encompasses a few important existing methodologies and algorithms.

#### 1) Variational data assimilation

The minimization of the cost function, used to derive the maximum likelihood estimate in MLEF, is inherently related to variational data assimilation algorithms. The difference is that, in the MLEF formulation, the minimization is performed in an ensemble-spanned subspace, while in the variational method the full model space is used. The issue of the number of degrees of freedom is problem dependent and will require consideration in future realistic applications. At present, one should note that there are ways to introduce complementary degrees of freedom and obtain a unique mathematical solution (e.g., Hamill and Snyder 2000). Also, there is a practical possibility to increase the degrees of freedom by introducing more ensemble members. All of these options require careful examination in problem-oriented applications.

#### 2) Iterated Kalman filter

Another methodology related to MLEF is the iterated Kalman filter (IKF; Jazwinski 1970; Cohn 1997), developed with the idea to solve iteratively the nonlinear problem. Bell and Cathey (1993) demonstrated that IKF is a Gauss–Newton method. As is the MLEF, the IKF is calculating the mode (e.g., maximum likelihood approach) with underlying Gaussian assumption. An obvious difference is that MLEF is defined within an ensemble framework. The practical advantage of an iterative methodology, such as the IKF or MLEF, is fundamentally tied to the choice of minimization method. An integral part of the MLEF is the use of an unconstrained minimization algorithm, in the form of the nonlinear conjugate gradient and the limited-memory Broyden–Fletcher–Goldfarb–Shanno (LBFGS) quasi-Newton methods (e.g., Gill et al. 1981; Luenberger 1984; Nocedal 1980). The unconstrained minimization approach allows very efficient iterative solution to the problem with significant nonlinearities and large residuals (e.g., Gill et al. 1981).

#### 3) Ensemble transform Kalman filter

The matrix transform and eigenvalue decomposition used for the Hessian preconditioning in MLEF [(6)–(10)] is equivalent to the matrix transform introduced in the ETKF algorithm (Bishop et al. 2001). This approach allows an efficient reduction of the dimensions of a matrix to be inverted. Therefore, the MLEF algorithm can be viewed as a maximum likelihood approach to the ETKF (C. Bishop 2003, personal communication).

The idea behind the MLEF is to retain only the components and concepts deemed advantageous from other algorithms, while weak components are changed or improved. For example, the cost function minimization, used in variational methods and IKF, is characterized as beneficial: minimization allows the equivalence between the inverse Hessian and analysis error covariance to be valid even for arbitrary nonlinear observation operators. Modeling of forecast error covariance, Hessian preconditioning, and adjoint model development are all considered weak points of variational methods and are improved or avoided using an ensemble framework. Hessian preconditioning introduced in the ETKF is considered advantageous as well. The ensemble framework makes the probabilistic forecasting and data assimilation with realistic prediction models and observations feasible, which is not possible with IKF.

The end products of the MLEF algorithm are 1) deterministic analysis, corresponding to the model state that maximizes the posterior probability distribution, and 2) (square root) analysis error covariance, corresponding to an estimate of analysis uncertainty.

## 3. Algorithmic details

The MLEF algorithm is designed to exploit the data assimilation infrastructure in existing algorithms. For example, the innovation vectors (e.g., observation-minus-forecast residuals) are calculated as in existing data assimilation algorithms, and the minimization currently used in variational data assimilation can be used in MLEF. To optimize the MLEF performance in realistic applications, the multiple processor capability of parallel computing is made an important component of the algorithm.

As implied in the previous section, the underlying principle in the MLEF development was to improve the computational stability of the algorithm by using only square root matrices. There are five algorithmic steps in the MLEF.

### a. Step 1: Ensemble forecasting from previous to new analysis cycle

A square root forecast error covariance is computed first. Normally, the initial ensemble perturbations are the columns of a square root analysis error covariance, available from a previous analysis cycle. At the very start of data assimilation, however, there is no previous analysis error covariance, and one needs to provide some initial ensemble perturbations to be used in (4). Amongst many feasible options, the following strategy is adopted in MLEF: define random perturbations to initial conditions some time into the past, say the time interval of one–two assimilation cycles, in order to form a set of perturbed initial conditions. Then use this set to initiate ensemble forecasting. The nonlinear ensemble forecast perturbations are computed as a difference between the ensemble forecasts and the control (e.g., unperturbed) forecast, valid at the time of the first data assimilation cycle. According to (4), these perturbations are then used as columns of a square root forecast error covariance, required for data assimilation.

Note that this step, common to all ensemble data assimilation algorithms, may contribute significantly to the computational cost of ensemble data assimilation in high-dimensional applications. It allows an efficient use of parallel computing, however, and thus the actual cost can be significantly reduced in practice.

### b. Step 2: Forward ensemble run to observation location—Innovation vector calculation

**z**

*[(8)] are computed as nonlinear ensemble perturbations of innovation vectors*

_{i}**b**

*are obtained from previously completed ensemble forecasts [(4)]. This means that each ensemble forecast is interpolated to observation location, using the same observation operator available in an existing variational data assimilation algorithm. The calculation of innovation vector perturbations is done without communication between processors; thus it is efficiently scalable on parallel computers.*

_{i}### c. Step 3: Hessian preconditioning and 𝗖-matrix calculation

This step is done only in first minimization iteration. The matrix 𝗖 is computed from ensemble perturbations around the initial forecast guess and is used for Hessian preconditioning. The innovation vectors calculated in step 2 are then used to calculate the elements of the matrix 𝗖 [(8)]. The elements of 𝗖 are computed through an inner-product calculation, and this represents the second dominant computational effort in MLEF (most dominant being the ensemble forecasting). Note that an equivalent computational effort is involved in the ETKF algorithm. Although 𝗖 is an *S* ×* S* symmetric matrix (*S* being the ensemble size), there are still *S*(*S* + 1)/2 inner products to be calculated. If parallel computing is available, each of the inner products can be efficiently calculated on separate processors, essentially with no communication between the processors, thus significantly reducing the computational cost. The EVD calculation of 𝗖 is of negligible cost, 𝗖 being a small-dimensional matrix. Standard EVD subroutines for dense matrices, commonly available in a general mathematical library, such as the Linear Algebra Package (LAPACK; Anderson et al. 1992), or similar, may be used. As shown by (10), the matrix inversion involved in the change of variable (6) is easily accomplished.

### d. Step 4: Gradient calculation

The gradient calculation requires a repeated calculation of innovation vector perturbations **z*** _{i}* in each minimization iteration, however without the need to update the matrix 𝗖. The components of the gradient vector in ensemble space [(11)] are essentially the control forecast innovation vector components projected on each ensemble perturbation. With mentioned good parallel scalability of innovation vectors calculation, the cost of the gradient calculation is relatively small.

### e. Step 5: Analysis error covariance

As stated earlier, the required square root of analysis error covariance is obtained as a by-product of minimization algorithm. The actual computation method depends on the employed minimization algorithm. For example, if a quasi-Newton algorithm is used, one could use the inverse Hessian update formula (e.g., Nocedal 1980) to update the analysis error covariance. In this work, however, we employed a nonlinear conjugate-gradient algorithm (e.g., Luenberger 1984), with the line-search algorithm as defined in Navon et al. (1992). To obtain a satisfactory square root analysis error covariance, the relation (12) is used, with 𝗖 computed around the optimal analysis. Otherwise, the calculation is identical to the step 3. Because 𝗖 is computed close to the true minimum, the nonlinear part of the Hessian is negligible, and a good estimate of the analysis error covariance can be obtained. The columns of the square root analysis error covariance are then used as perturbations to ensemble forecasting in step 1, and the new analysis cycle begins.

Note that error covariance localization, not employed in the current MLEF algorithm, is an important component of most ensemble-based data assimilation algorithms (e.g., Houtekamer and Mitchell 1998; Hamill et al. 2001; Whitaker and Hamill 2002). The idea is that, if the forecast error covariance is noisy and has unrealistic distant correlations, these correlations should be removed. The noisy error covariance is anticipated if the number of ensembles is very small. In the MLEF applications presented here, however, initially noisy error covariances were localized anyway after only few analysis cycles, without any need for additional localization procedure. For that reason, the issue of error covariance localization is left for future work.

## 4. Experimental design

The MLEF method will be used in a simple one-dimensional example, in order to illustrate the anticipated impact in realistic applications.

### a. Model

**u**is a nondimensional model state vector and

*ν*is a diffusion coefficient. The numerical solution is obtained using centered finite differences in space, and the fourth-order Runge–Kutta scheme for time integration (Marchant and Smyth 2002). The model domain has dimension

*N*= 101, with the grid spacing Δ

*x*= 0.5 nondimensional units, and the time step is Δ

*t*= 0.01 nondimensional units. The periodic boundary conditions are used. In the control experiment the diffusion coefficient is

*ν*= 0.07.

The KdVB model includes a few desirable characteristics, such as the nonlinear advection, dispersion, and diffusion. It also allows the solitary waves (e.g., solitons), a nonlinear superposition of several waves, with damping due to diffusion. Various forms of this model are being used in hydrodynamics, nonlinear optics, plasma physics, and elementary particle physics. An interesting weather-related application of a coupled KdV-based system of equations can be found in Gottwald and Grimshaw (1999a, b). Also implied by Mitsudera (1994) in applications to cyclogenesis, the KdV-based system supports baroclinic instability, and it models realistically a nonlinear interaction between the flow and topography.

*x*refers to distance and

*t*to time. The parameters

*β*

_{1}and

*β*

_{2}reflect the amplitude of the two solitons and are chosen to be

*β*

_{1}= 0.5 and

*β*

_{2}= 1.0. The solitons progress with the speed proportional to their amplitude, and the specific choice of the parameters assures that the solitons will often interact during the time integration of the model.

Note that the model run defined as *truth* is using *β*_{1} = 0.5 and *β*_{2} = 1.0, and the initial conditions used in assimilation experiments are defined using *β*_{1} = 0.4 and *β*_{2} = 0.9, with the time parameter *t* lagging behind the truth by one time unit (e.g., 100 model time steps). The initial forecast error covariance is defined using ensemble forecasts [e.g., (4)], initiated from a set of random perturbations two cycles prior to the first observation time. The initial perturbations are formed by randomly perturbing parameters of the solution (15), such as the time and the *β*_{1} and *β*_{2} parameters, around the values used in assimilation run, that is, using *β*_{1} = 0.4 and *β*_{2} = 0.9.

### b. Observations

The observations are chosen as random perturbations to the truth [i.e., the forecast run with initial conditions using *β*_{1} = 0.5 and *β*_{2} = 1.0 in (15)], with the error *ε*_{obs} = 0.05 nondimensional units. Note that such choice implies a perfect model assumption. The observation error covariance 𝗥 is chosen to be diagonal (e.g., variance), with elements *ε*^{2}_{obs}. There are approximately 10 irregularly spaced observations available at each analysis time. Two types of the experiments are performed: (i) *in situ* observations, fixed at one location at all times, and (ii) *targeted* observations, with observations following the solitons’ peaks throughout the integration. Initially, however, both the in situ and targeted observations are chosen to be identical.

The observation operator is a quadratic transformation operator, defined as *H*(**u**) = **u**^{2}. The choice of quadratic observation operator is influenced by a desire to test the algorithm with a relatively weakly nonlinear observation operator, not necessarily related to any meteorological observations. In practice, the observation operators of interest would include highly nonlinear observations operators, such as the radiative transfer model for cloudy atmosphere (e.g., Greenwald et al. 1999), with extensive use of exponential functions. Also, radar reflectivity measurements of rain, snow, and hail are related to model-produced specific humidity and density through logarithmic and other nonlinear functions (M. Xue 2004, personal communication). The observations are taken at grid points to avoid additional impact of interpolation. The case of the linear observation operator is less interesting, since then the MLEF solution is identical to the reduced-rank ensemble Kalman filter solution. The use of the linear observation operator, however, is important for algorithm development and initial testing. In that case, the MLEF solution is obtained in a single minimization step because of the implied perfect Hessian preconditioning.

The observations are made available every two nondimensional time units. Given the model time step of 0.01 units, each analysis cycle implies 200 model time steps. The time integration of the control forecast and the observations are shown in Fig. 1. Note that no data assimilation is involved in creating these plots. The shown time evolution corresponds to the first 10 analysis cycles and illustrates the two-soliton character of the solution. The shown cycles correspond to the cycles that are shown in section 5.

### c. Experiments

The control experiment includes 10 ensemble members (as compared with 101 total degrees of freedom) with 10 targeted observations, and employs a quadratic observation operator. The iterative minimization employed is the Fletcher–Reeves nonlinear conjugate-gradient algorithm (e.g., Luenberger 1984). In each of the MLEF data assimilation cycles, three minimization iterations are performed to obtain the analysis. In all experiments 100 analysis cycles are performed, until the amplitude of solitary waves in the control forecast was reduced by one order of magnitude because of diffusion. Long assimilation also helps in evaluating the stability of the MLEF algorithm performance.

The experiments are designed in such a way as to address two potentially important and challenging problems in realistic atmospheric and oceanic data assimilation: impact of minimization and impact of observation location.

### d. Validation

To compare the results of various experiments, four validation methods are employed.

#### 1) Root-mean-square (rms) error

**u**

_{true}, is given by the control forecast used to produce the observations. This is not completely true, being dependent on the relative influence of observation and forecast errors, but it is assumed acceptable. With this assumption, the rms error is calculated as

*N*defines the model state dimension (i.e., the number of grid points).

#### 2) Analysis error covariance estimate

The analysis error covariance is an estimate obtained from an ensemble data assimilation algorithm, and it will be shown in terms of the actual matrix elements. This is the new information produced by ensemble data assimilation, generally not available in variational data assimilation. It requires special attention, since this information is directly transferred to ensemble forecasting and also estimates the uncertainty of the produced analysis.

#### 3) The *χ*^{2} validation test

*χ*

^{2}validation diagnostics (e.g., Menard et al. 2000), developed to validate the Kalman filter performance, can also be used in the context of ensemble data assimilation. This diagnostics evaluates the correctness of the innovation (observation minus forecast) covariance matrix that employs a predefined observation error covariance 𝗥, and the MLEF-computed forecast error covariance 𝗣

*. We adopt the definition used in Menard et al. (2000)—*

_{f}*χ*

^{2}is defined in observation space, normalized by the number of observation,

*N*

_{obs}:

^{−1}(e.g., its square root) is defined in appendix B,

**y**denotes observations, and

**x**is the model forecast. Because of an iterative estimation of optimal analysis in MLEF, the forecast

**x**denotes the forecast from the last minimization iteration, and the matrix 𝗖 is calculated about the optimal state. For Gaussian distribution of innovations, and linear observation operator

*H*, the conditional mean of

*χ*

^{2}defined by (18) should be equal to 1. As in Menard et al. (2000), the conditional mean is substituted by a time mean. In this paper, a 10-cycle moving average is computed, as well as the instant values of

*χ*

^{2}, calculated at each assimilation cycle. Because of the use of a nonlinear model in calculation of 𝗣

*, and a statistically small sample (i.e., relatively few observations per cycle), one can expect only values of*

_{f}*χ*

^{2}close to 1 and not necessarily equal to 1.

#### 4) Innovation vector PDF statistics

*N*(0, 1). Note that, if innovations (19) are random variables with distribution

*N*(0, 1), then (17)–(18) define a

*χ*

^{2}distribution with

*N*

_{obs}degrees of freedom.

In our applications, because of the nonlinearity of the forecast model and the observation operator *H* and the relatively small statistical sample, only an approximate normal distribution should be expected.

## 5. Results

### a. Linear observation operator experiments

When linear observation operators are employed, and Gaussian error distribution assumed, in principle there is no difference between the MLEF and any related EnKF algorithm. Formally, a single minimization iteration of the MLEF is needed, with step length equal to 1. These experiments are conducted in order to develop and test the MLEF algorithm, especially the statistics of produced results, using diagnostics defined in sections 4d(3) and 4d(4). Note that perfect statistical fit cannot be expected, since the forecast model is still a nonlinear model, with diffusion, and the posterior statistics is not exact Gaussian. An obvious consequence of having few observations per cycle is that the innovation statistics may not be representative of true PDF statistics. Two experiments are performed, one with 10 (targeted) observations per cycle, and the other with all observations (e.g., 101 per cycle).

The *χ*^{2} test is shown in Fig. 2. Although in both experiments the value of *χ*^{2} is close to one, much better agreement is obtained with more observations (e.g., Fig. 2b). It also takes more analysis cycles to converge to 1, which may be a sign of an increased difficulty of the KdVB model to fit numerous and noisy observations. Note that with all observations assimilated, there is a greater chance that some observations are negative, which would in turn impact the numerical stability of the model.

The innovation histogram is shown in Fig. 3 and indicates a similar impact of statistical sample. Deviations from a Gaussian PDF are more pronounced when fewer observations are used (Fig. 3a), then in the case with all observations (Fig. 3b). There is also a notable right shift of the PDF, which could be the impact of diffusion (e.g., Daley and Menard (1993) or the impact of model nonlinearity.

The results in Figs. 2a and 3a indicate what can be expected from the experiments with a quadratic observation operator and few observations per cycle. On the other hand, in future applications of the MLEF with real models and real observations, one could expect a much better innovation statistical sample, given the enormous number of satellite and radar measurements available today.

### b. Quadratic observation operator

#### 1) Control experiment

The rms result of the control MLEF experiment is shown in Fig. 4. Also, the rms error from the experiment with no observations is shown. Any acceptable data assimilation experiment should have smaller rms error than the no-assimilation experiment. During the initial 11–12 cycles, however, there is a pronounced increase of the rms error. This suggests that the particular initial perturbation (defined as a difference from truth) is unstable during initial cycles. As the cycles continue, however, the rms error in the no-assimilation experiment converges to the true solution, that is, indicating an ultimate stability of initial perturbation. This is an artifact of diffusion, which would eventually force all forecasts to be zero, therefore producing all rms errors equal to zero. One can note good rms convergence in the MLEF experiment, within the first few cycles. The final rms error is nonzero, since the defined truth (e.g., **u**_{true}) is just a long-term forecast used to create observations, not necessarily equal to the actual true analysis solution. Overall, the rms error indicates a stable MLEF performance.

The estimate of the analysis error covariance in the control MLEF experiment is shown in Fig. 5, for the analysis cycles 1, 4, 7, and 10. These cycles are chosen in order to illustrate an initial adjustment of the analysis. Each of the figures represents actual matrix elements, with the diagonal corresponding to the variance. All analysis error covariance figures have a threshold of ±1 × 10^{−4} nondimensional units, in order to ease the qualitative comparison between the results from different experiments. Although the true analysis error covariance is not known, it would have nonzero values, since the observations have a nonzero error. One can immediately note how analysis error covariance became localized by the fourth cycle, without the need for any artificial error covariance localization. Also, the values of the covariance matrix remain relatively small through cycles, moving with the solitons.

Statistics of innovation vectors are an important sign of the algorithm performance, especially useful when the truth is not known. The *χ*^{2} test and the innovation histogram are shown in Fig. 6. As suggested earlier, because of increased nonlinearity and small statistical sample, one should expect only approximate agreement. It is clear that the *χ*^{2} value remains near the value of one throughout analysis cycles, again suggesting a stable performance of MLEF algorithm. The innovation histogram is showing close resemblance to standard normal PDF, confirming that the statistics of innovations is satisfactory.

#### 2) Impact of iterative minimization

The control MLEF experiment (with three minimization iterations) is compared with the ensemble data assimilation experiment with no explicit minimization. In both experiments 10 ensemble members and 10 observations are used, and a quadratic observation operator is employed. The only difference is that the MLEF employs an iterative minimization, while the no-minimization experiment is a single minimization iteration with the step length equal to 1 (e.g., appendix A). The experiment without minimization indirectly reflects the impact of linear analysis solution, implied in ensemble Kalman filters. It should be noted, however, that there are many other details of ensemble Kalman filters not captured in this experiment, and any direct comparison should be taken with caution.

The rms errors are shown in Fig. 7. It is obvious that without minimization, the ensemble-based reduced-rank data assimilation algorithm is not performing well. The explanation is that the MLEF algorithm is better equipped to handle nonlinearities of observation operators, and thus it creates smaller rms errors. Most of the difficulties in the no-minimization experiment are occurring during first 11–12 cycles, coinciding with the rms increase noted in the no-observation experiment (Fig. 4).

With fewer observations, the positive impact of iterative minimization is still notable in terms of rms error, although the impact is somewhat smaller in magnitude (Fig. 8). Again, most of the differences occur during first cycles, with both solutions reaching the same rms in later cycles. The reduced number of observations does have a negative impact on the performance of both algorithms, as expected. The impact of minimizations is also evaluated for the in situ observations, in terms of rms error (Fig. 9). As before, the positive impact of minimization is notable only in first cycles, with both algorithms showing signs of difficulty. A hidden problem with no-minimization experiments with five observations and with in situ observations was that the satisfactory solution was possible only for smaller initial ensemble perturbations. Therefore, the results shown in Figs. 8 and 9 imply smaller initial ensemble perturbations than in the experiments with 10 targeted observations (Fig. 7). This may be an indication of the sensitivity of the KdVB numerical solution to large perturbations, but also it may suggest a critical role of iterative minimization in situations with large innovation residuals. This issue will be further examined in future, in applications with realistic models and real observations.

Overall, the use of iterative minimization in MLEF shows a positive impact in terms of the rms error. The impact appears to be stronger for a better observed system.

#### 3) Impact of observation location

An interesting problem, related to the impact of observation location on the performance of ensemble data assimilation, is now considered. The issue of targeted observations, as means to improve the regular observation network, has been thoroughly discussed and evaluated (Palmer et al. 1998; Buizza and Montani 1999; Langland et al. 1999; Szunyogh et al. 2002; Majumdar et al. 2002). Here, we indirectly address this issue by examining the impact of observation location on the performance of the MLEF algorithm.

At initial time, the in situ and targeted observations are chosen to be identical. The two solitons may be viewed as weather disturbances with phase and amplitude important to predict, that is, as the temperature associated with fronts, for example. Since these systems move and interact with each other, it is instructive to evaluate the impact of targeted observations, intuitively associated with the location of the disturbances. Figure 10 shows the rms errors in targeted and in situ MLEF experiments. There is a strong positive impact of targeted observations, with them being able to resolve the two disturbances at all times. The particular location of the in situ observations does not allow the optimal use of observation information with regard to the two solitons. Only at cycles when the solitons are passing through the in situ observation network is the observation information adequately transferred and accumulated, eventually resulting in small rms errors.

The analysis error covariance associated with the in situ MLEF experiment is shown in Fig. 11 and should be compared with the control MLEF experiment (Fig. 5). The positive impact of targeted observations is now even more obvious. In the first cycle, the results are identical since the targeted and in situ observations are identical. As the cycles proceed, much larger uncertainties are obtained than in the control MLEF experiment, especially near the location of solitons. Although one should not draw strong conclusions from this simple experiment, the results appear to suggest that targeted observations amplify the beneficial impact of ensemble data assimilation.

## 6. Summary and conclusions

The maximum likelihood ensemble filter is presented, in applications to one-dimensional Korteweg–de Vries–Burgers equation with two solitons. The filter combines the maximum likelihood approach with the ensemble Kalman filter methodology to create a qualitatively new ensemble data assimilation algorithm with desirable computational features. The analysis solution is obtained as a model state that maximizes the posterior probability distribution, via an unconstrained minimization of an arbitrary nonlinear cost function. This creates an important link between the control theory and ensemble data assimilation. Like other ensemble data assimilation algorithms, the MLEF produces an estimate of the analysis uncertainty (e.g., analysis error covariance) and employs solely nonlinear forecast model and observation operators. The use of linearized models, or adjoints, required for variational methods, is completely avoided. The impact of the MLEF method is illustrated in an example with quadratic observation operator. The innovation vector statistics (e.g., *χ*^{2} test and innovation histogram) indicates satisfactory, stable performance of the algorithm. Although in this paper the MLEF method is applied in a simple environment, all calculations and processing of observations are directly applicable to use with state-of-the-art forecast models and arbitrary nonlinear observation operators. Since the observations assimilated in the experiments presented here are just a single realization of infinitely many possible realizations, the obtained results also depend on the particular observation realization.

The impact of targeted observations is another important issue relevant to the operational data assimilation and the use of ensembles. It appears that the targeted observation network amplifies the beneficial impact of ensemble data assimilation. This is certainly an issue worthy of further investigation.

The positive impact of iterative minimization, on both the rms error and the analysis error covariance, is obvious. The MLEF algorithm clearly benefits from the maximum likelihood component. The additional calculation involved in iterative minimization is almost negligible, compared to the cost of ensemble forecasts and the Hessian preconditioning calculations. Only two–three minimization iterations are anticipated in realistic applications, further relaxing possible concern of using iterative minimization.

A positive impact of minimization in the case of the nonlinear observation operator suggests that an iterative minimization approach can be also used in other ensemble-based data assimilation algorithms based on the use and calculation of the conditional mean (e.g., ensemble mean). Such an algorithm would be more robust with respect to nonlinear observation operators.

Because of the use of a control deterministic forecast as a first guess, the MLEF method may be more appealing in applications where a deterministic forecast is of interest. The MLEF method offers a potential advantage when computational burden forces the ensemble forecasts to be calculated in coarser resolution than desired. One can still minimize the cost function defined in fine resolution and thus produce the control (maximum likelihood) forecast in fine resolution. Only the ensembles, used for error covariance calculation, are defined in coarse resolution. Using the ensemble mean as a first guess, on the other hand, may be a limiting factor in that respect, since data assimilation problem would be defined and solved only in coarser resolution.

In a forthcoming paper, the model error and model error covariance evolution will be added to the MLEF algorithm. Applications to realistic models and observations are also underway. For somewhat higher computational cost, the MLEF algorithm allows a straightforward extension to smoothing, which could be relevant in applications with high temporal frequency of observations.

In future MLEF development, the non-Gaussian PDF framework and improved Hessian preconditioning is anticipated, to further extend the use of control theory in challenging geophysical applications. Both the conditional mean (e.g., minimum variance) and the conditional mode (e.g., maximum likelihood) are important PDF estimates (e.g., Cohn 1997). Future development of the MLEF will address these issues.

## Acknowledgments

I thank Dusanka Zupanski for many helpful discussions and careful reading of the manuscript. My gratitude is also extended to Ken Eis for helpful comments and suggestions. I also thank Thomas Vonder Haar and Tomislava Vukicevic for their continuous support throughout this work. I am greatly indebted to Rolf Reichle and an anonymous reviewer for thorough reviews that significantly improved the manuscript. This research was supported by the Department of Defense Center for Geosciences/Atmospheric Research at Colorado State University under Cooperative Agreement DAAD19-02-2-0005 with the Army Research Laboratory.

## REFERENCES

Anderson, E., and Coauthors, 1992:

*LAPACK Users’ Guide*. Society for Industrial and Applied Mathematics, 235 pp.Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation.

,*Mon. Wea. Rev.***129****,**2884–2903.Anderson, J. L., 2003: A local least squares framework for ensemble filtering.

,*Mon. Wea. Rev.***131****,**634–642.Axelsson, O., 1984:

*Iterative Solution Methods*. Cambridge University Press, 644 pp.Bell, B. M., and F. W. Cathey, 1993: The iterated Kalman filter update as a Gauss-Newton method.

,*IEEE Trans. Automat. Contr.***38****,**294–297.Bishop, C., J. Etherton, and S. J. Majmudar, 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects.

,*Mon. Wea. Rev.***129****,**420–436.Brasseur, P., J. Ballabrera, and J. Verron, 1999: Assimilation of altimetric data in the mid-latitude oceans using the SEEK filter with an eddy-resolving primitive equation model.

,*J. Mar. Syst.***22****,**269–294.Buizza, R., and A. Montani, 1999: Targeting observations using singular vectors.

,*J. Atmos. Sci.***56****,**2965–2985.Cohn, S. E., 1997: Estimation theory for data assimilation problems: Basic conceptual framework and some open questions.

,*J. Meteor. Soc. Japan***75****,**257–288.Cohn, S. E., A. da Silva, J. Guo, M. Sienkiewicz, and D. Lamich, 1998: Assessing the effects of data selection with the DAO physical-space statistical analysis system.

,*Mon. Wea. Rev.***126****,**2913–2926.Courtier, P., J-N. Thepaut, and A. Hollingsworth, 1994: A strategy for operational implementation of 4D-Var using an incremental approach.

,*Quart. J. Roy. Meteor. Soc.***120****,**1367–1388.Daley, R., and R. Menard, 1993: Spectral characteristics of Kalman filter systems for atmospheric data assimilation.

,*Mon. Wea. Rev.***121****,**1554–1565.Daley, R., and E. Barker, 2001: NAVDAS: Formulation and diagnostics.

,*Mon. Wea. Rev.***129****,**869–883.Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte-Carlo methods to forecast error statistics.

,*J. Geophys. Res.***99****,**C5,. 10143–10162.Evensen, G., 2003: The ensemble Kalman filter: Theoretical formulation and practical implementation.

,*Ocean Dyn.***53****,**343–367.Evensen, G., and P. J. van Leeuwen, 2000: An ensemble Kalman smoother for nonlinear dynamics.

,*Mon. Wea. Rev.***128****,**1852–1867.Fisher, M., and P. Courtier, 1995: Estimating the covariance matrix of analysis and forecast error in variational data assimilation. ECMWF Tech. Memo. 220, 28 pp.

Gandin, L. S., 1963:

*Objective Analysis of Meteorological Fields*. (in Russian). Gidrometeorizdar, 238 pp. [English translation by Israel Program for Scientific Translations, 1965, 242 pp.].Gill, P. E., W. Murray, and M. H. Wright, 1981:

*Practical Optimization*. Academic Press, 401 pp.Golub, G. H., and C. F. van Loan, 1989:

*Matrix Computations*. 2d ed. The Johns Hopkins University Press, 642 pp.Gottwald, G., and R. Grimshaw, 1999a: The formation of coherent structures in the context of blocking.

,*J. Atmos. Sci.***56****,**3640–3662.Gottwald, G., and R. Grimshaw, 1999b: The effect of topography on the dynamics of interacting solitary waves in the context of atmospheric blocking.

,*J. Atmos. Sci.***56****,**3663–3678.Greenwald, T. J., S. A. Christopher, J. Chou, and J. C. Liljegren, 1999: Inter-comparison of cloud liquid water path derived from the GOES 9 imager and ground based microwave radiometers for continental stratocumulus.

,*J. Geophys. Res.***104****,**9251–9260.Hamill, T. M., and C. Snyder, 2000: A hybrid ensemble Kalman filter–3D variational analysis scheme.

,*Mon. Wea. Rev.***128****,**2905–2919.Hamill, T. M., J. S. Whitaker, and C. Snyder, 2001: Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter.

,*Mon. Wea. Rev.***129****,**2776–2790.Haugen, V. E. J., and G. Evensen, 2002: Assimilation of SLA and SST data into an OGCM for the Indian Ocean.

,*Ocean Dyn.***52****,**133–151.Heemink, A. W., M. Verlaan, and J. Segers, 2001: Variance reduced ensemble Kalman filtering.

,*Mon. Wea. Rev.***129****,**1718–1728.Houtekamer, P. L., and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique.

,*Mon. Wea. Rev.***126****,**796–811.Houtekamer, P. L., and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation.

,*Mon. Wea. Rev.***129****,**123–137.Jazwinski, A. H., 1970:

*Stochastic Processes and Filtering Theory*. Academic Press, 376 pp.Kalman, R., and R. Bucy, 1961: New results in linear prediction and filtering theory.

,*Trans. AMSE J. Basic Eng.***83D****,**95–108.Keppenne, C. L., 2000: Data assimilation into a primitive-equation model with a parallel ensemble Kalman filter.

,*Mon. Wea. Rev.***128****,**1971–1981.Keppenne, C. L., and M. M. Rienecker, 2002: Initial testing of massively parallel ensemble Kalman filter with the Poseidon isopycnal ocean general circulation model.

,*Mon. Wea. Rev.***130****,**2951–2965.Langland, R. H., and Coauthors, 1999: The North Pacific Experiment (NORPEX-98): Targeted observations for improved North American weather forecasts.

,*Bull. Amer. Meteor. Soc.***80****,**1363–1384.Lermusiaux, P. F. J., and A. R. Robinson, 1999: Data assimilation via error subspace statistical estimation. Part I: Theory and schemes.

,*Mon. Wea. Rev.***127****,**1385–1407.Lorenc, A. C., 1986: Analysis methods for numerical weather prediction.

,*Quart. J. Roy. Meteor. Soc.***112****,**1177–1194.Luenberger, D. L., 1984:

*Linear and Non-linear Programming*. 2d ed. Addison-Wesley, 491 pp.Majumdar, S. J., C. H. Bishop, B. J. Etherton, and Z. Toth, 2002: Adaptive sampling with the ensemble transform Kalman filter. Part II: Field program implementation.

,*Mon. Wea. Rev.***130****,**1356–1369.Marchant, T. R., and N. F. Smyth, 2002: The initial-boundary problem for the Korteweg–de Vries equation on the negative quarter-plane.

,*Proc. Roy. Soc. London***458A****,**857–871.Menard, R., S. E. Cohn, L-P. Chang, and P. M. Lyster, 2000: Assimilation of stratospheric chemical tracer observations using a Kalman filter. Part I: Formulation.

,*Mon. Wea. Rev.***128****,**2654–2671.Mitsudera, H., 1994: Eady solitary waves: A theory of type B cyclogenesis.

,*J. Atmos. Sci.***51****,**3137–3154.Nocedal, J., 1980: Updating quasi-Newton matrices with limited storage.

,*Math. Comput.***35****,**773–782.Ott, E., and Coauthors, 2004: A local ensemble Kalman filter for atmospheric data assimilation.

,*Tellus***56A****,**415–428.Palmer, T. N., R. Gelaro, J. Barkmeijer, and R. Buizza, 1998: Singular vectors, metrics, and adaptive observations.

,*J. Atmos. Sci.***55****,**633–653.Parrish, D. F., and J. C. Derber, 1992: The National Meteorological Center’s Spectral Statistical Interpolation Analysis System.

,*Mon. Wea. Rev.***120****,**1747–1763.Pham, D. T., J. Verron, and M. C. Roubaud, 1998: A singular evolutive extended Kalman filter for data assimilation in oceanography.

,*J. Mar. Syst.***16****,**323–340.Rabier, F., A. McNally, E. Andersson, P. Courtier, P. Unden, J. Eyre, A. Hollingsworth, and F. Bouttier, 1998: The ECMWF implementation of three dimensional variational assimilation (3D-Var). Part II: Structure functions.

,*Quart. J. Roy. Meteor. Soc.***124A****,**1809–1829.Rabier, F., H. Jarvinen, E. Klinker, J-F. Mahfouf, and A. Simmons, 2000: The ECMWF operational implementation of four-dimensional variational assimilation. I: Experimental results with simplified physics.

,*Quart. J. Roy. Meteor. Soc.***126A****,**1143–1170.Reichle, R. H., D. B. McLaughlin, and D. Entekhabi, 2002a: Hydrologic data assimilation with the ensemble Kalman filter.

,*Mon. Wea. Rev.***130****,**103–114.Reichle, R. H., J. P. Walker, R. D. Koster, and P. R. Houser, 2002b: Extended versus ensemble Kalman filtering for land data assimilation.

,*J. Hydrometor.***3****,**728–740.Szunyogh, I., Z. Toth, A. V. Zimin, S. J. Majumdar, and A. Persson, 2002: Propagation of the effect of targeted observations: The 2000 Winter Storm Reconnaissance Program.

,*Mon. Wea. Rev.***130****,**1144–1165.Tippett, M., J. L. Anderson, C. H. Bishop, T. M. Hamill, and J. S. Whitaker, 2003: Ensemble square root filters.

,*Mon. Wea. Rev.***131****,**1485–1490.van Leeuwen, P. J., 2001: An ensemble smoother with error estimates.

,*Mon. Wea. Rev.***129****,**709–728.Verlaan, M., and A. W. Heemink, 2001: Nonlinearity in data assimilation applications: A practical method for analysis.

,*Mon. Wea. Rev.***129****,**1578–1589.Vvedensky, D., 1993:

*Partial Differential Equations with Mathematica*. Addison-Wesley, 465 pp.Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations.

,*Mon. Wea. Rev.***130****,**1913–1924.Zou, X., Y-H. Kuo, and Y-R. Guo, 1995: Assimilation of atmospheric radio refractivity using a nonhydrostatic adjoint model.

,*Mon. Wea. Rev.***123****,**2229–2250.Zou, X., H. Liu, J. Derber, J. G. Sela, R. Treadon, I. M. Navon, and B. Wang, 2001: Four-dimensional variational data assimilation with a diabatic version of the NCEP global spectral model: System development and preliminary results.

,*Quart. J. Roy. Meteor. Soc.***127****,**1095–1122.Zupanski, M., 1993: Regional four-dimensional variational data assimilation in a quasi-operational forecasting environment.

,*Mon. Wea. Rev.***121****,**2396–2408.Zupanski, M., D. Zupanski, D. Parrish, E. Rogers, and G. DiMego, 2002: Four-dimensional variational data assimilation for the Blizzard of 2000.

,*Mon. Wea. Rev.***130****,**1967–1988.

## APPENDIX A

### Equivalence of the Kalman Gain and Hessian Preconditioning-Gradient Calculation

*α*is the step length, 𝗘 is the Hessian, and

**g**is the gradient of the cost function (6) in the first minimization iteration. Denoting the Jacobian of observation operator as 𝗛,

*α*is equal to 1 (Gill et al. 1981). Therefore, for quadratic cost function, the solution of the iterative minimization problem in first iteration is

*identical*to the extended Kalman filter solution (Jazwinski 1970). In this context, the matrix identity (A6) shows the equivalence between the Kalman gain calculation and the Hessian-gradient calculation in iterative minimization. For nonquadratic cost function, the step length is different from one, and the solution (A7) is not identical to the extended Kalman filter solution. The Kalman gain computation, however, is still the same as the Hessian-gradient computation.

## APPENDIX B

### Computation of the Matrix 𝗚−1/2

^{−1}[(18)] and 𝗚

^{−1/2}[(19)] are needed for computation of normalized innovations. An efficient algorithm for computing the inverse square root matrix 𝗚

^{−1/2}is presented here. It relies on the use of Sherman–Morrison–Woodbury (SMW) formula, as well as on the use of an iterative matrix square root calculation procedure. This matrix is used to calculate the normalized innovations [(19)]. The calculated normalized innovations are then used in calculating the

*χ*

^{2}sum [(18)]. From (17) and (18), one can see that

**z**

_{1}

**z**

_{2}. . .

**z**

*), one can redefine 𝗚*

_{S}^{−1}as

^{T}𝗭, where the matrix 𝗖 is the same as defined by (9) and (10) from the text. This means that (B3) can be rewritten as

**Λ**are both available from the MLEF algorithm [(10)]. Therefore,

*all*matrices on the right-hand side of (B4) are available. To calculate the square root of a positive-definite symmetric matrix 𝗚

^{−1}, one can exploit an iterative formula (Golub and van Loan 1989, p. 554, problem P11.2.4), which produces a unique symmetric positive-definite square root matrix 𝗚

^{−1/2}

^{−1}[e.g., (B4)], and the fact that 𝗩 is unitary (e.g., 𝗩

^{T}𝗩 = 𝗜), allow a simplification of the matrix inversion involved in (B5). To see that, it is convenient to write 𝗚

^{−1}in generic form

**Ψ**

_{0}is a nonzero diagonal matrix, with known elements

*ψ*=

_{i}**−**1/(1 +

*λ*). After applying (B5) with formulation (B6), one obtains

_{i}*and 𝗫*

_{k}^{−1}

_{k}keep the same form, and only diagonal matrices

**Σ**

*and*

_{k}**Γ**

*are updated during iterations. This greatly simplifies the computational burden of a matrix square root calculation. A recursive (iterative) algorithm for 𝗚*

_{k}^{−1/2}can then be defined:

In experiments conducted in this paper, a satisfactory convergence was found after only three iterations [e.g., *N* = 3 in (B10)]. The above algorithm is stable and is convenient for the matrix square root calculations in the context of MLEF.