## Abstract

In the hybrid variational–ensemble data assimilation schemes preconditioned on the square root of background covariance , is a linear map from the model space to a higher-dimensional space. Because of the use of the nonsquare matrix , the transformed cost function still contains the inverse of . To avoid this inversion, all studies have used the diagonal quadratic form of the background term in practice without any justification. This study has shown that this practical cost function belongs to a class of cost functions that come into play whenever the minimization problem is transformed from the model space to a higher-dimension space. Each such cost function is associated with a vector in the kernel of (Ker), leading to an infinite number of these cost functions in which the practical cost function corresponds to the zero vector. These cost functions are shown to be the natural extension of the transformed one from the orthogonal complement of Ker to the full control space.

In practice, these cost functions are reduced to a practical form where calculation does not require a predefined vector in Ker, and are as valid as the transformed one in the control space. That means the minimization process is not needed to be restricted to any subspace, which is contrary to the previous studies. This was demonstrated using a real observation data assimilation system. The theory justifies the use of the practical cost function and its variant in the hybrid variational–ensemble data assimilation method.

## 1. Introduction

The variational data assimilation (VAR) finds the mode of the posterior distribution through minimization of a cost function. The second moment of the prior distribution (the background error covariance ) is required in the definition of the cost function. This kind of information is not available if using VAR solely. Therefore, the variational method usually replaces the unknown or “errors of the day” with the climatological error covariance (Parrish and Derber 1992). This second moment is available into the ensemble Kalman filter (EnKF) by sampling the prior distribution with a number of ensemble members. This means partial errors of the day are considered in EnKF. Although only partial information for the background errors can be obtained, this information when ingested into variational schemes can introduce errors of the day into VAR. This blending of the second moment from EnKF into VAR is named the hybrid variational–ensemble method in data assimilation (Hamill and Snyder 2000; Lorenc 2003; Buehner 2005).

The incorporation of the ensemble error covariance into the variational method introduces a new form for the square root of , which is no longer a square full-rank matrix. Under the variational schemes preconditioned on the square root of (^{1/2} preconditioning), the matrix is used as the preconditioner in minimizing the cost function. This is equivalent to a transformation of the cost function from the model space to a high-dimensional space through the operator . As a result, the background term in the control space cannot be written as a diagonal quadratic form as in the case of the square full-rank matrix. In fact, the background term contains the inverse of , which cannot be simplified further.

The simplest way to tackle this problem is to define a cost function in the control space, imposing the diagonal form for the background term, rather than derive the cost function from the original one in the model space. This approach was addressed implicitly in the work of Lorenc (2003) and Buehner (2005), and was followed by other authors using the hybrid schemes with the ^{1/2} preconditioning (Wang et al. 2008; Zhang and Zhang 2012; Clayton et al. 2013; Desroziers et al. 2014). These studies were justified by the proof in Wang et al. (2007, hereafter W07) showing the equivalence between the original and the proposed cost functions in the sense that the point that minimizes the latter when mapped to the model space also minimizes the former. However, the reason for the coincidence of the minimum points of the two cost functions was not explained.

The inconsistency in the background terms between the transformed cost function and the cost function used in practice was first noticed in the work of Ménétrier and Auligné (2015, hereafter MA15). The authors called the former the correct transformed cost function and the latter the practical transformed cost function. They have distinguished three minimization problems in the hybrid method corresponding to the original, the transformed, and the practical cost functions. Here the original cost function is the cost function defined in the model space. The authors have found that the transformed and the practical cost functions are identical in the range of the transpose of (Im^{T}). Moreover, the practical cost function attains its minimum in this subspace. These two facts imply that the two minimization problems with the transformed and practical cost functions when restricted into Im^{T} are totally equivalent.

This clearly has elaborated the research by W07. W07 have shown that the transformed and practical cost functions are identical at the minimum point of the practical cost function. However, the variation of these two cost functions about this point was not examined, while the understanding of this variation is important since in practice only approximation of the minimum point is possible. This has been examined in MA15 where these two cost functions are shown identical not only at this point but also over Im^{T}. Since the practical cost function is different from the transformed one outside Im^{T}, these authors have argued that any minimizer used for the practical cost function should be restricted to Im^{T}.

Why can the analysis be obtained from the practical cost function even though this cost function differs from the correct transformed one? In this paper we show that it is not incidental that the practical cost function and the transformed cost function are identical in Im^{T}. This coincidence suggests that something more fundamental is hidden behind the surface. In fact, the so-called practical cost function is a special case of a class of cost functions that come into play when the minimization problem is transformed from the model space to a higher-dimension space. That is to say these cost functions are related to the extra dimensions when the control space has more dimensions than the model space, and are as valid as the transformed cost function to deal with in the control space. MA15 have concluded that any minimizer used for the practical cost function should be restricted to Im^{T} since the authors have considered the practical cost function as an ad hoc cost function used in practice to avoid the inverse of .

The mathematical formulation behind the new class of cost functions is established in section 2. The key factor here involves the kernel of (Ker). A simple example is also presented to clarify the theory. Investigation on the properties of these cost functions and the transformed one are described into section 3. It turns out that the transformed cost function is only defined on the orthogonal complement (Ker)^{┴} of Ker and is always less than or equal to these cost functions. This fact is demonstrated in section 4 using a real observation data assimilation system. Section 5 reinterprets the new cost function as a natural extension of the transformed one from (Ker)^{┴} to the full control space and its implication. After showing the cost function in the alpha control variable method as a variant of the new one, the conclusions are given.

## 2. Cost functions in control space

In this section, we adopt the same mathematical notations as in MA15. The variational method finds the mode of the posterior distribution, leading to minimization of a cost function, which is represented under the incremental form (Courtier et al. 1994):

where ∈ is the first-guess value, δ**x** ∈ is the analysis increment, **y** ∈ denotes the observations, ∈ is the background error covariance matrix, ∈ is the observational error covariance matrix, *H*: is a nonlinear operator, and : is the Jacobian of *H* evaluated at . The *H* is the observation operator in the three-dimensional variational method and the composite operator between the observation operator and the model in the four-dimensional variational method. We assume that and are full-rank matrices so that their inverses exist. To find the analysis increment, we find the zero of the gradient of the following cost function:

Here for compactness, we denote the innovation vector **y** − *H*() by **d**.

In the hybrid variational–ensemble method, the background error covariance matrix is a linear combination between the climatological error covariance matrix and the ensemble error covariance matrix :

where and are the weights assigned to and , respectively; is the localization matrix; and the symbol stands for the Hadamar product. The ensemble error covariance matrix is calculated from the forecast ensemble perturbations Δ**x**_{k}:

In Eq. (4) **x**_{k} is the forecast of the member *k*, is the forecast ensemble mean, and *K* denotes the number of ensemble members. Equation (4) shows that the *K* vectors Δ**x**_{k} form the column vectors of the square root of .

Because of a limited number of ensemble members, the localization matrix is used to remove the resulting sampling noises between points at long distances. To find an appropriate form for the square root of the localized matrix we use the square root of and the property of Hadamar product:

where the vectors are the column vectors of the square root of . Denote _{,i} as the *i*th column vector of the square root of , the matrix has the following form:

This shows that the set of column vectors and (Δ**x**_{k}) form a square root of . Assuming that is a full-rank square matrix, the number of vectors is *n*, leading to *n* × *K* vectors of the form Δ**x**_{k}. Combined with *n * vectors, the square root of is therefore formed by *p* = *n* × (*K* + 1) vectors. Denoting this matrix by , then is a linear map from to . Thus, because of the use of localization, the dimension of the control space is increased by a factor of the number of ensemble members (this factor can be reduced depending on specification of the localization matrix) compared to the one of the model space.

We use this matrix as a preconditioner for minimization of the cost function *J*:

Equation (7) shows that the analysis increment is expressed in terms of the column vectors of the square root of and υ_{i}, and υ_{j,k} are the components of the analysis increment in the span of the column vectors of , which form the control vector **v ∈ **. Since the number of the column vectors is greater than their dimension, this is a redundant representation, which is manifested in the nonzero dimension of the kernel of (Ker). In the control space, the cost function in Eq. (1) becomes

We call this cost function the transformed cost function. Unlike the case where is a square full-rank matrix, the inverse of still exists in the background term and this term cannot be simplified to the diagonal quadratic form (1/2)**v**^{T}**v**.

To overcome this problem, our approach here is to replace the gradient vector in Eq. (2) by the other vector that has the same zero point under an appropriate transformation. Applying the operator on both sides of Eq. (2) we have

Since is full rank, its kernel only contains the zero vector. Therefore, ∇*J*_{x} = **0** entails ∇*J*_{x} = **0**. In other words, if δ**x** is the zero of ∇*J*_{x}, it is also the zero of ∇*J*_{x}. That means to solve for the analysis increment, we must find the zero of ∇*J*_{x}.

Equation (9) can be further rewritten with the help of the square root of :

Denoting the term in parentheses on the right-hand side by **g**, Eq. (10) shows that if we can find **v** such that **g** is in Ker, that is **g** = **0**, the vector δ**x** = **v** will be the zero of ∇*J*_{x}. Note that due to the presence of Ker, the two equations ∇*J*_{x} = **0** and **g** = **0** are not equivalent in a sense that if δ**x** satisfies the former, the corresponding **v** such that δ**x = ****v** is not necessary to satisfy the latter. Thus, given an arbitrary vector **∈** Ker, to find the zero of ∇*J*_{x}, we find the zero of :

This expression suggests that the solution **v** can be obtained by minimization of the following cost function:

or equivalently, by including the last term into the first term in Eq. (12), a similar form as the cost functions in Eqs. (1) and (8) can be obtained:

Note that the constant term (1/2) is added into Eq. (13) compared with Eq. (12). We call this the extended cost function, which describes its nature as examined in section 5.

If is a nonsingular square matrix, Ker only contains the zero vector and the cost function in Eq. (13) with is identical to the transformed cost function in Eq. (8). However, when Ker is nontrivial, containing other vectors besides the zero vector, Eq. (13) introduces a new kind of cost function in the control space. In fact even in the case of this cost function cannot be identified with the transformed cost function. This reveals a new fact involving the extra dimensions when we lift the original cost function from the model space to the control space with a higher dimension. That is, each vector in Ker induces a cost function that, when minimized in the control space, yields the same solution in the model space after being transformed by . Thus, the number of such cost functions is infinite, corresponding to the number of vectors in Ker.

Especially, when **v**_{0} = **0** Eq. (13) gives the cost function used in all hybrid schemes preconditioned on the square root of , which MA15 called the practical cost function. However, the mathematical treatment here shows that it is legitimate to use this cost function to find the solution in the control space. Although the theory does not give preferential treatment to any cost function, the absence of the inverse of in Eq. (13) favors the use of this cost function over the transformed cost function in practice. We still use the term “practical” for this cost function to be consistent with the previous studies.

All mathematical treatment so far assumes is a full-rank matrix. If is singular, we use the Moore–Penrose pseudoinverse of () to redefine the cost function in Eq. (1):

Also the minimization should be confined to the range of : δ**x ∈** Im to have a well-defined minimization problem. Now the problem becomes a constrained minimization problem. To get rid of the constrain δ**x ∈** Im we substitute δ**x** by **u** with **u ∈ **^{n} and redefine the cost function in terms of **u**:

Note that is symmetric and is the orthogonal projection on Im, the cost function and its gradient have the following form:

Comparing ∇*J*_{u} in Eq. (16) with ∇*J*_{x} in Eq. (9), it is easily to see that two gradients are identical if δ**x** is replaced for **u** in Eq. (16). Then we can proceed and derive the similar cost function in Eq. (13). A subtle difference is that if we use Eq. (9) for the singular case, δ**x** is required to be in Im. It is interesting to see that the singular case points out that ∇*J*_{x} is more fundamental than ∇*J*_{x} in defining the minimum solution.

A simple example is helpful to further clarify the theory. Suppose that *n* = 2 and we want to minimize the following cost function:

where *x* and *y* are scalars. It is clear from this cost function that *m* = 1, = (1 1), = 1, and has the diagonal form:

This cost function has the minimum value min(*J*_{0}) = 1/2 at the point **x**_{0} = (1/2, 1/3).

The matrix can be decomposed as

This implies that the following 2 × 3 matrix can be used as one of the square roots of :

We express any vector in the two-dimensional *x*–*y* space in terms of three column vectors of :

Hence, the three scalars (*p*, *q*, *r*) form a new three-dimensional *p–q–r* space, the control space.

After applying Eq. (8), the transformed cost function is

Using the conjugate gradient algorithm with the initial guess (0, 0, 0), this cost function has the minimum value at the point **v**_{11} = (1/6, 0, −1/3). However, if started with the initial guess (1, 1, 1), the minimum value is still 1/12 but the minimum point is **v**_{12} = (3/2, 2/3, 1/3). Although has many minimum points in the *p–q–r* space depending on the initial guess, all these points are mapped into the same point **x**_{0} in the *x–y* space. The fact that the transformed cost function attains the same minimum at different points will be addressed in section 3.

After applying Eq. (13) with **v**_{0} = **0** the practical cost function is obtained:

It attains its minimum value at the point **v**_{2} = (1/6, 0, −1/3). This minimum point is unique; therefore, it does not depend on the initial guess like in the case of . When mapped back to the *x–y* space, we again have the minimum point **x**_{0} of *J*_{0}.

The more interesting case with the cost function in Eq. (13) is when **v**_{0} is nontrivial. From the definition of , it is simple to verify that

That means the vector **v**_{0} = (2 1 1)^{T} ∈ Ker. In fact, this vector spans Ker since the dimension of Ker is 1. Using this vector, Eq. (13) provides another cost function:

The conjugate gradient algorithm gives the minimum value at the point **v**_{3} = (13/6, 1, 2/3). Again maps this point to **x**_{0} in the *x–y* plane.

## 3. Properties of cost functions in control space

Since both observation terms in the cost functions in Eqs. (8) and (13) are identical, to understand the behavior of these cost functions we only need to investigate the background terms. Here the operator ^{T}^{−1} plays a key role. We prove that ^{T}^{−1} is the orthogonal projection operator onto the orthogonal complement of Ker, which is also Im^{T}:

Thus, ^{T}^{−1} is confirmed to be an orthogonal projection operator. Now consider the act of ^{T}^{−1} on a vector **v** in the control space:

Since decomposition of a given vector **v** into the sum of two vectors in Ker and **(**Ker**)**^{⊥} is unique, ^{T}^{−1} is the orthogonal projection on **(**Ker**)**^{⊥}. Moreover, it can be shown that ^{T}^{−1} is the Moore–Penrose pseudoinverse of .

Given the fact that ^{T}^{−1} is indeed the orthogonal projection on (Ker)^{⊥}, the transformed background term can be simplified as

Here we decompose **v** into its components **v**_{1} ∈ Ker and **v**_{2} ∈ (Ker)^{⊥}. The transformed observation term can also be rewritten in terms of **v**_{2}:

Equations (22) and (23) show that the transformed cost function only depends on the projection **v**_{2} of **v** in (Ker)^{⊥}:

Let **v**_{2} ∈ (Ker)^{⊥} be a fixed vector, *J*(**v**) is invariant under addition of any vector **v**_{1} ∈ Ker. That means the transformed cost function is constant on any dataset comprising a specific vector **v**_{2} in (Ker)^{⊥} and all vectors in Ker, namely, the affine subspace + Ker. Consequently, *J*(**v**) attains it minimum not at one point but on a specific affine subspace . Thus, in minimization we can obtain different minimum points depending on the initial guess as illustrated in the example at the end of section 2.

The background term in the extended cost function in Eq. (13) can also be decomposed in terms of components in Ker and (Ker)^{⊥}:

The corresponding observation term (**v**) has the same form as (**v**) in Eq. (23). Therefore, in terms of **v**_{1} and **v**_{2} the extended cost function has the following form:

From this decomposition form an important inequality can be deduced:

The equality occurs if and only if **v**_{1 }= **v**_{0}. That means (**v**) can only attain its minimum at the point **v*** where the projection of **v*** in Ker is **v**_{0} since for any **v*** with if is replaced by **v**_{0} we find the other point at which (**v**) is less than its minimum value. Therefore, the minimum point of (**v**) always lies on the affine subspace **v**_{0} + (Ker)^{⊥}. In the example in section 2, the component in Ker of the minimum point **v**_{3} = (13/6, 1, 2/3)^{T} of the cost function is exactly **v**_{0} = (2 1 1)^{T}.

The inequality in Eq. (27) also reveals other interesting fact involving the two cost functions. That is the extended cost function is always greater than or equal to the transformed one. The difference is exactly the diagonal quadratic term when evaluated on Ker (1/2)(**v**_{1} − **v**_{0})^{T}(**v**_{1} − **v**_{0}). When **v**_{1} = **v**_{0}, the difference becomes zero and the both cost functions are identical on the affine subspace **v**_{0} + (Ker)^{⊥}. This identity has been proven by MA15 for the special case when **v**_{0} = **0**; however, the relation of the two cost functions outside (Ker)^{⊥} has not been examined.

Since the minimum point of (**v**) lies on the affine subspace **v**_{0} + (Ker)^{⊥}, one of the minimum points of *J*(**v**) must lie there as well. As shown from Eq. (24), any minimum point of *J*(**v**) must have the form where **v**_{1} is an arbitrary vector in Ker. That means the same minimum point of both (**v**) and *J*(**v**) must be . In other words, the minimum point of the extended cost function, when **v**_{0} varies, comprises the minimum plane + Ker of the transformed cost function. Thus, in minimization of (**v**) we also obtain one of the minimum points of *J*(**v**). To illustrate we return to the example in section 2. It is easy to check that the minimum point **v**_{3} of the cost function is another minimum point of the transformed cost function beside two points **v**_{11} and **v**_{12}. In fact, **v**_{11} is identified with and **v**_{12}, **v**_{3} can be expressed as (2/3)**v**_{0} + and **v**_{0} +, respectively.

In the special case with **v**_{0} = **0** we have the practical cost function and the affine subspace **v**_{0} + (Ker)^{⊥} becomes (Ker)^{⊥}. MA15 have stated that it is coincidental that the transformed and practical cost function are identical on (Ker)^{⊥} and the practical cost function has a minimum on (Ker)^{⊥}. We have shown here that in fact these two facts are closely related each other. In addition, the minimization process of (**v**) needs not to be restricted into (Ker)^{⊥} since the practical cost function is as legitimate as the transformed one in the control space.

## 4. Real observation experiments

Desroziers et al. (2014) have described four different implementations of the hybrid variational–ensemble method in practice according to the space in which the cost function is represented. In their work, the cost function in Eq. (13) with **v**_{0} = **0** is used in the algorithm 1, which corresponds to minimizing in the column space of , while the original cost function in Eq. (1) is used in the algorithm 3, which corresponds to minimizing in the model space. This description can lead to a misinterpretation that the practical cost function is the corresponding form of the original cost function in the control space. In fact, the correct transformed cost function is the cost function in Eq. (8) but not the practical cost function. That means the two algorithms work with two different cost functions that are equivalent only when the practical cost function is restricted to (Ker)^{⊥}. Note that this causes no problem since the theory in section 2 supports the validity of the practical one.

In section 3, we have shown that (**v**) ≥ *J*(**v**). This entails that the extended cost function is always greater than or equal to the original one (**v**) ≥ *J*(δ**x** = **v**). This fact is difficult to verify in practice since ^{−1} is required to calculate the value of *J*(δ**x**). However, at their same minimum point, (**v**) and *J*(**v**) are identical; therefore, (**v**) and the original cost function *J*(δ**x**) should converge to the same value in their minimization processes and this fact can be verified in practice. In this section, we demonstrate this fact using a data assimilation system with real observations. Before showing this, however, we address two practical issues with the cost function in Eq. (13).

An unpleasant feature of the extended cost function in practice is the presence of the vector **v**_{0} in its definition since it is not easy to find a vector **v**_{0} ≠ **0** in Ker. However, **v**_{0} is not needed if we minimize the cost function in Eq. (13) in terms of a new variable **w** = **v** − **v**_{0}:

where we have relied on the fact that **v** = **w** + **v**_{0} = **w** to rewrite the observation term. Then the approximation of the minimum point **w*** of the cost function in Eq. (28) can be obtained by using any minimization algorithm. Since our objective is the analysis increment δ**x*** and not **v***, this can be calculated directly from **w***:

Also in terms of **w**, the inequality (**v**) ≥ *J*(δ**x** = **v**) becomes (**w**) ≥ *J*(δ**x** = **w**). It is worth noting here that the cost function in Eq. (28) is equivalent to the cost function in Eq. (13) with the trivial vector **v**_{0} = **0**, namely, the practical cost function.

The second issue in practice is that, to calculate the product between and **w**, we need to know the Hadamar product between each column of the square root of the localization matrix and each forecast ensemble perturbation as represented in Eq. (7). This seemingly difficult problem can be accomplished using the products between and certain vectors deduced from **w**. To show this, we rewrite the second term on the right-hand side of Eq. (7) with **w** in place of **v** for a fixed perturbation Δ**x**_{k} as

where for simplification is not included. In Eq. (30), **w**_{k} is a vector obtained from **w** by combining all components of **w** associated with the perturbation Δ**x**_{k}. That means we have a decomposition of **w** into **w**_{0}, **w**_{1}, …, **w**_{K} according to the contributions to the analysis increment from the climatological part (the 0th vector), and the ensemble part (the first to *K*th vectors), respectively. While vector **w**_{0} has the size of the dimension *n* of our minimization problem, the remaining vectors **w**_{k} have the same size as the number of columns of .

The first term on the right-hand side of Eq. (7) can also be rewritten in the similar form as in Eq. (30). If we factorize into its error correlation matrix and its standard deviations **Σ**, which is a diagonal matrix, = **Σ****Σ**, we can use **Σ** as the square root of where is the square root of . With this square root of the contribution from the climatological part with **w** in place of **v** becomes

where again for simplification is not included and Δ**x**_{0} is the vector representing the field of climatological standard deviations. In other words, we vectorize the diagonal matrix **Σ** to obtain the vector Δ**x**_{0}.

As a result, the more practical form of Eq. (7) with **w** in place of **v** is

Similarly, for calculation involving the transpose of we have

Equations (32) and (33) show that modeling of the operator reduces to modeling of two operators and , which are very similar in nature.

The data assimilation system we used here is a hybrid three-dimensional variational–ensemble data assimilation system (hybrid-3DVAR) built around the operational limited-area model, the Nonhydrostatic Mesoscale Model (NHM; Saito et al. 2006) of the Japan Meteorological Agency (JMA). Analysis perturbations were generated by a local ensemble transform Kalman filter (NHM-LETKF). Ensemble analyses were then created by adding these perturbations to the analysis from the 3DVAR program. Short-range ensemble forecasts initialized from these analyses provided the ensemble background error covariance for this variational program. The climatological background error covariance was estimated with the NMC method (Parrish and Derber 1992). The control variables were surface pressure, zonal and meridional wind components, potential temperature, and pseudorelative humidity, which is the ratio between specific humidity and saturated specific humidity. Boundary conditions were interpolated from forecasts of the JMA’s global model.

In NHM hybrid-3DVAR, both the operators and were modeled as a sequence of simple operators, which were reduced to one-dimensional operators for horizontal and vertical correlations. All parameters defining the shape of were obtained from the JMA’s operational mesoscale analysis system. They were the horizontal correlation length scales estimated for each control variable at each vertical level, the vertical correlation matrix that represented the vertical correlations for all vertical levels, and the climatological standard deviations. The horizontal localization length scales were calculated by dilating the corresponding climatological correlation length scales with a factor of 2. The same method was applied to the vertical climatological correlation matrix to derive the vertical localization matrix using the absolute values of vertical correlations.

To examine the behavior of the practical cost function in Eq. (28) and the original cost function in Eq. (1) near their minimum points, we need to implement two different minimization algorithms. For the case of (**w**), we have in fact employed the ^{1/2} preconditioning with *J*(δ**x**), and with its canonical quadratic form we can apply any standard minimization algorithms. In addition to the ^{1/2} preconditioning, the original cost function can also be preconditioned on (Wang 2010; Kuhl et al. 2013) where the cost function needs not to be transformed. This preconditioner is mainly employed with minimization algorithms based on the conjugate gradient (CG) method like the Lanczos version of the CG method (Gürol et al. 2014), or the biCG method (El Akkraoui et al. 2013). Therefore, we adopted the B-preconditioning Lanczos-based CG method in Gürol et al. (2014) to minimize *J*(δ**x**). This algorithm is similar to the algorithm 3 in Desroziers et al. (2014), except that the Lanczos version of the CG method was used so that the value of *J*(δ**x**) was easily calculated from the Lanczos vectors without the need to apply ^{−1}.

The CG method can also be applied for the practical cost function in Eq. (28) as in algorithm 1 in Desroziers et al. (2014). However, we did not choose the CG method for minimizing (**w**) in NHM hybrid-3DVAR. The reason is that if a gradient-based minimization algorithm like the CG method is used with an initial guess inside (Ker)^{⊥}, which is usually the zero initial guess, all successive approximations are also inside (Ker)^{⊥} as proved by MA15. That means the minimization process is restricted in (Ker)^{⊥}, where (**w**) = *J*(δ**x = ****w**), whereas we want to explore the full control space in the minimization process. Therefore, we adopted the limited-memory Broyden–Fletcher–Goldfarb–Shanno (LBFGS) method. This algorithm is a quasi-Newton minimization algorithm in which the Hessian is approximated and vectors outside (Ker)^{⊥} are introduced in the minimization process.

NHM hybrid-3DVAR was performed for one assimilation cycle (3 h) over the same domain as the previous operational one in JMA consisting of 241 × 193 × 40 grid points with 15-km horizontal grid spacing. There were 50 ensemble members. All routine observations retrieved from the JMA’s database were assimilated except radiance data. The observation errors were obtained from the JMA’s operational mesoscale analysis system. The same weight *β*^{2} = 0.5 was applied for both and . Thus, the same background error covariance, the same observational errors, and the same observations were used in all experiments.

Figure 1 shows the values of the practical cost function in Eq. (28) and the original cost function as functions of iteration steps for the first 20 iteration steps. The initial guess in the CG method was the zero initial guess. To ensure that the full control space was explored, some random initial guesses with different magnitudes were also considered in the LBFGS method besides the zero initial guess. Clearly, the values of (**w**) always converge to the ones of *J*(δ**x**) as the iteration step increases, regardless of the initial guess. This similarity near their minimum points explains why we tend to identify (**w**) as the transformed one of *J*(δ**x**) in the control space but in fact it is not equivalent.

The analysis increments resulting from the minimization processes in Fig. 1 are illustrated in Fig. 2 for the zonal wind fields at the model level 8. Although several initial guesses were tried in minimization of the practical cost function in Eq. (28), Fig. 2 only shows the analysis increments associated with the initial guesses of the smallest and largest magnitude. As anticipated from the analysis in section 3, all three plots in Figs. 2a–c are indistinguishable, and this holds for other fields at other model levels as well. The difference between the analysis increments in Figs. 2c and 2a is plotted in Fig. 2d, showing that the differences are two or three orders less than the values of any analysis increments. Unlike the conclusion in MA15, this shows that minimization of the practical cost function needs not to be restricted into (Ker)^{⊥} and any minimization algorithm can be applied.

## 5. Discussion and conclusions

In the derivation of the cost function in Eq. (13) in section 2, the crucial point is to solve for the zero of ∇*J*_{x} instead of the zero of ∇*J*_{x}. Then the resulting cost function follows naturally from the theory. However, this derivation does not give much insight on the nature of this cost function and its relation with the transformed cost function in Eq. (8). Further investigation in section 3, provides a clue to understand the nature of the extended cost function when the transformed cost function is pointed out to be defined on (Ker)^{⊥} as shown in Eq. (24). Another clue comes from the practical form in Eq. (28) of the extended cost function in section 4. These two clues suggest we rewrite *J*(**v**) in terms of the new variable **w** = **v** − **v**_{0}. This can be done easily since the projection of **w** on (Ker)^{⊥} is the same as the projection of **v** on (Ker)^{⊥}:

Here to avoid confusion with **w**_{2} in Eq. (30), the symbol **w**_{⊥} is used to denote the projection of **w** on (Ker)^{⊥}. We also put (**w**) beneath *J*(**w**) to highlight the relation between two cost functions.

Equation (34) clearly shows that (**w**) is in fact the natural extension of *J*(**w**) from (Ker)^{⊥} to the full control space. This justifies why we use the name “extended cost function” to call (**w**). The nature of (**w**) helps to explain all properties of this cost function and its relation with *J*(**w**) as analyzed in section 3. For example, when **w**_{||} ∈ Ker is added into **w**_{⊥} in *J*(**w**), it leaves the observation term unaffected, but at the same time introduces an additional quadratic term (1/2) into the background term. This implies that (**w**) is always greater than or equal to *J*(**w**).

The practical form in Eq. (28) also suggests an alternative form of the extended cost function. In Eq. (32), if we define new variables *α*_{0}, *α*_{1}, …, *α*_{K} as

Eq. (32) will become more compact in terms of *α*_{k}:

Like all entries of **w**_{k}, all entries of *α*_{k} are nondimensional and can be explained as weights for the ensemble member *k*. In addition, all new vectors *α*_{k} will have the same size as the dimension *n* of the model space. This is different from the case of **w**_{k} since the size of **w**_{k} for *k* > 0 is, in general, not *n* but the number of columns of .

When combining all vectors *α*_{k} to form a vector ** α**, the transformation matrix between

**w**and

**is**

*α*In general, is not a square matrix and therefore not invertible. This can be illustrated for the simplest case when no localization is employed. In this case is a matrix with all entries equal to 1. As a result, has only one column with all entries 1 and becomes a nonsquare matrix. However, if we assume that is a full-rank square matrix, will be invertible. This enables the representation of **w** in terms of ** α** as , and the cost function in Eq. (28) in terms of

**is**

*α*where δ**x** is calculated from Eq. (36) and is the squared matrix of :

Because of the presence of , in practice this cost function is usually transformed back to the practical form in Eq. (28) rather than is preconditioned with the matrix .

The cost function in Eq. (38) is the hybrid version of the alpha control variable method proposed by Lorenc (2003), which usually appears in the EnVAR method (Liu et al. 2008; Buehner et al. 2010; Lorenc et al. 2015). This is also the cost function used in algorithm 2 in Desroziers et al. (2014). W07 have pointed out that the minimum point of the cost function in Eq. (38) when transformed back to the model space is also the minimum point of the original cost function. However, the reason why these two different cost functions have the same minimum point has not been pursued. Here we have shown that this form of the extended cost function is indeed a variant of the practical form in Eq. (28) in a special case when is a full-rank matrix. Then the problem that W07 has proved can be easily deduced from the properties of the extended cost function.

In the data assimilation literature on the hybrid variational–ensemble data assimilation method, when applying a control variable transform δ**x** = **v** it was taken for granted that the transformed background term has the canonical quadratic form (1/2)**v**^{T}**v**. However, this is far from the case if we write down the mathematical transformation of this term. In fact, the correct transformed term is (1/2)**v**^{T}^{T}**v** and can only be identified with the canonical form on certain subspace of the control space as was first recognized by MA15. In the absence of a theory supporting the cost function with this canonical background term, these authors have labeled it as the practical cost function. It is surprising to see that this problem has been unnoticeable in almost all studies. Despite of that, this cost function has been shown to work well in all cases.

In this study, we have tried to develop a mathematical framework to explain for such cost function. We have shown that the hybrid variational–ensemble data assimilation method introduces a class of cost functions different from the correct transformed cost function. This class of cost functions has a close connection with the kernel of , which is nontrivial when we lift the cost function from the model space to a higher-dimensional space. Each vector in Ker induces a different cost function and the zero vector induces the practical cost function that has been used in all previous studies. Each cost function has a unique minimum point and all these points form the minimum plain of the transformed cost function. Especially, these cost functions are always greater than or equal to the transformed one. All these properties are a consequence of the essential fact that the new cost function is the natural extension of the transformed one from (Ker)^{⊥} to the full control space.

The mathematical derivation does not depend on the hybrid formulation of the square root of and only assumes this square root as a linear map from the model space to a higher-dimensional space. That is to say the analysis increment has a redundant representation in an overcomplete set of vectors. Therefore, the same arguments can be applied for any methods in which the number of column vectors of is greater than the model dimension like modeling by wavelets (Fisher 2003).

In practice these extended cost functions can be reduced to a practical form similar to the practical cost function. That means **v**_{0} ∈ Ker in the definition of the extended cost function is not needed in minimization. This clearly explains why this form has been used in all previous studies without any problem. Furthermore, the cost function in the alpha control variable method has been shown to be derived from the extended cost function when is a full-rank square matrix. Thus, the theory not only justifies the use of these two cost functions in practice, but also describes their derivation from the original one using the hybrid assumption for . Note that W07 has viewed these two cost functions and the original one as independent cost functions and shown that they have the same minimum point. In fact, these two cost functions are resulted from the original cost function when the hybrid form of is used. The study has provided a complete picture of the hybrid variational–ensemble method both in theory and practice.

## Acknowledgments

This work was supported by the Ministry of Education, Culture, Sports, Science, and Technology (MEXT) through the Strategic Programs for Innovative Research (SPIRE), the FLAGSHIP2020 project (Advancement of meteorological and global environmental predictions utilizing observational “Big Data”), and the JSPS Grant-in-Aid for Scientific Research “Study of optimum perturbation methods for ensemble data assimilation” (16H04054). The LBFGS program was kindly provided by the numerical prediction division of JMA.We would like to thank two anonymous reviewers, whose valuable comments have significantly improved the quality of the manuscript.

## REFERENCES

*Proc. ECMWF Seminar on Recent Developments in Data Assimilation for Atmosphere and Ocean*, Reading, United Kingdom, ECMWF,

## Footnotes

© 2017 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).