Lognormal and Mixed Gaussian–Lognormal Kalman Filters

Steven J. Fletcher aCooperative Institute for Research in the Atmosphere, Colorado State University, Fort Collins, Colorado

Search for other papers by Steven J. Fletcher in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0002-1662-7460
,
Milija Zupanski aCooperative Institute for Research in the Atmosphere, Colorado State University, Fort Collins, Colorado

Search for other papers by Milija Zupanski in
Current site
Google Scholar
PubMed
Close
,
Michael R. Goodliff aCooperative Institute for Research in the Atmosphere, Colorado State University, Fort Collins, Colorado

Search for other papers by Michael R. Goodliff in
Current site
Google Scholar
PubMed
Close
,
Anton J. Kliewer aCooperative Institute for Research in the Atmosphere, Colorado State University, Fort Collins, Colorado

Search for other papers by Anton J. Kliewer in
Current site
Google Scholar
PubMed
Close
,
Andrew S. Jones aCooperative Institute for Research in the Atmosphere, Colorado State University, Fort Collins, Colorado

Search for other papers by Andrew S. Jones in
Current site
Google Scholar
PubMed
Close
,
John M. Forsythe aCooperative Institute for Research in the Atmosphere, Colorado State University, Fort Collins, Colorado

Search for other papers by John M. Forsythe in
Current site
Google Scholar
PubMed
Close
,
Ting-Chi Wu aCooperative Institute for Research in the Atmosphere, Colorado State University, Fort Collins, Colorado

Search for other papers by Ting-Chi Wu in
Current site
Google Scholar
PubMed
Close
,
Md. Jakir Hossen aCooperative Institute for Research in the Atmosphere, Colorado State University, Fort Collins, Colorado

Search for other papers by Md. Jakir Hossen in
Current site
Google Scholar
PubMed
Close
, and
Senne Van Loon aCooperative Institute for Research in the Atmosphere, Colorado State University, Fort Collins, Colorado

Search for other papers by Senne Van Loon in
Current site
Google Scholar
PubMed
Close
Free access

Abstract

In this paper we present the derivation of two new forms of the Kalman filter equations; the first is for a pure lognormally distributed random variable, while the second set of Kalman filter equations will be for a combination of Gaussian and lognormally distributed random variables. We show that the appearance is similar to that of the Gaussian-based equations, but that the analysis state is a multivariate median and not the mean. We also show results of the mixed distribution Kalman filter with the Lorenz 1963 model with lognormal errors for the background and observations of the z component, and compare them to analysis results from a traditional Gaussian-based extended Kalman filter and show that under certain circumstances the new approach produces more accurate results.

Michael R. Goodliff’s current affiliation: Cooperative Institute for Research in the Environmental Studies, University of Colorado Boulder and NOAA/Physical Sciences Laboratory, Boulder, Colorado.

Anton J. Kliewer’s current affiliation: Cooperative Institute for Research in the Atmosphere, NOAA/OAR/ESRL/Global Systems Laboratory, Boulder, Colorado.

Ting-Chi Wu’s current affiliation: Ministry of Science and Technology, Taiwan.

Md. Jakir Hossen’s current affiliation: I.M. System’s Group, Inc., College Park, Maryland.

© 2023 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Steven J. Fletcher, steven.fletcher@colostate.edu

Abstract

In this paper we present the derivation of two new forms of the Kalman filter equations; the first is for a pure lognormally distributed random variable, while the second set of Kalman filter equations will be for a combination of Gaussian and lognormally distributed random variables. We show that the appearance is similar to that of the Gaussian-based equations, but that the analysis state is a multivariate median and not the mean. We also show results of the mixed distribution Kalman filter with the Lorenz 1963 model with lognormal errors for the background and observations of the z component, and compare them to analysis results from a traditional Gaussian-based extended Kalman filter and show that under certain circumstances the new approach produces more accurate results.

Michael R. Goodliff’s current affiliation: Cooperative Institute for Research in the Environmental Studies, University of Colorado Boulder and NOAA/Physical Sciences Laboratory, Boulder, Colorado.

Anton J. Kliewer’s current affiliation: Cooperative Institute for Research in the Atmosphere, NOAA/OAR/ESRL/Global Systems Laboratory, Boulder, Colorado.

Ting-Chi Wu’s current affiliation: Ministry of Science and Technology, Taiwan.

Md. Jakir Hossen’s current affiliation: I.M. System’s Group, Inc., College Park, Maryland.

© 2023 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Steven J. Fletcher, steven.fletcher@colostate.edu

1. Introduction

Over the last decade or so, there have been advances in the variational (VAR) forms of data assimilation to allow for non-Gaussian behavior of the errors; specifically, there has been much development that allows for lognormally distributed errors, as well as for Gaussian errors combined with lognormal errors such that these errors can be minimized simultaneously (Fletcher and Zupanski 2006a,b, 2007; Fletcher 2010; Fletcher and Jones 2014). A full summary of the development of the mixed Gaussian–lognormal variational systems from full field 3DVAR to incremental 4DVAR can be found in Fletcher (2017).

The aforementioned theory has been applied in a microwave retrieval system for temperature and mixing ratio from the AMSU-A brightness temperatures (Kliewer et al. 2016). There, three theories are compared: 1) the mixed distribution approach, which seeks the mode, or most likely state, of the analysis distribution, 2) a logarithmic transform applied to the mixing ratio, which seeks the median of the analysis distribution (Fletcher and Zupanski 2007), and 3) a Gaussian model for the errors of temperature and mixing ratio. It is shown that the mixed distribution approach performs better than the other two mentioned approaches in fitting to observations.

However, the ability to extend the lognormal theory to the Kalman filters, and hence to the ensemble-based data assimilation systems, has not been forthcoming. The basis of the lognormal variational work was provided in the seminal paper Cohn (1997), which contains the first appearance of a definition for a lognormally distributed observational error associated with a direct observation, which is in the form of a ratio, not a difference. This definition, together with a change of variables for the background state, allowed for a version of the Kalman equations that accounts for direct observations with lognormal errors. The reader is referred to Cohn (1997) for the full details of this derivation. The main difference between the work in Cohn (1997) and the work presented here is twofold: 1) we allow for nonlinear observation operators in the Kalman filter equations, and 2) we allow for a combination of Gaussian and lognormal distributed background and observational errors.

The equations for the Kalman filter (Kalman 1960; Kalman and Bucy 1961) can be derived from either a control theory approach or a least squares formulation (Fletcher 2017), where the latter is referring to minimizing the trace of the analysis error covariance matrix with respect to the Kalman gain matrix. The Kalman filter is therefore seeking the mean of the analysis distribution to minimize the errors. For a Gaussian distribution, the mean, mode, and median are equivalent. However, for skewed distributions it is not the case that the three descriptive statistics are equal. For left skewed distributions, the mode is less than the median, which is less than the mean, whereas for right skewed distributions the opposite is true. In section 2, we show that if one tries to follow the initial steps of the least squares approach, it is impossible to derive a lognormal-based Kalman filter because one cannot separate out the analysis and forecast error covariance matrices from the logarithmic operator in order to evolve them by the linear model. Another stumbling point is that it is not possible to define a weighted sum of predicted states and new observations to determine an estimate of the state at the current time and still be consistent with a lognormal estimate.

We shall briefly summarize an alternative approach, which is referred to as the lognormal Kalman filter (Kondrashov et al. 2011), where a logarithmic transform is introduced to a model variable, and the analytical differential equation is re-derived for the new variables. This approach is utilizing the property that the logarithm of a lognormally distributed random variable is a Gaussian distributed random variable and so the Kalman filter can be applied to the new variable. An inverse operation is utilized to obtain a value of the original model variable. This approach would not be practical for a numerical weather or ocean prediction model because it requires re-derivation of the associated prognostic differential equations for the state in the new variable.

Given these setbacks we recall that the Kalman filter is equivalent to 3DVAR when the static background error covariance matrix is replaced with a flow dependent background error covariance matrix, and examine a cost function–based approach to obtain the lognormal analytical state in terms of a lognormal-based Kalman gain matrix, and take the expectation of this state to obtain an expression that is similar to the Gaussian-based analysis error covariance matrix. We say “similar” in that the cost function will be defined to obtain the median, or unbiased state, of the analysis distribution. We should note here that the lognormal distribution is defined in terms of the vector of means μ and the covariance matrix Σ for ln x, not the vector of random variables x. We shall show that the lognormal-based analysis error covariance matrix is equivalent to the inverse Hessian matrix of the associated cost function scaled by the inverse of the derivatives of ln x. We shall also show that the estimate for the lognormal Kalman gain matrix minimizes the trace of the lognormal analysis covariance matrix.

The remainder of this paper is as follows. In section 3 we shall present the derivations of the lognormal and the mixed Gaussian–lognormal versions of the Kalman filter equations. In section 4 we shall examine the performance of the mixed Gaussian–lognormal Kalman filter equations against the Gaussian extended Kalman filter with the Lorenz 1963 model (Lorenz 1963) for different observational error variances and for different frequencies of observations. Also in this section we test the robustness of the new scheme against the extended Kalman filter over 5000 assimilation runs for the different observational error variance experiments and for 5000 perturbed true and background states. It is assumed in these experiments that the z component of the model is better modeled with a lognormal distribution than a Gaussian. The paper is finished with conclusions and ideas for future work.

2. Difficulties with lognormal-based Kalman filters

a. Statistical derivation of the forecast error covariance matrix

As indicated in the introduction, a first approach to derive a lognormal-based Kalman filter data assimilation system would be to follow the statistical derivation that is summarized in Fletcher (2017), but with the lognormal equivalent in parallel. The starting point is to define the Gaussian distributed analysis errors as
εaxaxt,
where xt is the true state that is being sought, and xa is the analysis state at the current time.
Introducing the time component, along with the background or forecast state xb=Mn,n1xan1, in terms of the numerical model M operating on the analysis state at the previous time step, yields the background/forecast error as
εbnMxan1xtn,
where MMn,n1 is a linear or linearized numerical model matrix that operates from time step tn−1 to time step tn.
Introducing the lognormal equivalent of (1) and (2), we have
εalnxalnxtn,
εbln(Mxaln1)xtn,
where is the Hadamard division operator, which is a componentwise division operator. As we are assuming that the components of the analysis and true state are lognormally distributed, then we have the property that all of these entries are greater than zero. The subscript l above, and throughout, is referring to the components associated with the lognormal formulations. The reason that the analysis error in (3) is defined as a ratio is as a result of the lognormal distribution being geometric, which implies that the ratio or product of two independent lognormal random variables is itself a lognormal random variable. For the Gaussian distribution the equivalent property is that for two independent Gaussian random variables, their difference or sum is also a Gaussian random variable.
Using the definition of the analysis state at time t = tn−1 for the Gaussian variable:
xan1=xtn1+εan1,
the background errors can be written in terms of the previous time’s true state and analysis error as
εbn=Mxtn1+Mεan1xtn.
This makes it possible to factorize the background error as
εbn=Mεan1+εmn,
where
εmnMxtn1xtn,
is the model error.
To derive the analysis error covariance matrix in the lognormal Kalman filter, it is desirable to factorize the lognormal analysis error in a similar fashion as in (7). However, using the definition of the background error (4) together with the analysis state at time t = tn−1:
xaln1=xtn1εaln1,
it is clear that this is not possible. Indeed, in this way one finds
lnεbln=lnM(xtn1εaln1)lnxtnlnMxtn1+lnMεaln1lnxtn,
where the inequality comes from the fact that the linear model does not commute through the Hadamard product operator, and as such the model cannot act on both factors separately. Then also the logarithm cannot be expanded, making sure the background error cannot be factorized as in (6). Note that we consider the logarithm of the background error because the Kalman filter equations are defined through the expectations of the relevant Gaussian random variable, which for the lognormal distribution is ln x, and not the lognormal random variable x.
To factorize the time evolution of the lognormal analysis error, another definition of the background error is necessary. A possible workaround is to instead define the background state at the next analysis time t = tn as the true state at that analysis time multiplied by the evolution of the analysis error from the previous analysis time xbn=xtnMϵan1. In this case, one has to multiply by the lognormal model error to keep everything consistent. Then, the logarithm of the background error can be written as
lnεbln=lnxtn+lnMεaln1lnxtn+lnεml=lnMεaln1+lnεml,
where the lognormal model error εml is defined as
lnεmlnlnMxtn1lnxtn.
This now implies that it is not possible to move the numerical model out of the logarithm, which causes a problem as it is not the evolution of the logarithm of the analysis errors that is needed, but rather the logarithm of the evolution of the analysis error.
For the Gaussian filter, the forecast error covariance matrix can now be formed by multiplying (7) with its transpose and then applying the statistical expectation operator E[⋅], which yields
PfnE[εbn(εbn)T]=E[(Mεan1+εmn)(Mεan1+εmn)T],=ME[εan1(εan1)T]MT+E[εmn(εmn)T],=MPan1MT+Qn
where Pan1 is the analysis error covariance matrix at time t = tn−1, and Qn is the model error covariance matrix at time t = tn, and it is assumed that the analysis error and the model error are uncorrelated.
Applying the same approach to (11) yields lognormal forecast and model error covariance matrices as
PflnE[(lnMεan1+lnεmn)(lnMεan1+lnεmn)T],=E[lnMεan1(lnMεan1)T]+E[lnεmn(lnεmn)T],=Paln+Qln
where it is assumed that the logarithm of the analysis and model errors are uncorrelated. It is clear from (14) that the definition for the forecast error covariance matrix does not explicitly contain the numerical model acting on the analysis error covariance matrix from the previous analysis time. It is, however, implicit in the state xn, as this is the result of evolving the true state and the analysis error from the previous time step. It should be noted here that given that it is not possible to interchange the model and the logarithm, for the remainder of the paper the new approach will be with the nonlinear model as there is no advantage of using the linear model to interchange operators. As future applications are more likely to be nonlinear, we remove the errors that are introduced through linearizations through this approach.

b. Kalman gain matrices

Another problem with trying to derive a lognormal version of the Kalman filter arises in the analysis step; as shown in Fletcher (2017), where if given a predicted state xbn+1 that is associated with observations up to time step tn, and assuming that an observation has been received at time t = tn+1, then an estimate of the state at t = tn+1, given the observation at time t = tn+1 is required. In Kalman filter theory this step is started by assuming that the estimate is a weighted sum of the predicted state and the new observation, that is to say
xan+1=Kbn+1xbn+1+Kon+1yn+1,
where yn+1 is the observation at time t = tn+1. We should note here that the observation could be either a direct or indirect observation of the predicted state. However, for a lognormal approach the equivalence of (15) would be in terms of a weighted sum of the logarithm of the predicted state and the observations such that
lnxaln+1=lnKbln+1xbln+1+lnKoln+1yn+1,
where the K matrices in (15) and (16) are referred to as gain matrices. Thus it is not possible to manipulate the equations to obtain the expressions for the gain matrices due to the logarithms being present and acting on the state, and it is not possible to interchange the operators to overcome this problem. Thus this approach cannot continue through the steps of the derivation of the Gaussian-based Kalman filter equations to seek an equivalent lognormal version of the Kalman gain matrix and filter equations.

c. Change of variable-based lognormal Kalman filter

As mentioned in the introduction, there is an alternative approach that has been proposed to obtain a form of a lognormal-based Kalman filter in Kondrashov et al. (2011). This approach, which recently has been used for a reanalysis of ring current phase space densities in Aseev and Shprits (2019), should more accurately be referred to as the change of variable lognormal Kalman filter. In Kondrashov et al. (2011) the authors state that a striking feature of the radiation belts is that values of observed electron fluxes, and modeled particle space density (PSD), vary by several orders of magnitude. The corresponding error distributions are therefore not Gaussian, while standard data assimilation methods, such as the Kalman filter and its various adaptations to large-dimensional and nonlinear problems, are essentially based on least squares minimization of Gaussian errors, which was shown in the last section. Thus it is not possible to change these approaches directly to ensure consistency with a lognormal distribution.

As mentioned in Kondrashov et al. (2011), a possible technique that is quite often used for dealing with lognormal random variables is to use the property that the logarithm of a lognormal random variable is a Gaussian random variable. However, as shown in Fletcher and Zupanski (2007), when transforming from lognormal random variables to Gaussian random variables, minimizing the cost function in variational data assimilation, or finding the covariance matrix and the mean state with an ensemble Kalman filter system, then the descriptive statistic that is found in lognormal space once the inverse transform is applied, is the median state. A problem with this approach for the ensemble Kalman filter formulation is that the model is in terms of the original lognormal variable and not the transform logarithmic variable.

The motivation for considering a lognormal model for the PSD is justified through the statements from Kondrashov et al. (2011) that since this variable is always positive, and generally its variations, as measured by the standard deviation, increase as its mean value increases. However, Gaussian distributed variables can be negative and have a standard deviation that does not change as the mean changes. Lognormal errors arise when sources of variation accumulate multiplicatively, whereas Gaussian errors arise when these sources are additive.

To overcome the problem just mentioned, the authors of Kondrashov et al. (2011) propose rewriting the time evolution parabolic partial differential equation:
ft=L2L(L2DLLfL)fτL,
where PSD is f = f(L, t;μ, J) in the Van Allen radiation belts, at fixed values of the adiabatic invariants μ and J. The radial variable L is the distance in the equatorial plane, measured in Earth radii RE, from the center of Earth to the magnetic field line around which the electron moves at time t, DLL is a radial diffusion term, and τL is the characteristic lifetime of the linear decay of J and μ, through considering the problem for log f.
Thus, Kondrashov et al. develop a lognormal-based dynamical model for the change of variable S = log f through the chain rule for partial derivatives in time and space, applied to (17), which results in
St=L2L(L2DLLSL)1τL+DLL(SL)2.
In Kondrashov et al. (2011) they show positive results from this approach compared to using the standard Gaussian-based Kalman filter equations. However, in numerical weather and ocean prediction it is not realistic to change all of the positive definite variables and then rederive the associated prognostic differential equations. Also as work in Goodliff et al. (2020) has shown, it is possible that the distribution of the positive definite variables can change with the dynamics, and may not always be lognormal. Therefore, also in numerical weather and ocean prediction the underlying distribution used should be able to change dynamically, which is unfeasible in a change of variable approach.

In the next section we shall present an approach that builds off of a cost function for a lognormal median of the posterior distribution for the situation where both the background, and observational, errors are lognormally distributed. We shall also show that this approach ensures that the derived covariances are consistent with a multivariate lognormal distribution.

3. Lognormal and mixed Gaussian–lognormal Kalman filters

It appears from the attempted derivation in section 2 that it is not possible to obtain a lognormal version of the Kalman filter equations, that are based upon the first two moments of the multivariate Gaussian distribution through following a least squares approach, which would have involved showing that the derived Kalman gain matrix minimized the trace of the analysis error covariance matrix. However, what is important to recall here is that for the Gaussian distribution the three descriptive statistics are the same, that is to say the mode, median, and the mean are equal. This is not true for the lognormal distribution. It is shown in Fletcher and Zupanski (2006a) that these three statistics are quite different, and that they each have their own properties; the mode is the maximum likelihood state, the median is the unbiased state, and the mean is the minimum variance state in distribution theory. To make this important distinction more clear, we will review some of the characteristics of the lognormal distribution, before deriving the lognormal Kalman filter.

a. Properties of the lognormal distribution

It is quite often stated that an error is unbiased if its mean is equal to zero. This is not true for many non-Gaussian distributed random variables. As just stated, the median is the unbiased state, which implies that the cumulative density function at the median is equal to 0.5. For example, to say that a lognormal random variable, or error ε, is unbiased if E[ε] = 0, is equivalent to saying that
exp(μ+σ22)=0,
where μ = E[ln ε] and σ2 is the variance of ln ε and not ε, which cannot happen. It is not possible to have a zero mean for a lognormal random variable. However, it is possible that E[ln ε] = 0, but applying this assumption to the background error would imply that the distributions of the true state and the background state are the same type and that they have the same median, not the same mean, in lognormal space, which is true in the Gaussian transformed space as well. This is because of the definition of the distribution of the ratio of two independent lognormal random variables, and its associated mean: if xbLN(μ,σ12) and xtLN(μ,σ22), then εb=(xb/xt)LN(μμ,σ12+σ22)=LN(0,σ12+σ22).

To help illustrate this point, we have plotted two lognormal distributions that are supposed to represent the probability density functions (PDFs) of the true state (solid curve) and the analysis state (dashed curve), where the two states have the same Gaussian mean, μt = μa = ln 2, but with different Gaussian variances, σt2=0.252 and σa2=0.52, respectively, in Fig. 1. We have also plotted lines for the mode, median, and mean of the two distributions in this order in their respective line styles.

Fig. 1.
Fig. 1.

(left) Plots to illustrate the differences in the modes, medians, and means for two different lognormal distributions that are representing the true state’s distribution (solid curve) and the analysis state’s distribution (dashed curve). (right) The distribution for the associated analysis error εa=xa/xt.

Citation: Monthly Weather Review 151, 3; 10.1175/MWR-D-22-0072.1

It is clear from the left panel of Fig. 1 that the two distributions have the same median, but that neither their modes nor their means are equal. In the right panel of Fig. 1 we have plotted the associated distribution for the equivalent analysis error εa=xa/xt. While the distribution has a median at 1, the most likely state is to the left of this value, indicating that the state with the highest probability of occurring for the analysis error is not equality, which implies a bias in the analysis.

In Fletcher and Jones (2014) it is shown that when following a modal approach for the incremental formulation of mixed Gaussian–lognormal 4DVAR, the analysis error distribution, or the posterior distribution as it is also referred to as, had a mode at 1. This is therefore indicating that the most likely answer from the data assimilation system was something close to the true state.

b. Lognormal Kalman filter—Median-based approach

Given this brief explanation and illustrations of the descriptive statistics of the lognormal distribution, it is clear that that the way to derive an equivalent set of Kalman filter–type equations for lognormally distributed errors, using the forecast error covariance matrix evolution described earlier, is to start with defining a cost function whose solution is the median of the analysis error distribution (Fletcher 2017), as this is equivalent to ln εa, and then find the associated covariance matrix for ln εa. In Fletcher (2017) it is shown that the median analysis state for lognormally distributed background and observation errors is the minimum of the following cost function:
J(x)=12(lnxlnxb)TPfl1(lnxlnxb)+12[lnylnh(x)]TRl1[lnylnh(x)],
where h(x) is the nonlinear observation operator that maps the state to observation space, and Pfl, now defined in terms of the nonlinear numerical model, is given by
Pfln=[lnM(xbn1εan1)lnxtn][lnM(xbn1εan1)lnxtn]T+Qln.
Differentiating (20) with respect to x results in
xJ(x)=WbTPfl1(lnxlnxb)HTWoTRl1[lnylnh(x)],
where H is the Jacobian of the observation operator, defined by Hi,jhi(x)/xj, and
Wb1=(x110000x21000000xn1),Wo1={[h1(x)]10000[h2(x)]1000000[hNo(x)]1},
where n is the total number of state variables, and No is the total number of observed variables.
As it will be required later to finalize the derivation of the lognormal Kalman filter, it can be shown that the Hessian matrix of (20) is
x2J(x)=WbTPfl1Wb1+HTWoTRl1Wo1H.
To compare later with the analysis covariance matrix, it is useful to write down a scaled Hessian matrix as
WbTx2J(x)Wb=Pfl1+WbTHTWoTRl1Wo1HWb.
An important feature to notice about (25) is that the Hessian matrix is positive definite because of the following short proof. Let A be a positive definite matrix, defined by xTAx>0,x0, and it is known that A−1 is also positive definite. Now let B=WTA1W, and consider the scalar expression xTWTA1Wx=(Wx)TA1(Wx)=0. Because we know that A−1 is positive definite, then the only way this expression can be 0 is if Wx is equal to 0, but if we assume that W has full column rank, then this implies that x = 0. This then implies that B=WTA1W must be positive definite.

Returning to (24) we have that the diagonal matrices multiplying Pf1 are of full column rank, which from the proof above, implies that the first term in (24) is positive definite. Turning to the second term, we know that W0 has full column rank as it is a diagonal matrix. With respect to H, we know that there has to be a sensitivity to at least one state variable, and that the same observation is not assimilated twice; therefore, H must have full rank as well. Thus the second term in (24) is also positive definite. Given that the sum of two positive definite matrices is also a positive definite matrix, then the Hessian matrix must be positive definite. By these same arguments, the scaled Hessian matrix in (25) must also be positive definite.

As with the minimization of the Gaussian cost function in 3DVAR, setting the gradient of (20) equal to zero, and calling the state that achieves this the analysis state, denoted by xa, yields
xJ(xa)=WbTPfl1(lnxalnxb)HTWoTRl1[lnylnh(xa)]=0.
The next step in deriving the lognormal version of the Kalman filter equations is to state that we wish to associate the analysis state with the observations in terms of some form of lognormal-based Kalman gain matrix Kl. Thus, we require
lnxalnxb=Kl[lnylnh(xa)].
The reason for the form in (27) is that applying a logarithm to the lognormal random variables results in Gaussian random variables, thus, one is free to consider a linear combination of these new variables, similar to the original Gaussian approach. This form also enables us to form the analysis covariance matrix that is consistent for a lognormal distribution.
Next we introduce the logarithmic geometric tangent linear approximation (Fletcher and Jones 2014), which enables the logarithm of the model fields to be operated on by the Jacobian of the observation operator h(x). The starting point is to consider the numerical geometric derivative of ln hj(x), for a component xi, with a multiplicative increment Δxi, such that
lnhj(xiΔxi)lnhj(xi)xi(Δxi1)1hj(xi)dhj(xi)dxi.
Multiplying and dividing (28) by [ln (xiΔxi) − ln xi] and interchanging the denominators on the left-hand side, yields
lnhj(xiΔxi)lnhj(xi)ln(xiΔxi)lnxi[ln(xiΔxi)lnxixi(Δxi1)]1hj(xi)dhj(xi)dxi.
Taking the limit of the second factor on the left-hand side of (29) with respect to Δxi → 1, and then multiplying by its inverse yields
lnhj(xiΔxi)lnhj(xi)ln(xiΔxi)lnxi1hj(xi)dhj(xi)dxixi.
Rearranging the expression above yields
lnhj(xiΔxi)lnhj(xi)1hj(xi)dhj(xi)dxixi[ln(xiΔxi)lnxi],
for i = 1, 2, …, N and j = 1, 2, …, No. Collecting all of the different components for i and j results in the following general matrix-vector geometric tangent linear approximation to h(xΔx) as
lnh(xΔx)lnh(x)Wo1HWb[ln(xΔx)lnx].
Returning to the derivation of the lognormal Kalman gain matrix, we substitute xa for xΔx and xb for x, along with the assumption that the analysis error, or Δx, is close to one, which is consistent with the assumption made for the tangent linear approximation used in incremental variational data assimilation (Fletcher and Jones 2014; Fletcher 2017). Thus, we have that
lnh(xa)lnh(xb)+Wo1HWb(lnxalnxb).
Now substituting (32) into (26) results in
WbTPfl1(lnxalnxb)HTWoTRl1[lnylnh(xb)Wo1HWb(lnxalnxb)]=0.
Pre-multiplying (33) by WbT and factorizing yields
(Pfl1+WbTHTWoTRl1Wo1HWb)(lnxalnxb)=WbTHTWoTRl1[lnylnh(xb)].
Pre-multiplying (34) by the inverse of the first matrix factor results in a format as desired in (27), given by
(lnxalnxb)=(Pfl1+WbTHTWoTRl1Wo1HWb)1×WbTHTWoTRl1[lnylnh(xb)],
and as such the lognormal Kalman gain matrix is of a similar form to the Gaussian version, but now containing the derivatives of the logarithms, given by
Kl(Pfl1+WbTHTWoTRl1Wo1HWb)1WbTHTWoTRl1,PflWbTHTWoT(Wo1HWbPflWbTHTWoT+Rl)1.
To obtain the expression in (36) involves a double application of the Sherman–Morrison–Woodby formula. A proof for the Gaussian equivalent from the Physical Space Assimilation System (PSAS) can be found on page 755 in Fletcher (2017), but to obtain the expression above involves scaling by Wb and Wo1. An important feature to note here is that it is required that Wo1HWbPflWbTHTWoT+Rl is positive definite. From the argument presented earlier we know that the observational Jacobian matrix multiplied by the W matrices is still of full rank, and as such this implies that the expression above is positive definite.
It is the latter expression of the lognormal Kalman gain matrix Kl that will be used in the next step, which is to derive the lognormal equivalent of the analysis error covariance matrix. The starting point here is to take the lognormal analysis error definition from (3) and use (27) to substitute ln xa. Thus, we have
lnεa=lnxblnxt+Kl[lnylnh(xb)].
To obtain a similar form to that derived for the Gaussian approach, ln h(xb) must be rewritten in terms of the true state and the background error. This is achieved through using the geometric tangent linear approximation (31) with xb=xtεbl, which results in
lnh(xb)lnh(xt)+Wo1HWblnεbl.
Substituting (38) into (37) yields
lnεa=lnεbl+Kl[lnylnh(xt)Wo1HWblnεbl],=(IKlWo1HWb)lnεbl+Kllnεol.
To simplify the appearance of the derivation we now define ĤWo1HWb. To form the lognormal analysis error covariance matrix, we have to take the expectation of ln εa(ln εa)T, which is
PalE[lnεa(lnεa)T],=(IKlĤ)E[lnεbl(lnεbl)T](IKlĤ)T+KlE[lnεol(lnεol)T]KlT,=I(IKlĤ)Pfl(IKlH^)T+KlRlKlT.
Following the same expansion of the products in (40) as for the Gaussian case, and noticing that the lognormal Kalman gain equation can be written as
Kl=PflĤT[ĤPflĤT+Rl]1,
results in (40) becoming
Pal=(IKlĤ)Pfl.
We now show that the analysis error covariance matrix is equivalent to the inverse of the scaled Hessian matrix in (25). The first step is to expand Kl in (41) in terms of Ĥ, and using the rule for the inverse of the product of two matrices, yielding
Pal=PflPflĤTRl1[ĤPflĤTRl1+I]1ĤPfl.
Next recalling the Sherman–Morrison–Woodbury formula:
(A+UVT)1A1A1U[I+VTA1U]1VTA1,
where for (42) this implies that A=Pfl1, U=ĤTRl1, and V=ĤT, the lognormal analysis error covariance matrix can be rewritten as
Pal=(Pfl1+ĤTRl1Ĥ)1,=(Pfl1+WbTHTWoTRl1Wo1HWb)1,
where the expression inside the brackets on the right-hand side of (43) is the same as that of the scaled Hessian matrix of the cost function in (25).
The final part of this derivation is to show that the expression for the lognormal Kalman gain matrix minimizes the trace of the analysis error covariance matrix. Thus differentiating (40) with respect to Kl and setting to zero yields
PalKl=2(IKlĤ)PflĤT+2KlRl=0,PfĤT=Kl(ĤPfĤT+Rl),Kl=PfĤT(ĤPfĤT+Rl)1.
It is clear that the expression for the lognormal-based Kalman gain matrix in (36) is the same as that in (44), and thus the lognormal Kalman gain matrix minimizes in a least squares sense the analysis error covariances.
Thus in summary the equations for the analysis step of the lognormal Kalman filter are given by
Pfln=[lnM(xan1εan1)lnxtn]×[lnM(xan1εan1)lnxtn]T+Qln,
lnxan=lnxbn+Kln[lnylnh(xa)],
Kln=PflnWbTHTWoT[Wo1HWbPflnWbTHTWoT+Rl]1,
Paln=(IKlnWo1HWb)Pfln,
xbn=M(xan1).
As shown with the development of the lognormal forms of variational data assimilation, we do not live in an one type of distribution only world, and as such the next step is to combine the lognormal Kalman filter just derived with the Gaussian Kalman filter to be able to use the mixed Gaussian–lognormal distribution from Fletcher and Zupanski (2006b) to form a second non-Gaussian-based Kalman filter system.

c. Mixed Gaussian–lognormal Kalman filter

In this section we shall refer to the mixed Gaussian–lognormal Kalman filter as MXKF. The starting point for the derivation of the MXKF is the definition of the associated background, observational, model, and analysis errors. As we are assuming that the error that is to be minimized is from a mixed distribution, it implies that there are a set of Gaussian distributed errors and a set of lognormal distributed errors that need to be minimized simultaneously. This then implies that the true state, background, and analysis states are given by
xtmx(xtGxtl),xbmx(xbGxbl),xamx(xaGxal),
where G represents the Gaussian distributed random variables, and l represents the lognormally distributed random variables, which implies that the associated mixed distributed errors are given by
εbmxn(xbGnxtGnlnxblnlnxtln),εamx(xaGnxtGnlnxalnlnxtln),εomx[yGhG(xtmxn)lnyllnhl(xtmxn)],εmmx{[M(xtmxn1)]GxtGnln[M(xtmxn1)]llnxtln}.
It should be noted here that the number of Gaussian observation errors will, in most circumstances, be less than the dimension of the Gaussian component of the true state. This is also true for the lognormal distributed errors. Finally there may not be an equal number of Gaussian and lognormal background, or observational, errors
The mixed distribution-based forecast error covariance matrix can be shown to be
Pfmxn[M(εaGn1)lnM(εaln1)][M(εaGn1)lnM(εaln1)]T,
where it can clearly be seen that there are covariances between the Gaussian and the lognormal forecast errors.
The next step is to define the equivalent cost function from the lognormal median approach to find the median of the mixed distribution, which is given by
Jmx(x)=12(xtGxbGlnxtllnxbl)TPfmx1(xtGxbGlnxtllnxbl)+12[yGhG(xt)lnyllnhl(xt)]TRmx1[yGhG(xt)lnyllnhl(xt)],
where the Jacobian of (53) can be shown to be
xtJ(x)=W˜bTPfmx1(xtGxbGlnxtllnxbl)HTW˜oTRmx1[yGhG(xt)lnyllnhl(xt)],
and the associated Hessian and scaled Hessian matrices are given by
xt2J(x)=W˜bTPfmx1W˜b1+HTW˜oTRmx1W˜01H,W˜bTxt2J(x)W˜b=Pfmx1+W˜bTHTW˜oTRmx1W˜01HW˜b,
where
W˜b1(11xtl11xtlN1),W˜o1[11hl1(xt)hlNo(xt)].
The next stage is to form the analysis errors in terms of the background state combined with a weighting of the observations. Thus, for the mixed distribution approach this should be of the following form:
(xaGxbGlnxallnxbl)=Kmx[yGhG(xa)lnyllnhl(xa)].
As with the lognormal derivation, we introduce the geometric tangent linear approximations to the lognormal observation operators and the additive tangent linear model to the Gaussian observation operators into the gradient of the mixed distribution cost function and set to zero. This results in
0=W˜bTPfmx1(xaGxbGlnxallnxbl)HTW˜oTRmx1{[yGhG(xb)lnyllnhl(xb)]HW˜o1W˜b(xaGxbGlnxallnxbl)}.
Factorizing (57) results in
[Pfmx1+W˜bTHTW˜oTRmx1W˜o1HW˜b](xaGxbGlnxallnxbl)=W˜bTHTW˜oTRmx1[yGhG(xb)lnyllnhl(xb)].
Therefore, for the mixed distribution approach one finds
(xaGxbGlnxallnxbl)=[Pfmx1+W˜bTHTW˜oTRmx1W˜o1HW˜b]1×W˜bTHTW˜oTRmx1[yGhG(xb)lnyllnhl(xb)].
Thus, the Kalman gain matrix for the mixed Gaussian–lognormal approach is
Kmx[Pfmx1+W˜bTHTW˜oTRmx1W˜o1HW˜b]1W˜bTHTW˜oTRmx1.
Through applying the Sherman–Morrison–Woodbury formula twice it is possible to write (60) in the more usable form,
KmxPfmxW˜bTHTW˜oT[W˜o1HW˜bPfmxW˜bTHTW˜oT+Rmx]1,
where the proof of the expression above can be found through using the derivation on page 755 from Fletcher (2017), and the proof of positive definiteness follows from the arguments from the lognormal Kalman filter analysis error derivation.
As with the lognormal approach, we shall introduce some notation to simplify the appearance of the derivation for the analysis error covariance matrix. We shall denote H˜W˜o1HW˜b. Using the definition of the analysis errors for the mixed distribution from (51), the different forms of tangent linear approximations presented earlier, as well as the standard version from the Gaussian formulation, results in the mixed distribution analysis errors of the following form:
(εaGlnεal)=[IKmxH˜T](εbGlnεbl)+Kmx(εoGlnεol).
Thus, forming the product of the analysis error vector with its transpose and taking the expectation results in
Pamx=[IKmxH˜T]Pfmx[IKmxH˜T]T+KmxRmxKmxT,
where following the same arguments for the Gaussian and lognormal cases results in the analysis error covariance matrix of the following form:
Pamx=[IKmxH˜T]Pfmx.
The final step is to confirm that the inverse of the analysis error covariance matrix is equivalent to the scaled Hessian of (53) in (55) which can easily be shown as the expression above is the same in appearance as the standard Gaussian and the lognormal version from the last section. Therefore, the analysis error covariance matrix for the mixed distribution approach is given by
Pamx=(Pfmx1+W˜bTHTW˜oTRmx1W˜o1HW˜b)1.
Thus, in summary, the mixed Gaussian–lognormal-based Kalman filter equations are given by
Pfmxn=[M(εaGn1)lnM(εaln1)][M(εaGn1)lnM(εaln1)]T+Qmxn,
(xaGnlnxaln)=(xbGnlnxbln)+Kmxn[yGhG(xbn)lnyllnhl(xbn)],
Kmxn=PfmxnW˜bTHTW˜oT[W˜o1HW˜bPfmxnW˜bTHTW˜oT+Rmx]1,
Pamxn=[IKmxnW˜o1HW˜b]Pfmxn,
(xbGnxbln)=M(xaGn1xaln1).

In this section it has been shown that it is possible to derive a nonlinear version of the Kalman filter equations to be used with lognormal random variables, as well as with a combination of lognormal and Gaussian random variables. The appearance of the set of equations are similar to the Gaussian form, but the evolution of the analysis error covariance matrix is exact and not through the application of the linearized model.

In the next section the mixed Gaussian–lognormal Kalman filter equations will be tested against the original linearized Gaussian Kalman filter equations, referred to as the extended Kalman filter (EKF) with the Lorenz 1963 model. It has been shown in the development and testing of the mixed Gaussian–lognormal variational data assimilation systems that the z component of this model is highly non-Gaussian, and as such is a good test to assess the performance of these new equations.

4. Experiments with the Lorenz 1963 model

As just mentioned, the Lorenz 1963 model has been used extensively to test the development of the mixed Gaussian–lognormal-based variational data assimilation systems. An important feature of the Lorenz 1963 model is that there are regions of the Lorenz attractor where the z component does not follow a Gaussian distribution, as has been shown in Fletcher (2010) and Goodliff et al. (2020). This component is always positive, while the x and y components have positive and negative values. In Fletcher (2010) there are four climatologies of the z component that are created from the Lorenz 1963 model with the initial conditions that we shall define soon, where it was clear that after 100 000 time steps this component appeared to have a global mode with a skewness to the left and then a secondary mode with a smaller occurrence rate than the global mode. When fitting a lognormal distribution to this data the global mode is very well captured with a slight underestimation of the secondary mode. When a Gaussian distribution was fitted to this data, both modes were underestimated and the Gaussian mode was in between the two modes and was assigning higher probabilities to states that did not occur that often. See Fig. 21.3 in Fletcher (2017) for this example.

This model is also a good choice due to its simplicity for a dynamical model that exhibits chaotic behavior. Another important property of this model is that it is very sensitive to the initial conditions from which it starts, and as such can give very different answers even by being out by a few decimal places from the true state. For an example of this sensitivity see Fletcher (2017). The continuous model equations are as given by
dxdt=σ(xy),
dydt=ρxyzx,
dzdt=xyβz,
where x = x(t), y = y(t), and z = z(t) are the state variables, σ = 10, ρ = 28, and β = 8/3 are parameters.

The experiments will compare the analysis errors from the EKF, which uses the linearized model, against those from the MXKF, that uses the full nonlinear model. The two filters will be tested with different observational errors, where it is assumed that x and y components have Gaussian errors and the z component has lognormal errors. The observations are generated with different observational error variances, and with different times between analysis updates to determine the robustness of the new approach.

In this section we shall look at the sensitivity of the EKF and the MXKF to both observational error variance size, as well as time between observations. The numerical scheme that is used for the discretization of the nonlinear model is the second-order explicit Runge–Kutta scheme. The MXKF utilizes the nonlinear numerical model for all three components, while the EKF utilizes a linearized version of the model, which was calculated analytically and then discretized with the same scheme as the nonlinear model. The adjoint model was also calculated by hand. This is all coded in MATLAB. See Fletcher (2017) for more details on this calculation.

We shall consider four different configurations in this section; 1) σo = 0.5, 50 time steps between observations, 2) σo = 2, 25 time steps between observations, 3) σo = 0.25, 200 time steps between observations, and 4) σo = 1, 100 time steps between observations, where σo is the observational error standard deviation.

The true solution is started from the initial conditions: x0t = −5.4458, y0t = −5.4841, and z0t = 22.5606, while the background solution starts from x0b = −5.9, y0b = −5.0, and z0b = 24.0. These are the same set of initial conditions that have been used with the mixed Gaussian–lognormal variational data assimilation schemes. However, an extra feature that is used in these experiments, and is the same for both the EKF and the MXKF is an approximation for the model error term. We use the values that are suggested in Evensen and Fabio (1997):
Q(0.14910.15050.00070.15050.90480.00140.00070.00140.9180).

a. Experiment 1: σ0 = 0.5, 50 time steps between observations

In Fig. 2 we have two sets of plots, the first is of the z and x trajectories for the true states (red lines), the solution from the MXKF (blue lines), and the solution from the EKF (black lines), along with the observations (green circles). The second set of plots is of the z and x errors, where for the z component we consider the ratio to define the error, while for the x component the error is defined as the difference.

Fig. 2.
Fig. 2.

(left) z and x true (red), mixed Gaussian–lognormal Kalman filter (MXKF) (blue), and extended Kalman filter (EKF) (black), observations (green circles). (right) za/ztt error and xaxt error plots for the analysis from the MXKF (solid black) and the EKF (black dashed) for σo = 0.5 with 50 time steps between observations.

Citation: Monthly Weather Review 151, 3; 10.1175/MWR-D-22-0072.1

It is clear from the trajectory plots in Fig. 2 that the observations are not that accurate, but are frequent. Both solutions appear drawn toward the observations in that the error increases when the less accurate observations are assimilated compared to the more accurate ones. However, when we consider the error plots we can see that both approaches are impacted by the less than perfect observations, but that the MXKF solution is able to be more consistent for both the x and z components, as measured by the z error being close to one, and close to zero for the x component, with the MXKF scheme able to recover quicker.

b. Experiment 2: σo = 2, 25 time steps between observations

In this section we consider the case where we have more observations than in experiment one, but these observations are less accurate. These results are presented in Fig. 3 in the same configuration as in experiment one. We can see that while there are some quite inaccurate observations, the solutions from either approach are not able to go out of phase from the true solution. Again it is clear that the MXKF approach is able to stay more consistent than the EKF solutions.

Fig. 3.
Fig. 3.

(left) z and x true (red), mixed Gaussian–lognormal Kalman filter (MXKF) (blue), and extended Kalman filter (EKF) (black), observations (green circles). (right) za/zt error and xaxt error plots for the analysis from the MXKF (solid black) and the EKF (black dashed) for σo = 2 with 25 time steps between observations.

Citation: Monthly Weather Review 151, 3; 10.1175/MWR-D-22-0072.1

c. Experiment 3: σo = 0.25, with 200 time steps between observations

In this experiment we are considering the case where we have fewer observations, but they are quite accurate. These results are presented in Fig. 4. As expected there is a decrease in the accuracy of both approaches, but we also have that neither of the solutions go out of phase. We again see that the MXKF is able to produce a more consistent solution than the EKF for both the x and z components.

Fig. 4.
Fig. 4.

(left) z and x true (red), mixed Gaussian–lognormal Kalman filter (MXKF) (blue), and extended Kalman filter (EKF) (black), observations (green circles). (right) za/zt error and xaxt error plots for the analysis from the MXKF (solid black) and the EKF (black dashed) for σo = 0.25 with 200 time steps between observations.

Citation: Monthly Weather Review 151, 3; 10.1175/MWR-D-22-0072.1

d. Experiment 4: σo = 1, 100 time steps between observations

The results from this experiment are presented in Fig. 5 where we can see that we have the situation where the EKF does go out of phase with the true solution, even going on to the wrong attractor for a short while, before assimilating additional observations to bring it back toward the true solution. However, for this configuration we see that the MXKF approach does not go out of phase in neither the z nor the x component to the extreme that the EKF solution does and appears to be able to better assimilate the observations each time.

Fig. 5.
Fig. 5.

(left) z and x true (red), mixed Gaussian–lognormal Kalman filter (MXKF) (blue), and extended Kalman filter (EKF) (black), observations (green circles). (right) za/zt error and xaxt error plots for the analysis from the MXKF (solid black) and the EKF (black dashed) for σo = 0.5 with 50 time steps between observations.

Citation: Monthly Weather Review 151, 3; 10.1175/MWR-D-22-0072.1

e. Robustness testing 1: Observational error standard deviations and observational frequency

In this subsection we present results from running experiments 1 to 4 with 5000 different random draws from the observational error distribution to test the robustness of the MXKF. To determine the robustness we calculate the analysis error as a lognormally distributed random variable, i.e., the ratio of the analysis to the true state for all 5000 solutions from the MXKF and the EKF, and from these errors we calculate the average minimum error and the average maximum error for the MXKF and the EKF. These results are summarized in Table 1 to highlight the spread from the average minimum analysis error to the maximum analysis error for the MXKF and the EKF.

Table 1

Summary of the average minimum and maximum analysis errors for experiments 1–4 over 5000 assimilation runs.

Table 1

We note here that during the 5000 evaluations for using experiment 3 and 4 configurations there was one instance for each where the MXKF did not converge. This aside, given that the analysis error is a ratio and if the scheme is performing well then the analysis error should be approximately equal to 1 as seen in the results in Fletcher and Jones (2014).

From the values in Table 1 it is clear that the MXKF on average for the situations considered here has a smaller spread between the average maximum and average minimum analysis error for all four experiments, bearing in mind the caveat above. It appears that the scheme performs best on average for the situation where there are inaccurate observations but more of them (experiment 2). The largest spread for the MXKF appears to be for experiment 3, accurate observations but less frequent.

f. Robustness testing 2: Perturbing the true and background states’ initial conditions

In the results present here experiment 1 was used for the observational error standard deviation and frequency but the initial conditions for the true state and the background state were randomly perturbed using the MATLAB function NORMRND with mean zero and three different standard deviations, σp = 0.1, 0.5, 1. Different perturbations were applied to the true initial conditions and the background initial conditions from those presented at the beginning of this section but were drawn from the same distribution. The performance metrics as for robustness testing 1 were applied here and are summarized in Table 2.

Table 2

Summary of the average minimum and maximum analysis errors for perturbing the true and background initial condition over 5000 assimilation runs using experiment 1’s observational configuration.

Table 2

From Table 2 it is clear that there is a sensitivity in the MXKF to the initial conditions for the true and background state. It should be noted that the MXKF had an approximate 1% failure rate for all three configurations, while the EKF did not. As with robustness testing 1, the MXKF has a smaller spread between its average maximum and minimum analysis error compared to the EKF when it converged.

5. Conclusions and further work

In this paper we have been able to show that it is not possible to follow the linear least squares approach that is used to derive the Kalman filter, and the extended Kalman filter (EKF) equations, to derive a similar expression for lognormally distributed errors. However, we have been able to show that if we keep the nonlinear model and follow a cost function–based approach associated with the median from Fletcher (2010), then it is possible to derive a set of nonlinear equations for the update of the median of the lognormal analysis state together with its uncertainty. We were able to extend this to the mixed Gaussian–lognormal probability density function, where the associated Kalman filter equations are referred to as the MXKF.

We coded the new MXKF, along with the EKF, for the Lorenz 1963 model in MATLAB, and showed that for different configurations of observational error variances and time steps between observations, where the observational errors for the x and y components were Gaussian distributed, while for the z component these were lognormally distributed, the MXKF appeared to be more consistent with the true solutions for longer periods than the EKF. We should note that the EKF was using the linearized numerical model, while the MXKF was using the nonlinear numerical model. It appears that this has an effect on the performance of the EKF compared to the MXKF in that it appears to fit more to the observations, while the MXKF does not always pull straight to the observations.

To evaluate the general performance of the MXKF against the EKF a set of 5000 assimilation experiments was run for each of the four experimental configurations from section 4. It was shown that the MXKF had a smaller spread between the average minimum and average maximum analysis error for all four experimental configurations, but we note that there was one realization for both experiment 3 and 4 where the MXKF did not converge. This robustness test was followed up with a sensitivity study of the MXKF and the EKF to perturbed true state and background state initial conditions.

It has been shown in Fletcher and Jones (2014) that lognormal-based data assimilation systems can be quite sensitive to the accuracy of the observations of the different components of the Lorenz 1963 model near the transition zones between the two attractors, and it is possible that the lognormal Kalman filter could also be suffering from this here, and is left for further work to determine if this is the case.

The next step in this work is to build the theory for an ensemble-based approach to the MXKF equations and rigorously test them with different toy problems. Given the nonlinear nature of the equations, and the dependence on the equations being derived from a cost function, the most likely candidate would be the maximum likelihood ensemble filter (MLEF) from Zupanski (2005). The MLEF is comprised of two steps: the forecast step uses the standard definition of the update for the forecast error covariance matrix from the Kalman filter but uses the nonlinear model to evolve this matrix between analysis times, instead of a linear model, through an ensemble where each ensemble member’s perturbation is a column from the analysis error covariance matrix from that assimilation cycle. The analysis step is to solve a flow dependent 3D VAR cost function projected into ensemble space through a Hessian preconditioner. The square root analysis error covariance is updated through the inversion of the Hessian preconditioner. The steps here are easily adaptable to the new MXKF equations for the updates of the analysis and forecast error covariance matrices.

The reason behind the non-Gaussian work over the last 15 years has been to develop more consistent data assimilation systems for positive definite variables. In the atmosphere it is well know that relative humidity is positive definite, that is to say it is always larger than zero, and as such we do not wish for a data assimilation system to produce an answer that is negative or equal to zero for this field. It has been shown in Kliewer et al. (2016) that through using a mixed Gaussian–lognormal 1DVAR for a temperature–mixing ratio it is possible to obtain better fits to both the temperature and the moisture channels through the covariances between the Gaussian and lognormal random variables. A full description of the links between the two distributions can be found in Fletcher (2017).

As most of the operational numerical weather prediction centers use a form of hybrid ensemble/variational data assimilation algorithm, it became important for the mixed Gaussian–lognormal theory to move toward that approach. However, the major stumbling block has been the Kalman filter component to create the ensemble covariance. The work in this paper is the first step toward a Gaussian–lognormal hybrid 4DVAR.

Acknowledgments.

The National Science Foundation Grant AGS-1738206 at CIRA/CSU supported authors 1, 3, 4, 5, and 6, while authors 2 and 7 were supported by NOAA’s Hurricane Forecast Improvement Program Award NA18NWS4680059. Funding for authors 1, 8, and 9 came from National Science Foundation Grant AGS-2033405 at CIRA/CSU. We are grateful for the extremely helpful reviews from the three anonymous reviewers and are grateful to reviewer two for the proof to show the positive definiteness of the different analysis error covariance matrices.

Data availability statement.

The code that support the findings of this study is openly available at the following URL: https://mountainscholar.org/handle/10217/234474.

REFERENCES

  • Aseev, N. A., and Y. Y. Shprits, 2019: Reanalysis of ring current electron phase space densities using Van Allen probe observations, convection model, and log-normal Kalman filter. Space Wea., 17, 619638, https://doi.org/10.1029/2018SW002110.

    • Search Google Scholar
    • Export Citation
  • Cohn, S. E., 1997: An introduction to estimation error theory. J. Meteor. Soc. Japan, 75, 257288, https://doi.org/10.2151/jmsj1965.75.1B_257.

    • Search Google Scholar
    • Export Citation
  • Evensen, G., and N. Fabio, 1997: Solving for the generalized inverse of the Lorenz model. J. Meteor. Soc. Japan, 75, 229243, https://doi.org/10.2151/jmsj1965.75.1B_229.

    • Search Google Scholar
    • Export Citation
  • Fletcher, S. J., 2010: Mixed lognormal-Gaussian four-dimensional data assimilation. Tellus, 62A, 266287, https://doi.org/10.1111/j.1600-0870.2010.00439.x.

    • Search Google Scholar
    • Export Citation
  • Fletcher, S. J., 2017: Data Assimilation for the Geosciences: From Theory to Applications. Elsevier, 976 pp.

  • Fletcher, S. J., and M. Zupanski, 2006a: A data assimilation method for log-normally distributed observational errors. Quart. J. Roy. Meteor. Soc., 132, 25052519, https://doi.org/10.1256/qj.05.222.

    • Search Google Scholar
    • Export Citation
  • Fletcher, S. J., and M. Zupanski, 2006b: A hybrid multivariate normal and lognormal distribution for data assimilation. Atmos. Sci. Lett., 7, 4346, https://doi.org/10.1002/asl.128.

    • Search Google Scholar
    • Export Citation
  • Fletcher, S. J., and M. Zupanski, 2007: Implications and impacts of transforming lognormal variables into normal variables in VAR. Meteor. Z., 16, 755765, https://doi.org/10.1127/0941-2948/2007/0243.

    • Search Google Scholar
    • Export Citation
  • Fletcher, S. J., and A. S. Jones, 2014: Multiplicative and additive incremental variational data assimilation for mixed lognormal-Gaussian errors. Mon. Wea. Rev., 142, 25212544, https://doi.org/10.1175/MWR-D-13-00136.1.

    • Search Google Scholar
    • Export Citation
  • Goodliff, M., S. Fletcher, A. Kliewer, J. Forsythe, and A. Jones, 2020: Detection of non-Gaussian behavior using machine learning techniques: A case study on the Lorenz 63 model. J. Geophys. Res. Atmos., 125, e2019JD031551, https://doi.org/10.1029/2019JD031551.

    • Search Google Scholar
    • Export Citation
  • Kalman, R. E., 1960: A new approach to linear filtering and prediction problems. J. Basic Eng., 82, 3545, https://doi.org/10.1115/1.3662552.

    • Search Google Scholar
    • Export Citation
  • Kalman, R. E., and R. S. Bucy, 1961: New results in linear filtering and prediction theory. J. Basic Eng., 83, 95108, https://doi.org/10.1115/1.3658902.

    • Search Google Scholar
    • Export Citation
  • Kliewer, A. J., S. J. Fletcher, A. S. Jones, and J. M. Forsthye, 2016: Comparison of Gaussian, logarithmic transform and mixed Gaussian-log-normal distribution based 1DVAR microwave temperature-water vapour mixing ratio retrievals. Quart. J. Roy. Meteor. Soc., 142, 274286, https://doi.org/10.1002/qj.2651.

    • Search Google Scholar
    • Export Citation
  • Kondrashov, D., M. Ghil, and Y. Shprits, 2011: Lognormal Kalman filter for assimilating phase space density data in the radiation belts. Space Wea., 9, S11006, https://doi.org/10.1029/2011SW000726.

    • Search Google Scholar
    • Export Citation
  • Lorenz, E. N., 1963: Deterministic nonperiodic flow. J. Atmos. Sci., 20, 130141, https://doi.org/10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Zupanski, M., 2005: Maximum likelihood ensemble filter. Part I: Theoretical aspects. Mon. Wea. Rev., 133, 17101726, https://doi.org/10.1175/MWR2946.1.

    • Search Google Scholar
    • Export Citation
Save
  • Aseev, N. A., and Y. Y. Shprits, 2019: Reanalysis of ring current electron phase space densities using Van Allen probe observations, convection model, and log-normal Kalman filter. Space Wea., 17, 619638, https://doi.org/10.1029/2018SW002110.

    • Search Google Scholar
    • Export Citation
  • Cohn, S. E., 1997: An introduction to estimation error theory. J. Meteor. Soc. Japan, 75, 257288, https://doi.org/10.2151/jmsj1965.75.1B_257.

    • Search Google Scholar
    • Export Citation
  • Evensen, G., and N. Fabio, 1997: Solving for the generalized inverse of the Lorenz model. J. Meteor. Soc. Japan, 75, 229243, https://doi.org/10.2151/jmsj1965.75.1B_229.

    • Search Google Scholar
    • Export Citation
  • Fletcher, S. J., 2010: Mixed lognormal-Gaussian four-dimensional data assimilation. Tellus, 62A, 266287, https://doi.org/10.1111/j.1600-0870.2010.00439.x.

    • Search Google Scholar
    • Export Citation
  • Fletcher, S. J., 2017: Data Assimilation for the Geosciences: From Theory to Applications. Elsevier, 976 pp.

  • Fletcher, S. J., and M. Zupanski, 2006a: A data assimilation method for log-normally distributed observational errors. Quart. J. Roy. Meteor. Soc., 132, 25052519, https://doi.org/10.1256/qj.05.222.

    • Search Google Scholar
    • Export Citation
  • Fletcher, S. J., and M. Zupanski, 2006b: A hybrid multivariate normal and lognormal distribution for data assimilation. Atmos. Sci. Lett., 7, 4346, https://doi.org/10.1002/asl.128.

    • Search Google Scholar
    • Export Citation
  • Fletcher, S. J., and M. Zupanski, 2007: Implications and impacts of transforming lognormal variables into normal variables in VAR. Meteor. Z., 16, 755765, https://doi.org/10.1127/0941-2948/2007/0243.

    • Search Google Scholar
    • Export Citation
  • Fletcher, S. J., and A. S. Jones, 2014: Multiplicative and additive incremental variational data assimilation for mixed lognormal-Gaussian errors. Mon. Wea. Rev., 142, 25212544, https://doi.org/10.1175/MWR-D-13-00136.1.

    • Search Google Scholar
    • Export Citation
  • Goodliff, M., S. Fletcher, A. Kliewer, J. Forsythe, and A. Jones, 2020: Detection of non-Gaussian behavior using machine learning techniques: A case study on the Lorenz 63 model. J. Geophys. Res. Atmos., 125, e2019JD031551, https://doi.org/10.1029/2019JD031551.

    • Search Google Scholar
    • Export Citation
  • Kalman, R. E., 1960: A new approach to linear filtering and prediction problems. J. Basic Eng., 82, 3545, https://doi.org/10.1115/1.3662552.

    • Search Google Scholar
    • Export Citation
  • Kalman, R. E., and R. S. Bucy, 1961: New results in linear filtering and prediction theory. J. Basic Eng., 83, 95108, https://doi.org/10.1115/1.3658902.

    • Search Google Scholar
    • Export Citation
  • Kliewer, A. J., S. J. Fletcher, A. S. Jones, and J. M. Forsthye, 2016: Comparison of Gaussian, logarithmic transform and mixed Gaussian-log-normal distribution based 1DVAR microwave temperature-water vapour mixing ratio retrievals. Quart. J. Roy. Meteor. Soc., 142, 274286, https://doi.org/10.1002/qj.2651.

    • Search Google Scholar
    • Export Citation
  • Kondrashov, D., M. Ghil, and Y. Shprits, 2011: Lognormal Kalman filter for assimilating phase space density data in the radiation belts. Space Wea., 9, S11006, https://doi.org/10.1029/2011SW000726.

    • Search Google Scholar
    • Export Citation
  • Lorenz, E. N., 1963: Deterministic nonperiodic flow. J. Atmos. Sci., 20, 130141, https://doi.org/10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Zupanski, M., 2005: Maximum likelihood ensemble filter. Part I: Theoretical aspects. Mon. Wea. Rev., 133, 17101726, https://doi.org/10.1175/MWR2946.1.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    (left) Plots to illustrate the differences in the modes, medians, and means for two different lognormal distributions that are representing the true state’s distribution (solid curve) and the analysis state’s distribution (dashed curve). (right) The distribution for the associated analysis error εa=xa/xt.

  • Fig. 2.

    (left) z and x true (red), mixed Gaussian–lognormal Kalman filter (MXKF) (blue), and extended Kalman filter (EKF) (black), observations (green circles). (right) za/ztt error and xaxt error plots for the analysis from the MXKF (solid black) and the EKF (black dashed) for σo = 0.5 with 50 time steps between observations.

  • Fig. 3.

    (left) z and x true (red), mixed Gaussian–lognormal Kalman filter (MXKF) (blue), and extended Kalman filter (EKF) (black), observations (green circles). (right) za/zt error and xaxt error plots for the analysis from the MXKF (solid black) and the EKF (black dashed) for σo = 2 with 25 time steps between observations.

  • Fig. 4.

    (left) z and x true (red), mixed Gaussian–lognormal Kalman filter (MXKF) (blue), and extended Kalman filter (EKF) (black), observations (green circles). (right) za/zt error and xaxt error plots for the analysis from the MXKF (solid black) and the EKF (black dashed) for σo = 0.25 with 200 time steps between observations.

  • Fig. 5.

    (left) z and x true (red), mixed Gaussian–lognormal Kalman filter (MXKF) (blue), and extended Kalman filter (EKF) (black), observations (green circles). (right) za/zt error and xaxt error plots for the analysis from the MXKF (solid black) and the EKF (black dashed) for σo = 0.5 with 50 time steps between observations.

All Time Past Year Past 30 Days
Abstract Views 590 0 0
Full Text Views 4148 3813 1063
PDF Downloads 745 299 16