1. Introduction
Over the last decade or so, there have been advances in the variational (VAR) forms of data assimilation to allow for non-Gaussian behavior of the errors; specifically, there has been much development that allows for lognormally distributed errors, as well as for Gaussian errors combined with lognormal errors such that these errors can be minimized simultaneously (Fletcher and Zupanski 2006a,b, 2007; Fletcher 2010; Fletcher and Jones 2014). A full summary of the development of the mixed Gaussian–lognormal variational systems from full field 3DVAR to incremental 4DVAR can be found in Fletcher (2017).
The aforementioned theory has been applied in a microwave retrieval system for temperature and mixing ratio from the AMSU-A brightness temperatures (Kliewer et al. 2016). There, three theories are compared: 1) the mixed distribution approach, which seeks the mode, or most likely state, of the analysis distribution, 2) a logarithmic transform applied to the mixing ratio, which seeks the median of the analysis distribution (Fletcher and Zupanski 2007), and 3) a Gaussian model for the errors of temperature and mixing ratio. It is shown that the mixed distribution approach performs better than the other two mentioned approaches in fitting to observations.
However, the ability to extend the lognormal theory to the Kalman filters, and hence to the ensemble-based data assimilation systems, has not been forthcoming. The basis of the lognormal variational work was provided in the seminal paper Cohn (1997), which contains the first appearance of a definition for a lognormally distributed observational error associated with a direct observation, which is in the form of a ratio, not a difference. This definition, together with a change of variables for the background state, allowed for a version of the Kalman equations that accounts for direct observations with lognormal errors. The reader is referred to Cohn (1997) for the full details of this derivation. The main difference between the work in Cohn (1997) and the work presented here is twofold: 1) we allow for nonlinear observation operators in the Kalman filter equations, and 2) we allow for a combination of Gaussian and lognormal distributed background and observational errors.
The equations for the Kalman filter (Kalman 1960; Kalman and Bucy 1961) can be derived from either a control theory approach or a least squares formulation (Fletcher 2017), where the latter is referring to minimizing the trace of the analysis error covariance matrix with respect to the Kalman gain matrix. The Kalman filter is therefore seeking the mean of the analysis distribution to minimize the errors. For a Gaussian distribution, the mean, mode, and median are equivalent. However, for skewed distributions it is not the case that the three descriptive statistics are equal. For left skewed distributions, the mode is less than the median, which is less than the mean, whereas for right skewed distributions the opposite is true. In section 2, we show that if one tries to follow the initial steps of the least squares approach, it is impossible to derive a lognormal-based Kalman filter because one cannot separate out the analysis and forecast error covariance matrices from the logarithmic operator in order to evolve them by the linear model. Another stumbling point is that it is not possible to define a weighted sum of predicted states and new observations to determine an estimate of the state at the current time and still be consistent with a lognormal estimate.
We shall briefly summarize an alternative approach, which is referred to as the lognormal Kalman filter (Kondrashov et al. 2011), where a logarithmic transform is introduced to a model variable, and the analytical differential equation is re-derived for the new variables. This approach is utilizing the property that the logarithm of a lognormally distributed random variable is a Gaussian distributed random variable and so the Kalman filter can be applied to the new variable. An inverse operation is utilized to obtain a value of the original model variable. This approach would not be practical for a numerical weather or ocean prediction model because it requires re-derivation of the associated prognostic differential equations for the state in the new variable.
Given these setbacks we recall that the Kalman filter is equivalent to 3DVAR when the static background error covariance matrix is replaced with a flow dependent background error covariance matrix, and examine a cost function–based approach to obtain the lognormal analytical state in terms of a lognormal-based Kalman gain matrix, and take the expectation of this state to obtain an expression that is similar to the Gaussian-based analysis error covariance matrix. We say “similar” in that the cost function will be defined to obtain the median, or unbiased state, of the analysis distribution. We should note here that the lognormal distribution is defined in terms of the vector of means μ and the covariance matrix Σ for ln x, not the vector of random variables x. We shall show that the lognormal-based analysis error covariance matrix is equivalent to the inverse Hessian matrix of the associated cost function scaled by the inverse of the derivatives of ln x. We shall also show that the estimate for the lognormal Kalman gain matrix minimizes the trace of the lognormal analysis covariance matrix.
The remainder of this paper is as follows. In section 3 we shall present the derivations of the lognormal and the mixed Gaussian–lognormal versions of the Kalman filter equations. In section 4 we shall examine the performance of the mixed Gaussian–lognormal Kalman filter equations against the Gaussian extended Kalman filter with the Lorenz 1963 model (Lorenz 1963) for different observational error variances and for different frequencies of observations. Also in this section we test the robustness of the new scheme against the extended Kalman filter over 5000 assimilation runs for the different observational error variance experiments and for 5000 perturbed true and background states. It is assumed in these experiments that the z component of the model is better modeled with a lognormal distribution than a Gaussian. The paper is finished with conclusions and ideas for future work.
2. Difficulties with lognormal-based Kalman filters
a. Statistical derivation of the forecast error covariance matrix
b. Kalman gain matrices
c. Change of variable-based lognormal Kalman filter
As mentioned in the introduction, there is an alternative approach that has been proposed to obtain a form of a lognormal-based Kalman filter in Kondrashov et al. (2011). This approach, which recently has been used for a reanalysis of ring current phase space densities in Aseev and Shprits (2019), should more accurately be referred to as the change of variable lognormal Kalman filter. In Kondrashov et al. (2011) the authors state that a striking feature of the radiation belts is that values of observed electron fluxes, and modeled particle space density (PSD), vary by several orders of magnitude. The corresponding error distributions are therefore not Gaussian, while standard data assimilation methods, such as the Kalman filter and its various adaptations to large-dimensional and nonlinear problems, are essentially based on least squares minimization of Gaussian errors, which was shown in the last section. Thus it is not possible to change these approaches directly to ensure consistency with a lognormal distribution.
As mentioned in Kondrashov et al. (2011), a possible technique that is quite often used for dealing with lognormal random variables is to use the property that the logarithm of a lognormal random variable is a Gaussian random variable. However, as shown in Fletcher and Zupanski (2007), when transforming from lognormal random variables to Gaussian random variables, minimizing the cost function in variational data assimilation, or finding the covariance matrix and the mean state with an ensemble Kalman filter system, then the descriptive statistic that is found in lognormal space once the inverse transform is applied, is the median state. A problem with this approach for the ensemble Kalman filter formulation is that the model is in terms of the original lognormal variable and not the transform logarithmic variable.
The motivation for considering a lognormal model for the PSD is justified through the statements from Kondrashov et al. (2011) that since this variable is always positive, and generally its variations, as measured by the standard deviation, increase as its mean value increases. However, Gaussian distributed variables can be negative and have a standard deviation that does not change as the mean changes. Lognormal errors arise when sources of variation accumulate multiplicatively, whereas Gaussian errors arise when these sources are additive.
In the next section we shall present an approach that builds off of a cost function for a lognormal median of the posterior distribution for the situation where both the background, and observational, errors are lognormally distributed. We shall also show that this approach ensures that the derived covariances are consistent with a multivariate lognormal distribution.
3. Lognormal and mixed Gaussian–lognormal Kalman filters
It appears from the attempted derivation in section 2 that it is not possible to obtain a lognormal version of the Kalman filter equations, that are based upon the first two moments of the multivariate Gaussian distribution through following a least squares approach, which would have involved showing that the derived Kalman gain matrix minimized the trace of the analysis error covariance matrix. However, what is important to recall here is that for the Gaussian distribution the three descriptive statistics are the same, that is to say the mode, median, and the mean are equal. This is not true for the lognormal distribution. It is shown in Fletcher and Zupanski (2006a) that these three statistics are quite different, and that they each have their own properties; the mode is the maximum likelihood state, the median is the unbiased state, and the mean is the minimum variance state in distribution theory. To make this important distinction more clear, we will review some of the characteristics of the lognormal distribution, before deriving the lognormal Kalman filter.
a. Properties of the lognormal distribution
To help illustrate this point, we have plotted two lognormal distributions that are supposed to represent the probability density functions (PDFs) of the true state (solid curve) and the analysis state (dashed curve), where the two states have the same Gaussian mean, μt = μa = ln 2, but with different Gaussian variances,
(left) Plots to illustrate the differences in the modes, medians, and means for two different lognormal distributions that are representing the true state’s distribution (solid curve) and the analysis state’s distribution (dashed curve). (right) The distribution for the associated analysis error
Citation: Monthly Weather Review 151, 3; 10.1175/MWR-D-22-0072.1
It is clear from the left panel of Fig. 1 that the two distributions have the same median, but that neither their modes nor their means are equal. In the right panel of Fig. 1 we have plotted the associated distribution for the equivalent analysis error
In Fletcher and Jones (2014) it is shown that when following a modal approach for the incremental formulation of mixed Gaussian–lognormal 4DVAR, the analysis error distribution, or the posterior distribution as it is also referred to as, had a mode at 1. This is therefore indicating that the most likely answer from the data assimilation system was something close to the true state.
b. Lognormal Kalman filter—Median-based approach
Returning to (24) we have that the diagonal matrices multiplying
c. Mixed Gaussian–lognormal Kalman filter
In this section it has been shown that it is possible to derive a nonlinear version of the Kalman filter equations to be used with lognormal random variables, as well as with a combination of lognormal and Gaussian random variables. The appearance of the set of equations are similar to the Gaussian form, but the evolution of the analysis error covariance matrix is exact and not through the application of the linearized model.
In the next section the mixed Gaussian–lognormal Kalman filter equations will be tested against the original linearized Gaussian Kalman filter equations, referred to as the extended Kalman filter (EKF) with the Lorenz 1963 model. It has been shown in the development and testing of the mixed Gaussian–lognormal variational data assimilation systems that the z component of this model is highly non-Gaussian, and as such is a good test to assess the performance of these new equations.
4. Experiments with the Lorenz 1963 model
As just mentioned, the Lorenz 1963 model has been used extensively to test the development of the mixed Gaussian–lognormal-based variational data assimilation systems. An important feature of the Lorenz 1963 model is that there are regions of the Lorenz attractor where the z component does not follow a Gaussian distribution, as has been shown in Fletcher (2010) and Goodliff et al. (2020). This component is always positive, while the x and y components have positive and negative values. In Fletcher (2010) there are four climatologies of the z component that are created from the Lorenz 1963 model with the initial conditions that we shall define soon, where it was clear that after 100 000 time steps this component appeared to have a global mode with a skewness to the left and then a secondary mode with a smaller occurrence rate than the global mode. When fitting a lognormal distribution to this data the global mode is very well captured with a slight underestimation of the secondary mode. When a Gaussian distribution was fitted to this data, both modes were underestimated and the Gaussian mode was in between the two modes and was assigning higher probabilities to states that did not occur that often. See Fig. 21.3 in Fletcher (2017) for this example.
The experiments will compare the analysis errors from the EKF, which uses the linearized model, against those from the MXKF, that uses the full nonlinear model. The two filters will be tested with different observational errors, where it is assumed that x and y components have Gaussian errors and the z component has lognormal errors. The observations are generated with different observational error variances, and with different times between analysis updates to determine the robustness of the new approach.
In this section we shall look at the sensitivity of the EKF and the MXKF to both observational error variance size, as well as time between observations. The numerical scheme that is used for the discretization of the nonlinear model is the second-order explicit Runge–Kutta scheme. The MXKF utilizes the nonlinear numerical model for all three components, while the EKF utilizes a linearized version of the model, which was calculated analytically and then discretized with the same scheme as the nonlinear model. The adjoint model was also calculated by hand. This is all coded in MATLAB. See Fletcher (2017) for more details on this calculation.
We shall consider four different configurations in this section; 1) σo = 0.5, 50 time steps between observations, 2) σo = 2, 25 time steps between observations, 3) σo = 0.25, 200 time steps between observations, and 4) σo = 1, 100 time steps between observations, where σo is the observational error standard deviation.
a. Experiment 1: σ0 = 0.5, 50 time steps between observations
In Fig. 2 we have two sets of plots, the first is of the z and x trajectories for the true states (red lines), the solution from the MXKF (blue lines), and the solution from the EKF (black lines), along with the observations (green circles). The second set of plots is of the z and x errors, where for the z component we consider the ratio to define the error, while for the x component the error is defined as the difference.
(left) z and x true (red), mixed Gaussian–lognormal Kalman filter (MXKF) (blue), and extended Kalman filter (EKF) (black), observations (green circles). (right)
Citation: Monthly Weather Review 151, 3; 10.1175/MWR-D-22-0072.1
It is clear from the trajectory plots in Fig. 2 that the observations are not that accurate, but are frequent. Both solutions appear drawn toward the observations in that the error increases when the less accurate observations are assimilated compared to the more accurate ones. However, when we consider the error plots we can see that both approaches are impacted by the less than perfect observations, but that the MXKF solution is able to be more consistent for both the x and z components, as measured by the z error being close to one, and close to zero for the x component, with the MXKF scheme able to recover quicker.
b. Experiment 2: σo = 2, 25 time steps between observations
In this section we consider the case where we have more observations than in experiment one, but these observations are less accurate. These results are presented in Fig. 3 in the same configuration as in experiment one. We can see that while there are some quite inaccurate observations, the solutions from either approach are not able to go out of phase from the true solution. Again it is clear that the MXKF approach is able to stay more consistent than the EKF solutions.
(left) z and x true (red), mixed Gaussian–lognormal Kalman filter (MXKF) (blue), and extended Kalman filter (EKF) (black), observations (green circles). (right)
Citation: Monthly Weather Review 151, 3; 10.1175/MWR-D-22-0072.1
c. Experiment 3: σo = 0.25, with 200 time steps between observations
In this experiment we are considering the case where we have fewer observations, but they are quite accurate. These results are presented in Fig. 4. As expected there is a decrease in the accuracy of both approaches, but we also have that neither of the solutions go out of phase. We again see that the MXKF is able to produce a more consistent solution than the EKF for both the x and z components.
(left) z and x true (red), mixed Gaussian–lognormal Kalman filter (MXKF) (blue), and extended Kalman filter (EKF) (black), observations (green circles). (right)
Citation: Monthly Weather Review 151, 3; 10.1175/MWR-D-22-0072.1
d. Experiment 4: σo = 1, 100 time steps between observations
The results from this experiment are presented in Fig. 5 where we can see that we have the situation where the EKF does go out of phase with the true solution, even going on to the wrong attractor for a short while, before assimilating additional observations to bring it back toward the true solution. However, for this configuration we see that the MXKF approach does not go out of phase in neither the z nor the x component to the extreme that the EKF solution does and appears to be able to better assimilate the observations each time.
(left) z and x true (red), mixed Gaussian–lognormal Kalman filter (MXKF) (blue), and extended Kalman filter (EKF) (black), observations (green circles). (right)
Citation: Monthly Weather Review 151, 3; 10.1175/MWR-D-22-0072.1
e. Robustness testing 1: Observational error standard deviations and observational frequency
In this subsection we present results from running experiments 1 to 4 with 5000 different random draws from the observational error distribution to test the robustness of the MXKF. To determine the robustness we calculate the analysis error as a lognormally distributed random variable, i.e., the ratio of the analysis to the true state for all 5000 solutions from the MXKF and the EKF, and from these errors we calculate the average minimum error and the average maximum error for the MXKF and the EKF. These results are summarized in Table 1 to highlight the spread from the average minimum analysis error to the maximum analysis error for the MXKF and the EKF.
Summary of the average minimum and maximum analysis errors for experiments 1–4 over 5000 assimilation runs.
We note here that during the 5000 evaluations for using experiment 3 and 4 configurations there was one instance for each where the MXKF did not converge. This aside, given that the analysis error is a ratio and if the scheme is performing well then the analysis error should be approximately equal to 1 as seen in the results in Fletcher and Jones (2014).
From the values in Table 1 it is clear that the MXKF on average for the situations considered here has a smaller spread between the average maximum and average minimum analysis error for all four experiments, bearing in mind the caveat above. It appears that the scheme performs best on average for the situation where there are inaccurate observations but more of them (experiment 2). The largest spread for the MXKF appears to be for experiment 3, accurate observations but less frequent.
f. Robustness testing 2: Perturbing the true and background states’ initial conditions
In the results present here experiment 1 was used for the observational error standard deviation and frequency but the initial conditions for the true state and the background state were randomly perturbed using the MATLAB function NORMRND with mean zero and three different standard deviations, σp = 0.1, 0.5, 1. Different perturbations were applied to the true initial conditions and the background initial conditions from those presented at the beginning of this section but were drawn from the same distribution. The performance metrics as for robustness testing 1 were applied here and are summarized in Table 2.
Summary of the average minimum and maximum analysis errors for perturbing the true and background initial condition over 5000 assimilation runs using experiment 1’s observational configuration.
From Table 2 it is clear that there is a sensitivity in the MXKF to the initial conditions for the true and background state. It should be noted that the MXKF had an approximate 1% failure rate for all three configurations, while the EKF did not. As with robustness testing 1, the MXKF has a smaller spread between its average maximum and minimum analysis error compared to the EKF when it converged.
5. Conclusions and further work
In this paper we have been able to show that it is not possible to follow the linear least squares approach that is used to derive the Kalman filter, and the extended Kalman filter (EKF) equations, to derive a similar expression for lognormally distributed errors. However, we have been able to show that if we keep the nonlinear model and follow a cost function–based approach associated with the median from Fletcher (2010), then it is possible to derive a set of nonlinear equations for the update of the median of the lognormal analysis state together with its uncertainty. We were able to extend this to the mixed Gaussian–lognormal probability density function, where the associated Kalman filter equations are referred to as the MXKF.
We coded the new MXKF, along with the EKF, for the Lorenz 1963 model in MATLAB, and showed that for different configurations of observational error variances and time steps between observations, where the observational errors for the x and y components were Gaussian distributed, while for the z component these were lognormally distributed, the MXKF appeared to be more consistent with the true solutions for longer periods than the EKF. We should note that the EKF was using the linearized numerical model, while the MXKF was using the nonlinear numerical model. It appears that this has an effect on the performance of the EKF compared to the MXKF in that it appears to fit more to the observations, while the MXKF does not always pull straight to the observations.
To evaluate the general performance of the MXKF against the EKF a set of 5000 assimilation experiments was run for each of the four experimental configurations from section 4. It was shown that the MXKF had a smaller spread between the average minimum and average maximum analysis error for all four experimental configurations, but we note that there was one realization for both experiment 3 and 4 where the MXKF did not converge. This robustness test was followed up with a sensitivity study of the MXKF and the EKF to perturbed true state and background state initial conditions.
It has been shown in Fletcher and Jones (2014) that lognormal-based data assimilation systems can be quite sensitive to the accuracy of the observations of the different components of the Lorenz 1963 model near the transition zones between the two attractors, and it is possible that the lognormal Kalman filter could also be suffering from this here, and is left for further work to determine if this is the case.
The next step in this work is to build the theory for an ensemble-based approach to the MXKF equations and rigorously test them with different toy problems. Given the nonlinear nature of the equations, and the dependence on the equations being derived from a cost function, the most likely candidate would be the maximum likelihood ensemble filter (MLEF) from Zupanski (2005). The MLEF is comprised of two steps: the forecast step uses the standard definition of the update for the forecast error covariance matrix from the Kalman filter but uses the nonlinear model to evolve this matrix between analysis times, instead of a linear model, through an ensemble where each ensemble member’s perturbation is a column from the analysis error covariance matrix from that assimilation cycle. The analysis step is to solve a flow dependent 3D VAR cost function projected into ensemble space through a Hessian preconditioner. The square root analysis error covariance is updated through the inversion of the Hessian preconditioner. The steps here are easily adaptable to the new MXKF equations for the updates of the analysis and forecast error covariance matrices.
The reason behind the non-Gaussian work over the last 15 years has been to develop more consistent data assimilation systems for positive definite variables. In the atmosphere it is well know that relative humidity is positive definite, that is to say it is always larger than zero, and as such we do not wish for a data assimilation system to produce an answer that is negative or equal to zero for this field. It has been shown in Kliewer et al. (2016) that through using a mixed Gaussian–lognormal 1DVAR for a temperature–mixing ratio it is possible to obtain better fits to both the temperature and the moisture channels through the covariances between the Gaussian and lognormal random variables. A full description of the links between the two distributions can be found in Fletcher (2017).
As most of the operational numerical weather prediction centers use a form of hybrid ensemble/variational data assimilation algorithm, it became important for the mixed Gaussian–lognormal theory to move toward that approach. However, the major stumbling block has been the Kalman filter component to create the ensemble covariance. The work in this paper is the first step toward a Gaussian–lognormal hybrid 4DVAR.
Acknowledgments.
The National Science Foundation Grant AGS-1738206 at CIRA/CSU supported authors 1, 3, 4, 5, and 6, while authors 2 and 7 were supported by NOAA’s Hurricane Forecast Improvement Program Award NA18NWS4680059. Funding for authors 1, 8, and 9 came from National Science Foundation Grant AGS-2033405 at CIRA/CSU. We are grateful for the extremely helpful reviews from the three anonymous reviewers and are grateful to reviewer two for the proof to show the positive definiteness of the different analysis error covariance matrices.
Data availability statement.
The code that support the findings of this study is openly available at the following URL: https://mountainscholar.org/handle/10217/234474.
REFERENCES
Aseev, N. A., and Y. Y. Shprits, 2019: Reanalysis of ring current electron phase space densities using Van Allen probe observations, convection model, and log-normal Kalman filter. Space Wea., 17, 619–638, https://doi.org/10.1029/2018SW002110.
Cohn, S. E., 1997: An introduction to estimation error theory. J. Meteor. Soc. Japan, 75, 257–288, https://doi.org/10.2151/jmsj1965.75.1B_257.
Evensen, G., and N. Fabio, 1997: Solving for the generalized inverse of the Lorenz model. J. Meteor. Soc. Japan, 75, 229–243, https://doi.org/10.2151/jmsj1965.75.1B_229.
Fletcher, S. J., 2010: Mixed lognormal-Gaussian four-dimensional data assimilation. Tellus, 62A, 266–287, https://doi.org/10.1111/j.1600-0870.2010.00439.x.
Fletcher, S. J., 2017: Data Assimilation for the Geosciences: From Theory to Applications. Elsevier, 976 pp.
Fletcher, S. J., and M. Zupanski, 2006a: A data assimilation method for log-normally distributed observational errors. Quart. J. Roy. Meteor. Soc., 132, 2505–2519, https://doi.org/10.1256/qj.05.222.
Fletcher, S. J., and M. Zupanski, 2006b: A hybrid multivariate normal and lognormal distribution for data assimilation. Atmos. Sci. Lett., 7, 43–46, https://doi.org/10.1002/asl.128.
Fletcher, S. J., and M. Zupanski, 2007: Implications and impacts of transforming lognormal variables into normal variables in VAR. Meteor. Z., 16, 755–765, https://doi.org/10.1127/0941-2948/2007/0243.
Fletcher, S. J., and A. S. Jones, 2014: Multiplicative and additive incremental variational data assimilation for mixed lognormal-Gaussian errors. Mon. Wea. Rev., 142, 2521–2544, https://doi.org/10.1175/MWR-D-13-00136.1.
Goodliff, M., S. Fletcher, A. Kliewer, J. Forsythe, and A. Jones, 2020: Detection of non-Gaussian behavior using machine learning techniques: A case study on the Lorenz 63 model. J. Geophys. Res. Atmos., 125, e2019JD031551, https://doi.org/10.1029/2019JD031551.
Kalman, R. E., 1960: A new approach to linear filtering and prediction problems. J. Basic Eng., 82, 35–45, https://doi.org/10.1115/1.3662552.
Kalman, R. E., and R. S. Bucy, 1961: New results in linear filtering and prediction theory. J. Basic Eng., 83, 95–108, https://doi.org/10.1115/1.3658902.
Kliewer, A. J., S. J. Fletcher, A. S. Jones, and J. M. Forsthye, 2016: Comparison of Gaussian, logarithmic transform and mixed Gaussian-log-normal distribution based 1DVAR microwave temperature-water vapour mixing ratio retrievals. Quart. J. Roy. Meteor. Soc., 142, 274–286, https://doi.org/10.1002/qj.2651.
Kondrashov, D., M. Ghil, and Y. Shprits, 2011: Lognormal Kalman filter for assimilating phase space density data in the radiation belts. Space Wea., 9, S11006, https://doi.org/10.1029/2011SW000726.
Lorenz, E. N., 1963: Deterministic nonperiodic flow. J. Atmos. Sci., 20, 130–141, https://doi.org/10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2.
Zupanski, M., 2005: Maximum likelihood ensemble filter. Part I: Theoretical aspects. Mon. Wea. Rev., 133, 1710–1726, https://doi.org/10.1175/MWR2946.1.