• Anderson, J. L., 2007: An adaptive covariance inflation error correction algorithm for ensemble filters. Tellus, 59A , 210224.

  • Anderson, J. L., 2009: Spatially and temporally varying adaptive covariance inflation for ensemble filters. Tellus, 61A , 7283.

  • Blanchet, I., C. Frankignoul, and M. Cane, 1997: A comparison of adaptive Kalman filters for a tropical Pacific Ocean model. Mon. Wea. Rev., 125 , 4058.

    • Search Google Scholar
    • Export Citation
  • Brankart, J-M., C. Ubelmann, C-E. Testut, E. Cosme, P. Brasseur, and J. Verron, 2009: Efficient parameterization of the observation error covariance matrix for square root or ensemble Kalman filters: Application to ocean altimetry. Mon. Wea. Rev., 137 , 19081927.

    • Search Google Scholar
    • Export Citation
  • Cohn, S. E., 1997: An introduction to estimation theory. J. Meteor. Soc. Japan, 75 , 257288.

  • Cosme, E., J-M. Brankart, J. Verron, P. Brasseur, and M. Krysta, 2010: Implementation of a reduced rank, square-root smoother for high resolution ocean data assimilation. Ocean Modelling, in press.

    • Search Google Scholar
    • Export Citation
  • Daley, R., 1991: Atmospheric Data Analysis. Cambridge University Press, 457 pp.

  • Daley, R., 1992: Estimating model-error covariances for application to atmospheric data assimilation. Mon. Wea. Rev., 120 , 17351746.

  • Dee, D., 1995: Online estimation of error covariance parameters for atmospheric data assimilation. Mon. Wea. Rev., 123 , 11281145.

  • Evensen, G., and P. J. van Leeuwen, 1996: Assimilation of Geosat altimeter data for the Agulhas Current using the ensemble Kalman filter with a quasi-geostrophic model. Mon. Wea. Rev., 124 , 8596.

    • Search Google Scholar
    • Export Citation
  • Hoang, S., R. Baraille, O. Talagrand, X. Carton, and P. De Mey, 1997: Adaptive filtering: Application to satellite data assimilation in oceanography. Dyn. Atmos. Oceans, 27 , 257281.

    • Search Google Scholar
    • Export Citation
  • Le Provost, C., and J. Verron, 1987: Wind-driven mid-latitude circulation—Transition to barotropic instability. Dyn. Atmos. Oceans, 11 , 175201.

    • Search Google Scholar
    • Export Citation
  • Lermusiaux, P., 2007: Adaptive modelling, adaptive data assimilation and adaptive sampling. Physica D, 230 , 172196.

  • Li, H., E. Kalnay, and T. Miyoshi, 2009: Simultaneous estimation of covariance inflation and observation errors within an ensemble Kalman filter. Quart. J. Roy. Meteor. Soc., 135 , 523533.

    • Search Google Scholar
    • Export Citation
  • Maybeck, P. S., 1979: Stochastic Models, Estimation and Control. Vol. 1. Academic Press, 423 pp.

  • Mitchell, H. L., and P. L. Houtekamer, 2000: An adaptive ensemble Kalman filter. Mon. Wea. Rev., 128 , 416433.

  • Pham, D. T., J. Verron, and M. C. Roubaud, 1998: Singular evolutive extended Kalman filter with EOF initialization for data assimilation in oceanography. J. Mar. Syst., 16 , 323340.

    • Search Google Scholar
    • Export Citation
  • Von Mises, R., 1964: Mathematical Theory of Probability and Statistics. Academic Press, 694 pp.

  • Wahba, G., D. R. Johnson, F. Gao, and J. Gong, 1995: Adaptive tuning of numerical weather prediction models: Randomized GCV in three- and four-dimensional data assimilation. Mon. Wea. Rev., 123 , 33583369.

    • Search Google Scholar
    • Export Citation
  • Wang, X., and C. Bishop, 2003: A comparison of breeding and ensemble transform Kalman filter ensemble forecast schemes. J. Atmos. Sci., 60 , 11401158.

    • Search Google Scholar
    • Export Citation
  • View in gallery

    (left) Mean and (right) standard deviation of the dynamic height (m) over the full square basin. Our region of interest is represented by the black square in the middle of the basin.

  • View in gallery

    Dynamic height (m) snapshots corresponding to year 21 on 10, 20, and 30 Oct.

  • View in gallery

    Simulated observation noise (m) corresponding (left to right) to correlation models A, B, and C with σ = 0.2 m and ℓ = 5 grid points (for the last 2 models).

  • View in gallery

    (left) Likelihood function for the scaling factor of the forecast error covariance matrix, as a function of the number of input innovation vectors (1, 3, 10, and 100). (right) Mode of the posterior probability, together with percentiles 0.1 and 0.9 (thin dashed lines), as a function of the number of innovations (in abscissa).

  • View in gallery

    Mode (solid line) and percentiles 0.1 and 0.9 (thin dashed lines) of the posterior probability distribution for the forecast error covariance scaling factor, as estimated from the last 50 observations of a signal with nonconstant statistics. The dotted line represents the true scaling factor.

  • View in gallery

    (top) Estimated scaling factor for the forecast error covariance matrix of the model-simulated signal (thick lines). Associated root mean square error (only shown for the first 20 yr) for (middle) altimetry and (bottom) velocity, as compared to another simulation (thin lines) that is performed without adaptivity (i.e., with fixed ); the dashed lines represent the corresponding error standard deviation as estimated by the filter.

  • View in gallery

    (top) Estimated scaling factor for the observation error covariance matrix; the dashed lines represent the percentiles 0.1 and 0.9, and the thin line represents the solution that is obtained with an incorrect forecast error covariance scaling. Associated rms error for (middle) altimetry and (bottom) velocity; the dashed line represents the error standard deviation as estimated by the filter, and the thin lines represents the solution that is obtained without the adaptive mechanism.

  • View in gallery

    Likelihood function for the correlation length scale (in grid points), as a function of the number of input innovation vectors (1, 3, 10, and 100). The result is shown for the correlation model B with (left) ℓ = 2 and (middle) 5 grid points and (right) the correlation model C with ℓ = 5 grid points.

  • View in gallery

    Estimated scaling factors for (top) the forecast error covariance matrix and (middle) the observation error covariance matrix. The three lines correspond to experiments performed with correlation models A (solid lines), B (dashed lines), or C (dotted lines) for the observation error. (bottom) Associated rms error for altimetry (solid line) and error standard deviation as estimated by the filter (dotted line). The smallest error (about 4 cm, similar to Fig. 6) is obtained using correlation model A and the largest error (about 8 cm) using correlation model C.

  • View in gallery

    Likelihood function for parameters α (x axis) and β (y axis), based on the first two innovation vectors. Experiments performed using the correlation model (left) A, (middle) B, or (right) C for the observation error.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 268 134 6
PDF Downloads 57 28 4

Efficient Adaptive Error Parameterizations for Square Root or Ensemble Kalman Filters: Application to the Control of Ocean Mesoscale Signals

View More View Less
  • 1 LEGI/CNRS-Grenoble Universités, Grenoble, France
Full access

Abstract

In Kalman filter applications, an adaptive parameterization of the error statistics is often necessary to avoid filter divergence, and prevent error estimates from becoming grossly inconsistent with the real error. With the classic formulation of the Kalman filter observational update, optimal estimates of general adaptive parameters can only be obtained at a numerical cost that is several times larger than the cost of the state observational update. In this paper, it is shown that there exists a few types of important parameters for which optimal estimates can be computed at a negligible numerical cost, as soon as the computation is performed using a transformed algorithm that works in the reduced control space defined by the square root or ensemble representation of the forecast error covariance matrix. The set of parameters that can be efficiently controlled includes scaling factors for the forecast error covariance matrix, scaling factors for the observation error covariance matrix, or even a scaling factor for the observation error correlation length scale.

As an application, the resulting adaptive filter is used to estimate the time evolution of ocean mesoscale signals using observations of the ocean dynamic topography. To check the behavior of the adaptive mechanism, this is done in the context of idealized experiments, in which model error and observation error statistics are known. This ideal framework is particularly appropriate to explore the ill-conditioned situations (inadequate prior assumptions or uncontrollability of the parameters) in which adaptivity can be misleading. Overall, the experiments show that, if used correctly, the efficient optimal adaptive algorithm proposed in this paper introduces useful supplementary degrees of freedom in the estimation problem, and that the direct control of these statistical parameters by the observations increases the robustness of the error estimates and thus the optimality of the resulting Kalman filter.

* Current affiliation: MERCATOR-Ocean, Toulouse, France

Corresponding author address: Jean-Michel Brankart, LEGI/CNRS, BP53X, 38041 Grenoble CEDEX, France. Email: jean-michel.brankart@hmg.inpg.fr

Abstract

In Kalman filter applications, an adaptive parameterization of the error statistics is often necessary to avoid filter divergence, and prevent error estimates from becoming grossly inconsistent with the real error. With the classic formulation of the Kalman filter observational update, optimal estimates of general adaptive parameters can only be obtained at a numerical cost that is several times larger than the cost of the state observational update. In this paper, it is shown that there exists a few types of important parameters for which optimal estimates can be computed at a negligible numerical cost, as soon as the computation is performed using a transformed algorithm that works in the reduced control space defined by the square root or ensemble representation of the forecast error covariance matrix. The set of parameters that can be efficiently controlled includes scaling factors for the forecast error covariance matrix, scaling factors for the observation error covariance matrix, or even a scaling factor for the observation error correlation length scale.

As an application, the resulting adaptive filter is used to estimate the time evolution of ocean mesoscale signals using observations of the ocean dynamic topography. To check the behavior of the adaptive mechanism, this is done in the context of idealized experiments, in which model error and observation error statistics are known. This ideal framework is particularly appropriate to explore the ill-conditioned situations (inadequate prior assumptions or uncontrollability of the parameters) in which adaptivity can be misleading. Overall, the experiments show that, if used correctly, the efficient optimal adaptive algorithm proposed in this paper introduces useful supplementary degrees of freedom in the estimation problem, and that the direct control of these statistical parameters by the observations increases the robustness of the error estimates and thus the optimality of the resulting Kalman filter.

* Current affiliation: MERCATOR-Ocean, Toulouse, France

Corresponding author address: Jean-Michel Brankart, LEGI/CNRS, BP53X, 38041 Grenoble CEDEX, France. Email: jean-michel.brankart@hmg.inpg.fr

1. Introduction

In Kalman filters, the accuracy of the estimated error covariances closely depends on the quality of the assumptions about model error and observation error statistics. Inaccurate parameterization may even lead to filter divergence, with error estimates becoming grossly inconsistent with the real error (Maybeck 1979; Daley 1991). To avoid this divergence, one recognized solution (adaptive filtering) is to determine the list of uncertain parameters in the model or observation error statistics, and try to adjust them using the actual differences between forecasts and observations (Daley 1992; Dee 1995; Wahba et al. 1995; Blanchet et al. 1997; Hoang et al. 1997; Wang and Bishop 2003; Lermusiaux 2007; Li et al. 2009). In particular, if the forecast and observation error probability distributions can be assumed Gaussian, a possible solution (proposed by Dee 1995) is to compute the maximum likelihood estimate of the adaptive parameters given the current innovation vector. This strategy is used and further developed in Mitchell and Houtekamer (2000) or Anderson (2007, 2009) in the more specific context of the ensemble Kalman filter. It is also this line of thought that is followed in this study to compute optimal estimates of adaptive statistical parameters.

A major difficulty with this kind of method is that, in general, the computational complexity of the parameter estimation is several times larger than the complexity of the estimation of the system state vector (i.e., than the classic observational update of the Kalman filter). The reason is that, in Kalman filters, the optimal state estimate is linear in the observation vector (of size y), whereas the optimal parameter estimate is intrinsically nonlinear in the observation vector, so that the optimal solution must be computed iteratively (for instance using a downhill simplex method to find the maximum of the likelihood function, as in Mitchell and Houtekamer 2000). A first objective of this paper is to show that there exists nonetheless a few types of important parameters, for which a maximum likelihood optimal estimate can be computed at a numerical cost that is asymptotically negligible (for large y) with respect to that of the standard Kalman filter observational update. Second, taking advantage of this small additional computational complexity, the method is extended to condition the current parameter estimation on the full sequence of past innovations, which amounts to solving an additional (nonlinear) filtering problem for the unknown statistical parameters.

Furthermore, in square root or ensemble Kalman filters, the forecast error covariance matrix is always available in square root form, making it possible to use a modified observational update algorithm [proposed by Pham et al. (1998), as one of the essential elements defining the singular evolutive extended Kalman (SEEK) filter algorithm], whose computational complexity is linear in the number of observations, instead of being cubic as in the standard formula. Originally, this modified algorithm requires that the observation error covariance matrix be diagonal, but solutions exist to preserve its numerical efficiency (linearity in y) in presence of observation error correlations, as shown by Brankart et al. (2009), who also give a detailed comparison of the modified versus the original algorithms. In the present paper, we first show in section 2 how the optimal adaptive filtering problem described above can be formulated in the framework of this modified square root algorithm. It is indeed in this framework that optimal parameter estimates can be computed at negligible additional numerical cost. This is shown in section 3, where the discussion focuses on the few types of parameters for which such computational efficiency is possible. These important parameters are (i) scaling factors for the forecast error covariance matrix, (ii) scaling factors for the observation error covariance matrix, and (iii) scaling factors for the observation error correlation length scale.

In section 4, this adaptive filter is applied to the problem of estimating the evolution of an ocean mesoscale signal using observations of the ocean dynamic topography. To demonstrate the behavior of the adaptive mechanism, idealized experiments are performed, in which the reference signal (the truth of the problem) is generated by a primitive equation ocean model and sampled to produce synthetic observations with known error statistics. In that way, it is possible to check that the method is able to produce accurate parameter estimates and to explore the ill-conditionned situations (inappropriate prior assumptions or uncontrollability of the parameters) in which adaptivity can be misleading.

2. Formulation of the problem

a. Nonadaptive statistics

Let us consider the problem of estimating the evolution of a system described by the state vector x(t), between times t0 and tN+1, given a set of observation vectors yk at times tk, k = 1, … , N(tk < tk+1):
i1520-0493-138-3-932-e1
where xk = x(tk), 𝗛k, and ϵk are the state vector, the observation operator, and the observational error at time tk, respectively. We also assume that we have information on the initial condition x0 = x(t0) and optionally on dynamical laws governing the time evolution of x(t). In many situations, it is useful to solve the filtering problem, in which the estimation at time t is computed using only past information. This means that the information only needs to be propagated forward in time from t0 to tN+1, with discrete updates each time that an observation vector is available. In a probabilistic framework, this amounts to computing sequentially the following probability distributions:
i1520-0493-138-3-932-e2
where p0(x0) and pN+1(xN+1) are the initial and final probability distributions, and where pkf(xk) and pka(xk) are the probability distributions at time tk before and after that the observation vector yk is taken into account (superscripts “f” and “a” stand for “forecast” and “analysis”). Since we solve a filtering problem, it is implicit that every probability distribution is conditioned on the past observations [i.e., yk with k′ < k for pkf(xk), and yk with k′ ≤ k for pka(xk)].
In this study, it is assumed that every probability distribution of the sequence (2) is Gaussian:
i1520-0493-138-3-932-e3
where xkf and xka are the expected forecast and analysis state vectors at time tk, respectively, and where 𝗣kf and 𝗣ka are the corresponding forecast and analysis error covariance matrices, respectively. They are computed by repeating the following two steps in sequence from k = 1 to k = N. The forecast step computes xkf and 𝗣kf from by exploiting the prior knowledge about the time dependence of xk and xk−1 (as expressed, e.g., by approximate dynamical laws or by a time decorrelation model). The analysis step (or observational update) computes xka and 𝗣ka from xkf and 𝗣kf by conditioning the prior distribution pkf(xk) on the observation vector yk using the Bayes’s theorem: pka(xk) ∼ pkf(xk)p(yk|xk). It is well known that the observational update preserves Gaussianity as soon as p(yk|xk) is Gaussian: p(yk|xk) = (𝗛kxk, 𝗥k), where 𝗥k = 〈ϵkϵkT〉 is the observation error covariance matrix, and that xka and 𝗣ka can be computed by the classic linear observational update formulas:
i1520-0493-138-3-932-e4
i1520-0493-138-3-932-e5
where dk is the innovation vector and 𝗞k is the Kalman gain. It is useful to remark that the Gaussian parameters xkf and xka, k = 1, … , N (and the innovations dk as well) functionally depend on the sequence of observation vectors yk, k = 1, … , N (even if, in practice, xkf and xka are usually directly determined from the actual value of the observations: yk = yko). Conversely, the Gaussian parameters 𝗣kf and 𝗣ka do not depend on the observations yk (but on the observation operator 𝗛k and on the observation error covariance matrix 𝗥k) and as long as all input covariance matrices are assumed to be error free, the resulting 𝗣kf and 𝗣ka are also known matrices, which do not need to be inferred from the observations. This is precisely what is going to be changed in section 2b.
If the forecast error covariance matrices 𝗣kf are available in square root form:
i1520-0493-138-3-932-e6
as in square root or ensemble Kalman filters, then it is possible to reformulate the observational update by linearly transforming the state and observation vectors xk and yk into the new vectors ξk and ηk:
i1520-0493-138-3-932-e7
where δk is the projection of the innovation vector dk on the square root of the forecast error covariance matrix 𝗛k𝗦kf (with metric ):
i1520-0493-138-3-932-e8
and 𝗨k (unitary matrix) and Λk (diagonal matrix) are the matrices with the normalized eigenvectors and inverse eigenvalues of the matrix:
i1520-0493-138-3-932-e9
By the transformation in (7), the probability distributions pkf(xk) and p(yk|xk) transform to
i1520-0493-138-3-932-e10
so that
i1520-0493-138-3-932-e11
from which it is easy to compute xka and 𝗣ka by inverting the transformation in (7):
i1520-0493-138-3-932-e12
By the transformation 𝗨k, the observational update of every component of the ξk vector are thus independent [all covariance matrices in (10) and (11) are diagonal].

The computational complexity of the observational update is also structurally modified by the transformation. In the linear observational update algorithm, the computational cost mainly results from the dependence between the vector components. In the presence of correlation, weighting optimally the forecast and observational information indeed requires the inversion of a full matrix (with computational complexity proportional to the cube of the size of the matrix). The main difference introduced by the transformation is thus that, in Eq. (5), the inversion is performed in the observation space, while in Eq. (9), it is performed in the state space [the cost of the inversion of 𝗥 is here assumed negligible, as shown by Brankart et al. (2009), for large classes of observation error correlation models]. Moreover, if 𝗣kf is rank deficient, the size of the matrix Γk in (9) is not the size of the state vector, but the rank of 𝗣kf, which is given by the number of independent columns in 𝗦kf (i.e., the error modes or the ensemble members). For such problems, the transformation in (7) is here introduced as a simple way of obtaining directly the reduced observational update formulas, that are otherwise deduced from (4) and (5) using the Sherman–Morrison–Woodbury formula (Pham et al. 1998). This transformed algorithm is particularly efficient for the reduced rank problems involving many observations, which are quite common in atmospheric and oceanic applications of square root or ensemble Kalman filters. The key property of the algorithm is indeed that its computational complexity is linear in the number y of observations, making it possible to deal efficiently with large size observation vectors [see Brankart et al. (2009), for a detailed comparison of the computational complexity of the transformed versus the original algorithm].

b. Adaptive statistics

In the filtering problem described in section 2a, optimal observational updates can only be obtained if the forecast and observation error covariance matrices 𝗣kf and 𝗥k are accurate. However, in realistic atmospheric or oceanic applications, both matrices depend on inaccurate parameters. On the one hand, in the forecast step, modeling the time dependence of the errors usually requires questionable parameterizations (e.g., to account for errors in the dynamical laws governing the time evolution of the system). On the other hand, accurate observation error statistics are not always available. This is especially true for representation errors [difference between the truth of the problem and the real world, see Cohn (1997)], which can occur for instance if the state vector of the problem only contains a subrange of the scales that are present in the real system. In addition, the computation of the covariance matrices 𝗣kf and 𝗣ka never involves the observations yk themselves, so that no feedback using differences with respect to the real world is possible; consequently, any inaccuracy in the parameterization of the observation or system noises can lead to instability of the error statistics produced by the filter. This is a well-known effect in Kalman filters, which is usually circumvented (in atmospheric and oceanic applications) by adjusting uncertain parameters in the forecast or observation error covariance matrices using statistics (variance or covariance) of the innovation sequence (adaptive Kalman filters). See for instance Dee (1995) for a more precise justification.

To describe the adaptive mechanism in a probabilistic framework, we introduce vectors of uncertain parameters αk and βk in the description of the forecast and observation error covariance matrices 𝗣kf(αk) and 𝗥k(βk), with probability distributions pkf(αk, βk) and pka(αk, βk) before and after their update using observations yk. Thus, αk and βk are additional random vectors that must be estimated from the observational information: they are additional degrees of freedom that are introduced in the estimation problem to account for possible uncertainty in the Gaussian error covariances. In principle, assuming that the parameters αk are uncertain transforms the probability distribution pkf(xk) into
i1520-0493-138-3-932-e13
where pkf(xk|αk) is the Gaussian probability distribution of xk that is assumed in section 2a if the vector of parameters αk is known. In this equation, we use the updated parameter probability distribution pka(αk, βk) for the parameters to express that all available information before and including time tk is taken into account to estimate the uncertain parameters in the prior state probability distribution pkf(xk), that is, the observational update of the parameter probability distribution is performed before the observational update of the state probability distribution. However, in (13), which explicitly takes into account uncertainties in the parameters, pkf(xk) is no longer Gaussian in general, so that the filter equations described in section 2a do not apply anymore. An approximation is needed. To close the problem and be able to compute an explicit solution both for the state of the system and for the parameters, the central assumption is that the forecast probability distribution pkf(xk) is still Gaussian (as in section 2a), but with covariances corresponding to the current best estimate α*k of the parameters:
i1520-0493-138-3-932-e14
This approximation means that uncertainty in the parameters is not taken into account in the computation of the state estimates, which does not make a large difference as soon as the parameters remain sufficiently accurate. Equation (14) is indeed the zeroth-order term in the development of the integral in (13) if the parameter variance is used as the small quantity. By applying the same chain of arguments to the observation probability distribution, we directly obtain
i1520-0493-138-3-932-e15
Consequently, with the approximations in (14) and (15), the state filtering problem remains formally unchanged with respect to section 2a, and it remains only to show how best estimates of the parameters α*k and β*k can be obtained.
In this study, the successive values of α*k and β*k are computed by solving an additional filtering problem for the parameters, which amounts to sequentially computing the following probability distributions (which are not Gaussian in general):
i1520-0493-138-3-932-e16
The forecast step computes pkf(αk, βk) from pka(αk−1, βk−1) by exploiting a prior knowledge p(αk, βk|αk−1, βk−1) about the time dependence of the parameter values (which must be specified, see section 3):
i1520-0493-138-3-932-e17
The analysis step (or observational update) computes pka(αk, βk) from pkf(αk, βk) by conditioning the prior distribution pkf(αk, βk) on the innovation vector dk using the Bayes theorem:
i1520-0493-138-3-932-e18
In writing this equation, it is assumed that innovation dk contains information about the parameters that is independent from that contained in the previous innovations, which is already included in pkf(αk, βk). As in the state filtering problem, this can be justified by the hypothesis that model errors and observation errors (that parameters αk and βk are meant to model) are independent for times tk and tk (kk′). Furthermore, since pkf(xk|αk) and pk(yk|xk, βk) are the Gaussian distributions given in section 2a, the probability distribution p(dk|αk, βk) of the innovation vector dk is also a Gaussian distribution:
i1520-0493-138-3-932-e19
This fully defines the update of the (αk, βk) distribution defined by Eq. (18). It is then the purpose of section 3 to describe how the probability distributions sequence in (16) can be computed efficiently from (17) and (18) in practical applications and how to deduce from them the best estimates α*k and β*k to use for the state observational update at time tk.

This joint state and parameter estimation problem can even be solved without the assumptions in (14) and (15) as soon as the state observational update is performed by a separate application of Eq. (4) for every member of the ensemble [as proposed by Evensen and van Leeuwen (1996), for the ensemble Kalman filter]. In that case, the prior non-Gaussian forecast probability distribution given by the integral in (13) can be simulated by randomly drawing a different parameter vector from pka(αk, βk) to perform the update of each member. With that scheme, uncertainty in the parameter of the prior distribution can thus be explicitly taken into account, with asymptotic convergence of the prior distribution to the integral in (13) for large ensemble size. However, using (13) instead of (14) may not be appropriate if pka(αk, βk) is not very accurate, for instance if the dispersion of the parameters (second-order moment) is not correctly simulated. Inaccuracy of the second-order moment is indeed the very reason why adaptivity is needed in the filter in (2) for the system state vector. Since this cannot be repeated for the parameter filter in (16), the use of the best parameter estimates α*k and β*k with Eqs. (14) and (15) can also be viewed as a closure that prevents inaccuracies in the second-order moments of pka(αk, βk) from affecting directly the observational update of the state vector.

3. Efficient adaptive parameter estimates

a. Constant parameters

If the parameters αk and βk are assumed constant in time, then p(αk, βk|αk−1, βk−1) = δ(αkαk−1, βkβk−1) so that the forecast step in (17) reduces to
i1520-0493-138-3-932-e20
which means that the previous knowledge of the parameters is not altered by the passing of time. In this simple case, we can suppress the index k to the parameter vectors, and all probability distributions of the sequence in (16) can be obtained by a recursive application of (18):
i1520-0493-138-3-932-e21
where Lk(α, β) is the likelihood function (as defined, e.g., in Von Mises 1964). From this expression, the best estimates of α*k and β*k of the parameters α and β at time tk can be obtained as the mean (minimum variance estimator) or the mode (maximum probability estimator) of pka(α, β). In this study, we only consider the mode of (21), which minimizes the cost function:
i1520-0493-138-3-932-e22
where the Gaussian shape of p(dk|α, β) in (19) is explicitly introduced, and Cst is an arbitrary constant.

b. Nonconstant parameters

If the parameters αk and βk are not assumed constant in time, it is necessary to introduce a prior knowledge p(αk, βk|αk−1, βk−1) about the time dependence of the parameters in order to perform the forecast step in (17). In this study, it is assumed that the effect of time is only to diffuse parameter probability densities from high probability regions to low probability regions, according to the simple model:
i1520-0493-138-3-932-e23
For a Gaussian distribution, the exponent f simply corresponds to multiplying the covariance by a factor 1/f, which means that the effect of time is simply to increase the error variance on the previous estimate by a factor 1/f. We call f the “forgetting exponent” because it corresponds to the exponential rate at which old information must be forgotten in the estimation of the parameters. The cost function in (22) indeed transforms to
i1520-0493-138-3-932-e24
Measured in number of assimilation cycles, the e-folding forgetting time scale is thus ke = (ln1/f )−1. Using a very small f exponent ( f → 0) means that only the last innovation is used to estimate the current parameters. This particular case exactly corresponds to the solution proposed by Dee (1995).

In addition, the solution of Dee (1995) is written without the first term in the cost function in (22) or (24), which corresponds to computing the maximum likelihood estimator of the parameters (as defined, e.g., in Von Mises 1964, chapter 10). The parameters α*k and β*k are then said to maximize the likelihood of the observed innovation sequence (i.e., the conditional probability of the innovation sequence for given parameters). This is useful in absence of reliable prior information on the parameters. With the parameterization in (23), this initial information is anyway progressively forgotten with time, as shown by the exponentially decreasing factor fk in the cost function in (24).

c. Evaluation of the cost function

Minimizing the cost functions (22) or (24) with respect to the parameters requires the application of an iterative method, and thus the possibility of evaluating Jk for the successive iterates of the parameters α, β (for the sake of simplicity, subscript k is removed for the parameters; it is now implicit that they are computed for the current cycle k). The main difficulty with the expressions in (22) or (24) is that the evaluation of Jk requires the computation of the inverse and determinant of the covariance matrix 𝗖k(α, β) for the full sequence k′ ≤ k of previous innovation vectors, and this is needed for all successive iterates of α, β (let p be the number of iterates needed to reach the minimum with sufficient accuracy). Even if we truncate the innovation sequence to the K last innovations, this corresponds to a computational complexity proportional to pKy3 (leading behavior for large y), just to compute the optimal parameters α* and β* at time tk [i.e., a factor pK with respect to the computational complexity of the observational update (4) and (5), or more precisely, with respect to the leading component (proportional to y3) of this computational complexity for large size observation vectors]. This large computational cost explains why using K = 1 (as in Dee 1995) is the only affordable solution (with this classic observational update algorithm) to compute optimal adaptive parameters in realistic atmospheric or oceanic assimilation systems. But, even in this special case, the computational complexity of the parameter estimation is still a factor p larger than the estimation of the state vector with Eqs. (4) and (5).

The objective of this section is to show how the cost function can be computed using the reduced innovation vector δk (size r) defined by Eq. (8), that is, by exploiting the transformation in (7) that transports the inversion problem from the observation space to the reduced dimension space defined by the square root or ensemble representation of the forecast error covariance matrix 𝗣kf(α). For that purpose, assume first that we know the reduced innovation vectors δk(α, β) and the matrix Γk(α, β), for k′ ≤ k, defined by Eqs. (8) and (9) as a function of the parameters α, β, together with the corresponding eigenvalues Λk(α, β), and eigenvectors 𝗨k(α, β) as defined by Eq. (9). From this, we need to compute two kinds of terms in the cost function: . On the one hand, can be computed by inverting:
i1520-0493-138-3-932-e25
using the Sherman–Morrison–Woodbury formula:
i1520-0493-138-3-932-e26
which gives directly:
i1520-0493-138-3-932-e27
or
i1520-0493-138-3-932-e28
On the other hand, can be computed using the general formula:
i1520-0493-138-3-932-e29
where 𝗭 is any rectangular matrix and 𝗫 a regular square matrix. By applying this formula to the matrix in (25) with 𝗫 = 𝗥k(β) and , and using the definition (9) of Γk, we obtain the determinant:
i1520-0493-138-3-932-e30
In addition, using the definition (9) of Λk and since , we finally obtain the following:
i1520-0493-138-3-932-e31
Efficiently computing the determinant of the covariance matrix 𝗖k is a classic difficulty in many problems involving the estimation of Gaussian parameters, which is here circumvented by introducing the diagonal covariance matrix Λk instead of 𝗖k using the transformation in (7).

Yet, the transformed Eqs. (28) and (31) are still not a solution by themselves to efficiently compute the cost function as a function of the parameters, since the computation of δk(α, β), Γk(α, β), and then Λk(α, β) and 𝗨k(α, β) is required for the full sequence of previous innovation vectors k′ ≤ k, and for all successive iterates of (α, β) (needed to minimize the cost function); that is, there is still a factor pK with respect to the computational complexity of the state observational update (8) and (9), or more precisely, with respect to the leading component (proportional to y) of this computational complexity for large size observation vectors. However, with the transformed equations in (28) and (31), there are classes of parameters α, β for which the additional computational complexity of the cost function minimization is either negligible with respect to the state observational update, or at least independent of the number y of observations. These classes of parameters are presented in sections 3df. Such simplifications are not possible with the original expressions in (22) or (24), because the matrix 𝗖k(α, β) is the sum of two matrices: one depending on α and the other on β. The inverse and determinant must therefore be computed explicitly for every iteration of α and β.

d. Scaling of the forecast error covariance matrix

Let us assume first that a scaling factor α > 0 is introduced at each time tk to rescale the forecast error covariance matrix 𝗣kf = αkf that is normally produced by the filter. [An efficient method to compute optimal estimates of this scaling factor is also proposed by Anderson (2007, 2009).] This can be done for instance to compensate a possible collapse of the ensemble forecast that can result from an insufficient system noise parameterization or the inadequacy of the Gaussian approximation. Then, if α is the only parameter to estimate, the cost functions in (28) and (31) reduce to
i1520-0493-138-3-932-e32
where δ̃k, Λ̃k, and k are already available from the state observational update performed at time tk, so that the computational complexity of (32) is negligible (it is proportional to r, because the product is also already available). Even the gradient of the cost function can be computed for a similar numerical cost. Consequently, if the observational update is performed in the reduced space defined by the transformation in (7), an optimal adaptive estimate of the scaling factor α* can always be obtained without significant additional cost. Moreover, this can be directly generalized to a vector of parameters, scaling separately the eigenmodes of the forecast error covariance matrix (i.e., by independent scale factors applied to the eigenvalues λl,k, l = 1, … , r). This can be done for instance by parameterizing a function α(λ) giving the scaling factor α as a function of the relative importance of each mode in the square root or ensemble representation of .

However, when estimating the inflation parameter α, it is important to keep in mind that nothing prevents the maximum likelihood estimator of α from being negative, which means that innovations are too small to explain the observation error covariance 𝗥k alone, so that the adaptive scheme is attempting to reduce 𝗖k(α) by subtracting from 𝗥k. This usually results from an incorrect parameterization of the observation error covariance matrix, which underestimates the accuracy of the observations. This problem cannot occur with the maximum probability estimator since the prior probability distribution p0(α) is equal to zero for α ≤ 0, so that optimization always provides a positive value for α*. In this case, the optimal value is, however, closely related to the behavior of p0(α) near to zero and no longer to the statistics of the innovation sequence. This means that the adaptive scheme ascribes undeserved accuracy to the forecast (i.e., a small-scale factor α*), to compensate an incorrect parameterization of the observation errors. This phenomenon always results in further underestimating the weight of the observations in the assimilation system, and it can only be avoided either by improving the parameterization of the observation errors or by including observation error parameters β in the list of adaptive parameters (see sections 3e,f).

e. Scaling of the observation error covariance matrix

A second possibility is to introduce a scaling factor β at each time tk to rescale the observation error covariance matrix 𝗥k = βk, where k is the default prior estimate for this matrix. This can be useful to adjust inaccurate observation error statistics resulting in particular from the unknown amplitude of representation errors. Thus, if we also keep the parameter α (introduced in section 3d) in the control vector, Eqs. (28) and (31) reduce to
i1520-0493-138-3-932-e33
and
i1520-0493-138-3-932-e34
where yk is the number of observations at time tk. Again, the computational complexity of (33) and (34) is negligible. The term can indeed be computed once for all for each time tk, with a computational complexity 2yk (for a diagonal k) that is negligible with respect to that of the observational update. We can then conclude that, if the observational update is performed in the reduced space defined by the transformation in (7), optimal adaptive estimates of both scaling factors α*k and β*k can always be obtained without significant additional cost.
In addition, it is possible to generalize the simple problem of estimating one single scaling factor β for the whole observation error covariance matrix to the more general problem of estimating separate scaling factors for several segments of the observation vector. This may be useful for instance if the observation vector results from the merging of several datasets (i = 1, … , N), originating from different instruments, whose errors need to be scaled separately. However this generalized problem requires additional matrix operations that may significantly increase the computational complexity, because changing the structure of the matrix 𝗥k modifies the eigenvalue problem defined by Eq. (9), which must then be solved again for every new iterate of the vector of parameters β. The first step is to express the matrix as the sum of the matrices (separately scaled with factor 1/βi), which are nonzero only for observations belonging to the corresponding dataset:
i1520-0493-138-3-932-e35
and to separately store the corresponding vectors δk,i and matrices Γk,i given by Eqs. (8) and (9). This computation does not involve any additional operations since this can be done once for all, and since every scalar product only needs to be extended to the part of with nonzero values (which are complementary so that the count of operations remains unchanged). From these precomputed elements, the cost function can be evaluated by performing the following operations for every iterate of the vector β and for every k′ ≤ k: (i) compute the overall δk(β) and Γk(β) as
i1520-0493-138-3-932-e36
and (ii) compute the eigenvalue–eigenvector decomposition of Γk(β) from which the second term of Eqs. (28) and (31) can be easily computed [with factor α as in (32) if required]. Concerning the first term of (28) or (31), it can be computed as the sum of the individual elements:
i1520-0493-138-3-932-e37
where yk′,i is the number of observations in the segment number i. This computation is straightforward since the individual can also be computed once for all at time step k′. The dominant cost of these additional operations (for Nr) comes from the necessity to recompute Λk and 𝗨k (computational complexity proportional to r3) for every iterate of β and every k′ ≤ k, so that the overall additional complexity is proportional to pKr3, still independent of the number of observations, but no more linear in r.

Finally, it is important to keep in mind that, in order to attempt the joint control of observation and forecast error covariance scaling factors α and β, we must be certain that both parameters can be simultaneously identified through the innovation sequence. [See Li et al. (2009), who also propose a method to control these two parameters.] This is for instance clearly impossible if observation and forecast errors have identical covariance structures (in the observation space), because then parameters α and β play exactly the same role in the innovation covariance matrix in (25). In this ill-conditioned situation, parameters α and β are not jointly controllable whatever the number of observations or the length of the innovation sequence.

f. Observation error correlation length scale

All developments presented so far are valid whatever the shape of the observation error covariance matrix. However, performing the observational update in the reduced space using Eqs. (7)(12) can only be efficient (i.e., linear in the number of observations) if the observation error covariance matrix 𝗥k can be inverted at low cost (e.g., if it can be assumed diagonal). It is nevertheless possible to preserve the efficiency of the transformed observational update algorithm in presence of observation error correlations, as shown by Brankart et al. (2009). Their method consists in augmenting the observation vector yk with new observations that are linear combinations of the original observations, and assuming a diagonal observation error covariance matrix in the augmented observation space. Since the computational complexity of the algorithm is linear in the number of observations, the size of the observation vector can indeed be increased without prohibitive consequence on the numerical cost. If the augmented observation vector y+, the augmented observation operator 𝗛+, and the associated diagonal observation error covariance matrix 𝗥+ are defined as
i1520-0493-138-3-932-e38
where 𝗧 is any linear transformation operator, and 𝗥0, 𝗥1 are diagonal matrices, then, the observational update that is performed using y+ and 𝗥+ is equivalent to an observational update performed using only observations y with a nondiagonal observation matrix 𝗥 given by
i1520-0493-138-3-932-e39
It can be shown for instance that adding gradient observations (𝗧 is then the discrete gradient operator) is equivalent to assuming observation error correlation decreasing exponentially with the distance, with a correlation length scale given by ℓ = σ0/σ1 (for homogeneous 𝗥0 = σ02𝗜 and 𝗥1 = σ12𝗜). The purpose of this section is to show how the optimal adaptive algorithm described above must be modified if this kind of parameterization of the observation errors correlations is applied.
First of all, it can be easily verified that the terms of the cost function in (27) are not affected by the observation vector transformation and can thus be identically computed using y and 𝗥 or y+ and 𝗥+. The vector δk and the matrix Γk are indeed left unchanged by this change of observation vector, if the observation error covariance matrices 𝗥 and 𝗥+ are related by Eq. (39). (It is the purpose of this transformation to keep the observational update unchanged.) This is also true for the term dT𝗥−1d in Eq. (27):
i1520-0493-138-3-932-e40
These terms can thus be computed efficiently using a diagonal observation error covariance matrix in the augmented observation space, using the elements δk, Γk, 𝗨k, and Λk that are already available from the observational update performed at time tk.
However, the adaptive algorithm cannot be applied blindly with the augmented observation vector defined by (38), because the determinant of the observation error covariance matrix, which is present in the terms of the cost function in (31), is modified by the transformation: |𝗥| ≠ |𝗥+|. The evaluation of the cost function thus requires an explicit computation of the determinant |𝗥| using Eq. (39). Fortunately, this computation can be simplified a great deal using the following transformation. First, the determinant of (39) can be rewritten as
i1520-0493-138-3-932-e41
Then using again Eq. (29) in the same way as it is used to deduce (30) and (31), we obtain
i1520-0493-138-3-932-e42
where μl are the eigenvalues of T. In general, computing the μl eigenvalues is a problem with computational complexity proportional to y3, but in many practical situations, a sufficient knowledge of this spectrum may not be very difficult to acquire if 𝗧 is kept simple enough [as is required for an efficient observational update, see Brankart et al. (2009)]. For instance, if 𝗧 is a discrete gradient operator and 𝗥0 = σ02𝗜 and 𝗥1 = σ12𝗜 are homogeneous, μl are simply the eigenvalues of a discrete Laplacian operator.
Let us assume now that the observation error covariance matrix depends on two uncertain parameters β0 and β1, which must be controlled:
i1520-0493-138-3-932-e43
where 0 and 1 are the default prior estimate of the diagonal matrices 𝗥0 and 𝗥1 defined by (39). Here β0 and β1 are thus scaling factors for these two matrices: 𝗥0 = β00 and 𝗥1 = β11. If 𝗧 is a combination of derivative operators of successive orders (and 𝗥0, 𝗥1 are spatially homogeneous: 𝗥0 = σ02𝗜, 𝗥1 = σ12𝗜), then the scaling factor for the correlation length scale ℓ is a function of . If 𝗧 is a discrete gradient, this reduces to (see Brankart et al. 2009). With the parameterization in (43) together with the scaling parameter α, Eqs. (28) and (31) for the components of the cost function reduce to
i1520-0493-138-3-932-e44
and
i1520-0493-138-3-932-e45
because δk(α, β0, β1) and Γk(α, β0, β1) defined in (36) can be written as
i1520-0493-138-3-932-e46
Equations (44)(46) show that the eigenvalue/eigenvector decomposition in (9) needs only to be recomputed when iterating on the ratio β0/β1, which scales the observation error correlation length. The computational complexity of the minimization is thus proportional to pKr3 (leading behavior for large y and r), where p is here only the number of iterations requiring the update of the correlation length scale. In summary, by minimizing the cost function in (22) or (24) with components (44) and (45), it is possible to adapt simultaneously scaling factors for the forecast error covariance matrix, for the observation error covariance matrix, and for the observation error correlation length scale (given by α, β0 and β0/β1, respectively).
As a particular case, the problem with α = 0 corresponds to estimating parameters β0 and β1 of the covariance 𝗥 of a zero mean random vector from a finite sample of independent events dk, k′ = 1, … , k. In this case, the last term disappears in (44) and (45) and we can obtain the likelihood function as
i1520-0493-138-3-932-e47
where a constant size y of the observation vector, a constant operator 𝗧, and constant diagonal matrices 0 and 1 are assumed. If a prior probability distribution p0(β0, β1) is available, the posterior distribution is then pka(β0, β1) ∼ p0(β0, β1) L(β0, β1). Parameters β0 and β1 jointly govern the observation error variance σ2 and the observation error correlation length scale ℓ (while the shape of the correlation function is governed by the operator 𝗧, which is assumed to be known). The likelihood function in (47) is thus also a likelihood function for σ2 and ℓ as soon as their relation to β0 and β1 is known. This relation can be deduced from the functions f (ℓ) = σ02/σ12 and g(ℓ) = σ02/σ2 that can be obtained from (43) with 0 = σ02𝗜 and 1 = σ12𝗜 (see Brankart et al. 2009). Using these functions, the likelihood function in (47) can be rewritten as a function of σ2 and ℓ:
i1520-0493-138-3-932-e48
where μl are here directly the eigenvalues of 𝗧𝗧T (and not of T as before). If 𝗧 is the gradient operator, the μl are simply the eigenvalues of the Laplacian operator, f (ℓ) = ℓ and g(ℓ) is a decreasing function of ℓ, behaving proportionally to ℓn (in n dimensions) if ℓ is large with respect to the distance between observations. If the correlation length is known, we retrieve the classic likelihood function for the variance σ2 of a Gaussian random vector. Conversely, if the variance is known, Eq. (48) is simply the likelihood function L(ℓ) for the correlation length of a random vector with known variance and known correlation shape. The equation shows that it can be computed explicitly at low cost as soon as 𝗧, μl, f (ℓ), and g(ℓ) are known (see section 4d).

4. Demonstration experiments

The purpose of this section is to demonstrate how the optimal adaptive algorithm described in the previous sections can be used in practice. As an example application, we consider the problem of estimating the long-term evolution of a model-simulated ocean mesoscale signal from synthetic observations of the ocean dynamic topography. To concentrate on the behavior of the adaptive algorithm, it is assumed that the only source of information to solve this estimation problem comes from these synthetic observations. The dynamical laws governing the ocean flow are not exploited to constrain the solution. Only simplified assimilation experiments are thus performed in which the ocean model operator 𝗠 is set to zero (total ignorance) or identity (persistence). In that way, a clear diagnostic of the adaptive mechanism can be obtained, without being blinded by unverifiable interferences with a complex nonlinear ocean model. Several examples are shown to illustrate the control of the forecast error covariance scaling (section 4b), the observation error covariance scaling (section 4c), and the correlation length scale (section 4d), before attempting the joint control of all these parameters (section 4e). But before that, we describe how the reference mesoscale signal and the synthetic observations are generated (section 4a).

a. Description of the experiments

The reference mesoscale signal is simulated using a primitive equation model of an idealized square and 5000-m-deep flat bottom ocean at midlatitudes (between 25° and 45°N). In this square basin a double-gyre circulation is created by a constant zonal wind forcing blowing westward in the northern and southern parts of the basin and eastward in the middle part of the basin (with sinusoidal latitude dependence: τ = −τ0 cos[(λλmin)/(λmaxλmin)] and τ0 = 0.1 N m−2). The western intensification of these two gyres produces western boundary currents that feed an eastward jet in the middle of the square basin (see the resulting mean dynamic height in Fig. 1). This jet is unstable (Le Provost and Verron 1987) so that the flow is dominated by chaotic mesoscale dynamics, with largest eddies that are ∼100 km wide, and to which correspond velocities of ∼1 m s−1 and dynamic height differences of ∼1 m (see the resulting dynamic height standard deviation in Fig. 1). All this is very similar in shape and magnitude to what is observed in the Gulf Stream (North Atlantic) or in the Kuroshio (North Pacific).

The time evolution of this chaotic system is computed using the Nucleus for European Modelling of the Ocean (NEMO) numerical ocean model, with a horizontal resolution of ¼° × ¼° cosλ and 11 levels in the vertical (see Cosme et al. 2010 for more detail about the model configuration). The main three physical parameters governing the dominant characteristics of the flow are the stratification, the bottom friction, and the horizontal viscosity. The model is started from rest with uniform stratification and can be considered to reach equilibrium statistics after 20 yr of simulation. In this paper, we thus concentrate on the estimation of the 100-yr signal from years 21 to 120. Moreover, we focus our study to a limited subdomain in the middle of the jet (about 650 × 650 km, as shown by the black square in Fig. 1) with intense and quite homogeneous mesoscale activity. It is also assumed that the reference simulation is known with a time resolution of 1 snapshot every 10 days. Figure 2 shows for instance the simulated dynamic height in October of year 21, with a time resolution of 10 days, which is clearly sufficient to observe the slow westward (upstream) motion of the main eddies.

To estimate the time evolution of this mesoscale flow, we assume that the ocean altimetry is observed every 10 days at model resolution. Without a dynamical model to constrain the estimation problem, it is indeed important to have observations with a sufficient horizontal coverage. However, in order to generate the synthetic observations, an artificial observational noise is added to the reference simulation. This noise is meant to simulate measurement and representation errors that always exist in a real observation dataset. In our system, the representation error mainly includes the subgrid-scale eddies that are not resolved by the model discretization. This component of the observation error is thus often dominant and poorly known, which justifies adjusting its main statistics (variance and correlation length scale) using the available observations. In this study, three kinds of correlation model are used to simulate the observational noise: (A) uncorrelated errors, (B) exponential decorrelation function: ρ(r) = exp(−r/ℓ), and (C) a smooth noise correlation model: ρ(r) = (r/l)K1(r/l), where K1 is the second kind modified Bessel function of order 1. The two last models have the property that they can be efficiently parameterized using a diagonal observation error covariance matrix in an augmented observation space [as in Eq. (38)]. Model B just requires including gradient observations (with error standard deviation σ1 = σ0/ℓ), while model C requires including both gradient and curvature (with error standard deviations σ1 = σ0/ℓ2 and σ2 = σ0/ℓ2). See Brankart et al. (2009) for more detail about this correspondence. Figure 3 shows an example of the simulated observation noise corresponding to the three correlation models. They are randomly sampled from the Gaussian distribution (0, 𝗥), where the observation error covariance matrix 𝗥 is characterized by a uniform standard deviation σ = 0.2 m and one of the correlation models A, B, or C, with a correlation length scale ℓ equal to 5 grid points (for the last 2 models).

b. Control of the forecast error covariance scaling

The first set of experiments is dedicated to the estimation of a single global scaling factor α for the forecast error covariance matrix, by applying the method described in section 3d. In these first experiments, observation errors are uncorrelated, with standard deviation σ = 0.2 m, and the corresponding observation error covariance matrix is assumed perfectly known: 𝗥 = σ𝗜. Problems with a constant and nonconstant parameter α are successively considered.

1) Gaussian random signal

As a first illustration of the adaptive mechanism, let us consider the problem of estimating a sequence of random and independent draws from the Gaussian probability distribution [0, αref(k)f], where αref(k) is not accurately known, and f is the covariance of the 100-yr model sequence described in section 4a (and illustrated in Fig. 2). In this very simple (but unfavorable) situation, the best model is obviously 𝗠 = 0, and the past observations are only useful to improve the knowledge of the parameter α. In the couple of filtering problems described by the sequences in (2) and (16), only the second remains, and the problem reduces to estimating one parameter of a Gaussian distribution (0, αf𝗛T + 𝗥) from a random sample whose size k is growing with time.

As a first case study, the reference sequence of draws xk is sampled from a constant Gaussian probability distribution: αref(k) = 1. From this, we can simulate a sequence of observations vectors yk as explained above, and then compute the terms of the cost function in (22) using Eq. (32). This can be done efficiently by computing once for all δ̃k, Λ̃, and (only δ̃k depends on k in this simple problem) from the square root decomposition of f using Eqs. (8) and (9). Figure 4 (left panel) shows the resulting likelihood function Lk(α), given by Eq. (21) that is obtained for several time indices k = 1, 3, 10, and 100. (The function is scaled so that the maximum is always equal to 1.) The narrowing of Lk(α) around the reference value α = 1 for increasing k shows that the knowledge of the parameter is increasing with time as more observations become available. With a prior probability distribution for the parameter α, for instance p0(α) = exp(−α), the posterior probability at time tk can be computed as pk(α) ∼ p0(α)Lk(α). The mode of this distribution together with percentiles 0.1 and 0.9 are drawn in Fig. 4 (right panel) as a function of k, showing that probability density is progressively concentrating toward the reference value α = 1.

If the reference parameter is no longer constant, and if little is known about the time dependence of the parameter values, we can use the simple model described in section 3b to forget the oldest innovations in the computation of the current parameter best estimate. To test the method for several parameter fluctuation time scales in one single experiment, we set αref(k) = a + b sin[ω(k)k] with ω(k) = k/k1 and k1 = 10 000. It is shown in Fig. 5 (thick dotted line) for a = 1 and b = ½. The figure also shows the estimate α*(k) that is obtained using an e-folding forgetting time scale ke = 100 (i.e., a forgetting exponent f ≃ 0.99). This forgetting time scale is well adapted only if it is comparable to the reference parameter fluctuation time scale: ke ∼ 1/ω(k) (i.e., for kk1/ke = 100). Before this time index, the accuracy of the estimate can be improved by keeping a longer sequence of observations, and after this time index, it becomes better to forget the observations faster. The estimation thus suffers from the poor prior knowledge (summarized in the single parameter f ≃ 0.99) of the time dependence between parameter values. As the fluctuation frequency increases with time, the representation of the successive minima and maxima of the reference parameter time series becomes less and less accurate.

2) Model simulated signal

Let us now consider the more realistic problem of estimating the 100-yr model-simulated mesoscale signal described in section 4a (and illustrated in Fig. 2). The initial condition is assumed perfectly known, the observation errors are still uncorrelated (with σ = 0.2 m), but here, a better model is to assume persistence: 𝗠 = 𝗜. Furthermore, assuming stationary statistics, the corresponding model error covariance matrix can be consistently parameterized using the time covariance of differences between successive model snapshots in the reference simulation: . The forecast step of the state filtering problem thus reduces to . In addition, in these experiments, it is assumed that the model error is always dominant and gives the structure of the forecast error covariance matrix, we thus write 𝗣kf = α𝗤, where α is an unknown scaling parameter. (In other situations, it may be better to make a different assumption and write for instance . Such scaling factors can be adapted as well by the method described in this paper.)

With this parameterization, we can solve the joint filtering problem for the state of the system and for the parameter α, as explained in sections 2 and 3. In the parameter filter, we prescribe the prior probability distribution for the parameter p0(α) = exp(−α) and the forgetting exponent f = 0.9. Figure 6 (top panel) shows the resulting estimate α*(tk) that is obtained as the mode of the posterior probability distribution at time tk. As expected, the covariance of the error of the persistence model is close to our prior estimate 𝗤 so that the estimated α remains close to 1. On the other hand, as explained in the previous example [section 4b(1)], the estimated accuracy of the parameter is very sensitive to the forgetting exponent (i.e., the number of innovative vectors that is assumed relevant to include in the current parameter estimate). This sensitivity to a subjective assumption is the very reason why using the closure in (14) and (15) is thought to be better than using the integral in (13) to simulate the forecast error probability distribution (see the explanation at the end of section 2b). In this example with very steady statistics, a better parameter accuracy can be obtained by increasing the forgetting exponent f (using for instance f = 0.99 instead of f = 0.9), but this is done at the expense of a much larger numerical cost (10 times larger for f = 0.99) since a longer innovation sequence must be used for each evaluation of the cost function.

On the other hand, Fig. 6 illustrates the corresponding error on the state estimate, as obtained for altimetry (middle panel) and velocity (bottom panel). The figure shows the root-mean-square difference between the state estimate (after the observational update at time tk) and the reference simulation (solid thick line), as compared to the corresponding error estimate produced by the filter (the square root of the trace of the analysis error covariance matrix, dashed thick line). In the adaptive experiment (thick line), these two curves remain quite consistent on the long term, indicating that the adaptive mechanism is sufficiently constraining the error statistics to produce consistent estimates of the total error variance. This result is compared with another simulation that is performed without the adaptive mechanism (thin lines) using the fixed value . The error parameterization is thus inconsistent, which deprives the filter from statistical optimality, so that larger errors are immediately produced (together with overoptimistic error estimates).

c. Control of the observation error covariance scaling

The second set of experiments is dedicated to the estimation of a single global scaling factor β for the observation error covariance matrix, by applying the method described in section 3e. In these experiments, observation errors are still uncorrelated, with a standard deviation of σ = 0.2 m, but the scaling of the observation error covariance matrix is assumed unknown: 𝗥 = β, with = σ𝗜. We first consider the same example as in section 4b(2), but with known forecast error covariance matrix 𝗣kf = 𝗤. The only parameter to control is thus the scaling factor β. This is done by solving the joint filtering problem for the state of the system and for the parameter β [as in section 4b(2) for parameter α, but this time using the expression of the cost function given in section 3e], using again the prior probability distribution p0(β) = exp(−β), but forgetting the exponent f = 1, since β can be assumed constant.

Figure 7 (top panel, thick lines) shows the resulting estimate β*(tk) that is obtained as the mode of the posterior probability distribution at time tk, together with percentiles 0.1 and 0.9. As expected, the optimal estimate β*(tk) quickly converges toward the correct value β = 1, with an associated error decreasing to zero as more observations become available. Old observations are indeed never forgotten since the parameter is assumed constant. The figure also illustrates the corresponding error on the state estimate (as in Fig. 6), showing that we obtain the same result as in Fig. 6 as soon as parameter β becomes sufficiently accurate. This result is compared with another simulation that is performed without the adaptive mechanism (thin lines) using the inaccurate value β = 0.5. We observe again that, without adaptive statistics, the error estimate remains inconsistent, so that the resulting nonoptimal scheme can only produce larger errors, together with badly estimated error variance (underestimated in this example).

On the other hand, it is also interesting to investigate what happens if we control the scaling factor β alone, in presence of inaccurate forecast error covariance scaling. For that purpose, we just redo the same experiment with 𝗣kf = α𝗤, with constant α = ¼. The resulting estimate β*(tk) is also shown in Fig. 7 (top panel, thin line). As can be observed, β*(tk) quickly diverges from the correct value β = 1, because the adaptive mechanism is trying to compensate for the underestimation of the forecast error variance by overestimating the observation error, in order to account for the total innovation signal. It is thus very important to stress again the assumption that all statistical parameters that are not included in the control vector are accurately known. Inconsistency in the prior statistical parameterization can easily lead to grossly incorrect adaptive parameters.

d. Control of the correlation length scale

The third set of experiments is dedicated to the estimation of the correlation length scale from a random sample of a Gaussian distribution, by applying the method described in section 3f. In these experiments, it is assumed that this Gaussian distribution is characterized by a zero mean and a covariance 𝗥. This application thus corresponds to the problem described in section 3f with the simplification that α = 0 (i.e., 𝗥 is the only remaining term in the covariance 𝗖 of the random vector), so that we can directly apply Eqs. (47) and (48) to estimate the signal variance and/or correlation length scale.

As an illustration, let us consider the problem of estimating the correlation length scale of the observational noise described in section 4a (and illustrated in Fig. 3) from samples of various sizes. In this experiment, it is assumed that the noise standard deviation σ = 0.2 m and the correlation structure in model B [ρ(r) = exp(−r/ℓ)] or model C [ρ(r) = (r/l)K1(r/l)] are known, so that the correlation length scale ℓ is the only parameter that must be estimated. Moreover, we have already said that these two correlation structures can be consistently parameterized using the parameterization in (39) with a transformation operator 𝗧 that includes the gradient for model B or the gradient and curvature for model C. From this operator, it is easy to evaluate the eigenvalues μl of 𝗧𝗧T, and the function f (ℓ) and g(ℓ) using Eq. (43). Figure 8 illustrates the resulting likelihood function L(ℓ) for the correlation length scale ℓ, as computed using Eq. (48), for several sizes k of the sample. (The figure is scaled so that the maximum is always equal to 1.) The figure shows that, for the correlation model B, the likelihood function narrows close to the correct value ℓ = 2 grid points (left panel) or, ℓ = 5 grid points (middle panel), as more observations are available, which indicates that the adaptive mechanism presented in this paper is able to control the correlation length scale of a random signal, as soon as a correct assumption about the correlation structure can be formulated. Concerning the correlation model C, also illustrated in Fig. 8, there is a discrepancy between the true correlation length scale (ℓ = 5 grid points) and the estimated value (ℓ ∼ 3.2 grid points). This problem results from an approximate evaluation of the function g(ℓ) in Eq. (48), for which we assumed an infinite domain (whereas our real domain is not very large with respect to the correlation length scale). This difficulty does not arise if the adaptive parameters are β0 and β1 (instead of σ2 and ℓ), but this points out the sensitivity of the estimation to inaccuracies in the representation of the correlation structure.

e. Joint control of forecast and observation error parameterizations

Last, in a fourth set of experiments, we solve a more general assimilation problem in which two adaptive parameters considered in the previous sections are controlled together: a parameter α to adjust the scaling of the forecast error covariance matrix, and a parameter β to adjust the scaling of the observation error covariance matrix. For that purpose, we use observations with uncorrelated or correlated errors (σ = 0.2 m and ℓ = 5 grid points) using correlation model A, B, or C, as illustrated in Fig. 3. In addition, in the experiments, the structure of the observation error covariance matrix is assumed to be known: it is given by Eq. (39) with an operator 𝗧 consistent with the real correlation model (A, B, or C), and with diagonal matrices 𝗥0 = σ0𝗜 and 𝗥1 = σ1𝗜 parameterized in such a way that the true value of the global scaling parameter is β = 1. As prior probability distribution for the parameters, we use p0(α, β) = exp[−(α + β)], that is, independent prior distributions with exponential probability density for each parameter. As in section 4b(2), the parameters are not assumed constant in time, and the same forgetting factor f = 0.9 is used. With these assumptions, we can solve the joint filtering problem for the state of the system and for the parameters α and β. In particular, the optimal parameter estimates are obtained at each time tk by minimizing the cost function whose components are given by Eqs. (33) and (34) as a function of the parameters α and β (with yk equal to the number of real observations, i.e., excluding the fictious gradient or curvature observations if any).

Figure 9 (top two panels, thick lines) shows the resulting estimates α*(tk) and β*(tk) that are obtained as the mode of the posterior probability distribution at time tk. As expected, for all correlation models A, B, or C, the optimal forecast error covariance scaling α*(tk) converges toward the same solution as in section 4b(2), and the optimal observation error covariance scaling β*1(tk) converges approximately toward the correct values β = 1. Figure 9 (bottom panel) also illustrates the corresponding error on altimetry. For correlation model A, the result is similar to the solution shown in Fig. 6 (obtained with known β), whereas for the two other correlation models B or C, the error is larger, as a result of the lower quality of the observations that are assimilated (correlated instead of uncorrelated observation error). But in any case, the error estimate produced by the adaptive filter (dotted thick line) is always a consistent estimate of the real error standard deviation. Without adaptivity (not shown in the figure), the error estimate can become grossly inconsistent (as explicitly shown in the examples of sections 4b and 4c) with the same negative consequences on filter optimality.

Finally, in order to characterize more completely the knowledge that is acquired about parameters α and β, Fig. 10 represents their joint likelihood function (based on the first two innovation vectors), as obtained in the three experiments (i.e., with correlation model A, B, or C for the observation error). The first thing that can be observed is that β is diagnosed with a better accuracy than α, as a consequence of the larger number of degrees of freedom in the observation error as compared to the forecast error (see Figs. 2 and 3). Second, in these experiments, the slope of the first principal axis of the sensitivity ellipse happens to be only slightly negative, which indicates that, in this case study, the adaptive scheme is able to make a clear distinction between forecast and observation errors. This absence of an unstable direction with large inaccuracy (which would correspond to a correct representation of the total covariance whatever α and β along that line) explains why parameters α and β can be jointly identified. Again, this is linked to the very distinct structures of the forecast and observation errors (cf. Figs. 2 and 3) so that their respective variances can be easily diagnosed as soon as their respective covariance structures are known.

5. Conclusions

It is a common practice in Kalman filter applications to adjust uncertain statistical parameters by exploiting the information contained in the innovation sequence. Yet, optimal estimates can only be obtained if it is possible to evaluate the posterior probability distribution for the parameters given the innovation sequence. With the classic formulation of the Kalman filter observational update, the computational complexity of this optimization is C0pKy3 (leading behavior for large y), where y is the size of the innovation vector, K is the number of innovation vectors, and p is the number of iterates to reach the optimum (i.e., a factor pK with respect to the observational update itself). This cost is obviously prohibitive for the large systems that are usually considered in atmospheric and oceanic applications, so that practitioners are compelled to develop complex nonoptimal schemes that are usually based on a clever fitting of filter statistics to innovation statistics. In this paper, it has been demonstrated that optimal parameter estimates can be computed efficiently (with a computational complexity C1 that is asymptotically negligible with respect to that of the observational update) as soon as the observational update is performed using a transformed formulation working in the reduced control space defined by the square root or ensemble representation of the forecast error covariance matrix. (This transformed formulation is also more efficient for the state observational update as soon as the dimension r of the reduced space is small with respect to the size of the observation vector: ry.) However, this level of efficiency can only be achieved for the following important parameters: scaling of the forecast error covariance matrix (C1pKr), scaling of the observation error covariance matrix (C1pKr), or parameters modifying the shape of the observation error covariance matrix, such as the correlation length scale (C1pKr3). In addition, the method is based on the fundamental assumption that the probability distribution of the innovation given the parameters is a Gaussian distribution. A direct generalization to non-Gaussian distributions is possible as soon as a nonlinear change of variables (anamorphosis) can be found to transform the non-Gaussian distributions into Gaussian distributions.

Idealized experiments have been performed to demonstrate the ability of this optimal adaptive algorithm to effectively control unknown statistical parameters in addition to the state of the system. These experiments have been designed to estimate a synthetic mesoscale signal over a limited region of the ocean basin, so that the rank of the covariance matrices is always small enough to preserve the efficiency of the adaptive algorithm. This means that in more realistic applications involving a large size dynamical system, the adaptive algorithm could only remain efficient if it is used in conjunction with a covariance localization method, in which local low rank covariance matrices can be obtained. In that way, it would become possible to compute the adaptive parameters locally, and thus to introduce even more degrees of freedom in the filter statistics. Showing how this can be done with sufficient numerical efficiency is the subject of a future work.

In addition, this local region of the mesoscale flow is estimated with a simplified assimilation scheme (i.e., zero or identity model and full observation coverage), which is devised in such a way that the optimality of the scheme can always be easily diagnosed. The results show first that the adaptive mechanism is able to control the above-cited unknown statistical parameters, separately and even jointly. Second, the associated measure of the accuracy of the estimated parameters (given for instance by the difference between percentiles 0.9 and 0.1 of their marginal posterior distribution) is closely related to the number of innovation vectors that take a significant part in the estimation. It is thus quite sensitive to the prior assumptions about the time dependence of the parameters (i.e., the choice of the forgetting exponent in our parameterization). Third, the experiments demonstrate that adaptivity can significantly improve the estimation of the state of the system. Comparisons with nonadaptive examples show that statistical parameters that are left inaccurate make the system depart from optimality, with dramatic consequences on the accuracy of the estimation. In addition, in this case study, the application of the adaptive algorithm always produces error estimates that are more consistent with the real error. This is a direct consequence of the improved accuracy of the statistical parameters, to which error estimates are particularly sensitive (much more than the state estimate itself).

However, the experiments also show that the estimation of the statistical parameters can be distorted by the presence of inaccurate parameters that are not included in the control vector. Then, the system tries to compensate for these errors and can produce parameter estimates that are even further away from their real value. Conversely, if too many parameters are included in the control vector, they may not be simultaneously controllable using the available observations. For instance, forecast and observation error scaling factors cannot be controlled together if their correlation structures are too similar. Consequently, defining the list of control parameters is still a subjective choice that needs to be carefully considered in order to find the best compromise between adjusting the largest number of uncertain parameters (to remove any possible inaccuracy in the error parameterization) and the possibility to control them effectively through the innovation sequence. Nevertheless, the optimal adaptive algorithm presented in this paper potentially introduces useful supplementary degrees of freedom in the estimation problem, and if they are judiciously chosen, the direct control of these statistical parameters by the observations increases the robustness of the error estimates produced by the filter. Being closer to statistical optimality, the adaptive filter can thus make a better use of the observational information, and be of direct benefit to atmosphere or ocean data assimilation systems.

Acknowledgments

This work was conducted as part of the MERSEA and MyOcean projects funded by the EU (Grants AIP3-CT-2003-502885 and FP7-SPACE-2007-1-CT-218812-MYOCEAN), with additional support from CNES. The calculations were performed using HPC resources from GENCI-IDRIS (Grant 2009-011279).

REFERENCES

  • Anderson, J. L., 2007: An adaptive covariance inflation error correction algorithm for ensemble filters. Tellus, 59A , 210224.

  • Anderson, J. L., 2009: Spatially and temporally varying adaptive covariance inflation for ensemble filters. Tellus, 61A , 7283.

  • Blanchet, I., C. Frankignoul, and M. Cane, 1997: A comparison of adaptive Kalman filters for a tropical Pacific Ocean model. Mon. Wea. Rev., 125 , 4058.

    • Search Google Scholar
    • Export Citation
  • Brankart, J-M., C. Ubelmann, C-E. Testut, E. Cosme, P. Brasseur, and J. Verron, 2009: Efficient parameterization of the observation error covariance matrix for square root or ensemble Kalman filters: Application to ocean altimetry. Mon. Wea. Rev., 137 , 19081927.

    • Search Google Scholar
    • Export Citation
  • Cohn, S. E., 1997: An introduction to estimation theory. J. Meteor. Soc. Japan, 75 , 257288.

  • Cosme, E., J-M. Brankart, J. Verron, P. Brasseur, and M. Krysta, 2010: Implementation of a reduced rank, square-root smoother for high resolution ocean data assimilation. Ocean Modelling, in press.

    • Search Google Scholar
    • Export Citation
  • Daley, R., 1991: Atmospheric Data Analysis. Cambridge University Press, 457 pp.

  • Daley, R., 1992: Estimating model-error covariances for application to atmospheric data assimilation. Mon. Wea. Rev., 120 , 17351746.

  • Dee, D., 1995: Online estimation of error covariance parameters for atmospheric data assimilation. Mon. Wea. Rev., 123 , 11281145.

  • Evensen, G., and P. J. van Leeuwen, 1996: Assimilation of Geosat altimeter data for the Agulhas Current using the ensemble Kalman filter with a quasi-geostrophic model. Mon. Wea. Rev., 124 , 8596.

    • Search Google Scholar
    • Export Citation
  • Hoang, S., R. Baraille, O. Talagrand, X. Carton, and P. De Mey, 1997: Adaptive filtering: Application to satellite data assimilation in oceanography. Dyn. Atmos. Oceans, 27 , 257281.

    • Search Google Scholar
    • Export Citation
  • Le Provost, C., and J. Verron, 1987: Wind-driven mid-latitude circulation—Transition to barotropic instability. Dyn. Atmos. Oceans, 11 , 175201.

    • Search Google Scholar
    • Export Citation
  • Lermusiaux, P., 2007: Adaptive modelling, adaptive data assimilation and adaptive sampling. Physica D, 230 , 172196.

  • Li, H., E. Kalnay, and T. Miyoshi, 2009: Simultaneous estimation of covariance inflation and observation errors within an ensemble Kalman filter. Quart. J. Roy. Meteor. Soc., 135 , 523533.

    • Search Google Scholar
    • Export Citation
  • Maybeck, P. S., 1979: Stochastic Models, Estimation and Control. Vol. 1. Academic Press, 423 pp.

  • Mitchell, H. L., and P. L. Houtekamer, 2000: An adaptive ensemble Kalman filter. Mon. Wea. Rev., 128 , 416433.

  • Pham, D. T., J. Verron, and M. C. Roubaud, 1998: Singular evolutive extended Kalman filter with EOF initialization for data assimilation in oceanography. J. Mar. Syst., 16 , 323340.

    • Search Google Scholar
    • Export Citation
  • Von Mises, R., 1964: Mathematical Theory of Probability and Statistics. Academic Press, 694 pp.

  • Wahba, G., D. R. Johnson, F. Gao, and J. Gong, 1995: Adaptive tuning of numerical weather prediction models: Randomized GCV in three- and four-dimensional data assimilation. Mon. Wea. Rev., 123 , 33583369.

    • Search Google Scholar
    • Export Citation
  • Wang, X., and C. Bishop, 2003: A comparison of breeding and ensemble transform Kalman filter ensemble forecast schemes. J. Atmos. Sci., 60 , 11401158.

    • Search Google Scholar
    • Export Citation

Fig. 1.
Fig. 1.

(left) Mean and (right) standard deviation of the dynamic height (m) over the full square basin. Our region of interest is represented by the black square in the middle of the basin.

Citation: Monthly Weather Review 138, 3; 10.1175/2009MWR3085.1

Fig. 2.
Fig. 2.

Dynamic height (m) snapshots corresponding to year 21 on 10, 20, and 30 Oct.

Citation: Monthly Weather Review 138, 3; 10.1175/2009MWR3085.1

Fig. 3.
Fig. 3.

Simulated observation noise (m) corresponding (left to right) to correlation models A, B, and C with σ = 0.2 m and ℓ = 5 grid points (for the last 2 models).

Citation: Monthly Weather Review 138, 3; 10.1175/2009MWR3085.1

Fig. 4.
Fig. 4.

(left) Likelihood function for the scaling factor of the forecast error covariance matrix, as a function of the number of input innovation vectors (1, 3, 10, and 100). (right) Mode of the posterior probability, together with percentiles 0.1 and 0.9 (thin dashed lines), as a function of the number of innovations (in abscissa).

Citation: Monthly Weather Review 138, 3; 10.1175/2009MWR3085.1

Fig. 5.
Fig. 5.

Mode (solid line) and percentiles 0.1 and 0.9 (thin dashed lines) of the posterior probability distribution for the forecast error covariance scaling factor, as estimated from the last 50 observations of a signal with nonconstant statistics. The dotted line represents the true scaling factor.

Citation: Monthly Weather Review 138, 3; 10.1175/2009MWR3085.1

Fig. 6.
Fig. 6.

(top) Estimated scaling factor for the forecast error covariance matrix of the model-simulated signal (thick lines). Associated root mean square error (only shown for the first 20 yr) for (middle) altimetry and (bottom) velocity, as compared to another simulation (thin lines) that is performed without adaptivity (i.e., with fixed ); the dashed lines represent the corresponding error standard deviation as estimated by the filter.

Citation: Monthly Weather Review 138, 3; 10.1175/2009MWR3085.1

Fig. 7.
Fig. 7.

(top) Estimated scaling factor for the observation error covariance matrix; the dashed lines represent the percentiles 0.1 and 0.9, and the thin line represents the solution that is obtained with an incorrect forecast error covariance scaling. Associated rms error for (middle) altimetry and (bottom) velocity; the dashed line represents the error standard deviation as estimated by the filter, and the thin lines represents the solution that is obtained without the adaptive mechanism.

Citation: Monthly Weather Review 138, 3; 10.1175/2009MWR3085.1

Fig. 8.
Fig. 8.

Likelihood function for the correlation length scale (in grid points), as a function of the number of input innovation vectors (1, 3, 10, and 100). The result is shown for the correlation model B with (left) ℓ = 2 and (middle) 5 grid points and (right) the correlation model C with ℓ = 5 grid points.

Citation: Monthly Weather Review 138, 3; 10.1175/2009MWR3085.1

Fig. 9.
Fig. 9.

Estimated scaling factors for (top) the forecast error covariance matrix and (middle) the observation error covariance matrix. The three lines correspond to experiments performed with correlation models A (solid lines), B (dashed lines), or C (dotted lines) for the observation error. (bottom) Associated rms error for altimetry (solid line) and error standard deviation as estimated by the filter (dotted line). The smallest error (about 4 cm, similar to Fig. 6) is obtained using correlation model A and the largest error (about 8 cm) using correlation model C.

Citation: Monthly Weather Review 138, 3; 10.1175/2009MWR3085.1

Fig. 10.
Fig. 10.

Likelihood function for parameters α (x axis) and β (y axis), based on the first two innovation vectors. Experiments performed using the correlation model (left) A, (middle) B, or (right) C for the observation error.

Citation: Monthly Weather Review 138, 3; 10.1175/2009MWR3085.1

Save