1. Introduction
In atmospheric or oceanic applications of the Kalman filters, the growing number of available observations often leads to a prohibitive cost of the observational update (analysis step), and to the necessity of simplifying the problem. Ad hoc solutions must be found to make the problem numerically tractable. One first option is to synthesize the observational information by aggregating observations in superobservations, or even by dropping the least useful or most redundant measurements (data thinning). Another option is to transform the original algorithm and reduce its computational complexity by taking advantage of prior hypotheses on the error statistics (i.e., on the shape of the state and observation error covariance matrices). Simplifications are thus applied on the error second-order statistical moments (which are anyway only approximately known) rather than on the observations themselves. Of course, these two options are not mutually exclusive; they can interact with and complement each other. As explained in Rabier (2006), the need for data thinning can also result from over simplistic assumptions in the parameterization of the observation error covariance matrix. For instance, with a suboptimal scheme neglecting observation error correlations, decreasing the observation density can help improving the accuracy of the estimation (Liu and Rabier 2002, 2003). In this paper, we propose to reduce the numerical cost of the observational update by using simplified (but rather general) parameterizations of the observation error covariance matrix. The expected consequence is that, with improved efficiency, together with sufficient accuracy and robustness in the representation of the observation error covariance matrix, this method can substantially reduce the need for data thinning.
If the forecast error covariance matrix is available in square root form, as in square root or ensemble Kalman filters, it is possible to use a modified observational update algorithm (proposed by Pham et al. 1998), whose computational complexity is linear in the number of observations (instead of being cubic in the standard formula), providing that the observation error covariance matrix can be inverted at low cost, as for instance if it is diagonal. It is the purpose of this paper to introduce specific parameterizations of the observation error correlations that preserve the numerical efficiency of that modified algorithm. This can be done (i) by expressing the observation error covariance matrix as the sum of a diagonal and a low rank matrix, or (ii) by applying a linear transformation to the observation vector (and assuming uncorrelated observations in the transformed space). It is interesting to note that, in parameterization ii, nonsquare transformation matrices are possible, which means that the observation vector can be augmented with new observations that are linear combinations of the original observations. Both parameterizations are presented in section 2 of this paper. In section 3, a specific choice of linear transformation, consisting of augmenting the observation vector with gradients of the original observations, is studied in more detail.
In section 4, the algorithm is applied to ocean altimetric observations, as simulated by a 1/4° model of the tropical Atlantic Ocean, and focusing on the North Brazil Current. It is known indeed that altimetric observation errors are spatially correlated, because, for example, of orbit errors or atmospheric correction errors. Moreover, these correlations are important to take into account, because they can directly improve the quality of the observational update (especially for the dynamic height gradient, and thus for velocities), and the accuracy of the associated error estimates. In the North Brazil Current, the ratio between signal amplitude (about 5 cm) and typical observational noise (about 4 cm) remains moderate: the signal is marginally observed; this example is thus appropriate to show the importance of accounting for error correlations to reconstruct the ocean circulation, and to check the validity of our simplified parameterizations.
2. Parameterization of the observation error covariance matrix


a. Observational update in square root or ensemble Kalman filters














It is worth noting here that, in realistic applications, the observational update is often performed locally by dividing the full model state in subdomains, and by performing a separate observational update for every subdomain using a subset of the global observation dataset (see, e.g., Anderson 2003; Houtekamer and Mitchell 1998; Ott et al. 2004; Tippett et al. 2003). With local methods, the size of the observation vector can be significantly reduced with respect to a global observational update, thus modifying the computational complexity (3) and (7) of the algorithms and the gain factor (8) that is obtained by using formula (5) instead of formula (2). The same expressions can however still be applied providing that x and y are defined as the size of the local state and observation vectors. In addition, if r is still the number of columns in 𝗦f, it can usually be set smaller with local methods. The use of low-rank 𝗣f parameterizations (or small size ensembles) is indeed one important reason for which local methods are required (Houtekamer and Mitchell 1998).
b. Observational update of the error covariance in square root or ensemble Kalman filters
















c. Modal parameterization of the observation error covariance matrix










Formula (5) or (11) with parameterization (17) for 𝗥 can only be advantageous with respect to formula (2) or (9) (i.e., C1 + C1R < C0, C1P + C1R < C0P or C1E + C1R < C0E) if the number of columns q of Θ is small with respect to the number of observations (q ≪ y); that is, if the observation error covariance matrix 𝗥 is the sum of a diagonal matrix and a low rank matrix. [This is why expression (17) is chosen: the diagonal term is necessary to make the matrix regular.] If this can be done, the computational complexity remains linear in the number of observations y and the numerical efficiency of formulas (5) and (11) is preserved.
A further difficulty is that Θ needs to be computed. Obviously, it cannot be computed by decomposition of a full size 𝗥 matrix (followed by rank reduction), because the computational complexity of such operation is again proportional to y3. A possibility (for spatially distributed observations) is to define the correlated part of 𝗥 at the nodes of a regular grid (𝗥g), compute the square root 𝗥g = ΘgΘgT (once for all) on that grid (with rank reduction if possible), and interpolate the modes Θg at observations locations (for every spatial distributions of the observations) to obtain Θ = 𝗛gΘg. Such approximation is valid if the error modes Θg contain only scales that are large against the regular grid resolution; that is, if the 𝗥 matrix can be represented by the superposition of a white noise (the diagonal part 𝗗) and a large-scale red noise (the correlated part 𝗗1/2ΘΘT𝗗1/2). In such case, two observations that are close together (much closer than the red noise correlation scales) are assumed fully independent with respect to the white noise, and fully dependent with respect to the red noise.
This parameterization is thus particularly efficient if the typical distance between observations is small with respect to the correlation scales, because then the number q of error modes can be made small with respect to the number of observations y (q ≪ y), and the additional cost C1R, given by (21), remains tractable (asymptotically for large y): the linear term in y is only increased to y(r 2 + q2 + rq) instead of yr 2. In other situations, this parameterization cannot preserve the efficiency of formulas (5) and (11) and other solutions must be found (see next sections).
An even more efficient parameterization can be built by using directly a reduced rank parameterization for the inverse observation error covariance matrix 𝗥−1 = ΘΘT, with square root Θ(y × q), q ≪ y. With respect to parameterization (17), the linear term in y is reduced to y(r 2 + rq) instead of y(r 2 + q2 + rq). Such a simplified parameterization is used in the oceanographic applications described in Brankart et al. (2003) and Testut et al. (2003). However, singular parameterizations of 𝗥−1 are dangerous because they imply that the null space of 𝗥−1 is assumed unobserved (infinite observation error), which may lead to neglecting important observational information. Again, this amounts to building superobservations (presumably the most useful ones) by projecting the original observations on the error modes (the columns of Θ), and dropping everything that is orthogonal to that. In this paper, we prefer to follow our original plan to keep all observations and thus only propose regular parameterizations of 𝗥.
d. Simulating correlations by linear transformation of the observation vector




Moreover, for a general linear operator 𝗧, the computational complexity of the application of the operator (e.g., to compute δy+ = 𝗧δy) is equal to yy+, where y+ is the size of the transformed observation vector (y+ ≥ y for a regular transformation). Hence, this complexity can only be linear in y if the structure of 𝗧 is simple. It can even become negligible (asymptotically) if every transformed observation (in the vector y+) is related to a small number of original observations. (It is the same argument that leads to neglecting the cost of 𝗛, see section 2a.) On the other hand, because the cost of the observational update is linear in y in formulas (5) and (11), we have the freedom to imagine a transformation 𝗧 that increases the number of observations (y+ > y), without prohibitive consequence on the numerical cost. Essentially, as soon as 𝗧 is known and is simple enough, the same computational complexity as (7), (12), and (15) applies, with y replaced by y+. (Thus the relative cost is multiplied by y+/y.) An example of such simple transformation, consisting of adding gradient observations to the original observation vector, is examined in section 3.
In addition, it is interesting to point out that, with uncorrelated observation errors in the transformed observation space, the observational update described by formulas (2) and (9) can be replaced by a repeated application of these formulas, using the observations in y+ one by one. The updated xa, 𝗣a obtained at each stage of the sequence are used as background state and background error covariance (xf and 𝗣f) for the next update. This is the serial processing algorithm that is also often used in ensemble filtering to reduce the numerical cost (at the expense of the assumption that observation errors are independent). By constructing an augmented observation vector with a diagonal error covariance matrix, the transformation method proposed in this paper thus also allows the application of this serial algorithm in presence of observation error correlations. On the other hand, in many applications, there can be several observation datasets with independent errors (e.g., if they originate from different instruments) so that the matrix 𝗥 is block-diagonal. Such observation error covariance matrices can also be easily simulated by this method using separate transformations to the corresponding segments of the observation vector, for instance by augmenting the observation vector with discrete gradients computed inside each observation dataset.














3. Simulating correlations by adding gradient observations
a. One-dimensional problem


















Figure 1 shows the solution of (39) computed numerically (by inversion of a tridiagonal matrix) for σ0 = 1 and different values of ℓ/Δξ, as compared with the continuous solution (35). The solution is drawn for ℓ = 1 and decreasing Δξ (left panel), showing the convergence toward the exponential decorrelation as Δξ → 0; and for Δξ = 1 and decreasing ℓ, showing how small correlation length scales (smaller than the observation resolution Δξ) are parameterized with this approach.






b. Two-dimensional problem










c. Higher order derivatives





Table 1 provides explicit expressions of the covariance function







The generality of the method is more directly obvious for discrete problems since any transformation 𝗧 can be obtained by adding finite differences of successive orders. However, increasing the number p of derivatives added to the observation vector also increases the numerical cost, so that the most effective parameterization always results from a compromise between a fine representation of a target observation error spectrum and the numerical efficiency of the observational update. In this respect, two critical elements are always the identification of an accurate prior model for the observation error correlations and the validation of this model using the observed information.
4. Application to altimetry in the North Brazil Current
Ocean altimetric observations are distributed along lines (the satellite ground track), or in the future, also along two-dimensional ribbons (wide-swath altimeters). And it is known that altimetric observation errors (due to the altimetric measure itself, orbit errors, or atmospheric correction errors) are spatially correlated along the ground track (or across the swath). The purpose of this section is to demonstrate the sensitivity of the observational update to these observation error correlations, and to check if the parameterization proposed in this paper is appropriate to take these errors into account. This is done on the particular example of the North Brazil Current circulation.
a. Description of the experiment
The North Brazil Current is a surface western boundary current flowing westward along the north Brazilian coast. It is fed from the southeast by the tropical surface current, and brings the water to the northwest into the Caribbean Sea. The current sheds large anticyclonic rings (with diameter of about 200 km), that are also transported toward the Caribbean Sea, covering the 2000 km in about 3 months [see Fratantoni et al. (1995) for more details]. The total transport of the mean current is about 21 Sv (1 Sv ≡ 106 m3 s−1; da Silveira et al. 1994), with typical surface velocities of 1 m s−1 for the main current and for the rings, corresponding to dynamic height differences of about 0.2 m.
A reference simulation of the circulation is computed using a primitive equation model covering the tropical Atlantic between 15°S and 20°N. It is a subregion of the Drakkar global ocean configuration at a 1/4° resolution of the Nucleus for European Modelling of the Ocean model (NEMO; Barnier et al. 2006; Penduff et al. 2007), using boundary conditions extracted from a global simulation. The model atmospheric forcing is computed from European Centre for Medium-Range Weather Forecasts 40-year reanalysis (ERA-40) atmospheric data using bulk aerodynamic formula. A 5-yr reference simulation of the tropical Atlantic model (computed by repeating 5 times the 2002 atmospheric data) is illustrated in Figs. 3 and 4. In this study, we focus on the results obtained in the region of the North Brazil Current (between 4.5° and 12.5°N, and between 6° and 46.5°W), that is shown on the figures. Figure 3 presents two snapshots of the sea surface height, together with its gradient and surface velocity for 2 and 14 December of the first year, showing the rings moving westward, and illustrating the close relation between altimetry and surface velocity. Figure 4 shows the mean circulation (sea surface height, gradient and surface velocity) averaged over the 5 yr of the simulation, together with the corresponding standard deviation. The order of magnitude of the sea surface height variability is similar to the bulk error standard deviation on satellite altimetric measurements, which is presently about 0.04 m. This variability is thus only marginally observed by such satellites, so that a fine tuning of the statistical parameters is particularly needed.
To test the observational update with different kinds of observation error parameterization, we need to define (i) the background (or forecast) state xf, and (ii) the true state xt, from which the observations y are sampled, and to which the estimation must be compared. As background state, we use the mean circulation (shown in Fig. 4, top panels); as true state, we use one of the model snapshots (illustrated in Fig. 3). And as observation, we assume that altimetry is observed over the full domain, with a 4-cm error standard deviation, at every node of the model grid. To test the sensitivity of the solution to the kind of observation error, two observation vectors are generated from the true state xt: a first one by adding uncorrelated observation noise, and a second one by adding a correlated observation noise, with a covariance matrix given by (23), with transformation (46) (for various values of ℓ= σ0/σ1). The noise is scaled to have a uniform standard deviation σ = 0.04 m. To randomly draw Gaussian noise vectors with known covariance 𝗥, we use the method described in the appendix of Fukumori (2002). Figure 5 (top panels) shows an example of such noise vectors, generated for three correlation lengths: ℓ = 0, 5, and 15 grid points. (In this section, ℓ = 0 stands for uncorrelated noise.) The figure also shows the corresponding error on the right difference between adjacent grid points, illustrating how the observational error on the discrete gradient decreases with ℓ.
It is interesting to make the link between this simulated observational noise and the characteristics of observation error in real altimetry data. As explained at the beginning of section 2, observation error is always the sum of a measurement error and a representation error. On the one hand, the altimetric measurement is affected by several kinds of error (altimetric measure, orbit error, atmospheric correction error) with a bulk standard deviation of about 3–5 cm, and horizontal correlation patterns that can depend on the satellite orbit and on the state of the atmosphere. On the other hand, altimetric data are actually spatial averages over about 5–10 km along track, which is about a factor of 3 smaller than the resolution of our model. The resulting representation error, which corresponds to this limited range of spatial scales in the continuous ocean system, is thus here likely to remain small with respect to measurement errors. Consequently, the properties of our randomly simulated observational noise (4-cm standard deviation, with various correlation length scales) is chosen quite adequately to be in the range of what can be expected for real altimetric data in this region and for that kind of ocean model.
In addition, in order to increase the robustness of the test, each experiment is repeated by using, as true state, every snapshot of the sequence (one every 6 days for 5 yr), and the results are averaged over that ensemble of experiments. There is thus an ensemble of true states xit, i = 1, …, N (with N = 300) and the corresponding ensemble of observations yi sampled from them. (Observational errors are drawn independently for every member of the sample.) Hence, as soon as the ensemble of true states can be viewed as representative of all possible states of the system, our indicator gives the average error that is committed using the observation error parameterization that is being tested (starting from the mean as background state).
To parameterize the background (or forecast) error covariance matrix 𝗣if, we use the covariance of all snapshots of the model simulation (sampled every month over the 5 yr of the simulation), except those that are less than 1 month away from the true state (larger than the typical decorrelation time scale), in order to avoid any influence of the true state on the input error covariance matrix (𝗣if, is thus recomputed for every member i = 1, …, N). With an ensemble of about 60 independent realizations (one per month), it is only for correlations >0.26 that the 95% confidence interval for correlations (assuming normal pdfs) does not include zero. Correlations of <0.26 are thus not significant. Thus, if the size of the region is much larger than the spatial decorrelation scale, the integrated influence of distant observations with nonsignificant correlations can be as large as that of close observations with significant correlations. To avoid the spurious effect of these inaccurate long-range correlations (resulting from the use of a small size ensemble), we perform a separate local observational update [as in Brankart et al. (2003) or Testut et al. (2003)] for each water column, with an additional weight on the observations decreasing with the distance r as exp(−r 2/d2), with d = 200 km (the typical distance at which the correlation ceases to be significant). Figure 6 illustrates the resulting local structure of the background covariance that is used to perform the observational update. The figure shows the observational update increment that would result from one single observation (with 0.04-m error standard deviation), located at 9°N, 54°W (in the middle of the area traversed by the mesoscale rings). The long-range (nonsignificant) influence is effectively set to zero, without affecting much the local covariance structure described by the ensemble.
b. Uncorrelated errors


However, the error reduction factor (with respect to background error) is less favorable for the gradient of altimetry. This is because computing altimetric difference Δζ between adjacent model cells amplifies the relative errors. Relative errors on velocities are again slightly worse because the relation between surface velocity and altimetry is not perfectly geostrophic (and thus not perfectly linear).
On the other hand, the observational update of the error covariance is illustrated in Fig. 7 (bottom panels), showing the estimated error standard deviation (the square root of the diagonal of 𝗣a). This estimation is quite consistent in amplitude and structure with the ensemble standard deviation of the error (measured by difference with respect to the true state), also shown in Fig. 7 (top panels). The good quality of the error estimate (for all variables) is the consequence of the consistent parameterization of the observation error covariance matrix; it also indicates that the background error covariance matrix (𝗣f) is quite accurately parameterized.
c. Correlated errors, with diagonal 𝗥 parameterization
In experiment 2, we use the observation vector that is perturbed by the correlated noise (with ℓ = 5 grid points), but keep the same diagonal parameterization of the observation error covariance matrix (𝗥 = σ2𝗹). The parameterization is thus inconsistent with the simulated errors, which are assumed uncorrelated, even though they are not. Figure 8 shows the corresponding error maps to be compared with Fig. 7. The comparison shows that the error on altimetry is significantly larger than in experiment 1. This is because the number of independent observations in the L × L area is reduced to about (L/ℓ)2 ∼ 1. Here, the background error keeps an influence, and the typical error is about ϵζ ∼ 0.03 m, a rough estimation that is again quite consistent with the results observed in Fig. 8.
However, larger errors on altimetry do not necessarily mean larger errors on the altimetry gradient or larger errors on velocity. In experiment 2, the error increase on the gradient with respect to experiment 1 is actually smaller than the error increase on altimetry. This is because the gradient is better observed through correlated observations than through uncorrelated observations (see Fig. 5). Sticking the solution to correlated observations (even with inappropriate diagonal error parameterization, as in experiment 2) thus partly compensates the easiness of filtering off a white noise from a large-scale (L = 125 km) signal (with optimal parameterization, as in experiment 1). This better observation of the gradient is the reason why keeping all available observations in the observation vector (even if the errors are very correlated) is always a better solution than subsampling the observations.
On the other hand, the observational update of the error covariance is illustrated in Fig. 8 (bottom panels), showing the estimated error standard deviation (the square root of the diagonal of 𝗣a). This estimation is identical to that of experiment 1 (Fig. 7, bottom panels), since all statistical parameters (𝗣f,𝗥) are kept identical. This largely underestimates the standard deviation of the true error (measured using the ensemble of differences with respect to the true state), that is shown in the top panels of Fig. 8. The estimation is about a factor of 3 below reality. This situation is the consequence of the inconsistent parameterization of the observation error covariance matrix. The diagonal 𝗥 parameterization lets the scheme believe that the data are more accurate than they are, so that it underestimates the error that is effectively in the system.
d. Correlated errors, with consistent 𝗥 parameterization
In experiment 3, we use the same observation vector as in experiment 2 (perturbed by the correlated noise), but add gradient observations to simulate correlations in the observation error covariance matrix, with the diagonal covariance matrix (38). By choosing σ0 = 0.275 m and σ1 = σ0/ℓ (for ℓ = 5 grid points, see Table 3), this observation error parameterization is thus perfectly consistent with the simulated errors. According to the theory presented in section 3, these observations with correlated errors are thus equivalent to much less accurate observations (σ0 = 0.275 m), together with accurate observations of the gradient (σ1 = 0.055 m per grid point). This is consistent with the idea suggested above that increasing ℓ reduces the number of independent observations. Figure 9 shows the corresponding error maps to be compared to Figs. 8 and 7. The comparison shows that the errors on altimetry are still larger than in experiment 1 (Fig. 7), because the quantity of information in the observation vector is still the same as in experiment 2, but the errors are smaller than in experiment 2 (Fig. 8), because the observation error parameterization is now consistent, so that the observational update is closer to optimality. (𝗣f is still an approximation: it cannot be considered that the background error is drawn randomly from a pdf of covariance 𝗣f.)
However, the error reduction (with respect to experiment 2) on altimetry gradient and velocity is significantly larger because the better confidence that we must give to the gradient is now explicitly taken into account in the observational update, through the nondiagonal parameterization of the observation error covariance matrix 𝗥. Better, this parameterization is effectively (and equivalently) applied in practice by the addition of gradient observations in the observation vector. (This addition of gradient observations can only bring the estimated gradient closer to the observed gradient, which is more accurate if the observation errors are spatially correlated.) The improvement of the gradient resulting from the parameterization of error correlations (if they exist) is thus clearly demonstrated by this experiment.
On the other hand, the observational update of the error covariance is illustrated in Fig. 9 (bottom panels), showing the estimated error standard deviation (the square root of the diagonal of 𝗣a). As in experiment 1, it is consistent with the standard deviation of the true error (measured using the ensemble of differences with respect to the true state), which is shown in the top panels of Fig. 9. Again, this is due to the consistent parameterization of the observation error covariance matrix, which has been restored by the addition of gradient observations (with adequate values for σ0 and σ1).
e. Sensitivity to the correlation scale
In this last section, we examine how the results presented above depend on the observation error correlation length. For that purpose, the same experiment is repeated for various simulated observation noises, with a correlation length ℓo ranging from 0 to 10 grid points. And, for each of these simulated noises, several parameterizations of the observation error covariance matrix are tested, using a correlation length ℓp also ranging from 0 to 10 grid points. Figure 10 shows the resulting error standard deviation for sea surface height and velocity (as measured by the ensemble of differences with respect to the true states), averaged over the domain of interest, as a function of ℓo and ℓp. The figure also shows the ratio between the averaged estimated error and the averaged measured error. It is only if ℓo = ℓp that the parameterization is consistent with the simulated errors: it is thus along that line that the measured error should be minimum (for a given ℓo) and that the ratio between estimated and measured errors should be equal to 1.
The results show that underestimating the observation error correlation length scale (ℓp ≪ ℓo) leads to a moderate increase of the error on sea surface height, but to a very significant increase of the error on velocity. The estimation of the error standard deviation is also well below reality. This situation indeed corresponds to giving too much importance to the observations and to imposing too weak a constraint on the gradient. On the contrary, overestimating the observation error correlation length scale (ℓp ≫ ℓo) leads to a moderate increase of the error on velocity but to a significant increase of the error on sea surface height. The estimation of the error standard deviation is also well above reality for sea surface height, whereas no sensitivity can be observed for velocity. A correct tuning of ℓp is thus required to accurately estimate both variables, with consistent error estimates.
However it must be noted that in these experiments, the optimal parameterization is not on the line ℓo = ℓp as it should be, but noticeably below that value, especially for the large values of ℓo. The benefit obtained by giving to the observations more credit than they deserve (by using ℓp < ℓo) can only be explained by inaccuracies in the parameterization of the 𝗣f matrix, which is here approximated by a limited size ensemble. Overestimating the confidence in the observations is thus somewhat useful here to compensate suboptimalities in the statistical parameterization of the scheme.
5. Conclusions
Classical algorithms to compute the observational update in Kalman filters are penalized by a computational complexity proportional to the cube of the number of observations. In square root or ensemble Kalman filters, this algorithm can be modified [as proposed by Pham et al. (1998)] to become linear in the number of observations if the observation error covariance matrix is diagonal. In this paper, it has been demonstrated that these benefits can be preserved with two nondiagonal parameterizations of the observation error covariance matrix 𝗥. The first method, parameterizing 𝗥 as the sum of a diagonal and a low rank matrix, is especially efficient if the typical distance between observations is small with respect to the correlation scales. The second method, simulating correlations by application of a linear transformation of the observation vector (with diagonal 𝗥 in the transformed space), is more generic. It is shown to be especially efficient to describe simple correlation structures if gradient observations can be added to the observation vector. This is possible, for instance, if the observations are distributed along lines or at the nodes of two-dimensional grids so that discrete gradients can be computed by subtracting successive observations. This has been shown to be equivalent to assuming a specific form of the observation error covariance matrix, with a correlation function or power spectrum that have been computed analytically in the asymptotic limit of dense (continuous) observations. The correlation scale is then the ratio of the observation error standard deviations that are assumed for the original observations and for the gradient observations.
Test experiments have been performed with the aim of reconstructing the circulation of the North Brazil Current, as simulated by a 1/4° model of the tropical Atlantic Ocean, using synthetic altimetric observations. Various observation datasets were generated by perturbation with uncorrelated and correlated noise, and for several correlation scales. For each dataset, diagonal and nondiagonal parameterizations of the observation error covariance matrix have been used to perform the observational update of altimetry together with surface velocities. The results show first that the more the observations are correlated, the less information they contain about altimetry. This is also true for velocity (but to a lesser degree) although the gradient of altimetry is better observed through correlated observations. Second, assuming a diagonal observation error covariance matrix in presence of a correlated noise leads to a nonoptimal solution that mainly penalizes the reconstruction of surface velocities, and underestimates the error variance (a factor of 3 lower than reality in our experiments). Third, optimal parameterizations of the observation error covariance matrix usually produce solutions that are close to minimizing the resulting error, although an artificial increase of the confidence to the observations (e.g., using a smaller correlation length) can lead to smaller errors (by compensating misspecifications of the forecast error covariance). Fourth, the experiments also suggest that our efficient parameterization of the observation error covariance matrix by adding gradient observations is appropriate to parameterize adequately observation error correlations. Adding explicit gradient observations can even be useful to compensate deficiencies in the forecast error statistics, by ensuring a direct control of velocities through gradient data. It must be stressed, however, that these conclusions may be sensitive to the region of interest: a fine-tuning of the observation error correlations may be less critical in regions where the noise-to-signal ratio is much smaller (like in the Gulf Stream region), since high relative accuracy is always obtained.
In ocean data assimilation applications, altimetry is always a key element of the observation system. However, the growing number of available observations (not only altimetry) often leads to a prohibitive numerical cost and to the temptation of simplifying the problem by aggregating (or even dropping) observations or by simplistic assumptions about the statistics (such as uncorrelated observation errors). These simplifications are always done at the expense of an optimal use of the observations, and singularly of altimetry, which is sensitive to that kind of approximation. The scheme proposed in this paper is a response to that problem: analyzing more observations at lesser cost becomes possible with realistic and robust parameterization of the observation error correlations. Being closer to statistical optimality, the scheme can thus make a better use of the observational information (especially about velocity), and be of direct benefit to ocean data assimilation systems.
Incidentally, our results also suggest a possible way of improving data thinning strategies. For instance, if the density of altimetric observations along the ground track is reduced, a critical information about the gradient (and thus about velocity) is likely to be lost, especially if the observation errors are correlated. A better strategy is certainly to transform the original observation vector by adding gradient observations, and parameterize a diagonal observation error covariance matrix in the transformed space as explained in this paper. The data thinning can then be performed on the transformed observation vector (aggregating observations and rescaling error variances) as if the data were independent. In that way, it becomes possible to give a reasonable importance to the gradient information in the reduced observation vector.
Acknowledgments
This work was conducted as part of the MERSEA project funded by the European Union (Contract AIP3-CT-2003-502885), with additional support from CNES. We also thank the anonymous reviewers for their useful comments and suggestions. The calculations were performed with the support of IDRIS/CNRS.
REFERENCES
Abramowitz, M., and I. A. Stegun, 1970: Handbook of Mathematical Functions. 9th ed. Dover Publications, 1046 pp.
Anderson, J. L., 2003: A local least squares framework for ensemble filtering. Mon. Wea. Rev., 131 , 634–642.
Barnier, B., and Coauthors, 2006: Impact of partial steps and momentum advection schemes in a global ocean circulation model at eddy permitting resolution. Ocean Dyn., 56 , 543–567.
Bateman, H., and A. Erdelyi, 1954: Tables of Integral Transforms. Vols. 1 and 2. McGraw-Hill Book Company, 835 pp.
Brankart, J-M., and P. Brasseur, 1996: Optimal analysis of in situ data in the western Mediterranean using statistics and cross-validation. J. Atmos. Oceanic Technol., 13 , 477–491.
Brankart, J-M., C-E. Testut, P. Brasseur, and J. Verron, 2003: Implementation of a multivariate data assimilation scheme for isopycnic coordinate ocean models: Application to a 1993–96 hindcast of the North Atlantic Ocean circulation. J. Geophys. Res., 108 , (C3). 3074. doi:10.1029/2001JC001198.
Cohn, S. E., 1997: An introduction to estimation theory. J. Meteor. Soc. Japan, 75 , 257–288.
Da Silveira, I., L. Miranda, and W. Brown, 1994: On the origins of the North Brazil Current. J. Geophys. Res., 99 , (C11). 22501–22512.
Evensen, G., and P. J. van Leeuwen, 1996: Assimilation of Geosat altimeter data for the Agulhas current using the ensemble Kalman filter with a quasigeostrophic model. Mon. Wea. Rev., 124 , 85–96.
Fratantoni, D., W. Johns, and T. Townsend, 1995: Rings of the North Brazil Current: Their structure and behavior inferred from observations and a numerical simulation. J. Geophys. Res., 100 , (C6). 10633–10654.
Fukumori, I., 2002: A partitioned Kalman filter and smoother. Mon. Wea. Rev., 130 , 1370–1383.
Houtekamer, P. L., and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique. Mon. Wea. Rev., 126 , 796–811.
Kalnay, E., 2003: Atmospheric Modeling, Data Assimilation and Predictability. Cambridge University Press, 341 pp.
Kimeldorf, G., and G. Wahba, 1970: A correspondence between Bayesian estimation of stochastic processes and smoothing by splines. Annu. Math. Stat., 41 , 495–502.
Landau, L., and E. Lifshitz, 1951: Statistical Physics: Course of Theoretical Physics. Vol. 5. Butterworth-Heinmann, 592 pp.
Liu, Z., and F. Rabier, 2002: The interaction between model resolution and observation resolution and density in data assimilation. Quart. J. Roy. Meteor. Soc., 128 , 1367–1386.
Liu, Z., and F. Rabier, 2003: The potential of high density observations for numerical weather prediction: A study with simulated observations. Quart. J. Roy. Meteor. Soc., 129 , 3013–3035.
McIntosh, P. C., 1990: Oceanographic data interpolation: Objective analysis and splines. J. Geophys. Res., 95 , (C8). 13529–13541.
Morse, P. M., and H. Feshbach, 1953: Methods of Theoretical Physics. Part I and II. Feshbach, 1978 pp.
Ott, E., H. B. R. I. Szunyogh, A. V. Zimin, E. J. Kostelich, M. Corazza, E. Kalnay, D. J. Patil, and J. A. Yorke, 2004: A local ensemble Kalman filter for atmospheric data assimilation. Tellus, 56A , 415–428.
Penduff, T., J. Le Sommer, B. Barnier, A-M. Treguier, J-M. Molines, and G. Madec, 2007: Influence of numerical schemes on current-topography interactions in 1/4° global ocean simulations. Ocean Sci., 3 , 509–524.
Pham, D. T., J. Verron, and M. C. Roubaud, 1998: Singular evolutive extended Kalman filter with EOF initialization for data assimilation in oceanography. J. Mar. Syst., 16 , 323–340.
Rabier, F., 2006: Importance of data: A meteorological perspective. Ocean Weather Forecasting: An Integrated View of Oceanography, E. P. Chassignet and J. Verron, Eds., Springer, 343–360.
Reif, F., 1965: Fundamentals of Statistical and Thermal Physics. McGraw Hill, 651 pp.
Testut, C., P. Brasseur, J. Brankart, and J. Verron, 2003: Assimilation of sea-surface temperature and altimetric observations during 1992–1993 into an eddy permitting primitive equation model of the North Atlantic Ocean. J. Mar. Syst., 40–41 , 291–316.
Tippett, M. K., J. L. Anderson, C. H. Bishop, T. M. Hamill, and J. S. Whitaker, 2003: Ensemble square root filters. Mon. Wea. Rev., 131 , 1485–1490.

Observation error covariance as a function of the distance ρ, as obtained numerically by inversion of the tridiagonal matrix given by (39) (for σ0 = 1 and different values of ℓ/Δξ). The solution is drawn (dotted curves) (left) for ℓ = 1 and decreasing Δξ = 2, 1, and 0.5 and (right) for Δξ = 1 and decreasing ℓ = 2, 1, 0.5, and 0.1. Larger bullets correspond to smaller ℓ/Δξ. In the left panel, the discrete solutions are multiplied by the factor ℓ/Δξ, to show the convergence to the continuous solution given by (35) (solid curve).
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

Observation error covariance as a function of the distance ρ, as obtained numerically by inversion of the tridiagonal matrix given by (39) (for σ0 = 1 and different values of ℓ/Δξ). The solution is drawn (dotted curves) (left) for ℓ = 1 and decreasing Δξ = 2, 1, and 0.5 and (right) for Δξ = 1 and decreasing ℓ = 2, 1, 0.5, and 0.1. Larger bullets correspond to smaller ℓ/Δξ. In the left panel, the discrete solutions are multiplied by the factor ℓ/Δξ, to show the convergence to the continuous solution given by (35) (solid curve).
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1
Observation error covariance as a function of the distance ρ, as obtained numerically by inversion of the tridiagonal matrix given by (39) (for σ0 = 1 and different values of ℓ/Δξ). The solution is drawn (dotted curves) (left) for ℓ = 1 and decreasing Δξ = 2, 1, and 0.5 and (right) for Δξ = 1 and decreasing ℓ = 2, 1, 0.5, and 0.1. Larger bullets correspond to smaller ℓ/Δξ. In the left panel, the discrete solutions are multiplied by the factor ℓ/Δξ, to show the convergence to the continuous solution given by (35) (solid curve).
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

Observation error covariance as a function of the distance ρ (along the grid lines), as obtained numerically for regular and isotropic grid spacings (for σ0 = 1 and different values of ℓ/Δξ). The solution is drawn (dotted curves) (left) for ℓ = 1 and decreasing Δξ = 1, 0.5, and 0.2 and (right) for Δξ = 1 and decreasing ℓ = 2, 1, 0.5, and 0.1. Larger bullets correspond to smaller ℓ/Δξ. In the left panel, the discrete solutions are multiplied by the factor ℓ2/Δξ2, to show the convergence to the continuous solution given by (44) (solid curve).
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

Observation error covariance as a function of the distance ρ (along the grid lines), as obtained numerically for regular and isotropic grid spacings (for σ0 = 1 and different values of ℓ/Δξ). The solution is drawn (dotted curves) (left) for ℓ = 1 and decreasing Δξ = 1, 0.5, and 0.2 and (right) for Δξ = 1 and decreasing ℓ = 2, 1, 0.5, and 0.1. Larger bullets correspond to smaller ℓ/Δξ. In the left panel, the discrete solutions are multiplied by the factor ℓ2/Δξ2, to show the convergence to the continuous solution given by (44) (solid curve).
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1
Observation error covariance as a function of the distance ρ (along the grid lines), as obtained numerically for regular and isotropic grid spacings (for σ0 = 1 and different values of ℓ/Δξ). The solution is drawn (dotted curves) (left) for ℓ = 1 and decreasing Δξ = 1, 0.5, and 0.2 and (right) for Δξ = 1 and decreasing ℓ = 2, 1, 0.5, and 0.1. Larger bullets correspond to smaller ℓ/Δξ. In the left panel, the discrete solutions are multiplied by the factor ℓ2/Δξ2, to show the convergence to the continuous solution given by (44) (solid curve).
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

Snapshots of the circulation in the region of the North Brazil Current, as simulated by the model for (top) 2 and (bottom) 14 Dec of the first year. (left) The sea surface height (m), (middle) the magnitude of its gradient (meters per grid point), and (right) sea surface velocity (m s−1).
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

Snapshots of the circulation in the region of the North Brazil Current, as simulated by the model for (top) 2 and (bottom) 14 Dec of the first year. (left) The sea surface height (m), (middle) the magnitude of its gradient (meters per grid point), and (right) sea surface velocity (m s−1).
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1
Snapshots of the circulation in the region of the North Brazil Current, as simulated by the model for (top) 2 and (bottom) 14 Dec of the first year. (left) The sea surface height (m), (middle) the magnitude of its gradient (meters per grid point), and (right) sea surface velocity (m s−1).
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

As in Fig. 3, but for the (top) means and (bottom) standard deviations of the 5-yr simulation.
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

As in Fig. 3, but for the (top) means and (bottom) standard deviations of the 5-yr simulation.
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1
As in Fig. 3, but for the (top) means and (bottom) standard deviations of the 5-yr simulation.
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

Simulated observational noise on sea surface elevation for three correlation lengths: (from left to right) ℓ = 0, 5, and 15 grid points. (top) The random noise (with variance equal to 1) and (bottom) the corresponding gradient, using the grid spacing as the length unit.
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

Simulated observational noise on sea surface elevation for three correlation lengths: (from left to right) ℓ = 0, 5, and 15 grid points. (top) The random noise (with variance equal to 1) and (bottom) the corresponding gradient, using the grid spacing as the length unit.
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1
Simulated observational noise on sea surface elevation for three correlation lengths: (from left to right) ℓ = 0, 5, and 15 grid points. (top) The random noise (with variance equal to 1) and (bottom) the corresponding gradient, using the grid spacing as the length unit.
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

Observational update increment on (left) sea surface height (m), (middle) zonal velocity (m s−1), and (right) meridional velocity (m s−1), that would result from one single observation (with 0.04-m error standard deviation) of altimetry located at 9°N, 54°W (in the middle of the area traversed by the mesoscale rings). This illustrates the size of the domain of influence of the observations.
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

Observational update increment on (left) sea surface height (m), (middle) zonal velocity (m s−1), and (right) meridional velocity (m s−1), that would result from one single observation (with 0.04-m error standard deviation) of altimetry located at 9°N, 54°W (in the middle of the area traversed by the mesoscale rings). This illustrates the size of the domain of influence of the observations.
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1
Observational update increment on (left) sea surface height (m), (middle) zonal velocity (m s−1), and (right) meridional velocity (m s−1), that would result from one single observation (with 0.04-m error standard deviation) of altimetry located at 9°N, 54°W (in the middle of the area traversed by the mesoscale rings). This illustrates the size of the domain of influence of the observations.
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

Error standard deviation corresponding to experiment 1, (top) as measured by the ensemble of differences with respect to the true states, and (bottom) as estimated by the scheme (the square root of the diagonal of 𝗣a). It is shown (left) for altimetry (m), (middle) for its gradient (m per grid point), and (right) for velocity (m s−1).
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

Error standard deviation corresponding to experiment 1, (top) as measured by the ensemble of differences with respect to the true states, and (bottom) as estimated by the scheme (the square root of the diagonal of 𝗣a). It is shown (left) for altimetry (m), (middle) for its gradient (m per grid point), and (right) for velocity (m s−1).
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1
Error standard deviation corresponding to experiment 1, (top) as measured by the ensemble of differences with respect to the true states, and (bottom) as estimated by the scheme (the square root of the diagonal of 𝗣a). It is shown (left) for altimetry (m), (middle) for its gradient (m per grid point), and (right) for velocity (m s−1).
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

As in Fig. 7, but for experiment 2.
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

As in Fig. 7, but for experiment 2.
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1
As in Fig. 7, but for experiment 2.
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

As in Fig. 7, but for experiment 3.
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

As in Fig. 7, but for experiment 3.
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1
As in Fig. 7, but for experiment 3.
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

This figure generalizes the results of Figs. 7, 8, and 9 (here averaged over the domain), by showing them as a function of the observation error correlation length scales (in grid points) ℓo (x axis), characterizing the simulated noise, and ℓp (y axis), which is used to parameterize the observation error covariance matrix 𝗥. Shown are results for (left two panels) sea surface height and (right two panels) velocity. Within each variable pair, the left panel shows the true error standard deviation (as measured by the ensemble of differences with respect to the true states), and the right panel shows the ratio between estimated and measured error standard deviations.
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

This figure generalizes the results of Figs. 7, 8, and 9 (here averaged over the domain), by showing them as a function of the observation error correlation length scales (in grid points) ℓo (x axis), characterizing the simulated noise, and ℓp (y axis), which is used to parameterize the observation error covariance matrix 𝗥. Shown are results for (left two panels) sea surface height and (right two panels) velocity. Within each variable pair, the left panel shows the true error standard deviation (as measured by the ensemble of differences with respect to the true states), and the right panel shows the ratio between estimated and measured error standard deviations.
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1
This figure generalizes the results of Figs. 7, 8, and 9 (here averaged over the domain), by showing them as a function of the observation error correlation length scales (in grid points) ℓo (x axis), characterizing the simulated noise, and ℓp (y axis), which is used to parameterize the observation error covariance matrix 𝗥. Shown are results for (left two panels) sea surface height and (right two panels) velocity. Within each variable pair, the left panel shows the true error standard deviation (as measured by the ensemble of differences with respect to the true states), and the right panel shows the ratio between estimated and measured error standard deviations.
Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1
Observation error power spectra and associated covariance functions. All spectra have the form of (48), so that they can be directly simulated by adding successive derivatives of the observations in the observation vector. The corresponding covariance functions have been derived from the tables of integral transforms compiled by Bateman and Erdelyi (1954); Jp is the first kind Bessel function of order p, Kp is the second kind modified Bessel function of order p, and kei0 is a Kelvin function (see Abramowitz and Stegun 1970). In functions (1.4) and (1.5), the parameters θ and α are such that −π/2 < θ < π/2 and 0 < α < 1. Some particular cases are included separately: (1.1) is (1.6) with p = 0; (1.2) is (1.4) with θ = π/4; (1.3) is (1.4) with θ = 0; (1.3) is (1.6) with p = 1; (2.1) is (2.4) with p = 0; (2.3) is (2.4) with p = 1.


The three experiments described in this paper only differ either by the (simulated) observation error or by the parameterization of the observation error. The observation error standard deviation is always set to 0.04 m, with consistent parameterization. The difference is only in the correlation: in experiment 1, the observation error is simulated by a white noise; in experiments 2 and 3, it is a correlated noise with spatial correlation ℓ = 5 grid points. In experiments 1 and 2, the observation errors are parameterized using a diagonal 𝗥 matrix (i.e., assuming uncorrelated errors); in experiment 3, gradient observations are added to simulate correlations. Hence, only experiments 1 and 3 receive a parameterization that is consistent with the simulated errors, (σ1 = ∞ means that no gradient observations are used)


Values of observation error standard deviation (σ0, m) and gradient error standard deviation (σ1, m per grid point) to use for parameterizing observation errors with standard deviation σ = 0.04 m and correlation length ℓ (in observation grid points). The correspondence is established using (23), with transformation (46).

