• Ballarotta, M., and Coauthors, 2019: On the resolutions of ocean altimetry maps. Ocean Sci., 15, 10911109, https://doi.org/10.5194/os-15-1091-2019.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Beckers, J. M., and M. Rixen, 2003: EOF calculations and data filling from incomplete oceanographic datasets. J. Atmos. Oceanic Technol., 20, 18391856, https://doi.org/10.1175/1520-0426(2003)020<1839:ECADFF>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bentley, J. L., 1975: Multidimensional binary search trees used for associative searching. Commun. ACM, 18, 509517, https://doi.org/10.1145/361002.361007.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bessières, L., and Coauthors, 2017: Development of a probabilistic ocean modelling system based on NEMO 3.5: Application at eddying resolution. Geosci. Model Dev., 10, 10911106, https://doi.org/10.5194/gmd-10-1091-2017.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Compo, G. P., and Coauthors, 2011: The Twentieth Century Reanalysis project. Quart. J. Roy. Meteor. Soc., 137, 128, https://doi.org/10.1002/QJ.776.

  • Dibarboure, G., M.-I. Pujol, F. Briol, P. Y. L. Traon, G. Larnicol, N. Picot, F. Mertz, and M. Ablain, 2011: Jason-2 in DUACS: Updated system description, first tandem results and impact on processing and products. Mar. Geod., 34, 214241, https://doi.org/10.1080/01490419.2011.584826.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dufau, C., M. Orsztynowicz, G. Dibarboure, R. Morrow, and P.-Y. Le Traon, 2016: Mesoscale resolution capability of altimetry: Present and future. J. Geophys. Res. Oceans, 121, 49104927, https://doi.org/10.1002/2015JC010904.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Durand, M., L.-L. Fu, D. P. Lettenmaier, D. E. Alsdorf, E. Rodriguez, and D. Esteban-Fernandez, 2010: The Surface Water and Ocean Topography mission: Observing terrestrial surface water and oceanic submesoscale eddies. Proc. IEEE, 98, 766779, https://doi.org/10.1109/JPROC.2010.2043031.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Fablet, R., J. Verron, B. Mourre, B. Chapron, and A. Pascual, 2018a: Improving mesoscale altimetric data from a multitracer convolutional processing of standard satellite-derived products. IEEE Trans. Geosci. Remote Sens., 56, 25182525, https://doi.org/10.1109/TGRS.2017.2750491.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Fablet, R., P. Viet, R. Lguensat, P.-H. Horrein, and B. Chapron, 2018b: Spatio-temporal interpolation of cloudy SST fields using conditional analog data assimilation. Remote Sens., 10, 310, https://doi.org/10.3390/rs10020310.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Fu, L.-L., and R. Ferrari, 2008: Observing oceanic submesoscale processes from space. Eos, Trans. Amer. Geophys. Union, 89, 488488, https://doi.org/10.1029/2008EO480003.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gandin, L. S., 1965: Objective Analysis of Meteorological Fields. Israel Program for Scientific Translations, 242 pp.

  • Gaspari, G., and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions. Quart. J. Roy. Meteor. Soc., 125, 723757, https://doi.org/10.1002/qj.49712555417.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique. Mon. Wea. Rev., 126, 796811, https://doi.org/10.1175/1520-0493(1998)126<0796:DAUAEK>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jouanno, J., J. Ochoa, E. Pallàs-Sanz, J. Sheinbaum, F. Andrade-Canto, J. Candela, and J.-M. Molines, 2016: Loop Current frontal eddies: Formation along the Campeche Bank and impact of coastally trapped waves. J. Phys. Oceanogr., 46, 33393363, https://doi.org/10.1175/JPO-D-16-0052.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Le Traon, P. Y., F. Nadal, and N. Ducet, 1998: An improved mapping method of multisatellite altimeter data. J. Atmos. Oceanic Technol., 15, 522534, https://doi.org/10.1175/1520-0426(1998)015<0522:AIMMOM>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Le Traon, P. Y., and Coauthors, 2019: From observation to information and users: The Copernicus Marine Service perspective. Front. Mar. Sci., 6, 234, https://doi.org/10.3389/FMARS.2019.00234.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lguensat, R., P. Tandeo, P. Ailliot, M. Pulido, and R. Fablet, 2017: The analog data assimilation. Mon. Wea. Rev., 145, 40934107, https://doi.org/10.1175/MWR-D-16-0441.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lguensat, R., P. H. Viet, M. Sun, G. Chen, F. Tian, B. Chapron, and R. Fablet, 2019: Data-driven interpolation of sea level anomalies using analog data assimilation. Remote Sens., 11, 858, https://doi.org/10.3390/rs11070858.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lorenz, E. N., 1963: Deterministic nonperiodic flow. J. Atmos. Sci., 20, 130141, https://doi.org/10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Minamide, M., and F. Zhang, 2017: Adaptive observation error inflation for assimilating all-sky satellite radiance. Mon. Wea. Rev., 145, 10631081, https://doi.org/10.1175/MWR-D-16-0257.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Miyoshi, T., E. Kalnay, and H. Li, 2013: Estimating and including observation-error correlations in data assimilation. Inverse Probl. Sci. Eng., 21, 387398, https://doi.org/10.1080/17415977.2012.712527.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Penduff, T., and Coauthors, 2014: Ensembles of eddying ocean simulations for climate. CLIVAR Exchanges, Vol. 65, International CLIVAR Project Office, Southampton, United Kingdom, 19–22.

  • Ponte, R. M., and R. D. Ray, 2002: Atmospheric pressure corrections in geodesy and oceanography: A strategy for handling air tides. Geophys. Res. Lett., 29, 2153, https://doi.org/10.1029/2002GL016340.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Pujol, M.-I., G. Dibarboure, P.-Y. Le Traon, and P. Klein, 2012: Using high-resolution altimetry to observe mesoscale signals. J. Atmos. Oceanic Technol., 29, 14091416, https://doi.org/10.1175/JTECH-D-12-00032.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Pujol, M.-I., Y. Faugère, G. Taburet, S. Dupuy, C. Pelloquin, M. Ablain, and N. Picot, 2016: DUACS DT2014: The new multi-mission altimeter data set reprocessed over 20 years. Ocean Sci., 12, 10671090, https://doi.org/10.5194/os-12-1067-2016.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Schleicher, D., 2007: Hausdorff dimension, its properties, and its surprises. Amer. Math. Mon., 114, 509528, https://doi.org/10.1080/00029890.2007.11920440.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Takens, F., 1981: Detecting strange attractors in turbulence. Dynamical Systems and Turbulence, Warwick 1980, D. A. Rand and L.-S. Young, Eds., Lecture Notes in Mathematics, Vol. 898, Springer-Verlag, 366–381, https://doi.org/10.1007/BFB0091903.

    • Crossref
    • Export Citation
  • Tandeo, P., P. Ailliot, J. J. Ruiz, A. Hannart, B. Chapron, R. Easton, and R. Fablet, 2015: Combining analog method and ensemble data assimilation: Application to the Lorenz-63 chaotic system. Machine Learning and Data Mining Approaches to Climate Science, Springer, 3–12.

    • Crossref
    • Export Citation
  • Tandeo, P., P. Ailliot, M. Bocquet, A. Carrassi, T. Miyoshi, M. Pulido, and Y. Zhen, 2018: A review of innovation-based methods to jointly estimate model and observation error covariance matrices in ensemble data assimilation. arXiv, https://arxiv.org/abs/1807.11221.

  • Ubelmann, C., P. Klein, and L. L. Fu, 2015: Dynamic interpolation of sea surface height and potential applications for future high-resolution altimetry mapping. J. Atmos. Oceanic Technol., 32, 177184, https://doi.org/10.1175/JTECH-D-14-00152.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wu, Z., 1995: Compactly supported positive definite radial functions. Adv. Comput. Math., 4, 283292, https://doi.org/10.1007/BF03177517.

  • Zhen, Y., P. Tandeo, S. Leroux, S. Metref, T. Penduff, and J. L. Sommer, 2019: 3da code and data. Zenodo, accessed 17 December 2019, https://doi.org/10.5281/zenodo.3559784.

    • Crossref
    • Export Citation
  • View in gallery

    The attractors of (left) the original state variable and (right) the time-delayed state variable of L63.

  • View in gallery

    (top) The trajectory of the truth, the observations, and the AnDA and OI estimates. The RMSE of AnDA estimates and OI estimates are 0.77 and 1.177, respectively. (middle) Estimated reanalysis standard deviation and absolute error of reanalysis estimate for AnDA. The estimated standard deviation is strongly correlated to the absolute error. (bottom) Estimated reanalysis standard deviation and absolute error of reanalysis estimate for OI. In this example, the estimated standard deviation is periodic since it only depends on the observation frequency and the magnitude of R.

  • View in gallery

    Snapshots of the “true” SSH in the region of interest on different days of year 2004, featuring the formation and shedding of a “Loop Current eddy.” The SSH here comes from the OCCIPUT ensemble simulation (see text). The two symbols on the maps mark the location of the Loop Current and the Florida coast grid points, respectively, at 25°N, 85°W and at 26.03°N, 82°W.

  • View in gallery

    Time series of the reconstructed daily SSH for year 2004 at the two grid points marked on the maps in Fig. 3: (top) in the Loop Current (25°N, 85°W) and (bottom) near the Florida coast (26.03°N, 82°W). The reconstructed SSH is shown for AnDA, OI and OICOA and compared with the true SSH.

  • View in gallery

    (a) Temporal power spectral density (PSD) of the reconstructed SSH (AnDA, OI, OICOA) and true SSH. (b) Temporal signal-to-noise ratio (R) measuring the temporal coherence of each of the reconstructed SSH (AnDA, OI, OICOA) with the true SSH. Both PSD and R are averaged over the entire domain. Both panels share the same x axis in log scale for temporal frequency [cycles per day (cpd)]. The tick labels on the top axis give the corresponding periods in days.

  • View in gallery

    Monthly averages of (top) estimated standard deviation and (bottom) absolute error centered on (a) 8 Mar and (b) 8 Sep 2004. The Ps produced by OI (top-middle panel for each month) and OICOA (top-right panel for each month) only depends on the tracks of satellite altimetry and the background covariance B. Therefore, the estimated standard deviation does not seem relevant to approximate the absolute error (bottom-middle and bottom-right panels for each month). On the other hand, the estimated standard deviation produced by AnDA is flow dependent (top-left panel for each month) and closer to the absolute error.

All Time Past Year Past 30 Days
Abstract Views 293 8 0
Full Text Views 240 196 0
PDF Downloads 203 148 0

An Adaptive Optimal Interpolation Based on Analog Forecasting: Application to SSH in the Gulf of Mexico

View More View Less
  • 1 IMT Atlantique, Lab-STICC, UBL, Brest, France
  • | 2 Ocean-Next, Grenoble, France
  • | 3 Université Grenoble Alpes, CNRS, IRD, IGE, Grenoble, France
Free access

Abstract

Because of the irregular sampling pattern of raw altimeter data, many oceanographic applications rely on information from sea surface height (SSH) products gridded on regular grids where gaps have been filled with interpolation. Today, the operational SSH products are created using the simple, but robust, optimal interpolation (OI) method. If well tuned, the OI becomes computationally cheap and provides accurate results at low resolution. However, OI is not adapted to produce high-resolution and high-frequency maps of SSH. To improve the interpolation of SSH satellite observations, a data-driven approach (i.e., constructing a dynamical forecast model from the data) was recently proposed: analog data assimilation (AnDA). AnDA adaptively chooses analog situations from a catalog of SSH scenes—originating from numerical simulations or a large database of observations—which allow the temporal propagation of physical features at different scales, while each observation is assimilated. In this article, we review the AnDA and OI algorithms and compare their skills in numerical experiments. The experiments are observing system simulation experiments (OSSE) on the Lorenz-63 system and on an SSH reconstruction problem in the Gulf of Mexico. The results show that AnDA, with no necessary tuning, produces comparable reconstructions as does OI with tuned parameters. Moreover, AnDA manages to reconstruct the signals at higher frequencies than OI. Finally, an important additional feature for any interpolation method is to be able to assess the quality of its reconstruction. This study shows that the standard deviation estimated by AnDA is flow dependent, hence more informative on the reconstruction quality, than the one estimated by OI.

Corresponding author: Yicun Zhen, zhenyicun@protonmail.com

Abstract

Because of the irregular sampling pattern of raw altimeter data, many oceanographic applications rely on information from sea surface height (SSH) products gridded on regular grids where gaps have been filled with interpolation. Today, the operational SSH products are created using the simple, but robust, optimal interpolation (OI) method. If well tuned, the OI becomes computationally cheap and provides accurate results at low resolution. However, OI is not adapted to produce high-resolution and high-frequency maps of SSH. To improve the interpolation of SSH satellite observations, a data-driven approach (i.e., constructing a dynamical forecast model from the data) was recently proposed: analog data assimilation (AnDA). AnDA adaptively chooses analog situations from a catalog of SSH scenes—originating from numerical simulations or a large database of observations—which allow the temporal propagation of physical features at different scales, while each observation is assimilated. In this article, we review the AnDA and OI algorithms and compare their skills in numerical experiments. The experiments are observing system simulation experiments (OSSE) on the Lorenz-63 system and on an SSH reconstruction problem in the Gulf of Mexico. The results show that AnDA, with no necessary tuning, produces comparable reconstructions as does OI with tuned parameters. Moreover, AnDA manages to reconstruct the signals at higher frequencies than OI. Finally, an important additional feature for any interpolation method is to be able to assess the quality of its reconstruction. This study shows that the standard deviation estimated by AnDA is flow dependent, hence more informative on the reconstruction quality, than the one estimated by OI.

Corresponding author: Yicun Zhen, zhenyicun@protonmail.com

1. Introduction

Satellite altimetry is an essential component of the global ocean observing system with many applications key to climate monitoring, operations at sea and oceanic process understanding. Satellite altimeters provide measurements of sea surface height (SSH), a dynamical parameter that holds information about the upper-ocean pressure field. Satellite derived SSH measurements are used for monitoring changes in sea level at global and regional scales. They are also used for estimating upper-ocean circulation at scales larger than the first Rossby radius of deformation where the geostrophic balance holds. Satellite altimetry is therefore a key source of information for ocean monitoring systems, and an essential constraint in ocean forecasting systems.

In practice, many oceanographic applications of satellite altimetry rely on gridded SSH products rather than on raw along-track SSH data. Satellite altimeters indeed provide SSH measurements along ground tracks, following a sampling pattern which depend on the satellite orbit. The existing constellation of altimeters combines several satellites, but the overall sampling of SSH data is irregular with large gaps both in space and in time, and will remain so in the near future with the advent of wide-swath altimetry. Still, many applications of SSH data require the tracking of oceanic flow features in space and time or the computation of spatial derivatives of SSH such as applications related to ship routing, search and rescue, oil spills, or fisheries, as detailed in Le Traon et al. (2019). Hence, for convenience, many applications of SSH data are currently based on operational data products where SSH data are interpolated on a regular spatial grid at fixed time intervals.

Presently, the most commonly used operational gridded SSH products are based on static interpolation methods. Operational gridded SSH L4 products, as distributed for instance by the AVISO data center within the Copernicus program (Pujol et al. 2016; Le Traon et al. 2019), combine information from multiple altimeters through an optimal interpolation (OI) analysis. Optimal interpolation analysis (Gandin 1965) is a static interpolation method which uses the autocorrelation of a field to define the relative weights given to a set of observed data for reconstructing the field at unobserved locations. In practice, gridded SSH products are therefore obtained as weighted sums of observed SSH values, derived from explicit assumptions as to the space and time autocorrelation structure of the SSH field.

Although widely used, OI-based gridded SSH products are affected by several limitations and shortcomings. The quality of OI-based SSH reconstructions is indeed intrinsically dependent on the choice of the predefined autocorrelation parameters; but in practice, the chosen autocorrelation parameters are usually not optimal because of the tradeoffs due to the optimization of the product resolution at global scale (Dibarboure et al. 2011; Pujol et al. 2012). Moreover, the OI procedure does not provide an a priori estimation of the level of error of the reconstructed fields. Most importantly, OI is not state dependent and therefore does not account for the complex, nonlinear dynamics of oceanic flows (Ubelmann et al. 2015). These limitations and shortcomings will likely become more problematic with the higher-spatial-resolution capability of upcoming wide-swath altimeters (Fu and Ferrari 2008; Durand et al. 2010).

Several alternative approaches to static interpolation methods have been proposed in the context of ocean remote sensing. Methods have for instance been proposed for improving the representation, and estimation of the covariance structure of the field to interpolate. This includes the Data Interpolating Empirical Orthogonal Functions (DINEOF) method (Beckers and Rixen 2003), a parameter-free procedure used for interpolating sea surface temperature (SST) or surface chlorophyll (Chl). In the context of SSH mapping, approaches accounting explicitly for the nonlinear dynamics of SSH have been proposed. Ubelmann et al. (2015) relies for instance on a dynamical propagator based on quasigeostrophic theory. Alternatively, Lguensat et al. (2019) proposes to use analog forecasting for accounting for ocean dynamics in SSH mapping algorithms. Research has also focused on exploiting synergies between different sensors for improving SSH mapping algorithms [as, for instance, with SST; see Fablet et al. (2018a)].

Because it is parameter-free and state dependent, analog data assimilation appears as a promising approach for improving SSH mapping algorithms. Analog data assimilation (AnDA), also known as empirical dynamical modeling, is a state estimation procedure which combines data assimilation and analog forecasting (Tandeo et al. 2015; Lguensat et al. 2017). AnDA uses a catalog of trajectories in the system state space, which can be drawn from observations or from numerical model simulations. The catalog is used for inferring the system dynamics and for building estimates of the system state at unobserved locations and times. Realistic applications to oceanic data include the interpolation of SST (Fablet et al. 2018b) and the interpolation of SSH (Lguensat et al. 2019). Lguensat et al. (2019) have shown in particular how AnDA can be used for improving OI-based SSH fields at finescale. Still, to date, a comparison of the respective skills and performances of OI versus AnDA in the context of SSH mapping is still missing.

In this study, we investigate how AnDA performs as compared to OI for the reconstruction of SSH maps from along-track SSH data. Our aim is to document the potential benefits of AnDA in the context of the design of operational gridded L4 SSH products. We present results based on observing system simulation experiments (OSSE) over the Gulf of Mexico where the true state and the catalog of scenes are drawn from different members of a 50-member ensemble model simulation run at 1/4° resolution. Our analysis focuses in particular on the relative performance of AnDA and OI in reconstructing the time variability of SSH signals, on the sensitivity of the reconstruction to the size of the catalog and the ability of the methods to estimate the quality of their reconstructions.

Within the limitations of our OSSE experiments, our results show that (i) AnDA provides estimates of SSH with error levels comparable to an optimally tuned OI but without the need to a priori tune the covariance parameters; (ii) AnDA can reconstruct more reliably high-frequency SSH fluctuations than OI, which shows limited skill for time scales faster than the pretuned temporal correlation; and (iii) AnDA provides a reliable a priori estimate of the absolute error of the reconstructed SSH field, therefore allowing us to detect when the quality of the reconstruction is poor. Our results therefore suggest that applications of AnDA to the mapping of SSH are worth investigating further.

This paper is organized as follows. In section 2, OI and AnDA algorithms are respectively reviewed, and details are given on how to tune the parameters. Then, both methods are applied to the Lorenz-63 system in section 3. Finally, in section 4, AnDA and OI are implemented on the SSH mapping problem in the region of the Gulf of Mexico. Section 5 brings a summary and a final discussion and conclusions. The code and data for reproducing the numerical results of the SSH experiments are available online (Zhen et al. 2019).

2. Description of the interpolation algorithms

OI is a widely used method for interpolating sparse and noisy observations. On the other hand, a data-driven interpolation method (i.e., constructing a dynamical forecast model from the data) AnDA has been introduced by Tandeo et al. (2015) and described in detail by Lguensat et al. (2017). The details of these two algorithms follow.

a. Optimal interpolation

OI is written as a linear inverse problem such as

x=xb+ηb,
y=Hx+ϵ,

with xb the background or a priori information, H the transformation from state x to observations y, ηb~N(0,B) the background error and ϵ~N(0,R) the observation error. Here, xb, B, and R are prescribed by the users. OI is a reanalysis and has a direct Gaussian solution given by N(xs,Ps) such that

xs=xb+K(yHxb),Ps=BKHB,

with K = BHT(HBHT + R)−1 the gain controlling the influence of the observations and the background.

The quality of OI results largely depends on the choice of the B and R matrices (Tandeo et al. 2018). The matrix R represents the error covariances in the observational model. It can be measured or estimated offline if we assume that the observation error is stationary, which is the case in this article. However, in realistic applications, R can be nonstationary and should be estimated online (Minamide and Zhang 2017). The matrix R is not necessarily diagonal, i.e., the observation errors can be correlated. But, in practice, R is often assumed diagonal in order to reduce computational costs (Miyoshi et al. 2013). In our experiment, we set

R=rI,

where r is a scalar and I is the identity matrix.

The choice of B should be consistent with the choice of xb. If xb is chosen to be the climatological mean state field x¯, then it is reasonable to choose B as the spatial–temporal climatology background covariance matrix. However, saving the complete spatial–temporal climatology covariances is not possible in large dimensional applications because of the prohibitive requirement for storage space. Therefore, a parameterized covariance matrix is often used to substitute the complete climatology covariances (Wu 1995; Gaspari and Cohn 1999). A popular choice of B has the following form:

B(xi,t1,xj,t2)=Bspatial(i,j)f(dt/Lt),

with dt = |t1t2| and where Bspatial(i, j) is the (i, j)th component of a predetermined symmetric positive-definite matrix that represents the spatial climatology distribution of the state variable x, f is a predetermined function that defines the shape of the temporal correlation of each component of x and Lt is a prescribed parameter that defines a uniform decay rate for the temporal correlation. The matrix Bspatial can be a parameterized matrix or the sample covariances computed from a long time series of x. Technically, B must be a symmetric positive-definite matrix. Hence, the choice of f cannot be arbitrary. When the dimension of x is large, directly inverting the full matrix HBHT + R is numerically demanding. In the present study, we implement OI locally in the spatial dimension, as presented in algorithm 1 (Table 1). The choice of Bspatial and f depends on the application problem and will be discussed in each experimental section. Note that OI can also be implemented locally in both spatial and temporal dimensions.

Table 1.

Algorithm 1. Local optimal interpolation.

Table 1.

b. Analog data assimilation

AnDA is a combination of analog forecasting and data assimilation. For the part of data assimilation, we use the ensemble Kalman smoother (EnKS), which is commonly used in many classic data assimilation problems (see, for instance, Compo et al. 2011). The EnKS requires an ensemble run of Ne simulations starting from different initial states. This ensemble run provides sample covariances for data assimilation at every time step. The EnKS consists of a forward filter and a backward smoother. In the forward process, the forecast of each ensemble member is calculated separately. And each member is updated by ensemble Kalman filter whenever observations are available. In the backward smoother, each member is updated recursively in the backward direction. The EnKS is summarized in algorithm 2 (Table 2). The subscript i refers to the ith member, t the time, and the superscripts p, f, a, and s refer to the forecast without noise, forecast with noise, analysis, and reanalysis, respectively. In the forward Kalman filter, εi,t is artificially created and added to yt to compensate for the loss of variance (Houtekamer and Mitchell 1998). Line 4 of algorithm 2 implements covariance localization which consist in the Schur product PCloc with Cloc a prescribed spatial covariance localization matrix (e.g., the Gaussian function or the Gaspari–Cohn matrix; Gaspari and Cohn 1999). In AnDA, the forecasting operator F in line 7 of algorithm 2 is replaced by the analog forecasting.

Table 2.

Algorithm 2. Ensemble Kalman smoother (with covariance localization).

Table 2.

The major difference between AnDA and the classic data assimilation is that AnDA uses the technique of analog forecasting to predict the state at the next time step, instead of running the numerical model. In many applications, the analog forecast method could be an interesting alternative since it can simulate variable dynamics that are not necessarily represented in a numerical model. For instance, if an underlying variable of the system is not modeled by the numerical model but is present in the analog database, the analog forecast will be able to describe its relationship to other variables and predict its evolution. To ensure the good performances of the analog forecast method and consequently of AnDA, a large historical dataset of state variables is needed: the catalog.

The quality of the analog forecasting procedure highly depends on the quality and the space of the catalog. First, the catalog has to be as rich as possible to cover all the possible situations. Larger catalogs usually lead to better performance of AnDA. Second, the analogs have to live in an informative space. In practice, it can be a subspace to reduce the dimensionality of the problem (e.g., the EOF space used in section 4) or an augmented space when the dimension of the system is too low to distinguish situations that are not real analogs (e.g., the time delayed state space used in section 3). The catalog is then saved in a k-dimensional tree structure so that the relevant analogs at each time step can be accessed efficiently (Bentley 1975). The technique of analog forecasting at each time step can be briefly summarized by the following three steps.

  • Step 1: For a given state estimate xt, search for k analogs (A1, …, Ak) that are nearest to xt within the catalog, where k is prechosen. At the same time, we are also given the successors of Ai, denoted by S1, …, Sk. Here Si is the physical state at one time step later than Ai.

  • Step 2: Build a local model Mt between A1, …, Ak and S1, …, Sk, i.e., Si=Mt(Ai)+ηi,t, where ηi,t is assumed to be some white and independent identically distributed noise, the distribution of which can be calculated from Ai and Si.

  • Step 3: Apply the local model Mt to xt: xt+1Mt(xt)+ηt, where ηt, describing the model error of Mt, is drawn randomly and follows the same distribution as ηi,t.

It has been pointed out in Lguensat et al. (2017) that there are various choices of local models in the second step. Lguensat et al. (2019) compared these local models and the numerical results show that the locally linear model outperforms the others. In our applications, the local model Mt is the locally linear model that regresses Si over the anomalies of analogs Ai=AiA¯, where A¯ refers to the weighted mean of Ai. Or equivalently, the local model we choose is the linear model that regresses the anomalies of successors Si=SiS¯ over Ai. In the numerical implementation, this linear regression can be done with respect to the leading components of Ai. In the case that xt represents the full state, this local model can be thought of as an approximation of the tangent linear model restricted on the attractor if the current state estimate xt lies on the attractor and the distribution of analogs is dense enough. Furthermore, the distribution of the residuals ηi,t is always assumed to be Gaussian in our applications. Hence, ηt~N(0,Qt), where Qt is the weighted covariance matrix of the residues SiMt(Ai), mentioned in step 3. The details of analog forecast with locally linear model is described in algorithm 3 (Table 3).

Table 3.

Algorithm 3. Analog forecast.

Table 3.

c. Conceptual differences

In this subsection we discuss, from a conceptual point of view, the differences between AnDA and OI based on the formulations of these two algorithms. These differences are then assessed in sections 3 and 4 on numerical experiments.

The OI is a purely spatial–temporal interpolation method. The performance of OI completely relies on the choice of the static matrices B and R. Hence, the interpolation does not account for the dynamics of the underlying state variable. As a consequence, the estimated posterior variance of the OI reanalysis shall only depend on the positions of observations and the physical locations of the state variables. On the other hand, AnDA automatically learns the dynamics from the catalog at every time step. Hence, the posterior variance of AnDA should be flow dependent.

In operational usages of OI, it is usually not realistic to construct the full spatial–temporal climatological covariance. Hence, B is often assumed to be the tensor product of a spatial covariance matrix and a temporal correlation matrix. The temporal correlation matrix is uniquely determined by a scalar parameter Lt which defines the temporal correlation scale. Numerically, this artificial temporal correlation smooths out the temporal fluctuations of the reanalysis that have periods shorter than Lt. Hence, the OI should not be able to reconstruct the signal for modes of periods less than Lt. In contrast, AnDA does not have this limitation since the state variables are propagated under the dynamics learned from the catalog.

3. Application to the Lorenz-63 system

In this section, we compare the reanalysis means and variances produced by AnDA with those produced by OI, using the classic three-dimensional Lorenz-63 (L63) chaotic system (Lorenz 1963):

dxtdt=10(ytxt),dytdt=xt(28zt)yt,dztdt=xtyt83zt.

The system is integrated with dt = 0.01 using the fourth-order Runge–Kutta method. The first component x(t) is observed for every 10 time steps (i.e., dtobs = 0.1), with an additive white Gaussian noise of variance R = 2. After model spinup, we first run the model for 103 time steps to generate the truth, and then we continue to run the model for 104 time steps to generate the catalog for AnDA. Our goal is to calculate the reanalysis of x together with its uncertainty estimate, based on the simulated observations. In this experiment we pretend that we have no knowledge of y and z. Therefore, we cannot directly apply the L63 equations for forecasting, which is the scenario that AnDA is designed for.

a. Implementation of AnDA

Applying AnDA directly on the first L63 component cannot lead to a good estimation. Indeed, if xt = a, the intersection of the section x = a and the L63 attractor has two branches, which is the case for a large proportion of possible values of a. Then whether xt+1 would be greater than or smaller than xt depends on which branch the full state variable (xt, yt, zt) lies on. Hence, it is roughly equally likely for xt to increase or decrease in the next time step. Therefore, we would not be able to have an informative prediction of xt+1 by merely looking at the analogs of xt. A solution to this problem is to consider the time-delayed states xt = (xt, xtτ, xt−2τ)T for the implementation of AnDA. Experimentally, we find the optimal τ = 11 value. Figure 1 shows the original attractor and the attractor of the time-delayed state variable. By using the time-delayed states as analogs, the details of the implementation of AnDA shall change correspondingly, which is explained in detail in the appendix. We use an ensemble of size Ne = 50. At each time step, we apply analog forecasting separately to each ensemble member with k = 50, which is the parameter mentioned in step 1 of analog forecasting. For the Kalman smoother, we use R = 2, which is the same as the observation error variance used to create the observations.

Fig. 1.
Fig. 1.

The attractors of (left) the original state variable and (right) the time-delayed state variable of L63.

Citation: Journal of Atmospheric and Oceanic Technology 37, 9; 10.1175/JTECH-D-20-0001.1

b. Implementation of OI

Since we only consider the first component x of the full system, we choose the following prior background covariance:

B(xt1,xt2)=B11exp{|t1t2|2/Lt2},

where B11 is the climatology covariance of x, which can be calculated from a long-time simulation. The parameters of OI, namely, r and Lt as indicated by Eqs. (4) and (7), are tuned to guarantee that OI algorithm produces the minimal root-mean-square error (RMSE): here we set r = 2 and Lt = 0.2.

c. Comparison of mean estimates

Let x^ be the reanalysis estimates of AnDA or OI, and xtrue be the truth such that xtrue and x^ exist for t1, t2, …, tT. Suppose that x^=(x^j)1,,nx and xtrue=(xjtrue)1,,nx are of dimension nx (which equals 1 in the present L63 case), the RMSE of x^ is then defined as

RMSE=1T1nxi=1Tj=1nxx^j(ti)xjtrue(ti)2.

Although the xt we use for analog forecast with time-delayed states has three components, we only take the first component xt to compute the RMSE. The time-delayed estimates (i.e., the second and the third components of the state reanalysis) are not used to evaluate the performance.

The RMSE for AnDA is 0.77, and the minimal (after tuning the parameters) RMSE for OI is 1.177. The top panel of Fig. 2 shows the trajectory of the truth, the observation, and the reanalysis estimates of the L63 first component. The state reanalysis produced by OI apparently has large errors when the state is near the origin. In contrast, the trajectory of AnDA manages to reproduce the L63 dynamics even when the observation errors are large. In this experiment, we do not meet the curse of dimensionality, since we have 104 samples in the catalog while the Hausdorff dimension (Schleicher 2007) of the L63 attractor is around 2.06. Therefore, the dynamics represented by the analog forecast method approximates the true dynamics very well.

Fig. 2.
Fig. 2.

(top) The trajectory of the truth, the observations, and the AnDA and OI estimates. The RMSE of AnDA estimates and OI estimates are 0.77 and 1.177, respectively. (middle) Estimated reanalysis standard deviation and absolute error of reanalysis estimate for AnDA. The estimated standard deviation is strongly correlated to the absolute error. (bottom) Estimated reanalysis standard deviation and absolute error of reanalysis estimate for OI. In this example, the estimated standard deviation is periodic since it only depends on the observation frequency and the magnitude of R.

Citation: Journal of Atmospheric and Oceanic Technology 37, 9; 10.1175/JTECH-D-20-0001.1

d. Comparison of estimated standard deviations

Another interesting way of comparing AnDA and OI is assessing the quality of the estimated standard deviation of the state reanalysis versus the true absolute error. Indeed, the absolute error directly quantifies how far the estimate is from the truth. However, the truth is usually unknown hence the absolute error is often not accessible. When this is the case, estimated standard deviations are often used as a reference to inform on the actual error of the state estimate. Hence, providing an estimated standard deviation that corresponds to the absolute error is a key feature for a reconstruction method. These quantities are defined as follows:

stdev=diag(Ps)nx,
abserror=|x^xtrue|nx.

It is not surprising to see that the OI algorithm produces a periodic estimate of standard deviation (Fig. 2, bottom panel). Indeed, the estimated error is only based on the observation sampling. This is a strong limitation of OI. In contrast, the estimated standard deviation of AnDA is much more flow dependent (Fig. 2, middle panel). The absolute error of AnDA increases each time the state variable is close to the bifurcation point or the furthest points of the two wings. At those times, the AnDA estimated standard deviation manages to inform on the error made as the complexity of the L63 dynamics renders the state estimation harder.

4. Application to the interpolation of along-track SSH

a. Targeted region and dataset

In this section we test the OI and AnDA algorithms in an OSSE aiming at interpolating along-track SSH onto gridded SSH maps. We focus here on a 10° × 10° region in the eastern Gulf of Mexico (centering at 85°W, 25°N; see Fig. 3). In terms of grid points, the region of interest is 41 × 41 large, including nx = 1353 ocean grid points in total (the rest being landmasses) thus giving the dimension of the state variable x.

Fig. 3.
Fig. 3.

Snapshots of the “true” SSH in the region of interest on different days of year 2004, featuring the formation and shedding of a “Loop Current eddy.” The SSH here comes from the OCCIPUT ensemble simulation (see text). The two symbols on the maps mark the location of the Loop Current and the Florida coast grid points, respectively, at 25°N, 85°W and at 26.03°N, 82°W.

Citation: Journal of Atmospheric and Oceanic Technology 37, 9; 10.1175/JTECH-D-20-0001.1

The ocean circulation in this region features the Loop Current (LC), an anticyclonic flowing meander entering the Gulf through the Yucatan Channel (Yucatan Current), and exiting along the southern tip of the Florida Peninsula (Florida Current). The Loop Current is known as an unstable system and episodically sheds large anticyclonic eddy rings of scale 200–400 km with periods ranging from about 100 to 450 days (see Fig. 3). The shedding of these Loop Current eddies is a complicated process as eddies can detach and reattach to the Loop Current, before propagating westward across the Gulf. SSH variability in the region is also related to smaller-size cyclonic eddies (80–120 km) that are observed moving along the outer edge of the LC [Loop Current frontal eddies (LCFEs)], both on subannual and submonthly time scales, and to coastally trapped waves that responds to wind variability, and especially to winter cold surges [see Jouanno et al. (2016) for a review].

We perform the OSSE using daily SSH maps from one of the Ocean Chaos–Impacts, Structures, Predictability (OCCIPUT) ensemble simulations (Penduff et al. 2014; Bessières et al. 2017). This is a regional North Atlantic Ocean–sea ice 50-member ensemble simulation performed at eddy-permitting horizontal resolution (1/4°). After a common 20-yr spinup, the 50 members are restarted from slightly perturbed initial conditions and forced over 20 years (1993–2012) with identical surface forcing. In the following, the SSH of the last year of the first ensemble member is taken as the ground truth. We then use the location of the real along-track AVISO observations available for 2004 (that include 4 satellites: TOPEX/Poseidon, GFO, Jason-1, Envisat), to generate our pseudo-observations by locally and linearly interpolating the truth along the observed tracks. No observation error is artificially added to the simulated observations (i.e., Rtrue = 0).

The historical catalog from which AnDA learns the forecast model is thus made of the daily maps of SSH from the 19 remaining years of the 49 remaining ensemble members (meaning 19 × 49 × 365 = 339 815 daily SSH maps in total). As an element of comparison, the historical catalog in Lguensat et al. (2019) for a similar problem is 34 years of 3-day data (4017 SSH maps).

b. Implementation of AnDA

First, we reduce the dimension of the state variable. We take the coefficients of the first 100 leading EOFs as the reduced state xred100. In practice, we calculate the spatial climatology covariance Bclim1353×1353 based on the OCCIPUT simulation:

Bclim=1365000iY=120iN=150t=1365[xiN,iY(t)x¯][xiN,iY(t)x¯]T,

where xiN,iY(t) refers to the SSH on the tth day of year iY of the iNth ensemble member. The EOFs (denoted by ei) are the eigenvectors of Bclim:

Bclimei=λiei,

for i = 1, 2, …, 1353. Then for a given state variable x1353, the reduced state is defined by

xred=(x,e1,x,e2,,x,e100)T100.

The first 100 EOFs explains more than 99% of the variance of SSH. This explained variance is stable over the whole time series of 20 years.

AnDA is implemented with respect to xred. Our catalog consists of the xred that were calculated using 49 members (member 2 to member 50), from year 1 to year 19. Therefore, the catalog and the truth come from different members and years. By dimension reduction, the corresponding observation operator Hred is different from the original H:

Hredxred=i=1100xiredHei.

And the corresponding observation error variance Robsred is no longer zero since the small components (i.e., ⟨x, e101⟩, …, ⟨x, e1353⟩) are missing in the reduced state variable. In this reduced space, covariance localization is implemented as T1[T(P)Cloc] in line 4 of algorithm 2 (Table 2), where T transforms the covariances of xred to the covariances of the original physical state x.

We choose Ne = 1000 (ensemble size for data assimilation), k = 1000 (the parameter mentioned in algorithm 3; Table 3) and R = 4 cm2. A different choice of analogs was made in Lguensat et al. (2019) where the analogs and successors were chosen to represent only the small-scale modes of the complete simulated SSH. The large-scale modes of SSH were first reconstructed using the OI method. Then the small-scale modes were reconstructed using AnDA. Although this space reduction strategy was shown to be promising, its success requires a catalog of high-resolution SSH data which are not often available.

c. Implementation of OI

We considered the following background covariance matrix:

B(xi,t1,xj,t2)=Bijexp{dt2/Lt2},

where i, j = 1, 2, …, dim(x), dij refers to the physical distance between xi and xj, dt = |t1t2|, Bij = Cov(xj, xj) refers to the spatial climatology covariance, and Lt is the scalar parameter defining the temporal scales of the covariance matrices.

The parameters Bij are directly calculated from the SSH dataset and the parameter Lt is tuned so that the OI algorithm produces the minimal RMSE. Often in real applications, when the true spatial climatology covariances are not accessible, they are also parameterized and the background covariance matrix is approximated by B[xi(t1),xj(t2)]=BiiBjjexp{dij2/Lx2dt2/Lt2}, with Lt a scalar parameter defining the spatial scales of the covariance matrices. However, for the sake of a fair comparison between OI and AnDA and since the simulated dataset in our experiments is large enough, we are able to estimate the spatial climatology covariances Bij and fully compute Eq. (11). This formulation indeed yields the best results in the experiments of the present section (comparison not shown).

Note that, for the OI used in Data Unification and Altimeter Combination System (DUACS), the choice of the parameters Lx and Lt is usually made as a best global trade-off to achieve global resolution of the mesoscale features (e.g., Dibarboure et al. 2011; Pujol et al. 2012), and could, in principle, be better optimized in a specific regional context (hence the tuned OI in this study).

In this study, we also consider an OI optimized with conventional objective analysis (Le Traon et al. 1998) and here named OICOA. The OICOA experiment is used as a point of comparison in order to show that (i) an OI is difficult to tune (conventional objective analysis fails to do so) and (ii) an incorrectly tuned OI can lead to significant errors. The covariance function BCOA is chosen to be

BCOA(xi,t1,xj,t2)=BiiBjjCijCOAexp{dt2/Lt2},

where

CijCOA=[1+αdijLx+(αdij)26Lx2(αdij)36Lx3]exp{αdijLx},

with α = 3.34. In Le Traon et al. (1998), the parameters are chosen to be Lx = 150 km, Lt = 20 days. We tune R so that the method produces the minimal RMSE based on the given Lx and Lt. A sensitivity test (not shown here) demonstrated that the difference between OI computed from Eq. (11) and OICOA is mainly due to the difference in the parameter Lt. The correlation functions are also different but do not make a significant difference in our numerical results.

d. RMSE results

The RMSE values for SSH, vorticity and velocity reconstructed with the three methods (AnDA, OI and OICOA) are summarized in Table 4. Here, the velocity refers to the two-dimensional vector of the geostrophic velocities which is defined as (u, υ) = (−∂ssh/∂x, ∂ssh/∂x) (g/f) and the vorticity is defined as q = (∂υ/∂x − ∂u/∂y) (g/f), where g = 9.81 m s−2 is the gravity acceleration and f is the Coriolis force. Table 4 shows that AnDA does as good as the best-tuned OI (i.e., tuned and optimized specifically for the region of interest) for these three variables, resulting in very similar RMSE values for the two methods in the full region of interest. In the case of SSH, the RMSE value for AnDA is smaller than the one for OI (1.40 and 1.68 cm, respectively). However, this difference fades off when the RMSE is computed over the central region only, i.e., excluding coastal areas. In the following, we will show that this is due to the fact that AnDA can reconstruct the high-frequency SSH fluctuations of the coastal areas much better than OI. These SSH high-frequency fluctuations are likely related to the coastally trapped waves responding to winter windstorm surges as mentioned in Jouanno et al. (2016).

Table 4.

Summary of the RMSE values obtained with the three methods for year 2004 (6 Jan–31 Dec 2004): AnDA, OI, and OICOA, for SSH (in cm), geostrophic velocity (in cm s−1), and vorticity [(100 s)−1], computed over the full domain, in the central region only (indicated with an asterisk), i.e., excluding the coastal areas (23.78°–27.13°N, 83.75°–90°W), and in the Florida and Yucatan coastal area.

Table 4.

It is also clear from Table 4 that OICOA is systematically less accurate than AnDA and OI in terms of RMSE. The time series of the reconstructed SSH at two example grid points, displayed in Fig. 4, provide an illustration to why this is the case. With parameter Lt set to 20 days for the temporal correlation scale of OICOA, the reconstructed SSH misses the high-fluctuations of the signal. These high-frequency fluctuations are particularly strong near the Florida coast (bottom panel), while in the Loop Current (top panel), the large amplitude fluctuations appear to be of monthly and subannual time scales as they are associated with the fluctuations of the LC meander and LCE shedding. On the other hand, the tuning of the best-tuned OI with Lt = 6 days results in a better behavior of the reconstructed SSH in the high frequencies. We quantify this further in the following with a dedicated temporal spectral analysis. At this point, we wish to emphasize the fact that AnDA is as accurate as the best-tuned OI, without the need to explicitly tune the parameters Lx and Lt. In AnDA, the information is implicitly provided by the historical catalog. It should be reminded, however, that these results are produced in the context of an OSSE with pseudo-observations derived from the simulated truth, and so the historical catalog from which AnDA learns is fully consistent with those observations. We reserve for future investigations the case where the catalog and the truth come from different sources.

Fig. 4.
Fig. 4.

Time series of the reconstructed daily SSH for year 2004 at the two grid points marked on the maps in Fig. 3: (top) in the Loop Current (25°N, 85°W) and (bottom) near the Florida coast (26.03°N, 82°W). The reconstructed SSH is shown for AnDA, OI and OICOA and compared with the true SSH.

Citation: Journal of Atmospheric and Oceanic Technology 37, 9; 10.1175/JTECH-D-20-0001.1

An additional sensitivity test has been performed in order to assess the impact of the catalog size on the reconstruction performances. Three other catalog sizes have been implemented: using 1 member (19 × 365 daily SSH maps), 20 members (20 × 19 × 365 daily SSH maps), and 30 members (30 × 19 × 365 daily SSH maps) of the 19-yr OCCIPUT ensemble and compared to the current catalog using 49 members (49 × 19 × 365 daily SSH maps). The resulting SSH reconstruction RMSE are, respectively, 1.82, 1.46, 1.45, and 1.40 cm. As expected, the performance of AnDA are improved by a larger catalog. However, the dependence is not linear and the difference between using 20 members, 30 members, and 49 members is relatively small. In fact, both the 20 member catalog and the 30-member catalog also lead to smaller RMSE than OI.

e. Temporal spectral results

The top panel of Fig. 5 shows the temporal power spectral densities (PSD) averaged over the entire domain for the reconstructed SSH with the three methods and for the true SSH. The PSD of the three reconstructed signals are very close to the PSD of the truth at time scales longer than about 30 days, confirming that all three methods produce equivalent energy reconstructions of the monthly to subannual fluctuations. But at higher frequencies, we find that only the PSD for the AnDA-reconstructed SSH stays close to the truth. A drop-off in the PSD is clearly seen for the OI- and OICOA-reconstructed SSH at approximately 6 and 20 days, respectively, which is consistent with the values set for Lt in each case.

Fig. 5.
Fig. 5.

(a) Temporal power spectral density (PSD) of the reconstructed SSH (AnDA, OI, OICOA) and true SSH. (b) Temporal signal-to-noise ratio (R) measuring the temporal coherence of each of the reconstructed SSH (AnDA, OI, OICOA) with the true SSH. Both PSD and R are averaged over the entire domain. Both panels share the same x axis in log scale for temporal frequency [cycles per day (cpd)]. The tick labels on the top axis give the corresponding periods in days.

Citation: Journal of Atmospheric and Oceanic Technology 37, 9; 10.1175/JTECH-D-20-0001.1

We also check the noise-to-signal ratio between the reconstructed signals and the truth (Fig. 5, bottom panel). For this purpose, and following Dufau et al. (2016) and Ballarotta et al. (2019), we compute the spectral noise-to-signal ratio as

R=1PSDErrorPSDTruth,

where PSDError is the PSD of the difference between the reconstructed SSH and the truth, and PSDTruth is the PSD of the true signal. This metric provides a measure of the coherence between the two signals that takes into account differences in both amplitude and phase (Ballarotta et al. 2019). Figure 5 (bottom panel) shows that the spectral noise-to-signal ratio for AnDA remains far above 0.5 down to time scales of ~5 days, which confirms that a good coherence exists between the AnDA-reconstructed and the true SSH. The coherence is not as good for the two reconstructed SSH signals of OI and OICOA. For the best-tuned OI, R drops below 0.5 at time scales of ~15 days, even if the PSD drops off only around 6 days. In other words, the OI method manages to reconstruct enough energy at high frequencies (although not below 6 days) yet fails to produce a coherent signal at those scales. As for OICOA, both PSD and R drop off at time scales of about 25 days.

We thus find confirmation in Fig. 5 that AnDA is able to reconstruct an SSH signal with good coherence to the truth at higher frequencies than OI, and so even when AnDA is compared with the OI specifically tuned for the region of interest. This is consistent with what we had already pointed out from the example gridpoint time series in Fig. 4. As already discussed, in the domain we examine here, submonthly fluctuations are the strongest in coastal regions because of the response to wind bursts (Jouanno et al. 2016). Operational systems such as DUACS, based on OI, are not able to capture well those fast fluctuations, but can partially get around this limitation by using additional products such as the AVISO Dynamic Atmosphere Correction (DAC) to propose an a posteriori correction for the missed and aliased part of the signal corresponding to the dynamical high-frequency ocean response to wind and pressure forcing (e.g., Ponte and Ray 2002). Note that this correction, however, is only based on this specific source of high-frequency fluctuations, while our study shows that a method such as AnDA is able to capture high-frequency signals originating from all kind of sources (to the extent that the fluctuations are well represented in the historical catalog). We find indeed that the spectral results shown in Fig. 5 remain robust also when restricting the area of the spectral analysis to the central ocean-only region (not shown here), meaning that AnDA is able to capture high-frequency fluctuations of any kind, and not only the coastally trapped waves.

f. Estimated standard deviation results

Another interesting result of this study is that, consistently with the L63 application in section 3, AnDA produces a more informative estimated standard deviation. Indeed, it has similar spatial patterns as the absolute error (i.e., the difference with the true signal), and does not only depend on the tracks of the observations to interpolate. This is illustrated in Fig. 6 where estimated standard deviation and absolute error are averaged over a 1-month period on two example dates (8 March and 8 September 2004) for AnDA, OI, and OICOA. For visual purposes, we show the monthly averaged distribution of the absolute error as it presents a clearer flow-dependent feature than the daily distribution.

Fig. 6.
Fig. 6.

Monthly averages of (top) estimated standard deviation and (bottom) absolute error centered on (a) 8 Mar and (b) 8 Sep 2004. The Ps produced by OI (top-middle panel for each month) and OICOA (top-right panel for each month) only depends on the tracks of satellite altimetry and the background covariance B. Therefore, the estimated standard deviation does not seem relevant to approximate the absolute error (bottom-middle and bottom-right panels for each month). On the other hand, the estimated standard deviation produced by AnDA is flow dependent (top-left panel for each month) and closer to the absolute error.

Citation: Journal of Atmospheric and Oceanic Technology 37, 9; 10.1175/JTECH-D-20-0001.1

Figure 6 shows that the absolute error of AnDA on 9 March and 9 September is smaller than that of OI and OICOA, especially along the Florida coast and in the Loop Current. It means that on that dates, the AnDA-reconstructed SSH is closer to the truth, which is consistent with time series given in Fig. 4. For instance, on 9 September, the absolute errors concentrate near the anticyclonic flowing meander. And it is clear that in this region, the absolute error of AnDA is smaller than that of OI and OICOA.

Figure 6 also illustrates the fact that the estimated standard deviation for OI depends on the satellite observation sampling (here, along tracks) and on the background error covariance matrix B. This is consistent with the results given in Fig. 2 (bottom panel) in the case of the L63 system. The estimated standard deviation for OI is thus noninformative. In contrast, the estimated standard deviation of AnDA does not only depend on the observation sampling but also on the flow. Therefore, its pattern is more correlated to the absolute error (see top and bottom left panels of each snapshot in Fig. 6).

5. Conclusions

This paper reviews the algorithms of analog data assimilation (AnDA) and optimal interpolation (OI), and presents the numerical results of interpolation with the Lorenz-63 (L63) system and with simulated sea surface height (SSH) data. Our comparison of AnDA and OI mainly focuses on the root-mean-square error (RMSE) of the state estimate, the estimated standard deviation, and the temporal spectra of the reconstructed states. To achieve a fair comparison, we carefully tune the parameters of OI so that the RMSE is the most reduced. As a reference we also present the numerical results of OI for a classical but suboptimal set of parameters (labeled OICOA) in the experiments with SSH data. This setting corresponds to the seminal work described in Le Traon et al. (1998).

In the tests with the L63 model, a case where we do not meet the curse of dimensionality, we show that AnDA produces more realistic interpolated trajectories, especially when the true state is near the center of the system attractor (see Fig. 2, top panel). Meanwhile, the standard deviation estimated by AnDA is highly correlated with the absolute error, which is unknown in practice, and is hence much more informative. On the other hand, the standard deviation estimated by OI is uncorrelated with the absolute error (see Fig. 2, middle and bottom panels) and only depends on the background and observation terms.

In the tests with simulated SSH data, AnDA and OI produce comparable RMSE for the daily SSH estimates (Table 4). However, only the interpolation using AnDA captures well the high-frequency fluctuations, including those generated in the coastal regions in response to winter wind bursts (Fig. 4). We show that the reconstructed temporal spectra of AnDA is also more consistent to that of the truth, in terms of energy and coherence, both at large and small time scales. In contrast, the OI-reconstructed temporal spectra suffers a significant loss of energy and is incoherent with the truth at small time scales (see Fig. 5). Moreover, the standard deviation estimated by AnDA is once again more informative. Indeed, compared to OI results, the AnDA estimated standard deviation is flow dependent, evolving in space and time, and has a significant correlation with the absolute error (see Fig. 6).

To summarize, AnDA and OI are interpolation methods with slightly different formulations. In the case of OI, parameters controlling spatial–temporal variability and levels of noise are prescribed by the user. The optimization process of these parameters is time demanding, especially for large systems. Instead, AnDA is using analogs and these parameters are adaptively learned from a catalog of data, which needs to be as rich as possible. In one sense, the construction of the catalog in AnDA is time demanding but once it is created, this procedure is very convenient as it does not need additional tuning. In terms of interpolation results, AnDA and OI differ from their mean and standard deviation estimates. Regarding the mean estimate, AnDA, based on a catalog of numerical simulations, creates realistic trajectories which capture fast and slow fluctuations at the same time. Instead, OI is linearly interpolating the observations with static parameters, which makes OI incapable of capturing time scales that are smaller than the temporal correlation parameter. Regarding the standard deviation, OI can only estimate a standard deviation that is dependent on the background and observation error covariances. AnDA is producing much more realistic standard deviation estimates, correlated with the absolute error of interpolation. This means that AnDA is able to detect when and where the interpolation is relevant or not. This point is crucial for the quantification of the uncertainty in the interpolation.

Our study demonstrates the potentiality of using AnDA as an alternative method to OI for the interpolation of along-track satellite observations. As the first step of demonstration, we have investigated for this study pure “twin” experiments, where the pseudo-observations and the AnDA historical catalog came from the same source (i.e., were fully consistent), and where a comparison to the known true SSH is possible. These twin experiments lead to encouraging results for AnDA, and call for future work to further test AnDA in the context of realistic operational applications. Future work will need to address several questions. First, are the good performances of AnDA confirmed when the historical catalog and the along-track observations do not come from the same source? In other words, a realistic experimental study should be performed with real observations or at least artificial observations extracted from an entirely distinct numerical simulation. Second, is the AnDA method applicable to other regions and/or to global scale? The current implementation is sufficient (technically) and can be straightforwardly applied to any other region of similar size as the Gulf of Mexico with no additional implementation difficulties. In this case, the creation of a new catalog will require new model data which can be costly unless, like in the present study with OCCIPUT data, the catalog is based on data that are available globally. The good performance of AnDA at regional scale (as shown here for the Gulf of Mexico) should then be confirmed in other regions under the condition that a computationally reasonable number of EOF is enough to capture the dynamics of that region. In other words, for this specific implementation of AnDA to work well, the energy distribution of the signal’s EOF decomposition must present a small tail. For the same reason, the EOF-based AnDA implementation is most likely not suited (as is) for global-scale applications. Being able to maintain a relatively small and detailed catalog is crucial to ensure a successful analog research. The EOF decomposition (with a computationally reasonable number of EOFs) would fail to capture the detailed SSH signal in a larger region and even more so at global scale. This restriction does pose an important challenge to a global-scale implementation of AnDA. However, a solid lead to extend the AnDA implementation to global scale has recently been developed. This implementation is a mixture of EOF-based AnDA implementation, as used in the present paper, and patched-based AnDA implementation, as described in Lguensat et al. (2019). This new implementation is currently under scrutiny and is already showing promising results. Finally, what is the computational cost of AnDA in comparison to OI? For the moment, the computational cost of AnDA is much larger than OI but, as already mentioned, a strong argument for AnDA is that the method does not require as much tuning as OI. Moreover, in a realistic setting, the tuning of OI is not only complicated and time-consuming but the tuning optimality cannot be guaranteed. Although, these considerations are obviously hard to quantify, a study should be conducted in where the computational efficiency of both OI and AnDA codes have been optimized. Also, in order to appropriately quantify the tuning efforts, the study should be taking into account the entire mapping production chain. A logical next step for AnDA would hence be to implement a comparative study in a realistic altimetric mapping production context in close collaboration with operational institutions.

Acknowledgments

This work has been carried out as part of the Copernicus Marine Environment Monitoring Service (CMEMS) 3DA project. CMEMS is implemented by Mercator Ocean in the framework of a delegation agreement with the European Union. Sammy Metref was funded by ANR through Contract ANR-17-CE01-0009-01. The ensemble simulation dataset used in this study was produced as part of the OCCIPUT project (http://meom-group.github.io/projects/occiput/) funded by the French Agence Nationale de la Recherche (ANR) through Contract ANR-13-BS06-0007-01, and further supported by the PIRATE project funded by the Centre National d’Études Spatiales (CNES) through the Ocean Surface Topography Science Team (OST/ST). The original OCCIPUT dataset is available upon request (contact: thierry.penduff@cnrs.fr)

APPENDIX

Time-Delayed Analog Forecast

A key aspect of analog forecasting is how to choose the analogs. On the one hand, the analogs need to be informative, meaning that Motivated by the mathematical theory established by Takens (1981) stating that, under certain conditions, the attractor of the original system can be embedded into the space of lagged partial state variables, we also consider using time-delayed states as the extended state variable. For the numerical experiment with Lorenz-63 system, our state estimate at time t is the three-dimensional vector xlag(t) = (xt, xt−τ, xt−2τ)T, where xt is the first component of the Lorenz-63 full state at time t and τ is a prescribed time gap. The value of τ is discussed in section 3a. For each t, although x(t) is represented in xlag(t), xlag(t + τ), and xlag(t + 2τ), we do not update xlag(t), xlag(t + τ), xlag(t + 2τ) at the same time. In other words, at the forecasting step at time t − 1 or at the data assimilation step at time t, only xlag(t) would be updated.

However, we do not apply time-delayed states in the experiment with SSH data since experimentally we do not find improvement of the quality of reanalysis.

REFERENCES

  • Ballarotta, M., and Coauthors, 2019: On the resolutions of ocean altimetry maps. Ocean Sci., 15, 10911109, https://doi.org/10.5194/os-15-1091-2019.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Beckers, J. M., and M. Rixen, 2003: EOF calculations and data filling from incomplete oceanographic datasets. J. Atmos. Oceanic Technol., 20, 18391856, https://doi.org/10.1175/1520-0426(2003)020<1839:ECADFF>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bentley, J. L., 1975: Multidimensional binary search trees used for associative searching. Commun. ACM, 18, 509517, https://doi.org/10.1145/361002.361007.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bessières, L., and Coauthors, 2017: Development of a probabilistic ocean modelling system based on NEMO 3.5: Application at eddying resolution. Geosci. Model Dev., 10, 10911106, https://doi.org/10.5194/gmd-10-1091-2017.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Compo, G. P., and Coauthors, 2011: The Twentieth Century Reanalysis project. Quart. J. Roy. Meteor. Soc., 137, 128, https://doi.org/10.1002/QJ.776.

  • Dibarboure, G., M.-I. Pujol, F. Briol, P. Y. L. Traon, G. Larnicol, N. Picot, F. Mertz, and M. Ablain, 2011: Jason-2 in DUACS: Updated system description, first tandem results and impact on processing and products. Mar. Geod., 34, 214241, https://doi.org/10.1080/01490419.2011.584826.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dufau, C., M. Orsztynowicz, G. Dibarboure, R. Morrow, and P.-Y. Le Traon, 2016: Mesoscale resolution capability of altimetry: Present and future. J. Geophys. Res. Oceans, 121, 49104927, https://doi.org/10.1002/2015JC010904.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Durand, M., L.-L. Fu, D. P. Lettenmaier, D. E. Alsdorf, E. Rodriguez, and D. Esteban-Fernandez, 2010: The Surface Water and Ocean Topography mission: Observing terrestrial surface water and oceanic submesoscale eddies. Proc. IEEE, 98, 766779, https://doi.org/10.1109/JPROC.2010.2043031.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Fablet, R., J. Verron, B. Mourre, B. Chapron, and A. Pascual, 2018a: Improving mesoscale altimetric data from a multitracer convolutional processing of standard satellite-derived products. IEEE Trans. Geosci. Remote Sens., 56, 25182525, https://doi.org/10.1109/TGRS.2017.2750491.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Fablet, R., P. Viet, R. Lguensat, P.-H. Horrein, and B. Chapron, 2018b: Spatio-temporal interpolation of cloudy SST fields using conditional analog data assimilation. Remote Sens., 10, 310, https://doi.org/10.3390/rs10020310.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Fu, L.-L., and R. Ferrari, 2008: Observing oceanic submesoscale processes from space. Eos, Trans. Amer. Geophys. Union, 89, 488488, https://doi.org/10.1029/2008EO480003.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gandin, L. S., 1965: Objective Analysis of Meteorological Fields. Israel Program for Scientific Translations, 242 pp.

  • Gaspari, G., and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions. Quart. J. Roy. Meteor. Soc., 125, 723757, https://doi.org/10.1002/qj.49712555417.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique. Mon. Wea. Rev., 126, 796811, https://doi.org/10.1175/1520-0493(1998)126<0796:DAUAEK>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jouanno, J., J. Ochoa, E. Pallàs-Sanz, J. Sheinbaum, F. Andrade-Canto, J. Candela, and J.-M. Molines, 2016: Loop Current frontal eddies: Formation along the Campeche Bank and impact of coastally trapped waves. J. Phys. Oceanogr., 46, 33393363, https://doi.org/10.1175/JPO-D-16-0052.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Le Traon, P. Y., F. Nadal, and N. Ducet, 1998: An improved mapping method of multisatellite altimeter data. J. Atmos. Oceanic Technol., 15, 522534, https://doi.org/10.1175/1520-0426(1998)015<0522:AIMMOM>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Le Traon, P. Y., and Coauthors, 2019: From observation to information and users: The Copernicus Marine Service perspective. Front. Mar. Sci., 6, 234, https://doi.org/10.3389/FMARS.2019.00234.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lguensat, R., P. Tandeo, P. Ailliot, M. Pulido, and R. Fablet, 2017: The analog data assimilation. Mon. Wea. Rev., 145, 40934107, https://doi.org/10.1175/MWR-D-16-0441.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lguensat, R., P. H. Viet, M. Sun, G. Chen, F. Tian, B. Chapron, and R. Fablet, 2019: Data-driven interpolation of sea level anomalies using analog data assimilation. Remote Sens., 11, 858, https://doi.org/10.3390/rs11070858.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lorenz, E. N., 1963: Deterministic nonperiodic flow. J. Atmos. Sci., 20, 130141, https://doi.org/10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Minamide, M., and F. Zhang, 2017: Adaptive observation error inflation for assimilating all-sky satellite radiance. Mon. Wea. Rev., 145, 10631081, https://doi.org/10.1175/MWR-D-16-0257.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Miyoshi, T., E. Kalnay, and H. Li, 2013: Estimating and including observation-error correlations in data assimilation. Inverse Probl. Sci. Eng., 21, 387398, https://doi.org/10.1080/17415977.2012.712527.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Penduff, T., and Coauthors, 2014: Ensembles of eddying ocean simulations for climate. CLIVAR Exchanges, Vol. 65, International CLIVAR Project Office, Southampton, United Kingdom, 19–22.

  • Ponte, R. M., and R. D. Ray, 2002: Atmospheric pressure corrections in geodesy and oceanography: A strategy for handling air tides. Geophys. Res. Lett., 29, 2153, https://doi.org/10.1029/2002GL016340.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Pujol, M.-I., G. Dibarboure, P.-Y. Le Traon, and P. Klein, 2012: Using high-resolution altimetry to observe mesoscale signals. J. Atmos. Oceanic Technol., 29, 14091416, https://doi.org/10.1175/JTECH-D-12-00032.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Pujol, M.-I., Y. Faugère, G. Taburet, S. Dupuy, C. Pelloquin, M. Ablain, and N. Picot, 2016: DUACS DT2014: The new multi-mission altimeter data set reprocessed over 20 years. Ocean Sci., 12, 10671090, https://doi.org/10.5194/os-12-1067-2016.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Schleicher, D., 2007: Hausdorff dimension, its properties, and its surprises. Amer. Math. Mon., 114, 509528, https://doi.org/10.1080/00029890.2007.11920440.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Takens, F., 1981: Detecting strange attractors in turbulence. Dynamical Systems and Turbulence, Warwick 1980, D. A. Rand and L.-S. Young, Eds., Lecture Notes in Mathematics, Vol. 898, Springer-Verlag, 366–381, https://doi.org/10.1007/BFB0091903.

    • Crossref
    • Export Citation
  • Tandeo, P., P. Ailliot, J. J. Ruiz, A. Hannart, B. Chapron, R. Easton, and R. Fablet, 2015: Combining analog method and ensemble data assimilation: Application to the Lorenz-63 chaotic system. Machine Learning and Data Mining Approaches to Climate Science, Springer, 3–12.

    • Crossref
    • Export Citation
  • Tandeo, P., P. Ailliot, M. Bocquet, A. Carrassi, T. Miyoshi, M. Pulido, and Y. Zhen, 2018: A review of innovation-based methods to jointly estimate model and observation error covariance matrices in ensemble data assimilation. arXiv, https://arxiv.org/abs/1807.11221.

  • Ubelmann, C., P. Klein, and L. L. Fu, 2015: Dynamic interpolation of sea surface height and potential applications for future high-resolution altimetry mapping. J. Atmos. Oceanic Technol., 32, 177184, https://doi.org/10.1175/JTECH-D-14-00152.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wu, Z., 1995: Compactly supported positive definite radial functions. Adv. Comput. Math., 4, 283292, https://doi.org/10.1007/BF03177517.

  • Zhen, Y., P. Tandeo, S. Leroux, S. Metref, T. Penduff, and J. L. Sommer, 2019: 3da code and data. Zenodo, accessed 17 December 2019, https://doi.org/10.5281/zenodo.3559784.

    • Crossref
    • Export Citation
Save