Using Local Dynamics to Explain Analog Forecasting of Chaotic Systems

P. Platzer aLaboratoire des Sciences du Climat et de l’Environnement, UMR 8212 CNRS-CEA-UVSQ, Institut Pierre-Simon Laplace and Université Paris-Saclay, Gif-sur-Yvette, France
bLab-STICC, UMR CNRS 6285, IMT Atlantique, Plouzané, France
cFrance Énergies Marines, Plouzané, France

Search for other papers by P. Platzer in
Current site
Google Scholar
PubMed
Close
,
P. Yiou aLaboratoire des Sciences du Climat et de l’Environnement, UMR 8212 CNRS-CEA-UVSQ, Institut Pierre-Simon Laplace and Université Paris-Saclay, Gif-sur-Yvette, France

Search for other papers by P. Yiou in
Current site
Google Scholar
PubMed
Close
,
P. Naveau aLaboratoire des Sciences du Climat et de l’Environnement, UMR 8212 CNRS-CEA-UVSQ, Institut Pierre-Simon Laplace and Université Paris-Saclay, Gif-sur-Yvette, France

Search for other papers by P. Naveau in
Current site
Google Scholar
PubMed
Close
,
P. Tandeo bLab-STICC, UMR CNRS 6285, IMT Atlantique, Plouzané, France

Search for other papers by P. Tandeo in
Current site
Google Scholar
PubMed
Close
,
J.-F. Filipot cFrance Énergies Marines, Plouzané, France

Search for other papers by J.-F. Filipot in
Current site
Google Scholar
PubMed
Close
,
P. Ailliot dLaboratoire de Mathématiques de Bretagne Atlantique, Brest, France

Search for other papers by P. Ailliot in
Current site
Google Scholar
PubMed
Close
, and
Y. Zhen bLab-STICC, UMR CNRS 6285, IMT Atlantique, Plouzané, France

Search for other papers by Y. Zhen in
Current site
Google Scholar
PubMed
Close
Free access

Abstract

Analogs are nearest neighbors of the state of a system. By using analogs and their successors in time, one is able to produce empirical forecasts. Several analog forecasting methods have been used in atmospheric applications and tested on well-known dynamical systems. Such methods are often used without reference to theoretical connections with dynamical systems. Yet, analog forecasting can be related to the dynamical equations of the system of interest. This study investigates the properties of different analog forecasting strategies by taking local approximations of the system’s dynamics. We find that analog forecasting performances are highly linked to the local Jacobian matrix of the flow map, and that analog forecasting combined with linear regression allows us to capture projections of this Jacobian matrix. Additionally, the proposed methodology allows us to efficiently estimate analog forecasting errors, an important component in many applications. Carrying out this analysis also makes it possible to compare different analog forecasting operators, helping us to choose which operator is best suited depending on the situation. These results are derived analytically and tested numerically on two simple chaotic dynamical systems. The impact of observational noise and of the number of analogs is evaluated theoretically and numerically.

© 2021 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Paul Platzer, paul.platzer@imt-atlantique.fr

Abstract

Analogs are nearest neighbors of the state of a system. By using analogs and their successors in time, one is able to produce empirical forecasts. Several analog forecasting methods have been used in atmospheric applications and tested on well-known dynamical systems. Such methods are often used without reference to theoretical connections with dynamical systems. Yet, analog forecasting can be related to the dynamical equations of the system of interest. This study investigates the properties of different analog forecasting strategies by taking local approximations of the system’s dynamics. We find that analog forecasting performances are highly linked to the local Jacobian matrix of the flow map, and that analog forecasting combined with linear regression allows us to capture projections of this Jacobian matrix. Additionally, the proposed methodology allows us to efficiently estimate analog forecasting errors, an important component in many applications. Carrying out this analysis also makes it possible to compare different analog forecasting operators, helping us to choose which operator is best suited depending on the situation. These results are derived analytically and tested numerically on two simple chaotic dynamical systems. The impact of observational noise and of the number of analogs is evaluated theoretically and numerically.

© 2021 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Paul Platzer, paul.platzer@imt-atlantique.fr

1. Introduction

To evaluate the future state of a physical system, one strategy is to use physical knowledge to build differential equations that emulate the dynamics of this system. Then, measurements provide information on the initial state from which these equations must be integrated. Data assimilation (Carrassi et al. 2018) gives a framework to account for two main types of error in this forecasting process. First, the aforementioned equations do not describe perfectly the real dynamics of the system, and solving these equations often requires additional approximations, such as numerical discretization. These first error sources combine into what is called model error. Second, observations are usually partial and noisy, such that the initial state from which the differential equations must be integrated is uncertain. Observation error is especially important for chaotic dynamical systems as the latter are highly sensitive to initial conditions. The quantification of model and observational uncertainties is an important topic in data assimilation (Tandeo et al. 2020).

For complex, highly nonlinear systems such as the atmosphere, forecasts based on physical equations are challenging. Indeed, the numerical integration of discretized physical equations can be expensive for high-resolution grids, and unresolved small-scale processes such as convection (Prein et al. 2015) or orographic drag (Lott and Miller 1997) must be parameterized. Therefore, many empirical methods have been used in atmospheric sciences (see Van den Dool 2007 and references therein). The volume of data from model outputs, observations, or the combination of both has increased exponentially in the last decades (Balaji 2015), which has strengthened the scientific interest for empirical methods. One of such methods is called analog forecasting and is based on a notion originally introduced by Lorenz (1969) to estimate atmospheric predictability. Analog forecasting has been used in meteorological applications and on low-dimensional dynamical systems. Analogs were originally used for weather forecasting (Schuurmans 1973), but nowadays numerical weather prediction is usually based on physical models combined with observations through data assimilation, thanks to the progresses of physical models and computer power. However, analog forecasting remains interesting to perform statistical tasks because it is computationally cheap, or when no physical model is available. For instance, Yiou (2014) uses analogs in the context of stochastic weather generators. Several authors (Tandeo et al. 2015; Hamilton et al. 2016; Lguensat et al. 2017; Grooms 2021) combined analog forecasting and data assimilation. Analog forecasting procedures have recently been used in a large range of environmental applications, from well-known atmospheric oscillations (Alexander et al. 2017; Wang et al. 2020) to surface wind velocity (Delle Monache et al. 2013) or solar irradiance (Ayet and Tandeo 2018).

Analog forecasting proposes to bypass physical equations and to use existing trajectories of the system instead, drawing either from numerical model output, observation data or reanalysis. Analog methods are based on the hypothesis that one is provided with many (or one long) trajectories of the system of interest, which enables to find analog states close to any initial state, and to use the time successors of these analogs to evaluate the future state of the system. The fluctuating quality and density of available trajectories adds variability to this process. This leads to analog forecasting errors, which play the same role as the previously described model errors. The present study focuses on analog forecasting errors.

Data-based strategies have been used extensively for the forecast of dynamical systems, some of which are similar to the analog methods studied here. Early approaches similar to the locally linear analog forecasting operator (introduced later on in the present paper) include the reconstruction of equations of motion from Crutchfield and McNamara (1987) and the forecast strategy of Farmer and Sidorowich (1987) and Farmer and Sidorowichl (1988). In particular, the latter are able to estimate the scaling of forecast errors with forecast time, number of data, and intrinsic properties of the system such as the maximal Lyapunov exponent. This approach is similar to the approach presented in this paper, although we also focus on forecasts at small time horizon, for which the divergence associated with the maximal Lyapunov exponent is not dominant yet. Local approaches have also been combined with time embeddings for data-based forecasts of dynamical systems, a strategy that has grown popular since the early works of Sauer (1994) and Sugihara (1994).

Theoretical arguments on the convergence of analog forecasts toward the real future state can be found in Zhao and Giannakis (2016), where the authors focus on specific kernels that make it possible to enhance analog forecasting performances. More recently, Alexander and Giannakis (2020) showed theoretical convergence toward optimal forecasts for an extended version of conventional analog forecasting approaches, involving kernel principal component regressions. Our objective here is to relate the errors of the analog forecasting operators of Lguensat et al. (2017) to local dynamical properties of the system. In the following, “analog forecasting” refers to the specific type of methods studied by Lguensat et al. (2017). Our work has strong connection with the work of Atencia and Zawadzki (2017), which focuses on the growth of analog forecasting errors to compare analog ensembles with other ensemble forecast methods.

Preliminary results suggest that analog forecasting errors can be estimated empirically using local approximations of the true dynamics (Platzer et al. 2019). The current paper gives a more in-depth description of the theory that supports different analog forecasting procedures and allows us to evaluate the evolution of analog covariance matrices. The methodology is applied to two chaotic Lorenz systems.

A theoretical framework for analog forecasting is outlined in section 2, and three analog forecasting operators are recalled. Using this framework, the successor-to-future state distance is expressed in section 3. Section 4 examines analog forecasting mean and covariance and investigates the link between linear regression in analog forecasting and the Jacobian matrix of the real system’s flow map.

2. Analog forecasting

a. Mathematical framework

Let a dynamical system be defined by the following time differential equation:
dxdt=f(x),
where x is a vector that fully characterizes the state of the system, and f is a deterministic, vector-valued map. The space P in which x lives is called phase space. For real geophysical systems, the physical space is an infinite-dimensional function space, as variables such as pressure or temperature are continuous functions of space and time. Mathematical studies on the attractors from atmospheric systems have suggested a finite dimension, although upper bounds could not be obtained (Temam 1988). Therefore, an infinite-dimensional physical space yields a finite dimensional phase space for many examples of hydrodynamical systems. Throughout this study, P is a vector space of finite dimension n. The system is supposed to be autonomous, such that f:PP does not depend on time.
Given an initial state x0, a forecast gives an estimation of the state of the system xt at a later time t. The true future state xt is given by the flow map Φ:P×RP such that
Φt:x0Φt(x0)=xt.

For the dynamical system defined through Eq. (1), Φ represents the time integration of this equation. The theorem of Poincaré (1890) states that trajectories will come back infinitely close to their initial condition after a sufficiently long time and will do so an infinite number of time. It is valid for many geophysical systems of interest and based on the hypothesis that the system can only access a finite volume of phase space, and that the flow “preserves volumes” in phase space. Furthermore, if the dynamical system has an attractor set AP, then almost all trajectories issued from its “basin of attraction” (a set of typical initial conditions) converge to this subset A (Milnor 1985). In the following, we assume the simple case that the system has one attractor A, and that all trajectories are in A.

Analog methods are based on the idea that if one is provided with a long enough trajectory of the system of interest, one will find analog states close to any initial point x0 in the attractor A. The trajectory from which the analogs are taken is called the “catalog” C, and can either come from numerical model output or reprocessed observational data.

Analog forecasting thus supposes that we know a finite number of initial states that are close enough to x0 to be called “analogs,” and that the flow map of the analogs resembles Φ. Therefore, the time successors of the analogs should allow us to estimate the real future state xt. In the following, the kth analog and its successor are noted a0k and atk. Note that analog forecasting is intrinsically random as it depends on the catalog, which is one out of many possible trajectories. The variability in the catalog influences the ability of the analogs and successors to estimate the future state. This motivates the use of probabilistic analog forecasting operators Θ such that
Θt:x0Θt(x0),
where Θt(x0) is a distribution that gives information both about the estimation of the future state xt and the variability of this estimation process.

Note that for chaotic dynamical systems, any small but finite perturbation in the initial condition will grow exponentially in time. Therefore, however small the distance between the analog a0k and the initial state x0 may be, the distance between the successor atk and the future state xt will eventually be as large as the typical distance between two random points on the attractor. For a chaotic system, we define the Lyapunov time as the time it takes for an infinitesimal error in initial conditions to double. The time needed for |atkxt| to be as large as the distance between any two points in the attractor is a function of both the Lyapunov time and the initial distance between a0k and x0. After this divergence, the analog forecast is as informative as a simple climatological forecast (i.e., a forecast based on average statistics on the attractor). This study is devoted to the properties of analog forecasts when the distance between atk and xt is still small compared to the typical distance between any two points on the attractor.

In atmospheric applications, the Lyapunov time is of the order of 5–10 days (see Vannitsem 2017 and references therein). The chaotic behavior makes forecasts difficult, whatever the methodology used. It is the main limiting factor of atmospheric predictability horizon. For a more detailed introduction to Lyapunov exponents and chaos theory in the context of atmospheric applications, the reader is referred to Dijkstra (2016). Note that the quantification of atmospheric predictability was the first motivation for the study of atmospheric analogs (Lorenz 1969), and the latter are still used in this purpose nowadays (Li and Ding 2011).

b. Analog forecasting operators

We present three analog forecasting operators originally introduced in Lguensat et al. (2017). A finite number K of analogs (a0k)k[1,K] and successors (atk)k[1,K] are used and are assigned weights (ωk)k[1,K].

The distributions of each analog forecast Θt(x0) is a discrete distribution with finite support (i.e., a finite sum of Dirac delta functions), with each pair of analog/successor defining an element of the empirical distribution.

  • The locally constant operator (“LC operator” or simply “LC”) uses only the successors to estimate xt. ΘLCt(x0)~kωkδatk(). The mean forecast is thus μLC=kωkatk. The covariance of the forecast is covωk(atk), the ω-weighted empirical covariance of the successors.

  • The locally incremental operator (“LI operator” or simply “LI”) uses x0, the analogs and the successors to estimate xt.

  • ΘLIt(x0)~kωkδx0+atka0k(). The mean forecast is μLI=x0+kωk(atka0k). The covariance of the forecast is covωk(atka0k), the ω-weighted empirical covariance of the increments.

  • The locally linear operator (“LL operator” or simply “LL”) performs a weighted linear regression between the analogs and the successors. The regression is applied between a0kμ0 and the successors atk, where μ0=kωka0k. This gives slope S, intercept c, and residuals ξk=atkS(a0kμ0)c. ΘLLt(x0)~kωkδμLL+ξk(). The mean forecast is μLL=S(x0μ0)+c. The covariance of the forecast is covωk(ξk), the ω-weighted empirical covariance of the residuals.

The LC, LI, and LL analog forecasting operators are illustrated in Fig. 1. The variance of the LC is similar around t = 0 and for the final value of t. On the other hand, the variance of the LI goes to 0 as t→0, but for large times the LI estimator has a larger variance compared to the LC. The next sections provide some information that helps to interpret this phenomenon. The LL is able to catch the dynamics, and therefore shows a smaller variance and better skills than the LC and LI at all times. This is due to the fact that, in this example, nonlinear terms are negligible, and the flow map of the analogs matches exactly the real system’s flow map.

Fig. 1.
Fig. 1.

Analog forecasting operators presented in section 2b. The flow map Φt (x0) has a simple polynomial form. Analogs are drawn from a normal distribution centered on x0 and follow the same model as the real state x. The same analogs and flow maps are used for the three operators (a) LC, (b) LI, and (c) LL. Weights ωk are computed using Gaussian kernels. The size of the kth triangle is proportional to ωk. The real initial and future states x0 and xt are shown in full circles. The initial forecast distribution is given either by the analogs in (a) or by a Dirac delta at x0 in (b) and (c). The final forecast distribution is given either by the successors in (a) or by the increments added to x0 in (b) or by residuals added to a linear regression applied to x0 in (c).

Citation: Journal of the Atmospheric Sciences 78, 7; 10.1175/JAS-D-20-0204.1

It is worth mentioning another kind of analog forecasting operator called “constructed analogs” (CA). It is a particular case of the LC operator where the weights ωkCA can have negative values and are such that the mean of the analogs μ0 is as close as possible to the initial state: {ωkCA}k=argmin{ωk}k|kωka0kx0|. It was used by Van Den Dool (1994) in cases where using directly the analogs from the catalog did not yield satisfactory results. Later, Tippett and DelSole (2013) showed that CA are equivalent to the LL operator with constant weights.

In the following and unless otherwise specified, it is assumed that the weights ωk are positive and decreasing functions of the distance between a0k and x0, and that kωk=1. This is a common choice, for instance, Lguensat et al. (2017) set ωkexp(|a0kx0|2/2l2) where l is the median of analog-to-initial state distances. This allows us to give more weight to the analogs that are closest to the real initial state. On the same topic, see the work of McDermott and Wikle (2016). However, the present analysis of analog forecasting errors is not restricted to any particular choice of weights. In the following, we make approximations in the limit a0kx0, so that the only hypothesis concerning the weights is that ωk0 if the kth analog is too far away from x0 for the limit to be taken.

3. Successor-to-future state distance

a. Notations and hypotheses

This work assumes that the evolution dynamics of the analogs are similar to the evolution dynamics of the system of interest, and that the system is deterministic. This can be stated in a differential equation form:
{dxdt=f(x)xt=0=x0,
k,{dakdt=fa(ak)at=0k=a0k,
fa=f+δf˜,
or in an integrated form using flow maps:
xt=Φt(x0),
k,atk=Φat(a0k),
Φat=Φt+δΦ˜t,
where Φa is the flow map of the analogs, and Φ˜ is the difference between the analog and real flow maps normalized through the scalar value δ so that Φ, Φa, and Φ˜ are of the same order of magnitude. In particular, in the following, the limit δ → 0 will be taken, equivalently to the limit |Φt||δΦ˜t|. This will allow us to emphasize that at first order in δ, the contribution of the error in the analogs’ flow map (or “dynamical error”) is additive and linear in the small parameter δ. The maps f, fa, and f˜ are defined accordingly.
To account for observational errors in the catalog, we use an error-in-variables framework:
k,atkϵtk=Φat(a0kϵ0k),
where ϵ0k is the additive observational error of the analog, and ϵtk is the error of the successor. In the description using f, the observational noises would be added at time 0 and t to a trajectory that follows the equation given by fa. The noises do not perturb the trajectory, only the observation.

In these formulations, the fundamental hypotheses of analog forecasting are the continuity of Φt (or f) with respect to the phase-space variable x, the density of the catalog C (i.e., for all k, a0k is close to x0 for a given metric) and the adequacy of the analogs’ dynamics (i.e., δ is small, ΦaΦ).

One must assume that the catalog yields enough samples to be dense in the phase space (in the topological exception), which means that no region of phase space is left unexplored. However, the minimum catalog length to reach a given density depends exponentially on the dimension of the system (Van Den Dool 1994; Nicolis 1998), making the use of analogs in high-dimension challenging. Also, some variables may exhibit extreme events that belong to regions of phase space that are rarely visited (distribution tails), even when the phase-space dimension is low. Therefore, one must be especially careful when using analogs to forecast extreme events.

In practice, the fact that the analogs’ dynamics are close to the real dynamics of the system (i.e., δ → 0) is only partially achieved. For instance, climate change may induce some modifications in atmospheric dynamics between the preindustrial era and the twenty-first century, so that the hypothesis ΦaΦ might not be valid if one uses analogs of wind velocity from the 1960s to produce wind velocity forecasts in 2020. Also, if one uses analogs from numerical model output, discretization approximations and physical approximations imply that ΦaΦ.

b. When analogs work: Taylor expansions of the dynamics

1) Distance between successor and real state

Assuming different levels of smoothness of the flow maps and using Taylor expansions, one can estimate the difference between the real future state xt and any given successor atk at leading order:
k,atkxt=δΦ˜t(x0)+[Φt|x0](a0kx0)+O(|a0kx0|2,δ|a0kx0|),
where Φt|x0 is the Jacobian matrix (the matrix of partial derivatives in phase space) of Φt at x0, “·” is the matrix multiplication, and O(|a0kx0|2,δ|a0kx0|) represents higher-order terms. The notation α=O(β) means that the absolute ratio of α over β is bounded as β goes toward zero. We also use the notation O(α,β)=O(α)+O(β).
Neglecting higher-order terms and simplifying notations, Eq. (6b) can be rewritten:
atkxtδΦ˜t+Φt(a0kx0),
where the evaluation of δΦ˜t and Φt at x0 is implicit. In the presence of observational noise as in Eq. (5d), two terms are added:
atkxtδΦ˜t+Φt(a0kx0)+ϵtkΦtϵ0k.

The leading-order difference terms explicitly described in the right-hand side of Eq. (6b) come from two sources. The first source is the difference between the analog and true flow maps at point x0, which is independent of a0k. The second source of difference is the mismatch in the initial condition, left-multiplied by the Jacobian matrix of the true flow map at point x0.

Equation (6b) states that at first order, these two error terms are additive. This is not true at higher orders. Higher-order terms include the bilinear product of a0kx0 with a matrix of second derivatives of Φt called the Hessian, and the product of the Jacobian of Φ˜t at x0 and a0kx0. Note also that, although |a0kx0|2O(|a0kx0|2) is bounded, it can still be of large magnitude. In particular, for large lead times, higher-order terms can become dominant due to the chaotic divergence associated with positive Lyapunov exponents. A detailed example will be given in Fig. 7.

Figure 2 shows applications of Eq. (6b) to the three-variable system of Lorenz (1963), hereafter L63. We focus on Eq. (6b) rather than the full Eq. (6c), because ϵtk plays the same role as δΦ˜t and ϵ0k plays the same role as (a0kx0). A real trajectory is compared with two analog trajectories. The L63 system is solved numerically using a fourth-order Runge–Kutta finite-difference scheme, with numerical integration time step Δt = 0.01 nondimensional time. For notation details, see Eq. (A1) in appendix A. The real trajectory has parameters σ = 10, ρ = 28, and β = 8/3, while the σ parameter for the analog dynamics is slightly perturbed with σ˜ = 9 = 0.9σ. The matrices δΦ˜t and Φt are estimated numerically using formulas given below and time step Δt = 0.01. The 10th analog stays close enough to the real trajectory all the time (Fig. 2a), therefore Eq. (6b) gives a satisfactory approximation of |at10xt| (Fig. 2b). The 100th analog starts to be too far from the real trajectory around t ≈ 0.7 (Fig. 2a,b), and Eq. (6b) provides a poor approximation of |at100xt| (Fig. 2b).

Fig. 2.
Fig. 2.

Illustrating Eqs. (6b) and (6e) on the three-variable L63 system. (a) A real trajectory from x0 to xt and two analog trajectories, namely the 10th best analog a010 to at10 and the 100th best analog a0100 to at100. The catalog is shown in white. (b) Comparing the exact value of the norm of atxt (full lines) and the sum of the two terms on the right-hand side of Eq. (6b) (dashed lines). (c) Contributions of the first term (black squares) and the second term (brown circles and blue triangles) of the right-hand side of Eq. (6b) projected on the first coordinate of the L63 system. (d) Contributions of the first term (black squares) and the second term (brown circles and blue triangles) of the right-hand side of Eq. (6e) projected on the first coordinate of the L63 system.

Citation: Journal of the Atmospheric Sciences 78, 7; 10.1175/JAS-D-20-0204.1

The different right-hand-side terms of Eq. (6b) are projected on the first axis of phase space and displayed in Fig. 2c. The “flow map” term δΦ˜t is the same for both analogs, but the “initial condition” term Φt(a0kx0) is much larger for the 100th analog, and one can see that those terms are proportional, here negatively correlated.

Further assuming that first-order terms in t are dominant, one can express Eq. (6b) in the alternative formulation:
k,atkxt=tδf˜(x0)+[I+tf|x0](a0kx0)+O(t2,|a0kx0|2,δ|a0kx0|),
where I is the identity matrix. Using lighter notations, this becomes
atkxttδf˜+[I+tf](a0kx0),
where the evaluation of δf˜ and f at x0 is implicit. This last formulation is analogous to a Euler scheme used in finite-difference numerical methods for solving differential equations. It allows us to describe the behavior of analog forecasting for small times, before the divergence associated with positive Lyapunov exponents becomes dominant. In Fig. 2d, one can see that the right-hand-side terms of Eq. (6e) only approximate the terms of Eq. (6b) for t0.1. This also explains the behavior of some analog forecasting operators in Fig. 7 (introduced later on in the paper) for t0.1.

2) Link between the two formulations, fand Φt

Equation (6e) is a first-order expansion in time of Eq. (6b). The fundamental resolvent matrix M(t, t′) gives a more complete relationship between the two representations. M(t, t′) is solution to the time varying linear system dM(t,t)/dt=f|xtM(t,t) with M(t′, t′) = I. The fundamental resolvent matrix can be approximated numerically as M(t,t)exp(Δtft)exp(ΔtftΔt)exp(Δtft) with numerical time step Δt and using the short notation ft:=f|xt.

We have
δΦ˜t(x0)δ0tM(t,t)f˜(xt)du,
Φt|x0=M(t,0),
where the “≈” sign is here to say that Eq. (7a) is valid only at first order in δ. This first order is enough to compute the right-hand-side terms of Eq. (6b), which is also valid at first order in δ.
From Eq. (7b), one can use Taylor developments relating ∇f and ∇Φt, such as
Φt=I+t f0+t2[(f0)2+ddtf0]+O(t3),
where ∇Φt is implicitly evaluated at x0. The short notation ∇f0 is used for fx0, and (d/dt)∇f0 is the time derivative along the trajectory xt of the Jacobian of f, at t = 0; (d/dt)f|0:=limt0(ftf0)/t. At first order in t, one recovers the result expressed in Eq. (6e).

4. Consequences for analog forecasting operators

a. Mean error of analog forecasting operators

By multiplying Eqs. (6a) and (3b) by ωk and summing over k, one can compare the distances from xt to the averages μLC, μLI, and μLL of the different analog forecasting operators of section 2b. Those averages depend on t, although only implicitly in the notation. Let μ0=kωka0k be the weighted mean of the analogs, we then have the following expressions.

LC mean error:
μLCxt=δΦ˜t(x0)+[Φt|x0](μ0x0)+O(kωk|a0kx0|2,δkωk|a0kx0|),
μLCxt=tδf˜(x0)+[I+tf|x0](μ0x0)+O(t2,kωk|a0kx0|2,δkωk|a0kx0|).
Using simpler notations with implicit evaluation at x0, this gives
μLCxtδΦ˜t+Φt(μ0x0),
tδf˜+[I+tf](μ0x0),
and, in the presence of noise:
μLCxtδΦ˜t+Φt(μ0x0)+kωkϵtkΦtkωkϵ0k.
LI mean error:
μLIxt=δΦ˜t(x0)+[Φt|x0I](μ0x0)+O(kωk|a0kx0|2,δkωk|a0kx0|),
μLIxt=tδf˜(x0)+[tf|x0](μ0x0)+O(t2,kωk|a0kx0|2,δkωk|a0kx0|).
Using simpler notations with implicit evaluation at x0, this gives
μLIxtδΦ˜t+[ΦtI](μ0x0),
tδf˜+tf(μ0x0).
And, in the presence of noise,
μLIxtδΦ˜t+[ΦtI](μ0x0)+kωkϵtkΦtkωkϵ0k.

The errors of the LC and LI operators are both affected by the difference between the analog and real flow maps. This source of error cannot be circumvented unless provided with some information about δΦ˜. The other first-order error term is linear in (μ0 − x0), but when t → 0, this term is of order t in the LI case. Thus, for small lead-times (i.e., when the first-order time approximations are valid), as both t → 0 and μ0x0 (dense catalog), the mean of the LI provides a better estimate of xt. This is why this operator is qualified by Lguensat et al. (2017) as more “physically sound” than the LC: the LI takes advantage of the fact that limt0Φt=I, just as any finite-difference numerical scheme does. Formulas similar to Eqs. (9) and (10) were used by Platzer et al. (2019) to predict analog forecasting errors with LC and LI operators, on the famous three-variable L63 system, with δ = 0.

Another interesting property of the LI is that it can give estimates of xt out of the convex hull of the catalog. This is related to what is called “novelty creation” in the machine-learning terminology. Such a property is interesting, because it could help to generalize out of what has already been observed. Since observations never span the whole phase space, it is useful to be able to produce forecasts beyond past observations. However, this can also generate some inconsistent forecasts. Indeed, if t is not small enough, the LI operator can produce forecasts that have a large error due to the −I term in Eq. (10a). In Fig. 1, one can see that the LI has a larger variance than the LC for large times.

In Eq. (9e) and Eq. (10e), the noises ϵ0k and ϵtk are averaged with weights ωk. This averaging procedure gives new centered variables kωkϵ0k and kωkϵtk. If the noises have finite variance σ2, the variance of the averaged noises is ≈σ2/K [to be more precise, the variance is divided by ≈K/2 if one uses the procedure of Lguensat et al. (2017)]. This shows that there is a bias–variance trade-off: raising the value of K lowers the variance and raises the bias of the estimator of xt.

Equation (9) is also valid for constructed analogs (CA) introduced in section 4b, where the weights {ωkCA} are chosen so that |kωkCAa0kx0| is as small as possible. This means that the (μ0x0)-linear term of Eq. (9) is also small. As mentioned earlier, Tippett and DelSole (2013) showed that this strategy is equivalent to making a linear regression. This explains why the (μ0x0)-linear term is absent from Eqs. (11a) and (11b).LI mean error:
μLLxt=δΦ˜t(x0)+O(kωk|a0kx0|2,δkωk|a0kx0|),
μLLxt=tδf˜(x0)+O(t2,kωk|a0kx0|2,δkωk|a0kx0|).

Another way to understand why the (μ0x0)-linear term should disappear when using the LL is to see that the LL is estimating the local Jacobian of the flow map. Indeed, the linear regression between the analogs and the successors gives an estimation of Φt|x0, with an estimation error that is at least of order O(|μ0x0|,δ). Section 4b gives a detailed argumentation to support this claim and investigates limitations. The estimation error between the linear regression matrix and the Jacobian thus adds higher-order error terms to the right-hand side of Eqs. (11a) and (11b), but these are already included in the O(kωk|a0kx0|2,δkωk|a0kx0|).

The modified form of Eqs. (11a) and (11b) due to additive observational noise is not straightforward. The linear regression of the LL operator is designed for the special case of ϵtk~N(0,σ2I) and ϵ0k=0. The error-in-variables model where ϵtk,ϵ0k~N(0,σ2I) is a more realistic representation of measurement errors. Applying regular linear least squares regression in this case is known to induce bias in the estimated linear coefficients. The bias is toward zero when Φt is one-dimensional and linear, but it is more complicated when Φt is nonlinear (Griliches and Ringstad 1970) and multidimensional. Total least squares (Markovsky and Van Huffel 2007) could be used to account for this fact. However, for the sake of simplicity here, the LL operator presented in section 3b is applied. A numerical example will be described in section 4b(5), section 4c and in Fig. 7.

We now make the explicit link between the three operators. Recall the notations of section 2b: the LL operator finds slope S and intercept c such that for all k, atk=S(a0kμ0)+c+ξk using weighted least squares estimates. This gives c=kωkatk=μLC, thus we have μLL=μLC+S(x0μ0) and the following relations hold:
μLC=μLL|S=0,
μLI=μLL|S=I,
such that the LC and LI operator are particular cases of the LL operator. We also have limt0S=I, because for all k, limt0atk=a0k. Thus, mean forecasts of the LL and LI operators are equivalent as t approaches 0: μLL~t0μLI.

This analysis shows that, for small lead times (i.e., when the time linear approximation is still valid) and in terms of mean forecast error, the LL operator is more accurate than the LI, and the latter is more accurate than the LC. These findings are in agreement with the numerical experiments of Lguensat et al. (2017). A detailed description of the advantages of each analog forecasting operator for different lead times is given in section 4c, in a numerical experiment of the L63 system.

We now investigate the link between the local Jacobian of the flow Φt|x0, and the linear regression matrix from the LL operator S.

b. Ability of analogs to estimate local Jacobians

If analogs can estimate the Jacobian of the real system, it means that analog forecasting provides a local approximation of the real dynamics, proving the relevance of analogs for short-range forecasts. Furthermore, having an estimation of the local Jacobian can be useful in some applications such as the extended Kalman filter (see Jazwinski 1970 for a detailed introduction), where the Jacobian allows us to estimate the evolution of the forecast covariance.

1) Derivation of the first-order error in Jacobian estimation

It is possible to find an exact expression of the first-order error term in the estimation of the local Jacobian. Let us start with the case of perfect agreement between the real and analog flow maps: ΦaΦ, or δ = 0. Then, assume that in the neighborhood of x0 where the analogs lie, the flow Φt(·)can be approximated by a quadratic function in phase space. We then have
k ,atk=Φt(a0kμ0)+12(a0kμ0)2Φt(a0kμ0)T+Cst,
where “Cst” is a constant (independent of k), and the Jacobian and Hessian of Φt are implicitly evaluated at x0 (see appendix B for notation of product of vectors and Hessian). In the next equations, the t superscript is dropped to simplify notations. Let X be the matrix of the analogs minus their mean, so that the kth row of x is a0kμ0. Similarly, let Y be the matrix of the successors, with the kth row of Y being atk. Equation (13) thus translates into Y=XΦT+(1/2)X2ΦXT, omitting the constant.
Now let Ω=diag(ω1,,ωK) be the K × K diagonal matrix of the weights given to each analog in the regression. Then S is the weighted least squares solution of the linear regression S=(XTΩX)1XTΩY. With a bit of rewriting, this finally gives
SΦ=(XTΩX)1XTΩ{X2Φ[12X+(μ0x0)TJK,1]T},
where ⊗ is the Kronecker matrix product and JK,1 is the column vector with K elements all equal to 1.

Equation (14) tells us that S is close to the Jacobian at x0 up to a factor that is linear in the distance between the mean of the analogs μ0 and the analogs a0k, and another factor linear in the distance between μ0 and x0. These linear error term depend on the second-order phase-space derivatives of Φ at the point x0 (the Hessian of Φ).

Conducting the same derivation but relaxing the hypothesis of δ = 0, one would find the same result with an added linear error term involving the Jacobian of Φ˜. This analysis allows us to say that S=Φt+O(|μ0x0|,δ), if the distance between the analogs and their mean is of same order as the distance between their mean and x0.

However, the claim that the linear regression matrix S is able to approximate the Jacobian ∇Φt must be tempered by several facts. To illustrate these, the regular LL analog forecasting operator will now be compared with two other strategies aimed at solving dimensionality issues.

2) Strategies for linear regression in high dimension

Dimensionality can make analog forecasting difficult, especially when using the locally linear analog forecasting operator. Here are recalled two strategies that can be used to circumvent this issue.

The first approach uses empirical orthogonal functions (EOFs, also called principal component analysis) at every forecast step. Dimension is reduced by keeping only the first neof EOFs of the set of analogs (a0k)k[1,K], or keeping only the neof first principal components of the matrix XTΩX.

Reducing dimension using EOFs:

  • Find analogs (a0k)k[1,K] of the initial state x0.

  • Compute the n EOFs of the weighted set of analogs (a0k)k[1,K].

  • Keep the neof first EOFs up to 95% total variance.

  • Project x0, (a0k)k[1,K] and (atk)k[1,K] on the neof first EOFs.

  • Perform LL analog forecasting in this projected space.

The second strategy is to perform n analog forecasts, one for each coordinate of the phase space P, and to assume that the future of a given coordinate only depends on the initial values of the neighboring coordinates and not on the whole initial vector x0. In the model of Lorenz (1996, hereafter L96), Eq. (A2) in appendix motivates the choice of keeping only the initial coordinates {i − 2, i − 1, i, i + 2, i + 2} to estimate the ith future coordinate. Thus we keep only ntrunc = 5 initial coordinates. Thus, the LL operator performs n linear regressions with 5 coefficients at each forecast. By combining the results of those linear regressions, one finds an n × n matrix that is sparse by construction: all elements two cells away from the diagonal are equal to zero. This was introduced in Lguensat et al. (2017) as “local analogs.” In the present paper this strategy will rather be termed as “coordinate-by-coordinate” analog forecasting. We make this choice to avoid confusions between locality in phase space and locality in physical space.

Coordinate-by-coordinate forecast:

for i from 1 to n, forecast the ith future coordinate xt,i:

  • Condition the forecast ΘLL,it on a few initial coordinates around x0,i.
    ΘLL,it(x0)=ΘLL,it(x0,i2,x0,i1,x0,i,x0,i+1,x0,i+2).
  • Find analogs of the truncated initial vector (x0,i2,x0,i1,x0,i,x0,i+1,x0,i+2).

  • Perform LL analog forecasting ΘLL,it.

  • Store the coefficients of the linear regression (Si,i2,Si,i1,Si,i,Si,i+1,Si,i+2).

  • Aggregate the coefficients into the n × n matrix S

There are other possibilities than EOFs or coordinate-by-coordinate forecast to reduce dimensionality (such as dynamic mode decomposition, Schmid 2010). However, they often require prior knowledge on the system itself, so that a use on one particular set of data might not be transposable to another set without tests that are beyond the scope of this paper, as stated in the “no free lunch theorem” (Wolpert 1996). Thus, we prefer to keep an EOF analysis, as the variables we consider have a roughly Gaussian distribution, with no particularly heavy tails. The coordinate-by-coordinate forecast is justified by the nature of the L96 dynamical equations and the experiments of Lguensat et al. (2017). However, in the case of a real application problem (e.g., in a large spatial domain), the coordinate-by-coordinate forecast procedure has to be carefully tuned. Indeed, the coordinates taken into account must be of the order of magnitude of the spatial correlation length.

The next section investigates limitations to the claim that the matrix S from the LL operator is able to approximate the Jacobian ∇Φt, and studies the impact of dimension reduction techniques on this Jacobian estimation.

3) Effect of the number of analogs and the phase-space dimension

First, to be able to compute S, one must have enough analogs to perform the inversion of the matrix XTΩX, where X is the matrix of the analogs and Ω the diagonal matrix of the weights. This cannot be done unless K, the number of analogs used for the forecast, is superior or equal to n, the phase-space dimension. Using the EOF or coordinate-by-coordinate strategies from the previous section, one can reduce the dimension to neof or ntrunc, needing only to satisfy Kneof or Kntrunc.

To illustrate the practical consequences of these issues, numerical simulations of the L96 system were performed with n = 8. The L96 is a well-known chaotic dynamical system with a flexible dimension, which allows us to set the latter to a convenient value. A dimension of 8 is sufficiently high to reveal the effects of dimensionality, and low enough to allow us to display full 8 × 8 matrices. The governing equations were solved using a fourth-order Runge–Kutta numerical scheme with an integration time step Δt = 0.05. A catalog was built from one long trajectory (104 times) using the real equations (δ = 0). Then, analog forecasting was performed at lead time 0.05, using the LL operator on 2 × 104 test points (103 nondimensional times) taken from another trajectory on the attractor (independent from the catalog). Setting the number of analogs to the limiting case K = 9 implies that there are just enough analogs to perform the linear regression (plus one extra analog). Even though n = 8 is not a very large dimension, if one is provided only with 9 good analogs, one must consider dimension reduction. Regular LL analog forecasting was compared with the combination of analog forecasting with EOFs, keeping the EOFs up to 95% variance, and with the coordinate-by-coordinate analog forecasting, with ntrunc = 5.

For a comparison with the actual atmospheric circulation, we can consider the study of Faranda et al. (2017). The authors estimate the instantaneous dimension of sea level pressure fields spanning the North Atlantic and Europe (22.5°–70°N, 80°W–50°E), with various horizontal resolutions (0.75°, 1.5°, and 2.5°) from reanalysis data. The average dimension is close to 13, with variations between 5 and 17 (25th and 75th percentiles). For comparison with our L96 example, assume that one searches for analogs of sea level pressure fields over the North Atlantic, after projection on a subspace of dimension 15. In such a case, the example that we study in this section corresponds to the situation where only 14 good analogs can be found.

To compare the matrix S from the linear regression between the analogs and successors and the Jacobian matrix ∇Φ, we use the RMSE, which we define as
RMSE(SΦ)=(1Ncoeffi,j(Si,jΦi,j)2)1/2,
where Ncoeff is the number of coefficients of the matrix. For instance, if the phase-space dimension is n, then Ncoeff = n2. Thus, our definition of the RMSE between matrices is proportional to the Froebenius matrix norm of the difference. Tests were performed using the operator norm instead of the RMSE, giving similar results.

The EOF strategy ensures that the linear regression can be performed, as it projects the phase space P onto the EOFs that maximize the variance in the set of analogs. Thus the rank of the set of analogs is likely to be equal to neof in this reduced space. However, the EOF strategy necessarily misses some of the components of the full n × n Jacobian matrix ∇Φt, as it gives only the estimation of an neof × neof matrix. The coordinate-by-coordinate method also ensures that the linear regression can be performed as long as ntrunc is low enough, but also misses some of the elements of the Jacobian matrix of the flow map. Indeed, even though the coefficients of ∇f are zero two cells away from the diagonal, this is not the case of ∇Φt. Recall that, at second-order in time, Φt=I+tf+t2[(f)2+(d/dt)f]. Thus, some coefficients of order t2 will not be captured by the linear regression matrix S using coordinate-by-coordinate analog forecasting with ntrunc = 5.

The linear regression matrix S is then compared with ∇Φt for the three methods. The real value of ∇Φt is estimated with the second-order time expansion of Eq. (8) that can be computed directly from the model Eq. (A2). An example is shown in Fig. 3. In this case, the regular analog forecasting misses the Jacobian with RMSE of 0.317, because the rank of the set of analogs is too low and XTΩX is thus not invertible. Analog forecasting combined with EOFs gives a better result as it circumvents this inversion problem, with a total RMSE between S and ∇Φt of 0.151. The coordinate-by-coordinate analog forecasting gives the best solution in this case, with a RMSE of 0.053. Note that many coefficients of the matrix S are set to zero by construction when using the coordinate-by-coordinate method.

Fig. 3.
Fig. 3.

Flow map Jacobian matrix estimation with the model of Lorenz (1996), without observational noise. Forecast lead time is t = 0.05 Lorenz time, catalog length is 104 Lorenz times, phase-space dimension is n = 8. K = 9 analogs are used for the forecast and Gaussian kernels for the weights ωk with shape parameter λ set to the median of analog-to-state distances |a0kx0|. (a) Jacobian matrix Φt|x0. (b) Linear regression matrix S using regular analogs. (c) Difference SΦt|x0 with regular analogs, also giving the value of RMSE below the plot. (d),(e) As in (a)–(c), but the linear regression is performed in a lower-dimensional subspace spanned by the first EOFs of the set of the K = 9 analogs. (f),(g) As in (d) and (e), but the linear regression is performed coordinate by coordinate, and assuming that the coefficients are zero two cells away from the diagonal.

Citation: Journal of the Atmospheric Sciences 78, 7; 10.1175/JAS-D-20-0204.1

Then, Fig. 4 shows empirical probability density functions for the RMSE of S∇Φt for each of the three methods. The low number of analogs implies large fluctuation of the regular LL analog forecasts, as the rank of the set of analogs used can be below or close to the phase-space dimension, making the inversion of XTΩX hazardous. This variability is noticeably reduced when the inversion is performed in the neof-dimension reduced space. The EOF strategy has the advantage of preventing large errors and the drawback of hindering accurate estimations of the Jacobian [log10(RMSE) < −1]. Indeed, when using EOFs the linear regression matrix has a rank necessarily lower than n, and some information is missed. Finally, coordinate-by-coordinate analog forecasting is able to perform better estimations of the Jacobian in average, and with a variability between that of the regular analogs and that of the analogs combined with EOFs. However, the probability to have accurate estimations of the Jacobian [log10(RMSE) < −1.4] is lower with coordinate-by-coordinate analog forecasting than with regular analog forecasting. This can be witnessed as the area under the graph for log10(RMSE) < −1.4 is larger for regular analogs then for coordinate-by-coordinate analog forecasting. This is due to the small (order t2) nonzero coefficients two cells away from the diagonal that the coordinate-by-coordinate analog forecasting cannot estimate.

Fig. 4.
Fig. 4.

Empirical probability density function of RMSE in flow map Jacobian matrix estimation, depending on the method used. We use the system of Lorenz (1996) with phase-space dimension n = 8. K = 9 analogs are used for each forecast and the methods are as in Fig. 3.

Citation: Journal of the Atmospheric Sciences 78, 7; 10.1175/JAS-D-20-0204.1

In some situations, however, the number of analogs K is much larger than the phase-space dimension n, but the linear regression matrix S is still unable to approximate the Jacobian ∇Φt. The next section gives an example of such a situation.

4) Effect of the analogs’ rank and the attractor’s dimension

As we have seen, to calculate S and perform LL analog forecasting, one must invert the matrix XTΩX. This means that the set of analogs must be of rank n. Yet, in some situations, the dimension of the attractor is lower than the full phase-space dimension n. Thus, if the catalog is made of one trajectory inside the attractor, the set of analogs might not be of rank n, however large K might be. In some cases, the dimension of the attractor is between n − 1 and n, such that the matrix XTΩX is still invertible but very sensitive to fluctuations in the rank of the set of analogs.

Similar remarks can be made for the successors. If Y (the set of successors) is not of rank n, then the matrix S, if it can be computed, is still not of rank n. Thus S will not be able to estimate the Jacobian ∇Φt if the latter is of rank n. Note that the rank of the successors (the rank of the matrix Y) is highly dependent on the rank of the analogs and the Jacobian matrix as we have YX∇ΦT at first order in X, such that if the analogs are not of rank n the successors are likely not to be of rank n either.

Thus, depending on the dimension of the attractor, the LL analog forecasting operator might not be able to estimate the local Jacobian of the real flow map, but only a projection of this Jacobian matrix onto the local sets of analogs and successors. This is a typical case where data-driven methods are not able to reveal the full physics of an observed system unless provided with other sources of information or hypotheses, such as a parametric law.

The three-variable L63 system is used to illustrate this fact. This system is known to have a dimension of ≈2.06, with local variations around this value (Caby et al. 2019). This is the perfect case study where the rank of the set of analogs will be close to n − 1. Thus, the linear regression matrix S between the analogs and the successors is not able to approximate the full (3 × 3) Jacobian matrix ∇Φt. Using restriction to the vector subspace Va spanned by the two first EOFs of the analogs (e1a,e2a), one can understand better the connection between the two matrices ∇Φt and S. In the following, subscript r indicates restriction to (e1a,e2a). The choice of using only the two first EOFs is motivated by the quasi-planar nature of the Lorenz attractor. In the next formulas the t superscript is dropped for the sake of readability:
Φr=Φ(e1ae2a0),
Sr=S(e1ae2a0) .

The condition number of the set of analogs gives a direct way to measure whether the matrix XTΩX can be inverted, and whether S can approximate a full-rank Jacobian matrix. This number is the ratio of highest to lowest singular value. It has the advantage of being a continuous function of the set of analogs, while the rank is a discontinuous function that takes only integer values. If the condition number is large, the set of analogs is almost contained in a plane, and the analogs might not be able to approximate the full Jacobian Φ. Conversely, if the condition number is close to 1, then the rank of the set of analogs is clearly 3, and analogs will be able to approximate the full Jacobian matrix. Note that the condition number of the set of analogs is not directly linked to the dimension of the attractor. One simply uses the fact that the attractor is locally close to a plane, without referring further to the complex notion of attractor dimension.

This can be investigated through numerical simulations of the L63 system, using a fourth-order Runge–Kutta numerical scheme and a time step of Δt = 0.01 to solve the governing equations. A catalog was generated from a trajectory of 105 nondimensional times, with the original equations (δ = 0). LL analog forecasting was performed at horizon t = 0.01 with K = 40 analogs, on 104 points randomly selected on the attractor. The linear regression matrix S was then compared with ∇Φt, with or without restriction to (e1a,e2a). To estimate numerically the real value of ∇Φt, a third-order time expansion similar to Eq. (8) was computed directly from the model equations.

We use the same RMSE as before, where the value of Ncoeff takes the restriction into account, such that it is either equal to 9 (full matrices) or 6 (restricted matrices).

Figure 5 shows that estimation of the Jacobian by the analogs improves as the catalog size (and therefore the catalog density) grows. This validates that the analogs are able to approximate accurately the Jacobian matrix of the flow map. The figure also shows that, once restricted to the two-dimensional subspace spanned by the analogs, this estimation is much more accurate and less fluctuating.

Fig. 5.
Fig. 5.

RMSE in estimating with analogs the L63 Jacobian matrix, as a function of catalog size. The brown circles indicate the median RMSE (with 10% and 90% quantiles) of the total (3 × 3) Jacobian matrix. The violet squares indicate the median RMSE (with 10% and 90% quantiles) of the (2 × 2) Jacobian matrix after projection on the two first EOFs of the successors and restriction to the two first EOFs of the analogs. The projection restriction implies much lower RMSE, and a much lower variability. Both estimation errors are decreasing functions of the catalog size. The number of test points decreases with catalog size, as more test points are needed for small catalogs.

Citation: Journal of the Atmospheric Sciences 78, 7; 10.1175/JAS-D-20-0204.1

Figure 6 displays the RMSE of the full (3 × 3) matrix S∇Φ as a function of the condition number of the set of analogs. We can see in this figure that large RMSE values are highly correlated with high condition numbers, while low RMSE values can only be achieved when the condition number of the analogs is close to 1.

Fig. 6.
Fig. 6.

RMSE in analogs estimation of the full (3 × 3) Jacobian matrix ∇Φt as a function of analog rank and median analog distance, with the L63 system. The rank of each set of analogs is measured by the ratio between the lowest and the highest singular value of the set of analogs. Most of the variability of the RMSE is explained by the rank of the analogs. Some of the remaining variability can be explained by the median distance from the analogs a0k to x0, which gives a measure of the local catalog density. The catalog size is 105 nondimensional times, δ = 0, and we use K = 40 analogs. Tests are done at 104 points randomly selected on the attractor.

Citation: Journal of the Atmospheric Sciences 78, 7; 10.1175/JAS-D-20-0204.1

All these elements show that the estimation of the Jacobian matrix from analogs is highly dependent on the number of analogs K, the condition number of the set analogs, the attractor’s dimension, and the phase-space dimension n. However, the fact that the matrix from the LL operator does not approximate the full Jacobian ∇Φt does not mean that the analog forecast will poorly approximate the future state xt. For the LL forecast to be efficient, one only needs a good approximation of the restricted Jacobian, and that the inversion associated with the linear regression is not ill conditioned.

5) Effect of observational noise

In the strong noise limit, variations of atk (respectively a0k) with k are dominated by variations of ϵtk (respectively ϵ0k) and there is no correlation between successors and analogs. Therefore, the least squares estimate of the linear relationship S between the analogs and successors gives a matrix of zeros, and the LL operator reduces to the LC operator. However, this effect can be mitigated by a large value of K.

We conducted numerical experiments with the three-dimensional Lorenz system (not shown). We considered additive noises ϵ~i.i.d.N(0,σ2I), and σ is a given percentage (0.01 or 1) of the root-mean-squared distance between two points picked randomly on the attractor. We used the values K = 40 200. In the strong noise case (1%), S is close to zero (for K = 40 and K = 200), and the condition number of analogs is always low because the two-dimensional structure of the attractor is lost inside the isotropic, three-dimensional noise. In the weak noise case (0.01%), we recover that RMSE(S∇Φt) is a growing function of the analogs’ condition number as in Fig. 6. Also, RMSE(S∇Φt) is a decreasing function of K. However, even for low condition numbers and high K, the estimation error is much larger than in the noise-free case of Fig. 6.

c. Forecast error at various time horizons: Numerical experiment

In this section, we use Eqs. (9e), (10e), and (11a) to interpret analog forecast errors on a practical numerical example of the L63, for various lead times. In particular, we explore the limitations of these equations for large time horizons.

Before describing the numerical experiment, we recall some of the results of Farmer and Sidorowichl (1988). The authors used analog-like forecasting techniques and focused on the long-term behavior of the forecast errors. In particular, they derived the time slope of the mean exponential growth of errors associated with the maximal Lyapunov exponent (MLE). As long as there is a nonzero component of a0kx0 along the direction associated with the MLE, this exponent drives the long-term amplitude of Φt(a0kx0) and of the terms that involve the Hessian included in O(|a0kx0|2) in Eqs. (9a), (10a), and (11a). At first order, the Jacobian grows as eλt, where λ is the MLE, and the Hessian grows as e2λt. Therefore, the errors of the LC and LI grow as eλt, and the error of the LL grows as e2λt. However, for very large lead times, the error term that grows as e2λt becomes dominant and drives the error of all operators LC, LI and LL, such that they become equivalent in accuracy.

Farmer and Sidorowichl (1988) also studied an iterative version of the LL operator. Instead of directly using successors at time t, one uses successors at time dt to split the forecast into multiple forecasts of small time step. This avoids having an error term involving the Hessian that grows as e2λt. In this case, at each iteration, the forecast error grows through both an additive error due to the nonzero difference ∇ΦdtS, and the growth of previous errors through the dynamics of the system. For large lead times, the growth of previous errors becomes dominant, and grows with time as eλt.

We conducted numerical experiments of the L63 system to produce mean analog forecasts, whose errors were computed over 2000 points on the attractor. Additive, independent and identically distributed (i.i.d.) Gaussian observational noises were considered, and we set δ = 0. Figure 7 shows medians of these forecast errors for several lead times and several methods.

Fig. 7.
Fig. 7.

Influence of noise intensity σ and number of analogs K on analog forecasting errors, using the Lorenz (1963) system, a catalog size of L = 107 and a time step dt = 0.01 between elements of the catalog. Median errors of mean analog forecasting operators |μLxt| are computed over 2000 points taken randomly on the attractor. Black curves indicate slopes associated with the maximal Lyapunov exponent.

Citation: Journal of the Atmospheric Sciences 78, 7; 10.1175/JAS-D-20-0204.1

First, let us consider the case of strong noise (1% of RMS distance in the attractor, red curves in Fig. 7). In this case and for the experiment reported here, the noise is larger than the typical distance between analogs and initial state |a0kx0|. All methods (LC, LI, and LL) give similar results (indistinguishable on the graph). The iterative LL method adds an error term at each iteration due to observational noise and is therefore less accurate than other methods (not shown). As stated above, in the case of strong noise S ≈ 0 and the LL operator tends to the LC operator at all times. Then, for small lead times (here, t1), the noise terms of Eqs. (9e) and (10e) are dominant, such that the LC and LI operator have similar performances. This also explains why raising the value of K diminishes the forecast error. On the other hand, for large lead times (here, t1), the terms proportional to (μ0x0) in Eqs. (9e) and (10e) are dominant. Therefore, raising the value of K does not change much the accuracy of mean analog forecasting. Also, for large lead times, the growth of errors through the Jacobian ∇Φt follows the slope given by the MLE (we take here the value of 0.9057; see Viswanath 1998). Finally, note that the similar performances of the LC and LI operators at large lead times can be explained by the fact that |Φt||I|.

Then, consider the case of weak noise (0.01% of RMS distance in the attractor, violet curves in Fig. 7). In this case, the noise amplitude is smaller than the typical distance between analogs and initial state, but still nonnegligible. For small lead times (here, t1.5), there is no dominant term in Eqs. (9e) and (10e). Thus, raising the value of K diminishes the errors due to noise but also raises the errors proportional to (μ0x0), resulting in better performances for the LL, but similar performances for the LC and LI. For large lead times (t3.5 for K = 40 and t2.5 for K = 200), all methods LC, LI, and LL are equivalent and driven by the Hessian terms O(ωk|a0kx0|2) with the expected slope of 2λ. Therefore, raising the value of K increases bias and raises the mean analog forecasting errors. As expected, the iterative LL error grows as eλt and is more accurate than the direct LL at large lead times (t2.5 for K = 40 and 0.01% noise, t0.5 for K = 200 and 0.01% noise, t0.1 for K = 40 and zero noise, and at all times for K = 200 and zero noise). Note also that this iterative method is the only one that benefits from noise reduction through the increase of K at large lead times.

For very small lead times only (t0.1), the LI operator has better performances than the LC operator. Note that at very small lead times (also t0.1), the forecast error of the LC operator is a decreasing function of time. This is due to the fact that the system under study is dissipative and the trace of ∇f is negative (for more details, see the work of Nicolis et al. 2009). Thus, at very small lead times the distance between two initially close states decreases. This decreasing trend at very small lead times is also witnessed for the direct LL operator. We interpret this fact as the result of a very good match between the real Jacobian and the linear matrix estimated by the direct LL. On the other hand, the iterative LL adds a new error term at each iteration and is thus not a decreasing function of time even for lead times t0.1. This explains why the iterative LL has poorer performances than the direct LL for these very small lead times. This can only be witnessed when noise is not the main driver of forecast error.

Similar arguments explain the curves of the LL operator and iterative LL operator for zero noise (green curves in Fig. 7). In this case, raising the value of K increases the error at all times. The fact that the curves of the direct LL with zero noise and with weak noise coincide at large lead times confirm that noise is not the main driver of forecast error at large lead times.

Note that these results are, in principle, influenced by the choice of the Euclidean metric to find analogs. Alternatives have been explored (e.g., Blanchet et al. 2018). But Toth (1991) argued that their performances marginally vary for forecast evaluations.

d. Evolution of mean and covariance under Gaussian assumption

In this section, it is assumed that the weighted discrete distributions of the analogs kωkδa0k and of their successors kωkδatk can be approximated by Gaussian distributions:
kωkδa0kN(μ0,P0),
kωkδatkN(μt,Pt),
where we have μt = μLC. One advantage of the Gaussian distribution is that any linear combination of Gaussian-distributed variables is also Gaussian distributed. This property allows us to derive many results in their exact, analytical form. This distribution also has the benefit of being represented entirely by its two first moments (i.e., its mean and variance). Finally, note that assuming Gaussian forecasts simplifies computations in a data-assimilation framework. However, this distribution can only represent light-tailed phenomena, and might thus not be suited to the study of extremes.

Note that if Eq. (16a) holds and if the flow map can be approximated by a linear function in the convex hull of the set of analogs, then Eq. (16b) holds. In practice, this is satisfied if the flow map is smooth enough, and if the catalog is dense enough so that most of the variations in the set of successors can be explained by the Jacobian matrix of the flow map at μ0.

In this section, we assume zero observational noise. Nonzero observational, centered Gaussian noises would give additional terms in the expressions given below for the covariance matrices. Combining the Gaussian hypothesis with Eq. (5b) and approximating Φat() by its tangent around μ0 we have the classic relationships:
μt=Φat(μ0)+O(TrP0),
Pt=Φat|μ0P0 Φat|μ0T+O(TrP0),
where Tr is the trace operator. Similar relations can be found using the differential representation of Eq. (5b):
dμtdt=fa(μt)+O(TrPt),μt=0=μ0,
dPtdt=fa|μtPt+Pt fa|μtT+O(TrPt),Pt=0=P0.
Now, let us make the simplifying hypothesis that |x0μ0|2TrP0, which means that the state x0 is not farther from the analogs’ mean μ0 than the standard deviation of the analogs. Then, one evaluates Φat, fa and their derivatives at x0 and xt instead of μ0 and μt, giving additional terms:
μt=Φt(x0)+δΦ˜t(x0)+Φt|x0(μ0x0)+O(TrP0,δ|μ0x0|),
Pt=Φt|x0P0Φt|x0T+δ(Φ˜t|x0P0Φt|x0T+Φt|x0P0Φ˜t|x0T)+([(μ0x0)2Φt|x0]P0Φt|x0T+Φt|x0P0[(μ0x0)2Φt|x0]T)+O(TrP0,δ|μ0x0|),
where terms of order |μ0x0|2 are included in O(TrP0) and 2Φt|x0 is the Hessian of Φt at x0. In the time differential representation we have
d(μtxt)dt=f|xt(μtxt)+f˜(xt)+O(TrPt,δ|μtxt|),
dPtdt=f|xtPt+Pt f|xtT+ δ(f˜|xtPt+Ptf˜|xtT)+([(μtxt)2f|xt]Pt+Pt[(μtxt)2f|xt]T)+O(TrPt,δ|μtxt|).

Equation (20a) is equivalent to Eq. (19a), which is also equivalent to Eq. (6b). Equation (20a) can be Taylor expanded around t = 0 to find Eq. (9b). This analysis recovers the results from section 4a for the mean forecast of the LC analog forecasting operator.

Equations (20b) and (19b) are two representations of the same phenomenon. They show that at first order, the growth in covariance between the analogs and successors is directly linked to the Jacobian matrix of Φt at x0. The covariance of the analog forecast will depend on the covariance of the analogs at t = 0, P0, and on the system’s local Jacobian Φt|x0. This is another way to see that the analogs are highly linked to the local dynamics of the system. If the local dynamics induce a large spread in the future possible trajectories, it is captured in the successors’ covariance Pt. On the contrary, if the local dynamics are flat (Φt|x0I or f|x00) the successors’ covariance is equal to the analogs’ covariance.

Finally, in the presence of observational noise as in Eq. (5d) and if the latter are centered, Gaussian, of the form ϵ0k~i.i.d.N(0,Σ0), ϵtk~i.i.d.N(0,Σt), and independent of the analogs and successors, then Eq. (17b) is modified to give
Pt=Φat|μ0(P0+Σ0)Φat|μ0T+Σt+O(TrP0),
while Eq. (17a) is unchanged. Similarly, Eq. (19b) is modified to give
Pt=Φt|x0(P0+Σ0)Φt|x0T+δ(Φ˜t|x0(P0+Σ0)Φt|x0T+Φt|x0(P0+Σ0)Φ˜t|x0T)+([(μ0x0)2Φt|x0](P0+Σ0)Φt|x0T+Φt|x0(P0+Σ0)[(μ0x0)2Φt|x0]T)+Σt+O(TrP0,δ|μ0x0|).

These equations show that observational noise adds variability among the set of successors both through the observational error of the successors and the observational error of the analogs. Again, the noise contribution from the analogs to the successor variability is dependent on the local dynamics of the system.

5. Conclusions

For practical applications, one has often not access to analogs in phase space (i.e., full observation of the system). The analysis of partially observed systems corresponds to building (possibly) multivariate observables, to create a one-to-one map between the space of observations and the phase space. Such an analysis is beyond the scope of this article, and we assume that this issue has been treated separately. The dimension of observables is connected to the dimension of the full system, as demonstrated by Caby et al. (2020). The number of key variables in phase space has been deemed to be moderate (5–17; see Faranda et al. 2017). This motivates our investigation on low dimensional systems, in order to achieve meaningful illustrations. For theoretical works on analog forecasting with partial observations, the reader is referred to the early works of Farmer and Sidorowichl (1988), Sauer (1994), and Sugihara (1994) that make use of time embeddings [see the study of Sauer et al. (1991) on time embeddings], and the more recent works of Zhao and Giannakis (2016) and Alexander and Giannakis (2020) that use kernel projection techniques. A complementary concern is the compensation of observational noise with analogs, which was investigated by Chau et al. (2020) in a data assimilation framework.

One must bear in mind that the use of analog forecasting in applications implies issues such as the choice of the space in which forecasting is performed, the choice of the right metric to compare analogs and initial state, and the combination of analogs with other techniques. In data assimilation, one might want to use Gaussian distributions instead of the discrete distributions of section 2b in order to use Kalman filtering. Ridge and least absolute shrinkage and selection operator (LASSO) regularizations could be used instead of the techniques mentioned in section 4b. Such regularization techniques use additional terms in the resolution of the least squares problem (Hastie et al. 2009) that prevent some complications of the linear regression, especially when the predictors are highly correlated. Each of these operational choices must be made accounting for memory use and computational time [see Lguensat et al. (2017) for differences between regular and coordinate-by-coordinate analog forecasting].

Analog forecasting allows us to avoid solving complex nonlinear equations by using existing solutions starting from similar initial conditions. The accuracy of analog forecasting depends on local dynamical properties of the system of interest. In particular, the quality of analog forecasts is related to the Jacobian matrix of the real system’s flow map, and the linear regression from analogs to successors is shown to provide an approximation of this matrix. This approximation helps to study the mean accuracy of known analog forecasting operators, and to compare different methods that evaluate this Jacobian matrix, using numerical experiments of well-known dynamical systems. The LL operator is found to give the best approximation of the future state, provided that the linear regression is not ill posed. For long-term forecasts, an iterative version of the LL can be even more accurate. The LI operator is shown to have better skills than the LC at small lead times. The behavior of analog forecast errors at large lead times is driven by the maximal Lyapunov exponent of the system. Additive observational noise lowers the performances of analog forecasts but can be mitigated by using a larger number of analogs. However, raising the number of analogs also increases bias, and the optimal trade-off depends on the time horizon and the analog method used. The Jacobian matrix of the flow map is found to drive the growth of the successors’ covariance matrix. Altogether, this brings theoretical evidence that analogs can be used to emulate a real system and gives quantitative expressions to analyze and predict the accuracy of analog forecasting techniques.

Acknowledgments

The work was financially supported by ERC Grant 338965-A2C2 and ANR 10-IEED-0006-26 (CARAVELE project). We thank the three anonymous reviewers, as this version of the paper owes much to their remarks and suggestions.

APPENDIX A

Lorenz Systems

The three-variable L63 (Lorenz 1963) system of equations is
{dx1dt=σ(x2x1),dx2dt=x1(ρx3)x2,dx3dt=x1x2βx3,
with usual parameters σ = 10, β = 8/3 and ρ = 28.
The n-variable L96 (Lorenz 1996) system of equations is
i[1,n],dxidt=(xi2+xi+1)xi1xi+θ,
where θ is the forcing parameter. We set n = 8 and θ = 8, and use periodic boundary conditions xi+n = xi.

APPENDIX B

Product of Hessian with Vectors

Let g be a vector-valued, phase-space-dependent function g:nn such as Φt or f.

The Hessian of g at x is noted 2g|x. It is of dimension n3 and its (i, j, k)th coefficient [2g|x]i,j,k equals [2gk/(xixj)](x). The product of a Hessian 2g|x with a n-dimensional vector y is a matrix and its (i, k)th coefficient [y2g|x]i,k equals jyj2gk/(xixj). The double product of a Hessian with two n-dimensional vectors y and z is a vector and its kth coefficient [y(2g|x)zT]k equals i,jyjzi2gk/(xixj). The double product of a Hessian 2g|x with two matrices X and Y of same shape k × n is a matrix of shape k × n, and its (k, j)th coefficient is l,mXk,lYk,m2gj/(xlxm).

REFERENCES

  • Alexander, R., and D. Giannakis, 2020: Operator-theoretic framework for forecasting nonlinear time series with kernel analog techniques. Physica D, 409, 132520, https://doi.org/10.1016/j.physd.2020.132520.

    • Search Google Scholar
    • Export Citation
  • Alexander, R., Z. Zhao, E. Székely, and D. Giannakis, 2017: Kernel analog forecasting of tropical intraseasonal oscillations. J. Atmos. Sci., 74, 13211342, https://doi.org/10.1175/JAS-D-16-0147.1.

    • Search Google Scholar
    • Export Citation
  • Atencia, A., and I. Zawadzki, 2017: Analogs on the Lorenz attractor and ensemble spread. Mon. Wea. Rev., 145, 13811400, https://doi.org/10.1175/MWR-D-16-0123.1.

    • Search Google Scholar
    • Export Citation
  • Ayet, A., and P. Tandeo, 2018: Nowcasting solar irradiance using an analog method and geostationary satellite images. Sol. Energy, 164, 301315, https://doi.org/10.1016/j.solener.2018.02.068.

    • Search Google Scholar
    • Export Citation
  • Balaji, V., 2015: Climate computing: The state of play. Comput. Sci. Eng., 17, 913, https://doi.org/10.1109/MCSE.2015.109.

  • Blanchet, J., S. Stalla, and J.-D. Creutin, 2018: Analogy of multiday sequences of atmospheric circulation favoring large rainfall accumulation over the French Alps. Atmos. Sci. Lett., 19, e809, https://doi.org/10.1002/asl.809.

    • Search Google Scholar
    • Export Citation
  • Caby, T., D. Faranda, G. Mantica, S. Vaienti, and P. Yiou, 2019: Generalized dimensions, large deviations and the distribution of rare events. Physica D, 400, 132143, https://doi.org/10.1016/j.physd.2019.06.009.

    • Search Google Scholar
    • Export Citation
  • Caby, T., D. Faranda, S. Vaienti, and P. Yiou, 2020: Extreme value distributions of observation recurrences. Nonlinearity, 34, 118163, https://doi.org/10.1088/1361-6544/abaff1.

    • Search Google Scholar
    • Export Citation
  • Carrassi, A., M. Bocquet, L. Bertino, and G. Evensen, 2018: Data assimilation in the geosciences: An overview of methods, issues, and perspectives. Wiley Interdiscip. Rev.: Climate Change, 9, e535, https://doi.org/10.1002/wcc.535.

    • Search Google Scholar
    • Export Citation
  • Chau, T. T. T., P. Ailliot, and V. Monbet, 2020: An algorithm for non-parametric estimation in state–space models. Comput. Stat. Data Anal., 153, 107062, https://doi.org/10.1016/j.csda.2020.107062.

    • Search Google Scholar
    • Export Citation
  • Crutchfield, J. P., and B. S. McNamara, 1987: Equations of motion from a data series. Complex Syst., 1, 417452.

  • Delle Monache, L., F. A. Eckel, D. L. Rife, B. Nagarajan, and K. Searight, 2013: Probabilistic weather prediction with an analog ensemble. Mon. Wea. Rev., 141, 34983516, https://doi.org/10.1175/MWR-D-12-00281.1.

    • Search Google Scholar
    • Export Citation
  • Dijkstra, H. A., 2016: Understanding climate variability using dynamical systems theory. The Fluid Dynamics of Climate, A. Provenzale, E. Palazzi, and K. Fraedrich, Eds., CISM International Centre for Mechanical Sciences, Vol. 564, Springer, 1–38.

  • Faranda, D., G. Messori, and P. Yiou, 2017: Dynamical proxies of North Atlantic predictability and extremes. Sci. Rep., 7, 41278, https://doi.org/10.1038/srep41278.

    • Search Google Scholar
    • Export Citation
  • Farmer, J. D., and J. J. Sidorowich, 1987: Predicting chaotic time series. Phys. Rev. Lett., 59, 845848, https://doi.org/10.1103/PhysRevLett.59.845.

    • Search Google Scholar
    • Export Citation
  • Farmer, J. D., and J. J. Sidorowich, 1988: Exploiting chaos to predict the future and reduce noise. Evolution, Learning and Cognition, World Scientific, 277–330.

  • Griliches, Z., and V. Ringstad, 1970: Error-in-the-variables bias in nonlinear contexts. Econometrica, 38, 368370, https://doi.org/10.2307/1913020.

    • Search Google Scholar
    • Export Citation
  • Grooms, I., 2021: Analog ensemble data assimilation and a method for constructing analogs with variational autoencoders. Quart. J. Roy. Meteor. Soc., 147, 139149, https://doi.org/10.1002/qj.3910.

    • Search Google Scholar
    • Export Citation
  • Hamilton, F., T. Berry, and T. Sauer, 2016: Ensemble Kalman filtering without a model. Phys. Rev. X, 6, 011021, https://doi.org/10.1103/PhysRevX.6.011021.

    • Search Google Scholar
    • Export Citation
  • Hastie, T., R. Tibshirani, and J. Friedman, 2009 : The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Science & Business Media, 745 pp.

  • Jazwinski, A. H., 1970: Stochastic Processes and Filtering Theory. Elsevier, 376 pp.

  • Lguensat, R., P. Tandeo, P. Ailliot, M. Pulido, and R. Fablet, 2017: The analog data assimilation. Mon. Wea. Rev., 145, 40934107, https://doi.org/10.1175/MWR-D-16-0441.1.

    • Search Google Scholar
    • Export Citation
  • Li, J., and R. Ding, 2011: Temporal–spatial distribution of atmospheric predictability limit by local dynamical analogs. Mon. Wea. Rev., 139, 32653283, https://doi.org/10.1175/MWR-D-10-05020.1.

    • Search Google Scholar
    • Export Citation
  • Lorenz, E. N., 1963: Deterministic nonperiodic flow. J. Atmos. Sci., 20, 130141, https://doi.org/10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Lorenz, E. N., 1969: Atmospheric predictability as revealed by naturally occurring analogues. J. Atmos. Sci., 26, 636646, https://doi.org/10.1175/1520-0469(1969)26<636:APARBN>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Lorenz, E. N., 1996: Predictability: A problem partly solved. Proc. Seminar on Predictability, Shinfield Park, Reading, United Kingdom, ECMWF, https://www.ecmwf.int/node/10829.

  • Lott, F., and M. J. Miller, 1997: A new subgrid-scale orographic drag parametrization: Its formulation and testing. Quart. J. Roy. Meteor. Soc., 123, 101127, https://doi.org/10.1002/qj.49712353704.

    • Search Google Scholar
    • Export Citation
  • Markovsky, I., and S. Van Huffel, 2007: Overview of total least-squares methods. Signal Process., 87, 22832302, https://doi.org/10.1016/j.sigpro.2007.04.004.

    • Search Google Scholar
    • Export Citation
  • McDermott, P. L., and C. K. Wikle, 2016: A model-based approach for analog spatio-temporal dynamic forecasting. Environmetrics, 27, 7082, https://doi.org/10.1002/env.2374.

    • Search Google Scholar
    • Export Citation
  • Milnor, J., 1985: On the concept of attractor. The Theory of Chaotic Attractors, Springer, 243–264.

  • Nicolis, C., 1998: Atmospheric analogs and recurrence time statistics: Toward a dynamical formulation. J. Atmos. Sci., 55, 465475, https://doi.org/10.1175/1520-0469(1998)055<0465:AAARTS>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Nicolis, C., P. Perdigao, and S. Vannitsem, 2009: Dynamics of prediction errors under the combined effect of initial condition and model errors. J. Atmos. Sci., 66, 766778, https://doi.org/10.1175/2008JAS2781.1.

    • Search Google Scholar
    • Export Citation
  • Platzer, P., P. Yiou, P. Tandeo, P. Naveau, and J.-F. Filipot, 2019: Predicting analog forecasting errors using dynamical systems. CI 2019: Ninth Int. Workshop on Climate Informatics, Paris, France, École Normale Supérieure, 69–72, https://opensky.ucar.edu/islandora/object/technotes%3A581/datastream/PDF/view.

  • Poincaré, H., 1890: Sur le problème des trois corps et les équations de la dynamique. Acta Math., 13, A3A270.

  • Prein, A. F., and Coauthors, 2015: A review on regional convection-permitting climate modeling: Demonstrations, prospects, and challenges. Rev. Geophys., 53, 323361, https://doi.org/10.1002/2014RG000475.

    • Search Google Scholar
    • Export Citation
  • Sauer, T., 1994: Time series prediction by using delay coordinate embedding. Time Series Prediction: Forecasting the Future and Understanding the Past, A. Weigend and N. A. Gershenfeld, Eds., 175–193.

  • Sauer, T., J. A. Yorke, and M. Casdagli, 1991: Embedology. J. Stat. Phys., 65, 579616, https://doi.org/10.1007/BF01053745.

  • Schmid, P. J., 2010: Dynamic mode decomposition of numerical and experimental data. J. Fluid Mech., 656, 528, https://doi.org/10.1017/S0022112010001217.

    • Search Google Scholar
    • Export Citation
  • Schuurmans, C. J., 1973: A 4-year experiment in long-range weather forecasting, using circulation analogues. Meteor. Rdsch., 26, 24.

  • Sugihara, G., 1994: Nonlinear forecasting for the classification of natural time series. Philos. Trans. Roy. Soc. London, A348, 477495, https://doi.org/10.1098/rsta.1994.0106.

    • Search Google Scholar
    • Export Citation
  • Tandeo, P., and Coauthors, 2015: Combining analog method and ensemble data assimilation: Application to the Lorenz-63 chaotic system. Machine Learning and Data Mining Approaches to Climate Science, Springer, 3–12.

  • Tandeo, P., P. Ailliot, M. Bocquet, A. Carrassi, T. Miyoshi, M. Pulido, and Y. Zhen, 2020: A review of innovation-based methods to jointly estimate model and observation error covariance matrices in ensemble data assimilation. Mon. Wea. Rev., 148, 39733994, https://doi.org/10.1175/MWR-D-19-0240.1.

    • Search Google Scholar
    • Export Citation
  • Temam, R., 1988: Infinite-Dimensional Dynamical Systems in Mechanics and Physics. Springer-Verlag, 500 pp.

  • Tippett, M. K., and T. DelSole, 2013: Constructed analogs and linear regression. Mon. Wea. Rev., 141, 25192525, https://doi.org/10.1175/MWR-D-12-00223.1.

    • Search Google Scholar
    • Export Citation
  • Toth, Z., 1991: Intercomparison of circulation similarity measures. Mon. Wea. Rev., 119, 5564, https://doi.org/10.1175/1520-0493(1991)119<0055:IOCSM>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Van Den Dool, H. M., 1994: Searching for analogues, how long must we wait? Tellus, 46A, 314324, https://doi.org/10.3402/tellusa.v46i3.15481.

    • Search Google Scholar
    • Export Citation
  • Van Den Dool, H. M., 2007: Empirical Methods in Short-Term Climate Prediction. Oxford University Press, 240 pp.

  • Vannitsem, S., 2017: Predictability of large-scale atmospheric motions: Lyapunov exponents and error dynamics. Chaos, 27, 032101, https://doi.org/10.1063/1.4979042.

    • Search Google Scholar
    • Export Citation
  • Viswanath, D., 1998: Lyapunov exponents from random Fibonacci sequences to the Lorenz equations. Ph.D. dissertation, Cornell University, 94 pp.

  • Wang, X., J. Slawinska, and D. Giannakis, 2020: Extended-range statistical ENSO prediction through operator-theoretic techniques for nonlinear dynamics. Sci. Rep., 10, 2636, https://doi.org/10.1038/s41598-020-59128-7.

    • Search Google Scholar
    • Export Citation
  • Wolpert, D. H., 1996: The lack of a priori distinctions between learning algorithms. Neural Comput., 8, 13411390, https://doi.org/10.1162/neco.1996.8.7.1341.

    • Search Google Scholar
    • Export Citation
  • Yiou, P., 2014: AnaWEGE: A weather generator based on analogues of atmospheric circulation. Geosci. Model Dev., 7, 531543, https://doi.org/10.5194/gmd-7-531-2014.

    • Search Google Scholar
    • Export Citation
  • Zhao, Z., and D. Giannakis, 2016: Analog forecasting with dynamics-adapted kernels. Nonlinearity, 29, 28882939, https://doi.org/10.1088/0951-7715/29/9/2888.

    • Search Google Scholar
    • Export Citation
Save
  • Alexander, R., and D. Giannakis, 2020: Operator-theoretic framework for forecasting nonlinear time series with kernel analog techniques. Physica D, 409, 132520, https://doi.org/10.1016/j.physd.2020.132520.

    • Search Google Scholar
    • Export Citation
  • Alexander, R., Z. Zhao, E. Székely, and D. Giannakis, 2017: Kernel analog forecasting of tropical intraseasonal oscillations. J. Atmos. Sci., 74, 13211342, https://doi.org/10.1175/JAS-D-16-0147.1.

    • Search Google Scholar
    • Export Citation
  • Atencia, A., and I. Zawadzki, 2017: Analogs on the Lorenz attractor and ensemble spread. Mon. Wea. Rev., 145, 13811400, https://doi.org/10.1175/MWR-D-16-0123.1.

    • Search Google Scholar
    • Export Citation
  • Ayet, A., and P. Tandeo, 2018: Nowcasting solar irradiance using an analog method and geostationary satellite images. Sol. Energy, 164, 301315, https://doi.org/10.1016/j.solener.2018.02.068.

    • Search Google Scholar
    • Export Citation
  • Balaji, V., 2015: Climate computing: The state of play. Comput. Sci. Eng., 17, 913, https://doi.org/10.1109/MCSE.2015.109.

  • Blanchet, J., S. Stalla, and J.-D. Creutin, 2018: Analogy of multiday sequences of atmospheric circulation favoring large rainfall accumulation over the French Alps. Atmos. Sci. Lett., 19, e809, https://doi.org/10.1002/asl.809.

    • Search Google Scholar
    • Export Citation
  • Caby, T., D. Faranda, G. Mantica, S. Vaienti, and P. Yiou, 2019: Generalized dimensions, large deviations and the distribution of rare events. Physica D, 400, 132143, https://doi.org/10.1016/j.physd.2019.06.009.

    • Search Google Scholar
    • Export Citation
  • Caby, T., D. Faranda, S. Vaienti, and P. Yiou, 2020: Extreme value distributions of observation recurrences. Nonlinearity, 34, 118163, https://doi.org/10.1088/1361-6544/abaff1.

    • Search Google Scholar
    • Export Citation
  • Carrassi, A., M. Bocquet, L. Bertino, and G. Evensen, 2018: Data assimilation in the geosciences: An overview of methods, issues, and perspectives. Wiley Interdiscip. Rev.: Climate Change, 9, e535, https://doi.org/10.1002/wcc.535.

    • Search Google Scholar
    • Export Citation
  • Chau, T. T. T., P. Ailliot, and V. Monbet, 2020: An algorithm for non-parametric estimation in state–space models. Comput. Stat. Data Anal., 153, 107062, https://doi.org/10.1016/j.csda.2020.107062.

    • Search Google Scholar
    • Export Citation
  • Crutchfield, J. P., and B. S. McNamara, 1987: Equations of motion from a data series. Complex Syst., 1, 417452.

  • Delle Monache, L., F. A. Eckel, D. L. Rife, B. Nagarajan, and K. Searight, 2013: Probabilistic weather prediction with an analog ensemble. Mon. Wea. Rev., 141, 34983516, https://doi.org/10.1175/MWR-D-12-00281.1.

    • Search Google Scholar
    • Export Citation
  • Dijkstra, H. A., 2016: Understanding climate variability using dynamical systems theory. The Fluid Dynamics of Climate, A. Provenzale, E. Palazzi, and K. Fraedrich, Eds., CISM International Centre for Mechanical Sciences, Vol. 564, Springer, 1–38.

  • Faranda, D., G. Messori, and P. Yiou, 2017: Dynamical proxies of North Atlantic predictability and extremes. Sci. Rep., 7, 41278, https://doi.org/10.1038/srep41278.

    • Search Google Scholar
    • Export Citation
  • Farmer, J. D., and J. J. Sidorowich, 1987: Predicting chaotic time series. Phys. Rev. Lett., 59, 845848, https://doi.org/10.1103/PhysRevLett.59.845.

    • Search Google Scholar
    • Export Citation
  • Farmer, J. D., and J. J. Sidorowich, 1988: Exploiting chaos to predict the future and reduce noise. Evolution, Learning and Cognition, World Scientific, 277–330.

  • Griliches, Z., and V. Ringstad, 1970: Error-in-the-variables bias in nonlinear contexts. Econometrica, 38, 368370, https://doi.org/10.2307/1913020.

    • Search Google Scholar
    • Export Citation
  • Grooms, I., 2021: Analog ensemble data assimilation and a method for constructing analogs with variational autoencoders. Quart. J. Roy. Meteor. Soc., 147, 139149, https://doi.org/10.1002/qj.3910.

    • Search Google Scholar
    • Export Citation
  • Hamilton, F., T. Berry, and T. Sauer, 2016: Ensemble Kalman filtering without a model. Phys. Rev. X, 6, 011021, https://doi.org/10.1103/PhysRevX.6.011021.

    • Search Google Scholar
    • Export Citation
  • Hastie, T., R. Tibshirani, and J. Friedman, 2009 : The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Science & Business Media, 745 pp.

  • Jazwinski, A. H., 1970: Stochastic Processes and Filtering Theory. Elsevier, 376 pp.

  • Lguensat, R., P. Tandeo, P. Ailliot, M. Pulido, and R. Fablet, 2017: The analog data assimilation. Mon. Wea. Rev., 145, 40934107, https://doi.org/10.1175/MWR-D-16-0441.1.

    • Search Google Scholar
    • Export Citation
  • Li, J., and R. Ding, 2011: Temporal–spatial distribution of atmospheric predictability limit by local dynamical analogs. Mon. Wea. Rev., 139, 32653283, https://doi.org/10.1175/MWR-D-10-05020.1.

    • Search Google Scholar
    • Export Citation
  • Lorenz, E. N., 1963: Deterministic nonperiodic flow. J. Atmos. Sci., 20, 130141, https://doi.org/10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Lorenz, E. N., 1969: Atmospheric predictability as revealed by naturally occurring analogues. J. Atmos. Sci., 26, 636646, https://doi.org/10.1175/1520-0469(1969)26<636:APARBN>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Lorenz, E. N., 1996: Predictability: A problem partly solved. Proc. Seminar on Predictability, Shinfield Park, Reading, United Kingdom, ECMWF, https://www.ecmwf.int/node/10829.

  • Lott, F., and M. J. Miller, 1997: A new subgrid-scale orographic drag parametrization: Its formulation and testing. Quart. J. Roy. Meteor. Soc., 123, 101127, https://doi.org/10.1002/qj.49712353704.

    • Search Google Scholar
    • Export Citation
  • Markovsky, I., and S. Van Huffel, 2007: Overview of total least-squares methods. Signal Process., 87, 22832302, https://doi.org/10.1016/j.sigpro.2007.04.004.

    • Search Google Scholar
    • Export Citation
  • McDermott, P. L., and C. K. Wikle, 2016: A model-based approach for analog spatio-temporal dynamic forecasting. Environmetrics, 27, 7082, https://doi.org/10.1002/env.2374.

    • Search Google Scholar
    • Export Citation
  • Milnor, J., 1985: On the concept of attractor. The Theory of Chaotic Attractors, Springer, 243–264.

  • Nicolis, C., 1998: Atmospheric analogs and recurrence time statistics: Toward a dynamical formulation. J. Atmos. Sci., 55, 465475, https://doi.org/10.1175/1520-0469(1998)055<0465:AAARTS>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Nicolis, C., P. Perdigao, and S. Vannitsem, 2009: Dynamics of prediction errors under the combined effect of initial condition and model errors. J. Atmos. Sci., 66, 766778, https://doi.org/10.1175/2008JAS2781.1.

    • Search Google Scholar
    • Export Citation
  • Platzer, P., P. Yiou, P. Tandeo, P. Naveau, and J.-F. Filipot, 2019: Predicting analog forecasting errors using dynamical systems. CI 2019: Ninth Int. Workshop on Climate Informatics, Paris, France, École Normale Supérieure, 69–72, https://opensky.ucar.edu/islandora/object/technotes%3A581/datastream/PDF/view.

  • Poincaré, H., 1890: Sur le problème des trois corps et les équations de la dynamique. Acta Math., 13, A3A270.

  • Prein, A. F., and Coauthors, 2015: A review on regional convection-permitting climate modeling: Demonstrations, prospects, and challenges. Rev. Geophys., 53, 323361, https://doi.org/10.1002/2014RG000475.

    • Search Google Scholar
    • Export Citation
  • Sauer, T., 1994: Time series prediction by using delay coordinate embedding. Time Series Prediction: Forecasting the Future and Understanding the Past, A. Weigend and N. A. Gershenfeld, Eds., 175–193.

  • Sauer, T., J. A. Yorke, and M. Casdagli, 1991: Embedology. J. Stat. Phys., 65, 579616, https://doi.org/10.1007/BF01053745.

  • Schmid, P. J., 2010: Dynamic mode decomposition of numerical and experimental data. J. Fluid Mech., 656, 528, https://doi.org/10.1017/S0022112010001217.

    • Search Google Scholar
    • Export Citation
  • Schuurmans, C. J., 1973: A 4-year experiment in long-range weather forecasting, using circulation analogues. Meteor. Rdsch., 26, 24.

  • Sugihara, G., 1994: Nonlinear forecasting for the classification of natural time series. Philos. Trans. Roy. Soc. London, A348, 477495, https://doi.org/10.1098/rsta.1994.0106.

    • Search Google Scholar
    • Export Citation
  • Tandeo, P., and Coauthors, 2015: Combining analog method and ensemble data assimilation: Application to the Lorenz-63 chaotic system. Machine Learning and Data Mining Approaches to Climate Science, Springer, 3–12.

  • Tandeo, P., P. Ailliot, M. Bocquet, A. Carrassi, T. Miyoshi, M. Pulido, and Y. Zhen, 2020: A review of innovation-based methods to jointly estimate model and observation error covariance matrices in ensemble data assimilation. Mon. Wea. Rev., 148, 39733994, https://doi.org/10.1175/MWR-D-19-0240.1.

    • Search Google Scholar
    • Export Citation
  • Temam, R., 1988: Infinite-Dimensional Dynamical Systems in Mechanics and Physics. Springer-Verlag, 500 pp.

  • Tippett, M. K., and T. DelSole, 2013: Constructed analogs and linear regression. Mon. Wea. Rev., 141, 25192525, https://doi.org/10.1175/MWR-D-12-00223.1.

    • Search Google Scholar
    • Export Citation
  • Toth, Z., 1991: Intercomparison of circulation similarity measures. Mon. Wea. Rev., 119, 5564, https://doi.org/10.1175/1520-0493(1991)119<0055:IOCSM>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Van Den Dool, H. M., 1994: Searching for analogues, how long must we wait? Tellus, 46A, 314324, https://doi.org/10.3402/tellusa.v46i3.15481.

    • Search Google Scholar
    • Export Citation
  • Van Den Dool, H. M., 2007: Empirical Methods in Short-Term Climate Prediction. Oxford University Press, 240 pp.

  • Vannitsem, S., 2017: Predictability of large-scale atmospheric motions: Lyapunov exponents and error dynamics. Chaos, 27, 032101, https://doi.org/10.1063/1.4979042.

    • Search Google Scholar
    • Export Citation
  • Viswanath, D., 1998: Lyapunov exponents from random Fibonacci sequences to the Lorenz equations. Ph.D. dissertation, Cornell University, 94 pp.

  • Wang, X., J. Slawinska, and D. Giannakis, 2020: Extended-range statistical ENSO prediction through operator-theoretic techniques for nonlinear dynamics. Sci. Rep., 10, 2636, https://doi.org/10.1038/s41598-020-59128-7.

    • Search Google Scholar
    • Export Citation
  • Wolpert, D. H., 1996: The lack of a priori distinctions between learning algorithms. Neural Comput., 8, 13411390, https://doi.org/10.1162/neco.1996.8.7.1341.

    • Search Google Scholar
    • Export Citation
  • Yiou, P., 2014: AnaWEGE: A weather generator based on analogues of atmospheric circulation. Geosci. Model Dev., 7, 531543, https://doi.org/10.5194/gmd-7-531-2014.

    • Search Google Scholar
    • Export Citation
  • Zhao, Z., and D. Giannakis, 2016: Analog forecasting with dynamics-adapted kernels. Nonlinearity, 29, 28882939, https://doi.org/10.1088/0951-7715/29/9/2888.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    Analog forecasting operators presented in section 2b. The flow map Φt (x0) has a simple polynomial form. Analogs are drawn from a normal distribution centered on x0 and follow the same model as the real state x. The same analogs and flow maps are used for the three operators (a) LC, (b) LI, and (c) LL. Weights ωk are computed using Gaussian kernels. The size of the kth triangle is proportional to ωk. The real initial and future states x0 and xt are shown in full circles. The initial forecast distribution is given either by the analogs in (a) or by a Dirac delta at x0 in (b) and (c). The final forecast distribution is given either by the successors in (a) or by the increments added to x0 in (b) or by residuals added to a linear regression applied to x0 in (c).

  • Fig. 2.

    Illustrating Eqs. (6b) and (6e) on the three-variable L63 system. (a) A real trajectory from x0 to xt and two analog trajectories, namely the 10th best analog a010 to at10 and the 100th best analog a0100 to at100. The catalog is shown in white. (b) Comparing the exact value of the norm of atxt (full lines) and the sum of the two terms on the right-hand side of Eq. (6b) (dashed lines). (c) Contributions of the first term (black squares) and the second term (brown circles and blue triangles) of the right-hand side of Eq. (6b) projected on the first coordinate of the L63 system. (d) Contributions of the first term (black squares) and the second term (brown circles and blue triangles) of the right-hand side of Eq. (6e) projected on the first coordinate of the L63 system.

  • Fig. 3.

    Flow map Jacobian matrix estimation with the model of Lorenz (1996), without observational noise. Forecast lead time is t = 0.05 Lorenz time, catalog length is 104 Lorenz times, phase-space dimension is n = 8. K = 9 analogs are used for the forecast and Gaussian kernels for the weights ωk with shape parameter λ set to the median of analog-to-state distances |a0kx0|. (a) Jacobian matrix Φt|x0. (b) Linear regression matrix S using regular analogs. (c) Difference SΦt|x0 with regular analogs, also giving the value of RMSE below the plot. (d),(e) As in (a)–(c), but the linear regression is performed in a lower-dimensional subspace spanned by the first EOFs of the set of the K = 9 analogs. (f),(g) As in (d) and (e), but the linear regression is performed coordinate by coordinate, and assuming that the coefficients are zero two cells away from the diagonal.

  • Fig. 4.

    Empirical probability density function of RMSE in flow map Jacobian matrix estimation, depending on the method used. We use the system of Lorenz (1996) with phase-space dimension n = 8. K = 9 analogs are used for each forecast and the methods are as in Fig. 3.

  • Fig. 5.

    RMSE in estimating with analogs the L63 Jacobian matrix, as a function of catalog size. The brown circles indicate the median RMSE (with 10% and 90% quantiles) of the total (3 × 3) Jacobian matrix. The violet squares indicate the median RMSE (with 10% and 90% quantiles) of the (2 × 2) Jacobian matrix after projection on the two first EOFs of the successors and restriction to the two first EOFs of the analogs. The projection restriction implies much lower RMSE, and a much lower variability. Both estimation errors are decreasing functions of the catalog size. The number of test points decreases with catalog size, as more test points are needed for small catalogs.

  • Fig. 6.

    RMSE in analogs estimation of the full (3 × 3) Jacobian matrix ∇Φt as a function of analog rank and median analog distance, with the L63 system. The rank of each set of analogs is measured by the ratio between the lowest and the highest singular value of the set of analogs. Most of the variability of the RMSE is explained by the rank of the analogs. Some of the remaining variability can be explained by the median distance from the analogs a0k to x0, which gives a measure of the local catalog density. The catalog size is 105 nondimensional times, δ = 0, and we use K = 40 analogs. Tests are done at 104 points randomly selected on the attractor.

  • Fig. 7.

    Influence of noise intensity σ and number of analogs K on analog forecasting errors, using the Lorenz (1963) system, a catalog size of L = 107 and a time step dt = 0.01 between elements of the catalog. Median errors of mean analog forecasting operators |μLxt| are computed over 2000 points taken randomly on the attractor. Black curves indicate slopes associated with the maximal Lyapunov exponent.