Abstract

A numerical framework for data-based identification of nonstationary linear factor models is presented. The approach is based on the extension of the recently developed method for identification of persistent dynamical phases in multidimensional time series, permitting the identification of discontinuous temporal changes in underlying model parameters. The finite element method (FEM) discretization of the resulting variational functional is applied to reduce the dimensionality of the resulting problem and to construct the numerical iterative algorithm. The presented method results in the sparse sequential linear minimization problem with linear constrains. The performance of the framework is demonstrated for the following two application examples: (i) in the context of subgrid-scale parameterization for the Lorenz model with external forcing and (ii) in an analysis of climate impact factors acting on the blocking events in the upper troposphere. The importance of accounting for the nonstationarity issue is demonstrated in the second application example: modeling the 40-yr ECMWF Re-Analysis (ERA-40) geopotential time series via a single best stochastic model with time-independent coefficients leads to the conclusion that all of the considered external factors are found to be statistically insignificant, whereas considering the nonstationary model (which is demonstrated to be more appropriate in the sense of information theory) identified by the methodology presented in the paper results in identification of statistically significant external impact factor influences.

1. Introduction

The parameterization of reduced dynamical models describing the behavior of the observed (measured) multiscale data has been a field of intensive research in the last several years. Approaches introduced in atmospheric science include stochastic differential equations (Majda et al. 1999, 2003; Wilks 2005), multiscale schemes (Majda et al. 2003; Fatkullin and Vanden-Eijnden 2004), regression models (Orrell 2003), discrete Markov chain models (Khouider et al. 2003), hidden Markov models (Majda et al. 2006; Horenko et al. 2008a,b), and conditional Markov models (Crommelin and Vanden-Eijnden 2008). In the present paper a purely data-driven approach for parameterization by means of nonstationary multivariate autoregressive factor (VARX) models is introduced, based on the combination of the stationary VARX models widely used in econometrics (Tsay 2005) with the recently introduced finite element method (FEM)-clustering procedure (Horenko 2009, 2010b). The resulting numerical strategy is demonstrated to allow the multiscale approximation of the nonstationary dynamical processes via optimal sequences of locally stationary fast VARX processes and some slow (or persistent) hidden process switching between them. In an atmospheric context it was recently demonstrated that the FEM-clustering framework can be successfully applied to identify large-scale dynamical circulation patterns in realistic atmospheric general circulation models (AGCMs; Franzke et al. 2009). In the current manuscript the FEM-clustering framework is extended to allow for discontinuous hidden processes via the formulation of the respective variational problems in the space of the functions with bounded variation. Applications of the proposed method are demonstrated in two different scenarios and compared with standard purely data-driven methods (Orrell 2003; Wilks 2005): (i) in the context of the subgrid-scale parameterization for the Lorenz ’96 model (Lorenz 1996) with nonstationary forcing in the right-hand side and (ii) in the context of climate impact factor analysis for 40-yr European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis (ERA-40) historical geopotential data in Europe between 1958 and 2003 (Simmons and Gibson 2000).

The outline of the remainder of this paper is as follows. In section 1 the inverse problem for nonstationary dynamical systems is formulated as a clustering problem and it is demonstrated how the multiscale assumption can be incorporated into the resulting variational formulation via the persistency condition in the space of functions with bounded variation. The finite element method is deployed to reduce the dimensionality of the resulting optimization problem, and the iterative numerical scheme is introduced. In section 2 the numerical details of the resulting FEM–VARX clustering method are explained. In section 3 different strategies of postprocessing the clustering results are discussed with regard to their insight into the analyzed data. In section 4 the performance of the presented framework is demonstrated for two practical applications and the discussion is presented in section 5.

2. Constrained clustering method

a. Model distance functional

Let x0, … , xT ∈ Ψ ⊂ Rn be the observed n-dimensional time series with T + 1 snapshots in time interval [0, T]. In the following we will assume that in this time interval the considered time series xt is approximated by a time-discrete output of the certain direct mathematical model

 
formula

where F(·) is the model operator, τ is the model time step, is the memory depth (m = 1 for Markov models), and

 
formula

is a (time-dependent) set of the model parameters (including, if necessary in some model contexts, some initial and/or boundary values), where d is the dimension of a model parameter space. Let

 
formula

be a functional (called the model distance functional) describing the distance between some given xt at time t and the output of the model (1) calculated for a fixed set of parameters θ(t). In this case, for a given observation series x0, … , xT and some fixed functional form g(·), the inverse problem (or the parameter identification problem) can be approached via the solution of the following variational problem:

 
formula

subjected to the constraints given in (2). Problem (4) is clearly ill posed if no special assumptions about the temporal dependence of the unknown parameters θ(t) can be made. In the following we will assume that for any t ∈ [0, T] model distance, functional (3) can be represented as a convex linear combination of K ≥ 1 stationary model distance functionals—that is, model functionals dependent on some constant (time-independent) model parameters θiΩ, i = 1, … , K:

 
formula

with some time-dependent model affiliations γi(t) fulfilling the convexity condition

 
formula
 
formula

In another words, we assume here that at any time t the global time-dependent (or nonstationary) model distance functional (3) can be approximated by one of K local time-independent (or stationary) model distance functionals chosen according to some time-dependent probabilities (or model affiliations) Γ(t) = [γ1(t), … , γK(t)]. This idea for the inverse numerical problems is widely used in the context of data clustering (Höppner et al. 1999); in the presented general form it was introduced in Horenko (2010a) and stems from the classical spline interpolation approach for direct numerical problems (see, e.g., Deuflhard 2004). Inserting the ansatz (5) into (4) results in the minimization of the average clustering functional

 
formula

subject to (2), (6), and (7) with Θ = (θ1, … , θK).

To comprehend the above concepts it is instructive to consider a case where the direct model (1) has a following simple form:

 
formula

where εt is some independent identically distributed (i.i.d.) stochastic variable with zero expectation and θ(t): [0, T] → Rn is a time-dependent parameter describing the evolution of the expectation value of the process xt. The model distance functional (3) in such a case has the form

 
formula

and the corresponding average clustering functional can be numerically minimized applying the standard K-means clustering algorithm (Bezdek 1981; Höppner et al. 1999). This means that with the help of the ansatz (5), the solution of the nonstationary inverse problem for dynamical system (9) can be approached via the iterative clustering algorithm based on the minimization of the average clustering functional (8).

In the following it will be explained in detail how these concepts can be interpreted and applied in context of more general nonstationary multivariate models with external forcing (VARX models).

b. Nonstationary VARX models and VARX model distance functional

The stationary VARX model is a widely used dynamical multivariate tool to investigate the time series subject to external forcing (Brockwell and Davis 2002; Tsay 2005). If, in addition to the time series x0, … , xTRn (describing the internal degrees of freedom of the considered dynamical system), the time series u0, … , uTRl of the external influences (or forcing) is available, the nonstationary nonlinear VARX model has the following form:

 
formula

where ϕ1(xtτ, … , xt): RnmRd is some (in general nonlinear) function connecting the previous observations xtτ, … , xt; ϕ2(u) = {ϕ21[u(t)], … , ϕ2k[u(t)]} : RlRk is some fixed (nonlinear) function of the external factors; εt : [0, T] → Rh(hn) is a Gaussian process with zero expectation; ; μ(t) : [0, T] → Rn; 𝗔(t) : [0, T] → Rn×d; 𝗕(t) : [0, T] → Rn×k; and 𝗖(t) : [0, T] → Rn×h.

The most simple and straightforward form of the VARX model used in the literature is the linear autoregressive factor model with ϕ1(xtτ, … , xt) = [xtτ, … , xt] and a VARX model equation (Brockwell and Davis 2002):

 
formula

We will further assume that the noise matrix 𝗖(t) (describing the coupling between the h-dimensional Gaussian noise process to the analyzed time series xt) for any t ∈ [0, T] can be represented as

 
formula

where 𝗣(t) : [0, T] → Rn×h is an orthogonal matrix function and Λ(t) : [0, T] → Rh×h is a diagonal matrix function with nonnegative diagonal elements. Defining θ(t) = [μ(t), 𝗔(t), 𝗕(t), 𝗖(t)] and under the assumption (13), the VARX model distance functional (3) of dynamical system (11) can be written as

 
formula

where the 𝗣(t)-weighted norm ‖·‖𝗣(t) = [·𝗣(t), 𝗣(t)·]2 is used and † denotes the matrix transposition. The main advantage of the above definition of the least squares residual norm (14)—as compared to the standard Gaussian norm based on the 𝗖(t)-weighted norm ‖·‖𝗖(t) = [·, 𝗖(t)·]2—is that it preserves the norm of the original residuals of the least-squared problem in the essential noise dimension. If the aforementioned assumptions are fulfilled, then it is easy to demonstrate that the time series of model distances g[xt, θ(t)], t = 0, … , T is a χ2 process. In the application examples below it will be demonstrated how this property of the process can be used to estimate the confidence intervals of the model parameters. In the context of stationary VARX models (e.g., in the case of the time-independent parameter matrices 𝗔, 𝗕, and 𝗖) this property can be deployed to a posteriori check the model assumption (11) and to demonstrate the asymptotical normality of the resulting parameter estimates (i.e., to demonstrate that for T → ∞ the parameter estimates are distributed according to the multivariate Gaussian; Reinsel 1993). Applying the ansatz (5), the nonstationary parameter identification problem can be approached via the solution of the minimization problem (8) [with θi = (μi, 𝗔i, 𝗕i, 𝗖i)]. This can be done by means of the iterative clustering algorithm (just as described for the K-means clustering; Höppner et al. 1999; Horenko 2010b).

c. Incorporation of additional information

Direct numerical treatment of the problem (8) is hampered by the following issues: (i) the problem is ill posed since the number of unknowns can be higher then the number of known parameters and (ii) because of the nonlinearity of g, the problem is in general nonconvex and the numerical solution gained with some sort of local minimization algorithm depends on the initial parameter values (Deuflhard 2004). Perhaps it is even more important that the solution Γ of the above constrained minimization task might be an irregular function: to see this, let us assume that we already know the minimizing values Θ* for Θ. Then, the minimizer Γ* for the affiliation vector Γ has the following form:

 
formula

Thus, the datum xt has perfect affiliation with state i if the model distance functional for xt is minimal in state i. That is, if the process exhibits strong variability, then the affiliations are rather nonsmooth functions. Whenever the affiliation functions just take values 0 or 1, we will call them deterministic, in the sense that then for every datum in the time series it is certain to which cluster it belongs.

As demonstrated in the literature (Horenko 2009, 2010b), one of the possibilities for approaching the two aforementioned problems simultaneously is first to incorporate some additional information about the regularity of the observed process [e.g., in the form of smoothness assumptions in the space of time-continuous functions Γ(·)] and then to apply a finite Galerkin discretization (e.g., a FEM discretization) of this infinite-dimensional Hilbert space. In context of Tykhonov-based FEM-clustering methods, this was done assuming the weak differentiability of the time-continuous functions γi; that is,

 
formula

where W1,2(0, T) is the Sobolev space of weakly differentiable functions [e.g., functions with 2(0, T) integrable first derivatives].

As demonstrated in Horenko (2010a), one possibility to incorporate this a priori information from (16) into the optimization is to modify the functional (8) and to write it in the Tykhonov-regularized form

 
formula

However, introduction of the ε2-dependent penalty term in the formulation of the Tykhonov-regularized problem (17) changes the functional form of the original clustering problem (8) and biases the position of the solution of the respective minimization problem with growing ε2 (e.g., the solution of the regularized problem may have a significant deviation from the global minimum of the original problem). As was demonstrated in Horenko (2009, 2010b), increasing the value of ε2 leads to the “smoothing out” of the sharp transitions between the cluster states.1 Moreover, the formulation (16) of the persistency condition in W1,2 sense relies on the differentiability and continuity of the underlying cluster affiliation functions γi(t). This cannot in general be assumed to be granted in the cases where the transitions between the cluster states are sharp [e.g., when the function γi(t) has jumps and is discontinuous].

In the following, an alternative way to incorporate the persistency assumption into the original clustering problem (8) (avoiding the two abovementioned problems) will be presented, based on the formulation of the persistency condition in a functional space allowing for a discontinuity of its elements [functions with bounded variation, BV(0, T)].

d. Persistence in the BV(0, T) sense: The constrained BV-clustering method and FEM discretization

Instead of limiting ∂tγi in the 2 sense [which relies on the differentiability and continuity assumption for the cluster affiliation function γi(t)], we will consider the time-discrete functions γi(t) defined only at the time instances where the observations xt are available. We will formulate the persistency condition in the time-discrete BV(0, T) sense

 
formula

where the persistency parameter C defines the maximal number of transition between the cluster state i and all other states in time interval (0, T). Note that since at least in the time-continuous case W1,2(0, T) ⊂ BV(0, T) (Moreau 1988), this kind of persistency condition will also allow us to preserve the “smooth” W1,2 transitions between the cluster states in the continuous limit by an appropriate choice of the persistency parameter C.

Let 𝗗 be the discrete difference operator (the right-hand derivatives) with regard to t:

 
formula

For a given Θ = (θ1, … , θK), let

 
formula

Then problem (8) transforms to

 
formula

where † denotes the transposition operation. The above problem (21) is subject to the constraints

 
formula
 
formula
 
formula

The direct numerical optimization of the problem (21)(24) is hampered by the fact that the persistency constraint (22) makes the overall problem nondifferentiable. In the following, an adequate transformation of the above problem to the differentiable formulation will be introduced. This will allow us to use the standard numerical optimization methods in the context of (21)(24). We define υi := Dγi and split this into nonpositive and nonnegative parts (following the theorem about the unique representation of the BV functions; cf. Moreau 1988):

 
formula

Moreover, let 𝗗−1 be the discrete integration operator; for example,

 
formula

for some variable (defining the initial value for the cluster affiliation i at time 0). Now we can express γi as

 
formula

Defining

 
formula

we can express the original nondifferentiable optimization problem [(21)(24)] as the following differentiable optimization problem (for more details, see the appendix):

 
formula

in the vector space of the higher dimensionality [since by construction the dimension of variable i defined in (28) is almost twice as high as the dimension of the original variable γi]. The solution of the above minimization problem can be approached via the subspace iteration procedure (e.g., via the solution of the restrained optimization problems in parameter subspaces and θ subsequently). Completely analogously to the Tykhonov-regularized FEM-clustering case (Horenko 2009, 2010b), it can be demonstrated that this subspace iteration procedure converges toward the (local) minimum of the problem (29) if some appropriate assumptions (convexity and differentiability) of the model distance functional (3) are fulfilled.

However, since the dimensionality of the variable is growing as K(2T + 1) with the length of the analyzed time series, the numerical solution of the restrained problem (29) for a fixed value of Θ can become increasingly expensive for long time series. In the following the finite element method will be deployed to reduce the dimensionality of the above problem.

e. FEM discretization

Let {0 = τ0, τ1, τ2, … , τN−1, τN, τN+1 = T} be a finite subset of the interval [0, T] with uniform time steps δt. We can define a set of N ≪ (T + 1) time-discrete functions with bounded variation {f1(t), f2(t), … , fN(t)} defined at T + 1 time points of the observation series x, where each function fi(t) shall take positive values at the observation time instances of x in the time interval (τi−1, τi+1) and be zero at the time instances outside this interval.2 Now we represent the υ’s from (25) by these functions; thus,

 
formula

where χN+ and χN are discrete discretization errors. As δt goes to 1, the discretization errors become zero. This develops into a reduced discrete representation of the discrete problem (29) by using

 
formula

where 𝗪NR(T+1)×N is a FEM-basis matrix and i is defined analogously. Then the previously defined i can be approximated as

 
formula

Defining

 
formula

results in the FEM-discretized problem in the BV sense:

 
formula

Note that x has a dimensionality of K(2N + 1), which is much less then the dimension K(2T + 1) of the original variable if NT. Analogously to the Tykhonov-regularized FEM-clustering problem described in Horenko (2009), adaptive FEM techniques can be deployed to find the optimal set of time intervals δt for a given total discretization error δ̃.3

For a fixed Θ the above minimization problem is a linear minimization with regard to x with linear equality and inequality constraints. This problem can be solved by means of some standard numerical methods of linear programming like simplex or interior point methods. For the fixed parameter x, the above problem is an unconstrained minimization problem that can be solved analytically if the model distance functional g is convex with regard to θ and the problem ∂g(xt, θ)/θ = 0 has a unique analytical solution with regard to θ (as will be demonstrated in the following, this is the case for the VARX models). The iterations can be repeated until the change of the functional value does not exceed some predefined threshold for the change of the functional value. The algorithm is:

  • choose an arbitrary x0

  • set s = 1

  • while not converged repeat

    • step 1: solve (34) with respect to (w.r.t.) Θ for a fixed xs−1 analytically and identify Θs

    • step 2: solve (34) w.r.t. x for a fixed Θs numerically (via linear programming) and identify xs

    • set s = s + 1

Compared to the Tykhonov-regularized FEM-clustering problem as described in Horenko (2009, 2010b), the above problem [(34)] has two major numerical advantages. First, while for the Tykhonov-regularized FEM-clustering problem the persistence could be influenced only indirectly through the choice of the regularization parameter ε2, in the context of (34) it is directly controllable via the persistency threshold C. Second, a sparse linear programming problem is solved in each iteration of the above algorithm (compared to the more numerically expensive quadratic programming in the case of the Tykhonov-regularized FEM-clustering problem).

In the next section it will be demonstrated how the above numerical procedure can be formulated for the parameter identification of nonstationary VARX models (11).

3. FEM–VARX clustering: Parameter estimation and determination of optimal values for number of states and persistency threshold

a. Derivation of the estimator formulas

In every iteration s of the subspace iteration algorithm described above, the unconstrained minimization problem

 
formula

is solved for a fixed value of (s−1) (step 2 of the above algorithm). If the dynamics of the analyzed time series is assumed to be governed by the VARX process (11) with memory m, the model distance functional in the clustering problem formulation will take the form (14). Let 𝗔𝗕i = (μi, 𝗔i, 𝗕i) ∈ Rn×(1+d+k) [where d is the dimensionality of the output of function ϕ1(·)] and Γs−1(t) be the cluster affiliations reconstructed from the current values s−1 via the subsequent application of Eqs. (30) and (32). After inserting Γs−1(t) into the function gi(θi), i = 1, … , K and taking the derivative of the functional L0 (34) w.r.t. the new variables 𝗔𝗕i, we get the optimal estimators for 𝗔𝗕si in the iteration s as the solution of the following system of linear equations:4

 
formula

where ϕ1t = ϕ1(xtτ, … , xt) and ϕ2t = ϕ2[u(t)]. If the matrix (𝗫*s𝗫s) is invertible, then the solution of the above system exists, is unique, and can be expressed as

 
formula

where * denotes the matrix conjugate (Golub and Loan 1989). The optimal estimates of the noise parameters 𝗖i are straightforwardly calculated from the covariance matrices of the model residuals in the cluster state i:

 
formula

Equations (37) and (38) give explicit estimator expressions that are used in step 2 of the subspace iteration algorithm.

As can be seen from the above estimator formulas, from the view point of the inverse numerical problem there is no difference between linear (12) and nonlinear (11) factor models: both result in the solution of a linear least squares problem (36). This is explained by the fact that in both cases the right-hand sides of the models are linear functions of model parameters. However, as will be demonstrated in the following, linear autoregressive models can in some situations provide more insight, allowing us to use the available tools of linear data analysis, and can be successfully applied to analyze realistic (nonlinear) data.

b. FEM–VARX model order selection for fixedK and C

To select a proper model order m and the optimal functional form ϕ2(u) = {ϕ21[u(t)], … , ϕ2k[u(t)]} for the external factors, standard tools of information theory such as the Akaike information criterion (AIC) or the Bayesian information criterion (BIC) McQuarrie and Tsai (1998) developed for the linear stationary VARX models can be applied a posteriori to the locally stationary VARX models identified via the FEM–VARX procedure. In terms of the information criteria, various models are compared wrt the special functional consistent of the model log likelihood with an added regularization term penalizing the total number of parameters involved in the model (to avoid overfitting). For example, the BIC functional (being in general more robust than AIC; cf. McQuarrie and Tsai 1998) for the cluster state i will have the form

 
formula

where i = γigi(θi) and Ni is the number of the model parameters in the cluster state i. Given any two estimated cluster models [e.g., models with different memory depth and/or different functions ϕ2(u)], the model with the lower value of BIC(i) is the one to be preferred. In the following it will be demonstrated how the criterion (39) can be applied in the praxis to make a decision about the optimal functional form of the local VARX models and to test the significance of different external factors u for the dynamical process explaining the analyzed time series data.

c. Choosing the optimal number of local models K

The upper bound for the number of statistically distinguishable cluster states for each value of the persistency threshold C can be algorithmically estimated in the following way: starting with some a priori chosen (big) K, one solves the optimization problem (34) for different fixed value of C and calculates the confidence intervals of the resulting local parameter matrices Θi, i = 1, … , K (this can be done applying standard bootstrap sampling procedures; see Chernik 2007). If two of the estimated parameter sets for two of the identified cluster states have confidence intervals that overlap in all components, this means that respective clusters are statistically indistinguishable and the whole procedure must be repeated for K = K − 1. If at a certain point all of the matrices are statistically distinguishable, the procedure is stopped and Kmax(C) = K.

Another possible means of estimating the optimal number of clusters can be used if the identified transition process Γ(t) is shown to be Markovian for given K, C. Markovianity can be verified applying some standard tests (e.g., one can check the generator structure of the hidden process; see Metzner et al. 2007).5 In such a case the hidden transition matrix can be calculated and its spectrum can be examined for a presence of the spectral gap (a gap separating the fast and the slow time scales in the Markov dynamics). If the spectral gap is present, then the number of the dominant eigenvalues (i.e., eigenvalues between the spectral gap and 1.0) gives the number of the metastable clusters in the system (Schütte and Huisinga 2003). Positive verification of the Markovianity of the hidden process has an additional advantage: it allows us to construct a reduced dynamical model of the analyzed process and to estimate some dynamical characteristics of the analyzed process (e.g., one can calculate relative statistical weights, mean exit times, and mean first passage times for the identified clusters) (Horenko et al. 2008a).

Verification of the Markov assumption also allows us to construct predictive Markovian models of the persistent dynamical process Γ(t) switching between the cluster states (Horenko et al. 2008a; Horenko 2009). As will be demonstrated for the numerical examples in the following, the respective K × K Markovian transition matrix, together with the parameter estimates (37) and (38) of the locally stationary cluster states, can help to construct the dynamical ensemble prediction models. It is important to mention that without this a posteriori Markov verification, the minimum of the functional (34) cannot be directly used to generate the data-based predictions since the identified cluster affiliation function Γ is just an abstract BV function with some predefined persistency C and has no intrinsic representation in terms of the underlying dynamical process.

d. Selection of the optimal persistency threshold C for a given number of states K

Linearity of the functional (34) wrt for fixed values of model distance parameters θi can help to apply the standard instruments from the theory of ill-posed linear problems, such as the L-curve approach (Calvetti et al. 2000), to identify the optimal value of the persistency threshold C for a fixed number of clusters K. For example, the optimal value of C can be determined as the edge point (or the point of maximal curvature) on a two-dimensional plot. This plot depicts a dependence of the residuum-norm of the solution from C (Horenko 2009). Alternatively, as would be demonstrated in the numerical examples, decreasing C up to some point [meaning the increasing regularity of the optimal solutions of (34) in the BV sense] results in decreasing the respective values L0 of the clustering functional in the solution point (meaning the increasing quality of the resulting clustering). The point with the minimal value of L0 in such a case indicates the optimal persistency threshold guaranteeing the best clustering quality.

4. Postprocessing of the FEM–VARX clustering results

Application of the FEM–VARX numerical scheme with fixed values of K and C results in the identification of K optimal locally stationary VARX models and the persistent BV(0, T) function Γ(t) switching between them. As explained above, the a posteriori verification of the Markovian hypothesis for Γ(t) allows us to construct the reduced representation of the overall dynamics in the multiscale sense (e.g., as some slow persistent Markovian process switching between K locally stationary VARX parameter sets). Postprocessing of the derived local VARX models can give some additional insight into the analyzed time series. For example, expectation values of mean dynamical equilibrium positions of the analyzed dynamical process in the cluster state i can be calculated as functions of the external forcing ut. In the case of the linear autoregressive VARX models (12), this is done via the solution of the following system of linear equations:6

 
formula

Note that if 𝗔qi = 0, ∀ q, then the above result is equivalent to the multivariate trend estimate in context of the recently introduced FEM–K-trends clustering algorithm (Horenko 2010b).

The local linearity of the identified VARX models also allows us to apply various techniques of model reduction known in the literature, such as proper orthogonal decomposition or balanced truncation (Moore 1981) allowing for construction of energy-preserving low-rank approximations to the identified VARX models.

Another kind of insight into the underlying multidimensional dynamics can be gained via the analysis of the Fourier transforms of the identified models (e.g., via the analysis of the transfer function matrices and directed transfer functions; Pereda et al. 2005). This kind of analysis can help to quantify the causal influence of different data dimensions on each other.

5. Numerical examples

In the following we will illustrate the proposed FEM–VARX clustering strategy on two practical examples: first, on data from the Lorenz ’96 model (Lorenz 1996) with external periodical forcing switching between the deterministic and the chaotic regime behavior and second, on a set of averaged daily ERA-40 500-hPa geopotential data between 1958 and 2003 on a 16 × 9 spatial grid covering Europe and part of the North Atlantic.

a. Subgrid-scale modeling for Lorenz ’96 system with forcing

The Lorenz ’96 model of type II (Lorenz 1996; Orrell 2003) is a two-scale simplified ordinary differential equation (ODE) describing advection, damping, and forcing of some (slow) resolved atmospheric variable i being coupled to some (fast) subscale variables yi,j:

 
formula

with all of the model parameters ( = 8, = 4, h = 1, b = c = 10) set to be the same as in the paper of Orrell (2003). In contrast to the paper of Orrell (2003), we choose the external forcing to be an explicitly time-dependent function of the form F(t) = 10 sin(2π/80t), resulting in switching between “deterministic” and “chaotic” regime behavior (Orrell and Smith 2003) on the time scale of the forcing (e.g., producing a system with three different time scales).

We generate one realization of (41) in the time interval [0, T] with an adaptive Runge–Kutta method of the fourth order (MATLAB command ode45), resulting in a 40-dimensional time series with 4000 instances (time step τ = 0.1). We further aim at parameterization of the subgrid-scale influences xti = Fty(t) in the nonstationary VARX form (11) with external factors defined by the resolved variables {ϕ2[u(t)] = [1(t), … , 8(t)]}. Figure 1 demonstrates the application of the FEM–VARX clustering to the time series of xt and ϕ2[u(t)] defined in such a way for K = 2 (number of clusters), q = 1 (memory depth), and with N = 200 (number of finite elements) for various numbers of persistency threshold parameter. The minimum is achieved for C = 6, which corresponds to the number of transitions through the bifurcation point of (41). As can be seen from Fig. 2, the FEM–VARX clustering procedure for K = 2 and C = 6 identifies both dynamical regimes of the model (41), whereas the minimization of (34) without the persistency constraint (or, equivalently, C = 2N = 400) results in a much worse clustering result (in terms of the optimal value of L0) and is due to the trapping in the local minimum of the clustering functional. Inspection of the confidence intervals of the estimated local stationary VARX model parameters in Fig. 3 reveals that whereas the matrices 𝗔 are statistically indistinguishable (e.g., the “own dynamics” of the subgrid-scale process is basically the same in both states), there are statistically significant differences in 𝗕 (in this case it governs the coupling from the resolved degrees of freedom to subgrid-scale degrees of freedom). Also, the noise intensities 𝗖 are different in both cases (in the identified chaotic regime the noise intensity is significantly higher).

Fig. 1.

Optimal value of the functional (34) for different values of persistence threshold C as calculated for the time series of forced Lorenz ’96 model [K = 2, N = 200, parameters of the Lorenz ’96 model as specified in text; each C optimization is repeated 100 times with randomly generated initial values and the result with minimal L0(C) is kept].

Fig. 1.

Optimal value of the functional (34) for different values of persistence threshold C as calculated for the time series of forced Lorenz ’96 model [K = 2, N = 200, parameters of the Lorenz ’96 model as specified in text; each C optimization is repeated 100 times with randomly generated initial values and the result with minimal L0(C) is kept].

Fig. 2.

(a) Cluster affiliation function γ1(t) (the calculation is performed with the FEM–VARX algorithm for C = 6 (dashed) and C = 400 (dotted). (b) Fragment of the subscale data F3y(t) together with the data affiliation corresponding to the cluster affiliation function from (a) (C = 6, N = 200, K = 2).

Fig. 2.

(a) Cluster affiliation function γ1(t) (the calculation is performed with the FEM–VARX algorithm for C = 6 (dashed) and C = 400 (dotted). (b) Fragment of the subscale data F3y(t) together with the data affiliation corresponding to the cluster affiliation function from (a) (C = 6, N = 200, K = 2).

Fig. 3.

(a)–(c) Diagonal elements of the local VARX parameters (C = 6, N = 200, K = 2).

Fig. 3.

(a)–(c) Diagonal elements of the local VARX parameters (C = 6, N = 200, K = 2).

Finally, we verify the Markov assumption for the identified affiliation function Γ(t), generate the ensemble predictions (with 10 000 ensemble members) based on the estimated eight-dimensional FEM–VARX parameterization (based on the first 90% of the data), calculate the expectation values of the relative prediction errors (in 2-norm, based on the last 10% of the data; black solid line in Fig. 4), and compare them with predictions obtained by other data-driven models (see Fig. 4): (i) stationary constant (gray dotted line) and regressive (black dashed–dotted line) subgrid-scale models (Orrell 2003); (ii) stationary linear stochastic model (black dashed line) (Wilks 2005); and (iii) one-dimensional FEM–VARX model, estimated under the assumption that different subgrid-scale processes in different dimensions do not interact (e.g., the matrices 𝗔 are diagonal) (gray solid line). Figure 4 shows that the fully interactive eight-dimensional persistent FEM–VARX model has a much better predictive skill for the considered nonstationary model series.

Fig. 4.

Comparison of the mean relative prediction errors calculated for the last 10% of the time series based on the different models trained on the first 90% of the data (see description in text).

Fig. 4.

Comparison of the mean relative prediction errors calculated for the last 10% of the time series based on the different models trained on the first 90% of the data (see description in text).

b. Analysis of ERA-40 geopotential data in Europe (1958–2003)

Using the FEM–VARX method introduced above, we analyze daily mean values of the 500-hPa air temperature field from the ERA-40 reanalysis data (Simmons and Gibson 2000).7 We consider a region with the coordinates 32.5°–75.0°N, 27.5°W–47.5°E, which includes Europe and a part of the eastern North Atlantic. The resolution of the data is 5°, which implies a grid with 16 points in the zonal and 9 in the meridional direction. For the analysis we have considered temperature values only for the period 1 December 1958 to 31 July 2003; thus, we end with a 144-dimensional time series of 16 314 days. To remove the seasonal trend we apply a standard procedure, where from each value in the time series we subtract a mean build over all values corresponding to the same day and month (e.g., from the data on 1 January 1959 we subtract the mean value over all days that are the first of January, and so on). The resulting deviations ΔH(t) of the geopotential heights from their seasonal mean values are the subject of the data analysis in the following.

Since the overall dimensionality of 144 of the analyzed data series will induce uncertainties too high for the given time span of 16 314 days, the scope of the data analysis in the following is reduced to the projections on 20 dominant EOFs (describing approximately 99% of the total data variance).

Three different external influence factors ut are tested on their significance for the change of the EOF projection data xt in the setting of the VARX models (11): (i) atmospheric CO2 values ut1 from the Mauna Loa observatory (data are available online at http://cdiac.ornl.gov/ftp/trends/co2/maunaloa.co2), (ii) the seasonal trend factor in the form ut2 = sin(2π/365.4t), and (iii) the sunspot cycle data ut3 (data are available online at http://solarscience.msfc.nasa.gov/SunspotCycle.shtml).

We start the data analysis parameterizing the globally stationary VARX model for 20 dominant EOFs with different sets of external factors described above for various values of the memory parameter q, applying the BIC test to verify the statistical significance of the factors and testing the independent identically distributed (i.i.d.) assumptions for the model residuals with the AR(1) test (Brockwell and Davis 2002) (to check the validity of the underlying estimator assumptions). It is found that the optimal globally stationary VARX model for the analyzed data is the one with q = 2 and no external factors; that is, in the globally stationary framework all of the external factors are found to be statistically nonsignificant.

Application of the nonstationary FEM–VARX framework according to the K selection procedure described above results in determining K = 4 as the maximal number of statistically distinguishable local VARX states (valid for a wide range of parameters C, N, q). For a fixed value of K = 4 we further determine the optimal value of the persistency threshold C (in the same manner demonstrated in Fig. 1), determine the optimal value of memory q = 2 via the BIC test (e.g., the same as for the globally stationary VARX model), and test the i.i.d. assumptions for the model residuals with the AR(1)-test (Brockwell and Davis 2002). Next, we repeat the verification of external factors in the same way as described above for the globally stationary VARX model. As shown in Fig. 5, the BIC test confirms the hypothesis that the optimal FEM–VARX model is the one with the two external factors ut1 (atmospheric CO2 values) and ut2 (seasonal factor), whereas the influence of the sunspot cycle data ut3 is found to be statistically negligible. Therefore, all of the further tests are conducted only with the factors ut1 and ut2.

Fig. 5.

Differences between BIC (39) as calculated for different EOF dimensions of the VARX models with and without the factors ut1 and ut2 (negative values indicate the EOF dimensions where the influence of both factors is statistically nonsignificant) for the global stationary linear VARX model (dashed) and for the local stationary linear FEM–VARX factor models (calculated with K = 4, N = 4000, C = 3000, q = 2) (solid lines). The dotted zero-line marks the statistical significance level (components above this line are significant in a sense of the BIC criterion).

Fig. 5.

Differences between BIC (39) as calculated for different EOF dimensions of the VARX models with and without the factors ut1 and ut2 (negative values indicate the EOF dimensions where the influence of both factors is statistically nonsignificant) for the global stationary linear VARX model (dashed) and for the local stationary linear FEM–VARX factor models (calculated with K = 4, N = 4000, C = 3000, q = 2) (solid lines). The dotted zero-line marks the statistical significance level (components above this line are significant in a sense of the BIC criterion).

Inspection of the respective cluster affiliation functions γi(t) (see Fig. 6) together with the mean equilibrium positions in the cluster states (40) reveals (see Figs. 7 and 8) that two of the identified cluster states, namely states 3 and 4, are describing the blocking situation in the upper troposphere. Figure 6 shows the comparison of the sum of cluster affiliations γ3(t) + γ4(t) for the par of the time series with the scaled zonally averaged Lejenas–Okland blocking index. It indicates the appearance of a blocking anticyclone and the duration of the event. We have a blocking if the geopotential height difference at 500 hP between 40° and 60°N is negative over a region with 20° zonal extent. The exact formula is given in Lupo et al. (1997); for the purpose of representation, we have computed a zonally averaged value of the index, rescaled it, and reversed its sign.

Fig. 6.

Comparison of the negative Lejenas–Oakland blocking index (dashed line) and the sum of cluster affiliations of locally linear states 3 and 4 (solid line, calculated with FEM–VARX for K = 4, N = 4000, C = 3000, q = 2).

Fig. 6.

Comparison of the negative Lejenas–Oakland blocking index (dashed line) and the sum of cluster affiliations of locally linear states 3 and 4 (solid line, calculated with FEM–VARX for K = 4, N = 4000, C = 3000, q = 2).

Fig. 7.

FEM–VARX cluster state 3 (blocking state): Mean dynamical equilibrium positions [see Eq. (40)] for the deviations ΔH of the geopotential heights (solid) and their confidence intervals (dashed) at different times (revealing the influence of both external factors separately; see explanation in text).

Fig. 7.

FEM–VARX cluster state 3 (blocking state): Mean dynamical equilibrium positions [see Eq. (40)] for the deviations ΔH of the geopotential heights (solid) and their confidence intervals (dashed) at different times (revealing the influence of both external factors separately; see explanation in text).

Fig. 8.

As in Fig. 7, but for FEM–VARX cluster state 4 (blocking state).

Fig. 8.

As in Fig. 7, but for FEM–VARX cluster state 4 (blocking state).

Next we aim at demonstrating how the postprocessing methods described in section 3 can help us gain additional insight into the data to understand the impact of the external factors. Figures 7 and 8 show the EOF back-projection to the original 144-dimensional space of the mean dynamical equilibrium positions in clusters 3 and 4 calculated at different times with the respective values of the factors ut1 and ut2 As can be seen from the graphics, the impact of the seasonal factor ut2 results in significant weakening of the blocking states in summer, a finding consistent with the observations reported in the literature. The impact of the increasing CO2 concentration also results in a weakening of the blocking situation in both states 3 and 4 that is less pronounced but still statistically significant (see the respective confidence intervals in Figs. 7 and 8).

Finally, in the same manner as in the previous nonstationary Lorenz ’96 example, we construct an ensemble prediction model (with 10 000 ensemble members) based on the FEM–VARX clustering results and compare the resulting predictions with the ones obtained by other methods. First, we verify the Markov assumption in the time-homogenous approximation for the identified affiliation function Γ(t), generate the ensemble predictions based on the estimated 20-dimensional FEM–VARX parameterization (based on the first 90% of the data), and calculate the expectation values of the relative prediction errors (in 2-norm, based on the last 10% of the data). As can be seen from Fig. 9, FEM–VARX ensemble predictions produce much better predictions than the constant model (where the prediction is always the expectation value over the whole history) and the “same as today model” (where the prediction for the next day is just the state of the system now). Compared with the globally stationary VARX model (where the ensemble prediction is calculated from 10 000 ensemble members of the global VARX model), FEM–VARX produces only slightly better predictions (approximately 2% better). This observation could be explained by the low overall persistence of the process Γ(t) compared, for example, to the previous Lorenz ’96 example where the hidden process was very persistent. This issue needs a deeper understanding and is a matter of future research.

Fig. 9.

Comparison of the mean relative prediction errors calculated for the last 10% of the time series based on the different models trained on the first 90% of the data (see detailed description and discussion in text).

Fig. 9.

Comparison of the mean relative prediction errors calculated for the last 10% of the time series based on the different models trained on the first 90% of the data (see detailed description and discussion in text).

6. Conclusions and discussion

A numerical FEM–VARX scheme for a data-driven parameterization of nonstationary multiscale dynamical processes was introduced. As demonstrated in the present paper, the application of the FEM–VARX method allows for the good description of the analyzed nonlinear and nonstationary data (in terms of recovering the nonlinear effects associated with the regime switching and in terms of the good prediction quality of the resulting reduced representation). Furthermore, a wide range of data-analysis techniques—for example, from information theory (like the Bayesian information criterion; cf. McQuarrie and Tsai 1998), model reduction approaches (like balanced truncation; cf. Moore 1981) and adaptive finite element methods—has become available and can be applied in the FEM–VARX context to postprocess the obtained results and to get additional insight into the analyzed data. In contrast to other multiscale approaches known from the literature (Majda et al. 2003; Fatkullin and Vanden-Eijnden 2004), the presented numerical scheme is not a systematic strategy based on the knowledge of some first principles but is purely data driven and results in an approximation of the analyzed process by means of a sequence of “simple” linear factor models. From this perspective, it is important to put further efforts into development of mixed numerical schemes, combining some features of systematic “first principles” approaches (and with this some a priori knowledge about the analyzed dynamics) with some aspects of purely data-driven approaches (like the one presented in this paper).

One of the main emphases in the present paper was on implicit mathematical assumptions being made at different stages of the derivation of the numerical method and postprocessing of the obtained results. In context of the meteorological application it was shown how big the impact of the implicit stationarity assumption is on the analysis of climate factors’ influence: whereas in the case of the globally stationary VARX model with constant coefficients, the impacts of the seasonal trend and CO2 concentration were found to be statistically insignificant, the nonstationary FEM–VARX clearly reveals the statistical significance of the two factors (see Fig. 5). It is important to emphasize that the applicability of the presented method, as well as subsequent interpretation and postprocessing of the obtained results, is dependent on the fulfillment of the mathematical assumptions involved. Conclusions that are drawn in every specific application case are reliable only if those assumptions are fulfilled; this is one of the most important issues to be kept in mind when applying methods such as the one presented in this paper.

Acknowledgments

The work was partially supported by the DFG SPP 1276 MetStroem “Meteorology and Turbulence Mechanics”, the DFG Research Center “Matheon” (Project E8), and the Center of Scientific Simulations (CSS Berlin).

REFERENCES

REFERENCES
Bezdek
,
J.
,
1981
:
Pattern Recognition with Fuzzy Objective Function Algorithms.
Plenum Press, 256 pp
.
Braess
,
D.
,
2007
:
Finite Elements: Theory, Fast Solvers, and Applications in Solid Mechanics.
3rd ed. Cambridge University Press, 365 pp
.
Brockwell
,
P.
and
R.
Davis
,
2002
:
Introduction to Time Series and Forecasting.
Springer, 434 pp
.
Calvetti
,
D.
,
S.
Morigi
,
L.
Reichel
, and
F.
Sgallari
,
2000
:
Tikhonov regularization and the L-curve for large discrete ill-posed problems.
J. Comput. Appl. Math.
,
123
:
423
446
.
doi:10.1016/S0377-0427(00)00414-3
.
Chernik
,
M.
,
2007
:
Bootstrap Methods: A Guide for Practitioners and Researchers.
Series in Probability and Statistics, Wiley, 400 pp
.
Crommelin
,
D.
and
E.
Vanden-Eijnden
,
2008
:
Subgrid-scale parameterization with conditional Markov chains.
J. Atmos. Sci.
,
65
:
2661
2675
.
Deuflhard
,
P.
,
2004
:
Newton Methods for Nonlinear Problems: Affine Invariance and Adaptive Algorithms.
Springer Series in Computational Mathematics, Vol. 35, Springer, 424 pp
.
Fatkullin
,
I.
and
E.
Vanden-Eijnden
,
2004
:
A computational strategy for multiscale systems with applications to the Lorenz ’96 model.
J. Comp. Phys.
,
200
:
605
638
.
Franzke
,
C.
,
I.
Horenko
,
A.
Majda
, and
R.
Klein
,
2009
:
Systematic metastable atmospheric regime identification in an AGCM.
J. Atmos. Sci.
,
66
:
1997
2012
.
Golub
,
G.
and
C. V.
Loan
,
1989
:
Matrix Computations.
2nd ed. Johns Hopkins University Press, 642 pp
.
Höppner
,
F.
,
F.
Klawonn
,
R.
Kruse
, and
T.
Runkler
,
1999
:
Fuzzy Cluster Analysis: Methods for Classification, Data Analysis, and Image Recognition.
John Wiley and Sons, 289 pp
.
Horenko
,
I.
,
2009
:
On robust estimation of low-frequency variability trends in discrete Markovian sequences of atmospheric circulation patterns.
J. Atmos. Sci.
,
66
:
2059
2072
.
Horenko
,
I.
,
2010a
:
Finite element approach to clustering of multidimensional time series.
SIAM J. Sci. Comput.
,
32
:
62
83
.
Horenko
,
I.
,
2010b
:
On clustering of non-stationary meteorological time series.
Dyn. Atmos. Oceans
,
49
:
164
187
.
Horenko
,
I.
,
S.
Dolaptchiev
,
A.
Eliseev
,
I.
Mokhov
, and
R.
Klein
,
2008a
:
Metastable decomposition of high-dimensional meteorological data with gaps.
J. Atmos. Sci.
,
65
:
3479
3496
.
Horenko
,
I.
,
R.
Klein
,
S.
Dolaptchiev
, and
C.
Schuette
,
2008b
:
Automated generation of reduced stochastic weather models. I: Simultaneous dimension and model reduction for time series analysis.
Multiscale Model. Simul.
,
6
:
1125
1145
.
Khouider
,
B.
,
A.
Majda
, and
M.
Katsoulakis
,
2003
:
Coarse-grained stochastic processes for tropical convection and climate.
Proc. Natl. Acad. Sci. USA
,
100
:
11941
11946
.
Lorenz
,
E. N.
,
1996
:
Predictability—A problem partly solved.
Proc. Seminar on Predictability, Vol. 1, Reading, United Kingdom, ECMWF, 1–18
.
Lupo
,
A. R.
,
R. J.
Oglesby
, and
I. I.
Mokhov
,
1997
:
Climatological features of blocking anticyclones: A study of Northern Hemisphere CCM1 model blocking events in present-day and double CO2 concentration atmospheres.
Climate Dyn.
,
13
:
181
195
.
Majda
,
A.
,
I.
Timofeev
, and
E.
Vanden-Eijnden
,
1999
:
Models for stochastic climate prediction.
Proc. Natl. Acad. Sci. USA
,
96
:
14687
14691
.
Majda
,
A.
,
I.
Timofeev
, and
E.
Vanden-Eijnden
,
2003
:
Systematic strategies for stochastic mode reduction in climate.
J. Atmos. Sci.
,
60
:
1705
1722
.
Majda
,
A.
,
C.
Franzke
,
A.
Fischer
, and
D.
Crommelin
,
2006
:
Distinct metastable atmospheric regimes despite nearly Gaussian statistics: A paradigm model.
Proc. Natl. Acad. Sci. USA
,
103
:
8309
8314
.
McQuarrie
,
A.
and
C.
Tsai
,
1998
:
Regression and Time Series Model Selection.
World Scientific, 455 pp
.
Metzner
,
P.
,
I.
Horenko
, and
C.
Schütte
,
2007
:
Generator estimation of Markov jump processes based on incomplete observations nonequidistant in time.
Phys. Rev. E
,
76
:
066702
.
doi:10.1103/PhysRevE.76.066702
.
Moore
,
B.
,
1981
:
Principal component analysis in linear systems: controllability, observability, and model reduction.
IEEE Trans. Autom. Control
,
26
:
17
32
.
Moreau
,
J-J.
,
1988
:
Bounded variation in time.
Topics in Nonsmooth Mechanics, J.-J. Moreau, P. D. Panagiotopoulos, and G. Strang, Eds., Birkhauser, 1–74
.
Orrell
,
D.
,
2003
:
Model error and predictability over different time scales in the Lorenz ‘96 systems.
J. Atmos. Sci.
,
60
:
2219
2228
.
Orrell
,
D.
and
L.
Smith
,
2003
:
The spectral bifurcation diagram: Visualizing bifurcations in high-dimensional systems.
Int. J. Bifurcation Chaos
,
13
:
3015
3027
.
Pereda
,
E.
,
R.
Quiroga
, and
J.
Bhattacharya
,
2005
:
Nonlinear multivariate analysis of neurophysiological signals.
Prog. Neurobiol.
,
77
:
1
37
.
Reinsel
,
G.
,
1993
:
Elements of Multivariate Time Series Analysis.
1st ed. Springer, 263 pp
.
Schütte
,
C.
and
W.
Huisinga
,
2003
:
Biomolecular conformations can be identified as metastable sets of molecular dynamics.
Handbook of Numerical Analysis, P. G. Ciaret and J.-L. Lions, Eds., Vol. X, Elsevier, 699–744
.
Simmons
,
A.
and
J.
Gibson
,
2000
:
The ERA-40 project plan.
ERA-40 Project Rep. Ser. 1, ECMWF, 63 pp
.
Tsay
,
R.
,
2005
:
Analysis of Financial Time Series.
2nd ed. Wiley-Interscience, 605 pp
.
Wilks
,
D. S.
,
2005
:
Effects of stochastic parametrizations in the Lorenz ‘96 model.
Quart. J. Roy. Meteor. Soc.
,
131
:
389
407
.

APPENDIX

Making the BV Problem Differentiable

Inserting (25) and (27) into (21), we get

 
formula

subject to

 
formula
 
formula
 
formula
 
formula

Note that the second expression above is the equivalent transformation of the original equality condition (6); therefore, we cannot just set without losing the preservation of the equality constrain (6) in the transformed optimization problem.8 By defining

 
formula
 
formula
 
formula
 
formula

and using the definition (28), we can rewrite (21)(24) in the matrix form (29).

Footnotes

Corresponding author address: Illia Horenko, Institute of Mathematics, Free University of Berlin, Arnimallee 6, 14195 Berlin, Germany. Email: horenko@math.fu-berlin.de

1

The influence of the Tykhonov regularization in the W1,2 Sobolev norm is equivalent to the action of the diffusion operator ε2Δt on γi(t) [e.g., the increase of the regularization parameter ε2 results in the stronger diffusion of the affiliation function γi(t) through the interface separating the clusters in time]. For a detailed discussion of the W1,2 regularization effects and their influence on different clustering scenarios, see Horenko (2009, 2010b).

2

For practical examples of standard finite element function sets fi(t) (such as linear finite elements) in discrete time, see Horenko (2010b) and Braess (2007).

3

This can be done in a standard way, controlling the norm of the discretization errors χN+, χN locally and applying the multigrid approach to guarantee that |χN±| ≤ δ̃ (Braess 2007).

4

We get use of the convexity of the functional g[xt, θi] = ‖xtμi − 𝗔iϕ1(xtτ, … , xt) − 𝗕iϕ2[u(t)]‖Pi with respect to parameters 𝗔𝗕i and the necessary minimum condition for convex functionals.

5

Note that all of the numerical criteria for verification of the Markov assumption known in the literature imply the stationarity (or homogeneity) of the underlying transition matrix and are in general not applicable if the analyzed process is known to be strongly nonstationary.

6

Equation (40) is derived under the assumptions that the analyzed data x are locally weakly stationary [e.g., in this case, that the expectation value of x is (locally) time-independent] and the external noise process εt is i.i.d. and has a zero expectation.

7

The author thanks the Potsdam Institute for Climate Impact Research (PIK) for access to the ERA-40 data.

8

One could take the condition as an additional explicit constraint, ending up with two equality constrains instead of one. This will obviously increase the overall numerical cost of the method.