## 1. Introduction

The parameterization of reduced dynamical models describing the behavior of the observed (measured) multiscale data has been a field of intensive research in the last several years. Approaches introduced in atmospheric science include stochastic differential equations (Majda et al. 1999, 2003; Wilks 2005), multiscale schemes (Majda et al. 2003; Fatkullin and Vanden-Eijnden 2004), regression models (Orrell 2003), discrete Markov chain models (Khouider et al. 2003), hidden Markov models (Majda et al. 2006; Horenko et al. 2008a,b), and conditional Markov models (Crommelin and Vanden-Eijnden 2008). In the present paper a purely data-driven approach for parameterization by means of nonstationary multivariate autoregressive factor (VARX) models is introduced, based on the combination of the stationary VARX models widely used in econometrics (Tsay 2005) with the recently introduced finite element method (FEM)-clustering procedure (Horenko 2009, 2010b). The resulting numerical strategy is demonstrated to allow the multiscale approximation of the nonstationary dynamical processes via optimal sequences of locally stationary fast VARX processes and some slow (or persistent) hidden process switching between them. In an atmospheric context it was recently demonstrated that the FEM-clustering framework can be successfully applied to identify large-scale dynamical circulation patterns in realistic atmospheric general circulation models (AGCMs; Franzke et al. 2009). In the current manuscript the FEM-clustering framework is extended to allow for discontinuous hidden processes via the formulation of the respective variational problems in the space of the functions with bounded variation. Applications of the proposed method are demonstrated in two different scenarios and compared with standard purely data-driven methods (Orrell 2003; Wilks 2005): (i) in the context of the subgrid-scale parameterization for the Lorenz ’96 model (Lorenz 1996) with nonstationary forcing in the right-hand side and (ii) in the context of climate impact factor analysis for 40-yr European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis (ERA-40) historical geopotential data in Europe between 1958 and 2003 (Simmons and Gibson 2000).

The outline of the remainder of this paper is as follows. In section 1 the inverse problem for nonstationary dynamical systems is formulated as a clustering problem and it is demonstrated how the multiscale assumption can be incorporated into the resulting variational formulation via the persistency condition in the space of functions with bounded variation. The finite element method is deployed to reduce the dimensionality of the resulting optimization problem, and the iterative numerical scheme is introduced. In section 2 the numerical details of the resulting FEM–VARX clustering method are explained. In section 3 different strategies of postprocessing the clustering results are discussed with regard to their insight into the analyzed data. In section 4 the performance of the presented framework is demonstrated for two practical applications and the discussion is presented in section 5.

## 2. Constrained clustering method

### a. Model distance functional

*x*

_{0}, … ,

*x*∈ Ψ ⊂

_{T}**R**

*be the observed*

^{n}*n*-dimensional time series with

*T*+ 1 snapshots in time interval [0,

*T*]. In the following we will assume that in this time interval the considered time series

**x**

*is approximated by a time-discrete output of the certain direct mathematical model*

_{t}**F**(·) is the model operator,

*τ*is the model time step,

*mτ*is the memory depth (

*m*= 1 for Markov models), and

*d*is the dimension of a model parameter space. Let

**x**

*at time*

_{t}*t*and the output of the model (1) calculated for a fixed set of parameters

**(**

*θ**t*). In this case, for a given observation series

*x*

_{0}, … ,

*x*and some fixed functional form

_{T}*g*(·), the inverse problem (or the parameter identification problem) can be approached via the solution of the following variational problem:

**(**

*θ**t*) can be made. In the following we will assume that for any

*t*∈ [0,

*T*] model distance, functional (3) can be represented as a convex linear combination of

*K*≥ 1 stationary model distance functionals—that is, model functionals dependent on some constant (time-independent) model parameters

*θ*∈

_{i}**Ω**,

*i*= 1, … ,

*K*:

*γ*(

_{i}*t*) fulfilling the convexity condition

*t*the global time-dependent (or nonstationary) model distance functional (3) can be approximated by one of

*K*local time-independent (or stationary) model distance functionals chosen according to some time-dependent probabilities (or model affiliations)

**Γ**(

*t*) = [

*γ*

_{1}(

*t*), … ,

*γ*(

_{K}*t*)]. This idea for the inverse numerical problems is widely used in the context of data clustering (Höppner et al. 1999); in the presented general form it was introduced in Horenko (2010a) and stems from the classical spline interpolation approach for direct numerical problems (see, e.g., Deuflhard 2004). Inserting the ansatz (5) into (4) results in the minimization of the average clustering functional

**Θ**= (

*θ*

_{1}, … ,

*θ*).

_{K}*is some independent identically distributed (i.i.d.) stochastic variable with zero expectation*

**ϵ**_{t}**(**

*θ**t*): [0,

*T*] →

**R**

*is a time-dependent parameter describing the evolution of the expectation value of the process*

^{n}**x**

*. The model distance functional (3) in such a case has the form*

_{t}*K*-means clustering algorithm (Bezdek 1981; Höppner et al. 1999). This means that with the help of the ansatz (5), the solution of the nonstationary inverse problem for dynamical system (9) can be approached via the iterative clustering algorithm based on the minimization of the average clustering functional (8).

In the following it will be explained in detail how these concepts can be interpreted and applied in context of more general nonstationary multivariate models with external forcing (VARX models).

### b. Nonstationary VARX models and VARX model distance functional

*x*

_{0}, … ,

*x*∈

_{T}**R**

*(describing the internal degrees of freedom of the considered dynamical system), the time series*

^{n}*u*

_{0}, … ,

*u*∈

_{T}**R**

*of the external influences (or forcing) is available, the nonstationary nonlinear VARX model has the following form:*

^{l}

*ϕ*_{1}(

*x*

_{t−τ}, … ,

*x*

_{t−mτ}):

**R**

^{nm}→

**R**

^{d}is some (in general nonlinear) function connecting the previous observations

*x*

_{t−τ}, … ,

*x*

_{t−mτ};

*ϕ*_{2}(

*u*) = {

*ϕ*

_{2}

^{1}[

*u*(

*t*)], … ,

*ϕ*

_{2}

^{k}[

*u*(

*t*)]} :

**R**

^{l}→

**R**

^{k}is some fixed (nonlinear) function of the external factors;

*: [0,*

**ϵ**_{t}*T*] →

**R**

^{h}(

*h*≪

*n*) is a Gaussian process with zero expectation;

**(**

*μ**t*) : [0,

*T*] →

**R**

^{n}; 𝗔(

*t*) : [0,

*T*] →

**R**

^{n×d}; 𝗕(

*t*) : [0,

*T*] →

**R**

^{n×k}; and 𝗖(

*t*) : [0,

*T*] →

**R**

^{n×h}.

**ϕ**_{1}(

*x*

_{t−τ}, … ,

*x*

_{t−mτ}) = [

*x*

_{t−τ}, … ,

*x*

_{t−mτ}] and a VARX model equation (Brockwell and Davis 2002):

*t*) (describing the coupling between the

*h*-dimensional Gaussian noise process to the analyzed time series

**x**

_{t}) for any

*t*∈ [0,

*T*] can be represented as

*t*) : [0,

*T*] →

**R**

^{n×h}is an orthogonal matrix function and

**Λ**(

*t*) : [0,

*T*] →

**R**

^{h×h}is a diagonal matrix function with nonnegative diagonal elements. Defining

**(**

*θ**t*) = [

**(**

*μ**t*), 𝗔(

*t*), 𝗕(

*t*), 𝗖(

*t*)] and under the assumption (13), the VARX model distance functional (3) of dynamical system (11) can be written as

*t*)-weighted norm ‖·‖

_{𝗣(t)}=

*t*), 𝗣

^{†}(

*t*)·]

_{2}

*t*)-weighted norm ‖·‖

_{𝗖(t)}=

*t*)·]

_{2}

*g*[

**x**

*,*

_{t}**(**

*θ**t*)],

*t*= 0, … ,

*T*is a

*χ*

^{2}process. In the application examples below it will be demonstrated how this property of the process can be used to estimate the confidence intervals of the model parameters. In the context of stationary VARX models (e.g., in the case of the time-independent parameter matrices 𝗔, 𝗕, and 𝗖) this property can be deployed to a posteriori check the model assumption (11) and to demonstrate the asymptotical normality of the resulting parameter estimates (i.e., to demonstrate that for

*T*→ ∞ the parameter estimates are distributed according to the multivariate Gaussian; Reinsel 1993). Applying the ansatz (5), the nonstationary parameter identification problem can be approached via the solution of the minimization problem (8) [with

*= (*

**θ**_{i}*, 𝗔*

**μ**^{i}*, 𝗕*

^{i}*, 𝗖*

^{i}*)]. This can be done by means of the iterative clustering algorithm (just as described for the*

^{i}*K*-means clustering; Höppner et al. 1999; Horenko 2010b).

### c. Incorporation of additional information

*g*, the problem is in general nonconvex and the numerical solution gained with some sort of local minimization algorithm depends on the initial parameter values (Deuflhard 2004). Perhaps it is even more important that the solution

**Γ**of the above constrained minimization task might be an irregular function: to see this, let us assume that we already know the minimizing values

**Θ**

_{*}for

**Θ**. Then, the minimizer

**Γ**

_{*}for the affiliation vector

**Γ**has the following form:

**x**

*has perfect affiliation with state*

_{t}*i*if the model distance functional for

**x**

*is minimal in state*

_{t}*i*. That is, if the process exhibits strong variability, then the affiliations are rather nonsmooth functions. Whenever the affiliation functions just take values 0 or 1, we will call them

*deterministic*, in the sense that then for every datum in the time series it is certain to which cluster it belongs.

**Γ**(·)] and then to apply a finite Galerkin discretization (e.g., a FEM discretization) of this infinite-dimensional Hilbert space. In context of Tykhonov-based FEM-clustering methods, this was done assuming the weak differentiability of the time-continuous functions

*γ*; that is,

_{i}*W*

_{1,2}(0,

*T*) is the Sobolev space of weakly differentiable functions [e.g., functions with

_{2}(0,

*T*) integrable first derivatives].

However, introduction of the *ϵ*^{2}-dependent penalty term in the formulation of the Tykhonov-regularized problem (17) changes the functional form of the original clustering problem (8) and biases the position of the solution of the respective minimization problem with growing *ϵ*^{2} (e.g., the solution of the regularized problem may have a significant deviation from the global minimum of the original problem). As was demonstrated in Horenko (2009, 2010b), increasing the value of *ϵ*^{2} leads to the “smoothing out” of the sharp transitions between the cluster states.^{1} Moreover, the formulation (16) of the persistency condition in *W*_{1,2} sense relies on the differentiability and continuity of the underlying cluster affiliation functions *γ _{i}*(

*t*). This cannot in general be assumed to be granted in the cases where the transitions between the cluster states are sharp [e.g., when the function

*γ*(

_{i}*t*) has jumps and is discontinuous].

In the following, an alternative way to incorporate the persistency assumption into the original clustering problem (8) (avoiding the two abovementioned problems) will be presented, based on the formulation of the persistency condition in a functional space allowing for a discontinuity of its elements [functions with bounded variation, BV(0, *T*)].

### d. Persistence in the BV(0, T) sense: The constrained BV-clustering method and FEM discretization

*in the*

_{t}γ_{i}^{2}sense [which relies on the differentiability and continuity assumption for the cluster affiliation function

*γ*(

_{i}*t*)], we will consider the time-discrete functions

*γ*(

_{i}*t*) defined only at the time instances where the observations

**x**

*are available. We will formulate the persistency condition in the time-discrete BV(0,*

_{t}*T*) sense

*C*defines the maximal number of transition between the cluster state

*i*and all other states in time interval (0,

*T*). Note that since at least in the time-continuous case

*W*

_{1,2}(0,

*T*) ⊂ BV(0,

*T*) (Moreau 1988), this kind of persistency condition will also allow us to preserve the “smooth”

*W*

_{1,2}transitions between the cluster states in the continuous limit by an appropriate choice of the persistency parameter

*C*.

*t*:

**Θ**= (

*θ*

_{1}, … ,

*θ*), let

_{K}*υ*:=

_{i}**D**

*γ*

_{i}^{†}and split this into nonpositive and nonnegative parts (following the theorem about the unique representation of the BV functions; cf. Moreau 1988):

^{−1}be the discrete integration operator; for example,

*i*at time 0). Now we can express

*γ*

_{i}^{†}as

*x̃*defined in (28) is almost twice as high as the dimension of the original variable

_{i}*γ*]. The solution of the above minimization problem can be approached via the subspace iteration procedure (e.g., via the solution of the restrained optimization problems in parameter subspaces

_{i}**x̃**and

**subsequently). Completely analogously to the Tykhonov-regularized FEM-clustering case (Horenko 2009, 2010b), it can be demonstrated that this subspace iteration procedure converges toward the (local) minimum of the problem (29) if some appropriate assumptions (convexity and differentiability) of the model distance functional (3) are fulfilled.**

*θ*However, since the dimensionality of the variable **x̃** is growing as *K*(2*T* + 1) with the length of the analyzed time series, the numerical solution of the restrained problem (29) for a fixed value of **Θ** can become increasingly expensive for long time series. In the following the finite element method will be deployed to reduce the dimensionality of the above problem.

### e. FEM discretization

*τ*

_{0},

*τ*

_{1},

*τ*

_{2}, … ,

*τ*

_{N−1},

*τ*,

_{N}*τ*

_{N+1}=

*T*} be a finite subset of the interval [0,

*T*] with uniform time steps

*δ*. We can define a set of

_{t}*N*≪ (

*T*+ 1) time-discrete functions with bounded variation {

*f*

_{1}(

*t*),

*f*

_{2}(

*t*), … ,

*f*(

_{N}*t*)} defined at

*T*+ 1 time points of the observation series

*x*, where each function

*f*(

_{i}*t*) shall take positive values at the observation time instances of

*x*in the time interval (

*τ*

_{i−1},

*τ*

_{i+1}) and be zero at the time instances outside this interval.

^{2}Now we represent the

*υ*’s from (25) by these functions; thus,

*χ*

_{N}^{+}and

*χ*

_{N}^{−}are discrete discretization errors. As

*δ*goes to 1, the discretization errors become zero. This develops into a reduced discrete representation of the discrete problem (29) by using

_{t}*∈*

_{N}**R**

^{(T+1)×N}is a FEM-basis matrix and

**ṽ**

_{i}

^{−}is defined analogously. Then the previously defined

*x̃*can be approximated as

_{i}**x**

*K*(2

*N*+ 1), which is much less then the dimension

*K*(2

*T*+ 1) of the original variable

**x̃**if

*N*≪

*T*. Analogously to the Tykhonov-regularized FEM-clustering problem described in Horenko (2009), adaptive FEM techniques can be deployed to find the optimal set of time intervals

*δ*for a given total discretization error

_{t}**.**

*δ̃*^{3}

For a fixed **Θ** the above minimization problem is a linear minimization with regard to **x****x***g* is convex with regard to *θ* and the problem ∂*g*(*x _{t}*,

*θ*)/

*θ*= 0 has a unique analytical solution with regard to

*θ*(as will be demonstrated in the following, this is the case for the VARX models). The iterations can be repeated until the change of the functional value does not exceed some predefined threshold for the change of the functional value. The algorithm is:

choose an arbitrary

*x*^{0}set

*s*= 1while not converged repeat

Compared to the Tykhonov-regularized FEM-clustering problem as described in Horenko (2009, 2010b), the above problem [(34)] has two major numerical advantages. First, while for the Tykhonov-regularized FEM-clustering problem the persistence could be influenced only indirectly through the choice of the regularization parameter *ϵ*^{2}, in the context of (34) it is directly controllable via the persistency threshold *C*. Second, a sparse linear programming problem is solved in each iteration of the above algorithm (compared to the more numerically expensive quadratic programming in the case of the Tykhonov-regularized FEM-clustering problem).

In the next section it will be demonstrated how the above numerical procedure can be formulated for the parameter identification of nonstationary VARX models (11).

## 3. FEM–VARX clustering: Parameter estimation and determination of optimal values for number of states and persistency threshold

### a. Derivation of the estimator formulas

*s*of the subspace iteration algorithm described above, the unconstrained minimization problem

*x̃*

^{(s−1)}(step 2 of the above algorithm). If the dynamics of the analyzed time series is assumed to be governed by the VARX process (11) with memory

*m*, the model distance functional in the clustering problem formulation will take the form (14). Let 𝗔𝗕

*= (*

^{i}*μ*, 𝗔

^{i}*, 𝗕*

^{i}*) ∈*

^{i}**R**

^{n×(1+d+k)}[where

*d*is the dimensionality of the output of function

*ϕ*

_{1}(·)] and Γ

^{s−1}(

*t*) be the cluster affiliations reconstructed from the current values

*x̃*

^{s−1}via the subsequent application of Eqs. (30) and (32). After inserting Γ

^{s−1}(

*t*) into the function

*g*(

_{i}*θ*),

_{i}*i*= 1, … ,

*K*and taking the derivative of the functional

**L**

_{0}(34) w.r.t. the new variables 𝗔𝗕

*, we get the optimal estimators for 𝗔𝗕*

^{i}*in the iteration*

_{s}^{i}*s*as the solution of the following system of linear equations:

^{4}

*ϕ*_{1}

^{t}=

*ϕ*_{1}(

*x*

_{t−τ}, … ,

*x*

_{t−mτ}) and

*ϕ*_{2}

^{t}=

*ϕ*_{2}[

*u*(

*t*)]. If the matrix (𝗫*

_{s}𝗫

_{s}) is invertible, then the solution of the above system exists, is unique, and can be expressed as

*are straightforwardly calculated from the covariance matrices of the model residuals in the cluster state*

^{i}*i*:

As can be seen from the above estimator formulas, from the view point of the inverse numerical problem there is no difference between linear (12) and nonlinear (11) factor models: both result in the solution of a linear least squares problem (36). This is explained by the fact that in both cases the right-hand sides of the models are linear functions of model parameters. However, as will be demonstrated in the following, linear autoregressive models can in some situations provide more insight, allowing us to use the available tools of linear data analysis, and can be successfully applied to analyze realistic (nonlinear) data.

### b. FEM–VARX model order selection for fixedK and C

*m*and the optimal functional form

*ϕ*_{2}(

*u*) = {

*ϕ*

_{2}

^{1}[

*u*(

*t*)], … ,

*ϕ*

_{2}

^{k}[

*u*(

*t*)]} for the external factors, standard tools of information theory such as the Akaike information criterion (AIC) or the Bayesian information criterion (BIC) McQuarrie and Tsai (1998) developed for the linear stationary VARX models can be applied a posteriori to the locally stationary VARX models identified via the FEM–VARX procedure. In terms of the information criteria, various models are compared wrt the special functional consistent of the model log likelihood with an added regularization term penalizing the total number of parameters involved in the model (to avoid overfitting). For example, the BIC functional (being in general more robust than AIC; cf. McQuarrie and Tsai 1998) for the cluster state

*i*will have the form

_{i}=

*γ*

_{i}g_{i}^{†}(

*θ*) and

_{i}*N*is the number of the model parameters in the cluster state

_{i}*i*. Given any two estimated cluster models [e.g., models with different memory depth and/or different functions

*ϕ*_{2}(

*u*)], the model with the lower value of BIC(

*i*) is the one to be preferred. In the following it will be demonstrated how the criterion (39) can be applied in the praxis to make a decision about the optimal functional form of the local VARX models and to test the significance of different external factors

*u*for the dynamical process explaining the analyzed time series data.

### c. Choosing the optimal number of local models K

The upper bound for the number of statistically distinguishable cluster states for each value of the persistency threshold *C* can be algorithmically estimated in the following way: starting with some a priori chosen (big) *K*, one solves the optimization problem (34) for different fixed value of *C* and calculates the confidence intervals of the resulting local parameter matrices **Θ*** ^{i}*,

*i*= 1, … ,

*K*(this can be done applying standard bootstrap sampling procedures; see Chernik 2007). If two of the estimated parameter sets for two of the identified cluster states have confidence intervals that overlap in all components, this means that respective clusters are statistically indistinguishable and the whole procedure must be repeated for

*K*=

*K*− 1. If at a certain point all of the matrices are statistically distinguishable, the procedure is stopped and

*K*

_{max}(

*C*) =

*K*.

Another possible means of estimating the optimal number of clusters can be used if the identified transition process Γ(*t*) is shown to be Markovian for given *K*, *C*. Markovianity can be verified applying some standard tests (e.g., one can check the generator structure of the hidden process; see Metzner et al. 2007).^{5} In such a case the hidden transition matrix can be calculated and its spectrum can be examined for a presence of the spectral gap (a gap separating the fast and the slow time scales in the Markov dynamics). If the spectral gap is present, then the number of the dominant eigenvalues (i.e., eigenvalues between the spectral gap and 1.0) gives the number of the metastable clusters in the system (Schütte and Huisinga 2003). Positive verification of the Markovianity of the hidden process has an additional advantage: it allows us to construct a reduced dynamical model of the analyzed process and to estimate some dynamical characteristics of the analyzed process (e.g., one can calculate relative statistical weights, mean exit times, and mean first passage times for the identified clusters) (Horenko et al. 2008a).

Verification of the Markov assumption also allows us to construct predictive Markovian models of the persistent dynamical process Γ(*t*) switching between the cluster states (Horenko et al. 2008a; Horenko 2009). As will be demonstrated for the numerical examples in the following, the respective *K* × *K* Markovian transition matrix, together with the parameter estimates (37) and (38) of the locally stationary cluster states, can help to construct the dynamical ensemble prediction models. It is important to mention that without this a posteriori Markov verification, the minimum of the functional (34) cannot be directly used to generate the data-based predictions since the identified cluster affiliation function Γ is just an abstract BV function with some predefined persistency *C* and has no intrinsic representation in terms of the underlying dynamical process.

### d. Selection of the optimal persistency threshold C for a given number of states K

Linearity of the functional (34) wrt **x̃** for *fixed* values of model distance parameters *θ _{i}* can help to apply the standard instruments from the theory of ill-posed linear problems, such as the

*L*-curve approach (Calvetti et al. 2000), to identify the optimal value of the persistency threshold

*C*for a fixed number of clusters

*K*. For example, the optimal value of

*C*can be determined as the edge point (or the point of maximal curvature) on a two-dimensional plot. This plot depicts a dependence of the residuum-norm of the solution from

*C*(Horenko 2009). Alternatively, as would be demonstrated in the numerical examples, decreasing

*C*up to some point [meaning the increasing regularity of the optimal solutions of (34) in the BV sense] results in decreasing the respective values

**L**

_{0}of the clustering functional in the solution point (meaning the increasing quality of the resulting clustering). The point with the minimal value of

**L**

_{0}in such a case indicates the optimal persistency threshold guaranteeing the best clustering quality.

## 4. Postprocessing of the FEM–VARX clustering results

*K*and

*C*results in the identification of

*K*optimal locally stationary VARX models and the persistent BV(0,

*T*) function Γ(

*t*) switching between them. As explained above, the a posteriori verification of the Markovian hypothesis for Γ(

*t*) allows us to construct the reduced representation of the overall dynamics in the multiscale sense (e.g., as some slow persistent Markovian process switching between

*K*locally stationary VARX parameter sets). Postprocessing of the derived local VARX models can give some additional insight into the analyzed time series. For example, expectation values of mean dynamical equilibrium positions

*i*can be calculated as functions of the external forcing

*u*. In the case of the linear autoregressive VARX models (12), this is done via the solution of the following system of linear equations:

_{t}^{6}

*= 0, ∀*

_{q}^{i}*q*, then the above result is equivalent to the multivariate trend estimate in context of the recently introduced FEM–

*K*-trends clustering algorithm (Horenko 2010b).

The local linearity of the identified VARX models also allows us to apply various techniques of model reduction known in the literature, such as proper orthogonal decomposition or balanced truncation (Moore 1981) allowing for construction of energy-preserving low-rank approximation*s* to the identified VARX models.

Another kind of insight into the underlying multidimensional dynamics can be gained via the analysis of the Fourier transforms of the identified models (e.g., via the analysis of the transfer function matrices and directed transfer functions; Pereda et al. 2005). This kind of analysis can help to quantify the causal influence of different data dimensions on each other.

## 5. Numerical examples

In the following we will illustrate the proposed FEM–VARX clustering strategy on two practical examples: first, on data from the Lorenz ’96 model (Lorenz 1996) with external periodical forcing switching between the deterministic and the chaotic regime behavior and second, on a set of averaged daily ERA-40 500-hPa geopotential data between 1958 and 2003 on a 16 × 9 spatial grid covering Europe and part of the North Atlantic.

### a. Subgrid-scale modeling for Lorenz ’96 system with forcing

*x̃*being coupled to some (fast) subscale variables

_{i}*y*

_{i,j}:

*h*= 1,

*b*=

*c*= 10) set to be the same as in the paper of Orrell (2003). In contrast to the paper of Orrell (2003), we choose the external forcing to be an explicitly time-dependent function of the form

*F*(

*t*) = 10 sin(2

*π*/80

*t*), resulting in switching between “deterministic” and “chaotic” regime behavior (Orrell and Smith 2003) on the time scale of the forcing (e.g., producing a system with three different time scales).

We generate one realization of (41) in the time interval [0, *T*] with an adaptive Runge–Kutta method of the fourth order (MATLAB command ode45), resulting in a 40-dimensional time series with 4000 instances (time step *τ* = 0.1). We further aim at parameterization of the subgrid-scale influences *x _{t}^{i}* =

**F**

*(*

_{t}^{y}*t*) in the nonstationary VARX form (11) with external factors defined by the resolved variables {

*ϕ*_{2}[

*u*(

*t*)] = [

*x̃*

_{1}(

*t*), … ,

*x̃*

_{8}(

*t*)]}. Figure 1 demonstrates the application of the FEM–VARX clustering to the time series of

**x**

*and*

_{t}

*ϕ*_{2}[

*u*(

*t*)] defined in such a way for

*K*= 2 (number of clusters),

*q*= 1 (memory depth), and with

*N*= 200 (number of finite elements) for various numbers of persistency threshold parameter. The minimum is achieved for

*C*= 6, which corresponds to the number of transitions through the bifurcation point of (41). As can be seen from Fig. 2, the FEM–VARX clustering procedure for

*K*= 2 and

*C*= 6 identifies both dynamical regimes of the model (41), whereas the minimization of (34) without the persistency constraint (or, equivalently,

*C*= 2

*N*= 400) results in a much worse clustering result (in terms of the optimal value of

**L**

_{0}) and is due to the trapping in the local minimum of the clustering functional. Inspection of the confidence intervals of the estimated local stationary VARX model parameters in Fig. 3 reveals that whereas the matrices 𝗔 are statistically indistinguishable (e.g., the “own dynamics” of the subgrid-scale process is basically the same in both states), there are statistically significant differences in 𝗕 (in this case it governs the coupling from the resolved degrees of freedom

**x̃**to subgrid-scale degrees of freedom). Also, the noise intensities 𝗖 are different in both cases (in the identified chaotic regime the noise intensity is significantly higher).

Finally, we verify the Markov assumption for the identified affiliation function Γ(*t*), generate the ensemble predictions (with 10 000 ensemble members) based on the estimated eight-dimensional FEM–VARX parameterization (based on the first 90% of the data), calculate the expectation values of the relative prediction errors (in 2-norm, based on the last 10% of the data; black solid line in Fig. 4), and compare them with predictions obtained by other data-driven models (see Fig. 4): (i) stationary constant (gray dotted line) and regressive (black dashed–dotted line) subgrid-scale models (Orrell 2003); (ii) stationary linear stochastic model (black dashed line) (Wilks 2005); and (iii) one-dimensional FEM–VARX model, estimated under the assumption that different subgrid-scale processes in different dimensions do not interact (e.g., the matrices 𝗔 are diagonal) (gray solid line). Figure 4 shows that the fully interactive eight-dimensional persistent FEM–VARX model has a much better predictive skill for the considered nonstationary model series.

### b. Analysis of ERA-40 geopotential data in Europe (1958–2003)

Using the FEM–VARX method introduced above, we analyze daily mean values of the 500-hPa air temperature field from the ERA-40 reanalysis data (Simmons and Gibson 2000).^{7} We consider a region with the coordinates 32.5°–75.0°N, 27.5°W–47.5°E, which includes Europe and a part of the eastern North Atlantic. The resolution of the data is 5°, which implies a grid with 16 points in the zonal and 9 in the meridional direction. For the analysis we have considered temperature values only for the period 1 December 1958 to 31 July 2003; thus, we end with a 144-dimensional time series of 16 314 days. To remove the seasonal trend we apply a standard procedure, where from each value in the time series we subtract a mean build over all values corresponding to the same day and month (e.g., from the data on 1 January 1959 we subtract the mean value over all days that are the first of January, and so on). The resulting deviations Δ*H*(*t*) of the geopotential heights from their seasonal mean values are the subject of the data analysis in the following.

Since the overall dimensionality of 144 of the analyzed data series will induce uncertainties too high for the given time span of 16 314 days, the scope of the data analysis in the following is reduced to the projections on 20 dominant EOFs (describing approximately 99% of the total data variance).

Three different external influence factors **u*** _{t}* are tested on their significance for the change of the EOF projection data

**x**

*in the setting of the VARX models (11): (i) atmospheric CO*

_{t}_{2}values

*u*

_{t}^{1}from the Mauna Loa observatory (data are available online at http://cdiac.ornl.gov/ftp/trends/co2/maunaloa.co2), (ii) the seasonal trend factor in the form

*u*

_{t}^{2}= sin(2

*π*/365.4

*t*), and (iii) the sunspot cycle data

*u*

_{t}^{3}(data are available online at http://solarscience.msfc.nasa.gov/SunspotCycle.shtml).

We start the data analysis parameterizing the globally stationary VARX model for 20 dominant EOFs with different sets of external factors described above for various values of the memory parameter *q*, applying the BIC test to verify the statistical significance of the factors and testing the independent identically distributed (i.i.d.) assumptions for the model residuals with the AR(1) test (Brockwell and Davis 2002) (to check the validity of the underlying estimator assumptions). It is found that the optimal globally stationary VARX model for the analyzed data is the one with *q* = 2 and no external factors; that is, in the globally stationary framework all of the external factors are found to be statistically nonsignificant.

Application of the nonstationary FEM–VARX framework according to the *K* selection procedure described above results in determining *K* = 4 as the maximal number of statistically distinguishable local VARX states (valid for a wide range of parameters *C*, *N*, *q*). For a fixed value of *K* = 4 we further determine the optimal value of the persistency threshold *C* (in the same manner demonstrated in Fig. 1), determine the optimal value of memory *q* = 2 via the BIC test (e.g., the same as for the globally stationary VARX model), and test the i.i.d. assumptions for the model residuals with the AR(1)-test (Brockwell and Davis 2002). Next, we repeat the verification of external factors in the same way as described above for the globally stationary VARX model. As shown in Fig. 5, the BIC test confirms the hypothesis that the optimal FEM–VARX model is the one with the two external factors *u _{t}*

^{1}(atmospheric CO

_{2}values) and

*u*

_{t}^{2}(seasonal factor), whereas the influence of the sunspot cycle data

*u*

_{t}^{3}is found to be statistically negligible. Therefore, all of the further tests are conducted only with the factors

*u*

_{t}^{1}and

*u*

_{t}^{2}.

Inspection of the respective cluster affiliation functions *γ _{i}*(

*t*) (see Fig. 6) together with the mean equilibrium positions in the cluster states (40) reveals (see Figs. 7 and 8) that two of the identified cluster states, namely states 3 and 4, are describing the blocking situation in the upper troposphere. Figure 6 shows the comparison of the sum of cluster affiliations

*γ*

_{3}(

*t*) +

*γ*

_{4}(

*t*) for the par of the time series with the scaled zonally averaged Lejenas–Okland blocking index. It indicates the appearance of a blocking anticyclone and the duration of the event. We have a blocking if the geopotential height difference at 500 hP between 40° and 60°N is negative over a region with 20° zonal extent. The exact formula is given in Lupo et al. (1997); for the purpose of representation, we have computed a zonally averaged value of the index, rescaled it, and reversed its sign.

Next we aim at demonstrating how the postprocessing methods described in section 3 can help us gain additional insight into the data to understand the impact of the external factors. Figures 7 and 8 show the EOF back-projection to the original 144-dimensional space of the mean dynamical equilibrium positions in clusters 3 and 4 calculated at different times with the respective values of the factors *u _{t}*

^{1}and

*u*

_{t}^{2}As can be seen from the graphics, the impact of the seasonal factor

*u*

_{t}^{2}results in significant weakening of the blocking states in summer, a finding consistent with the observations reported in the literature. The impact of the increasing CO

_{2}concentration also results in a weakening of the blocking situation in both states 3 and 4 that is less pronounced but still statistically significant (see the respective confidence intervals in Figs. 7 and 8).

Finally, in the same manner as in the previous nonstationary Lorenz ’96 example, we construct an ensemble prediction model (with 10 000 ensemble members) based on the FEM–VARX clustering results and compare the resulting predictions with the ones obtained by other methods. First, we verify the Markov assumption in the time-homogenous approximation for the identified affiliation function Γ(*t*), generate the ensemble predictions based on the estimated 20-dimensional FEM–VARX parameterization (based on the first 90% of the data), and calculate the expectation values of the relative prediction errors (in 2-norm, based on the last 10% of the data). As can be seen from Fig. 9, FEM–VARX ensemble predictions produce much better predictions than the constant model (where the prediction is always the expectation value over the whole history) and the “same as today model” (where the prediction for the next day is just the state of the system now). Compared with the globally stationary VARX model (where the ensemble prediction is calculated from 10 000 ensemble members of the global VARX model), FEM–VARX produces only slightly better predictions (approximately 2% better). This observation could be explained by the low overall persistence of the process Γ(*t*) compared, for example, to the previous Lorenz ’96 example where the hidden process was very persistent. This issue needs a deeper understanding and is a matter of future research.

## 6. Conclusions and discussion

A numerical FEM–VARX scheme for a data-driven parameterization of nonstationary multiscale dynamical processes was introduced. As demonstrated in the present paper, the application of the FEM–VARX method allows for the good description of the analyzed nonlinear and nonstationary data (in terms of recovering the nonlinear effects associated with the regime switching and in terms of the good prediction quality of the resulting reduced representation). Furthermore, a wide range of data-analysis techniques—for example, from information theory (like the Bayesian information criterion; cf. McQuarrie and Tsai 1998), model reduction approaches (like balanced truncation; cf. Moore 1981) and adaptive finite element methods—has become available and can be applied in the FEM–VARX context to postprocess the obtained results and to get additional insight into the analyzed data. In contrast to other multiscale approaches known from the literature (Majda et al. 2003; Fatkullin and Vanden-Eijnden 2004), the presented numerical scheme is not a systematic strategy based on the knowledge of some first principles but is purely data driven and results in an approximation of the analyzed process by means of a sequence of “simple” linear factor models. From this perspective, it is important to put further efforts into development of mixed numerical schemes, combining some features of systematic “first principles” approaches (and with this some a priori knowledge about the analyzed dynamics) with some aspects of purely data-driven approaches (like the one presented in this paper).

One of the main emphases in the present paper was on implicit mathematical assumptions being made at different stages of the derivation of the numerical method and postprocessing of the obtained results. In context of the meteorological application it was shown how big the impact of the implicit stationarity assumption is on the analysis of climate factors’ influence: whereas in the case of the globally stationary VARX model with constant coefficients, the impacts of the seasonal trend and CO_{2} concentration were found to be statistically insignificant, the nonstationary FEM–VARX clearly reveals the statistical significance of the two factors (see Fig. 5). It is important to emphasize that the applicability of the presented method, as well as subsequent interpretation and postprocessing of the obtained results, is dependent on the fulfillment of the mathematical assumptions involved. Conclusions that are drawn in every specific application case are reliable only if those assumptions are fulfilled; this is one of the most important issues to be kept in mind when applying methods such as the one presented in this paper.

## Acknowledgments

The work was partially supported by the DFG SPP 1276 MetStroem “Meteorology and Turbulence Mechanics”, the DFG Research Center “Matheon” (Project E8), and the Center of Scientific Simulations (CSS Berlin).

## REFERENCES

Bezdek, J., 1981:

*Pattern Recognition with Fuzzy Objective Function Algorithms*. Plenum Press, 256 pp.Braess, D., 2007:

*Finite Elements: Theory, Fast Solvers, and Applications in Solid Mechanics*. 3rd ed. Cambridge University Press, 365 pp.Brockwell, P. and R. Davis, 2002:

*Introduction to Time Series and Forecasting*. Springer, 434 pp.Calvetti, D., S. Morigi, L. Reichel, and F. Sgallari, 2000: Tikhonov regularization and the

*L*-curve for large discrete ill-posed problems.,*J. Comput. Appl. Math.***123**:423–446. doi:10.1016/S0377-0427(00)00414-3.Chernik, M., 2007:

*Bootstrap Methods: A Guide for Practitioners and Researchers*. Series in Probability and Statistics, Wiley, 400 pp.Crommelin, D. and E. Vanden-Eijnden, 2008: Subgrid-scale parameterization with conditional Markov chains.

,*J. Atmos. Sci.***65**:2661–2675.Deuflhard, P., 2004:

*Newton Methods for Nonlinear Problems: Affine Invariance and Adaptive Algorithms*. Springer Series in Computational Mathematics, Vol. 35, Springer, 424 pp.Fatkullin, I. and E. Vanden-Eijnden, 2004: A computational strategy for multiscale systems with applications to the Lorenz ’96 model.

,*J. Comp. Phys.***200**:605–638.Franzke, C., I. Horenko, A. Majda, and R. Klein, 2009: Systematic metastable atmospheric regime identification in an AGCM.

,*J. Atmos. Sci.***66**:1997–2012.Golub, G. and C. V. Loan, 1989:

*Matrix Computations*. 2nd ed. Johns Hopkins University Press, 642 pp.Höppner, F., F. Klawonn, R. Kruse, and T. Runkler, 1999:

*Fuzzy Cluster Analysis: Methods for Classification, Data Analysis, and Image Recognition*. John Wiley and Sons, 289 pp.Horenko, I., 2009: On robust estimation of low-frequency variability trends in discrete Markovian sequences of atmospheric circulation patterns.

,*J. Atmos. Sci.***66**:2059–2072.Horenko, I., 2010a: Finite element approach to clustering of multidimensional time series.

,*SIAM J. Sci. Comput.***32**:62–83.Horenko, I., 2010b: On clustering of non-stationary meteorological time series.

,*Dyn. Atmos. Oceans***49**:164–187.Horenko, I., S. Dolaptchiev, A. Eliseev, I. Mokhov, and R. Klein, 2008a: Metastable decomposition of high-dimensional meteorological data with gaps.

,*J. Atmos. Sci.***65**:3479–3496.Horenko, I., R. Klein, S. Dolaptchiev, and C. Schuette, 2008b: Automated generation of reduced stochastic weather models. I: Simultaneous dimension and model reduction for time series analysis.

,*Multiscale Model. Simul.***6**:1125–1145.Khouider, B., A. Majda, and M. Katsoulakis, 2003: Coarse-grained stochastic processes for tropical convection and climate.

,*Proc. Natl. Acad. Sci. USA***100**:11941–11946.Lorenz, E. N., 1996: Predictability—A problem partly solved.

*Proc. Seminar on Predictability,*Vol. 1, Reading, United Kingdom, ECMWF, 1–18.Lupo, A. R., R. J. Oglesby, and I. I. Mokhov, 1997: Climatological features of blocking anticyclones: A study of Northern Hemisphere CCM1 model blocking events in present-day and double CO2 concentration atmospheres.

,*Climate Dyn.***13**:181–195.Majda, A., I. Timofeev, and E. Vanden-Eijnden, 1999: Models for stochastic climate prediction.

,*Proc. Natl. Acad. Sci. USA***96**:14687–14691.Majda, A., I. Timofeev, and E. Vanden-Eijnden, 2003: Systematic strategies for stochastic mode reduction in climate.

,*J. Atmos. Sci.***60**:1705–1722.Majda, A., C. Franzke, A. Fischer, and D. Crommelin, 2006: Distinct metastable atmospheric regimes despite nearly Gaussian statistics: A paradigm model.

,*Proc. Natl. Acad. Sci. USA***103**:8309–8314.McQuarrie, A. and C. Tsai, 1998:

*Regression and Time Series Model Selection*. World Scientific, 455 pp.Metzner, P., I. Horenko, and C. Schütte, 2007: Generator estimation of Markov jump processes based on incomplete observations nonequidistant in time.

,*Phys. Rev. E***76**:066702. doi:10.1103/PhysRevE.76.066702.Moore, B., 1981: Principal component analysis in linear systems: controllability, observability, and model reduction.

,*IEEE Trans. Autom. Control***26**:17–32.Moreau, J-J., 1988: Bounded variation in time.

*Topics in Nonsmooth Mechanics,*J.-J. Moreau, P. D. Panagiotopoulos, and G. Strang, Eds., Birkhauser, 1–74.Orrell, D., 2003: Model error and predictability over different time scales in the Lorenz ‘96 systems.

,*J. Atmos. Sci.***60**:2219–2228.Orrell, D. and L. Smith, 2003: The spectral bifurcation diagram: Visualizing bifurcations in high-dimensional systems.

,*Int. J. Bifurcation Chaos***13**:3015–3027.Pereda, E., R. Quiroga, and J. Bhattacharya, 2005: Nonlinear multivariate analysis of neurophysiological signals.

,*Prog. Neurobiol.***77**:1–37.Reinsel, G., 1993:

*Elements of Multivariate Time Series Analysis*. 1st ed. Springer, 263 pp.Schütte, C. and W. Huisinga, 2003: Biomolecular conformations can be identified as metastable sets of molecular dynamics.

*Handbook of Numerical Analysis,*P. G. Ciaret and J.-L. Lions, Eds., Vol. X, Elsevier, 699–744.Simmons, A. and J. Gibson, 2000: The ERA-40 project plan. ERA-40 Project Rep. Ser. 1, ECMWF, 63 pp.

Tsay, R., 2005:

*Analysis of Financial Time Series*. 2nd ed. Wiley-Interscience, 605 pp.Wilks, D. S., 2005: Effects of stochastic parametrizations in the Lorenz ‘96 model.

,*Quart. J. Roy. Meteor. Soc.***131**:389–407.

## APPENDIX

### Making the BV Problem Differentiable

^{8}By defining

(a) Cluster affiliation function *γ*_{1}(*t*) (the calculation is performed with the FEM–VARX algorithm for *C* = 6 (dashed) and *C* = 400 (dotted). (b) Fragment of the subscale data *F*_{3}* ^{y}*(

*t*) together with the data affiliation corresponding to the cluster affiliation function from (a) (

*C*= 6,

*N*= 200,

*K*= 2).

Citation: Journal of the Atmospheric Sciences 67, 5; 10.1175/2010JAS3271.1

(a) Cluster affiliation function *γ*_{1}(*t*) (the calculation is performed with the FEM–VARX algorithm for *C* = 6 (dashed) and *C* = 400 (dotted). (b) Fragment of the subscale data *F*_{3}* ^{y}*(

*t*) together with the data affiliation corresponding to the cluster affiliation function from (a) (

*C*= 6,

*N*= 200,

*K*= 2).

Citation: Journal of the Atmospheric Sciences 67, 5; 10.1175/2010JAS3271.1

(a) Cluster affiliation function *γ*_{1}(*t*) (the calculation is performed with the FEM–VARX algorithm for *C* = 6 (dashed) and *C* = 400 (dotted). (b) Fragment of the subscale data *F*_{3}* ^{y}*(

*t*) together with the data affiliation corresponding to the cluster affiliation function from (a) (

*C*= 6,

*N*= 200,

*K*= 2).

Citation: Journal of the Atmospheric Sciences 67, 5; 10.1175/2010JAS3271.1

(a)–(c) Diagonal elements of the local VARX parameters (*C* = 6, *N* = 200, *K* = 2).

Citation: Journal of the Atmospheric Sciences 67, 5; 10.1175/2010JAS3271.1

(a)–(c) Diagonal elements of the local VARX parameters (*C* = 6, *N* = 200, *K* = 2).

Citation: Journal of the Atmospheric Sciences 67, 5; 10.1175/2010JAS3271.1

(a)–(c) Diagonal elements of the local VARX parameters (*C* = 6, *N* = 200, *K* = 2).

Citation: Journal of the Atmospheric Sciences 67, 5; 10.1175/2010JAS3271.1

Comparison of the mean relative prediction errors calculated for the last 10% of the time series based on the different models trained on the first 90% of the data (see description in text).

Citation: Journal of the Atmospheric Sciences 67, 5; 10.1175/2010JAS3271.1

Comparison of the mean relative prediction errors calculated for the last 10% of the time series based on the different models trained on the first 90% of the data (see description in text).

Citation: Journal of the Atmospheric Sciences 67, 5; 10.1175/2010JAS3271.1

Comparison of the mean relative prediction errors calculated for the last 10% of the time series based on the different models trained on the first 90% of the data (see description in text).

Citation: Journal of the Atmospheric Sciences 67, 5; 10.1175/2010JAS3271.1

Differences between BIC (39) as calculated for different EOF dimensions of the VARX models with and without the factors *u _{t}*

^{1}and

*u*

_{t}^{2}(negative values indicate the EOF dimensions where the influence of both factors is statistically nonsignificant) for the global stationary linear VARX model (dashed) and for the local stationary linear FEM–VARX factor models (calculated with

*K*= 4,

*N*= 4000,

*C*= 3000,

*q*= 2) (solid lines). The dotted zero-line marks the statistical significance level (components above this line are significant in a sense of the BIC criterion).

Citation: Journal of the Atmospheric Sciences 67, 5; 10.1175/2010JAS3271.1

Differences between BIC (39) as calculated for different EOF dimensions of the VARX models with and without the factors *u _{t}*

^{1}and

*u*

_{t}^{2}(negative values indicate the EOF dimensions where the influence of both factors is statistically nonsignificant) for the global stationary linear VARX model (dashed) and for the local stationary linear FEM–VARX factor models (calculated with

*K*= 4,

*N*= 4000,

*C*= 3000,

*q*= 2) (solid lines). The dotted zero-line marks the statistical significance level (components above this line are significant in a sense of the BIC criterion).

Citation: Journal of the Atmospheric Sciences 67, 5; 10.1175/2010JAS3271.1

Differences between BIC (39) as calculated for different EOF dimensions of the VARX models with and without the factors *u _{t}*

^{1}and

*u*

_{t}^{2}(negative values indicate the EOF dimensions where the influence of both factors is statistically nonsignificant) for the global stationary linear VARX model (dashed) and for the local stationary linear FEM–VARX factor models (calculated with

*K*= 4,

*N*= 4000,

*C*= 3000,

*q*= 2) (solid lines). The dotted zero-line marks the statistical significance level (components above this line are significant in a sense of the BIC criterion).

Citation: Journal of the Atmospheric Sciences 67, 5; 10.1175/2010JAS3271.1

Comparison of the negative Lejenas–Oakland blocking index (dashed line) and the sum of cluster affiliations of locally linear states 3 and 4 (solid line, calculated with FEM–VARX for *K* = 4, *N* = 4000, *C* = 3000, *q* = 2).

Citation: Journal of the Atmospheric Sciences 67, 5; 10.1175/2010JAS3271.1

Comparison of the negative Lejenas–Oakland blocking index (dashed line) and the sum of cluster affiliations of locally linear states 3 and 4 (solid line, calculated with FEM–VARX for *K* = 4, *N* = 4000, *C* = 3000, *q* = 2).

Citation: Journal of the Atmospheric Sciences 67, 5; 10.1175/2010JAS3271.1

Comparison of the negative Lejenas–Oakland blocking index (dashed line) and the sum of cluster affiliations of locally linear states 3 and 4 (solid line, calculated with FEM–VARX for *K* = 4, *N* = 4000, *C* = 3000, *q* = 2).

Citation: Journal of the Atmospheric Sciences 67, 5; 10.1175/2010JAS3271.1

FEM–VARX cluster state 3 (blocking state): Mean dynamical equilibrium positions *H* of the geopotential heights (solid) and their confidence intervals (dashed) at different times (revealing the influence of both external factors separately; see explanation in text).

Citation: Journal of the Atmospheric Sciences 67, 5; 10.1175/2010JAS3271.1

FEM–VARX cluster state 3 (blocking state): Mean dynamical equilibrium positions *H* of the geopotential heights (solid) and their confidence intervals (dashed) at different times (revealing the influence of both external factors separately; see explanation in text).

Citation: Journal of the Atmospheric Sciences 67, 5; 10.1175/2010JAS3271.1

FEM–VARX cluster state 3 (blocking state): Mean dynamical equilibrium positions *H* of the geopotential heights (solid) and their confidence intervals (dashed) at different times (revealing the influence of both external factors separately; see explanation in text).

Citation: Journal of the Atmospheric Sciences 67, 5; 10.1175/2010JAS3271.1

As in Fig. 7, but for FEM–VARX cluster state 4 (blocking state).

Citation: Journal of the Atmospheric Sciences 67, 5; 10.1175/2010JAS3271.1

As in Fig. 7, but for FEM–VARX cluster state 4 (blocking state).

Citation: Journal of the Atmospheric Sciences 67, 5; 10.1175/2010JAS3271.1

As in Fig. 7, but for FEM–VARX cluster state 4 (blocking state).

Citation: Journal of the Atmospheric Sciences 67, 5; 10.1175/2010JAS3271.1

Comparison of the mean relative prediction errors calculated for the last 10% of the time series based on the different models trained on the first 90% of the data (see detailed description and discussion in text).

Citation: Journal of the Atmospheric Sciences 67, 5; 10.1175/2010JAS3271.1

Comparison of the mean relative prediction errors calculated for the last 10% of the time series based on the different models trained on the first 90% of the data (see detailed description and discussion in text).

Citation: Journal of the Atmospheric Sciences 67, 5; 10.1175/2010JAS3271.1

Comparison of the mean relative prediction errors calculated for the last 10% of the time series based on the different models trained on the first 90% of the data (see detailed description and discussion in text).

Citation: Journal of the Atmospheric Sciences 67, 5; 10.1175/2010JAS3271.1

^{1}

The influence of the Tykhonov regularization in the *W*_{1,2} Sobolev norm is equivalent to the action of the diffusion operator *ϵ*^{2}Δ* _{t}* on

*γ*(

_{i}*t*) [e.g., the increase of the regularization parameter

*ϵ*

^{2}results in the stronger diffusion of the affiliation function

*γ*(

_{i}*t*) through the interface separating the clusters in time]. For a detailed discussion of the

*W*

_{1,2}regularization effects and their influence on different clustering scenarios, see Horenko (2009, 2010b).

^{2}

For practical examples of standard finite element function sets *f _{i}*(

*t*) (such as linear finite elements) in discrete time, see Horenko (2010b) and Braess (2007).

^{3}

This can be done in a standard way, controlling the norm of the discretization errors *χ _{N}*

^{+},

*χ*

_{N}^{−}locally and applying the multigrid approach to guarantee that |

*χ*

_{N}^{±}| ≤

**(Braess 2007).**

*δ̃*^{4}

We get use of the convexity of the functional *g*[**x**_{t}, * θ_{i}*] = ‖

**x**

_{t}−

*− 𝗔*

**μ**^{i}

^{i}**ϕ**_{1}(

*x*

_{t−τ}, … ,

*x*

_{t−mτ}) − 𝗕

^{i}**ϕ**_{2}[

*u*(

*t*)]‖

_{Pi}with respect to parameters 𝗔𝗕

*and the necessary minimum condition for convex functionals.*

^{i}^{5}

Note that all of the numerical criteria for verification of the Markov assumption known in the literature imply the stationarity (or homogeneity) of the underlying transition matrix and are in general not applicable if the analyzed process is known to be strongly nonstationary.

^{6}

Equation (40) is derived under the assumptions that the analyzed data **x** are locally weakly stationary [e.g., in this case, that the expectation value of **x** is (locally) time-independent] and the external noise process *ϵ _{t}* is i.i.d. and has a zero expectation.

^{7}

The author thanks the Potsdam Institute for Climate Impact Research (PIK) for access to the ERA-40 data.

^{8}

One could take the condition