## 1. Introduction

The aim of the present paper is to discuss some existing techniques and to demonstrate several new ideas of reduced stochastic modeling based on observed multidimensional meteorological data. Many dynamical systems in meteorology and climate research are characterized by the presence of low-dimensional attractors, that is, the dynamics of those processes converge toward some low-dimensional manifolds of the high-dimensional phase space (Lorenz 1963, 2006; Dijkstra and Neelin 1995; Guo and Huang 2006). An important question in the analysis of such systems is the identification of those manifolds because they give a reduced representation of the original dynamics in only a few essential degrees of freedom. If knowledge about the system is present only in the form of observational or measurement data, and there is no a priori knowledge about the equations governing the system’s dynamics, then the challenging problem of identifying those attractive manifolds and constructing reduced low-dimensional models becomes a problem of time series analysis and pattern recognition in many dimensions. The choice of the appropriate data analysis strategies (which implies a set of method-specific assumptions about the analyzed data) plays a crucial role in the correct interpretation of the available time series. Correct identification of attractors in the observed meteorological and climate data will allow for a reduced stochastic description of multidimensional processes and can help improve the quality of resulting predictions.

In most cases, the problem of identifying attractors is hindered by the fact that they have very complicated, even fractal geometry and noninteger dimensionality, so the points close to each other in Euclidean space can be infinitely wide apart on the attractor. Therefore, to apply methods of topological data analysis and clustering that require the notion of a geometrical distance between the different points in the phase space, it is necessary first to transform the data into an appropriate form. The first theoretical results about the existence of such transformations (or embeddings) for certain classes of dynamical systems were obtained by Whitney (1936); the method of delays was proposed subsequently by Packard et al. (1980). Later, in the context of the turbulence dynamics, the Takens embedding theorem provided a theoretical background to the method of delays (Takens 1981). However, none of these results gives a general strategy of how to calculate the dimensionality of the attractor if it is a priori unknown and how to extract the relevant information from the measurements.

Broomhead and King (1986; see also Broomhead et al. 1988) have suggested calculating the dimension *m* of the attractive manifold a posteriori by setting the embedding dimension sufficiently high and then calculating *m* from the numerical rank of the covariance matrix of the embedded data. From the mathematical point of view, this is equivalent to applying principal component analysis (PCA) to the embedded data and approximating the attractive manifold by an *m*-dimensional linear manifold built by *m* dominant eigenvectors of the covariance. These ideas were further developed by Vautard et al. (1992) in terms of singular-spectrum analysis (SSA) and were implemented for analysis of surface air temperature time series. However, these approaches implicitly rely on the assumption that the attractor can be globally approximated by a linear manifold. Assessing the quality of this approximation (and therefore also the validity of the assumption) are beyond the scope of this paper. We understand the approximation quality here to be a functional that describes the distance between the really existing attractive manifold and its approximation by a linear manifold.

In the present paper we show a way of constructing a quality functional. In the framework given by the Takens embedding theorem, this functional quantifies the reconstruction error, that is, the distance between the original data embedded in Euclidean space and their reconstruction based on a low-dimensional description. In contrast to traditional approaches like SSA, we assume that the attractor manifold can be represented as a combination of a fixed number *K* of *m-*dimensional hyperplanes in an *n-*dimensional embedding (i.e., extended space). The reduced representation of the analyzed data results from a projection on *K* unknown low-dimensional linear manifolds. We demonstrate how two special constraints imposed on the projection operators result in two different forms of dimension reduction: (i) an extension of the PCA toward multiple hidden states (with essential coordinates calculated as linear combinations of all of the original degrees of freedom in extended space) and (ii) what we call the principal original components (POC), with essential coordinates that are subsets of the original degrees of freedom in extended space.

We demonstrate the application of the proposed method both to the Lorenz oscillator system with measurement noise and to the analysis of historical air temperature data and show how the new method can be used for the elimination of noise and identification of the seasonal low-frequency components in meteorological data. The results are compared with other existing methods of seasonal cycle elimination. We also present an application of the proposed POC method in the context of constructing low-dimensional predictive models of temperature dynamics in the atmosphere.

## 2. Extraction of topological information from time series

**x**∈ 𝗠 ⊂ 𝗥

*representing a state of the system and 𝗠 being the space of all possible system configurations or phase space. If*

^{n}*F*(

*x*) is locally Lipschitz continuous, then (1) defines an initial value problem (i.e., a unique solution curve passes through each point

**x**⊂ 𝗠). In this case, we can formally write the solution at any time

*t*given an initial value

*x*

_{0}as

*x*=

_{t}*ϕ*

_{t}x_{0}, where

*ϕ*represents a so-called flow of dynamical system (1). In many dynamical systems, it is the case that their flows evolve toward some low-dimensional objects called

_{t}*attractors*, 𝗔 ⊂ 𝗠 is called an attractor if the following conditions are satisfied: (i) 𝗔 =

*ϕ*𝗔, ∀

_{t}*t*, that is, 𝗔 is invariant under the flow, (ii) there is a neighborhood of 𝗔, 𝗕(𝗔) called the basin of attraction for 𝗔, such that 𝗕(𝗔) = {

*s*|∀

*N*⊂ 𝗕(𝗔), ∃

*T*, ∀

*t*>

*T*,

*ϕ*∈

_{t}s*N*}, and (iii) there is no subset of 𝗔 with the first two properties.

The identification of attractors for dynamical systems is important because the projection of the original high-dimensional system’s observational *x* on the attractor allows us to find a reduced essential representation of the systems dynamics, which is low-dimensional if dim(𝗔) ≪ dim(𝗠). In the context of data analysis, this can help reduce the dimensionality of the observation vector and permit an optimal measurement process with a minimal number of measured system’s degrees of freedom involved. However, in context of time series analysis, this task can be hindered by the fact that the attractor may have very complicated, even fractal geometry and noninteger dimensionality, so points that are close to each other in Euclidean space can be infinitely far apart on the attractor. This issue can restrict the application of topological data analysis and clustering methods that require the notion of a geometrical distance between the different points in the phase space. Therefore, to be able to extract the essential topological information out of the measured data, it is necessary to transform the data into an appropriate form.

As was first shown by Whitney (1936), sufficiently smooth connected *m*-dimensional manifolds can be smoothly embedded in Euclidean (2*m* + 1)-dimensional space; in other words, at least for certain classes of attractors, such a transformation (in the form of embedding) exists. This result, however, does not answer the question of how this embedding can be constructed. The answer to this question was given by Takens (1981) in his embedding theorem. It states that such an embedding can be constructed in a form of a vector function containing (2*m* + 1) appropriately chosen consecutive discrete measurements of the attractor process. The Takens embedding theorem gave a solid theoretical background to the method of delays first proposed by Packard et al. (1980). The basic idea of the method is to consider the frames or windows of a certain length *q* for a discrete observation series *z _{t}* ∈ 𝗥

*, resulting in a new extended observation series*

^{c}*x*∈ 𝗥

_{t}*,*

^{n}*n*=

*qc*. The practical application of the Takens embedding theorem is still limited since it gives no answer to the questions of how to determine the attractor dimensionality

*m*and how to extract the attractor manifolds from the high-dimensional observational data. However, the fact that there exists a Euclidean embedding of some dimensionality

*n*gave Broomhead et al. (1988) the idea of combining the method of delays with techniques of statistical data analysis (Broomhead and King 1986; Broomhead et al. 1988).

Let us assume that we were able to estimate the upper bound *n* of the embedding dimension for the given time series {*z _{t}*}

_{t}_{=1,...,}

*. The idea of the method proposed in Broomhead and King (1986) is to identify the attractor dimension*

_{T}*m*by finding the principal directions with the data variance exceeding a certain threshold [typically given by the machine tolerance (Broomhead et al. 1988)] in

*n*-dimensional data

*x*(

_{t}*m*≪

*n*). Therefore, the resulting numerical strategy can be described as a combination of the Takens method of delays with the principal component analysis of the data in extended space. However, the application of PCA implicitly relies on the assumption that the attractor can be globally approximated by a linear manifold. Assessing the quality of this approximation (and therefore also the validity of the aforementioned assumption) remained beyond the scope of the paper of Broomhead and King (1986). In the next section, we will present a framework that allows us to overcome this difficulty by introducing the concept of states hidden in the data and by replacing the global linear approximation of the attractor by local linear approximations specific to each of the hidden states.

## 3. Topological identification of hidden states

### a. Residual functional with hidden states

*K*linear transformation matrices, 𝗠

*∈ 𝗥*

_{i}^{n×m}, 𝗧

*∈ 𝗥*

_{i}^{n×m},

*i*= 1, . . . ,

*K*, where 𝗧

*is understood to project onto the subspace spanned by the local principal directions and 𝗠*

_{i}*is a linear transformation casting the reduced vector back into the original space. Mathematically, the problem of identifying 𝗠*

_{i}*, 𝗧*

_{i}*can be stated as a minimization problem with regard to the residual functional, describing the least-squares difference between the original observation and its reconstruction from the*

_{i}*m*-dimensional projection:

*θ*= (𝗧

_{1}, 𝗠

_{1},

*μ*

_{1}, . . . , 𝗧

*, 𝗠*

_{K}*,*

_{K}*μ*) and

_{K}*μ*∈ 𝗥

_{i}*. The “hidden path”*

^{n}*γ*(

_{i}*t*) denotes the probability of optimally describing the

*n*-dimensional vector

**x**

*at time*

_{t}*t*with the local transformations 𝗠

*and 𝗧*

_{i}*[Σ*

_{i}^{K}

_{i=1}

*γ*(

_{i}*t*) = 1 for all

*t*]. The quantity

*γ*(

_{i}*t*) provides a relative weight to the statement that an observation

*x*belongs to the

_{t}*i*th hidden state.

We will first consider the case that the hidden path *γ* is known and fixed. In the case of the full optimization of (2), the problem can become ill posed in the general case because the number of unknown parameters can approach the number of data points *x _{t}* and the optimized functional is not convex. To solve this problem we suggest two implicit regularization techniques in the form of additional assumptions imposed on the optimization strategy. As shown in Horenko et al. (2006b) and Horenko and Schuette (2007, manuscript submitted to

*Econ. J.*, hereafter HOS) the formulation of the full optimization strategy (with regard to both

*θ*and

*γ*) can be achieved by applying either (i) hidden Markov models (HMMs), with few hidden states and Gaussian output in the essential coordinates, and the expectation-maximization (EM) framework, or (ii) a wavelets representation of the hidden path

*γ*(with few wavelets coefficients involved). In both cases the essential step is the calculation of the reestimation formulas resulting from setting the partial derivatives ∂𝗟/∂

*θ*to zero for a fixed sequence of hidden probabilities

*γ*(

_{i}*t*) and number

*K*of hidden states.

To guarantee the uniqueness of the resulting parameters *θ* = argmin 𝗟, we have to impose certain constraints on the transformation matrices 𝗠* _{i}*, 𝗧

*. Depending on the requirements imposed on the resulting attractor dimensions, various possible matrix structures could be imposed. In the following cases, we consider two special forms of constraints on the projection operators and discuss the resulting numerical strategies in both cases.*

_{i}#### 1) Case 1: State-specific principal component analysis

*= 𝗧*

_{i}^{T}

_{i}and furthermore assume that the 𝗧

*are orthonormal linear projectors, that is,*

_{i}*γ*(

_{i}*t*) is known and fixed (Horenko et al. 2006b; HOS):

*is a matrix with*

_{i}*m*dominant eigenvalues of the weighted covariance matrix Σ

^{T}

_{t=1}

*γ*(

_{i}*t*)(

*x*−

_{t}*μ*)(

_{i}*x*−

_{t}*μ*)

_{i}^{T}on the diagonal (nondiagonal elements are zero). In other words, each of the

*K*hidden states is characterized by a specific set of essential dimensions 𝗧

*(which can be defined as corresponding dominant eigenvectors) and center vectors*

_{i}*∈ 𝗥*

**μ**_{i}*calculated from the conditional averaging of the time series with regard to corresponding occupation probabilities*

^{n}*γ*(

_{i}*t*) (Horenko et al. 2006b). Note that for

*K*= 1 we automatically get a standard PCA procedure.

For an unknown sequence of *γ _{i}*(

*t*) two optimization strategies have been proposed: (i) HMM–PCA (Horenko et al. 2006b) and (ii) wavelets–PCA (HOS).

Majda et al. (2006) have recently demonstrated the application of the HMM–Gaussian framework and EM optimization in the atmospheric context (see also Franzke et al. 2008). They verified the power of the HMM approach with regard to identification of the blocking events hidden in some output time series of several atmospheric models. They showed that the identification of the hidden states was possible even despite the nearly Gaussian statistical distribution of the corresponding probability density function (a case where traditional clustering methods usually fail to reveal the hidden states). In the following paragraphs we will briefly describe a different HMM technique originally developed in the context of molecular dynamics data analysis (Horenko et al. 2006b).

Let us assume that the knowledge about the analyzed data allows the two following assumptions: (i) that the unknown vector of hidden probabilities *γ _{i}*(

*t*) can be assumed to be an output of the Markov process

*X*with

_{t}*K*states and (ii) that the probability distribution

*P*(𝗧

*|*

_{i}x_{t}*X*=

_{t}*i*)—which is the conditional probability distribution of the projected data in the hidden state

*i*—can be assumed to be Gaussian in each of the hidden states. If both of these assumptions hold, then the HMM framework can be used and a special form of EM algorithm can be constructed to find the minimum of the residual functional (2) [for details of the derivation and the resulting algorithmic procedure, refer to our previous works Horenko et al. (2006b) and HOS]. The resulting method is linear in

*T*and scales as

*O*(

*mn*

^{2}) with the dimension of the problem and as

*O*(

*K*

^{2}) with the number

*K*of hidden states. However, like all the likelihood-based methods in an HMM setting, HMM–PCA does not guarantee the uniqueness of the optimum since the EM algorithm converges toward a local optimum of the likelihood function.

If the knowledge about the data is not sufficient to make the above assumptions, we can constrain the functions *γ _{i}*(

*t*) to the class of piecewise constant functions switching between the discrete values of 0 and 1 and represent them as a finite linear combination of

*p*Haar wavelets. As demonstrated in HOS, this allows us to reduce the dimensionality of the parameter space necessary to minimize the residual functional and to construct an iterative optimization scheme called wavelets–PCA. The main advantage of this approach is that, in contrast to HMM–PCA, we do not need any a priori assumptions about the data being Markovian or Gaussian. [This is so because the resulting numerical scheme is a direct optimization of the functional (2); for details see HOS.] However, the major drawback of wavelets–PCA is its quadratic scaling with the number

*p*of Haar wavelet functions. This constrains the applicability of the method to the cases where only few (in practice 10–20) transitions between essential manifolds are present in the data. Therefore, in the case of a very long data series it is possible to verify the assumptions needed to use the HMM–PCA via a comparison of both methods for short fragments of the series. If the resulting functions

*γ*(

_{i}*t*) and associated manifolds 𝗧

*are equal or very similar for both of the methods, we can assume that the Markovianity and local Gaussianity assumptions are valid and the whole data series can be analyzed with the help of HMM–PCA.*

_{i}Note that in contrast to the methodology presented in Horenko et al. (2006b) and HOS, where the observational data were directly analyzed with HMM–PCA and wavelets–PCA approaches, the construction of the embedding and casting the data into extended space allows us to combine both methods with the Takens (1981) method of delays. The resulting approaches therefore circumvent the problem connected with the above-mentioned global linearity assumption of the method proposed by Broomhead and King (1986) and allow for the adaptive construction of local linear attractors associated with each of the hidden states. In short, the “global attractor linearity” assumption of the method proposed in Broomhead and King (1986) is replaced by a set of “local linearity assumptions” in the HMM–PCA and wavelets–PCA methods. An additional advantage of the proposed strategy compared to the HMM–PCA procedure described in Horenko et al. (2006b) is the fact that, as mentioned above, casting the observed (possibly non-Markovian) data in an extended space of sufficiently high dimensionality makes the resulting data Markovian and thereby verifies one of the basic assumptions needed for the HMM–PCA method. This issue is closely related to the choice of the frame length *q* and will be discussed in detail at a later stage.

It is important to mention that the applicability of the above strategy is limited to cases where the essential attractor manifolds of different hidden states are unequal and can be identified as directions of dominant spatial variability in extended space. This problem is closely connected to the problem of the optimal choice of local reduced dimensionality *m*. For observational data where the differences between the attractors are not significant for small *m* (corresponding to hidden states with almost identical *m* dominant local PCA modes), the value of *m* should be increased and the optimization procedure should be repeated. Additionally, the obvious numerical problems associated with the calculation of the *m* dominant eigenvectors of the covariance matrix for large *m*, there is a natural upper limit of *m* known from information theory, which is related to the statistical uncertainty of identified eigenvectors of the covariance matrix and the signal-to-noise ratio of the data (Broomhead et al. 1988; Lisi 1996; Gilmore 1998). All of the above considerations restrict the applicability of the described framework for very high *n* to the cases were a low *m* is sufficient for the identification of the differences in local attractor manifolds.

The Gaussianity assumption for the observation process in the HMM–PCA method gives an opportunity to estimate the confidence intervals of the manifold parameters (*μ _{i}*, 𝗧

*) straightforwardly. This can be done in a standard way of multivariate statistical analysis because the variability of the weighted covariance matrices (4) involved in the calculation of the optimal projectors 𝗧*

_{i}*is given by the Wishart distribution (Mardia et al. 1979). The confidence intervals of 𝗧*

_{i}*can be estimated by sampling from this distribution and calculating the*

_{i}*m*dominant eigenvectors of the sampled matrices.

#### 2) Case 2: State-specific principal original components

In many cases, the knowledge about essential degrees of freedom of the observed system can help to plan the optimal measurement process, that is, the measurement of the (few) essential degrees of freedom can be used to reconstruct the complete information about the system. However, the local PCA modes 𝗧* _{i}* described above define the essential degrees of freedom as linear combinations of all observation dimensions with coefficients defined by the elements of the matrices 𝗧

*. It means that if all of these elements are significantly nonzero, one has to measure all of the original system’s dimensions in order to get a time series of essential coordinates. If the process of measurement is expensive, it is worthwhile to reduce the number of measurements and to express the essential degrees of freedom in terms of as few observed system dimensions as possible. To approach this problem mathematically, we can first fix the reduced dimensionality*

_{i}*m*and hidden path

*γ*(

_{i}*t*) and minimize the functional (2) subject to the special form of constraints imposed on the projector operator 𝗧

*. To satisfy the requirement that only*

_{i}*m*different original dimensions of the observation process can be used in the construction of the reduced dynamics, we have to demand that the respective elements of the

*m*×

*n*matrices 𝗧

*can be either 0 or 1 and that there cannot be more than one 1 per matrix column. We do not impose any explicit constraints on 𝗠*

_{i}*.*

_{i}*,*

_{n}*possible projectors 𝗧*

_{m}*(assuming that*

_{i}*m*≪

*n*so the number of possibilities is not too large). For each fixed parameter 𝗧

*, we can calculate the explicit optimum of the functional (2) with regard to 𝗠*

_{i}*by setting the respective partial derivative to 0 and solving the resulting matrix equation. It is easy to verify that this matrix equation can be solved analytically, leading to the following expression for the optimal value of 𝗠*

_{i}*:*

_{i}_{i}= Σ

^{T}

_{t=1}

*γ*(

_{i}*t*)(

*x*−

_{t}*μ*)(

_{i}*x*−

_{t}*μ*)

_{i}^{T}. The calculation of the optimal value for

*μ*is analogous to the PCA case and results in the same estimation (5). The set of parameters (𝗧

_{i}*, 𝗠*

_{i}*,*

_{i}*μ*) calculated in such a way can then be substituted into the functional (2). It is obvious that one of the 𝗖

_{i}*,*

_{n}*parameter sets with the lowest value of*

_{m}*L*defines an optimal attractive manifold in POCs.

### b. Pipeline of data compression and reconstruction

*μ*, 𝗧

_{i}*, 𝗠*

_{i}*). This information can be used to compress the original data*

_{i}*z*(i.e., to find a low-dimensional description of the dynamics). The following diagram represents the pipeline of the data-compression process based on dimension reduction:

*q*is selected and the multidimensional process

*z*is embedded in the extended space. Following the Takens (1981) theorem, the frame length

*q*should be chosen such that

*q*≥ (1/

*c*)(2

*m*+ 1), where

*m*is the estimated attractor dimension and

*c*is the spatial dimension of the observational data

*z*. If the attractor dimension

*m*is a priori unknown, the choice of

*q*is bounded from above by the feasibility of computing the dominant eigenvectors of the (

*n*×

*n*) covariance matrix (where

*n*=

*qc*). Another limitation comes into play when analyzing observational data with some memory mem > 1 (note that we define Markov processes as those with mem = 1). In this case, the frame length should be chosen such that

*q*≥ mem in order to guarantee the Markov property for the embedded data

*x*and to fulfill the formal criteria of the Takens theorem.

*x*, whose elements are the dynamical patterns of length

*q*from the original time series

*z*. As the next step, the number of hidden states

*K*and desired reduced dimensionality

*m*should be selected (e.g., by inspection of the spectra of the resulting local covariance matrices) and functional (2) should be minimized numerically with one of the procedures described above. Afterward, the

*m-*dimensional projections

*x*of the time series

^{i}_{p}*z*on the respective manifolds can be calculated. The compression factor associated with this dimension reduction strategy is

*n*(

*T*−

*q*)/[

*Km*(

*T*−

*q*) +

*Kn*(

*m*+ 1)] and converges to

*n*/(

*Km*) if

*T*→ ∞. The projected time series

*x*can be projected back (or reconstructed) by applying the projector operators 𝗠

^{i}_{p}*,*

_{i}*i*= 1, . . . ,

*K*. This operation delivers the reconstructed time series from the dynamical frames

*x*in the extended space. Finally, the reconstruction

_{r}*z*of the original time series

_{r}*z*can be achieved via

*z*we calculate the expectation value over all of the reconstructed dynamical patterns contained in

_{r}*x*. As will be demonstrated in the following numerical examples, this leads to the smoothing or filtration of the reconstructed dynamics and will allow us to recover the attractor structures hidden by the external noise.

_{r}## 4. Numerical examples

In this section we will illustrate the proposed strategy for dimension reduction, dominant temporal pattern recognition, and decomposition into metastable states by three examples: (i) a time series of the Lorenz oscillator in three dimensions with the multivariate (MVAR) autoregressive process added to it to mimic the noisy measurement process with memory, (ii) a one-dimensional dataset of historical daily air temperatures in Berlin,^{1} and (iii) a set of historical averaged daily temperatures between 1976 and 2002 on a 20 × 29 point spatial grid covering Europe and part of the North Atlantic [for details concerning the data, see Horenko et al. (2008)].

The first example represents a toy model aiming to illustrate the proposed framework on a simple physical system. Identified hidden states and attractor manifolds can be straightforwardly interpreted according to our knowledge about the process.

In the next example, we demonstrate the power of topological pattern recognition and identification of hidden states associated with those dynamic patterns. In contrast to all other methods for identification of hidden states in one-dimensional data of which we are aware (Hamilton 1989; Horenko et al. 2006a), the proposed method allows assumption-free identification of the hidden states based on differences in the topology of corresponding dynamical patterns and simultaneous state-specific filtering of the data. We will show how this property of the framework can help eliminate the low-frequency seasonal trend from the meteorological data and will compare the results with filtering obtained by applying some standard methods of seasonal cycle elimination.

Finally, in the third example we demonstrate the application of the POC strategy for dimension reduction to high-dimensional meteorological data. The question to be answered in this context involves finding a (small) subset of original measurement dimensions that is optimal for the reconstruction of temporal data patterns for the rest of the measurements. In the presented example, this subset is represented by points on the geographic map where the temperature measurements are acquired. The example demonstrates that the original measurement dimensions that are optimal with regard to the temporal pattern recognition can be used to construct global stochastic models for temperature prediction based on some additional assumptions about the dynamics in the extended space. We verify those assumptions for the analyzed time series and demonstrate the application of MVAR autoregressive models in this context. Finally, we analyze the performance of a resulting prediction strategy with regard to the number *m* of original measurement dimensions involved in the construction of the predictive stochastic model.

### a. Lorenz system with measurement noise

*x*(0),

*y*(0),

*z*(0)] = (35, −10, −7) using the implicit Euler scheme with discretization time step

*h*= 0.01 on a time interval [0, 20]. To the resulting time series we add an output of a three-dimensional MVAR process [MVAR(3)] to mimic a measurement noise with memory (Brockwell and Davis 2002). The resulting three-dimensional data series is shown in the upper panel of Fig. 1. Added noise is chosen to be quite intense so that the original smooth “butterfly” structure of the respective attractor is smeared out.

To analyze the resulting time series, we first apply the standard PCA strategy to the extended space representation of the dynamics [with frame length *q* = 12, as described in Broomhead and King (1986) and Broomhead et al. (1988)]. To estimate the number *m* of essential dimensions, we first look at the logarithmic plot of the eigenvalues of the corresponding global covariance matrix. The top of Fig. 2 shows that there is no clear spectral gap indicating the presence of dominant low-dimensional manifolds in the extended 36-dimensional space. This is a hint that the standard PCA dimension reduction strategy may not succeed in finding an adequate reduced representation of the dynamics (for *m* < 3). In fact, as can be seen from the left panel of Fig. 3 there is a significant discrepancy between the original Lorenz trajectory and its reconstruction from a reduced representation of the noisy data (*m* = 2) based on the standard PCA approach.

The application of both HMM–PCA and wavelets–PCA approaches to the same noisy data sequence from Fig. 1 (with *K* = 2, *m* = 12, and *q* = 12) results in almost identical hidden paths (see Fig. 4). As can be seen in Fig. 2, the spectra of the local covariance matrices in each of the identified states show a presence of a spectral gap at *m* = 2. The bottom of Fig. 1 shows the reconstruction of the three-dimensional dynamics from its reduced two-dimensional HMM–PCA representation, thereby revealing the original butterfly structure of the Lorenz attractor hidden in the noisy data. As can be seen from the comparison of standard PCA and HMM–PCA reconstructions, the latter demonstrate a much higher resemblance to the original data. The bottom of Fig. 1 also suggests an explanation of the identified states: they correspond to the wings of the Lorenz-attractor butterfly.

### b. One-dimensional example: Historical temperatures in Berlin

As a first realistic example for the proposed analysis strategy we take the historical daily air temperature measurements as averaged values from 0000, 0600, 1200, and 1800 UTC observations in Berlin between 1 January 1976 and 26 September 1978 (resulting in a time series of 1000 measurements).

In contrast to the previous example, the analyzed time series is one-dimensional and there is no obvious metastability in the data (i.e., the dynamics is not obviously switching between geometrically well-separated domains of the phase space, as in the case of the Lorenz oscillator). However, we propose that the extended space representation of the dynamics can help reveal hidden dynamical patterns in the data that we can use to identify hidden metastability.

*P*= 1.0 yr from the original temperature series {

*z*}

_{t}_{t=1,...,T}by a subtraction of the

*P*-averaged temperature

*I*(

*t*) = [

*k*∈ 𝗡|1 ≤ (

*t*+

*kP*) ≤

*T*]; (ii) Fourier filtering of the signal {

*z*}

_{t}_{t=1,...,T}(i.e., elimination of the Fourier modes exceeding a threshold amplitude); and (iii) an SSA (as described in Vautard et al. 1992) for

*m*= 1,

*q*= 50. As can be seen from the bottom of Fig. 5, none of the above-mentioned methods is able to eliminate the seasonal trend completely from the data; that is, there remains a significant low-frequency periodicity in the data. This observation can be explained by the violation of the implicit assumptions of each approach. Method (i) implies a presence of an exactly periodic component with

*P*= 1.0 yr in the analyzed data and stationarity of the discretized process

*z*

_{t}_{+kP}, ∀

*t*∈ [1,

*T*],

*k*∈

*I*(

*t*). Method (ii) is based on the application of periodic sine and cosine functions for the decomposition of the signal (which means that one should involve indefinitely many of them for certain classes of periodic signals), and implicitly assumes the stationarity of the period. Method (iii) assumes the existence of a single globally linear attractive manifold, as discussed above. None of those assumptions is involved in the minimization of the functional (2) by means of wavelets–PCA. As we will demonstrate later, this leads to a reliable extraction of the low-frequency modes from the analyzed data.

We start by applying the standard PCA strategy and the wavelets–PCA (with *K* = 2, *m* = 49, *q* = 50) to the extended space representation of the dynamics in the same way as described in the previous example. The top of Fig. 6 demonstrates that there is no clear spectral gap indicating the presence of dominant low-dimensional manifolds in extended space for the standard PCA dimension reduction strategy. In contrast, as seen in the right-hand side of Fig. 6, wavelets–PCA exhibits a clear spectral gap after the first eigenvalue for both of the hidden states and therefore indicates the presence of two one-dimensional attractive manifolds (*m* = 1). We repeat the wavelets–PCA analysis for *K* = 2, *m* = 1, *q* = 50 and get the hidden sequence *γ _{i}*(

*t*) that is identical to the previous case with

*m*= 49.

To check the applicability of the HMM–PCA strategy, we can test the conditional Gaussianity (for the two hidden states identified by wavelets–PCA) of the embedded data *x _{t}* for different values of the frame length

*q*≥ 4 [because the memory of the analyzed data was found to be four according to the common test based on application of autoregressive models (Brockwell and Davis 2002)]. Applying some standard statistical tests (like the

*χ*

^{2}, Kolmogorov–Smirnov, and Shapiro–Wilks tests) to marginal statistical distributions of the extended data leads us to discard the Gaussian hypothesis (for the probability of error of type I being

*α*= 0.05). This feature indicates the data in the hidden states is non-Gaussian, therefore prohibiting application of HMM–PCA.

To interpret the results of the wavelets–PCA analysis and to understand the physical meaning of the identified states, we first look at the estimated manifold parameters (*μ _{i}*, 𝗧

*) for each of the states (see Fig. 7). The lower panel of Fig. 7 demonstrates that the identified optimal projectors 𝗧*

_{i}*,*

_{i}*i*= 1, 2, are significantly different for both of the states. The parameters

*,*

**μ**_{i}*i*= 1, 2 shown in the top panel of Fig. 7 can be understood as the mean dynamical patterns of length

*q*characteristic of the identified states. This means that the first hidden state is characterized by an almost linear mean dynamical pattern with an upward trend of the temperature dynamics (daily temperature difference of 0.12 ± 0.018), whereas the second hidden state is obviously characterized by an almost linear mean dynamical pattern with a downward trend of the temperature dynamics (daily temperature difference of −0.12 ± 0.02). It is important to mention that this obvious linearity of the hidden trends resulting from the wavelets–PCA procedure is not a result of the assumptions done implicitly in the analysis procedure. The only assumption needed for the wavelets–PCA dimension reduction strategy is discreteness of the underlying hidden path (HOS).

To make the last point clear, we look at the one-dimensional reduced representation of the extended space dynamics based on the wavelets–PCA strategy (*K* = 2, *m* = 1, *q* = 50; see the top panel of Fig. 8). Reduced dynamics exhibit a “sawtooth” form resulting from the consecutive linear upward and downward trends. The discontinuity of the reduced dynamics is explained by the jumps between the local attractive manifolds and is a direct consequence of the discreteness assumption involved in the wavelets–PCA procedure. Finally, the bottom panel of Fig. 8 shows the comparison of the original time series with its reconstruction from the reduced wavelets–PCA representation. Analogously to the previous example, the reconstructed series is smooth and clearly reveals the dynamical behavior associated with each of the hidden attractive manifolds. In the bottom panel of Fig. 5, in contrast to all other standard filtering methods we have used, we can see wavelets–PCA-based identification of attractive seasonal trend components.

After subtracting the identified linear attractive seasonal trend components from the analyzed data, we can check the applicability of local linear autoregressive (AR) models for the reduced data *x _{r}*(

*t*) ∈ 𝗥

^{1}[“local” in this context means that for each of the hidden states we can test for applicability of a specific AR model used to describe the filtered data locally in each state (Brockwell and Davis 2002)]. In the next example, this analysis will help us to construct predictive models based on a wavelets–PCA decomposition of data.

### c. Multidimensional example: Historical temperatures on a grid

To demonstrate the application of the presented framework to realistic multidimensional data with memory we choose daily mean values of the 2-m surface air temperature from the 40-yr European Centre for Medium-Range Weather Forecasts Re-Analysis (ERA-40) data (Simmons and Gibson 2000). We consider a region with the coordinates 33.0°–73.5°N, 27.0°W–45.5°E, which includes Europe and a part of the eastern North Atlantic. The original spatial resolution of the data was reduced to approximately 2.0° × 2.5° latitude and longitude by spatial averaging. Thus, we have temperature values on a grid of 20 × 29 points (resulting in *c* = 580). The time record is from 1976 till 2002 and it includes 9736 daily averaged values from 0000, 0600, 1200, and 1800 UTC observations.

Because both HMM–PCA and wavelets–PCA scale as *O*(*mn*^{2}) with regard to the extended dimension *n* = *qc* and reduced dimensionality *m*, our specific implementation of the presented dimension reduction strategy is restricted to the cases where *n* is not too big (no more than 1000–1500). This restriction is only due to the current implementation of the code and is not intrinsic to the presented strategy itself. On account of this restriction we cannot permit the frame length *q* to be arbitrarily large (because the extended space dimensionality *n* is a product of the observation process dimension *c* and the frame length *q*).

The application of the wavelets–PCA strategy for *K* = 2, *q* = 2, and *m* = 1 results in identification of the hidden path shown in Fig. 9. The figure shows that the hidden path is almost identical to the one calculated from the one-dimensional Berlin temperature data in the previous example. This indicates that the hidden process identified from the current multidimensional temperature data process is the same as in the example above and can be understood as a consequence of upward and downward trends in the dynamics. Inspection of the mean dynamical patterns associated with those trends reveals the linearity of the underlying dynamical trends. As in the example above, standard tests for linear autoregressive behavior of the filtered data can be applied (i.e., for the data with subtracted attractive trend components) and application of localized MVAR autoregressive processes for the construction of predictive models can be motivated (Brockwell and Davis 2002).

In contrast to the PCA type of the dimension reduction (where the essential manifolds are constructed as linear combinations of all of the original degrees of freedom), the POC approach identifies those manifolds from subsets of the original system dimensions. In the context of meteorological data analysis, this approach has an important advantage compared to PCA because only partial observation of the process is necessary to calculate the reduced representation of the overall system dynamics (i.e., it is enough to measure only a few of the original process dimensions to reconstruct the dynamics in all of the remaining degrees of freedom and to make predictions).

To identify the essential original dimensions in the analyzed temperature time series in each of the hidden states, we solve the POC optimization problems (2) for the increasing number *m* of original dimensions involved and for the fixed hidden path *γ _{i}*(

*t*) that results from the wavelets–PCA method (with

*K*= 2,

*q*= 2, and

*m*= 1). The time series data from the resulting POC subgrids are then used in identification of two-dimensional MVAR [MVAR(2)] model parameters in each of the hidden states separately. The identification of the MVAR(2) model parameters is done by applying the standard regression method described in Brockwell and Davis (2002). The 1-day predictions based on those models are then used to reconstruct the full observational data on the 20 × 29 point grid (applying the identified POC projectors 𝗠

*, 𝗧*

_{i}*) and the mean prediction errors are calculated for different subgrids. As can be seen from Fig. 10, to achieve the mean one-day prediction error of 2.5°C for the whole 20 × 29 grid it is enough to measure the temperature only on four grid points in winter and spring and on nine grid points in summer and autumn. Figure 11 shows the comparison of mean prediction errors based on hidden state-specific MVAR(2) models as functions of the number*

_{i}*m*of original systems dimensions involved. As expected, the mean prediction error monotonously decreases for both states when the number

*m*increases.

## 5. Discussion

A numerical framework for the simultaneous identification of hidden states and essential attractor manifolds in high-dimensional data is presented. The idea of the method is a combination of the method of delays (based on the Takens embedding theorem) with the minimization of a functional describing the Euclidean distance between original and reconstructed data (both in embedded representation). The extended space representation results in Euclidean embedding of the underlying dynamical process and verifies the application of the Euclidean residual functional (2). The minimization of this functional results in the approximation of the attractive manifolds by *K* linear hyperplanes.

We have demonstrated how the solution of the general form of the optimization problem (2) can be approached numerically in two specific cases: (i) if the essential coordinates are linear combinations of all original degrees of freedom, resulting in wavelets–PCA and HMM–PCA approaches, and (ii) if the essential coordinates are chosen as linear combinations of a few optimally chosen original degrees of freedom, resulting in a method of POC. The subtraction of the mean dynamical patterns associated with each of the hidden states allowed us to eliminate the low-frequency seasonal components much better than was possible with other standard methods we applied (see right-hand side of Fig. 5). We verified the application of a certain class of models for the filtered temperature series analysis and predictions, namely, the linear MVAR autoregressive models.

In the context of temperature data analysis, we have demonstrated that the hidden process switching between two attractive manifolds is almost exactly the same for a one-dimensional temperature series from a single grid point and for the whole multidimensional data on a grid. It is important to mention that this finding was independent of the assumptions that are typically needed for the mean 1-yr cycle subtraction (11) and therefore does not presuppose the existence and stationarity of the 1-yr temperature cycle. It indicates a presence of an identical hidden low-frequency mode switching between the seasonal trends. This feature should be validated for the different available sets of meteorological data.

Future work will require the overcoming of two technical problems associated with the present version of the method’s realization on principal components (PC): first, *n* cannot be taken too high (1000–1500 in the present version of the code) and second, the number of hidden states *K* is chosen a priori. Although the first problem can be solved by the implementation of an efficient variant of the Lanczos eigenvalue solver, surmounting the second problem will require the incorporation of some theoretical results from the spectral theory of Markov chains (Schütte and Huisinga 2003).

Another problem that is still unresolved involves the effective implementation of the POC optimization procedure, because the current version relies on the direct combinatorial search of optimal subgrids and is not applicable for high values of *m*.

## Acknowledgments

The author thanks Christof Schuette (FU) and Rupert Klein (FU/PIK) for their friendship and constant support. I also thank H. Osterle and S. Dolaptchiev from Potsdam Institute of Climate Research (PIK) for the use of the historical temperature data. The author also thanks the anonymous reviewers for useful comments.

## REFERENCES

Brockwell, P. J., and R. A. Davis, 2002:

*Introduction to Time Series and Forecasting*. 2nd ed. Springer, 434 pp.Broomhead, D., and G. King, 1986: Extracting qualitative dynamics from experimental data.

,*Physica D***20****,**217–236.Broomhead, D., R. Jones, and G. King, 1988: Comment on “Singular-value decomposition and embedding dimension”.

,*Phys. Rev.***37****,**5004–5005.Dijkstra, H., and J. Neelin, 1995: On the attractors of an intermediate coupled ocean-atmosphere model.

,*Dyn. Atmos. Oceans***22****,**19–48.Franzke, C., D. Crommelin, A. Fischer, and A. Majda, 2008: A hidden Markov model perspective on regimes and metastability in atmospheric flows.

,*J. Climate***21****,**1740–1757.Gilmore, R., 1998: Topological analysis of chaotic dynamical systems.

,*Rev. Mod. Phys.***70****,**1455–1529.Guo, B., and D. Huang, 2006: Existence of weak solutions and trajectory attractors for the moist atmospheric equations in geophysics.

,*J. Math. Phys.***47****.**083 508, doi:10.1063/1.2245207.Hamilton, J., 1989: A new approach to the economic analysis of nonstationary time series and the business cycle.

,*Econometrica***57****,**357–384.Horenko, I., E. Dittmer, A. Fischer, and C. Schütte, 2006a: Automated model reduction for complex systems exhibiting metastability.

,*SIAM Multiscale Model. Simul.***5****,**802–827.Horenko, I., J. Schmidt-Ehrenberg, and C. Schütte, 2006b: Set-oriented dimension reduction: Localizing principal component analysis via hidden Markov models.

*Computational Life Sciences II: CompLife 2006*, M. Berthold, R. Glen, and I. Fischer, Eds., Lecture Notes in Bioinformatics, Vol. 4216, Springer, 98–115.Horenko, I., R. Klein, S. Dolaptchiev, and C. Schütte, 2008: Automated generation of reduced stochastic weather models. I: Simultaneous dimension and model reduction for time series analysis.

,*SIAM Multiscale Model. Simul.***6****,**1125–1145.Lisi, F., 1996: Statistical dimension estimation in singular spectrum analysis.

,*Stat. Methods Appl.***5****,**203–209.Lorenz, E., 1963: Deterministic nonperiodic flow.

,*J. Atmos. Sci.***20****,**130–141.Lorenz, E., 2006: An attractor embedded in the atmosphere.

,*Tellus***58A****,**425–429.Majda, A., C. Franzke, A. Fischer, and D. Crommelin, 2006: Distinct metastable atmospheric regimes despite nearly Gaussian statistics: A paradigm model.

,*Proc. Natl. Acad. Sci. USA***103****,**8309–8314.Mardia, K., J. Kent, and J. Bibby, 1979:

*Multivariate Analysis*. Academic Press, 521 pp.Packard, N., J. Crutchfield, J. Farmer, and R. Shaw, 1980: Geometry from a time series.

,*Phys. Rev. Lett.***45****,**712–716.Schütte, C., and W. Huisinga, 2003: Biomolecular conformations can be identified as metastable sets of molecular dynamics.

*Special Volume: Computational Chemistry*, P. G. Ciaret and C. Le Bris, Eds., Vol. 10,*Handbook of Numerical Analysis*, Elsevier, 699–744.Simmons, A., and J. Gibson, 2000: The ERA 40 project plan. ERA 40 Project Rep. Series 1, ECMWF, Reading, United Kingdom, 63 pp.

Takens, F., 1981: Detecting strange attractors in turbulence.

*Dynamical Systems and Turbulence, Warwick 1980*, D. Rand and L. Young, Eds., Lecture Notes in Mathematics, Vol. 898, Springer, 366–381.Vautard, R., P. Yiou, and M. Ghil, 1992: Singular-spectrum analysis: A toolkit for short, noisy chaotic signals.

,*Physica D***58****,**95–126.Whitney, H., 1936: Differentiable manifolds.

,*Ann. Math.***37****,**645–680.

^{1}

The data were kindly provided by H. Osterle and R. Klein from the Potsdam Institute of Climate Research (PIK).