## 1. Introduction

Over the last two decades, singular spectrum analysis (SSA) and its multivariate extension (M-SSA) have become widely used in the identification of intermittent or modulated oscillations in geophysical and climatic time series (Vautard et al. 1992). These methods are closely related to principal component analysis (PCA) or empirical orthogonal function (EOF) analysis; they are primarily designed for the reduction of the dimensionality of a given dataset and the compression of a maximum of variance into a minimal number of robust components. In the identification of regularity, as well as the reduction to a simpler and easier to interpret picture of complex observations, SSA and M-SSA have demonstrated their usefulness in numerous applications; see Ghil et al. (2002b) for a comprehensive overview.

Both SSA and M-SSA decompose the time-delayed embedding of a given dataset (Broomhead and King 1986a,b) into a set of data-adaptive orthogonal components, while M-SSA also takes cross correlations into account. It turns out that these components can be classified essentially into trends, oscillatory patterns, and noise and allow a reconstruction of a “skeleton” of the underlying dynamical system’s structure (Vautard and Ghil 1989; Ghil and Vautard 1991; Vautard et al. 1992). Several practical aspects of SSA and its application to time series analysis are covered in Golyandina et al. (2001) and Golyandina and Zhigljavsky (2013).

In practice, we are usually confronted with the problem of the regular part of the behavior being rather weak and hidden by substantial noise. Without any a priori knowledge of the underlying dynamics—and by visual inspection or by more common spectral methods alone—it may be difficult to find and formulate a proper model for the mechanisms that underlie the regularity, if any.

To prevent the misinterpretation of random fluctuations as oscillations in a univariate SSA analysis, Allen and Smith (1996, hereafter AS) formulated an objective statistical-significance test against a red noise null hypothesis. In their Monte Carlo–type test, these authors simulate short-term temporal correlations of the time series by means of a first-order autoregressive process [AR(1)]; such a process does not support oscillatory behavior and it is therefore well adapted to the task.

In the case of multivariate data, it is not only temporal but also spatial correlations that have to be taken into account in the formulation of the null hypothesis. Allen and Robertson (1996) proposed a transformation of the data to pairwise uncorrelated principal components (PCs) by means of a conventional PCA analysis; their method then proceeds to fit independent AR(1) processes to each of the PCs.

Small mismatches in each of the null hypotheses are likely, however, to be amplified as the number of PCs increases, and this increases the risk of erroneously identifying oscillations where none are present. In an idealized experiment that uses harmonic oscillators hidden by irregular noise, we shall see that that such misidentification may occur even when the superimposed noise originates from an AR(1) process as well; that is, there is no formally erroneous specification in the definition of the null hypothesis.

In this paper, we introduce a modification of the Monte Carlo test of Allen and Robertson (1996) that helps reduce so-called type-I errors and improves the discriminating power of the test. We propose here to apply Procrustes target rotation (Green 1952; Hurley and Cattell 1962; Schönemann 1966) in matching M-SSA eigendecompositions of the null hypothesis’s surrogate data with that of the observed data. In this setting, the Monte Carlo tests of AS and of Allen and Robertson (1996) emerge as special cases.

In our application to sea surface temperature (SST) data from the Simple Ocean Data Assimilation (SODA) reanalysis (Carton and Giese 2008; Giese and Ray 2011) and sea level pressure (SLP) data from the atmospheric Twentieth-Century Reanalysis, version 2 (20CRv2; Compo et al. 2011), we rely furthermore on the varimax M-SSA analysis introduced recently by Groth and Ghil (2011). Feliks et al. (2013) applied such a varimax rotation to climatic indices and showed that it helps reduce mixture problems in the EOFs and that it provides therewith much sharper spectral results.

In the present paper, we follow Feliks et al. (2011) and focus on SST data in the Gulf Stream region but combine it with SLP data in the entire North Atlantic region in a joint M-SSA analysis. The cleaner and sharper spectral results obtained herein support the previous authors’ findings of very similar interannual peaks in the Gulf Stream SST data and the North Atlantic Oscillation (NAO). Given that the proposed Monte Carlo test improves M-SSA’s discriminating power, our findings provide even stronger evidence for shared physical mechanisms between the Gulf Stream’s meandering and the atmospheric NAO.

The remainder of the paper is organized as follows: In section 2, we first discuss the key features of M-SSA and the recently introduced varimax rotation of the M-SSA solution. We briefly review the Monte Carlo testing procedure as previously applied to M-SSA. In section 4, we introduce the concept of Procrustes target rotation as a generalization of such a test. In section 5, we apply the proposed testing methodology to an idealized statistical experiment. The results of this experiment are systematically evaluated in section 6. The proposed methodology is finally applied to observed SST and SLP data in section 7, and a summary of the results in section 8 concludes the paper.

## 2. SSA

Univariate SSA and its multivariate extension M-SSA rely on the classical Karhunen–Loève decomposition of a stochastic process. Broomhead and King (1986a,b) introduced SSA and M-SSA into dynamical system analysis, as a robust version of the Mañé–Takens idea to reconstruct dynamics from a time-delayed embedding of time series (Mañé 1981; Takens 1981). The method essentially diagonalizes the lag–covariance matrix with respect to a basis of orthogonal eigenvectors and computes the corresponding eigenvalues.

The M-SSA eigenvectors are often referred to as space–time empirical orthogonal functions (ST-EOFs; Plaut and Vautard 1994; Ghil et al. 2002b). The new data-adaptive eigenbasis is optimal in the sense that—for any truncation *k* with respect to the leading eigenelements—it minimizes the total mean squared error between the full dataset and that *k*-truncated reconstruction. Hence, projecting the dataset onto the data-adaptive eigenvectors simplifies the picture of a possibly high-dimensional complex system by viewing it in an optimal subspace.

Let *D* channels of length *N*. Each channel *d* is embedded into an *M*-dimensional phase space by using lagged copies

The latter explicitly imposes a Toeplitz structure—with constant sub- and superdiagonals—on the covariance matrix, and the eigenvectors are then necessarily either symmetric or antisymmetric in the univariate case. This feature of the Toeplitz approach helps the detection of oscillations, but it enhances the risk of spurious-oscillation identification as well. In the multichannel case (Keppenne and Ghil 1993; Plaut and Vautard 1994), each block has Toeplitz structure, while the “grand” block covariance matrix is symmetric. Hence, all eigenvalues are real, but negative eigenvalues may appear as well. This negative bias has to be compensated by a positive bias in the positive eigenvalues, which may reduce the power of the statistical test.

For simplicity, we rely here on the trajectory approach of Eq. (1) to calculate

*D*consecutive segments

*M*, each of which is associated with a channel in

Still, PCA overall was primarily designed for dimensionality reduction and signal compression of multichannel data. It is thus not clear, a priori, how informative PCA in general or M-SSA in particular can be in the interpretation of the underlying system’s dynamics or structure. Both PCA and M-SSA, in fact, suffer from degeneracy of eigenvectors when the corresponding eigenvalues are similar in size (North et al. 1982). Instead of clearly separating structurally distinct dynamical phenomena—e.g., distinct oscillations in the case of M-SSA—one often observes a mixture of two or more eigenvectors.

A common approach to reduce such mixture effects and to improve the physical interpretation of the results is to perform a rotation of the eigenvectors:

In the following, we no longer explicitly distinguish between unrotated and rotated eigenelements and thus drop the superscript

## 3. Monte Carlo SSA: A short review

Usually, the set of eigenvector–eigenvalue pairs are ranked in descending order of the eigenvalues. This informal ranking, however, should not be confused with the order of significance. It is only by testing against a specific null hypothesis that one can draw further conclusions about significant deterministic behavior (cf. AS).

In this context, AS proposed a framework of Monte Carlo testing for SSA that compares the variance captured by the data eigenvalues with that of an ensemble of surrogate data. Since climate and other geophysical records tend to have larger power at lower frequencies, the authors discuss the null hypothesis of an AR(1) process; note, though, that the class of null hypotheses is not restricted to such simple, linear stochastic processes. The AS approach also provides a low-bias estimator for the model parameters, given even a short time series. In the case of multivariate datasets, Allen and Robertson (1996) proposed first a rotation to uncorrelated principal components by means of a classical PCA, prior to the estimation of independent AR(1) processes.

Once an appropriate model for the null hypothesis has been formulated, an ensemble of surrogate data *N* and dimension *D* as the original dataset is generated and, for each realization, the covariance matrix

However, as AS further pointed out, this test tends to be too lax. Since the eigendecomposition puts maximum variance into a minimal number of data-adaptive components, artificial variance compression may occur; that is, SSA may account for too much variance in the largest eigenvalues and too little in the smallest. This increases the likelihood of the largest data eigenvalues being significant. Later on we shall see that this effect is amplified when the number of channels increases (e.g., when

Groth and Ghil (2011) have shown, furthermore, that this undesired effect of artificial variance compression can be at least partly reduced by a subsequent varimax rotation. The latter relaxes slightly the diagonal form of the eigenvalue matrix

Initially, this test gives confidence intervals for only

Such a pure frequency encoding of eigenvectors is quite helpful for single-channel SSA analysis (Ghil et al. 2002b), but it can be rather misleading in the multichannel setting of M-SSA. In the latter, it is possible that two eigenvectors have similar frequencies but different spatial patterns and thus could be linked to different types of dynamical behavior (e.g., two oscillators that are only slightly out of tune but not actually synchronized; Groth and Ghil 2011). The frequency pairing would then associate the same significance level to both eigenvectors, although their corresponding eigenvalues might be quite different.

In the end, the test in Eq. (8) may carry the opposite risk of being too conservative and thus not sufficiently sensitive. Since the null hypothesis approximates only certain aspects of the dataset, its eigendecomposition may be suitable for the description of the dataset only to a limited extent. In this context, AS noted that a weak signal may not align very well with the eigenvectors of the null hypothesis and that it could, therefore, be missed altogether. Finally, this test on the EOFs of the null hypothesis may not take into account all the advantages of a varimax M-SSA.

Significance tests on variance can be complemented by using other statistical properties of the oscillatory modes to distinguish regular behavior from noise (e.g., Paluš and Novotná 2004; Holmström and Launonen 2013).

## 4. Revisiting Monte Carlo SSA

In the following, we propose a Monte Carlo SSA algorithm that operates on the data eigenbasis in such a way as to keep its optimal data description properties as well as its high sensitivity in the identification of weak signals. At the same time, we allow for small mismatches between the data-generating process and the null hypothesis in order to reduce the risk of the test being either too lenient or overly conservative.

### a. Procrustes target rotation

The linear maps in Eq. (9) are based on the structure of the eigenvectors, and they disregard completely the eigenvalue spectrum. It is, however, the combination of both eigenvalues and eigenvectors that determines the spatiotemporal correlation structure of the underlying dynamical system. To improve the comparison of the data eigendecomposition with that of the surrogate data, we have to take both aspects into account.

^{1}

Cliff (1966) has suggested, as an alternative to the target rotation of one eigenbasis onto another, the possibility of comparing the two eigenbases with a common basis of maximum similarity. A solution to this problem is likewise given by Eq. (10), with the rotation of the data and surrogate data eigenvectors to

### b. Rank-deficient covariance matrix

So far we have assumed covariance matrices

*η*is equal to the larger of

The test based on the reduced covariance matrix is essentially a univariate test since no cross-channel covariance information is used in

A solution to this rank-deficiency problem is likewise given by the target rotation onto ST-EOFs, as described in the previous section. In this algorithm, it is the multiplication of the eigenvectors by their corresponding singular values that automatically restricts the projection to nonvanishing eigenelements and the orthogonality constraints in

As a special case of this solution, we could also simply remove all vanishing eigenelements from

### c. Composite null hypothesis

Once a certain part of the time series has been identified as signal, AS proposed a single-channel SSA composite test for the remainder of the time series against an AR(1) null hypothesis.

Following AS, we define

In the test against a pure-noise null hypothesis, *M*), and we estimate the AR(1) parameters from Eq. (16) instead, even when

In our proposed target-rotation algorithm, the individual realizations of the surrogate covariance matrix

The AS composite test we have considered so far applies only to the single-channel case of SSA. In the multichannel case of M-SSA, the covariance matrix

*D*individual

*M*, as long as

*d*a covariance matrix

*D*independent AR(1) processes, and the channel-wise solution from Eq. (19) is not unique.

This indeterminacy is a shortcoming of the test on T-EOFs, which are invariant with respect to an orthogonal rotation of the input channels. We will come back to this problem when comparing the null hypothesis test on T-EOFs with that on ST-EOFs in the presence of cross correlations. In the latter test, the ST-EOFs are not invariant and we proceed with the channel-wise solution of Eq. (19).

A drawback of this solution is the fact that only the projection onto T-EOFs [cf. Eq. (14)] satisfies

Note that in the multichannel parameter estimation of Eq. (19), there is no equivalent formulation to the single-channel case of Eq. (17) (i.e., no equivalent to a parameter estimation that would take the Procrustes target rotation into account).

## 5. Experimental design

To illustrate the limitations of the Monte Carlo tests in section 3, we consider the idealized case of a cluster of harmonic oscillators with red noise superimposed on the observations; hence, the model specification of the null hypothesis is correct.

We create *T*, and it has the same frequency dependence as the superimposed AR(1) process—that is, *γ* being the damping parameter of the AR(1) process. We consider the optimal case of independent AR(1) processes for each of the *D* channels and set

The multichannel time series is first transformed to uncorrelated PCs by means of a classical PCA analysis, and individual AR(1) processes are fitted to each of the PCs. Then, surrogate realizations of length *N* are created and transformed back.

Figure 1 shows the resulting eigenvalue spectrum for a typical realization of the simulation experiment. Figures 1a and 1b show the same spectrum but with different significance levels.

In Fig. 1a, the error bars are derived from the projection algorithm of Eq. (6). For each surrogate realization, the covariance matrix is determined and projected onto the data ST-EOFs.

We clearly observe the limitation of the significance test since more eigenvalues than expected are significant. Especially among the largest eigenvalues, there are some that correspond to the noise (open circles) and that, nonetheless, appear as significant oscillations. The appearance of such false positives (FPs) reduces the precision and limits the explanatory power of the test. All true oscillations (filled circles), however, also reject the null hypothesis of random fluctuations. These oscillations are referred to as true positives (TPs).

In Fig. 1b, the error bars are derived from the scaled target-rotation algorithm of Eq. (11). We first determine the eigendecomposition for each of the former surrogate realizations and then look for an optimal orthogonal rotation toward the data eigendecomposition, as described in section 4. We see that this algorithm reduces the number of FPs to one at

In this experiment, we have chosen the parameters in the limiting case where

Furthermore, we have performed a significance test on the null-hypothesis basis according to Eq. (8). It turns out that, with this test, only three out of four oscillations are detected. Figure 2 thus confirms the findings of AS that the test is less sensitive in the detection of weak signals. Moreover, we observe that each true oscillation is not necessarily captured by a single EOF pair of

## 6. Sensitivity versus specificity

### a. Basic definitions

We consider now the identification of an oscillatory pair as a binary classification test. Such a test is characterized by its sensitivity and specificity.

Ideally, one wishes for a test that is 100% sensitive (i.e., it identifies all actual oscillations) and also 100% specific (i.e., it does not mistake any spurious oscillation as an actual one). But, in practice, there is always a trade-off between these two properties of a test, and any classification test has a minimum error, known as the Bayes error rate (Fukunaga 1990).

In the following, we proceed with the two tests based on the data EOFs that we first compared in Fig. 1 and evaluate more systematically their capability to identify true oscillatory components. Based on an ensemble of multiple repetitions of the experiment, we thus evaluate the sensitivity as well as the specificity of these two tests.

First, we determine the eigendecomposition of a noise-free realization of the set of harmonic oscillators of section 5 in order to get a reference set of eigenelements that describes the harmonic part of the full time series. Next, we add the desired amount of red noise and rerun the eigenanalysis.

To identify true oscillations in the eigendecomposition of the noise-contaminated dataset, we (i) project the noise-free covariance matrix onto the noise-contaminated eigendecomposition [cf. Eq. (6)] and (ii) apply the proposed target-rotation algorithm from the noise-free eigenelements to the noise-contaminated eigenelements [Eq. (12)]. This way, the noise-free realization is compared with the noise-contaminated eigendecomposition exactly in the same way as we compare the surrogate realizations in the two significance tests.

Corresponding to the four imposed oscillations in the noise-free realization, we consider the eight largest elements in

Typically, we expect to identify the same eigenelements as true oscillations—with or without Procrustes rotation—but small differences are possible. To avoid the identification algorithm being in favor of one of the two significance tests, we keep only those noise-contaminated realizations that give the same identification results.

Once we have classified the eigenelements of the noise-contaminated dataset into true and spurious oscillations, we proceed with the two significance tests as in the previous section (cf. Fig. 1) and determine the number of TPs and FPs (#TP and #FP, respectively) from an ensemble of several repetitions of the experiment.

### b. Artificial variance compression

In the example of section 5 we have seen that the inclusion of eigenvalues, in addition to the EOFs, into the target-rotation algorithm has reduced the number of FPs, in particular for higher-rank eigenvalues. For this near-singular case of the covariance matrix (i.e.,

The experiment is repeated 100 times and the specificity is plotted as a function of the rank order *k* in Fig. 3a. As expected, there is an enhanced rate of FPs, especially at higher-rank EOFs, and that, in turn, reduces the specificity in both cases. In the target-rotation algorithm, however, this undesirable effect has been largely reduced, and we show in the following subsections that the overall number of FPs does indeed not exceed the expected nominal level, according to the chosen significance level.

Figure 3b shows the distribution of true oscillations among the full spectrum of eigenvalues, as a fraction of the number of replicates of the test. As a consequence of the low signal-to-noise ratio, it turns out that the true oscillations are not necessarily attributed to the leading eigenvalues and that the eigenvalue rank order used originally by Broomhead and King (1986a,b) is, therefore, not a reliable method for separating signal from noise; this result provides additional motivation for using a Monte Carlo test against a red noise process (cf. Allen and Smith 1996; Allen and Robertson 1996). To compare the success of the latter type of test with that of a simple rank-order test, we determine the number of TPs in the set of the eight largest eigenvalues as well. This criterion implies, of course, knowledge of the correct number of true oscillations, and it thus avoids estimating a break in the eigenvalue spectrum.

In the following, we first examine in greater detail the reliability of the two tests based on the data ST-EOFs—with and without scaling of the ST-EOFs—in different experimental settings. In particular, we consider the effect of modifying the number *D* of observed channels, the observation length *N*, and the window width *M*. Furthermore, we examine the effects of data compressions into PC. Finally, we compare the tests based on ST-EOFs with that based on T-EOFs.

### c. Number *D* of observed channels

We first analyze the influence of the observed number of channels. In our experiment of a cluster of oscillators with uncorrelated observational red noise, we expect to improve the detection rate of shared oscillations as the number of channels—and hence the amount of information—increases, while at the same time the signal-to-noise ratio is enhanced.

Figure 4 shows #TP and #FP (i.e., the average number of TPs and FPs) as a function of the number *D* of channels. An increase in *D* does indeed help the extraction of the signal from the noise, and the number of TPs converges toward its maximum value, which is

The convergence occurs already for

This improved detection rate, however, cannot be merely attributed to a concentration of oscillatory behavior in the largest eigenvalues alone. It turns out that no more than half of the eight largest eigenvalues are TPs (heavy gray line in Fig. 4).

While the sensitivity of the test does grow with *D*, it is indispensable to keep the specificity high as well. Figure 4 makes it clear that this is not the case for the projection algorithm (i.e., when

It is only the inclusion of eigenvalue information into the scaled target-rotation algorithm that finally helps control this type-I error, with the average number of FPs (solid red line in Fig. 4) now below the expected level of FPs over the entire range of *D* values. We note that this algorithm has a slight tendency toward a more conservative behavior—a property for which Procrustes methods have been often criticized (e.g., Paunonen 1997)—but the detection rate remains comparable to that of the unscaled target-rotation algorithm. In particular, the scaled target-rotation algorithm would be the preferable choice with respect to the test’s explanatory power.

To improve the detection of weak signals in the case of single-channel SSA, Paluš and Novotná (2004) proposed a test on the regularity of the oscillatory modes rather than on their variance. Their significance test has a demonstrably enhanced sensitivity, but it is not clear whether it remains sufficiently specific as well; see, for instance, Figs. 2 and 4 in Paluš and Novotná (2004), with further noise EOFs becoming significant above the upper significance levels in the test on regularity. On the other hand, the frequency-pairing algorithm in their test is much less susceptible to the problem of artificial variance compression than the projection approach in Eq. (6); see again Fig. 2 in Paluš and Novotná (2004), with the number of significant noise EOFs below the lower significance levels becoming largely reduced.

### d. Length *N* of the observations

In the previous subsection we have seen that for the unscaled target-rotation algorithm, the specificity strongly depends on the ratio of the embedding dimension *N*. It is the singular character of the covariance matrix *N* considerably exceeds *M*.

For a fixed number *D* of channels, we have varied the observation length *N* and rerun the analysis as before. Figure 5 shows the average number of TPs and FPs for both the unscaled and scaled target-rotation algorithms.

On the one hand, we see that the sensitivity is enhanced as the observation length *N* increases and that #TP tends toward the maximum value for sufficiently large *N*. It is only for very short observations that it drops to much lower values.

On the other hand, #FP in the unscaled target-rotation algorithm remains much higher than expected, even for large *N*. The scaled target-rotation algorithm, though, helps control the type I-error over the whole range of *N* values in the figure. In particular, the latter algorithm remains superior to the projection algorithm even in a full-rank case of very large *N*. This comparison demonstrates that the extent to which the artificial variance compression influences the hypothesis test is difficult to predict, as already indicated by Allen and Robertson (1996).

### e. Window length *M*

The length *N* of the observations and the number *D* of channels is usually specified by the experimental setting, but the window length *M* is a flexible parameter to be judiciously chosen by the data analyst. In its choice, we are usually confronted with the general trade-off between a high spectral resolution on the one hand and a high temporal resolution on the other; the decision may depend, therewith, on the specific problem. Increasing the window length increases the number of RCs as well, and M-SSA will provide then a more detailed spectral decomposition of the dataset, while incurring the risk of an excessive number of FPs.

Figure 6 shows the results of analyzing the same dataset as in the previous subsections for various values of *M*. First of all, the figure shows a consistently high detection rate for both methods over a large range of *M* values, with a slight decrease at small window length only. Note that a minimal window size of

At the same time, we observe that the scaled target-rotation algorithm controls well type-I errors over the entire range of *M* values in the figure. The projection algorithm for full-rank covariance matrices with *M* does the number of FPs diminish to the nominal level—a feature that strongly limits the choice of *M*.

### f. The effects of data compression

In the M-SSA analysis of high-dimensional data, it is common practice to perform first a conventional PCA analysis and to retain only a subset of leading PCs for the subsequent M-SSA analysis. This preprocessing is meant to reduce the number of input channels and the computational cost while retaining a large fraction of the total variance. Usually a small number *L* of channels is kept, no matter how large the dataset [Dettinger et al. 1995; Allen and Robertson 1996; Robertson 1996; Ghil et al. 2002b; see also Table 1 in Moron et al. (1998)]. Since the resulting PCs are pairwise uncorrelated at zero lag, the M-SSA results can be simply tested against independent AR(1) processes (cf. section 3).

Even though the transformation to conventional PCs turns out to be a helpful preprocessing step in the M-SSA analysis, its implications for the properties of the subsequent signal detection are rather complex. Given that the signal of interest typically involves only a small fraction of the total variance, we would expect it to show up only among the spatial EOFs (S-EOFs) with relatively small variance, while the leading S-EOFs might capture other large-scale effects. In this respect, the prior transformation to PCs could interfere with the detection of weak signals.

To study the implications of this type of preprocessing, we increase the number of observed channels in the previous example of a cluster of harmonic oscillators to *γ* for the *D* superimposed AR(1) processes is randomly drawn from the interval

These correlations between channels are meant to simulate the effect of spatial correlations in a randomly perturbed spatiotemporal process and, given our choice of *κ*, we expect the noise part to dominate the prior PCA analysis.

Figure 7 illustrates the implications of a transformation of the input channels to conventional PCs. The variance that is captured by the *L* leading S-EOFs increases monotonically, as expected, as the number *L* of S-EOFs increases. Before proceeding to the M-SSA analysis, one usually selects a good trade-off between a low number of components and a high fraction of the variance they capture.

Although the S-EOFs provide an efficient representation of the spatial aspects of the signal’s variance, potentially important information about a possibly weak signal in the time domain may be missed. To illustrate this limitation, we project the reference signal—that is, the cluster of harmonic oscillators without noise—onto the same S-EOFs and determine its variance as well (light solid line in the figure).

The variance of the reference signal increases monotonically with *L*, like that of the full signal, but the variance ratio between the reference and the full signal (dashed line) decreases markedly as *L* is reduced to values typically used in the type of preprocessing discussed herein; as a consequence, when

These effects have been further emphasized by the fact that the oscillations in our statistical experiment have random initial phases and that the PCA analysis is not taking this phase information into account. A similar negative effect is likewise possible in the detection of traveling oscillatory patterns (e.g., when having to rely on regionally averaged data) such as the SST data in section 7.

After the above preprocessing and retention of *L* leading conventional PCs, we next evaluate the detection rate of the subsequent M-SSA analysis. As before, we determine a set of reference ST-EOFs that correspond to oscillatory modes in order to identify true oscillatory behavior in the full signal’s ST-EOFs. To account for the projection of the full signal onto S-EOFs, we project the noise-free reference signal onto the same subspace as well and derive reference ST-EOFs from the M-SSA analysis of the corresponding PCs. Here we focus only on the scaled target-rotation algorithm and determine TPs and FPs in an ensemble of 50 repetitions of the experiment.

Figure 8 is similar to Figs. 4–6 and shows the number of TPs and FPs as a function of *L*. It turns out that the detection rate, or #TP (heavy solid line), is best for large *L* and that it does reach its optimal value in this limit. On the other hand, as the dataset is compressed into a decreasing number of PCs, the detection rate drops very markedly. This marked drop is comparable to that of Fig. 4, where *D*, the number of observed channels, affects the detection rate in a similar way.

In conclusion, we have seen that a compression of the dataset into a few leading PCs can strongly influence the capability of M-SSA to extract weak but significant signals. In particular, in the presence of other high-variance components, such preprocessing may reduce substantially the signal-to-noise ratio.

When the number of channels *D* exceeds the length *N* of the dataset (

Although it would be theoretically possible to perform M-SSA on the full set of *D* channels via the complementary eigendecomposition of the reduced covariance matrix (cf. section 2), it is computationally more efficient and numerically stabler to perform the subsequent varimax rotation on ST-EOFs of length

### g. Significance test on T-EOFs

So far, we have focused only on the two significance tests that are based on the unscaled and scaled target-rotation of ST-EOFs, respectively, in the two cases of a full-rank and a rank-deficient covariance matrix. In the latter case, we will compare those results now with that of the two tests that are based on the unscaled and scaled target-rotation of T-EOFs (cf. section 4).

Since no cross-channel covariance information is taken into account in the two tests on T-EOFs, we first transform the input channels into spatial PCs. As already discussed, this helps eliminate cross correlations at lag zero, but correlations at other time lags may remain and influence the tests on T-EOFs. To compare its reliability with that of the two tests on ST-EOFs, we consider correlated noise, as in the previous subsection, with the noise coupling strength set to

Figure 9 shows the number of TPs and FPs as a function of *κ* for the two unscaled target-rotation algorithms onto ST-EOFs and T-EOFs. Note that the latter case equals that of a projection onto T-EOFs as proposed by Allen and Robertson (1996) [cf. Eq. (14)]. The parameters *N*, *M*, and *D* are chosen to emphasize the severity of the problem of artificial variance compression (e.g., by taking

For uncorrelated noise (

The scaled target-rotation algorithm, on the other hand, clearly helps reduce the number of FPs not only in the test on ST-EOFs but also in the test on T-EOFs, as shown in Fig. 10. When the coupling is weak (*κ* increases. The TP rate at weak coupling is likewise high in both tests, although it does drop faster in the test on T-EOFs as the noise coupling increases.

In conclusion, the null-hypothesis test that is based on the scaled target-rotation algorithm of ST-EOFs provides the most reliable results also for the case of a rank-deficient covariance matrix (

## 7. An application to North Atlantic data

The SODA reanalysis dataset (Giese and Ray 2011, version 2.2.4) provides monthly SST fields over the 138-yr interval 1871–2008. The SST, following Feliks et al. (2011), is taken equal to the temperature in the upper 5 m of the ocean.

The analysis here is for the Gulf Stream region (30°–50°N, 76°–35°W), which includes the Cape Hatteras and the Grand Banks subregions. Feliks et al. (2011) identified in either one or both of these two regions interannual spectral peaks of 8.5, 4.2, and 2.8 yr. As discussed in the introduction, these peaks are similar to those found in the NAO index (Gámiz-Fortis et al. 2002; Paluš and Novotná 2004; Feliks et al. 2013); hence, the possibility of shared mechanisms between the ocean and atmosphere in the North Atlantic basin is worth examining further.

We include in our analysis, therefore, atmospheric SLP data that cover the North Atlantic region (25°–80°N, 80°W–33°E), taken from the 20CRv2 project (Compo et al. 2011) in the same 138-yr interval. The SST and SLP data fields are first converted into anomalies; that is, we remove at each grid point the average value over the 138-yr interval. To account for geographical variations in the grid-size resolution, we further multiply each grid point by the cosine of its latitude.

In the present analysis, we focus on interannual variability only and subsample the data in time with an annual sampling rate. The latter step implies a low-pass filtering of the monthly data with a Chebyshev type-I filter from which we take all the July values; see Feliks et al. (2013) for details.

To combine the annual SST and SLP anomalies into a joint M-SSA analysis, we further normalize each field to unit variance and concatenate all channels into a single large trajectory matrix. The resolution in time and space gives a total of

Figure 11a shows the M-SSA results—together with the significance level from a test against a null hypothesis of pure noise—as derived from the scaled target-rotation algorithm of ST-EOFs. It turns out that five eigenvalues exceed the significance level of 99% in the interannual frequency band: the largest eigenvalue, which can be attributed to a trend component, and two oscillatory pairs at period lengths of 2.7 and 2.2 yr, respectively.

A significance test that is based on the T-EOFs of the null-hypothesis covariance matrix confirms these findings (cf. Fig. 11b). The probability to observe five or more excursions above the 99% quantile, as given by the binomial distribution, is approximately 1.3%. This probability means that, even without any prior knowledge of the underlying dynamics and the frequencies of interest, we can still reject the null hypothesis at a significance level that is only marginally lower than 99%.

It appears that the low-frequency variance in EOFs 1–3 dominates the entire spectrum and that the significance levels for the adjacent frequencies are likely, therefore, to be overestimated; that is, EOFs 6–7 at a frequency of 0.05 yr^{−1} appear below the 15% quantile. Removing the leading components from the significance test could thus help reduce the bias in the AR(1) parameter estimation toward the strong trend components. In a composite test, we hence exclude not only EOFs 1–3 from the parameter estimation of the AR(1) null hypothesis but also the two oscillatory pairs that we found to be significant at the periods 2.7 and 2.2 yr, respectively.

Figure 12 shows the updated significance levels against a composite null hypothesis. In particular, at frequencies of

In the present case, however, we have prior reason to focus on a 7–8-yr frequency band. Feliks et al. (2011) identified significant oscillatory modes with periods of 8.5 and 10.5 yr in the SST field of the Cape Hatteras and Grand Banks regions, respectively, and have shown that these modes spin up in an atmospheric model and become synchronized with a simulated NAO index.

These authors have also shown that the spatiotemporal pattern of the 8.5-yr mode shares certain features with the so-called gyre mode (Jiang et al. 1995), which has a dominant 7–8-yr peak across a hierarchy of ocean models (Speich et al. 1995; Chang et al. 2001; Ghil et al. 2002a; Dijkstra and Ghil 2005). It is the modeling evidence in Feliks et al. (2011) combined with that obtained through the markedly improved discriminant power of the present significance test that provides the requisite, stronger evidence for the existence of such a joint mode in the analyzed SST and SLP data.

Relatively small discrepancies between the earlier frequency results of Feliks et al. (2011) and the present ones might be due to the shorter duration of the SODA reanalysis used there, which was of only 50 yr in Carton and Giese (2008, version 2.0.2–4), as well as the absence of a subsequent varimax rotation toward unimodal ST-EOFs in the previous analysis. We have further analyzed the SST and SLP anomalies separately, and the results support the existence of similar 7.7-, 2.7-, and 2.2-yr modes common to both of the fields (not shown).

Figure 13 shows the reconstruction of the SST and SLP anomaly fields from the three oscillatory modes found to be highly significant. In the SST anomalies (Figs. 13a–c), we observe in all three modes a concentration of small areas of high variance and of alternating sign along the Gulf Stream front. This spatial structure yields a weaker versus a stronger meandering of the eastward jet in the opposite phases of each mode, while the resulting deflection from the mean Gulf Stream position is largest in the 7.7-yr mode.

In the SLP anomalies (Figs. 13d–f), we observe a clear dipole structure in all three modes. In both the 7.7- and 2.2-yr mode, this dipole has a meridional orientation, with the two extrema that appear near the Iceland low and the Azores high, respectively. The phase alternation in these two modes thus leads to a weaker versus a stronger meridional SLP gradient. In the 2.7-yr mode, the dipole structure is tilted and oriented southwest to northeast. Hence, this oscillatory mode contributes to a tilt of the total SLP pattern in that direction. In all three oscillatory modes, we are thus led to the conclusion that the variability in the Gulf Stream region is probably linked to variability in the NAO.

Finally, to compare our analysis of the full SST field in the Gulf Stream region with that of a simple univariate analysis of regionally averaged indicators, we have further analyzed the mean SST field, as well as the leading PC of the same region, in a single-channel SSA analysis. It turns out that in both cases, apart from a low-frequency component of high variance, no further oscillatory modes are found to be significant at the 99% level.

In the single-channel SSA analysis of the mean SLP anomalies, as well as the leading PC of the North Atlantic region, oscillatory modes similar to the three modes discussed are found to be significant, but only at a lower, 97.5% level. The single-channel SSA results thus confirm our findings of possibly negative effects of data compression on the detection of weak signals; these results clearly demonstrate, therewith, the advantages of a full multichannel spectral analysis versus that of a simple scalar indicator.

## 8. Summary

In numerous applications, multichannel singular spectrum analysis (M-SSA) has proven an efficient tool for the identification of regular behavior in high-dimensional data (Ghil et al. 2002b, and references therein). Since M-SSA, like single-channel SSA (Vautard and Ghil 1989), can generate oscillatory-looking patterns from pure noise, Monte Carlo–type tests have been developed to provide objective criteria for its significance (Allen and Smith 1996; Allen and Robertson 1996). In the present paper, we have proposed several ways of improving such tests and studied their performance as a function of various parameters, such as the number *D* of observed channels, the length *N* of the time series, and the window parameter *M*.

We have shown that straightforward Monte Carlo tests for M-SSA are more likely to fail as the embedding dimension *N* of the observed time series. We introduced here Procrustes target rotation into the M-SSA setting and showed that it markedly improves the discriminant power of Monte Carlo–type tests by reducing the risk of type-I errors, while maintaining their sensitivity.

Our M-SSA analysis relied on varimax-rotated ST-EOFs (Groth and Ghil 2011), and we have shown that, in particular, the scaled target-rotation algorithm of ST-EOFs provides a robust significance test for both full-rank and rank-deficient covariance matrices. In the latter case, it clearly outperforms the test based on T-EOFs of a reduced covariance matrix, as proposed by Allen and Robertson (1996), especially in the presence of cross correlations.

We have further shown the limitations of preprocessing large datasets via data compression onto a few leading S-EOFs by means of a conventional PCA analysis in the M-SSA setting (i.e., when the goal is the detection of weak but significant signals in the space–time domain by M-SSA). Once a certain part of the time series has already been identified as signal, we have further proposed a generalization of the single-channel SSA composite test of Allen and Smith (1996) to M-SSA.

The evaluation of the methods was carried out at first in an idealized experiment using a cluster of harmonic oscillators with observational red noise. The perturbing noise is generated by the same class of AR(1) processes as the one used in the null hypothesis, and hence there is no formally erroneous specification of the latter.

The end-to-end testing algorithm that results from these various comparisons is summarized in appendix B. We applied this algorithm—along with the new varimax rotation methods introduced and tested herein—to the analysis of interannual variability in the North Atlantic basin. This analysis combined the SST field in the Gulf Stream region that includes Cape Hatteras and the Grand Banks with the SLP field over the entire North Atlantic basin. Given the more refined spectral results of varimax-rotated ST-EOFs and the improved discriminant power of our modified Monte Carlo test, we have been able to provide even stronger evidence for shared mechanisms between the Gulf Stream region and the North Atlantic Oscillation in the interannual frequency band.

## Acknowledgments

We thank Yizhak Feliks, Dmitri Kondrashov, and Andrew W. Robertson for helpful suggestions. Andreas Groth was supported by a postdoctoral fellowship of the Groupement d’Intérêt Scientifique (GIS) Réseau de Recherche sur le Développement Soutenable (R2DS) of the Région Ile-de-France while at the Ecole Normale Supérieure in Paris. Andreas Groth and Michael Ghil both received support from NSF Grants DMS-1049253 and OCE-1243175, as well as from ONR’s Multidisciplinary University Research Initiative (MURI) Grant N00014-12-1-0911.

## APPENDIX A

### Varimax Rotation of ST-EOFs

The common idea of so-called simple-structure rotations is to find a rotation that simplifies the interpretation of the eigenvectors and that reduces mixture effects. There are several ways to quantify the simplicity of an eigenvector’s structure (Richman 1986). Varimax rotation attempts to find an orthogonal rotation given by

*S*is the number of rotated eigenvectors

Since the criterion

*d*to the

*k*th ST-EOF and then try to maximize the variance in the participation index instead. Thus, the criterion becomeswith the normalization

In this way, the criteria

As proposed by Groth and Ghil (2011), we scale each eigenvector by its singular value prior to rotation, in order to stabilize the results over a large range of the number *S* of rotated eigenvectors and to minimize the risk of an overrotation (O’Lenic and Livezey 1988). That is, we first derive an orthogonal rotation matrix ^{1/2} the singular values. This yields a nonorthogonal

As shown hereafter, the rotation

## APPENDIX B

### Summary of Monte Carlo SSA Algorithms

Table B1 summarizes the different versions of the Monte Carlo SSA algorithm that have been discussed in the present paper. This includes the original algorithms of Allen and Smith (1996) for single-channel SSA and of Allen and Robertson (1996) for M-SSA (first row of the table), as well as their generalization to covariance matrices of arbitrary rank via the unscaled target-rotation algorithm (second row). The first column of the table deals with the case of a full-rank covariance matrix, while the second column presents the rank-deficient case.

Original Monte Carlo SSA algorithms of Allen and Smith (1996) and Allen and Robertson (1996) for full-rank and rank-deficient covariance matrices, respectively, in comparison with the proposed Procrustes target rotation algorithms.

All the algorithms in the upper half of the table (rows one and two) are based on the structure of the EOFs alone (i.e., they disregard completely the eigenvalue spectrum). To improve the comparison of the data eigendecomposition with that of the surrogate data, the scaled target-rotation algorithm is included in the lower half of the table (third and fourth rows).

Note that all the steps in each of these algorithms remain exactly the same in the case of varimax-rotated data eigenelements; that is, the target eigenelements

## REFERENCES

Allen, M. R., , and A. W. Robertson, 1996: Distinguishing modulated oscillations from coloured noise in multivariate datasets.

,*Climate Dyn.***12**, 775–784, doi:10.1007/s003820050142.Allen, M. R., , and L. A. Smith, 1996: Monte Carlo SSA: Detecting irregular oscillations in the presence of colored noise.

,*J. Climate***9**, 3373–3404, doi:10.1175/1520-0442(1996)009<3373:MCSDIO>2.0.CO;2.Broomhead, D. S., , and G. P. King, 1986a: Extracting qualitative dynamics from experimental data.

,*Physica D***20**, 217–236, doi:10.1016/0167-2789(86)90031-X.Broomhead, D. S., , and G. P. King, 1986b: On the qualitative analysis of experimental dynamical systems.

*Nonlinear Phenomena and Chaos*, S. Sarkar, Ed., Adam Hilger, 113–144.Carton, J. A., , and B. S. Giese, 2008: A reanalysis of ocean climate using Simple Ocean Data Assimilation (SODA).

,*Mon. Wea. Rev.***136**, 2999–3017, doi:10.1175/2007MWR1978.1.Chang, K.-I., , M. Ghil, , K. Ide, , and C.-C. A. Lai, 2001: Transition to aperiodic variability in a wind-driven double-gyre circulation model.

,*J. Phys. Oceanogr.***31**, 1260–1286, doi:10.1175/1520-0485(2001)031<1260:TTAVIA>2.0.CO;2.Cliff, N., 1966: Orthogonal rotation to congruence.

,*Psychometrika***31**, 33–42, doi:10.1007/BF02289455.Compo, G. P., and et al. , 2011: The Twentieth Century Reanalysis project.

,*Quart. J. Roy. Meteor. Soc.***137**, 1–28, doi:10.1002/qj.776.Dettinger, M. D., , M. Ghil, , and C. L. Keppenne, 1995: Interannual and interdecadal variability in United States surface-air temperatures, 1910–87.

,*Climatic Change***31**, 35–66, doi:10.1007/BF01092980.Dijkstra, H. A., , and M. Ghil, 2005: Low-frequency variability of the large-scale ocean circulation: A dynamical systems approach.

,*Rev. Geophys.***43**, RG3002, doi:10.1029/2002RG000122.Elsner, J. B., 1995: Significance tests for SSA.

*Proc. 19th Climate Diagnostics Workshop*, College Park, MD, CAC/NOAA, 187–190.Elsner, J. B., , and A. A. Tsonis, 1994: Low-frequency oscillation.

,*Nature***372**, 507–508, doi:10.1038/372507b0.Feliks, Y., , M. Ghil, , and A. W. Robertson, 2011: The atmospheric circulation over the North Atlantic as induced by the SST field.

,*J. Climate***24**, 522–542, doi:10.1175/2010JCLI3859.1.Feliks, Y., , A. Groth, , A. W. Robertson, , and M. Ghil, 2013: Oscillatory climate modes in the Indian monsoon, North Atlantic, and tropical Pacific.

,*J. Climate***26**, 9528–9544, doi:10.1175/JCLI-D-13-00105.1.Fukunaga, K., 1990:

*Introduction to Statistical Pattern Recognition*. 2nd ed., Academic Press, 592 pp.Gámiz-Fortis, S., , D. Pozo-Vázquez, , M. Esteban-Parra, , and Y. Castro-Díez, 2002: Spectral characteristics and predictability of the NAO assessed through singular spectral analysis.

,*J. Geophys. Res.***107**, 4685, doi:10.1029/2001JD001436.Ghil, M., , and R. Vautard, 1991: Interdecadal oscillations and the warming trend in global temperature time series.

,*Nature***350**, 324–327, doi:10.1038/350324a0.Ghil, M., , Y. Feliks, , and L. Sushama, 2002a: Baroclinic and barotropic aspects of the wind-driven ocean circulation.

,*Physica D***167**, 1–35, doi:10.1016/S0167-2789(02)00392-5.Ghil, M., and et al. , 2002b: Advanced spectral methods for climatic time series.

,*Rev. Geophys.***40**, 1–41, doi:10.1029/2000RG000092.Giese, B. S., , and S. Ray, 2011: El Niño variability in Simple Ocean Data Assimilation (SODA), 1871–2008.

*J. Geophys. Res.*,**116**, C02024, doi:10.1029/2010JC006695.Golyandina, N., , and A. A. Zhigljavsky, 2013:

*Singular Spectrum Analysis for Time Series*. Springer, 120 pp.Golyandina, N., , V. Nekrutkin, , and A. Zhigliavsky, 2001:

*Analysis of Time Series Structure: SSA and Related Techniques*. Chapman & Hall/CRC, 320 pp.Green, B., 1952: The orthogonal approximation of an oblique structure in factor analysis.

,*Psychometrika***17**, 429–440, doi:10.1007/BF02288918.Groth, A., , and M. Ghil, 2011: Multivariate singular spectrum analysis and the road to phase synchronization.

*Phys. Rev. E*,**84**, 036206, doi:10.1103/PhysRevE.84.036206.Holmström, L., , and I. Launonen, 2013: Posterior singular spectrum analysis.

,*Stat. Anal. Data Min.***6**, 387–402, doi:10.1002/sam.11195.Hurley, J. R., , and R. B. Cattell, 1962: The Procrustes program: Producing direct rotation to test a hypothesized factor structure.

,*Behav. Sci.***7**, 258–262, doi:10.1002/bs.3830070216.Jiang, S., , F. Jin, , and M. Ghil, 1995: Multiple equilibria, periodic, and aperiodic solutions in a wind-driven, double-gyre, shallow-water model.

,*J. Phys. Oceanogr.***25**, 764–786, doi:10.1175/1520-0485(1995)025<0764:MEPAAS>2.0.CO;2.Jolliffe, I. T., 2002:

*Principal Component Analysis*. 2nd ed. Springer, 488 pp.Kaiser, H., 1958: The varimax criterion for analytic rotation in factor analysis.

,*Psychometrika***23**, 187–200, doi:10.1007/BF02289233.Keppenne, C. L., , and M. Ghil, 1993: Adaptive filtering and prediction of noisy multivariate signals: An application to subannual variability in atmospheric angular momentum.

,*Int. J. Bifurcation Chaos***3**, 625–634, doi:10.1142/S0218127493000520.Mañé, R., 1981: On the dimension of the compact invariant sets of certain non-linear maps.

*Dynamical Systems and Turbulence*, Lecture Notes in Mathematics, Vol. 898, Springer, 230–242, doi:10.1007/BFb0091916.Moron, V., , R. Vautard, , and M. Ghil, 1998: Trends, interdecadal and interannual oscillations in global sea-surface temperatures.

,*Climate Dyn.***14**, 545–569, doi:10.1007/s003820050241.North, G. R., , T. L. Bell, , R. F. Cahalan, , and F. J. Moeng, 1982: Sampling errors in the estimation of empirical orthogonal functions.

,*Mon. Wea. Rev.***110**, 699–706, doi:10.1175/1520-0493(1982)110<0699:SEITEO>2.0.CO;2.O’Lenic, E. A., , and R. E. Livezey, 1988: Practical considerations in the use of rotated principal component analysis (RPCA) in diagnostic studies of upper-air height fields.

,*Mon. Wea. Rev.***116**, 1682–1689, doi:10.1175/1520-0493(1988)116<1682:PCITUO>2.0.CO;2.Paluš, M., , and D. Novotná, 2004: Enhanced Monte Carlo singular system analysis and detection of period 7.8 years oscillatory modes in the monthly NAO index and temperature records.

,*Nonlinear Processes Geophys.***11**, 721–729, doi:10.5194/npg-11-721-2004.Paunonen, S. V., 1997: On chance and factor congruence following orthogonal Procrustes rotation.

,*Educ. Psychol. Meas.***57**, 33–59, doi:10.1177/0013164497057001003.Plaut, G., , and R. Vautard, 1994: Spells of low-frequency oscillations and weather regimes in the Northern Hemisphere.

,*J. Atmos. Sci.***51**, 210–236, doi:10.1175/1520-0469(1994)051<0210:SOLFOA>2.0.CO;2.Richman, M. B., 1986: Rotation of principal components.

,*Int. J. Climatol.***6**, 293–335, doi:10.1002/joc.3370060305.Robertson, A. W., 1996: Interdecadal variability over the North Pacific in a multi-century climate simulation.

,*Climate Dyn.***12**, 227–241, doi:10.1007/BF00219498.Schönemann, P., 1966: A generalized solution of the orthogonal Procrustes problem.

,*Psychometrika***31**, 1–10, doi:10.1007/BF02289451.Speich, S., , H. Dijkstra, , and M. Ghil, 1995: Successive bifurcations in a shallow-water model applied to the wind-driven ocean circulation.

,*Nonlinear Processes Geophys.***2**, 241–268, doi:10.5194/npg-2-241-1995.Takens, F., 1981: Detecting strange attractors in turbulence.

*Dynamical Systems and Turbulence*, Lecture Notes in Mathematics, Vol. 898, Springer, 366–381.Vautard, R., , and M. Ghil, 1989: Singular spectrum analysis in nonlinear dynamics, with applications to paleoclimatic time series.

,*Physica D***35**, 395–424, doi:10.1016/0167-2789(89)90077-8.Vautard, R., , P. Yiou, , and M. Ghil, 1992: Singular-spectrum analysis: A toolkit for short, noisy chaotic signals.

,*Physica D***58**, 95–126, doi:10.1016/0167-2789(92)90103-T.

^{1}

In the case of varimax-rotated eigenelements, the target becomes