## Abstract

The variability of Indo-Pacific SST on seasonal to multidecadal time scales is investigated using a recently introduced technique called nonlinear Laplacian spectral analysis (NLSA). Through this technique, drawbacks associated with ad hoc prefiltering of the input data are avoided, enabling recovery of low-frequency and intermittent modes not previously accessible via classical approaches. Here, a multiscale hierarchy of spatiotemporal modes is identified for Indo-Pacific SST in millennial control runs of CCSM4 and GFDL CM3 and in HadISST data. On interannual time scales, a mode with spatiotemporal patterns corresponding to the fundamental component of ENSO emerges, along with ENSO-modulated annual modes consistent with combination mode theory. The ENSO combination modes also feature prominent activity in the Indian Ocean, explaining a significant fraction of the SST variance in regions associated with the Indian Ocean dipole and suggesting a deterministic relationship between these patterns. A pattern representing the tropospheric biennial oscillation also emerges along with its associated annual cycle combination modes. On multidecadal time scales, the dominant NLSA mode in the model data is predominantly active in the western tropical Pacific; this pattern is referred to here as the west Pacific multidecadal mode (WPMM). The interdecadal Pacific oscillation also emerges as a distinct NLSA mode, though with smaller explained variance than the WPMM. Analogous modes on interannual and decadal time scales are also identified in HadISST data for the industrial era, as well as in model data of comparable time span, though decadal modes are either absent or of degraded quality in these datasets.

## 1. Introduction

The internal variability of the Indo-Pacific Ocean is of wide scientific and societal interest as it has been found to be associated with a plethora of phenomena characterizing Earth’s climate. Among these phenomena is El Niño–Southern Oscillation (ENSO)—the dominant mode of interannual variability of the coupled climate system (Bjerknes 1969; Philander 1990; Neelin et al. 1998; Wang and Picaut 2004; Sarachik and Cane 2010; Trenberth 2013; Wang et al. 2017). The corresponding warm El Niño and cold La Niña events occur predominantly over the tropical Pacific Ocean and are sustained by ocean–atmosphere interactions producing a positive feedback, known as Bjerknes feedback (Bjerknes 1969; Wyrtki 1975; Cane and Zebiak 1985), whereby a positive SST anomaly in the eastern equatorial Pacific induces anomalous westerly surface winds, which reinforce the original SST anomaly through anomalous downwelling in the eastern Pacific. Moreover, negative-feedback mechanisms, leading to a self-sustained interannual oscillation (Cane and Zebiak 1985), have been extensively studied through conceptual models emphasizing equatorial wave propagation (Suarez and Schopf 1988; Battisti and Hirst 1989; Picaut et al. 1997; Weisberg and Wang 1997), heat recharge/discharge (Jin 1997a,b), or aspects of both (Wang 2001; Graham et al. 2015). Such models are part of a model hierarchy for ENSO that includes low-order deterministic and stochastic models (Thompson and Battisti 2001; Timmermann et al. 2003; Kleeman 2008; Gehne et al. 2014), intermediate-complexity models (Zebiak and Cane 1987; Suarez and Schopf 1988; Jin and Neelin 1993; Kleeman et al. 1999), and coupled climate models (Neelin et al. 1994; Deser et al. 2012; Choi et al. 2013; Bellenger et al. 2014).

Individual El Niño and La Niña events occur every 3–8 yr and are locked to the seasonal cycle (Rasmusson and Carpenter 1982), developing early in the boreal summer and peaking during the following winter. Among many studies on the phase synchronization of ENSO with the annual cycle, quadratic nonlinearities in the coupled atmosphere–ocean system have been proposed as a mechanism for producing so-called ENSO combination modes with spectral peaks at the sum and difference of the ENSO and annual cycle frequencies (Stein et al. 2011, 2014; Stuecker et al. 2013, 2015a; Ren et al. 2016). As a result of this coupling, the atmospheric response of ENSO combination modes exhibits a southward migration of surface zonal winds taking place in boreal winter (McGregor et al. 2012), which is generally associated with the seasonally locked termination of ENSO events (Vecchi 2006; Lengaigne et al. 2006). Another important feature of ENSO, which is a hallmark of nonlinear dynamics, is that asymmetries exist between the El Niño and La Niña phases, characterized by positively skewed SST anomalies due to strong El Niño events (Burgers and Stephenson 1999; An and Jin 2004; Weller and Cai 2013).

The tropospheric biennial oscillation (TBO; Meehl 1987, 1993, 1994, 1997; Li and Zhang 2002; Li et al. 2006) and the Indian Ocean dipole (IOD; Saji et al. 1999; Webster et al. 1999) are other prominent modes of interannual variability confined to the Indo-Pacific domain that are also seasonally locked and driven by strong atmosphere–ocean coupling. The TBO describes biennial variability of the Asian–Australian monsoon in a process where a weak Australian monsoon in boreal winter with anomalous easterlies and negative SST anomalies over the Maritime Continent is followed by a strong Indian monsoon with positive SST anomalies over the Maritime Continent and a westward shift of the Walker circulation, which is followed in turn by a strong Australian monsoon (and opposite-sign SST circulation anomalies) in the subsequent boreal winter (e.g., Meehl 1993, 1997; Meehl and Arblaster 2002; Wu and Kirtman 2004; Li et al. 2006). On the other hand, the IOD emerges in late spring, peaks in October, and quickly diminishes afterward. The IOD is usually measured by means of the IOD index, which is defined as the difference in SST anomalies between the western (50°–70°E, 10°S–10°N) and eastern (90°–110°E, 10°S–0°) parts of the Indian Ocean (Saji et al. 1999). A positive IOD phase is associated with a negative SST anomaly west of Sumatra, which is reinforced through Bjerknes feedback by coupling with correspondingly weaker convection and westerlies.

Although the IOD was discovered over 15 yr ago and has received considerable attention by the climate science community, it is still not fully understood. Several studies [e.g., Annamalai et al. (2003) and references therein] report that the IOD is correlated with the ENSO and have argued that the former is forced by the latter. Others (e.g., Yamagata et al. 2003; Ashok et al. 2003; Behera et al. 2006) claim that the IOD is an intrinsic mode of the Indian Ocean and pointed to cases of strong IOD events emerging in the absence of El Niño and La Niña events. Establishing the independence of the highly correlated IOD and ENSO events through classical methods such as empirical orthogonal function (EOF) analysis is far from straightforward, and despite extensive research on IOD dynamics and climatic impacts (e.g., Ashok et al. 2007; Manatsa et al. 2008; Cai et al. 2012, 2013; Pepler et al. 2014), there is no consensus on whether it is an intrinsic mode of Indian Ocean variability or a regional manifestation of ENSO. Therefore, alternative ways of reexamining its nature should still be sought. Similarly, the origin and nature of the TBO is a debated topic with some studies finding that it is an ENSO-slaved phenomenon (Goswami 1995), others that it can exist independently of ENSO forcing (Li et al. 2006), and yet others questioning IOD statistics altogether (Stuecker et al. 2015b).

Recently, Dommenget (2011) and Zhao and Nigam (2015) undertook an effort to verify previously applied methodologies and argued that upon careful analysis no IOD pattern emerges independently of ENSO. Moreover, Izumo et al. (2010, 2014) and Yuan et al. (2011, 2013) reported that the IOD is a good predictor of ENSO more than one year ahead. If true, this would potentially improve the predictability of ENSO beyond the so-called spring barrier (Webster and Hoyos 2010), a well-recognized scientific challenge and at the same time an issue of significant importance to economies around the world. Arguably, however, until the relation between the IOD and ENSO is correctly established, progress in recognizing the impact of these phenomena on other parts of the climate system will be hampered.

Beyond prediction of interannual fluctuations of Earth’s climate system, prognosing decadal to centennial variability also constitutes a grand challenge for climate science and potentially entails benefits to economies worldwide. Most studies on the decadal variability of the Indo-Pacific basin have focused on the Pacific decadal oscillation (PDO; Mantua et al. 1997) or the interdecadal Pacific oscillation (IPO; Power et al. 1999), which are considered by some as part of the same phenomenon. The PDO is traditionally defined over the North Pacific (20°–65°N, 120°E–110°W) through the first EOF of seasonally detrended monthly data. The IPO is usually defined as the leading EOF of low-pass-filtered SST anomalies over the whole Pacific basin. It is often described as ENSO-like decadal variability owing to the similarity of its spatial pattern to ENSO, although the two patterns differ in their relative activity in the tropical and North Pacific Oceans.

A major challenge in research on decadal variability is the relatively short observational record from which it is difficult to robustly establish modes operating at decadal time scales and to project their activity into the future. Han et al. (2014) examined a longer observational record and showed that the IPO does not explain in itself sea level rise over certain decades. Based on this finding, they hypothesized the existence of another decadal mode. They argued that this mode acts mainly over the tropical Indian Ocean (as opposed to the IPO, which is active in the Pacific Ocean) but did not address the question of its origin. England et al. (2014) analyzed the most recent hiatus and associated it with exceptionally strong trade winds [also considered responsible for sea level rise by Han et al. (2014)] and increased heat uptake by the ocean. Notably, they observed that the magnitude of trade wind strengthening could not be explained solely by the IPO, suggesting that other factors (e.g., the internal variability of the Indian Ocean) should play a role. Without further explanation of the origin of trade wind enhancement, a heat budget analysis of the modeled hiatus was performed based on a simulation with prescribed values of trade winds (as models fail to produce as strong winds as observed; see Luo et al. 2012). Several studies (Karnauskas et al. 2009; Solomon and Newman 2012; Seager et al. 2015) also stress the need to consider modes of Indo-Pacific internal variability other than the IPO to build a more complete explanation of the recent climatic record. In particular, other indices created from low-pass-filtered (decadal and centennial) SST (Timmermann 2003; Yeh and Kirtman 2004; Sun and Yu 2009; Ogata et al. 2013), whose associated spatial patterns generally consist of equatorial SST zonal dipoles, have been associated with decadal fluctuations in models and observations.

In this work, presented as a two-part series, we reexamine the existence and significance of the internal variability of the Indo-Pacific Ocean using a recent data analysis technique called nonlinear Laplacian spectral analysis (NLSA; Giannakis and Majda 2011, 2012c, 2013, 2014; Giannakis and Majda 2014 is hereafter GM). NLSA combines ideas from delay-coordinate embeddings of dynamical systems and kernel methods for machine learning to extract spatiotemporal modes of variability from high-dimensional time series. Unlike classical linear techniques such as EOF analysis, which typically require filtering or detrending of the data to isolate the time scale of interest, NLSA is able to simultaneously extract modes of multiple time scales with no ad hoc preprocessing of the data. The NLSA modes are computed from the eigenfunctions of a discrete Laplace–Beltrami operator on the nonlinear manifold sampled by the data. These eigenfunctions can be thought of as nonlinear analogs of the principal components (PCs) in EOF analysis, and they capture intrinsic time scales associated with the dynamical system generating the data (Berry et al. 2013; Giannakis 2017). As a result, NLSA is a useful tool for objectively recovering quasiperiodic patterns such as the dominant oscillations in the Indo-Pacific system from high-dimensional time series. Among other applications, NLSA and related kernel algorithms have been used to recover and predict low-frequency modes of variability in the North Pacific Ocean (Giannakis and Majda 2012a,b; Bushuk et al. 2014; Comeau et al. 2017), including the PDO and the North Pacific Gyre oscillation (Di Lorenzo et al. 2008).

In this paper, we study NLSA modes of Indo-Pacific SST variability in control integrations of coupled climate models and in observational data. Our analyzed model data are monthly averaged output from millennial control integrations of the Community Climate System Model, version 4 (CCSM4; Gent et al. 2011; Deser et al. 2012), and the Geophysical Fluid Dynamics Laboratory Climate Model, version 3 (GFDL CM3; Griffies et al. 2011). As observational data, we study monthly averaged HadISST data (Rayner et al. 2003) spanning the industrial and satellite eras. Applied to these datasets, NLSA yields a hierarchy of modes spanning seasonal to multidecadal time scales. These modes include the annual cycle and its harmonics, as well as a family of aperiodic modes. In particular, the dominant patterns at the interannual time scale correspond to ENSO and the annual modulations of these patterns as predicted by combination mode theory. Moreover, NLSA recovers a distinct family of modes representing the TBO together with the associated combination modes capturing the phase locking of this pattern to the annual cycle. On decadal time scales, the dominant mode recovered by NLSA from model output is characterized by a pronounced anomaly in the tropical equatorial western Pacific, with anomalies of opposite sign identified west of Tasmania and over the eastern Pacific. This pattern, which we refer to here as the west Pacific multidecadal mode (WPMM), resembles more closely the second EOF of low-pass-filtered SST as opposed to the IPO. The latter is also recovered as a distinct mode by NLSA. In the second part of this work (Giannakis and Slawinska 2017, manuscript submitted to *J. Climate*, hereafter Part II), we study various linkages and climatic impacts of the modes identified in this paper.

This paper is organized as follows. Section 2 contains a high-level description of NLSA algorithms. In section 3, we describe the datasets studied in this work. In section 4, we present and discuss the modes of Indo-Pacific SST recovered by NLSA from the CCSM4 dataset. Section 5 contains a sensitivity analysis of these results on the choice of NLSA parameters, spatial domain, temporal extent of the data, and the presence of noise. In sections 6 and 7, we present and discuss the corresponding Indo-Pacific SST modes in the CM3 and HadISST data, respectively. The paper ends in section 8 with concluding remarks. A comparison of the NLSA modes recovered from CCSM4 with those obtained via singular spectrum analysis (SSA) is included in an appendix. Movies illustrating the dynamical evolution of the NLSA modes are provided as supplementary material. The supplementary material also contains a discussion on the properties of the annual cycle modes from the CM3 and HadISST data.

## 2. Nonlinear Laplacian spectral analysis algorithms

NLSA (GM) is a nonlinear time series analysis technique designed to extract intrinsic spatiotemporal modes of variability from high-dimensional data generated by dynamical systems. In this section, we present an overview of the technique, referring the reader to GM and Giannakis (2015) for more thorough discussions.

Let be a multivariate time series consisting of *d*-dimensional samples *x*_{i} taken uniformly at times *t*_{i} = *iδt*. In sections 3–7 ahead, the *x*_{i} will be vectors consisting of monthly averaged SST gridpoint values in the Indo-Pacific sector. In NLSA, the samples are viewed as the values of an observable defined on the phase space of an abstract dynamical system (here, Earth’s climate system), and the goal is to decompose into a collection of temporal and spatial patterns with clear time-scale separation and without prefiltering the data. In the Indo-Pacific SST application studied here these patterns include the seasonal cycle, interannual oscillations such as ENSO, decadal oscillations such as the WPMM, and also modulated patterns representing interactions of these basic modes.

Similarly to classical eigendecomposition techniques such as EOF analysis, SSA (Ghil et al. 2002), and the extended EOF (EEOF) analysis (which is equivalent to SSA), NLSA extracts these patterns through the eigenvectors of a linear operator defined on the dataset. However, instead of the global covariance operator utilized in EEOFs and SSA, NLSA employs a Laplace–Beltrami operator defined on the nonlinear manifold sampled by the data and approximated through kernel techniques (Belkin and Niyogi 2003; Coifman and Lafon 2006). The eigenfunctions of this operator, which can be thought of as nonlinear analogs to PCs, are used for dimension reduction and spatiotemporal pattern extraction. A key feature of the Laplace–Beltrami operator in NLSA is that it is constructed from time-lagged sequences of data as opposed to individual snapshots *x*_{i}. As has been shown theoretically (Berry et al. 2013), the eigenfunctions from time-lagged embedded data separate the observed signal into intrinsic time scales associated with stable Lyapunov directions of the dynamics, such as the quasiperiodic oscillations that are the focus of this work.

### a. Time-lagged embedding

Time-lagged embedding, which is also familiar from SSA and EEOFs, involves mapping each snapshot *x*_{i} into a time-lagged sequence *X*_{i} = (*x*_{i}, *x*_{i+1}, …, *x*_{i −q+1}) ∈ where *q* is the number of lags. This technique was originally introduced in state-space reconstruction methods for dynamical systems (Packard et al. 1980; Takens 1981; Broomhead and King 1986; Sauer et al. 1991), where it was established that for sufficiently large *q*, the dataset with *N* = *n* − *q* + 1, in lagged embedding space lies with high probability on a manifold , which is equivalent (diffeomorphic) to the attractor of the dynamical system generating the data, even if the samples *x*_{i} do not preserve the attractor (i.e., the observations are incomplete, as is frequently the case in atmosphere ocean science).

Time-lagged embedding modifies the topology of the data, making it more Markovian if the observations are incomplete, but it also modifies its geometry. That is, inner products in embedding space and the corresponding distances, given by

between pairs of lagged embedded vectors depend not only on the snapshots *x*_{i} and *x*_{j} but also on the dynamical trajectories that the system took to arrive at these snapshots. As a result, distances in lagged embedding space induce a Riemannian geometry on that depends on the dynamical system generating the data. NLSA employs kernel methods for nonlinear data analysis (Belkin and Niyogi 2003; Coifman and Lafon 2006) to empirically approximate differential operators that depend on the induced geometry, and it performs dimension reduction and spatiotemporal pattern extraction through the eigenfunctions of these operators, as follows.

### b. Laplace–Beltrami eigenfunctions

Let *ξ*_{i} = *X*_{i} − *X*_{i−1} be the time tendency of state *X*_{i}, and let cos*θ*_{ij} = (*X*_{i} − *X*_{j})/(‖*ξ*_{i}‖ ‖*X*_{i} − *X*_{j}‖) be the cosine of the angle between *ξ*_{i} and the relative displacement vector *X*_{i} − *X*_{j} between states *X*_{i} and *X*_{j}. Fixing the parameters *ε* > 0 and *ζ* ∈ [0, 1), we define the kernel function as follows:

This quantity describes an anisotropic pairwise measure of similarity between states *X*_{i} and *X*_{j} that favors states that are mutually aligned to the dynamical flow (i.e., states with cos^{2}*θ*_{ij} ≈ 1 and cos^{2}*θ*_{ji} ≈ 1). Moreover, *K* is exponentially decaying with a rate of decay (bandwidth) controlled by *ε*. The kernel in (1) was introduced in Giannakis (2015) and includes the kernel employed in the earlier work of GM as the special case *ζ* = 0. The family of kernels in (1) can also be viewed as a generalization of radial Gaussian kernels, *K*(*x*_{i}, *x*_{j}) = (which are popular for nonlinear dimension reduction; e.g., Belkin and Niyogi 2003; Coifman and Lafon 2006), through the incorporation of delay embeddings and the ‖*ξ*_{i}‖ and cos*θ*_{ij} terms that depend on the time tendencies of the data. We use the symbol to represent the *N* × *N* symmetric matrix with elements *K*_{ij} = *K*(*X*_{i}, *X*_{j}) computed from a kernel *K*.

Consider now an arbitrary scalar-valued function *f* on . Associated with *f* is the time series (temporal pattern) {*f*_{1}, *f*_{2}, …, *f*_{N}} with *f*_{i} = *f* (*X*_{i}), which we can represent by an *N*-dimensional column vector **f** = (*f*_{1}, *f*_{2}, …, *f*_{N})^{T}. As the number of samples *N* grows, the matrix–vector product **f** = **g** approximates the action of an integral operator on . Moreover, because of the exponential decay of the kernel, as the bandwidth parameter *ε* tends to zero, the component *g*_{i} of **g** depends on the structure of *f* and the properties of the kernel in a small neighborhood of *X*_{i} in a manner that was elucidated in a series of works on harmonic analysis and machine learning (Belkin and Niyogi 2003; Coifman and Lafon 2006; Singer 2006; von Luxburg et al. 2008; Berry and Sauer 2016). Here, we follow the procedure introduced by Coifman and Lafon (2006), which involves normalizing to create a Markov matrix through the sequence of operations:

where and are diagonal matrices with nonzero entries given by the column vectors and , respectively, and is the *N*-dimensional column vector with elements all equal to 1. By construction, is similar to the symmetric matrix , and therefore it possesses eigenvectors {*ϕ*_{0}, *ϕ*_{1}, … , *ϕ*_{N−1}} that form a complete basis of . One can check that the *ϕ*_{k} are orthogonal with respect to the weighted inner product , where is the diagonal matrix with . Moreover, because is an irreducible right-stochastic matrix with , the eigenvalues *λ*_{k} corresponding to *ϕ*_{k} have the ordering 1 = *λ*_{0} > *λ*_{1} ≥ *λ*_{2} ≥ …, *λ*_{N−1}, where the eigenvector corresponding to *λ*_{0} is the constant eigenvector *ϕ*_{0} = **1**. In NLSA, the orthogonal time series corresponding to the eigenfunctions are used as temporal modes of variability, which are nonlinear analogs to the PC time series in EEOF analysis and SSA.

Coifman and Lafon (2006) established that in the limit of large data (*N* → ∞ and ε → 0) and for the radial Gaussian kernel, the operator ( − )/ε converges to the Laplace–Beltrami operator associated with the Riemannian metric inherited by the manifold through its embedding in data space. In particular, the eigenvectors *ϕ*_{k} = (*ϕ*_{1k}, *ϕ*_{2k}, … , *ϕ*_{Nk})^{T} approximate the eigenfunctions *ϕ*_{k} of the continuous operator at the sampled data points; that is, *ϕ*_{ik} ≈ *ϕ*_{k}(*X*_{i}). An important property of the eigenfunctions (Belkin and Niyogi 2003) is that they satisfy an optimality condition for the preservation of local distances in nonlinear dimension reduction maps of the form Φ: with Φ(*X*_{i}) = (*ϕ*_{i1}, *ϕ*_{i2}, … , *ϕ*_{im}). Moreover, the eigenvalues give rise to a measure of roughness for the corresponding eigenfunctions through the quantity 1 − *λ*_{k}. In particular, the latter is proportional to the average gradient of *ϕ*_{k} on so that eigenfunctions of with *λ*_{k} close to 1 correspond to “large scale” weakly oscillatory functions. Note that *ϕ*_{k} being a weakly oscillatory function on does not imply that the time series {*ϕ*_{1k}, *ϕ*_{2k}, …, *ϕ*_{Nk}} has low frequency—the time scales present in the time series are an outcome of both the geometrical structure of *ϕ*_{k} on and the sampling trajectory of the dynamics.

The analysis of Coifman and Lafon (2006) was generalized by Berry and Sauer (2016), who established that a general local kernel, not necessarily isotropic [e.g., the kernel in (1)], can be used to approximate the Laplace–Beltrami operator of a Riemannian metric that depends on the leading moments of the kernel about the diagonal *X*_{i} = *X*_{j}. Thus, through a suitable choice of kernel one can construct dimension reduction coordinates with desirable properties for the data analysis task at hand. Here, our objective is to produce eigenfunctions with enhanced time-scale separation, and the features of the kernel in (1) that contribute to this skill are time-lagged embedding and the directional dependence of the kernel on the local dynamical flow. Specifically, Berry et al. (2013) theoretically established that time-lagged embedding significantly enhances the ability of the eigenfunctions to capture dynamically stable quasiperiodic patterns with time-scale separation, a property that was experimentally observed in GM and Giannakis and Majda (2012a). Moreover, Giannakis (2015) established that in the limit ζ → 1 (where the directional dependence of the dynamical flow in the kernel is maximally significant) the eigenfunctions recover slow intrinsic time scales of the dynamical system generating the data. Both time-lagged embedding and the *ζ*-dependent terms contribute to the clean temporal character of the modes recovered in section 4 ahead requiring no prefiltering of the data.

As a simple example illustrating the interpretation of the eigenvalues as measures of roughness, and the fact that the eigenvalues are generally not related to explained variance, consider a dataset {*x*_{0}, …, *x*_{n}} generated by a time-periodic process. That is, we have *x*_{i} = *F*(*θ*_{i}), where *F* is a one-to-one, periodic function of period 2π taking values in , and *θ*_{i} = *ωt*_{i} for some angular frequency *ω*. Since *F* depends on a single periodic degree of freedom *θ*_{i} and is one to one, the data *x*_{i} lie on a one-dimensional subset *S* of , which has the topology of a circle. On *S*, we can define the Fourier functions *ψ*_{k}, where *k* ∈ {0, 1, 2, …} is an integer-valued wavenumber, *ψ*_{k}(*θ*) = cos(*kθ*/2) when *k* is even, and *ψ*_{k}(*θ*) = sin[(*k* + 1)*θ*/2] when *k* is odd. The Fourier functions are eigenfunctions of the canonical Laplace–Beltrami operator on the circle Δ = −∂^{2}/∂*θ *^{2}, corresponding to the eigenvalues *μ*_{k} = (*k*/2)^{2} and *μ*_{k} = [(*k* + 1)/2)]^{2} for even and odd *k*, respectively. Intuitively, Fourier functions with large wavenumber *k* are “rough” functions on the circle, and the fact that *μ*_{k} scales as *k*^{2} embodies that notion. In particular, the eigenvalues satisfy the following relationship:

For the family of kernels in (1) and the diffusion maps’ normalization, the numerical eigenfunctions *ϕ*_{k} converge to *ψ*_{k} in the sense that in the limit of large data *ϕ*_{ik} → *ψ*_{k}(*θ*_{i}). Moreover, the numerical eigenvalues *λ*_{k} have the limiting behavior (1 −*λ*_{k})/ε → *μ*_{k}, which, taking into account (3), is consistent with the statement made earlier that 1 −*λ*_{k} is a measure of roughness proportional to the average gradient of *ψ*_{k} on the data manifold.

As an illustration that the leading eigenfunctions from NLSA can capture low-variance yet dynamically important modes of variability, consider the same periodic example with *d* = 3 and *F*(*θ*) = [*F*_{1}(*θ*), *F*_{2}(*θ*), *F*_{3}(*θ*)] = [cos*θ*, sin*θ*, *R*cos(*lθ*)] with *l* ≫ 1. This particular choice of *F* describes a “wavy” circle exhibiting oscillatory behavior in the *F*_{3} direction. For *R* sufficiently large, that direction carries the majority of the signal’s variance, and the leading EOF is (0, 0, 1). As a result, the leading PC is proportional to *F*_{3} and exhibits variability at the frequency *lω*. On the other hand, the leading eigenfunctions from NLSA are not affected by the properties of *F* (which are extrinsic to the dynamical system), and approximate the lowest-wavenumber Fourier functions sin*θ* and cos*θ*, capturing the fundamental frequency *ω*.

Of course, the datasets generated by the atmosphere ocean climate system are of significantly higher complexity than the one-dimensional periodic example discussed here, but the basic behavior discussed above generalizes to systems with multiple periodic and nonperiodic degrees of freedom. In such real-world settings, we identify physically important modes by selecting a conservatively large number of eigenfunctions (ordered in order of decreasing *λ*_{k}) and manually examining the properties of the eigenfunction time series and the spatiotemporal reconstructions (performed as described in section 2c below) of climatic variables of interest. Once a candidate set of patterns of interest has been identified, we perform various sensitivity tests to verify their robustness as described in section 5 ahead and then focus on the patterns that have been deemed robust. As stated earlier, the family of kernels in (1) formulated in lagged embedding space is crucial in ensuring that individual eigenfunctions recover modes of variability with time-scale separation and clear physical interpretability. An example, involving North Pacific mixed layer temperature data, of the time-scale mixing that can take place in the eigenfunctions if no embedding is performed can be seen in Fig. 6 in Giannakis and Majda (2013).

### c. Spatiotemporal reconstruction

Consider a time series with *y*_{i} ∈ , sampled at the same times *t*_{i} as the input data used in NLSA. Let also *Y*_{i} = (*y*_{i}, *y*_{i−1}, …, *y*_{i−q+1}) be the corresponding time-lagged embedded data. In this paper, we will always set (i.e., will be Indo-Pacific SST data), but in Part II will also represent other fields of interest such as Indo-Pacific surface winds. We reconstruct spatiotemporal patterns associated with from the eigenfunctions following the convolution approach used in EEOF analysis and SSA (Ghil et al. 2002). Selecting an eigenfunction *ϕ*_{k}, we first compute the spatiotemporal pattern _{k} in lagged-embedding space given by _{k} = *ϕ*_{k}, where _{k} is a (*qd*′) × *N* matrix. That pattern is then projected to a spatiotemporal pattern _{k} in *d*′-dimensional physical space by averaging along *d* × *q* blocks of _{k} corresponding to the lagged windows; see, e.g., (1) in Giannakis and Majda (2012c) for an explicit formula. Note that _{k} is a *d* × *n* matrix, and the matrix element *y*_{k,j} corresponds to the value of the reconstructed signal at the *i*th grid point in physical space at time *t*_{j}.

Given a collection of indices *κ* = {*k*_{1}, *k*_{2}, …, *k*_{r}} we define the reconstructed pattern as the sum . In sections 4–7 ahead, *κ* will frequently consist of pairs of in-quadrature (90° out of phase) eigenfunctions representing oscillatory phenomena (e.g., the annual cycle) and families of modes related by amplitude modulations (e.g., ENSO and its modulation by the annual cycle). For each such pattern we compute the time average at each grid point and the associated explained variance . We also compute a spatially aggregated average . Note that because the are not constrained to be orthogonal in time and/or space, *σ*_{k,i} and *σ*_{k} do not add in quadrature.

If all of the eigenfunctions are used i.e., κ = {0, …, *N* − 1}, recovers the original data {*y*_{1}, …, *y*_{n}} exactly [note that the initial sample *y*_{0} is not recovered since the corresponding *x*_{0} sample is “used up” to compute the time tendency *ξ*_{0} in the kernel in (1)]. Of course, in practice one performs dimension reduction using *m* ≪ *N* eigenfunctions. These *m* eigenfunctions need not be consecutive, and in what follows we will employ a set of nonconsecutive eigenfunctions representing important annual, interannual, and decadal modes of variability of Indo-Pacific SST.

## 3. Dataset description

### a. Model data

We analyze 1300 yr of monthly averaged SST data from a control integration of CCSM4 (CCSM 2010; Gent et al. 2011). This control run is forced with preindustrial greenhouse gas levels and has an ocean grid of approximately 1° nominal resolution. In this configuration, CCMS4 is known to simulate realistic ENSO and Pacific decadal variability (Deser et al. 2012). This work focuses on the Indo-Pacific domain, which we define here as the ocean grid points in the longitude–latitude box 28°E–70°W, 60°S–20°N. Working throughout with the model’s native ocean grid, the spatial dimension (number of grid points) of the dataset is *d* = 44 471. We also analyze 800 years of monthly SST data from a preindustrial control integration of CM3 (Griffies et al. 2011; GFDL 2012) over the same Indo-Pacific domain. In this case, the nominal resolution is slightly lower corresponding to 24 612 grid points. Both datasets exhibit a small amount of drift that does not exceed 0.01 K per century. Note that the data are not subjected to preprocessing such as removal of the seasonal cycle, low-pass filtering, or detrending to remove the drift prior to analysis via NLSA. As will be discussed below, the ability to capture amplitude modulation relationships (e.g., the modulation of the annual cycle by ENSO through combination modes) through NLSA relies crucially on temporal information that would be lost by prefiltering the data.

### b. Observational data

We also study HadISST data (Rayner et al. 2003; HadISST 2013) over the same Indo-Pacific domain. This dataset consists of monthly averaged SST data on a uniform 1° latitude–longitude grid containing *d* = 20 960 grid points in our Indo-Pacific domain. We use either the full, 140-yr HadISST dataset from January 1870 to December 2013 or the shorter, but more reliable, portion of that dataset from January 1979 to December 2009 spanning the satellite era (the latter dataset was truncated to 2009 for compatibility with the reanalysis data used in Part II). Unlike the model data, these datasets exhibit a significant trend component, which we do not remove as we found that NLSA naturally separates the statistically stationary and trend components of the input data into distinct eigenfunctions. We defer the study of such trend modes to future work.

## 4. Spatiotemporal modes recovered from CCSM4 data

We extract spatiotemporal modes of Indo-Pacific SST using the NLSA algorithm described in section 2 applied to the 1300-yr CCSM4 control run. To induce time-scale separation between multidecadal, interannual, and seasonal modes, we use a 20-yr embedding window corresponding to *q* = 240 monthly lags. Note that the embedding window used in this work is significantly longer than the 2-yr window used by GM in studies of North Pacific SST variability via NLSA. Following Giannakis (2015), we work with the kernel in (1) for the parameter values ε = 1 and ζ = 0.99. Using these parameter values, we compute the leading 200 Laplace–Beltrami eigenfunctions and their associated spatiotemporal patterns. We will discuss the sensitivity of our results with respect to the NLSA parameters as well as the choice of spatial domain, time span of the data, and the presence of noise in section 5.

In what follows and in Part II, out of the 200 computed eigenfunctions we focus primarily on an 18-element subset representing dominant modes of Indo-Pacific variability—the annual cycle and its harmonics, ENSO, the combination modes associated with the modulation of the annual cycle by ENSO, the WPMM, the IPO, the TBO, and the combination modes associated with the modulation of the annual cycle by the TBO. These modes were identified through the temporal power spectra of the eigenfunction time series and the corresponding spatiotemporal reconstructions. The global ordering of the corresponding eigenfunctions in the spectrum (ordered as described in section 2b) is 1, …, 12, 90, 93, 116, 117, 146, 147. For notational clarity, we denote these eigenfunctions by *ϕ*_{1}, *ϕ*_{2}, …, *ϕ*_{18}, respectively, in accordance with their ordering within the selected subset. In addition to this subset, the NLSA spectrum contains several other “secondary” ENSO modes and their combinations with the annual cycle and its harmonics forming an ENSO frequency cascade (Stuecker et al. 2015a). In Part II, we will examine the role of these modes in El Niño–La Niña asymmetries.

Throughout this section, we frequently make reference to the amplitude and fractional variance explained by our modes. By amplitude we mean half of the difference between the maximum and minimum values of the SST anomalies in a given region, reconstructed for a given set of modes as described in section 2c and visualized in the movies in the supplementary material. Variances are also computed as described in section 2c either globally or at each grid point to construct spatial maps of explained variance. Depending on the context, we report fractional variances relative to the raw data, the raw data with the seasonal cycle subtracted, the reconstructed data using all modes {*ϕ*_{1}, …, *ϕ*_{18}}, and the reconstructed data using all modes except from those representing the seasonal cycle. For conciseness, we sometimes refer to the raw or reconstructed explained variance computed after removal of the seasonal cycle as “nonperiodic explained variance.” We note again that since the reconstructed patterns are not constrained to be orthogonal (either in time or in space), our computed variances are not additive.

### a. Annual cycle modes

The first six eigenfunctions *ϕ*_{1}, …, *ϕ*_{6}, come in doubly degenerate pairs that represent the annual cycle and its first two harmonics. The corresponding time series and spatial composites are depicted in Fig. 1, and 150 yr of reconstructed signal are visualized in movie 1 of the supplementary material. The variance explained by these modes relative to the raw SST data is shown in Figs. 2a–c.

The time series in this family have the structure of in-quadrature sinusoidal waves of frequency 1, 2, and 3 yr^{−1}, respectively, for {*ϕ*_{1}, *ϕ*_{2}}, {*ϕ*_{3}, *ϕ*_{4}}, and {*ϕ*_{5}, *ϕ*_{6}}. In the spatial domain, the annual pair {*ϕ*_{1}, *ϕ*_{2}} describes fluctuations of predominantly zonal character (Fig. 1c) with pronounced strength in the Southern Hemisphere midlatitudes and a relatively weak signal over tropics. The corresponding explained variance relative to the raw data reaches more than 60% south of the tropic of Capricorn (see Fig. 2a). In particular, the amplitude of the reconstructed anomalies reaches up to 5 K in a zonal belt centered around the 35°S parallel, explaining up to 70% of the raw variability east of Australia. The variance explained by the annual modes relative to the data reconstructed from the 16-eigenfunction family (not shown here) is even higher, reaching up to 90% in that region. Averaged over the whole Indo-Pacific basin, the variance associated with the annual modes is approximately 46% and 58% of the raw and reconstructed data, respectively, with only a weak contribution from the tropics.

The semiannual periodic pair {*ϕ*_{3}, *ϕ*_{4}} features a more complex spatial structure than the annual pair, with its activity taking place not just in the midlatitudes but also over the tropics (see Fig. 1f and movie 1 in the supplementary material). The strongest reconstructed fluctuations for this mode occur in the western and northern parts of the Indian Ocean (especially the coastal waters of Somalia, the Arabian Sea, and the Bay of Bengal), the vicinity of the Indonesian Archipelago, and the eastern tropical Pacific, where anomaly amplitudes up to ~1.5 K are observed. Averaged over the whole Indo-Pacific basin, the semiannual modes explain approximately 10% and 15% of the raw and reconstructed variance, respectively. Regionally, the explained variance of these modes can be significantly higher, reaching 25%–50% of the raw variability over the tropical Indian Ocean and the Indonesian Archipelago (see Fig. 2b).

Similarly, to the semiannual modes, the dominant spatial anomalies for the triennial modes {*ϕ*_{5}, *ϕ*_{6}} are concentrated in the tropics but generally feature smaller scales and weaker amplitudes (see Fig. 1i and movie 1 in the supplementary material). Nevertheless, amplitudes up to ~0.5 K associated with these modes develop over the Gulf of Aden and the west part of the Arabian Sea, where the resulting variance amounts to ~20% of the raw variability (see Fig. 2c).

### b. ENSO and ENSO combination modes

ENSO and its associated combination modes are captured by several NLSA modes. In this paper, we focus on the mode families {*ϕ*_{7}, *ϕ*_{8}}, {*ϕ*_{9}, *ϕ*_{10}}, and {*ϕ*_{11}, *ϕ*_{12}}, which represent the fundamental component of ENSO and amplitude modulations of the annual cycle by ENSO, respectively. The spatial and temporal patterns associated with these families are displayed in Fig. 3 and movies 2 and 3 in the supplementary material. Spatial maps of the fractional variance explained by these modes, computed relative to the raw SST data, the raw SST data minus the seasonal cycle, and the signal reconstructed by the nonperiodic modes {*ϕ*_{7}, …, *ϕ*_{14}} in our selected mode family are shown in Figs. 2d–f, 4a–c, and 5a–c.

Modes {*ϕ*_{7}, *ϕ*_{8}} capture the dominant features of ENSO through an in-quadrature pair of amplitude-modulated waveforms (see Fig. 3b). These waveforms have a carrier frequency of ν_{ENSO} ≈ 4 yr^{−1} and a low-frequency modulating envelope evolving on decadal time scales. The resulting Fourier spectrum (Fig. 3a) is strongly peaked at the carrier frequency but is broad and contains appreciable power on decadal to centennial time scales. In Part II, we will make a connection between the amplitude envelope of this family and the WPMM.

Spatially (see Fig. 3c and movie 2 in the supplementary material), the {*ϕ*_{7}, *ϕ*_{8}} cycle initiates with positive SST anomalies emerging west of the Galapagos Islands, exceeding at times 1 K there. At the same time, a center of opposite-sign and slightly weaker anomalies develops over the South China Sea, as well as along the eastern and western coast of Australia. This phase lasts a few months and is associated with the westward expansion of the eastern tropical Pacific anomalies. Moreover, an El Niño–like triangular pattern develops as this anomaly weakens while spreading simultaneously south and north to the subtropical eastern Pacific. At the same time, positive anomalies up to ~0.5 K develop over Indian Ocean waters north of 40°S and south of an arc crossing through the Gulf of Aden and the Perth basin. Also, negative anomalies propagate east of Australia and intensify over the southeast Pacific Ocean. A double triangular pattern of opposite-sign anomalies eventually covers the majority of the subtropical Pacific waters east of the warm pool and persists for several months before dissipating, ending one-half of the ENSO cycle.

Averaged in time, the explained variance of modes {*ϕ*_{7}, *ϕ*_{8}} reaches up to 20% and 50% of the raw (Fig. 2d) and reconstructed (not shown) variability of the equatorial regions of the Pacific Ocean. Averaged over the whole Indo-Pacific basin, these variances amount to 4.6% and 5.7%, respectively. When all but periodic modes are taken into account, the explained variance of the reconstructed data increases to 70% in the equatorial Pacific (see Fig. 5a) and to 40% for the whole Indo-Pacific basin.

The next two pairs of eigenfunctions, {*ϕ*_{9}, *ϕ*_{10}} and {*ϕ*_{11}, *ϕ*_{12}}, shown in Figs. 3e,h, describe amplitude modulations of the annual cycle by the primary ENSO modes. In the Fourier domain, these modulations are evidenced by strong peaks in the spectra in Figs. 3d,g at the frequencies (1 yr^{−1}) − ν_{ENSO} = 0.75 yr^{−1} and (1 yr^{−1}) + ν_{ENSO} = 1.25 yr^{−1}, respectively. Moreover, the amplitude envelopes of {*ϕ*_{9}, *ϕ*_{10}} and {*ϕ*_{11}, *ϕ*_{12}}, computed via Hilbert transforms, correlate almost perfectly [correlation coefficient of 0.99 at *p* value numerically equal to zero based on a *t* test with (1300 yr) × ν_{ENSO} − 2 = 323 degrees of freedom] with the corresponding amplitude envelope of {*ϕ*_{7}, *ϕ*_{8}}. A more direct test for modulating behavior is to examine the relative phases between the complex numbers *z* = *ϕ*_{7} + i*ϕ*_{8}, *z*_{−} = *ϕ*_{9} + i*ϕ*_{10}, and *z*_{+} = *ϕ*_{11} + i*ϕ*_{12}. As shown in Fig. 6, the real and imaginary parts of *z*_{−}/*z* and *z*_{+}/*z*_{−} are to a good approximation in-quadrature sinusoidal waves of frequency 1 and 2 yr^{−1}, respectively. This confirms that {*ϕ*_{9}, *ϕ*_{10}} and {*ϕ*_{11}, *ϕ*_{12}} are modulations of {*ϕ*_{7}, *ϕ*_{8}} by the annual cycle and that the peak frequency of {*ϕ*_{9}, *ϕ*_{10}} lies between the frequencies of its parent signals (ENSO and the annual cycle), whereas the peak frequency of {*ϕ*_{11}, *ϕ*_{12}} exceeds the frequencies of both of its parent signals. Hereafter, we denote the families {*ϕ*_{9}, *ϕ*_{10}} and {*ϕ*_{11}, *ϕ*_{12}} by ENSO-A1 and ENSO-A2, respectively, where ENSO-A stands for “ENSO-annual.”

The dominant frequencies of ENSO-A1 and ENSO-A2 are consistent with a frequency cascade attributed to quadratic nonlinearities in the coupled equations of motion for the ocean and atmosphere, leading to a synchronization between the seasonal cycle and ENSO (McGregor et al. 2012; Stuecker et al. 2013, 2015a; Stein et al. 2014; Ren et al. 2016). In previous works, these combination modes were either explicitly constructed through products of interannual ENSO and annual periodic signals (Stuecker et al. 2015a), through Hilbert transform techniques applied to ENSO indices (Stein et al. 2011, 2014), or they were identified via EOF analysis of surface atmospheric circulation fields (McGregor et al. 2012; Stuecker et al. 2013; Ren et al. 2016). In the EOF-based analyses, ENSO and the combination modes at the 1 yr^{−1} ± ν_{ENSO} frequencies (a total of six real-valued or four complex-valued signals) were represented as a single pair of in-quadrature PCs. In NLSA, the full ENSO, ENSO-A1, and ENSO-A2 families emerge naturally from Indo-Pacific SST model data. In sections 6 and 7, we will see that these modes are also recovered from the CM3 and HadISST datasets, respectively, and in Part II we will establish that the surface atmospheric circulation patterns due to these modes are consistent with the southward shift of zonal wind anomalies. The deterministic nature of the relationship between ENSO and ENSO-A1–ENSO-A2 has been proposed as a source of ENSO termination predictability (Stuecker et al. 2013, 2015a).

The spatiotemporal pattern associated with ENSO-A1 (shown in Fig. 3f and movie 3 in the supplementary material) consists primarily of SST anomalies emerging periodically along the coast of equatorial South America, strengthening up to ~0.2 K in the vicinity of the Galapagos Islands, and subsequently propagating westward over the equatorial Pacific until dissipating around the date line. The termination of this equatorial wavelike pattern is associated with simultaneous emergence of another cluster of SST anomalies located southwest of the coast of Sumatra. These anomalies have the same sign as the terminating wave, reach amplitudes up to ~0.4 K, and persist west of the Sunda Strait for 6–7 months. In the meantime, a southwestward-propagating SST anomaly develops in the northeast Indian Ocean. This traveling disturbance appears to reflect upon reaching the east coast of Africa near the equator and to cross the Indian Ocean for a second time until it merges with the cluster of anomalies west of the Sunda Strait (see, e.g., January–September of simulation year 1170 in movie 3 of the supplementary material). After the decay of the latter anomalies, a pronounced disturbance develops along the north coast of Australia and splits into a part that propagates slowly along the west Australian coast and a part that crosses the Pacific Ocean in a southeastward direction. Toward the end of this phase, new anomalies develop over the eastern tropical Pacific, and the cycle repeats.

Overall, the vicinity of the Galapagos Islands (and more broadly the eastern and central equatorial Pacific) and the Indian Ocean west of the Sunda Strait are regions where a significant amount of variability can be attributed to the ENSO-A1 family. In particular, the corresponding explained variance of the reconstructed data is up to 35% west of the Sunda Strait and 13.5% for the whole Indo-Pacific basin when all but periodic modes are taken into account (see Fig. 5b). Moreover, as shown in Figs. 2e and 4b, these modes explain up to ~8% and ~12% of the variance of the raw and nonperiodic signal west of the Sunda Strait, averaging to 1.6% and 4%, respectively, for the whole domain. Note that the region west of the Sunda Strait is typically associated with the IOD as defined conventionally through an index based on SST anomalies measured partially there (Saji et al. 1999). As pointed out in section 1, although the nature of the IOD and its relation to ENSO have been studied extensively over the last 15 yr, controversies remain, and no prevailing consensus has been achieved yet in the community. Our results indicate that in CCSM4 (and also for HadISST; see section 7), a significant portion of the SST fluctuations contributing to the variability typically associated with the IOD can be interpreted as a deterministic modulation of the annual cycle by the fundamental ENSO modes {*ϕ*_{7}, *ϕ*_{8}}.

The spatiotemporal patterns associated with ENSO-A2 bear several similarities with the patterns described above for ENSO-A1 but also have notable differences. First, the amplitude of the former is ~30% lower than that of the latter for most of the regions where these two modes are both active. Furthermore, in the case of ENSO-A2, the traveling disturbance propagating west of the Galapagos is faster, has significantly finer spatial structure than in the case of ENSO-A2, and propagates farther to the north and west of the equatorial Pacific before reflecting back over the Celebes Sea. Compared to ENSO-A1, the anomalies west of the Sunda Strait associated with ENSO-A2 last for a significantly shorter time and exhibit a stronger intensification after the arrival of the wave that has crossed the Indian Ocean twice (first in a southwest and then in a southeast direction of propagation). The temporal evolution of the anomalies over the South China Sea is also shifted in phase relative to the equatorial wave propagation. Averaged over the whole domain, these modes explain 1.1% of the raw variance, 2.9% of the raw nonperiodic variance, and 10.4% of the reconstructed nonperiodic variance. As with ENSO-A1, the explained variance by the ENSO-A2 modes is especially pronounced over the eastern Indian Ocean, reaching 9% of the raw variance, 12% of the raw nonperiodic variance, and 35% and of the reconstructed nonperiodic variance northwest of the Sunda Strait (see Figs. 2f, 4c, and 5c, respectively). Thus, these modes are also a significant contributor to SST anomalies over the region traditionally associated with the IOD. Similarly to ENSO termination, the deterministic nature of the relationship between ENSO and ENSO-A1–ENSO-A2 has potential implications to SST predictability in IOD regions.

### c. TBO and TBO combination modes

The fundamental component of the TBO is captured by eigenfunctions {*ϕ*_{15}, *ϕ*_{16}}. These modes form a degenerate in-quadrature oscillatory pair, which, as shown in Figs. 7a and 7b, is of predominantly biennial periodicity with a peak frequency ν_{TBO} = 0.5 yr^{−1}. Spatially (see Fig. 7c and movie 1 in the supplementary material), these modes are active over the majority of the Indo-Pacific sector and explain approximately 0.25%, 1.3%, and 5% of the raw, nonperiodic, and reconstructed nonperiodic variance, respectively. For the region northeast and east of New Guinea (see Figs. 2g, 4d, and 5d), approximately 2%, 4%, and 22% of the corresponding variances can be attributed to this pair.

The cycle represented by {*ϕ*_{15}, *ϕ*_{16}} starts with positive anomalies developing west of the Galapagos Islands, which intensifies up to 0.15 K over a six-month period. In the meantime, an anomaly of the same sign and similar amplitude emerges along the west coast of Sumatra and Java and another one dissipates over the South China Sea. Also, weaker negative anomalies grow southeast of New Zealand and off the coast of Somalia. Six months later, positive anomalies west of the Galapagos propagate westward, mainly to the north of the equator, and negative anomalies southeast of New Zealand start to propagate east while weakening in the meantime. The westward propagation of the disturbances leads to a fast warming northeast of New Guinea three months later. This wave subsequently propagates in a southeast direction and gradually decays. In the meantime, the positive anomalies west of Sumatra dissipate, while negative anomalies intensify over the South China Sea. A simultaneous development of negative anomalies west of the Galapagos initiates the second half (year) of the cycle.

Similar to ENSO, the TBO modes {*ϕ*_{15}, *ϕ*_{16}} have associated combination modes {*ϕ*_{17}, *ϕ*_{18}}, representing the phase locking of the TBO with the annual cycle. As shown in Figs. 7d,e, the dominant frequency of this oscillatory pair is ν_{TBO} + 1 yr^{−1} ≃ 1.5 yr^{−1}, and its amplitude envelope (extracted via a Hilbert transform) has a correlation coefficient of 0.87 with the corresponding amplitude envelope of the fundamental TBO modes [the *p* value of this correlation coefficient is numerically zero based on a *t* test with (1300 yr) × ν_{TBO} − 2 = 748 degrees of freedom]. Moreover, a similar amplitude and frequency analysis as that performed in section 4b and Fig. 6 (not shown here) reveals that {*ϕ*_{17}, *ϕ*_{18}} is indeed a pair of combination modes analogous to ENSO-A2. Hereafter, we denote this pair by TBO-A. In the spatial domain (Fig. 7f), the SST patterns associated with these modes have strong loadings in the equatorial Indo-Pacific sector and especially over the Maritime Continent. In Part II, we will see that together with the fundamental TBO modes these patterns are consistent with the surface wind and precipitation patterns associated with biennial variability of the Indian–Australian monsoon (Meehl and Arblaster 2002).

### d. Decadal and multidecadal modes

#### 1) West Pacific multidecadal mode

Mode *ϕ*_{13}, whose temporal pattern and frequency spectrum are depicted in Figs. 8a,b, represents the dominant multidecadal fluctuations of SST as extracted in the CCSM4 control run via NLSA. Eigenfunctions with the qualitative features of *ϕ*_{13} occur consistently from shorter time scales over a range of embedding windows greater than a decade and over a range of the other NLSA parameters. The spatial patterns associated with the negative phase of *ϕ*_{13} (see Fig. 8c and movie 2 in the supplementary material) consist predominantly of a globular negative SST anomaly with amplitudes up to ~0.25 K, which is bounded by New Guinea from the south, the date line from the east, and Borneo from the west. A weaker anomaly of the same sign develops over the southern Pacific east of the date line, while anomalies of the opposite sign prevail at the same time in the vicinity of Tasmania, gradually weakening westward for over 100° of longitude. Weaker fluctuations emerge also over other regions. In particular, the western Indian Ocean and the subtropical central Pacific are anomalously colder, whereas positive anomalies can be found south of Ecuador, east and west of Australia, and over the Indonesian Archipelago, the Bay of Bengal, and the South China Sea.

The significant events associated with the WPMM are of aperiodic character and vary in spatial extent and duration. The spatial pattern developing during these events consists of the anomalies described above persisting over the course of individual episodes that can last for several decades. For example, the reconstructed years 1060–1120 in movie 2 of the supplementary material are associated with a single prolonged event with a positive anomaly over the warm pool.

The WPMM explains ~1% of the reconstructed variance averaged over the whole Indo-Pacific basin and ~12% if all but annual cycle modes are taken into account. As shown in Figs. 2h and 5e, a strong imprint of this mode is found over the equatorial Pacific northeast of New Guinea, where it explains locally over 10% and 50% of the nonperiodic raw and reconstructed variance, respectively (see Figs. 4e and 5e). Similar values of the nonperiodic SST variance localized south of the subtropics and west of Tasmania can be explained by this mode.

In summary, the spatial characteristics of the WPMM partially resemble those of a west–east decadal dipole, identified often by the second EOF of low-pass-filtered SST (Timmermann 2003; Yeh and Kirtman 2004; Ogata et al. 2013). While the physical relevance of this EOF has sometimes been questioned (e.g., owing to preprocessing of the data by low-pass filtering), the fact that the WPMM is recovered here from unprocessed data suggests that, at least in CCSM4, it has a physically meaningful dynamical origin. In Part II, this assertion will be supported through an analysis of surface wind, thermocline depth, and precipitation fields associated with the WPMM.

#### 2) Interdecadal Pacific oscillation

Besides the WPMM, the NLSA spectrum contains another low-frequency mode, extracted via eigenfunction *ϕ*_{14}. As shown in Figs. 8d–f, this eigenfunction has a strong decadal component, with individual events usually lasting more than a decade, but is also influenced significantly by higher-frequency (biennial) fluctuations. According to the composite in Fig. 8f and the reconstructions in movie 2 of the supplementary material, this mode is active mainly over the Pacific basin. In particular, it generates a pronounced anomaly up to ~0.1 K prevailing over a large triangle-shaped region with a vertex in the central Pacific Ocean and an eastern edge along the coastline of South America. An equally pronounced but negative anomaly occurs south of the last anomaly reaching as far as 60°S latitude and as far as Tasmania in the west. Such spatial characteristics, in particular the ENSO-like pattern over the eastern part of the South Pacific Ocean, resemble the pattern associated usually with IPO. The explained variance from *ϕ*_{14}, averaged over the Indo-Pacific basin (with the Indian Ocean playing a minor role in comparison to other basins), amounts to less than 1% of the raw and reconstructed variability and is slightly less than 7% if all but periodic modes are taken into account. Note that *ϕ*_{14} explains 2 times less variance than the WPMM identified by *ϕ*_{13}.

## 5. Sensitivity analysis

In this section, we verify the robustness of the Indo-Pacific SST modes presented in section 4 with respect to the choice of NLSA parameters (section 5a), the choice of spatial and temporal domain (sections 5b and 5c), and the presence of independent and identically distributed (i.i.d.) Gaussian noise in the data (section 5d). Additional details on the choice of NLSA parameters can be found in GM and in Giannakis (2015). Technical results on the behavior of NLSA and related kernel algorithms in the presence of i.i.d. noise are discussed in Giannakis (2017).

### a. Choice of NLSA parameters

The parameters of NLSA as described in section 2 are the number of embedding lags *q*, the kernel bandwidth *ε*, and the cone kernel parameter *ζ*. For the class of kernels in (1) featuring *ξ*-dependent normalization factors the nominal value of *ε* is 1 (Giannakis 2015). The results in this paper and in Part II were obtained using this parameter value, but we have verified that qualitatively similar results can be obtained for values *ε* in the interval [1, 5]. Theoretically, for maximal capability to recover slow time scales (in the present context, decadal modes), *ζ* should be set arbitrarily close to 1 but not equal to or exceeding 1 (Giannakis 2015). Throughout, we work with ζ = 0.99, though we find that our results are not too sensitive on values of *ζ* in the interval [0.95, 1].

To verify the robustness of our results against the number of embedding lags, we performed analyses with *q* in the interval [48, 480] corresponding to physical embedding window lengths Δ*t* in the range 4–40 yr. The salient temporal and spatial features of the annual cycle, ENSO, and leading ENSO combination modes (including the amplitude–phase relationships between the ENSO and ENSO-A modes) are qualitatively robust under these changes. On the other hand, in order to cleanly resolve the WPMM, IPO, TBO, and TBO combination modes we found that Δ*t ***≳**10 yr embedding windows are generally required. As stated in the preamble of section 4, our nominal choice, *q* = 240, Δ*t* = 20 yr, is significantly longer than the 2-yr embedding windows used in previous studies of North Pacific SST variability via NLSA (GM and Giannakis 2015). We believe that the longer embedding windows needed to achieve time-scale separation in the Indo-Pacific system studied here is at least partly caused by the broadband character of ENSO.

In particular, we observe that modes with qualitatively similar characteristics as the ENSO and ENSO-A modes described in section 4b are present in the NLSA spectrum for embedding windows as small as 4 yr. As the embedding window length increases, the power spectral densities of the ENSO and ENSO combination modes become increasingly concentrated around the fundamental ENSO and ENSO-A peaks, and additional, “secondary” ENSO modes appear with frequency peaks adjacent to the dominant ENSO peak. This behavior, illustrated in Fig. 9, is generally consistent with the presence of an ENSO frequency cascade in the data, which is progressively separated by NLSA into its constituent frequencies with increasing *q*. Such a frequency cascade is expected on general grounds owing to dynamical nonlinearities in the coupled atmosphere–ocean system (Stuecker et al. 2015a), and as *q* increases NLSA is theoretically expected (Berry et al. 2013; Giannakis 2017) to increasingly resolve these frequencies through individual eigenfunctions. Together, these facts may explain the behavior observed in Fig. 9. It is important to note that the dominant ENSO frequency ν_{ENSO}, the phase relationships between the ENSO and ENSO-A modes (depicted in Fig. 7), and the structure of the corresponding spatial SST patterns (see Fig. 3) are robust under changes of *q* despite the emergence of new ENSO-like modes. Moreover, the decadal ENSO envelopes are also reasonably robust for Δ*t* ≥ 10 yr. While a detailed study of the secondary ENSO modes is beyond the scope of this work, in Part II we will demonstrate that the collection of fundamental and secondary NLSA ENSO modes provides a realistic representation of El Niño–La Niña SST asymmetries.

### b. Choice of spatial domain

To verify the robustness of our results against changes of the spatial domain we have performed NLSA on SST data from the same CCSM4 control integration on two subregions of our Indo-Pacific domain, namely the tropical Pacific band 100°E–110°W and 20°S–20°N and the tropical Indian Ocean band 30°–110°E and 30°S–30°N. The modes recovered by NLSA on these subdomains are in good agreement with the family presented in section 2. This agreement is particularly good in the case of the annual and interannual modes in the top of the spectrum (i.e., the annual cycle and its harmonics, ENSO and ENSO-A modes). WPMM patterns can also be detected from these subdomains, although in the case of the tropical Pacific subdomain the WPMM pattern is corrupted (presumably owing to the regional dominance of ENSO activity).

In general, we do not detect qualitatively new patterns among the leading NLSA modes from the two subdomains, suggesting that the Indian and Pacific Oceans exhibit strong dynamical couplings. We will study this point further in Part II using spatiotemporal reconstructions of atmospheric circulation fields. The agreement of the eigenfunctions from the different domains is also consistent with theoretical invariance properties of eigenfunctions obtained from the kernel in (1) owing to delay embeddings (Berry et al. 2013; Giannakis 2017) and the directionally dependent cone kernel structure (Giannakis 2015). Together, these properties imply that for data generated by a single dynamical system (in this case, the Indo-Pacific atmosphere–ocean system) and for sufficiently many delays *q*, different observation functions (in particular, SST over the full Indo-Pacific domains or the Pacific and Indian Ocean subdomains) should generically produce similar leading eigenfunctions.

### c. Influence of the temporal span of the input data

Another factor whose relevance in obtaining our results we examine here is the millennial length of the CCSM4 dataset. For a dataset of this time span, all modes presented in section 4 are reasonably well sampled. In particular, approximately 30 strong events (split approximately evenly between positive and negative phases) are captured for the WPMM, the slowest mode in our mode family. The duration of these events can easily exceed the time span of the satellite era and can even be comparable to the length of the industrial era (see, e.g., the positive WPMM event emerging around year 1060 in movie 2 of the supplementary material and lasting for approximately 65 yr), raising questions about the detectability of these modes in observational datasets of limited temporal coverage and potentially significant systematic biases.

Here, we examine the sensitivity of the modes in section 4 to the time span of the input data by splitting the original 1300-yr CCSM4 time series into eight disjoint consecutive subperiods, each spanning 162.5 yr, and performing NLSA on each batch with all parameters kept the same as in the original setup. Examining the resulting eight sets of eigenfunctions, we find that the annual, ENSO, and ENSO-A modes can be robustly captured from these smaller datasets, but the quality of the WPMM, IPO, TBO, and TBO-A modes is significantly impacted. In particular, just two out of eight sets of eigenfunctions included discernible WPMM-like modes, namely subperiods 1 and 3. SST reconstructions based on these eigenfunctions produced features resembling the WPMM recovered from the full dataset, but the eigenfunction time series were found to be significantly influenced by high-frequency variability, possibly due to mixing with ENSO. These findings indicate that the possibility of recovering clean decadal modes analogous to the WPMM from observational data through the current formulation of NLSA is limited, and we will verify that this is indeed the case in section 7. Thus, long climate simulations, or perhaps paleoclimatic data, remain indispensable for recovering multidecadal modes of variability via this method.

### d. Noise robustness

As shown in section 4, the NLSA Indo-Pacific SST modes extracted from millennial CCSM4 data exhibit a remarkably clear temporal evolution for both the variance-dominating modes such as ENSO and the ENSO-A modes, as well as for the low-amplitude WPMM and TBO. Besides the time span of the data, another potential barrier in the detection of these modes in real-world data is measurement noise, which is inevitably part of any observational dataset. Intuitively, one would expect the noise of given strength to have more pronounced impact on retrieval of lower-amplitude modes such as the WPMM and TBO, as opposed to variance-dominating modes such as ENSO.

Here, in order to assess the robustness of our extracted modes against observational noise, we examine the results of NLSA applied to millennial Indo-Pacific SST data corrupted by i.i.d. Gaussian noise. It can be shown (Giannakis 2017) that the presence of i.i.d. noise (not necessarily Gaussian) leads to a random bias in the pairwise distances ||*X*_{i} − *X*_{j}|| between the data in lagged embedding space, but, by the law of large numbers, as the number of lags *q* grows the bias becomes a constant with high probability. This statistically constant bias enters in the kernel matrix but is canceled in the normalization procedure in (2) for constructing the Markov matrix used for computing eigenfunctions. Thus, time-delay embedding naturally endows NLSA with robustness against i.i.d. measurement noise, and the larger *q* is the larger the noise tolerance becomes. This result also holds for noises with spatially nonuniform statistics and spatial dependencies, but otherwise i.i.d. in time. Of course, in practical applications there are limits on how large *q* can be (e.g., owing to computational cost and the fact that *q* must be significantly smaller than the number of samples *n* to avoid spurious correlations).

In separate calculations, we have computed eigenfunctions using data corrupted by i.i.d. Gaussian noise of standard deviation *σ* in the range 0.5–3 K using *q* = 240 lags as in the noise-free case. (Note that the presence of a time-independent, nonzero noise mean is irrelevant for NLSA since such a term cancels from all pairwise data differences *X*_{i} − *X*_{j}.) We found that the WPMM is still present in the spectrum for σ = 0.5 K, but the quality of its temporal pattern is substantially degraded. Increasing the noise to 1 K results in the WPMM being indistinguishable among the leading 200 eigenfunctions. On the other hand, for the same number of lags, ENSO and ENSO-A eigenfunctions can be detected for noise standard deviations up to 3 and 2 K, respectively. In agreement with theoretical expectations, we also found that the noise tolerance for these modes increased by increasing *q*.

## 6. Comparison with CM3 dataset

In this section, we compare the Indo-Pacific SST modes extracted from the 1300-yr CCSM4 dataset (section 4) against modes extracted from the 800-yr CM3 dataset. We analyzed this dataset in the same fashion as CCSM4, and the results were tested using a wide range of NLSA parameters. As different models have different physics representations and initial and boundary conditions, we do not obtain a perfect agreement between the two analyses. Qualitatively, however, the periodic, ENSO, and ENSO-A modes recovered from CCSM4 can also be identified in the CM3 control run. Modes resembling the WPMM, IPO, and TBO are also recovered from CM3, but their temporal patterns are noisier than their CCSM4 analogs. Figures 10 and 11 show the temporal patterns, frequency spectra, and spatial composites for the interannual and decadal CM3 modes corresponding to the CCSM4 modes in Figs. 3, 7, and 8. Below, we outline the main similarities and differences between the two sets of modes. In the interest of brevity, the properties of the annual cycle modes from CM3 are discussed in the supplementary material.

The ENSO and ENSO-A modes from CM3 are displayed in Fig. 10. As shown in Fig. 10c, the dominant interannual spatiotemporal pattern from CM3 features an ENSO-like triangular anomaly that compares well to the corresponding pattern from CCSM4 (Fig. 3c). However, the dominant ENSO time scale range of 2–3 yr for CM3 is significantly shorter than the 3–5-yr ENSO time-scale characteristic of CCSM4. Pronounced spatial differences between the two patterns are also present. In particular, the pattern of westward-propagating anomalies originating in the Galapagos Islands and gradually weakening in the eastern Pacific progresses significantly faster in CM3 than in CCSM4. In other regions, such as the sector west of Australia, the interannual anomalies in CM3 are significantly weaker than those in CCSM4. Despite these differences, the dominant interannual modes from CM3 exhibit multidecadal fluctuations, which are qualitatively similar to the behavior observed in CCSM4.

ENSO combination modes also have a pronounced signature in the CM3 dataset (Figs. 10f and 10i, respectively) that similarly to other modes is stronger than CCSM4. The first family of ENSO combination modes (analogous to the ENSO-A1 modes in section 4b) grows west of the Galapagos and subsequently weakens rapidly during its westward propagation. Also, the opposite-signed anomaly accompanying the Galapagos pattern develops east of Borneo and New Guinea, as opposed to the South China Sea as observed in CCSM4. The anomaly west of Australia is weaker, whereas the one along the west coastline of Sumatra and Java is more pronounced in CM3 than in CCSM4. The properties of the second set of combination modes (analogous to ENSO-A2 in section 4b) differ between two models in a similar fashion as the first annual modulation. Despite these differences, both the first and second annual modulations of the interannual mode are very pronounced west of the Sunda Strait in the CM3 dataset and can be associated with the IOD.

Biennial modes representing the TBO are also captured in the CM3 dataset (Figs. 11a–c), albeit with temporal patterns influenced by annual modulation. In both models, the biennial modes generate anomalies over the eastern Pacific and west of the Sunda Strait, and these anomalies have similar amplitude but different regional extent. Also, the westward propagation associated with this mode is much less pronounced over the tropical Pacific in CM3, and the emergence of anomalies west of the Sunda Strait is less coincident to the dissipation of the equatorial wave. We did not recover TBO combination modes from CM3 analogous to the TBO-A modes in Figs. 7d–f.

A WPMM-like mode is also recovered in the CM3 dataset (Figs. 11d,e), but in this case the mode’s temporal pattern is somewhat mixed with an annual signal. The corresponding spatial pattern (Fig. 11f) exhibits strong anomalies in the western Pacific, but these anomalies are not as spatially pronounced as in CCSM4 (Fig. 8f), and no opposite-sign anomalies are observed in the eastern tropical Pacific. Spatial differences also occur elsewhere and in particular over the southern midlatitudes. The second CM3 decadal mode has a cleaner temporal pattern (Figs. 11g,h) in comparison to its CCSM4 counterpart. Its corresponding spatial composite (Fig. 11i) resembles the IPO, featuring an ENSO-like signal over the tropical eastern Pacific, which is surrounded by same-sign but weaker anomalies prevailing over some parts of the Indian and Pacific Oceans.

## 7. Spatiotemporal modes recovered from observational data

In this section, we present the Indo-Pacific SST mode family recovered by NLSA from the 140-yr HadISST dataset spanning the industrial era (1870–2013). As stated in section 3b we do not detrend the data. Moreover, we work with the same bandwidth and cone kernel parameters for NLSA as with the model data (ε = 1 and ζ = 0.99), but we decrease the embedding window to Δ*t* = 4 yr, as we found that for longer embedding windows (including the 20-yr embedding window employed earlier) the top part of the spectrum becomes dominated by trend modes and “combination modes” between trend and periodic modes. Since the behavior of NLSA for such long embedding windows compared to the total time series length and simultaneous presence of trends is yet to be assessed, in what follows we present results obtained using a 4-yr embedding window.

With the choice of parameters described above, we identify a 16-mode family, consisting of eigenfunctions 1, …, 6, 11, 14, …, 19, 25, 26, 29 (ordered as described in section 2b), which we label *ϕ*_{1}, …, *ϕ*_{16} as in sections 4 and 6. This family consists of periodic modes representing the annual cycle and its first two harmonics {*ϕ*_{1}, …, *ϕ*_{6}}, the fundamental component of ENSO together with the associated ENSO-A1 and ENSO-A2 combination modes {*ϕ*_{7}, …, *ϕ*_{12}}, an IPO mode *ϕ*_{13}, a pair of TBO modes {*ϕ*_{14}, *ϕ*_{15}}, and a WPMM mode *ϕ*_{16}. Besides these modes, the NLSA spectrum from HadISST data contains trend like modes as well as “combination modes” representing modulations of the internal variability modes by the trend modes. We defer study of these modes to future work. Also, taking into account the sensitivity analysis in section 5c, we express caution about the robustness of the IPO and WPMM recovered from the HadISST data, though we find that the properties of these modes are broadly consistent with those recovered from the model data. Similarly, our confidence on the TBO modes from HadISST is somewhat diminished since we will see that its frequency peaks are not well resolved, but nevertheless that mode retains some of the qualitative features observed in its counterpart from the model data. In separate calculations, we have computed NLSA modes from the portion of the HadISST dataset spanning the satellite era and found that the periodic, ENSO, ENSO-A modes can also be recovered from that shorter dataset, although the WPPM, IPO, and TBO are not successfully recovered. In what follows (sections 7a–7c), we describe the properties of the HadISST mode family apart from the annual cycle modes whose properties are outlined in the supplementary material.

### a. ENSO and ENSO combination modes

First, we examine the collection of interannual modes *ϕ*_{7}, …, *ϕ*_{12}, whose power spectra, time series, and SST composites are shown in Fig. 12. The dynamic evolution of reconstructed SST, surface wind, and thermocline depth patterns associated with these modes will be discussed in more detail in Part II (along with movies in the supplementary material). Among these modes, the pair {*ϕ*_{7}, *ϕ*_{8}} (Figs. 12a–c) represents the fundamental component of ENSO. The dominant frequency of this pair, ν_{ENSO} = 0.25 yr^{−1}, compares well with the corresponding frequency peak identified in the corresponding mode from CCSM4 (Fig. 3a), but the spectra of the HadISST modes (Fig. 12a) are significantly broader. This discrepancy is at least partly caused by the different embedding windows used in each case, for, as shown in Fig. 9, the spectra of the HadISST ENSO modes compare well with the spectra of the CCSM4 ENSO modes computed for a 4-yr embedding window. A comparison of the SST composites in Fig. 12c and Figs. 3c and 10c reveals that the SST patterns from HadISST and CCSM4–CM3 are in good agreement over the tropical Pacific, but notable differences can be observed in the Indian Ocean and the Maritime Continent north of Borneo. In particular, instead of the dipolar Indian Ocean SST anomaly patterns observed in CCSM4 and CM3, the HadISST composites exhibit SST anomalies, which have the same sign as the main eastern Pacific ENSO anomaly, and extend over the majority of the Indian Ocean and the Maritime Continent region to the north of Borneo. In Part II, we will see that the reconstructed SST fields from the HadISST ENSO modes reproduce well the historical El Niño–La Niña events, as is also suggestive by the eigenfunction time series in Fig. 12b (e.g., notice the strong activity in the second half of the 1990s associated with the 1997/98 El Niño and the 1998–2000 La Niñas).

Modes {*ϕ*_{9}, *ϕ*_{10}} (Figs. 12d–f) and {*ϕ*_{11}, *ϕ*_{12}} (Figs. 12g–i) have spectral peaks at the ENSO combination frequencies ν_{ENSO} − 1 yr^{−1} and ν_{ENSO} + 1 yr^{−1}, respectively, and their amplitude envelopes correlate significantly with the envelopes of the fundamental ENSO modes [correlation coefficients 0.96 and 0.78, respectively, at *p* ≈ 0 using a *t* test with (140 yr^{−1}) × ν_{ENSO} − 2 = 33 degrees of freedom]. Moreover, the amplitudes and phases of these modes have relationships with the fundamental ENSO modes {*ϕ*_{7}, *ϕ*_{8}} analogous to those shown in Fig. 6 for the corresponding CCSM4 modes. Based on these facts, we conclude that the pairs {*ϕ*_{9}, *ϕ*_{10}} and {*ϕ*_{11}, *ϕ*_{12}} are combination modes between ENSO and the annual cycle; we denote these pairs ENSO-A1 and ENSO-A2 following the notation of sections 4b and 6. The SST composites from these modes (Figs. 12f,i) exhibit significant activity over the Indian Ocean (and in particular the region west of the Sunda Strait) as in the case of the ENSO-A modes from the model data (Figs. 3 and 10). However, the ENSO-A composites from HadISST differ in their prominent activity in the central Pacific (at ~120°W). In particular, the ENSO-A1 composite (Fig. 12i) features two prominent anomaly clusters of opposite sign, which are not seen in the model-based composites (Figs. 3i and 10i). Despite these differences, the ENSO and ENSO-A combination modes from HadISST explain comparable variance in the Indian Ocean as their counterparts from CCSM4 and CM3. Indeed, as shown in the variance maps in Fig. 13 the ENSO-A modes families from HadISST each explain ~10% of the nonperiodic SST variance in the eastern equatorial Indian Ocean region off the Sunda Strait, which is comparable to the corresponding variances explained by the CCSM4 modes (see section 4b).

### b. Biennial modes

HadISST mode pair {*ϕ*_{14}, *ϕ*_{15}}, depicted in Fig. 14a, displays a cluster of SST anomalies in the Maritime Continent and off Australia’s north coast, together with a cluster of opposite-sign anomalies in the eastern equatorial Pacific off the Galapagos Islands (Fig. 14c). Moreover, the frequency spectra of these modes (Fig. 14a) show evidence of spectral peaks at the TBO frequency ν_{TBO} = 0.5 yr^{−1} and the corresponding combination frequency ν_{TBO} + 1 yr^{−1} = 1.5 yr^{−1} (though these peaks are of marginal statistical significance). Because of these spatial and spectral characteristics, we interpret modes {*ϕ*_{14}, *ϕ*_{15}} as representations of the TBO that mix the fundamental TBO frequency and the TBO combination frequency with the annual cycle into a single pair of modes (cf. the TBO modes from CCSM4 presented in section 4c, which resolve these frequencies into distinct mode pairs). The shortness of the embedding window used to compute modes from the observational data may be contributing to the observed mixing of the TBO frequencies.

### c. Decadal modes

Figures 14d–i display the power spectra, time series, and SST composites associated with modes *ϕ*_{16} and *ϕ*_{13}—the modes recovered from the HadISST data that most closely resemble the WPMM and IPO modes, respectively. In particular, Fig. 14e shows that *ϕ*_{16} describes a multidecadal oscillation (albeit with noticeable mixing with high-frequency variability), essentially completing a single cycle over the 140-yr-long HadISST dataset (the time average of the *ϕ*_{16} time series is close to zero). The corresponding SST composites (Fig. 14f) feature a cluster of negative SST anomalies in the central tropical Pacific, which broadly resembles the western Pacific anomaly cluster in the WPMM recovered from CCSM4 but shifted to the east. The *ϕ*_{16} composite in Fig. 14f also features a central-eastern Pacific dipole as seen in the WPMM composite from CCSM4, but instead of the positive SST anomalies developing off the southern and eastern coasts of Australia in CCSM4, the HadISST mode features a positive anomaly cluster in the south-central Pacific. As stated earlier, owing to the shortness of the available HadISST data we do not have particularly high confidence in that mode, though in separate calculations we have confirmed that it appears in the NLSA spectrum for different embedding window lengths in the range 4–20 yr. Notably, the time series of *ϕ*_{16} correlates strongly with the low-frequency ENSO envelope. Turning now to *ϕ*_{13}, this mode is characterized by a red-noise-like power spectrum (Fig. 14g) and an SST composite (Fig. 14i) characteristic of the IPO. As with the IPO modes from CCSM4 and CM3, the time series of *ϕ*_{13} is noticeably influenced by interannual variability. Nevertheless it exhibits a discernible decadal signal that broadly resembles IPO variability.

## 8. Concluding remarks

In this paper, we have studied the variability of Indo-Pacific SST in millennial control integrations of the CCSM4 and CM3 coupled climate models and HadISST observational data for the industrial and satellite eras using a recent data analysis technique called nonlinear Laplacian spectral analysis (GM; Giannakis 2015). An advantage of NLSA over traditional eigendecomposition techniques such as EOF analysis and SSA is that it requires no prefiltering of the input data to isolate the time scales of interest. Instead, NLSA extracts a multiscale hierarchy of modes through the eigenfunctions of a single kernel operator designed to extract intrinsic time scales of data generated by complex dynamical systems. Applied to Indo-Pacific SST data, NLSA recovers a hierarchy of physically meaningful modes spanning seasonal to multidecadal time scales. Here, we have focused on the spatiotemporal properties of a subset of this family that includes the annual cycle and its harmonics, ENSO, combination modes between ENSO and the annual cycle (denoted here as ENSO-A modes), representations of the TBO (together with its associated combination modes) and IPO, and a new low-frequency pattern, which we refer to as the west Pacific multidecadal mode (WPMM). This family was best recovered in the 1300-yr-long CCSM4 dataset, where it was also found to be robust under changes of NLSA parameters and spatial domain and addition of i.i.d. Gaussian noise. The periodic, ENSO, and ENSO-A modes were also robustly recovered in CM3 and HadISST data (the latter both for the industrial and satellite eras), but the TBO, WPMM, and IPO modes were found to be of poorer quality (in the sense of time-scale mixing) in CM3 and industrial-era HadISST, and were not detectable in satellite-era HadISST data. A similar degradation of quality took place in separate analyses of ~150-yr portions of the CCSM4 data, suggesting that the poor quality of the TBO, WPMM, and IPO modes in HadISST data may be due to the time span of that dataset as opposed to absence of these modes in nature.

We showed that the ENSO and ENSO-A modes are naturally recovered through NLSA from raw monthly Indo-Pacific SST data from both models and observations. In particular, the recovered temporal patterns have a high accuracy of the frequencies, phase relationships, and multiplicities expected theoretically from quadratic nonlinearities in the governing equations of the coupled atmosphere–ocean system (McGregor et al. 2012; Stein et al. 2011, 2014; Stuecker et al. 2013, 2015a; Ren et al. 2016). Moreover, we showed that the ENSO-A modes explain a significant portion of SST variability over the area traditionally used to define IOD indices (Saji et al. 1999). Specifically, these modes explain ~35% of the raw variance in these regions after removal of the seasonal cycle. These results suggest that a significant portion of IOD variability (and potentially, predictability) can be associated with deterministic modulations of the annual cycle by ENSO.

Similarly to ENSO, we found that the phase locking of the TBO to the seasonal cycle can be represented through a distinct family of oscillatory patterns and their associated combination modes (denoted TBO-A modes). However, possibly owing to their low amplitude in comparison to ENSO, a clean identification of these patterns was only possible in the 1300-yr-long CCSM4 data. In particular, even though we did recover biennial modes in CM3 and in industrial-era HadISST data, these modes mixed together the TBO and TBO-A frequencies. Thus, our results from CCSM4 are consistent with studies attributing a dynamical origin to the TBO (e.g., Meehl and Arblaster 2002; Li et al. 2006; Meehl and Arblaster 2011), but the challenges in detecting these modes in the other datasets studied here are also representative of difficulties in TBO detection reported in other studies (Stuecker et al. 2015b).

Another result stemming from this work has been the identification of the WPMM as a distinct mode of decadal Indo-Pacific SST variability. The dominant feature of this mode is a persistent cluster of SST anomalies in the western Pacific. This pattern, which was most clearly observed in the CCSM4 data and for a wide range of NLSA parameters, is distinct from the eastern Pacific decadal SST anomalies typically associated with the IPO (Power et al. 1999; Meehl et al. 2013) and resembles more closely the second EOF of low-pass-filtered SST (Timmermann 2003; Yeh and Kirtman 2004; Sun and Yu 2009; Ogata et al. 2013). As with the TBO, we find that the WPMM is best recovered from the CCSM4 data, where it is robust under changes of NLSA parameters and the choice of spatial domain. However, because of its multidecadal nature and small explained variance, this mode requires datasets spanning several centuries in order for the sampling density in state space to be high enough for its robust detection. In particular, while modes resembling the WPMM from CCSM4 were recovered from both CM3 and industrial-era HadISST, these modes have noisier temporal patterns mixing decadal and higher-frequency time scales. Similar issues were encountered in experiments with ~150-yr portions of the CCSM4 dataset, suggesting that the poorer quality of the WPMM in CM3 and HadISST is at least partly caused by the time series length as opposed to absence of that mode. Still, it is also likely that the differences in the WPMM (as well as the other modes studied in this work) between the CCSM4, CM3, and HadISST datasets are not just outcomes of the time span and the NLSA parameters in the analysis, but may be also caused by model biases (Deser et al. 2012) and the fact that the HadISST dataset exhibits a significant forced variability component. In particular, even though our results suggest that NLSA performs a reasonable decomposition of the internal and forced variability in the HadISST dataset, the theoretical properties of NLSA (and, to our knowledge, other comparable eigendecomposition techniques) in the presence of trends are not well understood, and it is possible that forced variability contributes to mode discrepancies.

In summary, the family of NLSA modes presented here provides an attractive low-dimensional basis to characterize numerous features of the Indo-Pacific climate system. In Part II, we will study the SST, surface wind, thermocline depth, and precipitation patterns represented by the ENSO and TBO modes and the associated coupling between the Pacific and Indian Oceans, as well as the relationship between ENSO and TBO activity and the WPMM in CCSM4 data. We will also demonstrate the impacts of the WPMM on decadal precipitation over Australia in CCSM4.

## Acknowledgments

J. Slawinska received support from the Center for Prototype Climate Modeling at NYU Abu Dhabi and NSF Grant AGS-1430051. The research of D. Giannakis is supported by ONR Grant N00014-14-0150, ONR MURI Grant 25-74200-F7112, and NSF Grant DMS-579 1521775. This research was partially carried out on the high-performance computing resources at New York University Abu Dhabi. We thank Sulian Thual and Benjamin Lintner for stimulating comments and feedback. We also thank three anonymous reviewers for detailed and constructive comments that helped us to improve the manuscript.

### APPENDIX

#### Comparison with SSA

For completeness, we have compared the NLSA spatiotemporal patterns described in section 4 with the corresponding patterns recovered through SSA (Ghil et al. 2002) using the same Indo-Pacific SST dataset from CCSM4 and Δ*t* = 20 yr embedding window as in section 4. Representative temporal patterns from SSA and the associated phase composites are shown in Fig. 15 and Fig. 16, where the modes are ordered in order of decreasing singular value. The main commonalities and differences between the two approaches are as follows.

As in NLSA, the leading pairs of SSA modes are doubly degenerate periodic modes associated with the annual cycle and its harmonics. The SSA spectrum contains two such pairs of annual and semiannual modes (not shown), whose spatial patterns are in good agreement with the corresponding NLSA patterns in Fig. 1. We note that the SSA also recovers a pair of triennial modes (also not shown here) analogous to the NLSA modes {*ϕ*_{5}, *ϕ*_{6}}, but the SSA triennial pair appears as the 22nd and 23rd modes in the spectrum. This is an example of the differences in mode ordering that can occur between NLSA (which orders the modes in order of increasing roughness on the data manifold) and SSA (which orders the modes in order of decreasing explained variance).

The next pair of SSA modes (Figs. 15a–e) represents the fundamental component of ENSO. As in the case of the NLSA ENSO modes in Figs. 3a–c, the SSA ENSO modes form a 90° out-of-phase oscillatory pair with an interannual dominant frequency of ≃4 yr^{−1} and the characteristic SST patterns of eastern Pacific El Niño–La Niña. Moreover, the SSA ENSO modes have a decadal envelope, which correlates almost perfectly with the corresponding envelope from NLSA [correlation coefficient 0.99; hereafter, unless otherwise stated all amplitude envelopes are extracted via Hilbert transforms and the sample correlation coefficients have numerically zero *p* values based on *t* tests with (1300 yr) × ν_{ENSO} − 2 = 323 degrees of freedom].

Beyond the fundamental ENSO pair, the NLSA and SSA spectra differ significantly. In particular, the next several SSA modes have the structure of secondary ENSO modes as opposed to the ENSO-A combination modes recovered by NLSA. This is another example of the differences in mode ordering between SSA and NLSA. According to NLSA, the ENSO-A modes are represented by smoother functions on the data manifold than the secondary interannual modes, and therefore the former are leading the latter in the mode ordering, despite that the secondary interannual modes explain more variance than the ENSO-A modes. Two pairs of SSA modes that resemble the ENSO-A1 (Figs. 3d–f) and ENSO-A2 (Figs. 3g–i) modes are modes {27, 28} (Figs. 15d–f) and {48, 49} (Figs. 15g–i), respectively. These pairs exhibit the dominant frequencies that are consistent with the theoretical ENSO-A frequencies. However, the envelopes of these modes do not correlate as strongly with the fundamental ENSO envelope as they do in NLSA. Specifically, the corresponding correlation coefficients are 0.75 and 0.94, respectively (recall that in case of NLSA, the corresponding values are greater than 0.99). In the spatial domain, SSA modes {27, 28} and {48, 49} are in reasonably good qualitative agreement with their NLSA counterparts in Figs. 3f and 3i, respectively; in particular, they exhibit strong activity in the eastern Indian Ocean region employed in IOD indices and southward-propagating anomalies off the west coast of Australia as discussed in section 4b. Besides the ENSO and ENSO-A modes, the SSA spectrum contains a pair of biennial modes, {35, 36} (Figs. 15j–l), which have significantly different spatiotemporal characteristics than the corresponding NLSA TBO modes (Figs. 7a–c). We did not find evidence of TBO-A combination modes in the SSA spectrum.

Next, consider the decadal modes. Figures 16a–c displays the temporal pattern, frequency spectrum, and SST composite corresponding to the leading SSA decadal mode, mode 32. This mode has a red-noise-like frequency spectrum carrying the majority of its power on decadal time scales, though compared to the NLSA WPMM spectrum (Fig. 8d) the frequency spectrum of the SSA mode decays less steeply and exhibits more power on shorter time scales. In the spatial domain, there is also a moderate agreement between SSA mode 32 and the WPMM in that both modes feature a decadal SST anomaly cluster in the equatorial western Pacific coexisting with opposite-sign anomalies in the South Pacific, but in the case of SSA the amplitude is weaker. Also, SSA mode 32 has weaker correlation (correlation coefficient 0.48) with the amplitude envelope of the SSA ENSO mode than the WPMM has with the amplitude envelope of the NLSA ENSO mode (correlation coefficient 0.63; see Part II).

SSA modes 43 and 45 (Figs. 16d–i) are also of decadal character. In the spatial domain, these modes describe an oscillation featuring a cluster of SST anomalies that originates in the equatorial eastern Pacific and propagates westward until it merges with an anomaly of the same sign that develops in the western equatorial Pacific. At the same time, another same-sign anomaly cluster emerges in the South Pacific east of the coast of New Zealand and travels toward the east. Overall, these motions produce an apparent basin-scale anticyclonic motion in the South Pacific Ocean. We were not able to identify a clear analog of this pattern in the NLSA spectrum. Besides modes 43–45, SSA contains additional decadal modes, but these modes are generally mixed with high-frequency (~1 yr^{−1}) oscillations and we do not discuss them here. We also note that as with NLSA there are trend like modes in the SSA spectrum (not shown) that capture the small model drift present in the CCSM4 dataset.

In summary, applied to the Indo-Pacific SST dataset from CCSM4, both SSA and NLSA are able to extract modes of adequate time scale separation, spanning seasonal to multidecadal time scales. The leading modes, in particular the annual and semiannual modes and the fundamental component of ENSO, are in good agreement between the two methods. Further down the spectrum, the two methods have significant differences. In particular, NLSA has higher skill in capturing the combination modes associated with the nonlinear interaction between ENSO and the annual cycle, and it also provides a better representation of the TBO and its associated combination modes. On the decadal scale, both SSA and NLSA produce clean decadal spatiotemporal patterns, although we were not able to find a direct correspondence between the decadal modes extracted via the two methods. In particular, we were not able to identify a decadal mode in the SSA spectrum whose amplitude modulation relationships with ENSO have comparable strength to those of the WPMM.

## REFERENCES

*Proc. Conf. on Intelligent Data Understanding 2011*, Mountain View, CA, NASA, 107–117.

*Mathematical and Computational Modeling: With Applications in Engineering and the Natural and Social Sciences*, R. Melnik, Ed., Wiley, 137–191.

*El Niño, La Niña, and the Southern Oscillation*. International Geophysics Series, Vol. 46, Academic Press, 293 pp.

*The El Niño-Southern Oscillation Phenomenon.*Cambridge University Press, 384 pp.

*Dynamical Systems and Turbulence, Warwick 1980*, Lecture Notes in Mathematics, Vol. 898, Springer, 366–381, doi:.

*Reference Module in Earth Systems and Environmental Sciences*, S. Elias, Ed., Elsevier, doi:.

*Earth’s Climate: The Ocean–Atmosphere Interaction*, C. Wang, S.-P. Xie, and J. A. Carton, Eds.,

*Geophys. Monogr.*, Vol. 147, Amer. Geophys. Union, 21–48, doi:.

*Coral Reefs of the Eastern Tropical Pacific: Persistence and Loss in a Dynamic Environment*, P. W. Glynn, D. P. Manzello, and I. C. Enoch, Eds., Coral Reefs of the World, Vol. 8, Springer, 85–106, doi:.

## Footnotes

Supplemental information related to this paper is available at the Journals Online website: http://dx.doi.org/10.1175/JCLI-D-16-0176.s1.

© 2017 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).