Nonlinear Principal Component Analysis: Tropical Indo–Pacific Sea Surface Temperature and Sea Level Pressure

Adam Hugh Monahan Oceanography Unit, Department of Earth and Ocean Sciences, and Crisis Points Group, Peter Wall Institute for Advanced Studies, University of British Columbia, Vancouver, British Columbia, Canada

Search for other papers by Adam Hugh Monahan in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

Nonlinear principal component analysis (NLPCA) is a generalization of traditional principal component analysis (PCA) that allows for the detection and characterization of low-dimensional nonlinear structure in multivariate datasets. The authors consider the application of NLPCA to two datasets: tropical Pacific sea surface temperature (SST) and tropical Indo–Pacific sea level pressure (SLP). It is found that for the SST data, the low-dimensional NLPCA approximations characterize the data better than do PCA approximations of the same dimensionality. In particular, the one-dimensional NLPCA approximation characterizes the asymmetry between spatial patterns characteristic of average El Niño and La Niña events, which the 1D PCA approximation cannot. The differences between NLPCA and PCA results are more modest for the SLP data, indicating that the lower-dimensional structures of this dataset are nearly linear.

Corresponding author address: Dr. Adam H. Monahan, Oceanography Unit, Dept. of Earth and Ocean Sciences, University of British Columbia, 6270 University Boulevard, Vancouver, BC V6T1Z4, Canada.

Abstract

Nonlinear principal component analysis (NLPCA) is a generalization of traditional principal component analysis (PCA) that allows for the detection and characterization of low-dimensional nonlinear structure in multivariate datasets. The authors consider the application of NLPCA to two datasets: tropical Pacific sea surface temperature (SST) and tropical Indo–Pacific sea level pressure (SLP). It is found that for the SST data, the low-dimensional NLPCA approximations characterize the data better than do PCA approximations of the same dimensionality. In particular, the one-dimensional NLPCA approximation characterizes the asymmetry between spatial patterns characteristic of average El Niño and La Niña events, which the 1D PCA approximation cannot. The differences between NLPCA and PCA results are more modest for the SLP data, indicating that the lower-dimensional structures of this dataset are nearly linear.

Corresponding author address: Dr. Adam H. Monahan, Oceanography Unit, Dept. of Earth and Ocean Sciences, University of British Columbia, 6270 University Boulevard, Vancouver, BC V6T1Z4, Canada.

1. Introduction

Principal component analysis (PCA), also known as empirical orthogonal function (EOF) analysis, is a powerful technique for the objective characterization of low-dimensional linear structure in multivariate datasets. Consequently, it has enjoyed broad application in meteorology and oceanography (for a review, see von Storch and Zwiers 1999). However, the constraint that only linear structure can be detected is a strong one, and in general it leads to suboptimal low-dimensional approximations. PCA can be viewed as one of many feature extraction methods, which in general are concerned with the characterization of low-dimensional structure in multivariate datasets. Kramer (1991) demonstrated that if PCA is formulated as a variational problem for the detection of low-dimensional linear structure, then it has a natural extension to the more general problem of characterizing low-dimensional nonlinear structure and a straightforward implementation using feed-forward neural networks. He denoted this generalization nonlinear principal component analysis (NLPCA). Monahan (2000a) discussed the application of NLPCA to climatological datasets and demonstrated that the method was able to produce more representative one- and two-dimensional approximations of the Lorenz attractor, a dataset with substantial nonlinear low-dimensional structure, than was PCA. In particular, the one- and two-dimensional NLPCA approximations explained, respectively, 76% and 99.5% of the variance in the Lorenz data, as compared to 60% and 95% for the PCA approximation. While NLPCA has been applied in a number of different fields [see Monahan (2000a) for an overview], it has not yet been systematically applied to climatological datasets, apart from a single unpublished report by Sengupta and Boyle (1995), the results of which were equivocal.

In this paper, we apply NLPCA to two geophysical datasets: tropical Pacific Ocean sea surface temperature (SST) and tropical Indo–Pacific sea level pressure (SLP). Section 2 provides a brief overview of NLPCA, and section 3 describes the methodology used to construct the NLPCA approximations. Sections 4 and 5 describe the results of the NLPCA of sea surface temperature and sea level pressure, respectively, and a summary and conclusions are given in Section 6.

2. Nonlinear principal component analysis

Feature extraction problems, in particular PCA and NLPCA, are described in detail in Monahan (2000a). We provide here a brief overview of NLPCA and its implementation. Nonlinear principal component analysis solves the following feature extraction problem: given the (climatic) dataset X(tn) ∈ ℜM, n = 1, . . . , N, find sf: ℜM → ℜP and f: ℜP → ℜM, 1 ⩽ P < M, such that
ϵ2X2
is minimized, where
tnfsfXtn
the angle brackets denote an average over time, and ‖v‖ denotes the L2 norm of the vector v. The notation g ∘ h denotes the composition of functions g and h, that is, [g ∘ h](x) = g[h(x)]. It is assumed that the data are zero-centered in time:
X0.
The function sf maps the original data, of dimension M, to a space of smaller dimension P, while f is a map from this space back to ℜM. The composition f ∘ sf maps the original data to points on the P-dimensional manifold (tn) embedded in ℜM, which is an optimal approximation to the original data in the sense that the sum of squared errors (1) is minimized. In other words, subject to constraints on sf and f, the approximation (tn) is constructed so as to run through the middle of the data.
In particular, if the functions sf and f are constrained to be linear, then the problem reduces to traditional principal component analysis:
tnfsfXtnTXtnTXtn
where Π is the P × M matrix whose kth row is the kth empirical orthogonal function, or loading, ek. In this case, the action of f ∘ sf is to project the original data into the P-dimensional linear subspace of ℜM spanned by the first P EOFs of X(tn). This P-dimensional approximation is optimal if the data points X(tn) are drawn from a distribution whose structure is naturally represented by a set of orthogonal axes (e.g., multivariate Gaussian). If the data contain a lower-dimensional structure that is not planar, however, this approximation is suboptimal.
We can generalize PCA to allow the determination of lower-dimensional, nonplanar structure by allowing the functions sf and f to be nonlinear. An algorithm for the implementation of this generalized NLPCA, involving the use of a five-layer feed-forward neural network, was introduced by Kramer (1991). Feed-forward neural networks are described in appendix A, and the network used to implement NLPCA is illustrated in Fig. 1. The first (input) and fifth (output) layers each contain M neurons, while the third (bottleneck) layer contains P neurons. Layers 2 and 4, denoted, respectively, the encoding and decoding layers, both contain L neurons. Their transfer functions (see appendix A) are hyperbolic tangents. The transfer functions of the bottleneck and output layers are linear. The network parameters (weights and biases) are adjusted using a conjugate gradient algorithm until the network output, denoted N[X(tn)], minimizes the sum of squared differences:
ϵ2XtnNXtn2
(subject to certain caveats discussed in the subsequent section). Because the output of the network is adjusted until it matches the input as closely as possible, the network is said to be autoassociative. Layers 1 through 3 estimate the map sf from ℜM to ℜP, while layers 3 through 5 estimate the map f from ℜP to ℜM. The entire network is the composition of these two maps, so that (tn) = N [X(tn)]. Using the optimal approximation result of Cybenko (1989, discussed in appendix A), it is clear that as long as L is sufficiently large, Kramer’s network can solve the feature extraction problem stated above, with sf and f constrained only to be continuous. If all transfer functions in the network N are set to be linear, the algorithm reduces to traditional PCA (Baldi and Hornik 1989). Note that as sf and f are constrained to be continuous functions, NLPCA cannot model surfaces that are discontinuous or self-intersecting (Malthouse 1998). However, by employing a seven-layer autoassociative neural network, the method can be generalized to allow the characterization of such surfaces (Monahan 2000b).
As is always the case when mathematical models are generalized, certain features of the original model survive the generalization, and other features are lost. An important characteristic of PCA is that it partitions variance, in the sense that if (tn) is the PCA approximation to X(tn), then
i1520-0442-14-2-219-e6
which is to say that the total variance of X(tn) is the sum of the total variance of (tn), with the total variance of the residuals X(tn) − (tn). This partition is important, as it allows one to describe a PCA approximation as describing a given fraction of the variance in the system. Considering a dataset sampled from the Lorenz system, Monahan (2000a) found empirically that NLPCA also partitions variance in this sense. While we are not aware of a proof of this result, it is found again in all examples considered in this paper. Consequently, we are comfortable according it the status of an empirical fact.
A feature that does not survive the generalization is the additive nature of a P-dimensional PCA approximation. The P-dimensional PCA approximation to X(tn) may be found by first finding the 1D PCA approximation (1)(tn), subtracting this from the original data to form the residual R(1)(tn) = X(tn) − (1)(tn), determining the 1D PCA approximation of R(1)(tn), (2)(tn), and iterating this procedure until P successive 1D approximations have been found. The resulting P-dimensional approximation will be the sum of the (i)(tn). Alternatively, the optimal P-dimensional approximation can be determined all at once. With PCA, these two approaches are equivalent, because any linear function, g, of P variables (u1, . . . , uP) has the additive structure
i1520-0442-14-2-219-e7
With NLPCA, however, the two approaches are distinct, because a nonlinear function f cannot usually be decomposed as a generalized additive model:
fu1u2uPf1u1f2u2fPuP
for some f1, f2, . . . , fP. Note that this is unrelated to the fact that NLPCA approximations are found to partition variance. We shall refer to the first, iterative approach as a modal analysis and to the second as nonmodal. Naturally enough, in a modal analysis, each 1D approximation will be referred to as a mode and ordered in terms of decreasing fraction of variance explained. We will compare both the modal and nonmodal approaches in this paper. Theoretically, the nonmodal P-dimensional approximation should be superior to the modal approximation, because it will be drawn from a broader class of functions, although the modal analysis is more amenable to interpretation. Of course, a general P-dimensional analysis could involve both modal and nonmodal decompositions at various stages; such mixed modal/nonmodal analyses will not be considered here.

3. Model building

Neural networks are powerful function approximation tools, and the avoidance of overfitting is a primary issue in their implementation (Finnoff et al. 1993; Yuval 2000). A model is said to be overfit if its performance on the data used to determine the model parameters (the training data) is good, but is poor on other data. To avoid overfitting, we have used a simple technique called early stopping, as in Monahan (2000a). Because the neural network is nonlinear in the model parameters, these must be determined iteratively, in a process referred to as training. In early stopping, the training is terminated before the error function is minimized, according to a well-defined stopping criterion. In essence, the idea behind early stopping is that the training is allowed to continue sufficiently long to fit the structure underlying the data, but not long enough to fit the noise. The term “noise” refers to any variability that cannot be robustly characterized by the approximation being constructed. The strategy we employed was to hold aside a fraction (in our case, 20%) of randomly selected data points in a validation set not used to train the network. While network training proceeded, the network performance on the validation set was monitored, and training was stopped when this performance began to degrade, or after 5000 iterations, whichever came first. Note that other methods exist for robustly minimizing the error function (Yuval 2000); early stopping was used because of its relative transparency and ease of implementation.

The early stopping algorithm described above confers on the training results a degree of sensitivity to the network parameters used to initialize the iterative training procedure; this is exacerbated by the possible existence of multiple minima in the error function (5). To address this problem, an ensemble of at least 20 training runs starting from different, randomly chosen, initial parameter values was carried out, for each analysis performed. The training results from this ensemble were examined, and those members of the ensemble for which the final error over the validation set was greater than that over the training set were discarded. We shall refer to the remaining members of the ensemble as candidate models.

The number of neurons L in the encoding and decoding layers determines the class of functions that sf and f can take. As L increases, there is more flexibility in the forms of sf and f, but the model also has more parameters, implying both that the error surface becomes more complicated and that the parameters are less constrained by data. Consequently, if L is large, the scatter among the candidate models can be large, as measured by the normalized mean square distance (NMSD). The NMSD between approximations {1}(tn) and {2}(tn) is defined as
i1520-0442-14-2-219-e9
This statistic was introduced in Monahan (2000a), in which it was found that NLPCA approximations for which the NMSD was less than about 2% were essentially indistinguishable. In the end, the number L of neurons used in the encoding and decoding layers was the maximum such that the NMSD between NLPCA approximations to X(tn) in the candidate model set was less than 5%. This threshold value of NMSD was chosen on the basis of experience and intuition, and not on the basis of any rigorous sampling theory for this test statistic (which we do not have). In other words, for any given analysis, the value of L used in the NLPCA network is the largest that produces a robust set of candidate models. The early stopping technique ensures that the NLPCA approximation is robust with respect to the introduction of new data, and the existence of a set of similar candidate models (as measured by NMSD) ensures that the approximation is robust with respect to the initial parameter values used in the training.
Finally, once L was determined and a set of candidate models obtained, the model selected as “the” NLPCA approximation was the one with the highest fraction of explained variance (FEV),
i1520-0442-14-2-219-e10
which is a meaningful statistic because NLPCA partitions variance as described in Eq. (6). The FEV differed little between candidate models.

4. Tropical Pacific sea surface temperature

We consider first a dataset composed of monthly averaged National Oceanic and Atmospheric Administration sea surface temperatures (SST) for the tropical Pacific Ocean. The data are on a 2° × 2° grid from 19°S to 19°N and from 125°E to 69°W, and span the period from January 1950 to December 1998 (Smith et al. 1996). A climatological annual cycle was calculated by averaging the data for each calendar month, and monthly SST anomalies (SSTAs) were defined relative to this annual cycle. To render the NLPCA problem tractable, the dataset was preprocessed by projecting it on the space of its first 10 EOF modes {ek}10k=1, in which 91.4% of the total variance is contained. By doing so, we take advantage of the data compression aspect of PCA, which is distinct from feature extraction, for which we shall use NLPCA. Such preprocessing of data to reduce the problem to a manageable size is common in rotated PCA (Barnston and Livezey 1987) and in statistical forecasting (Barnston 1994; Tangang et al. 1998). The first three EOF spatial patterns of SSTA are displayed in Fig. 2; these explain, respectively, 57.6%, 10.9%, and 6.8% of the total SST variance.

A scatterplot of the two leading principal component time series is shown by the solid dots in Fig. 3a. The time series corresponding to these two PCA modes are uncorrelated, but they are clearly not independent. Indeed, Fig. 3a indicates that both strongly positive and negative values of the first PCA time series are associated with negative values of the second PCA time series. Physically, this describes the fact that the strongest positive SST anomalies during an average El Niño event lie closer to the eastern boundary of the Pacific than do the coldest anomalies during an average La Niña event. We shall return to this point later.

We consider first a modal NLPCA decomposition of this SSTA data. Mode 1, found using a network with L = 4 nodes in the encoding and decoding layers and a single neuron in the bottleneck layer, explains 69.1% of the variance in the 10-dimensional EOF space and, thus, 63.3% of the variance in the total SSTA data, as compared to 57.6% explained by the 1D PCA approximation. Four candidate models were obtained from an ensemble of 20; these models differed with an NMSD of at most 1%. Projections of the first NLPCA mode onto the subspaces spanned by the SSTA EOFs (e1, e2), (e2, e3), (e1, e3), and (e1, e2, e3) are shown by the open circles in Figs. 3a–d, respectively. All four projections are shown because it is difficult to understand the structure of the NLPCA approximation from a single projection alone. This is particularly evident in Fig. 3b, the curve, viewed edge-on, appears to be self-intersecting, when in fact the other projections demonstrate that this is not the case. Figure 3a indicates that NLPCA mode 1 characterizes the structure discussed in the previous paragraph; NLPCA mode 1 is primarily a mixture of PCA modes 1 and 2. Associated with this mode is the standardized time series
i1520-0442-14-2-219-e11
corresponding to the (standardized) output of the single neuron in the bottleneck layer. A plot of α1(tn) appears in Fig. 4. This time series bears a strong resemblance to the Niño-3.4 time series (defined as the average SSTA over a box from 7°S to 7°N, and from 119° to 171°W), also displayed in Fig. 4; the correlation coefficient between the two series is 0.88.

In contrast to PCA, no single spatial pattern is associated with any given NLPCA mode. The approximation , however, corresponds to a sequence of patterns that can be visualized cinematographically. This cinematographic interpretation is implicit in traditional PCA; the 1D PCA approximation (tn) = [e1 · X(tn)]e1 describes the evolution in time of a standing oscillation. This oscillation has a fixed spatial structure with an amplitude that varies in time. The more general approximation (tn) = (f&thinsp∘ sf)[X(tn)], with sf and f nonlinear, is not so constrained, and can characterize more complex lower-dimensional structures. There is no reason in general to expect the optimal 1D approximation to a spatial field to be a standing oscillation, but standing oscillations are the only such approximations that traditional PCA can produce. The power of NLPCA lies in its ability to characterize more general lower-dimensional structure.

Figure 5 displays maps corresponding to the first NLPCA mode (1), which corresponds to values of the time series α1 = −3.5, −1.5, −0.75, −0.25, 0.25, 0.75, 1.5, 3.5. These values were chosen to provide a representative sample of spatial structures associated with the NLPCA approximation. Clearly, NLPCA mode 1 describes the evolution of average ENSO events. This should be contrasted with PCA mode 1, which describes only the standing oscillation associated with ENSO variability. This difference between NLPCA and PCA modes 1 results from the spatial asymmetry between the average warm and cold phases of ENSO. In particular, warm events described by NLPCA mode 1 display the strongest anomalies near the Peruvian coast, whereas the cold events are strongest near 150°W. This asymmetry in the evolution of NLPCA mode 1 arises because NLPCA mode 1 mixes PCA modes 1 and 2; for both El Niño and La Niña events, the PCA mode 2 spatial map (Fig. 2b) enters into the NLPCA mode-1 approximation with the same (negative) sign.

This spatial asymmetry between average El Niño and average La Niña events is manifest in a composite study. Figure 6a is a composite of November–December–January (NDJ) averaged SSTA for those years in which the NDJ Niño-3.4 index is greater than one standard deviation above the long-term mean; Figure 6b is the same for those NDJ for which Niño-3.4 is less than one standard deviation below the long-term mean. This averaging period was used for the composites because NDJ displays the largest variance of all 3-month seasons. These two maps correspond to the SSTA patterns of an average El Niño and an average La Niña, respectively. Note that, consistent with the maps corresponding to the 1D NLPCA approximation (Fig. 5), the largest SST anomalies tend to be located in the central Pacific during average La Niña events and centered in the eastern Pacific during average El Niño events. This asymmetry in the composite fields was previously noted by Hoerling et al. (1997).

The symmetric component of the composite El Niño and La Niña maps, as defined in appendix B, is displayed in Fig. 6c. This map, which in a rough sense characterizes the pattern in the composites that is related nonlinearly to the Niño-3.4 time series, bears a strong resemblance to SST EOF mode 2 (Fig. 2b). In fact, the spatial correlation between the two maps is −0.90. The antisymmetric component of the composite (not shown) bears a strong resemblance to EOF 1; the pattern correlation between these two is 0.975. Thus, PCA mode 1 may be interpreted as characterizing the component of average ENSO behavior that is antisymmetric between El Niño and La Niña events. By mixing EOF modes 1 and 2, NLPCA mode 1 is able to characterize the spatial asymmetry between average El Niño and average La Niña events. The bias of SST toward warm anomalies in the eastern part of the basin and toward cold anomalies in the western part is also evident in the study of Burgers and Stephenson (1999), who calculate the skewness of the observed SSTA distribution. It is interesting to note the striking similarity between their map of the spatial distribution of skewness (their Fig. 3a) and the symmetric component of the SSTA composite displayed in Fig. 6c.

A final comparison of the 1D NLPCA and 1D PCA approximations is given in Figs. 7a and 7b, which show, respectively, maps of the pointwise correlation between the original SSTA data and the 1D PCA approximation, and of the pointwise correlation between SSTA and the 1D NLPCA approximation. The two approximations are equally well correlated with the original data over the eastern-central half of the Pacific Ocean, except near the Ecuadorian coast, where the NLPCA correlations are somewhat higher than those of the PCA approximation. In the western Pacific, and in particular in the neighborhood of the EOF mode 1 zero line, the 1D NLPCA approximation displays a greater fidelity to the original data, as measured by the pointwise correlation, than does the 1D PCA approximation.

We now consider SSTA NLPCA mode 2, which was calculated using a network containing L = 3 neurons in the encoding and decoding layers. Figure 8 displays mode 2 projected onto the subspaces spanned by SSTA EOFs (e1, e2), (e2, e3), (e1, e3), and (e1, e2, e3). The five candidate models from an ensemble of 20 trials differed from each other with an NMSD that was always less than 4%. NLPCA mode 2 explains 11.1% of the variance in the original SSTA data. The associated standardized time series, α2(tn), is shown in Fig. 9. Interestingly, the correlation coefficient between α1(tn) and α2(tn) is −0.06; the two time series are uncorrelated. Figure 10 displays maps of SSTA NLPCA mode 2, (2), corresponding to α2 = −4, −1, −0.1, 0, 0.1, 0.2, 0.3, 0.4, 1. These values of α2 were selected to produce a representative sample of maps describing NLPCA mode 2. When α2 is strongly negative, SSTA NLPCA mode 2 is characterized by negative anomalies in the central and western Pacific and positive anomalies in the eastern Pacific. As α2 increases through zero, the anomalies decrease in magnitude, while the positive anomalies in the eastern part of the basin become increasingly concentrated in the equatorial region. Eventually, the region of positive SSTA breaks off from the coast of South America and migrates into the central Pacific. As α2 increases further, the SSTA pattern becomes the opposite of that for α2 near zero, with positive anomalies in the central and western Pacific and negative anomalies in the east. Finally, for α2, near the extreme positive part of its range, SSTA NLPCA mode 2 is characterized by negative anomalies along the equator, extending to the date line, with positive anomalies throughout the rest of the basin. Because the anomalies are often concentrated along the equator, it is reasonable to associate this mode with some aspects of ENSO variability not captured by mode 1. Indeed, it is interesting to note from Fig. 9 that NLPCA mode 2 is more active in the later part of the record than in the earlier. The two strong minima in α2(tn) coincide with the decay phases of the large El Niño events of 1982–83 and 1997–98, describing the lingering patches of warm water in the eastern tropical Pacific observed during these periods. Two of the three weaker minima in the post-1977 period are associated with the peaks of the La Niña events of 1984–85 and 1988–89. This is consistent with the fact that the cold anomalies during La Niñas in this later period are somewhat stronger, and more concentrated in the central Pacific Ocean and weaker in the eastern Pacific Ocean, than La Niña events in the earlier part of the record, as indicated by a composite analysis (not shown).

A number of studies have noted a shift in ENSO variability in 1977 (e.g., Wang 1995). The apparent nonstationarity of α2(tn) is consistent with a shift at this time, although the precise timing of the shift in α2(tn) is not obvious. The 1977 shift is, in fact, manifest in the SSTA NLPCA mode 1 time series α1 (Fig. 4); the time series is biased toward negative extrema before 1977 and toward positive extrema after 1977. However, it should be noted that this nonstationarity in variance may simply reflect the inclusion, from November 1981 onwards, of satellite data in the Smith et al. dataset (Smith et al. 1996).

Thus, the first mode of the modal NLPCA decomposition of tropical Pacific SSTA describes the average variability associated with the ENSO phenomenon and nicely characterizes the asymmetry in spatial structure between average El Niño and average La Niña events. The second mode characterizes some differences in evolution between individual ENSO events, and in particular, displays a nonstationarity consistent with the observed 1977 “regime shift” in ENSO variability.

Plots of a 2D nonmodal NLPCA approximation of the SSTA data (i.e., using two neurons in the bottleneck layer), projected in the subspaces spanned by SSTA EOFs (e1, e2), (e2, e3), (e1, e3), and (e1, e2, e3), are shown in Figs. 11a–d. The associated network used L = 6 neurons in the encoding and decoding layers, and the NMSD between candidate models (8 out of an ensemble of 20) varied between 1% and 3%. The 2D nonmodal NLPCA approximation explains 79.0% of the variance in the truncated dataset, and thus 72.2% of the variance of the original data. The time series corresponding to the output of the bottleneck layers (not shown), denoted (β1, β2)(tn) = sf[X(tn)], are highly correlated with each other (r = −0.835) and with the Niño-3.4 index (r1 = −0.879 and r2 = 0.889, respectively). Because the 2D nonmodal NLPCA depends on two parameters, β1 and β2, it is difficult to visualize the results using a sequence of maps as we did with the modal NLPCA in Figs. 5 and 10. Figures 12a and 12b display maps of the pointwise correlation coefficient between the SSTA data and the 2D PCA and 2D nonmodal NLPCA approximations, respectively. The 2D nonmodal NLPCA approximation produces higher correlations than the 2D PCA approximation in the central equatorial, western, and southeastern Pacific, and slightly lower correlations in the eastern equatorial Pacific.

It is worth considering the time series β1(tn) and β2(tn) in more detail. As was pointed out by Malthouse (1998), the parameterization sf[X(tn)] of the P-dimensional surface determined by NLPCA is only defined up to an arbitrary homeomorphism (i.e., a continuous, one-to-one, and onto function with continuous inverse). That is, for an arbitrary homeomorphism g: ℜP → ℜP, the time series g{sf[X(tn)]} is also an acceptable parameterization of the surface, because f ∘ sf = (f ∘ g−1) ∘(g ∘ sf).; In particular, for any homeomorphism g: ℜ2 → ℜ2, [g1(tn), g2(tn)] = g[β1(tn), β2(tn)] parameterizes the surface found by 2D nonmodal NLPCA. Which parameterization is determined by the NLPCA algorithm is a matter of chance. This degeneracy complicates the interpretation of the time series determined by nonmodal NLPCA. In particular, the time series βi(tn) may not be independent, or they may not even be uncorrelated.

Determining a set of P independent variables γi parameterizing the surface from the set of P time series βi(tn) determined empirically by NLPCA is a problem of feature extraction in the space of the variables parameterizing the surface. Thus, PCA or modal NLPCA can be used to calculate the γi(tn). In the case at hand, inspection of the scatterplot of β1(tn) with β2(tn) (not shown) indicated that PCA was appropriate for separation of the correlated time series β1(tn) and β2(tn) into two uncorrelated time series γ1(tn) and γ2(tn). The homeomorphism g is thus simply a linear function. The first PCA mode explained 92.7% of the variance in β space, and the associated time series (not shown) describes ENSO variability. Its correlation coefficient with the Niño-3.4 time series is 0.92 and with α1(tn) is 0.87. The second PCA mode explains the remaining 7.3% of the variance in β space, and the associated time series γ2(tn) (not shown) is rather similar to α2(tn); the two time series have a correlation of 0.7. In particular γ2(tn) displays the same difference in activity between the pre-1977 and the post-1977 periods as does α2(tn), with the same prominent peaks appearing in both time series. The parameterization [β1(tn), β2(tn)] thus contains essentially the same information as the two time series α1(tn) and α2(tn). This extra step of processing required to allow interpretation of the time series produced by nonmodal NLPCA indicates a distinct disadvantage of nonmodal NLPCA as compared to modal NLPCA.

The 2D modal NLPCA approximation to the SSTA data (not shown), obtained by adding the leading two 1D NLPCA approximations, strongly resembles the 2D nonmodal NLPCA approximation displayed in Fig. 11. Because of the variance-partitioning property of NLPCA, the fraction of variance explained by this approximation is the sum of the fractions explained by the two 1D modal approximations, that is, 74.4% (slightly greater than that obtained with the 2D nonmodal approximation). This approximation differs in detail from the nonmodal approximation displayed in Fig. 11, but the two agree broadly in their general features. Figure 12c displays a map of the pointwise correlation coefficient between the 2D modal NLPCA approximation and the original SSTA data; correlations are somewhat higher than those of the 2D nonmodal approximation in the western Pacific Ocean and somewhat lower in the eastern equatorial Pacific, but by and large the differences between the two correlation maps are small.

Note that, while in principle we would expect a 2D nonmodal NLPCA approximation to be better able to characterize general low-dimensional structure than a 2D modal NLPCA approximation, in the case of SSTA we find that the former explains 72.2% of the variance, while the latter explains 74.4%. In fact, the model corresponding to the modal NLPCA had 13% more free parameters than that corresponding to the nonmodal NLPCA. We suspect that this allowed the modal model more flexibility than the nonmodal, leading to the slight improvement in the fraction of variance explained.

Thus, we have seen that in both 1D and 2D, and for both modal and nonmodal approaches, NLPCA produces approximations of greater fidelity to the tropical Pacific SSTA data than does PCA. In particular both the 1D PCA and 1D NLPCA approximations describe“average” ENSO variability, but the 1D modal NLPCA approximation was able to characterize the spatial asymmetry between average El Niño and La Niña events in a fashion that 1D PCA cannot. Note that there are considerable differences between the structure and evolution of individual El Niño and La Niña events. No claim is made that any of the 1D or 2D SSTA NLPCA approximations captures the full range of this variability. Indeed, Penland and Sardeshmukh (1995) find that an embedding space of 15 PCA modes is needed for their linear inverse model SSTA forecasts. As an N-dimensional manifold requires a Cartesian embedding space of at most 2N + 1 dimensions, to prevent spurious self-intersections, it would seem that a manifold of a dimensionality of at least 7 is required to capture all aspects of ENSO variability. This study presents low-dimensional estimates of this presumed ENSO attractor.

5. Tropical Indo–Pacific sea level pressure

As a second application of NLPCA to a dataset of climatological significance, we consider the Comprehensive Ocean–Atmosphere Data Set monthly averaged see level pressure (SLP) over the tropical Indo–Pacific area (Woodruff et al. 1987), on a 2° × 2° grid from 27°S to 19°N, and from 31°E to 67°W, covering the period from January 1950 to December 1998. The annual cycle was removed in the same fashion as the SST data to produce sea level pressure anomalies (SLPAs). The SLPA field was then smoothed in time using a 3-month running mean filter and in space with a 1–2–1 filter in each spatial direction. The resulting smoothed SLPA field was then projected onto the space spanned by its 10 leading EOF modes, which together explain 60% of the total variance in the data.

Figure 13 displays the 1D NLPCA approximation of the SLPA data projected on the subspaces spanned by SLPA EOFs (e1, e2), (e2, e3), (e1, e3), and (e1, e2, e3). This curve was obtained using a network with L = 2 neurons in the encoding and decoding layers; the NMSD between the eight candidate models from an ensemble of 50 ranged between 0.1% and 0.3%. The 1D NLPCA approximation explains 27.0% of the total variance in the SLPA data; this is only a slight improvement over the fraction of variance explained by the 1D PCA approximation. Figure 14 displays plots of α1(tn), the standardized time series associated with the 1D NLPCA approximation, and the Southern Oscillation index (SOI), calculated by subtracting the SLPA at Darwin, Australia (12°S, 131°E) from that at Tahiti (17°S, 149°W), and then applying a 3-month running average smoother. The two time series bear a strong resemblance to each other on interannual and longer timescales; their correlation coefficient is 0.7. The 1D modal NLPCA approximation thus seems to describe ENSO variability in the SLPA field. This association is reinforced by inspection of the sequence of maps of (1) for α1 = −3, −2, −1, −0.5, 0, 0.5, 1, 2 (Fig. 15). This sequence of spatial patterns describes the east–west SLPA dipole associated with average Southern Oscillation variability. Figure 16 displays composites of SLPA averaged over those December–January–February (DJF) periods in which the SOI was less than 1 standard deviation (Fig. 16a) below the long-term average or was more than 1 standard deviation above the long-term average (Fig. 16b). This 3-month period was selected because, of all 3-month seasons in the record, it displayed the greatest variance. Figures 16a and 16b represent average El Niño and La Niña patterns, respectively. Clearly, the maps in Fig. 15 for α1 < 0 correspond to the El Niño composite and those for α1 > 0 correspond to the La Niña composite. Comparison of Figs. 15 and 16 indicates that the 1D NLPCA approximation characterizes an asymmetry between average El Niño and average La Niña events, particularly in the eastern half of the domain. As was the case with SSTA, maps of pointwise correlation between the data and the 1D PCA and 1D NLPCA approximations (not shown) indicate that, particularly around the nodal line of the 1D PCA approximation, the 1D NLPCA approximation displays a higher degree of fidelity to the original data.

In calculating the second mode of the modal NLPCA decomposition of the SLPA data, it was determined that only for l = 1 neuron in the encoding and decoding layers could robust results be obtained. Neural network–based NLPCA can only find nonlinear structure if there are two or more neurons in the encoding and decoding layers. In other words, the optimal 1D characterization of the residual data, obtained by subtracting from the original SLPA data the 1D NLPCA approximation, is a straight line. The spatial pattern and associated time series of this mode (not shown), which explains 15.9% of the variance of the original data set, strongly resemble those of SLPA PCA mode 2. The correlation between the two time series is 0.96, and the pattern correlation between the two spatial patterns is 0.93. Thus, with the dataset available, no robust lower-dimensional nonlinear structures could be found in these residuals. Furthermore, the results of a 2D nonmodal NLPCA of the SLPA data did not yield particularly interesting results.

We conclude that, apart from a weakly nonlinear 1D NLPCA approximation corresponding to ENSO variability and characterizing a slight asymmetry between average El Niño and average La Niña events, the robust low-dimensional structure of this SLPA data is linear.

6. Summary and conclusions

Traditional PCA is a powerful tool for the detection and characterization of low-dimensional linear structure in multivariate datasets. However, when a dataset contains substantial nonlinear low-dimensional structure, the results of PCA are suboptimal. We have investigated the application of a nonlinear generalization of PCA, denoted NLPCA, to two datasets of climatic significance: tropical Pacific SST and tropical Indo–Pacific SLP.

Considering first the SST data, it was found that a 1D NLPCA approximation, which we denote NLPCA mode 1, explains 63.3% of the total variance in the SST field, in contrast to 57.6% for the first PCA mode. That this 1D approximation describes average ENSO variability is clear upon inspection of both the time series α1(tn) (Fig. 4a) and the sequence of spatial maps (Fig. 5) characterizing the approximation. PCA mode 1 also characterizes average ENSO variability, but only its linear component, so it is unable to describe the asymmetry in spatial pattern between average warm and cold events manifest in the 1D NLPCA approximation and in a composite analysis. NLPCA improves on PCA by allowing low-dimensional approximations to have a structure other than that of simple standing oscillations. While both the 1D NLPCA approximation and the composite analysis describe the asymmetry between average warm and cold ENSO events, NLPCA has the advantages that it does not require the a priori specification of a time series over which to composite and that it provides a full 1D approximation to the data, in contrast to the 0D approximation produced by the composite analysis. SSTA NLPCA mode 2 explains 11.1% of the SSTA variance and characterizes some aspects of ENSO variability not described by SSTA NLPCA mode 1. In particular, a striking temporal nonstationarity in the variability of this mode is consistent with the difference in variability between pre-1977 and post-1977 La Niña events discussed in a number of recent studies (e.g., Wang 1995). A 2D nonmodal NLPCA approximation is found to explain 72.2% of the total SSTA variance, in contrast to 68.5% for a 2D PCA approximation and 74.4% for a modal 2D NLPCA approximation. It is difficult to visualize a 2D nonmodal NLPCA approximation, but pointwise correlation maps comparing the spatial performance of the 2D PCA approximation, the 2D nonmodal NLPCA approximation, and the 2D modal NLPCA approximation (Fig. 12) indicate that the nonlinear approximations are not too different, one from the other, and that they characterize variability in the western part of the Pacific basin better than the linear approximation does. However, as the time series of variables parameterizing the surface produced by nonmodal NLPCA are only determined up to an arbitrary homeomorphism, complicating their interpretation, a secondary feature extraction problem using PCA had to be carried out to recapture from the nonmodal analysis the same information present in the time series produced by a modal analysis. This indicates a significant disadvantage of nonmodal NLPCA as compared to modal. Taken together, the results presented in section 4 demonstrate that the tropical Pacific dataset considered contains nonlinear lower-dimensional structure that can robustly be characterized by NLPCA.

In section 5, it was seen that the tropical Indo–Pacific SLPA dataset was primarily characterized by linear low-dimensional structure. A 1D NLPCA approximation was found to explain 27% of the total variance, in contrast to 24.2% for the 1D PCA approximation. This mode describes average ENS0 variability in the SLPA field and characterizes the asymmetry in spatial pattern between average El Niño and average La Niña events. No robust nonlinear second NLPCA mode could be found, indicating that the SLPA data are dominated by linear structure.

This analysis has been a preliminary investigation of the application of NLPCA to climatic datasets. Because of its ease of implementation, early stopping with an ensemble of models was used to ensure the robustness of the results of NLPCA to the introduction of new data and to the initial parameter values used in network training. Other, more sophisticated, regularization techniques involving the modification of the cost function can be implemented to automate the determination of model parameters such as L (the number of neurons in the encoding and decoding layers) and to eliminate the necessity of considering an ensemble of candidate models (Yuval 2000). Such modifications of the model building procedure will presumably be implemented in future applications of NLPCA to the analysis of climate data.

The operation of feature extraction plays a significant role in the statistical analysis of climatic datasets. Generally, there is no a priori reason to expect that any low-dimensional structure underlying a multivariate dataset should be linear, and thus no reason to expect traditional PCA to provide an optimal characterization of this lower-dimensional structure. NLPCA provides a natural generalization of PCA to the nonlinear feature extraction problem. As we have demonstrated using tropical Pacific SST anomaly data, NLPCA can provide insight into the structure of a dataset that PCA could not. Because of its complexity we do not feel that NLPCA will replace simpler traditional approaches such as PCA, but that it will serve as another useful method in the climate statistician’s tool box.

Acknowledgments

The author would like to acknowledge helpful comments by Lionel Pandolfo, William W. Hsieh, Benyang Tang, Yuval Zudman, and Francis Zwiers. In particular, I would like to thank Dr. Tang for helping prepare the SST and SLP datasets. As well, I gratefully acknowledge the insightful comments provided by two anonymous reviewers, which substantially improved this manuscript. This work was funded by a University of British Columbia University Graduate Fellowship and by the Crisis Points Group of the Peter Wall Institute for Advanced Studies.

REFERENCES

  • Baldi, P., and K. Hornik, 1989: Neural network and principal component analysis: Learning from examples without local minima. Neural Networks,2, 53–58.

  • Barnston, A. G., 1994: Linear statistical short-term climate predictive skill in the Northern Hemisphere. J. Climate,7, 1513–1564.

  • ——, and R. E. Livezey, 1987: Classification, seasonality, and persistence of low-frequency atmospheric circulation patterns. Mon. Wea. Rev.,115, 1083–1126.

  • Bishop, C. M., 1995: Neural Networks for Pattern Recognition. Clarendon Press, 482 pp.

  • Burgers, G., and D. B. Stephenson, 1999: The “normality” of El Niño. Geophys. Res. Lett.,26, 1027–1030.

  • Cybenko, G., 1989: Approximation by superpositions of a sigmoidal function. Math. Contrib. Signals Syst.,2, 303–314.

  • Finnoff, W., F. Hergert, and H. G. Zimmermann, 1993: Improving model selection by nonconvergent methods. Neural Networks,6, 771–783.

  • Hoerling, M. P., A. Kumar, and M. Zhong, 1997: El Niño, La Niña, and the nonlinearity of their teleconnections. J. Climate,10, 1769–1786.

  • Hsieh, W. W., and B. Tang, 1998: Applying neural network models to prediction and data analysis in meteorology and oceanography. Bull. Amer. Meteor. Soc.,79, 1855–1870.

  • Kramer, M. A., 1991: Nonlinear principal component analysis using autoassociative neural networks. AIChE J.,37, 233–243.

  • Malthouse, E. C., 1998: Limitations of nonlinear PCA as performed with generic neural networks. IEEE Trans. Neural Networks,9, 165–173.

  • Monahan, A. H., 2000a: Nonlinear principal component analysis by neural networks: Theory and application to the Lorenz system. J. Climate,13, 821–835.

  • ——, 2000b: Nonlinear principal component analysis of climate data. Ph.D. dissertation, University of British Columbia, 157 pp. [Available from Dept. of Earth and Ocean Sciences, University of British Columbia, Vancouver, BC V6T 1Z4, Canada.].

  • Penland, C., and P. D. Sardeshmukh, 1995: The optimal growth of tropical sea surface temperature anomalies. J. Climate,8, 1999–2024.

  • Sengupta, S. K., and J. S. Boyle, 1995: Nonlinear principal component analysis of climate data. PCMDI Tech. Rep. 29, 21 pp. [Available from Program for Climate Model Diagnosis and Intercomparison, Lawrence Livermore National Laboratory, University of California, Livermore, CA 94550.].

  • Smith, T. M., R. W. Reynolds, R. E. Livezey, and D. C. Stokes, 1996:Reconstruction of historical sea surface temperatures using empirical orthogonal functions. J. Climate,9, 1403–1420.

  • Tangang, F. T., B. Tang, A. H. Monahan, and W. W. Hsieh, 1998: Forecasting ENSO events: A neural network-extended EOF approach. J. Climate,11, 29–41.

  • von Storch, H., and F. W. Zwiers, 1999: Statistical Analysis in Climate Research. Cambridge University Press, 494 pp.

  • Wang, B., 1995: Interdecadal changes in El Niño onset in the last four decades. J. Climate,8, 267–285.

  • Woodruff, S. D., R. J. Slutz, R. L. Jenne, and P. M. Steurer, 1987: A Comprehensive Ocean–Atmosphere Data Set. Bull. Amer. Meteor. Soc.,68, 1239–1250.

  • Yuval, 2000: Neural network training for prediction of climatological time series, regularized by minimization of the generalized cross-validation function. Mon. Wea. Rev.,128, 1456–1473.

APPENDIX A

Neural Networks

As is described in detail by Bishop (1995) and Hsieh and Tang (1998), a feed-forward neural network is a nonparametric statistical model composed of a series of parallel layers, each of which contains a number of processing elements, or neurons, such that the output of the ith layer is used as input to the (i + 1)th. If y(i)j is the output of the jth neuron of the ith layer, then
i1520-0442-14-2-219-ea1
is the output of the kth neuron of the (i + 1)th layer. The elements of the arrays w(i+1)jk are referred to as the weights, and those of the vectors b(i+1)k as the biases. The transfer function characterizing the (i + 1)th layer is denoted σ(i+1); it may be linear or nonlinear. The first, or input layer, receives the values of the data presented to the network; its transfer function is simply the identity map σI: xx. The famous flexibility of neural networks comes from the use of nonlinear transfer functions (typically hyperbolic tangents) in some or all of the remaining layers. An important result, due to Cybenko (1989), is that a three-layer neural network with S input neurons, hyperbolic tangent transfer functions in the second layer, and linear transfer functions in the third layer of T neurons, can approximate to arbitrary accuracy any continuous function from ℜS to ℜT, if the number of neurons in the second layer is sufficiently large.

APPENDIX B

Symmetric and Antisymmetric Components of Composites

We consider a spatial field Y(tn), n = 1, . . . , N, which is composited using a time series X(tn), as follows. Two subsets of time, t(+)n and t(−)n, are defined by
i1520-0442-14-2-219-eb1
where c is some threshold; in our case, it is one standard deviation of X(tn). The positive and negative composites of Y(tn), Y(+) and Y(−), are simply defined as the respective averages over {t(+)n} and {t(−)n}:
i1520-0442-14-2-219-eb3
Maps of Y(+) and Y(−), where Y(tn) is SSTA and X(tn) is the NDJ-averaged Niño-3.4 index are shown in Figs. 6a and 6b, respectively. In general, the spatial patterns of Y(+) and Y(−) differ by more than a sign.
We want to determine the symmetric and antisymmetric [under a change of sign in X(tn)] components of Y(+) and Y(−). To address this question, we assume the minimal nonlinear model for the dependence of Y(tn) on X(tn):
Ytna(0)a(1)Xtna(2)Xtn2ϵn
where ϵn is a vector noise process, assumed to satisfy
ϵϵ+ϵ0
The validity of this approximation depends both on the validity of the model (B5) and on the length of the records, {tn}, {t(+)n}, and {t(−)n}. We can assume that both Y(tn) and X(tn) are zero-centered in time. This implies that
0a(0)a(2)X2
and so we can rewrite our model as
Ytna(1)Xtna(2)Xtn2X2ϵn
The vector a(1) is the antisymmetric field pattern under a change of sign in X, while a(2) is the symmetric field pattern under such a change of sign. They will be referred to, respectively, as the antisymmetric and symmetric components of the composite.
Clearly, by the definition of the composite maps,
i1520-0442-14-2-219-eb9
This is a linear equation that can easily be solved to yield
i1520-0442-14-2-219-eb11
where
i1520-0442-14-2-219-eb13

Figure 6c displays a(2) for tropical Pacific SSTA composited according to the Niño-3.4 index. A map of a(1) (not shown) looks very much like SSTA EOF mode 1 (Fig. 6a); the spatial correlation between these is 0.975.

Hoerling et al. (1997) considered the linear combinations Y(+)Y(−) and Y(+) + Y(−) and denoted them the linear and nonlinear responses of Y to X, respectively. The above analysis shows this identification is appropriate only in the special case that 〈X = −〈X+ and 〈X2 = 〈X2+. This is certainly not true in general, although for the case they considered, in which X(tn) was an SST index similar to Niño-3.4, it is a fairly good approximation.

In principle, one could use the technique described above to fit the more general model
i1520-0442-14-2-219-eb14
by stratifying the data into K + 1 subsets. Presumably, however, as K increases, so does the sampling variability associated with decreasing validity of approximations (B6).

Fig. 1.
Fig. 1.

The five-layer feed-forward neural network used to perform NLPCA.

Citation: Journal of Climate 14, 2; 10.1175/1520-0442(2001)013<0219:NPCATI>2.0.CO;2

Fig. 2.
Fig. 2.

Spatial patterns of the first three SSTA EOF patterns, normalized to unit magnitude. The contour interval is 0.02, the bold contour is the zero line, and negative contours are dashed.

Citation: Journal of Climate 14, 2; 10.1175/1520-0442(2001)013<0219:NPCATI>2.0.CO;2

Fig. 3.
Fig. 3.

Scatterplot of SSTA data (points) and SSTA NLPCA mode 1 approximation (open circles) projected onto the planes spanned by (a) e1 and e2, (b) e2 and e3, and (c) e1 and e3. (d) A scatterplot of the 1D NLPCA approximation projected into the subspace spanned by e1, e2, and e3.

Citation: Journal of Climate 14, 2; 10.1175/1520-0442(2001)013<0219:NPCATI>2.0.CO;2

Fig. 4.
Fig. 4.

Plot of α1(tn), the time series associated with SSTA NLPCA mode 1 (thick line) and the normalized Niño-3.4 index (thin line).

Citation: Journal of Climate 14, 2; 10.1175/1520-0442(2001)013<0219:NPCATI>2.0.CO;2

Fig. 5.
Fig. 5.

Sequence of spatial maps characterizing SSTA NLPCA mode 1 for (a) α1 = −3.5, (b) α1 = −1.5, (c) α1 = −0.75, (d) α1 = −0.25, (e) α1 = 0.25, (f) α1 = 0.75, (g) α1 = 1.5, and (h) α1 = 3.5. Contour interval: 0.5°C.

Citation: Journal of Climate 14, 2; 10.1175/1520-0442(2001)013<0219:NPCATI>2.0.CO;2

Fig. 6.
Fig. 6.

SSTA composite maps for average (a) El Niño and (b) La Niña events. Contour interval: 0.5°C. (c) Symmetric part of composites (a) and (b). Contour interval: 0.1°C. See text for definition of composites and of the symmetric component.

Citation: Journal of Climate 14, 2; 10.1175/1520-0442(2001)013<0219:NPCATI>2.0.CO;2

Fig. 7.
Fig. 7.

Maps of pointwise correlation coefficients between observed SSTA and (a) 1D PCA approximation and (b) 1D NLPCA approximation.

Citation: Journal of Climate 14, 2; 10.1175/1520-0442(2001)013<0219:NPCATI>2.0.CO;2

Fig. 8.
Fig. 8.

As in Fig. 3, but for SSTA NLPCA mode 2.

Citation: Journal of Climate 14, 2; 10.1175/1520-0442(2001)013<0219:NPCATI>2.0.CO;2

Fig. 9.
Fig. 9.

Plot of SSTA NLPCA mode 2 time series α2(tn).

Citation: Journal of Climate 14, 2; 10.1175/1520-0442(2001)013<0219:NPCATI>2.0.CO;2

Fig. 10.
Fig. 10.

Maps corresponding to SSTA NLPCA mode 2 approximation for (a) α2 = −4, (b) α2 = −1, (c) α2 = −0.1, (d) α2 = 0, (e) α2 = 0.1, (f) α2 = 0.2, (g) α2 = 0.3, (h) α2 = 0.4, and (i) α2 = 1. Contour interval: 0.5°C.

Citation: Journal of Climate 14, 2; 10.1175/1520-0442(2001)013<0219:NPCATI>2.0.CO;2

Fig. 11.
Fig. 11.

As in Fig. 3, but for SSTA 2D nonmodal NLPCA approximation.

Citation: Journal of Climate 14, 2; 10.1175/1520-0442(2001)013<0219:NPCATI>2.0.CO;2

Fig. 12.
Fig. 12.

Maps of pointwise correlations between observed SSTA and (a) 2D PCA approximation, (b) 2D nonmodal NLPCA approximation, and (c) 2D modal NLPCA approximation.

Citation: Journal of Climate 14, 2; 10.1175/1520-0442(2001)013<0219:NPCATI>2.0.CO;2

Fig. 13.
Fig. 13.

As in Fig. 3, but for SLPA NLPCA mode 1.

Citation: Journal of Climate 14, 2; 10.1175/1520-0442(2001)013<0219:NPCATI>2.0.CO;2

Fig. 14.
Fig. 14.

Plot of α1(tn), the standardized time series associated with SLPA NLPCA mode 1 (thick line), and of standardized SOI (thin line).

Citation: Journal of Climate 14, 2; 10.1175/1520-0442(2001)013<0219:NPCATI>2.0.CO;2

Fig. 15.
Fig. 15.

Plot of a sequence of spatial maps characterizing SLPA NLPCA mode 1 for (a) α1 = −3, (b) α1 = −2, (c) α1 = −1, (d) α1 = −0.5, (e) α1 = 0, (f) α1 = 0.5, (g) α1 = 1, and (h) α1 = 2. Contour interval: 0.5 hPa.

Citation: Journal of Climate 14, 2; 10.1175/1520-0442(2001)013<0219:NPCATI>2.0.CO;2

Fig. 16.
Fig. 16.

Composites of SLPA during average (a) El Niño and (b) La Niña events. Contour interval: 0.5 hPa.

Citation: Journal of Climate 14, 2; 10.1175/1520-0442(2001)013<0219:NPCATI>2.0.CO;2

Save
  • Baldi, P., and K. Hornik, 1989: Neural network and principal component analysis: Learning from examples without local minima. Neural Networks,2, 53–58.

  • Barnston, A. G., 1994: Linear statistical short-term climate predictive skill in the Northern Hemisphere. J. Climate,7, 1513–1564.

  • ——, and R. E. Livezey, 1987: Classification, seasonality, and persistence of low-frequency atmospheric circulation patterns. Mon. Wea. Rev.,115, 1083–1126.

  • Bishop, C. M., 1995: Neural Networks for Pattern Recognition. Clarendon Press, 482 pp.

  • Burgers, G., and D. B. Stephenson, 1999: The “normality” of El Niño. Geophys. Res. Lett.,26, 1027–1030.

  • Cybenko, G., 1989: Approximation by superpositions of a sigmoidal function. Math. Contrib. Signals Syst.,2, 303–314.

  • Finnoff, W., F. Hergert, and H. G. Zimmermann, 1993: Improving model selection by nonconvergent methods. Neural Networks,6, 771–783.

  • Hoerling, M. P., A. Kumar, and M. Zhong, 1997: El Niño, La Niña, and the nonlinearity of their teleconnections. J. Climate,10, 1769–1786.

  • Hsieh, W. W., and B. Tang, 1998: Applying neural network models to prediction and data analysis in meteorology and oceanography. Bull. Amer. Meteor. Soc.,79, 1855–1870.

  • Kramer, M. A., 1991: Nonlinear principal component analysis using autoassociative neural networks. AIChE J.,37, 233–243.

  • Malthouse, E. C., 1998: Limitations of nonlinear PCA as performed with generic neural networks. IEEE Trans. Neural Networks,9, 165–173.

  • Monahan, A. H., 2000a: Nonlinear principal component analysis by neural networks: Theory and application to the Lorenz system. J. Climate,13, 821–835.

  • ——, 2000b: Nonlinear principal component analysis of climate data. Ph.D. dissertation, University of British Columbia, 157 pp. [Available from Dept. of Earth and Ocean Sciences, University of British Columbia, Vancouver, BC V6T 1Z4, Canada.].

  • Penland, C., and P. D. Sardeshmukh, 1995: The optimal growth of tropical sea surface temperature anomalies. J. Climate,8, 1999–2024.

  • Sengupta, S. K., and J. S. Boyle, 1995: Nonlinear principal component analysis of climate data. PCMDI Tech. Rep. 29, 21 pp. [Available from Program for Climate Model Diagnosis and Intercomparison, Lawrence Livermore National Laboratory, University of California, Livermore, CA 94550.].

  • Smith, T. M., R. W. Reynolds, R. E. Livezey, and D. C. Stokes, 1996:Reconstruction of historical sea surface temperatures using empirical orthogonal functions. J. Climate,9, 1403–1420.

  • Tangang, F. T., B. Tang, A. H. Monahan, and W. W. Hsieh, 1998: Forecasting ENSO events: A neural network-extended EOF approach. J. Climate,11, 29–41.

  • von Storch, H., and F. W. Zwiers, 1999: Statistical Analysis in Climate Research. Cambridge University Press, 494 pp.

  • Wang, B., 1995: Interdecadal changes in El Niño onset in the last four decades. J. Climate,8, 267–285.

  • Woodruff, S. D., R. J. Slutz, R. L. Jenne, and P. M. Steurer, 1987: A Comprehensive Ocean–Atmosphere Data Set. Bull. Amer. Meteor. Soc.,68, 1239–1250.

  • Yuval, 2000: Neural network training for prediction of climatological time series, regularized by minimization of the generalized cross-validation function. Mon. Wea. Rev.,128, 1456–1473.

  • Fig. 1.

    The five-layer feed-forward neural network used to perform NLPCA.

  • Fig. 2.

    Spatial patterns of the first three SSTA EOF patterns, normalized to unit magnitude. The contour interval is 0.02, the bold contour is the zero line, and negative contours are dashed.

  • Fig. 3.

    Scatterplot of SSTA data (points) and SSTA NLPCA mode 1 approximation (open circles) projected onto the planes spanned by (a) e1 and e2, (b) e2 and e3, and (c) e1 and e3. (d) A scatterplot of the 1D NLPCA approximation projected into the subspace spanned by e1, e2, and e3.

  • Fig. 4.

    Plot of α1(tn), the time series associated with SSTA NLPCA mode 1 (thick line) and the normalized Niño-3.4 index (thin line).

  • Fig. 5.

    Sequence of spatial maps characterizing SSTA NLPCA mode 1 for (a) α1 = −3.5, (b) α1 = −1.5, (c) α1 = −0.75, (d) α1 = −0.25, (e) α1 = 0.25, (f) α1 = 0.75, (g) α1 = 1.5, and (h) α1 = 3.5. Contour interval: 0.5°C.

  • Fig. 6.

    SSTA composite maps for average (a) El Niño and (b) La Niña events. Contour interval: 0.5°C. (c) Symmetric part of composites (a) and (b). Contour interval: 0.1°C. See text for definition of composites and of the symmetric component.

  • Fig. 7.

    Maps of pointwise correlation coefficients between observed SSTA and (a) 1D PCA approximation and (b) 1D NLPCA approximation.

  • Fig. 8.

    As in Fig. 3, but for SSTA NLPCA mode 2.

  • Fig. 9.

    Plot of SSTA NLPCA mode 2 time series α2(tn).

  • Fig. 10.

    Maps corresponding to SSTA NLPCA mode 2 approximation for (a) α2 = −4, (b) α2 = −1, (c) α2 = −0.1, (d) α2 = 0, (e) α2 = 0.1, (f) α2 = 0.2, (g) α2 = 0.3, (h) α2 = 0.4, and (i) α2 = 1. Contour interval: 0.5°C.

  • Fig. 11.

    As in Fig. 3, but for SSTA 2D nonmodal NLPCA approximation.

  • Fig. 12.

    Maps of pointwise correlations between observed SSTA and (a) 2D PCA approximation, (b) 2D nonmodal NLPCA approximation, and (c) 2D modal NLPCA approximation.

  • Fig. 13.

    As in Fig. 3, but for SLPA NLPCA mode 1.

  • Fig. 14.

    Plot of α1(tn), the standardized time series associated with SLPA NLPCA mode 1 (thick line), and of standardized SOI (thin line).

  • Fig. 15.

    Plot of a sequence of spatial maps characterizing SLPA NLPCA mode 1 for (a) α1 = −3, (b) α1 = −2, (c) α1 = −1, (d) α1 = −0.5, (e) α1 = 0, (f) α1 = 0.5, (g) α1 = 1, and (h) α1 = 2. Contour interval: 0.5 hPa.

  • Fig. 16.

    Composites of SLPA during average (a) El Niño and (b) La Niña events. Contour interval: 0.5 hPa.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 941 316 71
PDF Downloads 402 60 6