1. Introduction
Pattern recognition plays an important role in quantitative climatology helping in diagnosing and understanding climate processes. Archetypal analysis (AA) is one technique that is gaining traction in the geophysical science community for its ability to find patterns based on extreme modes of data. Given the relatively new discovery of the utility of this analysis to geophysical problems, resources and references are scattered for the researcher who wishes to implement this technique. The goal of this paper is to present the AA method along with a detailed description of the decisions made in its implementation, and the effect each decision may have on the final output. We also provide a discussion on the interpretation of AA with respect to geophysical data.
Empirical orthogonal function (EOF) decomposition or factorization has become a hallmark of statistical analysis and data reduction (Hotelling 1933; Jolliffe 1986) since its application in the mid-1950s by Lorenz (1956) to weather and climate studies. Known also as principal component analysis (PCA), EOF analysis constructs patterns in the spatial dimension that maximize variance. The constructed EOFs are not directly interpretable in terms of the original data and therefore any attempt to attribute a particular dynamical mechanism to any one EOF pattern is discouraged when analyzing geophysical data (Monahan et al. 2009). By comparison, AA seeks patterns from the extreme points of a convex hull, or envelope, surrounding the data in state space. It follows that the constructed archetypes may be interpreted in terms of the original data, as shown in the derivation section of this paper and by previous studies (Mørup and Hansen 2012; Bauckhage 2014). Other pattern recognition types discussed here are nonnegative matrix factorization (Cichocki et al. 2009; Gillis 2020; Mairal 2014, 2017), clustering, and optimization on manifold (Boumal et al. 2014; Hannachi and Trendafilov 2017; Hannachi 2021; Trendafilov and Gallo 2021).
Like any data mining tool, many decisions can be made in archetypal analysis to tweak or optimize certain parameters based on the needs of the user. The output of archetypal analysis will depend strongly on the decisions made along the way. A 39-yr sea surface temperature (SST) reanalysis dataset is used here as paradigm for the method and to illustrate some of these choices.
We demonstrate the utility of this analysis method and the benefits that arise particularly when analyzing climate and weather datasets. We show that the decisions made to implement AA can greatly affect the interpretation of results and should therefore be considered carefully and documented thoroughly in all work involving AA. The structure of this paper is as follows. Sections 1 and 2, provide an historical perspective and the rationale for the work. Next, we introduce the datasets in section 3. In section 4, we contrast both PCA and AA methods, and introduce the minimization algorithm for reduced space archetypal analysis (RSAA). Section 5 describes some decisions required on the input data, on the archetypal analysis, and their impacts to the final result. We then examine potential generalization and extension of the AA method in section 6. Section 7 illustrates teleconnections derived from extreme conditions resolved by AA applied to SSTA. Last, section 8 provides a summary and conclusion statement. The appendix touches on the available computing packages.
2. Data
We apply AA to the Optimum Interpolation Sea Surface Temperature (OISST), version 2.1 (v2.1), high-resolution dataset (Reynolds et al. 2007) provided SST on a 0.25° global grid, of which a subset from 1982 to 2020 is reinterpolated to a 4° × 4° resolution when computational efficiency is required. Some illustrations of the technique consider daily anomalies, but here we focus on monthly anomalies. SST anomalies (SSTA) are defined the standard way. Daily and monthly SST anomalies represent here a departure from daily and monthly climatological values, both defined as time mean for each day of the year and month of the year across all years considered in the interval 1982–2020, respectively. Only complete years are considered. The archetypes spatial and temporal imprints, and their linkages to extratropical atmospheric circulation are revealed by compositing JRA-55 reanalysis atmospheric fields (Kobayashi et al. 2015) at the surface and on isobaric levels with corresponding level of temporal aggregation.
The SST dataset is used principally for illustration purpose of the AA method as the weather and climate research community is familiar with its variability patterns across multiple spatiotemporal scales and its teleconnections have been extensively studied and could be readily compared to the AA results presented hereafter.
3. Mathematical derivation
AA belongs to an ever increasing class of data analysis methods called matrix factorization (Cichocki et al. 2009; Elad 2010; Elden 2019; Gan et al. 2020; Gillis 2020; Hannachi 2021), where factorization allows one to represent the original data as a combination of factors or components that are provably easier to interpret. Another advantage of factorization is that the dimensionality of dataset can also be reduced, and so its complexity. Due diligence when these methods are employed is to thoroughly test their domain of applicability. Hereafter, we will focus on PCA, one of the oldest and most widespread techniques in statistical data analysis (Hotelling 1933; Jolliffe 1986), and AA, a lesser known one, but emerging when applied to geophysical problems (Steinschneider and Lall 2015; Hannachi and Trendafilov 2017; Richardson et al. 2021; Risbey et al. 2021).
PCA and AA are data-driven factorization methods belonging to unsupervised clustering techniques (Mørup and Hansen 2012). Only the PCA truncation order r and archetype cardinality p (akin to a number of clusters in clustering methods) allowed in the decomposition are predefined. Although not explicitly indicated in our notation, as not to render it too cumbersome, we weight spatially the SSTA data matrix
The constraints in the minimization procedure in both cases critically affect the factorization. For PCA and a predefined truncation of level r = min(s, t), the factorization is lossless. In other words, the original dataset
Archetypes are convex combinations of data points and data points are approximated in terms of convex combinations of archetypes.
The convexity characteristic of AA is crucial and leads to a probabilistic interpretation of both archetypes
Finally, to reveal potential teleconnections based on AA, the composites
4. Data-driven characteristics impacting AA
a. Data structure and distribution
Toy datasets illustrating the results of AA applied to 100 samples of 1000 points per sample drawn from on (a) a 3D normal distribution of unit variance, (b) a 3D uniform distribution centered on the origin, and (c) the distribution of points from the first three scaled PCs of daily SST anomalies, λ1–3PC1–3. In all examples, the dark gray points depict 1000 points of the last of the 100 samples and the light gray points represents the points projections on the X–Y, Y–Z, and Z–X planes of the coordinate system. All the samples AA resulting eight archetypes are shown in color.
Citation: Artificial Intelligence for the Earth Systems 1, 3; 10.1175/AIES-D-21-0007.1
In general, geophysical observables do not have a spherically or elliptically shaped distribution. As a tractable example, Fig. 1c displays the results of AA for eight archetypes applied to 100 samples of 1000 daily SSTA records drawn from the 14 250 records of sea surface temperatures over the 1982–2020 period. When the first three scaled PCs, λ1–3PC1–3, are retained for AA, the reduced dataset seems at first glance elliptically distributed given that the eigenvalues in the singular value decomposition of SSTA are such that λ1 > λ2 > λ3. However, a closer inspection of the three-dimensional cloud points reveals that, at times, excursions occur away from the broad ellipsoidally shaped distribution and outliers are readily identified by the AA method and are consistently arranged in small spatially coherent clusters across all 100 samples.
Points in Δp−1 can be projected on the plane perpendicular to the diagonal of the nonnegative orthant6 of the p-dimensional hypercube centered at the origin of
Simplex representations of detrended monthly SST anomalies over the 1982–2020 for archetypes cardinalities of (a) 4 and (b) 8.
Citation: Artificial Intelligence for the Earth Systems 1, 3; 10.1175/AIES-D-21-0007.1
b. Dimensionality reduction
The spatiotemporal dimensions of the SST and JRA-55 datasets considered throughout this work are commensurate to the resolutions of other major global reanalysis efforts and ocean–atmosphere general circulation model (OAGCM) simulations. For OISST v2.1, the daily (monthly) data matrix
To alleviate the dimensionality issue, we may first consider a reduction of the domain size or spatiotemporal averaging. This is suitable for certain problems; for example, when AA is used to find extreme patterns involving mesoscale ocean processes and both local and remote linkages to atmospheric circulation, or when variability at smaller scales and higher frequencies can safely be ignored. For example, applying 4° × 4° spatial averaging to the original 0.25° × 0.25° OISST v2.1 dataset decreases the spatial dimension by factor 16 × 16 = 256, two orders of magnitude. Additional aggregation of the time dimension from daily to monthly records further reduces the total number of data points by a factor of 14 245/468 × 256 ≈ 7792, without substantially changing SST AA results for large scale phenomena such as ENSO, for example.
Another class of dimensionality reduction rests on general matrix factorization procedures (Mørup and Hansen 2012; Nguyen and Holmes 2019), where the original data matrix
To illustrate the equivalence between AA and RSAA methods as in Eq. (7), we apply AA and RSAA to spatially averaged OISST v2.1 monthly anomalies on 4° × 4° grid. The AA and RSAA data matrix dimensions are
AA and RSAA results for the stochastic matrices,
Citation: Artificial Intelligence for the Earth Systems 1, 3; 10.1175/AIES-D-21-0007.1
A difficulty encountered by most dimensionality reduction methods is to justify the level of truncation or reduction to apply to the original dataset. Usually for PCA, researchers rely on the first “significant” step change between consecutive values of ranked fraction of variance explained [Eq. (5)] when displayed on a scree plot. However, the optimal truncation level is conditioned by the data itself. The SSTA data matrix spectral characteristics show a rather smooth and incremental decrease between consecutive eigenvalues without an obvious step change and, therefore, the truncation levels applied throughout this work are mainly informed by the percentage of variance one wishes to retain and driven by computational considerations: typically, 90%–100% of the variance corresponding to reduced dimensions of the order of O(103) or less. The residual variance made out of the excluded modes of variability can be displayed to gain insight on where it lays and what it represents. Similarly, a spectral analysis of the associated PCs informs on the time scales excluded from the reduced dataset. Hereafter, the variance explained is given as the ranked fraction of variance of the full or reduced dataset and ranges from 0 to 1.
(top) Explained variances and (bottom) sum of squares errors for increasing number of principal components or archetypes for PCA (blue) and AA (red) for both detrended (continuous) and full (dashed) monthly SST anomalies. The AA explained variance [Eq. (8)] reported for both the full and detrended cases corresponds to RSAA results for cardinality from 3 to 20 when all 468 PCs are retained.
Citation: Artificial Intelligence for the Earth Systems 1, 3; 10.1175/AIES-D-21-0007.1
No obvious spectral gaps can be readily identified and the spectral characteristics of the RSAA factorization, as for PCA, are of little help.
In investigating the impact of dimension reduction on RSAA factorization, one does not fail to notice that both stochastic matrices, solutions of Eq. (7)
The Gini coefficient has been used previously in economics as a measure of wealth inequality, where Gini = 1 indicates maximum inequality and Gini = 0 indicates perfectly distributed wealth across oi values. In the application of AA, Gini = 1 for the probability matrix
Figure 5 illustrates the effect of increasing retained dimensions on how well dispersed the mixture weights are, as well as a comparison of the variance explained by the reduced problem versus the explained variances compared to the total variance of the original data. The top panel plots the Gini coefficient as a function of retained dimension (the number of scaled PCs) for seven selected archetype numbers (shown by different colored lines). For
Gini coefficients as a function of retained dimensions (scaled principal components) for matrices (top)
Citation: Artificial Intelligence for the Earth Systems 1, 3; 10.1175/AIES-D-21-0007.1
The impact on
When the dimensionality of the original data becomes computationally prohibitive even for PCA, “approximate” methods of dimension reduction can be deployed. For geophysical datasets, Seitola et al. (2014) and Hannachi (2021) have recently illustrated that the issue of dimensionality could be addressed through random projections (RP), where the PCA decomposition factors,
c. Trend analysis
Geophysical data often display nonstationary or trending behavior, of which a notable example is the observed warming of SST caused by globally rising temperatures due to increasing greenhouse gas concentrations (IPCC 2013, 2019). It is important to ascertain the impact of this trend on natural climate modes. For example, there is clear observational evidence that significant changes in the nature of key ENSO indicators happen posterior to 1980, when at least three major El Niño episodes occurred in the 39-yr period between 1982 and 2020 compared to the previous 39 years (Capotondi and Sardeshmukh 2015, 2017; Capotondi et al. 2020). Throughout this work, we focus only on SSTA over the satellite era (Reynolds et al. 2007). Given the relatively short record of near-global SST coverage available since the advent of satellite observing platforms, the power of any statistical analysis to investigate the interplay between SSTA variability and a “warming” or slowly changing mean state is limited. Therefore, we will not attempt hereafter to explain this interaction. However, if we are interested in detecting the different “flavors” of ENSO, removing a linear trend to SSTA prior to AA implementation is a legitimate step to prevent the global warming signature from potentially “washing out” the ENSO global extreme imprints if one assumes that internal variability of the climate system can be neatly separated from anthropogenic forcing effects. Conversely, if we are interested in the change of ENSO extreme impacts under climate change, we would want to retain that trend. It remains part of the due diligence in the application of AA to properly formulate the questions that this method aims to address.
To illustrate the impact of trend removal, we compare the resulting archetype patterns,
AA spatial pattern and time series results using (a) full and (b) detrended monthly SST anomalies over 1982–2020 for a selected archetype number of 4. The two left columns on each subplot show archetypes constructed by
Citation: Artificial Intelligence for the Earth Systems 1, 3; 10.1175/AIES-D-21-0007.1
As in Fig. 6, but for an AA cardinality of 6.
Citation: Artificial Intelligence for the Earth Systems 1, 3; 10.1175/AIES-D-21-0007.1
As in Fig. 6, but for an AA cardinality of 8.
Citation: Artificial Intelligence for the Earth Systems 1, 3; 10.1175/AIES-D-21-0007.1
5. Archetypal analysis computation and extensions
a. Initialization and convergence
Esposito (2021) has recently reviewed initialization methods for nonnegative matrix factorization (NMF). NMF shares algorithmic similarities with AA and so do initialization strategies employed to solve the optimization problem sketched in section 4, Eq. (7). Here, we combine random-based and clustering-based or data-driven initialization procedures. Being simpler to implement, random-based procedures are used as a benchmark for more sophisticated ones but require a thorough investigation of their robustness and reproducibility. For AA, a suite of strategies has been adopted in the literature (Bauckhage and Thurau 2009; Thurau et al. 2009, 2011; Eugster and Leisch 2011; Mørup and Hansen 2012; Bauckhage and Manshaei 2014; Suleman 2017a,b; Mair et al. 2017; Mair and Brefeld 2019). Our benchmark in all cases, the first trial of typically a number of randomly sampled initializations per optimization, is always the data-driven “FurthestSum” procedure advocated by Mørup and Hansen (2012). The algorithm FurthestSum initializes the AA procedure with a number of observation points equal to the desired AA cardinality through the matrix
However, Suleman (2017a) criticizes FurthestSum as “ill-conceived” and potentially leading to archetype redundancy after convergence for increasing archetype cardinality. To protect against this eventuality, we implement the initialization procedure prescribed by Mair and Brefeld (2019) based on coreset construction for AA, algorithm 2 in Mair and Brefeld (2019), where initial seed archetypes are randomly drawn from a distribution constructed from the square Euclidean distance of each data point X(⋅, t) from the time mean of
For all results reported in the paper, we implement the MATLAB code of Mørup and Hansen (2012) PCHA, suitably modified to accommodate both FurthestSum and AA coreset initializations (see Table A1 in the appendix for references). The AA procedure runs through an outer loop consisting of 1000 initialization trials, 999 random coreset and one FurthestSum trials, where for each individual trial, the iterative nonlinear least squares algorithm in PCHA, the inner loop, is allowed to converge with relative sum of square error (SSE) stopping criterion of 10−8. We report the solution that minimizes the relative SSE across all 1000 trials. We note that FurthestSum, throughout our many experiments, never corresponds to the optimum. All computations are performed in double precision.
Archetypes nesting for SSTA with trend (left-hand side) and SSTA with linear trend removed (right-hand side). The numbers in each row label the archetype rank based on the time mean of the AA stochastic matrix,
A nonexhaustive list of archetypal analysis package URLs with corresponding computing language types and main references.
b. Archetype cardinality
We observe no clear “knee point” in the evolution of neither the AA explained variance nor the sum of square errors between the full SSTA data and the AA representations as a function of archetype cardinality in Fig. 4. Therefore, a balance has to be struck between AA cardinality and representation of extremes conditions in the original dataset. To avoid the pitfalls of archetype redundancy mentioned by Suleman (2017a), for example, several initialization procedures, dimension reduction truncation and aggregation levels have to be tested for a number of archetype orders and the results compared (Bauckhage and Thurau 2009; Suleman 2017b).
It is interesting to note, somewhat unexpectedly, that the global SSTA archetypes “nest” in contrast to the assumption of Risbey et al. (2021) for AA applied to geopotential anomalies at 500 hPa. A pattern correlation distance is applied to identify archetype correspondence for different cardinalities. The correspondence is corroborated by visual inspection directly in Fig. 9. Overall, pattern correlations across near-identical archetypes are typically larger than 0.8. Table 1 shows the AA correspondence across several archetype orders from nAA = 4 to 10, for both the full and detrended cases, where each row corresponds to near-identical archetypes independent of the order nAA, at least when nAA is small, ≤10. Such a correspondence could not be readily established when comparing the full with nAA = p and the detrended with nAA = p − 2 sets of archetypes. For the nondetrended SSTA, 2 archetypes account for the cooling and warming patterns. As a linear trend has been removed, the cooling and warming patterns found in the full case have to be absent in the detrended case. For example, Fig. 6b detrended archetypes for a cardinality of 4 (nAA = p − 2 with p = 6) can only be compared to Fig. 7a archetypes 3–6 for a cardinality of 6 (nAA = p) given that archetypes 1 and 2 correspond to global cooling and warming patterns. The mismatch between Fig. 6b detrended archetypes for a cardinality of 4 and the remaining archetypes 3–6 in Fig. 7a for the full problem possibly indicates that a clean separation between a slow-changing mean state and natural modes of variability is elusive, at least as far as the distribution of extremes is concerned.
(a) Stacked bar plots of
Citation: Artificial Intelligence for the Earth Systems 1, 3; 10.1175/AIES-D-21-0007.1
The nesting properties of AA, when applied to SSTA, could be utilized to increase computational speed when stepping through the archetype orders by “recycling” the results from the previous order or perturbations thereof as initial seeds for the next order. Throughout this paper, we assume at the outset that archetypes do not nest, even approximately, and we reinitialize randomly our trials chosen independently of the previous order results.
As companion to Table 1, Fig. 9 illustrates the patterns “nestedness” and the impact of cardinality on the affiliation probabilities expressed by the matrix
Figure 9a illustrates the hierarchical nature of the AA power of discrimination for extreme SSTA conditions as a function of increasing cardinality. For example, it shows that no spurious blending of archetypes occurs when “it matters,” that is, when extreme conditions occur. This can be seen when the archetype corresponding to the three major Niño intervals between 1982 and 2020 matches across 2–8 cardinalities as the pink colored intervals in 1982–83, 1997–98, and 2015–16 indicate. For a cardinality of 2 in Fig. 9a first row, all records, being convex combinations of archetypes, have to be expressed as “blended” patterns by construction unless they correspond to strongly expressed Niño and Niña intervals (respectively pink and light green) as depicted in Fig. 9b fifth and third rows. Individual data records not corresponding to extremes will lead to a low discrimination score Δp(t) introduced in Eq. (10) or a low Gini coefficient Γp(t) in Eq. (11) for
c. Serial correlation and causality
As mentioned in section 4, a direct AA factorization of SSTA,
Transitions from one record S(⋅, t) to the next S(⋅, t + 1) can be now analyzed from the changes from
As an alternative to postprocessing results obtained from the direct application of AA, one could imagine AA to be applied to datasets where serial correlation has been explicitly modeled. Horenko (2009, 2010a,b,c), O’Kane et al. (2013), Risbey et al. (2015), Franzke et al. (2015), O’Kane et al. (2016), Yu et al. (2016), O’Kane et al. (2017), Gerber et al. (2020), and Quinn et al. (2021) extend matrix factorization techniques to time series predictions, where lags of the dataset under investigation are included in vector autoregressive or dynamic linear models. These methods have to be combined with an appropriate level of regularization as the number of free parameters typically increases quadratically with the spatial, latent or retained10 dimensions s multiplied by the number of lags L, O(Ls2). Additional levels of regularization are often further imposed on solutions to handle ill-conditioning and overfitting plaguing problems of these types for high-dimensional datasets.
A simpler approach would be to apply time-embedding (Takens 1981) to construct an augmented data matrix
d. Multivariate RSAA
The time-embedding construction followed by RSAA described previously is a special case of a more general method where a combined EOF analysis is employed (O’Kane et al. 2017; Hannachi 2021) followed by RSAA on the augmented data matrix
6. Application of archetypal analysis
In section 5, we illustrate in Fig. 1c how AA identifies extreme SST conditions in the reduced space spanned by the first three scaled PCs of SSTA and we describe the impact of detrending the dataset to separate extremes in the interannual variability from those driven by anthropogenic forcing. Hereafter, we present an application of AA to characterize ENSO, building on the example of Hannachi and Trendafilov (2017). For a choice of four archetypes applied to global SST anomaly fields, the resulting archetypes display patterns indicative of the four ENSO types: the classical eastern Pacific-type El Niño and La Niña (Rasmusson and Carpenter 1982) and the central Pacific-type [coined “Modoki” by Ashok et al. (2007)] El Niño and La Niña (e.g., Fu et al. 1986; Wang 1995; Trenberth and Smith 2006; Kao and Yu 2009; Cai et al. 2009). These patterns are shown in column 1 in Fig. 10.
AA composite results using detrended monthly SST anomalies over 1982–2020 for a selected archetype number of 4. (left) The resulting spatial patterns of SST anomalies constructed with the
Citation: Artificial Intelligence for the Earth Systems 1, 3; 10.1175/AIES-D-21-0007.1
The time series of the archetype coefficients,
In the following examples, we formed composites based on the
Column 2 in Fig. 10 illustrates the anomalous 300-hPa zonal wind component, 500-hPa geopotential height anomalies, and thermal wind anomaly vectors corresponding to the spatial pattern of SST anomaly for each ENSO archetype to the left. Several key features match with previously identified ENSO behavior. Conventional understanding of a Northern Hemisphere wintertime La Niña episode includes a strong westerly Pacific jet stream that splits around a well-developed North Pacific high pressure system (Alexander et al. 2002; Newman et al. 2016; Christensen et al. 2017; Capotondi et al. 2020). Both classical and Modoki La Niña archetypes (Fig. 10, rows 2 and 3) follow this behavior, though the classical La Niña pattern has a more coherent strengthening of the subtropical jet over Asia into the North Pacific. This strengthening of the jet occurs in association or response to the enhanced thermal wind induced by the warm and cold SST anomalies across the North Pacific Ocean in this pattern. In the Southern Hemisphere the main response for La Niña is in the polar jet stream. For La Niña Modoki (Fig. 10, row 2) there is an almost circumglobal change from warm SST anomalies to cold SST anomalies around 50°S latitude, which results in westerly thermal wind anomalies and an enhanced polar jet stream. The stronger polar jet stream is associated with lower geopotential height poleward of the jet, indicating enhanced storminess at high latitudes. At lower latitudes in the South Pacific, the subtropical jet stream is weakened for La Niña Modoki by the strong easterly thermal wind anomaly.
The Modoki El Niño in the top row has a strong anticyclone at 500 hPa in geopotential height over the Gulf of Alaska with evidence of an in situ response to thermal wind anomalies, consistent with findings by Kao and Yu (2009) that indicated Modoki ENSO tends to favor in situ development forced primarily by the atmosphere. In the Southern Hemisphere, both El Niño types (rows 1 and 4) feature SST gradients that enhance the thermal wind in the vicinity of the subtropical jet in the Pacific, though this is much stronger for classical El Niño (row 4). The geopotential height anomalies for both El Niño types feature a ridge and trough about South America reminiscent of the Pacific–South America pattern.
A different set of atmospheric diagnostic composites for the same archetype patterns is provided in column 3 of Fig. 10. Here, monthly averaged maximum daily wind speed anomalies at the surface are shaded and superimposed with contours of velocity potential difference anomalies between 150 and 850 hPa as in Adames and Wallace (2014), as well as vectors of anomalous wave activity flux (WAF) at 200 hPa (Takaya and Nakamura 2001). In line with recent studies (Liang et al. 2021; Chen et al. 2015), both flavors of El Niño are associated with a lessening of the easterly trade winds. Conversely, both flavors of La Niña correspond to a strengthening of the easterly trade winds. The well-developed North Pacific block in the Modoki La Niña (Fig. 10, row 2) is supported by strong WAF activity from the tropics into the North Pacific.
The composite behavior of WAF in the Southern Hemisphere may prompt questions about the influence of ENSO on the South Pacific convergence zone (SPCZ), which has been described as a “graveyard for fronts” (Trenberth 1976) and more recently associated with Rossby wave breaking (Matthews 2011). In each of the ENSO archetype composites there is a flux of wave activity in the Southern Hemisphere polar waveguide, which tends to move equatorward across the Australian continent and into the SPCZ region, consistent with the analysis of Matthews (2011). However, there are variations in this picture from case to case as indicated by the set of longitudes where the flux out of the waveguide moves equatorward. For El Niño Modoki (Fig. 10, row 1) and classical La Niña (Fig. 10, row 3) the strong equatorward flux is in the Indian Ocean and Australian continent region. For classical El Niño (Fig. 10, row 4) and La Niña (Fig. 10, row 2) Modoki the equatorward flux is over the Australian continent and there is convergence of wave activity flux in the South Pacific region. In these two cases the impact on the SPCZ seems stronger as indicated by the more coherent northwest–southeast-oriented anomaly in maximum surface wind for these composites.
The strongest anomaly composites are found in the tropical to subtropical bands for classical El Niño (Fig. 10, row 4) and Modoki La Niña (Fig. 10, row 2) in both surface and atmospheric fields aloft. Monthly averages of maximum daily surface wind speed, velocity potential difference, Δ150 − 850, and thermal wind anomaly composites show a clear correspondence to ENSO phases. For classical El Niños there is a slackening of the surface trade winds in the central Pacific, enhanced/reduced convection activity in the eastern/western Pacific, and symmetrically diverging thermal wind anomalies from the equatorial region. The opposite conditions are observed for Modoki La Niñas with a reinforcement of the trade winds in the western to central Pacific, a reduced/enhanced convection activity in the eastern/western Pacific and symmetrically convergent northeasterly and southeasterly thermal wind anomalies toward the equator.
One notable feature of the composite teleconnection patterns associated with the four ENSO types is that they are not particularly symmetric. That is, the teleconnections for classical versus Modoki forms are different for the same ENSO type, just as the El Niño and La Niña forms are different. In some cases, the teleconnection for the Modoki form more closely resembles that for the opposite ENSO type than it does the classical equivalent. This is understandable in that the teleconnections form in response to the global SST patterns for each type, which can be very different outside the tropics. More local gradients in SST in the archetype patterns can drive thermal wind responses that modify the jets and dynamical response in a region.
7. Summary and conclusions
This paper has demonstrated the utility of the AA method and the benefits that arise particularly when analyzing geophysical data. A derivation of RSAA is first provided as the foundation for working with large datasets that require an initial dimensionality reduction step to increase computational efficiency. Using a prototype dataset of monthly SST anomalies between 1982 and 2020, we have shown how outliers around a broadly ellipsoid-shaped distribution may be readily identifiable as archetypes of the data. These spatial archetype patterns resemble anomalies of SST associated with ENSO. If trends are of interest, the nondetrended data yields archetypes that show gradual warming from an initially cold pattern to a warmer one. Detrending the data prior to AA may remove a global warming pattern and instead reveal different flavors of ENSO, like the central Pacific (aka Modoki) versus classical eastern Pacific ENSO. As the number of archetypes increase from 4 to 8, the spatial patterns increase in diversity while still retaining familiar ENSO patterns. The Gini coefficient is introduced as a tool to inform on the number of principal components to be retained in the analysis based on the total variance explained by RSAA and conditioned on archetype cardinality. The Gini coefficient can also be used as a univariate discrimination score to identify extreme conditions and their persistence. Last, a useful application of AA is presented to show that compositing around the AA matrix factors time series reveals familiar atmospheric teleconnection patterns associated with extreme SST anomaly patterns.
We show that the decisions made to implement AA can greatly affect the interpretation of results. There is a priori no guarantee that solutions exist for the minimization problem for any given task, or that the solutions found will be meaningful. Results of AA should therefore be considered carefully through the lens of each individual decision made. Chosen methods should be documented thoroughly in all work involving AA to encourage reproducibility and understanding.
Acknowledgments.
This research was supported by the Multiyear Climate Project at the Commonwealth Scientific and Industrial Research Organisation, Oceans and Atmosphere. Bernadette Sloyan and Christopher Chapman were also funded by The Centre for Southern Hemisphere Oceans Research, Hobart, Tasmania, Australia. Abdelwaheb Hannachi and Nikolay Trendafilov received no external support for this work. We thank Richard Matear at the Commonwealth Scientific and Industrial Research Organisation, Oceans and Atmosphere, for his constant support and encouragement. We are also grateful to the reviewers whose insightful comments led to substantial improvement of the paper.
Data availability statement.
SST data are from the OISST v2.1 high-resolution dataset provided by the NOAA/OAR/ESRL Physical Sciences Laboratory. These data are available at https://psl.noaa.gov/. The atmospheric reanalysis data used to relate extreme events to large-scale climate modes come from the JRA-55 project carried out by the Japan Meteorological Agency (JMA). JRA-55 data are available at https://jra.kishou.go.jp.
Footnotes
A left-stochastic matrix is a non-negative matrix with each column summing to 1.
In geometry, the convex hull of a set S of points sampled from r-dimensional Euclidean space, is the smallest convex r polytope enclosing the entire data set and which vertices are points of S.
The cardinality of a set corresponds of the number of elements in the set.
The square root
The non-negative orthant is the generalization of the first quadrant in two dimensions to n dimensions.
Qualify nondeterministic polynomial acceptable problems in reference to the computing time needed to find their “near optimal” solutions.
Abrol and Sharma (2020) recently make use of the Gini sparsity measure to develop computationally efficient greedy AA (GAA) algorithm, where it is implemented in the AA optimization procedure to update the sparse stochastic matrix C.
See Komarov (2021).
If a dimension reduction step has been implemented for example.
APPENDIX
Available AA Packages
A number of open-source AA packages are available online. They have been implemented for most major computing languages in use today such as MATLAB, Python, and R. The reader is referred to Table A1 for a nonexhaustive selection of package URLs. Throughout the paper, we have used exclusively the AA package developed by Mørup and Hansen (2012), after trialing several implementations listed in Table A1. We have found the pure MATLAB script “PCHA” extremely robust and easy to modify for our purpose. PCHA computation speed compares to the Sparse Modeling Software (SPAMS), version 2.6, MATLAB version of Mairal (2014, 2017) resulting in near-identical solutions in the stochastic matrices
REFERENCES
Abrol, V., and P. Sharma, 2020: A geometric approach to archetypal analysis via sparse projections. Proc. 37th Int. Conf. on Machine Learning, Online, ICML, 42–51, http://proceedings.mlr.press/v119/abrol20a/abrol20a.pdf.
Adames, A. F., and J. M. Wallace, 2014: Three-dimensional structure and evolution of the MJO and its relation to the mean flow. J. Atmos. Sci., 71, 2007–2026, https://doi.org/10.1175/JAS-D-13-0254.1.
Alexander, M. A., I. Bladé, M. Newman, J. R. Lanzante, N.-C. Lau, and J. D. Scott, 2002: The atmospheric bridge: The influence of ENSO teleconnections on air–sea interaction over the global oceans. J. Climate, 15, 2205–2231, https://doi.org/10.1175/1520-0442(2002)015<2205:TABTIO>2.0.CO;2.
Aloise, D., A. Deshpande, P. Hansen, and P. Popat, 2009: NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn., 75, 245–248, https://doi.org/10.1007/s10994-009-5103-0.
Ashok, K., S. K. Behera, S. A. Rao, H. Weng, and T. Yamagata, 2007: El Niño Modoki and its possible teleconnection. J. Geophys. Res., 112, C11007, https://doi.org/10.1029/2006JC003798.
Bauckhage, C., 2014: A note on archetypal analysis and the approximation of convex hulls. arXiv, 1410.0642, https://doi.org/10.48550/arXiv.1410.0642.
Bauckhage, C., and C. Thurau, 2009: Making archetypal analysis practical. Pattern Recognition, J. Denzler, G. Notni, and H. Süße, Eds., Lecture Notes in Computer Science, Vol. 5748, Springer, 272–281.
Bauckhage, C., and K. Manshaei, 2014: Kernel archetypal analysis for clustering web search frequency time series. 22nd Int. Conf. on Pattern Recognition, Stockholm, Sweden, IEEE, 1544–1549.
Boumal, N., B. Mishra, P.-A. Absil, and R. Sepulchre, 2014: Manopt, a Matlab toolbox for optimization on manifolds. J. Mach. Learn. Res., 15, 1455–1459
Boyd, S. P., and L. Vandenberghe, 2004: Convex Optimization. 1st ed. Cambridge University Press, 727 pp.
Cai, W., and T. Cowan, 2009: La Niña Modoki impacts Australia autumn rainfall variability. Geophys. Res. Lett., 36, L12805, https://doi.org/10.1029/2009GL037885.
Cai, W., T. Cowan, and A. Sullivan, 2009: Recent unprecedented skewness towards positive Indian Ocean Dipole occurrences and its impact on Australian rainfall. Geophys. Res. Lett., 36, L11705, https://doi.org/10.1029/2009GL037604.
Capotondi, A., and P. D. Sardeshmukh, 2015: Optimal precursors of different types of ENSO events. Geophys. Res. Lett., 42, 9952–9960, https://doi.org/10.1002/2015GL066171.
Capotondi, A., and P. D. Sardeshmukh, 2017: Is El Niño really changing? Geophys. Res. Lett., 44, 8548–8556, https://doi.org/10.1002/2017GL074515.
Capotondi, A., A. T. Wittenberg, J.-S. Kug, K. Takahashi, and M. J. McPhaden, 2020: ENSO Diversity. El Niño Southern Oscillation in a Changing Climate, Geophys. Monogr., Vol. 253, Amer. Geophys. Union, 65–86, https://doi.org/10.1002/9781119548164.ch4.
Chen, D., and Coauthors, 2015: Strong influence of westerly wind bursts on El Niño diversity. Nat. Geosci., 8, 339–345, https://doi.org/10.1038/ngeo2399.
Chen, Y., J. Mairal, and Z. Harchaoui, 2014: Fast and robust archetypal analysis for representation learning. Conf. on Computer Vision and Pattern Recognition, Columbus, OH, IEEE, 1478–1485.
Christiansen, B., 2007: Atmospheric circulation regimes: Can cluster analysis provide the number? J. Climate, 20, 2229–2250, https://doi.org/10.1175/JCLI4107.1.
Christensen, H. M., J. Berner, D. R. B. Coleman, and T. N. Palmer, 2017: Stochastic parameterization and El Niño–Southern Oscillation. J. Climate, 30, 17–38, https://doi.org/10.1175/JCLI-D-16-0122.1.
Cichocki, A., R. Zdunek, A. H. Phan, and S.-I. Amari, 2009: Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation. 1st ed. Wiley Publishing, 504 pp.
Cutler, A., and L. Breiman, 1994: Archetypal analysis. Technometrics, 36, 338–347, https://doi.org/10.1080/00401706.1994.10485840.
Elad, M., 2010: Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. 1st ed. Springer, 376 pp.
Elden, L., 2019: Matrix Methods in Data Mining and Pattern Recognition. 2nd ed. Society for Industrial and Applied Mathematics, 229 pp.
Esposito, F., 2021: A review on initialization methods for nonnegative matrix factorization: Towards omics data experiments. Mathematics, 9, 1006, https://doi.org/10.3390/math9091006.
Eugster, M. J. A., and F. Leisch, 2011: Weighted and robust archetypal analysis. Comput. Stat. Data Anal., 55, 1215–1225, https://doi.org/10.1016/j.csda.2010.10.017.
Fligner, M. A., and J. S. Verducci, 1986: Distance based ranking models. J. Roy. Stat. Soc., 48B, 359–369, https://doi.org/10.1111/j.2517-6161.1986.tb01420.x.
Franzke, C. L. E., T. J. O’Kane, D. P. Monselesan, J. S. Risbey, and I. Horenko, 2015: Systematic attribution of observed Southern Hemisphere circulation trends to external forcing and internal variability. Nonlinear Processes Geophys., 22, 513–525, https://doi.org/10.5194/npg-22-513-2015.
Fu, C., H. F. Diaz, and J. O. Fletcher, 1986: Characteristics of the response of sea surface temperature in the central Pacific associated with warm episodes of the Southern Oscillation. Mon. Wea. Rev., 114, 1716–1739, https://doi.org/10.1175/1520-0493(1986)114<1716:COTROS>2.0.CO;2.
Gan, G., C. Ma, and J. Wu, 2020: Data Clustering: Theory, Algorithms and Applications. 2nd ed. SIAM Press, 406 pp.
Gerber, S., L. Pospisil, M. Navandar, and I. Horenko, 2020: Low-cost scalable discretization, prediction, and feature selection for complex systems. Sci. Adv., 6, eaaw0961, https://doi.org/10.1126/sciadv.aaw0961.
Gillis, N., 2020: Nonnegative Matrix Factorization. Society for Industrial and Applied Mathematics, 350 pp.
Gini, C., 1921: Measurement of inequality of incomes. Econ. J., 31, 124–126, https://doi.org/10.2307/2223319.
Han, R., B. Osting, D. Wang, and Y. Xu, 2022: Probabilistic methods for approximate archetypal analysis. Inf. Inference J. IMA, 2022, iaac008, https://doi.org/10.1093/imaiai/iaac008.
Hannachi, A., 2021: Further topics. Patterns Identification and Data Mining in Weather and Climate, A. Hannachi, Ed., Springer, 367–413.
Hannachi, A., and N. Trendafilov, 2017: Archetypal analysis: Mining weather and climate extremes. J. Climate, 30, 6927–6944, https://doi.org/10.1175/JCLI-D-16-0798.1.
Hasselmann, K., 1988: PIPs and POPs: The reduction of complex dynamical systems using principal interaction and oscillation patterns. J. Geophys. Res., 93, 11 015–11 021, https://doi.org/10.1029/JD093iD09p11015.
Horenko, I., 2009: On robust estimation of low-frequency variability trends in discrete Markovian sequences of atmospheric circulation patterns. J. Atmos. Sci., 66, 2059–2072, https://doi.org/10.1175/2008JAS2959.1.
Horenko, I., 2010a: Finite element approach to clustering of multidimensional time series. SIAM J. Sci. Comput., 32, 62–83, https://doi.org/10.1137/080715962.
Horenko, I., 2010b: On clustering of non-stationary meteorological time series. Dyn. Atmos. Oceans, 49, 164–187, https://doi.org/10.1016/j.dynatmoce.2009.04.003.
Horenko, I., 2010c: On the identification of nonstationary factor models and their application to atmospheric data analysis. J. Atmos. Sci., 67, 1559–1574, https://doi.org/10.1175/2010JAS3271.1.
Hotelling, H., 1933: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol., 24, 417–441, https://doi.org/10.1037/h0071325.
Hurley, N., and S. Rickard, 2009: Comparing measures of sparsity. IEEE Trans. Inf. Theory, 55, 4723–4741, https://doi:10.1109/TIT.2009.2027527.
IPCC, 2013: Climate Change 2013: The Physical Science Basis. Cambridge University Press, 1535 pp., https://doi.org/10.1017/CBO9781107415324.
IPCC, 2019: The Ocean and Cryosphere in a Changing Climate. H.-O. Pörtner et al., Eds., Cambridge University Press, 766 pp., https://www.ipcc.ch/site/assets/uploads/sites/3/2022/03/SROCC_FullReport_FINAL.pdf.
Izenman, A. J., 2008: Linear dimensionality reduction. Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning, 1st ed. A. J. Izenman, Ed., Springer Texts in Statistics, Springer, 195–236.
Jolliffe, I. T., 1986: Principal Component Analysis. Springer Verlag, 271 pp.
Jolliffe, I. T., and J. Cadima, 2016: Principal component analysis: A review and recent developments. Philos. Trans. Roy. Soc., 374A, 20150202, https://doi.org/10.1098/rsta.2015.0202.
Kao, H.-Y., and J.-Y. Yu, 2009: Contrasting eastern-Pacific and central-Pacific types of ENSO. J. Climate, 22, 615–632, https://doi.org/10.1175/2008JCLI2309.1.
Keller, S. M., M. Samarin, F. Arend Torres, M. Wieser, and V. Roth, 2021: Learning extremal representations with deep archetypal analysis. Int. J. Comput. Vis., 129, 805–820, https://doi.org/10.1007/s11263-020-01390-3.
Kobayashi, S., and Coauthors, 2015: The JRA-55 reanalysis: General specifications and basic characteristics. J. Meteor. Soc. Japan, 93, 5–48, https://doi.org/10.2151/jmsj.2015-001.
Komarov, O., 2021: okomarov/ginicoeff. GitHub, accessed 10 November 2021, https://github.com/okomarov/ginicoeff.
Liang, Y., A. V. Fedorov, and P. Haertel, 2021: Intensification of westerly wind bursts caused by the coupling of the Madden-Julian oscillation to SST during El Niño onset and development. Geophys. Res. Lett., 48, e2020GL089395, https://doi.org/10.1029/2020GL089395.
Lorenz, E., 1956: Empirical orthogonal functions and statistical weather prediction. MIT Department of Meteorology Statistical Forecasting Project Scientific Rep. 1, 49 pp., https://eapsweb.mit.edu/sites/default/files/Empirical_Orthogonal_Functions_1956.pdf.
Mair, S., and U. Brefeld, 2019: Coresets for archetypal analysis. 9 pp., https://papers.nips.cc/paper/2019/file/7f278ad602c7f47aa76d1bfc90f20263-Paper.pdf.
Mair, S., A. Boubekki, and U. Brefeld, 2017: Frame-based data factorizations. Int. Conf. on Machine Learning, Sydney, New South Wales, Australia, ICML, 2305–2313, http://proceedings.mlr.press/v70/mair17a/mair17a.pdf.
Mairal, J., 2014: Sparse modeling for image and vision processing. Found. Trends Comput. Graph. Vis., 8, 85–283, https://doi.org/10.1561/0600000058.
Mairal, J., 2017: SPAMS: A SPArse Modeling Software, v 2.6. http://thoth.inrialpes.fr/people/mairal/spams/doc/html/index.html.
Matthews, A. J., 2011: A multiscale framework for the origin and variability of the South Pacific convergence zone. Quart. J. Roy. Meteor. Soc., 138, 1165–1178, https://doi.org/10.1002/qj.1870.
Meilă, M., 2007: Comparing clusterings—An information based distance. J. Multivar. Anal., 98, 873–895, https://doi:10.1016/j.jmva.2006.11.013.
Mo, K. C., and M. Ghil, 1987: Statistics and dynamics of persistent anomalies. J. Atmos. Sci., 44, 877–902, https://doi.org/10.1175/1520-0469(1987)044<0877:SADOPA>2.0.CO;2.
Monahan, A. H., J. C. Fyfe, M. H. Ambaum, D. B. Stephenson, and G. R. North, 2009: Empirical orthogonal functions: The medium is the message. J. Climate, 22, 6501–6514, https://doi.org/10.1175/2009JCLI3062.1.
Mørup, M., and L. K. Hansen, 2012: Archetypal analysis for machine learning and data mining. Neurocomputing, 80, 54–63, https://doi.org/10.1016/j.neucom.2011.06.033.
Motevalli-Soumehsaraei, B., and A. Barnard, 2019: Archetypal analysis package, version 1. CSIRO, https://doi.org/10.25919/5d3958889f7ff.
Newman, M., and Coauthors, 2016: The Pacific decadal oscillation, revisited. J. Climate, 29, 4399–4427, https://doi.org/10.1175/JCLI-D-15-0508.1.
Nguyen, L. H., and S. Holmes, 2019: Ten quick tips for effective dimensionality reduction. PLOS, 15, e1006907, https://doi.org/10.1371/journal.pcbi.1006907.
North, G. R., 1984: Empirical orthogonal functions and normal modes. J. Atmos. Sci., 41, 879–887, https://doi.org/10.1175/1520-0469(1984)041<0879:EOFANM>2.0.CO;2.
O’Kane, T. J., J. S. Risbey, C. Franzke, I. Horenko, and D. P. Monselesan, 2013: Changes in the metastability of the midlatitude Southern Hemisphere circulation and the utility of nonstationary cluster analysis and split-flow blocking indices as diagnostic tools. J. Atmos. Sci., 70, 824–842, https://doi.org/10.1175/JAS-D-12-028.1.
O’Kane, T. J., J. S. Risbey, D. P. Monselesan, I. Horenko, and C. L. E. Franzke, 2016: On the dynamics of persistent states and their secular trends in the waveguides of the Southern Hemisphere troposphere. Climate Dyn., 46, 3567–3597, https://doi.org/10.1007/s00382-015-2786-8.
O’Kane, T. J., D. P. Monselesan, J. S. Risbey, I. Horenko, and C. L. E. Franzke, 2017: On memory, dimension, and atmospheric teleconnections. Math. Climate Wea. Forecasting, 3, 1–27, https://doi.org/10.1515/mcwf-2017-0001.
Quinn, C., D. Harries, and T. J. O’Kane, 2021: Dynamical analysis of a reduced model for the North Atlantic Oscillation. J. Atmos. Sci., 78, 1647–1671, https://doi.org/10.1175/JAS-D-20-0282.1.
Rasmusson, E. M., and T. H. Carpenter, 1982: Variations in tropical sea surface temperature and surface wind fields associated with the Southern Oscillation/El Niño. Mon. Wea. Rev., 110, 354–384, https://doi.org/10.1175/1520-0493(1982)110<0354:VITSST>2.0.CO;2.
Reynolds, R. W., T. M. Smith, C. Liu, D. B. Chelton, K. S. Casey, and M. G. Schlax, 2007: Daily high-resolution-blended analyses for sea surface temperature. J. Climate, 20, 5473–5496, https://doi.org/10.1175/2007JCLI1824.1.
Richardson, D., A. S. Black, D. P. Monselesan, T. S. Moore, J. S. Risbey, A. Schepen, D. T. Squire, and C. R. Tozer, 2021: Identifying periods of forecast model confidence for improved subseasonal prediction of precipitation. J. Hydrometeor., 22, 371–385, https://doi.org/10.1175/JHM-D-20-0054.1.
Risbey, J. S., T. J. O’Kane, D. P. Monselesan, C. Franzke, and I. Horenko, 2015: Metastability of Northern Hemisphere teleconnection modes. J. Atmos. Sci., 72, 35–54, https://doi.org/10.1175/JAS-D-14-0020.1.
Risbey, J. S., D. P. Monselesan, A. S. Black, T. S. Moore, D. Richardson, D. T. Squire, and C. R. Tozer, 2021: The identification of long-lived Southern Hemisphere flow events using archetypes and principal components. Mon. Wea. Rev., 149, 1987–2010, https://doi.org/10.1175/MWR-D-20-0314.1.
Seitola, T., V. Mikkola, J. Silen, and H. Järvinen, 2014: Random projections in reducing the dimensionality of climate simulation data. Tellus, 66A, 25274, https://doi.org/10.3402/tellusa.v66.25274.
Seth, S., and M. J. A. Eugster, 2016: Probabilistic archetypal analysis. Mach. Learn., 102, 85–113, https://doi.org/10.1007/s10994-015-5498-8.
Steinschneider, S., and U. Lall, 2015: Daily precipitation and tropical moisture exports across the eastern United States: An application of archetypal analysis to identify spatiotemporal structure. J. Climate, 28, 8585–8602, https://doi.org/10.1175/JCLI-D-15-0340.1.
Suleman, A., 2017a: On ill-conceived initialization in archetypal analysis. Adv. Data Anal. Classif., 11, 785–808, https://doi.org/10.1007/s11634-017-0303-0.
Suleman, A., 2017b: Validation of archetypal analysis. Int. Conf. on Fuzzy Systems, Naples, Italy, IEEE, 1–6, https://doi.org/10.1109/FUZZ-IEEE.2017.8015385.
Takaya, K., and H. Nakamura, 2001: A formulation of a phase-independent wave-activity flux for stationary and migratory quasigeostrophic eddies on a zonally varying basic flow. J. Atmos. Sci., 58, 608–627, https://doi.org/10.1175/1520-0469(2001)058<0608:AFOAPI>2.0.CO;2.
Takens, F., 1981: Detecting strange attractors in turbulence. Dynamical Systems and Turbulence, Warwick 1980, D. Rand and L.-S. Young, Eds., Vol. 898, Springer, 366–381.
Thurau, C., K. Kersting, and C. Bauckhage, 2009: Convex non-negative matrix factorization in the wild. Ninth IEEE Int. Conf. on Data Mining, Miami Beach, FL, IEEE, 523–532, https://doi.org/10.1109/ICDM.2009.55.
Thurau, C., K. Kersting, M. Wahabzada, and C. Bauckhage, 2011: Convex non-negative matrix factorization for massive datasets. Knowl. Inf. Syst., 29, 457–478, https://doi.org/10.1007/s10115-010-0352-6.
Trenberth, K. E., 1976: Spatial and temporal variations of the Southern Oscillation. Quart. J. Roy. Meteor. Soc., 102, 639–653, https://doi.org/10.1002/qj.49710243310.
Trenberth, K. E., and L. Smith, 2006: The vertical structure of temperature in the tropics: Different flavors of El Niño. J. Climate, 19, 4956–4973, https://doi.org/10.1175/JCLI3891.1.
Trendafilov, N., and M. Gallo, 2021: Data analysis on simplexes. Multivariate Data Analysis on Matrix Manifolds, N. Trendafilov and M. Gallo, Eds., Springer Series in the Data Sciences, Springer, 373–402.
Vinué, G., 2017: Anthropometry: An R package for analysis of anthropometric data. J. Stat. Software, 77, 1–39, https://doi.org/10.18637/jss.v077.i06.
Wang, B., 1995: Interdecadal changes in El Niño onset in the last four decades. J. Climate, 8, 267–285, https://doi.org/10.1175/1520-0442(1995)008<0267:ICIENO>2.0.CO;2.
Yu, H.-F., N. Rao, and I. S. Dhillon, 2016: Temporal regularized matrix factorization for high-dimensional time series prediction. Advances in Neural Information Processing Systems 29, Barcelona, Spain, NIPS, 847–855, https://proceedings.neurips.cc/paper/2016/hash/85422afb467e9456013a2a51d4dff702-Abstract.html.
Zonoobi, D., A. A. Kassim, and Y. V. Venkatesh, 2011: Gini index as sparsity measure for signal reconstruction from compressive samples. IEEE J. Sel. Top. Signal Process., 5, 927–932, https://doi:10.1109/JSTSP.2011.2160711.