The Identification of Long-Lived Southern Hemisphere Flow Events Using Archetypes and Principal Components

James S. Risbey aCSIRO Oceans and Atmosphere, Hobart, Tasmania, Australia

Search for other papers by James S. Risbey in
Current site
Google Scholar
PubMed
Close
,
Didier P. Monselesan aCSIRO Oceans and Atmosphere, Hobart, Tasmania, Australia

Search for other papers by Didier P. Monselesan in
Current site
Google Scholar
PubMed
Close
,
Amanda S. Black aCSIRO Oceans and Atmosphere, Hobart, Tasmania, Australia

Search for other papers by Amanda S. Black in
Current site
Google Scholar
PubMed
Close
,
Thomas S. Moore aCSIRO Oceans and Atmosphere, Hobart, Tasmania, Australia

Search for other papers by Thomas S. Moore in
Current site
Google Scholar
PubMed
Close
,
Doug Richardson aCSIRO Oceans and Atmosphere, Hobart, Tasmania, Australia

Search for other papers by Doug Richardson in
Current site
Google Scholar
PubMed
Close
,
Dougal T. Squire aCSIRO Oceans and Atmosphere, Hobart, Tasmania, Australia

Search for other papers by Dougal T. Squire in
Current site
Google Scholar
PubMed
Close
, and
Carly R. Tozer aCSIRO Oceans and Atmosphere, Hobart, Tasmania, Australia

Search for other papers by Carly R. Tozer in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

From time to time atmospheric flows become organized and form coherent long-lived structures. Such structures could be propagating, quasi-stationary, or recur in place. We investigate the ability of principal components analysis (PCA) and archetypal analysis (AA) to identify long-lived events, excluding propagating forms. Our analysis is carried out on the Southern Hemisphere midtropospheric flow represented by geopotential height at 500 hPa (Z500). The leading basis patterns of Z500 for PCA and AA are similar and describe structures representing (or similar to) the southern annular mode (SAM) and Pacific–South American (PSA) pattern. Long-lived events are identified here from sequences of 8 days or longer where the same basis pattern dominates for PCA or AA. AA identifies more long-lived events than PCA using this approach. The most commonly occurring long-lived event for both AA and PCA is the annular SAM-like pattern. The second most commonly occurring event is the PSA-like Pacific wave train for both AA and PCA. For AA the flow at any given time is approximated as weighted contributions from each basis pattern, which lends itself to metrics for discriminating among basis patterns. These show that the longest long-lived events are in general better expressed than shorter events. Case studies of long-lived events featuring a blocking structure and an annular structure show that both PCA and AA can identify and discriminate the dominant basis pattern that most closely resembles the flow event.

© 2021 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: James Risbey, james.risbey@csiro.au

Abstract

From time to time atmospheric flows become organized and form coherent long-lived structures. Such structures could be propagating, quasi-stationary, or recur in place. We investigate the ability of principal components analysis (PCA) and archetypal analysis (AA) to identify long-lived events, excluding propagating forms. Our analysis is carried out on the Southern Hemisphere midtropospheric flow represented by geopotential height at 500 hPa (Z500). The leading basis patterns of Z500 for PCA and AA are similar and describe structures representing (or similar to) the southern annular mode (SAM) and Pacific–South American (PSA) pattern. Long-lived events are identified here from sequences of 8 days or longer where the same basis pattern dominates for PCA or AA. AA identifies more long-lived events than PCA using this approach. The most commonly occurring long-lived event for both AA and PCA is the annular SAM-like pattern. The second most commonly occurring event is the PSA-like Pacific wave train for both AA and PCA. For AA the flow at any given time is approximated as weighted contributions from each basis pattern, which lends itself to metrics for discriminating among basis patterns. These show that the longest long-lived events are in general better expressed than shorter events. Case studies of long-lived events featuring a blocking structure and an annular structure show that both PCA and AA can identify and discriminate the dominant basis pattern that most closely resembles the flow event.

© 2021 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: James Risbey, james.risbey@csiro.au

1. Introduction

The passage of weather systems in the extratropics generates variability in day-to-day conditions at the surface. Weather systems are a mix of transient (mobile) structures such as fronts, and quasi-stationary structures such as blocking highs. These structures induce meridional flow, which transports heat and moisture and generates the variability in surface conditions. When meridional flow patterns persist for long periods of time this variability is often manifest as extremes of temperature (heatwaves, cold events) and rainfall (wet or dry spells). For example, extreme heatwaves in Europe and Russia have been associated with long-lived blocking events (Dole et al. 2011; Lau and Kim 2012; Schneidereit et al. 2012); floods in Pakistan have been associated with a persistent Rossby wave train (Lau and Kim 2012); and droughts have been associated with persistent or recurring ridges/blocks (Teng and Branstator 2017). This study focuses on long-lived atmospheric flow events, which are interesting because they are unusual and because they can generate extreme conditions at the surface.

The time scales associated with synoptic variability are typically between about 2 and 10 days (Trenberth and Mo 1985; Dole 1986). Individual weather systems (highs and lows) are recognizable for multiple days. Individual systems may be embedded in organized large-scale patterns. Such patterns can be transient or quasi-stationary and may manifest as single features or as a recurrence of similar features. Organized patterns can span the hemisphere as in circumglobal wave trains (Branstator 2002), or may express in particular sectors of the hemisphere as in regional wave trains (Wirth et al. 2018), blocks, or large-scale troughs (Black et al. 2021). Most extratropical atmospheric flow patterns are recognizable as distinct patterns for typically about a week, but this period can be considerably longer. Some flow features can persist for a month or more, particularly in the Northern Hemisphere (Haines 1994).

Individual long-lived flow events can be pretty obvious, such as when a block sits or recurs in place through much of a month and generates a run of unseasonal weather. However, not all long-lived flow events are obvious at the time, and even when they are, that does not guarantee that they can be readily quantified. Ideally, we would like some way to identify and quantify all long-lived flow events. That would enable us to better understand their variability, their surface impacts, and the conditions that give rise to them. The difficulty arises in that these long-lived events take a variety of different forms with different structures and different spatiotemporal signatures on the flow (Mo and Ghil 1987; Lau et al. 1994).

Sets of methods have been developed to identify specific types of long-lived events. Perhaps the canonical long-lived flow event is the persistent block. Methods used to characterize individual blocking events include identifying persistent positive height anomalies (Trenberth and Mo 1985; Dole 1986; Renwick 2005), and indices based around a splitting or reversal of the midtropospheric flow (Pook and Gibson 1999; Tibaldi and Molteni 2002; Pelly and Hoskins 2003). Another targeted long-lived flow type is Rossby wave trains. Rossby wave train events have been identified using methods to track Rossby wave packets (Wirth et al. 2018) and to describe the envelope of recurrent Rossby wave patterns (Röthlisberger et al. 2019).

Long-lived flow events have also been studied in association with the low-frequency modes of variability in the atmosphere (Wallace and Gutzler 1981; Haines 1994). A good example of this is the annular modes: northern annular mode (NAM) (Lorenz 1951) and southern annular mode (SAM) (Thompson and Wallace 2000). Indices of the annular modes have been developed to track their evolution in real time and to allow forecasts of their potential evolution over coming weeks. The NAM and SAM indices are also used to identify specific annular mode events. These events tend to signify the strength of coupling between mid- and high latitudes (Spensberger et al. 2020). When such events are particularly long lived, they are sometimes ascribed as causes of specific seasonal anomalies over the midlatitude continents, for example, cold winters (Cohen et al. 2010) and hot, dry springs (Lim et al. 2019).

Another set of modes of variability that are similarly tracked using indices are the Pacific–North America (PNA) pattern (Wallace and Gutzler 1981) and Pacific–South America (PSA) pattern (Mo and Ghil 1987; Lau et al. 1994; Mo and Higgins 1998). These patterns reflect characteristics of wave train structures and blocking (in the nodes of the wave train) (Mo and Ghil 1987; Risbey et al. 2015; O’Kane et al. 2017). PNA events have been associated with drought in North America (Trenberth et al. 1988; Teng and Branstator 2017). PSA events have been associated with severe frost seasons in South America (Müller and Ambrizzi 2007) and extreme cold (Risbey et al. 2019) and rain (Tozer et al. 2018) in Australia.

Each of these low-frequency modes of variability (NAM, SAM, PNA, PSA) have been quantified using more than one type of index. They have typically been defined in simple form using point-based data (latitude–longitude coordinates or station locations), and more generally through principal components analysis (PCA) of atmospheric height or streamfunction fields. The use of PCA to construct indices of modes of variability is common in atmospheric sciences (North et al. 1982; Hannachi et al. 2007). The annular modes are typically defined as the leading principal component (PC) of geopotential height (at a level between 500 and 1000 hPa) in the respective hemispheres (Thompson and Wallace 2000). The PNA is typically defined as the second leading mode of rotated principal components of Northern Hemisphere geopotential height at 500 or 700 hPa (Mo and Livezey 1986; Barnston and Livezey 1987). The PSA is defined variously as the second and third PCs of 500-hPa geopotential height (Mo and Paegle 2001; O’Kane et al. 2017), as the leading PCs of 200-hPa eddy streamfunction (Mo and Higgins 1998; Mo and Paegle 2001), and via rotated PCs of 200-hPa streamfunction (Lau et al. 1994).

The use of PCA to define these modes of variability and their indices follows from the role of PCA in maximizing the variance represented in successive basis functions. PCA is an efficient means for decomposing the variance in the flow, and the leading PCA modes have spatial structures that are broadly consistent with those obtained using other methods (Mo and Ghil 1987; Risbey et al. 2015). In short, PCA is one of the primary tools used to generate indices of the low-frequency variability in the atmosphere, and these indices are used to track the evolution of the low-frequency modes in observations and model forecasts. For example, the NOAA Climate Prediction Center provides real-time updates of a range of climate teleconnection modes based on PCA. The time series of these PC indices are also used to identify and relate extreme or long-lived expressions of these modes to extreme surface conditions or seasonal anomalies (Cohen et al. 2010; Hendon et al. 2014). The association between runs of high values of PC indices and the identification of extreme events is routine, but mostly informal in the literature. By this we mean that the event is “identified” by visual inspection of the PC time series without formal measures to identify it. We explore here some of the issues in formalizing that link.

Our interest is in examining how well PCA-based indices capture individual long-lived flow events, what these events look like, what their surface signatures are, and how well PCA discriminates between different flow types at any point in time. We ask these questions for the case of long-lived Southern Hemisphere flow events, and so the relevant PCA modes here are the SAM (PC1) and PSA (PC2, PC3). As a point of reference, we will compare and contrast the PCA-based indices with those from an alternative means of decomposing the flow. There are many such alternatives that have been deployed in climate analysis, but the one selected here is archetypal analysis (AA) (Cutler and Breiman 1994). As such, we only consider PCA and AA here. Our rationale for this is that PCA is perhaps the most commonly used method, and we want to contrast it with a method (AA) that is formally similar, but uses a very different optimization principle. We do not argue that either method is appropriate or best for use with atmospheric data. Rather, we explore how well they work in identifying long-lived flow events. There are many issues with the use of these methods and we describe some of them below.

Archetypal analysis is not a new technique, but is relatively new to climate analysis (Steinschneider and Lall 2015; Hannachi and Trendafilov 2017; Knighton et al. 2019; Richardson et al. 2021). One advantage of AA for comparison with PCA is that it is similar in that it frames the general problem in the same formalism as a decomposition of the data, xi(s) (time dimension i = 1, …, t; space dimension s), into an optimal linear combination of k basis functions of space zk(s) and expansion functions of time αik. That is, xi(s)=k=1pαikzk(s), for truncation at p basis functions. The differences between PCA and AA arise in the constraints and goal in determining the optimal set of basis functions. For PCA the constraints provide solutions that maximize the successive amount of variance explained by each basis function. For AA a different set of constraints are used to find solutions that best approximate the convex hull of the data, X. Since AA is less familiar to the climate literature, we provide an explanation of this statement in describing the method below.

In the remainder of the paper we provide context for the use of AA with climate data, we describe the Southern Hemisphere data used for the analysis, and provide more detail on the differences between PCA and AA. We describe the leading basis functions that emerge from use of these two decompositions, and our methods for examining long-lived events in the PCA and AA settings. We generate composites of long-lived PCA and AA events and their surface signatures. We examine the problem of discriminating between basis functions at given points in time, which is a part of the problem of identifying individual events. Finally, to see how PCA and AA perform in identifying events in more detail, we examine some select case studies of long-lived flow events in the Southern Hemisphere.

2. Data and dimensionality

The midtropospheric flow is characterized here by the geopotential height at 500 hPa (Z500). The source of Z500 is JRA-55 (Kobayashi et al. 2015), which provides a high-resolution (nominally 0.5625° regridded to 1.25° latitude–longitude), four-dimensional variational reanalysis from 1958 to the present. Anomalies of the flow are calculated by removing the daily mean (calculated over 1958–2018) for each day of the year. The daily anomalies are denoted Z500. Surface fields for 2-m temperature T2m are also taken from the JRA-55.

The data for our analysis, X, consist of the sequence of spatial fields (s = 288 longitudes × 73 latitudes = 21 024 points) of Z500 over the Southern Hemisphere. The sequence contains t = 22 280 days of daily fields. The total dimension of the data, X = X[s, t], is s × t, or nearly half a billion, which imposes a large computational burden on optimization methods applied to the data. This is relevant to the archetype analysis here. As such, we chose to reduce the dimensionality of the data by first applying PCA to the data. The PCA solution method is via singular value decomposition (svd) of X, svd(X[s, t]) = U[s, s]Λ[s, t]VT[t, t] = [EOF, λ, PC], where U is the matrix form of the EOF patterns, Λ is the eigenmatrix, whose diagonal elements are the square root of the explained variance, and VT is the transpose of the principal component matrix. We represent X[s, t] by the leading r PCs of the data, accounting for 99% of the variance in the data, Xr. For our data, r = 155 to retain 99%. At each time instant i xr,i=k=1rλkPCk,iEOFk. The data reduction step reduces the spatial dimension of the data from 21 024 to 155, to give a reduced dimensionality of sr × t = 155 × 22 280 =3 453 400. In the archetype analysis that follows our representation of X for clustering purposes is taken to be this reduced form of the data consisting of the leading 155 PCs. For the PCA event analysis we chose to retain the first 5 PCs from the svd of X as basis functions (see section 5).

3. Flow decomposition methods

a. The general problem

All methods to cluster or decompose the atmospheric flow entail a range of choices and compromises. Methods to decompose the flow typically seek to find some small set of basis patterns of the flow, (z1, …, zp) that describe its dominant characteristics. The form and properties of z are determined by the constraints applied in solving for z. The basis patterns provide a way to categorize the flow, but the resulting categories and their structure are themselves influenced by the filtering choices implicit in every method (Monahan et al. 2009).

It is convenient to assume that flow events in the atmosphere generally correspond to particular recurring modes or patterns of variability (such as the SAM and PSA in the Southern Hemisphere). Further, it is hoped that the basis patterns from a flow decomposition might broadly represent these modes. Analysis would be difficult if every long-lived event were uniquely different and not broadly classifiable—though this is at least in part the case. The classification or clustering of events is challenging because the atmosphere (and the reanalyses used to represent it) is very high dimensional (Christiansen 2007). Instances of daily weather patterns in reanalyses tend to be sparse and dissimilar in many ways (Van Den Dool 1994; Lorenz 2004). This dissimilarity follows the curse of dimensionality. As the dimensionality of our data, X, increases, the distance between points in X increases. If distance is taken to be a measure of similarity, then the dissimilarity of points in X increases with increasing dimensionality (Altman and Krzywinski 2018). This increase in dissimilarity is problematic for all classification approaches based on assigning similarity according to distance from a set of representative basis patterns.

Both PCA and AA provide statistical representations of some properties of the atmospheric data, but these properties do not guarantee that either method perform well at identifying long-lived flow events. In particular, these methods have not been designed to identify the dynamical modes of the system. For PCA it has been shown that EOFs generally do not correspond to the true dynamical modes because the atmosphere is nonlinear, nonconservative, and its modes are generally nonnormal (North 1984; Mo and Ghil 1987; Hasselmann 1988; Monahan et al. 2009; Hassanzadeh and Kuang 2016; Sheshadri and Plumb 2017). Both PCA and AA are insensitive to the time ordering of data in that their basis patterns are the same even if the time dimension is shuffled. By contrast, real atmospheric flows are autocorrelated and sensitive to time ordering. Methods incorporating time dynamics such as principal oscillation patterns (Hasselmann 1988) may be better suited to identification of dynamical modes (Sheshadri and Plumb 2017). However, such methods may not capture well the quasi-stationary aspects of the flow (Dole 1986), which can be important for long-lived events (see section 5).

It is still an open question whether classifiable flow types with an underlying independent dynamical basis exist, and whether PCA or AA would capture them if they did. We can hypothesize that a method based on capturing successive amounts of variance (PCA) ought to reflect some of the more commonly expressed low-frequency variability. Alternatively, a method based on finding “extreme” archetypes in the data (AA) may be appealing for climate analysis if the extremes perhaps better represent the constituent modes. For example, one might think of extremes as purer expressions of the flow with higher signal to noise. Whether this is the case is tested empirically here by using the basis patterns from each method to try to identify long-lived events.

b. Principal component analysis

Since PCA is well known in climate literature and described above, we provide a brief description here in terms that make the formal equivalance with AA clearer. Both PCA and AA methods can be posed, following Cutler and Breiman (1994), in general form as finding the linear combination of basis patterns kαikzk that best approximates each instance xi of X as the minimizer of
ixikαikzk2,
for k = 1, …, p basis patterns.

In the case of PCA, z1, …, zp are taken to be orthonormal and αik, yielding basis vectors as solutions for z that maximize the successive amount of variance explained by each orthonormal mode (Jolliffe and Cadima 2016). While PCA lends itself to very efficient solution methods, the resulting patterns can be somewhat difficult to interpret physically, because each pattern does not need to resemble instances of the data, xi, nor do the xi need to be approximated by a mixture of the basis patterns (Cutler and Breiman 1994). The eigenvectors of PCA point in the direction of maximal variation, but their magnitudes are not physically related to the magnitudes of xi (Mørup and Hansen 2012).

c. Archetypal analysis

For achetypal analysis, two key conditions are applied on the selection of the basis patterns (here “archetypes”), z. First, the archetype patterns are mixtures (linear combinations) of the data values. That is
zk=i=1tβkixi,k=1,,p,
where βki ≥ 0 and iβki=1. This feature ensures that the magnitudes of the archetypes are interpretable in terms of the magnitudes of xi and have the same physical units. The second feature of archetypal analysis is that each data point xi can be approximated as a combination of the archetypes:
xik=1pαikzk,
subject to minimization of Eq. (1), where αik ≥ 0 and kαik=1. The latter two conditions mean that each xi can be interpreted as a combination of the archetypes. The coefficients, αik, can be interpreted as probabilities, p(xi|zk) indicating membership to classes represented by the archetypes (Bauckhage and Thurau 2009).
The archetype algorithm (Cutler and Breiman 1994) consists of minimizing the residual sum of squares, RSS:
RSS=i=1nxik=1pαikj=1nβkjxj2=XXBAF2,
subject to constraints αik0 (A0),kαik=1,βkj0 (B0),jβkj=1. Here A[p, t] and B[t, p] are matrix forms of α and β (Bauckhage and Thurau 2009), and F denotes the Frobenius norm ||M||F. There is no closed form solution to Eq. (4). Given an initial guess of the archetypes, zk, the estimation algorithm alternates between finding the best α for given archetypes z and finding the best archetypes z for given α. Our archetype estimation uses the Matlab code provided by Mørup and Hansen (2012) with 1000 seeds to find the seed that best minimizes the RSS. The use of many seeds also helps to reduce the likelihood that our solutions are trapped in local (rather than global) minima. The initial seed uses Mørup and Hansen (2012) “FurthestSum” algorithm and we test that any final iteration is at least better than this.

d. Reduced space archetype analysis

In the reduced space archetype analysis, the reduced form of the data, Xr, is used. In matrix form, Xr[s, t] = U[s, r]Λ[r, r]VT[r, t], where r ≤ min(s, t) = 155. The reduced form of Eq. (4) is then
XrXrBAF2=UΛVTUΛVTBAF2 =U(ΛVTΛVTBA)F2.
Since the Frobenius norm ||M||F is used in Eq. (5), we can exploit the fact that it is invariant in operations with unitary matrices U such that ||MU||F = ||UM||F = ||M||F. The EOF matrix U is unitary (UUT = I, where I is the identity matrix). As such, the reduced space archetype problem reduces to solution of
ΛVTΛVTBAF2.
This means that the spatial dependency through the EOF patterns drops out, and in the reduced space analysis only the eigenmatrix and principal components remain in Xr. In the reduced space of Xr, the archetypes are given by
Zr[r,p]=Λ[r,r]VT[r,t]B[t,p].
In the original data space the archetypes then take the following form:
Z[s,p]=U[s,r]Zr[r,p].
We are confident that the reduction of the data in this way does not bias the outcome of the archetypal analysis. We checked this by performing archetypal analysis for p = 4 archetypes on the full dataset X, (without reduction), and we obtained very similar results to the analysis on the reduced version of the data.

4. Archetype illustration

a. Severe spatial truncation example

The archetype algorithm generates iterative fits of a convex hull to the data, X (or Xr), subject to the constraints above. By “convex hull” we mean the shape of the smallest convex set that contains X. This process is difficult to visualize for even the reduced form of our data here using 155 PCs. For the purpose of illustration of the method only, let us suppose for the moment that our spatial data had been severely truncated to just the leading three PCs (instead of 155). That is, our spatial dimension is now 3 and we retain our full time dimension of daily data corresponding to 22 280 days, and denote this severely spatially truncated version of the data as X3.

We can now “visualize” X3 by plotting the location of each daily instance of X3 on the three-dimensional PC axis shown in Fig. 1. The cloud of points in X3 can be contained within a convex hull shown by the black polyhedron. The 184 vertices of the convex hull are all points in X3. This set of points are formally “extreme” in the sense that they do not lie on any open line segment joining any two points in X3. We can also think of them as “extreme” in that the set of points on the convex hull encloses within it all other points in X3.

Fig. 1.
Fig. 1.

Convex hull of Z500 truncated to the first three PCs. The cloud of points in blue represents daily Z500 for all days in the reanalysis, where each value is represented by the leading three PCs of the data. The black lines forming the polyhedron connect the vertices of a convex hull around all the blue points. The green tetrahedron is the AA approximation of the polyhedral convex hull for k = 4 archetypes. The light blue circles at the vertices of the tetrahedron are the locations of the four archetypes.

Citation: Monthly Weather Review 149, 6; 10.1175/MWR-D-20-0314.1

For this illustration we carried out an archetypal analysis on X3 using p = 4 archetypes. The converged AA solution is the best approximation to the original 184 vertex convex hull using just four points (archetypes) that iteratively minimizes the residuals of the points outside the fitted convex hull. This solution is shown by the green tetrahedron in Fig. 1. The four archetypes selected are shown by the larger blue circles at the vertices of the tetrahedron. These points do not necessarily correspond to actual points in X3. These archetypal points are also “extreme” in that they sit on (an approximation of) the convex hull of X3. In this illustration the archetypes are on the outer edges of the three-dimensional space of X3, whereas in our actual analysis they are on the convex hull of a 155-dimensional polytope feature space, X155.

This simple illustration suggests that it is a challenging exercise to represent distinct properties in the cloud of points in Fig. 1 by a handful of basis functions. For PCA, a selection of three basis functions here would correspond to just three vectors along the three PC coordinate axes (solid red lines), aligning the data in terms of maximal variation. For AA it is visually apparent that just four archetypes are a coarse approximation of the convex hull of X3. For clustering methods such as k-means, the basis functions would be centroids of groups of points in the data, maximizing their similarity in terms of distance. It is an open question for each method how well the data disaggregate in terms of the constraining property (variance, extremeness, distance), whether a small number of basis functions captures the desired disaggregation, and in turn whether that disaggregation is useful for classifying the desired event (long-lived events here).

b. Simplex representation

For AA, each instance in X is represented by a weighted combination of the archetypal points [Eq. (3)], and is thus reconstructed in terms of these archetypal extremes in the data. In our 3D spatial dimension example in Fig. 1 we can see how well (or not) the data are clustered around the archetypal points. One would like to visualize how close our data are to the selected archetypes in the more general case where the number of spatial dimensions of the data (155 here) is much greater than 3. The relationship of the set of data, X, to the p archetypes zk can be represented graphically by mapping each archetype to the p vertices μk of a simplex (zkμk, k = 1, …, p). Choosing p = 4 archetypes, we can then represent each instance of the data, xi, by αi1μ1 + αi2μ2 + αi3μ3 + αi4μ4 to generate the cloud of points in Fig. 2. Each of the four archetypes is given a unique color in Fig. 2 (left panel). Each instance xi is assigned an archetype color according to the archetype with the highest value of α, zk = argmaxαk.

Fig. 2.
Fig. 2.

Simplex plot of the phase space for (left) the four archetypes and (right) the five PCs of Z500. The color plotted corresponds to the color of the leading basis function. For the archetypes the leading archetype has the highest value of α. For the PCs, the leading PC has the highest magnitude.

Citation: Monthly Weather Review 149, 6; 10.1175/MWR-D-20-0314.1

If a daily instance were perfectly characterized by a single archetype, then the weight for that archetype would be 1 and the weights for the remaining archetypes would all be 0. Such a point would lie on the vertex corresponding to that archetype on the simplex plot (one of the points 1, 2, 3, 4 on Fig. 2). If the archetypes are successful in discriminating points in the data, then most of the points in the simplex plot would be tightly bunched around the four vertices. On the other extreme, if most points in the data were an equal mixture (weight) of each of the archetypes, then they would lie near the origin (middle) of the simplex plot. Thus, the simplex plot gives us a graphical way to show where all our data fall with respect to the archetypes. The cloud of points in Fig. 2 are fairly well distributed through the archetype space and do not strongly separate into distinct regimes around each archetype. That is to say, there is a continuum of flows, with each daily flow pattern having some affiliation to more than one archetype. The white space around each of the vertices indicates that there are few days that are close to the “pure” archetypes.

For comparison with the archetypes, we generated a set of mixture weights for the five leading PCs (see section 5) from PCA of Z500 by normalizing by the maximum absolute value of the PCs at each time: ωk = PCk/max(|PCk=1…5|). We then construct a simplex for the PCs analogously by representing each xi by k=15ωikμk. The results for the five transformed PCs are shown in the right panel of Fig. 2. The Z500 data for PCs is similarly well mixed across PCs on any given day. For both methods (AA and PCA) most days are not strongly affiliated to a single basis function on any given day. Almost all flow days will be represented by a mixture of basis functions. In section 7 we provide a measure for assessing how well the leading basis function is discriminated from the other basis functions on any given day. Well-discriminated flow days will lie nearer the vertices of the simplex plots in Fig. 2.

c. Archetype time series

The coefficients α (the weights applied to each archetype to reconstruct each xi) and β (the weights applied to xi to construct the archetypes) that result from archetypal analysis of Z500 for the case of p = 4 archetypes are shown in Fig. 3. The top row in this figure is the time series of β. The β series is sparse in the sense that, for any given archetype zk relatively few values (days) in X contribute to the construction of the archetype zk=iβkixi. Those days with nonzero values of β are the components of the archetype and thus reflect flows that most resemble that archetype.

Fig. 3.
Fig. 3.

(top) The daily time series of the archetype coefficients βk from Eq. (2) for p = 4 archetypes of Z500. The time series spans the reanalysis period from 1958 to 2019. The width of the lines are exaggerated here as each line corresponds to a single day only. (middle) A stacked plot of the probabilities αk of each of the k = 4 archetypes at each day. The probabilities always sum to 1, so the stacking shows the proportion of each archetype each day. (bottom) An expansion of the middle panel to show the archetype probabilities over just the period 2009–10. The color code for the archetypes in all panels follows the legend in the top panel.

Citation: Monthly Weather Review 149, 6; 10.1175/MWR-D-20-0314.1

Every day in the time series of X can be approximated by a combination of the archetypes weighted by the α coefficients: xik=1pαikzk. Since kαik=1, we stack the α coefficients in Fig. 3 (middle panel) to show the relative contribution of each archetype to each day. From this “broad-brush” view (showing the entire time series), it is clear here that there is strong interannual and multidecadal variability in the relative contributions of the archetypes to X. The interannual variability is apparent in the bottom panel, which shows 2009 and 2010 only. The year 2010 features long periods where the first archetype, AA1, is particularly dominant. This AA1 event forms one of our case studies here and is explored in detail in section 8.

5. Selection of number of basis patterns

When PCA is used to reduce the dimensionality of the data, X, one can retain many PC basis patterns (155 here) and still bring about a large reduction in dimensionality (by two orders of magnitude in our case). For the problem of identifying long-lived flow events in the atmosphere we now want to select some small subset of the leading PCs to form our basis patterns. We do this because it is impractical to keep track of more than a small handful of basis patterns. In addition, it is not necessarily desirable to have more patterns as the physical interpretation of the higher-order PC patterns is less clear (Jolliffe 1993). In this section we provide more detail on how we chose the number of basis patterns to represent long-lived events for PCA and AA.

The selection of the number of basis patterns to characterize the flow from any given method is somewhat arbitrary and involves tradeoffs (Christiansen 2007). Jolliffe (1993) summarizes different criteria that are typically applied to limit the number of PCs selected for PCA basis patterns. These include retaining enough basis patterns (PCs) to account for a set amount of variance, retaining the leading PCs that dominate the explained variance, or retaining those basis patterns that can be physically interpreted. Jolliffe (1993) notes that the first two criteria can be difficult to satisfy for climate data, which is multiscale (Lau et al. 1994; O’Kane et al. 2017) and high dimensional, and thus selection is often based on physical interpretability.

a. Explained variance

As more PCs or archetypes are retained, more of the variance in X is accounted for. The fraction of variance in X explained is shown as a function of the number of retained basis patterns (PCs or archetypes) in Fig. 4. For a given number of basis patterns, PCA explains more of the variance in X than AA, which must be the case as PCA optimizes Eq. (1) for maximum variance, whereas AA optimizes the fit to a convex hull of X. With 20 basis patterns, PCA explains about 3/4 of the variance in X, whereas AA explains a bit less than half. Similarly, the sum of squared errors [Eq. (4)] is lower for PCA for a given number of basis patterns than for AA (bottom panel of Fig. 4).

Fig. 4.
Fig. 4.

(top) The fraction of variance of Z500 explained by successive increases in the number of basis functions for PCA (blue curve) and AA (brown curve). (bottom) The sum of squared errors in the data as a function of the number of basis functions for PCA (blue curve) and AA (brown curve).

Citation: Monthly Weather Review 149, 6; 10.1175/MWR-D-20-0314.1

The selection of basis patterns can be aided when there is a clear “knee point” in the curves in Fig. 4 where additional basis patterns do little to add to the explained variance. In our case these curves are relatively smooth without a clear “knee point” for both PCA and AA. As such, we rely primarily on physical interpretation and past practice in the literature in selecting the number of basis patterns.

b. Physical interpretation

For PCA the first three basis patterns are often used to characterize the Southern Hemisphere flow from geopotential height and streamfunction fields. The first PC of Z500 (PC1) has an annular pattern in its EOF field (Fig. 5, top panel) and is often taken as an index of the SAM (Thompson and Wallace 2000; O’Kane et al. 2017; Tozer et al. 2018). The second and third PCs/EOFs display a wave train–like structure in the South Pacific (Fig. 5, second and third panels) and are used to portray the PSA pattern (Mo and Paegle 2001; O’Kane et al. 2017). The fourth and fifth PCs/EOFs are circumglobal with a pronounced wave train–like structure in the Indian Ocean (Fig. 5, fourth and fifth panels) and are sometimes denoted the Indian–Pacific–Atlantic (IPA) pattern (Tozer et al. 2018; Risbey et al. 2019).

Fig. 5.
Fig. 5.

(left) The leading five basis functions (EOFs) for PCA of Z500. (right) The four basis functions (archetypes) for AA of Z500 with p = 4 archetypes. The color bar applies to the archetypes only and is in units of m. The EOFs have no units.

Citation: Monthly Weather Review 149, 6; 10.1175/MWR-D-20-0314.1

Together, these first five PCs have been used to provide statistical representation of three distinct flow types in the Southern Hemisphere; “SAM” (PC1), “PSA” (PC2 and PC3), and “IPA” (PC4 and PC5). We use the quotes here to indicate that the PCs are not necessarily indicative of all aspects of these modes. The physical interpretation of these modes has been questioned (Mo and Ghil 1987; Christiansen 2002; Cohen and Saito 2002; Matthewman and Magnusdottir 2012; Spensberger et al. 2020). We do not try to resolve their physical basis here, but we do relate the events identified using the PCs and AAs to case studies using daily weather maps to explore what they do identify. We continue to use these three terms (SAM, PSA, IPA) throughout the paper because they are common and convenient shorthand for these modes.

c. Coherence and phase

The relationship of the paired PCs with one another is illustrated in Fig. 6, which shows their coherence and phase relationships. The PSA modes (2,3) and the IPA modes (4,5) are both nearly at 90° phase to one another over the 2–30-day range. The coherence of both the PSA and IPA modes drops off after about 10 days, but is higher for the IPA (~0.65) than the PSA (~0.4) over the 2–10-day range. The IPA PC modes are more coherent and at very nearly 90° phase over the 2–10-day period range, reflecting the propagating nature of this pattern. The lower coherence of the PSA modes implies that they are not pure propagating modes. The PSA is known to exhibit more quasi-stationarity, whereas the IPA mode reflects the more transient flow activity of the Indian Ocean region atmospheric waveguide (Tozer et al. 2018; Risbey et al. 2019).

Fig. 6.
Fig. 6.

(top left) The magnitude squared coherence between PC pair 2 and 3 and between PC pair 4 and 5 for PCA of Z500. (bottom left) The phase between these same PC pairs. (top right) The magnitude squared coherence between the first two PC pairs for the tendency of Z500, where the tendency is the time difference, Z500(i+1)Z500(i). (bottom right) The phase of the first two tendency pairs.

Citation: Monthly Weather Review 149, 6; 10.1175/MWR-D-20-0314.1

The quasi-stationary aspects of these PC modes is also revealed through PCA of the tendencies [Z500(i+1)Z500(i)] of Z500. The coherence and phase of the first two paired modes of the tendencies is shown in the right column of Fig. 6. These modes have much higher coherence than the modes in the left column, indicating that the PSA and IPA modes on the left are not purely propagating.

d. Archetype patterns

Since we have selected five basis patterns (PCs) to represent the flow in PCA, we want to select a similar number of basis patterns for AA. Since the five PC patterns represent only three distinct circulation types (SAM, PSA, IPA), we have made an intermediate choice and selected four basis patterns for AA. For AA, the archetypes do not come as pairs. The archetype algorithm finds archetypal points to approximate the convex hull of X, which ensures that the archetypes are well spaced (distinct) extremes in X. By selecting four archetypes, we will have four distinct basis patterns (circulation types here).

The magnitudes of the archetype patterns are obtained from Eq. (2). The spatial fields for the four archetypes can be constructed by applying the archetype weights, βki, to the geopotential height fields, i=1tβkiZ500, for each archetype k. The resulting spatial patterns for the four archetypes are shown in Fig. 5 (right column). Each archetype pattern represents only the sign of the pattern shown. This contrasts with the EOF patterns in Fig. 5 where both the sign of the pattern shown and its opposite-signed pattern are associated with each EOF pattern.

The first archetype pattern, AA1, has a three-wave sequence of ridges and troughs and predominantly lower Z500 at higher latitude. This pattern closely resembles the positive SAM pattern (the opposite-signed pattern to EOF1 in Fig. 5). The second and third archetype patterns, AA2 and AA3, feature a wave train–like pattern in the Pacific and about South America. The second archetype pattern is similar to the PSA1 pattern (EOF2 in Fig. 5), and the third archetype pattern is similar to the PSA2 pattern (EOF3 in Fig. 5). Patterns AA2 and AA3 correspond to the opposite signs of the patterns shown for EOF2 and EOF3. The fourth archetype pattern, AA4, has four waves in the storm track with higher-than-normal Z500 at higher latitude. The latter feature means that it resembles the negative SAM pattern.

The selection of five PCs and four archetypes here results in sets of patterns that are each broadly relatable to physical phenomena (in as much as the SAM and PSA are real), and which are broadly relatable to one another. The SAM and the PSA are represented in both PCs and archetypes here. This is notable in that there is no a priori reason why these patterns should be shared, since the optimization of Eq. (1) is based on different principles (maximizing explained variance and representing extremes) for PCA and AA.

The higher-order PC/EOF patterns (EOF4 and EOF5) are not particularly evident in the four archetype patterns, though the fourth archetype does have more pronounced wave structure in the Indian Ocean sector (which is a characteristic of EOF4 and EOF5). Note that for the archetypes, unlike PCA, there is no requirement that each successive archetype be orthogonal to the modes that precede it. For PC/EOF patterns this means that higher-order EOFs tend to successively smaller-scale structures (higher wavenumber) (Mo and Ghil 1987), whereas AA patterns need not do so. A further difference is that the PCs are ordered in terms of explained variance, whereas the k archetypes are effectively unordered. The archetypes here have been numbered here (1, 2, 3, 4) in decreasing order of their average probability of occurrence, given by iαik¯ as the mean for each archetype k over time.

A final difference noted here is that the PCs “nest” but AAs do not. That is, the first p patterns selected do not depend on the choice of p for PCA, but they do for AA. For a given set of data, X, the first five PCs/EOFs look the same no matter how many PCs we compute (though they do change as X grows when more samples are added to it). For AA the optimal positions of the archetypes on the convex hull encompassing X will typically change as the number of archetypes used to approximate the hull changes (Bauckhage and Thurau 2009). That means that the patterns of the archetypes describing the flow change as p changes. We have computed the archetypes of X for p = 2, …, 20 archetypes (p = 1 is just the mean flow). Though the patterns are different in each case, the archetypes always contain patterns that resemble those of the SAM-like and PSA-like patterns found here for p = 4 archetypes.

6. Long-lived AA and PCA events

a. Definition of events

We are interested here in long-lived (persistent) features of the hemispheric flow. Our approach is to define long-lived events as those where a single basis pattern zk dominates the flow (over other basis patterns) for a sufficiently long period of time τ. Though the choice of τ is somewhat arbitrary, we want it to be longer than for typical synoptic features in the Southern Hemisphere (2–7 days), but not so long that the events themselves are so rare as to limit our sample sizes. We have tested a range of thresholds and have settled on τ ≥ 8 days here. This is about the time scale of very persistent blocking features in the Southern Hemisphere (Trenberth and Mo 1985; Renwick 2005).

The long-lived flow events here are characterized using both PCA and AA. For AA the selection of which basis pattern dominates the flow at any given point in time is relatively straightforward as the archetype probabilities αik give the likelihood of how much each basis pattern zk contributes at each time i. We simply select the basis pattern zm where αm = max(αk) at each time. For long-lived archetype events, we require sequences of τ = 8 days or longer in which zm is the same basis pattern. The sequences over which the same archetype has highest likelihood can be as short as 1 day (our time resolution) or longer than a month. In Fig. 7 we show the histograms of these run lengths for the four archetypes. Sequences longer than 8 days make up the tail of the distribution. The longest sequences tend to favor AA1, which resembles the positive SAM pattern.

Fig. 7.
Fig. 7.

(top) A histogram of the length of continuous sequences of the leading archetype. The sequences are terminated when the leading archetype changes to a different archetype. The histogram is truncated at 30 days. (middle) Similar sequences for the leading PC based on the absolute value of the PCs weighted by eigenvalues. In this case + or − values of the PC are not differentiated. (bottom) The + and − states for each PC are assessed as separate categories. The inset plots in each case show the same data as in the outer plot, but focusing on just run lengths of 8 days or longer.

Citation: Monthly Weather Review 149, 6; 10.1175/MWR-D-20-0314.1

For PCA, the basis patterns are less readily interpretable relative to one another on any given day, since (unlike the α term for AA) there is no formal relationship expressing the relative likelihood of each PC basis pattern at any given point in time. For svd(X) = [EOF, λ, PC], the EOFs and PCs are orthonormal and unitless. The physical dimensions of the data are retained in the eigenvalues λ. To compare the PCs they need to be scaled by λk, so that we assess λkPCik. The λkPCik represent the relative magnitudes of each basis pattern, EOFk at each time i. For PCA we choose each day the pattern, k = m, that has the largest absolute value λmPCm = max|λkPCik|. Long lived PCA events are defined similarly as sequences of 8 days or longer in which the same basis pattern is selected. The lengths of runs over which the same PC is selected are shown in Fig. 7. We have broken these runs down into cases where each PCk is a single category (middle panel) and where we treat the plus and minus signs as different categories, PCk+ and PCk (bottom panel). The histograms for the PCs are similar in profile to those for AA, but there are fewer sequences that extend beyond 8 days. The PC associated with SAM (SAM+ for the bottom panel) tends to longer sequence lengths for the PCs, as it does for AA.

The sequences of runs with a particular leading archetype (or PC) end when a different archetype/PC has higher probability/magnitude. We can assess whether there are any preferred transitions from one archetype (or PC) to another by recording all transitions through the time series. In Fig. 8 we assess which archetype or PC follows another from one day to the next. The daily sampling rate should be sufficient to resolve transitions. In this plot we count all day to day sequences, so include counts of persistent cases where the dominant archetype or PC is the same from one day to the next. The transitions are expressed here as probabilities. For both AA and PCA, the persistence cases (where the basis pattern does not change from one day to the next) are by far the majority of cases, illustrated by the high probabilities on the diagonals.

Fig. 8.
Fig. 8.

(left) The probability that a given archetype (on the y axis) moves (from time i to time i + 1) to another archetype (on the x axis) based on daily archetype sequences. This includes cases where the archetype does not change (on the diagonal elements). (center) As in the left panel, but for PCs, where the sign of the PC is not taken into account. (right) As in the left panel, but for PCs where the sign of the PC is explicitly accounted for.

Citation: Monthly Weather Review 149, 6; 10.1175/MWR-D-20-0314.1

The archetype transitions in Fig. 8 are not symmetric. For example, AA2 (like PSA1) rarely transitions to AA3 (like PSA2), whereas AA3 prefers AA2 to any other archetype (except itself). This transition follows in the sense of eastward propagation (the preferred direction of propagation in the storm track) of AA3 leads to AA2. The transition plots for the PCs show similar behavior. If we consider PC transitions without regard to sign of the PC (middle panel), then the transitions between PC2 (PSA1) and PC3 (PSA2) seem fairly symmetric. However, when we account for the sign of the PCs (right panel), then PC2 much prefers to transition to PC3 with opposite sign to itself, and PC3 prefers to transition to PC2 with the same sign. In both cases, this is consistent with a preference for eastward (rather than westward) pattern transitions. While westward transitions are less favored, they do still occur, consistent with occasional regression of the flow. For the signed PCs (right panel) there are very few transitions between positive and negative states of the same PC number (illustrated by the white spaces just off the diagonal).

b. Event results

The set of long-lived flow events that result from the criterion that the dominant basis pattern must last for 8 days or more are shown in Fig. 9 for AA and in Fig. 10 for PCA. There are more events and longer events for AA than PCA. For both AA and PCA the long-lived event that occurs most often is related to the SAM (SAM-like). For AA, it is AA1 (SAM+) and for PCA it is PC1 (SAM). For both AA and PCA the dominance of SAM events over the other basis functions occurs primarily in summer and winter. In the transition seasons the long-lived events are spread more evenly across basis patterns. We have not explored the reasons here for seasonal differences in the relative frequency of long-lived events for each basis pattern. It may be that the transition seasons provide better definitions of the polar waveguide (Hoskins and Ambrizzi 1993; Ambrizzi et al. 1995), and thus may be more conducive to wave train–like structures such as the PSA.

Fig. 9.
Fig. 9.

(top-left) A calendar with days of the year (1 Jan–31 Dec) along the x axis and years along the y axis. All archetype events where the leading archetype spans 8 days or more are shown with a color strip marking the days when the event occurred. The white space covers days where no qualifying event was found. (bottom) The annual cycle as the total number of archetype event days for each day of the year for each archetype. (right) The annual total number of archetype event days for each archetype. (bottom right) The archetype color legend is shown.

Citation: Monthly Weather Review 149, 6; 10.1175/MWR-D-20-0314.1

Fig. 10.
Fig. 10.

As in Fig. 9, but for the five PCs.

Citation: Monthly Weather Review 149, 6; 10.1175/MWR-D-20-0314.1

The time series of annual long-lived flow days for each basis pattern (right panels in Figs. 9 and 10) shows strong interannual variability for each basis pattern. While we have not analyzed the basis patterns for trends here, there is clearly a trend for AA1 toward more long-lived event days through the time series, which is consistent with the documented trend toward more positive SAM events (Thompson et al. 2000; Thompson and Solomon 2002).

Though we chose τ = 8 days or longer for our persistent events, we could have chosen other thresholds and that would change the pattern of events in Figs. 9 and 10. In particular, what happens when larger values of τ (longer events) are used? We have explored this question for the case of AA in Fig. 11. The left panel shows the proportion of total days in the time series in which a qualifying event occurs for each archetype as the value of τ changes from 1 to 20 days. When τ = 1 day, there is always an event and so the sum of the proportions for each archetype (down the column) is 1. The first archetype, AA1, has the highest proportion of event days for all choices of τ here. Once the value of τ gets beyond about 17 days, most events are AA1 (SAM+) events.

Fig. 11.
Fig. 11.

(left) The proportion of total days in the time series in which a qualifying event occurs for each archetype where the minimum duration of the event is indicated on the x axis, running from 1 to 20 days. (right) The average probability of events (α¯k) for each archetype k as a function of the minimum duration of the event (x axis).

Citation: Monthly Weather Review 149, 6; 10.1175/MWR-D-20-0314.1

Since the archetype events in Fig. 9 are selected on the basis of the archetype probabilities αk, one can ask whether the more long-lived events are associated with higher probabilities. That is, when a very long-lived event takes place, are the probabilities associated with the selected archetype any different from those associated with shorter events? The right panel of Fig. 11 shows the average probability per event (α¯) for each of the archetypes as a function of our definition of τ from 1 to 20 days. For each of the archetypes there is an increase in average event probability with an increase in duration, τ. The only real exception to that is for archetypes 2 and 3 at τ > 17, when there are very few events anyway and the estimate of event probability is not well sampled. For all archetypes for events longer than τ = 8 days the average archetype probability over the event is greater than 0.55, implying that the preferred archetype is fairly strongly favored. In section 7 we provide more direct assessment of how well the leading archetypes/PCs are discriminated from the other archetypes/PCs during events.

c. Surface signatures

In this section we examine the surface temperature (T2m) signatures of the long-lived events for AA and PCA to assess how extreme they are. For each event we calculate the average daily surface temperature during the event and compare this with the climatological distribution of daily surface temperatures. The set of long-lived event days for the temperature composites was calculated two ways; using all days in each event, and where only those days with high discrimination scores (see section 7) are included. The results are broadly similar using these two methods as the long-lived event days feature better discrimination between basis functions than nonevent days. We show results for the second method here. The results for AA are shown in Fig. 12. The surface temperatures associated with long-lived AA events are not extreme per se, but the events do generate large-scale cold and warm temperature anomalies. These anomalies are concentrated in the regions where meridional flow is strong. The persistent meridional flow associated with the kinds of wave train structures exhibited is efficient in generating temperature extremes (Garfinkel and Harnik 2017). Where the archetype patterns resemble SAM+ and SAM (AA1 and AA4) there are cold and warm signatures, respectively, over Antarctica consistent with the movement of the storm tracks.

Fig. 12.
Fig. 12.

Composites are shown over long-lived event days for each archetype of the circulation and surface signature. The composites are composed of all days in each long-lived event where the discrimination score exceeds 0.8. The circulation is indicated by contours of Z850. The shading is for percentiles of 2-m temperature T2m, where the event daily average is ranked against the climatology over all days. The number in the top left of each panel is the number of days making up the composite. The two numbers in the top right of each panel are the maximum percentile of T2m and maximum of |Z850| in m, respectively.

Citation: Monthly Weather Review 149, 6; 10.1175/MWR-D-20-0314.1

The surface signatures for PC events are shown in Fig. 13. We show composites for the positive and negative states of each PC separately as they have opposite signed height anomalies and therefore antisymmetric surface temperature signatures. The PCs have qualitatively the same surface signatures as the AAs. The regions of meridional flow generate warm and cool extremes. The composites for the higher-order PCs (4 and 5) have many fewer events than the low-order PCs, which makes their surface signatures noisier and more difficult to interpret.

Fig. 13.
Fig. 13.

As in Fig. 12, but for composites over long-lived event days for each PC where the discrimination score exceeds 0.7.

Citation: Monthly Weather Review 149, 6; 10.1175/MWR-D-20-0314.1

7. Discriminating among basis functions

a. Discrimination definition

The sequences of consecutive runs of the same basis pattern define long-lived flow events as described above. For AA, the basis pattern selected has the highest probability; for PCA it has the largest magnitude. It is likely that some long-lived flow events will correspond to cases where the selected basis pattern has much higher probability/amplitude than the lesser patterns, and in some cases the selected pattern may be only marginally higher probability/amplitude. We would like to have some measure to assess how well the selected pattern is discriminated from the other patterns. For example, selected patterns that are better discriminated might correspond better to cases where actual long-lived events have a more coherent dynamical signature.

The metric for discriminating patterns needs to provide a measure of how much the selected pattern αm is different from the other patterns. For AA we start with the constraint that the sum of the probabilities for each basis pattern is 1 at each time: kαk=1. At a given time, the selected (highest probability) pattern is αm. Selecting out αm we have αm+kmαk=1. We can rearrange to give
kmαkαm=1αm1.
The term km(αk)/(αm) assesses the ratio of the probability of the selected pattern m against the probabilities of each of the nonselected patterns, summing to provide an overall measure of how different the selected pattern is from the others. Since this sum is equivalent [from Eq. (9)] to the simple expression (1/αm) − 1 we can use this simpler form to derive a score. The optimal score using this term is when αm = 1, yielding a value of 0. The worst score possible would be when αm attains lowest possible probability, which is 1/p, where p is the number of archetypes (basis patterns). We can normalize the score so that it is bounded between 0 and 1 by multiplying by 1/(p − 1). We would also prefer a score where 1 is the best possible score and 0 is the lowest possible, so we can subtract from 1 to yield a discrimination score Δ:
ΔAA=1(1p1)(1αm1).
A perfect ΔAA score of 1 now corresponds to the case where αm = 1 and the other archetypes all have probability 0. The lowest ΔAA score of 0 corresponds to the case where all archetypes are equally likely with probability 1/p and one cannot differentiate between the archetype likelihoods.
In the case of PCA the magnitudes of the PCs at each time are not constrained in their sum as the likelihoods for AA are. To assess the PCs in a similar way to AA we can transform the PCs by the Taxicab norm (sum of the absolute values of vector components: l=1p|λlPCl|), such that for each PC basis function k:
γk=|λkPCk|l=1p|λlPCl|.
This scaling means that k=1pγk=1, and we can now define a discrimination score similarly to that for AA. Let γm = max(γk) be the highest magnitude scaled PC of the p PC basis patterns at each time. Then, following Eq. (10), we can derive a discrimination score for the PCs as follows:
ΔPC=1(1p1)(1γm1).

b. Discrimination results

For each long-lived AA or PC flow event in Figs. 9 and 10 a dominant basis pattern persisted through the duration of the event. We quantify our assessment of dominance here through the discrimination scores ΔAA and ΔPC, which attempt to measure how well the dominant basis pattern is discriminated from the other basis patterns at the time. The discrimination scores over all days and basis patterns (regardless of whether there is a long-lived flow event or not) are shown in the top left of Fig. 14. The histogram of ΔAA scores is shifted to higher values of Δ than for ΔPC. By this measure the AAs are in general more discriminating than the PCs. However, the direct comparison is not entirely like-for-like in that ΔAA discriminates probabilities of basis functions and ΔPC discriminates magnitudes of basis functions.

Fig. 14.
Fig. 14.

Histograms of the discrimination scores ΔAA (blue) and ΔPC (brown). (top left) Scores over all days in the record. (top right) Scores only over days on which a long-lived AA event occurs. (bottom right) Scores only on days in which a long-lived PC event occurs. (bottom left) Scores over days common to both long-lived AA and PC events.

Citation: Monthly Weather Review 149, 6; 10.1175/MWR-D-20-0314.1

We can restrict the comparison of ΔAA and ΔPC scores to just days in which long-lived AA or PC events occur, and to days in which both AA and PC events occur (remaining panels in Fig. 14). As would be expected the ΔAA scores are higher than ΔPC scores for AA events (top-right panel), but they are also higher when assessed just on days when long-lived PC events occur (bottom right). For days common to both AA and PC events (bottom left) the ΔAA scores are again higher than ΔPC scores.

Thus far we have compared ΔAA and ΔPC scores over groups of days, but without regard for how the scores compare on the same day. This “pairing” of scores on the same day gives us the most direct comparison and is presented in Fig. 15 for all days, AA event days, PC event days, and common AA and PC event days. The blue line on the plots indicates where ΔAA = ΔPC. Higher counts below this line indicate where ΔAA > ΔPC in pairings, and higher counts above this line indicate ΔPC > ΔAA. The paired scores show that ΔAA is generally higher than ΔPC on the same day, and this is true whether considering all days, or just the event-day combinations.

Fig. 15.
Fig. 15.

Counts of paired discrimination scores. The color scale indicates the number of times each ΔAA and ΔPC score coincide in bin widths of 0.1 for each score. The paired scores are taken over (top left) all days, (top right) just long-lived AA event days, (bottom right) just long-lived PC event days, and (bottom left) common AA and PC event days. The diagonal blue line indicates where ΔAA = ΔPC.

Citation: Monthly Weather Review 149, 6; 10.1175/MWR-D-20-0314.1

Next, we look at the discrimination scores broken up for each AA or PC basis pattern separately. These are shown for the four archetypes and five PCs in Fig. 16. The scores are for days during long-lived events, and so correspond to the event days shown in Figs. 9 and 10. The score counts for each successive AA or PC reduce, consistent with the reduction in event days for successive basis patterns. There are not clear differences in the shape of the histograms for each basis pattern, indicating that they all perform similarly in discriminating events. The possible exception to this is the highest-order, AA4 and PC4 PC5, patterns, which appear to have a bit more probability mass at lower discrimination scores. These higher-order basis functions are also the least well sampled.

Fig. 16.
Fig. 16.

Histograms of discrimination scores (left) for the four archetypes on days included in all long-lived archetype events and (right) for the five PCs over all long-lived PC events.

Citation: Monthly Weather Review 149, 6; 10.1175/MWR-D-20-0314.1

For both AA and PC there are a range of discrimination scores associated with long-lived events. It is encouraging that most of the event scores are greater than 0.5. The median event scores for AA on AA events and for PCA on PC events are about 0.75 (Fig. 14). In the next section we focus on some individual long-lived flow events to examine in more detail how well individual events are identified by the AA and PCA methods.

8. Case study events

We have selected two years for some case studies of long-lived flow events: 2009 and 2010. Our choice of years was determined in part because we wanted to include particular long-lived events that have been described in the literature. At the end of 2009 there was a particularly long-lived blocking event in the southeast Pacific (Boening et al. 2011), and 2010 featured several extended periods of pronounced positive SAM (Hendon et al. 2014; Lim and Hendon 2015). In each case we examine the events to see whether they qualified as long-lived events by our AA and PCA event definitions, whether the basis patterns selected for these events by AA and PCA resemble the flow patterns that occurred, and how well discriminated these events were.

To put these two years in initial context we show the time series through 2009–10 of daily values of ΔAA and ΔPC (Fig. 17). The Δ values are truncated below 0.7 to highlight only the most well discriminated events here. The period of the blocking event at the end of 2009 is indicated by the first set of dashed vertical lines. Both AA and PCA give high discrimination scores to basis patterns at this time, with mostly the AA3 pattern for AA (top panel) and PC2 for PCA (bottom panel). The second case study period is indicated by the dashed vertical lines spanning much of 2010. In 2010 the high discrimination scores are dominated by AA1 (top panel) and PC1 (bottom panel) to a remarkable degree. Much of the period from May through August, and then from October through November is dominated by AA1 and PC1. The discrimination scores for the May through August event are very high for both AA and PCA. We now look at these cases in more detail.

Fig. 17.
Fig. 17.

Time series from 1 Jan 2009 to 31 Dec 2010 of daily discrimination scores: (top) ΔAA and (bottom) ΔPC. Only scores above 0.7 are shown. The horizontal dashed line in each panel corresponds to the 95th-percentile Δ score. The numbers 1–4 in the top panel denote AA1–AA4. The numbers 1–5 in the bottom panel denote PC1–PC5. The dates on the time axis correspond to the first day of the given month. The dashed vertical lines denote the two case study periods, one in 2009 and one in 2010.

Citation: Monthly Weather Review 149, 6; 10.1175/MWR-D-20-0314.1

a. 2009: Blocking case

The extended blocking event in 2009 has been documented through its impacts on the ocean. Satellite data revealed a period of record increase in ocean bottom pressure over the southeast Pacific Ocean from October 2009 through January 2010 (Boening et al. 2011). Boening et al. (2011) showed that this increase was primarily driven by enhanced wind stress curl associated with a persistent blocking high in the region. The event described by Boening et al. (2011) reached a peak in November 2009. We can visualize this event in the atmosphere by examining a sequence of daily charts of Z500 in Fig. 18. In the last few days of October there is a wavenumber-4 pattern with a well-defined wave train in the Indian Ocean sector. The pattern then transforms in the beginning of November to lower wavenumber with a pronounced anomaly (blocking high) in the Pacific just upstream of South America. This block then persists in place for almost the entire month of November. The duration of this block is not unprecedented, but it is unusual.

Fig. 18.
Fig. 18.

Sequence of 24 days of daily weather charts from 28 Oct to 20 Nov 2009. The contour lines depict Z500 and the shaded regions show anomalies Z500, where red colors indicate positive anomalies (ridges) and blue colors indicate negative anomalies (troughs). The sequences go down the columns so that it is easier to compare longitudinal positions of the ridges from day to day. The absolute value |Z500| of each daily field is denoted in the top left of each panel in units of meters. The shading scale is the same for all days and is bounded at ±500 m.

Citation: Monthly Weather Review 149, 6; 10.1175/MWR-D-20-0314.1

To show the November 2009 blocking event in broader context, we provide a Hovmöller plot of Z500 at 55°S in the top panel of Fig. 19. The blocking event shows up here as the contiguous positive anomaly (red) near 240°E denoted by the dashed ellipse. Contiguous positive anomalies at the same longitude (blocks) over such a long duration are rare. The panels immediately below the Hovmöller show the AA probabilities αk and normalized PC magnitudes γk. The Hovmöller shows the block longitude during the November event shifting between 220° and 260°E. When the block is closer to 250°E, AA3 (with a ridge near 250°E in its pattern in Fig. 5) is the dominant archetype in the αk panel, and when the block shifts to nearer 220°E, the dominant archetype in the αk panel is AA4 (which has a dominant ridge near 220°E in Fig. 5). The PCs also switch in the γk panel for PCs from a preference for PC2 to PC1 when the block in question moves back toward 220°E, though this preference is not directly relatable to the preferred positions of ridges in the EOF1 pattern, since the possible ridge positions are 200° and 250°E in the EOF1 pattern.

Fig. 19.
Fig. 19.

The x axis of each panel is a time series of daily data from 1 Sep to 31 Dec 2009. (top) A Hovmöller plot of Z500 at 55°S with positive anomalies shaded red and negative anomalies shaded blue. The elliptical region denotes a period of extended blocking near 240°E. The sloping dashed lines indicate a wave train in the Indian Ocean region. (second panel)The daily time series of the probabilities αk for each archetype AAk. (third panel) The daily time series of normalized PC magnitude γk for each PCk. (bottom) The daily time series of the discrimination scores ΔAA and ΔPC.

Citation: Monthly Weather Review 149, 6; 10.1175/MWR-D-20-0314.1

The 2009 blocking event is associated primarily with AA3 for AA and PC2 for PCA. The basis patterns for these basis functions are similar in Fig. 5. They both reflect the “PSA” pattern, with some variation between them in the positions of the troughs and ridges in the PSA wave train pattern. As such, the identification of basis pattern to the event is broadly consistent across AA and PCA. The event qualifies as a long-lived flow event for both AA and PCA in Figs. 9 and 10, respectively.

The discrimination scores for the 2009 blocking event (and other events) are generally higher for AA than PCA (bottom panel in Fig. 19). A common exception to this occurs when the leading event identified by PCA is PC4 or PC5 (corresponding to the “IPA” modes). An example of this occurs at the end of the blocking event. In early December the block in the Pacific weakens and dissipates. As it does so, the Hovmöller shows a wave train (sequence of propagating troughs and ridges) established in the Indian Ocean sector (longitudes 30°–120°E), indicated by the pair of sloping dashed lines. This propagating wave train pattern is well represented in the basis patterns for PC4 and PC5 in Fig. 5, but there is no pattern that well represents this feature among the AA basis patterns in Fig. 5. Thus the discrimination scores for the archetypes are all low at the end of the blocking event, whereas the scores for PC4 and PC5 are higher and better discriminated than the AAs at that time.

b. 2010: Southern annular mode case

The year 2010 is for AA the year with the highest number of AA1 (SAM+) event days (Fig. 9). This year also sits among the higher number of event days for PC1 (SAM) (Fig. 10). The event analysis for both AA and PCA identified long-lived AA1/PC1 events in May–June and October–November. The October–November event is identified by Hendon et al. (2014) and Lim and Hendon (2015) as an “extreme positive excursion of the SAM”.

The basis patterns for SAM+ events are shown in the top row of Fig. 5. For the PCA pattern (EOF1) on the left the sign needs to be reversed to correspond to SAM+. With that, the AA and PCA patterns AA1 and PC1 are remarkably similar. They feature a deep, poleward displaced trough at 260°E, lower pressures at other longitudes on the poleward side of the storm track, and three broad ridges on the equatorward side of the storm track.

We turn now to daily Z500 charts in May–June corresponding to the first major SAM+ event in 2010 (Fig. 20). Starting before the event on the 12 May a wavenumber-6 pattern spans the circumglobal storm track region. By the 17 May this pattern has transitioned to the canonical SAM+ pattern featuring all the major SAM+ characteristics (deep trough at 260°E, extended high latitude trough, three broad equatorward ridges). The positions of the ridges vary from day-to-day over the course of the event, but the zonal trough at high latitudes with a deep trough at 260°E persists through most of the May–June event.

Fig. 20.
Fig. 20.

As in Fig. 18, but for daily weather charts from 12 May to 4 Jun 2010.

Citation: Monthly Weather Review 149, 6; 10.1175/MWR-D-20-0314.1

The deep trough near 260°E evident in the daily charts is also captured in the Hovmöller plot in the top panel of Fig. 21. The trough is persistent at this longitude through much of the 2010 period shown, as indicated by the sequence of ellipses marking persistent troughs at 260°E. Whenever the trough at 260°E has large amplitude, both AA and PCA tend to register highest probability/amplitude for AA1/PC1 in the middle two panels of Fig. 21. Both AA and PCA provide good discrimination of AA1/PC1 from the other basis functions during the SAM+ events. For AA, AA1 is consistently and clearly dominant over the other AA basis patterns, and generally has higher discrimination scores (bottom panel of Fig. 21) than PC1.

Fig. 21.
Fig. 21.

As in Fig. 19, but from 1 May to 31 Dec 2010. The elliptical regions in the Hovmöller plot denote periods through the year where there is a persistent trough at 260°E.

Citation: Monthly Weather Review 149, 6; 10.1175/MWR-D-20-0314.1

9. Conclusions

There is no single way to identify long-lived flow events in the atmosphere. Long-lived events are often associated with the annular modes of variability (NAM, SAM), with blocking, or with quasi-stationary wave trains such as associated with the PNA or PSA. The annular modes and wave train modes are often described, quantified, and monitored using PCA. We provided a method here to identify long-lived flow events using PCA, and compared PCA with AA.

For both PCA and AA we used a finite number (5 and 4, respectively) of basis functions to represent atmospheric modes during long-lived events. This might seem like a serious limitation in that each basis function is effectively a fixed pattern, and so we have only a small number of fixed patterns in each case to represent the vast variability of flow phenomena. The underlying idea must therefore be that long-lived events have finite and repeated forms of expression which can be partly approximated by the set of basis functions. That idea is at least consistent with the view that there are natural regimes or modes of variability (Lorenz 1969; Charney and DeVore 1979) and the observation that annular and wave train structures are a feature of long-lived events.

The basis patterns for PCA and AA are obtained by different optimization routes; PCA maximizes the explained variance of successive basis patterns, and AA forms basis patterns that reflect the extremes of the data. These features of the optimization make both PCA and AA potentially suited to capturing long-lived events. For example, since persistence often increases with spatial scale, long-lived events ought to correspond to the larger spatial-scale structures that generate variance of the flow. This would lend them to identification by PCA. Alternatively, long-lived events might be extreme in the sense that they are rare and occur with high amplitude when the flow is more organized. The extreme nature of the events would lend them to identification by AA. As it turns out, the leading basis patterns for PCA and AA for Southern Hemisphere flow are similar and both capture elements of annular and wave train structures. PC1 and AA1 both represent a version of the SAM. PC2 and PC3, and AA2 and AA3 here both represent patterns characteristic of the PSA.

Where long-lived events are defined as sequences of the same leading basis pattern lasting 8 days or longer, there are more long-lived events for AA than PCA. The most common long-lived event is like positive SAM (AA1) for AA, and is SAM (PC1) for PCA. For both AA and PCA the long-lived SAM event is more common in summer and winter than in the transition seasons. The PSA-type modes (AA2 and AA3; PC2 and PC3) are the next most common long-lived events. The IPA modes of PCA (PC4 and PC5) are the least common long-lived event, and when they do occur, they tend to be shorter. This is consistent with the propagation implied by the strong coherence of the IPA modes, and the more transient, high wavenumber IPA flow pattern.

The surface temperature signatures of the long-lived PCA and AA events for the leading basis patterns are similar, which is not surprising as their spatial structures are similar. The large-scale troughs and ridges in the leading basis patterns are in similar locations, and these generate persistent meridional flow and warm/cold extremes consistent with that flow.

The longest long-lived events are in general, better expressed than shorter events. By this we mean that in AA the archetype probability over the event (α¯) has higher values the longer the event. That is, the longer the event lasts, the more similar the flow pattern during the event is to the selected archetype basis pattern, and the better the selected archetype pattern is discriminated from the other archetype patterns.

We developed a score Δ to measure how well the leading basis pattern is discriminated from the other basis patterns at any given time. The score is well suited to AA because the archetype basis patterns are assigned probabilities at each time, whereas for PCA it is less clear how to differentiate the strength of expression of the PCs. Long-lived flow events are generally well discriminated by our score for both AA and PCA. The AA discrimination scores are typically higher than those for PCA, except when higher wavenumber IPA-like events occur. The IPA structure is not well described by the four archetype patterns here.

The case studies here examined two previously documented long-lived episodes; a long-lived block/PSA event in 2009 and a set of very persistent positive SAM events in 2010. The 2009 block was persistent broadly in the region of 250°E, but with vacillation of longitude during the event that is characteristic of both PSA1 and PSA2 structures. Both AA and PCA successfully identified long-lived events during the 2009 event. In both AA and PCA the basis pattern identified had a blocking center consistent with observed and favored a PSA-type basis pattern. The leading basis pattern was also well discriminated from the other basis patterns during the event for AA and PCA. The strong positive SAM event in May/June 2010 had a classic SAM structure in Z500 of lower heights around the pole, higher heights in ridges equatorward of the storm track, and a deep trough in the eastern Pacific. The May/June event was identified as a long-lived positive SAM event by AA and PCA and well discriminated.

Our use of a very simple definition of long-lived events as persistent sequences of the same dominant basis pattern in PCA or AA is a starting point only and is likely to be problematic in some ways. For example, to what extent are the events identified in this way real, and are we perhaps missing long-lived events not well classified by this approach? We have some faith that the events identified are real because they have physically based annular or wave train structures. Further, daily sequences of synoptic charts (for the cases we examined) show persistent or recurring features that share much of the same structure as the dominant PCA and AA basis patterns during the events.

Our views of what the atmospheric modes of variability look like are partly shaped by PCA, since we use PCA to characterize many of these modes. When we use PCA as a tool to identify events, we will see PCA-structured events, reinforcing what we expect. It is therefore reassuring that AA yields some of the same long-lived event patterns as PCA here. On the other hand, AA identifies noticeably more long-lived events than PCA here. From this one might conclude that either AA yields more false-positive events than PCA, or that the additional AA events are real and take different form or are better characterized by AA. We cannot provide a conclusive answer to this question, since we have not performed a systematic study of all the events identified. We do note that AA lends itself more readily to discrimination among basis functions and thus is perhaps better suited to event identification using our definition.

The long-lived events identified here are largely quasi-stationary by construction of our method. We use a small number of fixed basis patterns and require that the dominant basis pattern is persistent through time. Events will therefore be identified when the flow pattern is largely stationary and matches one of the basis patterns better than the others. If the flow patterns are propagating it is likely that the best matching basis pattern will change unless the variation in flow pattern is small and/or the features of the pattern recur in time. Our approach therefore largely excludes transient patterns for identification as long-lived events. When the pattern is strongly propagating, even if it remains coherent, the positions of the troughs and ridges will move in space, which would likely result in affiliation with a different (fixed) basis pattern. This would break the run of persistence required by our definition. Other methods exist to cope with coherent translating structures such as moving archetypes (Cutler and Stone 1997) and will be the focus of future work.

This work has been solely concerned with the identification of long-lived events. We have not examined the theory behind why, when, or where they form, and what sustains them. It is hoped that steps to improve the description and climatology of long-lived events will provide more impetus and support for a deeper understanding of them.

Acknowledgments

This research was supported by the Decadal Climate Forecasting Project at CSIRO. We appreciate the very constructive review comments. We appreciate the very constructive review comments of Pedram Hassanzadeh and other reviewers.

Data availability statement

The reanalysis data used for this work are from the Japanese 55-year Reanalysis (JRA-55) project carried out by the Japan Meteorological Agency (JMA). These data are available at https://jra.kishou.go.jp.

REFERENCES

  • Altman, N., and M. Krzywinski, 2018: The curse(s) of dimensionality. Nat. Methods, 15, 399400, https://doi.org/10.1038/s41592-018-0019-x.

  • Ambrizzi, T., B. Hoskins, and H. Hsu, 1995: Rossby wave propagation and teleconnection patterns in the austral winter. J. Atmos. Sci., 52, 36613672, https://doi.org/10.1175/1520-0469(1995)052<3661:RWPATP>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Barnston, A. G., and R. E. Livezey, 1987: Classification, seasonality and persistence of low-frequency atmospheric circulation patterns. Mon. Wea. Rev., 115, 10831126, https://doi.org/10.1175/1520-0493(1987)115<1083:CSAPOL>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bauckhage, C., and C. Thurau, 2009: Making archetypal analysis practical. Lect. Notes Comput. Sci., 5748, 272281, https://doi.org/10.1007/978-3-642-03798-6_28.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Black, A., and Coauthors, 2021: Australian northwest cloudbands and their relationship to atmospheric rivers and precipitation. Mon. Wea. Rev., 149, 11251139, https://doi.org/10.1175/MWR-D-20-0308.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Boening, C., T. Lee, and V. Zlotnicki, 2011: A record-high ocean bottom pressure in the South Pacific observed by GRACE. Geophys. Res. Lett., 38, L04602, https://doi.org/10.1029/2010GL046013.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Branstator, G., 2002: Circumglobal teleconnections, the jet stream waveguide, and the North Atlantic Oscillation. J. Climate, 15, 18931910, https://doi.org/10.1175/1520-0442(2002)015<1893:CTTJSW>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Charney, J. G., and J. G. DeVore, 1979: Multiple flow equilibria in the atmosphere and blocking. J. Atmos. Sci., 36, 12051216, https://doi.org/10.1175/1520-0469(1979)036<1205:MFEITA>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Christiansen, B., 2002: On the physical nature of the Arctic Oscillation. Geophys. Res. Lett., 29, 1805, https://doi.org/10.1029/2002GL015208.

  • Christiansen, B., 2007: Atmospheric circulation regimes: Can cluster analysis provide the number? J. Climate, 20, 22292250, https://doi.org/10.1175/JCLI4107.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cohen, J., and K. Saito, 2002: A test for annular modes. J. Climate, 15, 25372546, https://doi.org/10.1175/1520-0442(2002)015<2537:ATFAM>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cohen, J., J. Foster, M. Barlow, K. Saito, and J. Jones, 2010: Winter 2009–2010: A case study of an extreme Arctic oscillation event. Geophys. Res. Lett., 37, L17707, https://doi.org/10.1029/2010GL044256.

    • Search Google Scholar
    • Export Citation
  • Cutler, A., and L. Breiman, 1994: Archetypal analysis. Technometrics, 36, 338347, https://doi.org/10.1080/00401706.1994.10485840.

  • Cutler, A., and E. Stone, 1997: Moving archetypes. Physica D, 107, 116, https://doi.org/10.1016/S0167-2789(97)84209-1.

  • Dole, R., 1986: The life cycles of persistent anomalies and blocking over the North Pacific. Advances in Geophysics, Vol. 29, Academic Press, 31–69, https://doi.org/10.1016/S0065-2687(08)60034-5.

    • Crossref
    • Export Citation
  • Dole, R., M. Hoerling, J. Perlwitz, J. Elscheid, P. Pegion, T. Zhang, X. Quan, and D. Murray, 2011: Was there a basis for anticipating the 2010 Russian heat wave? Geophys. Res. Lett., 38, L06702, https://doi.org/10.1029/2010GL046582.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Garfinkel, C., and N. Harnik, 2017: The non-Gaussianity and spatial asymmetry of temperature extremes relative to the storm track: The role of horizontal advection. J. Climate, 30, 445464, https://doi.org/10.1175/JCLI-D-15-0806.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Haines, K., 1994: Low-frequency variability in atmospheric middle latitudes. Surv. Geophys., 15, 161, https://doi.org/10.1007/BF00665686.

  • Hannachi, A., and N. Trendafilov, 2017: Archetypal analysis: Mining weather and climate extremes. J. Climate, 30, 69276944, https://doi.org/10.1175/JCLI-D-16-0798.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hannachi, A., I. Jolliffe, and D. Stephenson, 2007: Empirical orthogonal functions and related techniques in atmospheric science: A review. Int. J. Climatol., 27, 11191152, https://doi.org/10.1002/joc.1499.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hassanzadeh, P., and Z. Kuang, 2016: The linear response function of an idealized atmosphere. Part II: Implications for the practical use of the fluctuation-dissipation theorem and the role of operator’s nonnormality. J. Atmos. Sci., 73, 34413452, https://doi.org/10.1175/JAS-D-16-0099.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hasselmann, K., 1988: PIPs and POPs: The reduction of complex dynamical systems using principal interaction and oscillation patterns. J. Geophys. Res., 93, 11 01511 021, https://doi.org/10.1029/JD093iD09p11015.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hendon, H., E.-P. Lim, J. Arblaster, and D. Anderson, 2014: Causes and predictability of the record wet east Australian spring 2010. Climate Dyn., 42, 11551174, https://doi.org/10.1007/s00382-013-1700-5.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hoskins, B., and T. Ambrizzi, 1993: Rossby wave propagation on a realistic longitudinally varying flow. J. Atmos. Sci., 50, 16611671, https://doi.org/10.1175/1520-0469(1993)050<1661:RWPOAR>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jolliffe, I., 1993: Principal component analysis: A beginner’s guide—II: Pitfalls, myths, and extensions. Weather, 48, 246253, https://doi.org/10.1002/j.1477-8696.1993.tb05899.x.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jolliffe, I., and J. Cadima, 2016: Principal component analysis: A review and recent developments. Philos. Trans. Roy. Soc. London, 374A, 116, https://doi.org/10.1098/rsta.2015.0202.

    • Search Google Scholar
    • Export Citation
  • Knighton, J., G. Pleiss, E. Carter, S. Lyon, M. Walter, and S. Steinschneider, 2019: Potential predictability of regional precipitation and discharge extremes using synoptic-scale climate information via machine learning: An evaluation for the eastern continental United States. J. Hydrometeor., 20, 883900, https://doi.org/10.1175/JHM-D-18-0196.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kobayashi, S., and Coauthors, 2015: The JRA-55 reanalysis: General specifications and basic characteristics. J. Meteor. Soc. Japan, 93, 548, https://doi.org/10.2151/jmsj.2015-001.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lau, K., P. Sheu, and I. Kang, 1994: Multiscale low-frequency circulation modes in the global atmosphere. J. Atmos. Sci., 51, 11691193, https://doi.org/10.1175/1520-0469(1994)051<1169:MLFCMI>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lau, W. K., and K.-M. Kim, 2012: The 2010 Pakistan flood and Russian heat wave: Teleconnection of hydrometeorological extremes. J. Hydrometeor., 13, 392403, https://doi.org/10.1175/JHM-D-11-016.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lim, E.-P., and H. Hendon, 2015: Understanding and predicting the strong Southern Annular Mode and its impact on the record wet Australian spring 2010. Climate Dyn., 44, 28072824, https://doi.org/10.1007/s00382-014-2400-5.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lim, E.-P., H. Hendon, G. Boschat, D. Hudson, D. Thompson, A. Dowdy, and J. Arblaster, 2019: Australian hot and dry extremes induced by weakenings of the stratospheric polar vortex. Nat. Geosci., 12, 896901, https://doi.org/10.1038/s41561-019-0456-x.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lorenz, E., 1951: Seasonal and irregular variations of the Northern Hemisphere sea-level pressure profile. J. Meteor., 8, 5259, https://doi.org/10.1175/1520-0469(1951)008<0052:SAIVOT>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lorenz, E., 1969: The predictability of a flow which possesses many scales of motion. Tellus, 21, 289307, https://doi.org/10.3402/tellusa.v21i3.10086.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lorenz, E., 2004: The Essence of Chaos. University of Washington Press, 240 pp.

  • Matthewman, N., and G. Magnusdottir, 2012: Clarifying ambiguity in intraseasonal Southern Hemisphere climate modes during austral winter. J. Geophys. Res., 117, D03105, https://doi.org/10.1029/2011JD016707.

    • Search Google Scholar
    • Export Citation
  • Mo, K., and R. Livezey, 1986: Tropical-extratropical geopotential height teleconnections during the Northern Hemisphere winter. Mon. Wea. Rev., 114, 24882515, https://doi.org/10.1175/1520-0493(1986)114<2488:TEGHTD>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mo, K., and M. Ghil, 1987: Statistics and dynamics of persistent anomalies. J. Atmos. Sci., 44, 877902, https://doi.org/10.1175/1520-0469(1987)044<0877:SADOPA>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mo, K., and R. Higgins, 1998: The Pacific–South American modes and tropical convection during the Southern Hemisphere winter. Mon. Wea. Rev., 126, 15811596, https://doi.org/10.1175/1520-0493(1998)126<1581:TPSAMA>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mo, K., and J. Paegle, 2001: The Pacific–South American modes and their downstream effects. Int. J. Climatol., 21, 12111229, https://doi.org/10.1002/joc.685.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Monahan, A., J. Fyfe, M. Ambaum, D. Stephenson, and G. North, 2009: Empirical orthogonal functions: The medium is the message. J. Climate, 22, 65016514, https://doi.org/10.1175/2009JCLI3062.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mørup, M., and L. Hansen, 2012: Archetypal analysis for machine learning and data mining. Neurocomputing, 80, 5463, https://doi.org/10.1016/j.neucom.2011.06.033.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Müller, G., and T. Ambrizzi, 2007: Teleconnection patterns and Rossby wave propagation associated with generalized frosts over southern South America. Climate Dyn., 29, 633645, https://doi.org/10.1007/s00382-007-0253-x.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • North, G., 1984: Empirical orthogonal functions and normal modes. J. Atmos. Sci., 41, 879887, https://doi.org/10.1175/1520-0469(1984)041<0879:EOFANM>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • North, G., T. Bell, and R. Cahalan, 1982: Sampling errors in the estimation of empirical orthogonal functions. Mon. Wea. Rev., 110, 699706, https://doi.org/10.1175/1520-0493(1982)110<0699:SEITEO>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • O’Kane, T., D. Monselesan, and J. Risbey, 2017: A multiscale reexamination of the Pacific–South American pattern. Mon. Wea. Rev., 145, 379402, https://doi.org/10.1175/MWR-D-16-0291.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Pelly, J., and B. Hoskins, 2003: A new perspective on blocking. J. Atmos. Sci., 60, 743755, https://doi.org/10.1175/1520-0469(2003)060<0743:ANPOB>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Pook, M., and T. Gibson, 1999: Atmospheric blocking and storm tracks during SOP-1 of the FROST project. Aust. Meteor. Mag., 48, 5160.

    • Search Google Scholar
    • Export Citation
  • Renwick, J., 2005: Persistent positive anomalies in the Southern Hemisphere circulation. Mon. Wea. Rev., 133, 977988, https://doi.org/10.1175/MWR2900.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Richardson, D., A. Black, D. Monselesan, T. Moore, J. Risbey, D. Squire, and C. Tozer, 2021: Identifying periods of forecast model confidence for improved subseasonal prediction of precipitation. J. Hydrometeor., 22, 371385, https://doi.org/10.1175/JHM-D-20-0054.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Risbey, J., T. O’Kane, D. Monselesan, C. Franzke, and I. Horenko, 2015: Metastability of Northern Hemisphere teleconnection modes. J. Atmos. Sci., 72, 3554, https://doi.org/10.1175/JAS-D-14-0020.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Risbey, J., D. Monselesan, T. O’Kane, C. Tozer, M. Pook, and P. Hayman, 2019: Synoptic and large-scale determinants of extreme austral frost events. J. Appl. Meteor. Climatol., 58, 11031124, https://doi.org/10.1175/JAMC-D-18-0141.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Röthlisberger, M., L. Frossard, L. Bosart, D. Keyser, and O. Martius, 2019: Recurrent synoptic-scale Rossby wave patterns and their effect on the persistence of cold and hot spells. J. Climate, 32, 32073226, https://doi.org/10.1175/JCLI-D-18-0664.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Schneidereit, A., S. Schubert, P. Vargin, F. Lunkeit, X. Zhu, D. Peters, and K. Fraedrich, 2012: Large-scale flow and the long-lasting blocking high over Russia: Summer 2010. Mon. Wea. Rev., 140, 29672981, https://doi.org/10.1175/MWR-D-11-00249.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sheshadri, A., and R. A. Plumb, 2017: Propagating annular modes: Empirical orthogonal functions, principal oscillation patterns, and time scales. J. Atmos. Sci., 74, 13451361, https://doi.org/10.1175/JAS-D-16-0291.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Spensberger, C., M. Reeder, T. Spengler, and M. Patterson, 2020: The connection between the southern annular mode and a feature-based perspective on Southern Hemisphere midlatitude winter variability. J. Climate, 33, 115129, https://doi.org/10.1175/JCLI-D-19-0224.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Steinschneider, S., and U. Lall, 2015: Daily precipitation and tropical moisture exports across the eastern United States: An application of archetypal analysis to identify spatiotemporal structure. J. Climate, 28, 85858602, https://doi.org/10.1175/JCLI-D-15-0340.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Teng, H., and G. Branstator, 2017: Causes of extreme ridges that induce California droughts. J. Climate, 30, 14771492, https://doi.org/10.1175/JCLI-D-16-0524.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Thompson, D., and J. Wallace, 2000: Annular modes in the extratropical circulation. Part I: Month-to-month variability. J. Climate, 13, 10001016, https://doi.org/10.1175/1520-0442(2000)013<1000:AMITEC>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Thompson, D., and S. Solomon, 2002: Interpretation of recent Southern Hemisphere climate change. Science, 296, 895899, https://doi.org/10.1126/science.1069270.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Thompson, D., J. Wallace, and G. Hegerl, 2000: Annular modes in the extratropical circulation. Part II: Trends. J. Climate, 13, 10181036, https://doi.org/10.1175/1520-0442(2000)013<1018:AMITEC>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Tibaldi, S., and F. Molteni, 2002: On the operational predictability of blocking. Tellus, 42, 343365, https://doi.org/10.3402/tellusa.v42i3.11882.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Tozer, C., J. Risbey, T. O’Kane, D. Monselesan, and M. Pook, 2018: The relationship between wave trains in the Southern Hemisphere storm track and rainfall extremes over Tasmania. Mon. Wea. Rev., 146, 42014230, https://doi.org/10.1175/MWR-D-18-0135.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Trenberth, K., and K. Mo, 1985: Blocking in the Southern Hemisphere. Mon. Wea. Rev., 113, 321, https://doi.org/10.1175/1520-0493(1985)113<0003:BITSH>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Trenberth, K., G. Branstator, and P. Arkin, 1988: Origins of the 1988 North American drought. Science, 242, 16401645, https://doi.org/10.1126/science.242.4886.1640.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Van Den Dool, H., 1994: Searching for analogues, how long must we wait? Tellus, 46, 314324, https://doi.org/10.3402/tellusa.v46i3.15481.

  • Wallace, J. M., and D. S. Gutzler, 1981: Teleconnections in the geopotential height field during the Northern Hemisphere winter. Mon. Wea. Rev., 109, 784812, https://doi.org/10.1175/1520-0493(1981)109<0784:TITGHF>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wirth, V., M. Riemer, E. Chang, and O. Martius, 2018: Rossby wave packets on the midlatitude waveguide: A review. Mon. Wea. Rev., 146,