## Abstract

Persistent weather regimes in daily North Atlantic–European winter mean sea level pressure (MSLP) fields from the 140-yr Twentieth Century Reanalysis are investigated. The phase space is divided into discrete cells based on quantiles of empirical orthogonal function (EOF) principal components; the cells are thus approximately equally populated. An estimate of persistence is provided in terms of the number of different cells visited for a given trajectory duration. This technique is also applied to the well-known Lorenz63 system, which clearly exhibits two regimes, and the more complex Lorenz96 system where the regime structure is less pronounced. While the analysis identifies the two regimes of both the Lorenz63 and Lorenz96 systems, evidence for comparable regimes in the MSLP data is weaker. Recurrent weather regimes produced by *k*-means clustering might be expected to be clearly linked to slower-moving regions of phase space, but this is shown not to be the case. Only the region of phase space associated with the negative phase of the North Atlantic Oscillation (NAO) shows any regime-like behavior. Nevertheless, the analysis does reveal some structure to the time evolution of the atmospheric circulation—transitions between neighboring pairs of cells show a preferred direction of evolution in many cases.

## 1. Introduction

The analysis of atmospheric circulation in terms of weather regimes has a long history. Earlier subjective catalogues such as Grosswetterlagen (Baur et al. 1944) have been supplemented by more recent objective methods, with various atmospheric fields used to categorize daily weather patterns into different regimes. As noted by Michelangeli et al. (1995, hereinafter M95) the concept of a weather regime is somewhat imprecise, with no single accepted definition. One possible basis for defining weather regimes is persistence (Mo and Ghil 1988). A related approach is taken by Vautard (1990), who defines weather regimes as states for which the large-scale flow is stationary on average. Dole and Gordon (1983) find persistent geopotential height anomalies (both positive and negative) in regions associated with frequent blocking episodes, while Barnston and Livezey (1987) detect persistence in hemispheric height field patterns found using rotated principal component analysis.

Another approach is to equate regimes with regions of higher density of points in phase space. These regions can be found through estimation of local maxima of probability density functions (PDFs), although this approach becomes impractical in higher-dimensional spaces because of sparsity of data (e.g., Steinbach et al. 2004; Kimoto and Ghil 1993; Crommelin 2004). Christiansen (2005) uses a combination of persistence and PDFs to investigate regimes in planetary wave amplitude. Kimoto and Ghil (1993) find some evidence of PDF maxima corresponding to persistent states in an analysis of Northern Hemisphere 700-hPa height fields in winter. They argue that it would be desirable to search for PDF maxima in an eight-dimensional phase space, but that the limited data available make this approach statistically unfeasible. The limited evidence they find in two-dimensional phase space may be an indication of multimodality in a higher-dimensional space.

Another technique used to search for phase space maxima is cluster analysis, with one of the most popular variants being *k*-means cluster analysis (e.g., M95). This method partitions a collection of fields into a set of clusters by assigning each field to a cluster and then moving fields between clusters to minimize the total within cluster variance. The number of clusters *k* must be specified beforehand, which leads to the problem of determining how many clusters exist in the system.

Many papers focus on the North Atlantic–European (NAE) region in winter; several authors have argued that four clusters is an appropriate choice for this case (e.g., M95; Cassou 2008; Dawson et al. 2012), generally based on a comparison test using surrogate multinormal data (taken from first-order Markov processes with the same lag-1 autocorrelation and variance as the corresponding principal component time series of the atmospheric data). Other authors suggest that evidence for the existence of multiple regimes is weak (Christiansen 2007; Fereday et al. 2008).

Simple dynamical systems have long been argued to provide an analogy for the behavior of the climate system. In this analogy, the preferred states exhibited by (for example) the Lorenz (1963) system (hereinafter Lorenz63) correspond to weather regimes in atmospheric fields (Corti et al. 1999; Palmer 1999). Palmer (1993) uses the Lorenz63 model as a simple paradigm for extratropical climate variability and investigates the effect of additional terms intended to represent tropical SST. More recently, the ability to simulate weather regimes has been proposed as a way of evaluating model performance (Dawson et al. 2012; Christensen et al. 2015, hereinafter C15).

This paper examines the extent to which recurrent circulation patterns (i.e., with the highest probability of occurrence) detected by *k*-means clustering correspond to quasi-persistent states of the system, as the above analogy and previous work (e.g., Cassou 2008; Dawson et al. 2012) suggest they might. Clearly there are limits to the realism of a comparison between atmospheric circulation and the simple Lorenz63 system, but the latter may be usefully employed as a test of the ability of methods to detect regime structures (Stephenson et al. 2004). As with previous work, the focus is on the NAE region in winter. In the Lorenz63 system, the trajectory through phase space alternates irregularly between the two quasi-persistent regimes. In its usual implementation, the *k*-means analysis only takes account of the relative spatial position of the fields in phase space, ignoring any temporal information. The standard analysis cannot therefore guarantee that the circulation patterns identified as regimes by cluster analysis are more persistent than other circulation patterns.

Straus et al. (2007) use a modified version of *k*-means analysis, in which a proportion of the fields associated with the fastest trajectories through phase space are filtered out before the remaining fields are clustered [a similar approach is taken by Itoh and Kimoto (1999) in a model-based study]. The clusters produced are of significantly higher quality than those produced from multinormal surrogate data. However, it is not completely clear whether the retained fields are from particular small regions or more evenly distributed across phase space.

Temporal evolution is assessed here by dividing the phase space into a regular grid of cells and examining trajectories of sequences of daily fields through this phase space grid. This method is applied to the Lorenz63 and the more complex Lorenz (1996) system (hereinafter Lorenz 1996) and mean sea level pressure (MSLP) fields for the NAE region in winter. Such an approach also makes possible a straightforward analysis of preferred transitions between neighboring cells, in a similar way to Crommelin (2004).

Transitions between different states have also been analyzed in previous work. Luo et al. (2012) investigate NAO variability in recent decades using the M95 ,*k*-means cluster analysis method. They find preferred transitions between positive and negative phases of the NAO involving the other pair of the commonly derived set of four weather regimes, the Atlantic ridge and Scandinavian blocking. Michel and Rivière (2011) also follow the M95 method and find a similar set of four types. They also identify several preferred transitions between types, some of which are explained in terms of Rossby wave breaking events. Related studies have found preferred states in the position of the eddy-driven Atlantic jet stream and preferred transitions between them (Hannachi et al. 2012; Woollings et al. 2010a); these transitions have been linked to Rossby wave breaking (Franzke et al. 2011).

The grid of cells used in this paper contains many more than four cells. This allows an analysis of transitions between neighboring cells at a smaller scale in phase space than in previous work; for example, preferred transitions that occur within any of the four clusters can be investigated.

## 2. Data and methods

Data from three sources are analyzed. One dataset consists of ensemble mean daily mean MSLP fields from the Twentieth Century Reanalysis (Compo et al. 2011) for the boreal winter (December–March) for the period 1871–2010 and the NAE region (taken as 25°–70°N, 70°W–50°E). Anomalies are produced by removing a seasonal cycle climatology produced by smoothing the fields with repeated application of a binomial filter, as in Fereday et al. (2008).

The other two datasets are taken from integrations of two different dynamical systems developed by Edward Lorenz. The first of these (Lorenz63) is the well-known three-variable system described by the equations

with *σ* = 10, *β* = 8/3, and *ρ* = 28, parameter choices that produce the well-known chaotic behavior with two regimes. The fourth-order Runge–Kutta method is used to integrate the Lorenz equations, with a time step of 0.0001. The initial condition **x**_{0} = (−0.9295, −1.6890, 9.9316) is taken from a “spinup” integration of the Lorenz equations so as to start on the Lorenz attractor. The Lorenz integration is used to demonstrate that the analysis can detect persistent regimes in the data, so that it can be usefully applied to the MSLP data.

As in Stephenson et al. (2004), the integration is sampled to mimic the temporal distribution of the atmospheric data. A given time interval *τ*= 0.03 (i.e., 300 time steps) in the Lorenz integration is chosen to correspond to a day of the MSLP data (the choice of time interval is motivated below). The start of the integration is taken to correspond to 1 January 1871 (i.e., the start date of the MSLP data). With these choices made, the Lorenz integration is calculated up to the time corresponding to 31 December 2010 (the end date of the MSLP data) and sampled at the times corresponding to the winter days contained in the MSLP data. Both datasets are therefore of the same length, consisting of a set of separate 4-month continuous sections corresponding to the December–March (DJFM) seasons.

Both datasets can be represented as a time series of points in a phase space, with the coordinates of each point given by the principal components (PCs) of the leading empirical orthogonal functions (EOFs) of the data. Each dataset is analyzed by dividing its phase space into a regular grid of cells, whose boundaries are given by quantiles of the principal component distributions of the leading EOFs. This choice yields roughly equally populated cells. This approach is similar to previous work (Stephenson et al. 2004; Hannachi et al. 2012; Hannachi and Turner 2013) where the data are transformed to a uniform probability space. However, these papers test for the presence of regimes through the appearance of clusters of points in the transformed data, rather than through the cell-based persistence analysis described below, which (as far as the author is aware) has not previously been used.

The second dynamical system investigated (Lorenz96) is described in Lorenz (1996) and C15. This system is used here as a further test of the method in a more complex system with weaker regime structure than Lorenz63. The system describes the evolution of two sets of coupled variables: *K* large-scale, low-frequency variables *X*_{k} and *JK* smaller-scale, high-frequency variables *Y*_{j,k}. The governing equations are as follows:

and

with *j* = 1, …, *J* and *k* = 1, …, *K*. The system has periodic boundary conditions, so that *X*_{k+K} = *X*_{K}, *Y*_{j,k+K} = *Y*_{j,k}, *Y*_{j−J,k} = *Y*_{j,k−1}, and *Y*_{j+J,k} = *Y*_{j,k+1}. The *X*_{k} terms represent some variable in *K* sectors of a latitude circle, with *J* smaller-scale variables *Y*_{j,k} per sector. The parameters *h*, *F*, *b*, and *c* represent the coupling constant, forcing strength, and spatial and time scale ratios, respectively. The parameters chosen are *K* = 8, *J* = 32, *h* = 1, *F* = 20, *b* = 10, and *c* = 10, as in C15. The system is integrated with a numerical ordinary differential equation solver, and the results found to resemble those described in C15. The results used here are taken every 0.05 model time units of the integration.

C15 find two persistent regimes in this system corresponding to wavenumber-1 and wavenumber-2 states of the *X*_{k} variables. The leading four EOFs of the system consist of two degenerate pairs: EOF1 and EOF2 (together accounting for 68.7% of the variance) are a pair of wavenumber-2 waves in quadrature (i.e., *π*/4 out of phase), while EOF3 and EOF4 (accounting for an additional 14.4%) are a pair of wavenumber-1 waves in quadrature (*π*/2 out of phase). C15 therefore define the axes of their two-dimensional phase space as the modulus of the wavenumber-2 EOFs principal component vector [PC1, PC2] and the modulus of the wavenumber-1 EOFs principal component vector [PC3, PC4]. The same process is followed here, before again dividing the phase space into a grid of cells based on quantiles. Because the phase space is not based directly on the EOF principal components, the cell sizes and populations are more unevenly distributed.

### Method to detect persistent weather regimes

To examine the evolution of each system in time, each cell is examined in turn as the starting cell for sequences of points of a given duration. The sequence of points moving forward in time from each cell member is found (assuming the end of the season is not reached before the end of the sequence) and the number of cells visited during each sequence is counted (Fig. 1).

The idea is to measure persistence in terms of the number of different cells visited in a set time, since this number equates to the proportion of the total phase space that the trajectory travels through. Trajectories visiting only a few cells in phase space in a set time show more persistence than trajectories visiting many cells.

The cells visited statistic is used as a basis for the choice of *τ*, the time interval of the Lorenz63 integration corresponding to one day of the MSLP data. For the MSLP–Lorenz63 comparison to be useful, the Lorenz63 trajectories should visit a similar proportion of phase space (and therefore a similar proportion of the total number of cells) as the MSLP trajectories for the same number of “days” in both systems. For example, for the MSLP data, trajectories lasting 20 days visit (on average) around 20% of the cells. If *τ* were very small, then only one or two cells would be visited in 20 days of the Lorenz63 integration, whereas if *τ* were very large, then the majority of the cells would be visited. An intermediate value of *τ* is therefore chosen; the choice does not need to be exact since the intention is not to make a precise quantitative comparison.

Similarly, the choice of time interval in the Lorenz96 integration corresponding to one day is motivated by the same criterion as for the Lorenz63 system; therefore 0.05 time units are taken to correspond to one day [this is one-quarter of the value chosen by Lorenz (1996) based on the relative error doubling times in the model and the atmosphere]. As with the Lorenz63 system, the integration data are subsampled to mimic the temporal distribution of the MSLP data.

## 3. Lorenz63–Lorenz96–MSLP comparison results

We first examine the Lorenz63 dataset, to test that the method is capable of highlighting the two known regimes in the system. We expect trajectories of a given duration starting from cells close to the center of each regime to visit relatively few neighboring cells, with trajectories starting from cells in the transition regions between the two regimes expected to visit more cells in the same amount of time.

A choice first has to be made as to how to divide the phase space into cells, both in terms of the dimensionality of the grid of cells and the number of quantiles in each dimension. A higher-dimensional grid allows a greater proportion of the total variance to be analyzed (since more EOFs are included) but tends to produce more cells. Similarly, dividing each dimension into more cells gives greater resolution, but again at the cost of reducing the number of members of each cell. In the case of the Lorenz63 data, 96% of the variance is contained within the leading pair of EOFs, so a two-dimensional grid of cells is clearly optimal.

The left-hand panel of Fig. 2 shows a plot of the median number of cells visited in a 20-“day” period for all trajectories starting from each cell of an 8 × 8 grid, with the trajectories shown in the background. There is large variation in this statistic across the grid of cells, with a range of 4–15. The regimes corresponding to the two lobes of the Lorenz attractor can be clearly seen as local minima (of 4 and 6) in the plot, showing that the cell analysis is capable of detecting the regime structure. Note also the weaker minimum of 10 in the lower half of column 4 in the grid; this corresponds to trajectories that become slow moving in a particular region midway between the two regimes, so that few cells are visited for trajectories starting in this section of phase space. The asymmetry between the two halves of this panel of Fig. 2 is due to the finite sample of the Lorenz63 data; for longer integrations of the Lorenz63 system, the median cells visited statistic appears to converge to a symmetric final state (not shown). The slow-moving region close to the center of the attractor appears just off center (in column 4) because there is an even number of columns in the grid.

Because the Lorenz63 system is relatively simple and has strong regime structure, the cell counting technique is further tested for the more complex Lorenz96 system. The results are shown in the right-hand panel of Fig. 2. Two regions of increased persistence that appear to correspond to the regimes identified by C15 are visible in the top-left and bottom-right corners of the figure panel, although they are weaker than for the Lorenz63 data (the three cells with smaller values of median cells visited in the bottom-left corner cumulatively contain less than 1% of the total number of days in the sample, so they appear unlikely to be significant).

We now apply the analysis to a grid of EOF cells based on the MSLP data for comparison. In contrast to the two Lorenz systems, the leading pair of MSLP EOFs account for only 35% of the total variance. The leading three EOFs account for similar proportions of the total variance so are not well separated. Additionally, the search for regimes may be hindered by projecting onto a low-dimensional subspace of the system (Kimoto and Ghil 1993). The MSLP analysis is therefore carried out for a 4 × 4 × 4 grid of 64 cells (i.e., the same number as for the Lorenz systems) based on the leading three EOFs, which account for around 52% of the total variance. The leading EOF resembles the NAO with nodes close to Iceland and the Azores, while the second EOF has nodes over Scandinavia and the North Atlantic; the third EOF has a monopole over the center of the domain (Fig. 3). Of course, a higher-dimensional grid based on more EOFs could be used, but as discussed above the finite size of the dataset starts to become an issue. While the leading three EOFs account for similar proportions of the total variance, EOF4 accounts for only 9.4% of the variance, so it appears well separated. Using only the leading EOFs also implicitly focuses on the large-scale MSLP features that vary more slowly in time.

There is less variation in the median cells visited across the grid (ranging from 11 to 13, as shown in the top row of Fig. 4) compared to the results in Fig. 2. Significance is tested by constructing 100 red noise datasets with the same autocorrelation and variance as the principal component time series (the datasets have a Gaussian distribution). While a number of cells (mostly around the edges of the grid) show significantly longer trajectories than the red noise data, there are no cells with significantly shorter trajectories. These results show a lack of evidence for regime-like behavior in the MSLP phase space.

The results are not particularly sensitive to the chosen grid resolution (not shown). However, the two regimes of the Lorenz63 system are not resolved for grids with a coarser resolution than about 5 × 5. The two minima corresponding to the Lorenz63 regimes remain visible for finer-resolution grid sizes, although some unoccupied cells occur in the finer-resolution grids. For the Lorenz96 system, the two regimes are evident for grid resolutions of between 5 × 5 and 8 × 8; above this resolution, the more weakly detected regime is not evident. For the MSLP data, finer-resolution cubic grids (up to 8 × 8 × 8) show qualitatively similar results to the 4 × 4 × 4 grid, with much less variation across phase space in the median cells visited statistic compared to the Lorenz data. This statistic appears noisier for finer-resolution grids, presumably because of the smaller number of members in each cell.

Results for different trajectory durations are also broadly similar to the results shown here. For a range of different durations of 5 days up to 50 days, the minima in the Lorenz63 system corresponding to the two regimes are always visible, as is the case for the Lorenz96 system. The MSLP system again shows relatively little variation in the median cells visited statistic across phase space. Because the MSLP EOFs are not well separated, the MSLP cell analysis was repeated using varimax rotated EOFs, but little difference was found in the results.

Phase speed (i.e., the distance traveled through phase space in one day) is examined by calculating the distribution of phase speeds from all possible trajectories starting from a given cell. For the Lorenz63 system with the strongest regime structure, there is a pronounced difference in phase speed between different cells: the trajectories in the fastest cells move significantly further in one “day” than the trajectories in the slowest cells. Comparing the cell median phase speed values for each cell in the grid, the ratio of the lowest value to the highest value is approximately 7.4. For the Lorenz96 and MSLP systems, these ratios are approximately 1.9 and 1.5, respectively, suggesting that regimes are weaker in the Lorenz96 and weakest in the MSLP systems. While there is no overlap in phase speeds for the slowest and fastest cells in the Lorenz63 system, for the MSLP system the phase speed distributions of all the cells overlap. Figure 5 shows percentiles of cells visited in 20 days and phase speed for all trajectories starting from a given cell. Within each MSLP cell, the fastest trajectories move about 4 times farther in one day than the slowest, suggesting more variation within each cell than between cells. Although the MSLP cells with the slowest phase speeds tend to have a lower median cells visited statistic, the link is fairly weak, perhaps because the latter statistic is based on trajectories over a longer time period than the former.

### Comparison to clusters

Previous work has suggested that there are four regimes in the NAE region during winter, corresponding to the two phases of the NAO and two additional patterns generally called Atlantic ridge and Scandinavian blocking (e.g., Cassou 2008). To see whether there is any correspondence between the center of these regimes and variation in the median number of cells visited, cluster analysis is applied to the area-weighted MSLP fields to produce four clusters, so that the composition of the cells in terms of the clusters can be determined. Experiments with principal components of the leading EOFs showed that changes to the number of principal component series retained for clustering made little difference to the final classification. The four clusters produced here spatially resemble the four clusters identified in previous work: Scandinavian blocking (SBL), NAO+, Atlantic ridge (AR), and NAO− for clusters 1–4 respectively (left-hand column of Fig. 6). The relative frequency of occurrence of the clusters differs from some previous work (e.g., Dawson and Palmer 2015), possibly because different years are covered by the datasets used to generate the clusters.

The bottom row of Fig. 4 shows the distribution of the four clusters within the 4 × 4 × 4 grid of cells. The grid cell containing each cluster centroid is labeled with the appropriate cluster number in the top row of the figure. Comparison of the two rows of the figure shows that there is no clear slower-moving region of phase space at the center of each cluster, except perhaps for the NAO− cluster. Indeed, this has to be the case given the high degree of homogeneity of the data in the top row of the figure.

This result is reinforced by the PDF plots in the right column of Fig. 6. The plots are produced by dividing the 64 cells in the grid into five groups, with each cell belonging to a unique group. The idea is to separate the postulated slower-moving regions of phase space at the very center of each of the four clusters (contained in groups 1–4) from the faster-moving transition regions between the clusters (group 5). Group 1 contains those cells dominated by cluster 1 (defined as being where 99% or more of the cell members belong to cluster 1). Groups 2, 3, and 4 are similarly defined for clusters 2, 3, and 4 (the four groups contain 8, 5, 6, and 4 cells respectively). Group 5 contains the remaining 41 cells where no single cluster accounts for 99% of the members of a cell. The distribution of the number of cells visited in all 20-day trajectories starting from each group of cells is then calculated (other trajectory durations of 5, 10, and 30 days were also tried, with qualitatively similar results to the 20-day trajectories). If the regime structure is present, the PDFs for groups 1–4 (located at the center of the clusters) would be expected to favor lower numbers of cells visited compared to group 5 (the transition regions).

However, inspection of the plots shows that this is not the case: for clusters 1, 2, and 3 there is little evidence of slower-moving regions of phase space at the heart of the cluster. Cluster 4 (NAO−) does show a weak tendency toward shorter trajectories compared to the transition regions, but the PDF still shows substantial overlap with the transition cell PDF.

This suggests that the centers of clusters do not represent atmospheric states that are any more persistent than the states represented by neighboring cells; what small variation there is in median cells visited across phase space does not correlate with the position of the clusters. This is consistent with the results of Franzke and Feldstein (2005), who suggest that low-frequency teleconnection patterns can equally well be regarded as a continuum of states as opposed to a particular small set of recurrent regimes. Similarly, the scalar blocking index used by Scaife et al. (2010) to diagnose when blocking occurs has a Gaussian-like PDF, suggesting that blocking is part of a continuum of atmospheric states rather than a distinct regime.

The slowest-moving section of phase space (as identified by the column of cells in Fig. 4 with the lowest value) is contained within the NAO− cluster. This is perhaps not surprising, given the well-known persistence of blocked atmospheric states (e.g., Dole 1989) and the relative persistence of the negative phase of the NAO (Barnes and Hartmann 2010; Woollings et al. 2010b; Rivière and Drouard 2015). Stan and Straus (2007) relate blocking to weather regimes in a study of the Pacific–North American sector; they find that blocking is most associated with one particular regime, but many further days in this regime are not associated with blocking. The results here suggest that the NAO− cluster may show similar behavior, containing some persistent episodes visiting fewer cells, but also intervals of shorter duration.

Previous authors have used low-pass filtering to try to discern weather regimes in the data (e.g., Kimoto and Ghil 1993; Straus et al. 2007; Straus 2010). To test whether this approach alters the results above, the analysis was repeated with a 10-day low-pass Lanczos filter applied to the MSLP anomaly fields. The cluster centroids show some variation compared to the unfiltered case, although the NAO cluster remains relatively unaffected (Fig. 7). However, the results of the analysis showed little qualitative difference to the unfiltered case (not shown). The median number of cells visited statistic is slightly reduced, with a range of 8–10 compared to 11–13 for the unfiltered data. This is perhaps to be expected, given that the low-pass filtering may enhance persistence. However, the distribution of cells visited is again relatively homogeneous across phase space, with the NAO− cluster again standing out as slightly favoring slower-moving trajectories.

## 4. Preferred transitions

The above results show little evidence of persistent regime-like structures in the MSLP data, with the possible exception of the NAO− cluster. However, some phase space structure may be discerned from the cell analysis by looking at transitions between cells from one day to the next.

Examination of these sequences shows that for some pairs of cells, there are many more transitions in one direction than the other. Furthermore, pairs of cells with a preferred direction can be linked together, highlighting preferred pathways through phase space. The preferred transitions are slightly stronger for the low-pass filtered data. Figure 8 shows one of the longest possible chains of transitions with a significantly preferred direction for each link of the chain (significance at the 5% level is assessed using the null hypothesis that for a pair of cells A and B, A → B and B → A transitions are equally likely). Note that Fig. 8 is not meant to imply that any single trajectory follows every link of the entire chain; rather, the figure is intended to show that individual preferred transitions can be linked together into longer coherent sequences of states. Examples of such sequences include a low pressure anomaly moving east between cells 4 and 10, and a high pressure anomaly moving west and then north between cells 16 and 25. The latter transition is consistent with the results of Woollings et al. (2008), who find that episodes of blocking over Greenland are frequently preceded by blocking over northern Europe.

Previous work has examined preferred transitions in the context of four winter clusters. Luo et al. (2012) find transitions from NAO+ to SBL and on to NAO− for 1991–2008, and from NAO− to AR and on to NAO+ for 1978–90. Michel and Rivière (2011) find the transitions from NAO+ to SBL, AR to NAO+, NAO+ to AR, and SBL to NAO− to be favored above other possible transitions for the 1958–2001 period.

Because the clusters occupy several cells and a single cell can contain members of several clusters, a direct comparison with these results is not straightforward. Also, some of the preferred cell transitions found here occur largely within a single cluster, so this information will be missed by an analysis that focuses on transitions between clusters. Nevertheless, the second row of the figure is suggestive of a transition from NAO+ (cells 6–8) to AR (cell 10) as previously found by Michel and Rivière (2011). The fourth and fifth row of cells in Fig. 8 shows a transition from SBL (cell 16) to AR (cells 20 and 21), which then evolves to an NAO− state (cells 24 and 25). This is consistent with the transition from SBL to NAO− identified by previous authors (Michel and Rivière 2011; Luo et al. 2012; Woollings et al. 2008; Vautard 1990). Vautard (1990) also identify three transitions as unlikely to occur (from Greenland anticyclone to either blocked or Atlantic ridge, and from zonal to Greenland anticyclone); consistent with their results, none of these transitions is evident in Fig. 8.

Figure 8 shows one possible sequence of preferred transitions. An alternative focus is the most probable transitions between neighboring cells (where the probability of a transition from cell A to cell B is defined as the number of A → B transitions divided by the total number of trajectories starting from cell A). The 40 most probable transitions are shown in Fig. 9, which shows the full set of 64 cell mean fields as four layers of the 4 × 4 × 4 cube of cells. Transitions involving the cells projecting onto the positive NAO (EOF1 quantile 1, top-left quadrant of figure) and negative NAO (EOF1 quantile 4, bottom-right quadrant) tend to be the most probable. Several multicell transition sequences can be seen; in particular, the previously mentioned sequence of NAO+ evolving to an AR-like state reappears in the top row of the top-left quadrant, while the sequence from SBL to NAO− is again evident running clockwise around the edge of the bottom-right quadrant. Other sequences not present in Fig. 8 are also visible; the direction of the flow through phase space shows coherence across the grid of cells.

## 5. Conclusions

The time evolution of daily mean MSLP patterns has been analyzed for the North Atlantic–European region in winter. The technique used is to divide the phase space of the system (as defined by the principal component time series of the leading EOFs) into approximately equally populated cells, and then to count the number of different cells visited in sequences of days. For the Lorenz systems this technique reveals regions of faster- and slower-moving trajectories that correspond to regime structure in phase space. For the Twentieth Century Reanalysis MSLP data, however, there is much less difference between the fastest- and slowest-moving regions of phase space. Furthermore, there is no correlation between the slower-moving regions of phase space and the centroids of clusters found using *k*-means cluster analysis, implying that evidence for temporally persistent regimes is weak. These results are found despite the focus on the leading EOF modes and the use of a 4-month winter season, both factors that have been argued by Dawson and Palmer (2015) to aid the detection of regimes.

This paper’s interpretation of persistence in terms of cells visited is slightly different from defining persistence in terms of distance traveled through phase space. The unequal size of the cells in phase space (as is clear from comparing the edge cells with the center cells in Fig. 1) means that traveling a given distance in phase space does not necessarily equate to visiting a set number of cells. Repeating the analysis of Figs. 2, 4, and 6 with a measure of distance traveled along trajectories rather than cells visited gives qualitatively similar results. The persistent states remain clear in the Lorenz63 data, while the centers of the MSLP clusters do not all correspond to slower-moving phase space trajectories, although blocking patterns are linked to slightly slower-moving trajectories (not shown). (For the Lorenz96 data, only the wavenumber-2 regime appears to be linked with smaller distances traveled, perhaps because the cell sizes and populations are unevenly distributed in this system.)

Cluster analysis is clearly a useful tool for simplifying the analysis of large amounts of data, with a range of applications that include the analysis of teleconnections (Cassou 2008), links between circulation and surface climate (Beck et al. 2016), and model evaluation (Dawson et al. 2012; C15). The results presented here do not invalidate this earlier work but nevertheless suggest that care is needed in interpreting the physical significance of the clusters found by the *k*-means method. The four winter clusters found in many previous papers are clearly reproducible, given the similarity of patterns produced by different authors. However, the current analysis shows that (with the possible exception of NAO−) these clusters do not appear to represent either regimes, in the sense of quasi-persistent states of the atmospheric circulation, or states that are more persistent than intermediate patterns between them. Rather, there appears to be a continuum of states in phase space.

The lack of evidence for regime-like behavior does not mean that the phase space for the winter North Atlantic–European region MSLP is featureless. The relatively short observed record hinders the search for regimes. Because the limited amount of observed data restricts the feasibility of searching in a high-dimensional space (Kimoto and Ghil 1993), it may be that there is some higher-dimensional regime structure that the cell counting method fails to detect. The lack of observed data similarly hampers complex techniques such as that applied to a quasigeostrophic model (Pires and Ribeiro 2017). This analysis uses a long (10^{6} day) model integration to adequately sample the model attractor; since this is much longer than the duration of the observed record (and the model attractor is presumably much simpler than that of the real atmosphere) the analysis only appears feasible for observed data if quite simple non-Gaussian structures are sought on a moderate dimension space. Nevertheless, the cell counting method described here does detect some spatial variation in the speed of trajectories through phase space, although it is weaker than that shown in either of the Lorenz systems. In particular, trajectories associated with the negative phase of the NAO are slightly slower moving than elsewhere. This is consistent with previous work suggesting stronger persistence for the negative phase of the NAO than the positive phase (Barnes and Hartmann 2010; Woollings et al. 2010b; Rivière and Drouard 2015).

Moreover, the analysis also shows there is a preferred direction to the progression of trajectories through many parts of phase space. Because previous work tends to focus on transitions between a handful of regimes occupying large parts of phase space, predictability within some of the typically derived regimes may have been missed. Again, these results indicate that a finer-resolution analysis than is often used may be needed to fully describe the evolution of weather states in the North Atlantic–European region.

## Acknowledgments

This work was supported by the Joint UK BEIS/Defra Met Office Hadley Centre Climate Programme (GA01101). Useful conversations with Jeff Knight and Adam Scaife are acknowledged. Comments from three anonymous reviewers helped to improve the paper.

## REFERENCES

*Proc. Seminar on Predictability,*Shinfield Park, Reading, ECMWF, Vol. 1, 1–18.

*New Directions in Statistical Physics: Econophysics, Bioinformatics, and Pattern Recognition*, L. T. Wille, Ed., Springer, 273–309.

## Footnotes

For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).