## Abstract

Observed atmospheric circulation over the North Atlantic–European (NAE) region is examined using cluster analysis. A clustering algorithm incorporating a “simulated annealing” methodology is employed to improve on solutions found by the conventional *k*-means technique. Clustering is applied to daily mean sea level pressure (MSLP) fields to derive a set of circulation types for six 2-month seasons. A measure of the quality of this clustering is defined to reflect the average similarity of the fields in a cluster to each other. It is shown that a range of classifications can be produced for which this measure is almost identical but which partition the days quite differently. This lack of a unique set of circulation types suggests that distinct weather regimes in NAE circulation do not exist or are very weak. It is also shown that the stability of the clustering solution to removal of data is not maximized by a suitable choice of the number of clusters. Indeed, there does not appear to be any robust way of choosing an optimum number of circulation types. Despite the apparent lack of preferred circulation types, cluster analysis can usefully be applied to generate a set of patterns that fully characterize the different circulation types appearing in each season. These patterns can then be used to analyze NAE climate variability. Ten clusters per season are chosen to ensure that a range of distinct circulation types that span the variability is produced. Using this classification, the effect of forcing of NAE circulation by tropical Pacific sea surface temperature (SST) anomalies is analyzed. This shows a significant influence of SST in this region on certain circulation types in almost all seasons. A tendency for a negative correlation between El Niño and an anomaly pattern resembling the positive winter North Atlantic Oscillation (NAO) emerges in a number of seasons. A notable exception is November–December, which shows the opposite relationship, with positive NAO-like patterns correlated with El Niño.

## 1. Introduction

Many attempts have been made to understand midlatitude synoptic climate variability in terms of a relatively small set of dominant large-scale patterns or regimes. In this paradigm, the evolution of the atmospheric circulation consists of periods of one persistent regime interspersed with transitions from one regime to another. It has been argued that an analogy can be made with simple chaotic dynamical systems such as the Lorenz system, in which state trajectories spend more time in those regions of phase space corresponding to quasi-stationary states (Palmer 1998, 1999).

The focus of this paper is the North Atlantic–European (NAE) region. The first schemes for classifying the atmospheric circulation patterns over this region involved subjectively grouping together patterns in synoptic weather charts, for example, Grosswetterlagen (Baur et al. 1944; James 2006, and references therein) and Lamb weather patterns (Lamb 1972). More recently, numerous studies have tried to classify atmospheric circulation patterns objectively. One approach is to define regimes as quasi-stationary large-scale circulation patterns (Vautard 1990). A more popular approach is to look for those circulation types that recur most frequently. One way to do this is to look for local maxima in a probability density function (PDF) of circulation types (e.g., Kimoto and Ghil 1993a, b; Corti et al. 1999; Hsu and Zwiers 2001). Since there are insufficient historical circulation data to reliably estimate PDFs in many dimensions (and also to remove noise that may obscure the presence of weather regimes) the data are generally projected onto the leading pair of spatial patterns of a principal component analysis. Each field (pressure or geopotential height) is then represented by a point on a two-dimensional plane. A bivariate PDF is estimated from the distribution of points, and a search algorithm identifies weather regimes as the local maxima in this PDF. The statistical significance of the local maxima is generally assessed by comparison with PDFs of Gaussian red noise data.

Another popular method is cluster analysis, which aims to partition the data so that each cluster contains similar fields and different clusters contain patterns distinct from one another. Several variants exist, including mixture model clustering (Smyth et al. 1999), *k*-means (Michelangeli et al. 1995), hierarchical clustering (Cheng and Wallace 1993), and the simulated annealing method used here (Philipp et al. 2007). A problem with all cluster methods is how to identify the optimal number of clusters *k*. Tests based on Gaussian red noise are used in *k*-means cluster analysis by Michelangeli et al. (1995, hereafter M95) and others to do this.

Virtually all previous studies focus on the Northern Hemisphere winter season (definitions of which vary). Most studies conclude that there are indeed preferred weather regimes present in the NAE region, and several claim to reproduce the circulation patterns found in other studies. However, while there are similarities between circulation patterns found in different studies, there are also differences. In addition, the number of regimes identified is not robust to the method used, varying between two and six or more. This variation may be due in part to the different time averaging (e.g., daily or monthly means) used in different studies; Teng et al. (2004) show that time averaging of the data can affect the detection of regimes.

However, the evidence for multiple weather regimes from both the PDF and cluster analysis approaches is inconclusive. The PDFs do not always show more than one maximum (e.g., Kimoto and Ghil 1993a) and even when multiple maxima are seen, they may not be significant (Christiansen 2002). Rather than attempt to find multiple regimes, Stephenson et al. (2004) instead try to reject the null hypothesis that the data are multinormal (i.e., multivariate Gaussian). They analyze the same data as Corti et al. (1999) (who identify four regimes) and follow the same procedure of producing a bivariate PDF of the data in the space spanned by the leading pair of empirical orthogonal functions (EOFs). Despite this, they find that the null hypothesis of multinormality cannot be rejected at the 5% level.

Of course, significant deviations from multinormality are not a guarantee of multimodality: the data could be unimodal but non-Gaussian. The problem is that unimodality and multimodality are such broad hypotheses that it is difficult to test their significance for a given set of data. The often-used comparison with Gaussian red noise suffers from the limitation that multinormal distributions are only one possible type of unimodal distribution.

Christiansen (2007) applies *k*-means clustering to synthetic datasets constructed to contain no clusters but whose distributions are either anisotropic Gaussian, skewed, or platykurtic (i.e., flattened). The variable *k* is chosen by comparing the original data with Gaussian red noise. For the anisotropic Gaussian data *k* = 1 is correctly chosen. However, for the skewed and platykurtic data *k* > 1 is selected, with larger *k* for larger skewness or platykurticity. Christiansen also compares the clusters generated by applying mixture model clustering and *k*-means clustering to the same data; the results are inconsistent in terms of both the number and shape of the clusters produced. This suggests that the multiple weather regimes found by M95 and others may not represent physically important features of NAE circulation.

Research based on other methods also casts doubt on the weather regimes paradigm. In a study based on teleconnection patterns, Franzke and Feldstein (2005) find results consistent with a continuum of such patterns. Two interpretations are offered: either that each member of the continuum can be expressed as a linear combination of a few physical modes analogous to basis patterns, or that most members of the continuum represent real physical patterns with their own spatial structure and frequency of occurrence. The autocorrelation times for all the low-frequency patterns are found to be similar, contradicting the idea that a few preferred regime patterns are more persistent than the rest.

The schemes described above are all statistical; hence any regimes identified are not necessarily physically meaningful and may be merely statistical artifacts. Even if regimes are identified from PDFs, they may not be persistent or predictable. Sura et al. (2005) model atmospheric circulation as a dynamical system that evolves due to predictable, deterministic interactions on long time scales and forcing on time scales sufficiently short that it can be approximated by stochastic noise. Using a bivariate PDF analysis of observed data, they conclude that the structure of the PDF can only be explained by including stochastic noise that depends on the state of the system. They show that for a simple one-dimensional system that such state-dependent stochastic noise can limit persistence and predictability while being consistent with local maxima in the PDF. This means that the existence of regimes associated with non-Gaussian features of a PDF does not guarantee useful predictability.

An obvious use of classifications of circulation types is to investigate atmosphere–ocean links. The surface ocean [as represented by sea surface temperature (SST)] has the potential to modify extratropical circulation through heating or cooling of the atmosphere and consequent atmospheric dynamical responses. Numerous authors have attempted to determine the forcing of NAE circulation anomalies by SST (e.g., Ratcliffe and Murray 1970; Palmer and Sun 1985; Peng et al. 1995; Rodwell et al. 1999; Rodwell and Folland 2002, 2003; Moron and Plaut 2003; Cassou et al. 2004), primarily for the winter season. Although the existence of oceanic forcing of the extratropical atmosphere has been demonstrated in the North Atlantic, the effect of the atmospheric forcing of the extratropical ocean appears to be larger. This is consistent with the hypothesis of Hasselmann (1976) that the extratropical ocean integrates weather noise in time to give a red spectrum without much feedback to the atmosphere.

The focus in this paper is on links between tropical Pacific SST and the NAE region. Several such links have been documented by previous authors. Precipitation in the western Mediterranean is shown by Mariotti et al. (2002) to be positively correlated with the Niño-3.4 index in autumn but negatively correlated in spring. The link is not present in all decades, however. Van Oldenborgh et al. (2000) show that the winter Niño-3 index is correlated with wetter conditions in northern Europe and drier conditions in southern Europe in the following spring. Pozo-Vázquez et al. (2001) find a positive North Atlantic Oscillation (NAO)-like response in a composite of MSLP fields in December to February associated with strong La Niña events. A similar La Niña–positive NAO association is found by Gouirand and Moron (2003) for January to March, while El Niño is linked to the negative NAO in the same months. These links show decadal variability, with the La Niña association being more stable. Fraedrich (1990) finds El Niño events to be linked to an increase in cyclonic Grosswetterlagen circulation types in winter, but since these types are based on the central European region the results may not be directly comparable to changes in the NAO.

Other authors find the amplitude and spatial pattern of El Niño–Southern Oscillation (ENSO) events to be important. Toniazzo and Scaife (2006) find that the NAE region response to El Niño events in January–February varies nonlinearly with the strength of the events; moderate El Niño events are associated with a negative NAO-like response, but strong events are associated with a positive MSLP anomaly over the eastern North Atlantic and western Europe. Larkin and Harrison (2005) divide El Niño events into those showing a “conventional” pattern (warming in the central and eastern tropical Pacific) and “date line” events (warming in the central Pacific only). The conventional events are associated with warming in eastern Europe, whereas date line events show cooling in this region.

The work presented in this paper is based on observed MSLP clusters generated using the simulated annealing algorithm described in Philipp et al. (2007). Unlike previous studies, the motivation for using cluster analysis here is not to find a small set of preferred low-frequency weather regimes. The aim is rather to generate a representative set of large-scale circulation patterns for each season, to investigate climate variability as manifested by changes in circulation. The number of clusters used is consequently larger than in many previous studies. Analysis of the frequency of occurrence of the clusters reveals variability and atmosphere–ocean links in each season.

Data and methods are discussed further in section 2. Section 3 discusses the issue of cluster instability. Associations between clusters and SST are described in section 4. Because the cluster analysis produces too many SST links to analyze everything in detail here, attention is restricted to tropical Pacific SST. Finally, section 5 contains a discussion of the results and conclusions.

## 2. Data and methods

To characterize observed daily circulation, fields from the European and North Atlantic Daily to Multidecadal Climate Variability (EMULATE) MSLP (EMSLP) gridded dataset are used (Ansell et al. 2006). EMSLP was developed as part of the European Union Fifth Framework project. The dataset has a resolution of 5° × 5 ° over the region 25° to 70°N, 70°W to 50°E. The data run from the start of 1850 to the end of 2003. Monthly-mean SST data used in section 4 are taken from the Hadley Centre Sea Ice and SST (HadISST) dataset (Rayner et al. 2003).

Clusters based on daily data are generated for different seasons. Daily data are used to give a more detailed characterization of circulation variability in each season than is possible using fields based on longer time means. Clusters are generated using a simulated annealing variant of *k*-means clustering previously applied in genetics (Lukashin and Fuchs 2001). The *k*-means algorithm works to minimize the within-cluster variance summed over all the clusters (hereafter *V*) by exchanging MSLP fields between a prespecified number of clusters *k*. In conventional *k*-means, the fields are always moved so that *V* is reduced at each step. This process continues until a local minimum of *V* is reached. The annealing algorithm improves on this approach by sometimes moving MSLP fields to increase *V* to try to avoid local minima of *V*. The parameter controlling the proportion of such exchanges is called the temperature (by analogy with the annealing of metals by slow cooling). As the algorithm proceeds, the temperature is reduced exponentially so that progressively fewer “incorrect” moves are made and the algorithm converges to a minimum of *V*. In theory, if the temperature is reduced sufficiently slowly the algorithm should converge to the global minimum. In practice, better solutions are generally found by running the algorithm several times with a faster cooling rate than by running only once with a very slow cooling rate.

This work uses clusters for the 2-month seasons January–February (JF), March–April (MA), May–June (MJ), July–August (JA), September–October (SO), and November–December (ND). This choice is made to reduce within-season variation while still retaining a substantial quantity of data within each season. Rather than reducing the dimensions of the data prior to clustering by projecting onto the leading few EOFs, the full 250-dimensional MSLP fields are clustered.

The aim of using cluster analysis in this paper is to produce a set of distinct circulation patterns that are characteristic of each season, to study the variability of different circulation types. No attempt is made to isolate low-frequency weather regimes; every daily field is classified and no temporal filtering of the fields is carried out. However, a seasonally varying climatology is subtracted from the MSLP fields before the clustering is carried out. Without this step, cluster frequencies tend to be biased toward the start or end of the season. This is particularly true for the seasons around the equinoxes where there are substantial differences between typical MSLP fields at the beginning and end of the season.

The climatology is produced by forming a mean MSLP field for each day of the year using all the years of the EMSLP data. Because the amount of MSLP data is limited, the averaged set of fields for each day is only an approximation to the “true” climatology for that day. To address this, the climatology is then smoothed by repeatedly applying a 1, 2, 1 binomial filter so that the fields from neighboring days are merged together. The number of times to apply the filter is chosen by using an ensemble of 18 Hadley Centre atmospheric model (HadAM3) runs (Pope et al. 2000) for the period 1950–2002. The ensemble mean from all the runs is used as an estimate of the true climatology. A climatology is then formed from an individual run. The binomial filter is then repeatedly applied to this climatology and the area-weighted sum of squared differences between the smoothed climatology and the true climatology is calculated. The optimum number of times to apply the filter is then taken to be the number that minimizes this sum of squared errors, which is found to be 270. The effect of this repeated filtering is well approximated as a Gaussian filter with time-scale 270 ≈ 16.4 days, with a half power period of approximately 88 days. This approach succeeds in producing clusters that occur more evenly throughout each season.

## 3. Stability and number of clusters

We first consider whether the cluster algorithm reveals any genuine clustering in the MSLP data. In the following discussion, “classification” means a partition of the MSLP fields into a set of clusters. Our experiments with EMSLP show that in general, the clusters produced are of roughly equal size (although there is a tendency toward a broader distribution of sizes as the number of clusters *k* is increased). This suggests that rather than finding distinct regimes, the algorithm could merely be partitioning a smooth cloud of data into similarly sized volumes [as was also noted by Christiansen (2007)]. By running the annealing algorithm with a high cooling rate, classifications corresponding to local minima of the sum of within-cluster variance *V* can be found. For almost every choice of *k*, different solutions can be found for which *V* is within 1% of the best estimate of its global minimum value, but whose clusters are substantially different (Fig. 1). This suggests that genuine clusters are not present in the data. In addition, each cluster possesses a high density of points near to the origin in the figure with decreasing density away from the origin. This structure does not match the intuitive idea of each cluster consisting of a dense kernel of points at its center surrounded by a lower-density cloud of points.

Cluster stability can be further examined by looking more carefully for the global minimum of *V*. Running the algorithm many times with a slow cooling rate increases the likelihood that this global minimum is found. First, the entire set of data is clustered to produce a classification called FULL. Next, a random subset of half the data is selected and clustered (with the same number of clusters *k*) to produce a classification called HALF. Each MSLP field in the subset of the data now belongs to both a cluster in HALF and a cluster in FULL. This information can be shown in a matrix, where the *i*th row and *j*th column shows the number of fields in HALF cluster *i* and FULL cluster *j*, as displayed here for *k* = 4 for JF season clusters:

Each HALF cluster is paired with a unique FULL cluster—as indicated by the bold numbers, the HALF–FULL pairings here are 1–1, 2–2, 3–4, and 4–3. The idea is to pair the clusters with the largest numbers of fields in common, as in the example above. *P* is defined as the ratio of the sum of the bold numbers to the sum of the numbers in the matrix, so that *P* = 1 for completely stable clusters. The procedure is repeated with 100 half-size datasets, with the cluster stability for each *k* defined as the average value of *P*. Stability values for the JF season for the periods 1900–49 and 1950–99 are shown in Table 1.

Cluster stability is very high for *k* = 2. This is unsurprising, as in this case the clusters are large and highly constrained by the shape of the cloud of data in phase space. The stability gradually decreases as *k* is increased; this is because for larger *k* there are a greater number of possible solutions close to the global optimum. For large *k* the stability remains moderate, because with many small clusters some good matches are inevitable by chance; even clusters based on random data would show some stability in this case. Stability does not decrease monotonically as *k* increases, with some local maxima in stability values such as for *k* = 5. However, the stability values show some variations between the different time periods, so it is unclear whether these maxima are robust. This once again shows a lack of evidence for genuine clusters. To confirm that the lack of clustering is not a consequence of using unfiltered daily data, the stability analysis was also carried out for 10-day mean MSLP fields for the JF season. The cluster stability is once again high for *k* = 2 and slowly decreases as *k* is increased, suggesting that our cluster stability arguments also apply to low-pass filtered data.

Previous authors (including M95) have based their choice of *k* on cluster stability, with the idea that the correct number of clusters should be more stable. The approach taken by M95 (and most of the subsequent authors using their method) is to run the *k*-means algorithm many times to generate a set of classifications of the same set of fields (the algorithm does not find the same solution each time since it can become trapped in different local minima of *V*). If the clusters are robust, then these classifications should all be very similar. The similarity of each pair of classifications is measured using a method based on correlation coefficients of the cluster centroids. M95 calculate an average similarity 0 < *c* < 1 (where *c* = 1 indicates that all the classifications are identical) for 2 ≤ *k* ≤ 10. They compare this with 90% confidence intervals for *c* generated by clustering 100 Gaussian red noise datasets constructed to have the same lag-0 and lag-1 covariance as the original data. Their conclusion is that the correct choice is *k* = 4 (when *c* = 0.92), since this is the only value of *k* for which the value of *c* is significantly higher for the real data than for the red noise data. The most robust clusters are actually found for *k* = 2 (when *c* > 0.99), but this is rejected since robust clusters are then also found in the red noise data (which by construction do not contain any).

The analysis of cluster stability appears unable to reach any clear conclusions about the “correct” value of *k* for a particular season; if stability were the sole criterion used, *k* = 2 would always be selected (or even *k* = 1 if this possibility is allowed). Given this, along with the smooth structure and lack of any obvious clustering within plots such as Fig. 1, we conclude that *there is no objective choice of the number of clusters.*

Cluster analysis can nevertheless be useful in generating a representative set of circulation patterns for each season. Clusters are used in preference to EOFs because they provide a simple way of partitioning the phase space into localized regions. Cluster centroids are averages of similar circulation fields, and hence correspond to physical circulation patterns. EOF spatial patterns are constrained to be mutually orthogonal, so they do not necessarily resemble physical circulation patterns. In addition, opposite phases of each EOF are constrained to have the same spatial pattern, which may not be appropriate for physical phenomena, such as the NAO; clusters are not restricted in this way. While the clusters are all approximately the same size (and hence importance) higher EOFs account for relatively little of the total variance and have correspondingly reduced importance.

For small *k* (as M95 and other methods favor) the full range of circulation variability is not well represented by the cluster centroids. A further problem is that each cluster contains a large number of fields so that the within-cluster variance is large. At the other extreme, if *k* is too large then neighboring cluster centroids begin to look very similar.

In this paper, we choose *k* = 10 for every season. This is a compromise: not so few clusters that the cluster centroids do not effectively span the space of data, but not so many that the similarity between neighboring cluster centroids is too great. We emphasize that *k* = 10 is an arbitrary choice that cannot be objectively justified. However, the choice of 10 patterns is not unprecedented in circulation-type analysis, being used by Lund (1963) and Barnston and Livezey (1987). It is a larger number than has often been chosen previously but produces a range of circulation types while keeping cluster centroids reasonably distinct. An example of this is the set of centroids for the January–February season (Fig. 2). These centroids display a range of patterns including distinct varieties of strong zonal flow and blocking.

The relationship between the JF clusters and the winter NAO is highlighted in Fig. 3, which shows the projection of the cluster centroids onto the plane spanned by the leading pair of EOFs. The EOF 1 spatial pattern (not shown) accounts for 21% of the variance and resembles the winter NAO, with two zonally elongated nodes occupying the northern and southern halves of the NAE region. The clusters that most resemble the opposite phases of the winter NAO (4, 7, and 10) project onto EOF 1 as expected. Because the northern node of the EOF 1 pattern extends from Greenland to Scandinavia, clusters with large pressure anomalies over Scandinavia (2 and 6) also project onto EOF 1.

## 4. Application to links between circulation and SST

The cluster classification derived from the analysis in the previous section is now used to examine the influence of observed global SSTs on the NAE circulation. SST relationships are investigated in all seasons by lagged regression with the frequency of occurrence (days per season) of each cluster, so that there is one frequency value per cluster per year. The regression is performed over the period 1870–2002, to correspond to the availability of SST in the HadISST dataset. To investigate SST effects on the atmosphere, we use SST from the month preceding the 2-month season of interest. One-month lags are considered sufficient, since the time scale for tropical SST forcing of the extratropical atmosphere via Rossby wave propagation is under a month (Sardeshmukh and Hoskins 1988; Ting and Held 1990). In addition, ENSO itself varies slowly, so that 1-month lags with Pacific SST are likely to be very similar to 2-month lags. In examining the atmospheric imprint on SST, we use SST from the month following the season. Unlagged regressions were also analyzed; the results are intermediate between the SST leading and SST lagging patterns (not shown).

Trends in global SST over the instrumental period are removed from the SST data before computing regressions. To do this, a principal component analysis of low-pass filtered global SST is performed. This uses data for the period 1911 to 2002, over which the data quality is considered adequate for this purpose (Folland et al. 1999). The leading eigenvector resembles the spatial pattern of the linear trend over the same period, but its principal component time series represents the well-known nonlinearity in observed historical warming. The projection of this eigenvector onto each SST field in the dataset is then subtracted from the data. This approach allows a nonlinear estimate of the secular change to be removed.

Significant relationships between SST and circulation-type frequencies are apparent in all seasons, although the strength and significance of these links vary between clusters. In general, a stronger effect is seen for MSLP leading SST, with pronounced SST patterns in the North Atlantic consistent with atmospheric forcing of the ocean. The SST regression patterns are positively spatially correlated with cluster MSLP over the North Atlantic (in the sense that high pressure is associated with warm SST) in 57 of the 60 clusters. The correlation coefficients are generally larger in the Northern Hemisphere summer than in winter, averaging 0.51 between May and October and 0.29 between November and April. The enhanced correlation of high MSLP with warm SST in summer suggests this results from increased insolation under clear skies.

The number of patterns produced by the analysis precludes a complete discussion of all the SST associations found. Instead, we select a subset of clusters that show links to tropical Pacific SSTs. This approach is chosen because progress has been made on the effect of North Atlantic SST on NAE circulation while the effect of ENSO is less well established though starting to become clearer (Merkel and Latif 2002; Brönnimann et al. 2004; Toniazzo and Scaife 2006). By using a cluster decomposition, it is possible to identify which circulation types show sensitivity to the phase of ENSO. Additionally, the use of six 2-month seasons offers a better chance to resolve the seasonal cycle of ENSO influence than the traditional 3-month seasons. The regression analysis is limited, however, in that it is not possible to distinguish which phase of ENSO is responsible for the signal. For example, the appearance of an El Niño pattern may mean that the relevant circulation type occurs more frequently in El Niño years, or less frequently in La Niña years, or both.

For the JF season, inspection of SST-frequency regressions for all 10 clusters reveals that the frequencies of occurrence of two clusters, JF4 and JF7, are related to La Niña–like negative SST regressions in the tropical Pacific (Fig. 4). The patterns of both of these circulation types appear zonally elongated and dipolar, with positive anomalies in the southern part and negative anomalies in the northern part, akin to the conventional positive winter NAO. JF4 is most similar in this respect, while the latitude of the northern center in JF7 lies farther to the south.

In general, it is not possible to find a JF cluster centroid pattern that aligns completely with an NAO pattern defined by, for example, principal component analysis. This is because cluster analysis does not necessarily produce clusters that lie directly on the principal axis of variation of the data. If there are sufficient clusters to differentiate the range of circulation types in the data, however, one or two patterns similar to each phase of the NAO would be expected. This is found to be the case; JF4 and JF7 resemble the positive NAO and JF10 resembles the negative NAO. Thus the patterns found to have La Niña–like associations here can both be claimed to have positive NAO characteristics, corresponding to strong westerly weather types over Europe. The correlation of December Niño-3 values (SST average over the region 5°S–5°N, 90°–150°W) with cluster frequency is −0.24 for JF4 and −0.14 for JF7. The significance of these correlations is assessed by correlating Niño-3 against 10 000 randomly reordered versions of the cluster frequency time series. Only the JF4 correlation is significant at the 5% level, reflecting the greater overall significance for JF4 and the tendency for significance in only part of the equatorial East Pacific for JF7.

There are some significant North Atlantic SST links in the month before the JF season (December) in JF4, but there is little sign of the tripole pattern (warm SST anomalies extending eastward from the United States and cold anomalies in the tropical and subpolar North Atlantic) usually associated with the positive NAO for either cluster. Tripole SSTs do appear, however, in the regression patterns for the month after the JF season (March). For these circulation types, therefore, the analysis suggests relatively weak wintertime forcing of the atmosphere by SST but rather stronger forcing of SST by the atmosphere within the North Atlantic. Note also that there are no clusters with significant positive tropical Pacific SST (i.e., El Niño–like) regressions. El Niño–like regression patterns do exist for some circulation types, but these regressions are not significant and the circulation types do not resemble the negative NAO. Equally, types similar to the negative NAO do not appear to have El Niño–like regression patterns.

Different circulation types to those in JF are related to Pacific SSTs in MA. Two clusters are found to produce significant regressions, MA2 and MA6 (Fig. 5). Both show February SST regressions with a La Niña–like pattern. MA2 has an anticyclonic center north of the British Isles, with weak positive MSLP anomalies over most of Europe and weak negative anomalies over the northwestern Atlantic and Russia. This alignment of alternating positive and negative centers has the appearance of a Rossby wave train. MA6 has a strong Scandinavian anticyclone with cyclonic anomalies centered over Greenland. These are the only MA circulation types with anticyclonic centers over northern Europe. The cluster mean MSLP patterns show that these types correspond to blocking circulations, in contrast to the zonal relationships with a La Niña–like regression pattern in JF. Again, there appear to be relatively weak North Atlantic SST signals preceding the season, but stronger and somewhat tripolar SST signals following the occurrence of these types. February Niño-3 correlates significantly at the 5% level with the frequency of MA6 (−0.20) but not MA2 (−0.15).

In MJ, there is a further negative La Niña–type regression for April SST for circulation type MJ5, a zonal dipole, which appears similar to the positive winter NAO anomaly (Fig. 6). Unlike JF and MA, there is a circulation type (MJ2) with approximately opposite anomalies (akin to the negative winter NAO) linked with a positive, El Niño–like regression pattern for April SST. Correlations with April Niño-3 are +0.18 for MJ2 and −0.10 for MJ5. Only the former is significant at the 5% level, which is consistent with the regression pattern for MJ5 showing significance only in the central Pacific, away from much of the Niño-3 region. There is also an indication of significant opposite North Atlantic tripole SSTs leading these circulation types by one month, although the lagged SST regressions show stronger tripole patterns.

An almost identical “positive NAO” cluster pattern to MJ5 is associated with a La Niña–like regression in July–August (JA7; Fig. 7). Note that this winter NAO-like pattern is not the same as the summer NAO pattern described in Hurrell and Folland (2002). The Niño-3 index for the previous month is significantly correlated with the frequency time series (−0.24). A further circulation type (JA10) also appears to have a negative regression with tropical Pacific SST, albeit weaker. Like JA7, this pattern has a high-latitude cyclonic anomaly, this time centered over Scandinavia. The Niño-3 correlation (−0.13) is not significant. SO is the season with the least apparent ENSO connections—there is a hint of an El Niño–like regression associated with a positive MSLP center over Greenland and Iceland and negative centers over Scandinavia and the Atlantic Ocean (SO9; Fig. 8). This pattern still bears some similarity to a negative zonal dipole. The circulation type that is most opposite to SO9 does show a negative tropical Pacific regression, but it is not significant (not shown).

Of the nine cluster patterns surveyed so far, four appear to show a positive NAO-like dipole (resembling the winter NAO pattern) with a regression onto negative tropical Pacific SST anomalies (JF4, JF7, MJ5, JA7), while two link a negative NAO-like dipole with a positive tropical Pacific SST regression pattern (MJ2, SO9). This suggests a tendency for a systematic relationship between NAE circulation and Pacific SST that is almost independent of season (although note that MA is an exception). Such a relationship would see warmer SSTs result in a shift toward a more negative “NAO” dipole pattern. The regression analysis for ND, however, provides a very different picture (Fig. 9). ND3, which resembles the positive winter NAO, has a significant positive regression with an El Niño–like pattern of SST. Positive Pacific SST regressions are also found with similar circulation types having a strong zonal component of flow (ND5 and ND6). These have cyclonic anomalies over Scandinavia and the British Isles, respectively. Lag correlations of Niño-3 with each of these cluster series (ND3: +0.17, ND5: +0.24, ND6: +0.22) are all significant at the 5% level. In addition to the positive SST regressions, significant links between negative SST patterns and the frequencies of clusters ND4 and ND9 are found. These types possess meridional MSLP dipole anomalies and are among the clusters showing the blocked patterns in the ND classification. Again, correlations of the frequencies with the Niño-3 index for October are significant (ND4: −0.19, ND9: −0.26). Overall, ND is the season with the strongest apparent influence from tropical Pacific SSTs, as shown by both the number and strength of significant lagged relationships. Like the other seasons, ND circulation does not appear to be led by strong North Atlantic SST anomalies, except for ND5, which shows negative regressions with a horseshoe SST pattern. A range of tripole SST anomaly patterns does arise, however, after ND seasons in which these types occur.

In addition to the linear analysis described above, cluster frequency was plotted against Niño-3.4 index to attempt to highlight any nonlinear relationship (very similar results are obtained if the Niño-3 index is used). The results show that the links between ENSO and NAE circulation are weak, but the relationships between different patterns and phases of ENSO shown in the linear analysis are nevertheless discernible. Examples of these plots for November–December clusters 5 and 9 (which show significant linear regression patterns resembling El Niño and La Niña, respectively) are displayed in Fig. 10. The weakness of the links suggests that diagnosing a nonlinear relationship between NAE circulation and ENSO would be difficult.

The reversal of the Pacific SST associations in ND from those in other seasons, where La Niña–type regression patterns are associated with more zonal circulation types and El Niño–type regression patterns (if present) are associated with more blocked types, is striking. Particularly intriguing is the contrast between ND and similar clusters in neighboring JF. For example, the patterns of ND3 and JF4 show remarkable similarity to each other (and a resemblance to the positive phase of the winter NAO), and yet appear to occur more frequently in response to opposite ENSO phases. The difference is particularly clear for MSLP leading SST (Figs. 4 and 9), where both show the same North Atlantic tripole SST pattern but opposite ENSO phases in the Pacific. This change in sensitivity is consistent with the findings of other studies (Huang et al. 1998; Lau and Nath 2001; Moron and Plaut 2003; Brönnimann et al. 2007), which suggest a reversal in the effect of ENSO on extratropical circulation between early and late winter. Moron and Gouirand (2003) also show significant differences in the NAE response to ENSO in early and late winter. In November–December, El Niño is associated with a positive NAO-like pattern, while La Niña is linked to positive pressure anomalies between Greenland and western Europe. From January to March, however, the patterns associated with each phase of ENSO are spatially similar but of the opposite sign. The timing of the change suggests that links between El Niño and NAE circulation may be seriously obscured in analysis based on the standard December–February winter season. Manzini et al. (2006) show this reversal in a model study of the response to El Niño forcing; at 60°N easterly winds propagate downward from the stratosphere and reach the surface in February–March, replacing westerly winds evident in early winter.

## 5. Summary

An observational classification of daily MSLP circulation patterns covering the whole year has been produced using cluster analysis. We use a simulated annealing algorithm that improves on the previously employed *k*-means algorithm by better avoiding local minima of the sum of within-cluster variance *V*.

Investigation of the January–February season shows that classifications exist with *V* close to the minimum value but substantial differences within the clusters. Stability analysis based on clustering subsets of the MSLP data shows that stability is high when the number of clusters *k* is small and gradually decreases as *k* is increased. For small *k* the clusters are large, so their configuration is highly constrained by the shape of the data in phase space. For larger *k* the clusters are smaller, so there are many more possible arrangements of the clusters and the shape becomes less important. Cluster stability may also be affected by regions of high density associated with weather regimes (the clustering originally sought), but no such regions are apparent in the dataset used in this study. We conclude that stability cannot reliably be used as a way of determining an optimal number of weather regimes.

Cluster analysis is nevertheless useful, since it can be used to generate a set of representative circulation types and associated frequency time series for each season. We make the subjective choice of 10 clusters for every season (higher than previous authors) because this shows a range of circulation types not adequately represented if fewer clusters are chosen. Some of the circulation types derived from this analysis correspond to well-known phenomena, such as the winter NAO, but there are many others related to other facets of the circulation.

The set of circulation types derived using the cluster analysis is used to investigate the influence of SST on NAE circulation. This is inferred using regression of the cluster frequency with the SST in the month preceding each season. We concentrate on the effect of ENSO to simplify the analysis, and because it has been less well studied. The results show a range of associations through the year, with the largest number of significant ENSO associations in ND (5) and the least in SO (with one marginal regression pattern). Correlations between the Niño-3 index and lagged cluster frequency show that the links described above generally explain a modest fraction of the circulation variance. This may explain why ENSO effects on European climate have not been more readily apparent.

Examining the SST regression patterns in each season reveals a tendency for clusters with MSLP anomaly patterns similar to the positive winter NAO to have La Niña–like SST regression patterns. To a lesser extent, clusters with anomalies similar to negative winter NAO tend to have El Niño–like SST regression patterns. Two seasons, MA and ND, do not fit this pattern. In particular, ND shows the reverse links: circulation types with NAO-positive characteristics have El Niño–like SST regression patterns, and two blocking types have La Niña–like SST regressions. These results confirm previous suggestions that there is a marked difference in the effect of ENSO on European climate between early and late winter that may be masked in analyses using means over the conventional DJF winter season.

## Acknowledgments

This work was funded by the EU EMULATE project (Contract EVK2-CT-2002-00161 EMULATE).

## REFERENCES

**,**

**,**

**,**

**,**

**,**

**,**

**.**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**.**

**,**

**,**

**,**

**,**

**.**

**.**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**.**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**.**

**,**

**.**

**,**

**,**

## Footnotes

*Corresponding author address:* D. R. Fereday, Met Office Hadley Center, FitzRoy Rd., Exeter EX1 3PB, United Kingdom. Email: david.fereday@metoffice.gov.uk