1. Introduction
The analysis of lowfrequency atmospheric variability within the framework of circulation regimes has proved quite useful in terms of relating synoptic behavior and extreme surface anomalies to the largescale flow (Ayrault et al. 1995; Arnott et al. 2004; Cassou et al. 2005; Stan and Straus 2007). Methods that have been used to identify such regimes include cluster analysis (Michelangeli et al. 1995), modeling the probability distribution function in terms of additive Gaussian structures (Smyth et al. 1999), and statistical models based on neural networks (Monahan et al. 2001), although this last method has been questioned (Christiansen 2005).
However, the vast majority of observational studies identifying regimes have one feature in common: the analysis is carried out on time series of a single variable (often geopotential height or streamfunction), often filtered to remove synoptic timescale variability, and expressed in a reduced dimensional representation [usually principal components (PCs)].
Theoretical studies, on the other hand, have shown the importance of baroclinicscale transient feedback on the large scale as a mechanism to help support circulation regimes (Reinhold and Pierrehumbert 1982; Vautard and Legras 1988). Observational studies have confirmed the feedback between baroclinic waves and lowfrequency structures (Nakamura and Wallace 1990; Nakamura et al. 1997).
The purpose of this article is to introduce a method to identify circulation regimes that includes baroclinicscale properties. This is accomplished in the context of the work of Straus et al. (2007, hereafter SCM), in which circulation regimes were associated with significant clustering of lowfrequency filtered height maps in a state space defined by PCs.
This paper uses the same reanalysis dataset as in SCM, but incorporates the baroclinic eddy properties to help define a minimum state space dimension. In addition we use an improved method of assessing significance, involving a generalization of the “random phase” approximation of Christiansen (2007).
Section 2 briefly describes the dataset, the preprocessing used, the construction of the envelope function of eddy properties and the maximum covariance analysis. Section 3 describes the cluster analysis and the tests of significance. Brief conclusions are presented in section 4.
2. Data and analysis
Time series of oncedaily geopotential height and meridional wind at 200 hPa were obtained from the reanalyses of the National Centers for Environmental Prediction (NCEP) for the 152day period starting 21 November, for each of the 54 winters 1948/49 through 2001/02. An estimated climatological annual cycle was subtracted (as in SCM), so that the resulting series retain both interannual and intraseasonal variability. A digital filter was used to divide the time series of each variable into two components: a “low frequency” component containing all periods longer than about 10 days, and a “high frequency” component (retaining periods shorter than 10 days), representative of baroclinic eddies. The digital filter is similar to that used by, for example, Blackmon (1976), who use a filter that retains periods of 2–6 days. Here the highfrequency category extends up to periods of 10 days, since this filter is meant to capture not only the growth and propagation of baroclinic waves, but their life cycle as well (Simmons and Hoskins 1978). We denote the highfrequency components by primes.
The modulation of υ_{rms} series on time scales longer than 10 days was obtained by retaining only the lowfrequency component. The use of such an “envelope” functions is given (e.g., in Nakamura and Wallace 1990). Since the meridional wind is dominated by the longitudinal derivative of the streamfunction, it emphasizes the higher zonal wavenumbers.
Note that we need to filter twice: once to obtain the highfrequency fields appearing on the righthand side of Eq. (1) and a second time to lowpass filter the resulting rms velocity. Since each application of the filter results is a loss of 15 days of the record at the beginning and at the end, the highfrequency fields are available for 120 days starting 7 December, but the envelope fields are available only for the 92 days starting 21 December for each winter.
A principal component (or empirical orthogonal function) analysis was carried out on the 92day time series of lowfrequency 200hPa geopotential height, over the domain 20°–90°N, 150°E–30°W, denoted by the vector Z, and, separately on the lowfrequency envelope of υ_{rms}, denoted by the vector E. The spectrum of explained variance for Z is fairly red, with the leading three modes explaining 22%, 16%, and 12% of the total space–time variance, respectively, while the leading 10 modes together explain 84%. In contrast, the spectrum of the E modes is somewhat less red, with the leading three modes explaining about 15%, 10%, and 9% of the total variance, and the leading 10 modes explaining 59%.
The covariance of the envelope field with the lowfrequency flow is assessed with the use of maximum covariance analysis (MCA),^{1} applied to the PCs of Z and E. This analysis identifies linear combinations of PCs that maximize the squared covariance between the two fields. For more details on this and related multivariate analysis techniques, see Bretherton et al. (1992) and Wallace et al. (1992).
The results of the MCA calculations depend on the truncation (number of PCs) of the input fields. Table 1 gives a number of statistics as a function of PC truncation: the total squared covariance explained, the squared covariance explained by each of the first three modes, the percent of total squared covariance explained, and the percentage of the truncated variance of Z explained by each mode. In addition, the correlations between the leading PCs of Z and E with the corresponding leading MCA time series are shown. The correlations for the first mode are highly significant, indicating that the MCA analysis is not overly dominated by the Z spectrum, which is the redder of the two.^{2}
One robust result from Table 1 is that the first 3 MCA modes dominate at all truncations, explaining at least 75% of the total squared covariance. At least five PCs are required, however, to represent these three modes in a consistent fashion.
Figures 1 –3 compare the MCA patterns of Z and E for truncations of 5 and 10 PCs for the leading three modes. All the patterns shown are heterogeneous correlation functions (Bretherton et al. 1992), so that the maps of E give the covariances of this envelope field with the (standardized) Z time series associated with the given mode, while the height patterns give the covariance of Z with the (standardized) E time series associated with the given mode.
The leading MCA pattern shows a strong high in the North Pacific and a trough over northwestern North America (Figs. 1a,c). The associated northward shift in the storm track (seen in the envelope of υ_{rms} in Figs. 1b,d) is consistent with the shift in the jet implied by Figs. 1a,c.
There is agreement in the overall pattern between the two representations based on a different truncation. This is also true for the second mode (Fig. 2), which shows a North Atlantic mode combined with a low over western North America. Here we see that the weakened westerlies (anomalous easterlies) in the North Atlantic are consistent with the reduced storm track (envelope) activity, while farther south the situation is reversed. In the Pacific, the low over western North America (high in the North Pacific) is also associated with consistent storm track changes.
The third MCA pattern shows a sort of asymmetric annular mode, with enhanced height gradients in the Pacific (and a low over Alaska), consistent with enhanced storm track activity in this region. This configuration holds also in the North Atlantic, although the height gradient and envelope function anomalies are fairly weak in the five PC truncation.
3. Cluster analysis and significance
a. Truncation and algorithm
We apply the cluster analysis directly to the set of Z time series associated with the first M MCA modes, where the dimension M is to be chosen. (Each atmospheric state is represented by a point in a Mdimensional space.) The cluster analysis consists of a partitioning algorithm that, given a prespecified number k of clusters, iteratively assigns all points to one of k groups (clusters) so that the ratio of the withincluster variance to the total variance is minimized (Michelangeli et al. 1995). No quasistationary filtering was applied (see SCM). Once each state is identified as belonging to a particular cluster, corresponding composite maps of Z and E are obtained using the full original gridpoint representation of the filtered datasets. The composites are presented as anomalies from the full climatology for all fields.
To apply the cluster analysis we must first decide the dimension M and the number N of PCs (i.e., the truncation) to be used to define the MCA modes. The results of the previous section suggest that only three distinct MCA modes explain the covariance of the storm track envelope field with the lowfrequency height field. A minimum truncation of 5 PCs for each field is necessary to capture these MCA modes, suggesting that the truncation N be at least 5. We find consistent cluster analysis results for a range of truncations N ≥ 5 and dimension M ≥ 3.
Figures 4 –7 compare the composite fields of Z and E for 4 clusters (k = 4) for the choices of (N = 5 and M = 3), (N = 5 and M = 5), (N = 10 and M = 5), and (N = 10 and M = 10). The height composites are indicated by the contours, the envelope υ_{rms} by the shading. The Pacific trough, Alaska ridge, and Arctic high patterns of SCM show reasonably good agreement across the various choices of truncation and dimension, with the envelope field consistent with the height gradient anomalies.
The Arctic low pattern shows somewhat less consistency for the (N = 10 and M = 5) case. This cluster was also found to be less robust in SCM. Repeating the cluster analysis with only k = 3 clusters, we find good agreement between all choices of N and M, and with the results of SCM.
b. Significance
Cluster analysis seeks to identify regions in the reduced space whose probability density is higher than would be expected from a suitable background (a multinormal distribution). Although PCs are linearly uncorrelated (orthogonal in time), a “significant” cluster implies that they are not statistically independent. Significance testing is carried out by generating a large number of independent synthetic time series for each PC. Using the approach we describe below, we are assured that each synthetic time series approximately captures the temporal correlation structure of the corresponding real PC, yet they are statistically independent.
This approach, designed for clusters computed from PCs, would not be appropriate if applied directly to an MCAbased cluster calculation, since the MCA time series for different modes are not linearly uncorrelated. However, if synthetic PCs are generated as described below, and the MCA and cluster analysis repeated using these PCs, the results are appropriate for assessing significance.
Christiansen (2007) pointed out that the temporal correlation structure of any time series can be captured in a synthetic time series by computing the Fourier harmonic coefficients for all frequencies, retaining the original amplitudes, but randomizing the phases for each frequency. This randomphase approach preserves the spectrum. Since (for an infinite time series) the spectrum is the Fourier transform of the temporal autocovariance function, the synthetic time series should retain the original temporal correlation information.
A synthetic time series corresponding to P_{y,d} is formed using a randomly chosen surrogate Ŝ_{y} to which we add 54 randomly chosen surrogates F̂_{d}. This is repeated for each of the PCs to form a synthetic dataset of the same size as the original observed dataset. The cluster analysis is repeated for each of 100 such datasets; the number of datasets for which the variance ratio is less than the observed (synthetic data more clustered than observed data), once divided by 100, gives the significance level.
Significance results are shown in Table 2 for a range of k and a range of M, the number of MCA modes used in the clustering. In each case the truncation N = M. For 3 or 4 clusters, the corresponding p value is 0.03 or below^{3} for all M. Note that this significance test does not measure the degree to which the underlying distribution departs from a multinormal one (Christiansen 2007).
4. Conclusions and discussion
The main conclusions of this article are the following:

For wintertime in the Pacific–North America region, three MCA modes capture over 75% of the squared covariance between the lowfrequency 200hPa height field and the envelope fields describing the lowfrequency variation of the rms of highfrequency meridional wind. This result is robust to changes in the PC truncation of the fields in the analysis as long as at least 5 PCs are retained.

Cluster analysis carried out for a range of PC truncations and number of MCA modes identifies essentially the same preferred patterns identified in SCM (for the same observational dataset) on the basis of the lowfrequency flow. These states are thus characterized by a strong eddy–mean flow relationship.

An analysis of significance improved over that of SCM shows that 3 or 4 clusters are significant with a p value of 0.03 or less, independent of the number of modes kept in the cluster analysis. This is based on the full input dataset. The degree to which the underlying distribution departs from a Gaussian in not measured in this analysis.
For the region and season presented here, inclusion of the transient eddy configuration did not substantially modify the preferred patterns suggested by the original cluster analysis using principal components. However in general this will not be the case.
Acknowledgments
The author would like to acknowledge extensive discussions with Dr. Tim DelSole. This work was supported by the National Science Foundation under Grant ATM0332910, the National Aeronautics and Space Administration under Grant NNG04GG46G, and the National Oceanic and Atmospheric Administration under Grant NA04OAR4310034.
REFERENCES
Arnott, J. M. , J. L. Evans , and R. Chiaromonte , 2004: Characterization of extratropical transition using cluster analysis. Mon. Wea. Rev., 132 , 2916–2937.
Ayrault, F. , F. Lalaurette , A. Joly , and C. Loo , 1995: North Atlantic ultra high variability. Tellus, 47A , 671–696.
Blackmon, M. L. , 1976: A climatological spectral study of the 500 mb geopotential height of the Northern Hemisphere. J. Atmos. Sci., 33 , 1607–1623.
Bretherton, C. S. , C. Smith , and J. M. Wallace , 1992: Comparison of methods for finding coupled patterns in climate data. J. Climate, 5 , 541–560.
Cassou, C. , L. Terray , and A. Phillips , 2005: Tropical Atlantic influence on European heat waves. J. Climate, 18 , 2805–2811.
Christiansen, B. , 2005: The shortcomings of nonlinear principal component analysis in identifying circulation regimes. J. Climate, 18 , 4815–4832.
Christiansen, B. , 2007: Atmospheric circulation regimes: Can cluster analysis provide the number? J. Climate, 20 , 2229–2250.
Michelangeli, PA. , R. Vautard , and B. Legras , 1995: Weather regimes: Recurrence and quasi stationarity. J. Atmos. Sci., 52 , 1237–1256.
Monahan, A. H. , L. Pandolfo , and G. Flato , 2001: The preferred structure of variability of the northern hemisphere atmospheric circulation. Geophys. Res. Lett., 28 , 1019–1028.
Nakamura, H. , and J. M. Wallace , 1990: Observed changes in baroclinic wave activity during the life cycles of lowfrequency circulation anomalies. J. Atmos. Sci., 47 , 1100–1116.
Nakamura, H. , M. Nakamura , and J. L. Anderson , 1997: The role of high and lowfrequency dynamics in blocking formation. Mon. Wea. Rev., 125 , 2074–2093.
Reinhold, B. B. , and R. T. Pierrehumbert , 1982: Dynamics of weather regimes: Quasistationary waves and blocking. Mon. Wea. Rev., 110 , 1105–1145.
Simmons, A. J. , and B. J. Hoskins , 1978: The life cycles of some nonlinear baroclinic waves. J. Atmos. Sci., 35 , 414–432.
Smyth, P. , K. Ide , and M. Ghil , 1999: Multiple regimes in NorthernHemisphere height fields via mixture model clustering. J. Atmos. Sci., 56 , 3704–3723.
Stan, C. , and D. M. Straus , 2007: Is blocking a circulation regime? Mon. Wea. Rev., 135 , 2406–2413.
Straus, D. M. , S. Corti , and F. Molteni , 2007: Circulation regimes: Chaotic variability versus SSTforced predictability. J. Climate, 20 , 2251–2272.
Vautard, R. , and B. Legras , 1988: On the source of midlatitude lowfrequency variability. Part II: Nonlinear equilibration of weather regimes. J. Atmos. Sci., 45 , 2845–2867.
Wallace, J. M. , C. Smith , and C. S. Bretherton , 1992: Singular value decomposition of wintertime sea surface temperature and 500mb height anomalies. J. Climate, 5 , 561–576.
Heterogeneous covariance maps for the second MCA mode with truncation of 10 PCs for (a) the lowfrequency height field and (b) the envelope rms of highfrequency meridional wind. (c),(d) The same results for 5 PC truncation. The covariance of each field is shown with respect to the unit of standard deviation of the other fields’ MCA temporal coefficient. The contour interval is 10 m in (a),(c) and 0.2 m s^{−1} in (b),(d).
Citation: Monthly Weather Review 138, 11; 10.1175/2010MWR3333.1
Heterogeneous covariance maps for the second MCA mode with truncation of 10 PCs for (a) the lowfrequency height field and (b) the envelope rms of highfrequency meridional wind. (c),(d) The same results for 5 PC truncation. The covariance of each field is shown with respect to the unit of standard deviation of the other fields’ MCA temporal coefficient. The contour interval is 10 m in (a),(c) and 0.2 m s^{−1} in (b),(d).
Citation: Monthly Weather Review 138, 11; 10.1175/2010MWR3333.1
Heterogeneous covariance maps for the second MCA mode with truncation of 10 PCs for (a) the lowfrequency height field and (b) the envelope rms of highfrequency meridional wind. (c),(d) The same results for 5 PC truncation. The covariance of each field is shown with respect to the unit of standard deviation of the other fields’ MCA temporal coefficient. The contour interval is 10 m in (a),(c) and 0.2 m s^{−1} in (b),(d).
Citation: Monthly Weather Review 138, 11; 10.1175/2010MWR3333.1
Heterogeneous covariance maps for the third MCA mode with truncation of 10 PCs for (a) the lowfrequency height field and (b) the envelope rms of highfrequency meridional wind. (c),(d) The same results for 5 PC truncation. The covariance of each field is shown with respect to the unit of standard deviation of the other fields’ MCA temporal coefficient. The contour interval is 10 m in (a),(c) and 0.2 m s^{−1} in (b),(d).
Citation: Monthly Weather Review 138, 11; 10.1175/2010MWR3333.1
Heterogeneous covariance maps for the third MCA mode with truncation of 10 PCs for (a) the lowfrequency height field and (b) the envelope rms of highfrequency meridional wind. (c),(d) The same results for 5 PC truncation. The covariance of each field is shown with respect to the unit of standard deviation of the other fields’ MCA temporal coefficient. The contour interval is 10 m in (a),(c) and 0.2 m s^{−1} in (b),(d).
Citation: Monthly Weather Review 138, 11; 10.1175/2010MWR3333.1
Heterogeneous covariance maps for the third MCA mode with truncation of 10 PCs for (a) the lowfrequency height field and (b) the envelope rms of highfrequency meridional wind. (c),(d) The same results for 5 PC truncation. The covariance of each field is shown with respect to the unit of standard deviation of the other fields’ MCA temporal coefficient. The contour interval is 10 m in (a),(c) and 0.2 m s^{−1} in (b),(d).
Citation: Monthly Weather Review 138, 11; 10.1175/2010MWR3333.1
Pacific trough cluster composites using (a) 10 MCA coefficients computed from 10 PCs, (b) 5 MCA coefficients computed from 10 PCs, (c) 5 MCA coefficients computed from 5 PCs, and (d) 3 MCA coefficients computed from 5 PCs. The contours show the 200hPa height field (contour interval of 30 m), shading the 200hPa envelope rms of highfrequency meridional wind (interval of 0.3 m s^{−1}).
Citation: Monthly Weather Review 138, 11; 10.1175/2010MWR3333.1
Pacific trough cluster composites using (a) 10 MCA coefficients computed from 10 PCs, (b) 5 MCA coefficients computed from 10 PCs, (c) 5 MCA coefficients computed from 5 PCs, and (d) 3 MCA coefficients computed from 5 PCs. The contours show the 200hPa height field (contour interval of 30 m), shading the 200hPa envelope rms of highfrequency meridional wind (interval of 0.3 m s^{−1}).
Citation: Monthly Weather Review 138, 11; 10.1175/2010MWR3333.1
Pacific trough cluster composites using (a) 10 MCA coefficients computed from 10 PCs, (b) 5 MCA coefficients computed from 10 PCs, (c) 5 MCA coefficients computed from 5 PCs, and (d) 3 MCA coefficients computed from 5 PCs. The contours show the 200hPa height field (contour interval of 30 m), shading the 200hPa envelope rms of highfrequency meridional wind (interval of 0.3 m s^{−1}).
Citation: Monthly Weather Review 138, 11; 10.1175/2010MWR3333.1
Alaska ridge cluster composites using (a) 10 MCA coefficients computed from 10 PCs, (b) 5 MCA coefficients computed from 10 PCs, (c) 5 MCA coefficients computed from 5 PCs, and (d) 3 MCA coefficients computed from 5 PCs. The contours show the 200hPa height field (contour interval of 30 m), shading the 200hPa envelope rms of highfrequency meridional wind (interval of 0.3 m s^{−1}).
Citation: Monthly Weather Review 138, 11; 10.1175/2010MWR3333.1
Alaska ridge cluster composites using (a) 10 MCA coefficients computed from 10 PCs, (b) 5 MCA coefficients computed from 10 PCs, (c) 5 MCA coefficients computed from 5 PCs, and (d) 3 MCA coefficients computed from 5 PCs. The contours show the 200hPa height field (contour interval of 30 m), shading the 200hPa envelope rms of highfrequency meridional wind (interval of 0.3 m s^{−1}).
Citation: Monthly Weather Review 138, 11; 10.1175/2010MWR3333.1
Alaska ridge cluster composites using (a) 10 MCA coefficients computed from 10 PCs, (b) 5 MCA coefficients computed from 10 PCs, (c) 5 MCA coefficients computed from 5 PCs, and (d) 3 MCA coefficients computed from 5 PCs. The contours show the 200hPa height field (contour interval of 30 m), shading the 200hPa envelope rms of highfrequency meridional wind (interval of 0.3 m s^{−1}).
Citation: Monthly Weather Review 138, 11; 10.1175/2010MWR3333.1
Arctic high cluster composites using (a) 10 MCA coefficients computed from 10 PCs, (b) 5 MCA coefficients computed from 10 PCs, (c) 5 MCA coefficients computed from 5 PCs, and (d) 3 MCA coefficients computed from 5 PCs. The contours show the 200hPa height field (contour interval of 30 m), shading the 200hPa envelope rms of highfrequency meridional wind (interval of 0.5 m s^{−1}).
Citation: Monthly Weather Review 138, 11; 10.1175/2010MWR3333.1
Arctic high cluster composites using (a) 10 MCA coefficients computed from 10 PCs, (b) 5 MCA coefficients computed from 10 PCs, (c) 5 MCA coefficients computed from 5 PCs, and (d) 3 MCA coefficients computed from 5 PCs. The contours show the 200hPa height field (contour interval of 30 m), shading the 200hPa envelope rms of highfrequency meridional wind (interval of 0.5 m s^{−1}).
Citation: Monthly Weather Review 138, 11; 10.1175/2010MWR3333.1
Arctic high cluster composites using (a) 10 MCA coefficients computed from 10 PCs, (b) 5 MCA coefficients computed from 10 PCs, (c) 5 MCA coefficients computed from 5 PCs, and (d) 3 MCA coefficients computed from 5 PCs. The contours show the 200hPa height field (contour interval of 30 m), shading the 200hPa envelope rms of highfrequency meridional wind (interval of 0.5 m s^{−1}).
Citation: Monthly Weather Review 138, 11; 10.1175/2010MWR3333.1
Arctic low cluster composites using (a) 10 MCA coefficients computed from 10 PCs, (b) 5 MCA coefficients computed from 10 PCs, (c) 5 MCA coefficients computed from 5 PCs, and (d) 3 MCA coefficients computed from 5 PCs. The contours show the 200hPa height field (contour interval of 30 m), shading the 200hPa envelope rms of highfrequency meridional wind (interval of 0.5 m s^{−1}).
Citation: Monthly Weather Review 138, 11; 10.1175/2010MWR3333.1
Arctic low cluster composites using (a) 10 MCA coefficients computed from 10 PCs, (b) 5 MCA coefficients computed from 10 PCs, (c) 5 MCA coefficients computed from 5 PCs, and (d) 3 MCA coefficients computed from 5 PCs. The contours show the 200hPa height field (contour interval of 30 m), shading the 200hPa envelope rms of highfrequency meridional wind (interval of 0.5 m s^{−1}).
Citation: Monthly Weather Review 138, 11; 10.1175/2010MWR3333.1
Arctic low cluster composites using (a) 10 MCA coefficients computed from 10 PCs, (b) 5 MCA coefficients computed from 10 PCs, (c) 5 MCA coefficients computed from 5 PCs, and (d) 3 MCA coefficients computed from 5 PCs. The contours show the 200hPa height field (contour interval of 30 m), shading the 200hPa envelope rms of highfrequency meridional wind (interval of 0.5 m s^{−1}).
Citation: Monthly Weather Review 138, 11; 10.1175/2010MWR3333.1
Summary of maximum covariance calculations between lowfrequency 200hPa height Z and envelope storm track E (see text for details). No. PCs gives the number of PCs used, Tot sq cov gives the total squared covariance in m^{2}, Sq cov is the squared covariance for each mode in m^{2}, Cov exp is the percentage of squared covariance explained by each mode, Z var exp is the percentage of height variance explained, Z cor is the correlation (×100) between the PC of the Z and MCA mode, and E cor is the correlation (×100) between the PC of the E and MCA modes.
Statistical significance of clusters using MCA coefficients as a function of the dimension of the space M (number of MCA modes used), and the number of clusters k. The numbers reported are the p values, computed as the percentage of synthetic time series with greater clustering (lower variance ratio) than the reanalysis time series. In all cases, the number of PCs N = M.
MCA is often called singular value decomposition, which describes the mathematical technique used to maximize covariance between two sets of fields.
If the Z field were to dominate the MCA, which may occur if the envelope field were to have a completely white spectrum, the matrix that describes the rotation of the Z PCs to give the new MCA coordinates would look very much like the identity matrix, at least for the lowest modes (upperlefthand corner). We have confirmed that this is not the case here, so that the rotation matrix looks very distinct from the identity matrix.