The circulation regimes in the Pacific–North American region are studied using the NCEP–NCAR reanalyses for the 18-winter period (1981/82–1998/99; NCEP18) and for the 54-winter period (1948/49–2001/02; NCEP54). The sampling properties of the regimes are estimated using very large ensembles (of size 55) of winter simulations made for the NCEP18 period with the atmospheric general circulation model of the Center for Ocean–Land–Atmosphere Studies, forced by observed SST and sea ice.
The regimes are identified using a modified version of the k-means method. From the NCEP54 dataset a set of four clusters was found [i.e., the Alaskan ridge (AR), Arctic low (AL), Pacific trough (PT), and the Arctic high (AH)], which are significant (vis-à-vis a multinormal background), and more reproducible (within randomly chosen half-length samples) than would be expected from a multinormal process. The frequency of occurrence of the PT (AH) has increased (decreased) significantly during the past two decades.
The PT cluster obtained from NCEP18 dataset more closely resembles the El Niño–forced seasonal mean pattern of recent decades than it does the traditional PNA.
The GCM simulates the AR, AL, and PT clusters (but not the AH). The simulated AR and PT patterns have errors (cf. the NCEP18 results), which are outside the range of internal variability. The simulated frequency of occurrence agrees with the NCEP18 results within sampling variability.
The differences in cluster properties of the PT and AR regimes between the NCEP18 and NCEP54 datasets are due to changes in SST forcing, not sampling error.
Year-to-year changes in the frequency of occurrence of the PT, AL, and AR clusters in the simulations and the NCEP18 dataset are generally consistent with each other.
The notion that the large-scale circulation of the extratropical atmosphere can be described by the alternation of circulation regimes (or weather regimes), in which anomalies in the amplitude and phase of planetary waves are dynamically equilibrated by variations in diabatic energy sources and nonlinear interactions with synoptic-scale eddies, can be traced back to a number of seminal studies in the 1980s (e.g., Reinhold and Pierrehumbert 1982; Vautard and Legras 1988). Since then, circulation regimes have been simulated and diagnosed using a variety of models, ranging from quasigeostrophic (e.g., Marshall and Molteni 1993; D’Andrea and Vautard 2001; Kondroshov et al. 2004) to balance equation models (Itoh and Kimoto, 1999) to primitive equation general circulation models (GCMs) (e.g., Hansen and Sutera 1990; Haines and Hannachi 1995; Monahan et al. 2000). Although regime identification in the actual observed record remains a nontrivial statistical problem (see the review by Molteni et al. 2006), recent studies have obtained results that are consistent across different methodologies (Alhamed et al. 2002; Arnott et al. 2004; Cassou et al. 2004; Robertson and Mechoso 2003).
The methods used to establish the existence and significance of regimes have, however, been questioned by a number of authors. Recently Stephenson et al. (2004) and Hsu and Zwiers (2001) have criticized the specific methodology of Corti et al. (1999) who sought evidence for regimes in the two-dimensional probability density function (PDF) of winter monthly mean height fields based on reanalyses. Stephenson et al. (2004) argue in more general terms that rigorous hypothesis testing is essential and that a null hypothesis of a multinormal distribution for large-scale atmospheric fields may be very hard to reject. This was also the conclusion of Toth (1991), although only in a phase-averaged sense.
In this paper, regimes are identified from the mathematical technique of cluster analysis. Since the concept of regimes is still somewhat controversial, we refer to specific results obtained in this paper in terms of clusters. The issue of careful hypothesis testing in our analysis will be taken up again in sections 3 and 4. The difficulty in determining the number of clusters (Christiansen 2006) will also be discussed.
In the last decade, a number of studies have investigated the possible dependence of atmospheric regime properties on the state of slowly varying components of the climate system, such as sea surface temperature (SST). By analogy with the behavior of low-order chaotic systems, it was argued that the first-order response to a moderate variation of boundary forcing should be manifested only by a change in the regime’s frequency of occurrence (Molteni et al. 1993; Palmer 1993, 1999). It was also suggested that the observed interdecadal variability in Northern Hemisphere regime frequency may be one aspect of the atmospheric response to increased greenhouse forcing (Corti et al. 1999; Shindell et al. 1999; Hsu and Zwiers 2001). However, recent studies have pointed out that large variations in the distribution of diabatic forcing (such as those associated with strong El Niño events) may lead to more substantial changes of regime properties, including the spatial pattern of the cluster “centroids” or even the number of regimes (e.g., Molteni and Corti 1998; Straus and Molteni 2004).
It is now widely recognized that the variation of circulation regime (cluster) properties may be an important issue for climate predictions, from seasonal forecasts to future climate scenarios. However, either when validating model results against observations or when comparing regime properties in different periods, one is faced with the problem of estimating the significance of the products of complex statistical tools. Specifically, one needs to quantify the uncertainty in regime properties arising from the internal, chaotic dynamics of the atmosphere and use such information to assess the predictability of such properties as a response to external variations in forcing terms.
Given the limitations of the observed record of upper-air fields, it is very difficult to address these issues from reanalysis data. However, ensembles of GCM simulations can provide reliable estimates of the effects of internal versus external variability in cluster properties. In this paper, we make use of the National Centers for Environmental Prediction–National Center for Atmospheric Research (NCEP–NCAR) reanalysis data (Kalnay et al. 1996) and of a large set of seasonal ensemble simulations with the Center for Ocean–Land–Atmosphere Studies (COLA) atmospheric GCM, previously analyzed by Straus and Molteni (2004) to investigate three specific topics:
To what extent are the properties of model-simulated wintertime clusters over the Pacific–North American (PNA) region consistent with those obtained from reanalysis data (for the common period of 1981/82–1998/99)?
Are the differences between the observed cluster properties for the recent period (as above) and those for the extended period of 1948/49–2001/02 within the range of internal atmospheric variability?
Are interannual variations in cluster frequencies (at least partially) reproducible as a function of SST anomalies?
In principle it would be more consistent to pose question 2 as a comparison between the clusters occurring during two equal-length, nonoverlapping, periods. However, the shortness of the observed record dictates that we use as many winters as possible for one of the periods; the choice of the other (recent) period is dictated by the availability of the simulations.
Although the basic k-means clustering procedure adopted here (following Michelangeli et al. 1995, hereafter MVL) is the same as used by Straus and Molteni (2004, hereafter SM04), two important methodological differences between this study and SM04 should be pointed out.
Only quasi-stationary periods are examined, as in Toth (1991) and Toth (1993). The physical motivation is that, in order for circulation regimes (or preferred patterns) to also correspond to “weather regimes” during which the character of the synoptic disturbances is unusually persistent, the large-scale flow must be quasi-stationary (Vautard 1990). The association of persistent large-scale patterns with such weather regimes is quite explicit in earlier works (see, e.g., Rex 1950; Namias 1964; Reinhold and Pierrehumbert 1982). The use of quasi-stationary prefiltering of the data in cluster analysis has been discussed by Toth (1992), Kimoto and Ghil (1993b), and Itoh and Kimoto (1999). To accomplish this filtering, only fields with a slow time evolution in the state space have been retained, a point to which we return later.
Only the seasonal cycle averaged over all years for a given period has been removed, followed by the definition of a single set of regimes (clusters). This is true for both reanalysis and model-simulated datasets. This contrasts with the methodology of SM04, in which the entire analysis, encompassing the removal of the seasonal cycle and the subsequent definition of the clusters, was carried out in a completely independent manner for each winter. Thus in particular the interannual circulation anomalies associated with strong El Niño–Southern Oscillation (ENSO) events play a direct role in the current analysis, whereas by design they played only an indirect role in SM04.
Section 2 of the paper gives a description of the datasets, experimental settings, and basic data reduction employed in the paper. The clusters from reanalysis data are described in section 3, where a detailed description of the cluster analysis procedure is given, the impact of the introduction of the “quasi-stationarity” criterion is discussed, the clusters obtained from reanalysis data both for the 54- and the 18-yr period corresponding to that of the simulations are presented, and the dependence of the cluster probability (or occupancy) on tropical Pacific SST is estimated.
Issues of significance and reproducibility for the clusters obtained from the reanalysis datasets are described in section 4, which can be skipped without loss of context by readers not interested in these more technical matters.
The regimes from the large ensemble of simulations and detailed comparisons of cluster statistics for modeled and observed regimes, designed to address the issues outlined in questions 1 and 2 above, are presented in section 5, while section 6 addresses question 3. In these sections, we estimate the contribution of atmospheric internal variability to fluctuations of cluster properties (such as centroid patterns and frequencies of occurrence) by comparing results from different subsamples extracted from the full 55-member ensemble dataset. Section 7 discusses the sensitivity of our results to the choice of the number of clusters.
Our conclusions, which are summarized and discussed in section 8, offer a rather optimistic view on the prospect of exploiting regime predictability in GCM simulations and suggest that internal atmospheric dynamics alone cannot account for all observed variations in regime statistics between the last half and the last two decades of the twentieth century.
2. Data and analysis methods
a. NCEP–NCAR reanalysis
The daily wintertime 200-hPa geopotential height and zonal wind fields1 were accessed from the NCEP–NCAR reanalysis for two periods: 18 winters (1981/82– 1998/99) and 54 winters (1948/49–2001/02). These datasets are referred to as NCEP18 and NCEP54, respectively. The shorter, more recent period was chosen to match the very large set of GCM ensemble simulations described in section 2b, while the longer period covers almost the whole available data record. For each winter, the 110-day period starting on 12 December was accessed. To ensure consistency with the simulated data only 0000 UTC data were used. The original fields, available on a 2.5° × 2.5° latitude–longitude grid, were interpolated to a Gaussian grid corresponding to triangular 42 (T42) truncation (approximately 2.8° × 2.8° latitude–longitude grid) in order that the reanalysis fields have the same horizontal resolution as the GCM fields.
b. GCM “Grand Ensemble” dataset
The atmospheric GCM used is version 2.2 of the COLA GCM, run at horizontal spectral resolution of triangular T63, with 18 sigma levels. It uses the dynamical core of the NCAR Community Climate Model version 3 (CCM3) described in Kiehl et al. (1998); otherwise, it is as described in Schneider (2002). The dependent variables of the model are spectrally treated, except moisture which is advected using the semi-Lagrangian technique. The land surface model, which is coupled to the atmospheric model, is the Simplified Biosphere Model (SSiB) documented in Xue et al. (1991). The parameterization of deep convection is the relaxed Arakawa–Schubert scheme (Moorthi and Suarez 1992). For further details consult Schneider (2002).
For each of the 18 winters (1981/82–1998/99) we analyze an ensemble of 55 members. We label the initial conditions (ICs) for each winter as follows: ICs 1–5 are from the NCEP–NCAR reanalyses for 0000 UTC of 26–30 November. Initial conditions 6–10 are obtained from ICs 1–5 by adding a perturbation that consists of the difference between the analyses 12 h prior and 12 h subsequent. The remaining 45 ICs were obtained by adding nine further perturbations each to ICs 1–5, chosen randomly from a larger set of perturbations consisting of 24-, 48-, and 72-h differences of 0600, 1200, and 1800 UTC reanalysis states. In each case, the two reanalysis states used to form the perturbation bracket the original IC in time.
The observed SST and sea ice fields used as boundary conditions for the GCM simulations are taken from Reynolds (see Reynolds and Smith 1994); these fields vary weekly, so both intraseasonal and interannual variations in boundary forcing are used.
As with the reanalysis data, for each winter we access the 110-day records of 200-hPa height and zonal wind (once daily, at 0000 UTC), starting on 12 December. We interpolate these fields, originally on a T63 Gaussian (approximately 1.9° × 1.9° latitude–longitude) grid, to the common T42 Gaussian grid. The dataset consisting of all 55 simulations for each of the 18 winters is referred to as the “Grand Ensemble,” GE18, or GE in this paper.
c. Time-scale filtering
A “seasonal cycle” was obtained by retaining the lowest three components of a Legendre expansion in time of the 110-day time series at each grid point for each winter simulation (Straus 1983), followed by averaging the parabolic series over all ensemble simulations for all years. This mean seasonal cycle was removed from each individual simulation. For the observational datasets (NCEP54 and NCEP18), the seasonal cycle was obtained by averaging the parabolic time series at each grid point over all available years (54 or 18) and removing this seasonal cycle from the data for all years. Following many other studies on regimes, we filter out baroclinic, synoptic-scale fluctuations with the use of temporal filtering. The filtering was applied to the 110-day time series for each winter (after removal of the seasonal cycle) by the application of a 15-point digital Lanczos filer (Duchon 1979) so that fluctuations with periods greater (less) than about 10 days were retained (filtered out). After filtering, 96 days are available for further analysis. Thus, the time series have only the mean seasonal cycle and components with periods less than about 10 days filtered out. We then use only every fourth point of this filtered dataset, giving 24 points in each seasonal time series (see section 2g).
d. Reduction of dimensionality
We use an empirical orthogonal function (EOF) analysis in order to reduce the dimensionality of the datasets. The EOFs (spatial patterns) of 200-hPa height and zonal wind were computed from the T42 grids as the normalized eigenvectors of the covariance matrix2 calculated over the Pacific–North American region 20°–80°N, 150°E–30°W for the NCEP54, NCEP18, and GE18 datasets. The dimensional principle components (PCs), which are the time series of the EOFs, form the coordinates of the reanalysis or model state in a state space. The total space–time variance explained jointly by EOF 1–6 and EOF 1–10 are given in Table 1 for the various datasets. For geopotential height (u wind), the leading 6 EOFs explain at least ∼69% (∼63%) of the regional variance, and the leading 10 EOFs explain over ∼80% (∼77%) of the variance.
e. GCM one-member “EXP18” samples
From the GE18 200-hPa height PCs, we extracted 55 GCM one-member samples. A single sample consists of PCs from one winter simulation for each of the 18 winters. Each sample contains simulations with the same IC label. Since even the simulations with the same IC label are intialized independently for each calendar year, they may be considered to be independent. Note also that each sample of integrations is influenced by the identical record of winter SST. We subsequently refer to these samples as the EXP18 dataset.
f. GCM three-member “EXP54” samples
We also make use of 55 GCM three-member samples, each consisting of PCs from three winter simulations for each of the 18 winters. For each winter, the PCs belonging to the three-member sample n are obtained by taking the PCs of the simulation with ICn from the one-member sample EXP18, and augmenting it by the PCs from two randomly chosen simulations from among the other 54 members for the same winter. Here n runs from 1 to 55. We subsequently refer to this as the EXP54 dataset. It should be noted that the EXP54 samples include data from the same number (54) of winters as NCEP54, but associated with the same range of SST variability as in NCEP18 and EXP18. Therefore, any differences in statistics derived from the EXP18 and EXP54 datasets should only be ascribed to internal atmospheric variability.
g. Synthetic Markov datasets
In assessing whether the null hypothesis of multinormality can be rejected, it is necessary to perform Monte Carlo simulations using a large number of synthetic datasets, which are described here. (The actual statistical testing is described in sections 3 and 4). A large pool (from 100 to 500) of synthetic datasets is created separately to be comparable to each of the NCEP54, NCEP18, and GE18 datasets. In each case, each synthetic dataset has precisely the same length as the observed or simulated dataset against which it is compared, and is generated from a series of n Markov processes, whose variance and first-order autocorrelation are obtained from the first n PCs of the appropriate observed or simulated dataset. A requirement for using the surrogate synthetic datasets for hypothesis testing is that the low-frequency PC time series (sampled every 4 days) is well approximated by a Markov process.
3. Regimes from NCEP–NCAR reanalysis
a. Cluster analysis of quasi-stationary states
Following SM04, the k-means method (as described by MVL) was used. Recent studies using observationally based datasets have found that this method gives results consistent with those of hierarchical and mixture model methods (Alhamed et al. 2002; Cassou et al. 2004; Robertson and Mechoso 2003). Using the 6- or 10-dimensional state space spanned by the EOFs (as described in section 2), a partition into k clusters, or categories, is sought. This partition is accomplished in such a way that the ratio of the variance between cluster centroids (a centroid being the average over all states in a cluster) to the average intracluster variance is maximized. This is equivalent to minimizing the intracluster variance since the total variance of the dataset is fixed. The variances are computed using an Eulerian distance metric in the state space of the PCs.
Before the cluster analysis is carried out, the temporally filtered time series (see section 2c) are screened to focus on quasi-stationary periods, a procedure recommended by Toth (1992), Kimoto and Ghil (1993a), and Itoh and Kimoto (1999) in order to ensure that periods of persistent synoptic weather are considered. This quasi-stationary selection was accomplished by first computing a histogram of the distance in state space between successive points to give a measure of the difference between low-frequency states four days apart. (Fifty bins were used to compute the histogram, based on the computed upper limit of the distribution.) For m = 1, . . . , 50, a cumulative distribution is computed as the fraction of states that fall into the lowest m bins; once that fraction exceeds α, the states in the remaining bins (m + 1 and higher) are excluded from the cluster calculation. Since the choice of α is arbitrary, we will explore the sensitivity of some of our results to the value of α, which ranges between 0.40 (strong filtering) and 1.00 (no filtering).
Methods used to verify the significance and stability of the clusters include significance vis-à-vis a null hypothesis of a multinormal background distribution, reproducibility using randomly chosen half-length datasets, and consistency between clusters obtained from different variables (see MVL and SM04). Since the null hypothesis of a multinormal distribution may be rejected not only because of multimodality but also because of skewness (Stephenson et al. 2004; Christiansen 2007, hereafter C07), we have carried out statistical analyses (similar to those of C07) that indicate the modest degree of skewness of the PCs is not sufficient to explain the high significance levels that we obtain. See section 4 and the appendix for details.
Significance vis-à-vis a multinormal distribution is computed here as in SM04, using a Monte Carlo procedure in which a large number of synthetic datasets are created, each the same length as the dataset analyzed. The synthetic datasets are described in detail in section 2g. The cluster analysis is repeated (and the variance ratio calculated) for each set of synthetic data, and the significance level p is given as the fraction of cases in which the variance ratio of the Monte Carlo data fall below that computed from the real data.
For those cases in which quasi-stationary filtering was invoked (α < 1.0), the Markov process variance and lag-1 autocorrelation were obtained from the unfiltered PCs, and the quasi-stationary filtering was applied prior to the cluster analysis of each synthetic dataset to ensure consistency.
In the context of determining the dependence of the cluster properties on external forcing (in this case SST), the reproducibility tests are carried out by comparing randomly chosen half-length subsets within each simulation in order to avoid mixing different sets of SST forcings. To have a measure of reproducibility expected in the context of the multinormal null hypothesis, we have also carried out equivalent reproducibility tests using a number of synthetic datasets (as above). By comparing the clusters of the 200-hPa u-wind field to the cluster centroid maps of the geostrophic u wind calculated from the height clusters, we have tried to assess the consistency of the clusters across variables. Since geostrophic balance is expected to work well in midlatitudes, this comparison is equivalent to a sensitivity test of the cluster analysis to the choice of the distance metric in state space. Results of these tests are discussed in section 4.
b. Regimes in the observed 54-winter record
One of the most difficult problems in cluster analysis is to uniquely define the number (k) of clusters (as discussed in C07). In the next section and the appendix we present evidence that k should be at least 3 based on the strength of the clustering given by the variance ratio, but also that k should be less than 5 based on reproducibility results. Although we focus on four clusters, we give some results for k = 3 in section 7 to indicate the sensitivity of our results to the cluster choice.
Figure 1 shows the height maps of the four centroids from the NCEP54 record, computed using six EOFs and α = 0.50. We refer to cluster 1 (Fig. 1a) as the Alaskan ridge (AR) pattern, in agreement with the classification of SM04 and Renwick and Wallace (1996). It is similar to the P6 Pacific cluster reported by Kimoto and Ghil (1993b, hereafter KG), to the Pacific cluster reported in Fig. 11b of Smyth et al. (1999, hereafter SIG), to the A global cluster of Cheng and Wallace (1993, hereafter CW) in their Fig. 4, and to the A Pacific cluster shown in Fig. 8 of CW. This pattern has the appearance of a wave train originating in the subtropical Pacific, and is associated with Alaskan blocking (Renwick and Wallace 1996). Cluster 3 (Fig. 1c) consists of a wave train nearly out of phase with the AR but with the centers shifted, particularly the ones over the continent. This cluster, referred to as the Pacific trough (PT), resembles the P1 Pacific pattern of KG, the Pacific cluster of SIG shown in their Fig. 11a, and the R Pacific cluster of CW (their Fig. 8). This regime pattern is referred to as PNA-like, after the well-known PNA pattern of Wallace and Gutzler (1981).
Cluster 4 (Fig. 1d) is termed the Arctic high (AH), and is strongly suggestive of the North Atlantic Oscillation (NAO) and/or the Northern Hemisphere annular mode. It strongly resembles the global cluster G of SIG (their Fig. 9c), and the global cluster G of CW (their Fig. 4). Finally, cluster 2 (Fig. 1b), termed the Arctic low (AL), is similar to the Pacific P4 cluster of KG (their Fig. 5).
c. NCEP–NCAR 18-winter record
We chose to analyze the 18-winter record so that the tropical SST forcing would be the same as that of the GCM simulations to which we compare. However, with only 18 winters (and hence a rather small sample size of 18 × 24 = 432) it is not possible to demonstrate that the clusters have high formal significance.
The cluster centroids are shown in Fig. 2 for the same parameters used in Fig. 1. One can easily establish a visual one-to-one correspondence of these patterns with those shown in Fig. 1 for the NCEP54 dataset with, for example, the AR regime having a very strong high over the Gulf of Alaska in both cases and the PT regime a strong low. The associated pattern correlations of the AR, AL, PT, and AH clusters are 0.96, 0.91, 0.62, and 0.66, respectively. (The corresponding pattern correlations for the cluster analysis with no quasi-stationary filtering are 0.97, 0.60, 0.67, and 0.67, respectively. Thus only the AL changes dramatically using the pattern correlation as a measure of correspondence. Another measure of correspondence of two cluster centroids will be introduced later in the paper.)
The modifications in the PT pattern between the two periods are worth noting: the continental high has been shifted from northwestern North America in NCEP54 to Hudson Bay in NCEP18. The degree of mixing between the PT and AH also differs in the two periods. The latter period PT pattern is quite similar to the seasonal mean response forced by tropical Pacific SST in more recent times (Barnston and Livezey 1987; Straus and Shukla 2002, and many studies quoted therein), whereas the configuration of the PT pattern in NCEP54 is closer to the classical PNA pattern of Wallace and Gutzler (1981). This suggests that changes in the cluster maps may be associated with different tropical Pacific SST patterns. This hypothesis will be tested in section 6.
d. Evolution of cluster frequency
To assess the degree to which we can associate certain regimes with SST forcing, it is helpful to examine the winter-by-winter time series of the residence frequency of the quasi-stationary low-frequency atmosphere flow in each of the four NCEP54 clusters in Fig. 1. Such a series is shown for each of the clusters in Figs. 3a–d along with scatterplots of the standardized frequency against the standardized (seasonally averaged January–March) Niño-3.4 SST index3 shown in Figs. 3e–h. The frequency for a given cluster for a given winter gives the fraction of time the circulation resides in that cluster. For the α = 0.50 results, the frequencies add up to approximately 0.50 considering the whole dataset, but this is not true for each year since there could be a different proportion of quasi-stationary states in any given year.
While the Alaskan ridge and Arctic low frequencies do not display any obvious trend, the Pacific trough frequency clearly increases in the last half of the record while the Arctic high frequency decreases. The linear trend of the PT is 0.15 (54 yr)−1, that of the AH is −0.16 (54 yr)−1. Both trends are significant at the 98% level with respect to trends seen in 100 equivalent synthetic datasets. (The AH negative trend is very stable with respect to changes in the quasi-stationary filtering parameter α, while the PT trend increases gradually from 0.15 to 0.29 as α is increased from 0.5 to 1.0, i.e., as the filtering is removed.) The PT frequency also seems to undergo a qualitative change in the late 1970s; a link to the regime shift in the stratosphere found by Christiansen (2003) is suggested.
Both the AR and PT are correlated with Niño-3.4, (−0.4 and 0.4, respectively)4 based on a 54-yr time series, supporting the notion that the AR is more active in cold (La Niña) events (as in SM04 and Renwick and Wallace 1996) and also the association of the PT with warm events. (One can clearly pick out the strong warm El Niño winters of 1982/83, 1986/87, and 1997/98 as having a higher frequency of occurrence of the PT regime.)
Table 2 shows the cluster frequencies averaged over the three 18-yr periods within the 54 yr, as well as the 54-yr average, and the frequencies for the NCEP18 regimes. The most common regime in both the NCEP54 and NCEP18 calculations (the PT) becomes more common from the first to the third period, while the least frequented regime (AH) becomes less common. The AR and AL do not show obvious trends.
It is interesting to note that the positive trend in the North Atlantic Oscillation index as shown for example by Hurrell (1995) is expressed here as a suppression of the Arctic high.
4. Significance, consistency, and reproducibility of NCEP54 regimes
The significance of the NCEP54 height clusters vis-à-vis a suitably defined multinormal synthetic dataset (as in section 2g) is presented in Table 3. Results are given for both 6-EOF and 10-EOF truncations, and for a wide range of α. (The smaller the value of α, the greater the degree of prefiltering.) Significance values greater than 90% are shown in bold. Also shown is the variance ratio (ratio of variance among centroid clusters to average intracluster variance) for various values of α for the six-EOF truncation.
Following the approach of C07, we apply the cluster analysis to a synthetic dataset consisting of independent processes, each one having the same lag-1 autocorrelation and skewness as the corresponding PC of NCEP54. The results show that skewness alone is not enough to generate the high levels of significance in the clustering procedure5 reported in Table 3. For details, consult the appendix.
For six EOFs, the significance values of Table 3 show relatively little variation for k = 3–6 and α ≤ 0.8 (with values generally above 87%). The significance is clearly lower for k = 2 (<73%), and is somewhat lower for the unfiltered data (α = 1.0). The behavior of the significance is less systematic for 10 EOFs, indicating that the NCEP54 dataset is too short to adequately sample the larger state space.
The reproducibility is calculated by dividing the low-frequency PC time series of length 24 for each winter into two randomly chosen time series of length 12 and carrying out the cluster analysis for the two 54-winter datasets, each of length 12 × 54 = 648. (This procedure insures that all randomly chosen samples feel approximate the same SST forcing.) This procedure is repeated for 100 trials, and for each trial the two sets of clusters obtained (for each value of k) are matched. The matching is done in two ways: by maximizing the mean pattern correlation over every possible pairing of the k centroids between the two half-length samples and by minimizing the mean squared error. (See the appendix for more details. All averaging of correlations is carried out using the Fisher Z transform.)
The pairing giving the maximum average correlation, or minimum mean squared error (over k centroids), is then archived for each trial, yielding a distribution for each measure. For the pattern correlation, the average of these distributions is 0.95, 0.85, and 0.78 for k = 3, k = 4, and k = 5, respectively, while the median correlation is 0.96, 0.86, and 0.65. The centroid patterns for the average of all 200 half-length samples (matched on the basis of pattern correlation) are consistent with the patterns shown in Fig. 1. (For k = 4, the pattern correlations for the AR, AL, PT, and AH patterns are 0.98, 0.69, 0.87, and 0.98, respectively.)
The average of the mean squared error distribution of half-length samples was compared with that computed using a large number (100) realizations of synthetic Markov processes (see section 2g) for the NCEP54 dataset. The ratio of the NCEP54 error to that obtained from the synthetic data is shown in Table 4 for values of k from 3 to 5. Here we report the reproducibility of each cluster individually. It is clear that for k > 4 all cluster centroids are not on average as reproducible as the multinormal data; hence, k = 5 and larger can be rejected. It is noteworthy that for k = 4 the Arctic low is not reproducible.
Since considerations of the variance ratio suggest that k should be at least 3, but reproducibility considerations suggest that k < 5, we have chosen four clusters for many of the results in this paper, with some results repeated for k = 3.
We show that k = 4 is a reasonable choice in two further ways, based on the consistency of centroids with respect to variable (height versus zonal wind) and with respect to domain (regional versus hemispheric).
The consistency with respect to variable is assessed by comparing the maps of the geostrophic zonal wind obtained from the NCEP54 height cluster centroids with the maps of the centroids computed from the cluster analysis of the zonal wind. For six PCs and α = 0.50 these patterns (not shown) match quite well, with pattern correlations of 0.93, 0.84, 0.72, and 0.82 for the AR, AL, PT, and AH, respectively. (Note that, since the filtering is done independently for height and zonal wind clusters, the set of states going into the zonal wind and height cluster analyses may be somewhat different.)
We have compared the centroids shown in Fig. 1 to those computed using a full-hemispheric domain (north of 20°N) in order to provide a test of the sensitivity to the choice of lateral boundary. The two sets of patterns (not shown) are nearly identical over the hemispheric region 150°–30°W. The clusters that we have called the AH and AL have elements of both the Northern Hemisphere annular mode (Thompson and Wallace 2000) and the NAO (Hurrell 1995).
As in SM04, consistency with respect to variable and region also hold for values of k other than 4; however, it is important to note that they do hold for k = 4.
Finally, we show that the patterns given in Fig. 1 for NCEP54 are consistent with the cluster centroids obtained using other combinations of PCs retained and quasi-stationary filtering parameter α. Table 5 reports the average pattern correlation between the clusters of the standard NCEP54 calculation (six PCs, α = 0.50) and a number of other configurations. For each pair of configurations and for each k all possible pairwise matchings are considered, and the matching with the maximum average correlation is chosen.
5. Clusters of the Grand Ensemble
The cluster centroids of geopotential height for k = 4 are shown for the “Grand Ensemble” with the standard parameters (six EOFs, α = 0.50) in Fig. 4. Compared to Fig. 2, we are able to easily recognize the counterpart of the NCEP18 Alaskan ridge, Arctic low, and the Pacific trough from the simulations (Figs. 6a–c). (The corresponding pattern correlations are 0.74, 0.76, and 0.84.) The AH cluster in nature is replaced in the GCM results with a distinctive zonally oriented wave train pattern (Fig. 6d).
Despite the overall pattern agreement for the AR, AL, and PT regimes, there are some differences. To assess these differences, we compute the pattern correlation for each regime from the cluster analyses based on each of the 55 members of the EXP18 samples, and the cluster analysis based on the entire GE18. From these values we construct a probability distribution function, which is shown for each regime in Fig. 5. We match the regime centroids in each of the EXP18 sample clusters with those in the GE18 as before, by considering all possible pairings of k centroids and retaining the pairing with the largest average correlation. The width of the PDFs for the AL and AH patterns indicates that these clusters, although significant within the entire GE18, are less reproducible across samples than the AR and PT. In fact, if only two clusters are retained in the GE18 analysis (k = 2), the centroids obtained strongly resemble the AR and PT (not shown).
For the AR and PT regimes, the PDF of pattern correlation has most of its weight at values higher than the pattern correlation between the NCEP18 and GE18 centroid (shown as a vertical line in Fig. 5), indicating that the subtle differences between simulated and observed patterns are in fact significant vis-à-vis expected sampling variations. The AL correlation of 0.76, however, lies within the (wide) range indicated by the sampling PDF, so we can say that this simulated regime pattern has statistically significant agreement with the reanalyses.
The formal significance tests of the GE cluster analysis indicates 100% significance for every value of k tested, from 2 to 6. This is in part due to the size of the dataset (24 independent states for each of 55 ensemble members for each of 18 winters yields 23 760 states): even a modest degree of clustering is very rarely reproduced with a multinormal synthetic dataset of this size.
The frequency of occurrence of the AR, PT, and AL for the GE are compared to those observed (NCEP18) for the same winters in Table 2. While the PT is visited as frequently as observed, the AR is somewhat less common, while the AL is somewhat more common. The GCM underestimate of the AR frequency is consistent with the general underprediction of blocking in most models (see Watson and Colucci 2002 and references therein).
We evaluate whether the differences in the cluster occupation for the AL and AR in the simulations compared to the reanalyses could be due to sampling error by again utilizing the EXP18 samples. The PDFs in Fig. 6 indicate the distribution of frequencies for the 55 cluster calculations carried out on the EXP18 dataset. In contrast to the differences in pattern evaluated in Fig. 5, here we see that the modest differences in cluster population between the GE and NCEP18 (both indicated by vertical lines) are well within the spread of the samples. Thus, we can conclude that while the simulated circulation regime patterns show some significant differences from our reanalysis estimates, the cluster frequencies do not.
6. Cluster frequencies and SST forcing
As mentiond in the introduction, the analyses presented in this paper for simulated and reanalysis datasets do not remove the seasonal cycles of individual winters prior to computing the clusters, but only the climatological seasonal cycle. This means that SST-forced interannual variability may play a role in determining the regimes. The purpose of this section is to explore that role. Specifically, we ask three questions: Are the differences in regime patterns seen from the NCEP54 and NCEP18 analyses presented in section 3 only due to sampling error, or does the change in SST forcing play a role? Are the decadal shifts in cluster occupancy seen within the NCEP54 analysis (shown in Table 2) larger than the estimated sampling error? Finally, is the interannual variability of cluster frequencies within the GE consistent with that of NCEP18?
Again, we use the sampling properties of the large set of EXP18 cluster analyses to try to answer these questions. Figure 7 presents the PDF of pattern correlation (for each regime) between the clusters in each of the EXP18 sample members with the matching cluster in the corresponding EXP54 dataset. (Remember, each of the 54-yr EXP54 datasets was obtained by starting with one EXP18 sample and augmenting it with two other randomly chosen 18-yr time series from other initial conditions.) The clusters obtained from each EXP18 and EXP54 sample were identified by matching with the full GE cluster centroids using the pattern correlation obtained as before. The PDFs in Fig. 7 thus give an estimate of the difference in regime patterns that might be expected from 18-yr versus 54-yr time series based on sampling error alone. The vertical lines indicate the actual pattern correlations between the NCEP54 and NCEP18 cluster patterns. While the AR and AL pattern differences certainly lie within the range of sampling error (as might be expected since these differences are small), the PT shows significant differences, while the significance of the change in AH is marginal. The PT and AH showed a distinct trend in frequency of occurrence during the 54 yr (shown in Fig. 3).
The change in PT from a pattern showing some resemblance to the wave-train-like PNA of Wallace and Gutzler (1981) in the NCEP54 results to the Tropical Northern Hemisphere (TNH) pattern of Barnston and Livezey (1987) in the more recent years is certainly consistent with the work of Straus and Shukla (2002), who showed that the seasonal mean response to ENSO during the more recent decades closely resembles the NCEP18 PT regime.
The year-by-year variation in cluster frequency from the GE18 and NCEP18 cluster calculations are shown in Fig. 8 as the red and blue lines, respectively) along with a band (shown in light gray) representing the one standard deviation uncertainty in the frequency within each given winter due to sampling error. (This was estimated from the GE18 directly by evaluating the cluster frequency for each of the 55 ensemble members for each year, based on the cluster classification for the entire GE dataset.)
There are individual years in which statistically significant discrepancies can be seen between NCEP18 and GE. For example, the simulations underestimate the occurrence of the Alaskan ridge during the winters of 1988/89 and 1989/90, and also underestimate the occurrence of the Pacific trough during the warm ENSO years of 1986/87 and 1997/98. For the PT, AL, and AR, the blue and red lines track each other reasonably well except during the mid-1990s, while, in general, the differences between red and blue lines tend to stay within the gray band.
The overall intermember standard deviation σ of the frequency of occurrence of each regime from the GE18 results is given in the last column of Table 2. Comparing these to the observed changes in frequency also shown in Table 2 for the three 18-yr periods spanning 1948/49–2001/02 and with the time series in Fig. 3, we see that the observed decrease in the AR from the first to second period is significant (∼2σ), as is the increase from the second to the third period. The increase in PT occurrence in period 3 compared to period 1 in the table is highly significant (∼3σ). It is curious that the decrease in what we label the AH regime between period 1 and period 3 in NCEP54 is also nominally significant (∼−5σ) based on the GE18 estimates of variability although, since this circulation regime is misrepresented in the GCM, the estimate of variability may not be very good.
7. Results for three clusters
As we have indicated in sections 3 and 4, partitions of the NCEP54 dataset into more than four clusters are less reproducible on average than would be expected from the null hypothesis of a multinormal PDF (see Table 4), while the strength of the clustering (measured by the variance ratio) for k = 2 is rather low (see Table 3). But one cannot rule out k = 3 as a viable alternative for the NCEP54 data.
The centroids for k = 3 are shown in Fig. 9, in a form analogous to the k = 4 maps of Fig. 1. (Note that there is no panel b on purpose.) The AR and AH patterns carry over from k = 4 to k = 3 nearly unaltered, while the k = 4 PT and AL are clearly merged into a single pattern shown in Fig. 9c.
The NCEP18 centroid patterns for k = 3 are shown in Figs. 10a–c. The AR and AL patterns appear nearly unchanged from their four-cluster versions, while the AH and PT are merged into a pattern that, however, still strongly resembles the PT (cf. Figs. 2c and 10c). Figures 10d–f show the k = 3 patterns for the GE18, and can be compared to the maps in Fig. 4. Here the AR and PT, as well as the wave-train-like AH carry over quite well from Figs. 4 to 10, while the AL regime in GE18 is lost.
The year-to-year variability in the frequency of occurrence of the AR and PT patterns, which are the only ones now in common between GE18 and NCEP18, is presented in Fig. 11 in the same form as the k = 4 results in Fig. 8, with the gray band denoting plus or minus one standard deviation in frequency, computed from among the 55 members of GE18.
The AR results for both the GE18 and NCEP18 datasets are remarkably robust. Overall the frequencies of both increase very little in Fig. 11 (cf. Fig. 8) in spite of the fact that there are only three possible clusters. The peaks in 1988/89, 1989/90, and 1996/97 for the NCEP18 dataset are well reproduced, and the peak in 1993/94 is stronger. The overall impression that the simulated (GE18) AR does not occur often enough, but yet tracks the NCEP18 frequency except during the mid-1990s still holds.
Comparison of Figs. 8c and 11b shows that the PT frequency increases somewhat when only three clusters are used, but the year-to-year variation in both simulated and reanalysis datasets is very well preserved. The NCEP18 frequency departs from the gray GE18 band (on the positive side) only during the warm events of 1986/87 and 1997/98, and on the negative side during the winters of 1988/89 and 1989/90. (While the PT does not occur in NCEP18 for k = 4 at all during the winter of 1993/94, for k = 3 a small fraction is seen, putting the NCEP18 result just in the gray band.) Also note that the simulations track the reanalysis reasonably well except during the mid-1990s, and this holds for three or four clusters.
8. Discussion and conclusions
While there has been a great deal of interest in better understanding the properties of circulation regimes in the real atmosphere, the short record lengths available for low-frequency boreal wintertime flow present a formidable technical challenge to verifying the significance of deviations of the circulation PDF from Gaussian behavior. This difficulty is only compounded by the realization that the regime properties may be sensitive to forcing, for example, tropical Pacific SST as discussed in this paper or global radiative forcing in a future climate.
To assess whether differences between regime characteristics from reanalysis or simulated datasets are “significant,” and whether changes from one period to another are related to forcing, or are simply due to sampling error, it is essential to have some estimates of the relevant sampling properties. Numerical models of the atmosphere or atmosphere–ocean system play an important role in this regard since very large samples (generated with the same forcing) can be constructed.
In this paper we utilize 55-member ensembles of seasonal integrations of the COLA AGCM utilizing observed SST for 18 winters to provide cluster sampling properties. The simulated clusters shown in Fig. 4 are associated with a variance ratio that is achieved only by far fewer than 1% of multivariate normal Markovian synthetic time series.
The reliance on an imperfect atmosphere-only model to provide the sampling properties motivates the question of the consistency of the properties of model-simulated clusters with those obtained from reanalysis data. Figures 2 and 4 (which compare the regime centroids of the NCEP18 and simulated datasets) clearly show that three out of the four observed clusters (the Alaskan ridge, Arctic low, and Pacific trough) have identifiable counterparts in the reanalysis, while the Arctic high is not simulated. Yet the pattern correlation between the simulated and observed AR and PT clusters falls below the range implied by sampling error, indicating some error in their simulation. The relative frequency of occurrence is better simulated, however, with the small differences between NCEP18 and simulated frequencies lying well within the expected spread due to sampling error.
The direct comparison of simulated and observed clusters for the common recent 18-winter period is somewhat clouded by the fact that formal significance of the cluster analysis in the NCEP18 record is hard to establish. With the longer NCEP54 record, we establish that the four-cluster partition is significant (at a level above 90%) with respect to the null hypothesis of a multinormal dataset, is more reproducible than would be expected on the basis of that null hypothesis, and has as large enough variance ratio to suggest bimodality in the one-dimensional PDF. (Consistency with an independent calculation carried out over the entire hemisphere and agreement with a separate cluster analysis of the zonal wind are also found for a range of cluster numbers.)
Comparing the circulation clusters of the NCEP54 record (Fig. 1) with those of the NCEP18 record (Fig. 2), we ask whether the differences in centroid patterns are simply due to sampling or whether they reflect differences in the external SST forcing between the two periods. Figure 7 suggests that the intrarecord differences for the AR and AL patterns are probably due to sampling error, while the changes in the PT configuration are larger than expected due only to sampling error. The changes seen in this centroid from NCEP54 to NCEP18 resemble the differences between the classic “PNA” pattern and the seasonal mean response of the atmosphere to the strong El Niño events of the last several decades (Straus and Shukla 2002). Our results suggest that the change in the PT is caused by differences in SST anomalies, and also suggest the importance of low-frequency circulation regimes in determining the seasonal mean response to those SST anomalies. An alternate approach to demonstrating the sensitivity of the low-frequency PNA-like cluster to SST forcing is to examine the frequency of occurrence of a fixed regime pattern (that from the NCEP54 record) from year to year; this is shown in Fig. 3.
Figure 3 confirms that both the AR and PT frequency in the NCEP54 record are correlated with seasonal mean tropical Pacific SST, with the former seen more often in cold ENSO winters, and the latter seen more frequently in warm years. The enhancement of the AR pattern during cold winters has been reported previously (e.g., Chen and van den Dool 1997; Renwick and Wallace 1996), but has not before been put into a consistent framework of circulation regimes.
The very strong decadal behavior in the frequency of the AH in the NCEP54 record shown in Fig. 3 (with far fewer occurrences seen in the latter decades) compared to the more consistent behavior of the AL over time suggests an asymmetric behavior of the atmosphere with regard to the expression of the annular-type modes over the Pacific–North American sector. The trend in strengthening of the westerlies over the North Atlantic is seen here to be due to the decreasing importance of the AH pattern.
The regime sensitivity to tropical SST forcing implies a degree of added intraseasonal and interannual potential predictability, and is thus of great interest. Whether there is any hope of realizing this predictability of course depends on the ability of GCMs to capture not only the regimes, but their dependence on SST forcing. Figure 8 shows that the year-to-year variability of simulated frequencies of the AL and PT are close to those observed for the NCEP18 record, and that in general the differences between simulated and observed regime frequencies stay within the uncertainty band estimated from the simulations.
In the introduction we mention two (competing) hypotheses regarding the change in regime behavior to changes in boundary forcing: a change only in the frequency of occurrence of fixed regimes, versus a change in the regime structure itself. The behavior of the AR is a good example of the former; that of the PT regime of the latter (see Figs. 1, 2 and 8).
Since the sampling properties of the regimes have been estimated with a single uncoupled atmospheric GCM, it is important to investigate the model dependence to these sampling properties, and in particular whether they are strongly changed in atmosphere–ocean coupled models. This work is currently under way.
We wish to acknowledge the work of Larry Marx and Dan Paolino in executing the large number of GCM simulations. The insightful comments of the anonymous reviewers and the editor have helped to improve the manuscript. This work was supported by the National Science Foundation under Grant ATM-03-32910, the National Aeronautics and Space Administration under Grant NNG-04-GG46G, and the National Oceanic and Atmospheric Administration under Grant NA04-OAR-4310034.
Difference Measures and the Effect of Skewness
Measures of the differences between maps
In this paper we use two measures of difference between maps. One measure is just the mean squared error; the other the (uncentered) pattern correlation C. These errors are computed in the PC state space. The two maps are described by N-dimensional vectors x and y, where N is the number of PCs retained in the analysis. The mean squared error is proportional to the squared distance (E) between the vectors, which can be written as
where x is the length of the vector x, y the length of y; EA measures the amplitude error, and EP the phase error. Note that EP = 2xy(1 − C) and that C is just the cosine of the angle between x and y. When the vectors are aligned in state space, we have EP = 0, C = 1, and EA = E. When the vectors have the same length (x = y), we have EA = 0, and EP = E. In all cases discussed in the text, we find that E ≈ EP; that is, EA ≪ EP.
Effect of skewness on significance estimates
Here we show that the skewness of the NCEP54 PCs does not yield significant clustering against a multinormal background hypothesis using the k-means approach.
We follow the approach of C07, who considers an idealized two-dimensional distribution given by the product of a normal distribution and a Gamma distribution. In C07, clusters are computed for values of the shape parameter α = 50, 20, 10, 5, 3. The corresponding values of skewness are s = 0.28, 0.45, 0.63, 0.89, 1.15, since s = 2/α. The k-means method is applied using an approach similar to our significance testing (see section 4) to give the preferred number of clusters. For values of α = 10 or lower (values of s = 0.63 or higher), more than one cluster is preferred, even though the idealized distribution is clearly unimodal.
For the NCP54 dataset, large skewness was confined to the leading four PCs, with (absolute) coefficients of 0.25, 0.23, 0.15, and 0.27, respectively. (Thus from the results of C07, we would not expect the k-means method to falsely report more than one cluster based on skewness alone.) Skewed time series of pseudorandom data using a nonsymmetric Markov model were constructed in a four-dimensional space (see below). In each direction, the skewness, lag-1 autocorrelation, and standard deviation are those of the corresponding PC. (The full multivariate cross-covariance structure is not completely maintained.)
The nonsymmetric Markov model is given by
where β, γ, and x0 are positive constants, and ε is a Gaussian white noise process. Because of the asymmetry in the dissipative term controlled by the parameter γ, large negative values occur more frequently than large positive values. The time series is thus (negatively) skewed with the skewness coefficient increasing with γ. For β = 0.2 and x0 = 1, values of γ ranging from 0.2 to 1 produce time series with (absolute) skewness coefficients ranging from 0.14 to 0.32.
The number of times in each pseudorandom dataset was set to the length of the time series for α = 0.5 in the NCEP54 dataset. For each pseudorandom dataset, a cluster analysis was performed and the significance was assessed as for the NCEP54 dataset. This cluster analysis was repeated for 200 skewed pseudorandom datasets. The average significance of clusters for cluster number k in the range of 2–4 varied from 50% to 60%, and the number of pseudorandom skewed datasets for which the significance exceeded 90% (95%) was less than 10% (5%). Thus, the k-means method does not indicate significant clustering.
The entire calculation was repeated twice, and the results are the same.
* Current affiliation: European Centre for Medium-Range Weather Forecasts, Reading, United Kingdom
Corresponding author address: David Straus, George Mason University/COLA, 4041 Powder Mill Rd., Suite 302, Calverton, MD 20705. Email: firstname.lastname@example.org
The equivalent barotropic nature of midlatitude intraseasonal variability, with amplitudes increasing with height, motivated the use of upper-tropospheric fields for the analysis.
To assure appropriate area weighting, the fields are multiplied by the square root of the cosine of latitude prior to entering the covariance matrix calculation. The final centroid maps presented here, which are synthesized from the EOFs, have this square root weighting divided out before plotting.
The Niño-3.4 SST index is defined as the average SST over the region 5°S–5°N, 190°–240°.
The lag-1 year autocorrelation of the Niño-3.4 time series is quite small (−0.031). Thus, any correlation whose absolute value is greater than 0.22 is statistically significant at the 95% level.
C07 show that for skewness ≥0.63 in a unimodal one-dimensional time series, the k-means method will indicate a preferred choice of more than cluster. The magnitude of the skewness of the leading four PCs for NCEP54 is 0.24, 0.23, 0.15, and 0.27, respectively.