• Barnett, T. P., and R. Preisendorfer, 1987: Origins and levels of monthly and seasonal forecast skill for United States surface air temperatures determined by canonical correlation analysis. Mon. Wea. Rev.,115, 1825–1850.

  • Barnston, A. G., 1994: Linear statistical short-term climate predictive skill in the Northern Hemisphere. J. Climate,7, 1513–1564.

  • ——, and T. M. Smith, 1996: Specification and prediction of global surface temperature and precipitation from global SST using CCA. J. Climate,9, 2660–2697.

  • ——, and Y. He, 1996: Skill of canonical correlation analysis forecasts of 3-month mean surface climate in Hawaii and Alaska. J. Climate,10, 2579–2605.

  • Bretherton, C. S., C. Smith, and J. M. Wallace, 1992: An intercomparison of methods for finding coupled patterns in climate data. J. Climate,6, 541–560.

  • He, Y., and A. G. Barnston, 1996: Long-lead forecasts of seasonal precipitation in the tropical Pacific islands using CCA. J. Climate,10, 2020–2035.

  • Livezey, R. E., and T. M. Smith, 1999: Covariability of aspects of North American climate with global sea surface temperatures on interannual to interdecadal timescales. J. Climate,12, 289–302.

  • O’Lenic, E., and R. E. Livezey, 1988: Practical considerations in the use of rotated principal components analysis (RPCA) in diagnostic studies of upper-air height fields. Mon. Wea. Rev.,116, 1682–1689.

  • Smith, T. M., and R. E. Livezey, 1999: GCM systematic error correction and specification of the seasonal mean Pacific–North America region atmosphere from global SSTs. J. Climate,12, 273–288.

  • ——, R. W. Reynolds, R. E. Livezey, and D. C. Stokes, 1996: Reconstruction of historical sea surface temperatures using empirical orthogonal functions. J. Climate,9, 1403–1420.

  • View in gallery

    Seasonalities of average temporal correlations for specification of three-month mean U.S. (a) surface temperatures and (b) precipitation, by CCA with only global SSTs as predictors (solid lines) and Barnston and Smith’s (1996) version of the same (dashed lines). In (b) the dotted line is the counterpart of the solid line but for smoothed verification data.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 141 141 6
PDF Downloads 53 53 3

Considerations for Use of the Barnett and Preisendorfer (1987) Algorithm for Canonical Correlation Analysis of Climate Variations

View More View Less
  • 1 Climate Prediction Center, NCEP/NWS/NOAA, Camp Springs, Maryland
© Get Permissions
Full access

Abstract

No abstract available.

Corresponding author address: Dr. Robert E. Livezey, Climate Prediction Center, W/NP51, Rm. 604, 5200 Auth Rd., Camp Springs, MD 20744.

Email: livezey@sgi84.wwb.noaa.gov

Abstract

No abstract available.

Corresponding author address: Dr. Robert E. Livezey, Climate Prediction Center, W/NP51, Rm. 604, 5200 Auth Rd., Camp Springs, MD 20744.

Email: livezey@sgi84.wwb.noaa.gov

1. Introduction

Bretherton et al. (1992) have pointed out that for most problems involving interseasonal and longer timescale variability of the atmosphere and oceans, prefiltering data is necessary for canonical correlation analysis (CCA) to perform comparably to certain other multivariate statistical schemes like singular value decomposition. This approach to CCA for climate problems was first suggested by Barnett and Preisendorfer (1987, hereafter BP), who filtered multiple fields of time series by replacing them with a truncated set of their principal components (PCs). Subsequent to its introduction, the BP method was applied extensively to linear climate prediction at the National Weather Service (Barnston 1994), which in turn has led to several further applications, both published (Barnston and Smith 1996, hereafter BS; He and Barnston 1996; Barnston and He 1996) and in progress.

The CCA procedure produces an orthogonal hierarchy of predictor–predictand pattern pairs that represent in turn the maximum amount of cross correlation between the predictor and predictand fields, the second largest portion, etc. The transformation matrix relating the ordered predictor and predictand patterns can be used to optimally linearly specify the latter from the former. Its use (specifically the BP method) was adopted by us to address variants of the specification (i.e., simultaneous relationships) problems studied by BS (see Smith and Livezey 1998, hereafter SL, and Livezey and Smith 1998, hereafter LS2). In the course of this work it became apparent that the application of PC prefiltering in CCA requires attention to two considerations that can have an important impact on results. In our view these two considerations, data weighting and PC truncation, to date have not received the attention they deserve. The purpose of this note is to increase the awareness of users of the BP method to these issues. In the next section we will briefly describe the BP method and specific problem environments that should require consideration of data weighting. In the final section we will present an example in which PC truncation had a profound impact on CCA results.

2. The BP method and data weighting

Our application of the BP method was as follows: Standardization of the gridded data point by point, data weighting, separate PC analyses of the predictor and predictand covariance matrices, and CCA between truncated sets of predictor and predictand time series. Differences in the procedure to those in the previous studies were in the assignment of the predictor weights and in the number of predictor and predictand PCs retained for CCA. It is these differences that motivate this note.

Predictor and predictand field weighting prior to PCA is desirable in two situations. First, if the spatial density of gridded data varies substantially from place to place, areas with higher density coverage will have disproportionate influence on the PCA. Two of the datasets used in SL and LS2 have this property: Global SSTs on an equally spaced latitude–longitude grid where the density of grid points increases with latitude, and U.S. climate division data where it increases from the western to the eastern United States.

Ensuring that equal areas have equal influence on the PCA can be achieved in at least two ways, one of which is reinterpolation to equal area grids. An alternative approach, which we have adopted, is to area-weight the data, but this has to be done very carefully to achieve the desired result. For instance, if weights are applied and then the correlation matrix is formed the weighting will be nullified because calculation of the correlations involves restandardization. This is why we standardize, weight, and then form the covariance matrix. These steps without the weighting are equivalent to formation of the correlation matrix. The area weights themselves should be proportional to the square root of the area the data point (or climate division) represents. This is because the objective of PCA is the most efficient orthogonal representation of the total variance, which in the context of the procedure discussed here is the sum of the squares of the standarized data. Thus, the squares of the standardized data need to be area-weighted, rather than the data themselves. Lastly, the area weights can be scaled so that the total variance on a gridded field is unchanged after weighting, only redistributed.

The other situation for which it is necessary to consider predictor–predictand field weighting arises when there are multiple predictor and/or predictand fields and the individual fields for the combined PCAs have differing numbers of data points. Under these conditions the fields with the most data points (and the most variance) have the most influence on the combined PCA. In SL and LS2 the global SST data (796 grid points) was often used in combination with either the U.S. climate division data (327 divisions) or Pacific–North American region 700-hPa data (163 grid points) as predictors. The influences of the different grids on the PCA can be equalized by weighting all data on each grid (again after restandardization and before covariance matrix formation). Weighting is proportional to the square root of the ratio of the average total variance on the various grids to the total variance on the particular grid. Of course, the between-field weights can be considered an adjustable variable of the CCA and this was the case in SL, where they were varied by a further uniform percentage reduction of variance of the global SST data. Generally, there was relatively little sensitivity to a broad range of departures from equal variances in two predictor-field CCAs although results were slightly better (i.e., specification skill increased) when the variances were matched. In LS2 total variances were equalized between predictor fields with the sum of the total variances conserved in the CCAs. For their specification studies BS used uniform weighting.

3. PC truncation

Of more importance to the results in SL and LS2 than the data weighting were the departures from the approaches of Barnston (1994) and BS to PC truncation prior to CCA, in which the number of processed PCs was fixed for both predictors and predictands at 6 and 11, respectively, to minimize overfitting. In our work the only limits to the number of either predictor or predictand modes retained for further analysis were those imposed by the mathematical requirements of the CCA algorithm. The guidelines in O’Lenic and Livezey (1988) were followed to initially and separately set predictor and predictand truncations. These were then varied systematically and the sensitivity to the results of cross-validation tests (see SL or BS for descriptions of these procedures) noted before final selections were made. The objective was to find sets of predictor and predictand PC truncations for these empirical models that optimized specification performance and varied smoothly from season to season for each variable field.

The outcomes of these tests suggested truncations (Table 1) that departed substantially from the conservative practices of Barnston (1994) and BS. The optimal numbers for retained predictor modes were universally much larger than the actual truncation numbers in the previous studies, while those for retained predictand modes were generally considerably smaller than for truncations in BS and often somewhat larger for U.S. precipitation in Barnston (1994).

In retrospect, none of these differences were surprising. In the case of the predictands, the PC truncations are entirely consistent with many existing studies of the principal modes of low-frequency variability of the target fields. As for the predictors, Smith et al. (1996) found that the global SST variability, as well as many regional-scale structures, required the use of comparable numbers of modes to span this dataset well.

Of all the differences between directly comparable specifications in SL and in BS (U.S. seasonal suface temperatures and precipitation from global SSTs), we believe the PC truncations used were the most important ones. To eliminate some of the data differences between the two approaches, a comparison between respective cross-validated performances was conducted by retesting the BS specifications over the data period and U.S. domain used in SL. This resulted in temperature specifications with lower skill than those completed for this study (Fig. 1a) and precipitation specifications of overall comparable performance (Fig. 1b).

Probably the only difference in SL and BS important to the temperature specification results (Fig. 1a) was the use of different PC truncations. However, two other differences in the experiments likely also had some impact on the relative precipitation specification performance (Fig. 1b), and both represent advantages for the BS precipitation specifications. The effects of one of these, specifically the use of smoother U.S. precipitation data in BS than in SL, was partially tested by rescoring the present specifications against smoothed precipitation fields. The smoothing consisted of PC decompositions of the precipitation observations and reconstructions with the truncations used for the CCAs (Table 1). The result was higher average temporal correlations (Fig. 1b).

For any application of the BP method parsimonious truncations that retain the covarying predictor and predictand modes are highly desirable. For many problems (like those in SL and LS2) sample sizes are too small to confidently determine a priori what the best truncations should be. These are the situations where the procedure outlined here, namely, the use of O’Lenic and Livezey (1988) guidelines combined with judicious cross-validated sensitivity tests, should be the most useful.

REFERENCES

  • Barnett, T. P., and R. Preisendorfer, 1987: Origins and levels of monthly and seasonal forecast skill for United States surface air temperatures determined by canonical correlation analysis. Mon. Wea. Rev.,115, 1825–1850.

  • Barnston, A. G., 1994: Linear statistical short-term climate predictive skill in the Northern Hemisphere. J. Climate,7, 1513–1564.

  • ——, and T. M. Smith, 1996: Specification and prediction of global surface temperature and precipitation from global SST using CCA. J. Climate,9, 2660–2697.

  • ——, and Y. He, 1996: Skill of canonical correlation analysis forecasts of 3-month mean surface climate in Hawaii and Alaska. J. Climate,10, 2579–2605.

  • Bretherton, C. S., C. Smith, and J. M. Wallace, 1992: An intercomparison of methods for finding coupled patterns in climate data. J. Climate,6, 541–560.

  • He, Y., and A. G. Barnston, 1996: Long-lead forecasts of seasonal precipitation in the tropical Pacific islands using CCA. J. Climate,10, 2020–2035.

  • Livezey, R. E., and T. M. Smith, 1999: Covariability of aspects of North American climate with global sea surface temperatures on interannual to interdecadal timescales. J. Climate,12, 289–302.

  • O’Lenic, E., and R. E. Livezey, 1988: Practical considerations in the use of rotated principal components analysis (RPCA) in diagnostic studies of upper-air height fields. Mon. Wea. Rev.,116, 1682–1689.

  • Smith, T. M., and R. E. Livezey, 1999: GCM systematic error correction and specification of the seasonal mean Pacific–North America region atmosphere from global SSTs. J. Climate,12, 273–288.

  • ——, R. W. Reynolds, R. E. Livezey, and D. C. Stokes, 1996: Reconstruction of historical sea surface temperatures using empirical orthogonal functions. J. Climate,9, 1403–1420.

Fig. 1.
Fig. 1.

Seasonalities of average temporal correlations for specification of three-month mean U.S. (a) surface temperatures and (b) precipitation, by CCA with only global SSTs as predictors (solid lines) and Barnston and Smith’s (1996) version of the same (dashed lines). In (b) the dotted line is the counterpart of the solid line but for smoothed verification data.

Citation: Journal of Climate 12, 1; 10.1175/1520-0442(1999)012<0303:CFUOTB>2.0.CO;2

Table 1.

Predictor and predictand PC truncations for seasonal mean CCA specifications with seasons denoted by letters, for example, JFM for January–March.

Table 1.

* This note was originally part of a submission received 15 November 1996.

Save