Regression-Based Methods for Finding Coupled Patterns

Michael K. Tippett International Research Institute for Climate and Society, Palisades, New York

Search for other papers by Michael K. Tippett in
Current site
Google Scholar
PubMed
Close
,
Timothy DelSole George Mason University, Fairfax, Virginia, and Center for Ocean–Land–Atmosphere Studies, Calverton, Maryland

Search for other papers by Timothy DelSole in
Current site
Google Scholar
PubMed
Close
,
Simon J. Mason International Research Institute for Climate and Society, Palisades, New York

Search for other papers by Simon J. Mason in
Current site
Google Scholar
PubMed
Close
, and
Anthony G. Barnston International Research Institute for Climate and Society, Palisades, New York

Search for other papers by Anthony G. Barnston in
Current site
Google Scholar
PubMed
Close
Restricted access

Abstract

There are a variety of multivariate statistical methods for analyzing the relations between two datasets. Two commonly used methods are canonical correlation analysis (CCA) and maximum covariance analysis (MCA), which find the projections of the data onto coupled patterns with maximum correlation and covariance, respectively. These projections are often used in linear prediction models. Redundancy analysis and principal predictor analysis construct projections that maximize the explained variance and the sum of squared correlations of regression models. This paper shows that the above pattern methods are equivalent to different diagonalizations of the regression between the two datasets. The different diagonalizations are computed using the singular value decomposition of the regression matrix developed using data that are suitably transformed for each method. This common framework for the pattern methods permits easy comparison of their properties. Principal component regression is shown to be a special case of CCA-based regression. A commonly used linear prediction model constructed from MCA patterns does not give a least squares estimate since correlations among MCA predictors are neglected. A variation, denoted least squares estimate (LSE)-MCA, is suggested that uses the same patterns but minimizes squared error. Since the different pattern methods correspond to diagonalizations of the same regression matrix, they all produce the same regression model when a complete set of patterns is used. Different prediction models are obtained when an incomplete set of patterns is used, with each method optimizing different properties of the regression. Some key points are illustrated in two idealized examples, and the methods are applied to statistical downscaling of rainfall over the northeast of Brazil.

Corresponding author address: M. K. Tippett, International Research Institute for Climate and Society, The Earth Institute of Columbia University, Lamont Campus/61 Route 9W, Palisades, NY 10964. Email: tippett@iri.columbia.edu

Abstract

There are a variety of multivariate statistical methods for analyzing the relations between two datasets. Two commonly used methods are canonical correlation analysis (CCA) and maximum covariance analysis (MCA), which find the projections of the data onto coupled patterns with maximum correlation and covariance, respectively. These projections are often used in linear prediction models. Redundancy analysis and principal predictor analysis construct projections that maximize the explained variance and the sum of squared correlations of regression models. This paper shows that the above pattern methods are equivalent to different diagonalizations of the regression between the two datasets. The different diagonalizations are computed using the singular value decomposition of the regression matrix developed using data that are suitably transformed for each method. This common framework for the pattern methods permits easy comparison of their properties. Principal component regression is shown to be a special case of CCA-based regression. A commonly used linear prediction model constructed from MCA patterns does not give a least squares estimate since correlations among MCA predictors are neglected. A variation, denoted least squares estimate (LSE)-MCA, is suggested that uses the same patterns but minimizes squared error. Since the different pattern methods correspond to diagonalizations of the same regression matrix, they all produce the same regression model when a complete set of patterns is used. Different prediction models are obtained when an incomplete set of patterns is used, with each method optimizing different properties of the regression. Some key points are illustrated in two idealized examples, and the methods are applied to statistical downscaling of rainfall over the northeast of Brazil.

Corresponding author address: M. K. Tippett, International Research Institute for Climate and Society, The Earth Institute of Columbia University, Lamont Campus/61 Route 9W, Palisades, NY 10964. Email: tippett@iri.columbia.edu

Save
  • Akaike, H., 1973: Information theory and an extension of the maximum likelihood principle. Second International Symposium on Information Theory, B. N. Petrov and F. Czaki, Eds., Akademiai Kiado, 267–281.

    • Search Google Scholar
    • Export Citation
  • Boulesteix, A-L., and K. Strimmer, 2007: Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Brief. Bioinform., 8 , 32–44.

    • Search Google Scholar
    • Export Citation
  • Bretherton, C. S., C. Smith, and J. M. Wallace, 1992: An intercomparison of methods for finding coupled patterns in climate data. J. Climate, 5 , 541–560.

    • Search Google Scholar
    • Export Citation
  • Browne, M. W., 2000: Cross-validation methods. J. Math. Psychol., 44 , 108–132.

  • DelSole, T., and P. Chang, 2003: Predictable component analysis, canonical correlation analysis, and autoregressive models. J. Atmos. Sci., 60 , 409–416.

    • Search Google Scholar
    • Export Citation
  • DelSole, T., and J. Shukla, 2006: Specification of wintertime North America surface temperature. J. Climate, 19 , 2691–2716.

  • DelSole, T., and M. K. Tippett, 2007: Predictability: Recent insights from information theory. Rev. Geophys., 45 .RG4002, doi:10.1029/2006RG000202.

    • Search Google Scholar
    • Export Citation
  • Ehrendorfer, M., and J. Tribbia, 1997: Optimal prediction of forecast error covariances through singular vectors. J. Atmos. Sci., 54 , 286–313.

    • Search Google Scholar
    • Export Citation
  • Feddersen, H., A. Navarra, and M. N. Ward, 1999: Reduction of model systematic error by statistical correction for dynamical seasonal predictions. J. Climate, 12 , 1974–1989.

    • Search Google Scholar
    • Export Citation
  • Glahn, H. R., 1968: Canonical correlation and its relationship to discriminant analysis and multiple regression. J. Atmos. Sci., 25 , 23–31.

    • Search Google Scholar
    • Export Citation
  • Golub, G. H., and C. F. Van Loan, 1996: Matrix Computations. 3rd ed. The Johns Hopkins University Press, 694 pp.

  • Gower, J. C., and G. B. Dijksterhuis, 2004: Procrustes Problems. Oxford University Press, 248 pp.

  • Hastie, T., A. Buja, and R. Tibshirani, 1995: Penalized discriminant analysis. Ann. Stat., 23 , 73–102.

  • New, M. G., M. Hulme, and P. D. Jones, 2000: Representing twentieth-century space–time climate variability. Part II: Development of 1901–96 monthly grids of terrestrial surface climate. J. Climate, 13 , 2217–2238.

    • Search Google Scholar
    • Export Citation
  • Roeckner, E., and Coauthors, 1996: The atmospheric general circulation model ECHAM-4: Model description and simulation of present-day climate. Tech. Rep. 218, Max-Planck Institute for Meteorology, Hamburg, Germany, 90 pp.

  • Schwarz, G., 1978: Estimating the dimension of a model. Ann. Stat., 6 , 461–464.

  • Thacker, W. C., 1999: Principal predictors. Int. J. Climatol., 19 , 821–834.

  • van den Dool, H., 2006: Empirical Methods in Short-Term Climate Prediction. Oxford University Press, 240 pp.

  • Vinod, H. D., 1976: Canonical ridge and econometrics of joint production. J. Econometrics, 4 , 147–166.

  • von Storch, H., and F. W. Zwiers, 1999: Statistical Analysis in Climate Research. Cambridge University Press, 494 pp.

  • Wang, X. L., and F. Zwiers, 2001: Using redundancy analysis to improve dynamical seasonal mean 500 hPa geopotential forecasts. Int. J. Climatol., 21 , 637–654.

    • Search Google Scholar
    • Export Citation
  • Widmann, M., 2005: One-dimensional CCA and SVD, and their relationship to regression maps. J. Climate, 18 , 2785–2792.

  • Widmann, M., C. Bretherton, and E. P. Salathe Jr., 2003: Statistical precipitation downscaling over the northwestern United States using numerically simulated precipitation as a predictor. J. Climate, 16 , 799–816.

    • Search Google Scholar
    • Export Citation
  • Wold, S., A. Ruhe, H. Wold, and W. J. Dunn III, 1984: The collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverses. SIAM J. Sci. Stat. Comput., 5 , 735–743.

    • Search Google Scholar
    • Export Citation
  • Yu, Z-P., P-S. Chu, and T. Schroeder, 1997: Predictive skills of seasonal to annual rainfall variations in the U.S. Affiliated Pacific Islands: Canonical correlation analysis and multivariate principal component regression approaches. J. Climate, 10 , 2586–2599.

    • Search Google Scholar
    • Export Citation
  • Zucchini, W., 2000: An introduction to model selection. J. Math. Psychol., 44 , 41–61.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 841 199 13
PDF Downloads 515 88 8