• Anyadyke, R. N. C., 1987: A multivariate classification and regionalization of West African climates. J. Climatol.,7, 157–164.

  • Arabie, P., and L. J. Hubert, 1992: Combinatorial data analysis. Annu. Rev. Psychol.,43, 169–203.

  • Barthelemy, J.-P., and B. Monjardet, 1988: The median procedure in data analysis: New results and open problems. Classification and Related Methods of Data Analysis, H. H. Bock, Ed., North-Holland, 309–316.

  • Bunkers, M. J., J. R. Miller Jr., and A. T. DeGaetano, 1996: Definition of climate regions in the northern Plains using an objective cluster modification technique. J. Climate,9, 130–146.

  • Calinski, R. B., and J. Harabasz, 1974: A dendrite method for cluster analysis. Commun. Stat.,3, 1–27.

  • Cronbach, L. J., and G. C. Gleiser, 1953: Assessing similarity between profiles. Psychol. Bull.,50, 456–473.

  • Crutcher, H. L., 1960: Statistical grouping of climates and the statistical discrimination among climatic groups. Ph.D. dissertation, New York University, 462 pp.

  • Davis, R. E., 1991: A synoptic climatological analysis of winter visibility trends in the mideastern United States. Atmos. Environ.,25B, 165–175.

  • Day, W. H. E., 1986: Foreword: Comparison and consensus of classifications. J. Classification,3, 183–185.

  • Duda, R. O., and P. E. Hart, 1973: Pattern Classification and Scene Analysis. Wiley, 482 pp.

  • Everitt, B., 1979: Unsolved problems in cluster analysis. Biometrics,35, 169–181.

  • Fovell, R. G., 1992: Problems associated with the inclusion of redundant and irrelevant variables in cluster analysis. Proc. 17th Climate Diagnostics Workshop, Norman, OK, Natl. Oceanic Atmos. Admin., 380–383.

  • ——, and M.-Y. C. Fovell, 1993: Climate zones of the conterminous United States defined using cluster analysis. J. Climate,6, 2103–2135.

  • Gadgil, S., and N. V. Joshi, 1983: Climatic clusters of the Indian region. J. Climatol.,3, 47–63.

  • Gong, X., and M. B. Richman, 1995: On the application of cluster analysis to growing season precipitation data in North America east of the Rockies. J. Climate,8, 897–931.

  • Hubert, L. J., and P. Arabie, 1985: Comparing partitions. J. Classification,2, 193–218.

  • Kalkstein, L. S., G. Tan, and J. A. Skindlov, 1987: An evaluation of three clustering procedures for use in synoptic climatological classification. J. Climate Appl. Meteor.,26, 717–730.

  • Kaufman, L., and P. J. Rousseeuw, 1990: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, 352 pp.

  • Koeppen, W., 1923: Die Klimate der Erde: Grundriss der Klimakunde. De Gruyter, 369 pp.

  • Milligan, G. W., and M. C. Cooper, 1985: An examination of procedures for determining the number of clusters in a data set. Psychrometrika,50, 159–179.

  • ——, and ——, 1986: A study of the comparability of external criteria for hierarchical cluster analysis. Multivar. Behav. Res.,21, 441–458.

  • ——, and ——, 1988: A study of standardization of variables in cluster analysis. J. Classification,5, 181–204.

  • Richman, M. B., and P. E. Lamb, 1985: Climatic pattern analysis of three- and seven-day summer rainfall in the central United States: Some methodological considerations and a regionalization. J. Climate Appl. Meteor.,24, 1325–1342.

  • Ronberg, B., and W.-C. Wang, 1987: Climate patterns derived from Chinese proxy precipitation records: An evaluation of the station networks and statistical techniques. J. Climatol.,7, 391–416.

  • Sokal, R. R., and P. H. A. Sneath, 1963: Principles of Numerical Taxonomy. Freeman, 359 pp.

  • Spaeth, H., 1980: Cluster Analysis Algorithms for Data Reduction and Classification of Objects. Ellis Horwood, 226 pp.

  • Steiner, D., 1965: A multivariate statistical approach to climate regionalization and classification. Tijdsch. K. Ned. Aardrijkskindig Genootschap,82, 329–347.

  • Thornthwaite, C. W., 1931: The climates of North America, according to a new classification. Geogr. Rev.,21, 633–655.

  • Wolter, K., 1987: The Southern Oscillation in surface circulation and climate over the tropical Atlantic, eastern Pacific, and Indian Oceans as captured by cluster analysis. J. Climate Appl. Meteor.,26, 540–558.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 250 70 7
PDF Downloads 119 45 3

Consensus Clustering of U.S. Temperature and Precipitation Data

View More View Less
  • 1 Department of Atmospheric Sciences, University of California, Los Angeles, Los Angeles, California
Restricted access

Abstract

A “consensus clustering” strategy is applied to long-term temperature and precipitation time series data for the purpose of delineating climate zones of the conterminous United States in a “data-driven” (as opposed to “rule-driven”) fashion. Cluster analysis simplifies a dataset by arranging “objects” (here, climate divisions or stations) into a smaller number of relatively homogeneous groups or clusters on the basis of interobject dissimilarities computed using the identified “attributes” (here, temperature and precipitation measurements recorded for the objects). The results demonstrate the spatial scales associated with climatic variability and may suggest climatically justified ways in which the number of objects in a dataset may be reduced. Implicit in this work is the arguable contention that temperature and precipitation data are both necessary and sufficient for the delineation of climatic zones.

In prior work, the temperature and precipitation data were mixed during the computation of the interobject dissimilarities. This allowed the clusters to jointly reflect temperature and precipitation distinctions, but also had inherent problems relating to arbitrary attribute scaling and information redundancy that proved difficult to resolve. In the present approach, the temperature and precipitation data are clustered separately and then categorically intersected to forge consensus clusters. The consensus outcome may be viewed as having identified the temperature subzones of precipitation clusters (or vice versa) or as representing distinct groupings that are relatively homogeneous with respect to both attribute types simultaneously.

The dissimilarity measure employed herein is the Euclidean distance. As it employs only continuous time series data representing a single information type (temperature or precipitation), the consensus approach has the advantage of allowing an attractively simple interpretation of the total Euclidean distance between object pairs. The total squared distance may be subdivided into three components representing object dissimilarity with respect to temporal mean (level), seasonality (variability), and coseasonality (relative temporal phasing). Therefore, concerns about redundancy or arbitrary scaling problems are neutralized. This is seen as the chief advantage of consensus clustering.

The consensus strategy has several disadvantages. It is possible for two (or more) relatively general, undetailed clusterings to produce a very complex and fragmented clustering following categorical intersection. Further, the fact that the analyst chooses the clustering levels of the separate, contributing clusterings means that he or she has considerable freedom in fashioning the consensus outcome, which makes it difficult (if not impossible) to argue that true, “natural” clusters have been identified. The latter often applies to cluster analysis in general, however. It is believed that the consensus approach merits consideration owing to its advantages.

Two consensus outcomes are presented: a lower-order solution with 14 clusters and a higher-order solution with 26 clusters. The sensitivity of these clusterings to perturbations in the input data is assessed. The regionalizations are compared with those presented in prior work.

Corresponding author address: Prof. Robert G. Fovell, Dept. of Atmospheric Sciences, University of California, Los Angeles, 405 Hilgard Ave., Los Angeles, CA 90095-1565.

Email: fovell@atmos.ucla.edu

Abstract

A “consensus clustering” strategy is applied to long-term temperature and precipitation time series data for the purpose of delineating climate zones of the conterminous United States in a “data-driven” (as opposed to “rule-driven”) fashion. Cluster analysis simplifies a dataset by arranging “objects” (here, climate divisions or stations) into a smaller number of relatively homogeneous groups or clusters on the basis of interobject dissimilarities computed using the identified “attributes” (here, temperature and precipitation measurements recorded for the objects). The results demonstrate the spatial scales associated with climatic variability and may suggest climatically justified ways in which the number of objects in a dataset may be reduced. Implicit in this work is the arguable contention that temperature and precipitation data are both necessary and sufficient for the delineation of climatic zones.

In prior work, the temperature and precipitation data were mixed during the computation of the interobject dissimilarities. This allowed the clusters to jointly reflect temperature and precipitation distinctions, but also had inherent problems relating to arbitrary attribute scaling and information redundancy that proved difficult to resolve. In the present approach, the temperature and precipitation data are clustered separately and then categorically intersected to forge consensus clusters. The consensus outcome may be viewed as having identified the temperature subzones of precipitation clusters (or vice versa) or as representing distinct groupings that are relatively homogeneous with respect to both attribute types simultaneously.

The dissimilarity measure employed herein is the Euclidean distance. As it employs only continuous time series data representing a single information type (temperature or precipitation), the consensus approach has the advantage of allowing an attractively simple interpretation of the total Euclidean distance between object pairs. The total squared distance may be subdivided into three components representing object dissimilarity with respect to temporal mean (level), seasonality (variability), and coseasonality (relative temporal phasing). Therefore, concerns about redundancy or arbitrary scaling problems are neutralized. This is seen as the chief advantage of consensus clustering.

The consensus strategy has several disadvantages. It is possible for two (or more) relatively general, undetailed clusterings to produce a very complex and fragmented clustering following categorical intersection. Further, the fact that the analyst chooses the clustering levels of the separate, contributing clusterings means that he or she has considerable freedom in fashioning the consensus outcome, which makes it difficult (if not impossible) to argue that true, “natural” clusters have been identified. The latter often applies to cluster analysis in general, however. It is believed that the consensus approach merits consideration owing to its advantages.

Two consensus outcomes are presented: a lower-order solution with 14 clusters and a higher-order solution with 26 clusters. The sensitivity of these clusterings to perturbations in the input data is assessed. The regionalizations are compared with those presented in prior work.

Corresponding author address: Prof. Robert G. Fovell, Dept. of Atmospheric Sciences, University of California, Los Angeles, 405 Hilgard Ave., Los Angeles, CA 90095-1565.

Email: fovell@atmos.ucla.edu

Save