Climate regions within the northeastern United States are defined using a combination of multivariate statistical techniques. A set of over 100 climatic variables from 641 United States and Canadian Cooperative Observer Network stations form the basis for the classification. Using various numbers of retained principal components, a suite of hierarchical clustering solutions is produced using Ward's method. A single 54-cluster solution is selected based upon the similarity of cluster outcomes using sequentially larger principal component datasets. These clusters form a set of seeds that are used to derive a final nonhierarchical cluster solution.
A novel approach is used in the nonhierarchical cluster analysis to reduce bias introduced by both redundant and irrelevant data. A sequence of cluster solutions is developed in which an additional principal component is considered in each successive solution. Final cluster membership is assigned based on the maximum frequency of cluster membership within this array of solutions. Approximately one-fourth of the climatological stations change cluster membership as a result of this nonhierarchical clustering procedure. These changes result in substantial improvements to the spatial homogeneity of the clusters. Marginal improvements to within- and between-cluster standard deviation are also realized.
Once a final grouping of stations is established, discriminant functions are calculated to distinguish the climatic zones in terms of variables derived from latitude, longitude, and elevation. Cross validation shows that more than 60% of the stations are correctly classified based on the discriminant functions. Since the spatial resolution of the 641 climatological stations is relatively low, a 5-min grided elevation dataset was used in conjunction with the discriminant functions to produce the final climate delineations.