Changes in Earth’s climate in response to atmospheric greenhouse gas buildup impact the health of terrestrial ecosystems and the hydrologic cycle. The environmental conditions influential to plant and animal life are often mapped as ecoregions, which are land areas having similar combinations of environmental characteristics. This idea is extended to establish regions of similarity with respect to climatic characteristics that evolve through time using a quantitative statistical clustering technique called Multivariate Spatio-Temporal Clustering (MSTC). MSTC was applied to the monthly time series output from a fully coupled general circulation model (GCM) called the Parallel Climate Model (PCM). Results from an ensemble of five 99-yr Business-As-Usual (BAU) transient simulations from 2000 to 2098 were analyzed. MSTC establishes an exhaustive set of recurring climate regimes that form a “skeleton” through the “observations” (model output) throughout the occupied portion of the climate phase space formed by the characteristics being considered. MSTC facilitates direct comparison of ensemble members and ensemble and temporal averages since the derived climate regimes provide a basis for comparison. Moreover, by mapping all land cells to discrete climate states, the dynamic behavior of any part of the system can be studied by its time-varying sequence of climate state occupancy. MSTC is a powerful tool for model developers and environmental decision makers who wish to understand long, complex time series predictions of models. Strong predicted interannual trends were revealed in this analysis, including an increase in global desertification; a decrease in the cold, dry high-latitude conditions typical of North American and Asian winters; and significant warming in Antarctica and western Greenland.
Understanding the physical environment that affects the life cycles of all plants and animals (including humans) is of paramount importance as natural and anthropogenic environmental changes occur. The environment is characterized by a large number of conditions, including land surface properties (soil type, elevation, rivers and lakes, vegetation, etc.), ocean properties (sea surface temperatures, salinity, circulation patterns, etc.), and atmospheric properties (chemical species concentrations, air temperatures, atmospheric structure, etc.). Humans have long attempted to understand and model the interactions of these properties. The environmental conditions influential to plant and animal life are often mapped as ecoregions, which are groupings of land areas having similar combinations of environmental characteristics. Köppen and Geiger (Köppen and Geiger 1928; see also Köppen and Geiger 1930; Thornthwaite 1931; Thornthwaite 1948) long ago used a hierarchical classification scheme to map vegetation zones based only on temperature and precipitation patterns. More recent regionalizations have become more specialized and provide a framework for ecological research (Bailey 1983; Omernik 1987; Omernik 1995; McMahon et al. 2001). A multivariate clustering technique, developed by Hargrove and Hoffman, has been applied to ecological regionalization using many environmental characteristics for the conterminous United States at a resolution of 1 km2 (Hargrove and Hoffman 2004; Hargrove and Hoffman 1999). This technique is one of the most objective and quantitative in use today.
Here we extend our Multivariate Geographic Clustering (MGC) technique to regionalize changing environmental conditions predicted by a fully coupled general circulation model (GCM) called the Parallel Climate Model (PCM). We call this new technique for analyzing time series data Multivariate Spatio-Temporal Clustering (MSTC). The resulting regions, which change as simulation time progresses, are called climate regimes or climate states. Three important environmental characteristics, taken from model output, are used in this analysis: temperature, precipitation, and soil moisture. While choosing only three variables will simplify graphical presentation of the results, any number of characteristics could be analyzed simultaneously. Nevertheless, temperature, precipitation, and soil moisture are the three most important climatic parameters affecting terrestrial ecosystems, the hydrological cycle, and human life. Impacts due to predicted changes in these parameters are of utmost concern to environmental-policy decision makers.
1.1. Multivariate Spatio-Temporal Clustering
A multivariate statistical clustering technique based on an iterative k-means algorithm (Hartigan 1975) has been used to extract patterns of climatological significance from the output of five 99-yr Business-As-Usual (BAU) integrations of a fully coupled GCM. Originally developed and implemented on a Beowulf-style parallel computer constructed by Hoffman and Hargrove from surplus commodity desktop PCs (Hargrove et al. 2001), the high-performance parallel clustering algorithm (Hoffman and Hargrove 1999) was previously applied to the derivation of ecoregions from map stacks of 9 and 25 geophysical conditions or variables for the conterminous United States. The resulting regionalizations have recently been used to quantify the representativeness of the AmeriFlux sampling network (Hargrove et al. 2003). Now applied both across space and through time, the MSTC technique yields temporally varying climate regimes that can be used to diagnose model behavior and inherent variability and to understand and interpret model predictions. Figure 1 describes this MSTC procedure.
The left side of Figure 1 represents geographic space, while the right side illustrates the same map cells or “observations” from every snapshot in time in a multidimensional data space. The N characteristics of each map cell on the left are used as the N coordinates for that observation in data space on the right. In Figure 1, N is three: temperature, precipitation, and soil moisture. The values of the N input variables for the collection of map cells are standardized such that each variable has a mean of zero and a standard deviation of one. These values are then used to uniquely locate each map cell in the N-dimensional data space. Any two cells from anywhere in the maps possessing similar combinations of conditions will, by definition, be located near each other in data space. Their proximity and relative positions will quantitatively reflect their similarity.
Having no information about the geographic coordinates of each observation, the iterative clustering algorithm finds k groups or clusters of observations based on their proximity, by simple Euclidean distance, in data space. The number of clusters, k, is chosen by the user. First, a set of k initial centroid locations in data space is determined from the entire collection of input maps using a fast parallel algorithm called the “best of the best.” These initial centroids (or seeds) are chosen to be the k most widely distributed observations in data space. On a parallel computer this is accomplished by dividing up the total number of observations among a number of computer nodes, n. Each node finds the k best seeds from its portion of the observations and sends these seeds to a single node that subsequently determines the best k seeds from the n × k set of candidate seeds, that is, the best of the best.
In the iterative part of the algorithm, each observation is assigned to the cluster whose centroid is nearest to it in data space. After all observations are assigned to a cluster, new centroid positions are computed using the mean values for each coordinate of all observations assigned to that cluster. As a result, the centroids migrate to the most densely populated regions of data space. This procedure is repeated until the number of observations that change cluster assignments in a single iteration is less than some threshold. Once the threshold is met, final cluster assignments are saved. Reassembling the map cells in geographic space for each point in time and coloring them randomly according to their cluster assignment yields new maps showing regions having approximately equal multivariance with respect to the N characteristics used in the clustering process. Since the clustering algorithm is implemented for distributed memory parallel supercomputers, it may be used to analyze very large datasets (Hoffman and Hargrove 1999).
1.2. The Parallel Climate Model (PCM)
Output from a fully coupled GCM, called PCM, was used for these analyses. PCM makes use of the third version of the National Center for Atmospheric Research (NCAR) Community Climate Model (CCM3; Kiehl et al. 1998; Hurrell et al. 1998; Briegleb and Bromwich 1998), the NCAR Land Surface Model (LSM; Bonan 1996; Bonan 1998), the U.S. Department of Energy (DOE) Los Alamos National Laboratory Parallel Ocean Program (POP; Dukowicz and Smith 1994; Maltrud et al. 1998; Smith et al. 1995), the Naval Postgraduate School sea ice model, a River Transport Model (RTM; Branstetter 2001), and a distributed flux coupler (Bryan et al. 1996). The CCM3 is a spectral dynamics model that uses a T42 resolution (approximately 2.8° × 2.8°) with 18 hybrid levels in the vertical. The LSM simulates the biogeophysics of prescribed vegetation types and hydraulic and thermal properties of 12 soil types. The RTM routes water runoff from land into the oceans. The POP uses a grid with a displaced North Pole at an average resolution of 2/3° latitude–longitude with increased latitudinal resolution of approximately 1/2° near the equator. The sea ice component predicts the evolution of ice thickness, ice concentration, velocity, snow thickness, and surface temperature (Zhang and Hibler 1997); and it uses an elastic–viscous–plastic (EVP) ice rheology dynamics (Hunke and Dukowicz 1997). The distributed flux coupler, originally developed for the NCAR Climate System Model (CSM), connects the PCM components and facilitates the exchange of flux and state variables among the component models. The fully coupled modeling system is described in Washington et al. (Washington et al. 2000).
Versions of PCM, and a follow-on model called the Parallel Climate Transitional Model (PCTM), have been used for a wide variety of control and transient simulations. For example, the DOE Accelerated Climate Prediction Initiative (ACPI) commissioned an ensemble of runs for the twentieth century using oceans initialized to 1995 observed conditions. This forcing employed an anomaly coupling scheme and was created by a group from the Scripps Institution of Oceanography and other institutions (Pierce et al. 2004). Barnett et al. (Barnett 2001) have shown that the model-produced signals are indistinguishable from observations at the 0.05 confidence level, and they suggest that the model uniquely captures the effects of anthropogenic forcing by replicating both air temperature increases and associated ocean heat uptake (Barnett et al. 2001). When the same runs were performed without the oceanic forcing, a comparison of the results by Dai et al. (Dai et al. 2004) showed only a ±0.1°C difference in global surface temperature between the two sets of runs. They conclude that “the effect of small errors in the oceans (such as those associated with climate drifts) on coupled GCM-simulated climate changes for the next 50–100 years may be negligible” (Dai et al. 2004).
Among the large number of simulations performed using version 1.1 of PCM is an ensemble of 99-yr BAU scenario runs beginning in the year 2000. Run on the IBM RS/6000 SP parallel computers at the Oak Ridge National Laboratory’s Center for Computational Sciences (CCS), these simulations project continued atmospheric increases in CO2 and other trace gases comparable to the mean of all the scenarios developed subsequently for the Intergovernmental Panel on Climate Change’s (IPCC’s) Special Report on Emissions Scenarios (SRES; Nakićenović and Swart 2000). The CO2 levels increase from ∼371 ppm in 2000 to 710 ppm in 2100. These five BAU runs show a global mean surface temperature increase of ∼1.9°C over the twenty-first century and a ∼3% increase in global precipitation. The model produces a cooling over the North Atlantic Ocean (1°–2°C mostly in winter) resulting from a 20% slowdown in the thermohaline circulation (Dai et al. 2001). Presented here are results from all five of these BAU runs analyzed using MSTC.
MSTC was applied to the analysis of monthly output from the ensemble of five transient BAU scenario runs of PCM for the years 2000–2098. Three fields of ecological significance were included in the analysis: temperature (K), precipitation (kg m−2 s−1), and soil moisture (volume fraction of root zone soil water). Only values over land were analyzed (2796 out of 8192 total model grid cells); ocean and sea ice regions were not included.
The clustering process establishes an exhaustive set of clusters (called climate regimes) occupied by one or more land cell observations. The centroids of the resulting regimes in the three-dimensional climate data space (or phase space) represent and define the synoptic conditions of their land cell membership. The centroids represent a partitioning of the variance in the data and form a “skeleton” for the observations in the occupied portion of phase space. The number of clusters requested in the clustering process determines the number of climate regimes that will result. Therefore, it implicitly defines the multivariance encapsulated within a regime or the radius of the resulting clusters.
Each land cell must occupy one and only one regime at any single point in time. The transitions among these regimes (or states in phase space) for any land cell trace out an orbit or climate trajectory among successively occupied regimes. When a land cell enters a climate regime it has never previously visited, a climate change is said to have occurred. When the entire trajectory is traced out (along a time axis added to the phase space), it forms a manifold representing the complete climate regime or state occupancy for that location or land cell over the entire time period. Rare climate extremes appear as excursions from the body of the manifold while strong trends appear as changes or shifts in manifold shape.
Phase space state occupancy for a single land cell changes as a result of seasonal and interannual cycles, including potential changes in climate. By taking 5-yr running averages of cluster frequency (or land area under each regime or state) over the entire globe, seasonal cycles can be removed from interannual trends. The resulting curves show growth in some climate regimes, representing an increase in spatial area, and compensatory loss in spatial area under other regimes. The number of land cells is fixed throughout the simulations; whenever one regime loses a land cell member, it must be gained by another regime.
Since clusters have nearly equal radii, cluster centroids are regularly distributed throughout the occupied portion of phase space. Therefore, the number of clusters (or climate regimes) requested in the MSTC process defines the “resolution” at which the multivariance of the data (model output) is “sampled.” The magnification of a microscope can serve as an analogy to this resolution. When the magnification is low, only the largest features of an object are visible. When the magnification is increased, much more of the detail of an object becomes visible. When a complex object is viewed at a single magnification, that magnification must be sufficiently high to expose the finest level of detail desired, even if that detail of interest is isolated in a particular region of the object. The rest of the object may be homogeneous and not warrant such a high magnification.
Similarly, choosing too few climate regimes will result in very large, broad regimes and will likely cause the researcher to miss predicted climate signals. Choosing too many regimes will cause very small changes, attributable to noise, to be amplified in the analysis. Desired is an adequate sampling of the data multivariance that produces cohesive regions containing cells that experience only a handful of climate regimes annually. A suitable number of regimes is usually chosen—just like the magnification of a microscope—by trying a few different orders of magnitude. The exact number is not particularly important. For a preliminary analysis using temperature, precipitation, and cloud cover, eight regimes were requested; however, this was too few for an adequate sampling of the predicted multivariance. Subsequent analyses used 32 regimes since this number has proven to be adequate for producing cohesive regions, elucidating climate variability, and detecting predicted climate change with model results at this model resolution.
Initially, the results from each of the five BAU runs were clustered independently into 32 regimes each. The regimes produced from clustering each run independently were similar but were numbered differently. Moreover, because of model variability, the resulting climate regime definitions were not directly comparable. A single common set of climate regimes was needed to serve as a basis for comparison across runs. Such a set of climate regimes was generated by clustering the entire ensemble of runs simultaneously. The climate regimes that resulted represent the synoptic conditions for all five BAU runs taken together. These common climate regimes may be used to intercompare predictions among any of these runs. Finally, the ensemble average was computed for each month and clustered. However, instead of allowing the climate regimes or states to be determined from a partitioning of the ensemble average itself, the common climate regimes defined for the entire ensemble were used as centroids for a single-pass clustering or classification of the ensemble average time series. As a result, the ensemble average regime changes can be directly compared to those of the individual runs since they share a common set of basis states. MSTC serves as a transform between geographic space and phase space through simulation time, and climate regimes derived in one phase space may be transplanted into another to facilitate data comparison.
When MSTC is applied to the ensemble model results, each land cell is assigned to a climate regime for each time period (month). The multivariate results from each run can then be displayed as global maps animated through time, as shown in Movie 1 for the run designated B05.12. The static images shown here consist of two frames: January (left panel) and July (right panel) of 2005. The top row contains maps using random colors and the bottom row contains the same maps using similarity colors. As shown in the legend, the similarity colors consist of a linear combination of the three input variables: temperature (red), precipitation (green), and soil moisture (blue). For example, Antarctica and Greenland are shades of blue since they are dominated by high soil moisture; the Tropics tend to be yellow (red and green) due to high temperatures and precipitation; and the subtropics are shades of red because of warm temperatures and more modest precipitation. Next to each map is a histogram of climate regime occupancy showing the number of land cells contained in each cluster. As time progresses in the animations, the map patterns and histograms change reflecting seasonal and interannual changes predicted by the model.
3.1. Climate regime evolution
Table 1 contains the quantitative definitions of the 32 common climate regimes established by applying MSTC to the entire ensemble of runs simultaneously. The regimes are characterized by the centroid locations of clusters in climate phase space. The table is sorted by temperature, precipitation, and then soil moisture so the coolest regimes appear at the top while the hottest appear at the bottom. The first column, containing the cluster number used by the algorithm, is colored randomly while the remaining columns show similarity colors using the same variable to color assignments as in Movie 1. Climate regimes that experience a net global increase in surface area in one or more BAU runs are denoted by a plus (+) in the first column; those with a minus (−) shrink in net surface area globally. Table 1 serves as a quantitative legend to all figures, which are based on the 32 common climate regimes.
An analysis of the 5-yr running average of cluster frequency (or climate regime or state occupancy) reveals consistent trends across all five runs when divided into 32 regimes. Figure 2 shows global climate regime evolution curves of high variance for all five BAU runs and for the ensemble average. High-variance curves identify climate regimes that undergo significant changes in spatial area defined as those for which the 5-yr running average of state occupancy changes by 100 or more grid cells over the course of the simulation. Curves with low variance, representing regimes that did not experience significant global land area changes (i.e., less than 100 grid cells), are not included in these graphs. All curves are shown in random colors corresponding to the cluster definitions in Table 1. The evolution curves for the ensemble average were generated by taking 5-yr running averages of the ensemble average time series; they do not represent an average of the evolution curves of the ensemble members.
The curves in Figure 2 reveal an increase in spatial area occupied by the climate regime that typifies summertime desert regions (cluster 9, i.e., an increase in global desertification) and a decrease in the spatial area occupied by the climate regime typifying wintertime high-latitude permafrost regions (cluster 26). Additionally, significant changes are seen to occur in both Antarctica and Greenland due primarily to increasing temperatures over the 99-yr time period. This is indicated by a significant decrease in occupancy of the three coldest and driest regimes (clusters 6, 28, and 32) and compensatory increases in Antarctic and Greenland summer and spring/fall regimes (clusters 11 and 7). While desertlike conditions increase globally, the desert winter regime (cluster 1) actually decreases in coverage indicating increasingly warmer and drier winters.
For more regional analyses, similar regime evolution curves can be computed on a continental basis. Figures 3, 4 and 5 contain evolution curves of high-variance regimes for North America, Antarctica, and Eurasia, respectively. The variance threshold for these continental evolution curves to be included in the figure is 1% of the total continental area. As with Figure 2, curves for each BAU run as well as for the ensemble average are presented. While trends can be seen in other regimes in individual runs for North America, the ensemble average has only a significant decrease in spatial area of cluster 26: the coldest and driest Canadian winter conditions. In Antarctica, the coldest and driest conditions in the middle of the continent give way to climate regimes that typify current coastal conditions. As with North America, the spatial area of cluster 26 decreases in Eurasia. For Eurasia this regime represents the coldest and driest Siberian winter conditions. Interestingly, the ensemble average sees a measurable increase in the spatial area of a semiarid summer regime not observed in any single ensemble member. This results from the increasing trend of cluster 9, the hottest and driest desert summer regime, seen in three of the five ensemble members. Averaging the continuous variables each month effectively shifts this signal in three of the five runs from the most extreme desert regime to a semiarid regime in the ensemble average. It may be more accurate to say that three out of five model runs predict a significant increase in the hottest and driest desert summer regime than to say that the model runs predict a growth in a semiarid summer regime. MSTC provides an easy method for detecting both trends.
Once a subset of “changing” climate regimes is determined, a new animation containing only those regimes shows when and where each significantly changing regime occurs throughout the entire simulation. Movie 2 is an example of such an animation containing two frames (January and July of 2005) from the ensemble average time series with only the high-variance regimes plotted. These regimes are the same as the ones plotted in the ensemble average global climate regime evolution graph in Figure 2.
3.2. Climate state space representation
Plotting the climate regimes (i.e., the cluster centroids) in the three-dimensional climate phase or state space representation allows one to observe the portion of that space that is occupied by the land surface at all points in geographic space and time. Movie 3 shows the 32 common climate regimes derived from MSTC plotted using similarity colors in the temperature, precipitation, and soil moisture phase space. These climate states are common to all of the input observations (i.e., model predictions) and as such serve as a basis for comparison among these predictions. When all the input observations are plotted in the same state space, as shown in Movie 4, it is easily observed that the derived states (Movie 3) form an equal-multivariate skeleton throughout the portion of phase space occupied by observations.
The planes formed by cells separated from the main body of observations in Movie 4 represent regions modeled with completely saturated soil moisture, including Antarctica and Greenland. In Movie 3, a line of blue regimes accounts for these cells. The main body of observations in Movie 4 is flat on the front because precipitation cannot go negative. The most densely populated region of phase space is near the front representing low to mid soil moisture, low precipitation, and warm temperatures. As is evidenced by the line of yellow–green regimes in Movie 3, high precipitation occurs only at moderately warm temperatures. The shape formed by these regimes in this climate phase space makes intuitive sense and demonstrates the covariance of the variables under analysis.
Since every individual land cell on the globe must exist in one and only one of these climate states at any single point in time, the transitions among climate states can be analyzed to learn something about the dynamics of the climate in that geographic location. When the transitions among climate states for a single location on the globe are drawn in state space, they form a climate trajectory like that shown in Movie 5. In this figure, the current position of a land cell in state space is represented by a “spider” resting on one of the climate states. The land cell of interest is a point in the Middle East. As time progresses, the spider spins a “web” among these states. Since repeated traversal of transition paths causes web segments to thicken, favored transitions become more prominent while rate transitions appear as very thin line segments. In each frame, colored line segments represent the current transition, and the color of that segment is borrowed from the state to which the spider just traversed.
Because all five BAU runs are clustered together to define common climate states, climate trajectories for all the simulations can be plotted simultaneously in the same state space. For example, the animation in Movie 6 shows the same Middle Eastern point from each of the five BAU runs represented as five spiders simultaneously creating their own trajectories in the same common climate state space as time progresses. The runs start at different times so not all spiders are shown initially. Spiders and thickening web segments are colored by run number. When a transition occurs, the segment color matches that of the spider traversing the segment. When not being actively traversed, black segments represent transitions made by most or all of the runs while colored segments represent transitions favored by only one of the simulations.
This sort of visualization makes it easy to see when (in time) and where (in state space) model predictions agree or disagree for a single location on the globe. When the simulations agree, the spiders converge; when they disagree, the spiders diverge. As one would expect, climate variability among runs results in frequent state space divergence; however, the frequently made transitions between states emerge as simulation time progresses. For the Middle Eastern point, the simulations tend to agree most often during summer and winter months, but they may take very different paths among other states during spring and fall months.
When the entire climate trajectory for a single spot on the globe is traced out in a new phase space that includes a time axis, it forms a coil or a climate manifold in that space representing the shape of its predicted climate state occupancy. Rarely visited climate extremes appear as excursions from the body of the manifold while strong trends appear as changes in manifold shape. Climate manifolds for the Middle Eastern point for all five BAU runs and for the ensemble average are shown in Movie 7. Since only three dimensions can be shown at one time, the soil moisture axis has been replaced with a time axis. In all the runs and the ensemble average, the bodies of the manifolds are approximately the same.
Rarely and with decreasing frequency as time progresses, this Middle Eastern point visits a cold winter state that appears as a “comb” below the body of the manifold. Late in the integrations of two of the ensemble members (B06.06 and B06.09), the Middle Eastern point enters the hottest and driest desert summer regime, a state it had never previously occupied. As a result, a climate change is said to have occurred. Notice that although every simulation predicts continued but rare occupancy of the cold winter state, the land cell in the ensemble average stops visiting that state all together. Moreover, in the ensemble average, the predicted climate change is not observed and much of the variability in precipitation at warm temperatures from all of the runs is lost. This highlights the danger of analyzing only ensemble averages. While ensemble averages are good for finding the strongest interannual trends, they are not good for diagnosing model behavior or understanding climate variability since averaging at each time point removes all variability and forces apparent model behavior to the trend.
3.3. Temporal average snapshots
To detect and map long-term trends using MSTC, representative decadal climate regimes were determined for each BAU run by taking 10-yr averages of temperature, precipitation, and soil moisture early in the present (2001–10) and the future (2089–98). To further distinguish seasonal changes, separate averages were computed for Northern Hemisphere winter [December–January–February (DJF)] and Northern Hemisphere summer [June–July–August (JJA)]. To accomplish this, each of the three fields was averaged for each of the four seasonal 10-yr periods of time for each of the five BAU runs. Then these 20 global realizations were subjected to a single-pass clustering or classification using the common climate regimes as basis states. The resulting maps and histograms are shown using similarity colors in Figures 6 and 7.
By looking at the histograms, one can observe global area changes between the present and future predictions that match the regime evolution curves in Figure 2. By looking at the maps, one can determine not only where those area changes occur but also how regimes may be redistributed among or within continents. For instance, the histograms in Figure 6 show a consistent decrease in the desert winter regime (cluster 1) and an increase in the desert summer regime (cluster 9). The maps demonstrate that these changes in area occur primarily in the Sahara Desert and the Middle East. The significant decrease in spatial area of the Siberian/Canadian winter regime (cluster 26) is readily observed in those areas of the maps. The Antarctic and Greenland fall/spring #2 (cluster 7) and summer (cluster 11) regimes gain area with a compensatory loss of area by the cooler and drier fall/spring #1 regime (cluster 32). In addition, the maps and histograms show an increase in arid winter conditions in central Asia (cluster 21) and the center of North America (cluster 20).
Similarly, in Figure 7, the desert winter regime (cluster 1) shrinks in the Southern Hemisphere. The coldest and driest Antarctic winter regime (cluster 6) shrinks while the Antarctic and Greenland fall/spring #1 regime (cluster 32) grows. This growth is seen during the Southern Hemisphere winter despite the overall decline in the trend for this regime reflected strongly in Figure 6. The spatial area of the arid winter regime (cluster 20), which grew during the Northern Hemisphere winter in central North America, shrinks during the Northern Hemisphere summer. It represents the Arctic continental coastlines, the Andes in South America, and a region of southern Australia during the Northern Hemisphere summer. In addition, a warm desert margin regime marked by particularly low precipitation (cluster 16) grows in Figure 7. During the Northern Hemisphere summer, the regime begins to encroach upon central Asia and the United States, particularly the south, resulting in significantly drier summers for the southern states.
Similar seasonal area change maps for 10-yr averages of the ensemble average were generated. The ensemble average time series was temporally averaged for the present (2001–2010) and future (2089–2098) predictions for the Northern Hemisphere winter months (DJF) and the Northern Hemisphere summer months (JJA). These four snapshots were then clustered by a single pass through MSTC using the 32 common climate regimes resulting in seasonal land area classifications. Figures 8 and 9 show the results for DJF and JJA, respectively, using random colors (first row) and similarity colors (second row). In addition, stoplight-colored difference maps are presented for both seasons with respect to the present (left panel) and future (right panel) predictions. In the difference maps, green regimes grow in area from the present to the future, red regimes shrink in area from the present to the future, and yellow regimes remain nearly constant in global area. While the random-colored maps are better than the similarity-colored maps for finding specific regimes and their borders, the stoplight-colored maps make it very easy to find the changing regimes in both the present and future predictions as well as to determine the strength and sign of the area change trend.
In the ensemble average for the Northern Hemisphere winter, the significant changes are the same as for the individual BAU runs shown in Figure 6. In the stoplight-colored maps, the increase in area of the Antarctic and Greenland fall/spring #2 (cluster 7) and summer (cluster 11) regimes is shown as a large green area in Antarctica and western Greenland, while the decrease in area of the cooler and drier fall/spring #1 regime (cluster 32) is shown as a red central region in Antarctica and eastern Greenland. Notice that in Antarctica, the present map (left panel) has a larger red region than the future map (right panel) while the future map has a larger green region surrounding the red region than the present map. Similarly, since the Siberian/Canadian winter regime (cluster 26) shrinks from present to future, it has a reddish-orange color in both maps. The yellow regions experienced little, if any, area changes.
In Northern Hemisphere summer, the global results for the ensemble average are again consistent with those for the individual runs shown in Figure 7. The shrinking coldest and driest Antarctic winter regime (cluster 6) appears in red in the center of Antarctica while the growing Antarctic and Greenland fall/spring #1 regime (cluster 32) appears as a green coastline around Antarctica. The shrinking arid winter regime (cluster 20) is shown in red along the coastlines of Asia and North America and as red regions in the Andes and southern Australia. The warm desert margin regime marked by particularly low precipitation (cluster 16) appears as dark green regions scattered across the globe. In the future, this regime begins to dominate parts of the southern United States, Mexico, northern China, and southeastern Russia.
Multivariate Spatio-Temporal Clustering (MSTC) is a powerful technique for analyzing and comparing results from fully coupled GCMs. Using the full range of available data, it defines an exhaustive set of recurring climate regimes. These regimes form a “skeleton” through the “observations” throughout the occupied or realized portion of climate phase space formed by the fields or variables being considered. MSTC discretizes these continuous variables simultaneously by partitioning the variance (multivariance) in an equal fashion. Therefore, the resulting climate regimes are of approximately equal volume and are regularly distributed in phase space.
Since it runs on the largest supercomputers, MSTC can handle large volumes of complex multivariate data. It is a good change-detection technique because it can quickly find and quantify trends in long time series data. Since the derived climate regimes provide a basis for comparison, MSTC facilitates direct comparison of ensemble members and ensemble and temporal averages. Moreover, by mapping all land cells to discrete climate states, one can study the dynamical behavior of any part of the system by way of its climate state occupancy. As a result, MSTC is useful for model developers who want to decompose the complex behavior of the model. In addition, MSTC enables a researcher to more easily grasp the multivariate behavior of the modeled climate system and the resulting impacts on global water and energy cycles.
Appropriately derived regimes can further serve as a basis for comparison of the following: ensemble members to ensemble and temporal averages, different model scenarios, one model to another, models to measurements, and one set of measurements to another. Appropriate methods for determining regimes for comparison of time series data are shown in the green and blue boxes in Table 2. Regimes are called centroids since they represent the cluster centers in phase space. Normal clustering analyzes the full multivariance, derives equal-multivariate centroids, and classifies all observations into the corresponding clusters. This works for a single time series, multiple time series (like the ensembles presented here), and for ensemble average time series as illustrated by the first row (shown in green). Since ensemble members were clustered together (middle column), they may be intercompared; however, none of the three types of time series may be compared with each other since their centroids were derived independently.
The green diagonal in Table 2 represents a one-pass clustering or classification of each time series using the centroids obtained from normal clustering for the same time series. This is a self-referential classification because it is performed as a part of normal clustering. A single time series may be compared to an ensemble in which it is contained by performing a one-pass clustering using the centroids derived from that ensemble (blue box in the first column). Again this is self-referential. An ensemble average time series can be compared to the ensemble members from which it was obtained by performing a one-pass clustering using the multiple time series centroids (blue box in the third column) because the ensemble average time series will be fully contained within the variance of the collection of ensemble members. This method was used on both the ensemble average and the temporal averages presented here so that they shared the common climate regimes derived from the entire ensemble.
On the other hand, classifying an entire ensemble or even an ensemble average using centroids from a single ensemble member is likely to provide a poor basis for comparison since the multivariate range of the single time series may not fully encapsulate that of the other two types of time series. These methods are shown in red in the second row of Table 2. Similarly, classifying a single ensemble member or the entire ensemble using centroids derived from only the ensemble average is problematic because the average is unlikely to have the multivariate range of the time series data from which it was constructed. These methods are shown in red in the fourth row of Table 2. When the methods contained in blue or red boxes are employed, the input data must be normalized using the means and standard deviations of dataset from which the centroids were derived (see footnote c in Table 2).
The present analysis of an ensemble of BAU runs highlights the utility of MSTC for climate change detection and long time series comparisons. In addition, it exposes some of the dangers of analyzing only ensemble averages. Averaging multiple runs effectively reduces the climate variability resulting in a time series unlike any of the runs from which it was derived. While ensemble averages are good for finding the strongest interannual trends, they are not good for diagnosing model behavior or understanding climate variability.
By generating only 32 regimes using MSTC and plotting global regime evolution curves for the entire simulated time period, strong predicted interannual trends were revealed. These include an increase in global desertification, a decrease in the cold, dry high-latitude winter conditions typically experienced in North America and Asia, and significant warming in Antarctica and western Greenland. Additionally, temporal averages indicate a reduction in precipitation at midlatitudes in central Asia and North America during the Northern Hemisphere winter. During the Northern Hemisphere summer, hot and dry conditions typical of desert margins spread across central Asia and the United States, particularly in the south, and the cold and dry conditions typical of Arctic continental coastlines, the Andes in South America, and southern Australia are displaced as warmer and wetter conditions invade these regions. While almost all regions warmed, the overall tendency for moisture is one of drying; the ability of the model to put moisture into the atmosphere could not keep up with the increased moisture-holding capacity of the warmer atmosphere on a global basis. Impacts due to these predicted environmental changes are of utmost concern to environmental-policy decision makers. (All results from the analysis of this BAU ensemble from PCM, including color figures and animations, are available online at http://climate.ornl.gov/pcm.)
The authors wish to thank Michael Wehner at Lawrence Berkeley National Laboratory for his long-term efforts in extracting individual fields from model output and providing an archive accessible to research projects like ours. This research used resources of the Center for Computational Sciences at Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract Number DE-AC05-00OR22725. Support was also provided by a research and development grant from the Science Directorate of NASA/MSFC.
* Corresponding author address: Forrest M. Hoffman, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831. email@example.com