## Abstract

The authors analyzed climate data, seasonal averages of precipitation, and maximum, mean, and minimum temperatures over the years 1960–90, from 18 stations spread around the island of Puerto Rico in the Caribbean, to determine whether these distinguish the existence of climate zones in Puerto Rico. An R-mode principal components analysis (PCA), with varimax rotation to the seasonal data in order to reduce their dimensionality, was applied. The first five principal components, found by cross validation to be statistically significant, account for 99% of the variability in the 16 variables included in the analysis. These five components are related to annual variation in mean and minimum temperature (first PC), annual maximum temperature (second PC), and spring, summer, and fall precipitation (third through fifth PCs). A self-organizing map, an artificial neural network algorithm, was then employed to classify the first five PC scores in an optimal fashion. The scores were classified by the neural network into four climatic zones, each with a distinct geographic coverage in Puerto Rico. One zone, marked by the highest mean and minimum annual temperatures, is located along the northern, eastern, and southern coasts of Puerto Rico. The stations referred to the second zone are also from relatively low altitudes in the northern and eastern parts of the island, but they are not located along the immediate coastline. Intermediately high mean and minimum temperatures mark this zone. The third zone consists of stations from high altitudes in the central mountain range and is characterized by the lowest annual mean and minimum temperatures. To the south of the third zone, a fourth zone is identified, which is marked by the highest annual maximum temperatures.

## 1. Introduction

Puerto Rico is the smallest and most eastward of the islands in the Greater Antilles chain in the Caribbean (Fig. 1). The island is situated at a latitude of 18°N and a longitude of between 66° and 67°W. It is almost rectangular shaped and is about 180 km wide from east to west, and the maximum distance from north to south is about 60 km. The central parts are dominated by a mountain range (Cordillera Central), whereas the plains to the north and south of the mountains are generally flat and low-lying. The lowlands in the north are about 15 km wide, but those in the south are much narrower. The highest peak in Puerto Rico is 1350 m (Cerro de Punta).

The climate in Puerto Rico is tropical. Easterly waves occurring from May through November (summer period) are responsible for most of the rainfall in Puerto Rico, although cold fronts, which penetrate to the eastern Caribbean from December to April (winter period), can occasionally bring much rain. Most of this rainfall is orographic in nature. Moisture-laden air from the ocean is carried by the trade winds, which infrequently embed tropical cyclones producing copious amounts of precipitation. The moisture transported by the trade winds is associated with the intertropical convergence zone (ITCZ) and its associated cumulonimbus cloud belt. During winter, when the thermal equator is farthest south, the ITCZ is located between 0° and 5°S, south of the Caribbean Sea. In summer, the thermal equator shifts north and the ITCZ lies between 6° and 10°N over the Caribbean Sea (Etter et al. 1987). Precipitation is richer in the northern parts, whereas the southern shore of Puerto Rico is much drier.

The question of natural climatic zones in Puerto Rico has been addressed by former State Climatologist R. J. Calvesbert (1970). He identified six climatic zones in Puerto Rico: the north coastal, south coastal, northern slopes, southern slopes, eastern interior, and western interior zones. One additional zone comprised the islands east of Puerto Rico (Vieques and Culebra).

This study is based on climatic records from 18 stations dispersed throughout Puerto Rico (Fig. 1; Table 1). We analyzed 31-yr means (1960–90) of seasonal precipitation, and maximum, mean, and minimum temperature data at each of these stations in order to determine whether any distinct geographic zones marked by climatically similar conditions may be identified in Puerto Rico during this interval of time. We employed R-mode principal components analysis with varimax rotation together with an artificial neural network algorithm (a self-organizing map) to assess the existence of natural clusters in these data. Several recent studies have adopted a similar strategy to classify long-ranging climatic series from India and the continental United States into climatic regions (e.g., Gadgil and Joshi 1983; Fovell and Fovell 1993; DeGaetano 1996). These studies were based on a combination of R-mode principal components analysis and various cluster analysis techniques.

## 2. Material and methods

Of the 78 stations from Puerto Rico from which monthly climate data have been recorded more or less continuously since the beginning of the twentieth century, 18 stations had sufficiently complete temperature and precipitation records (>85%) over the interval 1960–90 for inclusion in this study (Fig. 1; Table 1). The data have undergone quality control at the National Climate Data Center, Asheville, North Carolina. One station, San Sebastian, only had temperature and precipitation data from 1965, but this station was still included in the analyses because the series was close to covering the full interval.

The values of the missing monthly observations range between 1.5% and 11.8% for precipitation (average 5.6;Table 1), and 1.5% and 13.4% for temperature (average 9.0). The 31-yr seasonal means per station were computed over the years for which observations were available. The monthly observations were divided into four seasons: winter (January–March), spring (April–June), summer (July–September), and fall (October–December). Therefore, the analysis was based on a total of 16 variables.

Principal components analysis (PCA) is the most common multivariate-statistical technique for reducing the dimensionality of multivariate data (Joliffe 1986; Jackson 1991). Through orthogonal rotations of the coordinate axes of the original variables to new positions along major clusters of data points in multivariate space, a few new axes [principal components (PCs)] are generally found to account for a major share of the variability in the data. The relationships between those variables that contribute significantly to the principal components that account for the highest proportions of the variability are used as a basis for interpreting the physical or other processes that have operated on the data. We extracted the PCs from the matrix of correlations among the climatic time series, because the precipitation and temperature data are not measured on the same scale. The use of the correlation matrix instead of the covariance matrix means that all variables will have a unit variance and that their variability can be compared independently of original scale.

Cross validation (Krzanowski 1987) was used to obtain a quantitative assessment of the optimum number of PCs. In this technique, a parameter (*Wm*), representing the increase in predictive information obtained from adding a principal component, *m,* is computed. The dimensionality of a problem is determined from the highest value of *m* for which *Wm* is greater than 0.9 (Krzanowski 1987). We employed a varimax rotation, which is a type of orthogonal rotation, to those PCs that were found by cross validation to be statistically significant. The varimax rotation maximizes the variance of the squared loadings in each principal component.

We applied a bootstrap variety of PCA with varimax rotation to determine the influence of the various time series on the PCs that were found to be significant. The bootstrap is equally applicable to unrotated PCs. The bootstrap belongs to the family of so-called computer-intensive methods in statistics (Diaconis and Efron 1983). We generated 95% confidence intervals for the PC loadings from the bootstrap distribution based on 1000 bootstrap replicates of the dataset generated using a random procedure and with replacement of the observations. Loadings whose confidence intervals do not include a zero value are interpreted as significant. The use of the bootstrap for generating confidence intervals for PC loadings was demonstrated by Diaconis and Efron (1983), but, as far as it is known, this is the first application of the bootstrap to varimax-rotated PCs.

Artificial neural networks (ANN), a branch of artificial intelligence, have not yet found wide applications in the earth sciences. The potential utility of ANN in the earth sciences stems from their great ability to learn relationships between a set of input signals and a corresponding set of output signals (supervised learning). The output signals may either be continuous variables or integer values, such as a number of predefined classes into which to assign the input signals. ANN may also be employed to classify a number of multivariate vectors into clusters, the number of which may not be known a priori (unsupervised learning).

ANN are computer systems formed by simple, highly interconnected processing elements containing so-called neurons. Neural networks learn by self-adjusting a set of parameters, using some algorithm to minimize the error between the desired output and network output. In their learning process, ANN mimic the way that the mammalian brain operates, but they are naturally only able to simulate the most elementary functions of the biological neuron (Wasserman 1989). In the earth sciences a back-propagation neural network has been applied to a problem involving stratigraphic correlation of volcanic ash zones (Malmgren and Nordlund 1996). ANN models have also been applied to problems of well log interpretations (Baldwin et al. 1989; Rogers et al. 1992), of identifications of linear features in satellite imagery (Penn et al. 1993), and of geophysical inversion (Raiche 1991).

The one-dimensional self-organizing map (SOM), used in the determination of clusters in the climate data, is referable to the class of self-organizing neural networks (Kohonen 1988). The SOM network can learn to categorize input vectors by recognizing regularities and correlations among them. The SOM network is adapted to situations where the number of classes into which to categorize the input data is not known a priori (an example of unsupervised learning). A brief technical description of the SOM algorithm is presented in the appendix.

The software used was MATLAB’s Neural Network Toolbox (MathWorks Incorporated) for self-organizing maps (Demuth and Beale 1994). We employed a learning rate starting at one according to the recommendations given in Demuth and Beale (1994). Further details on the SOM algorithm are provided by Kohonen (1988) and Wasserman (1989).

## 3. Results

Figure 2 shows the total precipitation and mean temperature at each of the Puerto Rican stations determined as averages per month for the winter, spring, summer, and fall seasons. Precipitation differs to a considerable degree among the various stations and also exhibits great seasonal contrasts at different stations. Precipitation is lowest during the winter months (Fig. 2a) and highest during either the spring or summer season (Figs. 2b,c). Throughout the year the lowest amounts of rainfall are found at the stations from the southern coast.

As expected, the winter season is coolest at all stations (Fig. 2e), whereas the highest temperatures occur during the summer season (Fig. 2g). The seasonal contrast (summer–winter mean temperatures) ranges between 2.3° (Guayama) and 3.6°C (Cayey). The spring season is slightly warmer than the fall season at all stations (Figs. 2f and 2h).

### a. Principal components analysis

Cross validation indicates that the first five PCs, accounting for 99% of the variation in the data (Table 2), are significant (Fig. 3). The mathematical dimensionality can therefore be reduced from 16 to 5 using PCA, and we consider that the problem of extracting information regarding the most essential relationships among the series can be solved through the use of the first five PCs as a set of new variables.

The first varimax-rotated PC, which explains more than half of the variance (58%), is shown by bootstrapping to contain significant contributions from all seasonal mean and minimum temperatures (Table 2). Since these variables are positively correlated, this PC constitutes an index of mean and minimum temperature differences among the Puerto Rico stations.

The second most important source of variation, which explains 25% of the variance, is represented by a positive correlation between all seasonal maximum temperatures, as shown by the second PC. This PC thus represents a measure of overall maximum temperatures in Puerto Rico.

The remaining PCs, accounting for a total of 16% of the variability, distinguish seasonal variation in precipitation independent of temperature. Here, PC3 is related to spring rainfall, PC4 to summer rainfall, and PC5 to fall rainfall.

### b. Artificial neural network

The first five varimax-rotated PC scores were then used as input variables to the SOM network in order to determine the optimum classification of the stations. Four different configurations of the network were attempted, 3, 4, 5, and 6 neurons in the SOM, giving rise to the same number of zonal divisions. We ran each of these networks over 1000, 2000, . . . , 10 000, 12 000, 14 000, . . . , 20 000 training epochs, and we monitored the results after each of these numbers of training epochs.

As mentioned above, the definition of climatic zones is an example of unsupervised learning, since the optimum number of zones cannot be known a priori. We found that a subdivision into four zones produced the most reasonable solution. Three neurons gave too crude a zonation, whereas solutions with five and six neurons divided the geographically natural cluster obtained for four neurons in a less meaningful way.

Figure 4 represents a documentation of how the SOM network with four neurons refers the various stations to the four different zones (A, B, C, and D) as a function of the number of training epochs. The referability of most of the stations does not change as training proceeds, but some stations are classified to different zones during the initial phases of training. For example, station Arecibo is referred to either zone B or D during the first 6000 epochs. However, after 7000 training epochs no further adjustments in allocation take place, so this classification is taken as the optimum four-fold zonation of the stations. The geographical extension of the four zones thus obtained is shown in Fig. 5. Zones B and D consist of four stations each, and zones A and C include five stations each.

### c. Characteristics of the climate zones

Figure 6 displays the configuration of the scores of the stations along varimax-rotated PC axes 1 and 2, 3 and 4, and 4 and 5. The location of the stations along the PC1 axis is directly related to seasonal mean and minimum temperatures; the higher the score, the higher the temperature. This axis clearly distinguishes the stations referable to zones A, B, and C in order from high to low mean and minimum temperatures. The stations from zone D are most similar to zone B with regard to mean and minimum temperatures. Table 3 summarizes the seasonal average climate data in the various zones. The second PC axis (PC2), which is associated with fluctuations in overall seasonal maximum temperature, indicates that the stations from zone D are marked by higher maximum temperatures than the other zones (cf. Table 3).

The configuration of the stations along the axes associated with variation in spring, summer, and fall precipitation (PC axes 3, 4, and 5, respectively; Fig. 6) indicates that precipitation does not contribute much to the differentiation of the various climate zones. Even if some of the stations referable to the various climate zones are distinguished by lower- or higher-than-average precipitation, for example, two stations from zone A that are marked by low spring precipitation (PC3) and two stations from zone D that show high summer precipitation (PC4), the amounts of precipitation clearly show overlapping values among the climate zones. These zones are thus essentially distinguished on the basis of temperature differences among the stations.

## 4. Discussion

It may be relevant to ask the question of how the zonation obtained using PCA and the SOM neural network compares with results from other clustering techniques. Principal coordinate analysis (Gower 1966) is a Q-mode clustering technique that uses a matrix of association among the samples as the basis for eigencomputations. This technique provides information regarding the relationships among the samples along principal coordinate axes (eigenvectors). We applied principal coordinate analysis to the original climate dataset to determine the interrelationships among the Puerto Rican stations. The plane formed by the first two coordinate axes accounts for 63% of the association among the stations (Fig. 7). The stations fall into four distinct clusters that conform perfectly with those obtained from the SOM network, which confirms the relevance to the zonation obtained here.

The distinction of zones A, B, and C seems to be generally controlled by the altitude and location of the stations in relation to the ocean. As noted above, the stations assigned to zone A are those that show the highest mean and minimum temperatures during the interval 1960–90. They are all situated along the coastline at low altitudes (generally 3–4 m, but one station at 55 m; Table 1). The stations in zone B exhibit intermediately high mean and minimum temperatures. They are situated close to the ocean but not along the immediate shoreline. Their altitudes are between 3 and 76 m. The stations in zone C show the lowest mean and minimum temperatures through the years analyzed here; they are at the highest altitudes in the central mountain range (49–418 m). Zone D was distinguished on the basis of unusually high annual maximum temperatures. The altitudes of the stations from this zone vary considerably (12–61 m). These stations are located in close proximity to trade winds leeward of the mountain range and, thus, they gain much heat due to adiabatic heating.

## REFERENCES

### APPENDIX

#### Technical Description of the One-Dimensional SOM

The architecture of the SOM network is shown in Fig. A1. The SOM uses the so-called instar learning rule of Grossberg (1974) for training the network. The instar consists of *S* neurons, each of which is associated with a set of weight terms, **w**_{i,j}, where 1 ⩽ *i* ⩽ *p* (the number of input variables) and 1 ⩽ *j* ⩽ *S.* These weight vectors have unit length. They are initially assigned random values. In effect, the instar neurons serve as “vector detectors.” The SOM does not use bias terms, as do many other neural network algorithms. It is through modifications of these weights that learning is accomplished. The number of neurons is set to equal the number of classes into which the input vectors are to be classified.

Before being presented to the network the input vectors, **x**_{i}, are normalized to unit lengths. Each of the *N* input vectors, **x**_{i}, is multiplied by the weight matrix to produce weighted net inputs to the instar’s neuron *j* (**net**_{j}), that is,

or in matrix notation,

where **X** is the input data matrix (*N* × *p*), **W** is the weight matrix (*p* × *S*), and **NET** is the matrix of net inputs (*N* × *S*). The dot product **x** · **w** is a measure of the similarity between the input and weight vectors when these vectors are normalized (Wasserman 1989). In effect, since the vector lengths equal one, the output from the instar is the cosine of the angle between the input and weight vectors.

An SOM network can have a so-called neighborhood size of either 0, 1, or larger. For a one-dimensional SOM, the neighborhood size will become one as training of the network proceeds, which means that the instar neuron will output an *a*_{j} value of 1 for the winning neuron, that is, the neuron that shows the highest net_{j}, and values of 0.5 for the neighboring neurons.

Training is accomplished through the instar learning rule applied to the input vector **x**_{i} and neuron *j*:

where Δ**w**_{j} is the change in the weight vector of neuron *j* from one generation of the network to the next, and lr is the learning rate. From this expression it is clear that a weight continues to learn when the neuron is still active (output from the previous generation *a*_{j} is >0), or, alternatively, a weight vector becomes equal to an input vector (**x**^{′}_{i} = **w**_{j}).

During training the neuron that has a weight vector closest to an input vector is updated to become even more similar (the winning neuron). Therefore, this neuron is likely to become the winner when a similar vector is presented during training. During the initial phases of learning, the neurons’ weight vectors will take large steps toward the area where the input vectors are located in multivariate space. The learning rate is slowly decreased as the training proceeds from a value of 1 down to nearly 0, and the weight vectors become evenly spread across the space of the input vectors. When the network is trained the instar will respond not only to one specific input vector, but the neurons’ weight vectors will also be trained to recognize a range of vectors, which are normal variations of a desired vector. Thus the weights adjust to average values of input vectors that have a similar orientation in multivariate space (**x**_{i} × **x**^{′}_{j} is considerably greater than 0). In this way, similar input vectors will be classified with the same group (out of the S groups).

## Footnotes

*Corresponding author address:* Professor Björn Malmgren, Dept. of Earth Sciences—Marine Geology, Earth Sciences Center, University of Göteborg, Box 460, SE405 30 Göteborg, Sweden.