Using multivariate discriminant analysis techniques, statistically significant and skillful models are developed for making extended-range forecasts of hurricane activity within specific locations of the North Atlantic basin. These forecasts predict the presence or absence of hurricane activity and not the actual number of storms that will occur within a region. Successful models are developed for predicting intense hurricane activity in both the Gulf of Mexico and the Caribbean subbasins separately. Extended-range forecasts of all hurricane activity are also possible within the Caribbean Sea. More significantly, lead-time forecasts of landfalling hurricanes on the southeastern Atlantic coast of the United States are possible and show a substantial improvement over climatology. Extended-range forecasts of hurricane activity for the northeastern United States and for the Gulf of Mexico are not feasible due, respectively, to the relative lack and abundance of hurricane activity. Cross-validated forecast accuracies range from 78% to 81% for the regions in which successful models can be developed. An all-possible subsets selection algorithm is used to identify the predictor models, while bootstrap techniques are used to assess model significance. Statistical tests using normal approximations are employed to compare cross-validated (hindcast) forecast accuracy to climatology.
Since 1984 (Gray 1984b), advanced seasonal forecasts of North Atlantic hurricane activity have been possible using rules and statistical techniques. While recent improvements have occurred in the statistical methodology used (Elsner and Schmertmann 1993; Hess and Elsner 1994), forecasts still involve seasonal activity for the entire Atlantic basin as opposed to more specific locations. Some researchers have noted relationships between the seasonal forecast predictors and certain United States landfall characteristics (Landsea et al. 1992), but statistical predictive methodology has yet to be incorporated. Consequently, we note that little has been done in localizing variables that may indicate the likelihood of hurricane development and the potential paths that hurricanes will take since the preliminary work of Ballenzweig (1959). Furthermore, as noted by Montgomery and Farrell (1993), while much attention has been given to the real-time public warning process and storm track prediction, the prediction of the time and location of hurricane development has proved far more challenging. Recently, some improvements have occurred in predicting tropical cyclogenesis with a lead time of one or two days (Zehr 1992; Fiorino et al. 1993), but this does not extend to the seasonal timescale.
Continuing, we note that seasonal hurricane prediction has established a firm foundation and has provided some beneficial tools in determining the probable activity within a given season but the very nature of the forecasts renders them somewhat intangible. That is, seasonal forecasts fail to provide enough detail for the public at large. For example, several months in advance of the official start to the hurricane season, a seasonal prediction of six named storms for which two become hurricanes (one of those expected to be intense) is issued; this activity level is substantially below the seasonal average. Although this forecast may verify, coastal dwellers may be lured into a false sense of security due to the prediction of below-average activity. In such a situation, it only requires one major hurricane to cause catastrophic losses and impinge on an otherwise unsuspecting public; Hurricane Alicia of 1983 provided a perfect example of such an occurrence, as this was an intense landfalling hurricane in an otherwise inactive year.
Consequently, the failure to specify location seriously reduces the usefulness of seasonal forecast models, since the accurate prediction of an active season may not prove beneficial to coastal residents if the location of the activity cannot be identified. We note that many active hurricane seasons have occurred this century with few if any landfalling storms (e.g., 1981). Conversely, an inactive season (such as 1983) could prove to be quite damaging if the path of just one of the storms crossed a vulnerable area.
Here we develop seasonal prediction models for forecasting hurricane activity in specific subbasins of the North Atlantic. The statistical methodology we employ is multivariate discriminant analysis, and we find that this methodology produces statistically significant forecasts for hurricane activity in the Caribbean Sea, for intense hurricane activity in the Gulf of Mexico and the Caribbean Sea, and for landfalling hurricanes along the southeast coast of the United States.
We start by arguing for a sensible division of the Atlantic basin in section 2. Data description and experimental design are given in section 3, followed in section 4 by a hurricane climatology for each subbasin or landfill region. The specifics of multivariate discriminant analysis, the model-building algorithm, and the statistical significance tests used are provided in section 5. We conclude by presenting the results in section 6 and a summary and discussion in section 7.
2. Specific subbasins
Having argued that location should play a larger role in seasonal prediction schemes, the determination of specific locations requires careful selection. For purposes here, we chose subbasins and landfall regions a priori based on precedent and geography rather than on hurricane climatology or model optimization. The subbasins are identified in Fig. 1 and consist of the Caribbean Sea and the Gulf of Mexico; the landfall regions are the southeast coast and the northeast coast. The Caribbean Sea and the Gulf of Mexico are natural subbasins, while the two East Coast landfall regions consist only of coastal strike zones. Necessity requires that the East Coast landfall regions only consist of these coastal strike zones since no natural geographic markers exist to designate any natural subbasins. Furthermore, any such attempt to create an East Coast subbasin(s) would by nature be arbitrary; model accuracy could be improved by simply expanding the subbasin domains. A further justification for allowing both subbasin and landfalling regions in this study was that tropical cyclones of hurricane intensity in either the Gulf of Mexico or the Caribbean Sea rarely fail to hit a populated area (e.g., only four hurricanes in the Gulf of Mexico failed to make landfall during 1950–95), while many storms recurve quite close to the East Coast, never making landfall.
Within each region, we develop a climatology and conduct a search for useful discriminant models to provide long-range forecasts. The available predictors for the models consist of the usual seasonal activity predictors (Gray et al. 1992, 1993) for the Gulf of Mexico and the Caribbean Sea, while additional predictors are required for the East Coast regions. Using an algorithm that chooses a subset of predictors, we find that significant and skillful models can be developed for almost all of the regions. This represents a new result in the field of seasonal hurricane forecasting and suggests new avenues for future exploration.
To develop our prediction models, we used data made available from the National Hurricane Center’s best track dataset, the National Center for Atmospheric Research, and Gray et al. (1992, 1993). The datasets created required spatial and temporal stratification. Independent datasets were created for the four locations of interest: the Gulf of Mexico and the Caribbean Sea (the two subbasins), and the lower and upper Atlantic coastal regions (the two landfall regions). We also stratified the prediction locations by prediction date: some locations had successful models initialized by 1 December of the preceding year, while other locations and climatological considerations necessitated an initialization date of 1 August. We chose the earliest prediction date for which our models obtained statistical significance; except for predicting regular hurricane activity in the Caribbean Sea, all statistically significant models required the use of the 1 August data. The model predicting Caribbean hurricane activity achieved a statistically significant result using 1 December data; this result did not improve with the 1 August data. Finally, we created separate datasets for the two ocean basins according to tropical cyclone intensity: one set for all storms of at least hurricane intensity (≥ 33 m s−1) and one set for storms of only major hurricane intensity (≥ 50 m s−1). We did not attempt to stratify the East Coast landfall regions by storm intensity due to insufficient numbers of landfalling intense storms.
We faced an additional concern in stratifying the datasets by storm intensity since the best track dataset contains a probable high bias in wind speeds for storms in the 1944–69 time period (Landsea 1993). To account for this, we implemented the bias correction suggested by Landsea (1993) by deflating storm wind speed values by 2.5 m s−1 for those tropical cyclones in the 1950–69 time period. Using this criterion, the omission of Hurricane Francelia of 1958 as an intense Caribbean hurricane caused the only modification of our dataset. Hurricane Camille of 1969 was the only other candidate that could have affected our dataset; however, since the storm had a reported pressure of 964 mb and reported sustained winds of 52 m s−1 several hours before making landfall in Cuba, and since it was in a rapidly deepening phase at this time, we retained Camille as an intense Caribbean hurricane. We note that the classification of intense hurricanes chosen here may not necessarily agree with other researchers’ classifications as other factors besides measured wind speeds (e.g., storm surge height) were sometimes chosen to form the classification basis; we utilized the wind speeds in order to maintain an objective classification criterion. Neumann et al. (1993) provide a good reference for determining storm intensities using other criteria.
The datasets for each region span the time frame from 1950 until 1995. While reliable hurricane track data exists before 1950, incomplete upper-air data records preclude the usage of data before this date. Incomplete archive records also caused a few recent years to be omitted for the East Coast landfall regions. The actual construction of the datasets for each region proceeded as follows.
Define the boundaries of the subbasin or the landfall region;
determine the number of tropical cyclones of hurricane intensity for each year within those boundaries;
omit any tropical cyclones that occurred in that year prior to the initialization date;
classify the year as active if one or more hurricanes occurred that year, otherwise classify it as inactive;
repeat steps 2–4 for intense hurricanes (excluding the two East Coast regions); and
add the appropriate pool of potential predictor covariates for each year.
For step 2 above, we define an occurrence of a hurricane within the predefined boundaries according to whether the location of interest is a subbasin or a landfall region. If the location is a subbasin, we require that at least one-half of the circulation cross the line demarcating the subbasin boundaries (refer to Fig. 1 for these boundary lines). Likewise, for landfall regions, a storm is classified as an occurrence within that region if at least one-half of the circulation crosses the coastline (or barrier islands if appropriate). This definition produces a couple of classification distinctions based on small differences in storm tracks. For example, Hurricane Betsy of 1965 is classified as a landfall on the southeast coast while Hurricane Inez of 1966 is not; even though Inez affected the Florida Keys (not within southeast coast landfall region), it neither produced hurricane conditions on the Florida peninsula nor did at least one-half of its circulation cross the southeast coast landfall zone. On the other hand, hurricane Betsy’s center of circulation passed directly over the southern Florida peninsula. Note finally that this definition allows for a single storm to count as an occurrence in more than one region or subbasin; Hurricane Betsy counts as both a southeast coast landfall and as an intense hurricane in the Gulf of Mexico subbasin.
In addition, two elements of this construction process require particular attention. First, the boundary regions were chosen by geographical considerations for the subbasins and by arbitrarily dividing the East Coast north and south of 35°N. These boundaries were constructed prior to model selection and validation, so that no need would exist to cross-validate the model statistics over geographical regions. Furthermore, the use of 35°N to segregate the northeastern coastal area has historical precedence (Kocin and Keller 1991). Kocin and Keller (1991) also justify the use of this latitude by the change in the orientation of the coast. The other issue to note is that we classified a year as active or inactive for a region without regard to the total number of storms that occurred within that region. The presence of at least one storm was sufficient to deem that year as active.
The pool of potential predictor covariates for the 1 December datasets consist of the familiar predictors identified by Gray et al. (1992). These predictors are 1) the autumnal rainfall within the Sahelian region of western Africa (RS); 2) the autumnal rainfall within the Gulf of Guinea of western Africa (RG); 3) the forward extrapolated 30-mb stratospheric quasi-biennial oscillation (QBO) (Q30); 4) the forward extrapolated 50-mb QBO (Q50); and 5) the forward extrapolated vertical shear magnitude between the 30- and 50-mb QBO winds (QDIFF). Similarly, the 1 August datasets include the nine hurricane activity predictors of Gray et al. (1993). Besides the five predictors described above (updated with early summer data), the 1 August data include four additional predictors: 1) the Southern Oscillation index (SOI); 2) the eastern equatorial Pacific sea surface temperature anomalies (SSTA); 3) the 200-mb zonal wind anomaly in the Caribbean (ZWA); and 4) the sea level pressure anomalies in the Caribbean (SLPA). The stratospheric QBO values are not forward extrapolated for the 1 August datasets.
The Gray et al. (1992, 1993) predictors were insufficient in assessing and modeling the risk of hurricane activity affecting the eastern United States since these predictors are entirely embedded in the large-scale tropical climate, while much of the East Coast lies well within the effects of midlatitude systems. Consequently, a search for additional predictors was undertaken; identification of candidate predictors was developed by consideration of known physical mechanisms affecting hurricane development and tracking. Since vertical wind shear, enhanced baroclinicity, and sea level pressure anomalies are all well-known factors affecting hurricane development and movement (Gray 1988; Kimberlain 1996), we considered several available predictors that addressed these factors. Some of these additional potential predictors were 1) July monthly mean sea level pressures at several East Coast reporting stations (SLP); 2) the July monthly coastal sea level pressure (JCSLP) averaged over these same East Coast stations; 3) the magnitude of the vertical shear of the average July monthly 700- and 200-mb winds (VS) for several East Coast sounding locations; 4) the least squares estimated meridional component of the gradient of the sea level pressure along the East Coast (SLPG, units of millibars per degree of latitude); and 5) the least squares estimated meridional component of the gradient of the geopotential heights on several constant pressure surfaces (HGxx, units of meters per degree of latitude, xx subscript refers to height level). Consideration of quasigeostrophics (i.e., baroclinic effects) leads somewhat to the choice of the candidate predictors. That is, we examined these pressures, height gradients, vertical shear, and similar factors as these represent some of the key influences in the midlatitudes as compared to the Tropics. Note that this does not represent a complete list of the potential predictors that we examined. For example, we also did some preliminary investigation of 500-mb monthly mean steering flows; since these showed no significant predictive value in preliminary examination, we omitted these from the later compilation. Furthermore, note that in constructing these predictors we were unable to obtain monthly upper-air records after 1992 for some reporting stations, which precluded the use of 1993–95 data for the United States east coast.
The VS was computed as follows. First, for a sounding location, we took the observed 700- and 200-mb winds for both the 0000 and 1200 UTC soundings of each day and decomposed each observed wind into its respective u and v components. Next, we calculated the monthly average u and v components for both the 700- and 200-mb winds. Finally, we computed the magnitude of the vertical shear, using the usual Euclidean distance function:
During the time period that these upper-air observations were available, the rawinsonde release points were relocated for some locations; the move of the release point from Miami (MIA) to West Palm Beach (PBI) in 1977 was by far the largest and most serious of these moves. Since the VS calculated from the MIA/PBI later proved to be one of the most important predictors, we ran some statistical tests to determine if any significant quantifiable changes occurred due to the relocation. To perform these tests, we investigated to see if the mean 500-mb geopotential heights, wind components, and temperatures showed any significant differences between the two sites. More levels were not chosen for the statistical tests so that we could avoid a statistical multiple comparison problem. A standard t test could not be employed due to unequal variances between the two sites, so we employed Mood’s median test (Daniel 1990) to ascertain any significant differences; p values for all tests were greater than 0.1, hence we concluded that the change in the release point had no detectable difference on the observed soundings.
Table 1 lists some of the additional predictors available for the model building process in the southeastern United States coastal region. The hurricane activity classification coefficient for southeastern United States landfalling storms is listed as well. The listing of the additional potential predictors in this table is incomplete and is provided so that our results may be reproduced. Many other candidate predictors were evaluated that are not provided in this table; these included vertical shears, height gradients, and other factors.
We present a climatology of hurricane activity within the Gulf of Mexico, the Caribbean Sea, and the two landfalling regions. Our focus is to lay the foundational understanding necessary to evaluate the utility of our forecast models developed within each prediction region. In consideration of this goal, we describe the expected incidence of (intense) hurricane activity on a regional basis, note differences in this from location to location, and discuss some of the known climatological factors involved.
We begin by noting that climatic conditions that govern the favorability of hurricane formation (e.g., pressure anomalies) and development within specific geographic regions of the North Atlantic basin are not very well understood. Large-scale circulation patterns that exert a control on the eventual paths of tropical cyclones have also received little attention in the literature. Such steering winds are a function of the major (dominant) pressure systems that may often have some persistence throughout a season (Ballenzweig 1959).
Ballenzweig (1959) observed that atmospheric conditions that dictate the favorability of hurricane formation and growth are unique to individual basins. For instance, anomalous easterly flow in the middle and upper troposphere supports increased tropical cyclogenesis and conditions capable of spawning hurricane activity in the eastern North Atlantic and the Gulf of Mexico. On the other hand, an extension of the polar trough in the western Caribbean Sea during the early and late season has a substantial impact on the frequency of tropical cyclogenesis in the Caribbean Sea (Ballenzweig 1959).
Recently, factors related to hurricane development for the entire North Atlantic basin have been discovered (Gray 1984a; Shapiro 1989), but individual regions still lack explanation. Landsea et al. (1992) were able to link western Sahelian monsoon rainfall to the number of intense landfalling hurricanes for the United States. They also noted that the numbers of intense hurricanes have declined substantially since the late 1960s. Landsea and Gray (1992) also showed a substantial link between Caribbean hurricane and intense hurricane activity and African rainfall; increased western African rainfall was associated with far more Caribbean hurricanes.
For the Gulf of Mexico, hurricane activity is largely confined to the period from mid-August to mid-October and is coincident with the peak for the entire Atlantic hurricane season. Hurricane incidence outside of this window is limited and episodic. In fact, it is rather rare for a hurricane to be present in the Gulf of Mexico during June or July or after the middle part of October, but it is not without precedent. Hurricane Audrey (1957) is the only recorded intense hurricane during the early season (before 1 August) in the Gulf of Mexico during the 1950–95 period; it was responsible for a tremendous disaster along the Louisiana coast.
For the period of 1950–95, 68 hurricanes were noted in the Gulf of Mexico for an average of 1.47 per season. Of the 68, 40 made landfall along the United States coast. The Gulf Coast states averaged nearly one hurricane landfall per year during this time. Nevertheless, salient absences of hurricane activity have occurred in this basin. No hurricanes were observed during the following seasons: 1952, 1958, 1962, 1976, 1978, 1981, 1984, 1991, and 1994. Furthermore, a substantial decline in intense hurricane numbers is noted since the early 1970s. Despite this, overall hurricane activity has remained relatively constant. Alternative sources of hurricane formation (e.g., baroclinic developments) may be offsetting the decline (Elsner et al. 1996; Kimberlain 1996).
Figure 2 shows the intraseasonal variability of Gulf of Mexico hurricanes; the depiction includes the intraseasonal frequency distribution for both intense hurricane activity and all hurricane activity. As the empirical distribution indicates, the use of input variables to initialize a model on 1 August is reasonable as the majority of all intense hurricane activity in this basin comes after this date.
A total of 50 hurricanes occurred within our basin boundaries of the Caribbean Sea between 1950 and 1995 for an average of approximately one hurricane per season. The 1950 season had four hurricanes occur in the Caribbean basin, and a number of other years have had three in a single season (1951, 1955, 1961, 1966, 1995). Hurricanes in the Caribbean basin were much more prevalent during the 1950s and 1960s, with only a few years experiencing no hurricane activity. Since the mid-1970s, there has been a substantial reduction in the hurricane incidence in this basin. Given that moderate to strong El Niño events may inhibit hurricane activity within this region (Gray 1984a), the recent extended episodes of El Niño activity may have caused some of this observed reduction in hurricane incidence. Recent African rainfall deficits may also have played a role in this reduction. In fact, no hurricane activity was observed whatsoever during the recent periods of 1982–86 and 1990–94. Additional years devoid of hurricane activity include 1957, 1959, 1962, 1965, 1972, 1973, 1976, and 1977. It is noted that many of these years are coincident with moderate to strong El Niño events. Landsea et al. (1994) provide a good reference for Caribbean hurricane variability on a multiyear timescale. It is possible that the 1995 and 1996 hurricane seasons may represent a sharp return of Caribbean hurricane activity.
Unlike Gulf of Mexico hurricanes, there has been a large reduction in hurricane numbers since 1970 in the Caribbean Sea. Likewise, a decrease in intense hurricane activity is evident after 1970. These events are nearly coincident with the reduction in tropical-only hurricane formations noted by Elsner et al. (1996) and the associated African rainfall deficits. Reading (1990) suggested that the 1930s and 1950s were characterized by high levels of tropical cyclone activity within this basin. Furthermore, Reading (1990) revealed that even though frequencies of Caribbean hurricanes had declined during the 1960s through the 1980s, they remained much higher than the deficits experienced during the 1870s, 1910s, and 1920s.
Figure 3 shows the intraseasonal variability of hurricane activity in the Caribbean Sea. For the most part, activity is confined to the August–October period; intense hurricane activity is also limited to the same time period with the centroid near the absolute peak of the hurricane season. Since most of the intense hurricane activity occurs after 1 August, a predictive model based on data available at that time retains its utility.
The zone from the east coast of Florida to the Carolinas is particularly vulnerable to hurricane strikes. However, most hurricane landfalls and all intense hurricane landfalls during the 1950–95 period occurred after 1 August but before 15 October, thus validating the use of data to make a 1 August forecast. In fact, for the period of 1950–92 (the period for which we have all available predictor data), 21 of the 23 storms that made landfall did so after 1 August.
For the northeast coast of the United States (the area north of 35° latitude), a long-term analysis of hurricane threats and landfalls reveals alternating periods of activity (Kocin and Keller 1991). The most recent periods of extensive hurricane activity were the 1890s and then the 1950s and 1960s. Also of note is the relative inactivity between 1900 and 1930 and the recent lull in activity since the mid-1960s.
Between 1950 and 1995, a total of 11 hurricanes made direct landfall on the northeastern landfall region. Of the 11, only one was intense. The origins of the vast majority of the hurricane landfalls in this region were either over the tropical Atlantic (south of 20°N and east of 60°W) or in the western Atlantic (Kocin and Keller 1991). For the purposes intended here, an insufficient number of storms occurred during the period of usable upper-air data (1950–95); consequently, we did not attempt to model landfalling storms for the northeastern United States since climatological accuracy was already too high. We should caution that the relative inactivity (compared to other regions) during our data period and subsequent lack of ability to develop a prediction model does not imply that the northeast coast is immune to the threat of a hurricane nor will the activity level always remain this low. Rather, we were unable to develop a model only because an insufficient number of storms occurred during the reliable upper-air data recording period.
5. Multivariate discriminant analysis
Here we describe the methodology of multivariate discriminant analysis as it pertains to the problem of predicting hurricane activity in subbasins of the North Atlantic. We also describe the predictor selection algorithm and the statistical tests used to ascertain model significance.
For our situation, we have a categorical response variable consisting of two distinct groups (i.e., 1 for a nonhurricane year and 2 for a hurricane year). For each year, we also have the values of several continuously valued candidate predictor variables (covariates). Linear discriminant analysis is a statistical method that seeks to classify categorical data as a linear function of its covariates (Mardia et al. 1979). It is the precise analog of linear regression analysis, except that the dependent (response) variable is now categorical instead of being a continuously valued random variable.
Linear discriminant analysis works by creating a linear function of the covariates for each group. Consider the case where we have two groups (1 and 2) and four covariates (X1, X2, X3, and X4). The methodology works by using the data in a sample to estimate linear functions for each group. Using the notation aij to denote the estimated linear coefficient for the ith group and jth covariate, the method would yield
An observation is then classified into either group 1 or 2 if the corresponding value of Scorei is the largest of the two values. Given a new observation for which we have values of the covariates but do not know the proper classification, we can use the linear discriminant functions to predict its classification.
A major issue for the procedure described above is the choice of the optimal method to estimate the linear coefficients since it may depend upon the distribution of the variables (Mardia et al. 1979). Furthermore, discriminant methodology is technically a Bayesian classifier, so that the choice of the optimal method should seek to maximize the associated Bayesian classification rule. For the case of only two categories, the classification method developed by Fisher asymptotically maximizes the Bayes classification efficiency regardless of covariate distributions, and the method in itself represents an effective classification method regardless of statistical considerations provided that the two groups have the same population covariance matrices (Hand 1981).
Some mathematical notation is helpful in explaining how the score functions are obtained. Let X be a p × 1 column vector (p predictor variables) that denotes the vector of covariate values for an observation in the sample. Let M1 and M2 be p × 1 vectors of the means of the sample covariates (the centroid) for groups 1 or 2. Furthermore, let S1 and S2 be the p × p sample covariance matrices for the two groups. Assuming that each group has the same covariance structure, we also define S as the p × p pooled sample covariance matrix (S is a linear combination of S1 and S2).
The linear coefficients are hence developed by using Fisher’s method; as used here, this is done by considering the distance between a sample observation and the centroid of all sample observations for that group. The distance metric used in this case is the so-called Mahalanobis distance function, which adjusts the distance in each predictor dimension according to the variance of that predictor so that the measure is scale invariant (Mardia et al. 1979). This distance measure D is
An observation would be classified to group i if this function were minimized for group i. Now the function above is not linear in X; however, we can make the expansion
In the above expansion, the first term is constant for group i, the second term is linear in X for group i, and the last term is constant across all groups and may thus be discarded for classification purposes. We have thus obtained the estimates for (2) and can use these for predicting new observations.
As an example of discriminant analysis, refer to Fig. 4. For simplicity, the graph shows only a two-predictor case; these are the two predictors used in the model for predicting the occurrence of a Caribbean hurricane (of any intensity) initialized from the preceding December. Since we have only two predictor variables, the linear discriminant methodology partitions the plane by a line and assigns observations to the two groups according to which side of the line the observation lies. Here observations above and to the right of the line are allocated as years for which a Caribbean hurricane will occur, while observations below the line are allocated as years for which a Caribbean hurricane will not occur. For this classification, the in-sample error rate, as can be determined by the graph, is 0.196 (9 out of 46 are wrong, giving an accuracy of 80.4%). In general, results are somewhat improved by incorporating additional predictors, but these results are difficult to depict graphically.
For this method to be valid, a crucial assumption requires each group to have identical true covariance matrices. In the case of multivariate normal data, this method is also optimal (Mardia et al. 1979). Since multivariate normal data rarely if ever occur in practice, an alternative distance function may prove more accurate in certain cases; however, this method works quite well even when the data are far from normal.
Of far more importance to the work here is evaluating how well the discriminant models classify the hurricane activity years and hence determine their predictive abilities. The naïve approach is merely to evaluate the in-sample classification accuracy; however, as is well documented, such estimates are biased substantially low. Furthermore, the in-sample classification accuracy increases monotonically as additional predictors are added. Consequently, cross-validation techniques are required to obtain nearly unbiased error rate estimates (Hand 1981).
Since we are interested in maximizing the predictive ability of the models, the model-building algorithm includes a search procedure that considers all possible variable subsets. The algorithm selects the smallest variable subset (in terms of number of predictors) that maximizes the cross-validated classification accuracy. To assess the statistical significance of the selected model, bootstrap techniques are used to find an approximate p value (the significance level) using the cross-validated classification accuracies as the simulated output variable (Efron and Tibshirani 1993). This approach was mandated to avoid selection biases that might occur using other significance tests and to test specifically in accordance with our prediction objectives. The bootstrap distribution is simulated here by randomly sampling (with replacement) the predictor variable vectors for each year while holding the observed classification variable fixed.
We also evaluated the skill of the models versus naïve climatology by employing a normal approximation to test the statistical significance of the model classification accuracy versus the best that could be obtained by climatology (Devore 1991). The test statistic z is given by
where n is the number of years of data available, pm is the cross-validated model classification accuracy, and pc is the best accuracy that could be obtained from climatology.
We implemented the model selection algorithm for all forecast locations for both hurricane activity and intense hurricane activity. The algorithm was also implemented for both available lead forecast times: December of the preceding year and August of the current year. The available predictors for selection in December consisted of the usual Gray et al. (1992) forecast variables, while the August available predictors included the usual Gray et al. (1993) variables as well as our new predictors.
As implied by previous discussion, useful models could not be developed against climatology for hurricane activity in the Gulf of Mexico or for landfalling hurricanes along the northeastern coast of the United States. The Gulf of Mexico has hurricane activity occur nearly every year, while landfalling storms in the northeastern United States are a relatively rare event.
For the 1 December initialization, the algorithm identified a successful set of only two predictors for predicting hurricane activity in the Caribbean: the two African rainfall estimates (RG and RS). The cross-validated model classification accuracy was 37/46 or 80.4% correct. Statistical significance tests yielded a bootstrap p value of less than or equal to 0.001 for the observed cross-validation classification accuracy and a normal approximation p value of 0.002 when compared to the climatological accuracy of 27/46 or 58.7%. Table 2 shows the cross-validated (hindcast summary) for each year from 1950 to 1995. Note that no error bias exists, that is, the model forecast errors do not show any consistent pattern. For this model, the cross-validated hindcast error was identical to the in-sample error estimate; refer to Fig. 4 for a graphical depiction of the classification regions.
No other successful forecast models could be initialized by 1 December. However, we were able to identify three useful models using data up to 1 August. In particular, skillful models were possible for the Caribbean Sea and the Gulf of Mexico for predicting the occurrence of intense hurricanes. This still provides good information for the remainder of the season given that only four intense hurricanes have occurred in these basins since 1944 before 1 August of each year (Landsea 1993). The climatology of intense hurricanes for the entire available record also shows that 98% of the activity takes place after 1 August (Landsea 1993). Furthermore, as discussed earlier, nearly all of the hurricanes striking the southeastern United States coastline have done so after 1 August. Consequently, model predictions should retain the majority of their utility, although earlier prediction dates would be desirable.
For predicting intense hurricanes in the Gulf of Mexico, the algorithm identified a model that obtained a cross-validated hindcast accuracy of 36/46 or 78.3%. In achieving this accuracy, the algorithm selected three predictors for the discriminant model: Q50, Q30, and SOI. The best strategy using climatology would be to predict no intense hurricane for each year, which would give an estimated accuracy of 24/46 or 52.2%. We again obtained bootstrap p values of less than or equal to 0.001, while the approximate z value versus climatology was 3.54, also yielding a p value of less than or equal to 0.001. Table 2 shows the hindcasts by year; again, no notable prediction bias exists.
Slightly more significant results were obtained for predicting intense hurricanes within the Caribbean basin. Our algorithm selected a three-predictor model achieving a cross-validated hindcast accuracy of 37/46 or 80.4% using the following predictors: Q30, RS, and ZWA. Climatological prediction accuracy was identical to the Gulf of Mexico, predicting no intense hurricane activity for an accuracy of 25/46 or 54.3%. The bootstrap p value obtained was less than or equal to 0.001, while the normal approximation z value was 3.55, yielding a p value also less than or equal to 0.001. Again, indicated in Table 2, no prediction bias was noted in the cross-validated hindcasts.
We were unable to obtain a skillful model for predicting intense landfalling hurricanes in the southeastern United States; however, we did obtain a significant model for landfalling hurricanes of any intensity. The model selected was a five-predictor model using RS, QDIFF, the 700–200-mb vertical shear in the Miami–West Palm Beach area (VS-MIA/PBI), the July monthly sea level pressure in Cape Hatteras (SLP-HAT), and the July monthly East Coast sea level pressure average (JCSLP). Cross-validated hindcast model accuracies stood at 35/43 or 81.4%, which clearly exceeded climatological prediction accuracies of 25/43 or 58.1%. Statistical significance tests of the model yielded a bootstrap p value of 0.0005 and a normal approximation z score versus climatology of 3.09, which has a corresponding p value of 0.001. To illustrate the significance of this model, Fig. 5 shows the simulated bootstrap distribution (based on 4000 simulations) of the hindcast accuracy; as can be seen, the observed value is in the extreme tail of the simulated distribution. As was the case for the other three models, Table 2 shows that no prediction bias exists with this model. To help summarize our results for this section, we present Table 3 for easy reference.
7. Discussion and conclusions
As with any study or result using purely statistical methodology, one must maintain awareness that statistical association does not necessarily imply causality. Nevertheless, we may cautiously interpret the results, particularly where past studies have indicated or implied a physical linking mechanism. Of particular interest with the current results are the different subsets of predictors identified with the different prediction locations. We note that good historical precedence exists to indicate that the results obtained are not at all surprising, especially where the familiar Gray et al. (1992, 1993) predictors are utilized. Somewhat more dubious interpretation is required with our new predictors.
Before interpretation of results can commence, a brief discussion of tropical-only (TO) versus baroclinically influenced (BI) hurricanes is required. A TO hurricane is defined as a tropical cyclone that first achieves hurricane intensity (≥33 m s−1) devoid of any enhancing midlatitude baroclinic influences; otherwise, it is a BI hurricane (Hess et al. 1995, Kimberlain 1996). Baroclinic influences can include interactions with midlatitude systems, synoptic-scale forcing caused by upper-level troughs, and initial tropical cyclone genesis caused by baroclinic zone vorticity generation. Hess et al. (1995) were able to show that the seasonal prediction models developed by Gray et al. (1992, 1993, 1994) were actually forecasting only the TO component of the hurricane season. Furthermore, Elsner et al. (1996) noted that 78% of all intense hurricanes were of the TO classification, though later baroclinic influences may have contributed to some of the TO hurricanes obtaining intense hurricane status.
We note first that the 1 December prediction model for Caribbean Sea hurricanes of any intensity was linked entirely to the African rainfall parameters. As the African rainfall values increase relative to the long-term average, the likelihood of a Caribbean hurricane increases. The result is entirely intuitive since the great majority of hurricanes that form or track in the Caribbean Sea are of the TO type. Hence, the increase in African rainfall drives an increase in TO hurricanes that form from easterly waves; these in turn may eventually track over portions of the Caribbean Sea.
In contrast to the prediction of regular Caribbean hurricane activity, we find that the models predicting intense hurricane activity depend much more strongly upon variables measuring the vertical structure of the wind. While the Caribbean intense hurricane activity model still uses an African rainfall parameter, due most likely to the TO nature of Caribbean hurricanes, the other two predictors generally measure the wind environment that would be available to any tropical cyclone present within the Caribbean Sea. We hypothesize that the predictors here consist of two parts: a part to indicate whether or not a hurricane will be present (RS) and a part to indicate the favorability of the storm relative environment (Q30 and ZWA). The possible physical mechanism relating the QBO phase to hurricane development and maintenance has been discussed by Shapiro (1989) and Gray et al. (1992). The most likely explanation involves how well the troposphere physically couples with the lower stratosphere, though debate remains as to the exact nature of the coupling. Nevertheless, the effect upon intense hurricanes is clear. We note again that the QBO does relate to the formation of TO hurricanes that generate intense hurricanes. The ZWA parameter very clearly relates to the vertical wind shear environment that would be felt by a storm; clearly, the better the environment, the more likely a storm could obtain intense hurricane status.
We note an even larger dependence of Gulf of Mexico intense hurricanes upon wind parameters. Here the entire prediction set consists of measurements of the vertical wind structure, though the relationship of the SOI upon this deserves some comment. As is well known, warm El Niño conditions in the equatorial Pacific enhance convection and result in strong upper-tropospheric jets that traverse the Atlantic. Gray (1984a) documented the effect of warm events upon upper tropospheric winds as vertical wind shear is greatly enhanced during warm El Niño conditions.
The other two variables in the Gulf of Mexico intense hurricane model, Q50 and ZWA, also pertain to vertical wind shear and structure. Noting that hurricanes occur nearly every year in the Gulf of Mexico and that the Gulf of Mexico experiences extensive BI hurricanes in addition to TO hurricanes (Elsner et al. 1996), we hypothesize that the only predictable criterion to developing an intense hurricane within the Gulf of Mexico is the storm relative environment since tropical cyclone activity is otherwise so frequent. Since the lack of significant vertical wind shear is an integral part of a favorable hurricane environment, the algorithm’s identification of these variables for predictive purposes has legitimate physical justifications.
Interpretation of model results is somewhat more difficult for the southeastern United States coastline. The presence of RS and the QBO shear suggest that the usual tropical-only element in the total number of Atlantic storms plays a role. However, the role of the East Coast sea level pressures and the 700–200-mb vertical shear over south Florida is somewhat more difficult to ascertain. It is possible that the sea level pressures could be a measurement of tropical cyclone steering mechanisms, provided that climatological persistence occurs. We speculate also that the July monthly vertical shear measured over south Florida could indicate the extent of the intrusion of midlatitude synoptic-scale features into the subtropics of the southeastern United States coast. Another possibility is that this shear is measuring the presence or absence of subtropical (or polar) jets within the region. Nevertheless, these predictors do not lend themselves to easy interpretation. For example, Fig. 6 displays the probability contours of a southeast coast landfall using only the VS-MIA/PBI and SLP-HAT variables (note that this will not have the accuracy of a full-variable model). As is evident, the highest likelihood of a landfall is associated with relatively high July monthly Hatteras sea level pressures and high vertical shears over south Florida. The associated mechanisms, if any, pose a difficult research problem.
We summarize this section by noting that our results hold in accord to those noted by other researchers but applied to more specific problems. As discussed in the climatology section, Ballenzweig (1959) noted that different features accounted for tropical cyclone genesis and motion within different regions. Our models clearly utilize different predictors within each region. Landsea et al. (1992) showed that intense hurricanes making landfall on the United States east coast are related to African rainfall, while no such association holds for the Gulf of Mexico. Our models failed to incorporate rainfall as a predictor for intense hurricanes in the Gulf of Mexico while using these everywhere else, holding in accord to these results. The division of hurricanes into TO and BI storms and their relative predictability using Gray et al.’s (1992, 1993) seasonal predictors is also affirmed here. Hess et al. (1995) showed that only the numbers of TO storms are related to these predictors, and we find that our models rely more heavily on these predictors where TO storms dominate.
Finally, we must stress that the prediction models developed here use data that were not idealized for our purposes. We believe it likely that better predictors yet exist of hurricane activity within certain regions; research is currently underway to modify existing predictors and search for additional predictors. Furthermore, we did not investigate the utility of predicting from other dates. For example, we did not determine whether or not successful predictions are possible using 1 June data (i.e., the data from Gray et al. 1994). Also, other statistical techniques may be more optimal for certain regions, such as using logistic models [Neter at al. (1989); note that this approach does not yield better results for the prediction locations used here].
What we have shown is that specific and relatively accurate prediction by region is possible using currently available data. Information from these models could be used to heighten the alert status of local authorities within certain regions and allow meteorologists and insurance companies to prepare for (intense) hurricane activity within responsibility areas. We have also shown an entirely new method of making seasonal predictions using existing statistical techniques. It is likely that this is only a start to predictive technology of this form since much work is yet required in this area.
Some funding for this research came from the Risk Prediction Initiative (RPI) and the Florida State University. The sea level pressure and upper-air data for the United States east coast were made available from the National Climatic Data Center via the National Center for Atmospheric Research. Dr. Chris Landsea also provided assistance in identifying properly normalized data.
Corresponding author address: Gregor S. Lehmiller, Statsoft Inc., 2300 East 14th Street, Tulsa, OK 74105.