## 1. Introduction

For all operationally designated tropical and subtropical cyclones, the National Oceanic and Atmospheric Administration’s National Hurricane Center (NHC) issues an official (OFCL) forecast daily at 0000, 0600, 1200, and 1800 UTC. These products include intensity (maximum sustained 1-min surface wind speed) forecasts valid 12, 24, 36, 48, 72, 96, and 120 h after the initial synoptic time. These forecasts, and those from individual models, are compared to the corresponding poststorm best-track intensities if the system is classified in the best track as a tropical or subtropical cyclone at the initial and verifying times. Some intensity forecasts, those in which either the true or forecast cyclone dissipated but the other did not, are therefore not verified.^{1}

Figure 1 shows intensity forecasts from two such cases: Tropical Storm Stan at 0000 UTC 4 October 2005 was forecast by the various intensity models and OFCL to reach an intensity of between 80 and 90 kt by 48 h (Fig. 1a). Stan made landfall earlier than forecast and dissipated over the mountainous terrain of Mexico 24 h after the initial time. No forecasts from 36 h onward, including the 36-h OFCL forecast of 90 kt, are verified in the currently used technique. In the second case, at 0600 UTC 11 November 2006, the tropical depression that became Tropical Storm Gamma was forecast to remain at 25-kt intensity for 24 h before dissipation (Fig. 1b). The best track shows that the system remained a tropical cyclone through 5 days, reaching a maximum intensity of 40 kt. Because the OFCL called for dissipation after 24 h, it is not verified with the currently used technique.

An alternative method to verify all tropical cyclone intensity forecasts, including these forecasts missed in the currently used method, is presented in the next section. In section 3, OFCL and some model forecast verifications are presented, and conclusions are provided in section 4.

## 2. Methods

Tropical cyclone intensity forecast error is defined as the difference between the forecast and the postseason best-track intensities verifying at the same time. This definition omits cases in which either the tropical cyclone dissipates or the forecast calls for dissipation. To include these cases, an alternative verification method using contingency tables is proposed. Because the best-track intensity is reported in 5-kt intervals, all model forecasts are rounded to the nearest 5 kt. At each forecast time, a matrix (or contingency table) is filled with the count of each forecast–verification pair in the sample (Table 1). The first row and column represent the number of times the tropical cyclone does, or is forecast to, dissipate, respectively. Each subsequent row and column represents increases in intensity in 5-kt increments. Perfect forecasts are along the contingency table’s diagonal. The farther each forecast is from the diagonal, the larger the forecast error is. Thus, a 25-kt forecast is closer to the diagonal than a 65-kt forecast if the tropical cyclone has dissipated. Similarly, a dissipation forecast is closer to the diagonal if the best track showed the intensity to be 25 kt than if the intensity were 65 kt.

*C*is the number of correct forecasts,

*T*is the total number of forecasts, and

*E*is the number of forecasts expected to be correct, can be calculated for each contingency table (Panofsky and Brier 1958). The skill score will be unity if all cases are correctly predicted (

*T*=

*C*), and less than or equal to zero for no skill (

*C*≈

*0*).

*R*and

_{i}*C*are the total numbers of cases in the

_{i}*i*th row and

*i*th column, respectively. For climatology and persistence, the Statistical Hurricane Intensity Forecast (SHFR; Jarvinen and Neumann 1979) is used, and

*E*is the actual number of correct SHFR forecasts (those along the diagonal).

Since the skill score does not consider the distance from the diagonal in the contingency table, error probability distributions are examined below.

## 3. Results

### a. OFCL forecasts

Since 1988, NHC has archived OFCL and model 12-, 24-, 36-, 48-, and 72-h intensity forecasts; OFCL 96- and 120-h and model 60-, 84-, 96-, 108-, and 120-h forecasts have been available since 2003. OFCL skill scores and their linear trends are shown in Fig. 2; the trends are not shown for the 4- and 5-day forecasts due to the short time series. Skill scores versus chance decrease with forecast time through 48 h (Fig. 2a).

Since 1988, 12-h forecast skill has decreased, 24-h forecast skill has remained about constant, and forecast skill has increased from 36 to 72 h. The skill scores versus SHFR (Fig. 2b) decrease with forecast time through 72 h, and the trends are similar to those versus chance. In the remainder of this manuscript, only skill scores versus SHFR are shown.

The annual probability that OFCL forecast intensity is within three error categories of the best-track intensity (<15 kt or one Saffir–Simpson intensity category), or are more than six error categories from the best-track intensity (>30 kt or two Saffir–Simpson intensity categories), are shown in Fig. 3. The majority of 12–48-h forecasts are currently within three error categories, and the percentage of these forecasts has been increasing in time at all forecast times except 12 h. Less than one-third of all forecasts are now more than six error categories off from the best-track intensity, and the annual probabilities of these occurrences has been falling steadily.

### b. Model forecasts

Of the model intensity guidance, the Statistical Hurricane Intensity Prediction Scheme (SHIPS; DeMaria and Kaplan 1994; DeMaria et al. 2005), the adjusted Geophysical Fluid Dynamical Laboratory Hurricane Model (GFDI; Bender et al. 1993; Kurihara et al. 1998), and the Florida State University Superensemble (FSSE; Krishnamurti et al. 1999, 2000a, b) have skill, and these models are shown here. SHIPS was upgraded in 2000 to include land effects (DSHP); the time series here includes the DSHP forecasts from 2000 onward. In contrast to the OFCL skill scores, the SHIPS skill scores (Fig. 4a) have increased at all forecast times and are approaching the OFCL skill scores. Both GFDI and FSSE (Figs. 4b and 4c, respectively) have shorter time series than OFCL and SHIPS, so any trends shown may not be representative of a true long-term trend. The GFDI skill scores have increased through 36 h and decreased afterward. The short FSSE time series has decreasing skill scores at all forecast times.

Figure 5 shows the probabilities the forecasts from the three models are within three error categories of, or more than six error categories from, the best-track intensity. The probability of the forecast intensity being within three error categories of the best-track intensity has increased at all times except for the 12-h SHIPS and FSSE forecasts; the probability of the forecast intensity being more than six error categories from the best-track intensity has decreased at all forecast times except for the 12-h SHIPS and FSSE forecasts.

## 4. Conclusions

An alternative method of verifying OFCL and model tropical cyclone intensity forecasts is presented. In the currently used verification technique, if either the forecast calls for the tropical cyclone to dissipate or if the real tropical cyclone dissipates, forecasts are not verified. In the new technique, these forecasts are included in a contingency table with all other forecasts. Skill scores and probabilities can be calculated using these contingency tables.

In contrast to previous studies (DeMaria et al. 2007), OFCL forecasts have been steadily improving at all forecast times except 12 h. This may be because 12-h forecasts are excellent, with nearly 90% of these forecasts being within three error categories (about one Saffir–Simpson category) of the best-track intensity, and only about 1% being more than six error categories (about three Saffir–Simpson categories) from the best-track intensity. The 12-h model intensity forecasts are not improving as fast as those at later forecast lead times. Currently, about one in twenty 24-h OFCL intensity forecasts are more than six error categories (35 kt or more) from the best-track intensity; Kaplan and DeMaria (2003) show that this is the approximate ratio of all overwater cases that undergo intensity changes of 35 kt or greater within 24 h. When the lead time increases to 2 days, nearly one-fifth of all intensity forecasts are more than six error categories from the best-track intensity, and by 5 days, nearly one-third are thus classified.

NHC OFCL intensity forecasts using the currently employed technique have improved slowly since 1990. The new technique, taking into account all forecasts, suggests that the improvement has been slow, but faster than previously shown with the current verification technique.

## Acknowledgments

This study is the result of earlier discussions of forecast verification with James Franklin of NHC and Harold Brooks of the National Severe Storms Laboratory. Mike Jankulak, John Kaplan, Frank Marks, and two anonymous reviewers provided very helpful comments on various versions of this manuscript.

## REFERENCES

Bender, M. A., , Ross R. J. , , Tuleya R. E. , , and Kurihara Y. , 1993: Improvements in tropical cyclone track and intensity forecasts using the GFDL initialization system.

,*Mon. Wea. Rev.***121****,**2046–2061.DeMaria, M., , and Kaplan J. , 1994: A Statistical Hurricane Intensity Prediction Scheme (SHIPS) for the Atlantic basin.

,*Wea. Forecasting***9****,**209–220.DeMaria, M., , Mainelli M. , , Shay L. K. , , Knaff J. A. , , and Kaplan J. , 2005: Further improvements to the Statistical Hurricane Intensity Prediction Scheme (SHIPS).

,*Wea. Forecasting***20****,**531–543.DeMaria, M., , Knaff J. A. , , and Sampson C. , 2007: Evaluation of long-term trends in tropical cyclone intensity forecasts.

,*Meteor. Atmos. Phys.***97****,**19–28.Jarvinen, B. R., , and Neumann C. J. , 1979: Statistical forecast of tropical cyclone intensity. NOAA Tech. Memo. NS NHC-10, 22 pp.

Kaplan, J., , and DeMaria M. , 2003: Large-scale characteristics of rapidly intensifying tropical cyclones in the North Atlantic basin.

,*Wea. Forecasting***18****,**1093–1108.Krishnamurti, T. N., , Kishtawal C. M. , , LaRow T. , , Bachiochi D. , , Zhang Z. , , Williford C. E. , , Gadgil S. , , and Surendran S. , 1999: Improved weather and seasonal climate forecasts from multimodel superensemble.

,*Science***285****,**1548–1550.Krishnamurti, T. N., , Kishtawal C. M. , , Shin D. W. , , and Williford C. E. , 2000a: Improving tropical precipitation forecasts from a multianalysis superensemble.

,*J. Climate***13****,**4217–4227.Krishnamurti, T. N., and Coauthors, 2000b: Multimodel ensemble forecasts for weather and seasonal climate.

,*J. Climate***13****,**4196–4216.Kurihara, Y., , Tuleya R. E. , , and Bender M. A. , 1998: The GFDL hurricane prediction system and its performance during the 1995 hurricane season.

,*Mon. Wea. Rev.***126****,**1306–1322.Panofsky, H. A., , and Brier G. W. , 1958:

*Some Applications of Statistics to Meteorology*. The Pennsylvania State University, 224 pp.

For 2005, the SHIPS 36-h contingency table and row–column titles. The first row and first column are the forecast and best-track intensities (kt), respectively, with “D” representing dissipation. The remainder is the contingency table in which numbers are the counts of each forecast–best-track intensity pair, with blank spaces representing zeroes. In this example, the forecast intensity was 30 kt 22 times when the tropical cyclone dissipated, and there are 12 cases that are correctly forecast to be 45 kt. Numbers along the diagonal (bold) represent the number of correct forecasts, and the distance from the diagonal corresponds to the size of the error.

^{1}

An internal NHC forecast verification that includes these cases is done, though using a different technique than is reported on here.