• Bender, M. A., , Ross R. J. , , Tuleya R. E. , , and Kurihara Y. , 1993: Improvements in tropical cyclone track and intensity forecasts using the GFDL initialization system. Mon. Wea. Rev., 121 , 20462061.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • DeMaria, M., , and Kaplan J. , 1994: A Statistical Hurricane Intensity Prediction Scheme (SHIPS) for the Atlantic basin. Wea. Forecasting, 9 , 209220.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • DeMaria, M., , Mainelli M. , , Shay L. K. , , Knaff J. A. , , and Kaplan J. , 2005: Further improvements to the Statistical Hurricane Intensity Prediction Scheme (SHIPS). Wea. Forecasting, 20 , 531543.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • DeMaria, M., , Knaff J. A. , , and Sampson C. , 2007: Evaluation of long-term trends in tropical cyclone intensity forecasts. Meteor. Atmos. Phys., 97 , 1928.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jarvinen, B. R., , and Neumann C. J. , 1979: Statistical forecast of tropical cyclone intensity. NOAA Tech. Memo. NS NHC-10, 22 pp.

  • Kaplan, J., , and DeMaria M. , 2003: Large-scale characteristics of rapidly intensifying tropical cyclones in the North Atlantic basin. Wea. Forecasting, 18 , 10931108.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Krishnamurti, T. N., , Kishtawal C. M. , , LaRow T. , , Bachiochi D. , , Zhang Z. , , Williford C. E. , , Gadgil S. , , and Surendran S. , 1999: Improved weather and seasonal climate forecasts from multimodel superensemble. Science, 285 , 15481550.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Krishnamurti, T. N., , Kishtawal C. M. , , Shin D. W. , , and Williford C. E. , 2000a: Improving tropical precipitation forecasts from a multianalysis superensemble. J. Climate, 13 , 42174227.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Krishnamurti, T. N., and Coauthors, 2000b: Multimodel ensemble forecasts for weather and seasonal climate. J. Climate, 13 , 41964216.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kurihara, Y., , Tuleya R. E. , , and Bender M. A. , 1998: The GFDL hurricane prediction system and its performance during the 1995 hurricane season. Mon. Wea. Rev., 126 , 13061322.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Panofsky, H. A., , and Brier G. W. , 1958: Some Applications of Statistics to Meteorology. The Pennsylvania State University, 224 pp.

  • View in gallery

    Operational model and OFCL intensity forecasts and best-track intensity for (a) Tropical Storm Stan, 0000 UTC 4 Oct 2005, and (b) Tropical Depression Gamma, 0600 UTC 11 Nov 2006.

  • View in gallery

    OFCL skill scores vs (a) chance and (b) SHFR, and their linear trends.

  • View in gallery

    Percentage of total annual OFCL forecasts (a) within three error categories of and (b) more than six error categories from the best-track intensity, and the linear trend of each series.

  • View in gallery

    Skill scores of (a) SHIPS, (b) GFDI, and (c) FSSE vs SHFR, and the linear trend of each series.

  • View in gallery

    Percentage of total annual (a) SHIPS forecasts within three error categories of, (b) SHIPS forecasts more than six error categories from, (c) GFDI forecasts within three error categories of, (d) GFDI forecasts more than six error categories from, (e) FSSE forecasts within three error categories of, and (f) FSSE forecasts more than six error categories from the best-track intensity, and the linear trend of each series.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 9 9 0
PDF Downloads 6 6 0

An Alternative Tropical Cyclone Intensity Forecast Verification Technique

View More View Less
  • 1 NOAA/AOML/Hurricane Research Division, Miami, Florida
© Get Permissions
Full access

Abstract

The National Hurricane Center (NHC) does not verify official or model forecasts if those forecasts call for a tropical cyclone to dissipate or if the real tropical cyclone dissipates. A new technique in which these forecasts are included in a contingency table with all other forecasts is presented. Skill scores and probabilities are calculated. Forecast verifications with the currently used technique have shown a slight improvement in intensity forecasts. The new technique, taking into account all forecasts, suggests that the probability of a forecast having a large (>30 kt) error is decreasing, and the likelihood of the error being less than about 10 kt is increasing in time, at all forecast lead times except 12 h when the forecasts are already quite good.

Corresponding author address: Sim D. Aberson, NOAA/AOML/Hurricane Research Division, Miami, FL 33149. Email: sim.aberson@noaa.gov

Abstract

The National Hurricane Center (NHC) does not verify official or model forecasts if those forecasts call for a tropical cyclone to dissipate or if the real tropical cyclone dissipates. A new technique in which these forecasts are included in a contingency table with all other forecasts is presented. Skill scores and probabilities are calculated. Forecast verifications with the currently used technique have shown a slight improvement in intensity forecasts. The new technique, taking into account all forecasts, suggests that the probability of a forecast having a large (>30 kt) error is decreasing, and the likelihood of the error being less than about 10 kt is increasing in time, at all forecast lead times except 12 h when the forecasts are already quite good.

Corresponding author address: Sim D. Aberson, NOAA/AOML/Hurricane Research Division, Miami, FL 33149. Email: sim.aberson@noaa.gov

1. Introduction

For all operationally designated tropical and subtropical cyclones, the National Oceanic and Atmospheric Administration’s National Hurricane Center (NHC) issues an official (OFCL) forecast daily at 0000, 0600, 1200, and 1800 UTC. These products include intensity (maximum sustained 1-min surface wind speed) forecasts valid 12, 24, 36, 48, 72, 96, and 120 h after the initial synoptic time. These forecasts, and those from individual models, are compared to the corresponding poststorm best-track intensities if the system is classified in the best track as a tropical or subtropical cyclone at the initial and verifying times. Some intensity forecasts, those in which either the true or forecast cyclone dissipated but the other did not, are therefore not verified.1

Figure 1 shows intensity forecasts from two such cases: Tropical Storm Stan at 0000 UTC 4 October 2005 was forecast by the various intensity models and OFCL to reach an intensity of between 80 and 90 kt by 48 h (Fig. 1a). Stan made landfall earlier than forecast and dissipated over the mountainous terrain of Mexico 24 h after the initial time. No forecasts from 36 h onward, including the 36-h OFCL forecast of 90 kt, are verified in the currently used technique. In the second case, at 0600 UTC 11 November 2006, the tropical depression that became Tropical Storm Gamma was forecast to remain at 25-kt intensity for 24 h before dissipation (Fig. 1b). The best track shows that the system remained a tropical cyclone through 5 days, reaching a maximum intensity of 40 kt. Because the OFCL called for dissipation after 24 h, it is not verified with the currently used technique.

An alternative method to verify all tropical cyclone intensity forecasts, including these forecasts missed in the currently used method, is presented in the next section. In section 3, OFCL and some model forecast verifications are presented, and conclusions are provided in section 4.

2. Methods

Tropical cyclone intensity forecast error is defined as the difference between the forecast and the postseason best-track intensities verifying at the same time. This definition omits cases in which either the tropical cyclone dissipates or the forecast calls for dissipation. To include these cases, an alternative verification method using contingency tables is proposed. Because the best-track intensity is reported in 5-kt intervals, all model forecasts are rounded to the nearest 5 kt. At each forecast time, a matrix (or contingency table) is filled with the count of each forecast–verification pair in the sample (Table 1). The first row and column represent the number of times the tropical cyclone does, or is forecast to, dissipate, respectively. Each subsequent row and column represents increases in intensity in 5-kt increments. Perfect forecasts are along the contingency table’s diagonal. The farther each forecast is from the diagonal, the larger the forecast error is. Thus, a 25-kt forecast is closer to the diagonal than a 65-kt forecast if the tropical cyclone has dissipated. Similarly, a dissipation forecast is closer to the diagonal if the best track showed the intensity to be 25 kt than if the intensity were 65 kt.

A skill score,
i1520-0434-23-6-1304-eq1
where C is the number of correct forecasts, T is the total number of forecasts, and E is the number of forecasts expected to be correct, can be calculated for each contingency table (Panofsky and Brier 1958). The skill score will be unity if all cases are correctly predicted (T = C), and less than or equal to zero for no skill (C0).
The expected number of correct forecasts can be calculated using any standard, such as chance or climatology and persistence. For chance, the expected number is given by
i1520-0434-23-6-1304-eq2
where Ri and Ci are the total numbers of cases in the ith row and ith column, respectively. For climatology and persistence, the Statistical Hurricane Intensity Forecast (SHFR; Jarvinen and Neumann 1979) is used, and E is the actual number of correct SHFR forecasts (those along the diagonal).

Since the skill score does not consider the distance from the diagonal in the contingency table, error probability distributions are examined below.

3. Results

a. OFCL forecasts

Since 1988, NHC has archived OFCL and model 12-, 24-, 36-, 48-, and 72-h intensity forecasts; OFCL 96- and 120-h and model 60-, 84-, 96-, 108-, and 120-h forecasts have been available since 2003. OFCL skill scores and their linear trends are shown in Fig. 2; the trends are not shown for the 4- and 5-day forecasts due to the short time series. Skill scores versus chance decrease with forecast time through 48 h (Fig. 2a).

Since 1988, 12-h forecast skill has decreased, 24-h forecast skill has remained about constant, and forecast skill has increased from 36 to 72 h. The skill scores versus SHFR (Fig. 2b) decrease with forecast time through 72 h, and the trends are similar to those versus chance. In the remainder of this manuscript, only skill scores versus SHFR are shown.

The annual probability that OFCL forecast intensity is within three error categories of the best-track intensity (<15 kt or one Saffir–Simpson intensity category), or are more than six error categories from the best-track intensity (>30 kt or two Saffir–Simpson intensity categories), are shown in Fig. 3. The majority of 12–48-h forecasts are currently within three error categories, and the percentage of these forecasts has been increasing in time at all forecast times except 12 h. Less than one-third of all forecasts are now more than six error categories off from the best-track intensity, and the annual probabilities of these occurrences has been falling steadily.

b. Model forecasts

Of the model intensity guidance, the Statistical Hurricane Intensity Prediction Scheme (SHIPS; DeMaria and Kaplan 1994; DeMaria et al. 2005), the adjusted Geophysical Fluid Dynamical Laboratory Hurricane Model (GFDI; Bender et al. 1993; Kurihara et al. 1998), and the Florida State University Superensemble (FSSE; Krishnamurti et al. 1999, 2000a, b) have skill, and these models are shown here. SHIPS was upgraded in 2000 to include land effects (DSHP); the time series here includes the DSHP forecasts from 2000 onward. In contrast to the OFCL skill scores, the SHIPS skill scores (Fig. 4a) have increased at all forecast times and are approaching the OFCL skill scores. Both GFDI and FSSE (Figs. 4b and 4c, respectively) have shorter time series than OFCL and SHIPS, so any trends shown may not be representative of a true long-term trend. The GFDI skill scores have increased through 36 h and decreased afterward. The short FSSE time series has decreasing skill scores at all forecast times.

Figure 5 shows the probabilities the forecasts from the three models are within three error categories of, or more than six error categories from, the best-track intensity. The probability of the forecast intensity being within three error categories of the best-track intensity has increased at all times except for the 12-h SHIPS and FSSE forecasts; the probability of the forecast intensity being more than six error categories from the best-track intensity has decreased at all forecast times except for the 12-h SHIPS and FSSE forecasts.

4. Conclusions

An alternative method of verifying OFCL and model tropical cyclone intensity forecasts is presented. In the currently used verification technique, if either the forecast calls for the tropical cyclone to dissipate or if the real tropical cyclone dissipates, forecasts are not verified. In the new technique, these forecasts are included in a contingency table with all other forecasts. Skill scores and probabilities can be calculated using these contingency tables.

In contrast to previous studies (DeMaria et al. 2007), OFCL forecasts have been steadily improving at all forecast times except 12 h. This may be because 12-h forecasts are excellent, with nearly 90% of these forecasts being within three error categories (about one Saffir–Simpson category) of the best-track intensity, and only about 1% being more than six error categories (about three Saffir–Simpson categories) from the best-track intensity. The 12-h model intensity forecasts are not improving as fast as those at later forecast lead times. Currently, about one in twenty 24-h OFCL intensity forecasts are more than six error categories (35 kt or more) from the best-track intensity; Kaplan and DeMaria (2003) show that this is the approximate ratio of all overwater cases that undergo intensity changes of 35 kt or greater within 24 h. When the lead time increases to 2 days, nearly one-fifth of all intensity forecasts are more than six error categories from the best-track intensity, and by 5 days, nearly one-third are thus classified.

NHC OFCL intensity forecasts using the currently employed technique have improved slowly since 1990. The new technique, taking into account all forecasts, suggests that the improvement has been slow, but faster than previously shown with the current verification technique.

Acknowledgments

This study is the result of earlier discussions of forecast verification with James Franklin of NHC and Harold Brooks of the National Severe Storms Laboratory. Mike Jankulak, John Kaplan, Frank Marks, and two anonymous reviewers provided very helpful comments on various versions of this manuscript.

REFERENCES

  • Bender, M. A., , Ross R. J. , , Tuleya R. E. , , and Kurihara Y. , 1993: Improvements in tropical cyclone track and intensity forecasts using the GFDL initialization system. Mon. Wea. Rev., 121 , 20462061.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • DeMaria, M., , and Kaplan J. , 1994: A Statistical Hurricane Intensity Prediction Scheme (SHIPS) for the Atlantic basin. Wea. Forecasting, 9 , 209220.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • DeMaria, M., , Mainelli M. , , Shay L. K. , , Knaff J. A. , , and Kaplan J. , 2005: Further improvements to the Statistical Hurricane Intensity Prediction Scheme (SHIPS). Wea. Forecasting, 20 , 531543.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • DeMaria, M., , Knaff J. A. , , and Sampson C. , 2007: Evaluation of long-term trends in tropical cyclone intensity forecasts. Meteor. Atmos. Phys., 97 , 1928.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jarvinen, B. R., , and Neumann C. J. , 1979: Statistical forecast of tropical cyclone intensity. NOAA Tech. Memo. NS NHC-10, 22 pp.

  • Kaplan, J., , and DeMaria M. , 2003: Large-scale characteristics of rapidly intensifying tropical cyclones in the North Atlantic basin. Wea. Forecasting, 18 , 10931108.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Krishnamurti, T. N., , Kishtawal C. M. , , LaRow T. , , Bachiochi D. , , Zhang Z. , , Williford C. E. , , Gadgil S. , , and Surendran S. , 1999: Improved weather and seasonal climate forecasts from multimodel superensemble. Science, 285 , 15481550.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Krishnamurti, T. N., , Kishtawal C. M. , , Shin D. W. , , and Williford C. E. , 2000a: Improving tropical precipitation forecasts from a multianalysis superensemble. J. Climate, 13 , 42174227.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Krishnamurti, T. N., and Coauthors, 2000b: Multimodel ensemble forecasts for weather and seasonal climate. J. Climate, 13 , 41964216.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kurihara, Y., , Tuleya R. E. , , and Bender M. A. , 1998: The GFDL hurricane prediction system and its performance during the 1995 hurricane season. Mon. Wea. Rev., 126 , 13061322.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Panofsky, H. A., , and Brier G. W. , 1958: Some Applications of Statistics to Meteorology. The Pennsylvania State University, 224 pp.

Fig. 1.
Fig. 1.

Operational model and OFCL intensity forecasts and best-track intensity for (a) Tropical Storm Stan, 0000 UTC 4 Oct 2005, and (b) Tropical Depression Gamma, 0600 UTC 11 Nov 2006.

Citation: Weather and Forecasting 23, 6; 10.1175/2008WAF2222123.1

Fig. 2.
Fig. 2.

OFCL skill scores vs (a) chance and (b) SHFR, and their linear trends.

Citation: Weather and Forecasting 23, 6; 10.1175/2008WAF2222123.1

Fig. 3.
Fig. 3.

Percentage of total annual OFCL forecasts (a) within three error categories of and (b) more than six error categories from the best-track intensity, and the linear trend of each series.

Citation: Weather and Forecasting 23, 6; 10.1175/2008WAF2222123.1

Fig. 4.
Fig. 4.

Skill scores of (a) SHIPS, (b) GFDI, and (c) FSSE vs SHFR, and the linear trend of each series.

Citation: Weather and Forecasting 23, 6; 10.1175/2008WAF2222123.1

Fig. 5.
Fig. 5.

Percentage of total annual (a) SHIPS forecasts within three error categories of, (b) SHIPS forecasts more than six error categories from, (c) GFDI forecasts within three error categories of, (d) GFDI forecasts more than six error categories from, (e) FSSE forecasts within three error categories of, and (f) FSSE forecasts more than six error categories from the best-track intensity, and the linear trend of each series.

Citation: Weather and Forecasting 23, 6; 10.1175/2008WAF2222123.1

Table 1.

For 2005, the SHIPS 36-h contingency table and row–column titles. The first row and first column are the forecast and best-track intensities (kt), respectively, with “D” representing dissipation. The remainder is the contingency table in which numbers are the counts of each forecast–best-track intensity pair, with blank spaces representing zeroes. In this example, the forecast intensity was 30 kt 22 times when the tropical cyclone dissipated, and there are 12 cases that are correctly forecast to be 45 kt. Numbers along the diagonal (bold) represent the number of correct forecasts, and the distance from the diagonal corresponds to the size of the error.

Table 1.

1

An internal NHC forecast verification that includes these cases is done, though using a different technique than is reported on here.

Save