Investigating the Utility of Using Cross-Oceanic Training Sets for Superensemble Forecasting of Eastern Pacific Tropical Cyclone Track and Intensity

Mark R. Jordan II Department of Meteorology, The Florida State University, Tallahassee, Florida

Search for other papers by Mark R. Jordan II in
Current site
Google Scholar
PubMed
Close
,
T. N. Krishnamurti Department of Meteorology, The Florida State University, Tallahassee, Florida

Search for other papers by T. N. Krishnamurti in
Current site
Google Scholar
PubMed
Close
, and
Carol Anne Clayson Department of Meteorology, The Florida State University, Tallahassee, Florida

Search for other papers by Carol Anne Clayson in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

This paper examines how combining training-set forecasts from two separate oceanic basins affects the resulting tropical cyclone track and intensity forecasts in a particular oceanic basin. Atlantic and eastern Pacific training sets for 2002 and 2003 are combined and used to forecast 2004 eastern Pacific tropical cyclones in a real-time setting. These experiments show that the addition of Atlantic training improves the 2004 eastern Pacific forecasts. Finally, a detailed study of training-set and real-time model biases is completed in an effort to determine why cross-oceanic training may have helped in this instance.

Corresponding author address: Carol Anne Clayson, Dept. of Meteorology, and Geophysical Fluid Dynamics Institute, The Florida State University, 404 Love Bldg., Tallahassee, FL 32306. Email: clayson@met.fsu.edu

Abstract

This paper examines how combining training-set forecasts from two separate oceanic basins affects the resulting tropical cyclone track and intensity forecasts in a particular oceanic basin. Atlantic and eastern Pacific training sets for 2002 and 2003 are combined and used to forecast 2004 eastern Pacific tropical cyclones in a real-time setting. These experiments show that the addition of Atlantic training improves the 2004 eastern Pacific forecasts. Finally, a detailed study of training-set and real-time model biases is completed in an effort to determine why cross-oceanic training may have helped in this instance.

Corresponding author address: Carol Anne Clayson, Dept. of Meteorology, and Geophysical Fluid Dynamics Institute, The Florida State University, 404 Love Bldg., Tallahassee, FL 32306. Email: clayson@met.fsu.edu

1. Introduction

Since its inception in 1998, T. N. Krishnamurti’s superensemble technique has led to significant improvements in the forecasting of temperature regimes and precipitation patterns (Krishnamurti et al. 1999). One of the technique’s greatest accomplishments, however, has been its capability of improving tropical cyclone track and intensity forecasts. The superensemble proved its worth in 2004, as it provided the best track and intensity forecasts for Atlantic tropical cyclones at all synoptic times. The superensemble technique is unlike other numerical models since the technique does not try to independently model the atmosphere using governing equations, observational data, and differencing techniques. Instead, the superensemble takes many previous forecasts from several different models (training) and corrects the biases of each model using a least squares minimization technique. Subsequently, each model is then weighted using multiple linear regression. These weights and bias corrections are then applied to the suite of current model forecasts, and a single superensemble (biases removed and weighted) forecast is produced. For the experiments in this paper and in real-time practice at The Florida State University, weights and bias corrections are applied to tropical cyclone forecasts uniformly, regardless of cyclone strength or location in a particular basin. Williford (2002) provides more details on how a superensemble forecast is developed.

For tropical cyclone superensemble forecasting, experiments have shown that larger training sets tend to provide better forecasts than smaller training sets. However, obtaining large enough training sets can be a challenge. The physical and dynamical characteristics of numerical models are regularly changed, so one potentially has the problem of using an outdated training set that incorrectly characterizes the biases and relative strength of an updated model. Furthermore, some tropical cyclone seasons are relatively inactive and, therefore, may not provide a sufficient number of forecast cases for a future season. This paper investigates the potential utility of combining training data from two separate ocean basins, the Atlantic and eastern Pacific, to provide a larger training set to forecast eastern Pacific tropical cyclones during the 2004 season. Because the same numerical models are used for tropical cyclone forecasts in both basins, compatibility is not an issue as the same versions of the models are applied to each basin. Based on the theories outlined above, such an experiment could lead to improved tropical cyclone forecasts.

Previous research involving the modification of training sets for improving superensemble forecasts has primarily focused on developing location-specific training sets for tropical cyclone forecasting. Williford (2002) separates Atlantic basin training into subsections based on location within the Atlantic basin. Then, superensemble forecasts are generated for real-time tropical cyclones based on where the cyclone forms in the basin. For example, a real-time tropical cyclone that forms and moves through the Gulf of Mexico is forecasted using only Gulf of Mexico storms in the training set. The results of these experiments show marginal improvements in the overall tropical cyclone track and intensity forecasts. The main problem with maintaining location-specific training sets involves the small size of the training sets individually, especially because it is difficult to obtain a large number of instances in each training set before numerical models undergo significant modifications.

2. Methods

Table 1 defines the acronyms used to identify each model. The numerical models chosen in developing superensemble forecasts for the 2004 eastern Pacific tropical cyclone season are outlined in Table 2. Detailed information about these numerical models can be found online (at www.nhc.noaa.gov/modelsummary.shmtl). The experiments presented in this paper were conducted as if real-time hurricane track and intensity forecasts were being developed. Many preliminary experiments were conducted to determine the distribution of models that would be used in the actual experiments, and the distribution of models in Table 2 represents the collection of models that was determined to provide the best forecasts.

The early and late designations for latitude, longitude, and intensity models refer to the synoptic times for which each group of models was used. Early models were used for 12–72-h forecasts while late models were used for 84–120-h forecasts. The 2002 and 2003 tropical cyclone forecasts for both the Atlantic and eastern Pacific were used in the training set for forecasting the 2004 eastern Pacific tropical cyclone season. Prior to the beginning of work on this project, it was determined that no significant changes had been made to the numerical models between 2002 and 2004, except for the GFDI model. However, given the large number of changes that are often made to numerical models over a couple of years, the consistency of the member models is good in this situation. Therefore, using a 2002–03 training set for forecasting cyclones in 2004 would be appropriate.

No cross-validation experiments were conducted in any experiments because the use of such methods would invalidate the purpose of recreating a real-time scenario. Cross validation involves using the training associated with future storms to determine how forecast-year training may differ from past-years’ training sets. While such cross-validation experiments can be useful in a research mode, those types of experiments provide no guidance for improving real-time forecasts. In determining error calculations, a homogeneous forecast sample was used so that all models and forecasts could be compared equally. The number of cases involved in the error calculations is different from published error results found on the National Hurricane Center’s Web site because superensemble forecasts are not made for tropical depressions and storms that have made landfall, regardless of strength. Therefore, the National Hurricane Center has a higher number of cases in its error calculations.

3. Results

The first experiment conducted involved using 2002–03 eastern Pacific training set to forecast 2004 eastern Pacific tropical cyclones. The previous forecast training set comprised 382 forecast instances. For this and all other experiments, the number of actual forecasts made for hours 12, 24, 36, 48, 60, 72, 84, 96, 108, and 120 were 138, 116, 99, 82, 69, 53, 43, 33, 23, and 14, respectively. Tables 3 –6 show the root-mean-square track and intensity errors for the superensemble along with other model and forecast errors for the 2004 eastern Pacific hurricane season. Tables 3 –6 indicate that the superensemble track forecasts perform relatively well compared to other models for 12–48-h forecasts. During this period, superensemble performance only slightly lags behind the performance of the ensemble models GUNS and GUNA. During the 60–120-h forecasts, however, superensemble forecast errors increase significantly compared to the errors of other models and forecasts. Many of the numerical models and ensemble models tend to outperform the superensemble during this time frame. The same type of error pattern is also noted when examining the intensity errors of the superensemble and other models and forecasts. For 12–60-h forecasts, the superensemble performed well, alternating between the best intensity model and the second-best intensity model. However, 72–120-h forecasts show a significant drop in accuracy for the superensemble, as it trails several of its member models during these times.

To assess the effects of an increased training set on the superensemble forecast, a second experiment was conducted involving a combined 2002–03 eastern Pacific training set and a 2002–03 Atlantic training set. This training set included 811 forecast instances. Tables 3 –6 show root-mean-square track and intensity errors for the superensemble and other numerical models. Tables 3 –6 show that 12–60-h superensemble track forecasts improve slightly over the model’s performance when using only Pacific training; however, the superensemble still trails the GUNS and GUNA ensemble models during these periods. For 72–120-h track forecasts, the superensemble shows significant improvement over its performance when using just Pacific training, although the model still demonstrates less skill than OFCI, GFDI, and the ensemble models (GUNS and GUNA). On the other hand, superensemble intensity forecasts improve remarkably with the use of combined training. The superensemble has the overall best intensity forecasts during hours 12–60. During hours 72–120, the superensemble’s skill decreases relative to other numerical models and forecasts. During this period, the superensemble is routinely outperformed by DSHP, SHF5, and OFCI. Even though the skill of the superensemble noticeably decreases during the later hours, the improvement in the forecasts during the early hours significantly improves the superensemble’s overall yearly performance since most of the verified forecasts occur during the early hours. These results appear to indicate that the use of combined-basin training can be helpful if the models used for both basins are the same.

Because the addition of an Atlantic training set improved the superensemble forecasts, a third experiment was conducted to see whether the additional training set (the 2002–03 Atlantic training set) alone would result in better forecasts than the training sets used in the two previous experiments. The 2002–03 Atlantic training set included 429 previous forecast instances. Tables 3 –6 show the results of using this training set to forecast eastern Pacific tropical cyclone track and intensity. The track results, as shown in Tables 3 and 4, indicate that the forecast skill for hours 12–60 is not significantly different than the skill seen when using combined training. However, forecasts for hours 72–120 show significant improvement over the skill of using either combined training or solely Pacific training. While the superensemble still trails GUNS and GUNA in forecast skill, the model routinely shows improved skill over other dynamical, statistical, and subjective forecasts. Superensemble intensity forecasts using an Atlantic training set show similar characteristics as the track forecasts. In this instance, hour 12–60 forecasts are actually a bit worse than forecasts using the combined training set; however, forecasts during the later hours show remarkably improved skill over previous forecasts.

Therefore, the overall results indicate that the best track results are achieved through using an Atlantic training set while the best intensity results are achieved through using the combined Atlantic–eastern Pacific training set. Either way, however, the introduction of Atlantic training appears to have increased the skill of the superensemble over solely using an eastern Pacific training set for forecasting eastern Pacific tropical cyclone tracks and intensities.

Determining whether the addition of an Atlantic training set results in statistically significant improvements in the overall superensemble track and intensity forecasts requires using a paired t test. Table 7 shows various paired t tests conducted between track and intensity results using only Pacific training and track and intensity results versus using combined training and Atlantic training. These tests were conducted using the average forecast errors at each forecast hour. Four paired t tests were conducted, and the results indicate that track and intensity improvements using both combined and Atlantic training are significant at the 95% confidence level, with corresponding P values ranging from 0.016 to 0.054. Therefore, it is likely that the improvements seen with the additional training sets are more likely systematic in nature and not random.

A question arises at this point: Why would using an Atlantic training set, as opposed to an eastern Pacific training set, result in better eastern Pacific superensemble forecasts? It is counterintuitive that superensemble forecasts for a given basin would be better because of using training sets from a different basin. It is possible that the size of the training sets caused the outcomes seen in these experiments. However, the size of the two training sets was not appreciably different. The Atlantic training set is composed of only 40 more training cases than is the eastern Pacific training set. It is also possible that the biases and overall performance of the models and forecasts change depending on the overall synoptic pattern in a particular basin. Unfortunately, to this date no one has performed detailed studies as to whether there are nonchaotic patterns associated with model performance at any given time. Finally, a comparison of Pacific forecasts shows that in many cases, the BCEM actually performs worse than the ENSM. This result would indicate that improper bias corrections are occurring in the majority of forecasts, and incorrect bias corrections would have a significant impact on the overall superensemble performance. Therefore, the only possible way, at this time, to answer the original question is to examine the biases of training-set models during the training and forecast periods since the weighting of the numerical models used to produce the superensemble forecast is secondary to the primary bias-correction scheme.

Tables 8 –13 show the biases of all of the models outlined in Table 2 at all synoptic times. The charts are broken down to show the biases of the models in the 2002–03 eastern Pacific training set, the 2002–03 Atlantic training set, and the actual biases used in the 2004 eastern Pacific forecasts. The natural goal would be to have a training set whose biases more closely resemble the biases of the models in real time. Instead of examining each table in detail with regard to the sign and magnitude of model bias in comparison to actual 2004 model biases, Table 14 provides correlation coefficients for the early and late time model training biases with the 2004 real-time model biases at the corresponding times. For the latitude and longitude model biases, the correlation coefficients indicate a higher correlation between the Atlantic training biases and the 2004 real-time model biases in all situations except for the correlations involving early time, longitude training. These correlations show that the Atlantic training biases and the 2004 real-time model biases have a strong, direct correlation while the eastern Pacific training biases and the 2004 real-time model biases have a strong, indirect correlation. Therefore, based on correlation analysis, one would expect that the Atlantic training would tend to provide better track forecasts because, in most cases, the biases associated with that training correlate better with the biases of the real-time models. It is important to note, however, that the bias-correlation improvements may not explain the total improvement in the track errors. Subsequent coefficient weighting, resulting from training biases, may also help improve track errors when using Atlantic training.

Using correlation analysis to explain the outcomes of various intensity forecasts, however, is somewhat more problematic. At all times, the intensity biases associated with the real-time models correlate better with the intensity biases associated with the eastern Pacific training. During the early times, high correlations exist between the real-time model biases and both sets of training-set biases. However, with an almost one to one correlation between the eastern Pacific training-set biases and the real-time biases, one would reasonably expect a much better intensity forecast using the eastern Pacific training set than when using the Atlantic training set (correlation of 0.595). However, even though the use of Pacific training does improve the early time intensity forecasts, the amount of improvement over the intensity forecasts when using an Atlantic training set is small, as is seen in examining Tables 5 –6.

During the later times, the correlations between the biases of the two training sets and the real-time model biases are much more similar in magnitude. The eastern Pacific–real-time correlation is approximately 0.46 while the Atlantic–real-time correlation is approximately 0.35. However, even though the eastern Pacific–real-time correlation is higher during this period, the Atlantic training set significantly improves the intensity forecasts as compared to forecasts made using the eastern Pacific training set. Therefore, it is reasonable to expect that the assignment of coefficients in these cases must have had a significant impact on the forecast skill.

Because correlation analysis does not reveal any information concerning the magnitude differences among the various sets of biases, the magnitude differences were calculated for the intensity biases to determine whether the magnitudes of the bias differences help to explain why the Atlantic training provided better intensity forecasts. In many cases, the magnitude differences between the Pacific training–Pacific real time and Atlantic training–Pacific real time were not substantial. However, the GFDI Pacific real-time biases are much closer in magnitude to the GFDI Atlantic training biases than the GFDI Pacific training biases. The same trend is noted when comparing biases associated with the UKMI and SHF5, but not to the same extent as with the GFDI biases. Furthermore, a bias magnitude spike is noted at hours 72 and 84 with the Pacific training biases that is not seen in either the Atlantic training biases or the Pacific real-time biases. This bias magnitude spike may also help to explain the success of the Atlantic training set when forecasting Pacific tropical cyclone intensity.

4. Conclusions

The overall goal of this experiment was to see whether multibasin training would be beneficial to superensemble forecasts if the characteristics between the training-set models and the real-time models are similar. Such multibasin training, at least in this instance, does appear to be beneficial to superensemble forecasts. It would be interesting and intriguing to see whether multibasin training sets continue to result in improved tropical cyclone forecasts in future seasons. The most captivating part of this experiment, however, does not result from proving that multibasin training could be useful in tropical cyclone forecasting. Instead, the most important aspects of this research involve showing that, at least in this case, Atlantic training is more beneficial to eastern Pacific tropical cyclone forecasting than eastern Pacific training is. Furthermore, while similar correlations between training-set biases and real-time model biases appear to improve the forecasts, exceedingly similar correlations are not always necessary to produce competitive forecasts, and higher correlations do not always result in better forecasts. These observations reveal a couple of important traits of superensembles. First, the use of multibasin training does appear to help the forecasts generally because of an increase in the overall training cases. As long as the models used in both basins are the same, using multibasin training should continue to be beneficial in reducing errors in the future. Next, the nature of the model-bias calculations in this paper seems to indicate two possibilities: Coefficient assignments are not as closely linked to model-bias evaluation as the superensemble method would lead one to believe, or the model-bias calculations, coefficient assignments, or both are the result of an attempt to implement order in a naturally chaotic system. One method to evaluate whether the latter idea is valid is to examine model bias as it relates to tropical cyclones. The key questions to answer involve whether model biases are consistent from cyclone to cyclone, from day to day, and from forecast to forecast.

Future studies should attempt to clarify the nature of biases in numerical models as they relate to tropical cyclone track and intensity. On a smaller scale, revisiting the idea of separating the training based on the location in the Pacific could result in better forecasts since the Pacific does not contain as many subsections as the Atlantic does, thereby allowing for more training instances in each training set. Also, it is reasonable to assume that model biases could be completely different for tropical cyclones near the Mexican coast, where terrain issues are present, as opposed to farther west in the open ocean, where terrain issues are not a factor.

REFERENCES

  • Krishnamurti, T. N., Kishtawal C. M. , LaRow T. , Bachiochi D. , Zhang Z. , Williford C. E. , Gadgil S. , and Surendran S. , 1999: Improved skills for weather and seasonal climate forecasts from multimodel superensemble. Science, 285 , 15481550.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Rhome, J., cited. 2007: Technical summary of the NHC/TPC tropical cyclone track and intensity guidance models. [Available online at http://www.nhc.noaa.gov/modelsummary.shtml.].

  • Williford, C. E., 2002: Real-time superensemble tropical cyclone prediction. Ph.D. dissertation, The Florida State University, Tallahassee, FL, 144 pp.

Table 1.

List of member model acronyms and their associated descriptions (Rhome 2007).

Table 1.
Table 2.

Numerical models used in producing superensemble forecasts for the 2004 eastern Pacific tropical cyclone season.

Table 2.
Table 3.

The 2004 tropical cyclone RMS track errors (km) for various member models and the ensemble mean (ENSM) of those member models.

Table 3.
Table 4.

The 2004 tropical cyclone RMS track errors (km) for the BCEM and the Florida State Superensemble (FSSE) associated with various training sets.

Table 4.
Table 5.

The 2004 tropical cyclone RMS intensity errors (m s−1) associated with various member models and the ensemble mean (ENSM) of those member models.

Table 5.
Table 6.

The 2004 tropical cyclone RMS intensity errors (m s−1) for the BCEMs and the FSSEs associated with various training sets.

Table 6.
Table 7.

Results of using a paired t test to determine the significance of the differences between various paired error sets.

Table 7.
Table 8.

Latitude model biases for the early times (km).

Table 8.
Table 9.

Longitude model biases for the early times (km).

Table 9.
Table 10.

Intensity model biases for the early times (m s−1).

Table 10.
Table 11.

Latitude model biases for the late times (km).

Table 11.
Table 12.

Longitude model biases for the late times (km).

Table 12.
Table 13.

Intensity model biases at the late times.

Table 13.
Table 14.

Model bias correlation coefficients.

Table 14.
Save
  • Krishnamurti, T. N., Kishtawal C. M. , LaRow T. , Bachiochi D. , Zhang Z. , Williford C. E. , Gadgil S. , and Surendran S. , 1999: Improved skills for weather and seasonal climate forecasts from multimodel superensemble. Science, 285 , 15481550.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Rhome, J., cited. 2007: Technical summary of the NHC/TPC tropical cyclone track and intensity guidance models. [Available online at http://www.nhc.noaa.gov/modelsummary.shtml.].

  • Williford, C. E., 2002: Real-time superensemble tropical cyclone prediction. Ph.D. dissertation, The Florida State University, Tallahassee, FL, 144 pp.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 84 28 3
PDF Downloads 20 13 0