• Balaguru, K., G. R. Foltz, L. R. Leung, S. M. Hagos, and D. R. Judi, 2018: On the use of ocean dynamic temperature for hurricane intensity forecasting. Wea. Forecasting, 33, 411418, https://doi.org/10.1175/WAF-D-17-0143.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Balaguru, K., G. R. Foltz, L. R. Leung, J. Kaplan, W. Xu, N. Reul, and B. Chapron, 2020: Pronounced impact of salinity on rapidly intensifying tropical cyclones. Bull. Amer. Meteor. Soc., 101, E1497E1511, https://doi.org/10.1175/BAMS-D-19-0303.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bergstra, J. S., R. Bardenet, Y. Bengio, and B. Kégl, 2011: Algorithms for hyper-parameter optimization. Advances in Neural Information Processing Systems, J. Shawe-Taylor et al., Eds., Information Processing Systems Foundation, Inc., 2546–2554.

  • Bergstra, J. S., D. Yamins, and D. D. Cox, 2013a: Hyperopt: A python library for optimizing the hyperparameters of machine learning algorithms. Proc. 12th Python in Science Conf., Austin, TX, Citeseer, 13–20.

    • Crossref
    • Export Citation
  • Bergstra, J. S., D. Yamins, and D. D. Cox, 2013b: Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. Proc. 30th Int. Conf. on Machine Learning, Atlanta, GA, ICML, 115–123.

  • Berrisford, P., D. Dee, P. Poli, R. Brugge, K. Fielding, M. Fuentes, and A. Simmons, 2011: The ERA-Interim archive version 2.0. ERA report series 1, Tech. Rep., ECMWF, 23 pp.

  • Cangialosi, J. P., 2019: National Hurricane Center forecast verification report: 2018 hurricane season. National Hurricane Center, 73 pp., www.nhc.noaa.gov/verification/pdfs/Verification_2018.pdf.

  • Cangialosi, J. P., E. Blake, M. DeMaria, A. Penny, A. Latto, E. Rappaport, and V. Tallapragada, 2020: Recent progress in tropical cyclone intensity forecasting at the National Hurricane Center. Wea. Forecasting, 35, 19131922, https://doi.org/10.1175/WAF-D-20-0059.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Chaudhuri, S., D. Dutta, S. Goswami, and A. Middey, 2013: Intensity forecast of tropical cyclones over North Indian Ocean using multilayer perceptron model: Skill and performance verification. Nat. Hazards, 65, 97113, https://doi.org/10.1007/s11069-012-0346-7.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cloud, K. A., B. J. Reich, C. M. Rozoff, S. Alessandrini, W. E. Lewis, and L. Delle Monache, 2019: A feed forward neural network based on model output statistics for short-term hurricane intensity prediction. Wea. Forecasting, 34, 985997, https://doi.org/10.1175/WAF-D-18-0173.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Combot, C., A. Mouche, J. A. Knaff, Y. Zhao, Y. Zhao, L. Vinour, Y. Quilfen, and B. Chapron, 2020: Extensive high-resolution Synthetic Aperture Radar (SAR) data analysis of tropical cyclones: Comparisons with SFMR flights and best track. Mon. Wea. Rev., 148, 45454563, https://doi.org/10.1175/MWR-D-20-0005.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Courtney, J. B., and et al. , 2019: Operational perspectives on tropical cyclone intensity change. Part 1: Recent advances in intensity guidance. Trop. Cyclone Res. Rev., 8, 123133, https://doi.org/10.1016/j.tcrr.2019.10.002.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cummings, J. A., 2005: Operational multivariate ocean data assimilation. Quart. J. Roy. Meteor. Soc., 131, 35833604, https://doi.org/10.1256/qj.05.105.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • DeMaria, M., 2009: A simplified dynamical system for tropical cyclone intensity prediction. Mon. Wea. Rev., 137, 6882, https://doi.org/10.1175/2008MWR2513.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • DeMaria, M., and J. Kaplan, 1994: A Statistical Hurricane Intensity Prediction Scheme (SHIPS) for the Atlantic basin. Wea. Forecasting, 9, 209220, https://doi.org/10.1175/1520-0434(1994)009<0209:ASHIPS>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • DeMaria, M., and J. Kaplan, 1999: An updated Statistical Hurricane Intensity Prediction Scheme (SHIPS) for the Atlantic and eastern North Pacific basins. Wea. Forecasting, 14, 326337, https://doi.org/10.1175/1520-0434(1999)014<0326:AUSHIP>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • DeMaria, M., M. Mainelli, L. K. Shay, J. A. Knaff, and J. Kaplan, 2005: Further improvements to the Statistical Hurricane Intensity Prediction Scheme (SHIPS). Wea. Forecasting, 20, 531543, https://doi.org/10.1175/WAF862.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • DeMaria, M., J. A. Knaff, and J. Kaplan, 2006: On the decay of tropical cyclone winds crossing narrow landmasses. J. Appl. Meteor. Climatol., 45, 491499, https://doi.org/10.1175/JAM2351.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • DeMaria, M., C. R. Sampson, J. A. Knaff, and K. D. Musgrave, 2014: Is tropical cyclone intensity guidance improving? Bull. Amer. Meteor. Soc., 95, 387398, https://doi.org/10.1175/BAMS-D-12-00240.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Emanuel, K., S. Ravela, E. Vivant, and C. Risi, 2006: A statistical deterministic approach to hurricane risk assessment. Bull. Amer. Meteor. Soc., 87, 299314, https://doi.org/10.1175/BAMS-87-3-299.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Fovell, R. G., and Y. P. Bu, 2015: Improving HWRF track and intensity forecasts via model physics evaluation and tuning. DTC visitor program final report, Developmental Testbed Center, 28 pp., https://dtcenter.org/sites/default/files/visitor-projects/DTC_report_2013_Fovell.pdf.

  • Giffard-Roisin, S., D. Gagne, A. Boucaud, B. Kégl, M. Yang, G. Charpiat, and C. Monteleoni, 2018: The 2018 climate informatics hackathon: Hurricane intensity forecast. Eighth Int. Workshop on Climate Informatics, Boulder, CO, Climate Informatics Hackathon, 4 pp.

  • Goh, A. T., 1995: Back-propagation neural networks for modeling complex systems. Artif. Intell. Eng., 9, 143151, https://doi.org/10.1016/0954-1810(94)00011-S.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jones, D. R., 2001: A taxonomy of global optimization methods based on response surfaces. J. Global Optim., 21, 345383, https://doi.org/10.1023/A:1012771025575.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kelly, P., L. R. Leung, K. Balaguru, W. Xu, B. Mapes, and B. Soden, 2018: Shape of Atlantic tropical cyclone tracks and the Indian monsoon. Geophys. Res. Lett., 45, 10746, https://doi.org/10.1029/2018GL080098.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Knaff, J. A., M. DeMaria, C. R. Sampson, and J. M. Gross, 2003: Statistical, 5-day tropical cyclone intensity forecasts derived from climatology and persistence. Wea. Forecasting, 18, 8092, https://doi.org/10.1175/1520-0434(2003)018<0080:SDTCIF>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Knaff, J. A., C. R. Sampson, and B. R. Strahl, 2020: A tropical cyclone rapid intensification prediction aid for the joint typhoon warning center’s areas of responsibility. Wea. Forecasting, 35, 11731185, https://doi.org/10.1175/WAF-D-19-0228.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Knapp, K. R., M. C. Kruk, D. H. Levinson, H. J. Diamond, and C. J. Neumann, 2010: The International Best Track Archive for Climate Stewardship (IBTrACS): Unifying tropical cyclone best track data. Bull. Amer. Meteor. Soc., 91, 363376, https://doi.org/10.1175/2009BAMS2755.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Knapp, K. R., H. J. Diamond, J. P. Kossin, M. C. Kruk, C. J. Schreck, 2018: International best track archive for climate stewardship (IBTrACS) project, version 4. NOAA/National Centers for Environmental Information, accessed 20 April 2021, https://doi.org/10.25921/82ty-9e16.

    • Crossref
    • Export Citation
  • Landsea, C. W., and J. L. Franklin, 2013: Atlantic hurricane database uncertainty and presentation of a new database format. Mon. Wea. Rev., 141, 35763592, https://doi.org/10.1175/MWR-D-12-00254.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lee, C. Y., M. K. Tippett, A. H. Sobel, and S. J. Camargo, 2018: An environmentally forced tropical cyclone hazard model. J. Adv. Model. Earth Syst., 10, 223241, https://doi.org/10.1002/2017MS001186.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lin, N., R. Jing, Y. Wang, E. Yonekura, J. Fan, and L. Xue, 2017: A statistical investigation of the dependence of tropical cyclone intensity change on the surrounding environment. Mon. Wea. Rev., 145, 28132831, https://doi.org/10.1175/MWR-D-16-0368.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lloyd, I. D., and G. A. Vecchi, 2011: Observational evidence for oceanic controls on hurricane intensity. J. Climate, 24, 11381153, https://doi.org/10.1175/2010JCLI3763.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Na, W., J. L. McBride, X. H. Zhang, and Y. H. Duan, 2018: Understanding biases in tropical cyclone intensity forecast error. Wea. Forecasting, 33, 129138, https://doi.org/10.1175/WAF-D-17-0106.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, and J. Vanderplas, 2011: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res., 12, 28252830.

    • Search Google Scholar
    • Export Citation
  • Rappaport, E. N., J. G. Jiing, C. W. Landsea, S. T. Murillo, and J. L. Franklin, 2012: The joint hurricane test bed: Its first decade of tropical cyclone research-to-operations activities reviewed. Bull. Amer. Meteor. Soc., 93, 371380, https://doi.org/10.1175/BAMS-D-11-00037.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sampson, C. R., and A. J. Schrader, 2000: The automated tropical cyclone forecasting system (version 3.2). Bull. Amer. Meteor. Soc., 81, 12311240, https://doi.org/10.1175/1520-0477(2000)081<1231:TATCFS>2.3.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sampson, C. R., and J. A. Knaff, 2009: Southern Hemisphere tropical cyclone intensity forecast methods used at the Joint Typhoon Warning Center. Part III: Forecasts based on a multi-model consensus approach. Aust. Meteor. Oceanogr. J., 58, 1927, https://doi.org/10.22499/2.5801.003.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sampson, C. R., J. L. Franklin, J. A. Knaff, and M. DeMaria, 2008: Experiments with a simple tropical cyclone intensity consensus. Wea. Forecasting, 23, 304312, https://doi.org/10.1175/2007WAF2007028.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sharma, N., M. M. Ali, J. A. Knaff, and P. Chand, 2013: A soft-computing cyclone intensity prediction scheme for the western North Pacific Ocean. Atmos. Sci. Lett., 14, 187192, https://doi.org/10.1002/asl2.438.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Shimada, U., H. Owada, M. Yamaguchi, T. Iriguchi, M. Sawada, K. Aonashi, and K. D. Musgrave, 2018: Further improvements to the Statistical Hurricane Intensity Prediction Scheme using tropical cyclone rainfall and structural features. Wea. Forecasting, 33, 15871603, https://doi.org/10.1175/WAF-D-18-0021.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Simon, A., A. B. Penny, M. DeMaria, J. L. Franklin, R. J. Pasch, E. N. Rappaport, and D. A. Zelinsky, 2018: A description of the real-time HFIP corrected consensus approach (HCCA) for tropical cyclone track and intensity guidance. Wea. Forecasting, 33, 3757, https://doi.org/10.1175/WAF-D-17-0068.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Su, H., L. Wu, J. H. Jiang, R. Pai, A. Liu, A. J. Zhai, P. Tavallali, and M. DeMaria, 2020: Applying satellite observations of tropical cyclone internal structures to rapid intensification forecast with machine learning. Geophys. Res. Lett., 47, e2020GL089102, https://doi.org/10.1029/2020GL089102.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Tallapragada, V., L. Bernardet, M. K. Biswas, S. Gopalakrishnan, Y. Kwon, Q. Liu, and X. Zhang, 2014: Hurricane Weather Research and Forecasting (HWRF) model: 2013 scientific documentation. HWRF Development Testbed Center Tech. Rep., 99 pp., http://www.emc.ncep.noaa.gov/gc_wmb/vxt/pubs/HWRFScientificDocumentation2013.pdf.

  • Torn, R. D., and C. Snyder, 2012: Uncertainty of tropical cyclone best-track information. Wea. Forecasting, 27, 715729, https://doi.org/10.1175/WAF-D-11-00085.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Torn, R. D., and M. DeMaria, 2021: Validation of ensemble-based probabilistic tropical cyclone intensity change. Atmosphere, 12, 373, https://doi.org/10.3390/atmos12030373.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wada, A., and N. Usui, 2007: Importance of tropical cyclone heat potential for tropical cyclone intensity and intensification in the western North Pacific. J. Oceanogr., 63, 427447, https://doi.org/10.1007/s10872-007-0039-0.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wang, Z., 2014: Characteristics of convective processes and vertical vorticity from the tropical wave to tropical cyclone stage in a high-resolution numerical model simulation of Tropical Cyclone Fay (2008). J. Atmos. Sci., 71, 896915, https://doi.org/10.1175/JAS-D-13-0256.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wilks, D., 2006: Statistical Methods in the Atmospheric Sciences. 2nd ed. International Geophysics Series, Vol. 100, Academic Press, 648 pp.

  • Zhao, H., L. Wu, and W. Zhou, 2009: Observational relationship of climatologic beta drift with large-scale environmental flows. Geophys. Res. Lett., 36, L18809, https://doi.org/10.1029/2009GL040126.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • View in gallery

    Data processing and model training flowchart.

  • View in gallery

    The 24-h intensity change MAE (kt) from MLP, SHIPS, DSHP, LGEM, HWFI, and OFCL, tested on 2010–20 Atlantic TCs over the same 6-hourly locations. While the MLP model was tested using the LOYO method in 2010–18, it was tested truly independently in 2019 and 20 as if in a real-time mode. The shading around the MLP line denotes the 95% confidence interval based on the bootstrap method with sample size equal to 40 and the number of repetitions equal to 10 000.

  • View in gallery

    Distribution of 24-h intensity forecast error frequencies from different models: (a) MLP, (b) SHIPS, (c) DSHP, (d) LGEM, (e) HWFI, and (f) OFCL, with respect to the observed TC 24-h intensity change on 2010–18 Atlantic TCs. All frequencies are labeled by numbers in each grid. The red bounding boxes highlight large overforecast (>30 kt) for rapid weakening (RW) events, and the magenta bounding boxes highlight large underforecast (<−30 kt) for rapid intensification (RI) events. The numbers on top of the red and magenta boxes represent the event counts in bounding boxes.

  • View in gallery

    Heat map of correlation coefficient on 24-h model biases among different models and consensus for 2010–18. The smaller correlation coefficient indicates the higher level of degree of independence between models and the consensus.

  • View in gallery

    The 6-h intensity change (a) MAE (kt), (b) ME (kt), and (c) R2 from the MLP model for each year of 1982–2017 tested on North Atlantic TCs. The black error bars indicate the STD from 10 experiments with different training and validation splits and different random weight initializations.

  • View in gallery

    (a) 100 synthetic tracks with intensity shown by line color, randomly selected from 50 000 synthetic tracks. (b) Randomly selected 100 observed tracks after 1970. (c) TC lifetime maximum intensity distribution comparisons between 50 000 synthetic tracks and 718 observed tracks (1970–2013) in the North Atlantic basin. (d) TC landfall counts at 51 major coastal cities on the U.S. East Coast and the Gulf of Mexico, comparing 50 000 synthetic tracks and 1753 observed tracks (1851–2013). A TC is counted once if the track passes within a 1° radius from the city center coordinates.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 350 350 181
PDF Downloads 325 325 179

Deep Learning Experiments for Tropical Cyclone Intensity Forecasts

View More View Less
  • 1 a Marine and Coastal Research Laboratory, Pacific Northwest National Laboratory, Seattle, Washington
  • | 2 b Computing and Analytics Division, Pacific Northwest National Laboratory, Richland, Washington
  • | 3 c Cooperative Institute for Research in the Atmosphere, Colorado State University, Fort Collins, Colorado
  • | 4 d Earth Systems Science, Pacific Northwest National Laboratory, Richland, Washington
© Get Permissions
Open access

Abstract

Reducing tropical cyclone (TC) intensity forecast errors is a challenging task that has interested the operational forecasting and research community for decades. To address this, we developed a deep learning (DL)-based multilayer perceptron (MLP) TC intensity prediction model. The model was trained using the global Statistical Hurricane Intensity Prediction Scheme (SHIPS) predictors to forecast the change in TC maximum wind speed for the Atlantic basin. In the first experiment, a 24-h forecast period was considered. To overcome sample size limitations, we adopted a leave one year out (LOYO) testing scheme, where a model is trained using data from all years except one and then evaluated on the year that is left out. When tested on 2010–18 operational data using the LOYO scheme, the MLP outperformed other statistical–dynamical models by 9%–20%. Additional independent tests in 2019 and 2020 were conducted to simulate real-time operational forecasts, where the MLP model again outperformed the statistical–dynamical models by 5%–22% and achieved comparable results as HWFI. The MLP model also correctly predicted more rapid intensification events than all the four operational TC intensity models compared. In the second experiment, we developed a lightweight MLP for 6-h intensity predictions. When coupled with a synthetic TC track model, the lightweight MLP generated realistic TC intensity distribution in the Atlantic basin. Therefore, the MLP-based approach has the potential to improve operational TC intensity forecasts, and will also be a viable option for generating synthetic TCs for climate studies.

Significance statement

Scientists have been searching for decades for breakthroughs in tropical cyclone intensity modeling to provide more accurate and timely tropical cyclone warnings. To this end, we developed a deep learning (DL)-based predictive model for North Atlantic 24- and 6-h intensity forecast. We simulated 2019 and 2020 tropical cyclones as if in an operational forecast mode, and found that the model’s 24-h intensity forecast outperformed some of the most skillful operational models by 5%–22%. Also, the 6-h intensity model produced realistic intensity labels for synthetic tropical cyclone tracks. These results highlight the potential for using deep neural network–based models to improve operational hurricane intensity forecasts and synthetic tropical cyclone generation.

© 2021 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Xu, Wenwei, wenwei.xu@pnnl.gov

Abstract

Reducing tropical cyclone (TC) intensity forecast errors is a challenging task that has interested the operational forecasting and research community for decades. To address this, we developed a deep learning (DL)-based multilayer perceptron (MLP) TC intensity prediction model. The model was trained using the global Statistical Hurricane Intensity Prediction Scheme (SHIPS) predictors to forecast the change in TC maximum wind speed for the Atlantic basin. In the first experiment, a 24-h forecast period was considered. To overcome sample size limitations, we adopted a leave one year out (LOYO) testing scheme, where a model is trained using data from all years except one and then evaluated on the year that is left out. When tested on 2010–18 operational data using the LOYO scheme, the MLP outperformed other statistical–dynamical models by 9%–20%. Additional independent tests in 2019 and 2020 were conducted to simulate real-time operational forecasts, where the MLP model again outperformed the statistical–dynamical models by 5%–22% and achieved comparable results as HWFI. The MLP model also correctly predicted more rapid intensification events than all the four operational TC intensity models compared. In the second experiment, we developed a lightweight MLP for 6-h intensity predictions. When coupled with a synthetic TC track model, the lightweight MLP generated realistic TC intensity distribution in the Atlantic basin. Therefore, the MLP-based approach has the potential to improve operational TC intensity forecasts, and will also be a viable option for generating synthetic TCs for climate studies.

Significance statement

Scientists have been searching for decades for breakthroughs in tropical cyclone intensity modeling to provide more accurate and timely tropical cyclone warnings. To this end, we developed a deep learning (DL)-based predictive model for North Atlantic 24- and 6-h intensity forecast. We simulated 2019 and 2020 tropical cyclones as if in an operational forecast mode, and found that the model’s 24-h intensity forecast outperformed some of the most skillful operational models by 5%–22%. Also, the 6-h intensity model produced realistic intensity labels for synthetic tropical cyclone tracks. These results highlight the potential for using deep neural network–based models to improve operational hurricane intensity forecasts and synthetic tropical cyclone generation.

© 2021 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Xu, Wenwei, wenwei.xu@pnnl.gov

1. Introduction

Tropical cyclones (TCs) pose a significant socioeconomic threat in the global tropics and subtropics. Accurate prediction of TC tracks and intensities to provide actionable information on TC hazards to mitigate TC damages and loss of life therefore offers high societal benefits. While significant progress has been made over the past few decades in TC track forecasting, TC intensity forecasting has only improved modestly (Rappaport et al. 2012; DeMaria et al. 2014; Balaguru et al. 2018). According to the U.S. National Weather Service’s National Hurricane Center (NHC) official error trend (https://www.nhc.noaa.gov/verification/verify5.shtml), the average 24-h track forecast error decreased from 85.4 to 38.9 n mi (1 n mi = 1.852 km) (a 54% reduction) between 1990–99 and 2010–18. In contrast, the 24-h intensity mean absolute error (MAE) over the Atlantic region has only decreased from 9.8 kt (1 kt ≈ 0.51 m s−1) in 1990–99 to 8.7 kt in 2010–18 (an 11% reduction). Intensity forecast improvements have accelerated in the past 5 years (Cangialosi et al. 2020), but consistent forecast skill is still lacking, especially for rapidly intensifying TCs (Knaff et al. 2020; Torn and DeMaria 2021). Further improvement in TC intensity forecast has been partially hindered by the relatively large uncertainty in the best track intensity records (Combot et al. 2020; Landsea and Franklin 2013; Torn and Snyder 2012). At the same time, there are urgent needs for TC intensity modeling breakthroughs to substantially improve our ability to forecast intensity at 1–7 days horizon.

Broadly, TC intensity forecast models can be divided into three categories: dynamical models or physics-based models, statistical models, and statistical–dynamical models, which blend dynamical model output with statistics. In this study we compare our results to several skillful operational models and consensus as listed in Table 1. Improvements in intensity forecasts during recent years have mostly been driven by 1) higher resolution dynamical models, which take advantage of increased computational power, to resolve inner-core dynamics and physics as well as air–sea interactions, 2) enhanced observations that enable better initializations of dynamical models, 3) consensus methods that combine forecasts from multiple dynamical and statistical–dynamical models (Sampson et al. 2008; Sampson and Knaff 2009; Simon et al. 2018), and 4) improvements in track forecasts (Cangialosi et al. 2020). At the same time, dynamical intensity models still have some limitations, such as an incomplete understanding of air–sea interaction processes (Lloyd and Vecchi 2011), the underestimation of convection and convective cloud processes (Fovell and Bu 2015), and the lack of real-time inner core TC observations (Shimada et al. 2018). Hence, statistical models and statistical–dynamical models remain competitive and are included in official intensity forecasts worldwide (Cangialosi 2019; Courtney et al. 2019).

Table 1.

Operational TC intensity models and consensus for intercomparisons with MLP.

Table 1.

Historically, multiple linear regression has been the most commonly used method in statistical and statistical–dynamical intensity forecast models. As suggested by DeMaria and Kaplan (1994), the intensity change for a fixed time interval follows an approximate normal distribution with a mean close to zero, hence multiple linear regression may be a reasonable choice. Despite this, air–sea interaction and other processes in the TC core are highly nonlinear, which suggests that nonlinear methods have the potential to improve statistically based intensity forecast models. In particular, deep learning (DL), or deep neural networks, are capable of learning highly complex and nonlinear relationships involving many predictors. DL has achieved remarkable success in computer vision and pattern recognition in recent years. Yet its application to TC intensity prediction has been rare. A few pioneer studies that applied neural networks to forecast intensity (Sharma et al. 2013 for the western North Pacific; Chaudhuri et al. 2013 for the north Indian Ocean) were conducted before the advent of graphics processing units (GPUs). Hence, the trained networks were much smaller in terms of the number of nodes and the number of trainable weights. The few recent studies that used DL to forecast TC intensity in the North Atlantic basin include the following:

  • The 2018 Climate Informatic Hackathon (Giffard-Roisin et al. 2018) received solutions from 35 teams to forecast 24-h intensity on global historical storms, including a few DL solutions leveraging convolutional neural networks on feature maps of atmospheric and oceanic conditions. The results revealed a significant overfitting problem in the submitted DL solutions, indicating the need to thoroughly test for unseen data.
  • Cloud et al. (2019) trained a multilayer perceptron (MLP) to predict Atlantic and eastern Pacific 3–72-h TC intensity change using 18 predictors from Hurricane Weather Research and Forecasting (HWRF) (Tallapragada et al. 2014) reforecast data from 2014 to 2016. Their neural network–based model exhibited excellent rapid intensification (RI) forecast skill, but showed no significant improvement over the observation-adjusted HWRF (HWFI) in terms of MAE.

In this study we took a similar approach to that of Cloud et al. (2019) but combined predictors from a statistical–dynamical model, the Statistical Hurricane Intensity Prediction Scheme (SHIPS) (DeMaria et al. 2005), with a MLP model that was extensively optimized in terms of its architecture and hyperparameters. Our contribution to this domain includes the following four elements:

  1. Use of MLP as an advanced statistical approach to predict 24-h TC intensity change using predictors from SHIPS.
  2. Use of automated optimization techniques to tune neural network architecture and hyperparameters, and fully explore the potential of MLP in predicting TC intensity.
  3. Apply the MLP method to climate-scale TC studies by coupling a synthetic track model with an MLP-based intensity model to generate synthetic storms.
  4. Make the data and method of this study available to promote comparable or more advanced development of machine learning (ML) algorithms for TC intensity forecasting.

2. Data

Global data are used for model development and testing is just conducted in the Atlantic basin. The dataset used for this study consists of SHIPS model inputs as predictors, and 24- or 6-h intensity changes as predictands. The data sources are described below. To compare our model performance to that of other models, we collected operational forecasts from several state-of-the-art models along with the NHC’s official forecast.

a. SHIPS predictors

SHIPS, a statistical–dynamical model employed operationally at NHC, uses a multiple linear regression technique that features climatological, persistence, and synoptic predictors (DeMaria et al. 2005). Storm environment predictors are derived from the National Centers for Environmental Prediction (NCEP) global forecasting system (GFS), including zonal and meridional wind, shear, vorticity, divergence, etc. Examples of climatological predictors include the climatological sea surface temperature (SST) and the climatological depth of 20°C isotherm at the TC location and time of the year, both of which are derived from the 2005–10 mean of the Navy Coupled Ocean Data Assimilation (NCODA, Cummings 2005) analyses. Oceanic predictors along the path of the TC include SST and oceanic heat content (OHC) derived from daily or weekly NCODA analyses. Several predictors related to brightness temperature are derived from GOES infrared images, which contain information about the strength and organization of convection (DeMaria et al. 2005). The feature selection process for the 24-h model is described in section 2d, and the feature selection process for the 6-h model is described in section 4b. [The short name, long name, mean value, and standard deviation (STD) of the selected predictors for the two models are described in Tables A1 and A2 in the appendix.] As shown in appendix, the SHIPS developmental dataset includes more than 100 predictors, with many of those being available every 6 h, but only about 20% of those are selected for inclusion in the operational SHIPS forecast model, based on standard multiple linear regression significance tests.

The reanalyzed 6-hourly SHIPS predictors, available from 1982 to 2017, are obtained from the National Oceanic and Atmospheric Administration’s (NOAA’s) Regional and Mesoscale Meteorology Branch (RAMMB) (http://rammb.cira.colostate.edu/research/tropical_cyclones/ships/index.asp, last accessed 29 October 2020). The operational 6-hourly SHIPS predictors from 2010 to 2020 were acquired directly from NHC, but the same dataset is available from RAMMB as well. Because different types of predictors affect the model evaluation results, it is worthwhile to distinguish the differences between the reanalysis and operational datasets:

  • Reanalysis data (or reforecast data): Reanalysis is a scientific method that combines earth system observations and a numerical model to generate a synthesized estimate of the state of the system. The reanalyzed TC datasets are often released annually or after significant improvements have been made in the numerical models and data assimilation schemes, thereby incorporating more accurate observed track location and environmental conditions. A “perfect prog” approach is used for the SHIPS developmental dataset, in which the final best track positions are used and analysis fields valid at future times are used instead of model forecast fields.
  • Operational data (or real-time data): Unlike the reanalysis dataset, the operational predictors are estimates that are available when the NHC official forecasts are made. The operational dataset may contain inaccurate track information, inaccurate initial intensity and trends of intensity, along with other biases in environmental conditions due to the use of global model forecast fields. As expected, with the same statistical or statistical–dynamical model, TC intensity predictions based on operational data may have larger errors than those based on reanalysis data.

Although operational predictors are less accurate than reanalysis predictors, it is important to use the former for training to make sure that the data distribution in the model development environment and the final deployment environment are consistent. Due to the small sample size of operational data, for training and validation purposes we included all reanalysis samples (1982–2017) in addition to the operational samples (2010–18). To evaluate the impacts of the use of both reanalysis and operational data for an overlapping set of cases (for years 2010–17), two controlled experiments were conducted with one including reanalysis data, and the other excluding reanalysis data. The testing results in Year 2020 showed that including overlapping reanalysis cases improves the MLP predictive skills, indicating that the small sample size is still a limiting factor for the MLP’s performance. To further overcome the sample size limitations, we adopted a leave one year out (LOYO) testing strategy, as described in detail in section 3c.

b. Intensity records

The TC track and intensity data for 1982–2019 were obtained from Kerry Emanuel’s Global Tropical Cyclone Data in Network Common Data Form format (ftp://texmex.mit.edu/pub/emanuel/HURR/tracks_netcdf/; last accessed 29 October 2020), which combines data from the NHC and the U.S. Navy’s Joint Typhoon Warning Center. The TC track and intensity data for 2020 were obtained from NOAA’s International Best Track Archive for Climate Stewardship (IBTrACS) (Knapp et al. 2010, 2018). The intensity change, calculated as the target time-step intensity minus current intensity, is used as the predictand in this study. Predictands are matched to predictors based on location and time.

c. SHIPS, DSHP, LGEM, HWFI, and OFCL

To provide a comparison with the MLP forecasts, the operational forecasts from SHIPS, Decay-Statistical Hurricane Intensity Prediction Scheme (DSHP) (DeMaria et al. 2006), Logistic Growth Equation Model (LGEM) (DeMaria 2009), HWFI, and NHC Official Forecasts (OFCL) (Simon et al. 2018) for the Atlantic basin were acquired from the NHC’s operational forecast archive (https://ftp.nhc.noaa.gov/atcf/archive/; last accessed 26 October 2020) (Sampson and Schrader 2000). The DSHP is based on SHIPS but has an inland decay component. LGEM uses a dynamical prediction system, whereby the wind speed growth rate is determined by a subset of SHIPS predictors. OFCL is subjectively determined by NHC forecasters and is based on all the model predictions that are available within about 3 h after synoptic time.

d. Feature selection and data preprocessing

In the standard release of the SHIPS developmental dataset, more than 500 predictors are related to the 24-h intensity change. We first removed the predictors that are only available in the reanalysis, and then removed predictors that are dependent on TC intensity, such as normalized ocean age by the maximum wind (NAGE). Most of the remaining features were time-dependent predictors (such as vertical shear) that are provided at each of the time steps of 0, 6, 12, 18, and 24 h. SHIPS’s practice is to average the time-dependent predictors along the track (DeMaria et al. 2005). We conducted linear regression experiments using only 24-h predictors and using only 0–24-h average predictors, and found that the 24-h predictors resulted in models with smaller predictive errors. This result is consistent regardless whether reanalysis data or operational data are used for training and testing. As a result, we retained only the 24-h time-step predictors for the time-dependent variables. For TC persistence and trend, we retained the current intensity and the last 12-h intensity change (DELV-12), following common practices (Knaff et al. 2003; DeMaria et al. 2005) reported in the literature, based on intensity records described in section 2b. (The 121 final predictors used in this study along with statistics are listed in the appendix, Table A1.) Missing values for the distance to the nearest landmass (DTL) are filled based on the projected track location. For the rest of the missing values, we used the reanalysis mean value to fill rows with missing predictors. The final 24-h intensity model predictors include a total sample size of 56 223 in the reanalysis data from 1982 to 2017, and a sample size of 2772 in the operational data from 2010 to 2018. The predictors are normalized by removing the reanalysis data mean and dividing by the reanalysis data STD for each of the predictors.

The features for the 6-h lightweight model are selected differently and are described in the discussion under the experiments and results section. The final 6-h intensity model predictors include a total sample size of 62 590 in the reanalysis data from 1982 to 2017, and a sample size of 3005 in the operational data from 2010 to 2018. The summary of final data availability for the training-validating-testing at basin-level is described in Table 2 (the 24-h model) and Table 3 (the 6-h model).

Table 2.

Summary of the availability of data for the 24-h intensity model.

Table 2.
Table 3.

Summary of data availability for the 6-h intensity model.

Table 3.

3. Methods

a. Multilayer perceptron

DL is a part of a broader set of ML methods based on artificial neural networks, with architectures including MLP, deep belief networks, recurrent neural networks, convolutional neural networks, etc. Here, we start with a relatively simple architecture, MLP, which also requires tabular data as inputs like the predictors of SHIPS. MLP refers to a DL model that has multiple feed-forward and fully connected hidden layers between an input layer and an output layer. Each hidden layer computes a linear combination of the outputs from the previous layer and applies a nonlinear activation function. Thus, each layer transforms its input data into a slightly more abstract and composite representation, making it possible for MLP to learn highly complex and nonlinear relationships involving many predictors. MLP can be used for both classification and regression.

b. Neural networks optimization

To optimize the hyperparameters of our model architecture, we use a Bayesian search algorithm called tree of parzen estimators (TPE) (Bergstra et al. 2011, 2013b), as implemented in the hyperopt python package (Bergstra et al. 2013a). TPE is a sequential model-based optimization algorithm that uses a history of hyperparameter settings {x0, x1, …, xt} and corresponding objective function evaluations {f(x0), f(x1), …, f(xt)} to construct an approximation of f, denoted f~, that is computationally easier to evaluate and optimize. The f~ is then optimized with respect to an improvement criterion (Jones 2001), which finds xt+1 such that f(xt+1) is likely to be near a minimum and f(xt+1) is likely to produce information that, when used to update f~, will reduce the dissimilarity between it and f. With xt+1, f is evaluated, and the value f(xt+1) is used to update f~. This process continues for a predefined number of iterations or until a stopping condition is reached. In our application, we specified the number of iterations to be 500 and allocated 7 days of computing time on a GPU. The optimization algorithm iterated through 257 combinations before reaching the 7-day time limit, and the best performing model was subsequently selected in the search record.

When TPE was used in Experiment I, we searched the number of hidden layers, the number of nodes in each layer, activation functions in each layer, batch size, learning rate, and regularization coefficient. As an objective function, we used the validation MAE on the epoch when early stopping was reached. Our early stopping criterion occurs when the best validation MAE does not decrease by more than 0.05 kt after 10 epochs. The values of the hyperparameters we search over are shown in Table 4.

Table 4.

Hyperoptimization search space.

Table 4.

While TPE was used in Experiment I, the model in Experiment II was lightweight and allowed for a more thorough and yet computationally demanding search algorithm—Grid Search, which exhaustively examines each possible combination of hyperparameter choices. The implementation details for the two search algorithms can be found in the experiments and results section.

c. LOYO testing scheme

As shown in Fig. 1, to overcome the challenge of the observation period being too short for training a DL model, we designed a LOYO testing scheme that allows for testing of individual years while using the majority of the data. The idea of LOYO can be illustrated in an example of testing the 24-h intensity model for operational forecasts for the Atlantic 2017 season: we trained and validated the model with all the data in all the years and all the basins available, including both reanalysis and operational data, except for the 2017 Atlantic reanalysis or operational data; then the model was tested on the 2017 Atlantic operational data. To test the model on a different year, e.g., 2018, the model was reinitiated and trained from scratch. In this way, the model was only evaluated on unseen testing data, and this provided a fair way to evaluate model performance for different years, while leveraging the limited number of observations. Training and validation data follow a 90%–10% ratio, where validating data are used for early stopping purposes in training.

Fig. 1.
Fig. 1.

Data processing and model training flowchart.

Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0104.1

Although the LOYO testing scheme solved the data limitation problem, it gave the MLP an advantage over other operational models being compared, because the MLP was exposed to more years of training data. For example, to test model performance for Year 2010, the MLP was exposed to a training dataset that includes 2011–18 data, while the operational models back in 2010 did not have access to such data. To address this problem, additional independent tests were conducted for 2019 and 20 in the first experiment to further validate the intermodel comparisons.

4. Experiments and results

a. 24-h intensity model for operational forecasts

The first experiment trains an MLP model on the 121 selected SHIPS predictors to predict the change in wind speed intensity during a 24-h period. The model training follows the LOYO strategy, where the model is always tested on an unseen year’s operational data for the tested basin. The best architecture and hyperparameters are found by conducting a Bayesian TPE algorithm as described in section 3b. The optimal network was found to have two hidden layers and 2048 nodes in each layer. To evaluate the MLP model performance, we compared its 24-h intensity change error statistics with that of SHIPS, DSHP, LGEM, HWFI, and OFCL. For a homogeneous comparison, we only kept the events for which operational forecast results are available from all models.

Figure 2 shows that the MLP model outperformed the statistical-dynamical models SHIPS, DSHP, and LGEM on most LOYO tested years; the 2010–18 mean MAE was lower by 1.91, 1.01, and 0.87 kt compared to the three models, respectively. The 2010–18 multiyear average MAE of the MLP model is slightly lower (by 0.46 kt) than HWFI and slightly higher (by 0.18 kt) than OFCL. To simulate how the MLP may possibly perform in a real-time operational forecast mode, additional independent tests were conducted using 2019 and 20 Atlantic operational data. Again, the MLP outperformed all three statistical–dynamical models. In 2019 the MLP has slightly higher MAE than HWFI (by 5%) but performed comparably with HWFI in 2020. While Fig. 2 uses MAE as the only metric, Table 5 included four more metrics to evaluate MLP: mean error (ME), root-mean-square error (RMSE), R2, and prediction STD. As an error metric, RMSE is more sensitive to large errors than MAE. The fact that the MLP has higher MAE but lower RMSE than HWFI in 2019–20 independent tests, indicates that the MLP is more skillful than HWFI in correctly predicting extreme intensity changes. According to Na et al. (2018), the 24-h intensity change STD in OFCL is only two-thirds of that in reality, indicating that not enough cases of large intensity change are forecasted, which is a potential area of improvement. The MLP has the highest STD among all the models in both the LOYO tests (75% of observed STD) and the independent tests (87% of observed STD), suggesting that the MLP can predict a wider range of intensity changes than any other models compared. In terms of bias, all models have fluctuating MEs year-to-year, which is linked to the fact that all models suffer from systematic biases when forecasting extreme intensity changes, as further explained in the error distribution analysis below.

Fig. 2.
Fig. 2.

The 24-h intensity change MAE (kt) from MLP, SHIPS, DSHP, LGEM, HWFI, and OFCL, tested on 2010–20 Atlantic TCs over the same 6-hourly locations. While the MLP model was tested using the LOYO method in 2010–18, it was tested truly independently in 2019 and 20 as if in a real-time mode. The shading around the MLP line denotes the 95% confidence interval based on the bootstrap method with sample size equal to 40 and the number of repetitions equal to 10 000.

Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0104.1

Table 5.

The 24-h intensity change model error statistics and model prediction STDs. The MLP model was tested using the LOYO testing scheme during the 2010–18 period and was tested independently in 2019 and 20. Data about the rest of the models (SHIPS, DSHP, LGEM, HWFI, OFCL) come from the NHC’s operational forecast archive. During the 2010–18 testing period, there are 2464 6-hourly locations tested, and the observed 24-h intensity change STD is 17.17 kt. During the 2019–20 independent testing period, there are 828 6-hourly locations tested, and the observed 24-h intensity change STD was 19.83 kt.

Table 5.

Figure 3 compares the 24-h intensity forecast error distribution from the MLP, SHIPS, DSHP, LGEM, HWFI, and OFCL of the 2010–18 Atlantic TCs. Na et al. (2018) suggested that OFCL is strongly anticorrelated with TC intensity change, particularly tending to produce negative errors (underforecast) for rapid intensification (RI; 30-kt or more intensity increase over a 24-h period) events, while tending to produce positive errors (overforecast) for rapid weakening (RW, 30-kt or more intensity decrease over a 24-h period) events. In our analysis, OFCL made 26 large overforecasts (>30 kt) for RW events and 34 large underforecasts (<−30 kt) for RI events. With knowledge of where land is located, DSHP and HWFI made significantly less overforecasts for RW events (21 and 9, respectively) comparing to SHIPS (50). By comparison, the MLP model only made 6 large overforecasts during the same period, the best among all models. The MLP model made the same number of large underforecasts for RI events as OFCL, better than all other models.

Fig. 3.
Fig. 3.

Distribution of 24-h intensity forecast error frequencies from different models: (a) MLP, (b) SHIPS, (c) DSHP, (d) LGEM, (e) HWFI, and (f) OFCL, with respect to the observed TC 24-h intensity change on 2010–18 Atlantic TCs. All frequencies are labeled by numbers in each grid. The red bounding boxes highlight large overforecast (>30 kt) for rapid weakening (RW) events, and the magenta bounding boxes highlight large underforecast (<−30 kt) for rapid intensification (RI) events. The numbers on top of the red and magenta boxes represent the event counts in bounding boxes.

Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0104.1

Besides the assessment of model error distribution, it is also important to examine the degree of freedom or independence (Sampson et al. 2008) among forecasting models and consensus. As suggested by Sampson et al. (2008), larger degrees of independence will result in larger improvements of the multimodel consensus. Here we calculated correlations for biases between different models and consensus for 2010–18, as shown in Fig. 4. The correlation between the MLP and other models remains below 0.8, while the correlations between SHIPS, DSHP and LGEM are above 0.8, indicating that the MLP maintains a relatively higher degree of independence from the SHIPS-related models despite using the same predictors. The correlation between the MLP and HWFI (0.69) is also relatively low, further indicating that the MLP has the potential to improve the consensus.

Fig. 4.
Fig. 4.

Heat map of correlation coefficient on 24-h model biases among different models and consensus for 2010–18. The smaller correlation coefficient indicates the higher level of degree of independence between models and the consensus.

Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0104.1

Although the MLP model is not trained specifically for RI detections, we also assessed its robustness from a different perspective by converting continuous intensity change predictions to binary RI classifications. Table 6 shows RI detection statistics from LOYO tests conducted for the 2017–18 season, as well as from independent tests conducted for the 2019–20 season. For the 54 observed RI events in the Atlantic basin in 2017–18, the MLP made the most correct predictions of all the models and the least missed predictions. Although the MLP produces a slightly higher number of false alarms (FA, 5 versus 3 in HWFI and OFCL), it has the highest Gilbert skill score (GSS) and Peirce skill score (PSS) of all models. The GSS is a commonly used metric in natural disaster prediction that answers the question of how well did the model detect RI events corresponding to the observed RI events (Wilks 2006). The PSS is calculated as the difference of the probability of detection and the probability of false detection (Wilks 2006), and answers the question of how well did the model separate the RI events from the non-RI events. For the 50 observed RI events in the Atlantic basin in 2019–20, the MLP made the most correct predictions and, has the highest GSS and PSS of all the models except for OFCL. It is also interesting to note that, SHIPS, DSHP, LGEM and OFCL make significantly more correct RI detections for the 2019 and 20 seasons compared to the 2017 and 18 seasons. HWFI makes the same number of correct RI detections for 2019–20 as in 2017–18; however, it raises significantly more FAs (9 in 2019–20 compared to 3 in 2017–18). The year-to-year variability in an individual model’s performance may possibly be attributed to its development and interannual variability in RI characteristics. Again, MLP’s cross-validation uncertainty is small, with even the lower bound of the 95% confidence level scenario for the MLP achieving a performance comparable to or better than HWFI, indicating that the MLP’s superior skill in performing RI classification is statistically significant. These results suggest that the MLP is a powerful tool for detecting RI events, consistent with the conclusions of Cloud et al. (2019). However, even though the skill of the MLP was statistically significant, the detection rate is still around 20%, so considerable improvement is needed to make it a viable guidance model for RI.

Table 6.

RI detection tested in the Atlantic basin during 2017–18 (LOYO test) and 2019–20 (independent test). The MLP RI statistics come from a majority voting schemes. The numbers in parentheses for the MLP row denote the 95% confidence interval based on the bootstrap method, and sample size is equal to 40 and 10 000 bootstrap repetitions.

Table 6.

We conclude that the MLP has significant skill in predicting the 24-h intensity change. It outperformed the operational statistical-dynamical models and is comparable to the leading dynamical model. The model’s cross-validation uncertainty is small, suggesting that it is robust and can produce reliable real-time operational forecasts. The improved prediction power of the MLP model is due to three factors: 1) the nonlinear activation functions enable the MLP to simulate highly nonlinear and complex processes, which probably better describe the TC intensification processes than linear functions; 2) the Bayesian-based hyperparameter search, which optimized the architecture and hyperparameters; and 3) the global reanalysis SHIPS predictors since 1982 provided sufficient training samples for proper training of a DL algorithm.

b. Lightweight 6-h intensity model for climate studies

In addition to forecasting changes in wind intensity for operational forecasts, the community also wants to understand the evolution of long-term TC risk associated with climate change. In such applications, environmental predictors from global climate models are often fed into a synthetic TC model, as described by Emanuel et al. (2006) and Lee et al. (2018), to generate synthetic TC tracks along with intensities. Here, the lightweight model is designed to only depend on the large-scale environmental conditions that would be available from a global climate model. In this setting, since the intensity is updated every 6 h, the target becomes a 6-h intensity change. Also, because the interest here is to better predict TC intensities in climate studies instead of in operational forecasting, the testing is performed only on the reanalysis dataset.

We narrowed the list of predictors based on the literature. Lin et al. (2017) demonstrated that reducing the number of predictors from 26 to 11 yields similar R2 values with linear and mixture models for 6-h intensity change predictions. Here, we adopted seven important variables from Lin et al. (2017): the current intensity, last 6-h intensity change, vertical shear, 200-hPa zonal wind, maximum potential intensity, latitude, and longitude. We added two variables based on univariate feature importance analysis: 1000-hPa equivalent potential temperature, and DTL. (The details of the nine features and associated statistics are listed in the appendix, Table A2).

Because the lightweight MLP uses only nine predictors and runs considerably faster than the 24-h model, we were able to conduct an exhaustive grid search instead of a Bayesian-based search to determine the best neural network architecture. We used GridSearchCV in the Scikit-learn library (Pedregosa et al. 2011) to perform the grid search. In addition to searching hyperparameters, we searched the number of hidden layers with choices ranging from two to five, and the number of neurons in each layer with choices from 8, 16, 32, 64, 128, or 256. Threefold cross validation was conducted during the grid search. The optimal network was found to have five hidden layers, with number of nodes as 128, 256, 128, 256, and 256, respectively, for each layer. The rest of the hyperparameters are specified as follows: Adam optimization, adaptive learning rate, “ReLU” activation, L2 regularization with alpha as 0.0005, maximum epochs of 200, and without early stopping.

Figure 5 shows the 6-h intensity change testing MAE, ME and R2 on the 1982–2017 Atlantic basin using the LOYO testing scheme. The MLP consistently performs better than the simple linear regression (LR) model: the MAE from the MLP is lower and the R2 is higher across all years. The 1982–2017 mean MAE from the MLP is 3.26 kt, compared to 3.63 kt from the LR. The 1982–2017 mean R2 from the MLP is 0.42, compared to 0.29 from the LR. Both the MLP and LR models have a negative bias when averaged over testing years (mean ME for MLP: −0.21 kt; mean MAE for LR: −0.24 kt); however, the biases are considerably smaller compared to observed 6-hourly intensity change STD (6.38 kt). Because the NHC forecast archive does not provide 6-h intensity predictions, here our model comparisons are only limited to the MLP and the LR using the same predictors. The MAE STD from 10 experiments generated by changing the training and validation dataset and initializing the weight differently is small, as shown in the error bars (mean STD is 0.065 kt), indicating that the MLP model is robust. Again, the improved predictive skills of MLP for 6-hourly intensity is due to the superiority of the DL algorithm, the automated neural networks optimization, and the abundance of more than 60 000 entries of reanalyzed SHIPS predictors available for model training.

Fig. 5.
Fig. 5.

The 6-h intensity change (a) MAE (kt), (b) ME (kt), and (c) R2 from the MLP model for each year of 1982–2017 tested on North Atlantic TCs. The black error bars indicate the STD from 10 experiments with different training and validation splits and different random weight initializations.

Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0104.1

To demonstrate the usefulness and possible applications of the 6-hourly MLP intensity model, we coupled it with a synthetic track model to generate synthetic Atlantic TCs of the current climate. The track model is based on the method of Emanuel et al. (2006) and includes additional improvement of a spatially varying beta drift, as described by Zhao et al. (2009). For a detailed description of the track model application, see Kelly et al. (2018). We generated 50 000 synthetic tracks with MLP-calculated intensities. While storm longitude, latitude, and large-scale wind directly come from the track model, maximum potential intensity and equivalent potential temperature are calculated from the ERA-Interim monthly reanalysis dataset (Berrisford et al. 2011). To generate realistic and yet random environmental conditions, for each storm a random year between 1979 and 2018 is selected, and the ERA-Interim monthly environmental variables from that year and month are used for the MLP intensity model. Tracks are terminated when the intensity falls below 25 kt, as described by Emanuel et al. (2006). Figures 6a and 6b show that the overall distribution of synthetic TC tracks resembles that of the observed events. Most synthetic tracks intensify in the Caribbean Sea and the Gulf of Mexico, and tracks are effectively terminated when TCs reach the continent of North America, which is consistent with the physics of TC intensification. In Fig. 6c, the lifetime maximum intensity distribution from the synthetic TCs is largely similar to the observed distribution. Synthetic TCs can reach a maximum intensity of 148 kt, suggesting that the model can produce storms of category-5 strength. Despite this, the model slightly underestimates the number of intense category-3 and above TCs compared to observations. This is likely because monthly mean values, instead of daily or 6-houly values, are used to represent the large-scale environmental conditions in this particular application, which results in the smoothing out of sharp gradients and extreme conditions that are linked to the strongest TCs. Another possible reason is that the small negative bias (−0.21 kt) in the 6-h intensity model may have caused an overall underestimation of lifetime maximum intensity. In future applications, bias correction may be helpful to generate more realistic future synthetic TCs; however, the application demonstrated here is not bias corrected. Despite this, the model is able to simulate 1534 category-4 and category-5 TCs out of the 50 000 synthetic storms. Figure 6d compares TC landfall counts at 51 major coastal cities, along the U.S. East and Gulf Coasts, from the model and observations. While the observed landfalls are limited in number, the synthetic approach generated an abundant number of landfall events (20 times more than observations). The synthetic landfalling TC counts correlate very well with the observed landfalling TC counts for the 51 cities with a correlation coefficient of 0.77. This further supports the ability of the model to realistically represent TC track distribution in a given climate.

Fig. 6.
Fig. 6.

(a) 100 synthetic tracks with intensity shown by line color, randomly selected from 50 000 synthetic tracks. (b) Randomly selected 100 observed tracks after 1970. (c) TC lifetime maximum intensity distribution comparisons between 50 000 synthetic tracks and 718 observed tracks (1970–2013) in the North Atlantic basin. (d) TC landfall counts at 51 major coastal cities on the U.S. East Coast and the Gulf of Mexico, comparing 50 000 synthetic tracks and 1753 observed tracks (1851–2013). A TC is counted once if the track passes within a 1° radius from the city center coordinates.

Citation: Weather and Forecasting 36, 4; 10.1175/WAF-D-20-0104.1

c. Predictor importance ranking

To gain insight into why the MLP model was able to achieve competitive predictive skills using the same predictors available to SHIPS, we ranked the 121 predictors from Experiment I according to the Garson variable importance score. The Garson variable importance score for a predictor is calculated from the absolute values of the products of all the neural network weights that are connected to the predictor (Goh 1995). The scores are subsequently scaled so that the scores from all predictors add up to 1.

Table 7 shows the top 10 most important predictors for the 24-h MLP model. Current intensity (vs0), with an importance score of 0.101, is the most importance predictor, which is consistent with Cloud et al. (2019). DTL is the second most important predictor, contributing close to 5% of the predictability. TC’s maximum wind speed decreases typically by around 50% during the first 12 h after landfall, and land interaction has been known to be important to intensity forecasting (DeMaria et al. 2006; Cangialosi et al. 2020). Tangential wind related predictors, TWXC and V000, describe the quality of the TC cyclonic structure and are ranked the third and seventh most important predictors. The climatological depth of the 20°C isotherm (CD20) is related to the ocean heat content that controls the energy supply of TCs (Wada and Usui 2007). The uplift vertical velocity of a parcel (VVAC) is related to the convective instability and is an important part of TC intensifying process (Wang 2014). DeMaria and Kaplan (1999) showed that air temperature and zonal wind are significant predictors of SHIPS, and in our MLP feature ranking, temperature at 250 and 200 hPa (T250 and T200) as well as 200-hPa zonal wind (U200) made it to the list of top 10 predictors. Brightness temperature derived from satellite images contains information related to the strength of convection and hence is a good indicator of intensity change (DeMaria et al. 2005; Shimada et al. 2018).

Table 7.

The top 10 most important predictors for 24-h MLP. Relative feature importance scores are determined by the Garson variable importance measure. The feature importance scores of all 121 predictors add up to 1.

Table 7.

The MLP may rank the significance of a predictor differently from a linear method for two reasons: 1) with 121 predictors, the MLP model leverages more information to make predictions than other statistical–dynamical models, and therefore may use the predictors differently; 2) the MLP is a neural network–based nonlinear model, which allows for more complex relationships between the predictors and the predictand in comparison to single weight coefficient-defined relationships in linear regression. Predictors previously not shown to be important in a linear method may turn out to be useful in a nonlinear approach and vice versa. Despite differences between the MLP and the more traditional multivariate linear regression methods, it may still be surprising that vertical shear is not among the top ten most important predictors for instance. We found that vertical wind shear and U200 (ranked tenth in feature importance) have a significant correlation coefficient of 0.55, which indicates that effects of wind shear are, at least partially, included through the use of U200. Although the predictor importance ranking offers an opportunity to understand how the MLP makes its predictions, we acknowledge that the ranking here does not constitute feature selection recommendations for other modeling practices that are not neural network based.

5. Discussion and conclusions

Two DL experiments were conducted to predict TC intensity change, using a DL model—MLP—trained with predictors that are available from SHIPS, a well-known statistical–dynamical model. In the LOYO tests of the first experiment, the MLP model predicted a 24-h intensity change that had a 20% lower MAE than SHIPS for Atlantic forecasts that used only input that was available in real-time. In the 2019 and 20 independent tests where the MLP model is tested as if in a real-time operational forecast mode, it again outperformed all three statistical–dynamical models, with a MAE lower than those of SHIPS, DSHP and LGEM by 22%, 5%, and 8%, respectively. When compared to the leading dynamical model HWFI, the MLP has lower RMSE (by −1%), higher MAE (by 2%), and the same R2 indicating that the MLP’s forecasting skills are broadly comparable to that of HWFI. The MLP also detected RI events more accurately than other models and had the highest GSS and PSS. In the second experiment, a lightweight MLP model using only nine predictors consistently outperformed the linear regression for 6-hourly predictions and achieved a model-data R2 correlation coefficient of 0.42 on 1982–2017 data, which is significantly higher than those reported in previous studies (Lin et al. 2017). When coupled with a synthetic track model, the MLP model generated realistic synthetic TCs in the Atlantic basin, which demonstrated the possibility of using this DL-based intensity model to generate a large quantity of synthetic TCs for the current climate and hypothetical climate scenarios.

The MLP has not only demonstrated competitive predictive skills, but also maintained relatively high level of independence from other operational models due to its DL-based modeling framework. As a result, the MLP would potentially be a meaningful addition to the NHC consensus methods to further improve official forecast skills, thereby helping address a task that has challenged scientists for decades. Previous literature indicates that implementing DL for TC intensity predictions has certain drawbacks, including overfitting and optimization challenges. In this study, we overcame the first challenge by using a LOYO testing scheme that keeps the majority of the data for training/validation, while allowing the model to be thoroughly tested on unseen data. To further validate that the MLP model’s superior performance is not due to overfitting, we supplemented the LOYO testing with additional independent tests and simulated how the MLP will perform in a real-time operational forecast mode. This study also addressed the optimization problem by implementing two automated optimization methods for DL architecture and hyperparameters. The techniques and methods used in this study could pave the way for future applications of DL in weather and climate-related studies.

The methodology and experiments can be further improved. Although trained on a global dataset, the models have only been tested in the Atlantic basin and not yet on a global scale. It would be interesting to see the predictive skills of the proposed approach in other basins. Also, it would be interesting to see if the MLP can be further improved through the addition of more real-time satellite observations of the upper-ocean and atmosphere (Balaguru et al. 2020; Su et al. 2020). Another major limitation of the current version of the DL model is that it only provides forecasts for 24-h period. NHC and Joint Typhoon Warning Center (JTWC) currently provide intensity forecasts for up to 5 days and are considering extending those to 7 days. The 24-h intensity forecasts rely heavily on persistence, but the longer-range forecasts do not. Applying the DL method beyond 24 h is needed to determine its usefulness for operational forecasting. Also, because the longer-range forecasts depend more heavily on the time-dependent predictors, the use of time-averaging versus just the value at the forecast time, as used in the current DL model, will need to be reevaluated. The MLP-based intensity models may be further improved by extending the DL architecture search to beyond five hidden layers, which may result in deeper and more powerful models. To incorporate the methodology in real-time operational forecasts, predictors from models other than SHIPS can be added to further improve model performance. Last, the MLP objective function has been defined to be MAE for the first experiment, but the model can be tailored to a specific purpose by modifying the objective function; for example, creating a specialized RI detection model with even higher true detections by rewarding true positives in the objective function.

Acknowledgments

The operational forecast portion of this research was supported by the Deep Science Agile Initiative at Pacific Northwest National Laboratory (PNNL). It was conducted under the Laboratory Directed Research and Development Program at PNNL. PNNL is a multiprogram national laboratory operated by Battelle for the U.S. Department of Energy under Contract DE-AC05-76RL01830. The synthetic tropical cyclone portion of this research was supported by the Multisector Dynamics program areas of the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research as part of the multi-program, collaborative Integrated Coastal Modeling (ICoM) project. K. B acknowledges support from the Regional and Global Modeling and Analysis Program of the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research (BER) and from NOAA’s Climate Program Office, Climate Monitoring Program (Award NA17OAR4310155). The work would not be possible without the data providers; thanks to RAMMB/CIRA for offering the SHIPS predictors. Thanks to NHC and JTWC for the track and intensity records, and thanks to Dr. K. Emanuel for compiling the global TC records in the netcdf format. Also, thanks to NHC for providing the operational forecast archive. We thank Dr. J. Knaff and two anonymous reviewers for their insightful comments that helped improve this manuscript significantly.

Data availability statement

To encourage more development of ML algorithms for TC intensity forecasting using comparable datasets and methods, the processed data and code to make intensity forecast using the MLP model are made available at public domains. The training, validation, and testing data processed during this study can be downloaded at http://doi.org/10.5281/zenodo.4784610. The associated code is at https://github.com/DOE-ICoM/tropicalcyclone_MLP.

APPENDIX

Predictors’ Details

Table A1 provides a complete list of the 121 predictors used in the 24-h intensity model in Experiment I. Table A2 provides a complete list of the nine predictors used in the lightweight 6-h intensity model in Experiment II. All predictor statistics are derived from the SHIPS reanalysis dataset. The predictor description is adapted from RAMMB (http://rammb.cira.colostate.edu/research/tropical_cyclones/ships/docs/ships_predictor_file_2018.doc).

Table A1.

Complete list of all 121 engineered features used in the 24-h intensity model in Experiment I.

Table A1.
Table A2.

Complete list of all nine predictors for the lightweight 6-h intensity model in Experiment II.

Table A2.

REFERENCES

  • Balaguru, K., G. R. Foltz, L. R. Leung, S. M. Hagos, and D. R. Judi, 2018: On the use of ocean dynamic temperature for hurricane intensity forecasting. Wea. Forecasting, 33, 411418, https://doi.org/10.1175/WAF-D-17-0143.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Balaguru, K., G. R. Foltz, L. R. Leung, J. Kaplan, W. Xu, N. Reul, and B. Chapron, 2020: Pronounced impact of salinity on rapidly intensifying tropical cyclones. Bull. Amer. Meteor. Soc., 101, E1497E1511, https://doi.org/10.1175/BAMS-D-19-0303.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bergstra, J. S., R. Bardenet, Y. Bengio, and B. Kégl, 2011: Algorithms for hyper-parameter optimization. Advances in Neural Information Processing Systems, J. Shawe-Taylor et al., Eds., Information Processing Systems Foundation, Inc., 2546–2554.

  • Bergstra, J. S., D. Yamins, and D. D. Cox, 2013a: Hyperopt: A python library for optimizing the hyperparameters of machine learning algorithms. Proc. 12th Python in Science Conf., Austin, TX, Citeseer, 13–20.

    • Crossref
    • Export Citation
  • Bergstra, J. S., D. Yamins, and D. D. Cox, 2013b: Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. Proc. 30th Int. Conf. on Machine Learning, Atlanta, GA, ICML, 115–123.

  • Berrisford, P., D. Dee, P. Poli, R. Brugge, K. Fielding, M. Fuentes, and A. Simmons, 2011: The ERA-Interim archive version 2.0. ERA report series 1, Tech. Rep., ECMWF, 23 pp.

  • Cangialosi, J. P., 2019: National Hurricane Center forecast verification report: 2018 hurricane season. National Hurricane Center, 73 pp., www.nhc.noaa.gov/verification/pdfs/Verification_2018.pdf.

  • Cangialosi, J. P., E. Blake, M. DeMaria, A. Penny, A. Latto, E. Rappaport, and V. Tallapragada, 2020: Recent progress in tropical cyclone intensity forecasting at the National Hurricane Center. Wea. Forecasting, 35, 19131922, https://doi.org/10.1175/WAF-D-20-0059.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Chaudhuri, S., D. Dutta, S. Goswami, and A. Middey, 2013: Intensity forecast of tropical cyclones over North Indian Ocean using multilayer perceptron model: Skill and performance verification. Nat. Hazards, 65, 97113, https://doi.org/10.1007/s11069-012-0346-7.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cloud, K. A., B. J. Reich, C. M. Rozoff, S. Alessandrini, W. E. Lewis, and L. Delle Monache, 2019: A feed forward neural network based on model output statistics for short-term hurricane intensity prediction. Wea. Forecasting, 34, 985997, https://doi.org/10.1175/WAF-D-18-0173.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Combot, C., A. Mouche, J. A. Knaff, Y. Zhao, Y. Zhao, L. Vinour, Y. Quilfen, and B. Chapron, 2020: Extensive high-resolution Synthetic Aperture Radar (SAR) data analysis of tropical cyclones: Comparisons with SFMR flights and best track. Mon. Wea. Rev., 148, 45454563, https://doi.org/10.1175/MWR-D-20-0005.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Courtney, J. B., and et al. , 2019: Operational perspectives on tropical cyclone intensity change. Part 1: Recent advances in intensity guidance. Trop. Cyclone Res. Rev., 8, 123133, https://doi.org/10.1016/j.tcrr.2019.10.002.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cummings, J. A., 2005: Operational multivariate ocean data assimilation. Quart. J. Roy. Meteor. Soc., 131, 35833604, https://doi.org/10.1256/qj.05.105.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • DeMaria, M., 2009: A simplified dynamical system for tropical cyclone intensity prediction. Mon. Wea. Rev., 137, 6882, https://doi.org/10.1175/2008MWR2513.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • DeMaria, M., and J. Kaplan, 1994: A Statistical Hurricane Intensity Prediction Scheme (SHIPS) for the Atlantic basin. Wea. Forecasting, 9, 209220, https://doi.org/10.1175/1520-0434(1994)009<0209:ASHIPS>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • DeMaria, M., and J. Kaplan, 1999: An updated Statistical Hurricane Intensity Prediction Scheme (SHIPS) for the Atlantic and eastern North Pacific basins. Wea. Forecasting, 14, 326337, https://doi.org/10.1175/1520-0434(1999)014<0326:AUSHIP>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • DeMaria, M., M. Mainelli, L. K. Shay, J. A. Knaff, and J. Kaplan, 2005: Further improvements to the Statistical Hurricane Intensity Prediction Scheme (SHIPS). Wea. Forecasting, 20, 531543, https://doi.org/10.1175/WAF862.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • DeMaria, M., J. A. Knaff, and J. Kaplan, 2006: On the decay of tropical cyclone winds crossing narrow landmasses. J. Appl. Meteor. Climatol., 45, 491499, https://doi.org/10.1175/JAM2351.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • DeMaria, M., C. R. Sampson, J. A. Knaff, and K. D. Musgrave, 2014: Is tropical cyclone intensity guidance improving? Bull. Amer. Meteor. Soc., 95, 387398, https://doi.org/10.1175/BAMS-D-12-00240.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Emanuel, K., S. Ravela, E. Vivant, and C. Risi, 2006: A statistical deterministic approach to hurricane risk assessment. Bull. Amer. Meteor. Soc., 87, 299314, https://doi.org/10.1175/BAMS-87-3-299.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Fovell, R. G., and Y. P. Bu, 2015: Improving HWRF track and intensity forecasts via model physics evaluation and tuning. DTC visitor program final report, Developmental Testbed Center, 28 pp., https://dtcenter.org/sites/default/files/visitor-projects/DTC_report_2013_Fovell.pdf.

  • Giffard-Roisin, S., D. Gagne, A. Boucaud, B. Kégl, M. Yang, G. Charpiat, and C. Monteleoni, 2018: The 2018 climate informatics hackathon: Hurricane intensity forecast. Eighth Int. Workshop on Climate Informatics, Boulder, CO, Climate Informatics Hackathon, 4 pp.

  • Goh, A. T., 1995: Back-propagation neural networks for modeling complex systems. Artif. Intell. Eng., 9, 143151, https://doi.org/10.1016/0954-1810(94)00011-S.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jones, D. R., 2001: A taxonomy of global optimization methods based on response surfaces. J. Global Optim., 21, 345383, https://doi.org/10.1023/A:1012771025575.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kelly, P., L. R. Leung, K. Balaguru, W. Xu, B. Mapes, and B. Soden, 2018: Shape of Atlantic tropical cyclone tracks and the Indian monsoon. Geophys. Res. Lett., 45, 10746, https://doi.org/10.1029/2018GL080098.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Knaff, J. A., M. DeMaria, C. R. Sampson, and J. M. Gross, 2003: Statistical, 5-day tropical cyclone intensity forecasts derived from climatology and persistence. Wea. Forecasting, 18, 8092, https://doi.org/10.1175/1520-0434(2003)018<0080:SDTCIF>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Knaff, J. A., C. R. Sampson, and B. R. Strahl, 2020: A tropical cyclone rapid intensification prediction aid for the joint typhoon warning center’s areas of responsibility. Wea. Forecasting, 35, 11731185, https://doi.org/10.1175/WAF-D-19-0228.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Knapp, K. R., M. C. Kruk, D. H. Levinson, H. J. Diamond, and C. J. Neumann, 2010: The International Best Track Archive for Climate Stewardship (IBTrACS): Unifying tropical cyclone best track data. Bull. Amer. Meteor. Soc., 91, 363376, https://doi.org/10.1175/2009BAMS2755.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Knapp, K. R., H. J. Diamond, J. P. Kossin, M. C. Kruk, C. J. Schreck, 2018: International best track archive for climate stewardship (IBTrACS) project, version 4. NOAA/National Centers for Environmental Information, accessed 20 April 2021, https://doi.org/10.25921/82ty-9e16.

    • Crossref
    • Export Citation
  • Landsea, C. W., and J. L. Franklin, 2013: Atlantic hurricane database uncertainty and presentation of a new database format. Mon. Wea. Rev., 141, 35763592, https://doi.org/10.1175/MWR-D-12-00254.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lee, C. Y., M. K. Tippett, A. H. Sobel, and S. J. Camargo, 2018: An environmentally forced tropical cyclone hazard model. J. Adv. Model. Earth Syst., 10, 223241, https://doi.org/10.1002/2017MS001186.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lin, N., R. Jing, Y. Wang, E. Yonekura, J. Fan, and L. Xue, 2017: A statistical investigation of the dependence of tropical cyclone intensity change on the surrounding environment. Mon. Wea. Rev., 145, 28132831, https://doi.org/10.1175/MWR-D-16-0368.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lloyd, I. D., and G. A. Vecchi, 2011: Observational evidence for oceanic controls on hurricane intensity. J. Climate, 24, 11381153, https://doi.org/10.1175/2010JCLI3763.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Na, W., J. L. McBride, X. H. Zhang, and Y. H. Duan, 2018: Understanding biases in tropical cyclone intensity forecast error. Wea. Forecasting, 33, 129138, https://doi.org/10.1175/WAF-D-17-0106.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, and J. Vanderplas, 2011: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res., 12, 28252830.

    • Search Google Scholar
    • Export Citation
  • Rappaport, E. N., J. G. Jiing, C. W. Landsea, S. T. Murillo, and J. L. Franklin, 2012: The joint hurricane test bed: Its first decade of tropical cyclone research-to-operations activities reviewed. Bull. Amer. Meteor. Soc., 93, 371380, https://doi.org/10.1175/BAMS-D-11-00037.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sampson, C. R., and A. J. Schrader, 2000: The automated tropical cyclone forecasting system (version 3.2). Bull. Amer. Meteor. Soc., 81, 12311240, https://doi.org/10.1175/1520-0477(2000)081<1231:TATCFS>2.3.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sampson, C. R., and J. A. Knaff, 2009: Southern Hemisphere tropical cyclone intensity forecast methods used at the Joint Typhoon Warning Center. Part III: Forecasts based on a multi-model consensus approach. Aust. Meteor. Oceanogr. J., 58, 1927, https://doi.org/10.22499/2.5801.003.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sampson, C. R., J. L. Franklin, J. A. Knaff, and M. DeMaria, 2008: Experiments with a simple tropical cyclone intensity consensus. Wea. Forecasting, 23, 304312, https://doi.org/10.1175/2007WAF2007028.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sharma, N., M. M. Ali, J. A. Knaff, and P. Chand, 2013: A soft-computing cyclone intensity prediction scheme for the western North Pacific Ocean. Atmos. Sci. Lett., 14, 187192, https://doi.org/10.1002/asl2.438.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Shimada, U., H. Owada, M. Yamaguchi, T. Iriguchi, M. Sawada, K. Aonashi, and K. D. Musgrave, 2018: Further improvements to the Statistical Hurricane Intensity Prediction Scheme using tropical cyclone rainfall and structural features. Wea. Forecasting, 33, 15871603, https://doi.org/10.1175/WAF-D-18-0021.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Simon, A., A. B. Penny, M. DeMaria, J. L. Franklin, R. J. Pasch, E. N. Rappaport, and D. A. Zelinsky, 2018: A description of the real-time HFIP corrected consensus approach (HCCA) for tropical cyclone track and intensity guidance. Wea. Forecasting, 33, 3757, https://doi.org/10.1175/WAF-D-17-0068.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Su, H., L. Wu, J. H. Jiang, R. Pai, A. Liu, A. J. Zhai, P. Tavallali, and M. DeMaria, 2020: Applying satellite observations of tropical cyclone internal structures to rapid intensification forecast with machine learning. Geophys. Res. Lett., 47, e2020GL089102, https://doi.org/10.1029/2020GL089102.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Tallapragada, V., L. Bernardet, M. K. Biswas, S. Gopalakrishnan, Y. Kwon, Q. Liu, and X. Zhang, 2014: Hurricane Weather Research and Forecasting (HWRF) model: 2013 scientific documentation. HWRF Development Testbed Center Tech. Rep., 99 pp., http://www.emc.ncep.noaa.gov/gc_wmb/vxt/pubs/HWRFScientificDocumentation2013.pdf.

  • Torn, R. D., and C. Snyder, 2012: Uncertainty of tropical cyclone best-track information. Wea. Forecasting, 27, 715729, https://doi.org/10.1175/WAF-D-11-00085.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Torn, R. D., and M. DeMaria, 2021: Validation of ensemble-based probabilistic tropical cyclone intensity change. Atmosphere, 12, 373, https://doi.org/10.3390/atmos12030373.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wada, A., and N. Usui, 2007: Importance of tropical cyclone heat potential for tropical cyclone intensity and intensification in the western North Pacific. J. Oceanogr., 63, 427447, https://doi.org/10.1007/s10872-007-0039-0.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wang, Z., 2014: Characteristics of convective processes and vertical vorticity from the tropical wave to tropical cyclone stage in a high-resolution numerical model simulation of Tropical Cyclone Fay (2008). J. Atmos. Sci., 71, 896915, https://doi.org/10.1175/JAS-D-13-0256.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wilks, D., 2006: Statistical Methods in the Atmospheric Sciences. 2nd ed. International Geophysics Series, Vol. 100, Academic Press, 648 pp.

  • Zhao, H., L. Wu, and W. Zhou, 2009: Observational relationship of climatologic beta drift with large-scale environmental flows. Geophys. Res. Lett., 36, L18809, https://doi.org/10.1029/2009GL040126.

    • Crossref
    • Search Google Scholar
    • Export Citation
Save