1. Introduction
Tropical cyclones (TCs) are powerful, organized systems that pose a major risk to coastal populations. Though many statistical models provide forecast guidance on future TC intensity change [e.g., the Statistical Hurricane Intensity Prediction Scheme (SHIPS); DeMaria and Kaplan 1999], direct measurement of most predictors such as relative humidity or vertical wind shear used in such models is impossible due to the development of TCs over open ocean far from land-based observing networks (Gray 1979). Many predictors must be inferred through a combination of remote observation and dynamic models of ocean and atmospheric behavior.
Infrared (IR; 10.3–10.7 μm) imagery from geostationary (Geo) satellites such as the Geostationary Operational Environmental Satellites (GOES) provides one of the few regular high-resolution observations of TC behavior over the open ocean with a historical record spanning decades (Knapp and Wilkins 2018; Janowiak et al. 2020). Furthermore, modern Geo IR platforms such as GOES-16 provide observations at even greater spatial and temporal resolution (Schmit et al. 2017). Since cloud-top temperature is related to cloud-top height, low IR temperatures tend to indicate higher cloud tops and thus stronger convection, and convective structures are known to be related to TC intensity (Dvorak 1975; Olander and Velden 2007).
In light of this growing record of satellite observations, a broad array of recent works has explored the wealth of information contained in the spatiotemporal structure of Geo IR imagery. The Dvorak technique and more recent advanced Dvorak technique (ADT) have long related Geo IR imagery to TC intensity (Dvorak 1975; Olander and Velden 2007), and more recent work has leveraged neural networks to improve the nowcasting accuracy of the ADT [artificial intelligence (AI)-enhanced Dvorak technique; Olander et al. 2021]. Here, we define “nowcasting” as estimating the current TC intensity based on intensity estimates up to 6 h prior and IR features up to the current time (0 h). Spatial analyses of IR imagery have been leveraged to improve forecasts of TC eye formation, a process related to intensification (DeMaria 2015; Knaff and DeMaria 2017). The deviation angle variance (DAV) technique, a measure of convective organization in IR imagery, contains valuable information for short-term (≤24 h) TC intensity guidance (Hu et al. 2020). The shape and evolution of Geo IR radial profiles is known to relate to intensity and intensity change, respectively (Sanabia et al. 2014; McNeely et al. 2020). In this work, we utilize the evolution over time of radial profiles (see Fig. 1) to jointly forecast short-term TC intensity and structure changes. We leverage deep autoregressive (AR) generative models to construct interpretable and high-resolution structural probabilistic forecasts, which display entire functions rather than time series of thresholded quantities, such as pixel counts beneath a given temperature threshold.
Concurrent with the rise of high-resolution Geo IR imagery is the growing application of convolutional neural networks (CNNs), powerful tools for performing prediction tasks with images as input. Predicting TC intensity from Geo IR data is an obvious candidate application; indeed, there are dozens of such works in the machine learning literature applying CNNs to this problem, including Pradhan et al. (2018), Combinido et al. (2018), Lee et al. (2020), Tian et al. (2020), Wang et al. (2020), and Zhang et al. (2021). These models achieve reasonable forecast accuracy via the traditional machine learning framework with a CNN taking IR imagery as input to directly predict intensity by, e.g., minimizing the average squared-error loss on independent test data. Explainable AI approaches may then use methods such as layer-wise relevance propagation, saliency maps, and activation maps to better understand how the model produced its point estimate (McGovern et al. 2019; Ebert-Uphoff and Hilburn 2020). For an example of explainable CNN-based TC intensity forecasting in the meteorological literature, see Griffin et al. (2022).
Our proposed pipeline takes a different approach to explainability—one which remains compatible with the above tools for insight into the relationships leveraged by CNNs. Our approach (i) utilizes a dimensionality-reducing functional transformation of IR imagery prior to analysis, and (ii) provides 12-h ensemble forecasts of TC convective structure in addition to TC intensity.
First, we extract scientifically motivated functional features, reducing the dimension of the problem (from 2D images over time to 1D functions over time) in a directly interpretable summary, rather than directly relying on the CNN to extract salient features from (high-dimensional and low-sample size) raw Geo IR imagery. These rich summary functions are derived from the “ORB” suite: organization (e.g., DAV as a function of radius), radial structure (e.g., the radial profiles examined in this work), and bulk morphology (e.g., pixel counts as a function of a temperature threshold). Temporal sequences of radial profiles are highly relevant to both intensity and intensity change (Sanabia et al. 2014; McNeely et al. 2020, 2022). Temporal changes in these sequences of profiles can be visualized via Hovmöller diagrams, which are more readily digestible by users than inferring temporal patterns from animations of satellite imagery.
Second, we provide a probabilistic structural forecast, a prediction of an ensemble of possible TC convective evolution, rather than directly predicting future intensity from past IR structure and TC intensity. Our novel approach to intensity guidance via Geo IR imagery results in interpretable intensity forecasts such as “our model predicts short-term intensification due to the potential emergence of an eye–eyewall structure in the next 12 h.” Though methods such as layer-wise relevance propagation can provide further insight into the CNN’s use of structural forecasts, the IR structural forecasts themselves are the core of our proposed intensity guidance pipeline.
Figure 2 outlines section 3 via a schematic diagram of the structural forecasting to intensity forecasting pipeline. There are three main subsections:
-
Section 3a: Structural trajectories via ORB. First, we apply the ORB framework (McNeely et al. 2019, 2020) to observed IR imagery to create a “structural summary” (Fig. 1) of the spatiotemporal evolution of the present and recent past TC structure.
-
Section 3b: Structural forecasting with a deep autoregressive generative model. Next, we propagate the observed IR structure up to 12 h forward in time via a deep pixel-autoregressive model, which stochastically simulates an ensemble of possible trajectories of IR radial profiles.
-
Section 3c: Forecasting TC intensity via convolutional neural networks. Finally, we input the observed structure, the forecasted structure, and TC intensity up to 6 h prior to the current time into a nowcasting model to estimate the current intensity; we choose CNNs because they are easy to train and commonly used for image data. By filling in the missing t + 6- and t + 12-h structure, we can then extend the nowcasting model from a nowcast for time t (i.e., hour 0) to a forecast at time t + 6 h and then to time t + 12 h.
Section 4 details the results of our prototype forecasting pipeline. The final Geo IR-based TC intensity guidance provides inherent measures of uncertainty and insight into the potential TC structural changes that influence a given forecast. The results in this work use proof-of-concept structural forecasting and a pipeline that relies solely on persistence predictors (i.e., prior intensity estimates) together with observed past and simulated future radial profiles; no environmental factors such as vertical wind shear or ocean heat content are included at this time. We demonstrate that a purely autoregressive prototype achieves a useful degree of forecasting accuracy.
2. Data
Our model relies on two data sources: sequences of Geo IR imagery captured by GOES satellites and past TC intensity. For training and verification (i.e., model selection), we use NOAA’s Hurricane Database 2 (HURDAT2; Landsea and Franklin 2013) because that database provides the postseason best estimates of TC intensities. For forecasting, we rely on operational TC intensity estimates, the CARQ entries from the Naval Research Laboratory’s Automated Tropical Cyclone Forecast (ATCF) operational “A-deck” files (Sampson and Schrader 2000) to assess model performance under real-time conditions.
GOES IR imagery is available through NOAA’s Merged IR (MERGIR) database (Janowiak et al. 2020) at 30 min × 4 km resolution over the North Atlantic (NAL) basin from 2000 to 2020. For each TC, we download ∼2000 km × 2000 km “stamps” of IR imagery centered on the TC location at a 30-min temporal resolution. Figure 1 (left) shows two such stamps after an 800-km radius mask is applied. For this work, we sample the 30-min data at 2-h resolution because of periodic corruption of the imagery in the MERGIR database (Z. Liu 2021, personal communication).
During training, we linearly interpolate TC location and intensity from HURDAT2 to obtain locations and intensities for nonsynoptic times; however, model assessment is restricted to synoptic times. We include TC lifetimes between the first synoptic time at which intensity reaches at least 35 kt (1 kt ≈ 0.51 m s−1) and the last synoptic time at which intensity is at least 35 kt; note that this can result in the inclusion of TCs < 35 kt if the TC decays and then reintensifies.
Finally, we rely on NHC’s official forecast verification to assess our model’s performance. We also draw on the SHIPS developmental database’s 200–850-hPa vertical wind shear values calculated within a 200–800-km annulus from the TC center as reference during model validation due to the known impact of shear on TC convective structure (DeMaria 2018).
3. Methods
As outlined in Fig. 2, we first construct a summary of IR structural evolution (section 3a). We then train a stochastic autoregressive model, which is an explicit likelihood model (of structural trajectories) that we can use to simulate probable IR structural evolution (section 3b). Finally, we combine observed and forecasted structure with operational intensity estimates up to and including the current time to provide interpretable short-term intensity guidance, based solely on Geo IR imagery and operational intensity estimates (section 3c).
a. Structural trajectories via ORB
Operational forecasting of TC intensity is a human-in-the-loop process and thus places a premium on guidance interpretability. In this spirit, the ORB framework (organization, radial structure, bulk morphology) summarizes 2D imagery via continuous 1D functions to enable static visualization of spatiotemporal patterns in TC development via Hovmöller diagrams (Hovmöller 1949). Our past work focused on the rich quantification of spatial information in Geo IR imagery (McNeely et al. 2019, 2020). More recently, we demonstrated the value of temporal patterns in ORB functions (McNeely et al. 2022), specifically the radial profile.
McNeely et al. (2022) demonstrated a relationship between TC intensity change and Hovmöller diagrams of radial profiles. However, the radial profile, if averaged over all angles, will disregard asymmetry within the original 2D images, which can degrade performance for cases affected by strong vertical wind shear. In this work, we instead compute a separate radial profile for each geographic quadrant (northeast, northwest, southeast, southwest) to capture asymmetries via the differences between quadrants. We use geographic quadrants instead of motion-relative or shear-relative quadrants because the directions of motion and shear are unstable when the magnitudes of those vectors are small.
b. Structural forecasting via deep autoregressive generative model
The crucial step in our guidance framework is the propagation of radial profiles into the near future. The Hovmöller diagram captures the spatiotemporal evolution of the TC over an extended period of time; that is, we can summarize TC development by an easily interpretable image. By treating the structural trajectory as an image, where the y axis corresponds to the passage of time, forecasting radial profiles becomes equivalent to an image completion problem. That is, we predict the missing pixels at the bottom of an image (forecasted structure) given those at the top (observed structure). Image completion is an active research area in machine learning; here we focus on a state-of-the-art model in the class of pixel-autoregressive models (van den Oord et al. 2016b).
The challenge of how to estimate the conditional likelihoods p(xi|xi−1, …, x1) has given rise to many flavors of pixel-autoregressive models, including PixelRNN (van den Oord et al. 2016b), PixelCNN (van den Oord et al. 2016a), PixelCNN++ (Salimans et al. 2017), and PixelSNAIL (Chen et al. 2018). This work utilizes the last model, PixelSNAIL. There are two main ingredients in the model: (i) causal convolution and (ii) self-attention. Causal convolution utilizes the same convolutional feature extraction outlined in section 3c but masks each convolution so that each element in the raster sequence only receives information from previously generated sequences (e.g., Fig. 3). Purely convolutional models, however, are restricted to small neighborhoods of pixels, leading to only a finite receptive field (area of the source image involved in a given convolution), and thus struggle with long-range dependencies in the conditional p(xi|xi−1, …, x1). PixelSNAIL, on the other hand, features a self-attention mechanism that leads to unbounded receptive fields with pinpointed access to information far away in the sequence; see Chen et al. (2018) for details on the PixelSNAIL architecture.
This autoregressive model enables stochastic simulation of structural trajectories based on Geo IR persistence. For a given synoptic time, we can simulate many trajectories from the observed history and then feed each potential trajectory through the nowcasting model to obtain the associated intensity guidance. Via multiple simulations per forecast time, an ensemble forecast provides a measure of uncertainty in both structural trajectories and intensities while also offering insight into cases where the model over- or underestimates intensity. For example, overestimates may be caused by too-low profile temperatures or overestimated symmetry between quadrants.
The structural forecasting model is trained on TCs from 2000 to 2012, with 2013–20 withheld for testing. We train the model using input radial profiles calculated every 2 h but test on synoptic times. Because AR models are likelihood-based, we can directly calculate and minimize the negative log-likelihood (NLL), a measure of the model’s ability to generalize well on withheld data.
c. Nowcasting TC intensity via convolutional neural networks
Traditional linear models are attractive for reasons of interpretability and good performance in low sample size settings. However, linear models often struggle to capture the complex, time-varying processes that drive TCs. It is also unclear how to include the radial profile Hovmöller diagrams as inputs to a linear model without sacrificing interpretability. In this work, we instead consider a simple convolutional neural network to map observed IR trajectories (St) to current intensities (Yt). Because we treat time as a spatial dimension in these diagrams and a structural trajectory is represented as an image, a CNN will leverage both spatial and temporal patterns in the data.
CNNs operate by two main elements: convolutional layers and fully connected layers. The convolutional layers first convolve each layer (here, each quadrant) with a library of filters (i.e., matrices whose entries are learned parameters); some of these filters may resemble familiar matrices, such as gradient approximators (e.g., Sobel matrices). After each convolutional layer, the image is pooled to reduce the image size and increase the receptive field of the next set of convolutions. In the final step, the results of all convolutions are passed into a fully connected layer that approximates the relationship between the convolutional feature map and the response.
Like the structural forecasting model, the nowcasting model is trained on TCs from 2000 to 2012, here by minimizing the mean squared error. The model is trained on data with a 2-h resolution (rather than synoptic times alone) with intensities linearly interpolated to those times; we do not include nonsynoptic times in the test TCs (2013–20). The details of the CNN architecture are given in Fig. 4.
d. From nowcasting to forecasting
Section 3c defines a nowcasting model for estimating intensity at time t (i.e., hour 0) by training on postseason (best track or HURDAT2) intensities from −30 to −6 h and imagery from −30 to 0 h. After we have trained and validated the nowcasting model to estimate 0-h intensities, we apply the CNN nowcasting model to TC intensity forecasting. To forecast intensity at time t + 6 h, we need the intensities at times ≤ t (in this work, we use operational intensities drawn from CARQ in the A-deck files when generating TC intensity forecasts) and structural trajectories at times ≤ t + 6 h (observed at times ≤t and simulated at times from t + 2 to t + 6 h). Using the structural forecasting model in section 3b, we simulate many possible trajectories from times t + 2 to t + 6 h. Each of these possible future trajectories is then passed to the nowcasting model to obtain a separate intensity forecast, giving an ensemble of possible intensities.
Our proposed framework for intensity forecasts at +6 and +12 h has two primary benefits: (i) by providing an additional structural forecast, we provide insight into potential TC evolution predicted by the model, such as deepening convection or the emergence of an eye; (ii) because the structural forecast is stochastic, we can straightforwardly assess the uncertainty in structural evolution over time and the associated uncertainty in the intensity forecasts.
4. Model results
We first demonstrate the performance of our proposed model on specific cases [Hurricanes Jose (2017), Nicole (2016), and Dorian (2019)] in section 4a, discussing both accuracy and the insight provided by structural forecasts at 6- and 12-h lead times. We then assess the performance of the model during 2013–20 in the North Atlantic basin at 6- and 12-h lead times in section 4b.
a. Case studies
We examine Hurricane Jose (2017) due to the presence of high vertical wind shear which produces convective asymmetries not captured by the azimuthally averaged radial profiles (i.e., not quadrant-based) of McNeely et al. (2022). Hurricane Nicole (2016) was selected due to undergoing two rapid intensification and two rapid weakening events. Finally, Hurricane Dorian (2019) was a powerful TC with many in situ observations.
1) Intensities
Figure 5 shows the 6-h forecasts based on 64 independently simulated structural trajectories per synoptic time. Because the structural forecasts are currently based entirely on persistence—no environmental fields, such as 200–850-hPa vertical wind shear, have been included—we expect the guidance to be most useful in the short term (6–12-h time frame). The steadier development of Hurricanes Jose and Dorian are well-modeled, but the swift intensity changes exhibited by Hurricane Nicole as well as the rapid intensification period of Hurricane Dorian both prove challenging to capture.
Extending lead time to 12 h increases the variation among individual simulations, but the average simulated intensity continues to roughly track the observed intensities. The rapid intensity change events exhibited by Hurricane Nicole are challenging to forecast with only IR persistence. However, the model follows Hurricane Jose’s evolution relatively well, indicating that the model has value as is at 12-h lead times.
2) Diagnostics
While the end goal of intensity guidance models is ultimately prediction of TC intensity, our structural forecasting pipeline adds valuable diagnostic insight into structural factors contributing to its predictions.
Figure 6 demonstrates the three-step (12-h lead time) structural forecast for Hurricane Dorian valid for 1800 UTC 27 August during a period in which it maintained 45-kt intensity. The final 6 rows of each Hovmöller diagram are simulated from the structural forecast model; in this figure, we average the four quadrants for ease of visualization (see appendix A, Fig. B10, in the online supplemental material for Hurricane Dorian structural forecasts broken down by quadrant). Cloud-top temperature magnitude tends to be underestimated, but the expansion of cloud coverage during this 12-h period is captured across most simulations.
Figure 7 demonstrates the 6-h forecasting guidance available at individual synoptic times. The average simulated profiles in each quadrant tend to track observed profiles reasonably well, although they tend to predict too flat a curve and too symmetric an eye. Figure 8 shows the same information but for the 12-h lead time. Here, model biases tend to be amplified by longer lead times. We note that the emergence of an eye is captured in the trajectory in Fig. 8b, even 12 h out.
Similar figures for Hurricanes Jose and Nicole are provided in the supplemental material. In general, the structural forecast follows the observed profile, even at 12-h lead times. We did not perform any data augmentation during training (e.g., rotation) in order to preserve dominant geographic patterns (e.g., the prevalence of TC convection sheared eastward and northeastward in the North Atlantic), but it is possible that augmentation by rotating TCs would improve simulation fidelity, as it has been shown to improve accuracy in other TC intensity forecasting applications such as Griffin et al. (2022).
b. Model verification
The same models are used to produce 16 simulated trajectories with associated intensity guidance for each synoptic time from 2013 to 2020 at the 6- and 12-h lead times. (We use 16 rather than 64 simulations when validating over the entire 8-yr period for computational reasons.) Intensity predictions provided via averaging the 16 simulations are validated against HURDAT2 best track intensities, and past TC intensity values provided as input to the model come from operational estimates (CARQ) to emulate real-time performance.
Overall model verification at n = 16. Trajectory verification of structural forecasts, compared to IR persistence forecasts where the radial profiles are fixed at their 0-h values. Simulation noise (root-mean variance and mean absolute deviation) grows rapidly in the first 6 h; bias increases in magnitude steadily. We note that persistence offers a less biased IR forecast on average, but higher overall errors in structure at all lead times.
Table 2 reports verification statistics for intensity guidance using the traditional definitions for root mean squared error (RMSE), mean absolute error (MAE), and bias. As expected, the negative bias in structural forecasts manifests as a positive bias in intensity guidance.
Intensity verification vs HURDAT2 best track intensities from 2013 to 2020 at each lead time.
Tables 3 and 4 assess the performance of our intensity guidance via structural forecasting at 12-h lead times and compare it to the NHC’s official forecast verification from 2013 to 2019 due to availability of verification data at time of writing; note that this is a subset of the times reported in Table 1, consisting of cases where both our structural forecasts and NHC official verification are available. Overall, the RMSE of the structural forecast is about 1.1 kt larger than the NHC official forecast error as computed by RMSE, and structural forecasts produce roughly twice the bias (1.1 versus −0.6 kt). The structural forecast sees unchanged MSE with increasing 200–850-hPa vertical wind shear; the bias, however, increases with increasing wind shear (Table 3). This trend is expected, as the model does not include wind shear as a predictor but instead relies on the positive correlation between shear and asymmetry in IR imagery (as captured by radial profiles computed by quadrant). The NHC official forecast error exhibits a similar, if less pronounced, trend in bias with increasing shear. The direction of shear seems more important, with both our model and the NHC official forecast performing most poorly for northwest shear (6% of cases) and best for southwest shear (9% of cases). The northeast and southeast cases dominate the overall model performance since they comprise the remaining 85% of the dataset. The disparity between different shear magnitudes and directions could be alleviated in a model which utilizes environmental predictors.
Intensity guidance verification relative to shear: Model verification binned by 200–850-hPa vertical wind shear, reported as RMSE/MAE/bias. The performance of the structural forecasting model does not change meaningfully relative to wind shear magnitude, while the NHC official forecast performs better in higher shear environments. The structural forecast has comparable performance to the NHC official forecasts in low-shear environments. The performance of the structural model does vary with shear direction. Both the NHC forecasts and the structural model produce higher errors for northwest shear (6% of cases).
Intensity guidance verification by TC intensity: Model verification split out by intensity and intensity change, reported as RMSE/MAE/bias. Both the structural and NHC official forecasts struggle more with intense storms, which are rarer. The structural forecast has much stronger bias, which is expected due to the heavy influence of persistence features in the absence of environmental predictors. Similarly, both forecasts perform best during maintenance periods (6-h change ≤ 5 kt in magnitude), overestimate during weakening, and underestimate during intensification. The bias is more pronounced in the structural forecast due to the absence of environmental predictors.
Table 4 demonstrates similar error trends for both official forecasts and our structural forecasts. Errors tend to increase with TC intensity and with rate of intensification or weakening. The structural model produces higher bias for weaker TCs and lower bias for stronger TCs. Similarly, the structural model tends to overestimate intensities during weakening and underestimate them during intensification. The model errors are comparable to NHC official forecast errors during periods of maintenance and intensification (although bias is higher); it is periods of weakening which tend to be poorly modeled by the structural forecast. We suspect that the inclusion of environmental information could improve fidelity in weakening cases; see section 6 on “future work directions” for a discussion of such avenues for model improvement.
c. Variable importance in intensity forecasts
Our model results show that structural forecasts result in 6- and 12-h intensity predictions of comparable accuracy to NHC official forecasts. For insight into how much our model relies on IR inputs and prior intensities when making predictions, we compute a saliency map (also known as pixel attribution) for each input. There are varied definitions for saliency, including occlusion-based approaches such as SHAP explainability values (Lundberg and Lee 2017), LIME values (Ribeiro et al. 2016), and gradient-based approaches.
Figure 9 (center) shows a map of the SHAP importance or contribution of each pixel of the IR observed and forecasted imagery on the 6-h intensity forecast for Hurricane Dorian (2019). The bottom-left panel shows the SHAP values for prior intensity and prior intensity change. The bottom-right panel shows aggregated SHAP values for each input channel. From this result and a similar analysis with SHAP variable importance maps for Hurricane Jose (2017) and Hurricane Nicole (2016) in appendix A and gradient-based saliency maps in appendix B of the supplemental material, we conclude that (i) IR imagery contributes to the intensity forecasts to a degree comparable to persistence features, (ii) forecasted infrared imagery from our deep autoregressive generative model plays a more important role than observed past imagery in the TC intensity forecasts, (iii) the current and past presence/absence of an eye is generally the key feature of a storm, and (iv) the core temperatures outside of the eye play a significant role for intensity forecasting.
5. Discussion and conclusions
This paper demonstrates a novel interpretable approach to short-term TC intensity guidance trained solely on intensity estimates up to 6 h prior to the current time and IR observations up to 0 h. We specifically leverage spatial characteristics of TC convection as captured by radial IR profiles. By forecasting an ensemble of +6- and +12-h trajectories of TC IR structure with radial profiles computed over four geographic quadrants, we obtain reasonable estimates of future +6- and +12-h TC intensity while simultaneously capturing and enabling visualization of signals in convective structure relevant to those future intensities. We focus on interpretable, physically based factors to facilitate understanding of the model’s performance (e.g., upcoming intensification corresponds with decreasing cloud-top temperatures in the structural forecast). The approach outlined here has the potential for further improvement by adopting other network architectures for structural forecasts and by including environmental predictors provided in real time by SHIPS guidance. Though testing on years of cases takes time, an individual forecast for a single TC can be obtained in minutes on a single GPU, indicating the potential for the eventual use of this model as part of the available TC guidance suite in an operational setting.
6. Future work directions
a. Improving the network architecture for structural forecasts
The PixelSNAIL approach provides reasonable simulations of TC IR structural evolution up to 12-h lead times. However, there exists a wealth of alternate deep autoregressive generative models, each of which can be designed and trained in innumerable ways. Likewise, deep autoregressive models are not the only generative models available. Simulation could be carried out via vector autoregression on a low-dimensional projection of profiles (e.g., principal component analysis, Fourier bases, etc.), generative adversarial networks (GANs; Creswell et al. 2018), or transformers [e.g., temporal fusion transformers for multihorizon forecasting (Lim et al. 2021) and spatiotemporal transformers (Grigsby et al. 2021)]. The PixelSNAIL architecture was chosen to demonstrate the value and feasibility of structural forecasting for intensity guidance.
b. Calibrating the probability distribution of structural forecasts
Our structural forecasts are probabilistic in nature, taking the form of probability distributions over future structural trajectories S>t. In the current work, we apply a standard machine learning approach of fitting a model by minimizing a loss function (in this case the negative log likelihood). A good probabilistic forecast, however, should be conditionally calibrated. That is, the probability of a particular event (in our case, specific radial profiles 6–12 h into the future), given or “conditional on” a particular history of evolution and other predictors, should match the predicted probability of the same event. This is essentially saying that draws from the forecasting model should be indistinguishable from actual observations, if all relevant conditions are the same. Dey et al. (2022) recently proposed a new method for adjusting or “recalibrating” probabilistic forecasts, so that they will have his property. Indeed, one can potentially apply their procedure sequentially to each autoregressive component p(Ziq|Zi−1, …, Z1), for pixel i = 1, …, n, and quadrant q = 1, 2, 3, 4, so as to obtain a conditionally calibrated density over structural trajectories S>t given present and past observations; see the discussion in Dey et al. (2022).
c. Inclusion of environmental variables
The PixelSNAIL model presented here is a purely autoregressive process; that is, it simulates future structural features using only past IR imagery as an input. The inclusion of environmental variables known to impact TCs such as vertical wind shear, atmospheric moisture, or sea surface temperature may improve the accuracy of the forward simulation of radial profiles, particularly of structural evolution beyond 12 h. Such factors can be added to the PixelSNAIL architecture as additional input layers via values provided by SHIPS which are not forecasted by the model. These inputs would then serve as “guiderails” for simulated structural evolution with potential to better capture the effects of such factors on profile asymmetry. Despite these limitations, our prototype model (which is derived solely from prior and present TC intensity estimates and Geo IR imagery alongside forecasted TC structure using a very simple network architecture) provides reasonable short-term structural and intensity forecasts comparable to NHC forecasts at 6- and 12-h lead times. The inclusion of environmental variables in the nowcasting model is likely to improve its intensity forecasts, which would then be compared to SHIPS forecasts as well as NHC official forecasts, the latter of which are crafted using SHIPS and other guidance.
Acknowledgments.
Part of this research was done as an independent study in the spring of 2021 while Pavel Khokhlov was a Master in Machine Learning student at Carnegie Mellon University. We are grateful to Microsoft for providing Azure computing resources for this work. The authors thank Katerina Fragkiadaki for a discussion on deep generative networks, and Galen Vincent for many helpful comments on the research. This work is supported in part by NSF DMS-2053804, NSF PHY-2020295, and the C3.ai Digital Transformation Institute.
Data availability statement.
Code to generate Geo IR radial profiles from openly available data can be found at https://github.com/ihmcneely/ORB2sample, which draws from the MERGIR database openly available from NASA at https://disc.gsfc.nasa.gov/datasets/GPM_MERGIR_1/summary. The HURDAT2 best track database is openly available from the NHC at https://www.nhc.noaa.gov/data, while the ATCF operational best tracks (B-deck) are openly available from the National Center for Atmospheric Research at http://hurricanes.ral.ucar.edu/repository/. Official forecast verification files are openly available from the NHC at https://www.nhc.noaa.gov/verification/. Finally, SHIPS developmental data are openly available from the Cooperative Institute for Research in the Atmosphere at https://rammb.cira.colostate.edu/research/tropical_cyclones/ships/developmental_data.asp.
REFERENCES
Chen, X., N. Mishra, M. Rohaninejad, and P. Abbeel, 2018: PixelSNAIL: An improved autoregressive generative model. Proc. 35th Int. Conf. on Machine Learning, Stockholm, Sweden, PMLR, 864–872, https://proceedings.mlr.press/v80/chen18h.html.
Combinido, J. S., J. R. Mendoza, and J. Aborot, 2018: A convolutional neural network approach for estimating tropical cyclone intensity using satellite-based infrared images. 24th Int. Conf. on Pattern Recognition (ICPR), Beijing, China, Institute of Electrical and Electronics Engineers, 1474–1480, https://doi.org/10.1109/ICPR.2018.8545593.
Creswell, A., T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, and A. A. Bharath, 2018: Generative adversarial networks: An overview. IEEE Signal Process. Mag., 35, 53–65, https://doi.org/10.1109/MSP.2017.2765202.
DeMaria, M., 2018: SHIPS developmental database file format and predictor descriptions: Developmental Data. Colorado State University, accessed 10 August 2022, https://rammb.cira.colostate.edu/research/tropical_cyclones/ships/developmental_data.asp.
DeMaria, M., and J. Kaplan, 1999: An updated Statistical Hurricane Intensity Prediction Scheme (SHIPS) for the Atlantic and eastern North Pacific basins. Wea. Forecasting, 14, 326–337, https://doi.org/10.1175/1520-0434(1999)014<0326:AUSHIP>2.0.CO;2.
DeMaria, R., 2015: Automated tropical cyclone eye detection using discriminant analysis. M.S. thesis, Dept. of Computer Science, Colorado State University, 74 pp., https://www.cs.colostate.edu/~anderson/wp/pubs/demaria-2015-ms.pdf.
Dey, B., D. Zhao, J. A. Newman, B. H. Andrews, R. Izbicki, and A. B. Lee, 2022: Calibrated predictive distributions via diagnostics for conditional coverage. arXiv, 2205.14568v2, https://doi.org/10.48550/arXiv.2205.14568.
Dvorak, V. F., 1975: Tropical cyclone intensity analysis and forecasting from satellite imagery. Mon. Wea. Rev., 103, 420–430, https://doi.org/10.1175/1520-0493(1975)103<0420:TCIAAF>2.0.CO;2.
Ebert-Uphoff, I., and K. Hilburn, 2020: Evaluation, tuning, and interpretation of neural networks for working with images in meteorological applications. Bull. Amer. Meteor. Soc., 101, E2149–E2170, https://doi.org/10.1175/BAMS-D-20-0097.1.
Gray, W. M., 1979: Hurricanes: Their formation, structure and likely role in the tropical circulation. Supplement to Meteorology over the Tropical Oceans, D. B. Shaw, Ed., Royal Meteorological Society, 155–218.
Griffin, S. M., A. Wimmers, and C. S. Velden, 2022: Predicting rapid intensification in North Atlantic and eastern North Pacific tropical cyclones using a convolutional neural network. Wea. Forecasting, 37, 1333–1355, https://doi.org/10.1175/WAF-D-21-0194.1.
Grigsby, J., Z. Wang, N. Nguyen, and Y. Qi, 2021: Long-range transformers for dynamic spatiotemporal forecasting. arXiv, 2109.12218v3, https://doi.org/10.48550/arXiv.2109.12218.
Hovmöller, E., 1949: The trough-and-ridge diagram. Tellus, 1, 62–66, https://doi.org/10.3402/tellusa.v1i2.8498.
Hu, L., E. A. Ritchie, and J. S. Tyo, 2020: Short-term tropical cyclone intensity forecasting from satellite imagery based on the deviation angle variance technique. Wea. Forecasting, 35, 285–298, https://doi.org/10.1175/WAF-D-19-0102.1.
Janowiak, J., B. Joyce, and P. Xie, 2020: NCEP/CPC L3 half hourly 4 km global (60S - 60N) merged IR v1 (GPM_MERGIR). NASA Goddard Earth Sciences Data and Information Services Center, accessed 8 February 2022, https://doi.org/10.5067/P4HZB9N27EKU.
Knaff, J. A., and R. T. DeMaria, 2017: Forecasting tropical cyclone eye formation and dissipation in infrared imagery. Wea. Forecasting, 32, 2103–2116, https://doi.org/10.1175/WAF-D-17-0037.1.
Knapp, K. R., and S. L. Wilkins, 2018: Gridded satellite (GridSat) GOES and CONUS data. Earth Syst. Sci. Data, 10, 1417–1425, https://doi.org/10.5194/essd-10-1417-2018.
Landsea, C. W., and J. L. Franklin, 2013: Atlantic hurricane database uncertainty and presentation of a new database format. Mon. Wea. Rev., 141, 3576–3592, https://doi.org/10.1175/MWR-D-12-00254.1.
Lee, J., J. Im, D.-H. Cha, H. Park, and S. Sim, 2020: Tropical cyclone intensity estimation using multi-dimensional convolutional neural networks from geostationary satellite data. Remote Sens., 12, 108, https://doi.org/10.3390/rs12010108.
Lim, B., S. Ö. Arık, N. Loeff, and T. Pfister, 2021: Temporal fusion transformers for interpretable multi-horizon time series forecasting. Int. J. Forecasting, 37, 1748–1764, https://doi.org/10.1016/j.ijforecast.2021.03.012.
Lundberg, S. M., and S.-I. Lee, 2017: A unified approach to interpreting model predictions. NIPS’17: Proc. 31st Int. Conf. on Neural Information Processing Systems, Long Beach, CA, Association for Computing Machinery, 4768–4777, https://dl.acm.org/doi/10.5555/3295222.3295230.
McGovern, A., R. Lagerquist, D. J. Gagne II, G. E. Jergensen, K. L. Elmore, C. R. Homeyer, and T. Smith, 2019: Making the black box more transparent: Understanding the physical implications of machine learning. Bull. Amer. Meteor. Soc., 100, 2175–2199, https://doi.org/10.1175/BAMS-D-18-0195.1.
McNeely, T., A. B. Lee, D. Hammerling, and K. Wood, 2019: Quantifying the spatial structure of tropical cyclone imagery. NCAR Tech. Note NCAR/TN-557+STR, 18 pp., https://doi.org/10.5065/5frb-ws04.
McNeely, T., A. B. Lee, K. M. Wood, and D. Hammerling, 2020: Unlocking GOES: A statistical framework for quantifying the evolution of convective structure in tropical cyclones. J. Appl. Meteor. Climatol., 59, 1671–1689, https://doi.org/10.1175/JAMC-D-19-0286.1.
McNeely, T., G. Vincent, A. B. Lee, R. Izbicki, and K. M. Wood, 2022: Detecting distributional differences in labeled sequence data with application to tropical cyclone satellite imagery. arXiv, 2202.02253v3, https://doi.org/10.48550/arXiv.2202.02253.
Olander, T., and C. Velden, 2007: The advanced Dvorak technique: Continued development of an objective scheme to estimate tropical cyclone intensity using geostationary infrared satellite imagery. Wea. Forecasting, 22, 287–298, https://doi.org/10.1175/WAF975.1.
Olander, T., A. Wimmers, C. Velden, and J. P. Kossin, 2021: Investigation of machine learning using satellite-based advanced Dvorak technique analysis parameters to estimate tropical cyclone intensity. Wea. Forecasting, 36, 2161–2186, https://doi.org/10.1175/WAF-D-20-0234.1.
Pradhan, R., R. S. Aygun, M. Maskey, R. Ramachandran, and D. J. Cecil, 2018: Tropical cyclone intensity estimation using a deep convolutional neural network. IEEE Trans. Image Process., 27, 692–702, https://doi.org/10.1109/TIP.2017.2766358.
Ribeiro, M. T., S. Singh, and C. Guestrin, 2016: “Why should I trust you?”: Explaining the predictions of any classifier. Proc. 22nd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, San Francisco, CA, Association for Computing Machinery, 1135–1144, https://dl.acm.org/doi/10.1145/2939672.2939778.
Salimans, T., A. Karpathy, X. Chen, and D. P. Kingma, 2017: PixelCNN++: Improving the PixelCNN with discretized logistic mixture likelihood and other modifications. arXiv, 1701.05517v1, https://doi.org/10.48550/arXiv.1701.05517.
Sampson, C. R., and A. J. Schrader, 2000: The Automated Tropical Cyclone Forecasting System (Version 3.2). Bull. Amer. Meteor. Soc., 81, 1231–1240, https://doi.org/10.1175/1520-0477(2000)081<1231:TATCFS>2.3.CO;2.
Sanabia, E. R., B. S. Barrett, and C. M. Fine, 2014: Relationships between tropical cyclone intensity and eyewall structure as determined by radial profiles of inner-core infrared brightness temperature. Mon. Wea. Rev., 142, 4581–4599, https://doi.org/10.1175/MWR-D-13-00336.1.
Schmit, T. J., P. Griffith, M. M. Gunshor, J. M. Daniels, S. J. Goodman, and W. J. Lebair, 2017: A closer look at the ABI on the GOES-R series. Bull. Amer. Meteor. Soc., 98, 681–698, https://doi.org/10.1175/BAMS-D-15-00230.1.
Tian, W., W. Huang, L. Yi, L. Wu, and C. Wang, 2020: A CNN-based hybrid model for tropical cyclone intensity estimation in meteorological industry. IEEE Access, 8, 59 158–59 168, https://doi.org/10.1109/ACCESS.2020.2982772.
van den Oord, A., N. Kalchbrenner, O. Vinyals, L. Espeholt, A. Graves, and K. Kavukcuoglu, 2016a: Conditional image generation with PixelCNN decoders. NIPS’16: Proc. 30th Int. Conf. on Neural Information Processing Systems, Barcelona, Spain, Curran Associates Inc., 4797–4805,https://dl.acm.org/doi/10.5555/3157382.3157633.
van den Oord, A., N. Kalchbrenner, and K. Kavukcuoglu, 2016b: Pixel recurrent neural networks. Proc. 33rdInt. Conf. on Machine Learning, New York, NY, Association for Computing Machinery, 1747–1756, https://dl.acm.org/doi/10.5555/3045390.3045575.
Wang, C., Q. Xu, X. Li, and Y. Cheng, 2020: CNN-based tropical cyclone track forecasting from satellite infrared images. IGARSS 2020 IEEE Int. Geoscience and Remote Sensing Symp., Waikoloa, HI, Institute of Electrical and Electronics Engineers, 5811–5814, https://doi.org/10.1109/IGARSS39084.2020.9324408.
Zhang, C.-J., X.-J. Wang, L.-M. Ma, and X.-Q. Lu, 2021: Tropical cyclone intensity classification and estimation using infrared satellite images with deep learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., 14, 2070–2086, https://doi.org/10.1109/JSTARS.2021.3050767.