## 1. Introduction

Over the past few decades, predicting track and intensity fluctuations of North Atlantic tropical cyclones (TCs) has been a major emphasis for operational forecasters and atmospheric researchers alike. Both track and intensity metrics are definitively connected to the risks of a tropical cyclone, as maximum wind speed (VMAX) is related to the maximum potential destructiveness of a storm’s wind field (e.g., Emanuel 2005; Bell et al. 2000), and storm position is used to identify the regions at risk of experiencing the storm.

Despite the inherent usefulness of track and intensity metrics, they do not yield information on the size or overall strength of TCs. Over the past few Atlantic hurricane seasons, several notable landfalling TCs have produced more damage than otherwise would have been expected by a storm of its intensity. Hurricanes Ike (2008) and Irene (2011) both produced in excess of $15 billion (U.S. dollars) in damage across the United States as noted by the National Hurricane Center’s (NHC) tropical cyclone reports (Berg 2009; Avila and Cangialosi 2011) despite being rated as category 2 and category 1 storms, respectively, at the time of landfall on the Saffir–Simpson hurricane wind scale (SSHWS; Saffir 1975; Simpson 1974). More recently, Sandy (2012) only had a VMAX of 70 kt (1 kt = 0.5144 m s^{−1}) when it made landfall as an extraordinarily large post-tropical storm, yet it caused in excess of $50 billion (U.S. dollars) in total damage (Blake et al. 2013). As a result of these recent damaging storms and those from the record-setting 2005 Atlantic hurricane season (i.e., Hurricane Katrina), some researchers have questioned whether or not intensity metrics, such as maximum sustained winds, are best suited to communicate the risks of TCs (Kantha 2006).

Hurricanes Ike, Irene, and Sandy were all very large storms despite their weaker intensities. Therefore, it is distinctly possible that VMAX is not the sole variable tied to damage potential, especially in large tropical cyclones. Irish et al. (2008) found that storm size is significantly correlated to storm surge in Atlantic hurricanes and recommended that although intensity scales adequately categorize wind damage, storm size must be considered to adequately categorize damage from flooding. Of course, storm surge damage is also significantly influenced by nonmeteorological variables such as coastal bathymetry, population density, and property value. Nonetheless, with all else being equal, storm size is thought to have a significant influence over storm surge risks in landfalling TCs. Overall, conclusions such as these underscore the importance of assessing TC structure in real time and also help to validate the usefulness of metrics related to TC size such as operational wind radii.

*U*

^{2}) times the density (

*ρ*) per unit volume over a 1-m depth volume domain of a tropical cyclone (Powell and Reinhold 2007), or more specifically,In most cases, IKE is integrated using Eq. (1) over a portion of the storm volume

*V*that contains wind speeds greater than a certain threshold, such as tropical storm force or hurricane force (Powell and Reinhold 2007; Misra et al. 2013).

Importantly, unlike maximum sustained wind metrics, IKE responds to changes in the overall size, strength, and intensity of a TC. With respect to the physical processes that govern damage potential, IKE is designed to scale with the stress of the wind on the ocean as well as the wind load that acts upon a structure (Powell and Reinhold 2007). Therefore, IKE and other similar kinetic energy metrics are hypothesized to correspond well to the destructive potential of TCs, particularly with regards to storm surge damage.

For example, Hurricanes Ike, Irene, and Sandy at their respective times of landfall in the United States had extremely high IKE values, despite their low intensities (Fig. 1). Just prior to landfall, each of these three storms had IKE values that would rank in the top 7.5% of IKE values for all hurricane fixes between 1990 and 2011. Sandy, in particular, reached a lifetime maximum of over 400 TJ of IKE just prior to landfall, giving it the second highest maximum IKE value in any Atlantic TC since 1990. Therefore, it is distinctly possible that forecasting IKE, as a complement to existing intensity metrics, could help to better assess the risks of landfalling North Atlantic TCs, particularly for larger and less intense storms such as these three recent examples.

Despite the potential usefulness of real-time IKE updates, there are limited resources currently available to forecasters that are specifically designed to assess the kinetic energy of a tropical cyclone. Therefore, we present a statistical model, named Statistical Prediction of Integrated Kinetic Energy (SPIKE), which can potentially be used in real time for forecasting IKE fluctuations in North Atlantic TCs. Much like the operational Statistical Hurricane Intensity Prediction Scheme (SHIPS) used for forecasting VMAX (DeMaria and Kaplan 1994, 1999; DeMaria et al. 2005), SPIKE utilizes a multivariate linear regression model trained on a blend of environmental and internal storm-driven predictors, in addition to observed persistence metrics, to predict changes of IKE out to 3 days in the future.

## 2. Data

### a. Historical integrated kinetic energy record

To calibrate a statistical regression model for IKE, it is first necessary to obtain a historical record of IKE that can be used as the calibration model’s dependent variable. Gridded analyses of TC wind fields would be ideal for calculating IKE, but unfortunately datasets like H*Wind analyses (Powell et al. 1998) are discontinuous, as they are not available for every 6-hourly storm fix. Alternatively, Powell and Reinhold (2007) developed an approximate set of stepwise equations to calculate estimates of IKE from operational 34-, 50-, and 64-kt wind radii, the radius of maximum wind (RMW), and VMAX. This methodology to approximate IKE from operational wind radii has been used previously to catalog historical IKE levels (Misra et al. 2013) and will be used again here to form a large historical record of IKE values to train and test our statistical model. Specifically, the operational wind radii from the extended best-track dataset (Demuth et al. 2006) are utilized to create a 6-hourly record of IKE values for all North Atlantic TCs over a 22-yr training interval between 1990 and 2011. The resulting IKE values in the historical dataset specifically measure the integrated kinetic energy only over the portion of the wind field where the wind speeds are of tropical storm force or greater (*U* > 18 m s^{−1}). This specific IKE quantity is selected because it is hypothesized to be closely related to storm surge damage potential by Powell and Reinhold (2007).

In total, our historical IKE record comprises 5498 6-hourly IKE values from 291 individual TCs included from 1990 to 2011 (one IKE value for each 6-hourly TC fix). For storm fixes to be included in this IKE database, the storm must have maximum sustained winds of tropical storm force or greater, and the storm must be classified as a tropical or subtropical cyclone in the official best-track database (i.e., extratropical and post-tropical lows are not considered). The vast majority of the fixes that meet these criteria, and are included in the historical IKE database, are located over the open ocean. However, there are a small minority of points, for which the center of the TC is located over land. These so-called land points occur mostly over islands or very near the coastlines of mainland North America, as storms typically do not retain tropical characteristics and tropical storm intensity long after landfall. We elected not to remove these “land points” in order to include landfalling storms in our training sample, considering that TCs have the most significant societal impacts when they make landfall.

The relative frequency distribution of this large sample of 6-hourly historical IKE data points is shown in Fig. 1. It is immediately evident that Atlantic TCs most often have IKE values that do not exceed 25 TJ. The mean IKE value in the dataset is 34.9 TJ, and the standard deviation is 43.0 TJ. In rare instances, TCs can briefly obtain IKE values greater than 300 TJ, typically near the end of their time as a tropical cyclone.

It should be noted that the historical wind radii dataset used to calculate historical IKE values is subject to year-to-year and storm-to-storm inconsistencies. Landsea and Franklin (2013) estimated that uncertainties for operational 30-, 50-, and 64-kt radii were quite large during the 2010 season. In fact, they estimate that on average the error bars for each radii estimate could exceed 30% of the mean size for each radii. Overall, they find that the accuracy of the historical wind radii is limited to the data available in the operational or poststorm analyses, wherein operational radii measured by aircraft reconnaissance have less uncertainty than do those measured by just satellite observations. Furthermore, Landsea and Franklin (2013) indicate that landfalling storms and more intense storms typically have less uncertainty, as these types of storms are generally observed more frequently and with better equipment, especially landfalling storms, which can be observed from the ground. Therefore, the historical IKE dataset is likely subject to the same inconsistencies of data quality and quantity that is evident within the wind radii data in the extended best track data.

### b. Potential model predictors

In addition to the historical record of IKE, a pool of potential predictors must also be created to establish the relationships that will govern the statistical model. These predictors are carefully selected based on some of the understood relationships between a storm’s environment and the factors that govern the size, strength, intensity, and ultimately kinetic energy of a TC (e.g., Gray 1968; McBride 1995; Hill and Lackmann 2009; Maclay et al. 2008; Musgrave et al. 2012).

These environmental, storm-specific, and persistence predictors are gathered from a combination of the NHC best track dataset (Jarvinen et al. 1984) and the SHIPS developmental dataset (DeMaria and Kaplan 1999). The variables derived from the NHC best track data are storm-specific predictors, such as date, position, duration, intensity, and translational storm motion. The environmental variables, which encompass thermodynamic-, dynamic-, and moisture-related fields known to affect TC behavior, are taken from the aforementioned SHIPS developmental dataset (DeMaria and Kaplan 1999). In all, 31 predictors were considered for this regression exercise (Table 1).

Variables considered for use in the SPIKE models. Many of the variables (e.g., RHLO, SHRD, etc.) originated from SHIPS and are averaged over specific areas as noted. Others were specifically created from observations for the SPIKE model. Not all of these variables are used in the final models, as many of the regression coefficients fail significance tests in the backward screening methodology.

It should be noted that the SHIPS developmental dataset, from which the environmental data are obtained, is not a forecast, and instead it utilizes analyses and reanalyses from the National Centers for Environmental Prediction (NCEP) to provide estimates for the observed environmental conditions experienced by each storm from genesis to dissipation. Therefore, this model is trained using a so-called perfect prog approach (e.g., Neumann 1987; DeMaria 2010), wherein observations and analyses are used to drive a statistical model. Obviously, an operational real-time version of SPIKE will require the predictors to be forecasted by numerical models, since the analyses and reanalyses are not available for future time steps in a real-time operational setting. Therefore, the SPIKE models discussed for the remainder of sections 3–5, all of which utilize observations and SHIPS developmental data, are not meant to be forecasts or even hindcasts. Instead these regression models will provide an estimate for the maximum potential skill of SPIKE in the idealistic scenario that the forecasted predictors exactly match the future observations.

## 3. Regression methodology

The distribution of historical 6-hourly IKE values is decidedly non-Gaussian as shown in Fig. 1. In fact, the 6-hourly total IKE values are approximately lognormally distributed, which is similar to the distribution of storm size as measured by the radii of vanishing winds (Dean et al. 2009). Therefore, it would be inappropriate to use linear regression to model total IKE values. Instead, SPIKE will seek to predict changes of IKE from the initial time out to 12 evenly spaced time intervals from 6 to 72 h (0–6, 0–12, …, 0–72 h). Ultimately, these IKE changes are more normally distributed (e.g., Fig. 2), and thus, it becomes more appropriate to use multivariate linear regression. However, in order to create a model for each of these 12 forecast intervals, SPIKE regression models must be calibrated and validated separately for each forecast interval.

Based on the success of the statistical SHIPS model used to predict intensity change relative to other statistical and dynamical models (DeMaria et al. 2005), we utilize a similar regression methodology to construct our SPIKE model for IKE changes. As done by DeMaria and Kaplan (1994), both the dependent and independent variables are normalized prior to training the regression model. Ultimately, by normalizing the predictors, it becomes possible to make a comparison between regression coefficients for various predictors and forecast hours (DeMaria and Kaplan 1994).

To avoid overfitting in our SPIKE regression model, predictor screening must be utilized. Once again, we follow the lead of DeMaria and Kaplan’s SHIPS model (DeMaria and Kaplan 1994, 1999) by using backward screening to objectively select the most skillful predictors. Backward predictor screening is done here by training the model upon all of the predictors for each forecast interval, and then repeatedly removing the single predictor with the least significant regression coefficient one at a time, until all of the remaining regression coefficients are significant at the *p* = 0.01 level. Ultimately, this backward screening methodology retains a smaller subset of predictors that are used in the SPIKE model for each forecast interval. To make the SPIKE prediction model as uniform as possible across the forecast intervals, the same predictors are chosen for all intervals. These predictors are selected if their coefficients are significant at the 99% level for at least half of the forecast intervals. As a result, predictors may be used on intervals when their coefficients are not significant, but as pointed out by DeMaria and Kaplan (1994), when a predictor is not significant, its coefficient becomes small and its influence on the regression model is diminished.

It should also be noted that some of the SHIPS predictors included in Table 1 have missing or null data values scattered throughout our training interval of 1990–2011. Most notably, some of the subsurface ocean predictors (i.e., RD26) are unavailable for entire years at the beginning of the record. Therefore, to account for the limited number of missing predictor values, the regression coefficients are calibrated and the predictors are screened using a sample of all storm fixes that contain no missing data in the training interval. In other words, if a certain storm fix contains even one missing predictor, that data point will not be used in the calibration of the model. However, once the predictors are chosen and the coefficients are calculated, we estimate IKE for every storm fix to evaluate model skill. In the case that a predictor is missing, as occurs occasionally in an operational setting, we filled the missing data point with a value equivalent to the sample mean (i.e., exactly zero since the predictors are all normalized).

## 4. Results

This section presents the results of the SPIKE regression model. In addition to estimating fluctuations of IKE, SPIKE will also be adapted to estimate total IKE by incorporating persistence values of kinetic energy. An emphasis will be made in interpreting the physical relationships that drive the regression model in the first section, because a statistical relationship is meaningless without an understanding of the underlying physical processes. Later sections focus on evaluating the calibration and predictive skill of SPIKE with explained variance and mean error statistics as well as a standard bootstrapping exercise. Finally, the last section analyzes SPIKE’s ability to project IKE values during the 2012 Atlantic hurricane season in order to evaluate the model’s skill on a season that is outside of the training interval.

### a. Physical interpretation of selected predictors

The predictors retained through the backward screening exercise are shown in the first column of Table 2. These predictors encompass a wide array of variables ranging from thermodynamical fields such as the depth of the 26°C isotherm to positional variables such as latitude, dynamical values like upper-level divergence, and persistence variables such as past values of IKE. For the sake of simplicity, the variables are referenced as they are abbreviated in Table 1 for the rest of this discussion.

Regression coefficients for each predictor in the SPIKE model. The coefficients listed in italic font are significant at 99% for that forecast hour. The sample size for the training interval of 1990 to 2011 is shown below. Finally, shared variance between the SPIKE regression model and observed kinetic energy changes are listed in the bottom row.

The coefficients for these variables at selected forecast intervals are also shown in Table 2. Encouragingly, the sign of most of the predictors’ coefficients do not vary with forecast hour. For example, the coefficient for PIKE is negative in all intervals suggesting that storms with higher IKE are more likely to have decreasing IKE over time. As seen in the distribution of historical IKE values, TCs most often have low values of IKE; therefore, it is not terribly surprising that higher IKE storms typically weaken. The physical reasoning behind this relationship is tied to the timing of maximum IKE during a TC life cycle. As found by Musgrave et al. (2012), TCs often exhibit storm growth (increasing IKE) through most of their life cycle. North Atlantic TCs, in particular, maintain or increase their size even after reaching maximum intensity (Knaff et al. 2014). As a result, TCs often have their highest levels of IKE late in their life cycle, either prior to landfall or during extratropical transition (ET) when TCs often undergo wind field expansion as the RMW moves outward and the outer wind field accelerates (Evans and Hart 2008). Obviously following landfall or the completion of ET, TCs typically weaken drastically over the hostile environments of land or the cold northern Atlantic Ocean. Therefore, the negative coefficient of PIKE can be attributed to the negative IKE change of these strong storms, and the fact that weaker storms near genesis typically will gain IKE as they become more mature, provided the environment is not too unfavorable. Similar to PIKE, PDAY has negative coefficients for all forecast intervals. The negative coefficient is tied to the fact that TCs are more likely to have periods of increasing IKE close to the peak of the season (small PDAY), when conditions are typically most favorable for TC development.

The only predictor to have a coefficient that changes sign with forecast hour is dIKE12, wherein the coefficient is positive in the shorter forecast intervals and slightly negative in the longer intervals. The positive dIKE12 coefficients in the first several forecast intervals make sense, as storms typically continue to have increasing IKE in most phases of their life cycle (Musgrave et al. 2012), wherein a growing storm will continue to grow provided the environment remains somewhat favorable. Likewise, if a storm is in an environment unfavorable for IKE growth, the kinetic energy will likely continue to drop, at least in the short term. The slightly negative sign for the longer forecast intervals is more difficult to reason physically, but it should be noted that the coefficient is not significant to begin with, which suggests that the past 12-h IKE change is only helpful for determining upcoming IKE changes in the immediate future.

In terms of the environmental predictors, some of the underlying physical relationships are immediately apparent. For example, the coefficients for VORT and D200 are positive, suggesting a direct relationship between storm growth and each of these fields. In this case, both low-level vorticity and upper-level divergence are well-known conditions that are generally favorable for large-scale organized convection and the formation and development of TCs (e.g., McBride 1995). Similarly, the negative coefficients for MSLP and PENV are expected, as a more intense storm and/or a storm with a larger area of low pressure will typically have higher wind speeds and increased IKE with all else being equal. Likewise, the positive coefficients for RD26 are unsurprising, as TCs induce turbulent mixing and upwelling in the upper levels of the ocean, which cools SSTs through the entrainment of cooler subsurface waters (e.g., Price 1981). This SST cooling mechanism plays a significant role of slowing down TC growth and intensification, especially for slower-moving storms over shallow oceanic mixed layers (Schade and Emanuel 1999). Thus, an environment with a deeper thermocline, and a higher RD26, is more resistant to the negative SST feedback mechanism, making storm growth more favorable. Finally, the positive coefficients for DTL also make sense, as TCs tend to weaken as they approach and eventually cross landmasses.

In some cases, however, the physical processes that govern the regression coefficients are less apparent. For example, the positive coefficients for SHRD and LAT seem somewhat counterintuitive to conventional TC development theories, wherein TCs favor low shear environments as well as warmer oceans, which are typically found in the lower latitudes. However, as discovered by Maclay et al. (2008), TC growth can also be tied to external forcing from trough interactions and baroclinic environments over the higher latitudes. The positive coefficients for REFC, for example, reflect the positive influence of trough interactions on storm growth (Maclay et al. 2008; DeMaria et al. 1993). Likewise, the positive coefficients of LAT and SHRD are likely related to ET. Historically, ET occurs in just under half of all Atlantic TCs (Hart and Evans 2001), typically over the more sheared higher latitudes of the basin. Thus, the wind field expansion from ET and the subsequent increase in IKE over the higher latitudes is likely influencing the signs of these coefficients. Finally, the negative coefficients for RHLO are particularly counterintuitive because in most cases increased low-level humidity is favorable for TC development and also for increased storm size (e.g., Hill and Lackmann 2009). However, this apparent contradiction can be explained going back to the relationship between increasing IKE and extratropical transition, whereas storms undergoing ET often have an intrusion of dry air into the storm circulation (Jones et al. 2003), thus decreasing RHLO. Therefore, it is possible that lower RHLO could be associated with expanding wind fields in ET or other similar events, but additional study of the physical relationship between IKE and RHLO is clearly warranted.

### b. Model skill and validation tests, 1990–2011

The shared variance between the SPIKE model and IKE variability is shown for selected forecast intervals during the training period in Table 2. As is the case with SHIPS (DeMaria and Kaplan 1994), the explained variance increases with increasing forecast hour. At first, this appears counterintuitive as forecast skill typically decreases with lead time. However, the average magnitude of IKE change from 1990 to 2011 is much smaller in the shorter forecast intervals than in the larger forecast intervals (9 TJ for 12 h and 32 TJ for 72 h). Considering the errors and biases within the historical archive of operational wind radii, the calculations for observed IKE likely contain biases on the same order as, or even greater than, the IKE changes themselves for the shorter forecast intervals. Therefore, the model will perform poorly at explaining these smaller short-term changes that are dominated by observational biases. Furthermore, the predictors used to train SPIKE in this “perfect prog” approach are observed and not forecasted. Therefore, this exercise is not hurt by forecast biases and errors, which would ostensibly increase with forecast hour. Instead this exercise is a proof of concept for forecasting IKE changes given idealistic perfectly forecasted predictors.

The shared variance scores of SPIKE for predicting integrated kinetic energy changes are all significant considering the large sample sizes from using thousands of storm fixes between 1990 and 2011. The shared variance statistics for SPIKE are particularly impressive at the longer forecast intervals (*r*^{2} = 0.54), where they approach and in some cases exceed the shared variance levels for SHIPS and TC intensity (DeMaria and Kaplan 1994, 1999). Admittedly, the model performs quite poorly in the shorter ranges, especially considering observed predictors are used instead of forecasted predictors.

Although SPIKE is designed to predict the normally distributed quantity of integrated kinetic energy change, it can still be adapted to predict total IKE values (Fig. 3). This is done by adding the estimate of IKE change from SPIKE to the known persistence IKE value. As a result of incorporating persistence into the IKE forecast, the shared variance levels between SPIKE’s total integrated kinetic energy estimates are significantly higher than its estimates for just IKE fluctuations. At a forecast interval of 12 h, SPIKE can estimate total IKE with a staggering explained variance of 84%. This shared variance drops off to 70% by 30 h and a still impressive 60% by a forecast interval of 72 h (Fig. 4a).

The use of persistence to obtain total IKE projections allows the SPIKE products to take advantage of the inertial nature of IKE quantities. Whereas a point metric like maximum sustained wind can and does change rapidly somewhat regularly (e.g., Kaplan and DeMaria 2003), kinetic energy integrated across the entire wind field does not change as rapidly. Although drastic intensity changes do impact IKE values, rapid intensification (RI) events typically result in a drastic increase of near-surface winds over a small confined area of convection near the center of the storm. Therefore, the impact on IKE during RI is typically small, provided the overall size of the storm’s wind field remains somewhat constant. Therefore, a persistence forecast of total IKE is typically very skillful, especially in a short forecast interval. However, at longer forecast intervals, persistence does not fare nearly as well (*r*^{2} = 25% at 72 h; Fig. 4a). In fact, the SPIKE regression model is more skillful at estimating total IKE values at a 72-h forecast interval than is persistence at forecasting the same quantity on a much shorter 30-h interval. The lack of skill by persistence over time indicates that environmental and storm-specific data must be utilized for longer-term forecasts of IKE.

Absolute mean errors were also calculated for the total SPIKE model by computing the average magnitude of the differences between the SPIKE forecast and the observed total IKE values for each of the storm fixes in the 1990–2011 training interval. These mean errors for the total SPIKE model and for a persistence forecast are plotted in Fig. 4b. It should be noted that the mean errors for the persistence forecast are equal to the mean magnitude of IKE change for each forecast interval since a persistence forecast by definition will predict a change of zero TJ. Therefore, the mean magnitude of errors for the SPIKE model’s total IKE projections are of the same magnitude as the IKE changes themselves within the shortest forecast intervals (e.g., ~10 TJ for the 12-h forecast interval and ~15 TJ for the 24-h forecast interval). This ultimately means that a short-term total IKE forecast is not significantly better than persistence. While unfortunate, this does not come as a surprise, as SPIKE has low shared variance scores in these shorter forecast windows for predicting the more subtle IKE changes (Table 2). However, a persistence forecast has significantly more error than the SPIKE model does in forecast intervals greater than 24 h. In fact, at a forecast window of 48 h, total IKE projections from SPIKE have a mean error magnitude of 18.6 TJ compared to a much higher 24.2 TJ from a persistence forecast. Therefore, the mean error statistics support the conclusions drawn from the shared variance statistics, wherein persistence is a tough forecast to beat within 24 h and the SPIKE model exhibits significant skill over persistence beyond that point.

Analysis of mean-squared error (MSE) values in the SPIKE model, with respect to those from a persistence forecast, further emphasize that SPIKE offers a significant skill improvement over a persistence forecast during the training interval (Fig. 5). In a 12-h forecast window, SPIKE exhibits a 14% reduction of MSE values when compared to a persistence forecast. Two-sample bootstrapping tests using the squared errors of the SPIKE model and a persistence forecast suggest that SPIKE’s improvement over persistence is not quite significant at the one-sided *p* = 0.10 level for this short forecast window. However, at 30 h, SPIKE exhibits a 36% improvement over a persistence forecast in terms of MSE, which is significant at the *p* = 0.01 level according to two-sample bootstrapping tests. The percent decrease of MSE in the SPIKE model relative to persistence within the training interval continues to improve across all later forecast intervals, culminating in a 60% reduction of MSE in the 72-h SPIKE model compared to a 72-h persistence forecast.

In addition to simply calculating statistics over the calibration interval, some validation exercises are also performed using standard bootstrapping techniques. These bootstrapping exercises are done by training the model over a sample that is created by randomly selecting data points from the overall population of IKE and predictor data (repetition allowed). The regression coefficients from the model trained over this sample are then used over the original population to examine how much skill is lost. In the case of the SPIKE model for kinetic energy change, there is an average decrease in shared variance of 3.7% across all 12 of the forecast intervals. The decrease of skill for the total kinetic energy estimates is less significant, averaging less than 0.5% across all of the forecast intervals. Ultimately, these simple tests indicate that SPIKE should be able to retain predictive skill when using a different sample of data. However, once again, it should be noted that because developmental SHIPS data are used for the predictors, there likely will be a decrease in skill when using forecasted predictors in an operational setting.

### c. Model performance in the 2012 Atlantic hurricane season

The 2012 Atlantic hurricane season consisted of 19 tropical cyclones, 10 of which were hurricanes and 2 that eventually reached major hurricane status. A total of 395 storm fixes were taken from the extended best track dataset to estimate IKE for each storm. Similar to storms in the training interval from 1990 to 2011, the TCs of the 2012 Atlantic season had IKE values less than 25 TJ for most of their lifetimes. However, four storms (Hurricanes Leslie, Nadine, Rafael, and Sandy) all obtained in excess of 100 TJ of IKE, mostly in the latter stages of their life cycles over the mid- and upper latitudes of the basin.

The same SPIKE models that were calibrated on the 1990–2011 data in the previous sections (coefficients listed in Table 2) are utilized to project observed fluctuations of IKE during the 2012 Atlantic hurricane season, as means of determining SPIKE’s potential predictive skill outside of the training interval. Predictor data for the 2012 season is once again taken from the best track dataset (Jarvinen et al. 1984) and the SHIPS developmental dataset (DeMaria and Kaplan 1999). Therefore, this analysis, which does not use dynamically forecasted predictors, is not meant to assess the operational skill of SPIKE. Instead, this exercise serves as an evaluation of SPIKE’s skill outside of the training interval when given idealistically accurate predictors.

Overall, the SPIKE model exhibits comparable skill during the 2012 season when compared to the 1990–2011 training interval. The explained variance and mean absolute forecast errors are shown in Fig. 6. SPIKE explains an even higher percentage of the observed total IKE variance in 2012 when compared to the training interval. At a forecast interval of 36 h, the 2012 SPIKE model explains an astonishing 83% of the variance of total IKE values, as compared to the 67% explained by SPIKE during the longer training interval. The main reason for this apparent enhanced skill during the 2012 season, when compared to the longer training interval, likely stems from the fact that persistence corresponds extremely well with future IKE values during the 2012 season. In fact, out to 36 h, a simple persistence forecast explains more than 80% of the variance during the 2012 season, suggesting that the kinetic energy levels of TCs during this one season were even more inertial than should be expected on average.

In terms of mean absolute errors, SPIKE does not appear to perform as well during the 2012 season when compared to the 1990–2011 training interval (Fig. 6b). An increase in forecast error should be expected since the 2012 data were not in the training interval that was used to calibrate the regression coefficients in SPIKE. In this case, the error at a forecast window of 72 h was 29.2 TJ during the entire 2012 season, which is a staggering 36% higher than the mean 72-h error during the 22-yr training interval. The highest SPIKE errors during the 2012 season unsurprisingly occurred during Hurricane Sandy. As previously mentioned, Sandy had near-record levels of IKE near the end of its life cycle. The SPIKE model, at all forecast intervals, correctly projected that Sandy would have IKE levels above 150 TJ before reaching the mid-Atlantic coastline. This forecasted level of IKE from SPIKE would have placed Sandy within the top 2.5% of all TC fixes during the training interval. Therefore, in many regards, the SPIKE forecast still would have been more than adequate to categorize the high damage potential of Sandy, which explains why the shared variance score of SPIKE during the 2012 season is not similarly worse than the training interval shared variance levels. Nonetheless, SPIKE was unable to project that Hurricane Sandy would reach near-record IKE values exceeding 400 TJ. As a result, even a very high forecast of 300 TJ for Sandy would contain an enormous error magnitude of more than 100 TJ, contributing to the much higher than expected mean error levels shown in Fig. 6b.

If the final dozen storm fixes of Hurricane Sandy were neglected, the SPIKE model would have significantly less mean error during the 2012 season (Fig. 6b). In fact, the mean error during 2012, excluding these Sandy data points, would be strikingly similar to the mean errors during the full 22-yr training interval at all forecast intervals. MSE statistics (Fig. 7) tell a similar story, wherein the SPIKE model improves upon a persistence forecast during the 2012 season, excluding Sandy, for all forecast hours. Like the MSE reduction analysis done for the training interval in section 4b, SPIKE performs increasingly better than persistence with increasing lead time. The percent reduction of MSE in SPIKE relative to persistence becomes significant at the one-sided *p* = 0.05 level at a forecast interval of 48 h, where SPIKE has 35% less MSE than a persistence forecast. At a forecast interval of 72 h, SPIKE has a reduction of MSE relative to persistence of nearly 45% during the 2012 season, excluding Sandy.

Overall, this exercise suggests that the SPIKE model performs well over a validation dataset such as the 2012 season. Skill relative to persistence increases with increasing lead time, as expected. Within a 24-h forecast window, the high correspondence between SPIKE projections and historical IKE in 2012 can be attributed to the skill of a short-term IKE persistence forecast. However, outside of this 1-day forecast interval, the SPIKE model performs significantly better than a persistence forecast in this validation exercise.

## 5. Conclusions

Although kinetic energy forecasts are uncommon in operations today, IKE is a metric that could be potentially very useful to forecasters. The relationship between kinetic energy metrics and the storm surge damage potential of a tropical cyclone makes forecasting this physical quantity very desirable, especially since IKE better represents damage potential in larger storms such as Sandy than does VMAX. On the other hand, it should be noted that one weakness of kinetic energy metrics is their underestimation of wind damage in smaller but intense storms such as Charley in 2004 (Maclay et al. 2008). Therefore, it is suggested that IKE forecasts should be used in conjunction with existing VMAX forecasts to maximize a forecaster’s ability to assess the wide array of risks for damage in landfalling storms.

The simple statistical regression model designed here adequately estimates changes of IKE in Atlantic TCs out to 72 h. More specifically, the model created here explains as much as 50% of the variance in historical IKE changes out to three days. More impressively, it is found that persistence can be added to SPIKE’s forecast for IKE change to successfully project total values of IKE. In this regard, SPIKE can explain more than 80% of the observed variance in historical total IKE values at a 12-h forecast interval, trailing down to near 60% at 72 h. The increase in skill for projecting total IKE is attributed to the inertial nature of the IKE metric. The fact that persistence is a viable kinetic energy forecast in the short term could be used to the advantage of forecasters in assessing TC risks, especially considering the lack of recent improvements in forecasting the less persistent and notoriously challenging intensity metrics (Rappaport et al. 2009).

Validation tests suggest that SPIKE’s high levels of skill should not degrade sharply when used in a forecast environment, provided that the predictors are forecasted accurately. For example, bootstrapping exercises revealed that SPIKE remains skillful at projecting total IKE values in historical storms, even when calibrated by resampled data. Furthermore, validation of SPIKE during the 2012 season reveals that the model can perform reasonably well when given predictor data outside of its training interval, as will be required in an operational setting. However, it should be once again noted that none of these exercises used forecasted predictors. Thus, it is fair to expect SPIKE to have somewhat reduced skill levels in a truly operational setting because the predictors will be imperfectly forecasted for future time steps in nearly all scenarios.

Previous work done by DeMaria (2010) offers some insight with regards to the amount of skill that could be lost by moving from a “perfect prog” approach to an operational approach with imperfectly forecasted predictors. DeMaria (2010) found that the skill of a statistical–dynamical model, such as the Logistic Growth Equation Model (LGEM), did not drastically improve when given perfect large-scale environmental predictors. Since most of the predictors in our SPIKE model are indeed large-scale environmental predictors, this bodes well for the potential skill of the SPIKE model in an operational setting. However, DeMaria (2010) noted that smaller-scale predictors that are more dependent upon track are more likely to affect model skill when moving from a perfect prog approach to an operational approach with forecasted predictors. This is relevant to SPIKE, as MSLP is used in our model. The intensity of a TC obviously is quite dependent on storm track, especially when a storm is nearing a coastal environment. Considering that MSLP is an important predictor in the SPIKE model, it is worthwhile to examine how an imperfect estimation of MSLP will affect the skill of SPIKE. Therefore, we replaced the observed MSLP predictor in our perfect prog approach with an MSLP persistence forecast, leaving all other “perfect” predictors the same. In most cases, a persistence forecast of MSLP will be a poor indication of future MSLP values, as TC intensity tends to change somewhat rapidly, particularly when storms near landfall. Nonetheless, the skill of the SPIKE model decreases by no more than 20% across all forecast intervals, in terms of both explained variance and MSE, when the poor persistence MSLP predictor is used in place of the perfect observed MSLP predictor used throughout the previous sections. In fact, a SPIKE model trained with an MSLP persistence predictor is still significantly better at projecting integrated kinetic energy than an IKE persistence forecast at the one-sided *p* = 0.10 level in the training interval for all forecast intervals exceeding 24 h. This further suggests that the SPIKE model will likely offer an improvement over a persistence forecast of IKE, even when using imperfect operational predictors.

In summary, this exercise serves as a proof of concept that when given accurately forecasted predictors, it is possible to project IKE in an operational setting. This work can be built upon by using IKE forecasts to statistically estimate storm size and wind field distribution. Future work will also be focused on adapting this model into a statistical–dynamical model that can be used in real time, while also continuing to build upon our understanding of the underlying physical processes that control kinetic energy variability.

Thanks to Drs. Mark Powell, Robert Hart, Phillip Sura, Allan Clarke, Ming Ye, T. N. Krishnamurti, and James O’Brien for their helpful comments and feedback. In addition, we greatly appreciate the helpful comments provided to us during the review process by Dr. Mark DeMaria and an anonymous reviewer. This work was supported by grants from NOAA (Grants NA12OAR4310078, NA10OAR4310215, and NA11OAR4310110) and the USDA (Grant 027865).

## REFERENCES

Avila, L. A., , and J. Cangialosi, 2011: Tropical cyclone report: Hurricane Irene (AL092011) 21-28 August 2011. National Hurricane Center, 45 pp. [Available online at http://www.nhc.noaa.gov/data/tcr/AL092011_Irene.pdf.]

Bell, G. D., and Coauthors, 2000: Climate assessment for 1999.

,*Bull. Amer. Meteor. Soc.***81**, 1328–1378, doi:10.1175/1520-0477(2000)081<1328:CAF>2.3.CO;2.Berg, R., 2009: Tropical cyclone report: Hurricane Ike (AL092008) 1-14 September 2008. National Hurricane Center, 55 pp. [Available online at http://www.nhc.noaa.gov/pdf/TCR-AL092008_Ike_3May10.pdf.]

Blake, E. S., , T. B. Kimberlain, , R. J. Berg, , J. P. Cangialosi, , and J. L. Bevin II, 2013: Tropical cyclone report: Hurricane Sandy (AL182012) 22-29 October 2012. National Hurricane Center, 157 pp. [Available online at http://www.nhc.noaa.gov/data/tcr/AL182012_Sandy.pdf.]

Dean, L., , K. A. Emanuel, , and D. R. Chavas, 2009: On the size distribution of Atlantic tropical cyclones.

,*Geophys. Res. Lett.***36**, L14803, doi:10.1029/2009GL039051.DeMaria, M., 2010: Tropical cyclone intensity change predictability estimates using a statistical-dynamical model.

*29th Conf. on Hurricanes and Tropical Meteorology,*Tuscon, AZ, Amer. Meteor. Soc., 9C.5. [Available online at https://ams.confex.com/ams/29Hurricanes/techprogram/paper_167916.htm.]DeMaria, M., , and J. Kaplan, 1994: A Statistical Hurricane Intensity Prediction Scheme (SHIPS) for the Atlantic basin.

,*Wea. Forecasting***9**, 209–220, doi:10.1175/1520-0434(1994)009<0209:ASHIPS>2.0.CO;2.DeMaria, M., , and J. Kaplan, 1999: An updated Statistical Hurricane Intensity Prediction Scheme (SHIPS) for the Atlantic and eastern North Pacific basins.

,*Wea. Forecasting***14**, 326–337, doi:10.1175/1520-0434(1999)014<0326:AUSHIP>2.0.CO;2.DeMaria, M., , J.-J. Baik, , and J. Kaplan, 1993: Upper-level eddy angular momentum fluxes and tropical cyclone intensity change.

,*J. Atmos. Sci.***50**, 1133–1147, doi:10.1175/1520-0469(1993)050<1133:ULEAMF>2.0.CO;2.DeMaria, M., , M. Mainelli, , L. K. Shay, , J. A. Knaff, , and J. Kaplan, 2005: Further Improvements in the Statistical Hurricane Intensity Prediction Scheme (SHIPS).

,*Wea. Forecasting***20**, 531–543, doi:10.1175/WAF862.1.Demuth, J. L., , M. DeMaria, , and J. A. Knaff, 2006: Improvement of Advanced Microwave Sounder Unit tropical cyclone intensity and size estimation algorithms.

,*J. Appl. Meteor. Climatol.***45**, 1573–1581, doi:10.1175/JAM2429.1.Emanuel, K., 2005: Increasing destructiveness of tropical cyclones over the past 30 years.

,*Nature***436**, 686–688, doi:10.1038/nature03906.Evans, C., , and R. E. Hart, 2008: Analysis of the wind field evolution associated with the extratropical transition of Bonnie (1998).

,*Mon. Wea. Rev.***136**, 2047–2065, doi:10.1175/2007MWR2051.1.Gray, W. M., 1968: Global view of the origins of tropical disturbances and storms.

,*Mon. Wea. Rev.***96**, 669–700, doi:10.1175/1520-0493(1968)096<0669:GVOTOO>2.0.CO;2.Hart, R. E., , and J. L. Evans, 2001: A climatology of the extratropical transition of Atlantic tropical cyclones.

,*J. Climate***14**, 546–564, doi:10.1175/1520-0442(2001)014<0546:ACOTET>2.0.CO;2.Hill, K. A., , and G. M. Lackmann, 2009: Influence of environmental humidity on tropical cyclone size.

,*Mon. Wea. Rev.***137**, 3294–3315, doi:10.1175/2009MWR2679.1.Irish, J. L., , D. T. Resio, , and J. J. Ratcliff, 2008: The influence of storm size on hurricane surge.

,*J. Phys. Oceanogr.***38**, 2003–2013, doi:10.1175/2008JPO3727.1.Jarvinen, B. R., , C. J. Neumann, , and M. A. S. Davis, 1984: A tropical cyclone data tape for the North Atlantic Basin, 1886-1983: Contents, limitations, and uses. NOAA Tech. Memo. NWS NHC 22, Coral Gables, FL, 21 pp.

Jones, S. C., and Coauthors, 2003: The extratropical transition of tropical cyclones: Forecast challenges, current understanding, and future directions.

,*Wea. Forecasting***18**, 1052–1092, doi:10.1175/1520-0434(2003)018<1052:TETOTC>2.0.CO;2.Kantha, L., 2006: Time to replace the Saffir–Simpson hurricane scale?

,*Eos, Trans. Amer. Geophys. Union***87**, 3–6, doi:10.1029/2006EO010003.Kaplan, J., , and M. DeMaria, 2003: Large-scale characteristics of rapidly intensifying tropical cyclones in the North Atlantic basin.

,*Wea. Forecasting***18**, 1093–1108, doi:10.1175/1520-0434(2003)018<1093:LCORIT>2.0.CO;2.Knaff, J. A., , S. P. Longmore, , and D. A. Molenar, 2014: An objective satellite-based tropical cyclone size climatology.

,*J. Climate***27**, 455–476, doi:10.1175/JCLI-D-13-00096.1.Landsea, C. W., , and J. L. Franklin, 2013: Atlantic hurricane database uncertainty and presentation of a new database format.

,*Mon. Wea. Rev.***141**, 3576–3592, doi:10.1175/MWR-D-12-00254.1.Maclay, K. S., , M. DeMaria, , and T. H. Vonder Haar, 2008: Tropical cyclone inner-core kinetic energy evolution.

,*Mon. Wea. Rev.***136**, 4882–4898, doi:10.1175/2008MWR2268.1.McBride, J. L., 1995: Tropical cyclone formation. Global Perspectives on Tropical Cyclones, WMO/TD-693, Rep. TCP-38, World Meteorological Organization, 63–105.

Misra, V., , S. DiNapoli, , and M. Powell, 2013: The track integrated kinetic energy of Atlantic tropical cyclones.

,*Mon. Wea. Rev.***141**, 2383–2389, doi:10.1175/MWR-D-12-00349.1.Musgrave, K. D., , R. K. Taft, , J. L. Vigh, , B. D. McNoldy, , and W. H. Schubert, 2012: Time evolution of the intensity and size of tropical cyclones.

*J. Adv. Model. Earth Syst*.,**4,**M08001, doi:10.1029/2011MS000104.Neumann, C. J., 1987: Prediction of tropical cyclone motion: Some practical aspects. Preprints,

*17th Conf. on Hurricanes and Tropical Meteorology,*Miami, FL, Amer. Meteor. Soc., 266–269.Powell, M. D., , and T. A. Reinhold, 2007: Tropical cyclone destructive potential by integrated kinetic energy.

,*Bull. Amer. Meteor. Soc.***88**, 513–526, doi:10.1175/BAMS-88-4-513.Powell, M. D., , S. H. Houston, , L. R. Amat, , and N. Morisseau-Leroy, 1998: The HRD real-time hurricane wind analysis system.

,*J. Wind Eng. Ind. Aerodyn.***77–78**, 53–64, doi:10.1016/S0167-6105(98)00131-7.Price, J. F., 1981: Upper ocean response to a hurricane.

,*J. Phys. Oceanogr.***11**, 153–175, doi:10.1175/1520-0485(1981)011<0153:UORTAH>2.0.CO;2.Rappaport, E. N., and Coauthors, 2009: Advances and challenges at the National Hurricane Center.

,*Wea. Forecasting***24**, 395–419, doi:10.1175/2008WAF2222128.1.Saffir, H., 1975: Low cost construction resistant to earthquakes and hurricanes. ST/ESA/23, United Nations, 216 pp.

Schade, L. R., , and K. A. Emanuel, 1999: The ocean’s effect on the intensity of tropical cyclones: Results from a simple coupled atmosphere–ocean model.

,*J. Atmos. Sci.***56**, 642–651, doi:10.1175/1520-0469(1999)056<0642:TOSEOT>2.0.CO;2.Simpson, R. H., 1974: The hurricane disaster potential scale.

,*Weatherwise***27**, 169–186, doi:10.1080/00431672.1974.9931702.