## Abstract

Despite recent improvements made to tropical cyclone intensity predictions, this study investigates a different approach than those attempted thus far. Here, the overall environmental setup at genesis is evaluated to determine whether it predisposes a storm to reach its future maximum intensity. Variables retrieved from ERA-Interim are used to generate storm-centered composites at the time of genesis for Atlantic basin, main development region TCs from 1979–2015. Composites are stratified by their maximum attained intensity: tropical depressions (GTD), tropical storms (GTS), minor hurricanes (GMN), or major hurricanes (GMJ). A multiple-parameter linear regression is then used to associate the eventual attained intensity of tropical cyclone to the obtained variables at genesis. The regression has an adjusted *r*^{2} of 0.39, which indicates that a statistical relationship is present. Regression coefficients, along with the spatial distribution of variables in the storm-centered composites, indicate that storms that reach higher intensities are associated at genesis with stronger, more compact, low-level vortices, better-defined outflow jets, a more compact region of high midlevel relative humidity, and higher atmospheric water vapor content.

## 1. Introduction

Tropical cyclone (TC) intensity prediction remains a critical issue in operational meteorology despite recent improvements in intensity forecasts (DeMaria et al. 2014). The current literature focuses on factors behind 1) along-track intensity fluctuations and 2) whether tropical cyclogenesis will occur. An area of research missing from the literature and explored here is whether a more favorable environment at genesis might initially predispose a storm to reach its future maximum achieved intensity.

Along-track intensity fluctuations, both external interactions and internal processes, are the focus of the majority of the research on TC intensity (e.g., Alvey et al. 2015). Externally induced fluctuations can occur as TCs traverse thousands of kilometers and interact with differing environments. Such interactions include movement into areas with varying sea surface temperatures, which can alter both surface fluxes and ocean mixing, and the interaction with upper-level troughs that can change both the vertical wind shear and eddy momentum fluxes (e.g., Challa and Pfeffer 1980; Pfeffer and Challa 1981; Holland and Merrill 1984; Molinari and Vollaro 1990; DeMaria 1996; Shay et al. 2000; Shen and Ginis 2003). However, some studies have shown that the environment a TC interacts with along its path does not have a significant impact on intensity (e.g., Hendricks et al. 2010). Regardless, TCs can resist external influences if the storm is associated with high inertial stability (Rappin et al. 2011). Intensity fluctuations can also be induced through internal processes such as eyewall replacement cycles, barotropic instability, and axisymmetrization of vortical hot towers (VHTs) via vortex Rossby waves (e.g., Schubert et al. 1999; Landsea et al. 2004; Montgomery et al. 2006; Sitkowski et al. 2011; Montgomery and Smith 2012). These processes can occur once, multiple times, or not at all along a TC’s path.

One well-known statistical model that attempts to include these external and internal forcings in their TC intensity predictions is the Statistical Hurricane Intensity Prediction Scheme (SHIPS). SHIPS predicts TC intensity changes through multiple regression techniques using climatological, persistence, and synoptic predictors (DeMaria and Kaplan 1994, 1999; DeMaria et al. 2005). The model was used in the creation and tuning of a rapid intensity index (RII) for the Atlantic basin to account for rapid intensification cases (Kaplan and DeMaria 2003; Kaplan et al. 2010, 2015). Research relating the intensity of TCs to the surrounding large-scale environments has also been performed using linear regression modeling (Lee et al. 2015, 2016).

Related to the intensity change prediction problem of an established TC, there is also active research on whether, and how, a tropical disturbance intensifies into a tropical depression (e.g., McBride and Zehr 1981; Pfeffer and Challa 1981; Smith and Montgomery 2012; Komaromi 2013; Zawislak and Zipser 2014b; Helms and Hart 2015). Often, comparisons are made between developing and nondeveloping TCs through composite analyses to further understand distinguishing factors for genesis. These factors include moistening of midlevels, an increase in the cyclonic relative vorticity at inner radii in the lower troposphere, the development of a strong warm core, and intense widespread convection. There are two major pathways for genesis that highlight some of these favorable conditions and provide the disturbances with sufficient surface cyclonic vorticity (Braun et al. 2010; Zawislak and Zipser 2014a). The first pathway is the bottom-up theory, which involves low-level potential vorticity anomalies interacting with an existing mesoscale vortex to enhance the convergence near the surface (Hendricks et al. 2004; Montgomery et al. 2006; Bell and Montgomery 2010). The second is the top-down theory, where stratiform rain evaporates near the surface causing subsidence that advects positive vorticity from aloft to lower levels (Bister and Emanuel 1997; Ritchie and Holland 1997; Simpson et al. 1997). There has also been recent research on the top-down method and the ways that the midlevel vortex can result in a TC. For example, Nolan (2007) focused on the importance of saturation of the mid- to upper levels for intensification of the midlevel vortex. Also, Gjorgjievska and Raymond (2014) showed that convergence near the surface is forced by large gradients of vertical mass flux, which are due to moist convection induced by the midlevel vortex.

Here, the relationship between TCs at genesis and their maximum attained intensity is investigated for TCs within the Atlantic basin main development region (MDR) over a 37-yr period (1979–2015) through a multiple-parameter linear regression analysis of storm-centered composites. Composites are generated using the European Centre for Medium-Range Weather Forecasts (ECMWF) interim reanalysis (ERA-Interim, hereafter ERA-I; Dee et al. 2011) and are stratified by their maximum attained intensity. In this paper, the methods and results from the regression index will be first discussed. Subsequently, the relationship of the regression index to the spatial distribution shown in storm-centered composites is detailed.

## 2. Data

### a. TC data

Four times daily, Atlantic basin TC location and intensity track data over a 37-yr period (1979–2015) were obtained from the NHC “best track” hurricane database (HURDAT2; Landsea and Franklin 2013). Because of the subjective nature of HURDAT2, uncertainties in position, intensity, and pressure data retrieved are present. A detailed description of uncertainties present is provided in Torn and Snyder (2012) and in Landsea and Franklin (2013).

Tracks were stratified into four intensity groupings defined as storms that 1) did not intensify beyond tropical depression status (GTD), 2) intensified to tropical storm status (GTS), 3) achieved minor hurricane intensity status (GMN), and 4) reached major hurricane intensity status (GMJ).

Within each grouping, genesis points were extracted at the first instance along a track where a TC achieved tropical depression status. Storm motion was then calculated by taking the one-sided difference between this genesis point and the subsequent center point location.

Two filters were then applied to the genesis point data. First, genesis points were spatially restricted to the MDR (5°–20°N, 70°–10°W) to eliminate latitude and longitude biases, as well as to include storms that form through similar dynamical processes (McTaggart-Cowan et al. 2008). Second, genesis points were temporally restricted by the number of 6-h periods until a storm reached its maximum intensity. Only storms that took between 1 and 9 days to reach their maximum intensity, the 10th and 90th percentiles, respectively, were included. This restriction was applied to minimize potential biases caused by storm length, which, for example, could introduce external interactions or allow for internal processes to occur that might cause intensity fluctuations.

Table 1 lists the original and remaining sample sizes of TC genesis points that fit the time and location criteria, where 35% of storms remain after subsetting. The spatial distribution of remaining genesis points are shown in Fig. 1. Hereafter, mention of genesis points refers to this subsetted data.

### b. Numerical model analyses

Storm-centered, storm-relative composites were computed from the 6-hourly ERA-I by extracting data from a 140° longitude × 112° latitude box centered on each genesis point identified. Storm-relative flow at various levels was calculated by subtracting the storm motion from the zonal and meridional wind. The ERA-I dataset is available on 37 pressure levels with a horizontal resolution of 0.7° × 0.7°.

## 3. Methods

### a. Environmental variables

Variables included in this study are those discussed in the literature as being influential in TC genesis: midlevel relative humidity (RH), local (single column) deep layer, local vertical wind shear (LWS),^{1} low-level relative vorticity *ζ*, upper-level divergence *D*, sea surface temperature (SST), and total column water (TCW) (Gray 1968, 1979; Emanuel and Nolan 2004; Camargo et al. 2007; Tippett et al. 2011). Units and levels^{2} chosen are found in Table 2. Two measures of moisture are chosen, as the humidity at one atmospheric level is not necessarily correlated with the moisture in an entire atmospheric column.

It is important to understand how asymmetries in the environmental setup might lead to a more viable storm at genesis. Based on azimuthal averages of relative vorticity, the dominant cyclonic circulation extends from the center to about 500 km in ERA-I (see Fig.). Thus, values within twenty-five 5.6° longitude by 5.6° latitude boxes^{3} were averaged (Fig. 2). The inner grid (blue boxes) focuses on the four quadrants of the TC—northwest, northeast, southeast, and southwest. The middle grid (green boxes) focuses on the boundary between the TC and the environment and also incorporates the entire TC center itself (Fig. 2, box M5). The outer grid (pink boxes) focuses on the immediate environment surrounding the storm.

Spatial differences between all intensity groupings at 95% confidence were determined through bootstrap testing with 1000 iterations. Differences between intensity groupings at the storm center position itself were determined through similar bootstrap testing, but at the 99% confidence level, unless otherwise specified.

### b. Statistical model

This study investigates the relationship between the environmental parameters listed in section 3a at genesis and the maximum intensity achieved by each individual storm as identified in HURDAT2 by applying an empirical multiparameter linear regression:

where is the predicted intensity with units of knots (kt; 1 kt ≈ 0.51 m s^{−1}) per unit of the specified variable, *a* is the intercept term, *n* is the total number of predictors, and *β*_{i} is a regression coefficient multiplied by *X*_{i}, the respective average value within a given longitude-by-latitude box shown in Fig. 2.

A forward stepwise selection function was used to reduce the total number of variables in the regression. The goal of variable reduction is to remove meteorological predictors that could strongly correlate with each other and provide a false improvement in correlation (Nelson 2015). The forward stepwise selection function used in this study chooses predictors based upon the Akaike information criterion (AIC; Akaike 1974). AIC is a criterion that evaluates the goodness of fit in a model to the number of predictors included in the model and is calculated by

where *n* is the number of predictors and *L* is the maximum of the likelihood function (Akaike 1974). Therefore, the most optimum regression will have a low AIC (Akaike 1974). Specifically, the step-AIC function from the Modern Applied Statistics with S (MASS) R package was used.

In an effort to even further eliminate the number of parameters included in the regression, a cross-validation approach was used to remove variables from the stepwise forward selection regression. The mean and one standard deviation of the deviance from 10 tenfold cross-validation simulations as a function of the number of predictors were obtained following methods in Ditchek et al. (2016). From the resulting distribution of deviance values, when the mean cross-validated deviance stops decreasing with the addition of more variables, the remaining variables can be excluded from the model.

## 4. Results

### a. Regression index

From an initial 150-parameter selection,^{4} the AIC forward stepwise selection technique selected a 74-parameter model with an overall correlation coefficient *r*^{2} of 0.86 and an adjusted *r*^{2} of 0.77. However, the inclusion of this many variables in the regression equation may provide a false increase in the *r*^{2} value.

Applying the cross-validation approach indicates that the regression can be reduced to seven parameters (Fig. 3), since the addition of more variables beyond seven does not reduce the deviance further. The relative importance of these variables and their individual contribution to the overall *r*^{2} is shown in Fig. 4. The term RH_{O7} contributes the least to the total *r*^{2} value (1.98%). This suggests that while RH_{O7} may help the overall performance of the regression, it is not as statistically significant individually as the other variables included in the regression. However, physically it does have considerable meaning and positively adds to the total *r*^{2} value. Thus, this variable is still included in the index.

The final seven-parameter cross-validated linear regression is given by the following:

where subscripts indicate the box where the given variable is selected (see Fig. 2).

This seven-parameter linear regression has an overall *r*^{2} of 0.41 and an adjusted *r*^{2} of 0.39, with an F statistic of 17.5 and a *p* value of <2 × 10^{−16}. There is a decrease in both the overall *r*^{2} and the adjusted *r*^{2} from the 74-parameter model, but this decrease is associated with the removal of correlated parameters and overfitting the data. The *r*^{2} between the variables in the seven-parameter model is weak, ranging from close to 0.00 to 0.22 with a mean of 0.053 and a standard deviation of 0.058, indicating that variables are not highly correlated and further validating their use in the regression. It is possible that the regression could exhibit artificial skill due to preferential predictor screening (DelSole and Shukla 2009). To specifically verify and test the robustness in skill, the regression would need to be tested on independent data. However, the purpose of the regression presented here is not to serve as an operational metric of forecasting future TC intensity. Rather, the regression is used to indicate key areas in storm-centered composites, as described later, where there is a relationship between environmental variables and future maximum attained intensity.

Using Eq. (3), one can interpret how these terms at genesis individually affect the subsequent intensity with all else equal (Table 3). To achieve a high-intensity TC, at genesis there should be high total column water in the inner northwest quadrant of the TC (box I1), large deep-layer, local vertical shear to the southeast of the TC (box M9), high total column water near the TC genesis point (box M5), weak deep-layer vertical shear to the northeast of the TC (box O4), low midlevel relative humidity to the west-southwest of the TC (box O7), strong positive low-level relative vorticity near the TC genesis point (box M5), and high total column water to the northwest of the TC (box O1). The implications of these relationships to the maximum achieved intensity will be examined in section 4b with storm-centered composites.

To assess the statistical significance of each parameter included in the regression, the standard errors and the *p* values of each predictor individually are summarized in Table 3. The terms TCW_{I1} and TCW_{O1} have the largest relative standard error compared to the chosen coefficient. However, the *p* values associated with all of the variables for this regression are less than 0.05, including TCW_{I1} and TCW_{O1}. On the other hand, the *p* values for TCW_{I1} and TCW_{O1} are high compared to the other parameters in the regression. However, as shown in Fig. 4, the contributions of TCW_{I1} and TCW_{O1} to the overall *r*^{2} are large and on the same order as the other variables chosen by the model, aside from RH_{O7}. This suggests that despite having a high standard error and relatively higher *p* values, TCW_{I1} and TCW_{O1} are individually statistically significant.

Overall, the model performs well, illustrating a general relationship between the genesis variables chosen and subsequent maximum intensity. The model has an acceptable correlation coefficient for both the overall *r*^{2} and the adjusted *r*^{2} as defined by Brooks and Carruthers (1978).^{5} The regression has a small, near-negligible positive bias of 1.09 × 10^{−13} kt but has high mean absolute error (MAE), root-mean-square error (RMSE) and standard deviation (STD) (Table 3; units of kt). Thus, on a long-term average across this dataset, the regression will not produce much bias, but on an individual storm-by-storm basis, there is potential for strong deviations from the predicted value.

Figure 5 depicts the predicted intensity compared to the observed intensity. Ideally, the regression should follow the *y* = *x* line (Fig. 5, blue line). Rather, the regression has a smaller slope (Fig. 5, red line), which indicates that the regression underpredicts more intense TCs and overpredicts weaker TCs. However, the regression does depict GTS and GMN well. This is expected when a linear regression model is applied to a nonlinear relationship, which tends to reduce the residuals in the middle of distributions but increases the residuals at the extremes of distributions (e.g., Nelson 2015). Another explanation could be the strong variance in the GTD cases is skewing the regression line.

To examine the influence of the variance of the GTD cases, all GTD storms were temporarily removed from the dataset. A new multiparameter linear regression was then derived for the non-GTD cases. The new model had an overall *r*^{2} of 0.38 and an adjusted *r*^{2} of 0.35. The fact that the overall and adjusted *r*^{2} is greater for the original seven-parameter linear regression than the non-GTD linear regression means that the strong variance with GTD cases does not severely impact the performance of the model. Further, the distribution of residuals without the GTD cases (not shown) had a very similar profile to the full, seven-parameter linear regression residuals (Fig. 6b). While removing the GTD cases did make the distribution more symmetric and slightly decreased the amount of extreme negative residuals, the changes in the distribution were not drastic enough to warrant the complete removal of the GTD cases. In addition, an adequate statistical model of future TC intensity should incorporate storms that did not intensify in order to understand why some TCs intensify, while others do not. Therefore, a model can be constructed with or without the GTD cases and the overall *r*^{2} will be qualitatively the same.

Collectively, this suggests that in order to decrease the deviance of the GTD cases and increase model performance, more and different environmental parameters should be included. It is possible that the strong variance in the GTD cases is a result of not accounting for other TC parameters. One parameter in particular, which is not included in the derivation of the model, is a parameter describing the convective intensity and/or distribution, which plays a crucial role in the intensification process of TCs (e.g., DeMaria et al. 2012; Kaplan et al. 2015). Hereafter, all model evaluation refers to the original, full dataset, seven-parameter linear regression [Eq. (3)].

Figure 6 contains four panels quantifying the residuals with the regression. The fitted values in Fig. 6a (otherwise known as ) are the predicted intensity corresponding to each data point (Holland 2011). The regression residuals tend to fluctuate around zero (red curve in Fig. 6a), but there is an extreme amount of variation in the residuals for each predicted (fitted) value, especially with GTD and GMJ.

The histogram of residuals (Fig. 6b) is approximately normally distributed, having a slight positive skew, which fits the above findings. This indicates that on a long-term average with many data points, the model does reduce the amount of residuals, but there is a signal of strongest variation in the GTD composite.

The standardized residual and theoretical quantiles plot, also known as a quantile–quantile (Q–Q) plot (Fig. 6c), tests how normally distributed the residuals are from the regression. The Q–Q plot shows that the bulk of the data points lie on the Q–Q line, except for two areas near the tails of the line. While not extremely heavy tailed, it shows that the actual negative residuals and actual positive residuals are slightly too negative and positive, respectively. This deviation from the Q–Q line further validates the strong variance in the environment with GMJ and, in particular, GTD. Physically, this means that the environments in which GTD form are highly variable and produce the most error in the model.

The last panel in Fig. 6 is an evaluation of the Cook’s distance, which accounts for leverage and strength of the residuals. A Cook’s distance is considered to be heavily influential on the regression line if it has a value greater than 0.5 (Holland 2011). In this case, the largest Cook’s distance is just over 0.04. This means that any one individual point can be removed from the dataset without changing the coefficients or performance of the regression line. This is likely due to the appreciably small bias with strong deviations.

It is therefore concluded from Table 3 and Fig. 6 that the bulk of the regression does reduce the size of residuals in a long-term application, which indicates some degree of skill. However, the authors explicitly ascertain that this regression is not an operational statistical model that will accurately predict intensity of a single TC in time and space as defined by maximum attained wind speed. Rather, this multiparameter linear regression serves as an exploratory method of evaluating the potential for specific environmental variables at genesis to affect the subsequent maximum intensity. It also serves as a method of synthesizing the myriad of initial variables in this study into a small subset of variables for further statistical analysis.

### b. Regression-related storm-centered composites

To further understand the contribution of the parameters chosen in Eq. (3) to the eventual attained intensity of TCs, boxes following the coloring scheme applied in Fig. 2 were overlaid on the regions chosen by the seven-parameter linear regression [Eq. (3)] in Figs. 7–13. Only the GMJ and GTD composites are shown, along with their difference field, since the environmental structure directly surrounding the composite center for each variable generated was similar in the GTD, GTS, GMN, and GMJ groupings. Specifically, differences manifested primarily from changes in the magnitude for each variable.

#### 1) Total column water

The seven-parameter, cross-validated linear regression identified TCW as a key variable in three locations: the blue inner northwest box (I1; the first chosen predictor), the green middle center box (M5; the third chosen predictor), and the pink outer northwest box (O1; the seventh chosen predictor).

All three coefficients associated with TCW are positive (Table 3, rows 1, 3, and 7), implying that a more moist environment at genesis leads to an eventual higher intensity storm. This is most clearly seen in Fig. 7c, which indicates that the three boxes are located in an area of significantly higher TCW in the GMJ composite compared to the GTD composite.

As expected, the amount of TCW decreases radially outward from the storm center. The GMJ composite storm center itself has 3.81 kg m^{−2} more water than the GTD composite, significant at the 99% confidence level. This difference is clearly seen by the area occupied by the warmest colors around the center of the storm when comparing Figs. 7a and 7b. Additionally, there is spatially more vertically integrated moisture north of GMJ storms than north of GTD storms.

This moister environment can be more conducive to sustaining deep convection and thus would help spur further development of the storm. Higher TCW can lead to more efficient lift, which aids in mass transport and modulates the secondary circulation. Additionally, higher TCW could imply fewer convective downdrafts, which would allow for boundary layer warming. Higher TCW also increases CAPE. Overall, this should strengthen the vertical velocity of VHTs in the bottom-up method. In the top-down method, it is more moisture to work with. Moisture to the north and northwest was also found to be a key factor for developing African easterly waves by Brammer and Thorncroft (2015).

#### 2) Local vertical shear

The regression model selected two regions of vertical wind shear: box O4 (Fig. 8, upper pink box) and box M9 (Fig. 8, lower green box). The area surrounding the storm center had similar shear values in each composite.

Box O4 is associated with a negative regression coefficient (Table 3, row 4), implying that wind shear in this area is detrimental to higher eventual attained intensities. In that box, the GTD composite has vertical shear values of around 13–20 m s^{−1} that cover most of the box (upper pink box in Fig. 8b), while the GMJ composite has vertical shear values of 13–15 m s^{−1} that cover only the top third of the box (upper pink box in Fig. 8a). Overall, to the north of the composite storm, moderate to high shear begins at lower latitudes in the GTD composite than in the GMJ composite, indicating that storms that do not intensify beyond tropical depression (TD) status are situated in a more unfavorable large-scale environment than storms that intensify to major hurricane status.

On the other hand, box M9 is associated with a positive regression coefficient (Table 3, row 5). Thus, vertical shear present in that box is positively associated with higher attainable intensities. In the GMJ composite, there is 13–15 m s^{−1} of shear present in box M9 (lower green box in Fig. 8a), while the GTD composite has less than 12 m s^{−1} of shear in box M9 (lower green box in Fig. 8b).

Examining the storm-relative flow at 850 and 200 hPa, the region of high shear in the GMJ composite is explained by the presence of 1) strong, westerly, storm-relative flow at 850 hPa (not shown) associated with a stronger low-level vortex than in the GTD composite (Fig. 11) and 2) strong, northerly flow at 200 hPa associated with a more well-formed outflow jet that extends farther northward before wrapping around the east side of the composite storm (Fig. 9). Therefore, the moderate shear values in box M9 are not indicative of a weaker environmental setup. Rather, they suggest that storms that reach higher intensities have a more defined outflow jet at upper levels and a stronger low-level vortex.

Looking at the order in which the regression chose the wind shear predictors, box O4 was the second predictor chosen while box M9 was the fourth predictor chosen. So, while storms that reach a higher intensity are associated with less shear in the large-scale environment north of the storm, more key in determining future intensity is whether there is a defined outflow jet that wraps around the east side of the storm.

#### 3) Relative humidity

The linear regression also included a relative humidity parameter to the southwest of the TC genesis point, as shown by the pink box in Fig. 10. The regression coefficient for RH is negative (Table 3, row 6), indicating that lower midlevel moisture to the west of the storm results in a higher eventual attained intensity. In the GMJ composite, RH values range from 40% to 50% (pink box in Fig. 10a). The GTD composite had values ranging from 35% to 55%, but over half of the box had 50%–55% RH (pink box in Fig. 10b).

These results appear to be counterintuitive since it is well documented that a moist midlevel environment is necessary for TC formation, persistence, and intensification. However, the chosen box is 600–1200 km away from the storm center, located in the environment directly outside the storm’s main cyclonic circulation (see Figs. 14b,e). Thus, the tighter gradient of moisture present in GMJ storms indicates a more compact storm that is more isolated from its environment, rather than indicating a storm with less moisture. This tighter gradient of moisture is directly the result of 500-hPa easterlies (not shown) in the GMJ composite wrapping more strongly around the storm than in the GTD composite. Therefore, in the GTD composite, higher values of RH are advected westward away from the storm center while in the GMJ composite, higher values of RH are advected around the storm center.

Overall, the environment at the storm center had a relative humidity 13.08% higher in the GMJ composite than in the GTD composite, significant at the 99% confidence level. The large-scale environment around and to the north of the storm center has higher relative humidity values in the GMJ composite as seen most clearly in Fig. 10c. Mean, storm-relative, easterly flow in both the GMJ and GTD composites (not shown) acts to entrain the moist (less moist) air to the east of the storm center into midlevels of the GMJ (GTD) composite. This difference could induce less strengthening of the GTD storms, perhaps explaining why GTD fails to intensify beyond TD status.

#### 4) Relative vorticity

The regression model also includes one region of 850-hPa relative vorticity: box M5, which surrounds the genesis point. The green box in Fig. 11 highlights this location.

The ζ_{M5} coefficient is large and positive (Table 3, row 7), indicating that storms that reach stronger intensities are associated with higher relative vorticity values present at the genesis location. This is consistent with results shown in Figs. 11a,b. It is evident that GMJ storms have stronger relative vorticity with tighter gradients on the periphery of the origin location. The difference is shown to be significant by the white contours in Fig. 11c. At the composite storm center, values of vorticity are 1.34 × 10^{−5} s^{−1} higher for GMJ storms than GTD storms, a statistically significant value at the 99% confidence level.

In the large-scale environment, the vortex in both composites is situated within the ITCZ—an area of cyclonic vorticity and associated with deep convection. Also of note, the area of high relative vorticity at the TC origin in the GMJ composite extends and connects to an area of heightened relative vorticity to the storm’s east. This extension is absent in the GTD composite. This suggests that GMJ cases develop in a strong band of continuous vorticity, whereas GTD cases are associated with a discontinuous band of vorticity. As vorticity is often tied to areas of convective activity, GMJ cases may originate from areas of strong, continuous convection off of Africa. While not captured by the regression, it is still a notable feature that distinguishes GMJ from GTD, which is statistically significant.

### c. Other storm-centered composites

Two of the variables identified in Table 2 are not included in the seven-parameter linear regression: sea surface temperature and upper-level divergence. Despite not having strong, statistical relationships to the eventual attained intensity of TCs, the large-scale spatial structure may still impact the future intensity of a storm.

#### 1) Sea surface temperature

The GMJ composite storm center is situated in waters that are 0.89 K warmer than the GTD composite center (Fig. 12), significant at the 99% confidence level. On a larger scale, the area north of the composite storm center is warmer in the GMJ composite than in the GTD composite. Warmer waters are not only more favorable for genesis but are also more favorable for generating greater instability, more surface fluxes, and a lower central pressure due to latent heat release. In fact, the average central pressure for GMJ is 2.38 hPa lower than GTD, again a statistically significant value at the 99% confidence level. However, high sea surface temperatures often are not sufficient enough to generate a favorable environment, as the warm water layer might be shallow. Since the GMJ and GTD storms, on average, form near the same location and have similar distributions in formation time of year,^{6} it can be assumed that their warm water volume is similar, and thus the simply higher sea surface temperature in the GMJ composite indicates a more favorable environment for future attainable intensity.

#### 2) Divergence

Upper-level divergence is shown in Fig. 13. The difference field between the GMJ and GTD composite shows little significance when compared to other variables. However, directly surrounding the storm center there is more divergence, and at the storm center there is 0.52 × 10^{−5} s^{−1} more divergence in the GMJ composite than in the GTD composite, not significant at the 99% level. Divergence at the storm center is associated with upper-level outflow. The larger divergence surrounding the GMJ composite is associated with the better-formed outflow jet discussed previously in the vertical shear section and shown in Fig. 9a.

## 5. Azimuthal averages

While specific levels were chosen in this study to generate a multiple-parameter linear regression, differences between GTD and GMJ storms are also found at other levels. To succinctly view this, azimuthal averages were generated by bilinearly interpolating ERA-I data to cylindrical grids centered on each individual TC center radially every 100 km (from 100 to 2000 km outside the storm core) and vertically every 25 hPa (from 1000 to 50 hPa). Regions where GMJ differed significantly from GTD at the 95% confidence level were determined through bootstrap testing with 1000 iterations.

The azimuthally averaged radial wind for GMJ (Fig. 14a) and GTD (Fig. 14d) show that storms that reach higher intensities begin with stronger outflow and inflow. Both also extend through a deeper layer in GMJ storms than in GTD storms. This result corroborates the positive coefficient associated with deep-layer wind shear in box M9—a stronger, more well-defined outflow jet aloft is present in GMJ storms than in GTD storms. Increased outflow aloft is also associated with stronger vertical motion at inner radii through mass continuity (not shown).

Figures 14b and 14e show that both TC relative vorticity cores extend radially outward to 600 km. However, the core of GMJ storms have stronger radial gradients of vorticity and higher maximum cyclonic vorticity in the core, as seen in Fig. 11. The cyclonic core in GMJ storms also extends through a deeper column than the GTD cases. The upper levels of the TC are characterized by more negative relative vorticity in the GMJ than the GTD composite. Together, this suggests that a more intense, isolated cyclonic vortex, with anticyclonic upper levels at outer radii, will more likely develop into a GMJ storm.

Finally, Figs. 14c and 14f show the anomalous potential temperature calculated relative to the mean moist tropical sounding from Dunion (2011). The GMJ storms have a stronger warm core, with a more intense anomaly present in upper levels from 200 to 600 hPa and extending past 1200 km. While the cold anomalies present above the warm core in the GMJ composite are larger than in the GTD composite, implying an elevated tropopause, differences are not significant.

Overall, these results show that GMJ storms at genesis have a more established primary and secondary circulations.

## 6. Discussion

This study has provided a first look at the possible dependence of environmental conditions at genesis to the maximum achieved intensity of TCs in the Atlantic basin. Specifically, the application of a multiparameter linear regression and cross-validation techniques related four variables in seven locations to storms, which reached higher maximum intensities. Again, a more favorable environment at genesis is defined in this study as an environment associated with those conditions favorable for genesis.

Figure 15 is a summary graphic highlighting the variables and boxes chosen: 1) high total column water in the inner northwest quadrant of the TC (box I1), 2) large deep-layer, local vertical shear to the southeast of the TC (box M9), 3) high total column water near the TC genesis point (box M5), 4) weak deep-layer vertical shear to the northeast of the TC (box O4), 5) low midlevel relative humidity to the west-southwest of the TC (box O7), 6) strong positive low-level relative vorticity near the TC genesis point (box M5), and 7) high total column water to the northwest of the TC (box O1), in order as chosen by the regression.

Overall, storms that reach a higher intensity will have a more localized, isolated, and relatively more intense vorticity maxima near the core (Fig. 11) and a stronger, tighter circulation through a deeper column (Figs. 14b,e). As shown in previous studies, the presence of strong relative vorticity not only increases the chances for genesis (Gray 1968, 1979; Emanuel and Nolan 2004; Camargo et al. 2007; Tippett et al. 2011) but also serves a role in the intensification process of TCs via VHT axisymmetrization (Hendricks et al. 2004). Stronger storms are also associated with enhanced, better-defined upper-level outflow that wraps around the east side of the storm at genesis. (Figs. 8, 9, and 14a,d). With a more established, stronger circulation overall, there would be more surface fluxes present in the GMJ composite than in the GTD composites, which supports the theory that intensity and surface fluxes are highly connected (Emanuel 1986). While SST was not chosen for the index, there is statistically warmer water located around the GMJ storm (Fig. 12).^{7} This, coupled with the midlevel import of moist air (Fig. 10) into and around the GMJ vortex by the 500-hPa easterlies (not shown), the overall higher moisture in general as seen in the TCW composites (Fig. 7), and the warmer anomalous potential temperature present aloft in GMJ storms at genesis (Figs. 14c,f), could provide a more favorable environment at genesis for storms that reach higher intensities.

As seen in Fig. 15, the regression chose variables in the environment on the periphery of the main TC circulation, as well as down- and upstream of the TC. The dominant flow patterns in the MDR thus act to advect favorable or unfavorable conditions into the vortex. Since GMJ storms are more isolated from their environment than GTD storms, they can resist negative effects from the external advection of environmental properties. The stronger, more resilient vortex for GMJ storms supports the findings of Rappin et al. (2011), where a TC with a stronger core can resist strong external forcings.

One potential interpretation of these results is that tropical depressions with stronger tangential wind or lower pressure at genesis become stronger TCs. The HURDAT2 genesis sustained wind speed (pressure) is only weakly, positively correlated to the eventual intensity sustained wind speed (pressure), with *r* = 0.19 (*r* = 0.17). Correlations listed above give *r*^{2} values of 0.04 and 0.03, respectively. These values are well below the Brooks and Carruthers (1978) threshold and the performance of the regression developed in this study. While ERA-I cannot resolve the radius of maximum winds near the inner core, the maximum tangential wind speed at genesis and at the maximum intensity of the storm was still compared. Values had a correlation coefficient of *r* = 0.48 and an *r*^{2} value of 0.23. The correlation is higher than that in HURDAT2, but it is still below the Brooks and Carruthers (1978) threshold. The higher correlation is perhaps due to the low resolution of ERA-I, which underestimates the radial and tangential wind fields. However, results still imply that storms that reach stronger intensities do not necessarily start out with a deeper pressure or faster tangential wind speeds at genesis.

The results found in this study indicate an underlying relationship between the environment at genesis and the subsequent achieved intensity. While this study does not focus on along-track factors, it does not suggest that along-track environment interactions are not important. Rather, this study argues that a more well-formed vortex and favorable environmental conditions at genesis enables a vortex to be more resilient to environmental influences at genesis and along its future track. Thus, the structure at genesis can be related to future TC intensity out to nine days.

Although beyond the scope of this study, it is important to still note that if a multiparameter linear model such as this were to be used operationally, more extensive development and testing must be conducted. Testing might include intrabasin (e.g., MDR vs Gulf of Mexico), basin (e.g., western Pacific vs eastern Pacific), or reanalysis dataset (e.g., ERA-I vs the Climate Forecast System Reanalysis) comparisons of statistical indexes in order to determine whether similar variables at genesis were related to the eventual attained intensity of TCs. Additionally, while this study used local vertical shear as a predictor, it is possible to remove the symmetric circulation, which would generate another predictor to be used, and repeat the statistical analysis. Furthermore, missing from the present statistical analysis is a measure of convection. Other studies that use convection as a predictor use satellite imagery (e.g., Kieper and Jiang 2012; Rozoff et al. 2015; Tao and Jiang 2015). However, this was not ideal for the present study. Although satellite imagery does extend to the beginning of the ERA-I dataset, including a single predictor from an outside dataset, while the rest were from ERA-I, may introduce unforeseen errors. Therefore, in order to ensure consistency with the datasets used in this analysis, satellite imagery representing convection, such as brightness temperatures, was not included. Finally, it would be valuable to construct a metric to classify how well organized (i.e., in terms of the regression results) a storm is at genesis. Perhaps with such a metric, the regression index could be used in predictive capabilities.

## Acknowledgments

This research is the culmination of a class project started while SDD, TCN, and MR were taking Special Problems in TC Research (AATM 741) taught at the University at Albany by the fourth author. The authors wish to thank John Molinari and Lee Harrison for useful discussions. Also, the authors thank the three anonymous reviewers for helpful comments. The first author was supported by the Department of Defense (DOD) through the National Defense Science and Engineering Graduate Fellowship (NDSEG) Program. The second author was supported by the Atmospheric Sciences Research Center (ASRC) Graduate Fellowship. All authors are grateful to the Department of Atmospheric and Environmental Sciences, University at Albany, for supporting the publication of this work. The ERA-Interim data used for this research were retrieved from the Research Data Archive (RDA), which is maintained by the Computational and Information Systems Laboratory (CISL) at the National Center for Atmospheric Research (NCAR). The original data are available from the RDA (http://rda.ucar.edu/datasets/ds627.0/).

## REFERENCES

*Handbook of Statistical Methods in Meteorology*. AMS Press, 412 pp.

*26th Conf. on Hurricanes and Tropical Meteorology*, Miami, FL, Amer. Meteor. Soc., 10A.2.

*Meteorology over the Tropical Oceans*, D. Shaw, Ed., Royal Meteorological Society, 155–218.

## Footnotes

^{a}

Additional affiliation: Atmospheric Sciences Research Center, University at Albany, State University of New York, Albany, New York.

© 2017 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

^{1}

Local vertical wind shear was calculated at each point. Thus, close to the TC shear is storm dominated. Outside the main storm circulation, shear represents the environmental shear.

^{2}

Multiple levels were tested with similar results. Therefore, levels aligned with those found in the literature were chosen to be used in this study.

^{3}

This size box corresponds to 8 × 8 grid boxes at the 0.7° resolution of ERA-I. Boxes of different sizes were explored and qualitatively similar results were found. Using 8 × 8 grid boxes yielded the highest *r*^{2} value in the regression analysis.

^{4}

For each of the six variables listed in section 3a, there are 25 area-averaged boxes as described in section 3b.

^{5}

Brooks and Carruthers (1978) defined an acceptable correlation as a correlation exceeding 0.5 or an *r*^{2} value exceeding 0.25 for meteorological data.

^{6}

In this dataset, the number of storms in each month is close to a normal distribution with a peak in August–September and a slight positive skew to later months (not shown). Both GMJ and GTD have similar distributions to the overall number of storms, and thus the observed shift in sea surface temperatures in Fig. 12 is not due to seasonal differences.

^{7}

Note that although the stronger circulation associated with GMJ implies that the developing TC might be hindered by upwelling of cooler waters through Ekman pumping, it is important to remember that these analyses are for tropical depressions and not for intense hurricanes. Therefore, the contribution of Ekman pumping to weakening the TC should be inefficient (Price 1981).