## Abstract

A Poisson regression between the observed climatology of tropical cyclogenesis (TCG) and large-scale climate variables is used to construct a TCG index. The regression methodology is objective and provides a framework for the selection of the climate variables in the index. Broadly following earlier work, four climate variables appear in the index: low-level absolute vorticity, relative humidity, relative sea surface temperature (SST), and vertical shear. Several variants in the choice of predictors are explored, including relative SST versus potential intensity and satellite-based column-integrated relative humidity versus reanalysis relative humidity at a single level; these choices lead to modest differences in the performance of the index. The feature of the new index that leads to the greatest improvement is a functional dependence on low-level absolute vorticity that causes the index response to absolute vorticity to saturate when absolute vorticity exceeds a threshold. This feature reduces some biases of the index and improves the fidelity of its spatial distribution. Physically, this result suggests that once low-level environmental vorticity reaches a sufficiently large value, other factors become rate limiting so that further increases in vorticity (at least on a monthly mean basis) do not increase the probability of genesis.

Although the index is fit to climatological data, it reproduces some aspects of interannual variability when applied to interannually varying data. Overall, the new index compares positively to the genesis potential index (GPI), whose derivation, computation, and analysis is more complex in part because of its dependence on potential intensity.

## 1. Introduction

We are interested in the relationship between the statistical distribution of tropical cyclone genesis (TCG) and the large-scale climate. If the climate changes, either because of natural or anthropogenic causes, will there be more or fewer tropical cyclones in a given basin? Will their spatial distribution within the basin change? If the distribution of tropical cyclone genesis does change, what climate factors are most important in producing that change and why?

At present, we lack a solid theoretical foundation with which to answer these questions from first principles. Numerical models are becoming able to provide plausible answers as available computational power now permits the use of global high-resolution models that simulate both the global climate and tropical cyclones with some fidelity (e.g., Oouchi et al. 2006; Bengtsson et al. 2007; Gualdi et al. 2008; Zhao et al. 2009). These models are expensive, however, and are subject to all the normal limitations of numerical models: they may be biased, and simulations with them do not automatically provide understanding. Empirical study of the problem using the observational record remains relevant.

Gray (1979) first developed an empirical “index” for tropical cyclogenesis. Gray’s index, and those that have been developed later, are functions of a set of predictors—physical fields to which genesis is believed to be sensitive, usually computed from large-scale data and often averaged over a month or some comparable duration—weighted in such a way that larger values of the index are indicative of a greater probability of genesis. Later investigators, following the same basic approach, have modified Gray’s index by changing either the predictors or the functional dependence of the index on them (e.g., DeMaria et al. 2001; Royer et al. 1998; Emanuel and Nolan 2004; Camargo et al. 2007a; Sall et al. 2006; Bye and Keay 2008; Kotal et al. 2009; Murakami and Wang 2010).

The present study follows the same general approach, with some incremental improvements. We aim to improve both the performance of the index and the degree to which its derivation can be understood and reproduced. To motivate this work, we describe some limitations of the Emanuel and Nolan genesis potential index (GPI; as described in more detail by Camargo et al. 2007a), which we take to be more or less representative of the state-of-the-art. The GPI has been applied widely and successfully to study variations of genesis frequency on various time scales in reanalysis and models (e.g., Camargo et al. 2007a,b; Vecchi and Soden 2007b; Nolan et al. 2007; Camargo et al. 2009; Lyon and Camargo 2009; Yokoi et al. 2009; Yokoi and Takayuba 2009). However, the GPI has the following limitations:

Its derivation was partly subjective and thus cannot be easily reproduced.

One of its thermodynamical predictors, potential intensity (PI), is a highly derived quantity whose computation requires use of a sophisticated algorithm. It is not clear whether this degree of technical difficulty and theoretical complexity is necessary, or whether equal performance can be obtained with a simpler predictor.

The choice of relative humidity (RH) at a single level as the other thermodynamic predictor, though reasonable, is not precisely justified and in practice generally requires use of assimilated humidity fields that are not strongly constrained by observations.

The GPI itself compared to the observed genesis climatology has some systematic biases. For example, in the seasons when no tropical cyclones are observed, it continues to predict a nonnegligible probability of genesis, and in some regions of particular interest, such as the tropical Atlantic Main Development Region, the GPI underpredicts the rate of genesis during the peak season.

Of these limitations, the first is perhaps the most significant. The importance of the problem warrants development of an index by a process that is clear, objective, and reproducible. The procedure should make apparent the consequences of the choices made, so that the procedure can be easily varied and adapted because of the requirements of some particular application, to make use of new observations of the predictors (or different predictors) or for other unforeseen reasons.

The second and third limitations are more minor but still worthy of consideration. PI was chosen by Emanuel and Nolan (2004) to replace the thermodynamic parameter used by Gray. Gray’s thermodynamic parameter is proportional to the difference between upper-ocean heat content [sometimes replaced in later work by sea surface temperature (SST)] and a fixed-threshold value, below which genesis is assumed impossible (Gray 1979). While deep convection does seem to be very roughly parameterizable as depending on SST above some threshold (e.g., Gadgil et al. 1984; Graham and Barnett 1987; Fu et al. 1990; Zhang 1993; Fu et al. 1994; Back and Bretherton 2009), our current understanding is that this threshold should not be fixed but should vary with the mean climate (whether because of anthropogenic or natural causes) because stability arguments show that the threshold is a function of tropical tropospheric temperature (e.g., Sobel et al. 2002; Chiang and Sobel 2002; Su et al. 2003). In particular, Yoshimura et al. (2006) and Knutson et al. (2008) show that the SST threshold for tropical cyclone formation rises in global warming simulations with high-resolution models. PI depends on the mean climate in a way that may be plausibly assumed to capture this dependence, and this is an improvement on Gray’s SST predictor for studying the influence of large-scale climate variability and change on genesis frequency.

However, the computation of PI involves a complex algorithm, which increases the difficulty of computing the index and understanding its behavior. Moreover, the use of PI does not add a great deal of theoretical justification since PI is a theoretical prediction of the maximum tropical cyclone intensity rather than the likelihood of genesis (at which point, by definition, tropical cyclone intensity is at a minimum). It is not clear that simpler predictors could not provide comparable performance. In particular, relative SST(*T*)—the difference between the local SST and the mean tropical SST—has been shown to be highly correlated with PI (Vecchi and Soden 2007a; Swanson 2008) as can be explained by straightforward physical arguments (Ramsay and Sobel 2011). Relative SST is similar to Gray’s original SST parameter but is a linear function of SST (no Heaviside function) and allows for change in the mean climate by using tropical mean SST in place of a fixed threshold value.

There is little doubt that free-tropospheric relative humidity is a factor in tropical cyclogenesis (e.g., Gray 1979; Emanuel 1989; Cheung 2004), but the comparative influence of different parts of the vertical profile of relative humidity is not precisely known. Therefore, the choice of level or levels to use as predictors in an index is somewhat arbitrary. Since microwave satellite retrievals of column-integrated water vapor are available, it seems reasonable to consider this quantity as a predictor in place of reanalysis products. Given the possible biases in either satellite retrievals or reanalysis products, it should be noted that the regression can implicitly correct systematic errors in its inputs.

In this study, we address the first three limitations and examine the extent to which choices we make in the development of the index influence its performance (the fourth). It turns out that the change that leads to the most significant improvement in performance is not related to any of the first three issues above but rather involves the vorticity parameter. This result in turn has implications about the physics of genesis. While the presence of environmental vorticity is required for TCG, we find that the sensitivity of TCG to climatological absolute vorticity (AV) is nonlinear. When the climatological absolute vorticity exceeds a threshold, further increase does not increase the climatological likelihood of TCG. Inclusion of this functional dependence in the index significantly improves the index performance.

The paper is organized as follows. Section 2 addresses the data used and the Poisson regression methodology used to construct the index. Section 3 details the construction of the index. Section 4 examines the properties of the index, including spatial distribution; and basin-scale quantities, dependence on climate variables, seasonal cycle, and interannual variability. Summary and conclusions are given in section 5.

## 2. Data and methodology

### a. Data

All data are represented on a 2.5° × 2.5° longitude–latitude grid extending from 60°S to 60°N. Values of 850-hPa absolute vorticity, 600-hPa relative humidity, and vertical shear (*V*) between the 850- and 200-hPa levels come from the monthly mean values of the National Centers for Environmental Prediction (NCEP)–National Center for Atmospheric Research (NCAR) reanalysis (Kalnay et al. 1996; Kistler et al. 2001) and the 40-year European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis (ERA-40) (Uppala et al. 2005) datasets. Climatological means of the variables for both datasets were computed using the common 40-yr period 1961–2000.

The column-integrated relative humidity was calculated following the procedure developed in Bretherton et al. (2004). We obtained the retrievals of column-integrated water vapor *W* from the Remote Sensing Systems Inc. (see http://www.remss.com) for all available Special Sensor Microwave Imager (SSM/I; satellites F08, F10, F11, F12, F14, and F15) in the period 1987–2008. Details of the SSM/I retrieval algorithms are given in Wentz and Spencer (1998). The data are provided on a 0.25° × 0.25° grid and are suitable for use over the ocean only. First, the daily average over each ocean grid point was calculated based on all valid data, then these averaged daily data were rescaled to a 2.5° × 2.5° grid for the region 60°S–60°N. Following Bretherton et al. (2004), we calculated the daily averaged saturation water vapor path *W*_{*}, using the daily temperature data and surface pressure from the ERA-40 and NCEP reanalysis. The saturation specific humidity was calculated at each pressure level and grid point and then vertically integrated for each day of the common period when each reanalysis and SSM/I data were available (NCEPNCEP: 1987–2008, ERAERA: 1987–August 2002). The daily column-relative humidity is then defined as the ratio *W*/*W*_{*}. Then monthly means and climatological values are calculated.

The relative SST is defined as the SST at each grid point minus the mean SST of the 20°S–20°N region (Vecchi and Soden 2007a; Vecchi et al. 2008). The SST product used was version 2 of the National Oceanic and Atmospheric Administration (NOAA) National Climatic Data Center (NCDC) extended reconstruction sea surface temperature (ERSST2; Smith and Reynolds 2004).

The PI was obtained from monthly means of ERSST2, sea level pressure, and vertical profiles of atmospheric temperature and humidity for both NCEP and ERA reanalysis datasets. The algorithm developed by Kerry Emanuel is a generalization of the procedure described in Emanuel (1995) taking into account dissipative heating (Bister and Emanuel 1998, 2002a,b).

The GPI was developed by Emanuel and Nolan (2004) and discussed in detail in Camargo et al. (2007a) and was also used in Camargo et al. (2007b), Nolan et al. (2007), Vecchi and Soden (2007a), and Camargo et al. (2009). The genesis potential index is defined as

where *η* is the absolute vorticity at 850 hPa in *s*^{−1}, is the relative humidity at 600 hPa in percent, PI is the potential intensity in m s^{−1}, and *V* is the magnitude of the vertical wind shear between 850 and 200 hPa in m s^{−1}.

The tropical cyclone genesis data comes from release v02r01 of the International Best Track Archive for Climate Stewardship (IBTrACS; Knapp et al. 2010) for the period 1961–2000. We define genesis locations as the first positions of those storms that eventually reach maximum sustained wind speed of 15 m s^{−1}. Storms without maximum sustained wind speed data are not included. The exclusion of storms with missing maximum sustained wind speed data has the largest impact on the North Indian Basin.

### b. Poisson regression

Poisson regression is typically used for the modeling of count data such as TCG occurrence (Solow and Nicholls 1990; Elsner and Schmertmann 1993; McDonnell and Holbrook 2004; Mestre and Hallegatte 2009; Villarini et al. 2010). A random variable *N* has a Poisson distribution with expected value *μ* if *N* takes on the values *n* = 0, 1, 2, … , with probability

Here, for each 2.5° × 2.5° grid cell and calendar month, *N* is the number of TCG events during a 40-yr climatological period. Our goal is to predict the expected value *μ* from a vector **x** of climate variables. A model in which the expected value *μ* depends linearly on the climate variables **x** is unsatisfactory since negative values of *μ* may result. A solution is to use a log-linear model where log*μ* is linearly related to **x**, that is,

where **b** is a vector of coefficients, or equivalently,

A constant term (intercept) is included in the model by taking one of the elements of **x** to be unity. This model, where the number *N* of TCG events has a Poisson distribution and the logarithm of its expected value is a linear combination of predictors, is a Poisson regression model—a special case of a generalized linear model.

The climate variables and the observed number of TCG events are defined on a 2.5° × 2.5° latitude–longitude grid. To account for the differing area associated with grid points at different latitudes we modify (4) to become

where *φ* is the latitude. The offset term log cos*φ* is a predictor with coefficient one and serves to make the units of exp(**b**^{T}**x**) be the number of TCG events per area.

The log-likelihood *L* of *k* independent observations *N*_{1}, *N*_{2}, … , *N _{k}* drawn from Poisson distributions with means

*μ*

_{1},

*μ*

_{2}, … ,

*μ*is from (2):

_{k}The mean *μ _{i}* depends on the associated climate variables

**x**

*and the coefficients*

_{i}**b**through the relation in (5). Therefore, for specified observations

*N*and climate variables

_{i}**x**

*, the log-likelihood*

_{i}*L*is a function only of the unknown coefficients

**b**. The coefficients are found by maximizing the log-likelihood

*L*defined in (6).

Here, we fit a single Poisson regression model to climatological data for all ocean grid points and months of the year; the subscript *i* indexes grid points and months. Moreover, we include both the NCEP and ERA data in (6), thus allowing the estimation of a single Poisson regression model relating the expected number of TCG events with climate variables. However, fitting both NCEP and ERA data means the observations in (6) are obviously not independent. Therefore, we deflate the number of observations by a factor of 2. This scaling reduces the log likelihood and increases the standard errors of the coefficient estimates.

The maximized log likelihood is one measure of how well the model fits the data. However, since the maximized log likelihood is the result of an optimization, it is positively biased and this bias increases as the number of predictors increases. This bias is reflected in the fact that the maximized log likelihood always improves as the number of predictors increases, regardless of whether the additional predictors would prove useful on independent data. The Akaike information criterion (AIC) corrects for that bias and attempts to avoid selection of useless predictors and overfitting (Akaike 1973). The AIC is defined as

where *p* is the number of parameters in the model. The AIC is oriented so that models with lower AIC are considered superior. The first term rewards model fit while the second term penalizes models with many parameters. However, AIC is a function of the data and therefore random, and as such, should only be used as a guide in predictor selection.

The variance of a Poisson distribution is equal to its mean. In practice, data often exhibits greater variability, a property denoted as overdispersion. The dispersion parameter *σ* measures the ratio of variance to mean with *σ* > 1 corresponding to overdispersion. The dispersion parameter *σ* is estimated from

A common way to deal with overdispersion, and the approach taken here, is the so-called quasi-Poisson method, in which the coefficient estimates are same as in Poisson regression, but their standard errors are inflated to reflect the overdispersion.

## 3. Construction of a TCG index

Our goal is to construct an index that reflects the dependence of TCG on large-scale climate variables. Necessary conditions for TCG include sufficient environmental vorticity, humidity, ocean thermal energy, and lack of vertical shear. There are several questions regarding how to include these factors in a TCG index. Here, we consider the following:

Is reanalysis relative humidity adequate given that there are relatively few humidity measurements in the oceanic troposphere (Kistler et al. 2001)? Is there any benefit from using column-integrated relative humidity from satellite microwave retrievals?

Should relative SST or PI be used to represent the availability of ocean thermal energy?

To what extent is the dependence of the number of TCG events on the climate variables log linear?

We first broadly address these questions in the framework of predictor selection, using the AIC to assess how well the index fits the observations. Later, in section 4, we examine these questions in terms of basin-integrated quantities and spatial distributions.

First, we consider the Poisson regression based on four climate variables: absolute vorticity, reanalysis midlevel relative humidity, relative SST, and vertical shear. The simplest Poisson regression model assumes no interactions between the predictor variables, that is, no powers or products of the predictors are included in the model, and it has the form

where *μ* is the expected number of tropical cyclone genesis events per month in a 40-yr period, and *η*, , *T* and *V* are, respectively, the absolute vorticity at 850 hPa in 10^{5}*s*^{−1}, the relative humidity at 600 hPa in percent, relative SST in °C, and vertical shear between the 850- and 200-hPa levels in m s^{−1}; *b* is the constant (intercept) term. We adopt the convention that the coefficient subscript indicates the quantity it multiplies, in the notation of (5), **b** = (*b*, *b _{η}*, ,

*b*,

_{T}*b*) and

_{V}**x**= (1,

*η*, ,

*T*,

*V*)—maximizing the likelihood (6) of the observed number of TCG events given the NCEP and ERA climatological data leads to the coefficient values shown in line 1 of the Table 1 estimates. Standard errors for the coefficient estimates are computed from regression statistics and from 1000 bootstrap samples and are shown in Table 1, confirming the significance of the coefficients.

The form of the Poisson regression model means that the coefficients can be directly interpreted as sensitivities. Specifically, for a small change *δ***x** in the climate variables, the change *δμ* in the expected number of TCG events is

That is, for a 0.01 unit change in one of the climate variables, the corresponding coefficient is the percent change in *μ*. For instance, an increase of 1 cm s^{−1} in vertical shear reduces the expected number of TCG events by about 0.15% in this index.

Addressing the first question regarding the choice of humidity variable, we find that using the SSM/I column-integrated relative humidity rather than the reanalysis relative humidity gives a lower AIC value (line 2 of Table 1), indicating a better fit to observations. However, fitting the NCEP and ERA data separately reveals that using the SSM/I column-integrated relative humidity improves the fit for NCEP data but not for ERA data. Figure 1 shows that overall the ERA relative humidity has a stronger seasonal cycle than NCEP in the Northern Hemisphere. NCEP relative humidity differs considerably from ERA in the Southern Hemisphere during austral winter. The difference between the NCEP and ERA relative humidity does not facilitate the use of a single Poisson regression model. The hemisphere- and basin-averaged column-integrated relative humidities computed using SSM/I water vapor path divided by saturation values derived from NCEP and ERA temperatures are more similar to each other (not shown), indicating better agreement in temperature than humidity, as might be expected.

SST and PI are both variables that can be used to quantify the availability of ocean heat for TCG and either could conceivably be used in the index in place of relative SST. For the purpose of fitting the spatial distribution of genesis probability from the present climatology, SST contains nearly the same information as relative SST; they differ only by the annual cycle of mean tropical SST. As the climate varies (because of either anthropogenic or natural causes), we expect the threshold for deep convection to vary roughly with the tropical mean SST (Sobel et al. 2002, e.g.,) and the SST threshold for genesis (to the extent that such a thing exists) to vary similarly; thus, we expect relative SST to be more appropriate than absolute SST for capturing the influence of climate variability on the probability of genesis. Recent studies show a strong empirical relation between relative SST and PI (Vecchi and Soden 2007a; Swanson 2008; Vecchi et al. 2008). PI, on the other hand, has the advantage that it includes atmospheric information in addition to SST that may influence the probability of TCG. However, being the theoretical maximum tropical cyclone intensity, PI was not defined with the purpose of characterizing TCG, and while it is plausible to use it as a predictor, there is no strong theoretical basis for doing so. PI also has the disadvantage that it is a highly derived quantity whose computation, compared to relative SST, is more complex and requires more data. From a practical point of view, we find that using PI in place of relative SST in the Poisson regression gives a higher AIC value (line 3 of of Table 1).^{1}

We examine the log-linear dependence assumption by adding powers and products of the climate variables to the regression. Adding the 10 possible quadratic powers and products of the four climate variables one at a time to the Poisson regression, we find that including the square of the absolute vorticity reduces AIC the most (line 4 of Table 1). The resulting negative coefficient (−0.16) for the square of absolute vorticity (line 4 of Table 1) means that in this index, increases in absolute vorticity reduce the expected number of TCG events for sufficiently large values of absolute vorticity. This behavior can be understood as the regression attempting to accommodate the lack of TCG events at higher latitudes where values of absolute vorticity are high on average. However, while such a dependence may fit the data, it would not appear to have a physical basis; we do not have a physical reason for associating the reduction in TCG occurrence at higher latitudes with increased absolute vorticity. Rather, a physical explanation would be that the lack of TCG events at high latitudes is due to insufficient ocean thermal energy there. A more attractive explanation of the results is that log*μ* has a nonlinear dependence on absolute vorticity, and including the square of the absolute vorticity in the regression approximates that dependence. Including additional powers of absolute vorticity, as in a series expansion, could give a more physically satisfying dependence. However, including more powers of absolute vorticity in the index would increase its complexity and make the estimation of its parameters less robust. Note also, there is a substantial increase in the dispersion parameter *σ*^{2} (line 4 of Table 1).

To examine further the functional dependence of the number of TCG events on absolute vorticity, we fit the Poisson regression for different ranges of absolute vorticity values. Specifically, for a given value *η*′ of the absolute vorticity, we fit the Poisson regression using data in the range (*η*′ − *δη*) ≤ *η* ≤ (*η*′ + *δη*) and thus obtain regression coefficients that depend on *η*′. To the extent that the coefficient *b _{η}* depends on

*η*′, the sensitivity of TCG events to changes in absolute vorticity depends on the value of absolute vorticity, and log

*μ*has a nonlinear dependence on absolute vorticity. This procedure is essentially equivalent to computing the partial logarithmic derivative of the number of TCG events with respect to absolute vorticity. The dependence of

*b*on absolute vorticity is shown in Fig. 2, computed using

_{η}*δη*= 0.5; the coefficient error bars are based on the estimated errors inflated by dispersion. For modest values of the absolute vorticity,

*b*has significant positive values indicating that TCG increases with increasing absolute vorticity. For larger values of the absolute vorticity,

_{η}*b*is not significantly different from zero, indicating that for this range of values, TCG is insensitive to further increases in absolute vorticity. Specifically, for absolute vorticity greater than about 4 × 10

_{η}^{−5}s

^{−1}, further increases in absolute vorticity do not increase the expected number of TCG events. The value of

*b*obtained using all values of the absolute vorticity (light dashed line) is roughly the average of the values of

_{η}*b*conditioned on

_{η}*η*. This observed dependence of

*b*on the value of

_{η}*η*motivates our decision to use the quantity minimum (

*η*, 3.7) rather than absolute vorticity in the index; we refer to this quantity as the “clipped” absolute vorticity. Generalized additive models provide a systematic method for including more complex functional dependence (Mestre and Hallegatte 2009; Villarini et al. 2010). The threshold value 3.7 was chosen to maximize the likelihood of the observations and therefore counts in the AIC as a parameter. Although the number of parameters is increased, AIC is smaller (line 5 of Table 1) for the Poisson regression based on the quantity minimum (

*η*, 3.7). The coefficient of min (

*η*, 3.7) is 1.12 which is larger than that of

*η*(0.56; line 2 of Table 1). We will see later that this feature means that the index based on clipped absolute vorticity responds more strongly to the near-equatorial latitudinal gradient in absolute vorticity without the undesired side effect of generating too many TCG events at high latitudes.

Adding additional powers and products to this set of climate variables does not substantially reduce AIC or change the fitted values. So, we take this model (line 5 of Table 1) as our TCG index and explore its properties in more detail in the next section.

## 4. Properties of the index

### a. Climatological spatial distribution and basin-integrated values

We now examine the properties of the TCG index developed in the previous section, focusing on physically relevant characteristics such as spatial distributions and basin-integrated values. Many of the important features of the index can be seen by examining its annually integrated values shown in Fig. 3. For the most part, there is reasonable agreement, both in spatial structure and in magnitude, between the observations and the indices. However, the magnitude of the NCEP TCG index is weak in the Atlantic main development region. The region of maximum observed density of TCG events in the Northern Pacific extends farther equatorward than is seen in either of the indices. Neither index matches the observed values in the Arabian Sea and the southern part of the North Indian Ocean. Values of the indices in the South Pacific are too large compared to observations while values in the eastern Pacific are too weak. Zonal sums of the observations and indices in Fig. 3d show that the NCEP TCG index, and to a lesser extent the ERA TCG index, is too weak in the Northern Hemisphere. The agreement in the Southern Hemisphere between the zonal sums of the observations and indices masks what we show later are off-setting biases in the annual cycle. Both indices show small but nonzero likelihood of TCG on the equator. For comparison we also show zonal sums of the GPI based on NCEP and ERA data.

We examine the seasonal cycle of the TCG index by comparing the basin-integrated climatology of the TCG index with that of the observations; basin domains are defined in Table 2. Figure 4 shows that the TCG index captures the overall TCG seasonal structure in all basins to some extent. No scaling is applied to the basin-integrated TCG index values; the differing basin sizes and differing areas of grid points at different latitudes are included as an offset in the Poisson regression as described previously. For the most part, TCG indices based on the NCEP and ERA data have similar properties. In the Southern Hemisphere, the TCG index representation of the active season is not active enough, and its representation of the inactive season is too active. Failure to capture the peak activity is seen in the South Indian and Australian Basins; peak activity is overestimated in the South Pacific basin consistent with Fig. 3. TCG index values are too large in all of the Southern basins during the inactive austral winter period. The two peaks in the North Indian seasonal cycle are reproduced. In the central North Pacific, the TCG index overestimates the amplitude during active season and has its maximum value about a month too late in the calendar year. The peak amplitude of the TCG index in the eastern North Pacific is too weak and phased about a month too late in the calendar year.

Figure 4 also shows the hemispheric and basin-integrated number of observed storms and index when the reanalysis relative humidity is used. In this case, the ERA-based index is too active during the peak seasons in the Northern Hemisphere, especially during August in the western North Pacific, central North Pacific, and Atlantic. In the Southern Hemisphere totals, the ERA relative humidity-based index is comparable to observations during the active phase while NCEP relative humidity–based index is too weak. This behavior can be understood from the differing seasonal cycles of NCEP and ERA relative humidity shown in Fig. 1.

The observed and modeled number of TCG events per year for the January–March (JFM) and August–October (ASO) seasons using NCEP and ERA data, respectively, are shown in Figs. 5a,b and 6a,b. Peak season spatial distributions of the TCG index are similar to observations. ERA has a more active development region in the Atlantic. NCEP has a more active western North Pacific basin. The negative bias of the TCG index in the eastern North Pacific and positive bias in the South Pacific is apparent in both datasets. Figures 5e,f and 6e,f show the impact of using reanalysis relative humidity. When using NCEP relative humidity, the NCEP-based index is weaker in the western and eastern North Pacific regions. In contrast, the ERA-based index is stronger in most basins when ERA relative humidity is used.

Using PI in the Poisson regression rather than relative SST leads to basin-integrated index amplitudes that are too low in the Northern Hemisphere and phased too late in the Southern Hemisphere as shown in Fig. 7. This phasing problem is seen in all Southern basins. The problem of low basin-integrated amplitude is the worst in the western North Pacific peak season. Figures 5g,h and 6g,h show that the impact of using PI on the spatial pattern is mostly in its amplitude, with overall index amplitudes being too high in the Southern Hemisphere and too low in the Northern Hemisphere.

If absolute vorticity rather than clipped absolute vorticity is used in the index, the Northern Hemisphere integrated August value is too high, primarily because of it being too high in the Western North Pacific and Atlantic as shown in Fig. 8. Using absolute vorticity rather than clipped absolute vorticity reduces the overall Southern Hemisphere peak values owing to the reduction in the South Indian and Australian Basins. Figures 5i,j and 6i,j show that if absolute vorticity rather than clipped absolute vorticity is used, the TCG index is too large on the equator and extends too far northward in the Atlantic and western North Pacific during ASO. Further spatial details for ASO are shown in Fig. 9. Use of the clipped absolute vorticity improves the spatial pattern in the Southern Hemisphere during JFM by shifting positive values of TCG off of the equator and narrowing the spatial distribution latitudinally.

We now compare the new TCG index with the GPI from Camargo et al. (2007a). As mentioned before, the NCEP PI and ERA PI have systematically different amplitudes and, therefore, so do the GPI values that depend on the third power of PI. We find separate multiplicative constants for the NCEP GPI and ERA GPI so that the area-weighted GPI best fits (in the sense of minimizing the sum of squared errors) the observed number of TCG events. Figure 10 shows that overall GPI is too high in the Southern Hemisphere especially during the inactive season. The GPI peak occurs about a month later in the South Indian basin. Amplitudes of the GPI in the western North Pacific and Atlantic match observations well during the active season but are too strong during the inactive months. Presumably, the imposed scaling of the GPI leads to better fitting in the Northern Hemisphere (where the total number of storms is larger) at the expense of fitting the Southern Hemisphere. Given the somewhat arbitrary scaling of the GPI, it is difficult to say whether the GPI is too small in the Northern Hemisphere or too large in the Southern Hemisphere. A more precise statement is that the difference in the number of storms in the Northern and Southern Hemispheres is too small in the GPI. Figures 5k,l and 6k,l show that GPI spatial patterns extend too far poleward, are too close to the equator (features that are consistent with Fig. 3d), and show less difference between JFM and ASO amplitudes.

### b. Dependence on climate variables

A method for examining the dependence of observed TCG and the TCG index on the individual climate variables is to compute “marginal” functions of a single variable. Marginal functions are constructed by averaging over all the variables except one. For instance, we define the marginal function *N _{η}*(

*η*′) for absolute vorticity by

〈·〉 denotes average. Analogous marginal functions can be defined for relative humidity, relative SST, and vertical shear. The number of TCG events has a log-linear dependence on the climate variables in the Poisson regression model. However, the dependence of the marginal function on the individual variables may not be log linear because of the correlations between the climate variables. Most of the correlations and hence much of the behavior of the marginal functions can be inferred from the latitude dependence of the zonally averaged climate variables.

Figure 11a shows the dependence of *N _{η}*(

*η*′) on vorticity as well as a histogram of the values of vorticity. In all the marginal function calculations, the range of the variable in question is divided into 50 equally spaced bins. For small values of absolute vorticity, the marginal function is an increasing function of absolute vorticity. For vorticity greater than about 4, it is a decreasing function. The explanation for this latter behavior is primarily in the fact that absolute vorticity increases as one moves poleward, and therefore, on average, the largest absolute vorticity values are found at high latitudes where low SST values make TC formation unlikely. The dashed line in Fig. 11a shows the behavior for a model based on absolute vorticity rather than clipped absolute vorticity. Not using clipped absolute vorticity leads to an index that does not respond strongly enough to small values of absolute vorticity near the equator and responds too strongly to large values of absolute vorticity.

The marginal function for relative humidity is mostly an increasing function of relative humidity (Fig. 11b). However, the number of TCG events decreases for relative humidity near 80% and there are no TCG events for higher values of relative humidity. This behavior can be understood by noting that relative humidity has its largest values on average near the equator and at high latitudes, both regions where there are few TCG events. What seems to be a nonlinear (in log) dependence is really a reflection of the correlation of relative humidity with other variables; near the equator relative humidity is on average a decreasing function of latitude while absolute vorticity is increasing. At high latitudes, relative humidity is an increasing function of latitude while relative SST is decreasing.

The marginal function for relative SST is an increasing function of relative SST except for the very highest values of relative SST where there are no TCG events (Fig. 11c). This regime corresponds to locations near the equator where absolute vorticity is small and there are few TCG events.

For very small values of vertical shear, the index marginal function for vertical shear decreases little as vertical shear increases, and the observed number of storms actually increases slightly (Fig. 11d). This behavior is explained by the fact that in the tropics zonally averaged vertical shear is an increasing function of latitude, thus reductions in vertical shear may correspond to moving closer to the equator and not conditions that are more favorable for TCG. As vertical shear increases further, the marginal function decreases. The “heavy tail” poorly described by the Poisson model may be due to subtropical storms that can form in a high shear environment (Evans and Guishard 2009; Guishard et al. 2009), presumably in many cases deriving energy from the shear via baroclinic eddy dynamics (Davis and Bosart 2003).

### c. Decomposition of the seasonal cycle

The form of the index as the exponential of a sum of factors allows easy quantification of the importance of each of the factors. Figure 12 shows the seasonal cycle of the basin-averaged individual factor anomalies; anomalies are with respect to the annual average. In an overall sense, relative SST variation has the biggest impact on the seasonal cycle followed by vertical shear and finally relative humidity; clipped absolute vorticity has little contribution to the seasonal cycle. In the eastern North Pacific, vertical shear has a larger contribution to the seasonal cycle than does relative SST. For the most part, the seasonal cycles of the three factors are in phase. However, the behavior in the North Indian Basin with its two maxima is more complex with relative SST playing a minor role compared to vertical shear (Gray 1968; Evan and Camargo 2011). There, the reduction in vertical shear results in the premonsoon maximum. The increase in relative humidity following the start of the monsoon is offset by increases in vertical shear and the number of storms decreases. The reduction in vertical shear and continuing high relative humidity results in the second maximum.

### d. Interannual variability

We now examine how well the TCG index developed with climatological data can reproduce basin-averaged interannual variability. We consider the period 1982–2001, a period when TCG observations are good. Since SSM/I data is not available during this period, we use an index based on reanalysis relative humidity. The Poisson regression model is fit to climatological data (line 6 Table 1) and applied to interannual data.

Tables 3 and 4 show the correlation between the observed and modeled basin-integrated seasonal (3-month) total number of TCG events based on NCEP and ERA data, respectively; Tables 5 and 6 show root-mean-squared (RMS) errors. Correlations were computed only for seasons whose average number of TCG is greater than one; correlation is a poor measure of association for Poisson variables with small expected values. Monte Carlo calculations show that the 95% significance level for correlation between two Poisson variables depends strongly on their mean value when the mean value is less than one. As the mean value increases, the Poisson variables become approximately Gaussian and the 95% significance level for correlation approaches that for Gaussian distributed variables, which for sample size 20 is 0.377. Here, we conservatively consider correlations greater than 0.4 to be significant. The NCEP-based index has 19 season basins with significant correlations while the ERA-based index has 15. For the most part, the TCG indices based on the two datasets show similar correlation levels and seasonality. Neither the NCEP nor the ERA index shows any significant interannual correlation in the North Indian basin. However, NCEP has some significant correlations in the Australian region during austral summer while ERA does not. Correlation levels are roughly comparable to those found in Camargo et al. (2007a) using GPI and different observed genesis data. The RMS errors of the ERA-based index are considerably larger than those of the NCEP-based index, especially in the WNP during boreal summer. The large errors in the ERA-based index can be attributed to the behavior of the ERA 600-hPa relative humidity; Daoud et al. (2009) found large devations in 850-hPa relative humidity data from NCEP and ERA over the North Atlantic Ocean. Figure 13 shows the NCEP and ERA July–September (JAS) 600-hPa relative humidity averaged over the box 120°E–180°, 0°–30°N. Prior to 1972 the two analyses agree. After 1972, the ERA relative humidity exhibits greater variability and exceeds its 1961–2000 climatological value. Consequently, during the period 1982–2001, the ERA-based index presents values considerably larger than its 1961–2000 climatological value.

Much of the interannual variability is related to El Niño–Southern Oscillation (ENSO). The common period for the NCEP reanalysis and ERA is 1958–2002. For the purpose of compositing, the 11 years with the highest values of the three-month Niño-3.4 index were classified at El Niño years and the 11 years with the lowest values as La Niña years. Table 7 shows the years selected for the composites. Figure 14 shows the NCEP and ERA based El Niño–La Niña composite difference maps for JFM and ASO. In JFM, the El Niño–La Niña composite shows a decrease in the western North Pacific and an increase in the central North Pacific, an equatorward shift in the South Pacific, and a decrease in the Australian Basin. In ASO, the El Niño–La Niña composite shows a reduction in the North Indian, western North Pacific, and Atlantic basins, and increases in the central North Pacific and eastern Pacific basins. As shown in Camargo et al. (2007a) for GPI, these shifts in the TCG ENSO composites reflect well the observed TCG behavior in the various basins for JFM and ASO.

## 5. Discussion: The role of vorticity

The superior performance of the clipped vorticity relative to vorticity itself is an unexpected result, which may have some significance for our understanding of the physics of the genesis process. It is well known that a finite background low-level absolute vorticity is necessary to the genesis process, as is immediately evident from the fact that genesis almost never occurs within a few degrees of the equator. The precise dependence of the probability of genesis on the value of the background vorticity is less clear. It is plausible, but by no means obvious, that once the vorticity reaches some sufficient value, it no longer is a rate-limiting factor and other aspects of the environment (such as thermodynamic parameters and vertical shear) become more critical. Our results suggest that this is in fact the case.

This conclusion is most likely dependent on the choice of averaging time used to define the environmental fields. On daily time scales, it is certain that the existence of a preexisting tropical depression makes genesis more likely compared to the absence of a depression, and it seems likely that the vorticity of the depression, which presumably in many cases exceeds 4 × 10^{−5} s^{−1}, is one of the factors that makes it so. Nonetheless, inasmuch as it is useful to quantify the probability of genesis based on monthly mean fields, our finding indicates that the probability of genesis does not increase further with low-level absolute vorticity once that variable reaches a threshold value. If one were to use the index derived here—or any similar one derived from monthly or climatological data—for prediction on shorter time scales, it would be important to reconsider this issue and perhaps to modify the derivation of the index to account for the time scale dependence of the vorticity influence.

## 6. Summary and conclusions

The likelihood of tropical cyclone genesis (TCG) being observed depends on features of the large-scale climate. Therefore, changes in climate because of either natural or anthropogenic causes can lead to changes in the likelihood of TCG. Given the incompleteness of the theoretical understanding of TCG, empirical indices are a useful way of encapsulating observed relations between TCG and large-scale climate variables. Here, in the spirit of earlier work (Gray 1979; DeMaria et al. 2001; Royer et al. 1998; Emanuel and Nolan 2004), we construct a TCG index that is a function of climate variables and whose size reflects the probability of genesis.

We construct the index by developing a Poisson regression between the observed monthly number of storms over a 40-yr period and the monthly climatological values of the climate variables. An attractive feature of this approach is that it is objective and hence easily applicable to other datasets. Moreover, the regression methodology provides a natural framework for selecting the variables to be used in the index and assessing the performance of the index. Initially, we take as predictors in the index: absolute vorticity at 850-hPa, 600-hPa reanalysis relative humidity, relative SST, and vertical shear between the 850- and 200-hPa levels; relative SST is the difference between the local SST and the mean tropical SST.

The Poisson regression assumes a log-linear relation between the number of TCG events and the climate variables. This assumption is equivalent to assuming that the sensitivity, as measured by the logarithmic partial derivative, of the number of storms to changes in the individual climate variable is constant. We find that the data do not support this assumption for absolute vorticity. In particular, the sensitivity of the number of TCG events to absolute vorticity is roughly constant and nonzero for values of absolute vorticity less than 4 × 10^{−5} s^{−1}, while for values of absolute vorticity greater than 4 × 10^{−5} s^{−1}, it is close to 0. This property of the data suggests the use of the “clipped” absolute vorticity, defined as the absolute vorticity itself when that quantity is below a threshold and the threshold value otherwise. We find that use of the clipped absolute vorticity in the TCG index improves the fit of the regression and results in more realistic spatial distributions with fewer TCG events near the equator and at high latitudes. Besides the practical value of this result for improving the performance of the index, it suggests a physical interpretation that is relevant to our understanding of the genesis process: while greater low-level ambient (monthly mean) vorticity increases the probability of genesis up to a point, beyond that point it does not continue to do so. While it is likely that increases in local vorticity on the daily time scale would still be a positive factor in genesis, our result may indicate that on a monthly mean basis, once vorticity is sufficiently large, it tends to be the case that other factors (either thermodynamics or vertical shear) become rate limiting.

We make a limited exploration of some alternative predictors in the index. We examine the impact of using SST or PI rather than relative SST and of using satellite-based column-integrated relative humidity rather than the reanalysis products. Using climatological data, relative SST and SST contain nearly the same information, and indices based on them are very similar. However, relative SST allows better for changes in the mean climate, though perhaps still not optimally. PI also depends on the mean climate and has been observed to be well correlated with relative SST (Vecchi and Soden 2007a; Swanson 2008), a result with a straightforward physical basis (Ramsay and Sobel 2011). However, relative SST is considerably simpler to compute and understand than PI, and in fact, we find that relative SST performs better in the index than PI. The relative advantage of using satellite-based column-integrated is mixed compared to using reanalysis humidity at a single level, with a modest positive (negative) impact seen with respect to the NCEP (ERA) reanalysis product.

Overall the TCG index reproduces much of the observed basin-integrated seasonality and spatial patterns. The index also reproduces well the observed marginal dependence of the number of TCG events on the individual climate variables. This dependence is not log linear owing to the correlations between variables. For the most part, the NCEP- and ERA-based indices have similar properties and common deficiencies. There are errors in the details of the spatial structure in the Northern Pacific. Index values are too small in the Arabian Sea. The observed number of TCG events in the South Pacific is smaller than that predicted by the index. The magnitude of the NCEP-based TCG index is weak in the Atlantic main development region. Comparison with the GPI, a TCG index based on PI (Emanuel and Nolan 2004; Camargo et al. 2007a) shows that the index developed here has better performance and avoids the complexity of associated with PI.

Developing the TCG index using climatological data results in an index that fits the climate–TCG covariability contained in the seasonal cycle and in different geographical regions. Applying the index to interannually varying climate data, we show that the index is also able to reproduce some interannual variability and spatial shifts because of ENSO. Future work will apply the index to simulated climate change scenarios.

## Acknowledgments

We thank the anonymous reviews for their useful comments and suggestions. We thank Larissa Back and Chris Bretherton for making available the scripts for calculation of column-relative humidity. We especially thank Larissa Back for her help and explanations. We also thank Kerry Emanuel and Gabriel Vecchi for discussions. The ECMWF ERA-40 data used in this study were obtained from the ECMWF data server. SSM/I data are produced by Remote Sensing Systems and sponsored by the NASA Earth Science MEaSUREs DISCOVER Project. Data are available online at www.remss.com. MKT is supported by a grant/cooperative agreement from the National Oceanic and Atmospheric Administration (NA05OAR4311004). SJC and AHS acknowledge support from NOAA Grant NA08OAR4320912. The views expressed herein are those of the authors and do not necessarily reflect the views of NOAA or any of its subagencies.

## REFERENCES

## Footnotes

^{1}

We note that ERA PI is systemically larger than NCEP PI by about 38%; to fit NCEP PI and ERA PI simultaneously in the Poisson regression, we scale ERA PI by the factor 0.72.