## Abstract

An autoregressive model is developed to simulate the climatological distribution of global tropical cyclone (TC) intensity. The model consists of two components: a regression-based deterministic component that advances the TC intensity in time and depends on the storm state and surrounding large-scale environment and a stochastic forcing. Potential intensity, deep-layer mean vertical shear, and midlevel relative humidity are the environmental variables included in the deterministic component. Given a storm track and its environment, the model is initialized and then iterated along the track. Model performance is evaluated by its ability to represent the observed global and basin distributions of TC intensity as well as lifetime maximum intensity (LMI). The deterministic model alone captures the spatial features of the climatological TC intensity distribution but with intensities that remain below 100 kt (1 kt ≈ 0.51 m s^{−1}). Addition of white (uncorrelated in time) stochastic forcing reduces this bias by improving the simulated intensification rates and the frequency of major storms. The model simulates a realistic range of intensities, but the frequency of major storms remains too low in some basins.

## 1. Introduction

For assessing tropical cyclone (TC) risk, it is necessary to estimate the risk of the most severe storm with a nonnegligible probability of making landfall at a given location. Even a very small probability may be nonnegligible if the impact of the event would be sufficiently high. In most, if not all, locations, the historical record alone is inadequate to characterize these low-probability, high-impact events. This is even more true in a changing climate, when the historical record may not be representative of future conditions.

Global climate models (GCMs) can simulate many features of the TC climatology. Recent improvements in both computational power and understanding of atmospheric dynamics and physics have led to the development of climate models with impressive ability to predict the interannual variability of TC activity (e.g., Vitart et al. 2007; Zhao et al. 2009, 2010; Chen and Lin 2013; Camargo and Wing 2016). Nevertheless, few climate models are capable of simulating the most intense storms, and the simulated distribution of lifetime maximum intensity (LMI) in these models is often deficient (Zhao et al. 2009; Manganello et al. 2012, 2014; Wu et al. 2014; Oouchi et al. 2006). The LMI distribution is a basic measure for understanding and describing the TC climatology in both current and future climates (e.g., Emanuel 2000; Kossin et al. 2014; Zhao et al. 2009; Manganello et al. 2014). Unlike other natural hazards whose rarity increases with intensity—for example, earthquakes (Hristopulos and Mouslopoulou 2013) and tornadoes (Feuerstein et al. 2005)—the LMI distribution is bimodal with peaks located at about 50 kt (1 kt ≈ 0.51 m s^{−1}) and at 110–130 kt (Kossin et al. 2013; Lee et al. 2016). In observations, the storms that undergo rapid intensification (RI) at some point during their lifetime are responsible for the second peak of the LMI distribution (Lee et al. 2016). Therefore, one explanation for the failure of most climate models to simulate the bimodal LMI distribution is their failure to simulate RI, the dramatic strengthening of a TC in a short time. High spatial resolution appears to improve the simulation of intensification rate and the LMI distribution (Manganello et al. 2012; Murakami et al. 2012). Using a coupled GCM with 25-km horizontal grid spacing, Murakami et al. (2015) successfully modeled hurricane intensities in categories 4 and 5 on the Saffir–Simpson scale with frequencies comparable to observations but with a unimodal LMI distribution.

However, apart from any limitations in their fidelity, high-resolution GCM simulations are computationally expensive, and that expense may limit their use in sampling the full range of low-probability, high-impact events. Alternative approaches of estimating TC risk, using statistical or simplified dynamical models, are desirable as a complement to GCM simulations.

Statistical models based on empirical equations are commonly used for prediction of TC activity on seasonal and longer time scales. Instead of simulating storms directly, these models usually predict their frequency or intensity-associated metrics, such as accumulated cyclone energy (ACE) or power dissipation index (PDI) from climate indices and/or large-scale forcing (e.g., Gray 1984; Klotzbach and Gray 2009; Klotzbach and Oliver 2015; Wang et al. 2009; Davis et al. 2015). These approaches do not directly calculate the probabilities of the most rare, extreme landfalling events, however.

One set of approaches for estimating the risk of low-probability, high-impact events involves generating large numbers of synthetic storms using methods whose computational cost is much lower than that of high-resolution GCMs. Such approaches simulate TC genesis, track, intensity, and size using varying combinations of statistical and dynamical methods. Some models simulate the storm track and intensity using stochastic models trained on the historical hurricane data as a function of location, storm characteristics, and/or environmental parameters. These include the models developed by Hall and Jewson (2007), Hall and Yonekura (2013), Nakamura et al. (2015), and AIR Worldwide Corporation (AIR 2015). These models’ use of the historical record or their choices of environmental parameters may restrict their use to the current climate. The use of fixed sea surface temperature thresholds (e.g., Hall and Jewson 2007; Hall and Yonekura 2013), for example, is likely inappropriate under climate change (e.g., Johnson and Xie 2010; Yoshimura et al. 2006; Knutson et al. 2008). The approach of Emanuel (2006) incorporates more physics, simulating TC intensity with an axisymmetric dynamical model. This dynamical model is coupled to a statistical–dynamical track model and embedded in an explicit three-dimensional representation of the large-scale climate that comes from either a reanalysis dataset or a (possibly low resolution) climate model. The more complete representation of the dependence of storm intensity on the environment makes this approach well suited to the task of quantifying the influence of anthropogenic climate change (e.g., Emanuel 2013).

We are working to develop a new statistical–dynamical downscaling system forced by large-scale environmental fields. Our intent is similar to that of Emanuel (2006) and Emanuel et al. (2006), whose work inspires this research. An essential element is the representation of TC intensity. Our approach here is based on the observed relation between TC intensification and the large-scale environment and thus uses less physics than that of Emanuel (2006) and Emanuel et al. (2006) but more than those of Hall and Yonekura (2013), Nakamura et al. (2015), and AIR (2015). We include an explicit dependence on the local atmospheric environment at each point in the storm’s evolution, requiring an explicit set of three-dimensional fields to represent that environment, as in Emanuel’s approach.

The development of the empirical TC intensity model in Lee et al. (2015) was a first step in developing such an intensity model. We pointed out that a multiple linear regression model widely used operationally [e.g., Statistical Hurricane Intensity Prediction Scheme (SHIPS); DeMaria et al. 2005] is essentially a form of statistical–dynamical downscaling that predicts intensity change based on storm characteristics and large-scale environment. With a similar configuration but differences in detail, a short-lead TC intensity model was built in that study. In the present study, we add a stochastic component to the model of Lee et al. (2015) and use it to simulate TC intensity through the complete storm life cycle as a function of ambient forcing with given observed storm tracks. The model is autoregressive, in that the linear stochastic model is initialized at the beginning of the storm and iterated over the TC’s lifetime with subsequent intensity values depending on earlier ones as well as the environment. The stochastic component draws from the empirical errors of the deterministic model and makes it possible for the complete model to simulate storms with realistic intensities.

After developing the stochastic model, we model the TC intensity climatology in the current climate using both the tracks and the environment taken from observation-based datasets. We focus on the global and basin LMI distributions, as well as on the complete distribution of intensity (not just the lifetime maxima).

This study is organized as follows. The datasets used, the environmental conditions considered, how LMI is defined, etc. are described in section 2. In section 3, a stochastic system is developed by extending the multiple linear regression intensity model used in Lee et al. (2015). The model performance in simulating the LMI distribution is discussed as well. The overall model performance in simulating the TC intensity distribution is shown in section 4. Results and findings are summarized in section 5.

## 2. Data

The best-track dataset HURDAT2, produced by the National Hurricane Center (NHC), is used for North Atlantic (ATL) and eastern North Pacific (EPC) hurricanes (Landsea and Franklin 2013; NHC 2013). Data produced by the Joint Typhoon Warning Center (JTWC) are used for storms in the western North Pacific (WPC), Indian Ocean (IO), and Southern Hemisphere Ocean (SH) (Chu et al. 2002; JTWC 2014). The best-track data include 1-min maximum sustained wind, minimum sea level pressure, and storm location every 6 h. Storm LMI is defined as the maximum sustained wind speed during the storm’s life cycle. We use all the recorded data, including storms with tropical depression (TD) strength.

Large-scale environmental variables examined here are calculated from the 2.5° × 2.5° monthly mean European Centre for Medium-Range Weather Forecasts interim reanalysis (ERA-Interim; Dee et al. 2011; ECMWF 2013).^{1} Monthly data are used instead of higher-frequency (e.g., daily) data because we have previously shown that monthly data are adequate for a statistical TC intensity model (Lee et al. 2015). We consider environmental parameters representing the critical conditions for TC intensification: potential intensity (PI; Bister and Emanuel 2002; Camargo et al. 2007), 800–200-hPa-deep layer mean vertical wind shear (SHRD; Chen et al. 2006), 500–300-hPa midlevel and boundary layer mean relative humidity (midRH and RHbl), total column water vapor (rhCol; Tippett et al. 2011), convective available potential energy (CAPE; Bechtold et al. 2014), and 200-hPa zonal wind (U200) and divergence (div200). Additionally, upper-ocean structures from the approximately 1.8° × 1.8° NOAA/NCEP Environmental Modeling Center (EMC) Climate Modeling Branch (CMB) Global Ocean Data Assimilation System (GODAS; Behringer and Xue 2004; IRI Data Library 2013) are used to calculate the upper-100-m mean ocean temperature (T100; Price 2009).

The monthly reanalysis fields are first linearly interpolated to the forecast day and storm location. The interpolated fields are averaged over a disk extending 500 km from the storm center for PI, over a circle with radius of 1000 km for div200, and over an annulus extending 200–800 km around the storm center for the remaining predictors. In the end, the predictors are averaged over the forecast interval (e.g., 12 h). Section 3 describes the procedure for selecting which environmental variables are used as predictors.

Data from 1981 to 1999 are used for the model development, and data from the period 2000–12 are used for evaluating model performance. All the climatological distributions shown here (i.e., all figures) use TC data from 2000 to 2012. Throughout the study, we use a Saffir–Simpson scale to categorize storm strength. Major TCs are defined as category-3–5 storms (LMI > 96 kt), and intense storms are category-4 or category-5 storms (LMI > 113 kt).

## 3. The global stochastic intensity model and LMI distribution

The probabilistic multiple linear regression model (MLR) in Lee et al. (2015) was developed to predict ATL TC intensity at 12-h intervals up to five days in advance. Here we apply the same MLR methodology to TC intensity in all basins. The statistics of storm characteristics and the surrounding environments vary substantially among different basins. For example, storms in the IO tend to be weaker than those in the other four basins owing to their shorter lifetimes (e.g., Lee et al. 2016); PI is usually stronger in the WPC than in the ATL because of the warmer sea surface temperature (Emanuel 1986). Despite these differences, we expect the physical laws controlling TC intensification to be the same in all basins. To the extent that those laws can be captured in the statistical relationships between storm behavior and the surrounding environment, then we can expect a single MLR to perform well in all basins.

On the other hand, the MLR is at best a crude approximation of the large-scale physics, and its deficiencies likely vary by basin. Allowing the MLR to vary by basin would address such inadequacies to the degree that they can be parameterized by the large-scale environment. Operationally, statistical TC intensity models use different predictors and coefficients in different basins (DeMaria et al. 2005; Knaff et al. 2005; Knaff and Sampson 2009). However, as we will show below, there is no notable disadvantage in using a single global MLR instead of using a set of basin-dependent MLRs, at least with our approach and purpose. Furthermore, for climate downscaling (the ultimate goal of our project), a single global model is the natural and probably better choice because as climate changes, basin differences may change as well. Moreover, fitting basin-dependent MLRs introduces more sampling variability. Therefore, in this study we focus on a global model, and results from basin-dependent models are only mentioned for comparison purposes.

The formal structure of the model is as follows:

where *V* is the TC intensity, *X* represents the environmental variables related to TC intensification, *L* is a linear regression function (deterministic), and *ε* is the stochastic forcing. The subscripts indicate time. For example, *V*_{t} means TC intensity at time *t* while *V*_{t−12h} means TC intensity 12 h before time *t*. This model is essentially a second-order vector autoregressive model with time step equal to 12 h and environmental variables as exogenous inputs. The choice of order 2 follows from allowing the TC intensity at the next time step to depend on both the current intensity and rate of change of the intensity. We refer to the first term on the right-hand side of Eq. (1) as the deterministic part of the model and the second as the stochastic. The deterministic part is constructed using the MLR methodology. For each storm, Eq. (1) is initialized with the first observed intensity and then iterated with stochastic forcing and the observed environment varying along the track.

In what follows, we describe the development and behavior of the components on the right-hand side of Eq. (1). Definitions of various combinations of models and their acronyms are listed in Table 1.

### a. Short-lead intensity model

In this section, we describe the environmental variables selected to be included in the model and their performance for forecasts with lead times up to 5 days. We begin with the initial pool of predictors representing large-scale forcing listed in section 2, along with storm intensity at the current time and its change over the previous 12 h. Predictors are chosen using forward selection for each basin and lead time (not shown). While the precise predictors chosen can vary by basin and lead time, five predictors are important in all basins and at all leads: initial storm intensity *V*_{0} [one can see it as *V*_{t} in Eq. (1)], previous intensification rate dVdt, translation speed trSpeed, difference of PI and initial storm intensity dPI_V_{0}, and SHRD, in agreement with Lee et al. (2015), who found robust dependence of ATL TC intensification on these predictors, regardless of reanalysis dataset. Therefore, these five predictors are initially selected.

Humidity parameters are not selected initially since they are not useful predictors in all basins at all lead times. Nevertheless, as moisture has been found to be an important factor in controlling the level of TC activity in response to climate change (Emanuel 2008; Camargo et al. 2014), we do include midRH, one of the two moisture variables considered. Large-scale subsurface oceanic conditions impact TC intensification (Schade and Emanuel 1999; Wu et al. 2007) and structure (Lee and Chen 2014) through storm self-induced sea surface temperature cooling. Projected changes in subsurface stratification can affect TC activity in a changing climate (Huang et al. 2015). However, the oceanic parameter T100 used here did not significantly improve the performance at any lead time and therefore is not included in the deterministic model.

Intensity changes at landfall are considered separately. Operationally, an empirical exponential decay model is used to predict intensity change during landfall (Kaplan and DeMaria 1995). Here we use separate multiple linear regression models for ocean and near-land points. We define a predictor dLandMask representing the difference in the surface type between initial and forecast time. The surface type LandMask is defined by averaging a 0.5° × 0.5° resolution land–sea mask (the value of land points is 0 while ocean points are −1) in a 300-km radius of the storm location. The 300-km radius is a proxy for the size of the strong storm circulation. Data points with LandMask values greater than −0.5 have more than 50% of the storm circulation over land and are not used. LandMask values between −1 and −0.5 are labeled as near land while those with value of −1 are labeled as ocean.^{2} Two models are then developed: one with six predictors (*V*_{0}, dVdt, trSpeed, dPI_V_{0}, SHRD, and midRH) for ocean points and a second one with dLandMask as an additional predictor for cases with either initial or forecast location in the near-land group. This approach allows different coefficients for the near-land and ocean points (e.g., coefficients for the 12-h forecast are shown in Table 2).

We first develop a global model for TC intensity without stochastic forcing (called MLR). The MLR is tested using independent data from the period 2000–12, and its overall performance is measured using mean absolute error (MAE) and root-mean-square error (RMSE), calculated against best-track data in each TC basin. MAE and RMSE results are similar to each other and therefore we show only MAE (Fig. 1). The skill of the MLR is compared to that of the persistence model, which predicts no change in the storm intensity. The MAE of the persistence model, in addition to providing a baseline error level, describes the average magnitude of TC intensity change in each basin as a function of time window. Changes in TC intensity are the largest in the WPC, followed by the SH. The frequent and dramatic strengthening of TCs in the WPC and the SH may be related to the inner-core convection, which often is poorly handled by linear statistical intensity prediction models (Knaff et al. 2005). In the IO, the error in the persistence model drops after 84-h lead time, indicating smaller intensity change in long leads. A possible explanation is the shorter lifetime for IO storms. The MLR MAE in each basin is in general less than that of the persistence model, and in that sense the MLR is skillful in all basins. While the MAE of the MLR is the largest in the WPC and SH at almost all lead times (except for 12 h), the MLR provides the greatest advantage over the persistence model and has the lowest normalized error in these basins as well (blue and yellow lines in the Fig. 1c). On the other hand, the MLR for the ATL is more accurate but provides less advantage over the persistence model than in other basins. The 12-h MAE in the ATL is even slightly bigger than that of the persistence model. The advantage of the MLR increases with increasing lead time in all basins, except the IO. The MLR, trained with global data, does not take into account the shorter lifetime of IO storms. The dependent MAE (not shown) is close to the MAE in Fig. 1b and is only slightly smaller, and gives no indication of overfitting.

We also developed a set of basin-dependent models (basin-MLR), which have the potential to perform better than the (global) MLR because they allow basin-dependent relations with the large-scale environment. We use the same set of predictors in all basins to maintain model consistency but allow their coefficients to vary by basin. Similar to the global case, the basin-MLR models have smaller MAEs than those of the persistence model (not shown). For WPC and SH storms, the MLR and basin-MLR have similar MAEs. The MLR performs slightly better than the basin-MLR for IO storms for 12–36-h lead forecasts. For long leads, the basin model in the IO has an advantage, presumably because it recognizes the short storm lifetime. Other uncertainties of the basin-MLR for IO are the poor quality of best-track data (especially before 1990; Chu et al. 2002) and the small sample size. The use of global training data in the MLR is expected to help improve both. For ATL storms, the difference between global and basin MAEs is small, approximately 5% at all leads. For EPC storms, the global MLR MAE is about 10% larger than that from basin-MLR for 120-h lead time, which is an error of only 2 kt. We can conclude that the global MLR performs similarly to the basin-MLR in most basins.

We compare the MAE of the MLR and basin-MLR with that of the persistence model only. Direct comparison with operational tools for predicting TC intensity change (e.g., SHIPS) is not entirely appropriate because the MLR uses monthly averaged reanalysis environment information and best-track storm information, while operational tools must use forecast fields and track. Moreover, the purpose of the MLR here is to provide the deterministic part of Eq. (1) rather than to develop a new forecasting tool. In any case, the error level of the MLR model in the ATL is roughly comparable to that of SHIPS (Lee et al. 2015).

### b. Deterministic model—12-h MLR

The deterministic model in Eq. (1) uses a 12-h time step. The predictor coefficients are given in Table 2. As an example of the model’s behavior, the blue line in Fig. 2a shows the predicted lifetime intensity of Typhoon Fanapi. The model in Eq. (1) is initialized on 0600 UTC 13 September 2010 with an initial intensity of 15 kt and dVdt = 0 and is subsequently iterated with the stochastic term *ε*_{t+12h} set to zero. The modeled intensity is close to observations up to 16 September but is lower than the observed intensity after that. Repeating this calculation for all observed TCs during the period 2000–12 shows that the intensification rates of the deterministic model (blue bars in Fig. 3) are never greater than 10 kt (12 h)^{−1} or less than −15 kt (12 h)^{−1}. The probability density function (PDF) of MLR intensification rate has a peak at 5–10 kt (12 h)^{−1} while the observed one is at 0–5 kt (12 h)^{−1}, though with a tail at high intensities, which the deterministic model lacks entirely. The deterministic model does not produce storms with LMI larger than 100 kt, and its LMI PDF has a strong peak at about 50 kt (blue dashed line in Fig. 4).

### c. Stochastic model—uncorrelated empirical error

The deterministic model is designed to minimize the squared error of 12-h forecasts by explaining as much variability as can be linearly related to the initial storm information and surrounding environment. The variability unexplained by the MLR may be due to other processes (e.g., nonlinear, mesoscale or small scales) and are accounted for in principle by the stochastic term in Eq. (1). Rather than characterizing the stochastic forcing term by a parametric distribution fitted to the residuals, we draw from the training period errors conditional on the initial intensity *V*_{t}, based on our finding that the error depends on the initial storm intensity (Lee et al. 2015). This stochastic forcing is not correlated in time, and we refer to it as MLR&wn, where wn stands for white noise.

The gray area in Fig. 2a shows the PDF of Typhoon Fanapi’s lifetime intensity based on 400 realizations of the MLR&wn model. The observed intensity is at the edge of the 90th percentile of the distribution of realizations. Applied to all storms from 2000 to 2012, the MLR&wn simulated global LMI distribution includes major storms but underestimates their frequency. Only a few storms become intense TCs, and the LMI distribution is unimodal (gray lines in Fig. 4). Compared to the observed LMI distribution (black line), the MLR&wn distribution is biased toward zero in part because the lowest LMI in the best-track datasets is 25 kt and in the MLR&wn is 15 kt. This LMI value corresponds to simulated storms that never intensify since 15 kt is the lowest initial intensity in the best track.

### d. Further model improvement—nonlinear terms

Adding stochastic forcing to the MLR extends the upper range of simulated storm intensity from category 2 to category 5. However, the number of major storms is still underpredicted and there is a leftward bias in the simulated LMI distribution. To improve the model further, we reexamine the model assumptions.

We first examine the assumption of a linear relation between the predictors and TC intensity changes. We follow the same method used in Tippett et al. (2011), in which the coefficient for a predictor is allowed to vary with the value of the predictor. To the extent that the variation is small, the linear assumption is valid. Results (not shown) suggest that the linear assumption is adequate for all variables except dVdt and dPI_V_{0}. The coefficient of dVdt is smaller when dVdt is negative (Fig. 5a), perhaps reflecting the fact that strong weakening is limited. When dVdt is positive, its coefficient varies little. A positive relationship between dVdt and its regression coefficient indicates that a quadratic term of dVdt should be included in the MLR. Similarly, both quadratic and cubic terms in dPI_V_{0} should be included (Fig. 5b). With these nonlinear terms, the coefficients of the deterministic model are still estimated in a linear framework at each time step, and the resulting deterministic model is called MLR3 hereafter, with 3 indicating the highest power of predictor used. The deterministic forecast of Typhoon Fanapi by the MLR3 shows some improvement (blue line in Fig. 2b) but still does not capture the rapid intensification on 16–18 September. The MLR3 does not produce storms with LMI larger than 100 kt (black dashed line in Fig. 4), although it performs better than the MLR. In short-lead forecasting mode, the MAE from the MLR3 is close to that of the MLR in all basins with some modest (less than 10%) improvement in the ATL, EPC, IO, and SH basins and a little worsening in the WPC (not shown).

Adding the nonlinear terms in the deterministic part of equation leads to a new stochastic model called MLR3&wn, for which the range of the 25th–75th percentiles of the PDF of intensity for Typhoon Fanapi becomes narrower and shifts toward the observed intensity (Fig. 2b). The MLR3&wn successfully represents the first peak of the LMI PDF (gray lines in Fig. 4e), with the nonlinear terms preventing storms from being weaker than 15 kt and correcting the leftward bias in the simulated LMI distribution.

Next, we examine the white noise assumption in the MLR&wn model, in which the stochastic term is uncorrelated in time. In fact, forecast errors could be correlated in time. For example, the 12-h forecast errors of the Geophysical Fluid Dynamics Laboratory hurricane model and Hurricane Weather Research and Forecasting Model both have 12-h lag correlations of about 0.45. On the other hand, the 12-h lag correlation is close to 0.1 in the SHIPS model. Here, the 12-h lag correlation of the MLR model error is small and insignificant (black dotted line in Fig. 6). The lag correlation of *ε*^{2} (black dashed line in Fig. 6) indicates that while the errors are uncorrelated in time, they are not independent. This dependence is related to the fact that the magnitude of errors depends on the initial intensity, which is serially correlated. We plan to include the effect of this correlation in our model in a future study.

### e. Quantitative verification—RPSS

We additionally assess the probabilistic quality of the simulated distribution of LMI values. For each storm, we count the number of simulated LMI values in each 5-kt bin from 30 to 150 kt and compare this simulated LMI frequency distribution with the observed LMI value using the rank probability skill score (RPSS). The rank probability score (RPS) measures the sum-squared differences of forecast and observed cumulative occurrence. Smaller values of RPS indicate better forecasts. RPSS compares the average RPS to that of a baseline model, and positive RPSS values indicate more skill than the baseline model. The baseline model used here consists of a deterministic persistence model and the white noise stochastic model, denoted as persistence&wn. RPS and RPSS values are then stratified according to observed LMI value. RPS values from the persistence&wn model (black line in Fig. 7a) increase with increasing observed LMI, indicating the increasing difficulty in predicting LMI. A similar tendency is found in the RPS of the MLR&wn and MLR3&wn models. These two models are both skillful relative to the persistence&wn model (Fig. 7b), although the RPSS values drop with increasing LMI. The advantage of using the MLR as well as MLR3 (either together or individually) is modest for storms with LMI greater than 60 kt. Inclusion of nonlinear terms gives some small improvement (comparing the RPSS values of the MLR3&wn and MLR&wn). The basin-MLR3&wn (the system trained in individual basins; thin light-blue lines) has the same RPSS as the MLR3&wn (red lines), showing that, by this measure, there is no advantage in using basin-dependent models rather than a single global model.

One clear deficiency in the MLR3&wn-simulated LMI distribution (as well as in other stochastic simulations) is the missing second peak at high intensities. The observed bimodality in LMI is related to RI and reflects two types of storms: those that undergo RI sometime in their lifetime (RI storms) and those that do not (non-RI storms; Lee et al. 2016). The definition of RI that best achieves this separation is an increase of 35 kt in the maximum wind speed in 24 h. The MLR alone allows storms to intensify by at most 10 kt (12 h)^{−1} (Fig. 3). The MLR3 shows no significant improvement on the intensification rate, but the stochastic error term allows intensification as large as 30 kt (12 h)^{−1}. The simulated frequency of intensification rates between 15 and 20 kt (12 h)^{−1} is quite close to observations, but the frequency of higher intensification rates is still underpredicted. The addition of nonlinear terms primarily improves the distribution of non-RI storms with little change in the distribution of RI storms (not shown). The MLR3&wn reproduces the LMI distribution best in the ATL and EPC (Fig. 8). The deficiency of the model in generating RI storms stands out the most in WPC and SH because the frequencies of RI and intense storm occurrence are greatest there. Models with dependent data (not shown) show similar improvement at each development stage, and the model limitation in representing the second peak of the LMI distribution remains. Similar to the short-lead models, the overall performance of the stochastic models using dependent data is close to that with independent data.

## 4. TC intensity distribution

The complete distribution of TC intensity *V* (over all 12-h periods, as opposed to LMI) is another meaningful characteristic of the TC intensity climatology. We begin with the PDF and cumulative density function (CDF) of *V*. The observed *V* PDF has a unimodal distribution with a peak at around tropical storm (TS) strength and is right skewed (black line in Fig. 9a). More than 50% of the recorded storm intensities are weaker than or at TS strength (black line in Fig. 9b); only about 20% of storm intensities exceed category 1 in the Saffir–Simpson scale and less than 10% of the recorded intensities are categories 3–5. Considering only the deterministic components, MLR (blue dashed line) and MLR3 (black dashed line) produce more symmetric *V* PDFs than observed. Both these models only match the observed distribution for *V* < 34 kt. Addition of the stochastic components (MLR&wn and MLR3&wn) produces intensity PDFs that match the observed peak location and much of the tail behavior, at least for intensities up to 100 kt. The frequency of major storm intensity is still underpredicted, in agreement with the results from LMI analyses. The simulated intensity PDFs and CDFs have a leftward bias because some of the simulated storm intensities reach 0 kt while 10 kt is the lowest value in the best-track datasets. Differences in *V* distributions are fairly small among the stochastic simulations. The advantages of the models with stochastic forcing and nonlinear terms, while not dramatic, are apparent, especially over the range of 50–100 kt. RPSS analyses reflect their improvement on probabilistic storm intensity forecasts (not shown) similarly to Fig. 7.

Observed and modeled TC intensity along historical tracks are shown in Fig. 10 with the intensity indicated by color. Only the maximum intensity at any given location is visible. Since historical tracks are used in all figures, the only differences in occurrence locations are due to a few instances where modeled storms dissipate early. Consistent with other observational studies (e.g., Knapp et al. 2010), the best-track data (Fig. 10a) show the highest intensities (red areas) where many major storms occur—east of Taiwan, northeast of the Philippines, and over the Caribbean Sea and the Gulf of Mexico. These areas are favorable for intense storms because the sea surface temperatures are usually warm, the upper-oceanic structure is favorable (e.g., Lin et al. 2008), and the large-scale vertical shear is usually weak (Chen et al. 2006). Intensities from the MLR3 (Fig. 10c) show a spatial pattern similar to that found in the observations but with weaker maximum intensities. The similarity between Figs. 10a and 10c indicates that the spatial distribution of strong storms is controlled to a large extent by the large-scale environment. One realization from the MLR3&wn is shown in Fig. 10b and indicates that the stochastic forcing improves the strength of the simulated intensity but slightly worsens some aspects of the spatial distribution. This realization has a couple of unrealistically strong storms in high latitudes in the ATL, a category-3 TC in the central Pacific, and almost no major storms in the Gulf of Mexico. The details of these deficiencies are different in other realizations, as the location of problematic storms is different because the stochastic forcing does not depend on the environment. The persistence&wn model also shows no coherency in the distribution of strong storms (Fig. 10d). In the persistence&wn model, the simulated storm intensities in the EPC and ATL are greater than in other basins because large training errors are more frequent in these two basins compared to the other three basins. The track map from ensemble mean of the MLR3&wn (not shown) is similar to that of the MLR3.

We plot the frequencies of storms with specified intensity ranges in Fig. 11 to compare the observed and simulated distribution of storm intensities at a given location. Figures 11a,c,e,g show maps of the observed annual frequency of TC intensities rated TD, TS, categories 1 and 2, and categories 3–5, respectively. For each range of intensities, this quantity is the product of two factors: average number of storms per unit area (TC frequency) and the fraction of storms in the given range. The TC frequency is the same in both observations and simulations because only historical tracks are used. The TC frequency term has a large role in setting the spatial pattern, especially for weak storms (Fig. 11a). With a higher intensity threshold, the area with relatively high probabilities is more restricted. Category-1 and category-2 storms (Fig. 11e) are more likely to occur east of Taiwan, northeast of the Philippines, and west of Mexico. Over the eastern ATL and 15–20°S band, the probability of a category-1 or category-2 storm is 2%–5% yr^{−1} (° lat/lon)^{−2}. The global frequency of major storms is low, but the frequency in the WPC is twice as large as in other basins (Fig. 11g). The MLR3&wn captures most of the features seen in observations (Figs. 11b,d,f,h). Both the MLR3 and the persistence&wn have the right pattern as well, especially for weak storms (Fig. 12). However, the quantitative accuracy in MLR3&wn-simulated probabilities is due to the well-simulated TC intensities. MLR3, with only a deterministic component, gives almost zero probabilities for major storms while it overestimates the frequency of TD, TS, and category-1 and category-2 storms. Persistence&wn underestimates all categories except TD.

In summary, the deterministic (MLR3) model alone is able to represent much of the TC intensity spatial structure. The stochastic terms significantly improve the simulated intensity distribution, at the cost of slightly worsening the simulated spatial structure. MLR3&wn successfully models the observed distribution of TC intensity, both the overall *V* distribution (PDF and CDF), as well as the spatial patterns. The intensity PDF, compared to the LMI distribution, is less sensitive to model configuration and therefore is a less stringent metric for evaluating our simulation of the TC intensity climatology.

## 5. Summary

We have developed autoregressive (AR) models for simulating the climatological distribution of TC intensity. The AR models consist of a multiple linear regression (MLR) deterministic component and a stochastic component. The deterministic component is an extension of that from our previous study (Lee et al. 2015), which described a model for TC intensity as a function of environmental variables with given observational tracks. Low-frequency (i.e., monthly averaged) reanalysis from ERA-Interim and best-track data from the NHC and JTWC from 1981 to 1999 are used for model development, and data from 2000 to 2012 are used for model verification. We simulate the complete life cycle of TC intensity given observed genesis location, first recorded storm intensity, and track. The key climatological features discussed here are the distributions of lifetime maximum intensity (LMI), as well as the full (not just maxima) distribution of TC intensities *V*.

The model is developed globally. We show that, for our application, there is little disadvantage in using a single global model instead of the basin-dependent ones typically developed for operational statistical forecasting. A practical benefit of the global model is that sample size for estimating parameters is increased, which has a benefit in the Indian Ocean where there are relatively few storms. Moreover, the fact that a single global model is effective provides some evidence that robust physical relationships between TC intensity and environment are being represented. Furthermore, as our goal is climate downscaling and the differences between basins in future climates might not be the same as they are in the current climate, the extra basin-dependent tuning could potentially introduce additional errors. A minimal set of predictors is used, including three environmental conditions [potential intensity (PI), deep-layer mean shear (SHRD), and midlevel relative humidity (midRH)] and three storm state quantities [initial storm intensity (*V*_{0}), change in storm intensity in previous 12 h (dVdt), and storm translation speed (trSpeed)]. PI enters the model in the form of the difference between PI and *V*_{0} (dPI_V_{0}). The influence of land is included in a simple way.

Without stochastic forcing, the simulated distribution of LMI has a negative bias, containing almost no storms with intensities greater than 100 kt and no simulated intensification rates greater than 10 kt (12 h)^{−1}. The systematic negative bias is significantly reduced by the addition of stochastic forcing. The stochastic forcing is constructed from the residuals of the MLR model. The addition of nonlinear terms for some predictors [dVdt^{2}, (dPI_V_{0})^{2}, and (dPI_V_{0})^{3}] in the deterministic component (MLR3) allows some model storms to intensify more than 15 kt (12 h)^{−1}. Inclusion of a stochastic component (MLR3&wn) further improves the distribution of simulated intensification rates and the modeled number of category-3 storms as well. The number of simulated category-4 and category-5 TCs remains too low, however. As a consequence, the simulated global distribution of LMI is unimodal unlike the observed one, which is bimodal. This model deficiency is related to the model’s failure to simulate rapid intensification. With this caveat, MLR3&wn simulates the LMI distribution reasonably well both globally and regionally. The MLR3&wn model reproduces the full (as opposed to just lifetime maxima) observed distribution of storm intensities well, including the PDF, CDF, and spatial structure. Given suitable models for track and genesis, this autoregressive model (MLR3&wn) could be used to downscale tropical cyclone intensity using environmental variables from a low-resolution climate model, keeping in mind deficiencies in the number of rapidly intensifying and high-intensity storms.

## Acknowledgments

The research was supported by the Office of Naval Research under the research grant of MURI (N00014-16-1-2073). We thank Dr. Dmitri Kondrashov from UCLA for his suggestion on the importance of including nonlinear terms in the linear regression model. Comments and suggestions from Dr. Chris Landsea and an anonymous reviewer are appreciated.

## REFERENCES

*Eighth Symp. on Integrated Observing and Assimilation Systems for Atmosphere, Oceans, and Land Surface*, Seattle, WA, Amer. Meteor. Soc., 2.3. [Available online at https://ams.confex.com/ams/pdfpapers/70720.pdf.]

## Footnotes

^{1}

ERA-Interim was 0.75° × 0.75°, but we interpolated to a 2.5° × 2.5° grid.

^{2}

The −0.5 threshold and 300-km radius are chosen arbitrarily, but with these criteria, the model is capable of modeling the land impact on TC intensity change to some degree.