## Abstract

This paper describes the development of a model framework for Forecasts of Hurricanes Using Large-Ensemble Outputs (FHLO). FHLO quantifies the forecast uncertainty of a tropical cyclone (TC) by generating probabilistic forecasts of track, intensity, and wind speed that incorporate the state-dependent uncertainty in the large-scale field. The main goal is to provide useful probabilistic forecasts of wind at fixed points in space, but these require large ensembles [*O*(1000)] to flesh out the tails of the distributions. FHLO accomplishes this by using a computationally inexpensive framework, which consists of three components: 1) a track model that generates synthetic tracks from the TC tracks of an ensemble numerical weather prediction (NWP) model, 2) an intensity model that predicts the intensity along each synthetic track, and 3) a TC wind field model that estimates the time-varying two-dimensional surface wind field. The intensity and wind field of a TC evolve as though the TC were embedded in a time-evolving environmental field, which is derived from the forecast fields of ensemble NWP models. Each component of the framework is evaluated using 1000-member ensembles and four years (2015–18) of TC forecasts in the Atlantic and eastern Pacific basins. We show that the synthetic track algorithm generates tracks that are statistically similar to those of the underlying global ensemble models. We show that FHLO produces competitive intensity forecasts, especially when considering probabilistic verification statistics. We also demonstrate the reliability and accuracy of the probabilistic wind forecasts. Limitations of the model framework are also discussed.

## 1. Introduction

Tropical cyclones (TCs) are complex weather systems that bring flooding, storm surge, high winds, and other hazards to many coastal and island locations. Each year, TCs cause billions of dollars in damage to businesses and property and result in the loss of numerous lives (Pielke et al. 2008). To mitigate such losses and allow vulnerable populations to undertake life-saving preparations, TC forecasts must be provided with sufficient lead time and be reasonably accurate and reliable. Forecasts of TCs have traditionally been separated into two categories: 1) track forecasts, which predict the location of the TC center, and 2) intensity forecasts, which, in the Atlantic basin, predict the 1-min maximum sustained surface wind anywhere in the storm (Landsea and Franklin 2013). The forecasting community has used such metrics to quantify errors in forecast models. Through substantial model improvements, better observations, and improved data assimilation methods, forecast skills for track and intensity have been steadily improving (DeMaria et al. 2014).

One of the advantages of separating TC forecasts into track and intensity is that it allows for straightforward evaluation of model performance. However, there are some significant drawbacks with this approach. First, by definition, deterministic forecasts do not quantify the uncertainty in the track and intensity of a TC, which can be an issue if the intensity of the TC strongly depends on its track. The concept of quantifying forecast uncertainty is not new. In 1992, the National Meteorological Center (NMC) and the European Centre for Medium-Range Weather Forecasts (ECMWF) began running ensemble numerical weather prediction (NWP) systems to forecast the range of future weather conditions, by slightly perturbing the initial conditions and model physics of each member of the ensemble NWP system (Tracton and Kalnay 1993; Molteni et al. 1996). Since then, ensemble forecasts have proven to be vital to advancing probabilistic forecasts of weather, which in turn allow us to quantify weather-related risks (Gneiting and Raftery 2005).

The various components of the TC forecast (e.g., track, intensity, size, rainfall) affect the specific hazards (e.g., wind speed, waves, surge inundation, and riverine flood inundation) that are experienced at a given location. The specific impacts that then result depend on the vulnerability and exposure of the asset at that location. Thus, the ability to quantify the uncertainty of TC wind speed forecasts is of utmost importance, especially for vulnerable communities. Much work on the intrinsic predictability of TCs has focused on the importance of both large-scale influences (Emanuel et al. 2004; Zhang and Tao 2013; Kieu and Moon 2016) and convective-scale processes (Van Sang et al. 2008; Sippel and Zhang 2008; Judt et al. 2016). Initial condition uncertainty in the intensity and inner-core moisture has also been suggested to play a significant role in TC forecast uncertainty (Emanuel and Zhang 2016, 2017). In light of this, deterministic forecasts of TCs may be highly misleading, and probabilistic forecasts that can sample the inherent forecast uncertainty are preferred. It is no surprise, then, that ensemble prediction systems (EPSs) have enhanced probabilistic forecasts of TCs (Majumdar and Finocchio 2010; Hamill et al. 2011). Another approach to generating members of an ensemble is to combine the predictions of many different, independent, models, as is done for some “superensemble” models (Williford et al. 2003; Vijaya Kumar et al. 2003); in fact, even a simple average of the forecasted tracks from a variety of dynamical models has been shown to outperform any of the individual models (Goerss 2000).

Since the wind field of a TC can span hundreds of kilometers, the position and intensity of a TC are insufficient for specifying the conditions at fixed points. In practice, interested parties should be more concerned about the wind speed probability distribution at a single location (pointwise wind speed probabilities), rather than the TC’s maximum wind speed or exact center location. Unfortunately, due to computational constraints, ensemble NWP models are typically run with inadequate horizontal resolution to resolve strong gradients in pressure and temperature that are commonly found in TCs, which often leads to large underestimation of the wind speeds in the simulated TCs (Gentry and Lackmann 2010; Gopalakrishnan et al. 2011). To make matters worse, an ensemble NWP system usually does not have enough ensemble members to flesh out the highest wind speeds of a pointwise wind distribution. Inadequate horizontal resolution and small ensemble size mean that it is impractical to use the (insufficiently) resolved wind fields from ensemble NWP models to produce well-resolved probabilistic, pointwise forecasts of wind speeds.

To address these issues and communicate the pointwise uncertainty of TC winds, DeMaria et al. (2009) developed the Monte Carlo probability^{1} (MCP) model to estimate the pointwise probability of surface winds exceeding the 34-, 50-, and 64-kt (1 kt ≈ 0.51 m s^{−1}) wind speed thresholds. For each forecast, the MCP model generates computationally inexpensive track and intensity realizations by adding random errors to the National Hurricane Center’s official forecasts for track, intensity, and wind radii. The errors are sampled from the official forecast track/intensity/wind radii error distributions over the most recent 5 years. However, this method forces the uncertainty in the model to exactly equal the observed average error; in practice, this usually does not reflect the state-dependent uncertainty for a given forecast. Since the forecast uncertainty for each TC can vary greatly, the climatological uncertainty may poorly reflect the true uncertainty in the track/intensity/wind radii forecasts. DeMaria et al. (2013) improved the MCP model to separate the climatological error distributions into three categories, based on the Goerss predicted consensus error (GPCE), a parameter that measures the extent of the track spread among an ensemble system (Goerss 2007). Goerss (2007) showed that the degree of uncertainty of a forecast could be coarsely predicted based on a low, medium, or high GPCE, and separated the climatological error bins accordingly. While the inclusion of GPCE improved the sharpness of the probabilistic wind forecasts, the MCP model still uses climatologically based forecast errors (stratified by basin) to generate uncertainty in the official forecast. Ensemble-based uncertainties, on the contrary, characterize the state-dependent uncertainty in the large-scale system. Therefore, a model that is able to draw upon the uncertainty represented in the ensemble members of an NWP model, while maintaining the computationally inexpensive components of the MCP model, could improve on probabilistic wind speed forecasts.

We aim to develop a computationally inexpensive pointwise TC wind speed prediction framework that is capable of incorporating the state-dependent forecast uncertainty. To be practical, the framework should be capable of quickly generating the large number of ensemble members necessary to create a robust probabilistic forecast. Our proposed framework will address these concerns by using a large-ensemble prediction system in which the ensemble members are computationally cheap, stochastic realizations reflecting the uncertainty derived from dynamical ensemble models. Furthermore, our framework is designed to readily scale with advancements in the physics, resolution, and size of ensemble NWP models, especially since there is still much room for improvement in forecasting the intensity of TCs (Emanuel and Zhang 2016).

The paper is organized as follows. Section 2 describes the datasets used in this study. A detailed description of Forecasts of Hurricanes Using Large-Ensemble Outputs (FHLO) is provided in section 3, followed by an evaluation of the model framework in section 4. Section 5 concludes with a summary and discussion.

## 2. Data

In this study, we use forecast data from EPSs as the primary inputs to FHLO. We use two EPSs in particular: the 51-member global ensemble of ECMWF, and NOAA’s 21-member Global Ensemble Forecast System (GEFS). Both ensembles are run multiple times per day and provide an estimate of forecast uncertainty. All ensemble forecast data are obtained from an online portal generously made available by the THORPEX Interactive Grand Global Ensemble (TIGGE) project (Bougeault et al. 2010; Swinbank et al. 2016). The data are obtained on a 0.5° × 0.5° resolution grid. The analyzed tracks of TCs in ensemble forecast models are obtained from the TIGGE Model Tropical Cyclone Track Dataset (National Centers for Environmental Prediction et al. 2008), which is obtained from the Research Data Archive (RDA) maintained by the National Center for Atmospheric Research (NCAR).

We also require initial conditions at each initialization time. For the ECMWF ensemble, we use the reanalysis fields from the ERA5 atmospheric reanalysis (Hersbach 2016; Copernicus Climate Change Service 2017). Note that these fields can have significant differences from the analysis fields of the operational ECMWF model. However, we require high-resolution soundings in order to estimate the potential intensity. Since the high-resolution analysis fields of the operational ECMWF model are not available to the public, we have used the ERA5 fields. For the GEFS, we use the analysis fields from the Global Forecast System (GFS) analysis fields at each model initialization time, obtained through the RDA at NCAR (National Centers for Environmental Prediction 2015). All analysis fields are obtained on a 0.25° × 0.25° resolution grid. As part of initialization, we also require initialization of the TC itself. To obtain real-time estimates of TC position, intensity, and wind structure, we obtain automated tropical cyclone forecasting (ATCF) a-decks for each model initialization time (Sampson and Schrader 2000).

To obtain real-time estimates of the sea surface temperature, we use the National Centers for Environmental Information’s 0.25° Optimum Interpolation Sea Surface Temperature (OISST) dataset (Reynolds et al. 2008). The sea surface temperature data is provided daily.

Finally, we use the HURDAT2 best track data to evaluate the performance of the model (Landsea and Franklin 2013).

## 3. The large-ensemble model for tropical cyclones

In this work, we model TCs by assuming that a TC vortex is embedded within an evolving large-scale environmental field that ultimately determines the TC’s intensity. From an ensemble NWP model, such as the GEFS, we estimate environmental quantities relevant to the intensity of a TC, such as the saturation entropy deficit and vertical wind shear (Emanuel et al. 2004). By deriving these quantities from multiple members of an ensemble model, we can sample the internal variability in each environmental field. The track module of the large-ensemble model generates realistic tracks from the set of ensemble TC tracks. These tracks, combined with the various environmental fields along these tracks, serve as input into FAST (Emanuel 2017), a fast TC intensity model that emulates an idealized, axisymmetric TC model (Emanuel et al. 2004). FAST evaluates the intensity, or maximum azimuthal wind speed, along a specified track through one realization of the large-scale environmental field. Finally, the intensity and environmental fields are used as inputs into a parametric surface wind model, to generate a full spatial wind field of the TC (Chavas et al. 2015). Figure 1 summarizes the overall flow of information in the large-ensemble model. Each component of the complete model framework is explained in depth in the following sections.

### a. Synthetic track model

In this section, we describe a track algorithm that draws information from the ensemble track covariance to generate a large number of statistically indistinguishable synthetic tracks. The model is physically motivated by the beta-and-advection model (Marks 1992), which assumes that TCs are advected by some large-scale steering flow. Since we do not expect the large-scale flow to have considerable fluctuations on short (hourly) time scales, we expect there to be some correlation between the translational speed vector from one time step to the next. Note that the forecasted center of TCs is typically output in 3- or 6-h increments.

In light of this, we model the distribution of TC translational speeds using a Markov-chain solution. This means we condition on the previous time step translational speed, *u*_{t−1} and *υ*_{t−1}, where *u*_{t−1} and *υ*_{t−1} are the zonal and meridional translational speeds at time step *t* − 1, respectively, to determine the translational speed at the next discrete time step. Mathematically, this corresponds to *P*(*u*_{t}, *υ*_{t}|*u*_{t−1}, *υ*_{t−1}). To properly describe this conditional distribution, we expand this as

Next, we model both joint probability distributions as a mixture of *k* Gaussian distributions:

where *w*_{i} are the weights of each Gaussian mixture, such that $\u2211i=1k=1$. The quantities *μ*_{i}, *μ*_{j}, **Σ**_{i}, and **Σ**_{j} must be estimated using the track displacements from the ensemble model. In this study, we set *k* = 1, though future work can explore how increasing *k* affects the track model.

Given (*u*_{t−1}, *υ*_{t−1}), we can step the model forward by drawing from the conditional probability above. Integrating forwards in time will generate a synthetic track. The statistical algorithm is relatively fast, as 1000 synthetic tracks can be generated in approximately five minutes on a conventional laptop. For robustness, we require that at least 75% of the global ensemble member tracks have not dissipated before proceeding to the next time step.

Note that this method directly depends on the skill of the ensemble prediction system. While this means that the accuracy of the ensemble covariance algorithm should scale alongside the general accuracy of ensemble NWP models, it also indirectly ties the track algorithm to how accurately the ensemble NWP models simulate the intensity of the analyzed TC. If the TC dissipates too early in the ensemble prediction model, the ensemble track covariance model will generate tracks that also dissipate too early.

Since a potentially unlimited number of synthetic tracks can be generated, robust probabilistic statistics can also be generated. Figure 2 shows an example of the corresponding 75-km strike probability for the forecast of Hurricane Irma (2017), initialized at 0000 UTC 5 September 2017. While we observe that the density of the 75-km strike probability corresponds well with the density of the actual ensemble tracks for this single case, a large sample size with probabilistic evaluation is needed to fully evaluate the quality of the framework’s track predictions. This will be further examined in section 3d.

### b. Intensity model

The intensity model evaluates the surface azimuthal wind speed along a particular track. Though any computationally inexpensive intensity model can be used in the large-ensemble framework, we choose to use the FAST system, a pair of coupled, nonlinear ordinary differential equations that describe the evolution of *V*, the maximum azimuthal wind, and *m*, an inner-core moisture variable that is bounded between 0 and 1 (Emanuel and Zhang 2017; Emanuel 2017). The choice was motivated primarily because the system is framed around physically based parameters that can be easily derived from ensemble fields. The equations are included below:

where *C*_{k} is the surface enthalpy coefficient, *C*_{d} is the surface drag coefficient, *h* is the atmospheric boundary layer depth, *V*_{p} is the potential intensity, *α* is an ocean interaction parameter that varies between 0 and 1, *χ* is the midlevel saturation entropy deficit, *S* is the 250–850-hPa vertical wind shear, *T*_{s} is the surface temperature, *T*_{o} is the outflow temperature, *L*_{υ} is a constant latent heat of vaporization, *R*_{d} is the dry gas constant, $qs*$ is the surface saturation specific humidity, and *ε* is the thermodynamic efficiency [see Emanuel (2017) for further details]. Note that in Emanuel (2017), *χ* = 2.2, which is a typical value for the saturation entropy deficit, but as discussed presently, we will evaluate its time-evolving value from the global ensemble. The behavior of this system is controlled by four key terms: 1) a spinup term that represents intensification of the vortex toward its potential intensity because of surface fluxes, 2) a spindown term from the thermodynamic dampening influence of downdrafts in the inner core, 3) a moistening term that represents surface moisture fluxes from the ocean, and 4) a drying term that mimics eddy entropy fluxes into the TC’s eyewall. Though remarkably simple, the FAST equations do have some limitations, which are discussed in the conclusion.

The FAST equations are integrated forwards in time using a Runge–Kutta fourth-order numerical scheme with a time step of 450 s. As in Emanuel (2017), we set *C*_{k} = 1.2 × 10^{−3}, *h* = 1400 m, *ε* = 0.33, *κ* = 0.1. Thus, given an initial intensity and inner-core moisture, as well as the vertical wind shear, saturation entropy deficit, and potential intensity along a track, we can solve for the time evolution of *V* and *m*. It is worth nothing that in order to properly calculate *α*, the ocean interaction parameter that modulates *V*_{p}, one needs the ocean mixed layer depth *h*_{m} as well as the sub–mixed layer thermal stratification Γ. For simplicity, we choose to use climatological values for *h*_{m} and Γ (Levitus 1982). Future work will incorporate this information from a real-time ocean forecast model. Ocean mixing is switched off whenever the sub–mixed layer depth is larger than the depth of the ocean. Finally, a 0.25° bathymetric dataset is used to determine when the center of the TC is over land, during which the potential intensity is set to zero.

In this framework, the three most important environmental quantities that influence the intensity of a TC are 1) vertical wind shear, 2) saturation entropy deficit, and 3) potential intensity. These dynamic and thermodynamic quantities are defined to be environmental fields, and thus should be evaluated assuming that the considered TC does not exist. To progress with the perspective of modeling a TC in a synthetic environment, we need to remove any effects on these fields that are induced by an analyzed TC.

### c. Environmental quantities

#### 1) Vertical wind shear

To calculate the environmental vertical wind shear, the circulation induced by the TC must be removed. One method, for instance, averages winds over some distance larger than the radius of the inner core of the TC (DeMaria and Kaplan 1994), while other methods set the vorticity and divergence to zero within a specified distance from the TC center, and invert the Poisson equation to find the streamfunction and velocity potential associated with the vortex (Davis et al. 2008; Galarneau and Davis 2013). Since we desire a continuous spatial field of environmental winds **u**_{env}, we choose the latter method, setting the relative vorticity and divergence of the environmental field to zero within an inversion radius of $r*$ from the vortex center. We use $r*=400\u2009km$, where the magnitude of the axisymmetric component of the vortex’s relative vorticity becomes comparable to that of the environmental field. Note that $r*=400\u2009km$ is around the middle range of $r*$ values used by Galarneau and Davis (2013), and close to the median outer radius of TCs, inferred using scatterometer data (Chavas and Emanuel 2010). Fixing $r*$ is perhaps not the best choice, as Galarneau and Davis (2013) showed that the resulting environmental winds are in fact sensitive to $r*$. Future work could include stochastic perturbations to $r*$, or optimizing $r*$ from environmental profiles of relative vorticity. Defining *ζ*_{vortex} and *δ*_{vortex} as the vorticity and divergence identified as part of the vortex, respectively, we have

where **u** is the wind velocity vector, *ψ* is the streamfunction, Φ is the velocity potential, ∇ is the gradient operator, and ∇^{2} is the Laplacian operator. To solve for **u**_{vortex}, we must invert the Laplacian operator on a sphere to obtain the corresponding streamfunction and velocity potential. Details on this inversion are included in the appendix. After calculation of the environmental wind at 250- and 850-hPa, we subtract the two to obtain the environmental vertical wind shear. We chose these two levels based on DeMaria and Kaplan (1994), who found that the vertical wind shear between the 250- and 850-hPa levels correlates well with intensity changes in tropical cyclones, though alternative pressure levels could also be used to calculate the vertical wind shear.

#### 2) Midlevel ventilation

Midlevel ventilation, or the entrainment of low-entropy environmental air into a TC at midlevels, is one pathway by which vertical wind shear can interact with a TC and cause it to weaken (Simpson and Riehl 1958; Tang and Emanuel 2010). In the ventilation hypothesis, vertical wind shear leads to asymmetric processes that induce eddy fluxes and mixing between the TC eyewall and its environment. If the environmental air is sufficiently low in entropy, downdrafts will occur in the eyewall and disrupt warming of the inner core. For a TC with a saturated inner core, the normalized eddy fluxes that result from such ventilation are proportional to *χS* [see the appendix of Tang and Emanuel (2012)], where *S* is the 250–850-hPa vertical wind shear, and *χ* is a scalar that represents the saturation entropy deficit normalized by the air–sea thermodynamic disequilibrium, as in Eq. (8):

The pseudoadiabatic entropy *s* can be approximated following Bryan (2008):

where *c*_{p} is the specific heat of dry at constant pressure, *q* is the specific humidity, $H$ is the relative humidity, *p*_{d} is the dry pressure, *R*_{υ} is the water vapor gas constant, $sm*$ is the inner-core saturation entropy, *s*_{env} is the environmental entropy, $ss*$ is the saturation entropy at the sea surface, and *s*_{b} is the entropy at the boundary layer. To calculate *χ* at a fixed pressure level *p* from gridded data, we first assume that temperature perturbations on pressure surfaces are small, and that the inner core is saturated (Emanuel et al. 2008), such that the numerator becomes

To evaluate these quantities, we assume that the air at the sea surface is saturated and at the same temperature and pressure as the sea surface. We also assume that $sb=sLCL*=sp*$, where $sLCL*$ is the saturation entropy at the lifted condensation level. The first step assumes adiabatic motion from the boundary layer to the lifted condensation level, which is defined as the top of the boundary layer, and the last step follows from moist convective neutrality. Then, we have

While the environmental saturation deficit is typically evaluated at *p* = 600 hPa (Emanuel 2013), consistent with the midlevel ventilation hypothesis, the 600-hPa level is not available through the TIGGE database. Instead, we calculate *χ*_{p} at *p* = 500 hPa and *p* = 700 hPa and take the gridpoint maximum between the two levels. To obtain the *χ* used in FAST, we take the *N*th percentile of the distribution of the saturation entropy deficit within *r*_{env} of the TC center. Since any downdrafts that occur near the core are detrimental to the TC, we take relatively large values of *N*. The saturation entropy deficit typically increases away from the core, since deep convection near the inner core saturates the midlevels, such that if *N* is large enough, we are effectively diagnosing an environmental entropy deficit. To calculate the denominator, we take the median of the air–sea thermodynamic disequilibrium over the inner 200 km from the TC center. When the air–sea disequilibrium is negative, which can occur, for instance, at cold SSTs, we set *χ* = *χ*_{d}. We also cap *χ* to a value of *χ*_{d}. We estimated the optimal values of *N*, *r*_{env}, and *χ*_{d}, by finding values that minimized the mean absolute error in intensity. We found *N* = 90th percentile and *r*_{env} = 1000 km for the Atlantic basin, *N* = 50th percentile and *r*_{env} = 900 km for the eastern Pacific basin, and *χ*_{d} = 4. The large differences in *N* between the two basins is largely a result of differences in climatology; the density of tropical cyclone forecasts maximizes in thermodynamically favorable environments in the Atlantic, while the opposite is true in the eastern Pacific.

#### 3) Potential intensity

The potential intensity of a TC is a theoretical upper bound on its maximum wind speed (Emanuel 1986). Potential intensity has been verified as a reasonable upper bound on the intensity of real TCs (Emanuel 2000), and is defined as

where *k* is the enthalpy of the boundary layer air, and $k*$ is the saturation enthalpy of the sea surface. To calculate potential intensity, an environmental sounding with a high resolution in the vertical is required. Since potential intensity must be evaluated at the eyewall of a TC, we must remove the effect of the global ensemble system’s warm-core anomaly, which acts to reduce the buoyancy of a parcel lifted from the surface. Previous methods that attempt to account for this deficiency use time-lagged potential intensity fields (Emanuel et al. 2004). One weakness of this method is that it ignores any short-term variability of potential intensity. Instead, we opt to smooth out the effects of an analyzed TC. To remove the thermodynamic effect of an analyzed TC, we apply a 9-box smoother across a 10° × 10° grid box centered on the TC, to the temperature, specific humidity, and sea level pressure fields, holding the boundary fixed and smoothing inwards. This first-order approximation successfully removes the potential intensity minimum near the vortex center and allows us to obtain a robust estimate of the environmental potential intensity. In this study, we calculate the spatially varying potential intensity using the analysis (initialization) fields, which have a high vertical resolution. The potential intensity field is then kept fixed throughout the forecast, though the potential intensity of the TC can still change as it moves in space.

### d. Wind model

Since FAST outputs the maximum surface azimuthal wind speed, we use a parametric wind model and another parameterization to obtain the full TC wind field as well as the maximum surface wind speed. We first obtain the axisymmetric wind profile by using the physically based wind model developed by Chavas et al. (2015). We chose this model because of its basis on physical principles, though other wind models, such as the radii-CLIPER model (Knaff et al. 2007), could be used as well. The wind model developed in Chavas et al. (2015) separates the axisymmetric wind field into two regions, a convecting inner region, and a subsiding outer region, and the equations are below for convenience:

where *M* = *rV* + (1/2)*fr*^{2} is the angular momentum per unit mass, *M*_{inner} (*M*_{outer}) is the angular momentum in the inner (outer) region, *r* is the radius from the vortex center, *V* is the azimuthal wind, *r*_{0} is the outer radius (or radius of vanishing wind), *r*_{m} is the radius of maximum winds, *W*_{cool} is the subsidence rate from tropospheric radiative cooling, and *M*_{m}(*r*_{m}, *V*_{m}) is the angular momentum at the radius of maximum wind, where *V*_{m} is the maximum azimuthal wind speed. Because we lack observations of *W*_{cool}, we set 2*C*_{d}/*W*_{cool} = 1 s m^{−1}, for simplicity, though the model does have some dependence on the strength of *W*_{cool} (Chavas and Lin 2016). The full axisymmetric wind field can be resolved by fixing two of the three free parameters, *r*_{m}, *r*_{0}, and *V*_{m}. The maximum azimuthal wind speed *V*_{m} is readily obtained from FAST, and we choose to specify *r*_{0} (details on selecting *r*_{0} are described in section 3e). After obtaining the surface axisymmetric wind field, we apply a second model to obtain the asymmetric component of the wind field. This model is based on the isallobaric wind, which occurs whenever the vortex propagates with respect to the low-level wind, which will happen when there is vertical wind shear **S**. In a reference frame moving with the low-level vortex, vorticity must be increasing downshear of the vortex center, and for this to happen, there must be low-level convergence. We crudely take this into account by representing the vorticity in terms of the maximum azimuthal wind **V** divided by a vortex radial length scale, and the vortex-relative vorticity advection by the shear vector times this vorticity divided by the same length scale. This yields the low-level convergence downshear of the low-level vortex. The associated convergent velocity component is then obtained by integrating over the same length scale. With some empirical adjusting of constants, this results in

where **V** is the axisymmetric wind, **u**_{t} is the translational speed of the vortex, **S** is the 250–850-hPa vertical wind shear (m s^{−1}), *ϕ* is the latitude of storm center, |**V**| is the magnitude of the axisymmetric wind (kt), and **u**_{net} is the net wind at the surface. Finally, the maximum surface wind speed is determined by taking the maximum magnitude of **u**_{net} over the domain.

### e. Initialization and parameter estimation

The final component of FAST pertains to initialization. A poor initialization can lead to significant errors in the short-range forecast period (Emanuel and Zhang 2016, 2017). Since we want to include initial intensity uncertainty in the probabilistic system, we create a synthetic perturbation of the TC intensity analysis over the past 24 h of the analyzed TC (see appendix for details). Uncertainties in the intensity observations are taken from climatological errors (Landsea and Franklin 2013). For each synthetic ensemble member, we use the environmental parameters from the analysis fields to drive FAST, and add a time-varying forcing term to the azimuthal wind [Eq. (1)] such that the modeled maximum surface wind best matches the synthetic perturbation of the observed TC intensity. The initialization period runs from 48 h before the initial forecast time until the initial forecast time, where the forcing term to the azimuthal wind equation then decays in magnitude as $exp[\u2212\u2061(t/t0)2]$, where *t* is the forecast lead time, and *t*_{0} = 1 day.

To initialize the wind field, we take the initial analysis of the maximum extent of the 34-, 50-, and 64-kt winds in each quadrant [obtained from the combined automated response to query (CARQ) lines of the ATCF a-decks], and find the corresponding value of *r*_{0} that allows the modeled asymmetric wind field to best match the analysis. Furthermore, to combat large negative biases in the axisymmetric wind model at radii where 3 ≲~ *r*/*r*_{m} ≲~ 6 (see Fig. 9 in Chavas et al. 2015), we add a shape parameter *k* to the axisymmetric wind profile, that is, for *r* > *r*_{m}, *V*(*r*, *k*) = *V*(*r*)^{k}. Then, to initialize the wind field, we find *r*_{0} and *k* such that the full asymmetric wind field best matches the analysis radii in each quadrant. Finally, if an official forecast of the radii in each quadrant is available, the optimal *r*_{0} and *k*, given the official forecast of intensity, are used to interpolate *r*_{0} and *k* forward in the forecast. Otherwise, *r*_{0} and *k* are kept constant in the forecast. If an initial analysis and forecast of the wind radii do not exist, we set *r*_{0} = 700 km and *k* = 1. This may not be realistic but could be improved upon by using the model developed in Knaff et al. (2017), which predicts the maximum extent of the 34-, 50-, and 64-kt winds. We do not explicitly perturb *r*_{0}, though this could be the subject of future work.

## 4. Evaluation of the large-ensemble model

To robustly evaluate the skill of FHLO, we run 1000-member ensemble reforecasts for all 0000 and 1200 UTC cycle TC forecast cases in the Atlantic and eastern Pacific basins during the years 2015–18 (the choice of 1000 members is described in the following section). Since all of the aforementioned data used to generate a probabilistic forecast are available in real time, these reforecasts can be considered equivalent to late-cycle real-time forecasts. Since the skill of FHLO also depends on the skill of the ensemble, we run two variations of 1000-member ensembles, one using data from the ECMWF ensemble (FHLO-ECMWF) and the other from the GEFS (FHLO-GEFS). We also combine the two into a 2000-member superensemble, which we will denote FHLO-Super. We evaluate the performance of FHLO by using the HURDAT2 best track data as the observed track, intensity, and wind radii of each TC (Landsea and Franklin 2013). For each forecast case, we predict the track distribution, intensity distribution, and probability of exceedance for the 34-, 50-, and 64-kt wind speed thresholds.

### a. Ensemble size

The choice of running 1000-member ensembles is motivated primarily by looking at probability distributions of wind speeds from TCs at a fixed point. To illustrate this, we generate pointwise forecasts of wind speed from Hurricane Maria at San Juan, Puerto Rico, using the ECMWF ensemble initialized on 0000 UTC 18 September 2017 as input into FHLO. For demonstration, we use 100- and 1000-member FHLO-ECMWF ensembles, as well as a 51-member FHLO-ECMWF model that only uses the tracks from the original ECMWF ensemble (RAW-ECMWF). Figure 3 shows the time-varying maximum wind speed from Hurricane Maria at San Juan, Puerto Rico, from the 1000-member FHLO-ECMWF and the 51-member RAW-ECMWF. From the eye test, there is a sampling issue with RAW-ECMWF; there simply are not enough ensemble members to resolve the tail of the distribution. To be more quantitative, we estimate the nondimensional damage ([0, 1]), *f*, that represents the fraction of property lost [see Eq. (1) of Emanuel (2011)]:

where *V* is the maximum wind speed, *V*_{thresh} is the wind speed at which no damage occurs, and *V*_{half} is the wind speed at which half damage occurs. As the right column of Fig. 3 shows, there is an inferred probability of zero for *f* > 0.3 from the 51-member RAW-ECMWF. However, as we increase the size of the ensemble, the right tail of the distribution, which represents the most destructive scenarios, is better resolved. Thus, large-ensemble forecasts are necessary to flesh out the tail of wind distributions, which is often critical to decision making. With only a small number of ensemble members, it is extremely difficult to create smooth PDFs of pointwise wind forecasts. We settled on a 1000-member ensemble, since the marginal return on resolving the tail diminishes with further increases in ensemble size.

### b. Track forecasts

To verify that the synthetic track algorithm produces tracks that are statistically similar to the set of TC tracks of a given EPS, we evaluate the track error distribution and spread from forecasts of 1) the mean of the synthetic tracks, and 2) the mean of the ensemble system. A two-sample Kolmogorov–Smirnov test indicates that the track error distributions of forecasts from both methods are statistically similar at all lead times, regardless of the ensemble, with *α* = 0.05 (not shown).

Figure 4 shows the error between the observed track and the mean of the 1000-member ensemble, which is driven by the tracks of different EPSs. FHLO-GEFS has the smallest initial error for both basins, since the GEFS ensemble relocates the analyzed vortex to the best-guess position at initialization (Liu et al. 2000). In the Atlantic basin, FHLO-ECMWF outperforms all of the other individual models for all other lead times, while FHLO-GEFS has the best performance in the eastern Pacific. The multimodel superensemble, has the lowest mean absolute error (MAE) for almost all lead times in both basins, which is a well-known property of superensembles (Williford et al. 2003; Vijaya Kumar et al. 2003). Of course, the best metrics for the evaluation of a large ensemble are probabilistic metrics. A reliability diagram assesses the observed probabilities as a function of the forecasted probabilities. A model is said to be reliable if the observed frequencies of an event match the observed frequencies of its forecast probability—that is, the model performs well all the time without its forecasts being overconfident or underconfident. A perfectly reliable model falls along the 1:1 line in a reliability diagram. Figure 5 shows the reliability curve for the probability that the center of the TC is within 75-km of a particular grid point (using a 0.1° resolution grid), cumulative over 5 days. In general, the results demonstrate the reliability of the FHLO-ECMWF strike probabilities, as well as the overconfidence of the FHLO-GEFS strike probabilities. These results are similar to those obtained in Titley et al. (2020). We forgo a detailed evaluation of the synthetic tracks in lieu of evaluating the wind speed exceedance probabilities in section 4d, as the latter cannot be accurate if the former is not.

### c. Intensity forecasts

In this section we evaluate the intensity forecasts of FHLO using traditional, deterministic statistics, as well as probabilistic metrics. When evaluating, we do not explicitly terminate a TC when its simulated intensity falls below the tropical depression threshold (17 m s^{−1}). Figure 6 shows a model comparison of the mean absolute error in intensity as a function of time since initialization, for forecasts where the initial intensity is greater than 30 m s^{−1} (the reason for applying this filter is described below). Since the mean absolute error is a deterministic statistic, we take the mean of all the members of each FHLO ensemble as its deterministic model forecast. In both basins, the intensity error is largely the same between FHLO-ECMWF, FHLO-GEFS, and FHLO-Super for the first two days, likely since the large-scale environments of the ECMWF ensemble and GEFS do not diverge significantly for these short-term time scales. The intensity errors begin to diverge around two days, after which we observe that FHLO-ECMWF and FHLO-Super outperform FHLO-GEFS. The FHLO-based intensity forecasts also have a larger absolute error than the Hurricane Weather Research and Forecasting (HWRF) Model, though the FHLO-based ensembles perform comparably to HWRF in the eastern Pacific. The FHLO-based ensembles have a slight negative bias in the first two days of initialization. In general, however, the bias over the sample set is comparable to that of HWRF.

We also construct an idealized lower bound on the intensity error of the FHLO-based ensembles, to understand the best-case performance of a simple intensity model such as FAST. To gain insight into this, we take the exact same forecast cases, but instead use near-perfect initial conditions and analysis fields to derive the environmental parameters. The near-perfect initial conditions are achieved by running the aforementioned initialization procedure using the best track of the TC, forcing the model to the observed intensity. Once the initialization procedure hits the initialization time, the model is allowed to evolve freely. Then, the lower-bound error is computed by the divergence of the FAST model from the best track intensity. This is shown as the dashed black curve in Fig. 6. In the eastern Pacific basin, the lower-bound curve suggests that with improved ensemble forecasts and model initialization, the intensity error of the FAST model will also decrease. This is somewhat less true in the Atlantic basin; the lower-bound curve suggests that better initialization could result in improvements in forecasts with lead times of up to 2 days. For lead times longer than 2 days, however, the lower-bound curve is nearly indistinguishable from the forecast errors. This may be due to low inherent predictability and/or model error stemming from suboptimal parameterization of the ventilation, though further investigation is outside the scope of this study. Note that these bounds are derived using the current framework of the model; improvements to the model itself could lower the bound further. Last, in some sense, this curve can be loosely compared to the model error of FAST. While simple models, such as FAST, are unable to physically resolve the atmosphere and are often less accurate than more complex models, they are computationally inexpensive and can be used in large-ensemble studies such as this one. We believe that FAST makes a relatively good trade-off between simplicity and accuracy, though it is unlikely to be the best we can do.

Next, we evaluate the FHLO-based intensity forecasts using probabilistic metrics. A convenient metric to compare deterministic and probabilistic forecasts is the continuous ranked probability score (CRPS), which is the integrated squared difference between the cumulative distribution function (CDF) of the forecast and the observation:

where *F*(*υ*) is the CDF of the forecast, *υ*_{obs} is the observed intensity, and $1$ is the Heaviside step function. For a deterministic forecast, the CRPS simplifies to the mean absolute error. Figure 7 shows the CRPS as a function of time since initialization, and we see similar patterns in the results as compared to those of the mean absolute error. In the Atlantic basin, the CRPS of the FHLO-based ensembles is close to that of the deterministic HWRF forecast, for forecast lead times shorter than four days. For lead times longer than four days, the CRPS of the FHLO-based ensembles is lower than that of HWRF. In the eastern Pacific, the CRPS of the FHLO-based ensembles is lower than that of HWRF. This suggests that characterizing both the mean state and spread of the large-scale flow is paramount to quantifying the uncertainty in the future intensity of a TC.

Finally, it is worth commenting on the filter we used above, where we only considered forecasts where the initial intensity is greater than 30 m s^{−1}. The filter was specified in order to bypass the issue of cyclogenesis. In general, the mean absolute error in intensity increases as the initial intensity threshold in the evaluation filter is decreased (not shown), for a variety of reasons. While this may point to deficiencies in applying models based on idealized, axisymmetric TC theory to weak disturbances, there may be additional reasons for this behavior. For instance, a model that does not properly capture the observed intensification rate distribution but is tuned to provide sluggish intensification for weak storms could have the smallest mean absolute error in intensity, since most weak storms do not intensify into strong TCs. While the FAST system does capture the observed intensification rate distribution (Emanuel 2017), it has the tendency to overintensify weak disturbances, so long as the midlevel ventilation is not large. Regardless, much work has suggested that the intrinsic predictability of cyclogenesis and subsequent intensification is low (Sippel and Zhang 2008; Zhang and Sippel 2009; Zhang and Tao 2013), and thus TC genesis remains a significant forecasting challenge (Rappaport et al. 2009).

### d. Probabilistic wind speeds

In this section, we evaluate the 34-, 50-, and 64-kt wind speed exceedance probabilities that are generated from the FHLO-based ensembles. For each ensemble member, and at each time step, the full wind field of the TC is estimated. The evaluation procedure is similar to that in DeMaria et al. (2009), where the probabilities are generated by summing the number of ensemble members where the winds exceed a particular threshold, for each individual grid point. Following the evaluation procedure of DeMaria et al. (2009), we do not include forecast cases of extratropical transition. To do this, we truncate any forecasts that extend beyond when the official forecast predicts a transition to an extratropical storm. The wind probabilities should also account for the timing of dissipation, and thus we do not modify the forecast if the analyzed storm in the best track terminates before the end of the forecast, and vice versa. The latter situation is often more problematic for this model. This is because the ensemble tracks depend on how well the ensemble models resolve the TC on their relatively coarse grids. Finally, in this section, we only evaluate forecasts where the initial TC position is equatorward of 30°. This is a crude way to filter for storms that are characteristically more tropical, which is when we expect the FHLO-based ensembles to work the best. The aforementioned wind model also has more difficulty representing highly asymmetric wind fields, which is more likely to be the case for storms that are influenced by baroclinicity. Nevertheless, a large fraction (70% in Atlantic, 99% in the eastern Pacific) of the initial positions of our samples occurred south of 30°. Extending the model verification to latitudes north of 30° is left to future work.

Figure 8 shows an example of the exceedance probabilities for the 64-kt wind speed threshold, cumulative over the 5-day forecast period, for Hurricane Irma. The analyzed maximum extent of 64-kt winds, indicated by the thick black contours, are reasonably within the bounds of the probabilistic forecast, except for a mild along-track error. To obtain robust evaluation statistics, we evaluate all the forecasts by analyzing the associated reliability diagrams, multiplicative bias curves, and the maximum threat scores (Wilks 2011).

Figure 9 displays a reliability diagram for the cumulative exceedance wind probabilities using different wind speed thresholds, separated by basin and cumulative time. The results indicate that in general, the exceedance probabilities generated by the FHLO-Super model are reliable at the examined cumulative times and in both basins. We also observe that for probabilities between 0% and 50%, the 34-kt winds are more overdispersive than those for the 50-kt winds, which in turn are more overdispersive than the 64-kt winds. This is likely due to two factors. First, negative biases at *r*/*r*_{m} ≈ 4 in the axisymmetric wind model cannot be completely eliminated with the shape parameter, though the bias was heavily reduced (not shown). Second, while the TC wind profile is typically dominated by axisymmetric processes near the core, asymmetries often dominate farther away from the core. These asymmetries cannot always be represented by the simple asymmetric model in the aforementioned text. The combined effect of both of these factors is likely to underestimate the radial extent of winds at a particular threshold in each quadrant, where the magnitude of this bias decreases as one moves toward the inner core (i.e., we underestimate *r*_{34}, and less so for *r*_{50}, and even less for *r*_{64}). This observed bias could be addressed by adding more degrees of freedom to the wind model, though this would reduce the simplicity. Using another wind model with more degrees of freedom, such as the radii-CLIPER wind model of Knaff et al. (2007), may also help to alleviate these issues.

The multiplicative bias is defined by $B=\u2211iFi/\u2211iOi$, where *B* is the multiplicative bias, *F*_{i} are the forecasted probabilities, and *O*_{i} is the observation (0 or 1). The bias *B* is a measure of whether the average forecast has probabilities that are too large (*B* > 1), or too small (*B* < 1). The average probabilities of FHLO-Super, shown by Fig. 10, are generally too small for the 34- and 50-kt thresholds, and reasonably unbiased for the 64-kt threshold. The underlying reason for *B* < 1 at the 34- and 50-kt thresholds is likely the same reason for the bias in the reliability diagrams; namely, that *r*_{34} and *r*_{50} are being underestimated in each quadrant. Regardless, the multiplicative biases at a fixed threshold remain relatively constant in time. This suggests that reducing the bias in the wind model could lead to *B* ≈ 1 at all thresholds.

Unfortunately, a large number of grid points across a basin have zero probability and will evaluate to correct nulls, which can strongly influence the bias scores. To remedy this, we use the threat score metric to further evaluate the probabilistic wind forecasts. Correct nulls are not used in the threat score. To calculate the threat score, one defines a threshold probability to determine a categorical forecast (yes or no) and divides the total number of correct forecasts by the sum of the total number of correct forecasts, false positives, and misses. The threat score has a score from 0 (worst) to 1 (best). As a baseline, we perform a homogeneous comparison with NHC’s MCP model. NHC only archives the probabilities accumulated over all storms during a particular cycle, which is an issue when combined with the fact that FHLO forecasts only run as far out as the ensemble tracks. Thus, for any forecast, we must discard any samples beyond the lead time at which the FHLO forecast for any existing TC dissipates. This reduces the number of samples we can use to evaluate FHLO. In the Atlantic (eastern Pacific), we used 348 (161) cycles at a lead time of 6 h, to 61 (20) cycles at a lead time of 120 h, using 0000 and 1200 UTC forecasts from 2015 to 2018. We also create a superensemble between FHLO-Super and the MCP model, simply by averaging the wind probabilities of each model, and denote this as FHLO-MCP.

Figure 11 compares the maximum threat scores of FHLO-Super, the MCP model, and FHLO-MCP. The threat scores generally maximize at a threshold probability of ≈30% (not shown). In the Atlantic basin, the MCP model is superior at shorter lead times (≈0–2 days), especially at the 34- and 50-kt thresholds. One reason this may be is that analyses of the initial position and intensity can be explicitly taken into account in the official forecast. On the other hand, FHLO-Super has higher threat scores at longer lead times (≈3 days), regardless of the threshold. In the eastern Pacific basin, FHLO-Super has larger threat scores at nearly all lead times and thresholds. These results suggest that incorporating state-dependent uncertainties could lead to improvements in long-range, probabilistic, pointwise wind probability forecasts. Taking into account the flow-dependent uncertainty will be extremely important if forecast lead times are extended beyond 5 days. Finally, in the Atlantic basin, the FHLO-MCP superensemble has better threat scores compared to both FHLO-Super and the MCP model, at all lead times and thresholds. While the sample size of the homogeneous comparison between FHLO-Super and the MCP model may be a bit limited, this provides more evidence that combining multiple models can lead to superior forecasts.

As mentioned previously, ensemble models have the tendency to possess negative biases in the intensity of a TC. This may lead to premature dissipation of a TC, resulting in smaller threat scores from forecast misses. We expect that as the ability to resolve TCs in ensemble models increases, the threat scores associated with the FHLO-based ensembles will also increase. Regardless, these results suggest that there is much value in incorporating state-dependent uncertainty in TC forecasts.

## 5. Conclusions

In this study, we developed FHLO, a probabilistic, large-ensemble [*O*(1000) members], TC prediction framework that estimates forecast uncertainty by leveraging the internal variability of the large-scale environment simulated by a global NWP ensemble. We described a method to generate synthetic tracks that are statistically similar to those of an ensemble NWP model. We evaluated the intensity forecasts of the simulated TC along each track using the FAST intensity model, and we then used a physically based wind model to estimate the full wind field. We evaluated the model using four years (2015–18) of reforecasts in the Atlantic and eastern Pacific basins.

We show that the FHLO-based large-ensemble intensity forecasts perform comparably with HWRF, an advanced NWP model. We also evaluate the probability of exceedance for the 34-, 50-, and 64-kt wind speed thresholds using reliability curves, multiplicative bias curves, and threat scores. The results suggest that, notwithstanding some slight biases at low wind speed thresholds, the FHLO-based wind forecasts are skillful and reliable. Pointwise wind forecasts using the FHLO framework are particularly skillful at lead times longer than 3 days, and a combination of FHLO-Super and NHC’s MCP model was shown to have the highest threat scores across all lead times in the Atlantic. These results suggest that it will be important to better characterize the state-dependent uncertainty to continue to improve our long-range forecast skill.

Probabilistic wind speed forecasts combine the uncertainty in both the track and intensity and provide uncertainty quantification of wind speeds at fixed points. As compared to traditional forecast quantities such as the exact storm center or the maximum wind speed anywhere in the storm, pointwise wind speeds are the more relevant TC hazard metric for assessing wind impacts. The uncertainty of wind speeds at a point could also be translated to location-specific vulnerabilities, as is the goal of the Hurricane Risk Calculator (Vigh et al. 2020). Thus, while the traditional track-intensity dichotomy is useful for evaluating and improving our models, it is not the most useful and pertinent metric to the public.

It is also important to stress that large ensembles [*O*(1000) ensemble members] are necessary to provide accurate forecasts of pointwise wind speed probability distributions. This is one of the key advantages of the large-ensemble framework. Furthermore, unlike the model described in DeMaria et al. (2009), the FHLO framework quantifies forecast uncertainty by incorporating the internal variability in the large-scale environment. This is achieved by including uncertainty in the TC’s track, uncertainty in the dynamic and thermodynamic environments, and uncertainty in the initial conditions. The synthetic tracks of the model are generated by sampling from the dispersion among TC tracks of an ensemble NWP model. The dynamic and thermodynamic fields of the ensemble NWP system are also used to generate realistic perturbations to TC-relevant environmental quantities, such as the saturation entropy deficit and vertical wind shear. Initial condition uncertainty is accounted for by incorporating stochastic perturbations to the initial intensity and inner-core moisture.

There is also uncertainty in the observations that are used to evaluate the model. While we evaluate the wind speed probabilities by assuming that all points within the estimated maximum radial extent of a particular wind speed threshold, this may likely not be the case. A better method to evaluate wind speed probabilities is to only use wind observations over land, which are arguably more objective than best track estimates of intensity. Undertaking such an evaluation could be an area of future work.

It is worth discussing the various sources of error and limitations of the FHLO framework. First, while we provide evidence that FAST can reasonably model and forecast TCs, the FAST equations are still an imperfect and idealized representation of the evolution of the intensity of a TC. We expect the intensity module to work best for axisymmetric, surface-flux-driven, mature TCs, and thus less so for tropical disturbances or cyclones that are undergoing extratropical transition. A statistical bias correction could be applied to the intensity component for weak disturbances in light of this issue. Furthermore, in the FAST framework, we consider midlevel ventilation as the only process that dries out the inner core of a TC. The eddy-entropy flux into the eyewall is then approximated to vary linearly with the product of the environmental saturation entropy deficit and the vertical wind shear (Tang and Emanuel 2012). Any departures from this approximation will affect the intensity model. Uncertainty and error can also arise from the methods developed to calculate environmental quantities. The definition of vertical wind shear that is relevant to the TC is problematic, since errors can be introduced through $r*$, as well as the vertical levels by which to calculate shear. In this study, we fixed $r*=400\u2009km$ and used the 250–850-hPa vertical wind shear. These are perhaps not the best approximations, and it may be the case that the “optimal” $r*$ and vertical levels by which to calculate vertical wind shear can vary from storm to storm.

Since TCs are primarily driven by thermodynamic disequilibrium between the sea surface and boundary layer air (Emanuel 1986), accurate observations of the ocean are important to properly calculate environmental quantities, such as the potential intensity. For this study, errors in the estimation of these quantities could have been introduced with the smoothing operators, which were necessary to remove the TC’s influence on the thermodynamic and dynamic fields. Furthermore, we parameterize ocean mixing from upwelling by using climatological mixed layer depth and sub–mixed layer thermal stratification, which effectively smooths out any high-frequency variability in the ocean. This could lead to large errors in intensity forecasts (Emanuel et al. 2004). Future work will incorporate information from a real-time ocean model.

FHLO can also be used to forecast the probability of rapid intensification (RI). While a full evaluation of the skillfulness of the FHLO-based ensemble forecasts to predict RI is left for future work, preliminary evaluation of probabilistic rapid intensification forecasts, regardless of the intensity of the storm, shows some promise (not shown). The evaluation results of the track and intensity forecasts also suggest that combining many more ensemble models, such as the Met Office and Japan Meteorological Agency ensemble systems, could lead to more skillful forecasts, as suggested by Yamaguchi et al. (2012) and Titley et al. (2020). In addition, while we only used one model to predict TC intensity in this study, using a variety of intensity models to generate intensity forecasts may improve the overall model. The only restriction is that the intensity model must be computationally inexpensive. This principle applies equally to the wind field model.

Finally, it is important to note that the skill of the FHLO-based ensemble is derived from the accuracy of the ensemble NWP system. While it is clear that errors in representing the environmental fields will significantly affect the intensity component, errors in representing the TC in the NWP model itself can also affect the FHLO-based ensemble. This can happen when a TC dissipates too early in the ensemble prediction model, such that the track model will generate tracks that also dissipate too early. Though the skill of FHLO is significantly coupled to the quality of the ensemble system, it is likely that the skill of FHLO will progress along with advancements in ensemble NWP models. This is arguably the most intriguing aspect of FHLO: it can be viewed as a framework for bootstrapping an ensemble NWP model.

## Acknowledgments

This work was made possible due to substantial funding support from the NOAA Hurricane Forecast Improvement Project (HFIP) through Grant NA18NWS4680058, entitled “New Frameworks for Predicting Extreme Rapid Intensification.” This work is also based on TIGGE data. TIGGE (The Interactive Grand Global Ensemble) is an initiative of the World Weather Research Programme (WWRP). We thank the NCAR Advanced Study Program Graduate Visitor Program for supporting the first author’s visit to NCAR during Summer 2019. This material is based in part upon work supported by the National Center for Atmospheric Research, which is a major facility sponsored by the National Science Foundation under Cooperative Agreement 1852977. Three anonymous reviewers also provided valuable suggestions that helped to improve the manuscript.

### APPENDIX

#### Model Setup Details

##### a. Inversion on a low-resolution grid

In this study, we remove winds associated with the TC by performing “vortex surgery” on both the ERA5 analysis fields and the forecast ensemble fields, as in Davis et al. (2008). While the analysis fields can be obtained at high grid resolution, the grid resolution (0.5°) of ensemble numerical weather prediction models is inadequate to properly resolve the vorticity maxima of a TC. One way around this is to interpolate between grid points. However, interpolation schemes must always assume some structure about the field (i.e., linear, cubic, etc.). Using spherical harmonics, we can extend the number of basis functions to an arbitrary higher order by assuming that the power spectrum obeys a well-defined power scaling law. Defining *L* as the truncation order of the original field, *L*′ as the truncation order of the higher resolution field, *S*(*l*′) as the desired power at degree *l*′, and *n* as a random coefficient with unit variance, then, the superresoluted field *f* is

where *S* is determined from the power spectrum at the extended spherical harmonics. For simplicity, we use an exponentially decaying fit to the power spectrum and use the fit to extend to higher order of harmonics. Note that this smooth interpolation can also be performed without assuming any structure about the power spectrum, as long as the original resolution is high enough such that the amplitudes of the higher-order spherical harmonics are sufficiently small. In fact, we can even set the power of the higher-order spherical harmonics to zero with relatively little consequence on the resulting environmental wind fields.

##### b. Perturbations to observed intensity

To generate perturbations to the observed intensity, we model the intensity, *V*(*t*), as a Gaussian process with the time-varying observations as the mean and a constant covariance kernel:

where *t*_{1} and *t*_{2} are discrete time points in the observed history, *T* = 1 day, and *σ*^{2} is the variance that represents the uncertainty associated with the observations. We write the Karhunen–Loeve expansion of the intensity as $V\u2061(t)=\mu V\u2061(t)+\u2211i=1n\lambda i\psi i\theta i$, where *n* is the total stochastic dimension, *θ*_{i} is a standard normal random variable, and *λ*_{i} and *ψ*_{i} are the ordered eigenvalues and eigenvectors with respect to the covariance kernel *C*_{V}. We only consider fluctuations of the intensity on time scales of the observations (every 6 h), such that we take *n* = 10 to remove any high-frequency variability. Finally, we take $\sigma \upsilon 2$ as a piecewise constant function:

where *υ* is the observed intensity. These values of *σ*^{2}(*υ*) were chosen as simple approximations to the uncertainty distributions shown in Landsea and Franklin (2013).

## REFERENCES

*Bull. Amer. Meteor. Soc.*

*Mon. Wea. Rev.*

*Geophys. Res. Lett.*

*J. Atmos. Sci.*

*J. Atmos. Sci.*

*Mon. Wea. Rev.*

*Wea. Forecasting*

*Wea. Forecasting*

*Wea. Forecasting*

*Bull. Amer. Meteor. Soc.*

*J. Atmos. Sci.*

*Mon. Wea. Rev.*

*Wea. Climate Soc.*

*Proc. Natl. Acad. Sci. USA*

*Nat. Hazards*

*J. Atmos. Sci.*

*J. Atmos. Sci.*

*J. Atmos. Sci.*

*Bull. Amer. Meteor. Soc.*

*Mon. Wea. Rev.*

*Mon. Wea. Rev.*

*Science*

*Mon. Wea. Rev.*

*Mon. Wea. Rev.*

*Mon. Wea. Rev.*

*Mon. Wea. Rev.*

*2016 Fall Meeting*, San Francisco, CA, Amer. Geophys. Union, Abstract NG33D-01

*Quart. J. Roy. Meteor. Soc.*

*Bull. Amer. Meteor. Soc.*

*Wea. Forecasting*

*Wea. Forecasting*

*Mon. Wea. Rev.*

*Climatological Atlas of the World Ocean*

*Wea. Forecasting*

*Quart. J. Roy. Meteor. Soc.*

*Nat. Hazards Rev.*

*Wea. Forecasting*

*Bull. Amer. Meteor. Soc.*

*Tech. Conf. on Hurricanes*, Miami Beach, FL, Amer. Meteor. Soc., D4-1–D4-10

*J. Atmos. Sci.*

*Bull. Amer. Meteor. Soc.*

*J. Atmos. Sci.*

*Bull. Amer. Meteor. Soc.*

*Quart. J. Roy. Meteor. Soc.*

*Wea. Forecasting*

*Quart. J. Roy. Meteor. Soc.*

*Eighth Symp. on Building a Weather-Ready Nation: Enhancing Our Nation’s Readiness, Responsiveness, and Resilience to High Impact Weather Events*

*Mon. Wea. Rev.*

*Statistical Methods in the Atmospheric Sciences*

*Mon. Wea. Rev.*

*Quart. J. Roy. Meteor. Soc.*

*J. Atmos. Sci.*

*J. Atmos. Sci.*

## Footnotes

Denotes content that is immediately available upon publication as open access.

^{1}

The National Hurricane Center’s operational version of this model is called the Tropical Cyclone Wind Speed Probabilities Product.