1. Introduction
Global models, such as NOAA’s Global Forecasting System (GFS), the European Centre for Medium-Range Weather Forecasts (ECMWF) Integrated Forecasting System (IFS), and the Met Office (UKMO) Global Unified Model provide skillful deterministic track forecasts for TCs. Objective performance evaluations show a steady improvement in track forecast skill (Cangialosi 2019), much of which can be attributed to improved performance of these global models. However, these deterministic systems do not provide guidance about the uncertainty in the forecast. Global ensemble systems, such as NOAA’s Global Ensemble Forecast System (GEFS) and ECMWF’s Ensemble Prediction System (EPS), provide probabilistic track forecasts, but lack the horizontal resolution to provide a realistic intensity forecast. Mesoscale ensemble systems, such as the Naval Research Laboratory’s (NRL) recently developed Coupled Ocean–Atmosphere Mesoscale Prediction System-Tropical Cyclone (COAMPS-TC) ensemble, are designed to fill this gap.
The COAMPS-TC deterministic system (Doyle et al. 2014, 2012) is a regional nested model developed by NRL and run operationally at Fleet Numerical Meteorology and Oceanography Center (FNMOC) for the prediction of TC track, intensity and structure. Details on the COAMPS-TC model configuration are included in section 2. COAMPS-TC is a well-tested model with updates implemented annually, and is used by the National Hurricane Center (NHC), the Joint Typhoon Warning Center (JTWC), and the Department of Defense meteorological and oceanographic (DoD METOC) community. The model is competitive with global deterministic models such as NOAA’s Global Forecast System (GFS) for track and mesoscale models such as the Hurricane Weather Research and Forecast (HWRF) model for intensity (Doyle et al. 2020). In 2019, COAMPS-TC had the best intensity forecast among all dynamic and statistical models in the Atlantic, bested only by the NHC official forecast, and the second-best 120-h track forecast among dynamical models behind only the ECMWF (Masters 2020).
The COAMPS-TC ensemble is a probabilistic tropical cyclone forecasting system that has been developed to provide operational, real-time guidance to NHC and JTWC. This model accounts for key uncertainties associated with model initial and boundary conditions, and is globally relocatable and configurable in the same way as the deterministic COAMPS-TC model. As will be demonstrated in this paper, the COAMPS-TC ensemble mean track and intensity forecasts are as accurate as or more accurate than those of the deterministic COAMPS-TC model. The ensemble has been developed such that each of the perturbed members represents an equally likely outcome. A characteristic of a well-configured ensemble is to be able to distinguish between high and low uncertainty forecast scenarios. Forecasts associated with greater (lesser) uncertainty should be associated with greater (lower) ensemble spread, as well as higher (lower) ensemble mean error when averaged over a sufficiently large sample. The ability of the COAMPS-TC ensemble to distinguish between high and low uncertainty forecast scenarios will also be discussed in this study.
There have been a number of studies that have employed mesoscale dynamical TC ensemble systems for research purposes. Zhang et al. (2009) found deterministic forecasts initialized from an EnKF analysis can display considerable variability with different lead times, but are capable of predicting the rapid formation and intensification of a hurricane. Torn (2010) found that a 96-member WRF EnKF system run at 36-km horizontal resolution was competitive with the GFDL mesoscale model for TC intensity, but slightly lagged global models in skill for track. Using an improved version of the WRF EnKF ensemble with nested 36-/12-/4-km domains, Torn (2016) compared the effect of atmospheric perturbations, ocean perturbations, and perturbations to the drag Cd and enthalpy Ck exchange coefficients, and the effect on forecast spread in TC intensity. They found the atmospheric perturbations to provide the greatest contribution to intensity spread. Perturbing Cd was found to be beneficial to intensity spread in the first 48 h of the forecast. Emanuel and Zhang (2017) found errors in tropical cyclone intensity forecasts are dominated by initial-condition errors, as opposed to model error, out to at least a few days. This suggests that perturbations to the initial conditions will be an effective means of producing ensemble variance. Zhang et al. (2014) examined the impact of using GEFS perturbations and a stochastic convective trigger function in generating spread in the HWRF Ensemble Prediction System (HWRF-EPS). They found the HWRF-EPS to be overall underdispersive for track and intensity. While the GEFS perturbations contribute more significantly to spread, the stochastic convective perturbations also appear to be beneficial. Leighton et al. (2018) finds the HWRF-EPS to be a useful tool in diagnosing the structural evolution of Hurricane Edouard (2014), comparing the inner-core structure and large-scale environment between rapidly intensifying and nonintensifying ensemble members. Similarly, Alaka et al. (2019) utilized the HWRF-EPS to discriminate between ensemble members that predicted a U.S. landfall for Hurricane Joaquin (2015) versus members that did not, and to relate differences in the synoptic-scale environment, the TC vortex structure, and the TC location that contributed to these different forecasts.
Previous studies that utilize mesoscale models for TC prediction tend to use either an ensemble Kalman filter (EnKF) based Weather Research and Forecasting (WRF) modeling framework (Zhang et al. 2009; Torn 2010, 2016; Torn and Davis 2012; Weng and Zhang 2012, 2016; Emanuel and Zhang 2017), a variation of the HWRF-EnKF (Aksoy et al. 2012; Aberson et al. 2015; Pu et al. 2016; Lu et al. 2017a,b; Tong et al. 2018), or the HWRF-EPS (Zhang et al. 2014; Leighton et al. 2018; Alaka et al. 2019). An overarching theme in many of these studies is that processes that occur on the mesoscale or on convective scales contribute substantially to TC intensity and structural changes. However, many of these processes are of too small a spatial scale or too short a temporal scale to be reliably captured in a deterministic sense due to short predictability limits at these scales. As such, it is advantageous to represent these processes in a probabilistic spectrum to determine the most likely outcome. With perhaps a few exceptions, the majority of past studies that utilize mesoscale TC ensembles employ a model configuration that works well for a single case study, up to a handful of cases. In general, it is not within the scope of these types of studies to test a large sample for statistical validation. One unique aspect of this study is that the presented results are from an ensemble that is built around a well-tested operational TC model,1 and that a sufficiently large sample has been tested for statistically meaningful results.
We provide an overview of COAMPS-TC in section 2. The ensemble design is discussed in section 3. The performance of the ensemble is discussed in section 4. Last, a summary is presented in section 5.
2. The COAMPS-TC model
a. Model overview
The COAMPS-TC deterministic system (Doyle et al. 2014, 2012) is comprised of analysis, initialization, and forecast model subcomponents. The COAMPS-TC atmospheric model features a nonhydrostatic dynamical core. The suite of physical parameterizations for the atmospheric model includes representations for cloud microphysics, boundary layer and free-atmospheric turbulent mixing, surface fluxes, radiation, and deep and shallow convection. COAMPS-TC consists of a fixed outer grid mesh at 36-km horizontal resolution and two storm-following inner grid meshes at 12- and 4-km resolution. There is a total of seven different outer meshes for the various TC basins around the world. The atmospheric model uses 40 vertical levels, with a top near 10 hPa. Initial conditions (ICs) and boundary conditions (BCs) are provided by either the NOAA GFS system or Navy NAVGEM system. In this paper, we will primarily focus on the configuration using GFS ICs and BCs.
The atmospheric model configuration of COAMPS-TC ensemble is identical to the 2018 COAMPS-TC deterministic model. For the ensemble, the atmospheric model is currently run in an uncoupled configuration for computational efficiency. Initial time sea surface temperatures (SSTs) are prescribed from the global model rather than HYCOM. To capture the SST evolution due to upwelling under the high wind conditions, we parameterize ocean surface cooling rate to be a function of the surface wind speed, with cooling only occurring for surface wind speeds ≥ 50 kt (1 kt ≈ 0.51 m s−1). The configuration of the COAMPS-TC atmospheric model and the ensemble is summarized in Table 1.
Summary of the COAMPS-TC ensemble configuration.
b. Initialization
As previously mentioned, COAMPS-TC is initialized off global fields from either GFS for NAVGEM. These fields are interpolated from a 0.5° or 0.25° resolution to the COAMPS-TC grid and hydrostatically balanced. This interpolation is performed for the global model analysis, as well as every 6 h of the global model forecast, which is used as lateral boundary conditions for the outermost grid. Currently all COAMPS-TC initializations are cold starts, meaning that we are not running any analysis cycling or data assimilation in the operational system. That said, as will be demonstrated below, our relatively simple system works quite well in terms of accuracy of the track and intensity forecast of the ensemble mean, and in terms of producing meaningful spread about the mean.
A more sophisticated COAMPS-TC analysis and ensemble system that incorporates a cycled EnKF, vortex-scale, and all-sky radiance data assimilation is under development at NRL. Unfortunately, due to time and resource constraints, this system was not yet available to be utilized by the operational system discussed in this study, and will appear in a future study.
For numbered tropical cyclones, a balanced synthetic vortex is inserted in the storm-following 12- and 4-km grid meshes and replaces the global model analysis in these regions. The balanced vortex is based on a modified Rankine radial profile of tangential winds that is fit to the analyzed radius of maximum winds and radius of the 34-kt winds from the operational TC warning message disseminated by JTWC or NHC. The vortex is constructed such that the intensity matches the warning message intensity and contains tangential wind field asymmetries consistent with the warning message TC translational speed and direction. In the vertical, the tangential wind speed is increased by a factor of 1.4 from the surface to 1 km for hurricane-strength TCs and lower factors for weaker TCs. The tangential winds above 1 km decay with increasing height up to a specified height that serves as the top of the vortex. This upper bound for the vortex height increases with intensity. The tangential winds from the vortex are then used to compute the temperature and height fields based on gradient and hydrostatic balance. Outside the inserted balanced vortex, the initial fields are relaxed to the global model analysis.
c. Physical parameterizations
Here we highlight a few noteworthy changes to the model physics that have occurred since Doyle et al. (2014, 2012). The drag coefficient for momentum (Cd) has been modified from the Powell et al. (2003), Donelan et al. (2004) and Donelan (2018) relationships, leveling off at 26 m s−1 and decreasing above 42.5 m s−1, as depicted by the red curve in Fig. 1. We refer the reader to Soloviev et al. (2017) for further justification on leveling off and decreasing Cd above a certain wind threshold. The roughness length for heat and moisture follow the Tropical Ocean and Global Atmosphere Coupled Ocean–Atmosphere Response Experiment (TOGA COARE; Fairall et al. 2003) formulation.
The convective parameterization has also changed since the Doyle et al. (2014) reference. Subgrid-scale deep convection on the 36- and 12-km meshes is represented using the Kain and Fritsch (1993) parameterization. Since 2018, shallow convection is parameterized on all three meshes following Tiedtke (1989). Results of our testing are generally consistent with Torn and Davis (2012), indicating reduced midlevel temperature bias and slightly improved track forecasts associated with switching to Tiedtke shallow convection. Note that the testing described in section 4 was performed prior to this upgrade, but the forecasts in section 5 include it. For further details on the model dynamics and physical parameterizations, see Doyle et al. (2014, 2012).
3. Ensemble design
a. Overview
The initialization of the COAMPS-TC ensemble is computed in three steps. First, construction of the control (i.e., unperturbed) member initial state is identical to the deterministic system for the atmosphere. The ensemble control member is identical to the deterministic COAMPS-TC, with the exception that the deterministic forecast is fully coupled to the ocean using the Navy’s Coupled Ocean Model (NCOM) while the ensemble control uses a simple 1D SST cooling parameterization as mentioned in section 2a. Second, the large-scale state from the control member is randomly perturbed to represent analysis uncertainty among the perturbed members. Third, vortex-scale perturbations are added to the control initial TC vortex in the perturbed members. The perturbations are not cycled in time, which allows for a great deal of flexibility in that the system is cold started for each forecast without the need to spin up the ensemble. This is in contrast to the HWRF ensemble, which relies upon cycled perturbations from the parent global ensemble, GEFS (Zhang et al. 2014), and allows for maximum flexibility when prioritizing storms to be simulated while balancing resource constraints.
b. Synoptic-scale perturbations
The large-scale initial condition perturbations are designed to capture synoptic-scale analysis uncertainty in order to forecast realistic tropical cyclone position spread relative to the model skill at all lead times. These perturbations have been constructed such that they do not require a cycling forecasting system to spin up, and computationally inexpensive to apply, due to constraints of the operational environment. One drawback of this approach is that perturbations are not flow dependent, and might not grow as quickly during the first few hours of the forecast as those of an ensemble with a cycled data assimilation system. Ideally, a cycled system such as an EnKF will be implemented in future updates as additional resources for real-time runs become available.
Figure 2a shows a single realization of the perturbation field and the standard deviation of potential temperature on model level 20 (~630 hPa) for the western North Pacific basin 36-km domain. Figure 2b demonstrates that the magnitude of the perturbations often varies by latitude, with this example demonstrating large variability of potential temperature in the midlatitudes relative to the tropics. The initial-condition synoptic perturbations are constructed such that their mean is equal to the deterministic analysis, which is our best guess of the initial state of the atmosphere.
c. Lateral boundary perturbations
Collectively, the synoptic-scale initial condition perturbations and the lateral boundary perturbations comprise the “large-scale perturbations.” Example track (Figs. 3a, 4a) and intensity (Figs. 5a, 6a) forecasts produced using only the large-scale perturbations from two TCs are included to demonstrate the utility of these perturbations. Figures 3 and 5 are for Atlantic Hurricane Harvey (2017); Figs. 4 and 6 are for central Pacific Hurricane Lane (2018).2 It is clear from these forecasts that the large-scale perturbations generate significant spread in track. Track standard deviations at 48 h are 155 km for Harvey (2017) and 104 km for Lane (Table 2), which are similar in size to NHC’s 2/3 probability circles based on error statistics from 2015 to 2019: 128 km for the Atlantic and 120 km for the east Pacific at 48 h (National Hurricane Center 2020).
Standard deviation of 48-h track (km) and intensity (kt) forecasts using large-scale perturbations only, vortex/physics perturbations only, and all perturbations. Included are at the 0600 UTC 24 Aug 2017 forecast for Hurricane Harvey, and at the 1200 UTC 21 Aug 2018 forecast for Hurricane Lane.
It should also be noted that the large-scale perturbations also result in some meaningful spread in the intensity forecast, particularly beyond ~36 h. This finding is consistent with Torn (2016) and Emanuel and Zhang (2017). The differences in track forecasts result in having the storm in a variety of different environments among the various ensemble members, with some environments more dynamically or thermodynamically favorable to TC intensification than others. It is through this mechanism that initial changes in track lead to changes in intensity at later times. Intensity forecast standard deviation in these experiments at 48 h is 26.7 kt for Harvey and 5.4 kt for Lane. Clearly the 48-h intensity forecast for Harvey was associated with much higher uncertainty than it was for Lane. While the large-scale perturbations appear effective at perturbing the intensity at and beyond 48 h, they do not appear to be sufficient to capture the initial uncertainty in the intensity forecast from 0 to 36 h.
d. Vortex perturbations
To capture uncertainty in the initial state of the tropical cyclone vortex, the COAMPS-TC ensemble system perturbs the initial vortex intensity using a symmetric triangular distribution. The initial intensity is a key parameter of the balanced synthetic vortex, such that perturbing intensity affects the entire structure of the TC vortex inserted into the initial state. A triangular distribution and Gaussian distribution with a mean of 0.0 kt, a standard deviation of 10.0 kt, and a maximum possible perturbation of 24.5 kt for TCs with an initial intensity ≥ 40 kt is depicted in Fig. 7. The advantage of the triangular distribution is that the maximum perturbation is bounded, whereas perturbations can occasionally be unrealistically large using a Gaussian distribution. Initial tests with the COAMPS-TC ensemble using the Gaussian distribution occasionally resulted in large intensity perturbations which detrimentally impacted the ensemble. In the COAMPS-TC ensemble system the standard deviation of the initial vortex perturbation increases as a function of the control vortex intensity following Landsea and Franklin (2013). The triangular distributions used for the current configuration of the COAMPS-TC ensemble system, along with the maximum perturbation wind speed for each intensity range is depicted in Table 3. Note that the standard deviation of our vortex perturbations are consistent with intensity estimate uncertainty when aircraft reconnaissance data is not available, which is the case for the majority of TCs globally. The ensemble is not currently configured to know whether or not aircraft reconnaissance data was used when NHC or JTWC assigned the initial intensity.
Initial vortex intensity triangular distribution.
Initial testing of the COAMPS-TC ensemble system showed that the perturbation amplitude in storms with positive initial intensity perturbations tended to decay more rapidly than in storms with negative initial intensity perturbations, with storms with the largest amplitude intensity perturbations decaying the most rapidly. This behavior was partially attributed to the fact that temperature perturbations in the initial balanced TC vortex are directly correlated with the intensity perturbation. Therefore, in the absence of water vapor perturbations, the relative humidity of a vortex with a positive intensity perturbation decreases and the storm is not able to maintain its initial intensity. Therefore, in addition to the initial vortex intensity perturbations, the initial water vapor is adjusted within the vortex to maintain a constant relative humidity with respect to the control vortex.
In addition to the initial intensity uncertainty of the TC vortex, there is also uncertainty in the position and structure of the storm. However, testing of the COAMPS-TC ensemble system showed that perturbing either initial position or storm size had a detrimental impact on the evolution of the ensemble forecast. These perturbations tend to collapse over the first 12 h of the forecast. It is hypothesized that these perturbations did not behave in the model as intended because we were not adjusting the TC moisture fields to be consistent with the different vortex positions or sizes. As such, we will continue to test relocating and scaling the size of the initial moisture field to match the perturbed vortex, and the options to perturb initial vortex size and position are disabled in the operational COAMPS-TC ensemble system at this time. However, the vortex intensity perturbations are still enabled in the remainder of experiments in this study as well as the operational ensemble system.
e. Physical parameterization perturbations
Due to the large uncertainty in the value of the momentum exchange coefficient at high wind speeds (Black et al. 2007; Bell et al. 2012) and the sensitivity of tropical cyclone intensity to this “drag coefficient” (Cd; Smith et al. 2014; Torn 2016), a Cd perturbation method was developed for the COAMPS-TC ensemble. In the control member, the function for Cd that decays above a given wind speed is represented by a sixth-order polynomial, which begins decreasing with increasing wind speed above 42.5 m s−1. In the ensemble, we randomly perturb the point at which Cd begins decreasing to be a value between 35 and 53 m s−1, as well as randomly perturb the coefficients of the polynomial to retain a standard deviation within the bounds of Bell et al. (2012). Figure 1 shows the fixed control drag coefficient as well as randomly generated Cd curves for each ensemble member in a single initialization. Once the ensemble initializes, a single Cd curve is used for each member for the entire length of simulation (i.e., perturbed Cd does not change in time). A threshold value of 35 m s−1 was chosen partly because uncertainty in Cd is greatest at higher wind speeds. We also imposed a 35 m s−1 threshold because we wanted to introduce Cd perturbations as a source intensity spread, particularly for hurricanes, for which we are underdispersive. If the wind threshold is lowered, the area outside the TC subject to the Cd perturbation becomes increasingly large. A small number of tests showed that Cd perturbations at lower wind speeds began to affect the surrounding environment and thereby the track forecast too, and it was decided that this would warrant further study before implementing it. No other physical parameters are perturbed in the ensemble system at this time. However, in the future we hope to explore the impact of a stochastic convective trigger function, as in Zhang et al. (2014).
To demonstrate the utility of the vortex perturbations and the physics perturbations (hereafter “vortex/physics perturbations”), we ran the COAMPS-TC ensemble using just the vortex/physics perturbations (and the large-scale perturbations turned off). We group these perturbation types together because they have similar effects on the variance of track and intensity. Track (Figs. 3b, 4b) and intensity (Figs. 5b, 6b) forecasts from this experiment are again shown for Hurricane Harvey (Figs. 3, 5) and Hurricane Lane (Figs. 4, 6). The vortex and physics perturbations contribute to a track standard deviation of 25 km for Harvey and 16 km for Lane (Table 2). Clearly the vortex/physics perturbations are contributing fairly minimal variance to the track forecast. Alternatively, the vortex/physics perturbations add spread to the intensity forecast during the initial 0–36 h during which time the large-scale perturbation only experiments lacked any meaningful intensity spread. By 48 h, the intensity spread generated by the large-scale perturbations is larger than the spread associated with the vortex and physics perturbations for Harvey, while the spread remains larger in the vortex and physics perturbation experiment than the large-scale perturbation experiment for Lane (Table 2). Note that for Harvey, the best track intensity was revised upward from the operational real-time intensity estimate, so the forecast begins too weak (Fig. 5). However, the intensity forecasts actually captured the intensification rate quite well. Last, the combined effects of applying both large-scale and vortex perturbations are shown to produce the optimal blend of variance in both track and intensity at both early and later lead times (Figs. 3c, 4c, 5c, 6c, Table 2). As such, this is what is done in the operational COAMPS-TC ensemble.
4. Model performance
The COAMPS-TC ensemble is evaluated based upon a set of 181 retrospective forecasts from 2017 using both GFS and NAVGEM ICs and BCs. We use the constraint that 9 out of 11 members must be present at a particular forecast time to compute the ensemble mean. Per ATCF convention, COAMPS-TC initialized off GFS will be hereafter referred to as CTCX and the variant initialized from NAVGEM will be abbreviated COTC. This set of forecasts includes 67 forecasts for 15 western North Pacific TCs, 67 forecasts for 10 Atlantic TCs, and 47 forecasts for 8 eastern North Pacific TCs (Table 4). The forecasts were chosen from 2017 Northern Hemisphere storms, in the basins most pertinent to the customers of the operational system (JTWC and NHC). Longer-lived storms and forecasts with initial times early in a storm’s life cycle were favored for selection in order to maximize the number of validating forecasts at longer lead times. Forecasts are initialized every 24 h for a given storm (versus 6 or 12 h as is done in real time) in order to increase the independence of the test cases. For the evaluation, we will primarily focus on the CTCX variant since that version will be run operationally. There will also be a brief discussion of the COTC ensemble for comparison, as well as the potential advantages of a CTCX/COTC combined ensemble. Last, some statistics for all CTCX ensemble real-time demonstration runs performed from 2014 to 2017 will be shown. Note that minor upgrades to the ensemble were performed annually throughout this timespan, so the error characteristics change (generally improve) with time in the full sample. However, to first order, the ensemble perturbation scheme remained fixed throughout this period.
List of storms used for calculating performance and error statistics for the homogenous sample from 2017.
a. Error characteristics for the CTCX ensemble
We begin by examining the error characteristics for the 2017 version of the CTCX ensemble, comparing the ensemble mean forecast to the unperturbed control member forecast for combined results for all storms in Table 4. Overall, the mean absolute error (MAE) of the ensemble mean position is quite consistent with the MAE of the control (Fig. 8a). While the two lines appear to be almost right on top of each other, improvement of the ensemble mean with respect to the control ranges from −2% (slight degradation) to +4% (improvement), with a weighted average percent improvement of +0.92%. Note that the statistics are computed using 180 cases at the analysis time, which gradually decreases to 74 cases at 120 h as the storm is forecast to dissipate prior to 120 h in many forecasts. As such, the weighted average percent improvement is weighted by number of forecasts valid at each forecast time.
The benefit of running an ensemble versus a deterministic forecast is more dramatic for intensity than it was for track, as evidenced by MAE and the mean error (ME) especially at longer lead times (Fig. 8b). Note that the ME is closer to zero for the ensemble mean, implying a reduction of bias relative to the control. Percent improvement ranges from −2% to +18%, fluctuating with lead time, with a weighted average of 5.82%.
There is some basin dependence of the ensemble mean track and intensity error scores. For track, among the three basins in the test sample, the errors are the lowest in the east Pacific and greatest in the west Pacific for all lead times at and beyond 48 h (Fig. 9a). For intensity, errors are the smallest for short (0–30 h) and longer (78–120 h) lead times in the east Pacific, in contrast to the Atlantic where the errors are smallest at middle (36–72 h) lead times (Fig. 9b). However, consistent with track errors, intensity errors are greatest in the west Pacific at nearly all lead times. These interbasin error characteristics are generally consistent with those of deterministic COAMPS-TC (Doyle et al. 2020). One goal of future model upgrades will be to improve performance in the west Pacific to be more consistent with the other basins. However, the fact that the west Pacific typically has a greater number of intense TCs than the other basins and fewer in situ observations may make this goal difficult to achieve. Also note that the ensemble was ran for additional oceanic basins, including in the Southern Hemisphere, the central Pacific, and the Bay of Bengal and Indian Ocean, but there were not enough cases to compute meaningful error statistics. It is possible that in a larger sample, one of these other basins would be associated with greater mean error than the west Pacific, or lesser mean error than the east Pacific or the Atlantic.
b. Spread–skill score and uncertainty discrimination for the CTCX ensemble
Next we examine the spread–skill relationship of the 2017 CTCX ensemble for track and intensity. The ensemble is calibrated such that the spread in the forecast is consistent with the uncertainty in the forecast. Since the uncertainty and associated expected error of the ensemble mean forecast grow with time, the ensemble spread is also expected to increase with time. Ideally, there would be a 1:1 relationship between ensemble spread and ensemble mean error. Additionally, for a well-calibrated ensemble, the mean-squared ensemble-mean error should be equal to the sum of the ensemble forecast error variance plus the observation error variance. Accounting for the observation error variance has the effect of increasing the ensemble spread. Most studies do not account for this because the ensemble variance is often much larger than the observation variance, but this is not true for best track intensity where it has more of an impact. In this study, we account for the observation error when calculating the ensemble spread using the best track error estimates from Landsea and Franklin (2013). For 2017, the track spread–skill relationship for the CTCX ensemble is quite good, with both metrics increasing monotonically very close to a 1:1 relationship (Fig. 10a). Note that α and λ in Eq. (2) were iteratively chosen through experimentation as to optimize the spread–skill of the track forecast at 120 h.
The CTCX ensemble spread for intensity is actually too large at the initial time, but fails to grow much at all during the first 48 h of the forecast (Fig. 10b). By 24 h, the forecast spread is already too small. Ultimately, this results in having a forecast that is quite underdispersive for intensity from 24 to 72 h. This suggests that the perturbations in the ensemble system are not projecting onto the error modes to which intensity is most sensitive. It was found experimentally that perturbing the initial intensity of the vortex contributes to some intensity spread that remains through 120 h. However, perturbing the size of the RMW or the position of the center of the vortex were found to produce a weak bias in the intensity forecast while doing little to increase the spread, and are therefore not used in the current model configuration, as discussed in section 3d. Currently the only physics that are perturbed is the drag coefficient. In future updates, we seek to increase intensity spread in the 24–72-h period by perturbing tendencies in the moist physics, and perhaps also by perturbing the initial moisture field in the vortex. Interestingly, the RMS error of the ensemble mean is quite steady from 30 to 120 h into the forecast, while the intensity spread actually does gradually grow with time. This results in having a spread–skill relationship that is relatively well calibrated from 96 to 20 h.
Monotonically increasing spread–skill relationships demonstrate that the spread in the ensemble increases with forecast lead time as the mean error grows, as expected. However, a slightly more stringent criterion for which ensembles are also evaluated is the system’s ability to discriminate between high uncertainty, high error and low uncertainty, low error forecast scenarios at a particular lead time. To evaluate uncertainty discrimination, the spread of the ensemble about its mean is compared against the error of the ensemble mean, and binned into equally sized spread quartiles. In Fig. 11, a red dot represents each quartile bin, and an equal number of forecasts are associated with each red dot on any given subpanel. Each subpanel column represents a different forecast time: 24, 72, and 120 h, with uncertainty discrimination for track on the top row and for intensity on the bottom row. The red lines indicate the linear least squares fit to the data, the black lines are the 1:1 line. The linear least squares fit should at the very least have a positive slope, as a basic first-order test of the competency of the ensemble to discriminate different forecast scenarios. The ensemble is considered well-constructed if the slope of the least squares line connecting the error quartile points is close to 1 (matching the slope of the 1:1 line). Further details on this approach and the theory behind it are described in Grimit and Mass (2007), with applications of this technique for tropical cyclone prediction described in Komaromi and Majumdar (2014, 2015).
As previously stated, the ensemble is considered well-constructed if the slope of the least squares fit to the error quartiles is close to 1. For the CTCX ensemble, the slope ranges between 0.6 and 1.2 at 24, 72, and 120 h for both track and intensity (Fig. 11), indicating reasonably good uncertainty discrimination. For track, there are mean error bins both above and below the 1-to-1 line at all lead times, demonstrating that the relationship between spread and error is appropriate at all lead times. As shown earlier (Fig. 8b), the ensemble intensity forecast is nearly unbiased. The uncertainty discrimination diagrams for intensity have a slope of near 1, which implies that more uncertain (higher mean error) cases have higher intensity spread, while less uncertain (lower mean error) cases have lower intensity spread, as desired. The best-fit line being well above the 1-to-1 line implies that there is too little spread for all error bins, both for low uncertainty and high uncertainty cases. In other words, the ensemble is underdispersive for intensity at all lead times, consistent with earlier results. However, the underdispersive nature of the intensity forecasts aside, the uncertainty discrimination is quite impressive at 24 and 72 h, with a slope very close to one.
c. The COTC ensemble and the combined CTCX/COTC ensemble
Recall that while the CTCX ensemble uses GFS for ICs/BCs, the COTC ensemble uses NAVGEM for ICs/BCs. For track, the CTCX-ENS mean is clearly more skillful than the COTC-ENS mean at all lead times (Fig. 12a). This is consistent with the fact that deterministic CTCX is more skillful than deterministic COTC, with the relative improvement of the CTCX-Ensemble mean relative to the COTC-Ensemble mean being quite similar to the improvement of deterministic CTCX relative to COTC (Doyle et al. 2020). We have also experimented with generating a combined ensemble comprised of five COTC members and six CTCX members, which produces ensemble mean track errors much closer to the CTCX ensemble than the COTC ensemble, albeit still slightly inferior to the CTCX ensemble mean alone (Fig. 12a). Overall, the combined CTCX/COTC ensemble degrades the track error by about 4% versus the CTCX ensemble.
For intensity, the COTC ensemble mean (Fig. 12b) is associated with 1–2 kt greater error than the CTCX ensemble at most lead times. However, the CTCX/COTC ensemble mean actually performs superior to either variant at virtually all lead times. Percent improvement of the Combined Ensemble with respect to the CTCX ensemble ranges from −5% to +20%, with a mean improvement of +8%, which corresponds to a fairly substantial reduction in intensity forecast error. Additionally, the spread–skill relationships for both track (Fig. 13a) and intensity (Fig. 13b) for the CTCX/COTC ensemble are both superior to those for the CTCX ensemble (Fig. 10). For track, the mean spread and error at each lead time are nearly identical: the mean 12–120-h3 spread–skill ratio is 1.03 for the CTCX ensemble and 1.01 for the CTCX/COTC ensemble. For intensity, while the combined ensemble is still underdispersive at most lead times, the spread–skill relationship is slightly improved versus the CTCX ensemble alone (Fig. 10b), with a spread–skill ratio of 0.89 for the CTCX/COTC ensemble versus 0.86 for the CTCX ensemble. A ratio < 1.0 indicates underdispersion while a ratio > 1.0 indicates overdispersion. The CTCX/COTC ensemble is closer to 1.0 for both position and intensity.
Presently the CTCX/COTC combined ensemble is not run operationally. This ensemble would also likely violate our desire to have each ensemble member depict an equally likely scenario. However, since operational centers including NHC and JTWC are increasingly using multimodel ensembles to make forecasts, and since the intensity and spread–skill results both showed promise, this is an avenue that will perhaps be further explored in the future.
d. The full 2014–17 real-time results
The CTCX ensemble underwent a series of upgrades during the 2014–17 developmental stage, while also being run experimentally in real time. While the 2017 sample is large with 180 cases, the full 2014–17 sample is an order of magnitude larger with 1893 cases. Overall track MAE is consistent between the 2014–17 and just the 2017 samples, with slightly over 100 n mi mean error in the ensemble mean at 72 h and slightly over 200 n mi error at 120 h (Fig. 14a). As was the case with just the 2017 sample, the ensemble mean track is about 2%–3% more skillful than the control member, on average (Fig. 15a). For intensity, results from the 2014–17 sample are also generally consistent with the 2017 sample alone, with errors growing rapidly during the first 48 h before gradually saturating by 120 h (Fig. 14b). The 2017 version is associated with about 1–2 kt lower error, on average, perhaps suggestive of model improvement during the multiyear period (Fig. 9b). It is also possible that the TCs in 2017 were generally “easier” to forecast. For both samples, the ensemble mean is meaningfully more skillful than the control, with 1–2.5 kt lower MAE (Fig. 15b). For the 2014–17 sample, the ensemble mean intensity MAE is approximately 5% better than that of the control at early lead times increasing to 10% at later lead times.
There exists some basin-to-basin variability in the performance of the ensemble over the 2014–17 period. For track, the ensemble consistently had better performance in the Atlantic and the east Pacific than in the west Pacific beyond 36 h (Fig. 14a). For intensity, the COAMPS-TC ensemble performed the best in the Atlantic at most lead times, did generally the worst in the west Pacific, and scored somewhere in-between in the east Pacific (Fig. 13b). In the west Pacific, there is likely a causal relationship between the track and intensity errors, where having the storm in the wrong location results in also having the wrong environment and incorrect timing of landfall, compounding intensity errors. These results generally follow the performance of the deterministic CTCX system (Doyle et al. 2020).
5. NRL’s real-time experimental 21-member ensemble
In addition to the 11-member operational version of the COAMPS-TC ensemble currently running at FNMOC, an experimental 21-member version with modified perturbations and physics is also currently running on select storms at NRL. An example forecast from the NRL experimental ensemble for Hurricane Dorian produced at 1200 UTC 29 August 2019 is shown (Fig. 16). While the ensemble control and mean track forecasts were slightly right of the verifying track, the verifying track remained within the 2/3 probability ellipses (the larger ellipses) at each lead time, which are plotted in 24-h increments (Fig. 16a). Note that the uncertainty in the track forecast for Dorian was unusually high at this time, with the 120-h 1/3 and 2/3 probability ellipses much larger than the mean in our testing sample. A 24-h intensity change forecast product for the same Dorian forecast is also shown, indicating high probabilities of intensification at most forecast times (Fig. 16b). The probability of moderate intensification (10–29 kt in 24 h) peaks at ~90% from 18 to 42 h, with the probability of rapid intensification (≥30 kt in 24 h) nearly reaching 20% in the 72–96- and 84–108-h time windows.
An assortment of other forecasts ran with NRL’s 21-member experimental ensemble are shown (Fig. 17). Among these forecasts, Typhoon Faxai (Fig. 17a) and Cyclone Kyarr (Fig. 17f) were particularly noteworthy TCs, as Faxai was one of the strongest typhoons to cross Tokyo Bay over the last 50 years and Kyarr was an unusual high-end category 4 in the Arabian Sea. Note that for Faxai, Jerry and Lorenzo, the track uncertainty was generally quite low (Figs. 17a,b,d), while the track uncertainty was comparatively greater for Karen, Mitag and Kyarr (Figs. 17c,e,f). NRL’s COAMPS-TC ensemble forecasts are available online at https://www.nrlmry.navy.mil/coamps-web/web/ens. In addition to these real-time forecasts, the NRL version will be used as a test bed for future upgrades to the operational ensemble.
6. Summary
In this manuscript, we have provided an overview of the development and performance of the COAMPS-TC ensemble prediction system. The currently operational COAMPS-TC ensemble consists of one control CTCX member (initialized off of the GFS) and 10 perturbed members, comprising an 11-member ensemble. Perturbations are made to the large-scale initial flow fields, the initial vortex intensity, forecast lateral boundary conditions, and the surface drag coefficient. The ensemble system is currently running at 36-/12-/4-km horizontal resolution4 with the two inner domains being vortex following, consistent with the operational version on COAMPS-TC.
Statistics have been presented for a 181-case retrospective sample from the 2017 Atlantic, east Pacific, and west Pacific basins using a fixed version of the model. A few statistics from 1893 real-time forecasts from 2014 to 2017 were also shown, in order to demonstrate that similar error characteristics generally holds over a much larger sample. Verification of the ensemble relative to best track data revealed that the COAMPS-TC ensemble-mean prediction of TC position and intensity was superior to the deterministic control simulation at nearly every lead time to 120 h. For example, the CTCX ensemble mean prediction of intensity was on average 6% better than that of the deterministic control run.
The spread–skill relationship of the COAMPS-TC ensemble for track forecasts was shown to be extremely well calibrated, with the two values nearly equivalent at all lead times. For intensity, both spread and skill increase with increasing forecast lead time. However, intensity spread significantly underestimates the error at most lead times, especially from 24 to 72 h. The ability of the ensemble to discriminate forecast uncertainty at a given lead time was also investigated. It was shown that the ensemble spread is positively correlated with ensemble mean track and intensity forecast error magnitudes at all lead times. As such, the ensemble discriminates among forecast cases with relatively large and relatively small amounts of uncertainty.
The utility of having a combined ensemble with some members running off GFS ICs/BCs (CTCX) and other members running off NAVGEM (COTC) was demonstrated. While the combined ensemble exhibited a slight degradation in track forecast skill versus the CTCX ensemble, the combined ensemble produced the best mean intensity forecast and a superior spread–skill relationship for both track and intensity. NRL will continue exploring the advantages and disadvantages, as well as the operational feasibility of running the CTCX/COTC combined ensemble in real time.
We conclude that the COAMPS-TC ensemble system adds great value to the prediction of tropical cyclones for JTWC and NHC, both in terms of deterministic prediction of track and intensity using the ensemble mean and probabilistic prediction of forecast uncertainty using the ensemble spread. In the future, NRL will continue developing the ensemble to improve track and intensity forecasts with time. The real-time 21-member ensemble run at NRL will serve as a test bed for potential transitions to FNMOC operations further down the road. We will explore physics tendency perturbations, in particular for our parameterized cumulus and boundary layer schemes, in an attempt to improve the spread–skill relationship for intensity. A cycled EnKF that incorporates vortex-scale and all-sky radiance data assimilation is also under development, and will hopefully replace NRL’s experimental ensemble and begin real-time testing in the near future. Similar to the work of Leighton et al. (2018) and Alaka et al. (2019) using the HWRF-EPS, the COAMPS-TC ensemble has also proven to be a useful tool for predictability studies, with a follow-on paper already in the works. We seek to continue leveraging this system both for producing operational forecasts and for scientific study.
Acknowledgments
We acknowledge support by the Chief of Naval Research through the NRL Base Program (PE 61153N), the Office of Naval Research (ONR) Tropical Cyclone Intensity DRI (PE 0601153N), (ONR) Rapid Transition Program (PE 0602435N), and N2N6E (PE 0603207N). We also acknowledge support from the National Oceanic and Atmospheric Administration (NOAA) sponsored Hurricane Forecast Improvement Project (HFIP). We appreciate support for computational resources through a grant of Department of Defense High Performance Computing time from the DoD Supercomputing Resource Center at Stennis, MS, and Vicksburg, MS. COAMPS-TC is a registered trademark of the Naval Research Laboratory.
REFERENCES
Aberson, S. D., A. Aksoy, K. J. Sellwood, T. Vukicevic, and Z. Zhang, 2015: Assimilation of high-resolution tropical cyclone observations with an ensemble Kalman filter using HEDAS: Evaluation of 2008–11 HWRF forecasts. Mon. Wea. Rev., 143, 511–523, https://doi.org/10.1175/MWR-D-14-00138.1.
Aksoy, A., S. Lorsolo, T. Vukicevic, K. J. Sellwood, S. D. Aberson, and F. Zhang, 2012: The HWRF Hurricane Ensemble Data Assimilation System (HEDAS) for high-resolution data: The impact of airborne Doppler radar observations in an OSSE. Mon. Wea. Rev., 140, 1843–1862, https://doi.org/10.1175/MWR-D-11-00212.1.
Alaka, G. J., X. Zhang, S. G. Gopalakrishnan, Z. Zhang, F. D. Marks, and R. Atlas, 2019: Track uncertainty in high-resolution HWRF ensemble forecasts of Hurricane Joaquin. Wea. Forecasting, 34, 1889–1908, https://doi.org/10.1175/WAF-D-19-0028.1.
Barker, D. M., and Coauthors, 2012: The Weather Research and Forecasting Model’s Community Variational/Ensemble DA System: WRFDA. Bull. Amer. Meteor. Soc., 93, 831–843, https://doi.org/10.1175/BAMS-D-11-00167.1.
Bell, M. M., M. T. Montgomery, and K. A. Emanuel, 2012: Air–sea enthalpy and momentum exchange at major hurricane wind speeds observed during CBLAST. J. Atmos. Sci., 69, 3197–3222, https://doi.org/10.1175/JAS-D-11-0276.1.
Black, P. G., and Coauthors, 2007: Air–sea exchange in hurricanes: Synthesis of observations from the Coupled Boundary Layer Air–Sea Transfer Experiment. Bull. Amer. Meteor. Soc., 88, 357–374, https://doi.org/10.1175/BAMS-88-3-357.
Bougeault, P., 1985: The diurnal cycle of the marine stratocumulus layer: A higher-order model study. J. Atmos. Sci., 42, 2826–2843, https://doi.org/10.1175/1520-0469(1985)042<2826:TDCOTM>2.0.CO;2.
Cangialosi, J. P., 2019: 2018 hurricane season. National Hurricane Center Forecast Verification Rep., 73 pp., https://www.nhc.noaa.gov/verification/pdfs/Verification_2018.pdf.
Donelan, M. A., 2018: On the decrease of the oceanic drag coefficient in high winds. J. Geophys. Res. Oceans, 123, 1485–1501, https://doi.org/10.1002/2017JC013394.
Donelan, M. A., B. K. Haus, N. Reul, W. J. Plant, M. Stiassnie, H. C. Graber, O. B. Brown, and E. S. Saltzman, 2004: On the limiting aerodynamic roughness of the ocean in very strong winds. Geophys. Res. Lett., 31, L18306, https://doi.org/10.1029/2004GL019460.
Doyle, J. D., and Coauthors, 2012: Real time tropical cyclone prediction using COAMPS-TC. Advances in Geosciences, C.-C. Wu and J. Gan, Eds., Vol. 28, World Scientific Publishing Company, 15–28.
Doyle, J. D., and Coauthors, 2014: Tropical cyclone prediction using COAMPS-TC. Oceanography, 27, 104–115, https://doi.org/10.5670/oceanog.2014.72.
Doyle, J. D., and Coauthors, 2020: Recent progress and challenges in tropical cyclone intensity prediction using COAMPS-TC. Tropical Meteorology and Tropical Cyclones Symp., Boston, MA, Amer. Meteor. Soc., 1.1, https://ams.confex.com/ams/2020Annual/meetingapp.cgi/Paper/363334.
Emanuel, K. A., and F. Zhang, 2017: The role of inner-core moisture in tropical cyclone predictability and practical forecast skill. J. Atmos. Sci., 74, 2315–2324, https://doi.org/10.1175/JAS-D-17-0008.1.
Fairall, C., E. Bradley, J. Hare, A. Grachev, and J. Edson, 2003: Bulk parameterization of air–sea fluxes: Updates and verification for the COARE algorithm. J. Climate, 16, 571–591, https://doi.org/10.1175/1520-0442(2003)016<0571:BPOASF>2.0.CO;2.
Fu, Q., and K.-N. Liou, 1993: Parameterization of the radiative properties of cirrus clouds. J. Atmos. Sci., 50, 2008–2025, https://doi.org/10.1175/1520-0469(1993)050<2008:POTRPO>2.0.CO;2.
Grimit, E. P., and C. F. Mass, 2007: Measuring the ensemble spread–error relationship with a probabilistic approach: Stochastic ensemble results. Mon. Wea. Rev., 135, 203–221, https://doi.org/10.1175/MWR3262.1.
Jin, Y., W. T. Thompson, S. Wang, and C.-S. Liou, 2007: A numerical study of the effect of dissipative heating on tropical cyclone intensity. Wea. Forecasting, 22, 950–966, https://doi.org/10.1175/WAF1028.1.
Kain, J. S., and J. M. Fritsch, 1993: Convective parameterization for mesoscale models: The Kain–Fritsch scheme. The Representation of Cumulus Convection in Numerical Models, Meteor. Monogr., No. 46, Amer. Meteor. Soc., 165–170.
Komaromi, W. A., and S. J. Majumdar, 2014: Ensemble-based error and predictability metrics associated with tropical cyclogenesis. Part I: Basinwide perspective. Mon. Wea. Rev., 142, 2879–2898, https://doi.org/10.1175/MWR-D-13-00370.1.
Komaromi, W. A., and S. J. Majumdar, 2015: Ensemble-based error and predictability metrics associated with tropical cyclogenesis. Part II: Wave-relative framework. Mon. Wea. Rev., 143, 1665–1686, https://doi.org/10.1175/MWR-D-14-00286.1.
Landsea, C. W., and J. L. Franklin, 2013: Atlantic hurricane database uncertainty and presentation of a new database format. Mon. Wea. Rev., 141, 3576–3592, https://doi.org/10.1175/MWR-D-12-00254.1.
Leighton, H., S. Gopalakrishnan, J. A. Zhang, R. F. Rogers, Z. Zhang, and V. Tallapragada, 2018: Azimuthal distribution of deep convection, environmental factors, and tropical cyclone rapid intensification: A perspective from HWRF ensemble forecasts of Hurricane Edouard (2014). J. Atmos. Sci., 75, 275–295, https://doi.org/10.1175/JAS-D-17-0171.1.
Louis, J. F., 1979: A parametric model of vertical eddy fluxes in the atmosphere. Bound.-Layer Meteor., 17, 187–202, https://doi.org/10.1007/BF00117978.
Lu, X., X. Wang, Y. Li, M. Tong, and X. Ma, 2017a: GSI-based ensemble-variational hybrid data assimilation for HWRF for hurricane initialization and prediction: Impact of various error covariances for airborne radar observation assimilation. Quart. J. Roy. Meteor. Soc., 143, 223–239, https://doi.org/10.1002/qj.2914.
Lu, X., X. Wang, M. Tong, and V. Tallapragada, 2017b: GSI-based, continuously cycled, dual-resolution hybrid ensemble–variational data assimilation system for HWRF: System description and experiments with Edouard (2014). Mon. Wea. Rev., 145, 4877–4898, https://doi.org/10.1175/MWR-D-17-0068.1.
Masters, J., 2020: The most reliable hurricane models, according to their 2019 performance. Yale Climate Connections, accessed 3 September 2020, https://yaleclimateconnections.org/2020/08/the-most-reliable-hurricane-models-according-to-their-2019-performance/.
Mellor, G. L., and T. Yamada, 1982: Development of a turbulent closure model for geophysical fluid problems. Rev. Geophys. Space Phys., 20, 851–875.
Moskaitis, J. R., P. A. Reinecke, W. A. Komaromi, and J. D. Doyle, 2017: 2016 Real-time COAMPS-TC ensemble. 2017 HFIP Annual Review Meeting, Monterey, CA, Naval Research Laboratory, http://www.hfip.org/events/annual_meeting_jan_2017/presentations/Day1/1120AM-Moskaitis-COAMPS-TC-Ensemble.pdf.
National Hurricane Center, 2020: Definition of the NHC Track Forecast Cone. Accessed 29 September 2020, https://www.nhc.noaa.gov/aboutcone.shtml.
Parrish, D. F., and J. C. Derber, 1992: The National Meteorological Center’s spectral statistical-interpolation analysis system. Mon. Wea. Rev., 120, 1747–1763, https://doi.org/10.1175/1520-0493(1992)120<1747:TNMCSS>2.0.CO;2.
Powell, M. D., P. J. Vickery, and T. A. Reinhold, 2003: Reduced drag coefficient for high wind speeds in tropical cyclones. Nature, 422, 279–283, https://doi.org/10.1038/nature01481.
Pu, Z., S. Zhang, M. Tong, and V. Tallapragada, 2016: Influence of the self-consistent regional ensemble background error covariance on hurricane inner-core data assimilation with the GSI-based hybrid system for HWRF. J. Atmos. Sci., 73, 4911–4925, https://doi.org/10.1175/JAS-D-16-0017.1.
Rutledge, S. A., and P. V. Hobbs, 1983: The mesoscale and microscale structure of organization of clouds and precipitation in midlatitude cyclones. VIII: A model for the “seeder-feeder” process in warm-frontal rainbands. J. Atmos. Sci., 40, 1185–1206, https://doi.org/10.1175/1520-0469(1983)040<1185:TMAMSA>2.0.CO;2.
Smith, R. K., M. T. Montgomery, and G. L. Thomsen, 2014: Sensitivity of tropical-cyclone models to the surface drag coefficient in different boundary-layer schemes. Quart. J. Roy. Meteor. Soc., 140, 792–804, https://doi.org/10.1002/qj.2057.
Soloviev, A. V., R. Lukas, M. A. Donelan, B. K. Haus, and I. Ginis, 2017: Is the state of the air-sea interface a factor in rapid intensification and rapid decline of tropical cyclones? J. Geophys. Res. Oceans, 122, 10 174–10 183, https://doi.org/10.1002/2017JC013435.
Tiedtke, M., 1989: A comprehensive mass flux scheme for cumulus parametrization in large-scale models. Mon. Wea. Rev., 117, 1779–1800, https://doi.org/10.1175/1520-0493(1989)117<1779:ACMFSF>2.0.CO;2.
Tong, M., and Coauthors, 2018: Impact of assimilating aircraft reconnaissance observations on tropical cyclone initialization and prediction using operational HWRF and GSI ensemble–variational hybrid data assimilation. Mon. Wea. Rev., 146, 4155–4177, https://doi.org/10.1175/MWR-D-17-0380.1.
Torn, R. D., 2010: Performance of a mesoscale ensemble Kalman filter (EnKF) during the NOAA high-resolution hurricane test. Mon. Wea. Rev., 138, 4375–4392, https://doi.org/10.1175/2010MWR3361.1.
Torn, R. D., 2016: Evaluation of atmosphere and ocean initial condition uncertainty and stochastic exchange coefficients on ensemble tropical cyclone intensity forecasts. Mon. Wea. Rev., 144, 3487–3506, https://doi.org/10.1175/MWR-D-16-0108.1.
Torn, R. D., and C. A. Davis, 2012: The influence of shallow convection on tropical cyclone track forecasts. Mon. Wea. Rev., 140, 2188–2197, https://doi.org/10.1175/MWR-D-11-00246.1.
Torn, R. D., G. J. Hakim, and C. Snyder, 2006: Boundary conditions for limited-area ensemble Kalman filters. Mon. Wea. Rev., 134, 2490–2502, https://doi.org/10.1175/MWR3187.1.
Wang, S., Q. Wang, and J. Doyle, 2002: Some improvement of Louis surface flux parameterization. Preprints, 15th Symp. on Boundary Layers and Turbulence, Wageningen, Netherlands, Amer. Meteor. Soc., 547–550.
Weng, Y., and F. Zhang, 2012: Assimilating airborne Doppler radar observations with an ensemble Kalman filter for convection-permitting hurricane initialization and prediction: Katrina (2005). Mon. Wea. Rev., 140, 841–859, https://doi.org/10.1175/2011MWR3602.1.
Weng, Y., and F. Zhang, 2016: Advances in convection-permitting tropical cyclone analysis and prediction through EnKF assimilation of reconnaissance aircraft observations. J. Meteor. Soc. Japan, 94, 345–358, https://doi.org/10.2151/jmsj.2016-018.
Zhang, F., Y. Weng, J. A. Sippel, Z. Meng, and C. H. Bishop, 2009: Cloud-resolving hurricane initialization and prediction through assimilation of Doppler radar observations with an ensemble Kalman filter: Humberto (2007). Mon. Wea. Rev., 137, 2105–2125, https://doi.org/10.1175/2009MWR2645.1.
Zhang, Z., V. Tallapragada, C. Kieu, S. Trahan, and W. Wang, 2014: HWRF based ensemble prediction system using perturbations from GEFS and stochastic convective trigger function. Trop. Cyclone Res. Rev., 3, 145–161, https://doi.org/10.6057/2014TCRR03.02.
The HWRF ensemble is well tested and widely used, but not operational.
Note that for both of these cases, there are more ensemble members to the right than to the left of the verifying best track. This is consistent with the ensemble (and our deterministic system) exhibiting a slight right bias overall.
We exclude the earliest lead times since the spinup period artificially inflates the spread–skill ratio.
An earlier version of the ensemble was run at 27-/9-/3-km horizontal resolution configuration during testing, as presented at HFIP (Moskaitis et al. 2017). However, due to resource limitations and the desire to streamline the deterministic ensemble systems, a decision was made to subsequently run both systems at 36-/12-/4-km resolution in operations.