1. Introduction
Evaluating performance of tropical cyclone (TC) forecast models is a challenging problem. Traditionally, TC model performance is often examined from a common pointlike perspective in which a single set of the forecasted values pertaining to TCs including the maximum 10-m wind (VMAX), the minimum sea level pressure (PMIN), and the location of TC centers are compared against observations provided by the operational hurricane forecast centers in the form of best track data. These metrics are important as they measure directly impacts of the TC destructive power, and can be estimated quite accurately from different sources of observations (see, e.g., Black et al. 1997; Bell et al. 2000; Emanuel 2005, 2008; Webster et al. 2005; Marks et al. 2008; Rogers et al. 2012). For example, dropsonde or airborne measurements from reconnaissance flights can produce independent estimates of the PMIN and VMAX for the verification, while the location of the TC center can be estimated routinely from satellite images.
On the model development level and for detailed evaluation of model performance, simple use of the VMAX or track error metrics do not, however, represent the entire skill of a TC modeling system. A comprehensive appreciation of the TC model skill can be achieved only through the full structural verification such as vertical cross sections, wind radii, storm depth, vertical tilting, or spiral rainbands (Powell and Reinhold 2007; Drake 2012; Irish et al. 2008; Schenkel and Hart 2012; Bao et al. 2012). These structural characteristics are critical components for TC models as they have direct impact on many downstream applications such as storm surge forecasts or wave models. A recent study by Bao et al. (2012) showed that several vertical structure metrics as well as the dynamical consistency implied by the pressure–wind relationship obtained from idealized experiments with the Hurricane Weather Research and Forecasting Model (HWRF) could indeed provide useful insight into the overall performance of the model.
With an increasing number of TC observation platforms and enhancements such as airborne Doppler instruments or imagining radiometers, structural metrics become more important not only for better understanding of TC behavior but also for various needs of operational forecasting and model implementations. Several key questions specific to the performance of TC models that are of particular interest to explore are the following: (i) How realistically is storm size captured in the HWRF? (ii) How do the balance constraints between the mass and the wind fields in HWRF compare to observations? (iii) How well is the planetary boundary layer represented in the model? (iv) How much asymmetry would a storm possess under different shear environments? These questions are of importance for future improvements of any model that should be made for better TC forecasts.
From the operational point of view, the most difficult issue with the structural assessment of TC models is the lack of systematic real-time observations, especially within the inner-core region where extreme wind conditions over a very small area prevent current observing systems from obtaining detailed analysis of finescale processes and structures. Although the structural verifications based on different two-dimensional (2D) vertical or horizontal cross sections are the most exhaustive approach, it is hard in an operational environment to evaluate such 2D structures in real time because the 2D observations are typically not available except when the radar or flight reconnaissance data are accessible. In an ad hoc approach, one can carry out the structural verification for one or two TCs for which comprehensive sets of observations from radars, satellites, or airborne instruments are available. However, verification of only a few cases in general cannot produce representative statistics for the entire season. Temporal discontinuity, spatial inhomogeneity, or observational inaccuracy within the inner-core region of the various observing systems makes verification difficult to perform routinely in an operational setting.
At present, the operational real-time estimates of TC information contain a number of structural parameters including the radius of maximum wind (RMW); the 34-, 50-, and 64-kt (1 kt = 0.5144 m s−1) radii in four different quadrants; the radius of the outermost closed isobar; and the vertical depth of TC central region (specified as shallow, medium, or deep). However, verification of the model performance based on these additional storm size parameters is limited. In fact, these additional pieces of information have not been used thoroughly to evaluate hurricane model performance, mainly because such structural information contains considerable observational uncertainties such that verifications of selected cases have not yet provided results with any statistical significance.
In this study, we wish to focus on the verification of the newly implemented HWRF for the 2012 hurricane season (hereafter referred to as the H212 configuration) relative to the 2011 operational version (hereafter HOPS), based on several criteria including the wind radii verification, the dynamical consistency implied by the pressure–wind relationship, and the initial 6-h changes of VMAX and PMIN. Because of the lack of a continuous observational dataset for evaluating the 2D structures, we will focus on the wind radii information in addition to the traditional track and intensity errors as a proxy to probe the TC structure improvements of the new H212 implementation. While the wind radii information is not sufficient to fully measure the complete structure of TCs, the radii metrics can measure fairly well the relative difference in the storm size statistics between two model configurations when the sample is sufficiently large. A more complete approach would require using other sources of observations to verify different vertical cross sections of the forecasts. However, there are not enough of these datasets to develop extensive statistics of the different HWRF configurations, and therefore we limit our evaluations only to the above wind radii metrics in this study.
The rest of this paper is organized as follows. In the next section, an overview of the new FY2012 HWRF implementation (H212) and observational data sources are provided. Sections 3 and 4 present detailed results of different structural verifications. A summary and some conclusions are given in the final section.
2. Experiment descriptions
a. The 2012 HWRF implementation (H212)
In this study, the 2012 operational HWRF was evaluated for implementation during the 2012 hurricane season (Tallapragada et al. 2012). Several upgrades were made to the 2011 operational HWRF system (HOPS) as a result of major model development efforts supported by the National Oceanic and Atmospheric Administration (NOAA) Hurricane Forecast Improvement Project (HFIP; Gall et al. 2012). The HWRF system and related physics are designed specifically for TC forecasting, and consist of several major components including the WRF software infrastructure, the Nonhydrostatic Mesoscale Model (NMM) dynamic core (Janjic 2003; Janjic et al. 2001, 2010), the three-dimensional Princeton Ocean Model (POM) for the Atlantic basin, the one-dimensional POM for the eastern Pacific basin (Yablonsky and Ginis 2009), the National Centers for Environmental Prediction (NCEP) atmosphere–ocean coupler (Gopalakrishnan et al. 2012), and the vortex tracker (Marchok 2002). The H212 configuration is a new high-resolution HWRF with a triple-nest capability, based on the latest community version of the WRF-NMM, version 3.4a. The H212 configuration consists of three nested domains with horizontal grid resolution of 27, 9, and 3 km, respectively. The outermost domain is fixed in time, while the 9- and 3-km domains move with the storm center using a storm tracking algorithm (Trahan 2012). This tracking algorithm is based on a new centroid method that is essential for the high-resolution grids to efficiently and accurately identify and track TCs, and contributes to improved track and intensity forecasts as will be seen in section 4.
The H212 model domains are configured with 43 vertical hybrid pressure-sigma levels (Sangster 1960; Arakawa and Lamb 1977) in the rotated latitude–longitude coordinate with 216 × 432 grid points in the (x, y) direction for the 27-km parent grid, 88 × 170 grid points for the 9-km intermediate grid, and 154 × 272 grid points for the 3-km innermost grid, respectively. The H212 model configuration has a number of important physics upgrades consisting of modifications to the NCEP Global Forecast System (GFS) planetary boundary layer (PBL) based on observational findings (Gopalakrishnan et al. 2012; Zhang et al. 2012), improved Geophysical Fluid Dynamics Laboratory (GFDL) surface physics, improved Ferrier microphysics (Ferrier 1994), and implementation of the new GFS shallow convective parameterization (Hong and Pan 1996). The cloud-permitting 3-km nest is configured to explicitly resolve convection in the inner core of the TCs to be consistent with the higher spatial resolution grids. Lateral boundary conditions are taken from the GFS forecasts and updated every 6 h. The cumulus parameterization is called every 3 min with no call to the convective scheme for the innermost domain. In addition to the physics upgrades, another important component of the HWRF that generates the model initial vortex has been entirely redesigned for the 3-km domain, with improved interpolation algorithms, and storm size/intensity correction procedures (Gopalakrishnan et al. 2012; Liu et al. 2012). The improved vortex initialization component includes corrections to the storm size and to the three-dimensional vortex structure based on observed parameters including RMW, the averaged radius of 34-kt wind (R34), the radius of outermost closed isobar, VMAX, and PMIN. Each of these corrections requires careful rebalancing between the model winds, temperature, pressure, and moisture fields. Unlike the HOPS initialization in which a single size parameter (i.e., RMW) was used for correcting the storm size, it should be noted that H212 required two size parameters including the radius of the outmost closed isobar (or radius of the averaged 34-kt wind if the vortex is of hurricane strength) and RMW. This storm size correction is applied for both composite vortex (cold start) and cycled vortex (warm start) once the model storm size obtained from the vortex tracker (Marchok 2002) and the observed radii information does not match. Here R34 at each quadrant is the radial distance from the storm center at which the maximum sustained surface wind speed (using the 1-min average) is 34 kt (and similar for 50- and 64-kt radii). More technical details of the H212 new initialization can be found in Gopalakrishnan et al. (2012) and Liu et al. (2012).
The H212 configuration is fully coupled to a three-dimensional ocean model in the Atlantic basin and to a one-dimensional ocean model in the eastern Pacific basin. Assimilation of observational data is based on the community gridpoint statistical interpolation (GSI; Kleist et al. 2009) version V3.4. Data assimilation through HWRF GSI was performed for all storms in H212 instead of only for deep storms as was done in HOPS. It should be noted that no inner-core observations are assimilated in H212. For all experiments described in this report, GSI was used to assimilate only conventional data (in the prepbufr format) in the TC environment. The reason for not using the inner-core observations is mostly because of the potential negative impacts on vortex structure due to static nature of background error covariances used in the GSI three-dimensional variational data assimilation (3DVAR), which cannot take into account the flow-dependent characteristics required for data assimilation in the TC core region. Our experiments with different inner-core data assimilations showed that assimilation of the conventional data after performing vortex initialization tends to generate larger initial imbalance since the vortex is already close to a balanced state as imposed by the vortex initialization procedure (Liu et al. 2012). The balance constraints built in the GSI may not fully capture TC characteristics, and so assimilating the inner-core data with the GSI could lead to bigger imbalance at the meso- or smaller scales. However, there are plans to assimilate inner-core data whenever such data from airborne reconnaissance missions that target specific hurricanes are available operationally in real time. Usually the number of airborne data available in real time is limited; the aircraft based reconnaissance missions are conducted only when a storm has high potential threat to the U.S. mainland. More advanced ensemble Kalman filter (EnKF)–based hybrid ensemble-variational data assimilation techniques are now available in operational GSI (Wang et al. 2013; Li et al. 2013), enabling us to conduct more rigorous inner-core airborne data assimilation sensitivity experiments. The impact of airborne data assimilation on HWRF forecast skills will be presented in our upcoming study.
b. Model data
To have results that could ensure statistical significance for the operational model evaluation, experiments with the H212 model were conducted retrospectively for the 2010–11 hurricane seasons for all storms in both the North Atlantic (NATL) and the eastern North Pacific (EPAC) basins. Model forecasts were run out to 126 h throughout the life cycle of every storm at every 6-h interval. For the model input data, HWRF was initialized with the GFS spectral outputs available at the native resolution of T574L64. Observational data used for assimilation through the HWRF GSI include radiosonde, aircraft reports,1 surface ship and buoy observations, surface land observations, pilot balloon (pibal) winds, wind profilers, dropsondes, and scatterometer winds. The time window for collecting the observational data was set between ±1.5 h of synoptic time centered at 0000, 0600, 1200, or 1800 UTC. To avoid potential negative impacts of the assimilation of conventional data in the HWRF-GSI system near the TC region, the observational data were reduced within the 3-km domain. Specifically, all inner-core (within 111 km of the TC center) data including the surface synoptic observations (SYNOP), aviation routine weather report (METAR) or surface marine data, the Autonomous Temperature Line Acquisition System (ATLAS) buoy, surface mesonet, and synthetic hurricane data were excluded.
For verification, the postprocessed best track data (bdeck) in the Automated Tropical Cyclone Forecasting (ATCF) format provided by National Hurricane Center (NHC) were used exclusively, along with model-generated tracker outputs (adeck). These datasets provide full information including storm center, VMAX, PMIN, RMW, and radius at 34-, 50-, and 64-kt thresholds from observations and model output every 6 h. While there are some uncertainties in estimating different radius information, the goal of this study is to compare relative performance between the operational version (HOPS) and H212. Provided that the model forecast radii errors and the observational radii errors have no cross correlation, potential inaccuracies of the radius information in the best track data are expected to be the same in both HOPS and H212 and thus are of secondary importance.
3. Verification of storm structure
To first have some general picture of the performance of the new HWRF implementation, Fig. 1 shows the retrospective statistics of the mean track forecast errors, intensity errors, and intensity bias of H212 as compared to HOPS for the NATL and EPAC 2010–11 hurricane seasons. Here, the mean errors are defined as the sample average of the differences between the forecasted and the best track values for the entire 2010–11 seasons, independently for NATL and EPAC basins. While there are some potential issues with the robustness and resistance of such linear averaging, it is expected that the impacts of possible outliers are much reduced with ~500 forecast cycles during the two hurricane seasons in each ocean basin.
One notices in Fig. 1 that there are significant improvements in the track forecast errors in the NATL basin at lead time >72 h and intensity forecast errors beyond 48 h in the EPAC basin with the H212 configuration. While the improvements of the intensity forecasts and bias in the NATL basin are not significant at the 95% confidence level, the reduction of the track forecast error in the NATL basin is noticeable with the 5-day track errors decreased ~18% relative to HOPS. In relation to the reduction of the intensity forecast errors, the biases are also improved between 12 and 72 h in the NATL basin, albeit the improvement is not statistically significant at the 95% confidence level. The reduction in bias is noteworthy as any intensity improvement in terms of the absolute errors alone may not be statistically significant, especially when the intensity errors are within the current uncertainty in TC intensity measurements (Torn and Snyder 2012; Landsea and Franklin 2013). For the EPAC basin, an opposite picture is obtained with the most significant improvement obtained for intensity forecasts rather than the track forecasts, with 4- and 5-day intensity errors reduced from 28 and 34 kt to roughly 21 and 20 kt, respectively. Note, however, that the H212 bias intensity error in EPAC is larger than that for HOPS at all lead times up 4 days (Fig. 1f). Stratification of the storm statistics according to the storm initial intensity shows that the worse bias in EPAC is mostly caused by the underestimation of the storm intensity for the weak storms (not shown). While the underestimation of the storm intensity for weak TCs and overestimation of the storm intensify for the strong storms also exist in HOPS (cf. Figs. 8b,d), the larger improvement of H212 for strong storms than that for weaker storms appears to have an adverse impacts to the overall bias statistics in the EPAC basin as seen in Fig. 1f.
The improvement of the intensity forecast errors in the EPAC with the H212 configuration is remarkable as various reports have shown that the improvement in track forecasting should be larger than that of intensity forecasting (see e.g., DeMaria et al. 2007; DeMaria 2010; NHC). In fact, reports of the intensity and track forecast skill from NHC have shown that there has been virtually no change in intensity forecast errors over the last 30 years despite more than a 50% reduction in track forecast errors. This smaller reduction of the intensity errors as compared to that of the track error is indeed realized in the NATL basin (Fig. 1), which may be related partly to the deficient model physics such as the boundary layer representation, surface layer schemes, or cumulus parameterization. However, the much larger improvement of the intensity forecast in the EPAC appears to indicate that the benefit of the higher resolution and ocean coupling in the H212 configuration is maximized in this basin, as the EPAC storms are less influenced by nearby land interaction. In contrast, the complex topographic interactions in the NATL basin could have masked some of the benefit of the higher resolution and improved physics such that even better track forecasts could not help reduce intensity forecast errors. While such drastic intensity error reductions obtained with the H212 configuration may not directly translate to improved official forecasts, they at least indicate that the H212 configuration can contribute to improved guidance on intensity forecasts in future.
Given the overall improved performance of the H212 compared to HOPS, a question of interest is how the H212 configuration will perform in terms of the storm structure and internal dynamical consistency. As mentioned in the previous section, so far the statistics of most operational models have been examined based on the common metrics associated with track and intensity errors. These metrics are useful but do not necessarily reflect various improvements in storm structure. With reduced track and intensity forecast errors in H212, it is desirable to examine how the structure of TCs in H212 compared to HOPS. Specifically, we wish to examine if the storm size, an important indicator for the storm horizontal structure, as well as the dynamical constraints implied by the pressure–wind relationship of H212 are improved relative to HOPS.
a. Radii verification
Figure 2 shows the verification of the mean 34- (R34), 50- (R50), and 64-kt (R64) radii for the NATL and EPAC basins for the 2010–11 hurricane seasons. Here, the R34 at each quadrant is defined as the radius at which the mean tangential wind is equal to 34-kt (and similarly for R50 and R64), and the mean radii are obtained by taking an average of the radii in four different quadrants. It is apparent in Fig. 2 that improvements of the various storm radii are radical for both basins. Except for the R34 in the NATL basin, the overall reduction in the errors is nearly 80% at all forecast lead times, particularly for R50 and R64. Such a large reduction of the strong wind radii errors indicates that the H212 storms were able to more realistically capture the inner-core region of the storms. This is expected as storm structure and finescale processes are believed to be better resolved with higher resolution. As an illustration of the difference in the storm structure between HOPS and H212, Fig. 3 compares the vertical cross sections of the meridional wind of Hurricane Irene (2011) for a snapshot valid at the initial time 1800 UTC 22 October 2011 and another snapshot at the end of the 5-day forecast. Although VMAX was matched with observation in both configurations at the initial time, it is apparent that the H212 has a more compact inner core2 with the regime of high tangential flow extending from surface up to the 700-hPa level and the eye diameter less than 80 km. In contrast, the HOPS vortex has a significantly larger RMW with the inner core tilting more to the east with height. Farther expansion of the flow field in the azimuthal direction for this specific cycle shows that the northern sector has the most significant difference in the storm structure relative to HOPS (Fig. 4). Although the average radii difference appears to be small in this illustration, it is apparent that the new vortex initialization could help take into account different storm structure at different quadrants effectively so that the HOPS configuration could not be achieved. Such remarkable changes in the storm structure are observed not only at the initial time but also consistently preserved during most of the model integrations as seen in this example, which highlights the deficiency of the single VMAX metric in quantifying the model performance.
The radii error reduction is more apparent in the EPAC basin, which exhibits better storm size statistics for both the outer- (i.e., the 34-kt radius) and the inner-core regions (i.e., the 50- and 64-kt radii). In particular, the R34 error in the EPAC basin shows more than a 95% reduction. For instance, the initial R34 error is reduced from 35 n mi (1 n mi = 1.852 km) to less than 1 n mi in the EPAC basin, whereas the R34 initial error is still roughly 8 n mi in the NATL basin despite the fact that the R34 is used as an input parameter for the vortex initialization in the H212 configuration. The inconsistency of the R34 error in the NATL basin is mostly because the current initialization algorithm makes use of a combination of ROCI and R34 (when available), and the initial model R34 might not truly match the observed except for strong storms for which the R34 is used directly during the construction of the bogus vortex. As the bogus vortex is rescaled to fit with the observed intensity, the RMW, the ROCI, and the horizontal structure of the bogus storm appears to be deformed too much and this leads to inconsistency between the observed R34 and the bogus vortex (Liu et al. 2012; Gopalakrishnan et al. 2012). This is particularly clear for strong storms in the NATL basin for which the radial rescaling of the storm size based on the ROCI radius reported in the TC vitals (the real-time estimate of best track data) and the RMW could result in too much deformation of the initial storm size (cf. also Figs. 5 and 6). In addition to this rescaling issue, the 34-kt radii used for vortex initialization from the TC vitals data are slightly different from the postprocessed best track data used in verifying the model results. As such, the 34-kt radius in the NATL basin has some initial errors that cannot be entirely eliminated. In the EPAC basin, the situation is somewhat different as storms tend to be smaller in size. Therefore, the vortex initialization does not stretch the storm structure too much initially after the size correction, leading to smaller R34 initial errors in the EPAC basin. Furthermore, it is likely that there may have been more variability in the 34-kt wind radii for NATL TCs since there is much more in situ data with which to analyze the radii, such as aircraft, ship/buoy, and land stations. This could lead to more adjustments to the wind radii in the postanalysis relative to the EPAC where there is very little in situ data. Thus, the verified radii data in the NATL basin may have larger inconsistency as compared to the EPAC basin.
Although the radii verification shows an overall improvement for all quadrants, an intriguing issue with the NATL storms is that R34 keeps contracting with time in H212, rendering overall storm size too small compared to the observations near the end of the 5-day forecasts (Fig. 2). This happens with HOPS, but is more pronounced with H212. In the EPAC basin, the smaller size of the storms appears to help reduce such rapid development of grid-scale convection within the main circulation, and storm size error does not grow with time. Sensitivity experiments with the newly upgraded simplified Arakawa–Schubert convective parameterization scheme for mesoscale applications (known as MesoSAS; H.-L. Pan 2013, personal communication) showed that the 34-kt radius verification is improved with time in both EPAC and NATL with essentially no shrinking of the 34-kt radius in the forecasts.
For the inner-core radii verification, we found that the R50 and R64 errors are also reduced substantially for both basins, albeit the percentage of reduction is not as large as for R34. Analysis of each quadrant shows that the southwest quadrant persistently has the best verification as compared with observation, though the reasons for this remain unclear. This may be related to the storm asymmetry that tends to skew toward northern side of the storm (i.e., larger radii at the northern sectors) due to a number of factors such as beta effect, the translation speed for most of storms, or dominant westerly shear over the NATL and EPAC basins. As a result, the wind radii are better defined from observations in these quadrants. The substantial improvements of the storm radii in both the outer- and inner-core regions indicate that the storm structure has been well incorporated into and maintained in the H212 configuration that the previous HWRF operational versions could not achieve. Better initial storm size statistics in the H212 configuration together with higher-resolution dynamics and more consistent physics applicable at 3-km horizontal resolution appear to significantly improve the performance of the new implementation.
It should be noted that structural improvements are closely connected to the storm intensity forecasts errors because it is not likely for a TC to possess the intensity of category 5 with an inner-core size of 100 km. Various statistical and modeling studies (see, e.g., Knaff and Zehr 2007; Holland 2008) have shown that the storm size and the intensity have a close connection. In this regard, the relationship between the storm intensity (maximum wind speed at 10 m) and storm size is an additional aspect of the storm structural information that can be used to evaluate model performance. Previous studies (e.g., Willoughby and Rahn 2004; Knaff and Zehr 2007; Holland 2007; Kieu et al. 2010) showed that in general there is some connection between the storm intensity and the horizontal structure. The revised pressure–wind relationship proposed by Knaff and Zehr (2007) contains explicit size dependence that can be used to stratify this relationship for different storm structures.
Figure 5 shows the scatterplots of VMAX against normalized R343 for H212 and HOPS configurations along with the best track data. One notices first from the best track data that there is a significant connection between VMAX and the R34 with a positive correlation of ~0.63 in the NATL basin and ~0.70 in the EPAC basin. These positive correlations imply that storms with larger R34 appear to possess higher intensity. Comparison of the H212 and HOPS scatterplots in Fig. 5 shows that both configurations can capture the intensity-size dependence of storms quite effectively with the overall positive correlation similar to the observed fit. However, HOPS tends to overestimate R34 given the same VMAX in both basins. This overestimation tends to be apparent at the high wind regime for which HOPS produced larger R34, especially in the EPAC basin. Such differences between HOPS and the best track data apparently indicate some difficulty in the model dynamics, presumably due to insufficient resolution. In contrast, H212 exhibits a well-marked consistency despite slight underestimation at the weak intensity regime, and the overall best-fit pattern shows a striking improvement. For strong wind speeds, R34 shows much improved correlation with the storm intensity; the correlation between VMAX and R34 in H212 is 0.68 in the NATL basin and 0.72 for EPAC basin, respectively. In addition, the overestimation of the storm size in the high wind regime in both basins tends to be alleviated in H212 as compared to HOPS. Slopes of the best-fit curves for H212 and best track data are similar, suggesting the improved contribution of the dynamics in the H212 configuration.
To further evaluate storm size statistics, radii verifications are stratified based on the storm initial intensity for both basins (Figs. 6 and 7). Here, a storm is defined to be strong (weak) if its initial postanalysis best track VMAX is greater (less) than 50 kt. This 50-kt threshold is somewhat arbitrary but is acceptable as the 50-kt value can distinguish the strong and weak tropical storms. One can see in Fig. 7 that there are no significant changes in the statistics for strong storms as compared to the overall statistics (Fig. 2). The initial radius errors are very much similar to those observed in the statistics for entire season, indicating the dominant contribution of storm size errors to the initially strong storms. However, for weak storms, there is a noticeable change. Eliminating the strong storms actually results in somewhat similar statistics of the 34-kt radii in the NATL basin but leads to somewhat larger storm size errors in the EPAC as compared to all-storm statistics (Fig. 6). While the degraded statistics for the weak storms are anticipated as it is generally difficult to predict storm sizes for nonorganized systems, it is of interest to see that weaker storms in the NATL basin have smaller R34 errors at the initial time. Detailed examination of different quadrants shows that the western sector of the NATL storm has the smallest storm size errors while the eastern quadrants have a slightly larger size as compared to observations. Part of the reason for this could be due to the fact that most of the weak systems develop over the open ocean where such measurements of storm size could be challenging. Consistent with previous discussions, weaker systems tend to have less deformation in terms of horizontal structure during the vortex initialization step, leading to better R34 statistics at all lead times.
From a model development perspective, significant storm size improvement seen in Figs. 2–6 is a result of multiple upgrades in the H212 implementation that include a new vortex initialization, use of the higher-resolution third nest (at 3-km grid resolution), better PBL parameterization, as well as several critical major bug fixes related to nest movement. It is hard, if possible at all, to attribute the improvement of the storm size to any particular upgrade. However, we speculate that the new vortex initialization method and higher resolution could be the main factors because the better storm size statistics obtained with the H212 configuration are fairly consistent in a range of experiments that have been conducted at NCEP/EMC during the preimplementation testing and evaluation (not shown). In principle, one can isolate more concretely which factor is the main cause in storm size improvement by either using the higher-resolution configuration with the old vortex initialization separately, or implementing new vortex initialization in the old HWRF configuration. However, the modified vortex initialization is particularly tailored for 3-km resolution, making it not possible to separately evaluate the impact of either higher resolution or better vortex initialization. Besides, the preimplementation test plan does not have enough room for testing each individual factor, since these two changes were required at the same time.
It should be noted that while the radii verification shows promising improvement, there are some uncertainties in calculating different wind radii from the model output. Unlike in the idealized situations, real storms often possess some significant asymmetry, rendering it hard to estimate the radial profile of the tangential wind. To eliminate potential wrong radii associated with nearby systems, the current H212 radii calculation algorithm puts some upper bound on the values of R34 estimation (351 n mi in the current system; see, e.g., Marchok 2002). As such, a very large storm may have their R34 radii truncated occasionally. As more than 95% of the R34 data values are smaller than this upper bound, it is expected that this artificial upper bound of the R34 should have minimum impact on the robustness of the radii statistics obtained thus far.
b. Pressure–wind relationship
The relationship between the model-forecasted PMIN and VMAX is another important measurement for internal consistency of the model dynamics that the simple statistics of VMAX or PMIN errors cannot provide. This is because such pressure–wind relationships (PWR) can characterize a more general internal structure associated with the mass and wind distribution, especially within the inner-core region where the gradient wind balance is well approximated. For example, it would be highly unexpected for a model to produce a VMAX of 50 m s−1 with a corresponding PMIN of 1000 hPa. As discussed in Bao et al. (2012), this PWR is sensitive to different PBL and other physics schemes. As such, the degree to which the model PWR can compare to the observed PWR is a good indicator of the consistent dynamics and structure of TCs.
Results from previous operational HWRF versions showed that the model tended to possess a bias in the PWR in which VMAX was often significantly stronger than observations given the same PMIN (Liu 2009). There are several reasons for such overestimation of VMAX, including the unconstrained PMIN in the vortex initialization procedure, coarser horizontal resolution, inadequate representation of PBL, or unrealistic specification of momentum exchange coefficients in the surface physics. With significant reduction in the track, intensity forecast errors, and the storm size noted for H212 in previous sections, it is of interest to examine if the PWR is also improved in the new implementation.
Figure 8 shows the scatterplot of the model-simulated VMAX and PMIN and the corresponding best-fit observed PWR populated from best track data during the entire 2010–11 hurricane seasons in each basin for both the HOPS (right panels) and H212 (left panels) configurations. One notices that unlike the HOPS configuration in which the model best-fit PWR is substantially different from the observed PWR, the H212 configuration produces a consistent scattering of VMAX and PMIN as compared to the observation. Except for some particular points that show abnormally low PMIN when VMAX varies from 60 to 80 kt in the NATL basin (Fig. 8a), the overall H212-predicted VMAX and PMIN are well centered on the observed best-fit PWR with R2 ~ 0.8 for the NATL basin and even higher (0.94) for the EPAC basin. The abnormal points seen in the NATL basin are mostly due to the few very strong cycles of Hurricane Igor (2010) for which the model PMIN did not match well with the observation at the initial time. For example, for the 0000 UTC 21 September 2011 cycle of Igor, the observed PMIN was 920 hPa, but the model PMIN drops to ~895 hPa. These abnormal values of the PMIN could be related to the initialization algorithm which makes use of only VMAX, RMW, and ROCI (or R34-kt radius) for constructing the bogus vortex every cycle. A combination of this composite bogus vortex and the recycled vortex from previous cycle’s 6-h forecast is used to create an initial vortex without use of the PMIN information (Liu et al. 2012). The reason that the model PMIN is still unconstrained in H212 is mainly because the simple gradient balance that allows us to obtain PMIN given VMAX and RMW tends to underestimate the actual PMIN. In reality, VMAX, PMIN, RMW, and the warm core anomaly are all related to each other due to the internal TC dynamics (Kieu et al. 2010). For example, use of the gradient wind balance and the thermal wind relationship can show that a VMAX of 50 m s−1 will correspond roughly to a balanced warm core of 14 K, and a PMIN of ~950 hPa. For model implementation, such a connection is reflected entirely in the balanced relationship during the vortex construction. Because of the nongradient wind balance within the PBL, such a constraint on PMIN is clearly not well preserved and, therefore, we leave PMIN to be adjusted freely by the model.
In the EPAC basin, the fit between the H212 PWR and the observed PWR is more pronounced with most of the model-predicted VMAX and PMIN points concentrated around the observed best-fit PWR. This is consistent with the remarkable reduction of the intensity forecast errors seen in Fig. 1 in this basin. The fact that the H212 PWR could fit closely the observed PWR even though the PMIN is not used in the vortex initialization indicates that the higher resolution and better vortex construction are the key components in maintaining the dynamical consistency between the mass and wind fields. The relative similarity between the two best-fit H212 PWR curves for both basins is supportive of the overall improvement in the internal relationships of the H212 implementation evaluated for the 2010–11 hurricane seasons. A more comprehensive examination of the simulated storm dynamical constraints should take into account all parameters including VMAX, PMIN, RMW, translational speed, and storm location (see, e.g., Knaff and Zehr 2007; Holland 2008). However, this stronger constraint requires substantial fitting of the model values that we will examine in upcoming seasons.
c. Initial 6-h spinup/spindown
Another aspect of the internal dynamical relationships that could be used to evaluate the model performance is the correlation of changes of VMAX and PMIN during the first 6 h of model integration (termed as initial spinup/spindown of model vortex). This correlation indicates how much the model initial vortex is adjusting with the environment during the first 6-h period. Typically, a model vortex will take some time to adjust with the near-storm environment and rebalance itself during the first few hours. An ideal bogused vortex scheme would require 1) a consistent change of VMAX and PMIN during the initial adjustment, and 2) a minimum time for the balanced adjustment such that the model vortex can develop internal relationships consistent with the model environment as quickly as possible. Physically, the former requirement implies that the increase of VMAX will correspond to the deepening of PMIN (and vice versa), whereas the latter requirement is important as it allows the model vortex to adapt to its environment consistently within the model, especially during the rapid intensification or weakening phases which the TC models tend to have difficulty capturing.
Previous HWRF versions have been experiencing some unrealistic spinup/spindown of VMAX and PMIN during the first 6 h. In addition, there is some range of TC intensity in which HWRF tends to develop an inconsistent evolution of VMAX and PMIN. Figure 9 shows the stratification of the first 6-h changes of VMAX (black bars) and PMIN (gray bars) with respect to the storm initial intensity for both the HOPS (right panels) and H212 (left panels) configurations. One can notice substantial 6-h changes of both VMAX and PMIN for HOPS, especially at the weak and strong storm thresholds in the NATL basin (top panels). For weak storms, the 6-h change of VMAX is largely positive, reaching ~12 kt for the tropical depression phase. This implies that HWRF tends to have significant spinup for weak storms. In contrast, the first 6-h change of VMAX appears to be largely negative for storms with an initial intensity above the category-1 threshold. The large initial spinup/spindown of the HOPS configuration apparently indicates that the model initial condition is not well balanced, leading to the rapid changes seen in Fig. 9.
For the H212 configuration, it is promising to see that the magnitude of 6-h changes for both VMAX and PMIN is much reduced. In addition, there are no prominent spinup or spindown issues for either weak or strong storms. Instead, the first 6-h changes appear to be more uniformly distributed over the entire range of the initial intensity. This is best seen in the EPAC basin (bottom panels) where the storm intensity is generally not as extreme as those in the NATL basin. The overall reduction in the initial spinup/spindown is a clear demonstration that the model initial vortex is better aligned to its environment in the H212 configuration.
4. Summary and conclusions
In this work, performance of the recently upgraded operational HWRF for the 2012 hurricane season (H212) has been presented. Particular focus has been on the storm size verification as well as the dynamical consistency of the vortex structure in the new implementation. Results from the retrospective experiments of H212 for the 2010–11 hurricane seasons showed significant improvement compared to the 2011 operational HWRF (HOPS) in both the North Atlantic (NATL) and eastern Pacific (EPAC) basins. Specifically, verification of 34-, 50-, and 64-kt radii showed large improvements in all cyclone quadrants. Of importance is that reductions in the storm size errors were obtained not only for the initial time but also for the entire forecast lead times, which is consistent with the overall reduction in the intensity forecast errors of H212 as compared to HOPS. The improvement in storm structure at all forecast lead times indicates that H212 has demonstrated its capability in better capturing storm dynamics. The improvements with H212 are mainly attributed to improved vortex initialization and the higher horizontal resolution that allow the model to resolve the inner core of TCs more appropriately.
Further stratification of the storm radii verification with respect to the storm initial intensity shows that there is no significant change in the size statistics for strong storms as compared to overall statistics. For the weak storms, it was found that the H212 configuration has smaller initial storm size errors and the model could maintain the vortex structure well throughout the 5-day forecast in the NATL basin. This storm size verification was critical in helping us diagnose the issues related to too much contraction of the 34-kt radii of the strong storms in the NATL basin at the longer lead times that resulted in very small size of the 34-kt radii near the end of the 5-day forecasts. Our ongoing experiments seem to reveal that such strong contraction of the hurricanes in the NATL basin is mostly related to excessive development of grid-scale convection that tends to increase the inward convergence within the inner-core region.
Additional examination of the relationship between VMAX and PMIN showed that the H212 configuration also has major improvement in the pressure–wind relationship (PWR) compared to corresponding best track data. This improvement indicates that the H212 configuration is capable of preserving the balance between the mass and wind fields consistently, which can be attributed to the increased horizontal resolution of the model as well as improved model physics implemented in the H212 version. Further analysis of the initial 6-h changes of VMAX and PMIN showed that the model imbalance issues are greatly reduced in the H212 configuration. Specifically, the first 6-h changes of both VMAX and PMIN are observed to be reasonably small as compared to the 2011 operational HWRF version.
While the verification conducted thus far is based solely on the best track dataset that contains subjective uncertainties in estimation of the VMAX, PMIN, and various radii information, it is worth mentioning that our objective in this study is to examine the relative improvement of the newly implemented H212 configuration relative to HOPS. In this regard, the uncertainties of the best track dataset should be of secondary importance, and the improvements obtained from H212 are highly anticipated in future operational runs. In the upcoming HWRF implementation, the structural verifications will be based not only on the pointlike metrics from the best track dataset but also on more exhaustive and independent 3D observations such as Doppler radar data, satellite, HWIND data, or other available aircraft information.
Acknowledgments
This work was supported by the Hurricane Forecasting Improvement Project (HFIP) of NOAA. The EMC HWRF team would like to acknowledge tremendous help, suggestions, as well as continuous feedback from the HRD, DTC, URI, GFDL, ESRL, and NHC during the HWRF implementation, which helped improve the 2012 HWRF configuration substantially. We would also like to express our sincere thanks to all three anonymous reviewers and the editor for their careful and constructive comments, which helped to improve the manuscript greatly.
REFERENCES
Arakawa, A., and V. R. Lamb, 1977: Computational design of the basic dynamical processes of the UCLA general circulation model. Methods in Computational Physics, J. Chang, Ed., Vol. 17, Academic Press, 173–265.
Bao, J.-W., S. G. Gopalakrishnan, S. A. Michelson, F. D. Marks, and M. T. Montgomery, 2012: Impact of physics representations in the HWRFX on simulated hurricane structure and pressure–wind relationships. Mon. Wea. Rev.,140, 3278–3299, doi:10.1175/MWR-D-11-00332.1
Bell, G. D., and Coauthors, 2000: Climate assessment for 1999. Bull. Amer. Meteor. Soc., 81, 1328–1378, doi:10.1175/1520-0477(2000)081<1328:CAF>2.3.CO;2.
Black, M. L., J. F. Gamache, H. E. Willoughby, C. E. Samsury, F. D. Marks, and R. W. Burpee, 1997: Airborne radar observations of shear-induced asymmetries in the convective structure of Hurricane Olivia (1994). Preprints, 28th Conf. on Radar Meteorology, Austin, TX, Amer. Meteor. Soc., 577–578.
DeMaria, M., 2010: Tropical cyclone intensity change predictability estimates using a statistical-dynamical model. 29th Conf. on Hurricanes and Tropical Meteorology, Tucson, AZ, Amer. Meteor. Soc., 9C.5. [Available online at https://ams.confex.com/ams/29Hurricanes/techprogram/paper_167916.htm.]
DeMaria, M., J. A. Knaff, and C. Sampson, 2007: Evaluation of long-term trends in tropical cyclone intensity forecasts. Meteor. Atmos. Phys., 97, 19–28, doi:10.1007/s00703-006-0241-4.
Drake, L., 2012: Scientific prerequisites to comprehension of the tropical cyclone forecast: Intensity, track, and size. Wea. Forecasting, 27, 462–472, doi:10.1175/WAF-D-11-00041.1.
Emanuel, K. A., 2005: Increasing destructiveness of tropical cyclones over the past 30 years. Nature, 436, 686–688, doi:10.1038/nature03906.
Emanuel, K. A., 2008: The hurricane–climate connection. Bull. Amer. Meteor. Soc., 89, ES10–ES20, doi:10.1175/BAMS-89-5-Emanuel.
Ferrier, B. S., 1994: A double-moment multiple-phase four-class bulk ice scheme. Part I: Description. J. Atmos. Sci., 51, 249–280, doi:10.1175/1520-0469(1994)051<0249:ADMMPF>2.0.CO;2.
Gall, R., J. Franklin, F. Marks, E. N. Rappaport, and F. Toepfer, 2013: The Hurricane Forecast Improvement Project. Bull. Amer. Meteor. Soc.,94, 329–343, doi:10.1175/BAMS-D-12-00071.1
Gopalakrishnan, S., and Coauthors, 2012: Hurricane Weather Research and Forecasting (HWRF) Model: 2012 scientific documentation. NCAR Development Testbed Center Report, NCAR, 96 pp. [Available online at http://www.dtcenter.org/HurrWRF/users/docs/index.php.]
Holland, G., 2008: A revised hurricane pressure–wind model. Mon. Wea. Rev., 136, 3432–3445, doi:10.1175/2008MWR2395.1.
Hong, S.-Y., and H.-L. Pan, 1996: Nonlocal boundary layer vertical diffusion in a medium-range forecast model. Mon. Wea. Rev., 124, 2322–2339, doi:10.1175/1520-0493(1996)124<2322:NBLVDI>2.0.CO;2.
Irish, J. L., D. T. Resio, and J. J. Ratcliff, 2008: The influence of storm size on hurricane surge. J. Phys. Oceanogr.,38, 2003–2013, doi:10.1175/2008JPO3727.1.
Janjic, Z., 2003: A nonhydrostatic model based on a new approach. Meteor. Atmos. Phys., 82, 271–285, doi:10.1007/s00703-001-0587-6.
Janjic, Z., J. P. Gerrity Jr., and S. Nickovic, 2001: An alternative approach to nonhydrostatic modeling. Mon. Wea. Rev., 129, 1164–1178, doi:10.1175/1520-0493(2001)129<1164:AAATNM>2.0.CO;2.
Janjic, Z., R. Gall, and M. E. Pyle, 2010: Scientific documentation for the NMM solver. NCAR Tech. Note NCAR/TN-477+STR, 54 pp.
Kieu, C. Q., H. Chen, and D.-L. Zhang, 2010: An examination of the pressure–wind relationship for intense tropical cyclones. Wea. Forecasting, 25, 895–907, doi:10.1175/2010WAF2222344.1.
Kleist, D. T., D. F. Parrish, J. C. Derber, R. Treadon, W.-S. Wu, and S. Lord, 2009: Introduction of the GSI into the NCEP Global Data Assimilation System. Wea. Forecasting, 24, 1691–1705, doi:10.1175/2009WAF2222201.1.
Knaff, J. A., and R. M. Zehr, 2007: Reexamination of tropical cyclone wind–pressure relationships. Wea. Forecasting, 22, 71–88, doi:10.1175/WAF965.1.
Landsea, C. W., and J. L. Franklin, 2013: Atlantic hurricane database uncertainty and presentation of a new database format. Mon. Wea. Rev.,141, 3576–3592, doi:10.1175/MWR-D-12-00254.1.
Li, Y., X. Wang, M. Xue, and M. Tong, 2013: Assimilation of airborne radar radial velocity data with the GSI based EnKF-3DVAR hybrid system for the prediction of Hurricane Irene (2011). 17th Conf. on Integrated Observing and Assimilation Systems for the Atmosphere, Oceans, and Land Surface (IOAS–AOLS), Austin, TX, Amer. Meteor. Soc., 12.2. [Available online at https://ams.confex.com/ams/93Annual/webprogram/Paper219932.html.]
Liu, Q., 2009: HWRF pressure-wind relationship. NOAA NCEP Environmental Modeling Center HWRF model development Rep. [Available online at http://www.emc.ncep.noaa.gov/HWRF/weeklies/JAN09/Pressure_Wind_Relationship.ppt.]
Liu, Q., X. Zhang, S. Trahan, and V. Tallapragada, 2012: Extending operational HWRF initialization to triple-nest HWRF system. 30th Conf. on Hurricanes and Tropical Meteorology, Ponte Vedra Beach, FL, Amer. Meteor. Soc., 14A.6. [Available online at https://ams.confex.com/ams/30Hurricane/webprogram/Paper204853.html.]
Marchok, T. P., 2002: How the NCEP tropical cyclone tracker works. Preprints, 25th Conf. on Hurricanes and Tropical Meteorology, San Diego, CA, Amer. Meteor. Soc., P1.13. [Available online at https://ams.confex.com/ams/25HURR/techprogram/paper_37628.htm.]
Marks, F. D., P. G. Black, M. T. Montgomery, and R. W. Burpee, 2008: Structure of the eye and eyewall of Hurricane Hugo (1989). Mon. Wea. Rev., 136, 1237–1259, doi:10.1175/2007MWR2073.1.
Powell, M. D., and T. A. Reinhold, 2007: Tropical cyclone destructive potential by integrated kinetic energy. Bull. Amer. Meteor. Soc., 88, 513–526, doi:10.1175/BAMS-88-4-513.
Rogers, R., S. Lorsolo, P. Reasor, J. Gamache, and F. D. Marks, 2012: Multiscale analysis of tropical cyclone kinematic structure from airborne Doppler radar composites. Mon. Wea. Rev., 140, 77–99, doi:10.1175/MWR-D-10-05075.1.
Sangster, W. E., 1960: A method of representing the horizontal pressure force without reduction of station pressures to sea level. J. Meteor., 17, 166–176, doi:10.1175/1520-0469(1960)017<0166:AMORTH>2.0.CO;2.
Schenkel, B. A., and R. E. Hart, 2012: An examination of tropical cyclone position, intensity, and intensity life cycle within atmospheric reanalysis datasets. J. Climate, 25, 3453–3475, doi:10.1175/2011JCLI4208.1.
Tallapragada, V., and Coauthors, 2012: Operational implementation of high-resolution triple-nested HWRF at NCEP/EMC—A major step towards addressing intensity forecast problem. 30th Conf. on Hurricanes and Tropical Meteorology, Ponte Vedra Beach, FL, Amer. Meteor. Soc., 14A.2. [Available online at https://ams.confex.com/ams/30Hurricane/webprogram/Paper205287.html.]
Torn, R. D., and C. Snyder, 2012: Uncertainty of tropical cyclone best-track information. Wea. Forecasting, 27, 715–729, doi:10.1175/WAF-D-11-00085.1.
Trahan, S., 2012: Revised centroid method. NCEP Office Note, 3 pp. [Available online at http://www.emc.ncep.noaa.gov/HWRF/weeklies/JAN12/revised-centroid-method-draft2.pdf.]
Wang, X., D. Parrish, D. Kleist, and J. Whitaker, 2013: GSI 3DVar-based ensemble–variational hybrid data assimilation for NCEP global forecast system: Single-resolution experiments. Mon. Wea. Rev., 141, 4098–4117, doi:10.1175/MWR-D-12-00141.1.
Webster, P. J., G. J. Holland, J. A. Curry, and H.-R. Chang, 2005: Changes in tropical cyclone number, duration, and intensity in a warming environment. Science, 309, 1844–1846, doi:10.1126/science.1116448.
Willoughby, H. E., and M. E. Rahn, 2004: Parametric representation of the primary hurricane vortex. Part I: Observations and evaluation of the Holland (1980) model. Mon. Wea. Rev., 132, 3033–3048, doi:10.1175/MWR2831.1.
Yablonsky, R. M., and I. Ginis, 2009: Limitation of one-dimensional ocean models for coupled hurricane–ocean model forecasts. Mon. Wea. Rev., 137, 4410–4419, doi:10.1175/2009MWR2863.1.
Zhang, J. A., S. Gopalakrishnan, F. D. Marks, R. F. Rogers, and V. Tallapragada, 2012: A developmental framework for improving hurricane model physical parameterizations using aircraft observations. Trop. Cyclone Res. Rev., 1, 419–429, doi:10.6057/2012tcrr04.01.
The aircraft data included the aircraft/pilot report (AIREP/PIREP), tropical cyclone reconnaissance code (RECCO) observations, Aircraft Communications Addressing and Reporting System (MDCRS-ACARS), Tropospheric Airborne Meteorological Data Reporting (TAMDAR) data, and Aircraft Meteorological Data Relay (AMDAR) observations.
By convention, the inner core in this section is understood to be the region enclosed by the 50- and 64-kt radii, which is well bounded by the limit of 150 km defined in GSI inner-core data selection.
Here, the 34-kt radius is normalized as R0.269, where R is the 34-kt radius and the power of 0.269 is chosen to maximize the correlation between the best track VMAX and the 34-kt radius.