## 1. Introduction

Improving prediction of tropical cyclones (TCs) is important because more accurate forecasting of these storms would lead to improved warning and increased potential to reduce their impact on life and property. Based on potential damage concerns, the primary parameters of interest for the TC prediction are track and intensity, where the track is defined in terms of progression of geographical location of a cyclone’s minimum sea level pressure and the intensity is expressed in terms of maximum near-surface wind speed within the storm (Rappaport et al. 2009; Jarvinen et al. 1984; Harper et al. 2009). The standard operational forecast skill analysis in the United States (http://www.nhc.noaa.gov/verification) indicates that over the last two decades, there has been a marked improvement in operational forecasting of TC tracks, while the intensity forecast skill has exhibited virtually no improvement over the same period (Rappaport et al. 2009). The increased accuracy of track forecasting has been attributed primarily to improved prediction of synoptic- to planetary-scale circulation using global numerical prediction models (Rappaport et al. 2009; McAdie and Lawrence 2000). In contrast, the lack of systematic improvement for the intensity forecast is not well understood. It has been hypothesized that progress has been limited by outstanding challenges of numerical prediction of mesoscale processes and multiscale interactions, in particular regarding the parameterizations of microphysics and planetary boundary layer processes in the models and data assimilation for forecast initialization within core regions of TCs (Gall et al. 2013).

Consequently, the majority of recent research into understanding and improving the prediction of TC intensity has been focused on evaluating and improving the mesoscale numerical models (Davis et al. 2010; Gopalakrishnan et al. 2012; Gall et al. 2013, and references therein). In the research as well as in the operational practice, the standard verification of the forecast has been to compare the modeled intensity values to National Hurricane Center’s “best track” (BT) data (http://www.nhc.noaa.gov/data; Landsea and Franklin 2013). These data have been used because they include the most complete, official operational record of estimates of TC intensity for the observed storms. The hypothesis in this study is that the widely accepted verification using the BT intensity data is not optimal for understanding and improving the prediction of TC intensity because it is based on the intensity metric that cannot be well represented neither by the observations nor numerical models.

The current National Weather Service’s metric of TC intensity is defined as the maximum 1-min-averaged wind speed at 10-m elevation anywhere within the TC. Although justified for the potential damage concerns (Jarvinen et al. 1984), this metric is formally unverifiable because it refers to the maximum wind speed within a highly turbulent, rapidly evolving, and spatially extensive wind field, which is practically unobservable. An observing technology that would measure variability at temporal and spatial scales of near-surface turbulence over vast areas may be required to overcome this limitation. Such technology currently does not exist. Additional difficulty with using the BT data for the verification is that the estimates of TC intensity are subjective, although guided by the available observations (Landsea and Franklin 2013).

The standard TC intensity metric is also not well represented by the numerical models because the minimum spatial scale within TC boundary layer is, in theory, given by the Kolmogorov scale and is on the order of 10^{−4} m (Holton 2004). Resolving this or similar scale in the numerical models is unfeasible. It follows that the standard practice of forecast verification in terms of the maximum wind speed difference from the operational estimates of the intensity includes unquantifiable representativeness errors both from the forecast and BT quantity. Additionally, such verification is incompatible among different models because the forecast maximum speed values refer to different spatial scales, depending on the model resolution. These conditions imply that the verification results based on BT metric and data cannot correctly inform about the prediction skill and the impact of changes to the forecast models on the skill nor about the differences in skill between different models. Given the importance of TC intensity prediction, improving the methodology for evaluating the numerical model forecast for this purpose is highly desirable.

In this study a new multiscale intensity metric (MSI) is developed that enables verification based on quantities that could be well represented by both the observations and models. Properties and utility of the new metric were evaluated using aircraft-based stepped-frequency microwave radiometer (SFMR) surface wind speed observations and mesoscale numerical model data. Theoretical formulation of the multiscale intensity metric is presented in section 2. The metric properties using the observations are discussed in section 3. Section 4 includes application of the new metric to verification of the TC intensity forecasts using the Hurricane Weather Research and Forecasting model (HWRF) data and comparison to the verification using the standard metric. Summary and conclusions are presented in section 5.

## 2. Multiscale intensity metric

*r*is radius, the maximum speed along azimuth for each

*r*could be represented by the following expression:where

*r*. The decomposition is truncated at the first harmonic for the following reasons:

- The wavenumbers 0 and 1 are the only structures that are observable by already-available observing technology, and also resolvable by most current models, including some global models. Thus, the prediction of TC intensity that is contained in these wavenumbers could be evaluated directly with respect to the existing observations and would be also compatible among different models.
- The contribution to the maximum speed from wavenumber-0 and -1 amplitudes alone is phase independent and thus represents a full scalar measure of the low-wavenumber intensity without the uncertainty with respect to wave superposition that would be present if additional lower wavenumbers would be included.
- Majority of the horizontal wind energy in the core region of TC is contained in the wavenumbers 0 and 1 (Uhlhorn and Nolan 2012; Reasor et al. 2000).

The nominal grid resolution that is currently used in most models, including some global models, would be sufficient to resolve

Observation-based values of the low-wavenumber intensity could be obtained from SFMR measurements (Uhlhorn et al. 2007) as reported by operational and research aircraft reconnaissance in the Atlantic basin during the hurricane season (Aberson 2010; Rogers et al. 2013b). Although aircraft reconnaissance data are available for only about 30% of the Atlantic basin TCs (Rappaport et al. 2009), the retrospective SFMR data since 1998 include a significant number of diverse TC cases (Table 1). Complementary retrospective data for the low-wavenumber intensity could be potentially derived using other TC surface wind analyses, such as the Hurricane Wind Analysis System (H*Wind; Powell et al. 1998) and even the global ocean surface wind analyses when the data coverage is sufficient for the regions of interest (Atlas et al. 2011). For demonstration purposes, the multiscale decomposition of the wind field in this study only utilizes SFMR data since they provide the best compromise between resolution and spatial coverage in TC vortex core region (Uhlhorn et al. 2007).

List of SFMR low-wavenumber analysis cases based on best-track storm strength.

Using the stochastic representation of the residual intensity does not imply that wavenumber harmonics greater than 1 within TC near-surface wind structure are necessarily stochastic quantities in nature. Although that could be considered in theory, assuming for example the influence of stochastic forcing by the convective processes, the analysis of stochastic nature of the TC wind field is beyond the scope of the current study. In this study the assumptions about the quantities in the expression (2) only reflect the conditions pertaining to the verification of TC intensity prediction using the numerical prediction models.

Based on this analysis we define the new MSI metric consisting of the deterministic

## 3. Evaluating the multiscale intensity using observations

### a. SFMR surface wind analysis

The methodology for performing the low-wavenumber wind speed analysis using the SFMR measurements is presented in detail in the accompanying Uhlhorn et al. (2013) study. In the following, only a brief summary of the analysis approach is included.

Like aircraft flight-level winds, surface winds measured by SFMR are obtained along a flight track. A typical hurricane reconnaissance pattern consists of flight legs radially toward and away from the storm center. Thus, data sampling is dense in the radial direction, but comparatively sparse in the azimuthal direction. A semispectral representation of the wind field is well suited based on this type of sampling pattern, with the radial dimension represented in physical space and the azimuthal dimension in wavenumber space. In the radial direction the analysis considers the scales of quantities resolved by the SFMR from the moving aircraft. Real-time observations are typically provided every 30 s, and each wind speed value represents the highest 10-s-averaged wind speed over each 30-s interval. At typical aircraft ground speeds and considering the SFMR radiative footprint, each wind observation represents an approximately 3-km spatial scale along the flight track and around 1 km in the cross-track dimension. Since each observation is spaced 30 s (~3.5 km) apart, there is only around 0.5 km of latency between observations.

For an individual flight, each radial leg (inbound or outbound from the center) is identified. For each leg, an RMW is found and is used to normalize the observed distance from the center of the storm. Also, an average RMW is computed for all legs. Each radial leg of wind data is then interpolated to a normalized grid (*r** = *r*/RMW) whose spacing depends on the average RMW to maintain consistency with a prescribed fixed resolution. Based on the storm center found for each pass from the wind speed minimum, the azimuth angle relative to storm-motion direction is computed for each observation. The angles are also interpolated to the same radial grid; thus, at each radius, a set of *V*(*θ*) observations are obtained, corresponding to each radial leg, where *V* is wind speed.

*V*

_{0}, wavenumber-1 amplitude

*V*

_{1}, and phase

*ϕ*

_{1}) describe the low-wavenumber surface wind field structure, with an associated residual error

*e*. This residual error should not be confused with the residual intensity in Eq. (2) that is, by definition, the difference between the estimate of the maximum wind speed and the maximum of low-wavenumber intensity. In the SFMR analysis the low-wavenumber maximum is found at (

*r** = 1,

*θ*=

*ϕ*

_{1}). As expected, this is generally not found at the same location as the maximum observed wind speed over a single flight.

The analysis is truncated at wavenumber 1 because the reconnaissance missions often consist of one single figure-four pattern (Aberson 2010), which provides observations at four azimuthal locations. To unambiguously determine *V*_{0}, *V*_{1}, and *ϕ*_{1} requires a minimum of three observations around the storm. As such, wavenumber 1 is the highest resolvable Fourier component. Also, the analysis is synoptic in nature, as the field over the observation period of several hours (3–6 h) is assumed stationary in order to include all radial flight legs in the analysis. Because the higher-order components have higher temporal variability, the amplitude and phase of these features could not be estimated with a high degree of confidence, even when azimuthal resolution is greater. The mean residual errors for ^{−1} for the analyses of 157 cases that are used in this study (Table 1). A more detailed presentation of the SFMR analysis can be found in the accompanying Uhlhorn et al. (2013) study, which focuses on the diagnostic analysis of first-harmonic TC wind speed asymmetry and relationships to storm motion and environmental shear.

### b. MSI decomposition using the observations

*V*

_{0}and

*V*

_{1}is performed for all research and operational reconnaissance cases for the period 1998–2011. The total number of cases after quality control is 157 (Table 1). The observation-based estimate of the residual intensity (referred to as

As expected for TCs with well-developed system-scale circulations, the low-wavenumber intensity exhibits a good linear fit to the BT intensity (Fig. 1a, black curve) with the slope and intercept regression coefficient values, respectively, of +0.88 and +1.38 and the coefficient of determination of 0.91. The high correlation between the estimate of peak intensity and low-wavenumber (i.e., the system scale) intensity is consistent with the fact that a tropical cyclone, once it reaches a certain level of organization (i.e., tropical storm to hurricane strength), intensifies increasingly on the system scale. For example, Van Sang et al. (2008) showed that at the early stages of development in simulated storms, the flow intensified first on the convective scales in association with localized rotating updrafts. During the rapid intensification stage, significant fluctuations in peak intensity on the order of 10–15 m s^{−1} associated with these rotating updrafts were not strongly manifest in the system-scale intensity metric. Beyond the rapid intensification phase, when their simulated storms were stronger and the axisymmetrization of vortex asymmetry was more efficient (Montgomery and Kallenbach 1997), the peak intensity and system-scale intensity behaved similarly.

The normalized PDF(^{−1}. Consistent with the theoretical analysis in section 2, the distribution is left bounded, non-Gaussian, and positively skewed. The lower bound is at about −11 m s^{−1}, with the total cumulative probability over negative values of only 0.1. Although theoretically possible by the argument of superposition of higher-order harmonics with the wavenumber 1, the negative differences between the SFMR-based low-wavenumber intensity and the BT estimate could be also a consequence of uncertainty in the BT intensity, resulting in an underestimate for some cases. The negative values are associated with lower TC intensities, but not exclusively, as evident from the scatter diagram in Fig. 1a (black dots). By design, the uncertainty in the SFMR-based analysis would not cause an overestimate of the low-wavenumber amplitude (Uhlhorn et al. 2013).

Recent studies have suggested that the uncertainty of BT intensity estimates is about 5 m s^{−1} (Landsea and Franklin 2013; Torn and Snyder 2010). This value is approximate because the uncertainty estimates in these studies were not—in fact, could not have been—based on comparison to the observations as the maximum 1-min-averaged wind speed metric is unobservable, as discussed in the introduction. The uncertainty estimate of BT intensity in Landsea and Franklin’s (2013) study was based on subjective estimates by the forecasters, whereas in Torn and Snyder’s (2010) study a lower bound of BT error variance has been estimated objectively using a relatively large sample of differences between BT values for which the aircraft reconnaissance measurements were considered, and the intensity estimates that were based only on the empirical techniques using passive remote sensing observations (the Dvorak techniques; Velden et al. 2006; Knaff et al. 2010). When the reconnaissance measurements are available, although they do not contain the direct measurement of the maximum speed, they are referred to as “fixes” and are given priority over other information for making the subjective BT intensity estimate (Landsea and Franklin 2013). For the computation in Torn and Snyder’s (2010) study it was explicitly assumed that the BT data have negligible bias in this case. The estimates of uncertainty in both studies were shown to vary slightly with the TC intensity. Based on these values, an average uncertainty of 5 m s^{−1} is used in the current study as the best available approximation.

The mean and standard deviation of the residual intensity in Fig. 1b are well within this uncertainty. The result implies that accurate forecasts of the low-wavenumber intensity alone would have the maximum skill achievable with respect to the standard BT intensity metric. The mean error of such forecasts would be only 4.24 m s^{−1} and the mean absolute error would be equal to the mean absolute norm of ^{−1}. The dependence of maximum achievable skill with respect to the BT intensity on the low-wavenumber intensity alone implies that the residual intensity in the forecast could not further enhance the skill with respect to the BT values but could reduce it if not consistent with

In summary, the MSI decomposition based on the SFMR low-wavenumber analysis and BT intensity estimates strongly suggests that practical predictability of TC intensity is determined by the low-wavenumber intensity (wavenumbers 0 and 1 in the wind field at 10 m). This important asymptotic condition for the optimal TC intensity forecast skill could not be derived using the standard intensity metric alone.

## 4. Verification of intensity forecasts using the MSI and BT metrics

The MSI decomposition is performed for the forecasts that were produced in the data assimilation and forecast experiments with the experimental version of HWRF (Aksoy et al. 2013; Gall et al. 2013). The intensity forecast verification in the current study is focused exclusively on demonstrating the properties of the intensity metrics and is not intended for evaluation of the forecast system that was used in the experiments.

### a. Forecast data

The experimental data assimilation and forecast system and its application to 83 cases of observed TCs in the Atlantic basin during the 2008–11 hurricane seasons are described in detail in Aksoy et al. (2013). Here, only the properties that are relevant to the current analysis are summarized.

The experimental system is composed of the HWRF Ensemble Data Assimilation System (HEDAS; Aksoy et al. 2013; Vukicevic et al. 2013) and an experimental version of the HWRF (Gopalakrishnan et al. 2011; Zhang et al. 2011). HEDAS is used to produce initial conditions for the high-resolution forecast grid at TC vortex scales. The observations that are assimilated include airborne Tail Doppler Radar (TDR) radial wind speed, SFMR, and flight-level and dropwindsonde measurements from research and operational aircraft reconnaissance (Aberson 2010). In the assimilation and forecasts the model was configured with horizontal grid spacing of 9 and 3 km on the outer and inner domains, respectively. The vortex-following nest motion was used in the forecast with a domain size of about 10° latitude × 10° longitude. The lateral boundary conditions for the outer domain were obtained from NOAA’s operational Global Forecast System (GFS). The HWRF forecast data that are used in the current analysis include up to 120 h of forecast time.

*r*. The Fourier decomposition was not performed for

*r*values that included land anywhere within the corresponding circle. As for the observation-based analysis, the forecast value of low-wavenumber intensity is the maximum of

The forecast data used in this study were available with a frequency of 1 h. A total of 2985 samples of

The number of SFMR wavenumber analysis cases and coincident forecast–SFMR occurrences by storm category. The storm categories are determined from the best-track maximum wind speed.

### b. Verification of statistical properties of intensity forecast using the MSI decomposition

The following analysis demonstrates the utility of the MSI metric for forecast verification in terms of statistical properties of the intensity at different scales and the relationship of the MSI metric to the standard intensity forecast metric.

Similar to the observation-based analysis, the linear regression between

To investigate the relationship of the low-wavenumber intensities with the maximum in terms of time evolution, the MSI decomposition of hourly forecast data was evaluated for each multiday forecast. The examples of three different TC cases in Fig. 2 show synchronous evolution of the maximum and axisymmetric intensity for different dynamical regimes of TC evolution in the forecast, such as steady state, rapid intensification, and decay. It is evident that the axisymmetric intensity (solid blue curves in Fig. 2) is the dominant contribution to the variability at time scales of 12 h and longer, whereas

The consistency of the low-wavenumber forecast time evolution with the actual TC evolution is not possible to verify using the currently available infrequent observations. The results suggest, however, that temporal phase errors of the wavenumber-1 harmonic in the forecast could be an important source of the forecast intensity errors with respect to the BT values that are available with a frequency of 6 h. Regarding the impact of axisymmetric forecast, the results suggests that this component of the wind speed could have largest impact on the forecast skill.

The statistical properties of residual intensity are evaluated next. The normalized PDF(^{−1}, respectively. Compared to the observation-based distribution (Fig. 1b, black curve), the forecast residual intensity exhibits significantly smaller variance, is more symmetric, and has a bias of −1.51 m s^{−1} (the difference between mean values of the two distributions). Thus, for the forecast data that are used in this study, the forecast residual intensity is inconsistent with the observation-based estimates when using the BT data. Because of the negative bias and small variance, this inconsistency implies a forecast deficiency in terms of high probability of underestimating the residual intensity of the individual forecasts with respect to the BT metric. Verification using the standard intensity error measures in the next section confirms this conclusion.

### c. Verification using the standard intensity measures

The standard procedure for forecast verification using the BT metric is to compute mean differences and mean absolute differences between the BT and coincident forecast intensity values as a function of forecast lead time over a sample of cases. Clearly, the same procedure could be applied to the verification of low-wavenumber and also residual intensity when the equivalent observation-based estimates are available. Using the subset of forecast values that are coincident with the SFMR analysis for years 2008–11 (Table 2), the standard error measures are computed for the low-wavenumber and residual intensities. The equivalent errors are also computed using the BT metric for the same subset of forecast data. As expected, the sample size for computing the means is different for different forecast lead times and is relatively small, especially at extended lead times. The largest sample size is 45 for the initial forecast time and the smallest is 2 for the lead time of 120 h. Consequently, the current analysis is not expected to provide a robust estimate of the overall forecast skill for the particular forecast system. The goal of the analysis is, instead, to compare the error measures using the two different metrics (the BT and MSI metrics). This comparison is not affected by the sample size.

It is evident from the expression (7a) that if the component errors differ in sign, the individual error realizations and their mean with respect to the BT metric would be the result of a compensation of these errors. Thus, the forecast bias using the BT metric could be small even when the component errors are large. This property suggests that using the standard intensity metric could result in an inaccurate assessment of forecast skill. Similarly, from the relationship (7b), the mean absolute error with respect to the BT values could indicate higher confidence in the forecast than would be supported based on the underlying component errors. The following verification using the sample forecast data provides an example of such a result. Verification based on experiments for two different versions of HWRF in section 5 provides an example of how the forecast system optimization using only the BT intensity metric could lead to decreased forecast ability by increasing the errors in one component that happen to be more favorable for the error compensation.

The mean error and mean absolute error for the low-wavenumber and residual intensity as a function of forecast lead time for the sample data are displayed in Fig. 3, together with the corresponding errors using the BT metric. Several important properties are evident. First, the mean errors as a function of forecast time for the forecast maximum (with respect to the BT values) and the wavenumber-0 component (with respect to the SFMR analysis) are highly correlated; the corresponding curves are virtually parallel. This result is consistent with the high correlation between the maximum and axisymmetric intensity evolution in the forecast (section 4b) and confirms the hypothesis that the errors with respect to the BT values are dominated by the errors in the axisymmetric component of the forecast wind. Second, the mean errors for the wavenumber-1 (red curve) and residual intensity (green curve) are negative for all forecast times. The negative mean error for the residual intensity is consistent with the differences between the observation- and forecast-based statistical distributions in Fig. 1. Third, the mean error for wavenumber-0 component exhibits a positive tendency for the first 60 h and, because it is initially negative, it changes sign to positive around forecast lead time of 36 h and continues to grow in amplitude. At the same time, the mean error amplitude using the BT metric becomes small. It is clearly evident that for *t* > 36 h the errors using the BT metric are the consequence of compensation between the positive errors for the wavenumber-0 component and the negative errors for the wavenumber-1 and residual intensity. The effect of error compensation is also evident in the mean absolute error measure (Fig. 3b): the errors using the standard metric are of similar amplitude to the wavenumber-0 errors for the forecast times for which the mean error compensation occurs, while the mean absolute errors for the wavenumber-1 and residual intensity are significant. For the forecast hours 54–60, the mean absolute error with respect to the BT metric is even smaller than the mean absolute error for the axisymmetric intensity alone. Thus, the mean absolute error using the BT metric does not reflect the presence of significant errors in the wavenumber-1 and residual intensity.

Regarding the assessment of skill, the verification results in Fig. 4 would lead to different conclusions based on the BT and MSI metrics. Neglecting the error amplitude variations due to small sample size, especially for *t* > 60 h, the results for BT metric suggest a nearly unbiased forecast after adjustment to the biased initial conditions. Such a result could be interpreted as a basis for confidence in the forecast model not having the systematic errors regarding the intensity evolution and thus the potential for good skill. The results for the MSI metric, however, suggest the exact opposite conclusion: because the forecast shows a tendency toward increasing the positive bias for the axisymmetric component and for underestimating the wavenumber-1 and residual intensities at all forecast times, the forecast model is likely to have systematic errors in representing the intensity evolution and, thus, potential for lower skill. Similarly, using the mean absolute error measure, the BT metric suggests moderate forecast skill with errors of about 7 m s^{−1} for the first 60 h, after the initial adjustment within 12 h (to the initial bias), whereas the MSI metric suggests rather low skill for the wavenumber-1 and residual intensity because the errors are large relative to the observed amplitude of these components (section 4) and moderate-to-low skill for the wavenumber-0 component, considering the accuracy of the observation-based estimate using the SFMR analysis.

In summary, the example of verification using the standard error measures demonstrates the value added by the MSI metric and the suboptimality of the BT metric for assessing the forecast skill. The consequence of the choice of metric when the verification is used to estimate the potential value of a forecast system change is discussed in the next section.

## 5. Influence of the metric on assessing the impact of a forecast system change

The experimental data assimilation and forecast system for HWRF was updated in 2012 using a more advanced version of the model and HEDAS algorithm. The main change to the model was an improvement to the parameterization of planetary boundary layer (S. Gopalakrishnan 2012, personal communication). For testing the influence of intensity metric on the assessment of benefits from the system upgrade, 11 forecasts for Hurricane Earl were used, spanning the period 28 August–4 September 2010. The new forecasts were generated using the updated HEDAS analysis for the initial condition and the updated HWRF version. The model grid and lateral boundary conditions were configured in the same way as for the old version (described in section 4a).

The mean and absolute mean errors as functions of forecast lead time for the old and new forecast system versions were computed using the MSI and BT metrics. The mean forecast errors shown in Figs. 4a and 4c indicate opposing impact of the forecast system changes using the two metrics. For the BT metric (Fig. 4a), a noticeable reduction of forecast bias (for the given small sample) is evident for the new version for all forecast times. In contrast, using the MSI metric (Fig. 4c), a large increase of the bias is shown for the axisymmetric intensity errors, from small-amplitude negative values with the old version to large positive values with the new version, whereas virtually no differences in errors are evident for the wavenumber-1 and the residual intensity. Using the mean absolute error measure (Figs. 4b and 4d), similar conclusions apply: the BT metric suggests an overall improvement, while the MSI metric shows significant increase of the axisymmetric intensity errors and a small change for the wavenumber-1 and residual intensity errors.

The apparent positive impact of the forecast system change by the BT metric is clearly an artifact of error compensation between the axisymmetric component errors and the combined errors of the wavenumber-1 and residual intensity. Specifically, in the new version, the significant increase of positive errors for the axisymmetric intensity is more favorable for compensation with unchanged negative errors for the wavenumber-1 and residual intensity. Thus, while no errors were reduced, using the BT metric alone suggests improvement to the forecast from the forecast system update. The example clearly demonstrates suboptimality of the BT metric and the value added by the MSI metric for assessing the impact of forecast system change on the intensity forecast skill in terms of mean and absolute mean errors.

## 6. Summary and conclusions

In this study a new multiscale intensity (MSI) metric for verification of TC intensity prediction is presented. The study is motivated by a need to overcome difficulties with the current practice for evaluating numerical model prediction of TC intensity that makes use of the standard operational TC intensity metric: the maximum 1-min sustained wind speed at 10-m elevation anywhere within a TC (http://www.nhc.noaa.gov). Although estimating and predicting the extreme value of TC near-surface wind speed is well justified based on potential damage concerns, the standard metric is not optimal for the verification of numerical models because it refers to an extreme value in a highly turbulent and spatially extensive wind field that is practically unobservable and inherently unresolvable by the models. Consequently, the evaluation of model prediction of TC intensity in terms of the difference from the operational subjective estimate of this quantity that is available in the best-track (BT) data by the National Hurricane Center includes unquantifiable representativeness errors for both the models and the BT values. In such circumstances, the verification results cannot correctly inform about the prediction skill and the impact of changes to the model on the skill or about the differences in skill between different models with different grid resolutions. The new metric is introduced in this study that is based on quantities that can be well represented both by the numerical models and observations.

The MSI metric is formulated using an azimuthal Fourier spectral decomposition of wind speed in polar coordinates centered on the TC vortex center. When applied to the wind speed at 10-m elevation, the decomposition allows for expressing the TC maximum wind speed (the standard BT intensity metric) in terms of the sum of the wavenumber-0 and -1 component amplitudes that are evaluated at the radius of maximum wind and a residual. Based on this decomposition, the MSI metric is defined to consist of the deterministic, observable, and resolvable low-wavenumber intensities (the amplitudes of the wavenumber-0 and -1 components) and the stochastic residual intensity. As such, the residual intensity is represented with a probability density function.

The observation-based low-wavenumber intensity values are obtained from the optimal estimation analysis using the SFMR observations provided by research and operational TC reconnaissance missions. The observation-based residual intensity is computed as the difference between the coincident BT intensity estimate and the sum of low-wavenumber intensities. From objective analyses of 157 TC cases for years 1998–2011, it is shown that the residual intensity is a non-Gaussian stochastic quantity with mean, standard deviation, and mean absolute norm values, respectively, of only 4.24, 4.08, and 4.94 m s^{−1}. The major conclusion from this finding is that the practical predictability of TC intensity with respect to the BT estimates, which have the expected uncertainty of about 5 m s^{−1} (Landsea and Franklin 2013; Torn and Snyder 2010), is determined by the low-wavenumber intensity because the mean and absolute mean errors of the forecast with the accurate low-wavenumber intensity would be the same as for the residual intensity. This result justifies focusing on the low-wavenumber intensity for the TC prediction.

The MSI and BT verification metrics are compared for forecasts that were produced in the data assimilation and forecast experiments for 83 cases of observed TCs for years 2008–11, using the Hurricane Weather Research and Forecasting model (HWRF; Aksoy et al. 2013; Gall et al. 2013). The major findings from this verification exercise are as follows:

- Using the MSI metric reveals that the forecasts could have significant errors in all three components of the intensity. For the forecast data used, both the statistical and deterministic verification measures indicate significant errors. For example, the statistical distribution of the residual intensity has less variance and smaller mean value than the observation-based residual, whereas the axisymmetric intensity exhibits a strong positive bias, and the wavenumber-1 intensity is characterized with a significant negative bias and mean absolute errors.
- Error compensation may occur between different intensity components if the errors are of different sign. For the forecast data that are used, the axisymmetric errors tend to be positive after forecast adjustment to the initial condition, whereas the wavenumber-1 and residual-mean intensity errors are negative for all forecast times.
- The error compensation directly affects the verification using the BT metric, because the errors based on this metric are exactly equal to the sum of errors for the wavenumber-0 and -1 components and the residual for each individual realization of the error (every time the forecast is compared to the observation-based values). Consequently, the error compensation directly affects the mean and mean absolute error measures using the BT metric.
- Because of the error compensation when using the BT metric, the verification could be misleading regarding the assessment of the forecast skill in terms of the ability of the NWP modeling system that is used. For example, for the forecast data that are employed in this study, using the BT metric suggests low forecast bias, while significant biases of opposite sign are present in the low-wavenumber and residual intensity components. Thus, verification using the BT metric may suggest higher forecast model skill than actually present.
- The error compensation could significantly affect the assessment of the impact of changes to the numerical model on the forecast skill when only the BT metric is used. This is demonstrated using an example of a recent upgrade with the HWRF experimental system. For this example, the BT metric indicates improvement of the intensity forecast, both for the mean and mean absolute measures, whereas the MSI metric shows a significant degradation of the axisymmetric intensity and no change for the wavenumber-1 and residual intensities. Serendipitously, the degradation of the axisymmetric intensity produced more favorable error compensation: larger than before positive errors for this component better compensated for the unchanged negative errors for the remaining components.

In summary, it is demonstrated that the MSI metric would provide more informative error analysis for understanding and improving the TC prediction than would the maximum intensity metric. Because the MSI metric makes use of low-wavenumber components within the TC wind field that could be well represented by both the observations and models, the forecast verification could include robust estimates of the uncertainty. This could in turn lead to better hypotheses about the error sources and improvements to the modeling and data assimilation.

## Acknowledgments

The authors thank Drs. Sim Aberson, Altug Aksoy, and Sundararaman Gopalakrishnan of HRD/AOML who performed experiments with the HWRF system and provided the forecast data for this study. Thanks also to Dr. Christopher Landsea of NHC for the references regarding the history of best-track data.

## REFERENCES

Aberson, S. D., 2010: 10 years of hurricane synoptic surveillance (1997–2006).

,*Mon. Wea. Rev.***138**, 1536–1549, doi:10.1175/2009MWR3090.1.Aksoy, A., , S. D. Aberson, , T. Vukicevic, , K. J. Sellwood, , and S. Lorsolo, 2013: Assimilation of high-resolution tropical cyclone observations with an ensemble Kalman filter using NOAA/AOML/HRD’s HEDAS: Evaluation of the 2008–11 vortex-scale analyses.

,*Mon. Wea. Rev.***141**, 1842–1865, doi:10.1175/MWR-D-12-00194.1.Atlas, R., , R. N. Hoffman, , J. Ardizzone, , S. M. Leidner, , J. C. Jusem, , D. K. Smith, , and D. Gombos, 2011: A cross-calibrated, multiplatform ocean surface wind velocity product for meteorological and oceanographic applications.

,*Bull. Amer. Meteor. Soc.***92**, 157–174, doi:10.1175/2010BAMS2946.1.Davis, C., , W. Wang, , J. Dudhia, , and R. Torn, 2010: Does increased horizontal resolution improve hurricane wind forecasts?

,*Wea. Forecasting***25**, 1826–1841, doi:10.1175/2010WAF2222423.1.Gall, R., , J. Franklin, , F. Marks, , E. N. Rappaport, , and F. Toepher, 2013: The Hurricane Forecast Improvement Project.

,*Bull. Amer. Meteor. Soc.***94**, 329–343, doi:10.1175/BAMS-D-12-00071.1.Gopalakrishnan, S. G., , F. Marks Jr., , X. Zhang, , J.-W. Bao, , K.-S. Yeh, , and R. Atlas, 2011: The Experimental HWRF System: A study on the influence of horizontal resolution on the structure and intensity changes in tropical cyclones using an idealized framework.

,*Mon. Wea. Rev.***139,**1762–1784, doi:10.1175/2010MWR3535.1.Gopalakrishnan, S. G., , S. Goldenberg, , T. Quirino, , X. Zhang, , F. Marks Jr., , K.-S. Yeh, , R. Atlas, , and V. Tallapragada, 2012: Toward improving high-resolution numerical hurricane forecasting: Influence of model horizontal grid resolution, initialization, and physics.

,*Wea. Forecasting***27**, 647–666, doi:10.1175/WAF-D-11-00055.1.Harper, B. A., , J. D. Kepert, , and J. D. Ginger, 2009: Guidelines for converting between various wind averaging periods in tropical cyclone conditions. World Meteorological Organization Tech. Rep. WMO/TD 1555, 64 pp.

Holton, J. R., 2004:

*An Introduction to Dynamic Meteorology.*Academic Press, 531 pp.Jarvinen, B. R., , C. J. Neumann, , and M. A. S. Davis, 1984: A tropical cyclone data tape for the North Atlantic basin, 1886–1983: Contents, limitations, and uses. NOAA Tech. Memo. NWS NHC 22, 21 pp.

Knaff, J. A., , D. P. Brown, , J. Courtney, , G. M. Gallina, , and J. L. Beven II, 2010: An evaluation of Dvorak technique–based tropical cyclone intensity estimates.

,*Wea. Forecasting***25**, 1362–1379, doi:10.1175/2010WAF2222375.1.Landsea, C., , and J. Franklin, 2013: Atlantic hurricane database uncertainty and presentation of a new database format.

,*Mon. Wea. Rev.***141**, 3576–3592, doi:10.1175/MWR-D-12-00254.1.McAdie, C. J., , and M. B. Lawrence, 2000: Improvements in tropical cyclone track forecasting in the Atlantic basin, 1970–98.

,*Bull. Amer. Meteor. Soc.***81**, 989–997, doi:10.1175/1520-0477(2000)081<0989:IITCTF>2.3.CO;2.Montgomery, M. T., , and R. J. Kallenbach, 1997: A theory for vortex Rossby-waves and its application to spiral bands and intensity changes in hurricanes.

,*Quart. J. Roy. Meteor. Soc.***123**, 435–465, doi:10.1002/qj.49712353810.Powell, M. D., , S. H. Houston, , L. R. Amat, , and N. Morisseau-Leroy, 1998: The HRD real-time hurricane wind analysis system.

*J. Wind Eng. Ind. Aerodyn.,***77–78,**53–64.Rappaport, E. N., and Coauthors, 2009: Advances and challenges at the National Hurricane Center.

,*Wea. Forecasting***24**, 395–419, doi:10.1175/2008WAF2222128.1.Reasor, P. D., , M. T. Montgomery, , F. D. Marks, , and J. F. Gamache, 2000: Low-wavenumber structure and evolution of the hurricane inner core observed by airborne dual- Doppler radar.

,*Mon. Wea. Rev.***128**, 1653–1680, doi:10.1175/1520-0493(2000)128<1653:LWSAEO>2.0.CO;2.Rogers, R., and Coauthors, 2013a: NOAA’s Hurricane Intensity Forecasting Experiment (IFEX): A progress report.

,*Bull. Amer. Meteor. Soc.***94**, 859–882, doi:10.1175/BAMS-D-12-00089.1.Rogers, R., , P. Reasor, , and S. Lorsolo, 2013b: Airborne Doppler observations of the inner-core structural differences between intensifying and steady-state tropical cyclones.

,*Mon. Wea. Rev.***141**, 2970–2991, doi:10.1175/MWR-D-12-00357.1.Torn, R. D., , and C. Snyder, 2010: Uncertainty of tropical cyclone best-track information.

,*Wea. Forecasting***25**, 61–78, doi:10.1175/2009WAF2222255.1.Uhlhorn, E. W., , and D. Nolan, 2012: Observational undersampling in tropical cyclones and implications for estimated intensity.

,*Mon. Wea. Rev.***140**, 825–840, doi:10.1175/MWR-D-11-00073.1.Uhlhorn, E. W., , P. G. Black, , J. L. Franklin, , M. Goodberlet, , J. Carswell, , and A. S. Goldstein, 2007: Hurricane surface wind measurements from an operational stepped frequency microwave radiometer.

*Mon. Wea. Rev.,***135**, 3070–3085, doi:10.1175/MWR3454.1.Uhlhorn, E. W., , B. W. Klotz, , T. Vukicevic, , P. D. Reasor, , and R. F. Rogers, 2013: Observed hurricane wind speed asymmetries and relationships to motion and environmental shear.

*Mon. We**a. Rev**.,*doi:10.1175/MWR-D-13-00249.1, in press.Van Sang, N., , R. K. , Smith, , and M. T. Montgomery, 2008: Tropical cyclone intensification and predictability in three dimensions.

,*Quart. J. Roy. Meteor. Soc.***134**, 563–582, doi:10.1002/qj.235.Velden, C., and Coauthors, 2006: The Dvorak tropical cyclone intensity estimation technique: A satellite-based method that has endured for over 30 years.

,*Bull. Amer. Meteor. Soc.***87**, 1195–1210, doi:10.1175/BAMS-87-9-1195.Vukicevic, T., , A. Aksoy, , P. Reasor, , S. D. Aberson, , K. J. Sellwood, , and F. Marks, 2013: Joint impact of forecast tendency and state error biases in ensemble Kalman filter data assimilation of inner-core tropical cyclone observations.

,*Mon. Wea. Rev.***141**, 2992–3006, doi:10.1175/MWR-D-12-00211.1.Zhang, X., , T. Quirino, , K.-S. Yeh, , S. Gopalakrishnan, , F. Marks, , S. Goldenberg, , and S. Aberson, 2011: HWRFx: Improving hurricane forecasts with high-resolution modeling.

,*Comput. Sci. Eng.***13**, 13–21, doi:10.1109/MCSE.2010.121.