1. Introduction
Tropical cyclone (TC) postseason analysis data (i.e., best track) have been used for a wide variety of applications, such as forecast verification, and evaluating climatological trends (e.g., Emanuel 2005; Webster et al. 2005; Rappaport et al. 2009). Until recently, few studies have taken into account or tried to objectively estimate the inherent uncertainties in TC position and intensity, measured by minimum sea level pressure (SLP) or maximum wind speed, contained in these datasets (e.g., Landsea et al. 2006, 2012).
While the National Oceanographic and Atmospheric Administration/National Hurricane Center (NOAA/NHC) has provided a position uncertainty in each of their advisories since 1999, there are no such estimates of the intensity uncertainty. In most ocean basins, there are few in situ observations of TC intensity; therefore, this quantity is estimated using satellite-based techniques. The most widely used and accepted of these methods is the subjective Dvorak technique (Dvorak 1975, 1984), which involves determining the TC center, evaluating the cloud structure, measuring various quantities based on satellite imagery, converting these quantities into a “T number,” and estimating the current intensity (CI) based on a set of prescribed rules. Once a CI score is obtained, the forecaster can translate it into a maximum wind speed or minimum SLP based on a predefined table that varies depending on basin and organization. The interested reader is directed to Velden et al. (2006) for additional details and history of the Dvorak method.
Over time, various authors have verified Dvorak-based intensity guidance against coincident in time aircraft reconnaissance data (e.g., Sheets and McAdie 1988). More recently, Knaff et al. (2010) performed a comprehensive evaluation of Dvorak maximum wind speed estimates in the Atlantic basin from 1989 to 2008. Their results indicate that the Dvorak maximum wind speed estimates are a function of intensity, latitude, translation speed, size, and 12-h intensity trend, while the RMS error is primarily a function of intensity. Previously, Kossin and Velden (2004) documented a pronounced latitude-dependent bias in Dvorak minimum SLP estimates.
In addition to Dvorak-based intensity estimates, several other algorithms have been developed that estimate TC intensity using microwave data obtained from polar-orbiting satellites. The algorithms developed by Brueske and Velden (2003), Herndon and Velden (2006), and Demuth et al. (2004) use characteristics of microwave band brightness temperature images in the vicinity of TCs, which can measure various aspects of the TC warm core, to estimate minimum SLP and maximum wind speed. Later revisions of the Demuth et al. (2004) method include improved statistical techniques, more dependent data, and quadratic predictors that are meant to reduce errors for strong TCs (Demuth et al. 2006). The main drawback of these methods is that microwave imagers are located on polar-orbiting platforms, and thus they are not always available at synoptic times like Dvorak, which is based on geostationary images.
This manuscript extends the Knaff et al. (2010) work and estimates the uncertainty in TC position, minimum SLP, and maximum wind over a period of time using data available from the NHC. These estimates could be used to establish the expected lower bound for RMS differences between TC intensity forecasts and best-track estimates, provide confidence bounds to reject the null hypothesis of no change in basin-wide TC intensity, and provide observation error variances for data assimilation systems that assimilate TC position and intensity (i.e., Chen and Snyder 2007; Torn and Hakim 2009; Torn 2010), rather than employing vortex repositioning or vortex reconstruction (e.g., Liu et al. 2000; Kurihara et al. 1995).
Section 2 describes the data and method used in this study. In section 3, we present the uncertainty in TC position and intensity, while in section 4 we present lower bounds for TC intensity forecasts and mean intensity as a function of year. A summary and concluding remarks are found in section 5.
2. Data and methods
This study utilizes both subjective uncertainty estimates from the NHC and objectively derived estimates. The climatological TC position uncertainty is estimated by extracting the “position accurate within” value from NHC forecast advisories for both Atlantic and eastern Pacific Basin TCs from 2000 to 2009. This value is subjectively determined by an NHC forecaster based on a number of factors related to the difficulty in finding a center. The lowest position uncertainty values should be associated with times when the satellite image is characterized by a well-defined eye or aircraft reconnaissance data are available. It should be noted that these values apply to the real-time advisory position, not the best-track value. It is likely that the uncertainty in best-track positions is smaller by some unknown factor relative to the advisory value since the position can be retroactively corrected in the best track based on future data. This often occurs for positions determined during the night when only infrared imagery is available to the forecaster.
During the period considered here, NHC issued advisories 3 h after the standard synoptic time (0300, 0900, 1500, and 2100 UTC). In addition, special advisories were issued at 3-h intervals for systems threatening land. If NHC issued an advisory at the synoptic time, the position was assigned the uncertainty from that advisory; otherwise, it was assigned the value from the advisory issued 3 h later (i.e., the 0000 UTC position has the 0300 UTC advisory uncertainty) since the later advisory was based on data from the previous synoptic time. Any best-track entry that did not have an advisory issued within 3 h was not considered in this study. This often occurs when NHC adjusts the time of genesis following the season.




The errors in the Dvorak intensity estimates from the National Environmental Satellite, Data, and Information Service’ s (NESDIS) Satellite Analysis Branch (SAB) and the NHC Tropical Analysis and Forecasting Branch (TAFB) are estimated against best-track values (Davis et al. 1984; Jarvinen et al. 1984; Blake et al. 2006; McAdie et al. 2006) for those times with aircraft reconnaissance data during 2000–09. The translation between CI and minimum SLP and maximum wind speed (in kt, where 1 kt = 0.514 m s−1) is given in Table 1.1 In addition, this study also considers microwave-based intensity estimates based on the Demuth et al. (2006) (hereafter CIRA, for Cooperative Institute for Research in the Atmosphere) and Brueske and Velden (2003) (hereafter CIMSS, Cooperative Institute for Mesoscale Meteorological Studies) methods. All satellite estimates are obtained from fix files archived at NHC. Only those satellite intensity estimates that occur within 2 h of an aircraft fix are verified against the best-track intensity, which should strongly reflect the in situ measurements from the aircraft. This time criterion results in 938 Dvorak cases to verify, which is roughly 25% of all best-track fixes during this 10-yr period. This particular period was chosen because it provided a large number of cases, had a similar satellite resolution, and had a consistent flight-level to surface wind reduction (Franklin et al. 2003).
The lookup table for Dvorak CI vs tropical cyclone minimum SLP and maximum wind speed, taken from Dvorak (1984).
3. Uncertainty climatology
a. Position
In a climatological sense, the uncertainty in TC position has an inverse relationship with the intensity of the TC. Figure 1 shows histograms of advisory position uncertainty values for individual Atlantic basin advisories stratified by the best-track Saffir–Simpson category (Simpson 1974). The eastern Pacific results are qualitatively similar and thus are not shown. The tropical depression histogram is characterized by a skewed distribution, with a peak at 30 n mi (1 n mi = 1.852 km) with a broad tail that extends to 90 n mi, but also a number of times where the uncertainty is less than 10 n mi (Fig. 1a). In contrast, the major TC (category 3 and above, maximum wind speed greater than 49.4 m s−1) histograms are sharper and peak at 20 n mi (Figs. 1e–f). The larger uncertainty for weaker systems is likely due to difficulties in identifying a low-level circulation center under nondescript cirrus canopies. In contrast, more intense TCs often contain a well-defined eye in both visible and satellite imagery, which simplifies the identification of the TC center. In addition, we also stratified the average uncertainty by advisory time (not shown). The 0900 UTC advisories are characterized by 1 n mi greater uncertainty relative to other advisory times (statistically significant at 95%confidence), which could result from not having visible imagery to identify the center, though this idea would need to be evaluated more extensively.
Histogram of TC position uncertainty obtained from NHC forecast advisories for Atlantic basin TCs from 2000 to 2009 stratified by best-track intensity.
Citation: Weather and Forecasting 27, 3; 10.1175/WAF-D-11-00085.1
The decrease in position uncertainty with increasing intensity is evident when computing the RMS uncertainty over all cases (Fig. 2). For both the Atlantic and eastern Pacific basins, the uncertainty decreases almost linearly from 43 n mi for tropical depressions (TDs) to 16 n mi for category 4 and 5 TCs. Moreover, the Atlantic and eastern Pacific are statistically indistinguishable from each other; therefore, even though these numbers are subjectively determined by NHC forecasters, the values are consistent between the two basins, lending support to the idea that these climatological uncertainty values could be applied to any ocean basin.
RMS of NHC position uncertainty estimates from 2000 to 2009 as a function of TC best-track intensity. The top row of numbers shows the number of Atlantic basin cases, while the bottom row is for eastern Pacific cases.
Citation: Weather and Forecasting 27, 3; 10.1175/WAF-D-11-00085.1
One potential drawback of using the NHC advisory uncertainties for this kind of study is that these criteria are subjective and could change depending on forecaster, procedure, etc. To evaluate how appropriate these position uncertainties are, the difference between the center position used in the SAB and TAFB Dvorak classifications are computed for all Atlantic basin synoptic times where both are available. This calculation should provide an estimate of the uncertainty in position over a large number of storms and synoptic times.
In general, the histograms in Fig. 3 look similar to those in Fig. 1, with weaker storms characterized by greater probability for larger differences, though there are some notable exceptions. In nearly all categories, the SAB position differences are peaked at 10 n mi with longer tails in the distribution for weaker systems, while the NHC estimates are peaked around 20 n mi. The RMS of the SAB–TAFB position differences is roughly 5 n mi less than the RMS of the NHC uncertainty when stratified by TC category (Table 2), which suggests that the NHC is overestimating the position uncertainty. It is also possible that the SAB–TAFB difference underestimates position uncertainty because both SAB and TAFB operators use the same data sources to derive position; thus, they could have too similar of a position, particularly for weak systems or at night.
Histogram of the difference between the SAB and TAFB Dvorak position estimates for Atlantic basin TCs from 2000 to 2009 stratified by best-track intensity.
Citation: Weather and Forecasting 27, 3; 10.1175/WAF-D-11-00085.1
Homogeneous comparison between the RMS difference between the SAB and TAFB Dvorak positions and the RMS NHC position uncertainty stratified by best-track intensity for Atlantic basin TCs during 2000–2009 (n mi).
b. Intensity
Prior to describing the error in satellite intensity estimates, a brief summary of the aircraft fixes used in this verification is presented. Figure 4a shows the best-track locations of TCs where there was aircraft reconnaissance within 2 h of the satellite intensity estimate. Most of the fixes were west of 60°W where reconnaissance aircraft can reach a TC. Moreover, the less numerous microwave fixes are fairly well distributed among the larger set of Dvorak verification locations. Figure 4b shows the probability distribution function (PDF) of TC intensity for the times used here and the PDF of all Atlantic basin TCs during the same period. It appears there are comparatively more (fewer) times with aircraft data for category 4 and 5 TCs (TDs) relative to the Atlantic basin climatology, though other categories appear to be of similar frequency. This bias toward more frequent fixes for category 4 and 5 TCs is likely a consequence of their potential impact on human population; thus, they are sampled frequently by aircraft in this area. Furthermore, category 4 and 5 TCs are more likely to occur in this portion of the basin, which is within reach of aircraft.
(a) Location of satellite-estimated verification points that come within ±2 h of aircraft reconnaissance during 2000–09. The gray dots are points with only TAFB and SAB estimates, while the black dots are points with TAFB, SAB, and microwave-based estimates from CIRA and/or CIMSS. (b) Probability distribution function of all best-track maximum wind speed (solid line) and satellite-estimated verification times (dashed) during 2000–09.
Citation: Weather and Forecasting 27, 3; 10.1175/WAF-D-11-00085.1
Similar to previous Dvorak verification studies, the error in maximum wind speeds estimated via the Dvorak technique is a strong function of TC intensity. Figure 5 shows that for most categories, the individual differences between the SAB maximum wind speed estimate and the best-track value exhibit a normal distribution, with a peak value near zero (the TAFB distribution is qualitatively similar; not shown). The lone exception is category 3 TCs, which is characterized by a lower number of cases (76 over 10 yr; Fig. 5e). The lower number of cases is likely the result of a lack of intensity resolution in the Dvorak method at this Saffir–Simpson category. One notable difference between the different category histograms is the sharpness of the distribution, where more intense TCs are characterized by a broader distribution and are more likely to have large errors, or uncertainty in the intensity estimate. It should be noted that while more intense storms have larger absolute uncertainty, category 3 TCs actually have a smaller uncertainty than tropical storms (TSs) when considering the uncertainty as a percentage of the value itself (12%for category 3 TCs versus 20%for TSs). Nevertheless, this figure suggests that the Dvorak method does fairly well for a vast majority of storms; 90%of the fixes are within 10 kt of the best-track value.
Histogram of SAB maximum wind speed errors (forecast − observations) stratified by best-track intensity for those times when aircraft reconnaissance data are available within 2 h from 2000 to 2009.
Citation: Weather and Forecasting 27, 3; 10.1175/WAF-D-11-00085.1
The relationship between TC intensity and Dvorak error is clearly evident when comparing the RMS errors in the SAB and TAFB estimates (Fig. 6a). The RMS error in both the SAB and TAFB estimates increases from 7 kt for TD to 12 kt for category 4 and above TCs. In addition, the biases suggest that the Dvorak technique underestimates the intensity of TSs and category 4 and above TCs and overestimates the intensity of category 3 TCs. Although this study uses a smaller sample of cases, the RMS error and bias are quite similar to the Knaff et al. (2010) results (cf. their Fig. 3). Given this similarity, the interested reader is directed to Knaff et al. (2010) for a more detailed analysis of the biases sources observed here.
RMS error (thick line) and bias (thin line) in SAB (solid line) and TAFB (dashed line) for (a) maximum wind speed, (b) CI values, and (c) minimum SLP with respect to best-track data for those times when there is aircraft reconnaissance within 2 h of the estimate for Atlantic basin TCs from 2000 to 2009.
Citation: Weather and Forecasting 27, 3; 10.1175/WAF-D-11-00085.1
As noted by Knaff et al. (2010), one of the reasons that more intense TCs have larger intensity errors is the relative lack of precision in Dvorak CI numbers at those intensities (cf. Table 1). This idea is tested here by repeating the SAB and TAFB verification processes where the best-track maximum wind speed is converted into a CI number via linear interpolation of the values in Table 1. Indeed, Fig. 6b indicates that the RMS error in SAB and TAFB CI is approximately 0.55 T numbers at all categories, while the sign of the bias is similar to the maximum wind speed. Again, these results are similar to those of Knaff et al. (2010) and support their conclusion as to why there is greater error or uncertainty in Dvorak estimates for more intense TCs.
Similar to maximum wind speed, the error in Dvorak estimates of minimum SLP is a strong function of TC intensity. For all categories, the histogram of individual time errors has a Gaussian-like shape, with a peak near zero (Fig. 7). Moreover, more intense TCs are characterized by broader tails to the distribution, which suggests greater possibility of larger error and thus higher uncertainty. This idea is confirmed in the RMS error, which shows a linear increase with TC category (Fig. 6c). Finally, the distributions for category 2 and greater TCs are more skewed toward positive errors, meaning that the actual storm has a lower minimum SLP relative to the Dvorak estimate. Although the source of this bias is beyond the scope of this study, it could be related to the breakdown of the linear relationship between pressure and wind for more intense TCs (e.g., Knaff and Zehr 2007; Holland 2008; Courtney and Knaff 2009).
As in Fig. 5, but for minimum SLP.
Citation: Weather and Forecasting 27, 3; 10.1175/WAF-D-11-00085.1
During most synoptic times, both SAB and TAFB produce separate Dvorak intensity estimates. We consider these two values to be an ensemble of two members and take the ensemble mean as a consensus intensity estimate and the sample standard deviation given by (2) as a predictor of the magnitude of that estimate’ s error. To use the standard deviation in this way, the RMS error and RMS standard deviation should match each other when averaged over a large number of cases (e.g., Murphy 1988). Figures 8a,b show that the maximum wind speed and minimum SLP consensus error is roughly twice as large as the RMS standard deviation at all categories, which indicates that the standard deviation of these two estimates cannot be used to predict the intensity uncertainty. This is likely because the errors in the SAB and TAFB estimates are not independent; other studies have shown that Dvorak estimates are not overly sensitive to the individual doing the analysis (e.g., Mayfield et al. 1988).
RMS error (solid) and RMS standard deviation (dashed) in TC (a) maximum wind speed and (b) minimum SLP based on the SAB+TAFB consensus for synoptic times with aircraft reconnaissance within 2 h for Atlantic basin TCs as a function of best-track intensity from 2000 to 2009. (c),(d) As in (a),(b), but for the SAB+TAFB+CIRA consensus. (e),(f), As in (a),(b), but for the SAB+TAFB+CIRA+CIMSS consensus. The numbers along the bottom of (a), (c), and (e) give the number of times used in the verification.
Citation: Weather and Forecasting 27, 3; 10.1175/WAF-D-11-00085.1
An ensemble of estimates including other satellite-based algorithms could potentially provide a better representation of the intensity uncertainty. This possibility is assessed by repeating the consensus calculation above, but for times where there is a SAB, TAFB, and CIRA intensity estimate within 2 h of an aircraft reconnaissance fix from 2004 to 2009 (microwave estimates were not present in fix files prior to that). The relative infrequency of microwave estimates reduces the number of verification times by 84%compared to the SAB and TAFB verifications.
The addition of CIRA intensity estimates to the SAB+TAFB ensemble leads to a better match between the RMS error and standard deviation at several different intensity categories (Figs. 8c,d). This result is obtained because the RMS standard deviation in minimum SLP and maximum wind speed is nearly twice as large as the SAB+TAFB values at similar categories. It is noteworthy that the RMS error in the SAB+TAFB+CIRA consensus is much larger than the SAB+TAFB consensus for category 4 and 5 TCs. The degradation of the consensus for category 4 and 5 TCs comes from a well-documented bias in the CIRA algorithm (Demuth et al. 2006; Walton 2009); thus, the increased spread at this category is the result of a bias, rather than random error. In addition, category 3 TCs are characterized by relatively small RMS minimum SLP errors compared to the standard deviation.
Another measure of whether the sample standard deviation usefully represents the uncertainty is to compare the error magnitude against the standard deviation at individual times (Figs. 9a,b). If the standard deviation is appropriate, then times with larger standard deviations should be characterized by a greater probability of having a large error. Although the slope of the regression line between the error and the standard deviation is less than one, suggesting that the standard deviation is lower than the error, times with larger error are typically characterized by relatively larger standard deviations. Figure 9 suggests that one could use the standard deviation among satellite-based intensity estimates as a crude estimate of the uncertainty at individual times.
Standard deviation vs error in TC (a) minimum SLP and (b) maximum wind speed from a SAB+TAFB+CIRA consensus for synoptic times with aircraft reconnaissance within 2 h for Atlantic basin TCs from 2000 to 2009. The solid line gives the best fit to the data. (c),(d) As in (a),(b), but for the SAB+TAFB+CIRA+CIMSS consensus.
Citation: Weather and Forecasting 27, 3; 10.1175/WAF-D-11-00085.1
Finally, introducing the CIMSS algorithm into the ensemble does not qualitatively change the relationship between the error and the sample standard deviation at any particular category. The RMS error and RMS standard deviation are roughly equivalent for maximum wind speed for up to category 4 TCs, while the minimum SLP RMS standard deviation is too large for category 2 and 3 TCs (Figs. 8e,f). In addition, the regression line between standard deviation and error at individual times is less than one for both minimum SLP and maximum wind speed (Figs. 9c,d).
4. Application to intensity forecasting
Because of the uncertainty in best-track information, even a forecast that agrees perfectly with reality will, in general, differ from the best-track position and intensity. Thus, when forecast error is defined as the difference between the forecast and best-track information, the best-track uncertainty gives an estimated lower bound on forecast error. These estimates can be used as another benchmark of forecast skill, similar to how NHC uses the Climatology and Persistence (CLIPER; Aberson 1998) and Decay Statistical Hurricane Intensity Forecast (SHIFOR; Knaff et al. 2003) models (e.g., Beven et al. 2008), and in determining whether year-to-year changes in forecast skill are statistically significant.
The following procedure is used to determine the lower bounds in forecast TC position, minimum SLP, and maximum wind speed error for a particular season. Since the uncertainty in these quantities is a function of storm and time, this calculation requires a more sophisticated method than a simple arithmetic mean over the desired times. For each synoptic-time best-track fix characterized by TD or greater intensity, we generate an error in that position, minimum SLP, and maximum wind speed by independently sampling (i.e., no serial correlation) from a normal distribution with mean zero and a time- and storm-dependent standard deviation determined from the following procedure. The position uncertainty is taken from the NHC advisory closest to the best-track time. In contrast, the uncertainty in minimum SLP and maximum wind speed is assigned based on whether there was aircraft reconnaissance within 2 h of the best-track time. If this condition was met, the uncertainty in minimum SLP is set to 2 hPa based on Willoughby et al. (1989); however, it is difficult to assign an uncertainty to the maximum wind speed when aircraft observations are present. Knaff and Zehr (2007) found that using the measured central pressure to estimate the maximum wind speed via the Dvorak pressure–wind relationship resulted in a 12-kt RMS error against the best track; however, this value was 50%smaller when using their revised pressure–wind relationship. Powell et al. (2009) determined an RMS error of 6 and 12 kt between common flight-level wind speed reductions to the surface and Stepped Frequency Microwave Radiometer (SFMR) measurements. Given the wide range of values, the uncertainty in maximum wind speed is set to 9 kt; the sensitivity to this assumption will be explored later. When aircraft data are not present, the uncertainty value is set to the SAB RMS error as a function of intensity (cf. Fig. 6a,c).
Once each best-track position, minimum SLP, and maximum wind speed error has been computed by this sampling method, the mean-absolute or RMS uncertainty is calculated for that set of best-track fixes. For example, one could consider all best-track fixes for a season, or the subset of best-track fixes that was used to validate a particular forecast lead time. To ensure that this procedure does not produce unrealistically high or low mean-absolute or RMS uncertainties due to the sampling procedure, the above process is repeated 10 000 times for the same set of best-track fixes, where the same time- and storm-dependent standard deviations are used to sample from a normal distribution, but the random number is different; this results in different estimates of the error in the best-track values at one time. For each of the 10 000 realizations of the errors for that set of best-track times, the mean-absolute and RMS uncertainties are computed. Each of the 10 000 RMS uncertainty values are then averaged together to obtain a mean RMS uncertainty for that set of best-track fixes.
Figure 10 shows the mean-absolute error in NHC official forecasts and the lower bound on forecast error (i.e., uncertainty) as a function of lead time for the 2005 Atlantic season, which is notable for the to-date record number of TCs and the severity of the damage to the United States (Beven et al. 2008). Whereas the track error increases from 15 n mi at 0 h2 to 260 n mi during the 5-day forecast, the uncertainty is roughly 32 n mi at all lead times. Given that the mean-absolute error is larger than the uncertainty, this figure suggests that there is potential for improvement in position forecasts, even at 12 h. For TC maximum wind speed, the mean-absolute error saturates at 20 kt for 72-h forecasts and beyond, which is similar to statistical models, such as the Statistical Hurricane Intensity Forecast model (SHIFOR; Knaff et al. 2003), while the uncertainty in maximum wind speed is roughly 7.8 kt at all lead times (Fig. 10b). Moreover, this figure indicates that it might be difficult to improve 12-h forecasts, while lead times beyond 72 h could be improved by up to 65%. Although NHC does not issue a minimum SLP forecasts, the uncertainty in this metric is 3.9 hPa for all lead times (Fig. 10c).
Mean-absolute error (solid) in NHC official (a) position, and (b) maximum wind speed and (c) minimum SLP forecasts as a function of forecast hour for all storms during the 2005 season. The dashed line is the lower bound in the best-track quantities determined using the method outlined in section 4.
Citation: Weather and Forecasting 27, 3; 10.1175/WAF-D-11-00085.1
Recalculating these quantities for other TC seasons suggests that the uncertainty in TC position and intensity has some variance from season to season. During 2000–09, the uncertainty in TC position decreased at a rate of 1 n mi yr−1 (Fig. 11a). It is not clear why this might have occurred, though it could be related to NHC forecasters having greater access to scatterometer winds, land-based radars outside of the continental United States, and passive microwave imagery on polar-orbiting satellites (J. Beven and C. Landsea 2011, personal communication). The hypothesis that the decrease in position uncertainty is associated with increasing data available to NHC forecasters is supported by Fig. 12, which shows the average number of independent position fixes in the NHC fix files within 2 h of each synoptic time. During this period, the number of fixes increased from less than three per synoptic time in 2000 to five fixes per synoptic time at the end of the period.
Mean-absolute uncertainty in best-track (a) position and (b) maximum wind speed, and (c) minimum SLP for Atlantic basin TCs as a function of year. (d)–(f) As in (a)–(c), but for the eastern Pacific Basin.
Citation: Weather and Forecasting 27, 3; 10.1175/WAF-D-11-00085.1
Average number of independent fix positions within 2 h of each synoptic time for Atlantic (solid) and eastern Pacific (dashed) TCs as a function of year.
Citation: Weather and Forecasting 27, 3; 10.1175/WAF-D-11-00085.1
In contrast to position, the uncertainties in maximum wind speed and minimum SLP are not characterized by a statistically significant trend over this 10-yr period (Figs. 11b,c). For maximum wind speed, the uncertainty during any one particular season is within 18%of the mean value of 7.8 kt, while minimum SLP is within 22%of the mean of 4.6 hPa. Much of the year-to-year variability can be explained by the differences in the percentage of times with aircraft reconnaissance during each season, with years with more aircraft times having lower uncertainty (Fig. 13a). The results are fairly insensitive to the uncertainty value used when an aircraft is present. Increasing or decreasing the maximum wind speed uncertainty by 3 kt [consistent with the uncertainty range discussed in Knaff and Zehr (2007) and Powell et al. (2009)] results in a 0.8-kt change in the uncertainty (12% of its value). Moreover, increasing the minimum SLP uncertainty associated with aircraft times by 2 hPa leads to a 0.6-hPa increase in the uncertainty in minimum SLP over the season. As a means of comparison, Landsea et al. (2012) estimated a position and intensity uncertainty of 60 n mi and 12 kt, respectively, for landfalling TCs during the 1886–1930 period; thus, the values obtained here seem appropriate given the substantial increase in observation density during this period.
Average best-track maximum wind speed (solid) and percentage of best-track fixes where an aircraft sampled the TC within 2 h of that time (dashed) for the (a) Atlantic and (b) eastern Pacific basins as a function of season.
Citation: Weather and Forecasting 27, 3; 10.1175/WAF-D-11-00085.1
The uncertainty in Eastern Pacific TC position and uncertainty show similar trends to their Atlantic basin counterparts. Eastern Pacific position uncertainty decreased at a similar rate, though the values themselves are roughly 2–3 n mi larger than for the Atlantic (Fig. 11d). Whereas the average maximum wind speed uncertainty is similar to the Atlantic basin, the minimum SLP is 0.8 hPa higher than for the Atlantic (Figs. 11e–f). Moreover, the apparent upward trend is not statistically significant at a reasonable confidence interval, though it could be related to the slight upward trend in season-average maximum wind speed during this period (Fig. 13b). Unlike the Atlantic, the year-to-year variability in intensity uncertainty appears to be related to changes in the mean intensity of all TCs during that season (Fig. 13b). Moreover, the eastern Pacific results are fairly insensitive to the choice of maximum wind speed uncertainty at those times when aircraft observations were present due to the relative infrequency of aircraft reconnaissance in this basin.
5. Summary and conclusions
Although TC best-track data are widely used in a number of weather and climate applications, there have been few attempts to document the uncertainty in TC position and intensity. Here, we quantify the uncertainty in these quantities using subjectively defined position uncertainties contained in NHC advisories and by verifying satellite-based intensity estimates against best-track data during times with reconnaissance data over a 10-yr period. The latter is used to define intensity uncertainty because a majority of TC intensity estimates are obtained from satellite-based algorithms and NHC does not provide estimates of intensity uncertainty.
In general, TC position uncertainty decreases as the TC becomes more intense. Moreover, TDs, TSs, and weaker TCs are characterized by greater variability in position uncertainty between different storms and times relative to major TCs, even though the modes of each distribution are within 10 n mi. The lack of variability for major TCs is likely due to the emergence of an eye in satellite images, which make it much easier to identify the circulation center. There is remarkable consistency in the mean TC position uncertainty for both the Atlantic and eastern Pacific basins, even with less frequent aircraft reconnaissance in the eastern Pacific. This suggests that although these individual position uncertainties are subjective and only apply to advisories, NHC forecasters have been consistent in the aggregate. It also appears that these estimates are 5 n mi greater than the difference between SAB and TAFB Dvorak positions; therefore, the NHC values are too large, or there is too much agreement between SAB and TAFB position estimates since they use the same data to define position.
Similar to TC position, the errors in satellite-based TC estimates, which are used as a proxy for the TC intensity uncertainty, are a function of the intensity itself. With a few exceptions, the SAB and TAFB Dvorak error distribution is nearly Gaussian at all intensities, with a peak around zero and longer tails for more intense TCs. These longer tails translate into larger RMS errors in minimum SLP and maximum wind speed for more intense TCs. This result is qualitatively similar to that of Knaff et al. (2010) and is partially related to the lack of resolution in Dvorak CI values for more intense TCs, which allows the Dvorak technique to have errors of 0.5 T numbers at all intensities.
Given there are multiple satellite-based algorithms to determine intensity, it is possible to combine them as an ensemble, using the ensemble mean as a consensus intensity estimate and the ensemble standard deviation to assign an uncertainty to that estimate. Whereas the standard deviation of a Dvorak-only ensemble was roughly half the RMS error of the ensemble mean in all categories, including microwave-based intensity estimates from CIRA and CIMSS led to a better match between the ensemble standard deviation and the consensus error, both at individual times and averaged over many cases. Unlike the SAB and TAFB-based Dvorak estimates, which only differ by who is performing the analysis and the intensity trend, the microwave-based estimates are objective, automated, and based on a completely different algorithm; therefore, adding CIRA and CIMSS intensity values is more likely to add diversity to the ensemble of intensity estimates. In the case of category 4 and 5 TCs, this diversity is not necessarily beneficial because the microwave schemes contain a significant bias. Nevertheless, this work suggests that NHC could use the variance among Dvorak and microwave-based intensity estimates to assign a crude intensity uncertainty when all of these estimates are available.
These TC uncertainty statistics determine a lower bound for RMS differences between position and intensity forecasts and the corresponding best-track information from 2000 to 2009. Over one season, the uncertainty in position and intensity is a weak function of forecast lead time. If forecast error is defined as the difference from the best-track information, the uncertainty in position is several times smaller than the error from day 2 and beyond. The uncertainty in intensity is more significant and is never less than ⅓ the error in the NHC official forecasts. While this leaves much room for improvement in intensity, these results also place an important limit on how closely TC intensity forecasts should reproduce best-track values, given that most TC intensity values are based on satellite estimates. In particular, these results suggest it might be difficult for the Hurricane Forecast Improvement Project (HFIP; http://www.hfip.org) to achieve a 50%reduction in intensity errors over the baseline. For lead times less than 72 h, this improvement would actually end up below the uncertainty in maximum wind speed. In the future, it will likely be difficult to reduce the uncertainty in TC intensity without more frequent aircraftlike observations, such as from unmanned vehicles or balloons. In addition, maximum wind speed is an ill-posed, noncontinuous metric; therefore, it will always have uncertainties related to undersampling.
Finally, there is some year-to-year variability in TC position and intensity uncertainty related to the frequency of aircraft reconnaissance and mean intensity for the season. For TC position, there is a steady decrease in uncertainty both in the Atlantic and eastern Pacific basins. While it is unclear what has led to this decline, it is likely related to NHC forecasters having access to additional data sources that can help identify the TC position. In contrast, both TC minimum SLP and maximum wind speed uncertainty are characterized by variability about a mean value. The mean value in the Atlantic is approximately 20%smaller than the in the eastern Pacific, which likely results from more frequent aircraft reconnaissance data. The Atlantic values vary by 8% depending on the assumed uncertainty in maximum wind speed when aircraft data are present (6–12 kt). Although not computed here, it is likely that the uncertainty in other ocean basins is closer to the eastern Pacific than to the Atlantic due to the lack of in situ observations.
Although the techniques outlined here are mainly based on objective estimates, there is some reason to believe that the uncertainties obtained here might be upper bounds on the actual value. In many instances, the intensity estimates are based on more than just the Dvorak value; it is likely that the synthesis of many sources of data leads to lower errors than just SAB and TAFB values. Even though it was not considered here, the intensity uncertainty is probably smaller than the Dvorak error value following an aircraft reconnaissance estimate due to the autocorrelation in TC intensity. At this point, it is not clear how to take this into account and it could provide a future direction of study. Nevertheless, these results suggest that the uncertainty in best-track position and intensity are not trivial and should be accounted for regardless of application. In the future, it may be necessary to revisit the values presented here based on revisions to satellite-based intensity estimates, new observation platforms, and revisions to the best-track and observation postprocessing.
Acknowledgments
We thank NHC for providing data for this study and Chris Landsea (NHC), Jack Beven (NHC), and John Knaff (CIRA) for providing feedback on the results. Two anonymous reviews helped clarify some portions of this work. This work is sponsored by the NOAA Hurricane Forecast Improvement Project (HFIP).
REFERENCES
Aberson, S. D., 1998: Five-day tropical cyclone track forecasts in the North Atlantic basin. Wea. Forecasting, 13, 1005–1015.
Beven, J. L., and Coauthors, 2008: Atlantic hurricane season of 2005. Mon. Wea. Rev., 136, 1109–1173.
Blake, E. S., Gibney E. J. , Brown D. P. , Mainelli M. , Franklin J. L. , Kimberlain T. , and Hammer G. R. , 2006: Tropical cyclones of the eastern North Pacific basin 1949–2006. Historical Climatology Series 6-5, National Climate Data Center, 162 pp.
Brueske, K. F., and Velden C. S. , 2003: Satellite-based tropical cyclone intensity estimation using the NOAA-KLM series Advanced Microwave Sounding Unit. Mon. Wea. Rev., 131, 687–697.
Chen, Y., and Snyder C. , 2007: Assimilating vortex position with an ensemble Kalman filter. Mon. Wea. Rev., 135, 1828–1845.
Courtney, J., and Knaff J. A. , 2009: Adapting the Knaff and Zehr pressure–wind relationship for operational use in tropical cyclone warning centers. Aust. Meteor. Oceanogr. J., 58, 167–179.
Davis, M. A. S., Brown G. M. , and Leftwich P. , 1984: A tropical cyclone data tape for the eastern and central North Pacific basins, 1949–1983: Contents, limitations and uses. NOAA Tech. Memo. NWS NHC 25, 16 pp.
Demuth, J. L., DeMaria M. , Knaff J. A. , and Vonder Haar T. H. , 2004: Evaluation of Advanced Microwave Sounding Unit tropical cyclone intensity and size estimation algorithms. J. Appl. Meteor., 43, 282–296.
Demuth, J. L., DeMaria M. , and Knaff J. A. , 2006: Improvement of Advanced Microwave Sounding Unit tropical cyclone intensity and size estimation algorithms. J. Appl. Meteor. Climatol., 45, 1573–1581.
Dvorak, V. F., 1975: Tropical cyclone intensity analysis and forecasting from satellite imagery. Mon. Wea. Rev., 103, 420–430.
Dvorak, V. F., 1984: Tropical cyclone intensity analysis using satellite data. NOAA Tech. Rep. 11, 45 pp. [Available from NOAA/NESDIS, 5200 Auth Rd., Washington, DC 20333.]
Emanuel, K. A., 2005: Increasing destructiveness of tropical cyclones over the past 30 years. Nature, 436, 686–688.
Franklin, J. L., 2011: Comments on estimating maximum surface winds from hurricane reconnaissance measurements. Wea. Forecasting, 26, 774–776.
Franklin, J. L., Black M. L. , and Valde K. , 2003: GPS dropwindsonde wind profiles in hurricanes and their operational implications. Wea. Forecasting, 18, 32–44.
Herndon, D., and Velden C. , 2006: Upgrades to the UW-CIMSS AMSU-based tropical cyclone intensity algorithm. Preprints, 27th Conf. on Hurricanes and Tropical Meteorology, Monterey, CA, Amer. Meteor. Soc., 4B.5. [Available online at http://ams.confex.com/ams/pdfpapers/108186.pdf.]
Holland, G. J., 2008: A revised hurricane pressure–wind model. Mon. Wea. Rev., 136, 3432–3445.
Jarvinen, G. M., Brown B. R. , and Davis M. A. S. , 1984: A tropical cyclone data tape for the North Atlantic basin, 1886–1983: Contents, limitations and uses. NOAA Tech. Memo. NWS NHC 22, 21 pp.
Knaff, J. A., and Zehr R. M. , 2007: Reexamination of tropical cyclone wind–pressure relationships. Wea. Forecasting, 22, 71–88.
Knaff, J. A., DeMaria M. , Sampson B. , and Gross J. M. , 2003: Statistical, 5-day tropical cyclone intensity forecasts derived from climatology and persistence. Wea. Forecasting, 18, 80–92.
Knaff, J. A., Brown D. P. , Courtney J. , Gallina G. M. , and Beven J. L. III, 2010: An evaluation of Dvorak technique–based tropical cyclone intensity estimates. Wea. Forecasting, 25, 1362–1379.
Kossin, J. P., and Velden C. S. , 2004: A pronounced bias in tropical cyclone minimum sea level pressure estimation based on the Dvorak technique. Mon. Wea. Rev., 132, 165–173.
Kurihara, Y. M., Bender A. , Tuleya R. E. , and Ross R. J. , 1995: Improvements in the GFDL hurricane prediction system. Mon. Wea. Rev., 123, 2791–2801.
Landsea, C. W., Harper B. A. , Hoarau K. , and Knaff J. A. , 2006: Can we detect trends in extreme tropical cyclones? Science, 313, 452–454.
Landsea, C. W., Feuer S. , Hagen A. B. , Glenn D. A. , Sims J. , Perez R. , Chenoweth M. , and Anderson N. , 2012: A reanalysis of the 1921–30 Atlantic hurricane database. J. Climate, 25, 865–885.
Liu, Q., Marchok T. , Pan H.-L. , Bender M. , and Lord S. J. , 2000: Improvements in hurricane initialization and forecasting at NCEP with global and regional (GFDL) models. NOAA Tech. Procedures Bull. 472, 7 pp.
Mayfield, M., McAdie C. J. , and Pike A. C. , 1988: Tropical cyclone studies. Part 2—A preliminary evaluation of the dispersion of tropical cyclone position and intensity estimates determined from satellite imagery. Tech. Rep. FCM-R11-1988, Federal Coordinator for Meteorological Services and Supporting Research, 2-1–2-17. [Available from Office of Federal Coordinator for Meteorology, Ste. 1500, 8455 Colesville Rd., Silver Spring, MD 20910.]
McAdie, C. J., Landsea C. W. , Neumann C. J. , David J. E. , Blake E. S. , and Hammer G. R. , 2006: Tropical cyclones of the North Atlantic Ocean 1949–2006. Historical Climatology Series 6-2, National Climate Data Center, 238 pp.
Murphy, J. M., 1988: The impact of ensemble forecasts on predictability. Quart. J. Roy. Meteor. Soc., 114, 463–493.
Powell, M. D., Uhlhorn E. W. , and Kepert J. D. , 2009: Estimating maximum surface winds from hurricane reconnaissance measurements. Wea. Forecasting, 24, 868–883.
Powell, M. D., Uhlhorn E. W. , and Kepert J. D. , 2011: Reply. Wea. Forecasting, 26, 777–779.
Rappaport, E. N., and Coauthors, 2009: Advances and challenges at the National Hurricane Center. Wea. Forecasting, 24, 395–419.
Sheets, R. C., and McAdie C. , 1988: Tropical cyclone studies. Part 1—Preliminary results of a study of the accuracy of satellite-based tropical cyclone position and intensity estimates. Tech. Rep. R11-1988, Federal Coordinator for Meteorological Services and Supporting Research, 1-1–1-49. [Available from Office of the Federal Coordinator for Meteorology, Ste. 1500, 8455 Colesville Rd., Silver Spring, MD 20910.]
Simpson, R. H., 1974: The hurricane disaster potential scale. Weatherwise, 27, 169–186.
Torn, R. D., 2010: Performance of a mesoscale ensemble Kalman filter (EnKF) during the NOAA High-Resolution Hurricane test. Mon. Wea. Rev., 138, 4375–4392.
Torn, R. D., and Hakim G. J. , 2009: Ensemble data assimilation applied to RAINEX observations of Hurricane Katrina (2005). Mon. Wea. Rev., 137, 2817–2829.
Velden, C., and Coauthors, 2006: The Dvorak tropical cyclone intensity estimation technique: A satellite-based method that has endured for over 30 years. Bull. Amer. Meteor. Soc., 87, 1195–1210.
Walton, C. A., 2009: An analysis of tropical cyclone intensity estimates of the Advanced Microwave Sounding Unit (AMSU). B.S. thesis, Dept. of Atmospheric Sciences, University of Miami, 72 pp.
Webster, P. J., Holland G. J. , Curry J. A. , and Chang H.-R. , 2005: Changes in tropical cyclone number, duration and intensity in a warming environment. Science, 309, 1844–1846.
Willoughby, H. E., Masters J. M. , and Landsea C. W. , 1989: A record minimum sea level pressure observed in Hurricane Gilbert. Mon. Wea. Rev., 117, 2824–2828.
Beginning in 2010, NHC adopted a Knaff–Zehr–Courtney pressure–wind relationship (Knaff and Zehr 2007; Courtney and Knaff 2009) to determine the minimum SLP based on the maximum wind speed. As a consequence, there can be large differences in this pressure–wind relationship compared to previous Dvorak-based estimates.
The nonzero difference at 0 h is due to the difference between the advisory and best-track positions.