Official forecasts of tropical cyclone (TC) tracks issued by the China Meteorological Administration (CMA); the Regional Specialized Meteorological Centre in Tokyo, Japan; and the Joint Typhoon Warning Center (JTWC) were used to evaluate the accuracies, biases, and trends of TC track forecasts during 2005–14 over the western North Pacific. Overall, the JTWC demonstrated the best forecasting performance. However, the CMA showed the most significant rate of improvement. Two main zones were discovered in the regional distribution of forecast errors: a low-latitude zone that comprises the South China Sea and the sea region east of the Philippines, and a midlatitude zone comprising the southern Sea of Japan and the sea region east of Japan. When TCs moved into the former zone, there were both translational speed and direction biases in the forecast tracks, whereas slow biases were predominated in the latter zone. Twelve synoptic flow patterns of TCs with the largest error have been identified based on the steering flow theory. Among them, the most two common pattern are the pattern with the combination of cyclonic circulations, subtropical ridges, and midlatitude troughs (CRT, 26 TCs), and the pattern of the TCs’ track that cannot be explained by steering flow (NSF, 6 TCs). In the CRT pattern, TCs move northwestward forced by the cyclonic circulations and the subtropical ridges and then turn poleward and eastward under the influence of the midlatitude troughs. In the NSF pattern, storms embedded in the southwest flow by the cyclonic circulation and the steering flow suggest TCs should turn to the right and move northeastward but instead TCs persisted in moving northwestward.
Since the mid-1990s, there has been considerable progress in the operational forecasting of tropical cyclone (TC) tracks, which can be attributed to two major developments (Elsberry 2007). The first is the substantial improvement in the forecasting performance of numerical weather prediction (NWP) models. The accuracies of these NWP models have been enhanced considerably by 1) rapid developments in model physical parameterization schemes; 2) higher model resolutions; 3) greater availability of aircraft, satellite, and other observational data (Aberson 2010; Rappaport et al. 2009); and 4) development of data assimilation techniques (Hamill et al. 2011a,b). The second development is the operational application of the multimodel consensus technique, which can offset inherent random biases in the forecasted results of individual models (Goerss 2000; Goerss et al. 2004; Elsberry 2014). The development and operational application of this technique has been an important factor in the improvement of short-range track forecasts (Gall et al. 2013).
Following the continuous improvement in TC track forecasting, issuing 5-day TC track forecasts has become a global standard (Elliott and Yamaguchi 2014), with the possibility of extending the forecast lead time to up to 7 days in the future. However, significant track forecast errors (FEs) still exist, and cases do occur in which the positional error can exceed 1000 km over 3 days. For example, for Supertyphoon Lupit (2009), a cyclone that changed movement direction several times, some 3-day forecasts (the initial forecast times were 1200 and 1800 UTC 23 October and 0000 UTC 24 October 2009, respectively) issued by the various operational centers had FEs of >1500 km. At present, there remains a lack of understanding over the sources of such large FEs (Elliott and Yamaguchi 2014). There are primarily four sources for meteorologists to obtain TC track forecasts: official subjective forecasts (OFCs), NWP models, multimodel consensus, and ensemble prediction systems. The three latter methods are referred to as forecast guidance. Comparison and evaluation of the results between forecast guidance (including subjective and objective) and best-track datasets make it possible to diagnose and quantify systematic and random errors, thereby leading to improvements in both the official subjective methods and the numerical models. Hence, the analysis and evaluation of forecast results are important components in improving the accuracies of TC track forecasts.
Most operational centers analyze and evaluate their TC track forecasts regularly on a yearly basis (RSMC-Tokyo 2014; Chen and Cao 2014; Sopko and Falvey 2014). In addition, independent researchers also conduct error analyses of operational forecasts of TC tracks. Jarrell et al. (1978) performed an error analysis of the 1966–75 TC track forecasts over the western North Pacific (WNP), which were issued by the Joint Typhoon Warning Center (JTWC). Although the forecasts did improve during this period, it was found that they contained two primary sources of error: those related to recurvature and those associated with initial positioning. Based on this work, it was suggested that the NWP model should take into full consideration the physical mechanisms related to recurvature and, at the same time, the expansion of the observational systems should be intensified. Thompson et al. (1981) verified the FEs of TC tracks that were issued by the now-defunct Eastern Pacific Hurricane Center, and they had similar findings. Neumann and Pelissier (1981a) studied the 1971–78 OFCs for the Atlantic basin issued by the National Hurricane Center (NHC). After making adjustments according to the attendant year-to-year variations in average TC latitude and translational speed, they found no significant trend of increase or decrease in FEs. Thus, they concluded that there had been no progress during the 1970s in the skill of the NHC in forecasting TC tracks in the Atlantic basin. Furthermore, the OFCs issued by the NHC contained small right-of-track and slow biases.
After the above-discussed studies, the evaluation of OFC tracks focused on trend analyses of forecast accuracies. McAdie and Lawrence (2000) expanded on Neumann’s earlier work and they found that during 1979–98, the NHC’s 24-, 48-, and 72-h TC track forecasts over the Atlantic basin improved at annual average rates of 1.0%, 1.7%, and 1.9%, respectively. These trends passed statistical testing at the 0.05 significance level. However, Powell and Aberson (2001) examined inferred landfall forecasts at lead times of roughly 12, 24, 36, 48, and 60 h prior to landfall. Their evaluation results indicated that during 1976–2000 none of the NHC landfall location error trends showed statistically significant improvement. According to Franklin et al. (2003), the results of McAdie and Lawrence (2000) differed from those of Powell and Aberson (2001) because of the verification methodology they had used, as well as because of discrepancies between the objects analyzed in each study. For this reason, Franklin et al. (2003) selected TCs that had threatened the United States in 1970–2001 as their evaluation samples. They concluded that the NHC’s OFC performance for this type of TC had improved significantly during the three decades. Elsberry (2007) evaluated the 72-h FEs in the OFCs issued by various operational centers for 1990–2005 (see Fig. 1 in that study). The results showed that during 2000–05 such errors in all tropical cyclone basins of the Northern Hemisphere had been reduced significantly. This considerable improvement in the 72-h forecasts prompted the JTWC to begin issuing 120-h track forecasts in 2003. By 2005, the accuracy of the JTWC’s 120-h forecast was comparable to that of its 72-h forecast of a decade earlier (Elsberry 2007).
Following accurate evaluation of the performances of the individual operational centers in TC track forecasting over a particular sea area [e.g., the China Meteorological Administration (CMA); the Regional Specialized Meteorological Center in Tokyo, Japan (RSMC-Tokyo); JTWC; the Korea Meteorological Administration (KMA); and China’s Hong Kong Observatory (HKO), are five operational centers responsible for monitoring and forecasting the locations and intensities of all active TCs in the WNP], meaningful research and operational value can be obtained through a systematic evaluation and error analysis of the OFCs issued by the various centers. This type of study is of great importance not only for understanding the performance of the official forecasts but also for determining the current status and development trend of OFCs for specific sea areas.
However, the aforementioned studies focused on evaluating the accuracy of the OFCs issued by individual operational centers. Studies comparing the accuracies of TC track forecasts issued by different operational centers are rare. According to the World Meteorological Organization (WMO 2013), comparative analysis of forecast results should comply with the homogeneity of samples; that is, for a given TC, forecast initialization time, and lead time, all official forecasts of interest exist and are able to be verifiable against an existing best-track observation. Also, the verification result will be more robust with the larger sample size. In addition, the calculation of forecast errors should be based on an identical best-track dataset and an identically calculated algorithm. Nevertheless, the verification samples in the annual reports of the operational centers and the aforementioned studies were different. Moreover, the information on the best-track datasets used as a reference by these operational centers contained discrepancies. Hence, it would be neither systematic nor complete if data from the annual reports or aforementioned studies were used directly for objective comparisons.
Furthermore, to the best of the authors’ knowledge, the cases with larger forecast errors of TC tracks over the WNP in recent years and the associated error characteristics and synoptic flow patterns have not as yet been examined in any systematic way.
Therefore, the focuses of our work are to 1) determine which TC forecasting and warning center performs better than others in TC track forecasting over the WNP during 2005–14, 2) identify statistically significant error characteristics, and 3) characterize the synoptic flow patterns over the WNP that accompany larger errors in 72-h forecasts.
The remainder of this paper is organized as follows. The best-track and TC track forecast data and evaluation methods are described in section 2. Section 3 presents the evaluation results, characteristics identified through error analysis, synoptic flow patterns associated with larger forecast errors, and an assessment of the forecast performance in two failed cases. Finally, section 4 presents the conclusions.
2. Data and methods
a. Selection of OFCs and best-track datasets
OFCs for TC tracks over the WNP issued during 2005–14 by the CMA, RSMC-Tokyo, and JTWC were selected for this study. The forecast lead times were 24, 48, 72, 96, and 120 h. It was found that the RSMC-Tokyo issued relatively few OFCs for lead times of 96 and 120 h (approximately one-tenth the number issued by both the JTWC and the CMA). Therefore, based on the principle of larger sample size, only the homogeneous samples for the 96- and 120-h TC track forecasts issued by the JTWC and CMA were evaluated. The JTWC began issuing 96- and 120-h forecasts in 2003, while the CMA started issuing 96- and 120-h forecasts in 2008 and 2010, respectively. Hence, in this study, the durations for selecting 96- and 120-h forecasts were 2008–14 and 2010–14, respectively. Ultimately, the numbers of homogeneous samples for the 24-, 48- and 72-h forecasts from the three operational centers were 3520, 2807, and 2266. The numbers of homogeneous samples for the 96- and 120-h forecasts from CMA and JTWC were 1093 and 599, respectively (Table 1).
Currently, four organizations in the world provide best-track datasets for TCs over the WNP: the RSMC-Tokyo, JTWC, CMA, and the Hong Kong Observatory. Generally, discrepancies exist between the best-track datasets provided by the different centers regarding the TC position, intensity, and structural information (Song et al. 2010). Depending on the specific best-track dataset chosen as the reference, the evaluated TC forecast performances would be dissimilar (Yu et al. 2012). Here, the best-track datasets of the RSMC-Tokyo, CMA, and JTWC were used separately as references against which to calculate the FEs (Table 2). The results indicated that there were no major differences between the FE trends (Table 3). In the present work, the best-track dataset of the RSMC-Tokyo was chosen as the reference for all subsequent statistical analyses, as it undertakes specialized activities in analyzing and forecasting WNP TCs within the framework of the World Weather Watch Program of the WMO.
b. Evaluation metric and methods
The FE (in km) was the evaluation metric used for the analysis of the OFC performance. It is defined as the great-circle distance between the forecasted and best positions, and is given by (Neumann and Pelissier 1981a)
where and represent the latitude and longitude, respectively, of the TC center position.
The cross-track error (CTE) is defined as the minimum distance between the forecasted TC position and an interpolated observed track (a connection line of adjacent record for a TC in best-track datasets). A positive (negative) CTE indicates the forecasted position of a TC is to the right (left) of the best-track position. The along-track error (ATE) is the estimated great-circle distance between an observed TC position and the point of intersection of the cross track with the interpolated best track. The ATE is positive (negative) when a forecast position lies ahead of (behind) an observed cyclone (Shapiro and Neumann 1984).
To demonstrate the performances of the CMA, RSMC-Tokyo, and JTWC OFCs, their track FEs are examined through analyses of the results from a series of conventional evaluation methods. Specifically, the mean FEs, percentile distributions, and probability density distributions describe the forecast accuracies, stabilities, and distribution characteristics of error frequencies for a single forecast lead time, respectively. In addition, special evaluation methods such as ATEs and CTEs were used to describe the biases in forecasted TC translational direction and speed, while the regional distribution of the FEs described the characteristics of their spatial distribution.
Neumann and Pelissier (1981b) noted that the probability ellipse is a suitable tool for the visual interpretation of patterns of bias associated with a bivariate normal distribution. The centroid of the ellipse represents the mean bias, and the size, shape, and orientation of the ellipse describe the bias dispersion. Hence, the calculation of probability ellipses was used in this study to describe the characteristics of the distribution of the FEs.
a. Mean FEs during 2005–14
The numbers of homogeneous samples and mean FEs for the various forecast lead times of the OFCs issued by the three operational centers during 2005–14 are displayed in Table 1. They had similar 10-yr mean errors for the 24-, 48-, and 72-h TC track forecasts over the WNP, of approximately 110, 200, and 300 km, respectively. For the 24-h forecast lead time, the FEs of the JTWC were the smallest, followed by the RSMC-Tokyo and CMA. For the 48- and 72-h forecast lead times, the FEs of the JTWC were again the smallest, followed by the CMA and RSMC-Tokyo. For the 96- and 120-h forecast lead times (for which the RSMC-Tokyo had an insufficient number of samples), the FEs of the JTWC were smaller than the CMA. The standard deviations for the 24-h TC track forecasts issued by CMA, RSMC-Tokyo, and JTWC were 83, 72, and 75 km, respectively. For 48 h, the standard deviations were 145, 143, and 131 km, respectively. For 72 h, the standard deviations were 221, 230, and 219 km, respectively.
The distributions of track FEs for the various forecast lead times are shown in Fig. 1. For all three operational centers, the mean error of the TC track forecasts increased with increasing lead time, leading to larger interpercentile ranges (25th–75th and 5th–95th percentiles). We define the interquartiles range of the box plot of forecast errors as the stability of the forecasts. Figure 1 shows the stability of the TC track forecasts issued by the various centers. For 24-, 48-, and 72-h forecast lead times, all the mean FEs of the JTWC were smaller than those of the CMA and RSMC-Tokyo. In addition, its 5th, 25th, 50th, 75th, and 95th percentiles were also smaller. For the 96- and 120-h forecast lead times, the means and percentiles of the FEs of the JTWC were again smaller than those of the CMA. The 75th and 95th percentiles of the FEs are characterizations of samples with larger errors. This shows that the ratio of JTWC larger error cases was smaller than for the CMA and RSMC-Tokyo (especially so for longer forecast lead times). This might be one of the reasons why the mean FEs of the JTWC were smaller than for the CMA and RSMC-Tokyo. For the 24- and 48-h forecast lead times, the 75th and 95th percentiles of the FEs of the RSMC-Tokyo were smaller than of the CMA. Both of these operational centers fared similarly regarding the 72-h forecast lead time. Overall, the JTWC exhibited the best forecasting performance during 2005–14 in terms of both accuracy and stability. The performances of the CMA and RSMC-Tokyo were slightly poorer but comparable with each other.
b. Trend of mean FEs during 2005–14
The annual mean numbers of homogeneous samples and FEs of the OFCs issued by the three operational centers during 2005–14 are listed in Table 2. In the initial years of the evaluations, the 24-, 48-, 72-, 96-, and 120-h FEs of the TC tracks over the WNP were approximately 100, 190, 290, 550, and 540 km, respectively (the initial year for the 24-, 48-, and 72-h forecast lead times was 2005; those for the 96- and 120-h forecast lead times were 2008 and 2010, respectively). By 2014, the 24-, 48-, 72-, 96-, and 120-h FEs were approximately 100, 160, 240, 300, and 430 km, respectively; considerable reductions in all but the 24-h forecast.
The time series of the annual average FEs are shown in Fig. 2. After a linear regression of the forecast value (annual mean FEs) and forecast factor (time), it was found that the FE trends for all three centers were negative (straight dashed lines in Fig. 2). The 24-, 48-, 72-, and 96-h year-to-year FEs by the CMA were reduced by 3.3, 6.3, 8.8, and 37.3 km, respectively (Table 3). For the corresponding forecast lead times, the JTWC’s year-to-year FEs were reduced by 1.6, 5.5, 11.2, and 44.2 km, respectively.
For the RSMC-Tokyo, the linear trends for all forecast lead times did not pass the F test at the 90% confidence level (Table 3). This indicated that RSMC-Tokyo’s 24-, 48-, and 72-h forecasts did not improve significantly during 2005–14. Thus, only the trends of the CMA and JTWC were considered in the analysis of the progress made regarding the various forecast lead times. Among these, the TC track forecasts over the WNP for the 96-h forecast lead time improved the fastest: the year-to-year FEs were reduced by approximately 40 km and they passed the F test at the 99% confidence level (Table 3). Accuracies of the 96-h forecast in 2014 were comparable with those of the 72-h forecast in 2005 (Table 2).
The accuracies of the other forecast lead times exhibited two different characteristics: there were annual improvements for the 24-, 48-, and 72-h forecast lead times, but the accuracies of the 120-h forecasts fluctuated without any significant trend. Although the interannual trends of forecast accuracies of the 24-, 48-, and 72-h forecast lead times passed the F test at the 90% confidence level, most of the percentage of the explained variance did not exceed 50% (Table 3). This meant that the trends for the TC tracks at those forecast lead times could be explained partially by consideration of the time factor. It also indicated the need to consider other factors, including the differences in the annual number of TCs (Elliott and Yamaguchi 2014), the annual average latitude, and the annual average translational speed of TC initialization (Neumann and Pelissier 1981a). However, the investigation of such factors was beyond the scope of this study.
Figure 2 also illustrates the tremendous progress achieved by the three operational centers in recent years regarding TC track forecasts in the WNP. For 2010–14, their mean 5-yr FEs for the 24-, 48-, 72-, 96-, and 120-h forecast lead times were 104, 183, 278, 375, and 520 km, respectively (Table 1). When the accuracies of the JTWC’s track forecasts were used as the referencing standard [see Table 6–1 in Sopko and Falvey (2014)], the comparative results indicated the accuracies of the 48-, 72-, and 96-h forecasts during the period 2010–14 were comparable to the 24-h forecast lead time during the period 1996–2000, the 48-h forecast lead time during the period 1998–2002, and the 72-h forecast lead time during the period 1998–2002, respectively. This meant that during the period 2010–14, the 2-, 3-, and 4-day forecasts of TC tracks over the WNP were of the same level of accuracy as the 1-, 2-, and 3-day forecasts of 15 yr earlier, respectively.
Among the operational centers, the CMA showed the most obvious improvement in forecast accuracy. It ranked relatively low among the three centers prior to 2010 (Table 2); however, by 2014, the accuracy of its 24-h forecasts matched the other two centers (all at approximately 100 km). The accuracy of the CMA’s 48-h forecasts (150 km) was comparable with the JTWC (156 km) and better than the RSMC-Tokyo (176 km). For the 72-h forecast, the accuracy of the CMA track forecasts (206 km) was better than those of the JTWC (233 km) and RSMC-Tokyo (271 km). All comparisons of forecast accuracies between the various operational centers passed the t test at the 95% confidence level.
The box plots of the three operational centers’ annual mean FEs (Fig. 3) indicate the stability trends of their track forecasts. The trend of the annual mean FEs for the 120-h forecast lead time contained considerable volatility and, hence, it was not analyzed further. The median errors have been decreasing since 2005 in general (Fig. 3). However, year-to-year variations exist and make this improvement hard to see from Fig. 3. But if we pay attention to the changes in the range of median errors between the periods 2005–09 and 2010–14, we can further see the recent improvements in the median errors. For 24-h forecasts, the range of median errors changes from 88.40~118.85 km during the period 2005–09 to 72.90~114.55 km during the period 2010–14. For 48-h forecasts, the range of median errors changes from 140.10~211.40 km during 2005–09 to 118.30~184.60 km during 2010–14. For 72-h forecasts, the range of median errors changes from 210.70~349.70 km during 2005–09 to 166.95~319.30 km during 2010–14. The median FEs for the 96-h forecast lead time were reduced from 500 km in 2008 to approximately 250 km in 2014. Furthermore, for the 24-, 48-, 72-, and 96-h forecast lead times, the trends of the FEs at the 75th and 95th percentiles declined the most among the various percentiles. The simultaneous reduction in the annual mean FEs and errors at the 75th and 95th percentiles meant that during the study period improvements in TC track forecasting performance were achieved mainly through the considerable reductions in the ratio of cases with very large FEs (i.e., larger error cases) against the total number of cases of TC track forecasts.
In addition to using the RSMC-Tokyo best-track dataset to calculate the FEs, separate calculations were made using the best-track datasets of the CMA and JTWC as reference (Table 2). For the number of homogeneous samples, the largest number of samples size was obtained when the RSMC-Tokyo dataset was used, followed by that of the CMA and JTWC. However, the homogeneous samples with the RSMC-Tokyo dataset also had the most FEs, followed by the homogeneous samples with the CMA and JTWC datasets. Table 3 shows the comparison of the FE trends of the three operational centers. For the 24-, 48-, and 72-h forecast lead times, the FE trends calculated using the respective operational centers’ best-track datasets did not show any significant differences in the percentage of explained variance or statistical confidence levels. For the 96- and 120-h forecast lead times, the accuracies of the forecasts based on the CMA best-track dataset showed the greatest improvement (Table 3), while those based on the datasets of the RSMC-Tokyo and JTWC were slightly poorer but comparable with each other.
In terms of the forecast accuracies of the three centers, the conclusions derived when using the best-track datasets of the CMA and JTWC were similar to those described earlier; that is, there were continuous improvements in forecasting accuracies by all three centers and the CMA improved the most in short-range forecasts (1–2 days). This indicated that although the selection of different best-track datasets might affect the evaluation results of forecast performance to a certain degree, it did not cause any major differences in the FE trends.
c. Regional distribution characteristics of FEs
The WNP (0°–60°N, 100°E–180°) was divided into a 2° × 2° grid with 1271 grid points. The mean FEs within each grid point were calculated before a nine-point smoothing function was used four times on the calculated data. This gave the regional distributions of the 24-, 48-, 72-, 96-, and 120-h FEs (Fig. 4).
As shown in Figs. 4a–c, the 24-h FEs for the three operational centers were large over two main zones. One was located over the southern Sea of Japan and the sea region east of Japan (called the midlatitude zone). When TCs moved into this zone, the complex interactions between them and midlatitude weather systems led the TCs to transition into extratropical cyclones [please refer to Fig. 3 in Kitabatake (2011)]. The other area was located over the South China Sea and the sea region east of the Philippines (called the low-latitude zone), where WNP TCs are easily formed [please refer to Fig. 6 in Chen et al. (1998)].
Most TCs form over the open tropical oceans where there is a gross lack of observational data, particularly with regard to defining the precise locations and inner-core structures of the TCs. This lack of information regarding the initial conditions usually causes large FEs (Cha and Wang 2013; Hendricks et al. 2013). These issues concerning the initial formation of TCs result in larger FEs for short- and medium-range forecasts. The 24-h mean FEs for the entire area were approximately 110 km (Table 1). However, for Taiwan and the sea region to its east (which are especially vulnerable to TCs), the FEs were significantly smaller than 110 km. This could be attributed to the TCs that moved into that area being generally well developed, and NWP models generally performed better for well-developed TCs than weak TCs (Chen et al. 2013).
The regional distributions of the 48-h TC track FEs were similar to the 24-h FEs, as shown in Figs. 4d–f. The 72-, 96-, and 120-h FEs were located mainly in the midlatitude zone, as can be seen in Figs. 4g–m. It is remarkable that the FEs of the low-latitude zone for 72, 96, and 120 h are no longer as large as those for 24 h. These characteristics are consistent with those of the error analysis by Chen et al. (2013) for ECMWF-IFS forecasts over WNP: that is, “the relatively large errors were mostly located in two regions. One is the offshore or land area at high-latitudes (25°–40°N, 115°–135°E). The other is the low-latitude area over the western North Pacific (5°–15°N, 130°–155°E).” The position errors of ECMWF-IFS in high latitudes were also larger for longer forecast periods (e.g., 72 h).
d. Cyclone displacement errors
A quantitative analysis analogous to Fig. 5 in Colle and Charles (2011) is performed. Figure 5 presents the position-error histograms of the forecast position relative to the observed position at each lead time level for three official forecasts, aiming to analyze the distribution of displacement errors of a given forecast. For CMA, the forecast cyclones are shifted in a southeasterly direction within 24 h, and then in a southwesterly direction from 48 to 120 h. The value of the relative position bias was 15 km at 24 h, 30 km at 48 h, 53 km at 72 h, 126 km at 96 h, and 201 km at 120 h. For RSMC-Tokyo, the forecast cyclones basically shifted in a southwesterly direction within 72 h. The values of the relative position bias were 13 km at 24 h, 33 km at 48 h, and 71 km at 72 h for RSMC-Tokyo, slightly smaller than that for CMA at 24 h and larger than that for CMA at 48 and 72 h. Both CMA and RSMC-Tokyo showed the variety of systematic biases increasing with lead time.
JTWC showed its own respective characteristics of systematic bias. The direction of the systematic bias for JTWC spread to the southeast within 48 h and then turned toward the southwest from 72 to 120 h, and the value reached about 20 km at 48 h and 70 km at 120 h. The values of JTWC’s systematic biases were significantly smaller than those for CMA and RSMC-Tokyo at each lead time level, which means JTWC did not show obvious systematic bias compared with the other two official forecasts.
e. Translational speed and direction biases
Figure 6 shows scatterplots of ATEs and CTEs for the three operational centers’ 24-, 48-, 72-, 96-, and 120-h TC track forecasts, as well as the probability ellipses that theoretically contain 90% of the FEs associated with a bivariate normal distribution. The mean FEs were relatively small (Table 1). However, t-test results at the 0.05 significance level revealed that the TC track forecasts issued by the three centers contained significant translational speed and direction biases for most of the forecast lead times (please see Table 1, FEs in bold).
A positive (negative) CTE indicates displacement to the right (left) of the observed track, and a positive (negative) ATE indicates that the forecasting system is moving the TC too quickly (slowly). It can be seen from Fig. 6 that for all of the forecast lead times, the major axis of the probability ellipse was in a northeast–southwest orientation. This meant that the majority of the FEs were combinations of slow and left-of-track or fast and right-of-track biases. It should also be noted that the major axis of the ellipse formed an acute angle with the y axis, and the ratio of the lengths of the major to minor axes increased continuously with time, except for JTWC at 120 h. This indicated that the majority of the track forecasts made by the three operational centers contained the characteristics of small translational direction biases but large translational speed biases, and these characteristics became more apparent with longer forecast lead times.
The regional grid system mentioned in section 3c was used to evaluate the characteristics of the spatial distributions of the translational speed and direction biases contained in the three operational centers’ track forecasts. For each grid point, the mean of the four categories of track FEs (negative and positive ATEs, and negative and positive CTEs) was calculated. After smoothing operations, the distribution of translational speed and direction biases in the forecasts of the positions for TC centers were determined (Fig. 7). The three rows of subfigures contained therein correspond with the 24-, 48- and 72-h forecast lead times.
The shading in Fig. 7 represents the spatial distribution of the track FEs of the CMA. The region surrounded by the solid (dash) contour lines in the subfigures contains the FEs of the RSMC-Tokyo (JTWC) for the corresponding forecast lead time, where the absolute value exceeded 110, 200, and 400 km. Figure 7 shows one of the main characteristics of the FEs; that is, among the four categories of biases, the magnitude of the negative ATEs was the largest. For the 24-h forecast lead time, the magnitudes of the four types of biases were largely comparable and approximately 50–100 km, as shown in Figs. 7a–d. For the 48-h forecast lead time, the magnitude of the negative ATEs was greater than for the other three types of biases, as shown in Figs. 6e–h.
Negative ATEs formed the bulk of the errors for the 72-h forecast lead times. Figures 6j–l indicate that the other three types of biases were rarely seen for regions of biases with magnitudes > 200 km. This shows that for short-range OFCs (1–2 days), all four types of biases appeared frequently and that the magnitudes of the resultant FEs were similar. However, for the medium-range OFCs (the results of 96- and 120-h forecasts presented similar characteristics to those of 72-h forecasts and are not shown) of all three operational centers, the magnitudes of the FEs caused by slow biases far exceeded those caused by other types of biases.
It can also be seen from Fig. 7 that the translational speed and direction biases in the track forecast of all three operational centers contained significant regional characteristics of distribution. The most prominent characteristic was that the zone with higher-value negative ATEs was located mainly in Japan and the sea region east of Japan. This indicates that when TCs moved into this sea region, their positions, as forecasted by the CMA, JTWC, and RSMC-Tokyo, tended to lag behind the actual positions (relative to the TCs’ actual translational direction). When a tropical cyclone moves westerly, its speed is much faster than when it moves easterly. The TCs moving fast tend to have larger forecast errors than slow-moving TCs. That has been applied by the forecast centers in probability circles along with the forecast track (Narita 2015). It was noted that this sea region matched the midlatitude zone defined in section 3c very well.
Figures 4 and 7 have common characteristics and should be analyzed as a whole to examine the regional characteristics of the distributions of their corresponding biases. It was found that the low- and midlatitude zones (the two regions with higher-value errors) contained different types of biases. All four types of biases contributed to the FEs in the low-latitude zone, with slow biases being most prominent. For the midlatitude zone, negative ATEs, positive ATEs, and negative CTEs contributed to the FEs. However, the source of the FEs for the 72-, 96-, and 120-h (the results for 96 and 120 h were not shown) forecast lead times was almost entirely due to slow biases.
Previous studies on sudden TC track changes (often resulting in large FEs) have focused mostly on sudden directional changes (Carr and Elsberry 1995; Wu et al. 2011, 2013; Shi et al. 2014; Luo et al. 2015), and few studies have addressed sudden speed changes. When approaching an island or a mainland area, TCs occasionally experienced sudden speed changes (Chen et al. 2002), which can lead to insufficient time to complete storm preparations and evacuations.
It was stated above that most of the track forecasts made by the CMA, RSMC-Tokyo, and JTWC during 2005–14 had characteristics of small translational direction biases but large translational speed biases. In other words, the magnitudes of the FEs introduced by translational speed biases were greater than those introduced by translational direction biases. For short-range forecasts, both translational speed and direction biases were sources of TC track FEs; however, for medium-range forecasts, the main source was translational speed biases (mainly slow biases). As TCs often accelerate after entering the midlatitude zone, it is important for track forecasts to account for sudden changes in translational speed.
f. Synoptic flow patterns of TCs with the largest error
The frequency distribution characteristics of the three operational centers’ 72-h track FEs are shown in Fig. 8. The left column is the histogram of the official FEs (class intervals in increments of 50 km) and generalized extreme distribution fit. The right column is the percentage of the cumulative errors for a particular error interval against the sum of the total error and gamma distribution fit. Both fitting lines passed the t test at the 0.05 significance level. Successful and failed cases were defined in this study as 72-h FEs of <300 km and >600 km, respectively. (The thresholds of successful and failed forecasts are based on the range of forecast error values. For successful forecasts, we take the threshold of the forecast error values less than the mean FEs. For failed forecasts, we take the threshold of the forecast error value more than the mean FEs plus 1.5 times the standard derivation. Both thresholds have been rounded to the nearest 100.) The statistical results in Figs. 8a, 8c, and 8e show that 62% of the 72-h forecasts for TC tracks over the WNP were successful, but approximately 8.5% failed.
Although failed cases were in the minority, it can be seen from Figs. 8b, 8d, and 8f that the cumulative errors produced by successful and failed cases composed approximately 38% and 19%, respectively, of the total cumulative errors. Even though the number of failed cases amounted to approximately one-seventh of the successful cases, the errors produced by the former were half those of the latter and almost one-fifth of the total errors; that is, these results would significantly affect the mean FEs. This finding is consistent with the conclusions reached by Carr and Elsberry (2000b), Kehoe et al. (2007), and Payne et al. (2007) in their error analyses of failed cases using samples from forecast guidance. Previous studies showed that a TC track forecast tended to have a larger error when 1) WNP TCs were usually accompanied by cyclonic circulations on time scales that range from intraseasonal oscillations to synoptic disturbances and 2) crucial steering factors were improperly estimated and interactions in the NWP models were misrepresented (Harr and Elsberry 1991; Carr and Elsberry 2000a,b; Kehoe et al. 2007; Payne et al. 2007; Galarneau and Davis 2013; Wu et al. 2012). Hence, it was necessary to characterize the synoptic flow patterns of the failed cases.
It is well known that the environmental steering flow is typically the most prominent factor among all factors influencing TC motion (Chan and Gray 1982; Carr and Elsberry 1990), accounting for as much as 70%–90% of the motion (Neumann 1992). Here, we stratify the influence of the synoptic flow patterns on the TCs with larger error based on the steering flow theory. We employ NCEP–DOE AMIP-II reanalysis (NCEP–DOE 2) data (Kanamitsu et al. 2002) to describe the TC environment. The calculation of deep-layer mean (DLM) flow of the steering layer follows the methodology in Pike and Neumann [(1987), Eq. (1)].
The statistical results showed that for 2005–14, there were 60, 59, and 60 TCs that experienced failures in the 72-h track forecasts issued by the CMA, JTWC, and RSMC-Tokyo, respectively. From these, 56 homogeneous TCs with failed forecast experience were selected. Note that several TCs experienced more than one fail forecast. For brevity, we focused on the initial and forecast times when the largest FE occurred for each TC.
The classification of synoptic flow patterns of TCs with the largest errors is based on those in the DLM steering flow at the initial time t and the forecast time t + 72 h when thelargest forecast error occurred. Finally, 12 synoptic flow patterns are identified, as conceptually shown in Fig. 9. Charts of large-scale mean steering are attained through compositing similar cases’ environment fields. For brevity, here we only show two synoptic patterns, which have the highest occurrence frequency.
1) Flow patterns with the combination of cyclonic circulations, subtropical ridges, and midlatitude troughs (CRT)
There are 27 TCs associated with the CRT flow pattern (Fig. 9a), accounting for almost 50% of the total cases. Among these cases, the prominent characteristic of the TC tracks is recurvature.
Composite steering flow patterns within a ±35° square from the TC center are shown in Fig. 10. In the CRT pattern, at the beginning (Fig. 10a), the TCs are located in a region west of the subtropical ridge and northeast of the cyclonic circulation. TCs steered by these two synoptic systems moved northwestward. Then (Fig. 10b), TCs turn poleward and eastward under the influence of the midlatitude westerlies. The key synoptic features during this stage are subtropical ridges and midlatitude troughs.
2) Flow patterns of the TCs’ track that cannot be explained by steering flow (NSF)
It is worth noting that there are six TCs that have a large bias between the steering flow and TC movement (Fig. 9l).
In this pattern, TCs move northwestward around the western portion of the subtropical ridge (Fig. 11a) and are then embedded in the southwest flow by the cyclonic circulation (Fig. 11b). The steering flow suggests TCs should turn to the right and move northeastward but TCs actually move westward.
g. Case studies
As we can see from section 3f, the CRT and NST patterns are the two most common synoptic patterns associated with the TCs with the largest error. In this section, we selected two TCs with the largest mean forecast error during the storms’ lifetime from each type to determine the source of the forecast errors. Fengshen (2008, with the NSF pattern) and Lupit (2009, with the CRT pattern) were selected and their synoptic maps are shown in Fig. 12.
The 3-day track forecasts for Fengshen (2008) issued by the CMA, JTWC, and RSMC-Tokyo are shown in Fig. 13. Fengshen’s best track is indicated by the solid black line. Fengshen (2008) formed over the sea to the southeast of the Philippines, tracked west-northwestward across the Philippines and, eventually, made landfall north of Hong Kong. It can be seen from Fig. 13 that the tracks forecast by the various operational centers were inaccurate; that is, despite its continuous west-northwest track, there were predictions of northward movement within 12–24 h of any given initial forecast position. Several OFCs even predicted very sharp poleward turns and recurvature.
When Supertyphoon Lupit (2009) passed through the eastern Philippine Sea, it underwent a reverse “S” turn (T1 in Fig. 14). Subsequently, it underwent another sharp turn to the northeast (T2 in Fig. 14) near northern Luzon, in the Philippines. Figure 13 shows that none of the operational centers accurately captured the initial reverse S turn. The operational centers then exhibited different trends regarding the forecast of Lupit’s subsequent sharp turn and movements thereafter. Compared with the CMA and JTWC, the RSMC-Tokyo forecasted this sharp turn earlier. However, after the recurvature, and during Lupit’s sustained acceleration stage, the track forecasts by the CMA and JTWC were actually closer to Lupit’s best-track dataset compared with the RSMC-Tokyo forecasts.
The time series of the 24- and 72-h track FEs, ATEs, and CTEs for the two cases are shown in Figs. 15 and 16, respectively. Figures 15a and 16a show that all three operational centers had similar FE sequences for Fengshen’s track forecasts. Their errors were consistently large; for example, some of the 24-h FEs were >200 km, while the 72-h FEs were >500 km. Throughout Fengshen’s entire life cycle, the magnitudes of the ATEs and CTEs in the track forecasts made by all three operational centers were comparable, as can be seen in Figs. 15c and 15e and 16c and 16e. The main source of FEs was that all three operational centers forecasted almost continuously that Fengshen (2008) would turn north or northeast, whereas it actually persisted in moving northwestward.
During Lupit’s reverse S turn, the 24- and 72-h FEs were >200 km and nearly 500 km, respectively. The magnitudes of the contributions by the ATEs and CTEs were comparable, as shown in Figs. 15d, and 15f and 16d and 16f. This indicated that both the translational direction and the speed biases contributed toward the FEs during the S turn. After Lupit (2009) made its sharp turn (T2 in Fig. 14) with sustained acceleration thereafter, the FEs increased suddenly; for example, for the 24- and 72-h forecast lead times, the FEs were approximately 400 and >2000 km, respectively. The latter far exceeded the errors caused by the S turn. During this period, the time series for the ATEs and FEs coincided, whereas the magnitudes of the CTEs were small. Thus, translational speed biases were the main source of the FEs after the sharp turn and subsequent acceleration. In summary, the abnormally large FEs for Lupit’s track arose from two causes: 1) failure to forecast the TC recurvature accurately and 2) failure to forecast accurately that there would be sustained acceleration following its recurvature; the FEs caused by the latter far exceeded those due to the former.
4. Conclusions and discussion
In this paper, homogeneous samples of OFCs for TCs over the WNP, issued by the CMA, RSMC-Tokyo, and JTWC for 2005–14, were selected to evaluate the forecast performance of the three operational centers. Unlike the majority of the previous studies that have focused on trend analyses of forecast accuracies, this study used other statistical methods (including the means, percentiles, and regional distributions; probability density distributions; probability ellipses; ATEs; and CTEs) to identify statistically significant biases, trends, and other characteristics. In addition, we have investigated the synoptic flow patterns of the TCs with largest forecast error based on the steering flow theory.
The main conclusions of this study are as follows.
For TC tracks over the WNP for 2005–14, the 24-, 48-, 72-, 96- (2008–14), and 120-h (2010–14) FEs were approximately 110, 200, 300, 430, and 520 km, respectively. In the five most recent years, the mean FEs were approximately 100, 180, 280, 380, and 520 km, respectively. The JTWC delivered the best forecasting performance, while the RSMC-Tokyo and CMA were comparable with each other, and showed slightly poorer levels of performance.
There were significant improvements in the accuracies and stabilities of the 24-, 48-, 72-, and 96-h forecasts. Among these, the 96-h forecasts improved the most. The forecasting performances of the three operational centers improved significantly by 2010–14. The t (t < 4) day track forecasts for the period of 2010–14 were found to be comparable with the t − 1 day forecasts from 15 yr earlier. The CMA made the most obvious progress: its 24-, 48-, and 72-h forecasts for 2014 matched or even surpassed those of the RSMC-Tokyo and JTWC.
In terms of spatial distributions, the 72-, 96-, and 120-h FEs had one zone with high values, which was located in the midlatitude region. The 24- and 48-h FEs had two zones with high values: the aforementioned zone and a low-latitude zone.
Both translational speed and direction biases contributed to the FEs in the low-latitude zone. However, the lag in translational speed was the main source of FEs in the midlatitude zone.
Most track forecasts by the three operational centers had the characteristics of small translational direction biases but large translational speed biases. For the latter, the magnitudes of the slow biases exceeded those of the fast biases.
Although the proportion of failed cases was small, these could significantly affect the mean FEs. Based on the 56 TCs with largest forecast error, 12 synoptic flow patterns associated with larger forecast error have been identified. The most common pattern is the CRT pattern (26 TCs), while the second common pattern is the NSF pattern (6 TCs). The CRT pattern is characterized by cyclonic circulations, subtropical ridges, and midlatitude troughs. The former two force TCs to move northwestward while the latter two force TCs to turn poleward and eastward. The NSF pattern is characterized by the cyclonic circulation. The steering flow suggests TCs should turn to the right and move northeastward but the TCs actually move westward.
Analysis of the two failed cases indicated that the source of their FEs was related to recurvature and subsequent acceleration. The first issue involved TCs either being forecasted to recurve when they did not, or being forecasted not to recurve but they did. The second issue included forecasted translational speeds being significantly slower than the TCs’ actual translational speeds.
The advancements in the global and regional models have played a critical role in the decreasing errors of the operational TC track forecasts (Elliott and Yamaguchi 2014). For JTWC, the Hurricane Forecast Improvement Program (HFIP), designed to improve TC track forecasting through research into the development of data assimilation, model advances, and statistical postprocessing (Gall et al. 2013), was a major factor contributing to recent improvements in operational TC track forecasts. JTWC (2015) stated that “model and data assimilation improvements since HFIP began have contributed to reduced mean JTWC CONW errors. Combination of above improvements…contributing to lower mean JTWC forecast track error.” Similarly, CMA attributed its recent improvements in TC track forecasting to the establishment of a tropical cyclone monitoring network made up of satellite, weather radar, and automatic weather stations (Duan et al. 2012), and particularly to the improved tropical cyclone ensemble forecast techniques built on multiple dynamic models (CMA 2015). Recently, RSMC-Tokyo updated its Global Spectral Model from TL959L60 to TL959L100 and the Typhoon Ensemble Prediction System from TL319L60 to TL479L60 to make further improvements in TC track guidance (RSMC-Tokyo 2014).
In this study, results from the examination of the relationship between the 12 types of synoptic patterns and large forecast errors can be assigned to four categories: 1) sudden adjustment of the environmental flow (types CR2, R1, R2, and NSF), 2) TC recurve into higher latitudes (types CRT, CR1, RH, and RT), 3) interaction of binary TCs (types BT1, BT2, and BT3), and 4) weak backgrounds (type WB). However, the detailed physical mechanisms responsible for large error still need to be explored. Further studies should be conducted exploring the scientific questions related to what factors cause the recurvature and subsequent acceleration of TCs and how these important systems impact the performance of TC track forecasting.
This work is supported by National Natural Science Foundation of China Grants 41230421, 41675058 and the 973 project (2015CB452802) of the Ministry of Science and Technology of China. We thank the editor, Yuqing Wang, and three anonymous reviewers for their helpful comments and suggestions. We also thank Wenli Shi, Lu Yang, and Wen Yang for valuable discussions.