1. Introduction
Research and development (RD) advancements in tropical cyclone (TC) forecasts using ensemble methods1 have been transferred into the operational forecasting community over the last few decades (Heming and Goerss 2010). For example, ensemble mean TC track forecasts have been widely used for operational TC track forecasting (Yamaguchi et al. 2015, submitted to WMO Bull.). Either simple, weighted, or selective ensemble mean TC track forecasts tend to have smaller position errors than single model–based forecasts. According to Goerss (2000) and Goerss et al. (2004), the improvement rate is approximately 10%–20%. Van der Grijn et al. (2005) defined TC strike probability as the probability that a TC will pass within a 120-km radius from a certain location during the next 5 days from the forecast initial time, and verified TC strike probability forecasts using ensemble forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF). Similar TC strike probability forecasts based on ensemble applications have been used in operational weather centers worldwide.
Global medium-range ensemble forecasts from 10 numerical weather prediction (NWP) centers have become recently available for research purposes with a 2-day delay after the initial time through The Observing System Research and Predictability Experiment (THORPEX; Parsons et al. 2015, manuscript submitted to Bull. Amer. Meteor. Soc.) Interactive Grand Global Ensemble (TIGGE; Bougeault et al. 2010; Swinbank et al. 2015). The TIGGE archive makes it possible to conduct an intercomparison study as well as to construct and evaluate a combination of multiple ensembles called a multicenter grand ensemble (MCGE). Various projects, such as the World Meteorological Organization (WMO) North Western Pacific Tropical Cyclone Ensemble Forecast Project (NWP-TCEFP; Yamaguchi et al. 2014) and the WMO Typhoon Landfall Forecast Demonstration Project (WMO-TLFDP; Tang et al. 2012), have demonstrated the value of the MCGE, which is another example of the transfer of the RD advancements into operations.
Many studies evaluating the performance of ensemble TC forecasts have used verification samples in which the TCs existed at the initial time of the forecasts (e.g., Van der Grijn et al. 2005; Majumdar and Finocchio 2010; Yamaguchi et al. 2012). Few studies have verified TCs created during the model integrations on a medium-range time scale up to 2 weeks, such as a verification of TC genesis and subsequent track (hereafter referred to as TC activity). These medium-range TC activity forecasts are important from a perspective of early warnings and would be particularly important for countries located in lower latitudes where a lead time from TC genesis to the landfall may be relatively short.
The ECMWF routinely creates TC activity forecast products based on its medium-range ensemble. Here, TC activity in the medium-range will be defined as the probability that TCs are present within a certain radius (e.g., 300 km) from a certain location during a certain forecast time window. Vitart et al. (2012) showed that verifications in terms of forecast reliability for these TC activity forecasts of the ECMWF ensemble for the North Atlantic, west Pacific, south Indian Ocean, and north Australia basins do not perform equally well among the basins. Belanger et al. (2012) evaluated ECMWF ensemble forecasts for TC genesis or activity on short- to medium-range time scales over the Arabian Sea and the Bay of Bengal. Specifically, Belanger et al. (2012) verified the probability that a vortex of tropical depression strength or greater was present, and showed that the ensemble forecasts were more skillful than the climatological forecasts up to day 10 for both regions. Majumdar and Torn (2014) also demonstrated that the ECMWF ensemble has the potential for the probabilistic prediction of tropical cyclogenesis out to day 5 in the North Atlantic basin. Halperin et al. (2013) analyzed TC genesis forecasts from five global models and demonstrated the utility of the consensus forecasts for the North Atlantic and eastern North Pacific basins.
Vitart et al. (2010) evaluated the skill of the extended-range ECMWF forecasts (32-day ensemble forecasts) to predict tropical cyclone strike probabilities and found skill up to week 3 over the Southern Hemisphere basins. The skill of the ECMWF 32-day ensemble in predicting TC strike probability over weekly periods was also examined by Vitart et al. (2012), who showed that the ensembles are skillful for the first 3-weekly periods for all TC basins in the Northern Hemisphere and the Southern Hemisphere. Elsberry et al. (2010, 2011, 2014) and Tsai et al. (2013) have verified the performance of the ECMWF 1-month ensemble forecasts of TCs over weeks 1–4 before the TC actually began. These studies have demonstrated that the ECMWF ensemble can predict many TC formations and subsequent tracks even for the week-4 forecasts.
Whereas TC genesis or activity forecasts have been studied within many contexts as described above, the extent to which the state-of-the-art global medium-range ensembles can forecast TC activity for each TC basin over the world has not been investigated yet. Similarly, the effectiveness of MCGE in TC activity forecasts has not been evaluated in all basins. To provide a fundamental understanding of the skill of TC activity forecasts by the current operational global medium-range ensembles, it is worthwhile to conduct systematic verifications on the ensemble forecasts of TC activity by single-ensemble forecasts and by MCGEs for all TC basins around the world. A study of this kind is of great importance not only for understanding the performance of ensemble forecasting but also for the transfer of the RD advancements into operations.
In the present study, ensemble forecasts of TC activity from short- to medium-range time scales (0–14 days) are verified over seven TC basins using global medium-range ensembles from the ECMWF, Japan Meteorological Agency (JMA), National Centers for Environmental Prediction (NCEP), and the Met Office (UKMO), which are obtained through the TIGGE portal site (http://tigge.ecmwf.int). First, an intercomparison of these four ensembles in terms of the skill of the TC activity forecasts is conducted. Following this, the relative benefits of an MCGE with respect to the best single-model ensemble among four ensembles are investigated. Here, two MCGEs are created with MCGE3 being a combination of only the ECMWF, NCEP, and UKMO ensembles while MCGE4 is a combination of all four ensembles. The ensemble sizes of MCGE3 and MCGE4 are 96 and 147, respectively (Table 1). The verification metric used to evaluate the skill of the ensemble forecasts is the Brier skill score (BSS; Wilks 2006), which is a standard metric for evaluating probabilistic forecasts and is used to determine whether the ensemble forecasts are more skillful than the climatological forecasts, which is crucial from an operational forecasting perspective.
Ensemble configurations for the ECMWF (Europe), JMA (Japan), NCEP (United States), and UKMO (United Kingdom) systems during the study period from 2010 to 2013.
This paper is organized as follows. Section 2 describes the data and the verification methodology. Section 3 describes the results of the intercomparison of TC activity forecasts worldwide and discusses the relative benefits of combining multiple ensembles. Section 4 is a summary of this study.
2. Methodology
Ensemble forecasts of TC activity are verified in the North Atlantic (atl), eastern and central Pacific (ecp), western North Pacific (wnp), north Indian Ocean (nin), south Indian Ocean (sin), Australian region (aus), and South Pacific (spc) basins as shown in Fig. 1. The model TCs are tracked using the methodology described in Vitart et al. (2012), where the maximum (or minimum in the Southern Hemisphere) vorticity at 850 hPa with a warm-core structure is tracked every 6 h and both minimum sea level pressure and a maximum wind speed are detected near the vorticity local maximum (or minimum).
Definitions of the TC basins that are verified in this study. Note that the verification regions are limited to between 25°N and 25°S to focus on TC genesis events.
Citation: Weather and Forecasting 30, 6; 10.1175/WAF-D-14-00136.1
Here, TC activity is defined as the probability that a TC is present within a 300-km radius from a certain location during a 3-day forecast time window. This 3-day time window is applied with a 1-day interval over a forecast length of 2 weeks (i.e., time windows of 0–3, 1–4, 2–5, …, and 11–14 days) to evaluate the performance of the ensemble forecasts for different lead times. Only latitudes between 25°N and 25°S are verified to focus on TC activity that includes TC genesis events.
The ECMWF, JMA, NCEP, and UKMO ensemble forecasts used in this study are available from the TIGGE website (Table 1). Only the 1200 UTC initial times for the forecasts are used in this verification in order to have a longer verification period. Note that the verifications for JMA are only up to the time window of 6–9 days as the forecast length of the JMA ensemble is 9 days. Ensemble forecasts initiated from 1200 UTC 1 January 2010 to 1200 UTC 31 December 2013 (30 June 2013) are verified for the basins in the Northern (Southern) Hemisphere. Note that the forecasts in which TCs do not exist at the initial time of the forecasts are included in the verification as the objective is to detect genesis plus the subsequent track. The numbers of forecasts verified are 1461 [365 × 4 + 1 (leap day)] in the Northern Hemisphere and 1277 in the Southern Hemisphere.


Climatological TC activity estimates have been calculated using the best-track data from the National Hurricane Center (NHC) for atl and ecp, the Joint Typhoon Warning Center (JTWC) for nin and the basins in the Southern Hemisphere, and the Regional Specialized Meteorological Center (RSMC) Tokyo-Typhoon Center for wnp (Table 2). First, daily climatological TC activity is computed using the best-track 6-hourly TC positions within ±15 days of each date. (Note that a shorter time span leads to erratic daily climatological TC activity estimates, especially in TC basins where the annual number of TCs is small.) As in the ensemble forecasts, a threshold distance of 300 km is specified to determine whether the observed TCs affect a certain location (Vitart et al. 2012). Then, climatological TC activity within the 3-day time windows is computed. Similarly, the observed TC activity probability, which is either 0% or 100%, is calculated using the best-track data (independent of the dataset used for climatology). These ensemble, climatological, and observed TC activity probabilities are calculated for 0.5° × 0.5° grid boxes. In these BSS computations, only the grid points with a climatological TC activity probability of the 3-day time window larger than 0 are considered in order to reduce the number of grid points with the correct “no TC activity” forecasts.
Sources of the best-track files used to create the daily climatological TC activity in each basin.
Only named TCs are verified in this study. Thus, climatological and observed TC activity probabilities are created based on TCs that had a maximum sustained wind of 35 knots (kt; where 1 kt = 0.51 m s−1) or stronger during their lifetime. According to the definitions of TC scales for each verification basin in Table 3, tropical depressions in the atl, aus, ecp, and wnp; depressions and deep depressions in the nin; and tropical disturbances and depressions in the sin are also not included in the verification. In addition, the TC tracks after extratropical transition are not included in the verification.
TC scales for different basins as a function of the max surface wind speed (kt).
Model TCs tend to be weaker than observed TCs because of the relatively low horizontal resolution of the ensemble models. An additional consideration is that the horizontal resolution of the ensemble fields in the TIGGE archive at ECMWF, from which the forecast tracks are created in this study, may be lower than the original model resolution (Table 1). Furthermore, a threshold wind value in the model that is the same as the reality (e.g., 35 kt) does not necessarily lead to the largest BSS. Therefore, threshold wind values of 15, 20, 25, 30, and 35 kt are tested for defining the model TC. For a threshold wind value of 30 kt, for example, all model TCs with a maximum wind of 30 kt or greater are regarded as TCs to be verified. Sensitivity of the skill score to the threshold wind values is of interest for maximizing the value of the technique for operational forecasts. Thus, both BS and BSS are calculated for each of the five threshold wind values.
An example of a TC activity probability map for Typhoon Haiyan (2013) that caused massive destruction across the Philippines is shown in Fig. 2. The observed TC activity probability (using 300 km for the potential impact) is given in Fig. 2a and the climatological probability is shown in Fig. 2d. The initial time of the forecasts for the four global ensembles is 1200 UTC 31 October 2013, which is about 4 days before the genesis (here defined as the formation of a system of tropical storm intensity or stronger) of Haiyan and 8 days before landfall on the Philippines. The time window in Fig. 2 is 5–8 days, so the forecast period is from 1200 UTC 5 November to 1200 UTC 8 November. The forecast TC activity probability is calculated using ensemble TC tracking dataset from each NWP center. Note that each of the forecasts has TC activity probability areas that are oriented along the path of Haiyan and thus have much smaller areas than the climatological probability map that is broader in the meridional direction, and includes the possibility of recurving TCs almost to Japan. While the ECMWF (Fig. 2b), NCEP (Fig. 2e), and UKMO (Fig. 2f) forecasts and the climatology (Fig. 2d) have probability areas extending across the South China Sea, only JMA (Fig. 2c) has a probability area that more closely matches the western extent of the observed probability in Fig. 2a.
Example of TC activity probability maps for Typhoon Haiyan for the forecast time window of 5–8 days initiated at 1200 UTC 31 Oct 2013. (a) Observed and (d) climatological TC activity probabilities are shown for (b) ECMWF, (c) JMA, (e) NCEP, and (f) UKMO. Note that the color bar for (d) is different from the others.
Citation: Weather and Forecasting 30, 6; 10.1175/WAF-D-14-00136.1
3. Results
a. Intercomparison
The skill of the TC activity forecasts in overlapping time windows from 0–3 to 11–14 days by the ECMWF, JMA, NCEP, and UKMO global medium-range ensembles is shown in Fig. 3. The larger the BSS value is, the more skillful the ensemble forecast is with respect to a climatological forecast. A negative BSS value means that the ensemble forecast is less skillful than climatological forecasts. Note that the BSS has been calculated for a series of threshold wind values (i.e., 15, 20, 25, 30, and 35 kt) to define model TCs and that the largest BSS value among them is plotted for each ensemble, each time window, and each TC basin.
BSSs of TC activity forecasts in the seven TC basins by the ECMWF (red), JMA (blue), NCEP (yellow), and UKMO (green) ensembles. Each group of histograms along the abscissa for the four models represents 3-day forecast time windows from 0–3 to 11–14 days.
Citation: Weather and Forecasting 30, 6; 10.1175/WAF-D-14-00136.1
The important result in Fig. 3 is that the positive BSS values extend to the 6–9-day forecast time window for most of the basins and most of these operational global ensembles, which indicates some skill in providing forecast guidance on TC activity that extends into week 2. In general, the ECMWF ensemble tends to have larger BSS values than the other three models, and especially at long lead times. More skill (larger BSS values) in atl is likely due to the aircraft reconnaissance and other observations, and in ecp to the limited areas over which TCs form and move. Larger skill in wnp may be related to the fact that the annual number of TCs is larger than in other basins.
The threshold wind values that result in the largest BSS values for each time window and each of the seven TC basins are summarized in Table 4. A threshold wind value corresponding to a tropical storm (i.e., 35 kt) generally gives the largest BSS for the NCEP ensemble in spite of the fact that the horizontal resolution of the NCEP forecast fields archived on the TIGGE website is 1.0° × 1.0°, which is about twice as coarse as the original model resolution (Table 1). However, the NCEP ensemble has large negative BSS values in some basins such as the Australian and South Pacific regions. These negative BSS values are attributed to NCEP ensemble forecasts of TC activity with high probabilities for which no TC activity was observed (false alarm). The NCEP ensemble spread tends to be relatively small at long lead times, which affects this lack of skill, as a “false alarm” with high probabilities significantly degrades the BSS. As will be shown by a reliability diagram of TC activity forecasts (see Fig. 7c, described in greater detail below) the NCEP ensemble is overconfident (forecast probability is larger than observed frequency) over large forecast probability ranges, especially where the number of grid points predicting the event (see green box in Fig. 7c, described in greater detail below) is not small.
Threshold max wind values (kt) that give the largest BSSs among the five threshold values of 15, 20, 25, 30, and 35 kt tested to define model TCs for each time window (days) and each of the seven TC basins. The numbers in boldface indicate the negative BSSs.
The ECMWF ensemble generally has larger BSS values for most time windows and in most TC basins (Table 4). This better performance of the ECMWF ensemble is consistent with Yamaguchi et al. (2012), as well as the study of Brown and Thepaut (2014) of the accuracy of TC track forecasts by the global deterministic models of these centers.
A unique characteristic of the UKMO ensemble forecasts is the frequency bias for the different wind forecast thresholds and lead times (Fig. 4). This frequency bias is defined as the number of grid points at which the event was forecast relative to the number of grid points at which the event actually occurred. Note that the number of grid points where the event was forecast is weighted with forecast probability. Frequency bias larger (smaller) than 1 means that the ensembles tend to forecast more (fewer) TCs than occurred, and a forecast bias equal to 1 means no bias. For an unbiased forecast system, this value is obtained for the 35-kt forecast threshold (equal to the observed threshold). As indicated in Fig. 4, the frequency of UKMO ensemble TC activity forecasts tends to decrease less with forecast time than for the other centers. For example, the frequencies decrease with time in the western North Pacific for the ECMWF, JMA, and NCEP ensembles while that of the UKMO ensemble is almost constant. By contrast, the frequency bias increases with time in the UKMO ensemble forecasts of TC activity for the eastern and central Pacific.
Frequency bias of TC activity forecasts in the seven TC basins by the ECMWF (red), JMA (blue), NCEP (yellow), and UKMO (green) ensembles. The x axis is the wind threshold value (kt) used to define model TCs and the y axis is the forecast bias (1 is no bias). Circle, hourglass, diamond, square, triangle, and inverted triangle symbols are for a forecast time window of 0–3, 2–5, 4–7, 6–9, 8–11, and 10–13 days, respectively. JMA’s results with wind threshold values of 30 and 35 kt are scaled out for the ecp.
Citation: Weather and Forecasting 30, 6; 10.1175/WAF-D-14-00136.1
Sensitivity to the threshold wind value used to define a model TC may also be illustrated from the reliability diagrams (Wilks 2006) for the UKMO ensemble forecasts for the ecp with a forecast time window of 2–5 days (Fig. 5). Note that the observation frequency is larger than the forecast probability when 35 kt is used as the threshold wind value to define model TCs. However, the observation frequency is clearly smaller than the forecast probability when threshold values of 15–25 kt are utilized. That is, using 35 (15–25) kt as the threshold wind value for defining the model TCs results in forecasts that underestimate (overestimate) the numbers and lifetimes of observed TCs. Similar sensitivity to the threshold wind value is found for the other three global ensemble models.
Reliability diagrams for TC activity forecasts by the UKMO ensemble for the ecp with a forecast time window of 2–5 days with threshold wind values of (a) 35, (b) 30, (c) 25, (d) 20, and (e) 15 kt to define the model TCs. Reliability of these UKMO ensemble forecasts is indicated by the red line with reliability values to the left (right) of the blue dashed line indicating an under(over) forecast of the numbers of observed events. Number of grid points (samples) on a log10 scale (ordinate) predicting the event is shown by dashed green boxes, and the number of grid points (samples) that the event actually covered is shown by dashed blue boxes.
Citation: Weather and Forecasting 30, 6; 10.1175/WAF-D-14-00136.1
Smaller threshold wind values tend to lead to the larger BSS skill for the JMA ensemble for the atl, ecp, spc, and wnp. These lower threshold wind values may be more appropriate because of the relatively low horizontal resolution of the JMA fields archived in the TIGGE website as well as a possibility that the JMA ensemble tends to predict weaker TCs over these basins as the frequency biases for a certain threshold wind value are smaller than those of other centers.
b. Multicenter grand ensemble
The relative benefit of an MCGE with respect to the best single-model ensemble among the ECMWF, JMA, NCEP, and UKMO ensembles is shown in Fig. 6. In each time window, the global ensemble with the largest BSS among the four NWP centers is selected and compared with the BSS of the combination of all four ensembles (MCGE4) and with the BSS for a combination of the ECMWF, NCEP, and UKMO ensembles (MCGE3). Note that BSS for the MCGE4 is only calculated to a time window of 6–9 days because the forecast length for the JMA ensemble is 9 days. Note also that the threshold wind values from Table 4, which are different among the four ensembles, are used to create both MCGE3 and MCGE4.
As in Fig. 3, but for MCGEs. The left-most bar is for the single global ensemble model with the largest BSS among the four global ensemble models (bar colors are the same as in Fig. 3), the middle bar is the BSS for MCGE4 that includes all four ensemble models, and the right-most bar is the BSS for MCGE3 that includes just the ECMWF, NCEP, and UKMO ensemble models.
Citation: Weather and Forecasting 30, 6; 10.1175/WAF-D-14-00136.1
In general, the skill (BSS value) of both MCGEs is larger than that of the best single-model ensemble, which indicates the relative benefit of the MCGEs over the single-model ensemble. In addition, both MCGEs are more skillful than the climatological forecasts for all time windows more than 1 week for all of the seven basins, except for nin, where skill only exists through 8–11 days. Indeed, the MCGE4 is capable of providing skillful guidance on TC activity forecasts to 11–14 days for all basins except nin and sin. After 0–3 days, the relative benefit of the MCGE compared to the single-best ensemble is relatively large in the nin, sin, aus, and spc, in part because the BSS for the single-best ensemble becomes relatively small in those basins.
Another benefit of the MCGE compared to the single-best ensemble is an improvement in reliability, which is a component of the BS. For example, the reliability diagrams for the sin for the individual ensembles and the two MCGEs for a time window of 4–7 days are displayed in Fig. 7. Although the observed frequency is smaller than the forecast probability in all single-model ensembles in Figs. 7a–d, the reliability curves are closer to the diagonal line for the two MCGEs in Figs. 7e and 7f. Such an improvement in reliability compared to a single-best ensemble for these MCGEs designed for TC activity is consistent with the MCGE comparisons in the western North Pacific by Yamaguchi et al. (2012) and Matsueda and Nakazawa (2014) of multimodel track forecasts versus the single-best model track forecasts.
As in Fig. 5, but for reliability diagrams with different ensembles for sin TC activity forecasts with a time window of 4–7 days for the (a) ECMWF, (b) JMA, (c) NCEP, and (d) UKMO single-model ensembles, and by the (e) MCGE3 and (f) MCGE4.
Citation: Weather and Forecasting 30, 6; 10.1175/WAF-D-14-00136.1
The TC activity in the above comparisons was defined as the probability that a TC is present within a 300-km radius from a certain location during a 3-day forecast time window. The sensitivity to this radius and the length of the time window definition are examined by changing the radius from 300 to 100, 500, or 700 km and the length of the time window from 3 to 1, 5, or 7 days. Note that not only ensemble TC activity but also climatological and observed TC activity are recomputed with each new radius and the length of the time window for BSS computations. For example, the BSS for the wnp is shown in Fig. 8. As expected, the BSS values tend to increase with the increasing radius and the increasing length of the time window, especially for longer lead times. For example, the BSS of MCGE3 for the 11–14-day time window increases from 0.0045 to 0.0354 by changing the radius from 300 to 700 km. Similarly, the BSS of MCGE3 for the radius of 300 km for the longest lead time increases from 0.0045 to 0.0388 by changing the length of the time window from 3 to 7 days. Choices of the radius and the length of the time window providing the largest BSS for a time window including 14 days (2 weeks) are 700 km and 7 days, respectively. Given that the annual average TC position errors in the western North Pacific are approximately100 and 500 km at 1- and 5-day forecasts, using the radius of 100 km on a medium-range time scale seems to be too small. These results for the wnp are also found in the other basins and, as expected, indicate that the BSS values become larger with larger radii and longer forecast time window thresholds.
BSSs of TC activity forecasts in the wnp by MCGE3 with threshold distances of 100 (red), 300 (blue), 500 (yellow), and 700 (green) km and with forecast time windows of (a) 1, (b) 3, (c) 5, and (d) 7 days.
Citation: Weather and Forecasting 30, 6; 10.1175/WAF-D-14-00136.1
4. Summary
A systematic verification of operational global medium-range ensemble forecasts by ECMWF, JMA, NCEP, and UKMO of TC activity (i.e., genesis plus the subsequent track) is performed for seven TC basins around the world. The skill of these TC activity forecasts relative to a climatological TC activity is assessed for the single-model ensembles, and then the relative benefit of two multicenter grand ensembles (MCGEs) over that of the single-best ensemble is examined. The verification metric used to measure the skill of these TC activity probability forecasts is the Brier skill score (BSS), which is calculated within 3-day time windows over a forecast length of 2 weeks (i.e., from short- to medium-range time scales). In contrast to many studies that have focused on a single TC basin, a global verification is provided for seven TC basins: North Atlantic (atl), eastern and central Pacific (ecp), western North Pacific (wnp), north Indian Ocean (nin), south Indian Ocean (sin), Australian region (aus), and South Pacific (spc). Since the focus of this study is on both the genesis and the subsequent track, all the ensemble forecasts produced during this 2010–13 period have been used, rather than just the cases in which the TCs have existed at the initial time of the forecasts.
Major findings in this study include the following:
In most of these basins, these operational global medium-range ensembles are capable of providing skillful guidance of TC activity forecasts with a forecast lead time extending to week 2.
The MCGEs have more skill (larger BSS) than the best single-model ensemble, which is generally the ECMWF ensemble for most time windows and in most TC basins.
The benefits of the MCGEs are relatively larger in nin, sin, aus, and spc, where the BSS for the single-best ensemble is relatively small.
The reliability of these TC activity forecasts is improved in the MCGEs compared to the reliability of the individual ensembles.
Both the BSS and the reliability are sensitive to the choice of threshold wind values that are used to define model TCs.
The frequency of correct forecasts of the TC activity decreases with increasing forecast interval and this is seen most notably for the ECMWF, JMA, and NCEP ensembles.
There are some limitations in this study. First, two important parameters have been subjectively assigned for the verification: (i) a threshold distance of 300 km has been specified that determines whether the observed or the forecast TC affects a grid point and (ii) a 3-day time window has been allowed in matching the forecast TC to an observed TC. Changing the threshold distance from 300 to 100, 500, or 700 km, or the time window from 3 to 1, 5, or 7 days, increases the BSS values, but does not change the overall conclusions of this study. However, it should be noted that the areal extent of the TCs influence may differ depending on its size, intensity, asymmetry, etc.
Another limitation in this study is the definition of TC genesis. Because this study has only verified those TCs that reached a maximum sustained wind of 35 kt or stronger during their lifetimes, tropical depressions have been excluded from the verification. Since the verification may be sensitive to the definition of genesis timing, and even weak TCs can cause substantial damage, performance of the ensembles in predicting weak TCs will be investigated in a future study.
This study evaluated only four ensembles, while the TIGGE website has six other ensembles. The reason why these four ensembles were selected for this study is that Yamaguchi et al. (2012) had shown they had relatively high skill in western North Pacific TC track forecasting. In addition, the Vitart et al. (2012) TC tracking algorithm used in this study requires a 6-hourly gridpoint value dataset at the surface and in the atmosphere for all ensemble members, so it is computationally expensive to process the dataset and create a TC tracking database over the 4-yr verification period via the TIGGE portal. Nevertheless, it may be valuable to extend this study by increasing the number of ensembles and thereby create an MCGE with more ensemble members.
Further study is required as to why the skill differs so much among the seven TC basins, and what factors contribute to the differences in predictability among the TCs. These aspects of TC activity forecasting will be investigated in the future by analyzing the differences between successful and unsuccessful ensemble members. Given that TC genesis events and intensity changes are often modulated by vertical wind shear associated with large-scale synoptic features (e.g., Dunion 2013, DeMaria et al. 2014), even relatively low-resolution ensemble forecast fields may provide insights regarding these questions.
Several TIGGE participants started providing TC track forecasts in near–real time under the initiative of the WMO Global Interactive Forecast System (GIFS)–TIGGE Working Group (http://www.bom.gov.au/cyclone/cxmlinfo/). Inclusion of tracking information for preexisting TCs may further expedite the operational use of TC activity forecasts. In addition, a WMO Forecast Demonstration Project such as the Severe Weather Forecasting Demonstration Projects (SWFDPs) or a research and development project such as NWP-TCEFP or the WMO-TLFDP may provide good opportunities for the TC research and forecasting communities to transfer their achievements into operations.
Acknowledgments
The authors thank The Observing System Research and Predictability Experiment (THORPEX) Interactive Grand Global Ensemble (TIGGE) for constructing useful and user-friendly portal sites and providing analysis and forecast data of operational ensemble prediction systems. The authors also thank Koji Kuroiwa, a former chief of the WMO Tropical Cyclone Programme, for constant support and suggestions on this study. Mrs. Penny Jones provided valuable assistance in the preparation of this manuscript.
REFERENCES
Belanger, J. I., Webster P. J. , Curry J. A. , and Jelinek M. T. , 2012: Extended prediction of north Indian Ocean tropical cyclones. Wea. Forecasting, 27, 757–769, doi:10.1175/WAF-D-11-00083.1.
Bougeault, P., and Coauthors, 2010: The THORPEX interactive grand global ensemble. Bull. Amer. Meteor. Soc., 91, 1059–1072, doi:10.1175/2010BAMS2853.1.
Brown, A., and Thepaut J.-N. , 2014: Final report of 29th session of the Working Group on Numerical Experimentation (WGNE-29). WMO, 35 pp. [Available online at http://www.wmo.int/pages/prog/arep/wwrp/new/documents/WGNE_29_Report_Final.pdf.]
DeMaria, M., Sampson C. R. , Knaff J. A. , and Musgrave K. D. , 2014: Is tropical cyclone intensity guidance improving? Bull. Amer. Meteor. Soc., 95, 387–398, doi:10.1175/BAMS-D-12-00240.1.
Dunion, J., 2013: Development of a probabilistic tropical cyclone genesis prediction scheme. Joint Hurricane Testbed Final Rep., NOAA/NHC, 4 pp. [Available online at http://www.nhc.noaa.gov/jht/11-13reports/Final_Dunion_JHT13.pdf.]
Elsberry, R. L., Jordan M. S. , and Vitart F. , 2010: Predictability of tropical cyclone events on intraseasonal timescales with the ECMWF monthly forecast model. Asia-Pac. J. Atmos. Sci., 46, 135–153, doi:10.1007/s13143-010-0013-4.
Elsberry, R. L., Jordan M. S. , and Vitart F. , 2011: Evaluation of the ECMWF 32-day ensemble predictions during 2009 season of western North Pacific tropical cyclone events on intraseasonal timescales. Asia-Pac. J. Atmos. Sci., 47, 305–318, doi:10.1007/s13143-011-0017-8.
Elsberry, R. L., Tsai H.-C. , and Jordan M. S. , 2014: Extended-range forecasts of Atlantic tropical cyclone events during 2012 using the ECMWF 32-day ensemble predictions. Wea. Forecasting, 29, 271–288, doi:10.1175/WAF-D-13-00104.1.
Goerss, J. S., 2000: Tropical cyclone track forecasts using an ensemble of dynamical models. Mon. Wea. Rev., 128, 1187–1193, doi:10.1175/1520-0493(2000)128<1187:TCTFUA>2.0.CO;2.
Goerss, J. S., Sampson C. R. , and Gross J. M. , 2004: A history of western North Pacific tropical cyclone track forecast skill. Wea. Forecasting, 19, 633–638, doi:10.1175/1520-0434(2004)019<0633:AHOWNP>2.0.CO;2.
Halperin, D. J., Fuelberg H. E. , Hart R. E. , Cossuth J. H. , Sura P. , and Pasch R. J. , 2013: An evaluation of tropical cyclone genesis forecasts from global numerical models. Wea. Forecasting, 28, 1423–1445, doi:10.1175/WAF-D-13-00008.1.
Heming, J., and Goerss J. S. , 2010: Track and structure forecasts of tropical cyclones. Global Perspectives on Tropical Cyclones, J. C.-L. Chan and J. D. Kepert, Eds., World Scientific Series on Asia-Pacific Weather and Climate, Vol. 4, World Scientific Press, 287–323.
Majumdar, S. J., and Finocchio P. M. , 2010: On the ability of global ensemble prediction systems to predict tropical cyclone track probabilities. Wea. Forecasting, 25, 659–680, doi:10.1175/2009WAF2222327.1.
Majumdar, S. J., and Torn R. D. , 2014: Probabilistic verification of global and mesoscale ensemble forecasts of tropical cyclogenesis. Wea. Forecasting, 29, 1181–1198, doi:10.1175/WAF-D-14-00028.1.
Matsueda, M., and Nakazawa T. , 2014: Early warning products for severe weather events derived from operational medium-range ensemble forecasts. Meteor. Appl., 22, 213–222, doi:10.1002/met.1444.
Swinbank, R., and Coauthors, 2015: The TIGGE project and its achievements. Bull. Amer. Meteor. Soc., doi:10.1175/BAMS-D-13-00191.1, in press.
Tang, X., Lei X. , and Yu H. , 2012: WMO Typhoon Landfall Forecast Demonstration Project (WMO-TLFDP): Concept and progress. Trop. Cyclone Res. Rev., 1, 89–96.
Tsai, H.-C., Elsberry R. L. , Jordan M. S. , and Vitart F. , 2013: Objective verifications and false alarm analyses of western North Pacific tropical cyclone event forecasts by the ECMWF 32-day ensemble. Asia-Pac. J. Atmos. Sci., 49, 409–420, doi:10.1007/s13143-013-0038-6.
Van der Grijn, G., Paulsen J. E. , Lalaurette F. , and Leutbecher M. , 2005: Early medium-range forecasts of tropical cyclones. ECMWF Newsletter, No. 102, Reading, United Kingdom, 7–14. [Available online at http://old.ecmwf.int/publications/newsletters/pdf/102.pdf.]
Vitart, F., Leroy A. , and Wheeler M. C. , 2010: A comparison of dynamical and statistical predictions of weekly tropical cyclone activity in the Southern Hemisphere. Mon. Wea. Rev., 138, 3671–3682, doi:10.1175/2010MWR3343.1.
Vitart, F., Prates F. , Bonet A. , and Sahin C. , 2012: New tropical cyclone products on the web. ECMWF Newsletter, No. 130, Reading, United Kingdom, 17–23. [Available online at http://old.ecmwf.int/publications/newsletters/pdf/130.pdf.]
Wilks, D. S., 2006. Statistical Methods in the Atmospheric Sciences. 2nd ed. Academic Press, 627 pp.
Yamaguchi, M., Nakazawa T. , and Hoshino S. , 2012: On the relative benefits of a multi-centre grand ensemble for tropical cyclone track prediction in the western North Pacific. Quart. J. Roy. Meteor. Soc., 138, 2019–2029, doi:10.1002/qj.1937.
Yamaguchi, M., Nakazawa T. , and Hoshino S. , 2014: North West Pacific tropical cyclone ensemble forecast project. Trop. Cyclone Res. Rev., 3, 193–201.
Consensus of deterministic model TC tracks will be considered as an ensemble method.