Abstract

This paper is the first of a two-part series of papers that employs the data-mining approach to analyze tropical cyclone (TC) movement in the western North Pacific Ocean. Part I unravels conditions under which TCs tend to recurve, and Part II uncovers conditions leading to TCs making landfall. Here in Part I, a detailed study is carried out into TC recurvature over the South China Sea and western North Pacific. The investigation focuses on the unraveling of rules governing TC recurvature hidden in TC data. The historical TC track database comprises recurving TCs and straight movers. Potential parameters affecting TC recurvature are categorized into three groups: large-scale circulation, circulations surrounding TCs, and variables characterizing TCs. The tree construction algorithm, C4.5, is applied to classify recurving and straight-moving TCs. Parameters measuring large-scale circulation patterns and characterizing TCs play significant roles in building the classification tree. Altogether, 18 rules are discovered from the processed database. Most of the 18 rules can be explained by existing theories and are supported by various empirical findings on TC recurvatures. Rules governing TC recurvature discovered by the present study contain quantitative descriptions of factors such as composite wind fields, geopotential heights, and deep-layer mean winds that are essential to the understanding, interpretation, and prediction of TC recurvatures.

1. Introduction

A tropical cyclone (TC) is a kind of extreme weather event that can cause a tremendous loss of human life and social property through excessive torrential rainfall, flash floods, and strong winds. TC track (Holland 1983; Chan 1984; Holland 1984; Carr and Elsberry 1990; Holland and Lander 1993; Chan 2005) and intensity (Velden and Leslie 1991; Wang and Wu 2004; Wong and Chan 2004; Emanuel 2005; Mandal et al. 2007) are two issues of paramount importance in TC research involving the study of TC mechanisms, observations, simulations, and short- and long-term predictions.

Tropical cyclone recurvature and landfall, in particular, are of significant scientific, social, and economic concern (Bao and Sadler 1983; Krishnamurti et al. 1992; O’Shay and Krishnamurti 2004). Until recently, the largest error in the prediction of TC tracks was caused by TC recurvature, especially sudden or sharp recurvatures (George and Gray 1977; Peak and Elsberry 1986; Holland and Wang 1995). TCs cause most of their damage during or after landfall, and the landfall process is taken to be the process of the entire TC system moving over land (Tuleya and Kurihara 1978). On average, the coast of China is struck annually by about six TC landfalls (Chan and Shi 2000; Liu and Chan 2003; Goh and Chan 2010). TCs forming in the South China Sea (SCS) and western North Pacific Ocean (WNP) are steered northward or northwestward when the subtropical high is strong and shifts westward (Ho et al. 2004; Gao et al. 2009; Goh and Chan 2010). These TCs may make landfall along the Chinese coast if the steering current is persistently strong and westward (Ho et al. 2004; Gao et al. 2009; Goh and Chan 2010). However, westward-moving TCs will turn toward the north and then the northeast if the steering current changes direction from westward to eastward due to their interactions with large-scale circulation patterns (e.g., a subtropical high, monsoon systems, and midlatitude westerlies) (Harr and Elsberry 1991, 1995; Chen et al. 2009). TCs will move far away from the coast without making landfall over China due to the reversal of the steering flow (Ho et al. 2004; Gao et al. 2009; Goh and Chan 2010). TC recurvature and landfall are both dramatically influenced by large-scale circulation (George and Gray 1977; Hodanish and Gray 1993; Liu and Chan 2003; Fudeyasu et al. 2006).

Although process models can be built to predict TC recurvature and landfall, they generally fall short in capturing the intricacies of the underlying mechanisms of such movements (Krishnamurti et al. 1992; Holland and Wang 1995; Li and Chan 1999; Davis et al. 2008). Guided by meteorological knowledge, we, however, might be able to unravel these mechanisms from historical TC tracks via data-mining methods. Regularities thus uncovered can in turn enhance our understanding of TC movements. Therefore, in this two-part series of papers, we will employ data-mining methods to unravel mechanisms of TC recurvature and landfall. This paper is the first part of the series and intends to discover the rules governing TC recurvature from archived historical TC data. In Zhang et al. (2013, Part II of this study), we attempt to mine rules for TC landfall.

TC recurvature is a special type of TC track, turning from westward toward the north and eventually to the northeast in the Northern Hemisphere (Riehl and Shafer 1944; George and Gray 1977; JTWC 1988; Dobos and Elsberry 1993; O’Shay and Krishnamurti 2004). In general, large forecast errors typically occur when a storm that has been predicted to recurve continues on a westward track. Similarly, the sudden recurvature of a storm from its predicted westward track also causes large forecasting errors. Thus, forecasters frequently face uncertainty because nearly half of all WNP TCs recurve at some points in their movements (Dobos and Elsberry 1993). Therefore, a comprehensive understanding of the mechanisms affecting TC recurvature is of great theoretical and operational significance. The results will also play an essential role in disaster preparation, management, and mitigation. Furthermore, the accurate prediction of TC recurvature enables timely forewarnings for people dwelling in coastal regions.

Since the 1950s, numerous statistical and observational studies of the physical mechanisms of TC recurvature have been conducted. The subtropical high, monsoon systems, and the midlatitude westerlies are regarded as the most important synoptic systems controlling TC recurvature (George and Gray 1977; Evans et al. 1991; Harr and Elsberry 1991; Krishnamurti et al. 1992; Holland and Wang 1995; Chen et al. 2009). Synoptically, favorable conditions for recurvature include the penetration of a westerly trough and the eastward retreat of the subtropical high (Riehl and Shafer 1944; George and Gray 1977; Holland 1984; Elsberry 1990; Evans et al. 1991). On the contrary, typical synoptic patterns for nonrecurvature are a strong subtropical high sitting poleward of the TC and a major westerly trough located far west of the TC (George and Gray 1977; Holland and Wang 1995). In the study area, the East Asian summer monsoon (EASM) is the most influential and essential component of the Asian climate systems (Chen and Chang 1980; Tao and Chen 1987; Lau et al. 1988; Ding 1992; Wang and Wu 2004; Wang et al. 2008) largely due to the orographic forcing: huge thermal contrasts between the world’s largest continent, Eurasia, and the largest ocean basin, the Pacific. It is also strongly influenced by the world’s highest landform, the Tibetan Plateau. Meteorologically, there are five categories of EASM indices: the east–west thermal contrast index and the north–south thermal contrast index, which are constructed according to the vertical shear of zonal winds; the shear vorticity (often expressed by a north–south gradient of the zonal winds); the southwesterly monsoon indices, which directly gauge the strength of the low-level East Asian monsoon winds using the 850-hPa southwesterly winds; and the South China Sea monsoon indices (Wang et al. 2008). The WNP subtropical high (WNPSH) is a kind of large-scale circulation pattern exerting the most dramatic influence on TC movement, including TC recurvature through the steering flow (Kasahara 1959; George and Gray 1977; Elsberry and School 1987; Evans et al. 1991; Harr and Elsberry 1991, 1995; Holland and Wang 1995; Chen et al. 2009). Kasahara (1959) first noticed that the position of the subtropical high plays a crucial role in TC movement. He suggested that the vector mean of the 500- and 700-hPa steering flows should be used for forecasting TC movement. The 500-hPa geopotential heights have been widely used to measure the WNPSH (Zhang and Yu 1998; Sun and Ying 1999; Sui et al. 2007; Zhou et al. 2009). In addition, the National Climate Center in China (NCC) announced the monthly indices to describe the westward extension as well as the north edge, and the intensity of the WNPSH, based on the monthly mean 500-hPa geopotential heights. The midlatitude westerlies also exert a significant impact on TC recurvature through upper-tropospheric zonal winds (~200 hPa) (George and Gray 1977; Guard 1977). For example, it has been observed that when the base of the westerlies lowers considerably west of a TC in connection with an eastward-moving midlatitude trough and remains low, northward recurvature will occur (Riehl and Shafer 1944).

The outer circulations of TCs (e.g., 5°–7° radius from the TC center) have been found to be the indicators of TC recurvature from observational studies. For example, the 200-hPa zonal and meridional winds (George and Gray 1977; Hodanish and Gray 1993), 300-hPa zonal wind (Burroughs and Brand 1973), 400-hPa zonal wind (Hodanish and Gray 1993), 500-hPa zonal wind (George and Gray 1977; Chan et al. 1980; Hodanish and Gray 1993), and the deep-layer mean flow (Holland 1984; Fitzpatrick 1992) appear to play an important role in determining TC recurvature within a certain period (e.g., 24 h) preceding recurvature. Statistical methods have also been employed to formulate prediction schemes for TC recurvature, including discriminant analysis (Leftwich 1980; Lage 1982; Ford et al. 1993), regression analysis (Leftwich 1980; Lage 1982), and empirical orthogonal functions (Lage 1982; Ford et al. 1993).

In addition to the observational analyses discussed above, dynamic models, from the barotropic (Evans et al. 1991) and baroclinic (Krishnamurti et al. 1992; Holland and Wang 1995) perspectives have long been utilized to study TC recurvature. The vorticity advection (Elsberry 1990), momentum exchange between a TC and its environment (Krishnamurti et al. 1992; Holland and Wang 1995; Li and Chan 1999; Chan et al. 2002), and diabatic heating (Holland and Wang 1995; Chan et al. 2002) provide explanations for TC recurvature through dynamic modeling. Although these studies advance our understanding of TC recurvature, unraveling the intricate conditions under which TCs tend to recurve remains a big challenge in TC forecasting.

Unlike dynamic modeling, data mining (DM), also referred to as knowledge discovery from data, is the process of identifying valid, novel, potentially useful, and understandable patterns in data (Fayyad and Stolorz 1997; Miller and Han 2001; Leung 2010). The volume of TC-related data is huge with a large number of fields (attributes). The growth rate of TC-related datasets appears to outstrip the amount of data that traditional analysis methods can handle. Useful knowledge (e.g., structures, processes, relationships, regularities, and patterns) underlying the mechanisms of TC recurvature is often hidden in the historical TC database. Therefore, the DM approach is suitable for unraveling from the TC database rules governing the recurvature of TCs throughout their movements.

In recent years, a number of DM algorithms have been employed to unravel TC tracks and intensities from TC databases (Harr and Elsberry 1991; Harr and Elsberry 1995; Gaffney and Smyth 1999; Camargo et al. 2004; Gaffney 2004; Camargo et al. 2007; Gaffney et al. 2007; Lee et al. 2007; Camargo et al. 2008; Cheng et al. 2008; Kossin et al. 2010). However, the emphasis of these works is largely placed on the application of clustering methods to the historical TC tracks. Additionally, decision-tree methods [e.g., classification and regression tree (CART) and the C4.5 algorithm] have seldom been employed to analyze TC recurvature. Tree-based algorithms (e.g., CART and C4.5) are widely used for classification because of their simplicity, interpretability, and adaptability in processing datasets with errors or missing values, as well as their ability to help unravel rules and regularities (Breiman et al. 1984; Quinlan 1993; Fayyad and Stolorz 1997). TC recurvature is treated as a binary classification problem (recurvature and nonrecurvature) in this study. The primary objective of this paper is to unravel rules governing TC recurvature through a decision-tree-based algorithm—the C4.5 algorithm—for classification (Quinlan 1993) from the macroscopic (e.g., the subtropical high, monsoon systems, and westerlies) and microscopic (i.e., the outer radius ambient wind fields of TCs) perspectives.

In what follows, the study area and data source are described in section 2. The basics of the C4.5 algorithm, a data-mining method, are introduced in section 3. In section 4, we give the analysis results and their interpretations. The paper is then concluded by a summary and discussion in section 5.

2. Study area and data source

The study area of this research covers the SCS and WNP. All of the TCs that occurred in these study areas from 2000 to 2009 are employed as a basis for rule discovery. The TC-related dataset consists of two major classes: TC best-track data and meteorological data.

The TC best-track data are made available from the Japan Meteorological Agency (JMA) Regional Specialized Meteorological Center Tokyo (RSMC Tokyo). These postanalysis best-track data consist of the 6-hourly estimates of a TC position (latitude and longitude), minimum central pressure (MCP), and the 10-min maximum sustained wind speed of all the TCs in the WNP basin, including SCS starting from 1951 to the present. There are 2552 TC sample points obtained from 102 TC tracks, 63 of which are of recurving tracks and the rest are of straight-moving tracks (Fig. 1). Because TC recurvature is influenced by the internal and external atmospheric conditions of TCs prior to recurvature, the TC observations after recurvature are excluded in the following analysis. It should be noted that the recurvature points of 63 recurving TC tracks are in the ocean. In this study, TCs that recurved over land are excluded because the mechanisms for recurvature over the ocean and over land differ due to TC–land interaction and the cutting off of water vapor for postlandfall TCs. Thus, our TC sample set comprises recurvers over the ocean and nonrecurvers (straight movers). The recurvers should possess certain characteristics: 1) the TC is sustained for at least 72 h in its best track, 2) the recurvature point has the minimum longitude among the entire track, and 3) the latitude of the subsequent point cannot be less than the latitude of the recurvature point (i.e., no looping tracks or equatorward recurving TCs). Of the recurving TC tracks, 1560 observations prior to TC recurvature are employed as samples and labeled class 0. For straight-moving TC tracks, all 992 observations are labeled class 1.

Fig. 1.

The (a) straight-moving and (b) recurving TC tracks during 2000–09 used for analysis. The solid circles and curves are 6-h TC observations and tracks, respectively.

Fig. 1.

The (a) straight-moving and (b) recurving TC tracks during 2000–09 used for analysis. The solid circles and curves are 6-h TC observations and tracks, respectively.

The meteorological variables (e.g., wind fields and geopotential height in different atmospheric layers) are derived from the National Centers for Environmental Protection (NCEP) Global Forecasting System (GFS) Final Analysis (FNL) at 6-h time intervals from 30 July 1999 to the present (Yang et al. 2006). These NCEP FNL (Final) Operational Global Analysis data are on continuous 1.0° × 1.0° grids for every 6 h. This GFS is run 4 times per day in near–real time at NCEP. The data are made available on the surface, at 26 mandatory (and other pressure) levels, varying from 1000 to 10 hPa. We employ geopotential height and zonal u and meridional υ winds from the FNL dataset in this study. Although the NCEP–National Center for Atmospheric Research (NCAR) reanalysis data (Kalnay et al. 1996) are available for a longer time period, the NCEP–NCAR’s 2.5° × 2.5° resolution is coarser than that of the NCEP FNL dataset. Therefore, we derive the meteorological variables from the NCEP FNL dataset rather than the NCEP–NCAR dataset. Because of the higher resolution and time-period availability of the NCEP FNL dataset, the TC samples are only extracted from 2000 to 2009.

3. Methodology

The C4.5 algorithm is employed to unravel rules for TC recurvature from the potential factors affecting this phenomenon. These factors are categorized into three groups: variables relating to large-scale circulation, variables measuring the circulation patterns surrounding TCs, and variables characterizing TCs (see Table 1). The variables in Table 1 are displayed in abbreviations. In the group “circulation surrounding TC,” uwnd_200 and vwnd_200 are, respectively, the average zonal and meridional winds of 6°–8° radial belts at the 200-hPa level. The other variables in the same group are defined likewise. The five variables chosen to measure the strength and position of the large-scale circulation (i.e., the subtropical high, EASM, westerlies) that largely control TC recurvature are area index (area_IndexSTH), intensity index (inten_IndexSTH), westward extension index of the subtropical high (west_extSTH) in the WNP (for measuring the strength of the subtropical high and the position of the subtropical high ridge), the EASM index in Wang and Fan (1999) (Monsoon_WF), as well as the westerly index (W_Westerly). In what follows, we first give a brief discussion of C4.5 and then examine these potential factors.

Table 1.

The three-group potential attributes influencing TC recurvature.

The three-group potential attributes influencing TC recurvature.
The three-group potential attributes influencing TC recurvature.

a. The decision-tree approach

1) Algorithm selection

A decision tree is a typical DM method for unraveling rules and selecting features from databases for decision making (Quinlan 1987). A decision tree is defined as a classification procedure that recursively partitions a dataset into smaller subdivisions based on a set of tests defined at each branch (or node) in the tree. The tree is composed of a root node (formed from all data), a set of internal nodes (splits), and a set of terminal nodes (leaves). Each node in a decision tree has only one parent node, and two or more descendant nodes. Within this framework, a dataset is classified by sequentially subdividing it according to the decision framework defined by the tree, and a class label is assigned to each observation according to the leaf node into which the observation falls (Friedl and Brodley 1997). Decision trees offer several advantages over traditional supervised classification procedures, such as maximum likelihood classification. In particular, decision trees are strictly nonparametric and do not require any assumption about the distributions of the input data. In addition, they can handle nonlinear relationships between features and classes, missing values, and numeric and categorical data (Hampson and Volper 1986; Fayyad and Irani 1992). Finally, decision trees are intuitive because the classification structure is explicit and interpretable (Friedl and Brodley 1997).

The CART model is classic decision tree [proposed by Breiman et al. (1984)]. Corresponding to CART, a tree-structured decision space is estimated by recursively splitting the data at each node based on a statistical test that increases the homogeneity of the training data in the resulting descendant nodes (Breiman et al. 1984). Other decision-tree algorithms, such as the chi-square automatic interaction detection (CHAID), ID3, and C4.5 (Quinlan 1993), are proposed for classification. CHAID and ID3 have limitations when classifying continuous variables. Continuous variable must be converted into categorical variables under CHAID and ID3. However, C4.5 can be used for continuous variables. Since the variables employed in this study (e.g., wind fields and indices for large-scale circulation) are mainly continuous, CART and the C4.5 algorithm are employed to classify recurving and nonrecurving TCs.

2) Algorithm description

CART is a nonparametric statistical methodology developed for the classification of categorical or continuous dependent variables. If the dependent variable is categorical, CART produces a classification tree. When the dependent variable is continuous, it produces a regression tree. In a classification and regression tree, the major goal is to produce an accurate set of data classifiers by uncovering the predictive structure of the problem under consideration (Breiman et al. 1984).

In brief, the construction of a CART classification tree centers on the definition of three major concepts. They are 1) the sample-splitting rule, 2) the goodness-of-split criteria, and 3) the criteria for choosing an optimal or final tree for analysis. CART builds trees by applying predefined splitting rules and goodness-of-split criteria at every step in the node-splitting process.

In a highly condensed form, the steps in the tree-building process involve 1) growing a large tree (one with a large number of nodes), 2) combining some of the branches of this large tree to generate a series of subtrees of different sizes (varying numbers of nodes), and 3) selecting an optimal tree via the application of “measures of accuracy of a tree” (Breiman et al. 1984; Tsoi and Pearson 1990; Yohannes and Hoddinott 1999).

In CART, the variable-splitting criteria are the Gini impurity criterion and Twoing criterion for nominal targets. The two-class problem is simply a special case of the multiclass problem. The impurity function Φ is a function defined on the set of all J-tuples of numbers (p1, p2, p3, … , pJ) satisfying pj ≥ 0, j = 1, 2, … , J, with the following properties:

  • Φ is maximum only at the point ;

  • Φ achieves its minimum only at the points (1, 0, … , 0), (0, 1, … , 0), … , (0, 0, … , 1); and

  • Φ is a symmetric function of p1, p2, p3, … , pj.

Given a node t with estimated class probabilities p(j | t) representing the likelihood this node belongs to class j, j = 1, … , J, a measure of node impurity given t is defined by

 
formula

where Φ is an impurity function and the search is made for the optimal split that most effectively reduces the tree impurity to the greatest extent. The impurity function is conventionally selected as

 
formula

Adopting the Gini diversity index, it takes on the form

 
formula

In the two-class problem, the index is reduced to

 
formula

The Gini index has an interesting interpretation. Instead of using the plurality rule to classify objects at a node t, the rule that assigns an object selected at random from the node to class i with probability p(i | t) is employed. The estimated probability that the item is actually in class j is p(j | t). Therefore, the estimated probability of misclassification under this rule is the Gini index:

 
formula

It has been proven that, for any split s, Δi(s, t) > 0. Actually, it is strictly concave (Breiman et al. 1984), so that Δi(s, t) = 0 only if p(j | tL) = p(j | tR) = p(j | t), j = 1, … , J.

As another powerful classification algorithm, the C4.5 algorithm (Quinlan 1993) is also a supervised learning method based on decision-tree induction. The fundamental strategy is to select an attribute that will best separate samples into individual classes by a measurement. Here, the measurement is the information gain ratio, based on the information-theoretic concept of “entropy.” The primary objective is to find the minimum information required to maintain the least “impurity” of the partitions (Han and Kamber 2006).

Let S be the training set consisting of s data samples and s(Ci) be the number of records in S that belong to class Ci (for i = 1, 2, … , m). The information (entropy) needed to classify S is

 
formula

Hence, the amount of information needed to partition S into {S1, S2 … Sv} by attribute A (the number of distinct values of attribute A is V) is

 
formula

The gain is computed as

 
formula

where

 
formula

In contrast to other classification algorithms, decision trees (e.g., C4.5 algorithm) have the following advantages (Breiman et al. 1984; Quinlan 1987, 1993; Tsoi and Pearson 1990; Rokach and Maimon 2005):

  1. decision trees are self-explanatory and the logic flow is easy to follow, particularly when they are converted into a set of rules;

  2. decision trees can handle both nominal and numeric attributes;

  3. decision trees are rich enough to represent any discrete-value classifier;

  4. decision trees are capable of handling datasets containing errors; and

  5. decision trees are capable of handling datasets containing missing values.

Decision trees also have some disadvantages that are summarized as follows (Breiman et al. 1984; Quinlan 1987, 1993; Tsoi and Pearson 1990; Rokach and Maimon 2005). First, the decision-tree algorithms (e.g., CART and C4.5) require that the target attribute should have discrete or discretized values. Second, as decision trees use the “divide and conquer” method, they will more likely perform well if some highly relevant attributes are there, but they will be less effective if many complex interactions exist. Third, the greedy approach may make decision trees sensitive to the training set, irrelevant attributes, and noise.

3) Implementation of the C4.5 algorithm

The C4.5 algorithm is implemented in Weka 3.6.2 (a collection of machine learning algorithms for DM tasks). The algorithms in Weka can either be applied directly to a dataset or called from the user-defined Java code. Weka contains tools for data preprocessing, classification, regression, clustering, association rules, and visualization. It is also suitable for the development of new machine learning schemes (the software is open source and is available online at http://www.cs.waikato.ac.nz/ml/weka/index.html). To lower the complexity of the classification tree, the minimum leaf size is set at 50 because the smaller the minimum leaf size is, the more complicated the classification tree becomes.

The objective of cross validation is to evaluate the generalization capability of an algorithm. Cross validation is a statistical method for evaluating learning algorithms by dividing data into learning and testing sets (Stone 1974; Chen et al. 2002; Kantardzic 2003; Han and Kamber 2006). In typical cross validation, the training and validation datasets must cross over in subsequent rounds such that each data point has a chance to be validated. The fundamental cross-validation method is the k-fold cross validation. In k-fold cross validation, the whole dataset is first partitioned into k equal-size segments or folds. The k iterations of training and validation are subsequently performed such that within each iteration, a different fold (segment) of the data is used for validation whereas the remaining k – 1 folds are used for learning. Tenfold cross validation is used for verification in this study. Other parameters are used according to their default settings. The settings are binary split is true, confidence factor is 0.25, debug is false, numFolds are 3, reducedErrorPruning is true, saveInstanceData is false, seed is 1, subtreeRaising is true, unpruned is false, and uselaplace is false.

b. Indices for large-scale circulation

A number of indices have been proposed to measure the position and intensity of large-scale circulation (e.g., monsoon systems, subtropical high, and midlatitude westerlies) in the SCS and WNP. The indices depicting large-scale circulation belong to potential parameters that influence TC recurvature.

In Wang et al. (2008), 25 existing EASM indices were examined in terms of two observed major modes of interannual variation in the precipitation and circulation anomalies from 1979 to 2006. They ultimately recommended an index, the reversed Wang and Fan index (Monsoon_WF; Wang and Fan 1999) which is virtually identical to the leading principal component of the EASM and greatly facilitates real-time monitoring. The Monsoon_WF index belongs to the shear vorticity index and was first proposed to quantify the variability of the WNP summer monsoon. This index is defined by uwnd_850 in 5°–15°N, 90°–130°E minus uwnd_850 in 22.5°–32.5°N, 110°–140°E, where uwnd_850 denotes the 850-hPa zonal wind. Physically, Monsoon_WF reflects the variations in both the WNP monsoon trough and the subtropical high, which are the key elements of the EASM circulation system (Tao and Chen 1987). The monsoon index in this study (Wang and Fan 1999) is negatively correlated with the strength of the monsoon. Therefore, a low monsoon index (here <−1.732) refers to the relatively high intensity of the summer monsoon. Given the existing findings on monsoon–TC interactions (Lander 1996; Elsberry 2004; Harr and Chan 2004; Chen et al. 2009), a strong summer monsoon in the SCS and WNP tends to provide northward and northeastward components for TC movement. Therefore, the strong monsoon plays a positive role in TC recurvature here. Thus, Monsoon_WF is chosen to measure the status of EASM, which exerts significant influences on TC recurvature (Lander 1996; Elsberry 2004; Harr and Chan 2004; Ko and Hsu 2006; Chen et al. 2009).

The WNPSH indices announced by NCC have been used in previous studies (Chen et al. 2001). The NCC defined indices to describe the status of the WNPSH. These indices include the WNPSH intensity index, westward extension index, and area index according to the mean 500-hPa geopotential heights in the weather charts published by the China Meteorological Administration (Chen 1999). By considering the preexisting subtropical high indices defined by NCC and other scholars (Lu 2001; Lu et al. 2007; Sui et al. 2007), we define the WNPSH indices as follow: intensity index is defined as the average geopotential height of the points with a geopotential height larger than 5870 gpm in the region (10°–60°N, 100°E–180°), the WNPSH area index is defined as the number of grids with a geopotential height larger than 5870 gpm in this region, and the west extension index is defined as the longitude at the western edge of the 5870-gpm contour. This definition of the westward extension index is in line with that of NCC. The three indices are extracted and calculated using the FNL reanalysis data.

Rossby (1939) defined a typical westerly index derived from the 500-hPa geopotential height: IR = H35 − H55. It should be noted that IR is the westerly index defined by Rossby, and H35 and H55 represent the 500-hPa geopotential heights at 35° and 55°N, respectively. Here, we calculate the IR from 100°E to 180° to measure the midlatitude westerlies, and the larger the IR, the stronger the westerly.

c. Outer radius flow surrounding TCs

A variety of indices depicting large-scale circulation patterns have been presented in the previous subsection. Apart from these large-scale synoptic systems surrounding TCs, the outer radius (e.g., 5°–7°, 6°–8°) ambient wind fields from a TC center also play a significant role in determining whether the TC will recurve. Numerous radial circles have been proposed to investigate ambient wind fields in recent decades; for example, the 5°–7° radial circle from the TC center is the best region for the indication of the wind fields surrounding TCs (Chan et al. 1980; Chan 1984). Based on their rawinsonde database, Hodanish and Gray (1993) found that the 1–3 octants in the 6°–8° radial circle from a TC center is the best region for the wind fields. Fitzpatrick (1992) verified that the upper-tropospheric zonal winds at a 6° radius to the northwest of a TC are probably a crucial factor in determining whether the TC will recurve or remain on a west-northwest course. The result indicated that the composite deep-layer mean (850–300 hPa) wind fields averaged within the radius of 5°–7° from the TC center provide a significant component of the steering flow for TCs (Fitzpatrick 1992). To choose the most appropriate outer radius for calculating the wind fields, the radial circles, such as 4°–6°, 5°–7°, 6°–8°, 7°–9°, and the 1–3 octants in these radial belts, are tested for classification accuracy.

d. Variables characterizing TCs

As a result of the atmospheric characteristics of different places, TC movements are largely related to the positions in which the TCs are located. The latitude and longitude of a TC center are therefore potential factors influencing TC recurvature. Several studies claimed that TC recurvature are associated with intensity (Riehl 1972; Evans and McKinley 1998; Knaff 2009). TC intensity is related to the depth of best deep-layer mean, which largely suggests the direction of TC movement (Dong and Neumann 1986). The latitude, longitude, and intensity (here central pressure in hectopascals) of a TC center are thus chosen as potential parameters for mining rules governing TC recurvature.

To choose the most appropriate radial belt and algorithm, the C4.5 algorithm and CART are compared with respect to their classification performances. The first column in Table 2 represents the widths of the radial circles within which the environmental flows (e.g., the 200- and 500-hPa zonal wind fields) are averaged. The second and third columns indicate the classification accuracy by CART when the minimum leaf sizes are 20 and 50, respectively. The fourth and fifth columns show the classification accuracy of the C4.5 algorithm when the leaf sizes are 20 and 50, respectively. It should be noted that classification accuracy is calculated by inputting all the variables described in Table 1. From Table 2, the radial belt (6°–8°) outperforms the others and the C4.5 algorithm performs better than CART when the other parameters are the same. Therefore, the 6°–8° radial belt is employed to measure the ambient wind fields and the C4.5 algorithm is used to analyze the potential parameters.

Table 2.

The classifying accuracy by the CART and C4.5 algorithms based on different radial belts. The boldface numbers represent the row with maximum classifying accuracy by the C4.5 algorithm. Table entries are in percent.

The classifying accuracy by the CART and C4.5 algorithms based on different radial belts. The boldface numbers represent the row with maximum classifying accuracy by the C4.5 algorithm. Table entries are in percent.
The classifying accuracy by the CART and C4.5 algorithms based on different radial belts. The boldface numbers represent the row with maximum classifying accuracy by the C4.5 algorithm. Table entries are in percent.

4. Analysis results and interpretation

a. Results

The results indicate that latitude (lat), longitude (lon), pressure, uwnd_200, uwnd_1000, vwnd_800, vwnd_850, vwnd_1000, area_IndexSTH, west_extSTH, and Monsoon_WF are chosen by the C4.5 algorithm to build the decision tree, stipulating 18 unraveled rules governing TC recurvature (see Fig. 2 and Table 3). Among these variables, longitude is chosen for the splitting of the root node. For the two resultant child nodes of the root node, central pressure (measurement of TC intensity) is the splitting variable. The average accuracy of TC recurvature prediction by the C4.5 algorithm is 84.364% on a basis of the dependent TC samples. In Fig. 2, 1 means recurvature and 0 means nonrecurvature. The rectangles are leaf nodes whereas the ellipses or circles are parent nodes. A path from the root node to the leaf node represents a rule that can be used as a reference for TC recurvature prediction.

Fig. 2.

The classification tree governing TC recurvature as constucted by the C4.5 algorithm.

Fig. 2.

The classification tree governing TC recurvature as constucted by the C4.5 algorithm.

Table 3.

The 18 rules governing TC recurvature.

The 18 rules governing TC recurvature.
The 18 rules governing TC recurvature.

Taking leaf node 0(263.0/12.0) as an example, the 0 before the parenthesis means nonrecurvature and 263.0 and 12.0 indicate that, among the 263 samples of the leaf node, there are 251 (263 − 12) nonrecurvature samples and 12 recurvature samples, respectively. The first column in Table 3 describes the rules unraveled. The second column depicts the attributes contained in each rule. The third column shows the classification accuracy of each rule. The accuracy is calculated by dividing the number of samples correctly classified by the total number of samples in the leaf node. From the third column, the highest classification accuracy is 0.977 whereas the lowest is 0.530. It is apparent that the longitude of a TC center appears in every rule as it is the first attribute selected by the C4.5 algorithm. For substantiation, interpretations of some of these rules are made in the following subsection.

Previous investigations have shown that some synoptic patterns can identify TC recurvature. For example, TCs under the influence of a weak subtropical high and southward westerly trough tend to recurve, whereas a strong and westward subtropical high causes TCs to move westward or northwestward (Riehl and Shafer 1944; George and Gray 1977; Holland 1984; Elsberry 1990; Evans et al. 1991). In fact, the tree built by the C4.5 algorithm (Fig. 2) contains not only empirical knowledge based on these existing synoptic patterns, but also the rules derived from characteristics such as central pressure, latitude, longitude, and the indices for depicting large-scale circulation.

b. Results verification and interpretation

Verification and interpretation, collectively called pattern evaluation (Han and Kamber 2006), are essential components of data mining. In addition to cross validation in the training and classification step, the decision tree in Fig. 2 is further verified using the RSMC–JMA TC best-track data in 2010 to see whether or not it can correctly classify the newly recurving and nonrecurving TCs.

From the 14 TCs occurring in 2010, seven TC tracks with 134 observations were selected for verification. The requirements for selecting the testing dataset are consistent with those for selecting training datasets, as discussed in section 3—that is, 1) the TC is sustained for at least 72 h, 2) the recurvature point has the minimum longitude, and 3) the latitude of the subsequent point cannot be less than the latitude of the recurvature point. Of the seven selected tracks, four are recurvers and three are straight movers. The fundamental information about the test TC tracks are shown in Table 4. The total classification accuracy of the decision tree is 80.597%, which is only slightly lower than the 84.364% training accuracy. Table 5 shows the confusion matrix of verification. Sixty-six nonrecurvature TC points are all correctly classified, whereas 26 TC points of recurving TCs are classified as the nonrecurvature class. It can be inferred that the straight movers are easier to predict than recurvers.

Table 4.

Selected TC tracks in 2010 for verification.

Selected TC tracks in 2010 for verification.
Selected TC tracks in 2010 for verification.
Table 5.

Confusion matrix of the verification.

Confusion matrix of the verification.
Confusion matrix of the verification.

TC tracks are largely determined by the steering flow, especially the 500-hPa (midlevel troposphere) environmental flow (Holland 1983; Chan 1985, 2005; Holland and Lander 1993; Ngan and Chan 1995). It has been found that the steering level is greatly influenced by the intensity of the TC (Velden and Leslie 1991). The stronger the intensity of the TC is, the deeper is the steering level (Holland 1993). The deep-layer mean flow is, in general, the pressure-weighted flow averaged from the 850-hPa layer to the 300-hPa layer (Holland 1984, 1993). Therefore, the deep-layer mean flow determines the TC movement to a large extent (Chan and Gray 1982; Elsberry and School 1987; Carr and Elsberry 1990; Harr and Elsberry 1991). It has been reckoned that deep-layer means (e.g., 850–300 hPa) are most reliable for forecasting (Holland 1993). Therefore, the decision tree with unraveled rules is interpreted from the perspective of the deep-layer mean. Each leaf node has a certain number of samples. The composite deep-layer mean of the samples in a leaf node are obtained within the 60° × 60° latitude–longitude square anchored at the centers of the TC samples. The deep-layer mean flow indicates the steering flow around the TC center. The direction of the steering flow can thus affect whether a TC recurves or not. It should be noted that some of the conditions of a rule can be combined to avoid redundancy. For example, the conditions lon ≤ 146°E and lon ≤ 130°E can be combined to be lon ≤ 130°E in a single rule. However, this combination will leave out the original conditions before lon ≤ 146°E becomes effective in the rule.

The rule with the highest classification accuracy (0.977) is generated by the leaf node 1(129.0/3.0). It can be described as follows: If the longitude of the TC center is to the east of 130°E, the central pressure of the TC center (TC intensity) is smaller than 1006 hPa, the area_Index of the subtropical high is smaller than 314, the average zonal wind within the 6°–8° radius in the 200-hPa layer is >−3.57 m s−1, the western edge of the subtropical high shifts to the west of 133°E, and the longitude of the TC center is to the east of 146°E, then the TC will recurve.

As observed in Fig. 3, the subtropical high is located to the north of the TC center. This is a normal situation since the TC lies to the east of 146°E to its west beyond where the subtropical high always extends. Supposing the TC moves westward by 20° (east of 126 °E), it will lie to the west of the subtropical high and will be steered by the northeastward wind in the western part of the subtropical high toward recurvature. This constitutes the reason why the subtropical high sitting to the north of a TC center still leads to TC recurvature. This suggests that despite a strong subtropical high north of a TC, this rule can still predict this TC to recurve based on variables such as longitude, high-tropospheric wind, and central pressure (TC intensity).

Fig. 3.

The composite deep-layer mean wind fields (m s−1) and geopotential heights (gpm) in the 500-hPa layer of the samples in leaf node 1(129.0/3.0). The plot is relative to the center of the composite TC [at coordinates (0, 0) with the typhoon symbol]. The thick 5870-gpm contour of geopotential height indicates the center of the subtropical high.

Fig. 3.

The composite deep-layer mean wind fields (m s−1) and geopotential heights (gpm) in the 500-hPa layer of the samples in leaf node 1(129.0/3.0). The plot is relative to the center of the composite TC [at coordinates (0, 0) with the typhoon symbol]. The thick 5870-gpm contour of geopotential height indicates the center of the subtropical high.

The rule with the lowest classification accuracy (0.530) among all the rules is generated by leaf node 0(87.0/41.0), which can be described as follows: If the longitude of the TC center lies to the east of 130°E, the central pressure of the TC center (TC intensity) is smaller than 1006 hPa, the area_Index of the subtropical high is larger than 314, the Monsoon_WF index is smaller than 6.241, and the latitude of the TC center is to the south of 16°N, then the TC will not recurve. This leaf node contains 87 samples, 41 of which belong to recurving TCs. The samples belonging to recurving TCs have an eastward-shift longitude (>130°E) and relatively equatorward latitude (≤16°N). With the area_Index of the subtropical high being larger than 314, these TC samples are subject to a strong steering flow caused by the subtropical high. It should be noted that the TC samples of recurving TCs in this node are mainly in the very early part of their life span when the large-scale circulation and circulation surrounding TCs are characterized by nonrecurving TCs owing to direct influence of a strong subtropical high. This helps us to understand why 41 TC samples of recurving TCs are misclassified as nonrecurving TCs.

Figure 4 illustrates that the center of the subtropical high shifts to the west of the TC center by more than 30°. The TCs tend to be steered westward under the steering current in the southern part of the subtropical high. Therefore, they will move straight under the direct influence of the steering flow.

Fig. 4.

As in Fig. 3, but for leaf node 0(87.0/41.0).

Fig. 4.

As in Fig. 3, but for leaf node 0(87.0/41.0).

Taking the rule formed by the path from the root node to the leaf node 1(617.0/26.0) as another example, it can be stated as follows: If the longitude of the TC center is to the east of 130°E, the central pressure is less than 1006 hPa, the number of grids within the 5870-gpm contour of the 500-hPa geopotential heights is less than 314, the 200-hPa zonal wind averaged within the 6°–8° radial belt is mainly eastward, and the western edge of the subtropical high is to the east of 133°E, then the TC will recurve. According to this rule, TCs will recurve under the influence of moderate midlatitude westerlies and a weak and retreating subtropical high. The composite 500-hPa geopotential heights and deep-layer mean wind fields derived from the TC samples belonging to this rule are in agreement with this rule (Fig. 5). The 5870-gpm contours in Fig. 5 illustrate the status of the subtropical high. We can observe that the subtropical high retreats to the east of the TC, and the steering flow surrounding the TCs, illustrated by the deep-layer mean from the 850–300-hPa layer, tends to turn to the north and northeast (George and Gray 1977; Holland 1984, 1993). As a result, this steering flow surrounding the TC center leads the TC to recurve.

Fig. 5.

As in Fig. 3, but for leaf node 0(617.0/26.0).

Fig. 5.

As in Fig. 3, but for leaf node 0(617.0/26.0).

The rule derived from leaf node 1(120/11) is stated as follows: If the longitude of the TC center is to the east of 130°E, the central pressure of the TC center (TC intensity) is smaller than 1006 hPa, the area_Index of the subtropical high is smaller than 314, the average zonal wind within 6°–8° radius in the 200-hPa layer is >−3.57, the western edge of the subtropical high shifts to the west of 133°E, the longitude of the TC center is to the west of 146°E, and the monsoon index (Wang and Fan 1999) is <−1.732 (strong summer monsoon), then the TC will recurve. With regard to this rule, the relatively weak subtropical high, the relatively high zonal wind in the 200-hPa layer, and the strong summer monsoon will cause the TC to recurve. The composited deep-layer mean is plotted in Fig. 6a. Although the subtropical high ridge shifts westward beyond the TC center, the TC still tends to recurve due to the strong northward component of the monsoon to the south and east of this TC (depicted by the wind fields in the 850-hPa layer; see Fig. 6b) and its own longitude (130°E < lon ≤ 146°E).

Fig. 6.

The composition of the wind fields (m s−1) and geopotential heights (gpm) in the (a) 500- and (b) 850-hPa layers of the samples in the left node [1(120.0/11.0)]. The thick contours in (a) indicate the center of the subtropical high.

Fig. 6.

The composition of the wind fields (m s−1) and geopotential heights (gpm) in the (a) 500- and (b) 850-hPa layers of the samples in the left node [1(120.0/11.0)]. The thick contours in (a) indicate the center of the subtropical high.

The rule derived from leaf node 0(286.0/20.0) can be stated as follows: If a TC moves to the west of 130°E, the central pressure of the TC center is smaller than 996 hPa, the western edge of the subtropical high shifts to the west of 127°E, and the longitude of the TC center is to the west of 123°E, then the TC will move straight. This rule involves the longitude of the TC center and the status of the subtropical high. The longitude at 123°E is quite close to the Chinese coast. If the subtropical high is poleward of a TC, such steering flow more likely causes the TC to move straight. For this rule, the steering flow is easterly and the subtropical high shifts west of the TC center (depicted by the typhoon symbol) (Fig. 7). The TC will therefore move westward or northwestward under the steering of the prevalent easterly in the southern part of the subtropical high under such large-scale circulation.

Fig. 7.

As in Fig. 3, but for leaf node 0(286.0/20.0).

Fig. 7.

As in Fig. 3, but for leaf node 0(286.0/20.0).

The rule obtained from leaf node 0(74.0/18.0) is stated as follows: If a TC is located to the east of 130°E, the pressure of the TC center is <1006 hPa, the area index of the subtropical high is larger than 314, and the monsoon index is larger than 6.241 (a weak monsoon), then the TC will not recurve. The conditions of this rule indicate a strong subtropical high, a weak monsoon, and moderate TC intensity. The composite deep-layer mean wind and 500-hPa geopotential height are overlaid in Fig. 8. In Fig. 8, the subtropical high is significantly stronger and situated to the north of the TC center. The steering flow surrounding the TC is largely westward. Therefore, the TC tends to move westward under the influence of the strong subtropical high and the weak monsoon in this rule (figure not shown).

Fig. 8.

As in Fig. 3, but for leaf node 0(74.0/18.0).

Fig. 8.

As in Fig. 3, but for leaf node 0(74.0/18.0).

The rule derived from leaf node 1(270.0/54.0) is stated as follows: If a TC moves to the west of 130°E, the central pressure of the TC is <996 hPa, the western edge of the subtropical high is to the east of 127°E, the average 1000-hPa zonal wind within the 6°–8° radius from the TC center is <0.898 m s−1, and the average meridional wind within the 6°–8° radius from the TC center in the 1000-hPa layer is <3.624 m s−1, then the TC will recurve. This rule includes the low-level wind fields (i.e., the 1000-hPa layer), which means that the low-level zonal and meridional winds also influence TC movement to some degree. The rule indicates that the western edge of the subtropical high retreats and the low-level winds are weak. We can observe that the subtropical high ridge shifts to the east of 127°E and the deep-layer mean is weak (Fig. 9). The TC moves northward or northeastward along the western part of the subtropical high and eventually recurves according to existing theories on steering flow (Chan 1985; Velden and Leslie 1991; Ngan and Chan 1995).

Fig. 9.

As in Fig. 3, but for leaf node 1(270.0/54.0).

Fig. 9.

As in Fig. 3, but for leaf node 1(270.0/54.0).

The rule obtained from leaf node 0(110.0/22.0) can be stated as follows: If the longitude of the TC center is to the west of 130°E, the central pressure of the TC center <996 hPa, the western edge of the subtropical high ridge shifts to the west of 127°E, the TC moves to the east of 123°E, and the monsoon index is larger than −5.168 (strong monsoon), then the TC will not recurve. This rule indicates that if a TC is located to the east of 123°E, the subtropical high extends westward, and the monsoon is relatively weak, then the TC will move westward or northwestward because of the strong steering flow (see Fig. 10).

Fig. 10.

As in Fig. 3, but for leaf node 0(110.0/22.0).

Fig. 10.

As in Fig. 3, but for leaf node 0(110.0/22.0).

The conditions of the recurving leaf node [1(74.0/24.0)] derived from the binary split are similar to those for node 0(110.0/22.0) except that the monsoon index is <−5.168. Such a monsoon index suggests that corresponding to the leaf node [1(74.0/24.0)] should be a strong monsoon flow. We use 850-hPa geopotential height and wind fields to indicate the monsoon flow. Due to the strong northward component of a strong monsoon flow close to the TC (refer to Fig. 11), the rule formed by leaf node 1(74.0/24.0) leads to recurvature. This result is in full agreement with previous findings on TC–monsoon interaction (Harr and Elsberry 1991, 1995; Chen et al. 2009).

Fig. 11.

The composite deep-layer mean wind fields and geopotential heights in the 850-hPa layer of leaf node 1(74.0/24.0). The thick contours depict the region with a geopotential height > 1500 gpm.

Fig. 11.

The composite deep-layer mean wind fields and geopotential heights in the 850-hPa layer of leaf node 1(74.0/24.0). The thick contours depict the region with a geopotential height > 1500 gpm.

The rule produced by leaf node 0(263.0/12.0) is stated as follows: If a TC moves to the west of 130°E and the intensity of the TC center is <996 hPa, then the TC will not recurve. We can observe in Fig. 12 that the subtropical high lies to the north of the TC center. It should be noted that the TC center is adjacent to the Chinese coast (lon < 130°E). Therefore, the TC will move westward under the influence of the prevailing easterly and will not recurve.

Fig. 12.

As in Fig. 3, but for leaf node 1(263.0/12.0).

Fig. 12.

As in Fig. 3, but for leaf node 1(263.0/12.0).

Because of space limitations, not all rules are interpreted in this paper. The 10 rules discussed are meant to be taken as examples of interpretation and substantiation. Apart from the modulation of the subtropical high, monsoon systems, and the midlatitude westerlies (Riehl and Shafer 1944; George and Gray 1977; Hodanish and Gray 1993), the intensity, latitude, and longitude of the TC center and the ambient wind of the TC also exert an influence on TC recurvature. As is consistent with the existing results concerning the significant modulation of the subtropical high on TC movement (George and Gray 1977; Elsberry and School 1987; Evans et al. 1991; Harr and Elsberry 1991, 1995; Holland and Wang 1995; Chen et al. 2009), the west extension index and the area index of the subtropical high are selected by the C4.5 algorithm to identify TC recurvature. The Monsoon_WF monsoon index (Wang and Fan 1999) is chosen three times to construct the decision tree (Fig. 2). Particular attention should be paid to this monsoon index when measuring the impact of monsoons on TC recurvature. A variety of splitting values in the classification tree can provide references for TC forecasting, since these values can best differentiate the recurving and nonrecurving cases based on the archived TC data. For example, TC recurvature is highly sensitive to the longitude 130°E because this longitude is chosen by the C4.5 algorithm in the first step to build the classification tree. Additionally, special attention should be paid to the longitude 123°E in relation to TC recurvature because it is a key split value in building the decision tree. Being based on the data-mining results, these critical values will provide useful references for forecasts of TC recurvature in the WNP and SCS basins.

c. Case study

To discuss the potential use of these rules for forecasting TC recurvature, we have conducted a case study using Typhoon Malakas, which formed in 2010 at 0600 UTC 20 September and dissipated at 0000 UTC 28 September. During its life span, Malakas first headed westward and then moved toward northwestward at 1200 UTC 22 September. It turned north and then northeast at 0000 UTC 24 September. We use the decision tree (Fig. 2) to forecast whether this TC will recurve or not, employing the variables quantifying large-scale circulation, wind fields surrounding TCs, and variables characterizing TC. We start from the genesis of this TC. For the decision tree, the first splitting variable is lon, which represents the longitude at the TC center. The longitude for the genesis is 146.5°E, which is larger than 130°E. This genesis follows the right branch of this root node and the next variable is pressure, which represents the central pressure. At TC genesis, the central pressure is 1006 hPa. Therefore, this genesis follows the left branch (≤1006) to area_IndexSTH, which is the area index of the subtropical high. The value of area_IndexSTH at genesis is 354, which is >314 (the splitting value of this variable). This genesis follows the right branch to Monsoon_WF (the monsoon index). The value for this monsoon index is −1.21, which is much smaller than the splitting value 6.241. This genesis follows the left branch to lat (for latitude). The latitude at the TC genesis is 19.0°N, which is larger than the splitting value (16°). Eventually, the genesis of Malakas reaches a leaf node [1(349/71)] that represents the class of recurvature. Therefore, at TC genesis, we can infer that Malakas tends to recurve based on our decision tree. At 0600 UTM 28 September, as the values of these variables have changed, we should use these values to infer the future trend of this TC at this observation. Since the longitude has changed to 146.0°E, this observation follows the right branch of the root node to the pressure variable. This observation follows the left branch to area_IndexSTH because the central pressure is still 1006 hPa. The area_IndexSTH is 447 (>314), and the path follows the right branch to Monsoon_WF. Because the monsoon index at this time is 0.473 (≤6.241), this observation follows the left branch to lat. The latitude is 19°N, which is larger than 16°. This observation follows the right branch of this node to leaf node 1(349/71). Therefore, this observation at 0600 UTM 28 September still suggests the TC recurvature. At each point in time during its life span, we can infer whether it will recurve or not, based on the decision tree and values of the variables in this tree. For Typhoon Malakas, we infer that the storm will recurve corresponding to all of the observations prior to recurvature. However, it is noticeable that the decision tree cannot provide references for the exact time at which the TC will recurve. It can merely infer whether it will recurve or not during the subsequent life span based on its current situation (e.g., latitude, longitude, central pressure, and large-scale circulation).

5. Conclusions and discussion

We have presented in this part of our two-part paper a detailed study of TC recurvature over the SCS and WNP through data mining. This investigation focuses on the unraveling rules and regularities from data that govern TC recurvature. The historical TC track database is composed of recurving TCs and straight movers. Potential parameters affecting TC recurvature have been categorized into three groups: large-scale circulation,ccirculations surrounding TCs, and variables characterizing TCs. The tree construction algorithm, C4.5, has been applied to classify recurving and nonrecurving TCs. Significant variables have been selected from the potential parameters by the C4.5 algorithm to build the classification tree. Altogether, 18 rules have been discovered from the processed database. Furthermore, the built classification tree transforms the qualitative empirical rules to quantitative rules characterized by selected variables and splitting values. Therefore, this classification tree can provide references for the prediction of TC recurvature.

Most of the 18 rules can be explained by existing theories and are supported by various empirical findings on TC recurvatures (Riehl and Shafer 1944; George and Gray 1977; Chan 1984; Elsberry 1990; Evans et al. 1991; Hodanish and Gray 1993; Harr and Elsberry 1995; Holland and Wang 1995). As compared with some existing qualitative findings, such as “a strong subtropical high, together with a weak monsoon causes TCs to recurve,” rules governing TC recurvature discovered by the present study contain quantitative descriptions of factors such as composite wind fields, geopotential heights, and deep-layer mean winds essential to the understanding, interpretation, and prediction of TC recurvatures. Based on fundamental theories about TCs, we have found new local conditions (e.g., latitude and longitude) that have been captured by the rules derived from the decision tree.

To further evaluate the importance of the local conditions, we perform two classification analyses, one with and one without the local conditions while keeping the other variables (i.e., variables characterizing large-scale circulation and TCs) unchanged. Experimental results show that the classification accuracy is 84.364% when latitude and longitude and are taken into consideration. However, when these two parameters are omitted, the accuracy drops to 74.2446%. Such results show that the local conditions, namely the position of a TC, play a significant role in the forecasting of TC recurvature, when large-scale circulation and circulation surrounding TCs are involved.

Some studies (Gray 1979; Lee et al. 1989; Gray 1998) indicate that the stage (developing, mature, and weakening) of a TC should be taken into account when forecasting TC recurvature. To assess the effects of stage on a forecast, we perform additional analyses by examining the accuracy of recurvature forecasting under different stages of TC development. Because stage information is nominal or ordinal in scale, it cannot be accommodated directly by the C4.5 algorithm. However, on the basis of TC stage information, we can divide the TC samples into three basic groups—developing stage, mature stage, and weakening stage—and we use the corresponding TC samples to make our predictions. The classification accuracies for the three groups are 84.364%, 88.4848%, and 89.1266%, respectively. This suggests that we are increasingly more accurate in our forecasting when TCs move from the time they first form to the time they weaken. This appears to be natural because at the developing stage, we use all samples prior to TC recurvature as recurving samples, even though these samples may not have the characteristics of a recurving TC during their early life span. These TC samples, therefore, tend to cause errors in recurvature classification. However, we have more and better information about recurving TCs in the latter part of their life spans. This additional analysis highlights the importance of stage information in forecasting. We will develop in our future study an appropriate method for accommodating both nominal (e.g., TC stage information) and interval scaled variables (e.g., zonal and meridional wind, geopotential height, and monsoon indices) in the classification algorithm to gain a more thorough understanding of the process.

It should be noted that our basic framework for analyzing TC tracks is based on the steering flow concept. In addition to the steering flow, thermodynamic processes such as diabatic heating affect TC motion, including TC recurvature (Li and Chan 1999; Wu and Wang 2000; Chan et al. 2002). Such processes may bias the influence of steering flow on the direction of a TC track. We will consider these factors in our future study.

Dynamic models have undergone rapid advancements in the last few decades and are now commonly used in TC track and intensity prediction. Rules unraveled by the present study provide some new perspectives and information on TC recurvature that might be useful for the formulation of more comprehensive dynamic models in future research by fine-tuning parameters or incorporating critical variables in model construction. New models using data mining, in turn, might shed light on the discovery of further rules previously hidden in the data. Thus, dynamic modeling and data mining can mutually enrich one another in a complementary and integrated manner.

Acknowledgments

This research was jointly supported by the Geographical Modeling and Geocomputation Program under the Focused Investment Scheme of the Chinese University of Hong Kong, the National Natural Science Foundation of China (Grant 41201045), and the 973 project (2012CB955800) of the Ministry of Science and Technology of China. Work of the third author (JCLC) was supported by the General Research Fund of the Research Grants Council of the HKSAR government with Grant CityU 100210.

REFERENCES

REFERENCES
Bao
,
C. L.
, and
J. C.
Sadler
,
1983
:
The speed of recurving typhoons over the western North Pacific Ocean
.
Mon. Wea. Rev.
,
111
,
1280
1292
.
Breiman
,
L.
,
J. H.
Friedman
,
R. A.
Olshen
, and
C. J.
Stone
,
1984
: Classification and Regression Trees. Wadsworth and Brooks, 358 pp.
Burroughs
,
L. D.
, and
S.
Brand
,
1973
:
Speed of tropical storms and typhoons after recurvature in the western North Pacific Ocean
.
J. Appl. Meteor.
,
12
,
452
458
.
Camargo
,
S. J.
,
A.
Robertson
,
S.
Gaffney
, and
P.
Smyth
,
2004
: Cluster analysis of western North Pacific tropical cyclone tracks. Preprints, 26th Conf. on Hurricanes and Tropical Meteorology, Miami, FL, Amer. Meteor. Soc., 250–251.
Camargo
,
S. J.
,
A. W.
Robertson
,
S. J.
Gaffney
,
P.
Smyth
, and
M.
Ghil
,
2007
:
Cluster analysis of typhoon tracks. Part I: General properties
.
J. Climate
,
20
,
3635
3653
.
Camargo
,
S. J.
,
A. W.
Robertson
,
A. G.
Barnston
, and
M.
Ghil
,
2008
:
Clustering of eastern North Pacific tropical cyclone tracks: ENSO and MJO effects
.
Geochem. Geophys. Geosyst.
,
9
,
Q06V05, doi:10.1029/2007GC001861
.
Carr
,
L. E.
, and
R. L.
Elsberry
,
1990
:
Observational evidence for predictions of tropical cyclone propagation relative to environmental steering
.
J. Atmos. Sci.
,
47
,
542
546
.
Chan
,
J. C. L.
,
1984
:
An observational study of the physical processes responsible for tropical cyclone motion
.
J. Atmos. Sci.
,
41
,
1036
1048
.
Chan
,
J. C. L.
,
1985
:
Identification of the steering flow for tropical cyclone motion from objectively analyzed wind fields
.
Mon. Wea. Rev.
,
113
,
106
116
.
Chan
,
J. C. L.
,
2005
:
The physics of tropical cyclone motion
.
Annu. Rev. Fluid Mech.
,
37
,
99
128
.
Chan
,
J. C. L.
, and
W. M.
Gray
,
1982
:
Tropical cyclone movement and surrounding flow relationships
.
Mon. Wea. Rev.
,
110
,
1354
1374
.
Chan
,
J. C. L.
, and
J. E.
Shi
,
2000
:
Frequency of typhoon landfall over Guangdong Province of China during the period 1470–1931
.
Int. J. Climatol.
,
20
,
183
190
.
Chan
,
J. C. L.
,
W. M.
Gray
, and
S. Q.
Kidder
,
1980
:
Forecasting tropical cyclone turning motion from surrounding wind and temperature fields
.
Mon. Wea. Rev.
,
108
,
778
792
.
Chan
,
J. C. L.
,
F. M. F.
Ko
, and
Y. M.
Lei
,
2002
:
Relationship between potential vorticity tendency and tropical cyclone motion
.
J. Atmos. Sci.
,
59
,
1317
1336
.
Chen
,
G.
,
1999
: The subtropical high. The Droughts and Floods in Summer in China and Background Fields, Z. Zhao, Ed., China Meteorological Press, 45–52.
Chen
,
M.
,
J.
Han
, and
P.
Yu
,
2002
:
Data mining: An overview from a database perspective
.
IEEE Trans. Knowl. Data Eng.
,
8
,
866
883
.
Chen
,
T. C.
,
S. Y.
Wang
,
M. C.
Yen
, and
A. J.
Clark
,
2009
:
Impact of the intraseasonal variability of the western North Pacific large-scale circulation on tropical cyclone tracks
.
Wea. Forecasting
,
24
,
646
666
.
Chen
,
T.-J. C.
, and
C.
Chang
,
1980
:
The structure and vorticity budget of an early summer monsoon trough (mei-yu) over southeastern China and Japan
.
Mon. Wea. Rev.
,
108
,
942
953
.
Chen
,
Y.
,
H.
Zhang
,
R.
Zhou
, and
H.
Wu
,
2001
:
Relationship between the ground surface temperature in Asia and the intensity and location of subtropical high in the western Pacific
.
Chin. J. Atmos. Sci.
,
25
,
515
522
.
Cheng
,
C.
,
N.
Hsu
, and
C.
Wei
,
2008
:
Decision-tree analysis on optimal release of reservoir storage under typhoon warnings
.
Nat. Hazards
,
44
,
65
84
.
Davis
,
C.
, and
Coauthors
,
2008
:
Prediction of landfalling hurricanes with the Advanced Hurricane WRF model
.
Mon. Wea. Rev.
,
136
,
1990
2005
.
Ding
,
Y. H.
,
1992
:
Summer monsoon rainfalls in China
.
J. Meteor. Soc. Japan
,
70
,
373
396
.
Dobos
,
P. H.
, and
R. L.
Elsberry
,
1993
:
Forecasting tropical cyclone recurvature. Part I: Evaluation of existing methods
.
Mon. Wea. Rev.
,
121
,
1273
1278
.
Dong
,
K.
, and
C. J.
Neumann
,
1986
:
The relationship between tropical cyclone motion and environmental geostrophic flows
.
Mon. Wea. Rev.
,
114
,
115
122
.
Elsberry
,
R. L.
,
1990
:
International experiments to study tropical cyclones in the western North Pacific
.
Bull. Amer. Meteor. Soc.
,
71
,
1305
1316
.
Elsberry
,
R. L.
,
2004
: Monsoon-related tropical cyclones in East Asia. East Asian Monsoon, C.-P. Chang, Ed., Series on Meteorology of East Asia, Vol. 2, World Scientific, 463–498.
Elsberry
,
R. L.
, and
N. P.
School
,
1987
: A Global View of Tropical Cyclones. University of Chicago Press, 185 pp.
Emanuel
,
K.
,
2005
:
Increasing destructiveness of tropical cyclones over the past 30 years
.
Nature
,
436
,
686
688
.
Evans
,
J. L.
, and
K.
McKinley
,
1998
:
Relative timing of tropical storm lifetime maximum intensity and track recurvature
.
Meteor. Atmos. Phys.
,
65
,
241
245
.
Evans
,
J. L.
,
G. J.
Holland
, and
R. L.
Elsberry
,
1991
:
Interactions between a barotropic vortex and an idealized subtropical ridge. Part I: Vortex motion
.
J. Atmos. Sci.
,
48
,
301
314
.
Fayyad
,
U.
, and
K.
Irani
,
1992
: The attribute selection problem in decision tree generation. Proc. 10th Natl. Conf. on Artificial Intelligence, San Jose, CA, Association for the Advancement of Artificial Intelligence, 104–110. [Available online at http://www.aaai.org/Papers/AAAI/1992/AAAI92-016.pdf.]
Fayyad
,
U.
, and
P.
Stolorz
,
1997
:
Data mining and KDD: Promise and challenges
.
Future Gener. Comput. Syst.
,
13
,
99
115
.
Fitzpatrick
,
M. E.
,
1992
: Tropical cyclone motion and recurvature in TCM-90. M.S. thesis, Dept. of Atmospheric Science, Colorado State University, 93 pp.
Ford
,
D. M.
,
R. L.
Elsberry
,
P. A.
Harr
, and
P. H.
Dobos
,
1993
:
Forecasting tropical cyclone recurvature. Part II: An objective technique using an empirical orthogonal function representation of vorticity fields
.
Mon. Wea. Rev.
,
121
,
1279
1290
.
Friedl
,
M.
, and
C.
Brodley
,
1997
:
Decision tree classification of land cover from remotely sensed data
.
Remote Sens. Environ.
,
61
,
399
409
.
Fudeyasu
,
H.
,
S.
Iizuka
, and
T.
Matsuura
,
2006
:
Impact of ENSO on landfall characteristics of tropical cyclones over the western North Pacific during the summer monsoon season
.
Geophys. Res. Lett.
,
33
,
doi:10.1029/2006GL027449
.
Gaffney
,
S. J.
,
2004
: Probabilistic curve-aligned clustering and prediction with regression mixture models. Ph.D. dissertation, University of California, Irvine, 281 pp.
Gaffney
,
S. J.
, and
P.
Smyth
,
1999
: Trajectory clustering with mixtures of regression models. Proc. Fifth ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, San Diego, CA, ACM, 63–72,
doi:10.1145/312129.312198
.
Gaffney
,
S. J.
,
A.
Robertson
,
P.
Smyth
,
S.
Camargo
, and
M.
Ghil
,
2007
:
Probabilistic clustering of extratropical cyclones using regression mixture models
.
Climate Dyn.
,
29
,
423
440
.
Gao
,
S.
,
Z.
Meng
,
F.
Zhang
, and
L. F.
Bosart
,
2009
:
Observational analysis of heavy rainfall mechanisms associated with severe Tropical Storm Bilis (2006) after its landfall
.
Mon. Wea. Rev.
,
137
,
1881
1897
.
George
,
J. E.
, and
W. M.
Gray
,
1977
:
Tropical cyclone recurvature and nonrecurvature as related to surrounding wind-height fields
.
J. Appl. Meteor.
,
16
,
34
42
.
Goh
,
A. Z.-C.
, and
J. C. L.
Chan
,
2010
:
An improved statistical scheme for the prediction of tropical cyclones making landfall in south China
.
Wea. Forecasting
,
25
,
587
593
.
Gray
,
W. M.
,
1979
: Hurricanes: Their formation, structure and likely role in the tropical circulation. Meteorology over the Tropical Oceans, D. B. Shaw, Ed., Royal Meteor. Soc., 155–218.
Gray
,
W. M.
,
1998
:
The formation of tropical cyclones
.
Meteor. Atmos. Phys.
,
67
,
37
69
.
Guard
,
C.
,
1977
: Operational application of a tropical cyclone recurvature/non-recurvature study based on 200 mb wind fields. FLEWEACEN Tech. Note JTWC 77-1, 40 pp.
Hampson
,
S.
, and
D.
Volper
,
1986
:
Linear function neurons: Structure and training
.
Biol. Cybern.
,
53
,
203
217
.
Han
,
J.
, and
M.
Kamber
,
2006
: Data Mining: Concepts and Techniques. 2nd ed. Morgan Kaufmann, 770 pp.
Harr
,
P. A.
, and
R. L.
Elsberry
,
1991
:
Tropical cyclone track characteristics as a function of large-scale circulation anomalies
.
Mon. Wea. Rev.
,
119
,
1448
1468
.
Harr
,
P. A.
, and
R. L.
Elsberry
,
1995
:
Large-scale circulation variability over the tropical western North Pacific. Part I: Spatial patterns and tropical cyclone characteristics
.
Mon. Wea. Rev.
,
123
,
1225
1246
.
Harr
,
P. A.
, and
J.
Chan
,
2004
: Monsoon impacts on tropical cyclone variability. Review Topic B3e: Tropical Cyclones, Naval Postgraduate School Press, 31 pp. [Available online at http://www.weather.nps.navy.mil/~cpchang/IWM-III/R10-B3e-Tropical%20Cyclones.pdf.]
Ho
,
C. H.
,
J. J.
Baik
,
J. H.
Kim
,
D. Y.
Gong
, and
C. H.
Sui
,
2004
:
Interdecadal changes in summertime typhoon tracks
.
J. Climate
,
17
,
1767
1776
.
Hodanish
,
S.
, and
W. M.
Gray
,
1993
:
An observational analysis of tropical cyclone recurvature
.
Mon. Wea. Rev.
,
121
,
2665
2689
.
Holland
,
G. J.
,
1983
:
Tropical cyclone motion—Environmental interaction plus a beta-effect
.
J. Atmos. Sci.
,
40
,
328
342
.
Holland
,
G. J.
,
1984
:
Tropical cyclone motion—A comparison of theory and observation
.
J. Atmos. Sci.
,
41
,
68
75
.
Holland
,
G. J.
,
1993
: Tropical cyclone motion. Global Guide to Tropical Cyclone Forecasting, G. Holland, Ed., WMO, 1–46.
Holland
,
G. J.
, and
M.
Lander
,
1993
:
The meandering nature of tropical cyclone tracks
.
J. Atmos. Sci.
,
50
,
1254
1266
.
Holland
,
G. J.
, and
Y. Q.
Wang
,
1995
:
Baroclinic dynamics of simulated tropical cyclone recurvature
.
J. Atmos. Sci.
,
52
,
410
426
.
JTWC
,
1988
: 1988 annual tropical cyclone report. Joint Typhoon Weather Center, 216 pp. [Available online at http://www.usno.navy.mil/NOOC/nmfc-ph/RSS/jtwc/atcr/1988atcr.pdf.]
Kalnay
,
E.
, and
Coauthors
,
1996
:
The NCEP/NCAR 40-Year Reanalysis Project
.
Bull. Amer. Meteor. Soc.
,
77
,
437
471
.
Kantardzic
,
M.
,
2003
: Data Mining: Concepts, Models, Methods and Algorithms. Wiley-Interscience, 345 pp.
Kasahara
,
A.
,
1959
:
A comparison between geostrophic and non-geostrophic numerical forecasts of hurricane movement with the barotropic steering model
.
J. Meteor.
,
16
,
371
384
.
Knaff
,
J. A.
,
2009
:
Revisiting the maximum intensity of recurving tropical cyclones
.
Int. J. Climatol.
,
29
,
827
837
.
Ko
,
K. C.
, and
H. H.
Hsu
,
2006
:
Sub-monthly circulation features associated with tropical cyclone tracks over the East Asian monsoon area during July–August season
.
J. Meteor. Soc. Japan
,
84
,
871
889
.
Kossin
,
J. P.
,
S. J.
Camargo
, and
M.
Sitkowski
,
2010
:
Climate modulation of North Atlantic hurricane tracks
.
J. Climate
,
23
,
3057
3076
.
Krishnamurti
,
T. N.
,
H. S.
Bedi
,
K. S.
Yap
, D. Oosterhof, and G. Rohaly,
1992
:
Recurvature dynamics of a typhoon
.
Meteor. Atmos. Phys.
,
50
,
105
126
.
Lage
,
T. D.
,
1982
: Forecasting tropical cyclone recurvature using an empirical orthogonal function representation of the synoptic forcing. M.S. thesis, Dept. of Meteorology, Naval Postgraduate School, 77 pp.
Lander
,
M.
,
1996
:
Specific tropical cyclone track types and unusual tropical cyclone motions associated with a reverse-oriented monsoon trough in the western North Pacific
.
Wea. Forecasting
,
11
,
170
186
.
Lau
,
K.
,
G.
Yang
, and
S.
Shen
,
1988
:
Seasonal and intraseasonal climatology of summer monsoon rainfall over East Asia
.
Mon. Wea. Rev.
,
116
,
18
37
.
Lee
,
C.
,
R.
Edson
, and
W. M.
Gray
,
1989
:
Some large-scale characteristics associated with tropical cyclone development in the North Indian Ocean during FGGE
.
Mon. Wea. Rev.
,
117
,
407
426
.
Lee
,
J.
,
J.
Han
, and
K.
Whang
,
2007
: Trajectory clustering: A partition-and-group framework. Proc. 2007 ACM SIGMOD Int. Conf. on Management of Data, Beijing, China, ACM, 593–604.
Leftwich
,
P. W.
,
1980
:
An analysis of predictions of recurvature of Atlantic tropical cyclones
.
Bull. Amer. Meteor. Soc.
,
61
,
1126
.
Leung
,
Y.
,
2010
: Knowledge Discovery in Spatial Data. Springer-Verlag, 360 pp.
Li
,
Y. S.
, and
J. C. L.
Chan
,
1999
:
Momentum transports associated with tropical cyclone recurvature
.
Mon. Wea. Rev.
,
127
,
1021
1037
.
Liu
,
K. S.
, and
J. C. L.
Chan
,
2003
:
Climatological characteristics and seasonal forecasting of tropical cyclones making landfall along the south China coast
.
Mon. Wea. Rev.
,
131
,
1650
1662
.
Lu
,
R.
,
2001
:
Interannual variability of the summertime North Pacific subtropical high and its relation to atmospheric convection over the warm pool
.
J. Meteor. Soc. Japan
,
79
,
771
783
.
Lu
,
X. Y.
,
X. Z.
Zhang
, and
J. N.
Chen
,
2007
: The relationship between the interdecadal variability of East Asian summer monsoon’s movement and the spatial distribution pattern of the summer rainfall in East China. Atmospheric and Environmental Remote Sensing Data Processing and Utilization III: Readiness for GEOSS, M. D. Goldberg et al., Eds., International Society for Optical Engineering (SPIE Proceedings, Vol. 6684), R6840–R6840.
Mandal
,
M.
,
U. C.
Mohanty
,
P.
Sinha
, and
M. M.
Ali
,
2007
:
Impact of sea surface temperature in modulating movement and intensity of tropical cyclones
.
Nat. Hazards
,
41
,
413
427
.
Miller
,
H. J.
, and
J.
Han
,
2001
: Geographic data mining and knowledge discovery: An overview. Geographic Data Mining and Knowledge Discovery, H. J. Miller and J. Han, Eds., Taylor and Francis, 3–32.
Ngan
,
K. W.
, and
J. C. L.
Chan
,
1995
: Tropical cyclone motion—Steering vs propagation. Preprints, 21st Conf. on Hurricanes and Tropical Meteorology, Miami, FL, Amer. Meteor. Soc., 23–25.
O’Shay
,
A. J.
, and
T. N.
Krishnamurti
,
2004
:
An examination of a model’s components during tropical cyclone recurvature
.
Mon. Wea. Rev.
,
132
,
1143
1166
.
Peak
,
J. E.
, and
R. L.
Elsberry
,
1986
:
Prediction of tropical cyclone turning and acceleration using empirical orthogonal function representations
.
Mon. Wea. Rev.
,
114
,
156
164
.
Quinlan
,
J.
,
1987
: Decision trees as probabilistic classifiers. Proc. Fourth Int. Workshop on Machine Learning, Irvine, CA, American Association for Artificial Intelligence, 31–37.
Quinlan
,
J.
,
1993
: C4.5: Programs for Machine Learning. Morgan Kaufmann, 302 pp.
Riehl
,
H.
,
1972
:
Intensity of recurved typhoons
.
J. Appl. Meteor.
,
11
,
613
615
.
Riehl
,
H.
, and
R. J.
Shafer
,
1944
:
The recurvature of tropical storms
.
J. Atmos. Sci.
,
1
,
42
54
.
Rokach
,
L.
, and
O.
Maimon
,
2005
: Decision trees. Data Mining and Knowledge Discovery Handbook, O. Maimon and L. Rokach, Eds., Springer, 165–192.
Rossby
,
C.
,
1939
:
Relation between variations in the intensity of the zonal circulation of the atmosphere and the displacements of the semi-permanent centers of action
.
J. Mar. Res.
,
2
,
38
55
.
Stone
,
M.
,
1974
:
Cross-validatory choice and assessment of statistical predictions
.
J. Roy. Stat. Soc.
,
36B
,
111
147
.
Sui
,
C.
,
P.
Chung
, and
T.
Li
,
2007
:
Interannual and interdecadal variability of the summertime western North Pacific subtropical high
.
Geophys. Res. Lett.
,
34
,
L11701
,
doi:10.1029/2006GL029204
.
Sun
,
S.
, and
M.
Ying
,
1999
:
Subtropical high anomalies over the western Pacific and its relations to the Asian monsoon and SST anomaly
.
Adv. Atmos. Sci.
,
16
,
559
568
.
Tao
,
S.
, and
L.
Chen
,
1987
: A review of recent research on the East Asian summer monsoon in China. Monsoon Meteorology, C.-P. Chang and T. N. Krishnamurti, Eds., Oxford Monographs on Geology and Geophysics, Vol. 7, Oxford University Press, 60–92.
Tsoi
,
A. C.
, and
R. A.
Pearson
,
1990
: Comparison of three classification techniques, CART, C4.5 and multi-layer perceptrons. Proc. Conf. on Advances in Neural Information Processing Systems 3, Denver, CO, NIPS, 963–969. [Available online at http://books.nips.cc/papers/files/nips03/0963.pdf.]
Tuleya
,
R. E.
, and
Y.
Kurihara
,
1978
:
A numerical simulation of the landfall of tropical cyclones
.
J. Atmos. Sci.
,
35
,
242
257
.
Velden
,
C. S.
, and
L. M.
Leslie
,
1991
:
The basic relationship between tropical cyclone intensity and the depth of the environmental steering layer in the Australian region
.
Wea. Forecasting
,
6
,
244
253
.
Wang
,
B.
, and
Z.
Fan
,
1999
:
Choice of South Asian summer monsoon indices
.
Bull. Amer. Meteor. Soc.
,
80
,
629
638
.
Wang
,
B.
,
Z. W.
Wu
,
J. P.
Li
,
J.
Liu
,
C. P.
Chang
,
Y. H.
Ding
, and
G. X.
Wu
,
2008
:
How to measure the strength of the East Asian summer monsoon
.
J. Climate
,
21
,
4449
4463
.
Wang
,
Y.
, and
C. C.
Wu
,
2004
:
Current understanding of tropical cyclone structure and intensity changes—A review
.
Meteor. Atmos. Phys.
,
87
,
257
278
.
Wong
,
M. L. M.
, and
J. C. L.
Chan
,
2004
:
Tropical cyclone intensity in vertical wind shear
.
J. Atmos. Sci.
,
61
,
1859
1876
.
Wu
,
L.
, and
B.
Wang
,
2000
:
A potential vorticity tendency diagnostic approach for tropical cyclone motion
.
Mon. Wea. Rev.
,
128
,
1899
1911
.
Yang
,
F.
,
H. L.
Pan
,
S. K.
Krueger
,
S.
Moorthi
, and
S. J.
Lord
,
2006
:
Evaluation of the NCEP Global Forecast System at the ARM SGP site
.
Mon. Wea. Rev.
,
134
,
3668
3690
.
Yohannes
,
Y.
, and
J.
Hoddinott
,
1999
: Classification and regression trees: An introduction. Tech. Guide 3, International Food Policy Research Institute, 27 pp.
Zhang
,
J.
, and
S.
Yu
,
1998
:
A diagnostic study on the relationship between the assembling of low frequency waves in the Pacific Ocean and the abnormality of the subtropical high
.
Adv. Atmos. Sci.
,
15
,
247
257
.
Zhang
,
W.
,
Y.
Leung
, and
J. C. L.
Chan
,
2013
:
The analysis of tropical cyclone tracks in the western North Pacific through data mining. Part II: Tropical cyclone landfall
.
J. Appl. Meteor. Climatol.
,
52
,
1417
1432
.
Zhou
,
T.
, and
Coauthors
,
2009
:
Why the western Pacific subtropical high has extended westward since the late 1970s
.
J. Climate
,
22
,
2199
2215
.