## 1. Introduction

A tropical cyclone (TC) is one of the most devastating weather systems because it can involve multiple hazards over the course of 1 or 2 days. One striking example was Tropical Cyclone Nargis in 2008, which killed 138 366 people in Myanmar and ranks as the second deadliest disaster of the decade of the 2000s according to the Center for Research on Epidemiology of Disasters (Rodriguez et al. 2009). In addition to high winds and heavy rain, tropical storms can result in life-threatening floods and mudslides. High quality seamless forecasts, from nowcasting to seasonal forecasts, are needed to mitigate human and property losses.

Seasonal forecasting of TC activity was pioneered by Nicholls (1979) and Gray (1984a,b) in the early 1980s for the Australian and North Atlantic regions, respectively. For the western North Pacific (WNP), issuing seasonal forecasts of the annual number of tropical cyclones and typhoons was first attempted by J. Chan and his colleagues in 1997 (Camargo et al. 2007a). The large-scale atmospheric and oceanic conditions incorporated into their statistical forecast model (Chan et al. 1998, 2001) are El Niño–Southern Oscillation (ENSO), the extent of the Pacific subtropical ridge, the intensity of the Indian–Burma trough, the polar vortex, and the frequency of cold-air intrusions in China. Different predictors are used for the Pacific and South China Sea (Liu and Chan 2003). Recently, various forecast models were developed for specific TC-prone areas in East Asia such as Taiwan (Chu et al. 2007), Korea (Choi et al. 2009), and the East China Sea (Kim et al. 2010). In the meantime, new approaches to predictor selection procedures (Lee et al. 2007; Kwon et al. 2007; Fan and Wang 2009) were proposed. Recent studies (Ho et al. 2009; Kim et al. 2010) have shown that better forecast skill can be achieved by Poisson regression than linear regression when the method was applied to forecasting seasonal TC frequency over the East China Sea.

The WNP TC activity is known to possess several kinds of variations that differ in their time scales. The interannual variations are related to ENSO (Chan 1985, 2000; Chan et al. 1998; Wang and Chan 2002), the biennial variations are related to stratospheric quasi-biennial oscillation (Chan 1985), and the interdecadal variations are related to the Pacific decadal oscillation (Ho et al. 2004) and the Antarctic Oscillation (Ho et al. 2005). ENSO and TC relationships for various ocean basins are reviewed by Chu (2004). Intraseasonal variations have also been reported and related to the Madden–Julian oscillation (MJO; Harr and Elsberry 1995; Nakazawa 2006; Kim et al. 2008; Hsu et al. 2008) and 10–30-day waves (Ko and Hsu 2006, 2009).

The predictability of TC frequency in a limited area relies on factors that control TC trajectories. Over the WNP region, influential large-scale features in the low levels include the western Pacific subtropical anticyclone and monsoonal flow (Harr and Elsberry 1995; Kuo et al. 2001; Liu and Chan 2002), and in the upper-level tropical upper-tropospheric troughs (TUTTs) (Sadler 1978; Montgomery and Farrell 1993). Camargo et al. (2007b,c) found that WNP TC trajectories can be grouped into seven clusters. The clusters are sensitive to both genesis location and trajectory patterns. Three of the clusters are ENSO related (Camargo et al. 2007c). During El Niño years, the preferable location of TC genesis shifted southeastward, while during La Niña years the preferable location shifted northwestward. The type of highly populated cluster, wherein more recurving trajectories are observed, tends to occur more often during La Niña years.

Taiwan is located in an area that frequently experiences TCs. On average about 60% of the annual rainfall totals in Taiwan are associated with TCs. Thus, the seasonal forecasting of TC activity in the vicinity of Taiwan is of extreme importance to drought mitigation and water resources management for the island. Figure 1a shows the tracks of TCs affecting Taiwan from 1979 to 2007 and most of the storms are formed in the warm pool of the western Pacific and the Philippine Sea. After formation, these storms either move northwestward and make landfall in Taiwan or South China; move northwestward then recurve near 25°N and approach eastern China, Korea, and Japan; or move to the open ocean of the North Pacific. The climatological distribution of the corresponding monthly TC frequency is presented in Fig. 1b. A seasonal contrast is evident, with no activity from January to March, an extended active season from June through December, and a peak in August.

The seasonal outlook for typhoons affecting Taiwan is an important forecast item routinely issued by the Central Weather Bureau (CWB) since 2006. The forecast is based on information generated by a least absolute deviation (LAD) multivariate linear regression model (Chu et al. 2007), where the median of the residual term is minimized. In the LAD model, the predictand is the regional count of the total number of tropical cyclones entering into an area encompassing Taiwan and its vicinity during the 5-month period from June through October. Five antecedent environmental parameters, namely, sea surface temperature, sea level pressure, precipitable water, low-level relative vorticity, and vertical wind shear in key locations of the tropical WNP, are identified as predictors. Results from cross validation suggest that the statistical model is skillful in predicting regional TC activity. When the sea surface temperatures over the Philippine Sea are warm and anomalous low-level cyclonic circulation coupled with low-latitude westerly winds across the South China Sea and the Philippine Sea appear in the antecedent May, the TC activity near Taiwan tends to be more active in the following typhoon season.

Although the LAD model can produce skillful forecasts, it does not provide information on the likelihood of the range of tropical cyclone counts that may be realized. This specification of likelihood is needed for risk management in estimating a range of potential disaster losses or vulnerability before the commencement of the typhoon season. Another drawback in the LAD approach is that the predictand used is seasonal TC counts, which would be properly represented by a discrete distribution such as a Poisson process because the occurrences of typhoons in a small region are rare and discrete events.

Chu and Zhao (2007) used the Bayesian probabilistic forecast models to predict the seasonal tropical cyclone activity in the central North Pacific. The Bayesian probabilistic models have also been used to predict the tropical cyclone activity in the North Atlantic (Elsner and Jagger 2004, 2006). Given the advantage of the probabilistic approach, we will adopt the Bayesian regression method for predicting the seasonal TC activity near Taiwan. As the objective of the present study is to develop a forecast procedure for operational centers, the data and computation environments need to be available in real time. The forecast skill of the forecasting system also needs to be evaluated thoroughly. From the long-term risk management and general public interest point of view, it is desirable to have the forecast information of annual total TC counts before the beginning of a year, preferably by January of the target year. However, at present it would be very difficult to obtain a skillful forecast for the Taiwan area based on predictors prior to May, probably due to the chaotic monsoon influences on TC activity. The interannual variability in the air–sea coupled climate system over the tropical Pacific and Asian monsoon region shows a strong quasi-biennial nature with the changes of signs during northern spring (Yasunari 1991). Instead of using the period from June to October, an alternative approach for operational forecasts at present is to modify the predictand period from 20 June to 30 November and issue the forecast on 15 June. The differences in the predictands with and without the TC counts before 20 June and a real-time forecast example will be discussed in the last section.

The structure of this paper is as follows. The data and data preprocessing are described in section 2. The Bayesian regression model of the TC counts is presented in section 3. Correlation of large-scale variables and TC activity is presented in section 4. The predictor selection procedure and prediction results are presented in section 5, and a discussion and our conclusions appear in section 6.

## 2. Data and data processing

The tropical cyclone series in the vicinity of Taiwan from 1979 to 2007 are taken from the Regional Specialized Meteorological Center (RSMC) best-track data prepared by the Japan Meteorological Agency (information online at http://www.jma.go.jp/jma/jma-eng/jma-center/rsmc-hp-pub-eg/trackarchives.html). This series covers an area between 21°–26°N and 119°–125°E, a fairly limited geographical domain of 5° latitude and 6° longitude.

Monthly mean sea level pressure, wind data at the 850- and 200-hPa levels, relative vorticity data at the 850-hPa level, and total precipitable water over the western North Pacific (0°–30°N) are derived from the National Centers for Environmental Prediction–Department of Energy (NCEP–DOE) Reanalysis-2 dataset [information available online at http://www.cpc.noaa.gov/products/wesley/reanalysis2/index.html; see also Kanamitsu et al. (2002)]. These variables are the same as those used in Chu et al. (2007).

The horizontal resolution of the reanalysis dataset is 2.5° latitude–longitude. Tropospheric vertical wind shear is computed as the square root of the sum of the square of the difference in the zonal wind component between the 850- and 200-hPa levels and the square of the difference in the meridional wind component between the 850- and 200-hPa levels (Clark and Chu 2002). The monthly mean sea surface temperatures, at 2° horizontal resolution, are taken from the Extended Reconstructed Sea Surface Temperatures (ERSST) dataset prepared by the National Climate Data Center (NCDC) and downloaded from the Web site of the National Oceanic and Atmospheric Administration (NOAA) Physical System Division of the Earth System Research Laboratory in Boulder, Colorado (information online at http://www.cdc.noaa.gov/). After examining the correlation between large-scale environmental parameters and TC counts, only the parameters for the month of May are derived as predictors.

## 3. The Bayesian regression model for TC counts

*λ*(i.e., the mean seasonal TC rates), the probability mass function (PMF) of

*h*TCs occurring in a unit of observation time (e.g., one season) is The Poisson mean is simply

*λ*; thus, so is its variance.

*λ*is usually treated as a random variable that is conditional on the predictors. We assume that there are

*N*observations and for each observation there are

*K*relative predictors. We define a latent random

*N*-vector

**Z**, such that for each observation

*h*,

_{i}*i*= 1, 2, … ,

*N*,

*Z*= log

_{i}*λ*, where

_{i}*λ*is the relative Poisson intensity for the

_{i}*i*th observation. Here,

*N*denotes the sample size, which in this study is 29 (1979–2007). The link function between this latent variable and its associated predictors is expressed as

*Z*=

_{i}*X*+ ɛ

_{i}β*, where*

_{i}*β*= [

*β*

_{0},

*β*

_{1},

*β*

_{2}, … ,

*β*]′ is a random vector, the noise ɛ

_{K}*is assumed to be identical and independently distributed (IID) and normally distributed with zero mean and*

_{i}*σ*

^{2}variance, and

*X*= [1,

_{i}*X*

_{i,1},

*X*

_{i,2}, … ,

*X*

_{i,K}] denotes the predictor vector. In vector form, the general Poisson linear regression model can be formulated as below: where, specifically,

**X**′ = [

**X**′

_{1},

**X**′

_{2}, … ,

**X**′

*], 𝗜*

_{N}*is the*

_{N}*N*identity matrix and

**X**

*= [1,*

_{i}*X*

_{i,1},

*X*

_{i,2}, … ,

*X*

_{i,K}] is the predictor vector for

**h**

*,*

_{i}*i*= 1, 2, … ,

*N*, where Here, normal and Poisson stand for the normal distribution and Poisson distribution, respectively. In Eq. (2),

*β*

_{0}is referred to as the intercept.

It is worth noting that the Poisson rate *λ* is a real value while the TC count *h* is only an integer. Accordingly, *λ* contains more information relative to *h*. Furthermore, *h* is conditional on *λ*, which is subject to smaller variance than *h*. Taken together, for decision-making purposes *λ* should be used as the forecast quantity of the TC activity rather than *h*. We also note the fact that this hierarchical structure essentially fits well for Bayesian inference. For the details on the Bayesian analysis procedure, readers are referred to section 4 of Chu and Zhao (2007). The Bayesian inference requires posterior distribution, which involves a complex integration of high-dimensional functions. We use a Gibbs sampler and Matlab software to solve the integration. The Gibbs sampler is a widely used Markov chain Monte Carlo (MCMC) method that solves complex integrals by expressing them as expectations for some distribution and then estimating this expectation by drawing samples from that distribution. An explanation of a Gibbs sampler and how it may be used to generate the samples of the coefficient parameters *β _{i}* and evaluate the quality of the generated samples are detailed in Chu and Zhao (2007).

The development procedure of the forecast model is summarized in Fig. 2. The first step is to identify the predictors, which will be described in the next section. Then, data from 29 yr (1979–2007) of TC counts in the area of Taiwan and its vicinity are used to build the Bayesian regression model using the Gibbs sampler. The forecast skill is evaluated using a cross-validation procedure.

## 4. Correlation of large-scale variables and TC activity

Chu et al. (2007) found that the seasonal tropical cyclone activity around Taiwan and its vicinity modulated by large-scale conditions in May represented by five environmental variables. The same variables are chosen as predictor candidates in the present study. In this section we will describe the geographic locations of the predictor candidates determined by correlation analysis. Our predictor selection procedure will be discussed in next section.

The variables of the predictor candidates are sea surface temperature (SST), sea level pressure (SLP), precipitable water (PWAT), 850-hPa relative vorticity (Vor850), and vertical wind shear (VWS). The present paper uses NCEP–DOE Reanalysis-2 data during the period of 1979–2007, while Chu et al. (2007) used NCEP–National Center for Atmospheric Research (NCAR) reanalysis (Kalnay et al. 1996) during a longer period (1970–2006). The predictors are formed by the five variables averaged at the grid points where the variable and tropical cyclone activity are correlated at the 95% confidence level. Correlation analysis between the seasonal TC occurrences and the environmental parameters in the preceding May over the tropical WNP is used to identify their physical relationships. If correlations over a particular area of the WNP are found to be statistically significant, the parameter over this critical region is identified as a predictor candidate. For a sample size of 29, this critical value is 0.37 when a two-tailed *t* test is applied. A similar analysis was also applied to other months earlier than the preceding May, but very few grid points show significant correlation. Therefore, the predictors are limited to the variables in the preceding May only.

### a. SST

The contour plot for the correlation between TC counts and SST is shown in Fig. 3a, where a large area with significant positive correlations is found in the Philippine Sea and the tropical western Pacific warm pool marked by filled circles. The average of the SST series over the critical regions is chosen as a predictor. Significant correlations are also noted near Taiwan. For the sake of simplicity this area is not included in the predictor.

### b. Vor850

Figure 3b displays the correlation between the seasonal TC counts and the antecedent low-level relative vorticity at 850 hPa (Vor850). Critical regions with significantly high positive correlations are found in a southwest-to-northeast-oriented belt extending from the southern Philippines to the western Pacific. Accordingly, greater cyclonic vorticity anomalies in the preceding May over the critical regions were instrumental for more TC activity around Taiwan.

### c. PWAT

Figure 3c displays the correlation between the seasonal TC count and the precipitable water in which the critical regions are found mainly over the Philippine Sea. Note that the critical regions in Figs. 3c and 3b approximately coincide well to each other and are located to the north of the positive correlations of the SSTs in Fig. 3a. The approximately coincident critical region revealed in Figs. 3a–c suggests the possibility that enhanced (suppressed) low-level vortex and convective activity are driven by the warm (cold) SST anomalies over the Philippine Sea and the western Pacific warm pool. As a result, these circulation features tend to contribute to greater (lesser) TC frequency near Taiwan in the following season.

### d. SLP

The contour plot for the correlation between the seasonal TC frequency in the vicinity of Taiwan and the May SLP is shown in Fig. 3d. A large area with significant negative correlations is found in the Philippine Sea and the warm pool. The negative correlation area coincides well with the positive correlation area of SST in Fig. 3a. It is noted that the center of the critical region in Fig. 3d is near the equator while the center of the critical region in Fig. 3a is to the north of the equator. This suggests the possibility that the negative SLP might reflect the Rossby wave response to the warm SST anomaly.

### e. VWS

The contour plot for the correlation between the seasonal TC frequency in the vicinity of Taiwan and the May VWS is shown in Fig. 3e. Significant positive correlations are observed over the Indonesia portion of the Maritime Continent covering the area of the West Caroline Basin, Sulawesi, and the Java Sea. The coincident relationship between the positive correlation of VWS, negative correlation of SLP over the western Pacific, and positive correlation of Vor850 over the Philippine Sea suggests that if the convection over the low-latitude Philippine Sea is strong in May, then seasonal TC activity near and around Taiwan tends to be more active during the following months.

## 5. Predictor selection and prediction results

### a. Predictor selection

The screening procedure of the five candidate predictors is similar to the stepwise regression method used for multivariate linear regression models (e.g., Kim et al. 2010). For multivariate regression models, the importance of a predictor is mainly judged by the Pearson correlation coefficient between the observed and predicted variables and the sum of their absolute errors. For the Bayesian multivariate regression method, however, in addition to the correlation coefficient, the importance of a predictor is judged by the parameters of the mean, standard deviation, and ratio of the number of samples that lie to the left (right) of zero to the total number of iterations if the predictor is expected to have a positively (negatively) oriented impact on the forecast variable (e.g., SST). The ratio is referred to as the Bayesian *p* value. In the regression model a predictor with a smaller *p* value is more important. The posterior probability density functions (PDFs) for the model parameter set are solved using a Gibbs sampler. For simplicity, for all the simulations in this study, we take the first 2000 samples as burn in and use the following 10 000 samples as the output of the Gibbs sampler. We have the PDFs of all of the predictors besides the intercept term.

The correlation coefficient is calculated based on the forecasts obtained from leave-one-out cross validation (LOOCV). The cross-validation (CV) test is a general way to verify the effectiveness of a regression method. LOOCV is a forecast procedure in which a target year is chosen and a model is developed using the remaining 28 yr of data as the training set. The observations from the selected predictors for the target year are then used as inputs to forecast the missing year. This process is repeated successively until all 29 forecasts are made.

The relative importance of the predictor candidates determined by the stepwise predictor screening procedure is summarized in Table 1. For each step there are three rows listed under each variable. For each variable the following identification numbers are given: 1) Vor850, 2) VWS, 3) SLP, 4) SST, and 5) PWAT. The first row is the correlation between the observed and the median values of the predicted probability distribution, indicated as “50% − *r*” in Table 1. Higher correlation reflects a better prediction. The Bayesian *p* value, mean, and standard deviation of the predicted probability distribution are presented in the third row under columns *p*, *m*, and *s*, respectively. As shown in Table 1, we first use the forecast model, which has only one predictor, to perform the prediction. It turns out that the Vor850 has the highest correlation 50% − *r* (0.65) with TC counts in Taiwan, a perfect *p* value (0), and small standard deviation *s* (0.1). Therefore, Vor850 is identified as the primary predictor. In the second step, we repeat the prediction with two predictors at a time while keeping the Vor850. From the correlation values we cannot discern improvement by the second predictors when comparing the correlations of the second step with that of the first step Vor850. However, VWS shows a small *p* value for both VWS and Vor850 and, therefore, is the best second predictor. In the third step, we use three predictors while retaining the Vor850 and VWS. The prediction is improved slightly by including SLP. It is interesting to see the drop in correlation and increase in *p* and *s* when SST is included. Therefore, Vor850, VWS, and SLP are finally selected to formulate the Bayesian regression forecast model.

The predicted maximum and average probabilities of the TC counts through a LOOCV are plotted together with the actual observation for each year in Figs. 4a and 4b, respectively. The Pearson correlation between the maximum probability of predictive TC counts and independent observations is 0.672. The correlation between the average predictive TC counts and observations is also 0.672. The average TC predictive counts are computed as the sum of the TC count weighted by its probability density. The median and upper and lower quartiles (the upper 75% and lower 25%) of the predicted TC counts are plotted in Fig. 5. The distance between the upper and lower quartiles determines the central 50% of the predicted TC variations. The correlation between the median of the predictive rate and independent observations is 0.673. The skill of the deterministic forecast of the current model is comparable to that of the LAD model (0.673 versus 0.69). Out of a total of 29 yr, there are only 3 yr (2004, 2005, and 2007) in which the actual TC counts lie outside the predictive central 50% boundaries. Possible reasons for prediction failure will be discussed in the last section.

The reason the parameters *p*, *m*, and *s* in Table 1 can be used to select predictors can be understood by showing the posterior PDFs of the predictors in Fig. 6. The kernel-estimated marginal PDF for the parameter set, *β* and *σ*, is calculated for all the samples by convolving the resulting frequency of the target samples with a smoothing filter. Figure 6 shows that the posterior PDF of Vor850 has the largest mean value *m* (0.21) and the smallest Bayesian *p* value (0.02), which measures the ratio of the number of samples that lie to the left of zero to the total number of iterations. Both Vor850 and VWS show clear positive correlation, while SLP shows clear negative correlation with the TC counts. SLP has the largest *p* value (0.16).

### b. Forecast skill assessment

The accuracy of a probabilistic forecast method can be measured by its reliability, sharpness, and resolution. Reliability measures the agreement between forecast probability and mean observed frequency. Sharpness measures the skill of forecasting probabilities near 0 or 1. Resolution measures the ability of the forecast to resolve the set of sample events into subsets with characteristically different outcomes. Only resolution can be evaluated in the present study due to the nature of the small sample size of the observed seasonal TC counts. We use the relative operating characteristic (ROC) diagram to evaluate resolution.

It is important to know the characteristics of TC counts in terms of probability distribution before evaluating probabilistic forecast skill. The relationship between the TC counts and forecast probabilities is illustrated by the histogram of the 29-yr TC counts and its cumulative probability diagram presented in Figs. 7a and 7b, respectively. The thin dashed line in Fig. 7b is used as a threshold for distinguishing among groups in categorical forecasts. Figure 7a shows that the TC count varies from 1 to 8 in the data from 1979 to 2007. The cumulative probability diagram in Fig. 7b shows that 3 counts are slightly above the 30% cumulative probability and 5 counts are slightly above 70%. To have near even members in each category, as tercile classification implies, we set the below normal category to be when the TC count is less or equal to 3, and above normal as when the count is equal to or larger than 6. The normal category has a very narrow band that only includes counts of 4 and 5.

To construct the ROC diagram, the range of forecast probabilities is divided into 10 bins (0%–10%, 11%–20%, 21%–30%, etc.). The ROC diagram is constructed by plotting the hit rate (HR) and false alarm rate (FAR) against the accumulated probability at 10 bins jointly, as in Fig. 8. The curve connected by the 10 dots in Fig. 8 is called the ROC curve, which measures the ability of the forecast model to discriminate between hits and misses in terms of the occurrence probability. The ROC area is defined as the ratio of the area below the ROC curve with respect to the entire plotting area. If the ROC area is less than 0.5, the forecast model cannot discriminate between high and low occurrence probabilities. The ROC area presented in Fig. 8 is 0.6, suggesting that the Bayesian regression forecast model is moderately skillful in discriminating high and low occurrence probabilities.

A biased forecast may still have good resolution. The ROC curve is not sensitive to forecast bias and therefore cannot provide a reliable representation. A good ROC curve suggests that it may be possible to improve the forecast through calibration. Therefore, the ROC can be considered to be a measure of potential usefulness. The model bias and a need for calibration can be seen in Table 2, which shows the forecasted TC counts of maximum probability tabulated against the observed results. A negative bias of the forecast model is clearly presented in Table 2, which means that the forecasted TC counts tend to be lower than the actual occurrences. The bias inherent with the assumed Poisson distribution of TC counts, which will be discussed in the last section, implies that the probabilistic forecast results need to be calibrated and transformed to a categorical forecast. The forecast skills associated with categorical forecasts are easier to understand for most users.

A common practice in forecasting the seasonal outlook of TC counts is to categorize typhoon activity as above normal, normal, or below normal. In principle, the empirical cumulative distribution function (ECDF) corresponding to 33% and 67% in Fig. 7b should be used as the reference values for categorizing the forecast results when the outcome is expressed in a tercile. In the present study, the TC counts that are closest to ECDF = 33% are 3 or less (Fig. 7b) and this is classified as below normal (BN). For the ECDF = 67%, the corresponding counts are equal to 5 or larger, which is considered to be above normal (AN). To have near even samples in the tercile categories, the normal category (N) should have TC counts of 4 and 5; however, a count of 6 actually corresponds to ECDF = 87%. The uneven probability portion in AN and BN implies that the occurrence probability of AN is naturally less than BN when the observed TC counts are simulated by the Poisson distribution. The inconsistency between the actual occurrence rate and the assumed Poisson probability can be adjusted by a simple calibration procedure as follows.

The calibration is done on the basis of the cumulative probability of the predictive categories based on the 29-yr LOOCV forecasts. The cumulative probability of predictive counts is presented in Fig. 9. The cumulative probability of the predictive BN (≤3) and AN (≥6) categories is represented by solid and long-dashed lines, respectively, and the normal category N(4, 5) is represented by a short-dashed line. Figure 9 suggests that based on the predictive probability derived from 29 LOOCV forecast experiments, the lowest probability of BN is 0.11 (11%) and the highest probability of BN is 0.83 (83%), which refers to the points at which ECDF = 0 and 1, respectively, in Fig. 9. For the BN category the cumulative probability is 0.3151 when ECDF = 33% and 0.5721 when ECDF = 67%. Similarly, for category N the ECDF = 33% cumulative probability is 0.2622 and ECDF = 67% cumulative probability is 0.3042. For AN, the ECDF = 33% cumulative probability is 0.1402 and ECDF = 67% cumulative probability is 0.3536. The cumulative probabilities of ECDF = 33% and 67% are used as reference values for determining the predictive likelihood of a specific category. If the cumulative probability of a specific category is lower than the ECDF = 33% reference value, then it is unlikely that such category will occur. In contrast, a category is likely to occur when the predictive cumulative probability of the category is higher than its ECDF = 67% reference value.

To further explain the calibration procedure, we present the predictive PDFs of each year during 1979–2007 in Fig. 10. The climatological PDF as presented in Fig. 7a is shown in the background of Fig. 10 in gray. The observed TC count in each year is marked by a filled bar. Each year the predictive cumulative probability of AN, N, and BN categories can be directly calculated based on the PDF. Using the categorical reference values at ECDF = 33% and 67%, we can determine which category is most likely to occur. Thus, the probabilistic forecast result (the predictive PDF) can be converted into a categorical forecast result (categories AN, N, or BN). The contingency table for the category forecast after calibration adjustment is presented in Table 3. The zero number of false forecasts for the opposite category clearly reflects the capability of the Bayesian regression forecast for capturing the categorical forecasts correctly.

## 6. Discussion

Seasonal forecasts of tropical cyclone activity were pioneered by Nicholls (1979) and Gray (1984a,b). For the western North Pacific, Chan et al. (1998) have performed seasonal forecasts of tropical cyclone activity. Skillful forecasts are noted for some basin-wide predictands such as the annual number of typhoons. In the past, while progress was made on forecasting basin-wide seasonal or annual typhoon activity, little attention was paid to forecasting regional activity. The lack of regional information for particular typhoon-threatened, subbasin regions poses problems for adequate long-term planning of regional emergency management and hazard mitigation. In particular, prediction of the landfall frequency on specific coastal areas is sorely needed, as many regions in East Asia are vulnerable to typhoons. In this paper, we present a probabilistic model that has been proven to be skillful in predicting seasonal TC numbers for a region. The categorical forecast skill from the Bayesian regression model is better than that achieved by climatology and persistence methods.

### a. Physical interpretation of the predictability

Three climate variables (Vor850, VWS, and SLP) are used in the prediction model. The predictor screening procedure shows that the most important variable is Vor850. The reason Vor850 stands out is probably because it captures the variations of the ridge position of the westward extension of the western Pacific subtropical high (WPSH) in May, which can be a precursor signaling how WPSH will evolve in the following months. This speculation is supported by the correlation maps of the lag correlation of Vor850 in May and the low-level wind fields over the Philippine Sea and western Pacific in the following months (figures not shown). Note that the correlation is higher during June–August, but lower in later months.

Variations of WPSH were found to be influenced by SST and convective activity over the tropical Indian Ocean–western Pacific on the decadal time scale (Hu 1997; Gong and Ho 2002; Zhou et al. 2009). For interannual variations, convection and SST over the Philippine Sea are major influential factors (Lu 2001; Lu and Dong 2001). ENSO is not a major factor causing systematic interannual variations of WPSH. The outstanding correlation between Vor850 and TCs affecting Taiwan found in the present study suggests that the remote SST variations such as ENSO cannot describe sufficient variances of the track anomalies that are important to Taiwan. The potential SST predictor presented in Fig. 3a is less important compared with Vor850, VWS, and SLP. It suggests that SST is a secondary factor influencing the seasonal tendency of TC tracks. However, SST can have an indirect influence through affecting convection then modulating the westward extension of WPSH (Tu et al. 2009). Such a process can be captured by the selected predictors of the present prediction model. In summary, WPSH is the key that modulates TC tracks over the Philippine Sea and the west end of the WNP. Convection and SST over the Philippine Sea near the equator captured by the VWS and SLP can significantly influence WPSH, which influences the TC tracks affecting Taiwan.

### b. 2008 and 2009 prediction results

The forecast model developed in this paper was applied to 2008 and 2009 as an operational test. The data from these 2 yr were not used in the model development and evaluation. The forecast results are presented in Figs. 11a and 11b. For 2008, prediction shows below normal TC activity with the maximum probability of only one TC affecting Taiwan (Fig. 11a). However, verification shows normal TC activity with four TCs affecting Taiwan. For 2009, prediction shows normal TC activity with the maximum probability of four TCs and verification also shows normal TC activity with four TCs affecting Taiwan; therefore, the prediction in 2009 is perfect.

Although Taiwan was affected by four TCs in both 2008 and 2009, the temporal pattern of occurrence is very different in these two years. In 2008, one TC occurred in July and three in September. The September cluster resulted from a strong easterly wave associated with strong easterly trade winds. In 2009, the four TCs occurred evenly in June, July, August, and October, respectively. There is no obvious clustering phenomenon. The contrast between these two years strongly suggests that the Bayesian regression model is good at capturing the temporally even distribution condition. We note in Fig. 5 that the model performed poorly after 2000, particularly in 2004, 2005, and 2007. In 2004 there was strong MJO modulation on TC activity (Nakazawa 2006; Hsu et al. 2008). In 2005 and 2007, Taiwan saw successive TCs, which were influenced by strong easterly waves, approach within 1 month. The results are consistent with what we found for 2008.

Li and Fu (2006) pointed out that the Rossby wave train in the wakes of preexisting TCs creates a favorable condition for successive TCs to occur. In this case the successive TCs that formed in the Pacific easterly and Asian monsoon westerly confluent region (Lau and Lau 1990; Chang et al. 1996) cannot be considered independent. This means such phenomenon is against the assumption of a Poisson distribution of the TC counts (i.e., the occurrence of TCs in a particular time period is independent of previous occurrences) in the Bayesian regression model, which may cause prediction failure.

### c. Recommendation for real-time operational forecast

In the practice of real-time operational forecasting because the monthly mean reanalysis data from May cannot be available before June, the presented model cannot meet the strict requirements for an operational forecast. Therefore, for operational practices we recommend issuing the forecast around 10 June to predict the total TC counts during the period 20 June–30 November. We repeated the same model development procedure described in this paper and examined the forecast skill. The recommended procedure can produce slightly better forecasts for the years with TCs affecting Taiwan before 20 June.

At the Central Weather Bureau of Taiwan, May and June are considered to be the mei-yu season. The mei-yu front is usually associated with the northern rim of the WPSH. Concurrent with the seasonal development of the South Asian monsoon, the ridge of the western edges of the WPSH starts to move northward, from the Philippine Sea (20°N, 135°E) on pentad 31 (31 May–4 June) to the latitudes of Taiwan (23°N, 129°E) on pentad 35 (20–24 June) (Nagata and Mikami 2010). In other words, Taiwan is less influenced by the tropical disturbances from the western Pacific before mid-June. Therefore, the recommendations we proposed for operational forecasting can fit the large-scale climate conditions very well. Note that additional deterministic prediction information can be generated by a multivariate linear regression model proved skillful by the standard verification procedure (Chu et al. 2007).

For future development, in addition to exploring more potential predictors such as Arctic sea ice and North Pacific indices (Fan 2007; Wang et al. 2007), we plan to expand the statistical forecast model to a hybrid dynamical–statistical configuration similar to what is done for the seasonal forecast of Atlantic hurricane activity using NCEP dynamical seasonal forecasts (Wang et al. 2009). Because the WPSH is the most influential large-scale system that affects the interannual variations of the TC activity near Taiwan, the dynamical forecast system needs to produce reliable forecast information about the WPSH. This is a challenging demand because the variability of WPSH is strongly modulated by the SST and convective activity in the region of the tropical Indian Ocean, Philippine Sea, and western Pacific (Hu 1997; Lu 2001; Lu and Dong 2001; Gong and Ho 2002; Zhou et al. 2009). The negative bias associated with successive TCs suggests that accumulated cyclone energy (ACE) might be a better predictor than TC frequency. Research in this direction is beyond the scope of the current study.

While typhoons bring strong winds, storm surges, and huge waves, they also result in beneficial rainfall to Taiwan as a majority of the annual rainfall comes from typhoons. If the number of landfalling typhoons is lower than expected in a typhoon season, the likelihood of drought in the following year would be high. Many other tropical coastal areas or islands have problems similar to those of Taiwan, namely, natural variability in tropical cyclone activity from year to year and increasing demand for freshwater resulting from typhoons as populations have soared. It is hoped that the method demonstrated here would also be of value to other areas in East Asia (e.g., the Philippines, China) and Southeast Asia (e.g., Vietnam) in better predicting regional typhoon activity. This could in turn be a vital tool for various government agencies when doing long-lead-time disaster-mitigation planning and water resources management.

## Acknowledgments

Thanks are due to May Izumi for her editorial assistance. This study was supported in part by the Central Weather Bureau under the Hazardous Weather Monitoring and Forecasting Enhancement Project and the National Science Council of the Republic of China under Grants NSC98-2625-M-052-009 and NSC99-2625-M-052-002-MY3 to Central Weather Bureau.

## REFERENCES

Camargo, S. J., , Barnston A. G. , , Klotzbach P. J. , , and Landsea C. W. , 2007a: Seasonal tropical cyclone forecasts.

,*WMO Bull.***56****,**297–307.Camargo, S. J., , Robertson A. W. , , Gaffney S. J. , , Smyth P. , , and Ghil M. , 2007b: Cluster analysis of typhoon tracks. Part I: General properties.

,*J. Climate***20****,**3635–3653.Camargo, S. J., , Robertson A. W. , , Gaffney S. J. , , Smyth P. , , and Ghil M. , 2007c: Cluster analysis of typhoon tracks. Part II: Large-scale circulation and ENSO.

,*J. Climate***20****,**3654–3676.Chan, J. C. L., 1985: Tropical cyclone activity in the northwest Pacific in relation to the El Niño/Southern Oscillation phenomenon.

,*Mon. Wea. Rev.***113****,**599–606.Chan, J. C. L., 2000: Tropical cyclone activity over the western North Pacific associated with El Niño and La Niña events.

,*J. Climate***13****,**2960–2972.Chan, J. C. L., , Shi J. S. , , and Lam C. M. , 1998: Seasonal forecasting of tropical cyclone activity over the western North Pacific and the South China Sea.

,*Wea. Forecasting***13****,**997–1004.Chan, J. C. L., , Shi J. E. , , and Liu K. S. , 2001: Improvements in the seasonal forecasting of tropical cyclone activity over the western North Pacific.

,*Wea. Forecasting***16****,**491–498.Chang, C. P., , Chen J. M. , , Harr P. A. , , and Carr L. E. , 1996: Northwestward-propagating wave patterns over the tropical western North Pacific during summer.

,*Mon. Wea. Rev.***124****,**2245–2266.Choi, K-S., , Kim D-W. , , and Byun H-R. , 2009: Statistical model for seasonal prediction of tropical cyclone frequency around Korea.

,*Asia-Pac. J. Atmos. Sci.***45****,**21–32.Chu, P. S., 2004: ENSO and tropical cyclone activity.

*Hurricanes and Typhoons: Past, Present, and Future,*R. J. Murnane and K. B. Liu, Eds., Columbia University Press, 297–332.Chu, P. S., , and Zhao X. , 2007: A Bayesian regression approach for predicting seasonal tropical cyclone activity over the central North Pacific.

,*J. Climate***20****,**4002–4013.Chu, P. S., , Zhao X. , , Lee C. T. , , and Lu M. M. , 2007: Climate prediction of Taiwan cyclone activity in the vicinity of Taiwan using the multivariate least absolute deviation regression method.

,*Terr. Atmos. Oceanic Sci.***13****,**469–498.Clark, J. D., , and Chu P. S. , 2002: Interannual variation of tropical cyclone activity over the central North Pacific.

,*J. Meteor. Soc. Japan***80****,**403–418.Elsner, J. B., , and Jagger T. H. , 2004: A hierarchical Bayesian approach to seasonal hurricane modeling.

,*J. Climate***17****,**2813–2827.Elsner, J. B., , and Jagger T. H. , 2006: Prediction models for annual U.S. hurricane counts.

,*J. Climate***19****,**2935–2952.Fan, K., 2007: North Pacific sea ice cover, a predictor for the western North Pacific typhoon frequency?

,*Sci. China Ser. Dokl. Earth Sci.***50****,**1251–1257.Fan, K., , and Wang H. , 2009: A new approach to forecasting typhoon frequency over the western North Pacific.

,*Wea. Forecasting***24****,**974–986.Gong, D. Y., , and Ho C. H. , 2002: Shift in the summer rainfall over the Yangtze River valley in the late 1970s.

,*Geophys. Res. Lett.***29****,**1436. doi:10.1029/2001GL014523.Gray, W. M., 1984a: Atlantic seasonal hurricane frequency. Part I: El Niño and 30 mb quasi-biennial oscillation influences.

,*Mon. Wea. Rev.***112****,**1649–1668.Gray, W. M., 1984b: Atlantic seasonal hurricane frequency. Part II: Forecasting its variability.

,*Mon. Wea. Rev.***112****,**1669–1683.Harr, P. A., , and Elsberry R. L. , 1995: Large-scale circulation variability over the tropical western North Pacific. Part I: Spatial patterns and tropical cyclone characteristics.

,*Mon. Wea. Rev.***123****,**1225–1246.Ho, C. H., , Baik J. J. , , Kim J. H. , , Gong D. Y. , , and Sui C. H. , 2004: Interdecadal changes in summertime typhoon tracks.

,*J. Climate***17****,**1767–1776.Ho, C. H., , Kim J. H. , , Kim H. S. , , Sui C. H. , , and Gong D. Y. , 2005: Possible influence of the Antarctic Oscillation on tropical cyclone activity in the western North Pacific.

,*J. Geophys. Res.***110****,**D19104. doi:10.1029/2005JD005766.Ho, C. H., , Kim H. S. , , and Chu P. S. , 2009: Seasonal prediction of tropical cyclone frequency over the East China Sea through a Bayesian Poisson-regression method.

,*Asia-Pac. J. Atmos. Sci.***45****,**45–54.Hsu, H. H., , Chen Y. L. , , Lo A. K. , , Hung C. H. , , Kau W. S. , , and Wu C. C. , 2008: Intraseasonal oscillation–tropical cyclone coupling in the western North Pacific during the 2004 typhoon season.

*Recent Progress in Atmospheric Sciences: Applications to the Asia-Pacific Region,*K. N. Liou and M. D. Chou, Eds., World Scientific, 49–65.Hu, Z. Z., 1997: Interdecadal variability of summer climate over East Asia and its association with 500 hPa height and global sea surface temperature.

,*J. Geophys. Res.***102****,**19403–19412.Kalnay, E., and Coauthors, 1996: The NCEP/NCAR 40-Year Reanalysis Project.

,*Bull. Amer. Meteor. Soc.***77****,**437–471.Kanamitsu, M., , Ebisuzaki W. , , Woollen J. , , Yang S. K. , , Hnilo J. J. , , Fiorino M. , , and Potter G. L. , 2002: NCEP–DOE AMIP-II reanalysis (R-2).

,*Bull. Amer. Meteor. Soc.***83****,**1631–1643.Kim, H. S., , Ho C. H. , , Chu P. S. , , and Kim J. H. , 2010: Seasonal prediction of summertime tropical cyclone activity over the East China Sea using the least absolute deviation regression and the Poisson regression.

,*Int. J. Climatol.***30****,**210–219.Kim, J. H., , Ho C. H. , , Kim H. S. , , Sui C. H. , , and Park S. K. , 2008: Systematic variation of summertime tropical cyclone activity in the western North Pacific in relation to the Madden–Julian oscillation.

,*J. Climate***21****,**1171–1191.Ko, K. C., , and Hsu H. H. , 2006: Sub-monthly circulation features associated with tropical cyclone tracks over the East Asian monsoon area during July–August season.

,*J. Meteor. Soc. Japan***84****,**871–889.Ko, K. C., , and Hsu H. H. , 2009: ISO modulation on the submonthly wave pattern and recurving tropical cyclones in the tropical western North Pacific.

,*J. Climate***22****,**582–599.Kuo, H. C., , Chen J. H. , , Williams R. T. , , and Chang C. P. , 2001: Rossby waves in zonally opposing mean flow: Behavior in the northwest Pacific summer monsoon.

,*J. Atmos. Sci.***58****,**1035–1050.Kwon, H. J., , Lee W. J. , , Won S. H. , , and Cha E. J. , 2007: Statistical ensemble prediction of the tropical cyclone activity over the western North Pacific.

,*Geophys. Res. Lett.***34****,**L24805. doi:10.1029/2007GL032308.Lau, K. H., , and Lau N. C. , 1990: Observed structure and propagation characteristics of tropical summertime synoptic scale disturbances.

,*Mon. Wea. Rev.***118****,**1888–1913.Lee, W. J., , Park J. S. , , and Kwon H. J. , 2007: A statistical model for prediction of the tropical cyclone activity over the western North Pacific.

,*Asia-Pac. J. Atmos. Sci.***43****,**175–183.Li, T., , and Fu B. , 2006: Tropical cyclogenesis associated with Rossby wave energy dispersion of a preexisting typhoon. Part I: Satellite data analyses.

,*J. Atmos. Sci.***63****,**1377–1389.Liu, K. S., , and Chan J. C. L. , 2002: Synoptic flow patterns associated with small and large tropical cyclones over the western North Pacific.

,*Mon. Wea. Rev.***130****,**2134–2142.Liu, K. S., , and Chan J. C. L. , 2003: Climatological characteristics and seasonal forecasting of tropical cyclones making landfall along the South China coast.

,*Mon. Wea. Rev.***131****,**1650–1662.Lu, R., 2001: Interannual variability of the summertime North Pacific subtropical high and its relation to atmospheric convection over the warm pool.

,*J. Meteor. Soc. Japan***79****,**771–783.Lu, R., , and Dong B. W. , 2001: Westward extension of North Pacific subtropical high in summer.

,*J. Meteor. Soc. Japan***79****,**1229–1241.Montgomery, M. T., , and Farrell B. F. , 1993: Tropical cyclone formation.

,*J. Atmos. Sci.***50****,**285–310.Nagata, R., , and Mikami T. , 2010: Response of the summer atmospheric circulation over East Asia to SST variability in the tropical Pacific.

,*Int. J. Climatol.***30****,**813–826.Nakazawa, T., 2006: Madden–Julian oscillation activity and typhoon landfall on Japan in 2004.

,*Sci. Online Lett. Atmos.***2****,**136–139.Nicholls, N., 1979: A possible method for predicting seasonal tropical cyclone activity in the Australian region.

,*Mon. Wea. Rev.***107****,**1221–1224.Rodriguez, J., , Vos F. , , Below R. , , and Guha-Sapir D. , 2009: Annual disaster statistical review 2008. Centre for Research on the Epidemiology of Disasters (CRED), Brussels, Belgium, 33 pp. [Available online at http://www.cred.be/sites/default/files/ADSR_2009.pdf].

Sadler, J. C., 1978: Mid-season typhoon development and intensity changes and the tropical upper tropospheric trough.

,*Mon. Wea. Rev.***106****,**1137–1152.Tu, J. Y., , Chou C. , , and Chu P. S. , 2009: The abrupt shift of typhoon activity in the vicinity of Taiwan and its association with western North Pacific–East Asian climate change.

,*J. Climate***22****,**3617–3628.Wang, B., , and Chan J. C. L. , 2002: How strong ENSO events affect tropical storm activity over the western North Pacific.

,*J. Climate***15****,**1643–1658.Wang, H. J., , Sun J. Q. , , and Fan K. , 2007: Relationships between the North Pacific Oscillation and the typhoon/hurricane frequencies.

,*Sci. China Ser. Dokl. Earth Sci.***50****,**1409–1416.Wang, H., , Schemm J. K. E. , , Kumar A. , , Wang W. , , Long L. , , Chelliah M. , , Bell G. , , and Peng P. , 2009: A statistical forecast model for Atlantic seasonal hurricane activity based on the NCEP dynamical seasonal forecast.

,*J. Climate***22****,**4481–4500.Wilks, D. S., 2006:

*Statistical Methods in the Atmospheric Sciences*. 2nd ed. Academic Press, 627 pp.WMO, 2002:

*Standardised Verification System (SVS) for Long-Range Forecasts (LRF)*.*Manual on the GDPS,*WMO No.-485, Vol. 1, 6 pp.Yasunari, T., 1991: The monsoon year—A new concept of the climatic year in the Tropics.

,*Bull. Amer. Meteor. Soc.***72****,**1131–1138.Zhou, T., and Coauthors, 2009: Why the western Pacific subtropical high has extended westward since the late 1970s.

,*J. Climate***22****,**2199–2215.

Correlation coefficient (50% − *r*) of the median probability and the Bayesian *p* value (*p*), mean (*m*), and standard deviation (*s*) of the predicted probability distribution of the predictors in each stage of predictor screening.

Contingency table for the observed and forecasted TC counts from 0 to 10. The forecasted counts presented here are the counts that have the maximum probability in the forecasted probability distribution.

Contingency table for the tercile category forecasted by the Bayesian model after calibration using the 29-yr LOOCV forecast results.