## 1. Introduction

Tropical cyclones are natural events with very destructive impact. Hurricanes Mitch and Georges that made landfall over North America in 1998 showed, once again, the importance of increasing the accuracy of tropical cyclone track prediction. The cyclone track depends in a nonlinear way on many parameters, such as the large-scale wind and pressure fields, the surface conditions, and the field of moist convection (see, e.g., Holland 1983; Elsberry 1995, etc.). Most tracks are reasonably regular but, at times, surprising loops or sudden changes in direction or velocity are observed. Numerical weather prediction (NWP) models forecasting tropical cyclones and their tracks have substantially improved during the last decade. In particular, when ensemble methods are being introduced, they reduce the influence of the initial uncertainty on the forecasts (see, e.g., Zhang and Krishnamurti 1997). In addition to the improvements of NWP models in predicting tropical cyclone motion, there is a need, as well, to improve on the performance of empirical models. In this connection, Leslie and Fraedrich (1990) point out the advantages of combining NWP and empirical schemes, and, furthermore, error recycling methods (Fraedrich et al. 2000).

The purpose of this paper is to present an analog forecast scheme that is self-adapting compared to the Hurricane Analog model (HURRAN; see Hope and Neumann 1970). Fraedrich and Rückert (1998) developed this scheme to predict time series of nonlinear chaotic systems. Motivated by their results, we apply it to tropical cyclone track forecasts and extend it to ensemble forecasts (Sievers 1998). For verification purposes these forecasts are compared with the predictions of the CLIPER model (Climatology and Persistence; Neumann 1972; Neumann and Leftwich 1977), which is generally used as a reference model. The outline of this note is as follows: In section 2, the basic idea and structure of the analog model and its extension to ensemble forecasting is presented. Section 3 applies the method to data in the Atlantic and east Pacific basins. Finally, section 4 summarizes the results and highlights the important features.

## 2. Self-adapting analog forecast scheme

In principle, forecasting future values of time series utilizing a set of measurements is possible if patterns of these values have a one-to-one correspondence to states of an underlying dynamical system. The embedding of the dynamical system in a state space spanned by measured variables (see, e.g., Sauer et al. 1991) provides the theoretical background for analyzing and forecasting dynamical systems. Besides other statistical models, one option is to use an analog scheme that depends on the assumption that events with equal (at least similar) initial states develop in equal (at least similar) ways. In HURRAN (Hope and Neumann 1970), an analog to a given forecast situation is defined as a tropical cyclone from the historical file having temporal and spatial characteristics similar to a current storm. After numerous trial-and-error type model runs, the authors accepted those tropical cyclones as analogs, passing within 240 km of the current cyclone, moving within 22.5°, and occurring within 15 Julian days from the current cyclone. This approach is extended here, following Fraedrich and Rückert (1998), who developed a method that iteratively reduces a user-defined forecast error by suitably fitting metric weights for the components of the reconstructed states entering the analog scheme. In this sense, the analog scheme is adapting itself to an optimal prediction in the dependent dataset. The building of the analog forecast model proceeds in the following four steps.

*Step A:*The first step consists of reconstructing the state space and defining the error measure used. For adapting the best weights, the library of states with zero mean and unit standard deviation is split into a dependent and an independent dataset, respectively. The dependent dataset is used for model building, the independent dataset for model verification. The weights are adapted by minimizing an individual forecast error,

*e*(

*t*

_{j}), which characterizes the distance between the observed and the

*j*-analog position of the cyclone track for all lead times

*i*= +6, +12, . . . , +72 h:

*x*

_{1}(

*t*

_{j}),

*x*

_{2}(

*t*

_{j})] always denotes the meridional and zonal position (longitude and latitude) of the

*j*-analog state, which is defined as

**x**(

*t*

_{j}) = [

*x*

_{l}(

*t*

_{j}), . . . ,

*x*

_{D}(

*t*

_{j})] with the embedding dimension,

*D,*in the dependent dataset;[

*x*

_{1}(

*t*

_{0}),

*x*

_{2}(

*t*

_{0})] is the position of the observed initial state. The individual error,

*e*(

*t*

_{j}), has a close relation to the verification error

*E*[see Eq. (6)], but it is only used as a sorting condition in step C.

*Step B:*In the second step the evaluation of the analog forecasting scheme is described. Steps A and B together form the basis for the analog forecast. Therefore, a metric

*d*(

**x**(

*t*

_{0}),

**x**(

*t*

_{j})) is defined

**x**(

*t*

_{j}) = [

*x*

_{1}(

*t*

_{j}), . . . ,

*x*

_{D}(

*t*

_{j})] and

**x**(

*t*

_{0}) = [

*x*

_{1}(

*t*

_{0}), . . . ,

*x*

_{D}(

*t*

_{0})], the real positive metric weights,

*G*

_{k}, and the embedding dimension,

*D*(see section 3b). The hyperbolic tangent function is chosen because it reduces the overlearning effects and saves computational costs. The problem is to find the optimal weights

*G*

_{k}(that is, to change the state space optimally) so that the analogs found in the state space deliver the best forecasts.

*Step C:*The scheme is improved by a learning rule that optimizes the metric weights, modifying Fraedrich and Rückert (1998) by including ensemble forecasts with

*N*ensemble members (ensemble size). In the following we have to distinguish between the expressions“near” and “best neighbor” of the reference state. The former denotes an analog state with a small metric

*d*[see Eq. (2)], the latter a small corresponding forecast error

*e*(

*t*

_{j}) [see Eq. (1)]. The (

*N*+ 1)-nearest neighbors

**x**(

*t*

_{1}), . . . ,

**x**(

*t*

_{N+1}) of the observed state

**x**(

*t*

_{0}) are identified within the dependent dataset. From these (

*N*+ 1)-nearest neighbours (a) the

*N*-nearest neighbors are selected and (b) with respect to their individual error

*e*(

*t*

_{j}) the

*N*-best neighbors are chosen discarding the neighbor with the highest individual error. Note that the number of the nearest and the best neighbor

*N*is defined by the ensemble size. The metric weights

*G*

_{k}are adapted by comparing the squared distances,

*f*

_{k}and

*b*

_{k}, of the

*N*-nearest (

*n*= 1, . . . ,

*N*) with the

*N*-best (

*m*= 1, . . . ,

*N*) ensemble members from the observed state

**x**(

*t*

_{0}):

*G*

^{′}

_{k}

*x*

_{k}(

*t*

_{0}) −

*x*

_{k}(

*t*

_{j})]

^{2}, is small, tanh[

*x*

_{k}(

*t*

_{0}) −

*x*

_{k}(

*t*

_{j})]

^{2}is approximately [

*x*

_{k}(

*t*

_{0}) −

*x*

_{k}(

*t*

_{j})]

^{2}, but for large values the hyperbolic tangent function is limited by one.

*Step D:* This last step uses this learning rule iteratively to optimize the metric, starting with the Euclidean metric *G*_{k} = 1 (for *k* = 1, . . . , *D*), until a certain threshold is achieved. Here, no defined threshold is used, but a more subjective method is applied, which iterates the scheme 400 times and then uses the metric weights that achieve a minimum in the dependent dataset error [Σ *e*(*t*_{j}) for all ensemble members averaged over the dependent dataset]. If the metric weights are optimally adapted, the forecast analogs are searched in this optimal phase space.

*Ensemble mean forecasts:*The ensemble mean forecast of the meridional and zonal position

*x*

_{1,2}(

*t*+

*i*) is defined by the arithmetic mean over all

*N*ensemble members at lead time

*i*:

*N*(ensemble size) is chosen, running the scheme several times with a different number

*N*to obtain the optimal ensemble size,

*N*

_{Basin}.

## 3. Tropical cyclone tracks: Data, model building, and forecasting

Independent ensemble mean forecasts of the self-adapting analog scheme are made after the model building process. The results are compared with two different analog models (the best adapted analog, *N* = 1, and the ensemble based on the Euclidean metric) and the CLIPER reference model.

### a. Data

The dataset for the Atlantic and east Pacific basins (National Hurricane Center, Miami, FL) contains the following parameters: the zonal and meridional position of the cyclone center, the date and time (UTC), maximum sustained wind speed (kt), and central pressure (hPa) every 6 h. The central pressure is not always available in the datasets and therefore not used in this study. Due to data inhomogeneities, the wind speed in the east Pacific basin dataset is also not used; in the east Pacific basin, the wind speed of the presatellite era (up to 1969) enters in classes of 25, 45, and 75 kt though later a more detailed range of wind speed classes is available. All data are the so-called best track data, that is, the final archived track of a storm after all information are examined. The accuracy of the data has changed since the beginning of the registration of tropical cyclones due to better monitoring facilities (e.g., satellites). However, the need for large datasets calls for using the less accurate data (Aberson 1998). Thus, the Atlantic basin set contains 28 643 observations from 1886 to 1996; the east Pacific basin set holds 16 188 observations in the 1949–96 period (for more details, see Jarvinen et al. 1984; Brown and Leftwich 1982).

### b. Model building: Embedding dimension, ensemble size, and metric weights

Before building the self-adapting analog scheme the dimension of the phase space needs to be defined. The embedding theorem (see Sauer et al. 1991) requires a sufficient embedding dimension *D* = 2*D*_{a} + 1, where *D*_{a} is the attractor dimension of the underlying dynamical system. It guarantees that *D* observed variables span a state space that completely embeds the dynamical system. The only available estimates on the dimension of tropical cyclone tracks suggest *D*_{a} ∼ 8 (Fraedrich and Leslie 1989), which leads to an embedding dimension of *D* ∼ 17. This dimension derived for the Australia regions is applicable to other basins assuming the cyclone track forecasts in the Australia region to be more difficult than in other regions. So the dimension used is expected to be an upper limit. Each entry in the dataset provides the following four parameters: zonal and meridional position, the time, and the maximum sustained wind speed. The displacements for a time lag of 6 h are used up to 24 h in the past, which leads to two positions and 16 displacements. Furthermore, the Julian date and, for the Atlantic basin, the current maximum sustained wind speed are included. Thus the Atlantic basin phase space has 20 dimensions and the east Pacific basin phase space has 19. The components (listed in Table 1) are used as input parameters in the self-adapting analog scheme.

In the next step the dataset is divided into a dependent dataset and an independent verification set. The verification set contains all states from 1989 to 1996, with a total of 1566 usable states in the Atlantic basin and 2019 in the east Pacific basin. A state is considered usable if the lifetime of the individual storm equals or exceeds the forecast period (72 h) plus 24 h for defining the state. The dependent datasets consist of the remaining record, with 12 671 states in the Atlantic basin (beginning in 1886) and 4701 states since 1949 in the east Pacific basin. The optimal number of ensemble members (ensemble size) is derived for both basins by applying the self-adapting analog scheme several times and increasing the number of ensemble members from 1 to 30. The optimal ensemble size is reached when the corresponding self-adapting scheme achieves the best performance within the verification dataset. The number of the ensemble members for the Atlantic is *N*_{ATL} = 18, and for the east Pacific basin is *N*_{EP} = 22. A time lag of at least 5 days (120 h) between the observation and analog or between two analogs is used to find the nearest independent neighbors in space but not in time. Now, the scheme is run 400 times to find the optimal metric weights. In Fig. 1 the dependent dataset error, that is the sum of the individual forecast error for all ensemble members averaged over the dependent dataset, versus iteration step is shown for the Atlantic basin to illustrate the performance gain applying the self-adapting scheme.

The metric weights obtained by the self-adapting scheme show similar behavior in both basins (Fig. 2): The most important component is the zonal displacement during the last 6 h, followed by the meridional displacement during the last 6 h. The other zonal and meridional displacements, the Julian date, and the maximum sustained wind speed in the Atlantic basin are more important than the positions. The main difference between the basins is that in the east Pacific basin the weights for the zonal movement are larger than the ones for meridional displacements.

### c. Forecast and verification

*E*

_{ANALOG}〉 between predicted position and the observed best track position. The great circle distance (km) is

*x*

_{0},

*y*

_{0}) is the observed zonal and meridional best track position and (

*x*

_{f},

*y*

_{f}) the forecast position. In addition, the skill score

*s*is introduced to compare the performance of the analog scheme with a reference model 〈

*E*

_{ref}〉. A positive skill score denotes that the analog model is better than the reference and vice versa:

Two sets of forecast experiments are performed: (i) The ideal forecasts utilize initial best tracks and (ii) the simulated operational forecasts employ noisy initial tracks, because under operational conditions, best track data are not available. For both forecasts i and ii the errors are determined by best track comparison. To simulate real-time conditions, Gaussian white noise is added to the best track as initial uncertainties before employing the analog scheme and the CLIPER reference. The white noise intensity is prescribed by the mean distance between best track and operational data, and the related wind speeds, that is, 35 km and 10 knots (Neumann 1981).

### d. Results

The results of the forecast experiments are presented for the independent verification set for best track data (Fig. 3) and simulated operational data (Fig. 4). Figures 3a–d show improvements obtained by the self-adapting analog ensemble forecasts (plus signs) over the best adapted analog model (triangles) and the ensemble forecasts based on the Euclidean metric (squares). Compared with CLIPER (diamonds) the self-adapting analog scheme reveals positive (negative) skill in the east Pacific (Atlantic) basin. More details are noted in the following:

In the east Pacific (Fig. 3d) and Atlantic (Fig. 3b) basins the great circle error of the ensemble mean forecasts (plus signs) obtained by the self-adapting model grows from about 120 km (190 km) at 24 h to 450 km (780 km) at 72 h.

The self-adapting analog ensemble forecasts (plus signs) gains 35% skill over the best adapted analog model (triangles) for all lead times. Compared with the ensemble forecasts based on the Euclidean metric (squares) the skill gain decreases with lead time from 35% at 24 h to 10% at 72 h for both basins. That is, both the optimal ensemble size and the best weights substantially improve the analog forecasts.

Compared to the performance of the reference models, the self-adapting ensemble model performed quite differently in the two basins. In the east Pacific basin, the skill score is positive and varies between 15% and 20% (Fig. 3c). This corresponds to a great circle error reduction of up to 120 km. At a lead time of 72 h, the self-adapting ensemble forecasts gain 12 h compared to the forecast of the EP-CLIPER model (Fig. 3d). In the Atlantic basin, however, a small negative skill score is found. It increases from −10% at 12 h to −3% at 72 h (Fig. 3a) enhancing the great circle error by about 25 km at 72 h (Fig. 3b). We have no explanation for the different performance behavior in the two basins. A factor might be related to the more climatological tracks in the east Pacific compared to the Atlantic. Also the meridional component in the Atlantic, which has larger weights in the scheme than the zonal component (and vice versa in the east Pacific), might cause the difference (see Fig. 2).

The simulated operational trials (with white noise being added to the initial best tracks) yield the following results: In both basins the absolute forecast errors of the self-adapting and the CLIPER model are increased compared to the results using independent best track data. The forecasts of the self-adapting analog model, however, are better than the ones of CLIPER utilizing on the same simulated operational dataset. In particular, in the Atlantic basin the ensemble mean forecasts of the self-adapting model achieve a distinct improvement over CLIPER (Fig. 4a). In the east Pacific basin, there is also an improvement over CLIPER; the skill score decreases up to 36 h and increases again for larger lead times (Fig. 4b).

## 4. Discussion and conclusions

A self-adapting analog forecast scheme is utilized for ensemble predictions of the tropical cyclones tracks in the Atlantic and east Pacific basins. Starting with the Euclidean metric and a given set of states defined by the best track data, the model learns how to weight the components of the predictor states by minimizing the forecast error. The weights, which result from the self-adapting scheme, are an indication of the importance of the corresponding components. They show that the displacements, the intensity, and the season are more important for the analog search than the actual position of the cyclone center. Unlike the east Pacific basin where movements are more zonal, the meridional components are more important in the Atlantic basin with recurvature occuring frequently.

When comparing different analog models, it has been shown that both ensemble forecasting and metric adaption lead to substantial forecast improvement. Comparing the self-adapting ensemble forecasts with the CLIPER reference predictions using best track data as an independent verification set shows different results for the two basins with positive (negative) skill in the east Pacific (Atlantic). Applying the CLIPER and the self-adapting analog model to simulated operational data with white noise added to the initial best tracks, the self-adapting ensemble forecasts provide positive skill scores for both basins (compared to CLIPER), showing that the self-adapting ensemble model is not as sensitive to noisy data as the CLIPER-type model. Nevertheless, the real value of the model needs to be tested under operational conditions.

## Acknowledgments

The detailed and helpful comments of the two reviewers are appreciated. Cooperation with institutions in Australia has been supported by the Max Planck Prize.

## REFERENCES

Aberson, S. D., 1998: Five-day tropical cyclone track forecasts in the North Atlantic basin.

*Wea. Forecasting,***13,**1050–1056.Brown, G. M., and P. W. Leftwich, 1982: A compilation of eastern and central North Pacific tropical cyclone data. NOAA Tech. Memo. NWS NHC 16, 15 pp. [Available from National Technical Information Service, 5285 Port Royal Rd., Springfield, VA 22151.].

Elsberry, R. L., 1995: Tropical cyclone motion. Tech. Doc. WMO/TD-No. 693, World Meteorological Organization, 289 pp.

Fraedrich, K., and L. M. Leslie, 1989: Estimates of cyclone track predictability. I: Tropical cyclones in the Australian region.

*Quart. J. Roy. Meteor. Soc.,***115,**79–92.——, and B. Rückert, 1998: Metric adaption for analog forecasting.

*Physica A,***253,**379–393.——, L. M. Leslie, and R. Z. Morison, 2000: Improved tropical cyclone track predictions by error recycling.

*Meteor. Atmos. Phys.,*in press.Holland, G. J., 1983: Tropical cyclone motion: Environmental interaction plus a beta effect.

*J. Atmos. Sci.,***40,**328–342.Hope, J. R., and C. J. Neumann, 1970: An operational technique for relating the movement of existing tropical cyclones to past tracks.

*Mon. Wea. Rev.,***98,**925–933.Jarvinen, B. R., C. J. Neumann, and M. A. S. Davies, 1984: A tropical cyclone data tape for the North Atlantic basin, 1886–1983: Contents, limitations, and uses. NOAA Tech. Memo. NWS NHC 22, 24 pp. [Available from National Technical Information Service, 5285 Port Royal Rd., Springfield, VA 22151.].

Leslie, L., and K. Fraedrich, 1990: Reduction of tropical cyclone position errors using an optimal combination of independent forecasts.

*Wea. Forecasting,***5,**158–161.Neumann, C. J., 1972: An alternate to the HURRAN (Hurricane Analog) tropical cyclone forecast system. NOAA Tech. Memo. NWS SR-62, 32 pp. [Available from National Technical Information Service, 5285 Port Royal Rd., Springfield, VA 22151.].

——, 1981: Trends in forecasting the tracks of Atlantic tropical cyclones.

*Bull. Amer. Meteor. Soc.,***62,**1473–1485.——, and P. W. Leftwich, 1977: Statistical guidance of the prediction of eastern North Pacific tropical cyclone motion—Part 1. NOAA Tech. Memo. NWS WR-125, 32 pp. [Available from National Technical Information Service, 5285 Port Royal Rd., Springfield, VA 22151.].

——, and J. M. Pelissier, 1981a: Models for the prediction of tropical cyclone motion over the North Atlantic: An operational evaluation.

*Mon. Wea. Rev.,***109,**522–538.——, and ——, 1981b: An analysis of Atlantic tropical cyclone forecast errors, 1970–1979.

*Mon. Wea. Rev.,***109,**1248–1266.Pike, A. C., 1987: Statistical prediction of track and intesity for eastern North Pacific tropical cyclones. Preprints,

*10th Conf. on Probability and Statistics in Atmospheric Sciences,*Edmonton, AB, Canada, Amer. Meteor. Soc., 40–41.——, and C. J. Neumann, 1987: The variation of track forecast difficulty among tropical cyclone basins.

*Wea. Forecasting,***2,**237–241.Sauer, T., J. A. Yorke, and M. Casdagli, 1991: Embedology.

*J. Stat. Phys.,***65,**579–615.Sievers, O., 1998: Analogvorhersage von tropischen Zyklonenzugbahnen mit einem selbst-adaptierenden Modell. Diplomathesis, Meteorologisches Institut, University of Hamburg, Hamburg, Germany, 102 pp. [Available from Meteorologisches Institut, Universität Hamburg, Bundesstr. 55, D-20146 Hamburg, Germany.].

Zhang, Z., and T. N. Krishnamurti, 1997: Ensemble forecasting of hurricane tracks.

*Bull. Amer. Meteor. Soc.,***78,**2785–2795.

Phase space components of the analog model for the Atlantic basin. Note that the maximum wind component (20) is not included in the east Pacific basin; long in ° north, lat in ° east, zonal–meridional displacements in km, wind in m s^{−1}