## Abstract

Major airstreams in tropical cyclones (TCs) are rarely described from a Lagrangian perspective. Such a perspective, however, is required to account for asymmetries and time dependence of the TC circulation. We present a procedure that identifies main airstreams in TCs based on trajectory clustering. The procedure takes into account the TC’s large degree of inherent symmetry and is suitable for a very large number of trajectories $[O\u2061(106)]$. A large number of trajectories may be needed to resolve both the TC’s inner-core convection as well as the larger-scale environment. We define similarity of trajectories based on their shape in a storm-relative reference frame, rather than on proximity in physical space, and use Fréchet distance, which emphasizes differences in trajectory shape, as a similarity metric. To make feasible the use of this elaborate metric, data compression is introduced that approximates the shape of trajectories in an optimal sense. To make clustering of large numbers of trajectories computationally feasible, we reduce dimensionality in distance space by so-called landmark multidimensional scaling. Finally, *k*-means clustering is performed in this low-dimensional space. We investigate the extratropical transition of Tropical Storm Karl (2016) to demonstrate the applicability of our clustering procedure. All identified clusters prove to be physically meaningful and describe distinct flavors of inflow, ascent, outflow, and quasi-horizontal motion in Karl’s vicinity. Importantly, the clusters exhibit gradual temporal evolution, which is most notable because the clustering procedure itself does not impose temporal consistency on the clusters. Finally, TC problems are discussed for which the application of the clustering procedures seems to be most fruitful.

## 1. Introduction

The notion of coherent airstreams plays a long-standing and fundamental role in our understanding of cyclone dynamics (e.g., Hamilton and Archbold 1945; Riehl 1954; Eliassen and Kleinschmidt 1957). In midlatitude cyclones coherent airstreams are referred to as conveyor belts (e.g., Browning 1971; Carlson 1980). These airstreams are distinctly asymmetric with respect to the cyclone’s center and have been studied extensively based on trajectory calculations (e.g., Wernli and Davies 1997, and subsequent developments). In tropical cyclones (TCs) the major airstreams are usually separated into the primary and the secondary circulation and are traditionally depicted as streamlines in horizontal and radius–height cross sections, respectively (e.g., Riehl 1954). Such a streamline framework may be employed because mature TCs can be considered, to lowest order, as axisymmetric, steady-state systems. For a steady state, streamlines and trajectories are equivalent.

A two-dimensional streamline framework, however, is a poor approximation when transient and/or asymmetric motions become important (e.g., in TCs that undergo substantial structure change or TCs embedded in an environmental flow with considerable vertical shear). So far, relatively few comprehensive trajectory analyses of TCs have been performed (e.g., Cram et al. 2007; Stern and Zhang 2013; Riemer and Laliberté 2015). Trajectory analysis of TCs needs to resolve convection in the inner core and thus requires high temporal and spatial resolution of the underlying data when model output is used to calculate trajectories. This requirement for large datasets may partly impede the application of trajectory analysis to TCs. A further and conceptually more severe impediment is the need for a concise interpretation of the wealth of information contained in the computed trajectories. Regarding the thermodynamic interaction with the environment, Riemer and Laliberté (2015) proposed transformation of three-dimensional (3D) trajectories into a two-dimensional thermodynamic (moist entropy–temperature) space as a succinct depiction of a TC’s secondary circulation. A succinct description of the evolving main airstreams during TC structure change has, to our knowledge, not been proposed yet.

In midlatitude cyclones the conceptual model of conveyor belts is heavily used to organize and diagnose information from trajectory calculations (e.g., Wernli and Davies 1997; Papritz and Schemm 2013; Joos and Forbes 2016). For example, a rather simple set of variables and appropriate thresholds can be used to identify warm conveyor belts as ensembles of trajectories that behave very similarly and that densely populate a compact region in 3D physical space (sometimes referred to as coherent bundles of trajectories; Wernli and Davies 1997). Clustering techniques have been employed more recently in an attempt to identify conveyor-belt trajectories more objectively (e.g., Hart et al. 2015). If trajectory bundles are coherent in 3D physical space, the bundle may be represented simply by its average 3D trajectory or by suitable thresholds of some trajectory-density metric.

In tropical cyclones the large degree of symmetry of the system impedes a definition of coherent airstreams based on similarity in physical space. To illustrate this point, consider an axisymmetric TC, in which some air parcels spiral toward the center in the frictional inflow layer, ascend in the eyewall, and swirl out to larger radii in the outflow layer in the upper troposphere. These air parcels constitute the classical secondary circulation of the TC and thus a physically meaningful class of trajectories. In 3D physical space, however, individual trajectories of this secondary circulation may differ substantially because they may start and end at very different azimuth. In addition, the average 3D trajectory of this class collapses to a line at the TC’s center for exact axisymmetry.

The goal of this study is to develop an objective classification of major airstreams in the vicinity of TCs based on cluster analysis. The particular problems to be addressed are (i) to appropriately represent both the convective-scale motion in the TC’s inner core and the larger-scale, more quasi-horizontal environmental motions and (ii) to take into account the large degree of symmetry that characterizes a TC. The first problem requires a dense distribution and a high temporal resolution of trajectories in the inner core. To address this problem, downsampling strategies to reduce the density of trajectories outside of the inner core, and thus to reduce the total number of trajectories, could be developed and applied. We here take a different approach and use uniform sampling, which yields order of 10^{6} trajectories per considered time period. We will present methods (i) to compress the large amount of trajectory data while preserving the main characteristics of the shape of trajectories and (ii) to reduce the dimensionality of the clustering problem and thus make feasible cluster analysis for a very large amount of trajectories. The second problem, accounting for the large degree of symmetry, will be addressed by a suitable transformation of trajectories before clustering. The basic idea here is to compute similarity of trajectories based on their shape but not on their absolute position in 3D physical space. Addressing these two problems will be the main method developments presented in this study.

The applicability of our approach will be demonstrated for the case of a TC that undergoes substantial structure change and that is strongly affected by vertical shear (i.e., a case in which transient and asymmetric motion can be expected to be of crucial importance). This case will be the extratropical transition (ET) of Karl (2016), one of the major cases during the North Atlantic Waveguide and Downstream Impact Experiment (NAWDEX; Schäfler et al. 2018). During ET, TCs lose their tropical characteristics and start to transition into midlatitude cyclones (Jones et al. 2003; Evans et al. 2017). An ET case therefore appears to be a well-suited testbed to demonstrate the applicability of our clustering approach. In addition, our clustering results can be compared with results of a previous study that has investigated the ET of Karl focusing on airstreams associated with latent heat release near the inner core (Euler et al. 2019). In the current study, however, we will consider the more general Lagrangian evolution of air masses, including quasi-horizontal environmental flow. Specifically, we will consider all trajectories within a 2.5° radius of Karl’s center. This radius is expected to capture the bulk of the air masses above the frictional inflow layer that interact with a TC’s inner-core convection within 12 h, which is a typical time scale for TC intensity change due to environmental interaction [as observed in idealized experiments with moderate to strong vertical shear; e.g., Frank and Ritchie (2001); Riemer et al. (2010)].

Section 2 provides an overview of Karl’s ET, describes the model simulation that provides the data underlying this study, and gives information on the calculation of trajectories. Our method to account for the large degree of symmetry and the very large number of trajectories are detailed in section 3. The results of the cluster analysis are presented in section 4 where it is demonstrated that all clusters are distinct and represent physically meaningful classes of trajectories. A brief analysis of thermodynamic characteristics along these main airstreams (section 5) provides insight into aspects of Karl’s intensification and may illustrate avenues for future analyses based on automatically identified airstreams. Section 6 finally provides a summary, conclusions, and an outlook on potential future work.

## 2. The ET of Karl: Convection-permitting simulation and trajectory computation

### a. Synoptic overview of Karl’s evolution

Karl originally developed from a tropical wave off the coast of West Africa, was declared a tropical depression^{1} near the Cape Verde Islands on 14 September 2016, and further intensified to tropical-storm strength on 15 September. Tropical Storm Karl moved mainly westward for several days until beginning to interact with an upper-level low on 21 September. During this interaction Karl weakened to a tropical depression and started to move to the northwest. When the upper-level low moved away toward the south, Karl started to move northward and to reintensify to tropical-storm intensity. After recurvature on 23 September, vertical shear increased and Karl accelerated toward the northeast. Karl reached its peak intensity [60 kt (1 kt ≈ 0.51 m s^{−1})] just before the TC became embedded in a midlatitude frontal zone and was declared extratropical on 25 September. A more detailed overview of Tropical Storm Karl is provided by Pasch and Zelinsky (2016). This study considers Karl’s substantial structure changes that occurred on 24 and 25 September (i.e., the intensification of Tropical Storm Karl toward peak intensity in an environment with increasing vertical wind shear until Karl lost its tropical characteristics and was declared extratropical). An illustration of Karl’s track and intensity evolution between 1800 UTC 23 September and 1800 UTC 25 September is provided in Fig. 1, along with the distribution of equivalent potential temperature (*θ*_{e}) in the low to midtroposphere and Karl’s location relative to the midlatitude jet.

During the extratropical stage of ET (Klein et al. 2000) Karl moved rapidly to the northeast and merged with a preexisting weak cyclone. This merger resulted in an unusually strong midlatitude jet streak. Karl has thus been identified as one of the “triggers” for midlatitude impact during the NAWDEX campaign (Schäfler et al. 2018). This impact manifested itself in a severe precipitation event in Northern Europe. Using ensemble sensitivity analysis, it has been demonstrated that this high-impact event exhibited sensitivity to Karl’s evolution during ET (Kumpf et al. 2019).

### b. Convection-permitting COSMO simulation

The data underlying this study is the same as in Euler et al. (2019). A convection-permitting simulation of Tropical Storm Karl (2016) has been performed using the Consortium for Small-scale Modeling (COSMO; Steppeler et al. 2003) model, version 5.04. COSMO is a nonhydrostatic limited-area model for fully compressible flow, which has been designed for both operational numerical weather prediction and scientific applications. Besides in Euler et al. (2019), convection-permitting COSMO simulations have been successfully used before to investigate structure changes of a TC during ET (Lentink et al. 2018).

The horizontal grid spacing of the simulation is 0.025° (~2.8 km) with 49 levels between 0 and 21 km height. COSMO default settings have been used for the parameterization schemes: inter alia a turbulent kinetic energy-based turbulence scheme, a shallow-convection scheme with mass-flux closure after Tiedtke (1989), and a single-moment ice microphysics scheme (including graupel). The parameterization schemes are described in Doms et al. (2011) and more details on the setup of the simulation are given in Euler et al. (2019). Our simulation domain spans from 22° to 54°N and from 40° to 70°W. Boundary and initial conditions have been taken from archived operational analysis from the European Centre for Medium-Range Weather Forecasts. The simulation employs rotated spherical coordinates (Doms and Schättler 2002), with the model equator running approximately through the center of the domain at 35°N. This rotation of the coordinate system minimizes the convergence of the meridians such that the grid spacing, specified in degrees latitude and longitude, is approximately constant. The results will be presented in this native rotated coordinate system, denoted by rlat and rlon. Karl’s geographical position in longitude and latitude is depicted in Figs. 1a, 1b, and 2.

### c. Karl’s structure change in the COSMO simulation

The simulation spans the 66-h period from 0000 UTC 23 September to 1800 UTC 25 September. The general track and intensity evolution of Karl is captured well by the simulation (Fig. 1), as compared to the best track analysis by Pasch and Zelinsky (2016). In the simulation, as discussed in some more detail in Euler et al. (2019), Karl moves somewhat faster and reaches somewhat higher intensity, with a position and intensity error at the end of the simulation of approximately 200 km and 7 m s^{−1}, respectively. Despite these differences, we deem the simulation to be sufficiently realistic for the purpose of the current study.

Euler et al. (2019) identified four key stages in the evolution of Karl during the simulated period. Early during the simulation (Fig. 2a), Tropical Storm Karl is situated in a moist tropical environment with low vertical shear from the southwest. Inner-core convection mostly occurs in the downshear-left quadrant and the horizontal winds exhibit a prominent wavenumber-1 pattern with maximum winds to the right of the shear vector. This period has been dubbed tropical storm (TS) stage, represented by time 1200 UTC 24 September. Subsequently, Karl starts to intensify in the COSMO simulation, notwithstanding increasing vertical wind shear, and the inner-core updrafts in the downshear-left quadrant become more vigorous (Fig. 2b). Karl starts to approach a low-level baroclinic zone to its northwest but interaction with this feature remains weak. Instead, at this stage, Karl exhibits clear characteristics of a TC in moderate shear and the stage has been dubbed intensification (IS) stage. The representative time for this stage is 2100 UTC 24 September.

The remainder of the simulation is characterized by prominent interaction of Karl with the baroclinic zone and thus the midlatitude flow. Vertical wind shear further increases and Karl’s convection in the downshear-left quadrant starts to merge with banded convection along the low-level baroclinic zone (Fig. 2c). Prominent descent in the near-core region occurs in the upshear-left quadrant. Overall, Karl starts to lose its tropical characteristics and this stage is thus referred to as intermediate (IM) stage, represented at 0600 UTC 25 September. Finally, Karl starts to substantially distort the baroclinic zone, is affected by very strong vertical shear (above 20 m s^{−1}), and a dipole of vertical motion is evident with ascent in the downshear-left and descent in the upshear-left quadrant (Fig. 2d). This stage is referred to as extratropical (XT) stage, represented at 1200 UTC 25 September. While Euler et al. (2019) focused on trajectories associated with near-core vertical motion our analysis here will consider the behavior of all trajectories in the vicinity of Karl, thus including quasi-horizontal environmental flow also.

### d. Trajectory calculation

Data for trajectory calculation are available every 5 min from the simulation. Trajectories are calculated using a fourth-order Runge–Kutta scheme with linear interpolation in space and time and a 2 s time step. Trajectories are seeded with a horizontal spacing equivalent to the grid spacing (i.e., every 0.025° within a radius of 2.5° around the storm center). The center is defined by a low-level centroid of the Okubo–Weiss parameter [Okubo 1970; Weiss 1991; see Euler et al. (2019) for details]. Trajectories are seeded on pressure levels every 20 hPa, with a starting level of 10 hPa above the surface and extending up to the highest level with pressure greater than 150 hPa. This seeding strategy results in approximately 1.3 × 10^{6} trajectories for each considered seeding time. Trajectories are seeded at the four stages TS, IS, IM, and XT and are computed for 6 h forward and backward in time. The final positions of the forward computation will hereafter be referred to as “end positions” and the final positions of the backward computation as “initial positions.” Trajectory positions are stored at the same times as the underlying model data (i.e., every 5 min). Each trajectory is thus represented by 145 data points.

The presented results are qualitatively robust to small changes of our choice of spatiotemporal scales. Sensitivity tests were performed with a trajectory integration time of 11 and 10 h, respectively, and seeding within a radius of 2.4° and 2.2°, respectively. Only when reducing the radius to 2.2° and only for the XT stage, a qualitative change in the identified clusters occurs. This change can be explained by no longer identifying a cluster that largely comprises trajectories at relatively large radii at seeding time [i.e., outside of the 2.2° radius (cluster ATRC)], and instead identifying an alternative cluster that is a mix of two existing clusters (clusters INUP and OUT; see below for a description of the respective clusters).

## 3. Clustering under symmetry and large numbers of trajectories

Any cluster analysis crucially depends on how similarity in the underlying data are defined. Our first steps described below serve to adopt the definition of similarity to the problem of trajectories in TCs. We then describe our strategy to handle the very large amount of trajectories that underlies our analysis and our choice of cluster algorithm. Finally, we discuss the representation of the obtained clusters in physical space, which is a nontrivial task in our case because clustering is performed in a transformed, normalized space (see below) and trajectories in the same cluster thus do not necessarily form coherent bundles of trajectories in physical space.

### a. Defining similarity of trajectories in the TC context

#### 1) Accounting for translation and symmetries

As discussed in the introduction, TCs exhibit a relatively large degree of symmetry and we argue that similarity of trajectories within TCs is best measured by similarity of the shape of the trajectory and not primarily by proximity of trajectories in physical space. To account for the underlying approximate symmetries the following steps are performed to preprocess the data before clustering (Fig. 3):

Taking into account TC motion, trajectories are considered in a storm-relative frame of reference. The storm center, which constitutes the origin of this frame of reference, is defined by the low-level center as described above (section 2d).

Taking translational invariance into account, the seeding locations of all trajectories are moved to the same location in the horizontal plane, arbitrarily defined as the origin.

Taking rotational invariance into account, trajectories are rotated in the horizontal plane and the direction of all trajectories is aligned in the same direction at seeding time, arbitrarily defined as the positive rlon axis.

#### 2) Normalization and Fréchet distance

Taking into account the vastly different length scales for horizontal and vertical motion, the transformed trajectories are normalized by the standard deviation of the trajectory coordinates in the respective directions (following Hart et al. 2015). Standard deviations in rlat and rlon are reasonably similar (within 5%–10%) so that normalization may be performed after the transformation of trajectories. Specifically, in the following we consider deviations of the trajectories from their seeding location normalized, for the sake of symmetry, by the mean of the standard deviations at the start and the end times of the trajectories. Importantly, considering deviations only imply that the absolute height of the trajectories is not taken into account to define similarity, but only height changes (i.e., the characteristics of ascent and descent).

A widely used metric to measure distance between trajectories is the Fréchet distance, which we employ in this study also. While the Euclidean distance is based on the pointwise distance between air parcel locations at each given time step, the (discrete) Fréchet distance does not consider such pointwise distances at a given time (Fig. 4). Instead, the Fréchet distance considers reparameterizations (in time) of the trajectories and then seeks the minimum of the maximum distance between trajectories under all reparameterizations (Fréchet 1906; Eiter and Mannila 1994). More illustratively, consider two air parcels that travel along the *path* of their respective trajectory with arbitrary velocities.^{2} Let the two air parcels be connected by an imaginary string. The Fréchet distance equals the length of the shortest string that still allows both air parcels to travel along their respective path when the velocities are optimally matched. Thus, the Fréchet distance emphasizes the shape of trajectories while discounting differences in the trajectories’ temporal evolutions. Due to these characteristics, we consider the Fréchet distance to be the preferred, if not the optimal distance metric in the context of this study.

### b. Handling very large numbers of trajectories

Investigating a very large number of trajectories (here: order of 10^{6}) makes the computation of trajectory similarity by pairwise comparison of their distance during clustering computationally extremely expensive. To make a cluster analysis feasible, we first use a suitable approximate representation of trajectories to reduce the sheer amount of data and then a suitable reduction of dimensionality to make the comparison of the distances between trajectories computationally feasible.

#### 1) Data compression

High-frequency oscillations around the more slowly varying “mean” path of trajectories should not affect the overall similarity of the shape of trajectories. In our case, high-frequency oscillations occur [e.g., as gravity waves in the outflow layer that manifest as vertical fluctuations around an approximately constant mean trajectory height (not shown)]. Filtering out the high-frequency oscillations effectively allows a representation of the trajectories by fewer data points than in the original dataset (i.e., a reduction of the amount of required data).

Filtering out high-frequency oscillations while maintaining sharp but persistent turns of trajectories (i.e., large curvature) can be obtained in an optimal sense by using the Visvalingam–Whyatt algorithm (Visvaligam and Whyatt 1993), a well-known algorithm originally developed in the field of cartography. This algorithm considers the triangles that are formed by three consecutive points along a (discrete) trajectory (see Fig. 5 for illustration). The main idea of the Visvalingam–Whyatt algorithm is to eliminate triangles with small area. For trajectories sampled at constant time intervals, as it is the case here, triangles with small areas are a key property of slow velocities and/or small curvature. Removing all middle points^{3} along trajectories that are associated with triangles with small areas thus preserves the overall geometry of the trajectory while reducing the amount of data points needed for its representation.

Subjectively, we choose to reduce the number of points representing the trajectory from 145 to 11. This choice is a trade-off between accuracy^{4} and the need for data compression when using the costly (discrete) Fréchet distance, which scales as $O\u2061(M2\u2009logM)$, where *M* is the number of data points representing the trajectory.

#### 2) Reduction of dimensionality

Essentially, any cluster algorithm requires information about the pairwise distances between all trajectories to define individual clusters and cluster membership of individual trajectories. The computation of these pairwise distances scales as $O\u2061(cN2)$, where *c* is the cost of computing the distance metric and *N* is the number of trajectories. For large *N*, this computation is prohibitively expensive; in particular, if an elaborate metric such as the Fréchet distance is used (large *c*). A main goal here is to make feasible the clustering of a large number of trajectories using an elaborate distance metric. To this end, we dramatically reduce the computational cost by approximating these pairwise distances by Euclidean distances in a low-dimensional space and by using the costly Fréchet distance with respect to a subset of the trajectories only.

The first key idea of the approximation is to transform trajectories into a low-dimensional space, in which the trajectories behave with respect to their pairwise distances as similarly as possible as the original trajectories.^{5} This transformation may be performed by a technique that is well-known in the field of information visualization: so-called multidimensional scaling (MDS; Kruskal 1964, here illustrated in Fig. 6a). MDS is conceptually similar to how principal component analysis is usually applied in a meteorological context (e.g., Wilks 2011, chapter 11): MDS, however, performs the reduction of dimensionality by retaining only the leading eigenvalues of a matrix that contains the pairwise distances, whereas principal component analysis considers the covariance matrix. Consequently, MDS requires the knowledge of all pairwise distances, which would render the approach irrelevant for our application. The need to eliminate this prohibitive requirement introduces a second key idea: to apply MDS only for a small subset of *l* “landmark” trajectories (with *l* ≪ *N*). This approximate version of MDS, so-called landmark MDS (LMDS; De Silva and Tenenbaum 2004, here illustrated in Fig. 6b), requires the calculation of pairwise distance between the *l* landmarks only. The remaining trajectories are approximately embedded into the low-dimensional space using triangulation with these landmarks as reference points (i.e., the pairwise distances between the remaining individual trajectories and all landmarks are approximately preserved). The latter step requires the calculation of pairwise distance between all remaining trajectories and the *l* landmarks. Pairwise distance between all remaining trajectories is then computed with low computational cost in low-dimensional space. A more detailed and technical description of LMDS can be found in De Silva and Tenenbaum (2004).

There are several strategies to define the landmarks. Following De Silva and Tenenbaum (2004), we use a simple and computationally most efficient strategy and sample landmarks uniformly in the low-dimensional space. To yield the best low-dimensional representation, the number of landmarks *l* should be as high as computationally possible. Here, we choose *l* = 1000. A further choice to be made is the dimension *k* of the low-dimensional space. In a similar application, albeit for streamlines of steady-state flows, Theisel and Rössl (2012) have successfully applied MDS. Following Theisel and Rössl (2012), we set *k* = 3. Using *k* = 15, which includes all eigenvalues that are at least 5% of the magnitude of the largest eigenvalue, leads to virtually identical results of the cluster statistics (not shown).

Computation of the pairwise distances is computationally the limiting factor in a cluster analysis. As noted above, without reduction of dimensionality this computation scales as $O\u2061(cN2)$. Using LMDS, the leading term in the scaling is $O\u2061(clN)$, which implies a reduction of computational cost by a factor of *N*/*l*, here 10^{3}. Still, on an Intel i7–6800k CPU with 3.4 GHz our cluster analysis for a single stage takes approximately 1 day. A further term in the scaling of landmark multidimensional scaling is $O\u2061(l3)$. This cubic dependence on *l* illustrates that the number of landmarks is severely limited and that substantially increasing our choice *l* = 10^{3} would be computationally unreasonable.

### c. Clustering and representation of clusters in physical space

#### 1) Clustering

Clustering is performed in the low-dimensional space. Nevertheless, the very large amount of trajectories considered in this study leaves only two choices concerning the cluster algorithm: DBSCAN and *k*-means. Other clustering algorithms [e.g., hierarchical clustering used for trajectory clustering by Hart et al. (2015)], would be prohibitively expensive. Because the representation of trajectories in low-dimensional space is compact and dense (Fig. 7a), DBSCAN, which is a density-based cluster algorithm, yields only one single cluster (not shown). We thus adopt in this study the only suitable algorithm for our problem: the widely used *k*-means algorithm. In contrast to DBSCAN, *k*-means identifies by design a prescribed number of clusters. The seemingly arbitrary partitioning of a dense cloud of data into individual clusters (Fig. 7b) is a well-known feature of *k*-means. We will demonstrate below, however, that the resulting clusters are indeed physically distinct and meaningful.

To select the number of clusters at each of Karl’s stages, we have considered several metrics of cluster quality. It turned out that the so-called Calinski–Harabasz index, which is essentially the ratio of the overall intracluster and the intercluster variances, yields the sharpest signal. With this metric, distinct peaks indicate the preferred number of clusters. Such peaks are found for the TS, IS, and XT stage, indicating 5, 6, and 5 clusters,^{6} respectively (Fig. 8). At the IM stage, the Calinski–Harabasz index is basically flat between 2 and 6 clusters. Temporal consistency with the other stages suggests a choice of 5 or 6 clusters. We have chosen 6 clusters at the IM stage. Increasing the number of clusters from 5 to 6 turned out to leave the mean representation of 4 clusters virtually unchanged but splits the fifth cluster into two separate clusters^{7} that can both be identified at the previous IS stage also (not shown).

#### 2) Representation in physical space

Representing the clusters in physical space is a nontrivial task. The similarity of trajectories within a cluster pertains to the transformed space (i.e., after translation and rotation of individual trajectories). In physical space, however, individual trajectories within one cluster may differ substantially. As discussed in the introduction, the naive approach of representing the clusters by a mean trajectory in physical space may therefore often be misleading. The results below will thus be presented by showing the distribution of start and end points of trajectories in physical space, as well as the cluster-mean trajectory in normalized, transformed space. This mean trajectory is well suited to represent the *shape* of trajectories in a given cluster. The physical interpretation is that air masses within a given cluster move from the initial positions to the end positions following a path that is represented by the shape of the cluster-mean trajectory.

## 4. Clustering results and temporal evolution of main airstreams

We first present the clusters that are identified during the TS stage. Subsequently, we describe how these clusters evolve with time. In addition, a qualitatively new cluster will be described that emerges with the increase of vertical wind shear, and thus storm-relative flow, during the IS stage.

### a. Clusters during the TS stage

The clusters identified during the TS stage represent (i) quasi-horizontal, cyclonic swirling around the center, (ii) spiraling inflow, rapid ascent, and anticyclonic outflow, and (iii) slow, corkscrew-like ascent. These clusters meet the expectations that a tropical meteorologist may have for airstreams in the vicinity of a weak TC in a nearly quiescent environment and can thus be considered to be physically meaningful.

#### 1) Quasi-horizontal, cyclonic swirling

Figure 9 illustrates the two clusters that represent quasi-horizontal, cyclonic swirling. The clusters are distinguished by the magnitude of the cyclonic curvature of trajectories (Fig. 9b). The cluster with smaller curvature will be referred to as _{small}CURV and the cluster with larger curvature as _{large}CURV. Trajectories in _{small}CURV and _{large}CURV hardly undergo, on average, any vertical displacements within the considered time period of 12 h (Fig. 9e). Trajectories in _{large}CURV start and end at smaller radii and encircle the center almost completely, whereas trajectories in _{small}CURV start and end at larger radii and describe on average a third of a circle (Figs. 9b,d,f).

The spatial distribution of the start and end positions of _{large}CURV are located in an annulus around the center, approximately between 1°and 2° radius (Figs. 9a,c). In contrast, the start positions of trajectories in _{small}CURV exhibit a bimodal distribution, split into a northern branch that moves west of the center toward the south, and a southern branch that moves east of the center toward the north, mostly outside of 2° radius. Trajectories mostly occur up to 6 and 8 km height in _{large}CURV and _{small}CURV, respectively (Figs. 9d,f). Trajectories that occur below 1.5 km height tend to move radially inward, consistent with the existence of a frictional inflow layer.

#### 2) Inflow, rapid ascent, and anticyclonic outflow

Figure 10 illustrates the two clusters that represent inflow, rapid ascent, and anticyclonic outflow. One cluster comprises spiraling inflow and rapid ascent (hereafter referred to as INUP) and one cluster the outflow anticyclone (hereafter referred to as OUT; Figs. 10b,e). In INUP, the majority of trajectories originate from below 2 km height and from within 4° radius, then rise into the upper troposphere (on average to 12 km height) and remain during the considered time period within 2° radius from the center (Figs. 10a,c,d,f). Trajectories in OUT originate from the upper troposphere, mostly between 9 and 12 km height and predominantly from above and from the southwest of the center. At the end of the TS stage, most of this outflow air is found to the downshear-right between 1° and 4° radius and at heights between 8 and 14 km.

#### 3) Slow, corkscrew-like ascent

In the fifth cluster, trajectories ascend slowly while swirling cyclonically around the center (Figs. 11b,e). This movement implies corkscrew-like ascent and thus the cluster will be referred to as CORK. Trajectories in CORK originate from close to the center (within 1°–2° radius) and predominantly from below 8 km height (Figs. 11a,d).

Trajectories remain within this radial range and ascend in the considered 12 h time period on average only by 2 km (Figs. 11d,e,f), despite their location in the inner core. This trajectory behavior is consistent with the observation by Euler et al. (2019) that a substantial part of Karl’s inner-core vertical mass flux does not reach the outflow layer at this stage. Air parcels that detrain from and entrain into inner-core updrafts and that spent most of the 12 h period in the eye may contribute to this cluster. It is interesting to note that the characteristics of Karl’s inner-core convection manifest themselves as two distinct clusters, INUP and CORK, in the current cluster analysis.

### b. Temporal evolution of clusters

The cluster analysis presented herein has been performed independently for each of Karl’s stages. Subsequently, clusters are tied together across consecutive stages by using the minimum of the pairwise Fréchet distances between the cluster mean trajectories as an objective criterion.^{8} The analysis thus does not, per se, require a gradual, coherent temporal evolution of the individual clusters. Therefore, it is most notable that the identified clusters do exhibit such a gradual evolution, strongly supporting the notion that the clusters represent physically meaningful airstreams.

#### 1) Clusters with quasi-horizontal motion

During all of Karl’s stages considered herein, the mean trajectories of _{large}CURV and _{small}CURV remain very similar (Figs. 12b,e and 13b,e ). Besides these similarities, small gradual changes with time are evident. One gradual change is a decrease in cyclonic curvature. In addition, for _{large}CURV, gradual changes comprise an increase in mean height and increasing ascent during the second part of the trajectory. For _{small}CURV, there is a gradual increase of the length of the mean trajectory, reflecting the increase of vertical shear from the TS to the XT stage. With increasing strength of the environmental flow, trajectories originate and end at larger distance from the center (Figs. 13d,f).

The spatial distribution of the start and end positions exhibit more prominent, but still gradual changes. Most notably, in _{large}CURV, the distributions of the start and end positions develop with time an increasing degree of asymmetry (Figs. 12a,c) and extend to larger radii and heights (Figs. 12d,f). During the IM and the XT stage, the end positions indicate an increasing transport of air masses into the downshear region. The increasing asymmetry and increasing downshear transport are a consequence of the gradual increase of vertical shear from the TS to the XT stage. Increasing vertical shear implies increasing storm-relative flow (e.g., Willoughby et al. 1984), which in turn implies (i) an increasing deformation of storm-relative streamlines and (ii) a decreasing area within a dividing streamline (i.e., less air is being rotationally constrained to circulate around the vortex center) (e.g., Riemer and Montgomery 2011).

In _{small}CURV, the bimodal distribution of the start and end positions becomes more distinct with time (Figs. 13a,c). In strong vertical shear (IM and XT stage), the start and end positions clearly exhibit an upper- and lower-tropospheric branch (Figs. 13d,f). The occurrence of these two branches (i.e., a lack of midtropospheric trajectories in this cluster), can be explained based on the general kinematic structure of vertical-shear flows (Willoughby et al. 1984; Riemer and Montgomery 2011). Tropospheric-deep vertical shear implies that the storm-relative environmental flow changes sign at some midtropospheric steering level, with downshear flow in the upper troposphere and upshear flow in the lower troposphere. Small storm-relative environmental flow in the vicinity of the steering level, combined with Karl’s vortical flow, evidently leads to a trajectory behavior that is distinct from that in the lower and upper troposphere, where the combination of Karl’s vortical flow and prominent storm-relative environmental flow leads to similar curvature of trajectories. The configuration of storm-relative environmental flow further implies that air masses tend to move upshear in the lower troposphere and downshear in the upper troposphere, relative to Karl’s center. Consequently, the branch with initial positions downshear (-left) and end positions upshear (Figs. 13a,c) can be identified with the lower-tropospheric branch. Accordingly, air masses in the upper-tropospheric branch move from up- to downshear. Figure 13 further indicates that the lower-tropospheric branch dominates.

Notably, with the advent of considerable vertical shear at the IS stage, a qualitatively new cluster emerges and persists through the IM and the XT stage. At the IS stage, the mean trajectory of this cluster exhibits little vertical motion and curvature, with some cyclonic curvature during the second half of the trajectory (Figs. 14b,e). Trajectories originate from the mid- to upper troposphere (mostly between 6 and 12 km height) and from a widespread region upstream of the center (Figs. 14a,d). Trajectory end positions are mostly right of shear and between 1° and 3° radius (Figs. 14c,f). The mean trajectories at the IM and the XT stage are similar to the IS stage but (i) exhibit more cyclonic curvature during the later part of the trajectory, (ii) start from gradually lower heights, and (iii) indicate weak descent (Figs. 14b,e). The start positions exhibit more distinct changes and are found at larger radii in the upshear region during the IM and the XT stage (Figs. 14a,d). In contrast, the end positions change relatively little and appear much more localized than the start positions. In reference to this attracting nature of the trajectory behavior, we term this cluster ATRC.

Cluster ATRC represents air masses that are transported toward the storm center from large radii. With increasing storm-relative flow (IM and XT stage) trajectories originate from larger radii upshear. Getting closer to the center, the trajectories get increasingly influenced by the storm’s cyclonic flow. In particular, trajectories enter a region of confluent flow (in terms of flow topology: an attracting manifold (Riemer and Montgomery 2011), which arises from the superposition of vortical and storm-relative flow (cf. e.g., Fig. 18 in Willoughby et al. 1984). Arguably, this confluence organizes the trajectories originating from a widespread region upshear into the more localized region of end positions observed in Fig. 14c.

#### 2) Clusters with rapid ascent

As discussed above, the in-up-out circulation of Karl at the TS stage is represented by inflow and rapid ascent in cluster INUP and upper-tropospheric anticyclonic flow in cluster OUT. Cluster INUP persists through the IS and the IM stage, with strongly curved cyclonic inflow and subsequent rapid ascent (Figs. 15b,e), but is no longer identified at the XT stage. With the increasing vertical shear during the IS and the IM stage, the source region of the inflow becomes increasingly asymmetric and extends out to larger radii (Figs. 15a,d), with inflow from relatively large radii originating predominantly from right of shear. The end positions of the trajectories remain in the vicinity of the center but gradually shift downshear and to larger radii (Figs. 15c,f). These changes are consistent with the increase of shear-induced storm-relative flow in the lower and upper troposphere (i.e., below and above a midlevel steering level). Furthermore, it is interesting to note that trajectories reach lower heights at the IM stage than before (Figs. 15e,f). This change is consistent with the increasingly detrimental thermodynamic environment in which Karl moves during this stage (Euler et al. 2019).

At the IS, the IM, and the XT stage, upper-tropospheric anticyclonic flow in cluster OUT is no longer separated from the rapid ascent below (Fig. 16). In the low-shear environment at the TS stage, upper-tropospheric flow is weak enough such that the outflow anticyclone may remain in the vicinity of the center (Fig. 10). The increased vertical shear during the later stages implies, however, that outflow air is rapidly advected downshear (Figs. 16b,c). Comparing the general characteristics of cluster OUT at these later stages with the general characteristics of cluster INUP at the earlier stages (the TS, the IS, and the IM stage), cluster OUT exhibits a less pronounced cyclonically curved inflow (cf. Figs. 16b and 15b), earlier rapid ascent (cf. Figs. 16d and 15d) and a distinctly larger downshear displacement in the upper troposphere (cf. Figs. 16c,f and 15c,f). Similar to cluster INUP, the start positions of trajectories become increasingly asymmetric and the outflow gradually reaches lower altitude (Figs. 16a,e).

It is interesting to note that the distinction between clusters INUP and OUT diminishes in the strong-shear environments of the IM and the XT stage: the start height of the mean trajectory of cluster OUT gradually decreases because increasingly more trajectories start at low levels and in the frictional inflow layer, in particular during the XT stage (Figs. 16d,e). In addition, the mean trajectory starts ascending at later times. Arguably, the lack of a distinct INUP cluster at the XT stage is partly due to these diminishing differences.

#### 3) Cluster with slow, corkscrew-like inner-core ascent

At all stages, cluster CORK exhibits mean trajectories that swirl cyclonically around the center at small radii and that ascend slowly (Figs. 17b,e). The ascent increases from the TS to the XT stage. Start positions of trajectories are mostly from below 6–8 km height and within 2° radius above the frictional inflow layer. Within the inflow layer, trajectories originate also from larger radii. From the TS to the XT stage, trajectories originate from increasingly larger radii within the inflow layer (Figs. 17a,d) and the fraction of inflow-layer trajectories increase (Fig. 17d). Within the inflow layer, trajectories mostly originate from right of shear and exhibit an increasing tendency to end in the downshear region (Figs. 17a,c).

## 5. Avenues for further analysis: Mean thermodynamic characteristics of clusters

The analysis of Karl’s major airstreams provides a kinematic perspective of the evolution. An important further component of a more complete analysis is how thermodynamic properties evolve along these airstreams. To gain more insight into specific aspects of Karl’s evolution, and also to illustrate avenues for further analysis based on automatically identified major airstreams, we here consider the evolution of temperature *T* and moist entropy along trajectories of selected clusters. A more comprehensive thermodynamic analysis is beyond the scope of this study.

The evolution of thermodynamic properties along airstreams can be described succinctly by the associated mass flux in thermodynamic space (Kjellsson et al. 2014; Riemer and Laliberté 2015), here namely temperature (*T*)–moist-entropy space. For simplicity, we use *θ*_{e} as moist-entropy variable [calculated following Bolton (1980)]. The interested reader is referred to Eq. (1) in Riemer and Laliberté (2015) for a formal definition of the thermodynamic mass flux vector. In a nutshell, mass flux directed from low to high *T* and *θ*_{e} manifests an increase of *T* and *θ*_{e} along trajectories, respectively. Accordingly, the mass flux is directed in the opposite direction when *T* or *θ*_{e} decreases along trajectories. Due to a strong dependence on pressure, and thus height, *T* may serve to a first approximation as vertical coordinate. Two specific examples: (i) mass flux from high to low *T* at constant *θ*_{e} (e.g., around *θ*_{e} = 350 K in Fig. 18a) signifies moist-adiabatic ascent and associated cooling by expansion, and (ii) mass flux from low to high *θ*_{e} at high *T* (e.g., for *T* > 290 K and *θ*_{e} < 350 K in all panels of Fig. 18) signifies nonconservative increase of *θ*_{e} by surface fluxes and turbulent vertical mixing in the boundary layer.

We first consider the thermodynamic characteristics that are associated with the intensification of Karl in increasing vertical shear. Vertical shear increasing from small to moderate values tends to organize convection in weak tropical cyclones (e.g., Molinari et al. 2004) and has been demonstrated to increase inner-core vertical mass flux during the onset of ET (Davis et al. 2008). Euler et al. (2019) have shown that Karl’s intensification is associated with such an increased vertical mass flux of relatively high-*θ*_{e} air (here illustrated in Figs. 18a,b vs c,d; the increased mass flux is most directly evident in the mid to upper troposphere, i.e., for *T* < 270 K). The new aspect revealed by the current analysis is that not only the rapid, tropospheric-deep convection contributes to this increased mass flux (the combined clusters INUP and OUT; Figs. 18a,c). In addition, the slower corkscrew-like ascent makes a substantial contribution (cluster CORK, Figs. 18b,d). A closer inspection (not shown) reveals that the mass flux in the mid- to upper troposphere increases from the TS to the IS stage by approximately 50% in the combined clusters INUP and OUT and by approximately 100%–150% in cluster CORK. Arguably, in the case of Karl, increasing vertical shear does not only invigorate deep convective ascent but invigorates also the more slowly ascending airstream.

Thermodynamic characteristics of selected clusters elucidate also aspects of Karl’s evolving structure during the onset of ET. During the XT stage (Figs. 18e,f), the respective thermodynamic mass flux of the combined clusters INUP and OUT and of the cluster CORK remain qualitatively the same as during the tropical stages of Karl (the TS and IS stages, Figs. 18a–d). From this perspective, Karl’s ET at this stage does not imply fundamental changes of the thermodynamic characteristics along these ascending airstreams. Quantitatively, however, ascent during the XT stage occurs with substantially (10–8 K) lower *θ*_{e} and ends at substantially higher *T* (mostly *T* > 230 K instead of *T* < 220 K) than during the tropical stages. We further note that cluster CORK exhibits two ascending branches: one at *θ*_{e} > 350 K and one at *θ*_{e} < 345 K (Fig. 18f). Euler et al. (2019) has demonstrated that Karl’s inner core, notwithstanding Karl’s weak intensity, acts as a containment vessel that transports high-*θ*_{e} air into the increasingly hostile environment during ET. The current analysis indicates that this high-*θ*_{e} air contributes to latent heat release during the XT stage as part of the CORK airstream. Further analysis, however, of such intriguing but more detailed features of Karl’s ET are beyond the scope of the current study.

In contrast, the thermodynamic characteristics along the quasi-horizontal airstreams in cluster ATRC and cluster _{small}CURV exhibit substantial qualitative changes (Fig. 19). During the tropical stages, here exemplified by the IS stage (Figs. 19a,b), the thermodynamic mass flux is generally small and will not be further considered here. During the XT stage, however, cluster ATRC exhibits prominent warming at approximately constant *θ*_{e} (for *θ*_{e} < 340 K, Fig. 19c) and cluster _{small}CURV exhibits a prominent increase of *θ*_{e} at high *T* for low-*θ*_{e} air (for *θ*_{e} < 330 K, Fig. 19d). Arguably, these features are due to adiabatic compression of descending air masses along the baroclinic zone (cluster ATRC) and due to surface fluxes and turbulent mixing in the boundary layer (cluster _{small}CURV). Further exploration of the significance of these features and comparison with other ET cases are potentially fruitful tasks for future studies.

## 6. Summary and conclusions

This study presents a trajectory-clustering procedure tailored to TCs and to handle a very large number of trajectories [$O\u2061(106)$]. Clustering is performed in a normalized, transformed space to take into account the large degree of axisymmetry and translational invariance of the TC circulation. The Fréchet distance, which emphasizes differences in the shape of trajectories, is used to define similarity between transformed trajectories. To adequately resolve trajectories in a TC’s convectively dominated inner core requires high spatiotemporal resolution. Considering the larger-scale flow at the same time may readily yield a very large number of trajectories. A cluster analysis may thus become computationally prohibitively expensive.

To make cluster analysis computationally feasible, we perform data compression by the Visvaligam and Whyatt (1993) algorithm, which reduces the number of data points to represent a trajectory while approximately preserving the general shape of a trajectory in an optimal sense. A further reduction of dimensionality is achieved by landmark multidimensional scaling (De Silva and Tenenbaum 2004). Exact pairwise distances between trajectories, required for any cluster algorithm, are calculated for a small subset of trajectories only. These so-called landmarks are used to construct a low-dimensional space that optimally approximates the pairwise distances. For the remaining trajectories, pairwise distances are calculated only with respect to the landmarks. The remaining trajectories are then approximately embedded in the low-dimensional space using distance-based triangulation with the landmarks as reference points. Clustering of all trajectories is then performed based on the approximated distances between trajectories in the low-dimensional space using the *k*-means algorithm. These approximations reduce the computational cost of the clustering procedure by a factor of $O\u2061(103)$. Only the clustering procedure is subject to approximations. The actual analyses presented herein show the full unapproximated trajectories.

The ET of Tropical Storm Karl (2016) is investigated to demonstrate the applicability of our clustering procedure. This case has been a focus of a major field campaign and has been linked to a high-impact weather event in the downstream region (Schäfler et al. 2018; Kumpf et al. 2019). In addition, Euler et al. (2019) have investigated inner-core airstreams in Karl that are associated with latent heat release. The focus of the current study is on the more general behavior of airstreams in the vicinity of Karl, specifically: within a cylinder of 2.5° radius and for a temporal evolution of 12 h. The choice of these scales is motivated by the typical scales for environmental interaction of TCs and associated intensity change. The presented results are representative for this combination of typical scales. A major change to one of these scales, however, may change the physical problem under consideration and may lead to qualitatively different results.

Our cluster analysis is applied to four key stages of Karl’s evolution, as defined by Euler et al. (2019). At all stages, the analysis yields distinct and physically meaningful clusters. The clusters represent (i) quasi-horizontal cyclonic swirling of air masses around the center, subdivided into clusters with large and small trajectory curvature, (ii) the in-up-out secondary circulation, subdivided into clusters with inflow and rapid ascent, and rapid ascent and outflow, and (iii) slowly ascending air masses in the inner core that describe corkscrew-like ascent. An additional cluster emerges with the advent of substantial vertical wind shear and thus storm-relative flow. In this cluster, air masses pass Karl on quasi-horizontal paths with little curvature. Notably, the identified clusters exhibit a gradual and coherent evolution from stage to stage. Our cluster analysis, however, is performed independently for each stage and thus does not, per se, require such a gradual, coherent evolution. We therefore interpret the observed gradual temporal evolution as further evidence that the cluster analysis yields physically meaningful results.

The identified clusters provide an objective and automatable overview of the main airstreams in the vicinity of TCs. Analysis based on such clusters may be particularly useful when intercomparing several different cases or when analyzing many cases (e.g., in an ensemble framework). While the approach proposed herein lacks the details found in studies that focus on more specific processes (e.g., Cram et al. 2007; Stern and Zhang 2013; Euler et al. 2019), such studies usually define airstreams subjectively by introducing threshold criteria for several variables that are tailored to the intensity and structure of a specific TC. In addition, that approach tends to neglect those air parcels that do not fulfill the specific criteria. Taking into account the full Lagrangian solution (i.e., all trajectories within a given region) as in the approach presented herein, may be beneficial for some applications.

A brief analysis of thermodynamic characteristics along main airstreams provides insight into aspects of Karl’s intensification in increasing vertical shear and of structure change during Karl’s ET. While a more comprehensive analysis is beyond the scope of this study, this brief analysis may illustrate avenues for future analyses based on automatically identified main airstreams. Such Lagrangian analyses seem most fruitful for TC problems, in which flow asymmetries and the interaction with environmental air have been demonstrated to be of crucial importance; for example, the interaction of mature TCs with vertical shear (e.g., Simpson and Riehl 1958; Frank and Ritchie 2001; Tang and Emanuel 2010; Riemer et al. 2010, 2013), the development and intensification of TCs in shear (e.g., Rios-Berrios et al. 2016a,b, 2018), and ET. Ideally, the objective identification of a TC’s main airstreams may provide a framework for future trajectory analyses similarly to the conveyor-belt paradigm for midlatitude cyclones.

## Acknowledgments

The research leading to these results has been done within the subproject “A4—Evolution and predictability of storm structure during extratropical transition of tropical cyclones” of the Transregional Collaborative Research Center SFB/TRR 165 “Waves to Weather” funded by the German Research Foundation (DFG). The code to perform the cluster analysis will be available at https://github.com/wavestoweather. Parts of this research were conducted using the supercomputer Mogon and/or advisory services offered by Johannes Gutenberg University Mainz (hpc.uni-mainz.de), which is a member of the Alliance for High Performance Computing (AHRP) in Rhineland Palatinate (www.ahrp.info) and the Gauss Alliance e.V.

## REFERENCES

*Mon. Wea. Rev.*

*Weather*

*Mon. Wea. Rev.*

*J. Atmos. Sci.*

*J. Atmos. Sci.*

*Handbuch der Physik*, S. Flügge, Ed., Vol. 48, Springer, 1–154

*Mon. Wea. Rev.*

*Mon. Wea. Rev.*

*Mon. Wea. Rev.*

*Rend. Circ. Matemat. Palermo*

*Quart. J. Roy. Meteor. Soc.*

*Mon. Wea. Rev.*

*Wea. Forecasting*

*Quart. J. Roy. Meteor. Soc.*

*J. Atmos. Sci.*

*Wea. Forecasting*

*Psychometrika*

*IEEE Trans. Vis. Comput. Graph.*

*Mon. Wea. Rev.*

*J. Atmos. Sci.*

*Deep-Sea Res.*

*Tellus*

*Tropical Meteorology*

*Atmos. Chem. Phys.*

*J. Atmos. Sci.*

*Atmos. Chem. Phys.*

*Atmos. Chem. Phys.*

*J. Atmos. Sci.*

*J. Atmos. Sci.*

*J. Atmos. Sci.*

*Bull. Amer. Meteor. Soc.*

*Tech. Conf. on Hurricanes*, Miami Beach, FL, Amer. Meteor. Soc., D4-1–D4-10

*Meteor. Atmos. Phys.*

*J. Atmos. Sci.*

*J. Atmos. Sci.*

*IEEE Trans. Vis. Comput. Graph.*

*Mon. Wea. Rev.*

*Cartogr. J.*

*Physica D*

*Quart. J. Roy. Meteor. Soc.*

*Statistical Methods in the Atmospheric Sciences*. 3rd ed. International Geophysics Series, Vol. 100, Academic Press, 704 pp

*J. Atmos. Sci.*

## Footnotes

Denotes content that is immediately available upon publication as open access.

This article is included in the Waves to Weather (W2W) Special Collection.

^{1}

By the National Hurricane Center of the U.S. National Oceanic and Atmospheric Administration.

^{2}

The Fréchet distance comprises a monotonicity constraint: The velocities need to be positive into the direction of the trajectory path.

^{3}

In our case: except for the seeding point.

^{4}

On average, the difference between simplified and original trajectories in our case decreases exponentially with increasing number of points. Our choice of 11 points lies in the transition region between the rapid decrease of accuracy for smaller numbers and the high accuracy of the simplification achieved with a higher number (>20) of points.

^{5}

The distance metric in low-dimensional space may be different from the original metric. We here use Euclidean distance in low-dimensional space. The final results of our cluster analysis are essentially insensitive to the choice of this distance metric.

^{6}

When clustering is performed using Euclidean distance, the Calinski–Harabsz index tends to indicate a lower number of optimal clusters (for the IS stage, e.g., four instead of six clusters). Our interpretation of this tendency is that the Fréchet distance does indeed provide a sharper distinction between trajectory shapes, which further supports the use of the Fréchet distance in this application.

^{7}

Below, these two clusters will be referred to as clusters INUP and CORK, respectively.

^{8}

The only exception is cluster OUT and cluster ATRC at the transition from the TS to the IS stage. Cluster ATRC at the IS stage is most similar to cluster OUT at the TS stage. The mean trajectory of cluster OUT, however, clearly exhibits anticyclonic curvature throughout the whole trajectory, whereas the mean trajectory of cluster ATRC first exhibits very little curvature and then exhibits cyclonic curvature during the later part (Fig. 14b). Due to this qualitative difference we introduce cluster ATRC as a new, distinct cluster at the IS stage. For cluster OUT at the IS stage, cluster OUT at the TS stage is the most similar cluster. Therefore, and due to the close Lagrangian connection of the in-up-out circulation, we define cluster OUT at the IS stage as the continuation of cluster OUT at the TS stage.