## Abstract

Using trajectories from acoustically tracked (RAFOS) floats in the Gulf of Mexico, we construct a geography of its Lagrangian circulation within the 1500–2500-m layer. This is done by building a Markov-chain representation of the Lagrangian dynamics. The geography is composed of weakly interacting provinces that constrain the connectivity at depth. The main geography includes two provinces of near-equal areas separated by a roughly meridional boundary. The residence time is about 4.5 (3.5) years in the western (eastern) province. The exchange between these provinces is effected through a slow cyclonic circulation, which is well constrained in the western basin by preservation of *f*/*H*, where *f* is the Coriolis parameter and *H* is depth. Secondary provinces of varied shapes covering smaller areas are identified with residence times ranging from about 0.4 to 1.2 years or so. Except for the main provinces, the deep Lagrangian geography does not resemble the surface Lagrangian geography recently inferred from satellite-tracked drifter trajectories. This implies disparate connectivity characteristics with potential implications for pollutant (e.g., oil) dispersal at the surface and at depth. Support for our results is provided by a Markov-chain analysis of satellite-tracked profiling (Argo) floats, which, while forming a smaller dataset and having seemingly different water-following characteristics than the RAFOS floats, replicate the main aspects of the Lagrangian geography. Our results find further validation in independent results from a chemical tracer release experiment.

## 1. Introduction

The oil spill produced by the *Deepwater Horizon* drilling rig explosion in May 2010 (Lubchenco et al. 2012) has motivated great interest in the Lagrangian circulation of the Gulf of Mexico (GoM). This is reflected in the execution in recent years of a number of field campaigns observing its surface Lagrangian circulation. A main reason for investigating the surface Lagrangian circulation is found in the very tangible effects it had on the evolution of the oil slick that emerged from the ocean floor (Olascoaga and Haller 2012). The main campaigns have been the Grand Lagrangian Deployment (GLAD) in July 2012 (Olascoaga et al. 2013; Poje et al. 2014; Beron-Vera and LaCasce 2016) and the Lagrangian Submesoscale Experiment (LASER) in February 2016 (Miron et al. 2017; Novelli et al. 2017). These two campaigns contributed to nearly duplicating the satellite-tracked surface drifter database existing prior to the oil spill, which consisted mainly of drifter trajectories from the National Oceanic and Atmospheric Administration (NOAA) Global Drifter Program (GDP; Lumpkin and Pazos 2007) and the Surface Current Lagrangian-Drifter Program (SCULP; Sturges et al. 2001; Ohlmann and Niiler 2005); see Miron et al. (2017) for more details.

Large amounts of oil were reported to stay submerged and to persist for months without substantial biodegradation (Camilli et al. 2010). Yet, the effects that the deep Lagrangian circulation had on the submerged oil remained elusive, which directly or indirectly motivated the execution of experiments to also observe the Lagrangian circulation at depth. From 2011 to 2015, one experiment featured an important deployment of acoustically tracked floats (Hamilton et al. 2016). This augmented the existing submerged float database, consisting mainly of profiling floats from a dedicated experiment (Weatherly et al. 2005) and routine sensing of the deep global ocean (Roemmich et al. 2009). Another experiment involved the release of a chemical tracer at depth in July 2012 near the *Deepwater Horizon* site and its subsequent sampling over the course of one year (Ledwell et al. 2016). An aspect of the deep Lagrangian circulation highlighted by the dedicated profiling float experiment (Weatherly et al. 2005) was the restricted communication between the eastern and western GoM basins and also a cyclonic circulation at about 900 m in the southwestern sector. Analysis of the acoustically tracked float trajectories in the western basin (Pérez-Brunius et al. 2018) from the recent experiment (Hamilton et al. 2016) further revealed the existence of a cyclonic boundary current below 900 m and a cyclonic gyre over the abyssal plain consistent with numerical studies (Oey and Lee 2002) and the analysis of hydrographic data (DeHaan and Sturges 2005) and deep-water moorings (Tenreiro et al. 2018). Direct inspection of the same acoustically tracked float trajectories (Pérez-Brunius et al. 2018), as well as rough estimates of connectivity between the eastern and western basins (Hamilton et al. 2016), suggests that the exchange between them occurs along the boundary following a cyclonic circulatory motion. In turn, the analysis of the dispersion of the chemical tracer released at depth in the eastern basin (Ledwell et al. 2016) concluded that homogenization by stirring and mixing is substantially faster in the GoM than in the open ocean. The main source of energy in the deep eastern basin is presumably provided by the Loop Current by inducing a deep flow through baroclinic instabilities, deep eddies, and topographic Rossby waves, which can transfer energy toward the western basin (Sheinbaum et al. 2016; Hamilton et al. 2016; Donohue et al. 2016).

The goal of this paper is to shed new light on the deep Lagrangian circulation in the GoM by using probabilistic tools from nonlinear dynamical systems. These are applied to the above acoustically tracked float trajectories with a focus on connectivity. Investigating connectivity with the probabilistic nonlinear dynamics tools summarizes to the analysis of the eigenvectors of a transfer operator approximated by a matrix of probabilities of transitioning between boxes of a grid, which provides a discrete representation of the Lagrangian dynamics (Froyland et al. 2014b). Markov-chain representations of this type had originally been used to approximate almost-invariant sets in nonlinear dynamical systems using short-run trajectories (Hsu 1987; Dellnitz and Junge 1999; Froyland 2005), and in the ocean context to determine the extent of Antarctic gyres in two- (Froyland et al. 2007) and three-space (Dellnitz et al. 2009) dimensions. This eigenvector method has been recently applied on drifter data to construct a geography of the surface Lagrangian circulation (Miron et al. 2017). A Lagrangian geography is composed of dynamical provinces that delineate weakly interacting basins of attraction for almost-invariant attractors, which imposes constraints on connectivity. Here we construct a geography for the deep Lagrangian circulation, providing firm support to earlier inferences from the direct inspection of float trajectories and, furthermore, revealing a number of aspects transparent to traditional Lagrangian data analysis.

## 2. Data

The main dataset analyzed in this paper is composed of trajectories produced by a total of 152 quasi-isobaric acoustically tracked RAFOS (Sound Fixing And Ranging or SOFAR, spelled backward) floats (Rossby et al. 1986) deployed in the GoM (Hamilton et al. 2016). Starting in 2011, 121 floats ballasted for 1500 m and 31 floats for a lower depth of 2500 m were deployed in the following 2 years. Each float recorded position fixes three times daily, with record lengths varying between 7 days and 1.5 years. Given that the estimated position uncertainty is on the order of 5 km, this sampling is more frequent than required, so we here consider daily interpolated trajectories. The float deployment during the first 2 years of a 4-yr-long program was performed by several U.S. (Woods Hole Oceanographic Institution, Leidos Corporation, University of Colorado) and Mexican (Centro de Investigación Científica y de Educación Superior de Ensenada) teams sponsored by the U.S. Bureau of Ocean Energy Management (BOEM). The records of the last floats deployed ended in summer 2015. The recorded trajectories of all floats are shown in Fig. 1 (deployment locations and final positions are indicated in blue and red, respectively). The trajectories cover a region bounded for the most part by the 1750-m isobath (dashed lines in Fig. 1 indicate, from outside to inside, the 1500-, 1750-, and 2500-m isobaths). Note that while the floats are too deep to escape the GoM through the Straits of Florida (where the maximum depth is roughly 700 m), they are capable of escaping through the Yucatan Channel (where the maximum depth is about 2000 m). Mooring measurements suggest that the latter is indeed possible as they have revealed a countercurrent between 500 and 1750 m on the western and eastern sides of the Yucatan Channel (Sheinbaum et al. 2002). However, no float is seen to travel into the Caribbean Sea. Confinement of the 1500- and 1750-m floats within the region bounded by the 1750-m isobath suggests predominantly columnar motion. This is confirmed by the analysis presented below, which ignores the depth of the floats to maximize the number of available trajectories.

We also analyze trajectories recorded by all available (60) profiling floats in the GoM from the Argo Program (Roemmich et al. 2009). Unlike RAFOS float positions, Argo float positions are recorded every 10 days after the float descends down to a parking depth of 1000 m, where it drifts for 9 days, and further to 2000 m to begin profiling temperature and salinity in its ascent back to the surface. The trajectories of the Argo floats roughly sample the same area as the RAFOS floats, albeit much less densely. However, the way that the Argo floats sample the deep Lagrangian circulation differs from that of the RAFOS floats, which remain at all times parked at a fixed depth. Despite this difference, we show that Argo floats replicate some important aspects of the deep circulation inferred using the RAFOS floats.

A third set of independent data considered is composed of concentrations of chemical tracer from a release experiment (Ledwell et al. 2016). In the experiment, a 25-km-long streak of CF_{3}SF_{5} was injected on an isopycnal surface about 1100 m deep and 150 m above the bottom, along the continental slope of the northern GoM, about 100 km southwest of the *Deepwater Horizon* oil well, where oil was detected at depth after its explosion. The tracer was sampled between 5 and 12 days after release, and again 4 and 12 months after release.

## 3. Theory

### a. Transfer operator and transition matrix

Let be a closed flow domain on the plane. We assume that the Lagrangian dynamics is governed by an advection–diffusion process. A tracer initialized at position (represented by a delta measure ) therefore evolves passively for units of time to a probability density , with . The function is clearly nonnegative, and satisfies

The function is a (bounded) stochastic kernel; see, for example, sections 5.7 and 11.7 of Lasota and Mackey (1994) for a discussion of stochastic kernels and advection–diffusion equations, respectively. To evolve a general initial density forward *T* units of time, we define a Markov operator, known as the Perron–Frobenius operator, or more generally a *transfer operator*, , as

The density is the result of evolving forward *T* units of time under the advection–diffusion dynamics.

Note that the only time dependence is the duration of time *T*. In particular, we do not model variation of the advection–diffusion dynamics as a function of initial time. This is appropriate for a probabilistic description of the dynamics, as done in statistically stationary turbulence (Orszag 1977), yet it is also a consequence of the nature of the dataset considered here. In either case the significance of the time homogeneity assumption can only be assessed a posteriori, as we do here.

A discretization of the transfer operator can be attained using a Galerkin approximation referred to as Ulam’s method (Ulam 1960). In the context of ocean dynamics, such approximations have been obtained from simulated trajectories from ocean models in Froyland et al. (2007) and Dellnitz et al. (2009) [a related continuous-time version is discussed in Khatiwala et al. (2005)] and from observed drifter trajectories in Maximenko et al. (2012) and van Sebille et al. (2012). Ulam’s method involves partitioning the domain *X* into a grid of *N* connected boxes and projecting functions in onto the finite-dimensional approximation space , where for and 0 otherwise is the indicator function of set . Define by

To calculate the projected action of on , we compute

The (*i*, *j*)th entry of the array represents an approximation of the kernel for . We assume from now on that the grid is regular, that is, for all . Because represents the density obtained by evolving forward *T* units of time,

is the proportion of tracer beginning in that is found in after *T* units of time [“proportion” because of integration over and the property (1)], averaged over the initial tracer in [“averaged” because of the integration over and division by ].

If we are presented with tracer data in the form of trajectories of individual tracer particles, by considering a sufficiently large number of particles over a total time horizon , we can estimate the entries of as (see appendix A)

The *transition matrix * defines a Markov-chain representation of the dynamics, with the entries equal to the conditional transition probabilities between boxes, which are represented by the states of the chain.

The forward evolution of the discrete representation of , , is calculated under left multiplication, that is,

We note that the discrete evolution described by introduces additional diffusion with magnitude of the order of the box diameters (Froyland 2013).

### b. Ergodicity, mixing, attracting sets, residence time, and retention time

Because the transition matrix is row stochastic, that is, , is a *right* eigenvector with eigenvalue , that is, . The eigenvalue is the maximum eigenvalue of . The associated nonunique *left* eigenvector is normalized to a probability vector () which is invariant (because ) and nonnegative (by the Perron–Frobenius theorem; Horn and Johnson 1990).

We call *irreducible* (or ergodic) if for each there is an such that . All states of an irreducible Markov chain communicate, the eigenvalue is simple, and the corresponding left eigenvector is strictly positive (Horn and Johnson 1990). We call *aperiodic* (or mixing) if there is an *i* such that . For aperiodic one has for *any* initial probability vector .

Suppose that is irreducible on some class of states . We call *S* an *absorbing closed communicating class* if for all , and for some (cf. Froyland et al. 2014b). The set forms an approximate time-asymptotic forward-invariant attracting set for trajectories starting in .

Markov-chain theory also provides a very simple means of determining the mean residence time in a collection of boxes. For , let be the transition matrix restricted to the subset of indices corresponding to boxes whose union is *A*. The mean time of a trajectory initialized in box to move out of *A*, also known as the mean time to hit the complement of *A*, is given by the solution of the linear equation [see, e.g., Norris (1998) and Dellnitz et al. (2009) for its use in the context of ocean dynamics]:

where is the identity matrix. The mean value of within *A* is a measure of the residence time of the entire set.

Another time scale related to residence time is the time-asymptotic retention time. As before, let be the set for which we wish to allocate a retention time and denote the restriction of to boxes whose union is *A*. If is mixing, the leading positive eigenvalue of , , is strictly less than 1. If one conditions on the fact that trajectories have already remained in *A* sufficiently long (so they are distributed like , the leading left eigenvector of ), then the probability of remaining in *A* for one more application of is . The probability of a point in *A* (distributed according to ) remaining in *A* for applications of is

Hence, the mean retention time for points in *A* distributed according to is

Note that if the mean value of , , is computed according to , that is, , then .

### c. Lagrangian geography from almost-invariant decomposition

Revealing those regions in which trajectories tend to stay for a long time before entering another region is key to assessing connectivity in a flow. Such forward time-asymptotic *almost*-invariant sets and their corresponding backward-time basins of attraction can be framed (Froyland et al. 2014b) by inspecting eigenvectors of with .

The magnitude of the eigenvalues quantifies the geometric rates at which eigenvectors decay. Those left eigenvectors with *λ* closest to 1 are the slowest to decay and thus represent the most long-lived transient modes (Froyland 1997; Pikovsky and Popovych 2003). For a given , a forward time-asymptotic almost-invariant set will be identified with the support of similarly valued and like-sign elements in the left eigenvector. Regions where the magnitude of the left eigenvector is greatest are the most dynamically disconnected and take the longest times to transit to other almost-invariant sets.

The multiple backward-time basins of attraction are identified by boxes where the corresponding right eigenvectors take approximately constant values [see Koltai (2011) for the simpler single basin case]. Decomposition of the ocean flow into weakly disjoint basins of attraction for time-asymptotic almost-invariant attracting sets using the above eigenvector method has been shown (Froyland et al. 2014b; Miron et al. 2017) to form the basis of a *Lagrangian geography* of the ocean, where the boundaries between basins are determined from the Lagrangian circulation itself, rather than from arbitrary geographical divisions.

We note that the eigenvector method differs from the flow network approach (Rossi et al. 2014; Ser-Giacomi et al. 2015). The eigenvector method analyzes time-asymptotic aspects of the dynamics through spectral information from the generating Markov chain, while the flow network approach computes various graph-based quantities for finite-time durations to study flow dynamics.

## 4. Results

### a. Building a Markov-chain model

To discretize the deep-ocean Lagrangian dynamics in the GoM, we laid down on the region *X* spanned by the trajectories of the RAFOS floats in Fig. 1 a grid with boxes of roughly 25 km a side. The size of the boxes was selected to maximize the grid’s resolution while each individual box is sampled by enough trajectories (recall that the float’s position is determined with a precision of about 5 km, so the uncertainty area around a float’s location is roughly 8 times smaller than the area of a box in the grid). Figure 2 shows the number of floats per box in the grid, independent of time over the entire 2011–15 period (left) and within each season in this period (right). Regions not visited by floats are found in each season, particularly in winter. Ignoring time, there are on average 86 floats per box, with one box having as many as 286 floats and seven boxes having only one float. Overall, while the float coverage may not be dense enough to carry out a time-dependent analysis, it is sufficient in space to build a Markov-chain model that assumes time homogeneity (Maximenko et al. 2012; van Sebille et al. 2012; Miron et al. 2017; McAdam and van Sebille 2018).

Before getting into the specifics of the computation of the transition matrix defining the Markov chain, it is important to note that formula (5) for does not require the number of trajectories that sample the boxes in the partition to be equal for each box. Thus, the nonuniform sampling of the RAFOS floats is not an impediment for computing . Nevertheless, to make sure that this does not introduce any biases in our analysis, we have also computed various values by randomly choosing a fixed number (50) of trajectories per bin and by randomly removing up to 50% of the float trajectories. The resulting values were found to produce results that could not be distinguished from those produced by the computed using all available trajectories, as done as follows. (Explicit evidence of the robustness of the eigenvector analysis in provided in appendix B.)

Using formula (6), we computed the entry of the transition matrix by counting the number of floats that starting in box on any day landed in box after days within the entire record of float data, which lasts years. A transition time days in general guarantees interbox communication. Furthermore, days is larger than the Lagrangian decorrelation time scale, which we estimated to be of about 5 days for half decorrelation consistent with earlier estimates (cf. LaCasce 2008). Markovian dynamics can be expected to approximately hold as there is negligible memory farther than 7 days into the past. A similar reasoning was applied in applications involving surface drifters (Maximenko et al. 2012; van Sebille et al. 2012; Miron et al. 2017; McAdam and van Sebille 2018), in which case the transition time was taken to be shorter due to the shorter decorrelation time near the ocean surface (LaCasce 2008). The results presented below were found to be insensitive to variations of *T* in the range 7–21 days.

### b. Assessing communication within the Markov chain

A Markov chain can be seen as a directed graph with vertices corresponding to states in the chain, and directed arcs corresponding to one-step transitions of positive probability. This allows one to apply Tarjan’s algorithm (Tarjan 1972) to assess communication within a chain. Specifically, the Tarjan algorithm takes such a graph as input and produces a partition of the graph’s vertices into the graph’s strongly connected components. A directed graph is strongly connected if there is a path between all pairs of vertices. A strongly connected component of a directed graph is a maximal strongly connected subgraph and by definition also a maximal communicating class of the underlying Markov chain.

Applying the Tarjan algorithm to the directed graph associated with the Markov chain derived using the float trajectory data, we found a total of four maximal communicating classes. Each one of these classes is indicated with a different color in Fig. 3. One class, denoted , is large, composed of the majority of the states in the chain or boxes of the partition (935 out of a total of ). From direct computation, for all , , that is, is closed. The classes in the complement of are small, with two formed by a single state and another one formed by nine states. Direct computation shows that for some *m* for all , . This reveals that is absorbing. Because is communicating and closed, restricted to , that is, , is irreducible and provided that no state repeatedly occurs, it will be mixing, that is, a unique, limiting invariant probability density will be supported on . Because is absorbing, its small complement can be safely ignored from the analysis without affecting the results by replacing with .

### c. Forward evolution of the probability density of a tracer

Figure 4 shows selected snapshots of the forward evolution of the probability density of a tracer initially uniformly distributed within the layer between 1500 and 2500 m in the GoM under the discrete action of the underlying flow map. At the coarse-grained level given by the grid defined above, this is defined by (7) with and as derived using the float data. Because we used 7-day-long trajectory pieces to construct , one application of is equivalent to evolving in forward time for *T* = 7 days. Note that the density eventually settles on a nonuniform invariant distribution. Indeed, the differences between the distributions acquired after 5 months and 5 years of evolution are already small under any similarity measure.

The regions where the density distribution of Fig. 4 locally maximizes represent vertical “outwelling” sites. Likewise, there is vertical “inwelling” in the regions where the density locally minimizes. Volume conservation in either case implies vertical motion. While the direction of this motion cannot be determined from the analysis of the float trajectories on a single layer, some sense may be made of its magnitude by comparing area change estimates obtained using the deep floats and the satellite-tracked surface drifter data employed in Miron et al. (2017).

Let be the area of box of the domain on which the float trajectories lie. After one application of , equivalently 7 days, we have . If the flow were area-preserving, we would expect . At depth, area dispersion () corresponds to vertical inwelling, while area concentration () corresponds to vertical outwelling. At the surface, area dispersion and concentration correspond to up- and downwelling, respectively. In Fig. 5 we show probability density function (PDF) estimates of relative area change inferred using deep float data (red) and surface drifter data (blue). The surface relative area change is computed over 7 days and using a partition into boxes of similar size as at depth, approximately 625 km^{2}. Note that both PDFs peak at 0, with the drifter PDF showing longer tails toward large positive values and higher negative values than the float PDF. Overall area preservation is seen to dominate equally at the surface and depth, while a larger tendency to disperse and concentrate areas is observed at the surface than at depth. Area preservation prevalence at depth is consistent with flow developing along isopycnals (Ledwell et al. 2016). But we note that RAFOS floats are isobaric and that the depth of the deep isopycnals can drop by a few hundred meters from west to east (Sturges and Lugo-Fernandez 2005).

### d. Analysis of the Markov chain’s eigenspectrum

We now proceed to determine the level of connectivity within the horizontal domain in the layer visited by the deep floats by applying the eigenvector method on the matrix . The eigenspectrum of is composed of a total of eigenvalues. As noted above, only eigenvectors of corresponding to eigenvalues close to unity are relevant, and because is very sparse, one may use Lanczos-type methods (Lehoucq et al. 1998) to compute the largest eigenvalues. With this in mind, we show in Fig. 6 the eigenspectrum of restricted to . This top 10% portion of the eigenspectrum includes 10 eigenvalues, yet not all 10 associated eigenvectors might need to be taken into account. Indeed, thinking of as a decay rate for a signed density revealed by the *n*th left eigenvector of , if a large spectral gap between two collections of eigenvalues is present, then the densities revealed by the eigenvectors associated with eigenvalues prior to the gap will decay much more slowly and survive over much longer time scales than the densities revealed by the eigenvectors associated with the eigenvalues after the gap. Therefore, the presence of a pronounced eigengap provides rationale for stopping eigenvector analysis. However, inspection of Fig. 6 does not reveal any gap that strongly suggests a cutoff for analysis except, perhaps, at or . Yet, only the gap at may be considered significant with respect to the uncertainty of the eigenvalue computation. The gray shading in Fig. 6 represents an uncertainty of the computation of as measured by the median absolute deviation about in an ensemble of 1000 realizations computed from transition matrices constructed using randomly perturbed float trajectories with 5-km amplitude, corresponding to the float positioning accuracy. This uncertainty measure grows noticeably beyond , making the gap there not significant. This suggests, in the absence of evidence establishing any better criterion, that the analysis should not be extended beyond the sixth eigenvector. It must be noted that we have so far implicitly referred to the real eigenspectrum of . Interspersed among the dominant eigenvalues (in absolute magnitude) are complex-conjugate eigenvalue pairs. We will discuss a relevant complex eigenvector pair after discussing the real eigenvectors, restricted to the first six as just articulated.

As expected from the assessment of communication within the Markov chain associated with , to numerical precision its largest eigenvalue, , is simple. The corresponding left and right eigenvectors are shown in the left and right panels of Fig. 7, respectively. The right eigenvector is flat as required by row stochasticity of , which is exactly satisfied given that no floats exit the domain *X* on which is defined (the physical significance of this domain being closed is discussed below). The left eigenvector is positive within the absorbing closed communicating class of states of the Markov chain, which covers most of the boxes of the partition (cf. Fig. 3), and vanishes elsewhere. Furthermore, the structure of the left eigenvector resembles the distribution of the probability density in the rightmost panel of Fig. 4. More precisely, this is nearly equal to the left eigenvector when normalized by the area of the boxes in the partition, confirming its invariant, limiting nature.

The top left and right panels of Fig. 8 respectively show the left and right eigenvectors associated with the second-largest eigenvalue of , . Note that the left eigenvector takes a single sign within each side of a zero-level curve that partitions the floats’ domain into two regions. Two basins of attraction are identified by the right eigenvector. These are given by the regions where the right eigenvector is approximately flat, that is, approximately looks like **1**. Splitting the domain into two regions between which there is weak interaction, the western (eastern) region constitutes a basin of attraction for the attractors revealed by the left eigenvector in the western (eastern) side of the domain. To make the connection between forward-time attractors and backward-time basins of attraction explicit, the eigenvectors have been assigned (in this and the subsequent figure) signs such that positive (negative) portions of the right eigenvector map to positive (negative) portions of the left eigenvector under repeated right multiplication by . These attractors are not invariant but rather will retain tracer for a finite period of time [intramixing time scales, loosely referred to as invariance time scales in Miron et al. (2017), can be estimated by thinking of as a decay rate as noted above; more insightful measures of the invariance time of a set are provided by the residence and retention times in the set, which are considered below].

Inspection of the left eigenvector associated with the 3rd largest eigenvalue of , , reveals further attracting sets, but with shorter invariance time scale (Fig. 8, middle-left panel). One such set in particular highlights a tendency of the Lagrangian motion to circulate along the 1500-m isobath on the western side of the domain for tracers initially covering a large sector in the center of the domain. The latter is revealed in the right eigenvector (Fig. 8, middle-right panel), which is nearly flat in the noted central sector. The right eigenvector is also approximately flat on two regions flanking this sector which are weakly connected through a very narrow channel running along the southern edge of the domain. The two regions cover the entire domain, forming two basins of attraction for almost-invariant sets in the regions of the left eigenvector with like sign.

Additional almost-invariant attracting sets (with shorter invariance time scales) and corresponding basins of attraction are revealed by the left–right eigenvector pairs associated with the fourth to sixth eigenvalues of (see, e.g., the fifth pair in the bottom row of Fig. 8). Patching together these and the above basins of attraction, the various Lagrangian geographic partitions shown in Fig. 9 are obtained.

### e. Lagrangian geography

Rather than thresholding right eigenvectors as in prior applications (Froyland et al. 2014b; Miron et al. 2017), the various provinces in each Lagrangian geography constructed here were automatically obtained by applying a *k*-means clustering algorithm (Kaufman and Rousseeuw 1990) that minimizes squared Euclidean distance as outlined in Algorithm 1 of Froyland (2005), but with the weighted fuzzy clustering replaced with *k*-means clustering. The main geographic partition in the left panel of Fig. 9 was obtained by seeking clusters in the second right eigenvector of . Refined geographic partitions, shown in the middle and right panels, resulted by considering more eigenvectors in the relevant set suggested by the eigenvalue uncertainty computation and eigengap inspection. The refined partition in the right panel of Fig. 9 was obtained by seeking clusters in the second through sixth right eigenvector. The middle panel shows an intermediate partition resulting from seeking clusters in the second through fourth right eigenvector. Seeking clusters from the second eigenvector is an obvious choice that follows from direct inspection of this eigenvector. Seeking clusters from the second through *K*th eigenvectors assumes that each right eigenvector adds a new province to the geography at a time. The silhouette value (Rousseeuw 1987) is a measure of how similar an object is to its own cluster compared to other clusters, with a large *s* indicating that the object is well matched to its own cluster and poorly matched to neighboring clusters. The mean *s* equals 0.91, 0.54, and 0.56 for the clusters identified using , 4, and 6 when the first 2, 4, and 6 right eigenvectors are taken into account, respectively. Larger *k* in the latter two cases can produce more consistent clusters, albeit much smaller and hence with shorter residence or retention times, so we have not considered them.

In the two-eigenvector geography, two large provinces, one western (WE) and another one eastern (ES), split the domain nearly in half. The four-eigenvector geography incorporates two provinces in WE: a small northern subprovince (WN) and another southern subprovince (WS) even smaller. The six-eigenvector geography incorporates to WE these same small subprovinces and another, much larger, central subprovince (WC). Province ES is not modified by the four-eigenvector geography, while the six-eigenvector geography alters it by the addition of a small northern subprovince (EN).

As constructed, the provinces of the above Lagrangian geographies only weakly dynamically interact. This imposes constraints on connectivity within the 1500–2500-m layer in the GoM. More specifically, the communication between any two provinces is constrained locally by the level of invariance of the attractors contained within each of them and remotely by that of any attractors outside of the provinces but sufficiently close to them.

The level of communication among provinces can be assessed by the computation of forward-time conditional transition probabilities between pair of provinces. Over 7 days, the mean interprovince probability percentages are 98.5%, 97.5%, and 94.3% for the two-, four-, and six-eigenvector geographic partitions, respectively. Note that these are high, indicating weak dynamical interaction among provinces. Note also that the percentages decrease as the number of provinces in the geography increases. This reflects in part that communication within large provinces is less constrained than across their boundaries. The resulting transition matrices restricted to the various geographies are not symmetric, revealing the asymmetric nature of the Lagrangian dynamics in time. The asymmetry grows with the number of provinces in the partition, as can be expected.

From left to right, Fig. 10 shows residence time estimates according to formula (8) within each of the provinces in the two- to six-eigenvector Lagrangian geographies. While the basins of attraction are not necessarily highly invariant under forward-time dynamics, we record relatively high residence times. Note that decreases outward down to zero at the boundaries of the provinces. As expected, the maximum values tend to be attained where the right eigenvectors of locally maximize (in absolute magnitude). These regions, as described above, correspond to basins that are attracted toward forward-time almost-invariant attracting sets. The small values of attained in the periphery of the provinces (basins of attraction corresponding to those attractors) simply indicate that mixing with neighboring provinces begins at their boundaries.

The residence time calculation shows a west–east asymmetry in the two-eigenvector geography, with the WE province having longer residence times than the ES province. Specifically, the mean residence time in WE (ES) province is of about 4.53 (3.38) years. It must be mentioned that the mean here is taken with respect to a uniform probability distribution, that is, it is computed as an average according to Lebesgue (area) measure. As noted earlier in the paper, mean residence times computed using (8) coincide with retention times in (10) based on the likelihood of a trajectory to survive in a given set if the mean in the former is taken according to the probability distribution given by the leading left eigenvector of restricted to the set in question. The resulting mean residence (or, equivalently, retention) times are 4.74 and 3.74 years for the WE and ES provinces, respectively, which are very similar to the values stated above.

The provinces in the four- and six-eigenvector geographies have shorter residence times. For instance, on average within the WE, WS, WC, WW, ES, and EN provinces in the six-eigenvector partition these are about 0.74, 0.90, 0.48, 0.64, 1.18, and 0.38 years, respectively. Shorter residence times in sets covering smaller areas are expected. But note the short residence time of the WC province despite its large coverage.

The direct evolution of tracers with further revealed that the slow exchange between the main provinces is executed from east (west) to west (east) through a northern (southern) corridor. This suggests a slow cyclonic circulation in the deep GoM. Such a preferred circulatory motion is consistent with the peculiar shape of the geography in the western side of the domain, which includes an enclave around which tracers will tend to circulate before the exchange of material is effected. This explains the residence time asymmetry of the main provinces.

The west–east residence time asymmetry can be further realized by computing the time it takes on average to hit or reach a given province starting in another province. This can be done using (8) with the region A set to the *complement* of the target province. The result of this calculation for the six-eigenvector geographic partition is shown in Table 1. The top row shows source provinces and the left column target provinces. Consider, for example, the bottom row. The mean time to hit EN in the eastern basin starting on WC in the western side of the domain is 6.42 years. Consider now the second-to-top row. To reach WC from EN, it takes on average 1.91 years. Consistent with west–east residence time asymmetry, it takes more than 3 times as long to reach EN from WC. Clearly, the mean time to reach a given province starting in the same province is 0. Note that WS has not been included in the table as this province is never reached from outside.

## 5. Validation

### a. Chemical tracer

The deep Lagrangian geography constructed here and the surface Lagrangian geography computed by Miron et al. (2017) are globally different on the overlapping domains, suggesting that the surface Lagrangian motion is to a large extent decoupled from the deep Lagrangian motion. An important exception is the partition by a roughly meridional boundary of the surface and deep domains into two basins of attraction for almost-invariant attractors revealed by the inspection of eigenvectors of the corresponding transition matrices with the second-largest nonunity eigenvalue [cf. the left panel of Fig. 9 and the left panel of Fig. 6 of Miron et al. (2017)].

The restricted connection at depth between the eastern and western GoM was suggested by the behavior of profiling floats parked at about 900 m launched in the eastern side, which tended to stay there, and those launched in the western side, which remained there for a long period of time (Weatherly et al. 2005). Here we provide support for the significance of the eigenvector methodology applied to the deep GoM domain at a deeper level using the observed evolution of the chemical tracer injected near the *Deepwater Horizon* oil rig during the field experiment described by Ledwell et al. (2016).

The right panels of Fig. 11 show the distribution taken by the chemical tracer 4 (top) and 12 (bottom) months after release. The release site lies about 100 km southwest of the cross, indicating the location of the *Deepwater Horizon* rig. The circles are colored according to the amount of tracer found during in situ casts, integrated vertically between 1000 and 2500 m. The colored background is a smoothed interpolated map based on the station data. Note the tendency of the tracer to spread over the eastern side of the domain. We may compare this tendency with the dominant (second) left eigenvector, representing the dominant forward-time almost-invariant set structure. After 12 months from release, little tracer is seen to have traversed the zero-level set of the left eigenvector of with the largest eigenvalue (indicated by a solid line). And note that absolutely no tracer at all was detected during the in situ casts made well in the western side of the domain. This is consistent with the expected fate of a tracer probability density initiated on the main eastern province of the Lagrangian geography constructed from the float data.

This expectation is confirmed by the evolution of a tracer probability started from a source location near the chemical tracer release site under the action of the transition matrix (Fig. 11, left panels). Consistent with the chemical tracer evolution, the (synthetic) tracer probability spreads similarly over the eastern side of the domain without significantly crossing the zero-level set of the largest nontrivial left eigenvector.

It must be noted, however, that as the tracer probability is continually being evolved under , it will eventually spread over the western basin along its northern boundary of the domain, describing a cyclonic circulatory pattern as noted above consistent with inferences made from the direct inspection of float trajectories, the analysis of hydrography and mooring data, and numerical simulations (Hurlburt and Thompson 1980; Oey and Lee 2002; Mizuta and Hogg 2004; DeHaan and Sturges 2005; Bracco et al. 2016; Hamilton et al. 2016; Pérez-Brunius et al. 2018; Tenreiro et al. 2018). Also consistent with direct inspection of trajectories (Hamilton et al. 2016; Pérez-Brunius et al. 2018), and high-resolution modeled fields (Bracco et al. 2016), some tracer will circulate cyclonically around the enclave inside the main western province of the Lagrangian geography.

### b. Profiling floats

Additional independent observational support for the significance of the results obtained from the analysis of the Markov-chain model derived using the RAFOS floats is provided by the analysis of a Markov-chain model constructed using Argo profiling floats drifting at an average parking depth of 1000 m. At a shallower depth, the Argo trajectories sample a similar horizontal domain of the GoM as the RAFOS trajectories, but less densely (there are only 60 Argo floats in the database analyzed). Also, the temporal coverage of the Argo floats is not as ample as the RAFOS floats. With these differences in mind, we constructed a matrix of probabilities of the Argo floats to transitioning between the boxes of a grid similar to that used with the RAFOS floats. The transition time was set to 7 days as in RAFOS floats analysis, which required us to interpolate the original 10-day trajectories. The Markov chain associated with the resulting was found to be characterized by an absorbing closed communicating class of states spanning most boxes of the partition. Thus, was found to be the only eigenvalue of on the unit circle, implying the existence of a limiting invariant probability vector. The structure of the left–right eigenvector pair for the second eigenvalue is shown in Fig. 12. Albeit more noisy and gappy, this pair shows similarities with that of the computed using the RAFOS floats (Fig. 8, top row). Indeed, a partition of the domain nearly into two basins of attraction is evident. This adds confidence to the RAFOS float analysis. It also suggests a tendency of the motion to be preferentially columnar in the deep GoM (recall that the Argo floats ascend to transmit positions at the surface while the RAFOS floats remain parked at a fixed level at all times during an experiment).

## 6. Discussion

### a. Cyclonic circulation and f/H

The cyclonic circulation in the western side of the GoM domain is well described by complex eigenvectors of . Let be a complex-conjugate left eigenvector pair of with , where . After *k* applications of , . Viewing each complex component of as a vector in , if the latter represents a rotation of each such vectors by an angle [cf. Froyland et al. (2014a), example 4.5, and the discussion immediately after]. Furthermore, returns to after applications of . Because need not be an integer, will not in general exactly coincide with , which represents almost-cyclic sets. Figure 13 shows where corresponds to the leading (for which and ) for several *k* over an almost cycle (with period years). Independent of whether or is considered, note the cyclonic rotation described in the western main province of the Lagrangian geography.

Rectification of topographic Rossby waves has been identified as a driver for the cyclonic circulation along the boundary (Hurlburt and Thompson 1980; Oey and Lee 2002; Mizuta and Hogg 2004; DeHaan and Sturges 2005). In linear, unforced, inviscid, barotropic and quasigeostrophic dynamics (Gill 1982), the vorticity changes only when there is motion across *f*/*H* contours, where *f* is the Coriolis parameter (twice the local vertical component of Earth’s angular velocity) and *H* is the fluid depth. As such, *f*/*H* provides a restoring force, supporting topographic Rossby waves, which through nonlinear interaction can be rectified to give rise to a mean flow directed mainly along *f*/*H* isolines (de Verdiere 1979). Here we test *f*/*H* conservation using the Markov-chain model and further assess its effect on the evolution of probability tracer densities in the domain.

Specifically, let be the mean value of *f*/*H* inside box in the partition. To first-order approximation, must be preserved along tracer trajectories entering box . To check this, we propagate backward the observable *f*/*H* by *k* steps with the transition matrix . Define , the *i*th component of the backward propagation of *f*/*H*, averaged over those boxes that the forward propagation of intersects. If *f*/*H* were exactly preserved along trajectories, then one would expect .

Figure 14 shows after applications of (corresponding to 1-yr forward evolution). Note that *f*/*H* is preserved with 25% error or less along the western and southern boundaries of the domain and over a large region of the eastern side of the domain. Because the restoring force provided by *f*/*H* is largest where *f*/*H* varies rapidly, tracer trajectories initially on the western boundary will be constrained to run along that boundary as the gradient of *f*/*H* across isobaths there is large (*f* is relatively constant in the domain). A similar behavior may be expected for trajectories starting on the southern boundary across which *f*/*H* changes rapidly. However, *f*/*H* is not uniformly preserved along this boundary. Indeed, boxes where is large are interspersed among boxes where this is small. As a consequence, trajectories starting on that boundary are not expected to be so constrained to run along it. In the eastern portion of the domain where is small, the bottom is relatively flat. As a result, trajectories starting will unlikely follow any particular *H* isoline while nearly conserving *f*/*H*. Moreover, this will tend to occupy the domain in question, which resembles the positive side of second left eigenvector of (cf. Fig. 8, upper panels). Finally, in the large western region where is large, the trajectories will wander unrestrained within the region, which itself resembles quite well the negative side of the second left eigenvector of .

The expected behavior of tracer trajectories deduced from the Markov-chain model is corroborated by observed float trajectory patterns. This is shown in Fig. 15 for groups of float trajectories that have gone through selected sites along the boundary of the domain (left) and the center of the western side of the domain (right). Note, in the left panel, how the red and orange trajectories tend to run along the western boundary, while the green trajectories are not so constrained to doing so along the southern boundary. Observe too in this panel how the blue trajectories cover the eastern side of the domain, consistent with *f*/*H* being preserved in that region. Finally, note that the trajectories in the right panel loop around in a largely unrestricted manner.

### b. Homogenization

The analysis of the chemical tracer injected at depth suggested that homogenization in the GoM is more rapid than in the open ocean (Ledwell et al. 2016). While the Markov-chain model constructed here does not predict uniform homogenization in the long run, it supports a limiting, invariant distribution which does not reveal a preferred region for accumulation but rather a multitude of different small regions where some accumulation is possible. The highly structured texture of this distribution suggests partial homogenization in the long run. This can be quite fast. For instance, for a tracer released in the eastern side of the domain, it can take as little as 1 year or so to spread over that portion of the domain (cf. Fig. 10), consistent with the good agreement between the forward evolution of a tracer probability and the observed chemical tracer spreading (recall Fig. 11). This corresponds well with the mean time required for the EN province to hit the WE province, which is of 0.86 years (cf. Table 1 and Fig. 9). A rough estimate of mean lateral eddy diffusivity is 2.4 × 10^{4} m^{2} s^{−1}, computed as the area of the eastern province (approximately *N*/2 = 473 times the area of the individual boxes, about 25^{2} km^{2}) divided by 1 year. Consistent with the idea of fast homogenization, this is quite big, on the order of values estimated using surface drifters in the vicinity of the Gulf Stream (Zhurbas and Oh 2004).

### c. Ventilation

Below 1000 m the GoM is filled with oxygen-rich water which is isolated from the diffusive inflow of oxygen from the surface by the presence of a layer of oxygen-poor water (Nowlin et al. 2001). As standard deep-water formation is very unlikely due to the extreme cooling and salinity increase required for the surface layer to sink, ventilation of the deep GoM has been argued to be accomplished via horizontal transport of oxygen-rich water from the Caribbean Sea (Rivas et al. 2005).

The tendency of the Lagrangian motion as inferred from the Markov-chain model constructed using the RAFOS floats to conserve area more effectively than that using surface drifters, together with the similarities of the dominant eigenvectors of the transition matrices built using RAFOS and Argo floats, is consistent with the above observations in that Lagrangian motion within the 1500–2500-m layer is predominantly horizontal. However, the Markov-chain model does not represent exchanges through the boundary of the domain as the RAFOS floats deployed inside the domain do not escape the domain.

Yet, the above does not rule out the possibility that floats deployed outside the domain eventually enter the domain. Indeed, the Argo floats suggest that this might happen at about 1000 m. This is shown in Fig. 16 for a few Argo float trajectories that start in the Caribbean Sea. We say “might” because we do not know for certain that the Argo floats penetrate the GoM domain at their parking depth or at a shallower level in their ascent and descent. Yet, ventilation might indeed take place more effectively at a shallower level. This is suggested by hydrographic observations which situate the core of the North Atlantic Deep Water (NADW) mass inside the Caribbean Sea between 1200 and 1300 m (Hamilton et al. 2018).

### d. Mean circulation

We close the discussion by investigating if the results obtained using the Markov-chain model could have ever been revealed by simply considering advection–diffusion using mean velocities deduced from the RAFOS float trajectories. This is a valid question inasmuch as our assumption of time homogeneity of the statistics could be thought to be represented in this simpler way. The top panel of Fig. 17 shows ensemble-mean streamlines computed by integrating a steady velocity field resulting from averaging the float velocities in each box of the grid used to construct the Markov-chain model. While the streamlines suggest a cyclonic flow along the periphery of the domain especially on its western side, which is consistent with the transfer operator analysis and also direct inspection of float trajectories (Hamilton et al. 2016), it is difficult to find a correspondence among the many sources and sinks with the various local minima and maxima of the limiting, invariant distribution of the Markov-chain model (cf. the rightmost panel of Fig. 4 or the left panel of Fig. 7). The differences with the Markov-chain model results are most clearly evidenced when the evolution of a tracer under the corresponding flow and subjected to diffusion is compared with the evolution of a tracer probability density under the transition matrix . The middle and bottom panels of Fig. 17 show snapshots of the evolution of a narrow Gaussian (normalized to represent a probability density) initially near the chemical tracer injection site for two diffusivity choices. The middle panels show the case of pure stirring since the simulation uses a diffusivity of 7 × 10^{−10} m^{2} s^{−1}, a molecular value for CF_{3}SF_{5} at mean ocean temperature estimated by Vardner and Loose (2016). The bottom panels use an eddy diffusivity of 50 m^{2} s^{−1}, estimated by an anonymous reviewer from the product of a characteristic length of 5 km (estimated from the resolution of the RAFOS floats’ positions) and a characteristic velocity of 0.1 m s^{−1} [a value that can be considered typical and consistent with diffusion diagrams of Okubo (1971) for the surface ocean and arguably also useful for the quite energetic deep GoM]. In the first (advective) case, the tracer is mainly stirred (stretched and folded) by eddy structures of the mean streamlines, which also trap the tracer near the initialization site. With eddy diffusivity, the mean flow eddies are much less effective in retaining the tracer near the initialization region. The tracer spreads more rapidly, yet not as widely as a probability density from a source located initially near the same location under action of or, nearly equivalently, the chemical tracer injected nearby (cf. Fig. 11). The substantial differences between these results and those from the Markov-chain model should not come as a big surprise since the Markov-chain model, while time homogeneous, is derived from trajectories sustained by time-varying deep-ocean currents and represent, in a statistical sense, the advection–diffusion dynamics of such a time-dependent flow. The mean flow streamlines plus a simple eddy diffusion do not represent this variability and are unable to reproduce the observed tracer spreading.

## 7. Summary and concluding remarks

Analyzing acoustically tracked (RAFOS) float data in the Gulf of Mexico (GoM), we have constructed a geography of its Lagrangian circulation within the deep layer between 1500 and 2500 m, revealing aspects of the circulation transparent to standard Lagrangian data examination as well as confirming, and thus providing firm support to, other aspects already noted from direct inspection of the float trajectories. The analysis was done by applying a probabilistic technique that enables the study of long-term behavior in a nonlinear dynamical system using short-run trajectories. The Lagrangian geography is inferred from the inspection of the eigenvectors of a transfer operator approximated by a transition probability matrix of the floats to moving over 1 week between boxes of a grid laid down on the domain visited by the floats. Such a transition matrix provides a Markov-chain representation of the Lagrangian dynamics.

The basic geography has a single dynamical province which constitutes the backward-time basin of attraction for a time-asymptotic invariant attracting set, which is revealed by the unique left eigenvector of with unit eigenvalue. This suggests that the residence time for tracers within the 1500–2500-m layer is very long. This result together with the tendency of the motion to preserve areas more effectively at depth than at the surface as inferred using satellite-tracked drifters suggest that transport and mixing is predominantly lateral.

Lateral transport and mixing inside the layer scrutinized do not happen unrestrainedly. Indeed, left eigenvectors of with eigenvalues close to unity reveal almost invariant sets that attract Lagrangian tracers originating in disconnected regions where the corresponding right eigenvectors are nearly flat. These backward-time basins of attraction define the provinces of a nontrivial Lagrangian geography, which, because they are only weakly dynamically interacting, impose constraints on connectivity and thus on lateral transport and mixing of tracers.

The simplest nontrivial geographical partition includes two nearly equal-area western and eastern provinces. Tracers initially released within these main provinces tend to remain confined within a few years, with the western province retaining tracers for longer than the eastern province. Communication between the provinces is accomplished through a cyclonic flow confined to the periphery of the domain, which was shown to be highly constrained by conservation of *f*/*H*, where *f* is the Coriolis parameter and *H* is depth, in the western side of the domain. Smaller secondary provinces of different shapes with residence times shorter than 1 year or so were also identified, imposing further restrictions on connectivity at shorter time scales.

Except for the main provinces, the secondary provinces identified do not resemble those of the surface Lagrangian geography recently inferred from satellite-tracked drifter trajectories. This implies disparate connectivity characteristics with possible implications for pollutant (e.g., oil) dispersal at the surface and depth.

The evolution of a chemical tracer from a release experiment as well as the analysis of a smaller set of Argo profiling floats were shown to provide independent support for the Lagrangian geography derived using the RAFOS floats. It is quite remarkable that the RAFOS and Argo floats produced similar Markov chain representations of the Lagrangian dynamics of the deep GoM given the different sampling characteristics (parking depth, temporal coverage) of these two observational platforms.

The good agreement between the results from the RAFOS and Argo float analyses suggests that the probabilistic tools employed here applied on the global Argo float array may provide important insight into the abyssal circulation of the World Ocean.

## Acknowledgments

We acknowledge Amy Bower’s group at Woods Hole Oceanographic Institution for RAFOS float preparation, data acquisition, and final processing, which made the float trajectory dataset possible. We thank Alexis Lugo-Fernandez for helping us access the acoustically tracked float data, which are currently available from the National Oceanic and Atmospheric Administration (NOAA)/Atlantic Ocean and Meteorological Laboratory (AOML) subsurface float observations page (http://www.aoml.noaa.gov/phod/float_traj/index.php). The profiling float data were collected and made freely available through SEA scieNtific Open data Edition (SEANOE) at http://www.seanoe.org (doi:10.17882/42182) by the International Argo Program and the national programs that contribute to it (http://www.argo.ucsd.edu, http://argo.jcommops.org). The Argo Program is part of the Global Ocean Observing System (http://www.goosocean.org). The chemical tracer data are publicly available through the Gulf of Mexico Research Initiative Information and Data Cooperative (GRIIDC) at https://data.gulfresearchinitiative.org (doi:10.7266/N79P2ZK2, doi:10.7266/N75X26VQ, and doi:10.7266/N7251G4Q). Support for this work was provided by the Gulf of Mexico Research Initiative (P.M., F.J.B.V., and M.J.O.) as part of CARTHE, the Consejo Nacional de Ciencia y Tecnología (CONACyT)–Secretaría de Energía (SENER) Grant 201441 (F.J.B.V., M.J.O., and P.M.) as part of the Consorcio de Investigación del Golfo de México (CIGoM), and the Australian Research Council (ARC) Discovery Project DP150100017 (G.F.).

### APPENDIX A

#### Derivation of (6)

Denote by the time indices at which particle positions may be available. At time *t*, denote by the number of particle positions available at time *t*, and denote the particle indices at time *t* by . Finally, for each and , denote the *q*th particle position at time *t* by . The corresponding particle positions *T* time units later denoted by . We approximate

where *δ* is a Dirac delta; thus, with probability 1,^{A1} a trajectory initialized at (at time *t*, although we disregard this information) lands at after *T* units of time.

We approximate the integral of an arbitrary function *f* over by sampling trajectories:

Putting these two estimates together we have [by setting ],

or (6).

### APPENDIX B

#### Robustness under Dataset Truncations

Evidence of the robustness of the eigenvector method results is provided in Fig. B1, which shows from left to right six-eigenvector geography partitions using transition matrices constructed by randomly excluding 10%, 25%, and 50% of RAFOS trajectories (recall that trajectories from only 152 floats are available for analysis). With a 10% reduction of the data, all regions obtained from the analysis of the complete dataset are present. The only difference is the boundary of the WE province, a transition region between the eastern and western deep GoM characterized initially by lower data density and residence time. Reducing the dataset by 25% and 50% leads to the separation of the WE province into a central region and a band around the WC province. The cyclonic circulation inferred from the analysis of the full dataset around the WC province is still represented. Using fewer trajectories eventually leads to the loss of information in some bins, for example, inside the WC province. Nonetheless, all Lagrangian provinces of the complete dataset are represented considering half of the data.

## REFERENCES

*Chaos*,

**24**, 033126, https://doi.org/10.1063/1.4892530.

*Atmosphere-Ocean Dynamics*. Academic Press, 662 pp.

*Matrix Analysis*. Cambridge University Press, 561 pp.

*Cell-to-Cell Mapping: A Method of Global Analysis for Nonlinear Systems*. Applied Mathematical Sciences, Vol. 64, Springer, 354 pp.

*Finding Groups in Data: An Introduction to Cluster Analysis*. Wiley, 342 pp.

*Proc. Eighth AIMS Int. Conf.*, Dresden, Germany, American Institute of Mathematical Sciences, 854–863, https://doi.org/10.3934/proc.2011.2011.854.

*Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics*. 2nd ed. Applied Mathematical Sciences, Vol. 97, Springer, 474 pp.

*ARPACK Users’ Guide: Solution of Large-Scale Eigenvalue Problems by Implicitly Restarted Arnoldi Methods*. Society for Industrial and Applied Mathematics, 160 pp.

*Lagrangian Analysis and Prediction of Coastal and Ocean Dynamics*, A. Griffa et al., Eds., Cambridge University Press, 39–67, https://doi.org/10.1017/CBO9780511535901.003.

*Markov Chains*. Cambridge University Press, 254 pp., https://doi.org/10.1017/CBO9780511810633.

*Fluid Dynamics*, R. Balian and J.-L. Peube, Eds., Gordon and Breach, 235–374.

*Circulation in the Gulf of Mexico: Observations and Models*.

*Geophys. Monogr.*, Vol. 161, Amer. Geophys. Union, 347 pp., https://doi.org/10.1029/GM161.

*A Collection of Mathematical Problems*. Interscience Tracts in Pure and Applied Mathematics, No. 8, Interscience, 150 pp.

_{3}SF

_{5}in pure water and artificial seawater

*Circulation in the Gulf of Mexico: Observations and Models*,

*Geophys. Monogr.*, Vol. 161, Amer. Geophys. Union, 315–324, https://doi.org/10.1029/161GM22.

## Footnotes

© 2019 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

^{A1}

If one wishes to have a smooth kernel approximation, one could, for example, replace each with tight Gaussians ; for the box radius, this will have negligible impact on the resulting estimates for .