## Abstract

The thermohaline structure of the Southern Ocean is deeply influenced by the presence of the Antarctic Circumpolar Current (ACC), where water masses of the World Ocean are advected, transformed, and redistributed to the other basins. It remains a challenge to describe and visualize the complex 3D pattern of this circulation and its associated tracer distribution. Here, a simple framework is presented to analyze the Southern Ocean thermohaline structure. A functional principal component analysis (PCA) is applied to temperature *θ* and salinity *S* profiles to determine the main spatial patterns of their variations. Using the Southern Ocean State Estimate (SOSE), this study determines the vertical modes describing the Southern Ocean thermohaline structure between 5 and 2000 m. The first two modes explain 92% of the combined *θ*–*S* variance, thus providing a surprisingly good approximation of the thermohaline properties in the Southern Ocean. The first mode (72% of total variance) accurately describes the north–south property gradients. The second mode (20%) mostly describes salinity at 500 m in the region of Antarctic Intermediate Water formation. These two modes present circumpolar patterns that can be closely related with standard frontal definitions. By projecting any given hydrographic profile onto the SOSE-based modes, it is possible to determine its position relative to the fronts. The projection is successfully applied on the hydrographic profiles of the WOCE SR3 section. The Southern Ocean thermohaline decomposition provides an objective way to define water mass boundaries and their spatial variability and has useful application for comparing model output with observations.

## 1. Introduction

The Southern Ocean has a latitude band with no meridional boundary, allowing the Antarctic Circumpolar Current (ACC) to flow from west to east around Antarctica. This nearly zonal ACC is organized in three major fronts that deeply influence the distribution of properties (Talley et al. 2011): from north to south, the Subantarctic Front (SAF), the Polar Front (PF), and the Southern ACC Front (SACCF). The Southern Ocean is traditionally bordered by the Subtropical Front (STF) to the north, forming a major water mass boundary between the warm, salty subtropical surface waters and the cold, fresh subantarctic surface waters (Orsi et al. 1995; Belkin and Gordon 1996). South of the southern boundary of the ACC (SBdy) lies the three subpolar gyre systems in the Weddell–Enderby Basin, the Australian–Antarctic Basin, and the Ross Basin. The location and intensity of ACC fronts have a significant influence on the heat, salt, and nutrient exchange in the World Ocean (Rintoul and Sokolov 2001) as part of the global meridional overturning circulation (MOC).

Several criteria have been proposed in the literature to determine the position and characteristics of ACC fronts. Historically, they have relied on phenological indicators (Wyrtki 1960; Deacon 1982; Pollard et al. 2002) such as the position of minimum temperature or salinity maximum layers. The tracking of specific hydrographic values at given depths were later preferred as they are generally less ambiguous (Park et al. 1993; Orsi et al. 1995; Belkin and Gordon 1996). Refined atlases of hydrographic properties in the Southern Ocean were used in the early 1990s to draw a circumpolar picture of ACC fronts (Orsi et al. 1995), which remains to date a widely accepted reference. The density gradient due to temperature and salinity gradients across ACC fronts generally drives a geostrophic current; thereby, a front is often associated with a swift deep-reaching current with a clear signature in sea surface height (SSH). Sokolov and Rintoul (2007) proposed frontal definitions using specific SSH values for the Australian–Antarctic Basin sector, later generalized to the entire Southern Ocean by Sokolov and Rintoul (2009). In their SSH-based view, they defined eight circumpolar fronts instead of the three proposed by Orsi et al. (1995), significantly complicating the picture of ACC fronts. These fronts are prone to merging and separations as part of the intense eddy activity observed along the ACC. A number of frontal definitions were later proposed based on this same idea (Boning et al. 2008; Sallée et al. 2008; Langlais et al. 2011; Chapman 2014; Kim and Orsi 2014), and it was found that SSH satellite observations were very useful to track fronts and eddies.

However, the notion of well-defined circumpolar fronts structuring the ACC has been increasingly debated lately, as it is unable to account for some major features of the ACC. Distinct differences are observed in the ACC structure between the different basins, with fronts being more intense in the Atlantic and Indian sector than in the Pacific sector. In several sectors, such as around the Kerguelen and Campbell Plateaus, surface and deep signatures of ACC fronts can diverge by several hundreds of kilometers (Park et al. 2008; Smith et al. 2013). Also, water mass fronts are not always unambiguously associated with frontal jets that can be tracked from the surface. Strong gradients in temperature and salinity can be density compensating (e.g., Graham et al. 2012), while the magnitude of gradients largely depends on the local strength of the topographic control or the mesoscale variability and can thus significantly vary with time. The idea of an ACC structured in circumpolar fronts conveys an essentially barotropic view, acting as an effective barrier to any meridional exchange, yet the ACC has a strong baroclinic signature that supports the southern MOC (Marshall and Speer 2012; Peña-Molino et al. 2014).

The distribution of temperature and salinity gives some clue on how the 3D structure is organized. The temperature distribution at the surface presents circumpolar contours similar to the STF, SAF, and PF, but farther south the gradients are too weak to control the stratification (Dong et al. 2006). The salinity distribution better describes the fronts of the Antarctic zone (AZ; SACCF; SBdy). Pollard et al. (2002) observed that the main features of the ACC zonation can be related to the stratification, specifically whether the stratification just below the mixed layer is controlled by the temperature or salinity vertical gradients or both acting together. Similarly, Carmack (2007) pointed out that the upper layers of high-latitude seas are permanently stratified by salinity (beta oceans), while the upper layers of subtropical seas are permanently stratified by temperature (alpha oceans). More recently, Stewart and Haine (2016) proposed maps of the distribution of alpha, beta, and transitions zone oceans. Comparing the salinity distribution at the surface and at ~600 m best illustrates the complex nonbarotropic nature of the ACC (Fig. 1). The salinity distribution at the surface closely covaries with the surface temperature distribution, but at ~600 m, it presents a very different structure. A low-salinity trench present in the Indian and Atlantic basins is opening in the Pacific, which provides a clear signature of the subduction of the Antarctic Intermediate Water (AIW). In the Pacific the low-salinity signal extends north, creating an exchange window with subtropical waters (Iudicone et al. 2007). In this region, actual observations diverge most dramatically from standard definitions of SAF and STF, challenging the reality of their circumpolar nature.

In this study, we analyze the shape of temperature and salinity profiles using an objective statistical method based on functional principal component analysis (PCA) in the multivariate case (Ramsay and Silverman 2005). Classical PCAs are widely used in climate science in the form of empirical orthogonal functions (EOFs). They have also been used to handle several oceanographic variables simultaneously in order to delineate water masses (de Boer and van Aken 1995; Lindegren and Josefson 1998). Yet, in this paper, the PCA is applied on a spline functional model of temperature and salinity profiles instead of applying it directly to the profiles themselves (hence the term functional). The extension of univariate functional PCA to multivariate functional data is of high practical relevance to reveal joint variation of the different variables (Happ and Greven 2015). A multivariate functional PCA provides an effective way to understand the structure of a multivariate system such as the 3D temperature–salinity structure of the Southern Ocean.

Functional PCA have become increasingly important in recent years and have been applied on various datasets and in many fields. For example, in medical sciences, it can be used to explore human cornea curvature images (Locantore et al. 1999) or brain scan images (Viviani et al. 2005). It also helped to describe handwriting dynamics (Ramsay 2000; Hosseinkashi and Shafie 2009), characterize Internet pageview dynamics (Moyer et al. 2015), classify the size distribution of a zooplankton community (Nerini and Ghattas 2007), or explore the variations of cadmium concentrations in Pacific oysters (Feng et al. 2011). Here, for the first time to our knowledge, a functional PCA is used in physical oceanography.

This paper is organized as follows: The first section describes the hydrographic datasets. The second describes the functional PCA method and how it has been applied on Southern Ocean hydrographic data. The third section presents the results of the decomposition, focusing in particular on the two first modes of variability that explain together more than 90% of the hydrographic variance in the first 2000 m of the Southern Ocean. The two last sections conclude on the potential of the functional PCA method and discuss the implications of our results for our understanding of the Southern Ocean stratification and frontal structure.

## 2. Data

### a. The Southern Ocean state estimate

The main dataset used is potential temperature and practical salinity (*θ*–*S*) profiles from the Southern Ocean State Estimate (SOSE).^{1} The SOSE product is a 4D estimate of the Southern Ocean physical state obtained by constraining an eddy-permitting simulation based on the Massachusetts Institute of Technology General Circulation Model (MITgcm; Marshall et al. 1997) to remain close to available Southern Ocean observations (Mazloff et al. 2010). The SOSE product has been preferred to hydrographic data or more standard climatological products because it is provided on a regular spatial and temporal grid, and yet it is by construction physically and kinematically consistent (Wunsch and Heimbach 2007).

The model grid has a ⅙° resolution, encompassing the Southern Ocean south of 25°S. We considered only data south of 30°S to avoid possible boundary effects. Moreover, 30°S is the most inclusive definition to fully encompass all Southern Ocean phenomena northward of the STF in each basin (Talley et al. 2011). It is possible to run a functional PCA on the total SOSE product, but such a resolution is not needed, as the focus is on large-scale variations. Therefore, we use only one data point every five grid points in both latitude and longitude, thus reducing the number of grid points by a factor 25. SOSE provides a 5-day mean that we average two by two, giving 36 outputs every 10 days for the period between the 1 January 2007 and 26 December 2007.

The vertical grid is composed of 42 unevenly spaced levels extending from the surface (5 m) to the bottom (5575 m). The statistical method requires us to work with profiles at a fixed-depth range. Considering that ACC fronts rarely cross regions shallower than 2000 m (e.g., Sokolov and Rintoul 2009) and that the bulk of existing hydrographic data (Argo data) is about 2000 m deep, it was decided to apply the method on *θ*–*S* profiles between 5 and 2000 m (28 levels). Regions where the bathymetry is less than 2000 m (e.g., Kerguelen, Falkland, and Campbell Plateaus or the Antarctic continental shelf) were discarded from the analysis.

### b. WOCE SR3 section

The World Ocean Circulation Experiment (WOCE) coordinated numerous hydrographic sections in the World Ocean in the 1990s. We used the WOCE SR3 section (Rintoul et al. 1997; Sokolov and Rintoul 2002) as a case study. It has been broadly used and is located at a longitude representative of the zonally averaged Southern Ocean. This section was sampled from Tasmania to Antarctica along the 140°E meridian on board the *Aurora Australis* between December 1994 and February 1995. Only the profiles deeper than 2000 m were retained, consisting of 44 stations with temperature and salinity sampled every 2 m (Fig. 1).

## 3. Method

The goal is to analyze the simultaneous variations of *θ* and *S* profiles as a function of depth *z*. For each spatial position, an observation is defined as two vertical profiles of two different tracers (*θ* and *S*), described as continuous functional curves using a decomposition in a B-spline basis. Then we analyze the curves obtained with the specific method coined functional PCA by Ramsay and Silverman (2005). The terminology EOF is used among oceanographers and atmospheric scientists. EOF and PCA are the same methods, except that the term EOF is mostly used for analysis that seeks the main spatial structures in a two-dimensional dataset (space) with time as the sampling dimension. In our case we seek spatial structures in a one-dimensional dataset (depth) with space and time as the sampling dimension. The PCA allows us both to decouple the phenomena that govern the structure of the curves and to quantify the amount of variance that they induce on the water column structure. The analysis is applied on the coefficients of the B-spline expansion model, hence in a functional space. It transforms the original variables into new variables that are uncorrelated linear combinations of the original ones and concentrate the variance of the system. These new variables are principal components (PCs) and represent the most significant modes of data variation. In this section, the reader can refer to the nomenclature table (Table 1) that state every matrix, vector, and scalar with its size and a short definition.

### a. Transformation of discrete profiles into functions

Depth can differ from one dataset to another, making obsolete most of the conventional statistical methods dealing with large data tables. To compare datasets they must be on a common vertical grid. This is achieved by linearly interpolating the SR3 and SOSE data onto a uniform vertical grid with 100 × 20 m levels. Then the profiles are projected on a chosen basis {*ϕ*_{1}, . . . , *ϕ*_{K}} of finite dimension *K*, which allows us to express a profile as a linear combination:

where the are coefficients estimated by regression using data on hand. In other words, the discrete profiles are smoothed, turning them into continuous curves *x*_{n}(*z*) that could be compared at any depth. For this study the functions *ϕ*_{k} form a B-spline basis. The B-splines are polynomial segments of degree three joined end to end at argument values (knots), adjusted on the data under continuity and derivability constraints. Other choices of basis can be relevant as well, but this basis is best suited for nonperiodic functions and has the advantages of very fast computation and great flexibility (Ramsay et al. 2013). The number of segments controls the flexibility of the spline (Fig. 2). We used by default *K* = 80 spline functions to represent a 2000-m-deep profile.

The proposed method can keep the vertical complexity of the data (high number of basis functions) or disregard it by smoothing the profile (low number of basis function), which is useful when the data are highly corrupted by noise (Nerini et al. 2010). A projection on a basis of *K* = 20 functions would be sufficient to get a good estimate of the SOSE profiles, given their small, vertical resolution and smooth curvature. But the number of basis functions to use depends also on the dataset we want to project on the modes afterward. A smaller number of basis functions is relevant when using observations with sparse vertical data points and noise. Here, the case study (WOCE SR3 section) is composed of clean and high-resolution profiles (one sample every 2 m). These profiles can be estimated with a high number of basis functions to capture finescale without adding another source of error. We find that using 80 B-splines by profile allows us to get a negligible loss in the SR3 data, the absolute mean error being 10^{−3} psu for the salinity and 7 × 10^{−3} °C for the temperature. The estimation error of the SOSE profiles with *K* = 80 B-splines is close to zero. This projection procedure from numerous raw data to a basis containing a reasonable number *K* of coefficients can also be useful as a first step toward dimension reduction.

### b. Functional principal component analysis

For each spatial position, *θ* and *S* curves are projected into a B-spline basis with *K* coefficients. We therefore have a data table of size *N* rows and *L* = 2 × *K* columns composed with the coefficients of that decomposition. Each row constitutes a vector **c**_{n} such that

is composed with *L* = 2 × *K* elements that merge both coefficients of temperature and salinity profiles. The mean observation can be computed with

where corresponds to the mean of the *N* coefficients of column *k* in . As in Eq. (1), a mean functional profile can be obtained for both salinity and temperature variables with

Let us now consider the anomaly of coefficients matrix denoted as matrix where the mean observation has been subtracted from each row of . The functional PCA consists of finding the unique decomposition of the matrix by solving the following eigenvalue problem:

where **b**_{l} is the *l*th eigenvector associated with the eigenvalue *λ*_{l}. These *L* eigenvectors can then be ranked by order of their associated eigenvalues, so that the first vertical mode corresponds to the eigenvector with the largest eigenvalue and so on. The matrix is the crossed covariance matrix of size *L* × *L* of the coefficients in computed with

The structure of the data matrix (rows composed with centered coefficients merged from *θ* and *S* decomposition) provides a block-structured covariance matrix that can also be expressed as

where matrix _{Sθ} means the covariance matrix of size *K* × *K* between coefficients of variable *S* and variable *θ*.

As B-splines do not form an orthonormal basis, the matrix of size *L* × *L* ensures the metric equivalence between the functional problem (working on functions) and its discrete version (working on coefficients). It is constructed by block as follows:

where matrices _{i} are constituted with scalar products of the basis functions {*ϕ*_{1}, . . . , *ϕ*_{k}}, which are integrals in that case. They can easily be computed or approximated numerically. In that special case, _{θ} = _{S} because the same B-spline basis has been used for *θ* and *S* profile basis expansion. In the same way, the block diagonal weighting matrix of size *L* × *L* is used to balance the coefficient values when decomposing both the variance of temperature and salinity data. It acts as a normalization step usual in standard PCA when variables do not have the same units.

As in Eq. (1), an eigenvector with associated eigenvalue *λ* will generate two eigenfunctions (*ξ*^{θ}, *ξ*^{S}) (called vertical modes in this paper) that can be expressed in the physical *θ*–*S* space with

where coefficients

are a normalized version of the eigenvector **b** with respect to the metric. Here, we used the fact that is a symmetric, positive, definite matrix that has a unique Cholesky decomposition , where ^{1/2} is an invertible upper triangular matrix; is a diagonal matrix, so its inverse square root is readily defined.

In the same way, observations can be projected in a space of small dimension when computing the PCs denoted as

For example, the *N* row coordinates of matrix (**y**_{1}, **y**_{2}) provide the best 2D mapping of the profiles in the PCs space, associated with the highest projected variance *λ*_{1} + *λ*_{2}; an observation is no more a B-spline curve but a curve in the space of the eigenfunctions.

### c. Computation of the Southern Ocean modes

To compute the vertical modes of the Southern Ocean, we use *θ*–*S* profiles from the year 2007 of SOSE, as explained in section 3a. The crossed covariance matrix of the SOSE profiles is represented in Fig. 3. Temperature is highly correlated (>0.83) with itself at any depth (block _{θθ}). Surface salinity is anticorrelated with deep salinity; the transition is located at ~500 m (contour 0 of the block _{SS}). The block _{θS} (or _{Sθ}) reveals that temperature from 0 to 500 m is highly correlated (0.8) with surface salinity, whereas salinity at 500 m has a null correlation with temperature at any depth. The functional PCA finds the unique decomposition of this matrix that decorrelates objectively the structures we just described.

Modes have been determined using the SOSE product; however, they define a basis on which any observations can be projected. Once the observations are projected in a space of small dimension *q* < *L*, it is possible to reconstruct each salinity and temperature curve with only these *q* dimensions using the attached eigenfunctions computed with SOSE. As a case study the WOCE SR3 hydrographic profiles are projected on the modes and reconstructed with only one or two dimensions in section 4b.

## 4. Results

### a. Vertical modes

A total of 98.83% of the variance is captured by the six modes attached to the first six highest eigenvalues (Table 2). Only the first two PCs summarize as much as 92.41% of the variance. The effect of the different modes can be displayed by adding or subtracting the eigenfunctions to the notional mean profiles , such that

as shown in Fig. 4. This representation is equivalent to the display of variables in standard PCA. It presents the effect of adding (+ curve) and subtracting (− curve) a suitable multiple of each mode.

The first mode explains 72.52% of the variance and involves a modification of the whole water column in both temperature and salinity. The variation in temperature explained by PC1 goes from an almost unstratified, cold water column when PC1 is positive to a warm, strongly stratified column when PC1 is negative. The variation of salinity shows an inversion at 610 m. The − curve presents fresher waters in the first 200 m, while the + curve is maximum at 200 m and minimum at 1000 m. Temperature and salinity are contributing almost equally to this PC with a slight advantage of the temperature (61.37%).

The second mode represents 19.89% of the total variance. It is largely driven by salinity with a relative contribution of 85.7%. When the PC2 value increases, the temperature decreases at any depth, with slightly larger variations in the top 150 m. The mode PC2 influences salinity from the near surface to ~1500 m, increasing with PC2 values. The maximum effect of PC2 on the salinity is found around 500-m depth. Note that the − curve has less salinity variation near the surface than the + curve, which means that larger PC2 values are associated with stronger salinity gradients just below the mixed layer.

Higher modes represent a smaller fraction of the total variance (7.59%), although they might be of large relative importance in some sectors of vertical profiles such as near the surface. PC3 has a significant influence on temperature below 250 m, and it has an inversion at 300 m for salinity, meaning that increasing the PC3 value freshens the surface while increasing salinity at intermediate depth. PC4 and PC6 have large signatures in surface temperature, while PC5 instead has a very strong signature in surface salinity.

A test of the sensitivity to the number of B-spline basis functions has been performed by redoing the PCA using only *K* = 20 basis functions (see Table 2). There is almost no effect on the variance explained of each of the first six modes, showing that the PCA is robust to the B-spline decomposition. The variance associated with PC5 and PC6 with a close fit (80 B-splines) is slightly higher than with 20 splines (1.05 > 0.99 and 0.59 > 0.58). These changes are very small, but it implies that an increased vertical resolution of the model might put a little more variance to the higher modes. An analysis that extends deeper than 2000 m will not produce different modes as there are smoother changes of properties there. If a functional PCA is done on profiles that exclude the mixed layer (e.g., profiles that extend from 400 to 2000 m), the variance explained of the modes higher than PC4 is redistributed on PC1 to PC4.

Because PC1 and PC2 modes alone are able to capture more than 90% of the total variance, it is natural to attempt a description of the thermohaline structure of the Southern Ocean based only on these two modes. We can map the PC1/PC2 coordinate, where the origin corresponds to the notional mean *θ*–*S* profile, and each point in this plane represents a single *θ*–*S* profile obtained as a linear combination of the first two modes. The distribution of SOSE profiles in the PC1/PC2 map is displayed in the central panel of Fig. 5, showing a characteristic comma shape. To illustrate how temperature and salinity profiles vary on the PC1/PC2 map along the comma shape, we will now turn to the case of WOCE SR3 hydrographic profiles.

### b. A case study: The WOCE SR3 section

Figure 5 allows us to visualize the shape of any hydrographic observations relatively to the modes. Here, the black points are the profiles of the WOCE SR3 section that we first transformed into functions (Fig. 2) and then projected on the SOSE modes to identify the features of *θ* and *S* responsible for the strongest components of variability: PC1 and PC2. In Fig. 5, we add black segments representing the fronts defined in section 5 as a visual aid. On the PC1/PC2 plane, each point contains the whole vertical structure of temperature and salinity profiles. Here, it displays a continuous comma shape, sorting the stations from the south [−1, 1] to the north [3, 1]. With the help of SR3 profiles (Fig. 5, black dots), we identify the subpolar region south of the SBdy [−1, 0.6] forming an angle with the Antarctic zone. Then the latter is delineated by PC2 only, PC1 being almost constant at −1 from the SBdy until the PF. The regime changes drastically in the polar frontal zone (PFZ, between PF and SAF) from a constant PC1 on the left side of the plot to PC1 and PC2 increasing together on the right side. The PFZ is the area of transition where temperature and salinity contribute equally to stratification (Pollard et al. 2002). North of the SAF temperature dominates in contribution to stratification, whereas south of the PF salinity dominates. On the PCs space the PFZ spreads on a large area (−1 to −0.2 for PC1) but only on ~4° of latitude in the section (Fig. 6). Finally the SAF and STF positions are consistent with the projection of SR3.

Hydrographic observations projected on the SOSE-based modes can be reconstructed with only the main modes of variation as illustrated here with the SR3 section. Four SR3 profiles are plotted with their PC1 + PC2 reconstruction in Figs. 5a–d. This mainly shows that the information kept by the modes higher than two is mostly surface stratification. More precisely, the reconstruction of the SR3 section with only PC1 and PC1 + PC2 is shown on Figs. 6d–g. PC1 does not take into account the surface stratification in temperature (Fig. 6d) characteristic of the Antarctic Surface Water (AASW). It only contains the difference in stratification that changes at ~52°S, both in temperature and salinity. The reconstructed temperature profiles are almost homogenous in the south at ~2°C and stratified in the north with warmer waters at the surface. The reconstructed salinity shows an inversion of stratification similar to the one described by the vertical mode (Fig. 4b): fresher surface waters in the south and past 50° saltier surface waters with a minimum at ~1000 m. Adding PC2 reveals more details in the temperature structure of the AASW, but it mainly solves the characteristic low-salinity tongue that was totally missed by PC1 alone (Figs. 6f,g). PC2 strengthens also the salinity stratification at the surface of the AASW and adds more details in the high-salinity surface waters north of 52°S. The temperature markers for the southernmost fronts (PF, SACCF, and SBdy) are poorly estimated by PC1 + PC2. Whereas the reconstruction of salinity marks these southernmost fronts by larger slopes, the northernmost fronts (SAF and STF) are equally marked in the reconstruction of both variables, with larger slopes of temperature and changes in the salinity from the surface to 750 m. Figure 6a is complementary of the main panel of Fig. 5. It displays the PC1 and PC2 of SR3 profiles and mark four values (red segments) that match best with climatology contours of fronts (see section 5). The reconstruction of the section with PC1 + PC2 + PC3 mainly shows an improvement of the temperature field under the winter water between 53° and 62°S (Fig. S1 in the supplemental material).

### c. Spatial distribution of modes

By displaying the spatial distribution of the PCs averaged over the year 2007 (Fig. 7), we can identify the different regimes of water masses, defined by the modes. The first two PCs have similar shape from 30°S to the PF but with different amplitude. In the Antarctic zone, PC1 is almost constant with a value of −1 and is therefore not discriminant anymore. It is PC2 that reveals additional structures that are consistent with the most recent maps of mean dynamic topography (Rio et al. 2011). We identify, for example, the Fawn Trough Current (Roquet et al. 2009), the Weddell and Ross Gyres and even the multiple jets at 140°E described by Sokolov and Rintoul (2002). The South Pacific Gyre is revealed by null values of PC1 and minimum values of PC2 with contours directed in opposite directions; PC1 contours are going north with longitude, while PC2 contours are going south with longitude. Finally the boundary made by the STF south of the Agulhas system is intense in both PC1 and PC2.

Higher PCs show less zonal and more complex structures. PC3 is a variation related to the depth of the pycnocline (Fig. 4c). The positive values are the deepest pycnocline (Agulhas system and Indian Basin), whereas the negative values are the shallowest pycnocline (Weddell Sea, Pacific and Atlantic subtropical waters). The ACC shows a slightly positive signal, while the Pacific and Ross Gyre are null (Fig. 7c). This third mode especially differentiates the southern part of the three basins with positive values in the Indian, null values in the Pacific, and negative values in the Atlantic.

The negative values of PC4 associates cooler surface waters with fresher waters deeper than 600 m. In contrast, PC5 shows a strong signal of low salinity (− curve) at the surface driving the mode by 93.8% (Fig. 4e). This can be related to the seasonal ice melting and thus to surface release of freshwater in the ACC, as the map of PC5 shows negative values around the ice limit (Fig. 7e). The maximum values of PC5 highlight the Weddell and Ross Gyres as well as the sluggish zone west of the Kerguelen Plateau. PC6 is not obvious to interpret but could be a seasonal thermocline mode as seen on its mode representation (Fig. 4f).

### d. Relationship between modes and hydrographic properties

One can wonder which unique criterion, such as a tracer at a given depth, could best describe the whole vertical structure of the ocean. Thus, the correlation between the spatial distribution of the two first modes and temperature or salinity at any depth is computed (Fig. 8). The black profile is the mean, and the gray envelope is bordered by the maximum and minimum correlation for any time step. The seasonal variability is illustrated with the gray envelope around the temporal average. These variations of correlation during the year are null except for the salinity at the surface, showing that the seasonal cycle has no influence on the correlations. PC1 and temperature have a correlation >0.76 at any depth and a maximum of 0.98 at 250 m, while PC2 and temperature have a low correlation at any depth (between −0.57 and −0.15). The correlation between PC1 and salinity is high (0.95) at the surface, vanishes at 610 m, and decreases until a minimum (−0.8) at 1355 m. This vertical variation is coherent with the expression of PC1 on salinity profiles (Fig. 4). The correlation between PC2 and salinity shows a maximum at 610 m and null values at surface and depth, in opposition with the correlation of PC1 and salinity. This illustrates the orthogonality between modes. To summarize, temperature at 250 m and salinity at 25 and 1355 m are the best correlated fields with PC1 (72.52% of the total variance), whereas the salinity at 610 m correlates very well with PC2 (19.89% of the total variance). This is coherent with Fig. 3 revealing a null correlation between salinity at ~600 m and temperature at any depth.

At the surface the salinity is the minimum in the south following the ice limit and the field is positively correlated with PC1 (Fig. 1a). Then at 610 m the salinity minimum shifts northward, revealing this low-salinity channel characteristic of PC2 (Fig. 1b). Note that the salinity minimum at 610 m extends in the whole South Pacific basin, which is not the case for the PC2 field (Fig. 7b). This is why the correlation between these two is only 0.91 and not 1. It also delineates well the Agulhas system and leakage. Thus, PC1 dissociates well the subtropical and subantarctic zone (i.e., STF), as seen in the temperature at 250 m and in the salinity at 25 m. It also reveals the SAF, best seen in the temperature at 250 m and in the salinity at 1355 m. PC2 adds the PF and SACCF, which appear in the salinity at 610 m.

### e. Analysis of residuals

In this section, we describe how the residuals are distributed between the profiles reconstructed with PC1 and PC1 + PC2 only and the original profiles of *θ* and *S*. We use the root-mean-square error (RMSE) to represent these residuals on the vertical (Fig. 9) and on the horizontal (Fig. 10). The total RMSEs (Fig. 9, solid lines) are at their maximum at the surface and reach a first minimum at 250 m for temperature. PC2 clearly improves the salinity estimation from 25 to 1500 m and does not at all modify the surface, while the temperature profiles are corrected on the whole water column, as expected from Fig. 4b. A second maximum is found at ~600 m, more marked on PC1 + PC2 RMSEs (blue). These two maxima (surface and 600 m) can be isolated by computing the RMSE on profiles north or south of 52°S. Indeed the 600-m bump is not found on the RMSE south of 52°S for temperature and salinity (dashed lines). Moreover the salinity RMSE south of 52°S obviously contains the surface error, which is the seasonal variability due to the ice, contained by PC5 (Figs. 4e and 7e). In contrast RMSE north of 52°S contains the 600-m bulb and most of the seasonal variability for temperature. So the maximum of RMSE at 600 m is concentrated north of 52°S and the Fig. 10 indicates that it is mainly due to the Subantarctic Mode Water (SAMW) that lies north of the SAF in the Australian–Antarctic sector. This area is also well highlighted by a positive PC3 in Fig. 7c. The 600-m maximum of temperature RMSE is not corrected by PC2 north of 52°S. The RMSE south of 52°S and deeper than 250 m is concentrated on the Weddell Sea (Fig. 10c). Figure 10 indicates that PC2 corrects well the profiles south of the STF under 250 m, as expected. The ACC band is almost not affected by this error (Figs. 10c,d), and PC1 + PC2 gives a good estimation of its vertical structure while disregarding the surface seasonal variability. Therefore, PC1 + PC2 is sufficient to detect ACC fronts, but it might be less accurate for the STF in the Indian and Atlantic basins.

## 5. Front detection

One potential application of the PC decomposition is to provide an objective way to detect fronts. It is apparent that some PC contours can be closely related to the climatological frontal path. Here, we want to provide more quantitative information on how well PC contours can be used as proxies of the frontal position. The climatological position of four Southern Ocean fronts will be considered here. The reference STF position is taken from Orsi et al. (1995), and the SAF, PF, and SACCF are from Kim and Orsi (2014), who define fronts using SSH and validate them with Argo profiles. For the four circumpolar fronts, we first extract the PC1 and PC2 values of grid points that are closest to the climatological front position. Then we compute the median of extracted PC values, as well as the interquartile range (IQR), which is the difference between the third and first quartile (Fig. 11 and Table 3). We also extract the gradients of PC1 and PC2 along each circumpolar front. The average value of the PC gradient along the fronts is displayed in Table 3 and reveals a larger gradient for PC1 along the STF and SAF and a larger one for PC2 along the PF and SACCF. This means that, on average, the PC1 contours provide better estimators of the STF and SAF positions, yet PC2 contours work better for the PF and SACCF.

The values of PCs selected (Table 3) are shown in Fig. 5 (black segments) and Fig. 6 (red segments). The STF (PC1 = 1.05) is crossing several SR3 profiles on Figs. 5 and 6, which could indicate that the section is aligned with the fronts or the presence of a meander and eddies. Rintoul and Sokolov (2001) describe a recirculation between 44° and 48°S with water from the Tasman Sea flowing westward at 44°S. The SAF defined by PC1 = −0.36 is located at the right place according to the property indicators of Orsi et al. (1995), that is, *S* < 34.2 under 300 m and *θ* > 4°–5°C at 400 m (see Figs. 6a–c). The PC values for the PF and SACCF are also coherent with Orsi et al. (1995). The PF (PC2 = −0.31) is found near the subsurface *θ* = 2°C and the SACCF for *θ* = 0°C along *θ* min at 150 m and *S* > 34.73 along *S* max at 800 m. Thus, the fronts’ position defined with the vertical modes identified on the year 2007 of SOSE corresponds to the property indicators of fronts in SR3 that is done in 1994/95. This is a good indicator of the robustness of PC1 and PC2.

Figure 11 gives a visual idea of the match between selected contours of PC and standard frontal definitions. This method implies that the zonation is defined by the same criteria at every longitude. The contours on the PC maps are not tight all along the ACC (Figs. 4 and 11), which implies a weakening of the water masses’ boundaries in some places.

The STF band is only narrow in the Atlantic and along the Agulhas return current, while the east Indian Basin and Pacific basin present smaller gradients (wider gray band). On the west side of the Pacific, the front forms a right angle that has also been described with SSH contouring (Sokolov and Rintoul 2009), and in the west Atlantic it is found ~5° south of Orsi et al. (1995). The wide STF area south of Australia may be due to the poor estimation done by PC1 in this area (Fig. 10a). It could also relate to the multiple STF jets in this area as suggested by James et al. (2002). The SAF is well followed by PC1 with some discrepancies: PC1 contours surround the Crozet Archipelago and get wider southeast of the Campbell Plateau and in the east Pacific and Argentine Basin. The largest difference of path is on the Argentine Basin where PC1 = −0.36 is ~5° north of the SAF of Kim and Orsi (2014). However, a southernmost contour of PC1 or PC2 (Figs. 7a,b) gives this particular spike specific of the Falkland Current, which means that a different dynamic must be occurring between 300° and 320°E for the SAF. The correlation between PC2 and salinity at 610 m is interesting; this is the depth at which the subsurface salinity minimum appears north of the SAF on SR3 (Fig. 6b) and where temperature begins to dominate the contribution to stratification. Therefore, the minimum of PC2 (or salinity at 610 m) is a good indicator of the SAF if this latter is only defined as the northern limit of temperature domination in contribution to stratification. Indeed on PC2 the SAF is found in the middle of the minimum value channel (Fig. 12) between the Kerguelen and Challenger Plateaus, otherwise it follows contours that mark the southern limit of the PC2 low-value channel (in Atlantic and Pacific basins). In the Pacific and Atlantic, this subsurface minimum of salinity is farther north, far away from the velocity-based fronts’ definition (Fig. 12). This raises up the concept of “not so circumpolar” fronts as Pollard et al. (2002, p. 1) said, “It is not the fronts that are circumpolar, but the total ACC transport and scalar properties of the salinity and temperature fields.”

The PF is well delimited by the −1 contour of PC1 (black contour on Fig. 11), yet the whole Antarctic zone has an approximate PC1 value of −1. Thus, the PF is a northern limit to where the thermal mode is not discriminant anymore to differentiate water masses. The PF is a transition between the thermal mode in the north and the haline mode in the south. This is also consistent with Pollard et al. (2002) describing the PF as the northern limit of salinity domination to stratification. As mentioned above, PC2 reveals the frontal structures in the Antarctic zone, starting at the PF. Indeed PC2 fits remarkably well the PF in tight contours except west of the Kerguelen Plateau where it gets wider, crosses through the plateau, and reappears more concentrated. The −1 contour of PC1 is passing completely north of Kerguelen and reappears farther south, like PC2 contours. In other words, around Kerguelen the haline mode depicts the PF south of the island like Kim and Orsi (2014) and Park et al. (2014), whereas the thermal mode follows a path north of the plateau similar to the definition of Orsi et al. (1995) and many others. The other notable difference is between south Georgia and the Falklands Islands where the contour of PC1 coincides with Kim and Orsi (2014), while PC2 contours are found north of that (Fig. 11).

The SACCF is driven by an intense subsurface salinity gradient and is globally well adjusted on PC2; the zone of discrepancies is located where the distribution of reoccurrence rates of Kim and Orsi (2014) frontal indicators are the largest. We must add that PC3 and PC4 are accurate indicators of the SB with some differences between the two. PC3 presents an intrusion on the eastern side of the Ross Sea (Fig. 7, 0 contour of PC3), while the 0 contour of PC4 follows very well the SB described by Orsi et al. (1995) without the intrusion of ACC water in the Ross Gyre.

## 6. Conclusions

In this study, we used functional PCA to decompose the vertical structure of Southern Ocean hydrographic profiles in order to establish a simple ranking of them. The originality of the analysis is to integrate the functional nature of *θ*–*S* profiles. The functional approach allows us to match any profiles from model or observation, and the result of the decomposition brings an interesting view on the Southern Ocean circulation. It compares the shape of the profiles and identifies which information is contained in each variable, temperature, or salinity. The main modes of variation are directly related to the joint *θ*–*S* distribution. They do not capture the seasonal variance because the profiles reach 2000 m, revealing larger spatial variations at depth than in the surface seasonal layer.

We found that the main mode of variation is the north–south *θ*–*S* gradient opposing Antarctic to subtropical waters (PC1, 72.52%). This primarily thermal mode has a spatial distribution directly related to temperature at 250 m and salinity at 25 m. It explains well the STF and SAF but does not vary much in the Antarctic zone, delineating the PF. The second mode (19.89%) is rather associated with a variation of salinity between 25 and 1500 m. This haline mode reveals the AZ structures (PF and SACCF) and is best correlated to salinity at 610 m. These PC maps (Fig. 7) combined with the vertical modes (Fig. 4) are a new type of useful indicators to describe water structure and define fronts. Here, we did not give a fully objective method to locate frontal structure as we are based on climatological fronts (section 5). Yet, the PC values used to depict the frontal path are consistent with Orsi et al. (1995) classical property indicators of ACC fronts.

The PF is the only circumpolar front where PC1 and PC2 contours coincide (Fig. 7). PC1 only vary north of the PF and does not vary anymore south of it, allowing PC2 to dominate the Antarctic zone. Pollard et al. (2002) proposed a zonation primarily controlled by the ACC and second by the changing balance of stratification, with temperature dominating the stratification north of the SAF, contributing equally with salinity in the PFZ, and south of the PF, salinity dominates over temperature. It is interesting to note that our objective statistical method gives PCs with spatial patterns that can be connected to the zonation of Pollard et al. (2002).

In some areas PC1 and PC2 are acting against each other, revealing the strong baroclinicity of the flow. This is particularly the case west of the Kerguelen Plateau, southeast of the Challenger Plateau, in the southeast Pacific and Argentine Basin (Fig. 7). These areas also correspond to wider fronts on Fig. 11 and high values of baroclinic cross-stream transport (Peña-Molino et al. 2014, their Fig. 7). Indeed, the water mass fronts that appear on the maps of PC (Figs. 7, 11) vary in width with longitude, revealing weakening and strengthening of the water mass boundaries especially for the SAF in the Pacific basin.

The residuals associated with modes higher than two (section 4e) can be dissociated in two types: the surface seasonal variability and the subtropical waters high gradients at ~600 m (Figs. 9, 10). For example, the fifth mode captures the freshwater input to the surface from melting ice. Further investigations of higher modes are needed to interpret which physical processes govern them. This would be useful to study the temporal variability of the ocean structures.

The functional PCA decomposition is particularly powerful in the Southern Ocean probably because of the well-organized nature of its thermohaline structure. The ACC organizes the interbasin flow by receiving, mixing, and distributing waters of the three other basins, which is strikingly well captured by the functional PCA. With 3.4% of the total variance, the third mode differentiates well the structure of the three subtropical basins, and a 3D representation of the PCA similar to the Fig. 5, but with PC3 as the third dimension, would show a clear separation between basin properties north of the SAF, but we leave that for a future study. In theory nothing would prevent us from adding more than two variables in the analysis (e.g., oxygen and nutrients), and the maximum depth of profiles could be fixed at a different value than 2000 m. A different maximum depth (e.g., 300 m) or a more restricted geographic area (e.g., considering a smaller latitude range) would give more detailed and local information about the water mass distribution.

Note that the SOSE-based modes are not specific to the SOSE hydrographic distribution. Of course, a similar calculation of thermohaline modes based directly on available hydrographic profiles could be performed; however, it would probably not produce significantly different modes both because SOSE has been constrained with available observations and because we focused here on robust large-scale features of the Southern Ocean thermohaline structure, which is the existence and depth of minima below the mixed layer. The SOSE-based mode can thus be used as Southern Ocean modes to classify any *θ*–*S* observations and locate them relatively to the main ACC fronts as we did for the WOCE SR3 section.

Being able to determine objectively the type of a cast in terms of water mass properties is a valuable asset for many oceanographic problems. For instance, it can be extremely useful for data calibration. In frontal regions, the usual calibration methods are uncertain (Wong et al. 2003), and our study could improve that by locating any profile relatively to a climatology in a PC space, our advantage being to take into account the whole vertical shape. Another perspective of this work is to correlate PC modes to SSH variations in order to reconstruct temperature and salinity profiles from altimetry observations similarly to the gravest empirical mode projections (Meijers et al. 2011), with the advantage that functional PCA do not need to assume an equivalent barotropic structure of the Southern Ocean. This method also has the potential to robustly assess changes in heat and salt content and could be used to study the spatiotemporal variability of the fronts.

It is the first time to our knowledge that functional PCA is applied to a problem in physical oceanography. Compared to classical PCA methods, that is, EOF in the standard meteorological terminology, functional PCA adds more flexibility, enabling us to simultaneously analyze datasets with unstructured sampling or heterogeneous data types. More generally, it is clear that functional PCA presents a large potential for data analysis in climate sciences.

## Acknowledgments

The authors acknowledge the Scripps Institution of Oceanography for providing the Southern Ocean State Estimate data (http://sose.ucsd.edu/). Computational resources for the SOSE were provided by NSF XSEDE Resource Grant OCE130007. The authors acknowledge the National Oceanic and Atmospheric Administration (NOAA) for providing the World Ocean Circulation Experiment (WOCE) hydrographic data (https://www.nodc.noaa.gov/woce/).

## REFERENCES

*Ninth Int. AAAI Conf. on Web and Social Media*, Oxford, United Kingdom, 75–82. [Available online at https://opus.lib.uts.edu.au/bitstream/10453/43941/1/Moyer%2615InfluenceRedditOnWiki.pdf.]

*Functional Data Analysis.*2nd ed. Springer, 426 pp.

*Descriptive Physical Oceanography: An Introduction.*6th ed. Academic Press, 555 pp.

## Footnotes

Supplemental information related to this paper is available at the Journals Online website: http://dx.doi.org/10.1175/JPO-D-16-0083.s1.

^{1}

For simplicity, we will refer to potential temperature as “temperature” in the following.