## 1. Introduction

In 2011 an array of floating ocean surface buoys (drifters) were deployed in the Sargasso Sea to assess the lateral diffusivity of oceanic processes (Shcherbina et al. 2015). Each drifter was equipped with a global positioning system (GPS) receiver recording locations every 30 min. Addressing the primary goal of understanding the processes controlling lateral diffusivity requires significant processing of the drifter positions, including removing mean flow, accounting for the large-scale strain field, and analyzing the residual spectra for hints of a dynamical process. However, it quickly became clear that the GPS position data, which can have accuracies as low as a few meters (Wide Area Augmentation System T and E Team 2016), were contaminated by outliers with position jumps of hundreds of meters or more. Prior to analysis, the position data require removing outliers as well as interpolating gaps to keep the position data synchronized in time across the drifter array.

The basic problem is ubiquitous: observations from GPS receivers return observed positions *x*_{i} at times *t*_{i} that differ from the true positions *x*_{true}(*t*_{i}) by some noise *ε*_{i} ≡ *x*_{i} − *x*_{true}(*t*_{i}) with variance *σ*^{2}. The goal of *smoothing* is to find the true position *x*_{true}(*t*_{i}) that is not contaminated by the noise, whereas the goal of *interpolating* is to find the true position *x*_{true}(*t*) between observation times. The approach taken here is to use smoothing splines. This approach is relatively broad (Handcock et al. 1994; Nychka 2000), and is related to the methods in Yaremchuk and Coelho (2015) and Elipot et al. (2016) for smoothing drifter trajectories, as discussed later.

*x*(

*t*) is specified using interpolating B-splines

*X*

^{K}(

*t*) such that

*K*is the order (degree

*S*=

*K*− 1) of the spline. For

*N*observations we construct

*N*B-splines such that

*x*(

*t*

_{i}) =

*x*

_{i}for appropriately chosen coefficients

*ξ*

_{i}. To smooth the data, we choose new coefficients

*λ*

_{T}≥ 0. If

*λ*

_{T}= 0 then

*ϕ*= 0 and

*x*(

*t*

_{i}) =

*x*

_{i}, but if

*λ*

_{T}→ ∞ then this forces

*x*(

*t*) to a

*T*th-order polynomial (e.g., when

*T*= 2, the model is forced to be a straight line because it has no second derivative). The resulting path

*x*(

*t*) is known as a smoothing spline and was first introduced in modern form by Reinsch (1967), but according to De Boor (1978) the idea dates back to Whittaker (1923). Once

*S*and

*T*are chosen, the smoothing spline has one free parameter (

*λ*

_{T}) and its optimal value can be found by minimizing the expected mean-square error when the true value of

*σ*is known (Craven and Wahba 1978).

Three issues must be addressed before smoothing splines are applied to GPS data:

- how to choose
*S*and*T*—and how these choices affect the recovered power spectrum, - how to modify the spline fit to accommodate the non-Gaussian errors of GPS receivers, and
- how to identify and remove outliers.

To address these issues but also to serve as a practical guide to other practitioners, we review B-splines in section 2 and introduce the canonical interpolating spline as the underlying model for path *x*(*t*) in (1). We demonstrate the effect that choosing *S* has on the high-frequency slope of the power spectrum of the interpolated fit.

Section 3 takes a broad look at smoothing splines and the assumptions they make on the underlying process. Many of the ideas presented in this section are known to the statistics community, so here we present these ideas from a more physical perspective. We show that the penalty function in (2) can be formulated as a maximum-likelihood problem and that applying tension is equivalent to assuming a Gaussian distribution on the tensioned derivative of the underlying process.

Section 4 uses ensembles from synthetic data that mimic the oceanographic data to test a number of choices that must be made. We establish that setting *T* = *S* is a reasonable choice. We show how the tension parameter can be chosen a priori (without optimization of the mean-square error) when the *effective sample size* (which we define later) can be estimated from the data. This estimate for effective sample size can be used to reduce the coefficients *ξ*^{i} in the spline fit without increasing mean-square error.

The second half of the paper addresses issues specific to GPS position errors. In section 5 we discuss the assumptions of stationarity and isotropy required for bivariate smoothing splines. In section 6 we show that GPS errors are not Gaussian distributed but rather are *t* distributed, and we show how to modify our method for a *t* distribution. Section 7 addresses how to modify our method to make smoothing splines robust to outliers. We compare with alternative methods and conclude in sections 8 and 9, respectively.

A major outcome of this work is the implementation of MATLAB classes for generating B-splines, interpolating splines, smoothing splines, and a class specific to smoothing GPS data (https://github.com/JeffreyEarly/GLNumericalModelingKit). These classes are highlighted throughout in relevant sections.

## 2. Interpolating spline

Assume that we are given *N* observations of a particle position (*t*_{i}, *x*_{i}) with no errors. The simplest form of interpolation is a nearest-neighbor method that assigns the position of the particle to the nearest observations in time. The resulting interpolated function *x*(*t*) is a polynomial of order *K* = 1 (piecewise constant), shown in the top row of Fig. 1. The next level of sophistication is to assume a constant velocity between any two observations and to use that to interpolate positions between observations, as shown in the second row of Fig. 1. This means we now have a piecewise constant function *dx*/*dt* that represents the velocity of the particle, shown in the second row, second column, of Fig. 1. This is a polynomial function of order *K* = 2.

It is less obvious how to proceed to a polynomial of order *K* = 3. With *N* data points we can construct a piecewise constant acceleration (the second derivative) using the *N* − 2 independent accelerations computed from finite differencing, but where to place *knot points* that define the boundaries of the regions and how to maintain continuity is less clear. The approach taken here is to use B-splines.

### a. B-Splines

A B-spline (or basis spline) of order *K* (degree *S* = *K* − 1) is a piecewise polynomial that maintains nonzero continuity across *S* knot points. The knot points are a nondecreasing collection of points in time denoted by *τ*_{i}. The basic theory is well documented in De Boor (1978), but here we present a reduced version tailored to our needs.

*m*th B-spline of order

*K*= 1 is defined as

*P*knot points we can construct

*P*− 1 B-splines of order

*K*= 1, although if a knot point is repeated it results in a spline that is zero everywhere. To represent an interpolating function

*x*(

*t*) for the

*N*observations of a particle position (

*t*

_{i},

*x*

_{i}) we define

*N*+ 1 knot points as

*N*independent basis functions that provide support for the region

*t*

_{1}≤

*t*≤

*t*

_{N}(provided that the last spline is defined to include the last knot point). The interpolating function

*x*(

*t*) is defined as

*ξ*

^{m}are found by solving

*K*> 1) B-splines are defined by recursion,

*T*times result in a total of

*T*− 1 splines of order 1 that are everywhere zero. This has the effect of introducing discontinuities in the derivatives for higher-order splines. For our purposes, we use this feature to prevent higher-order splines from crossing boundaries. For

*K*= 2 order splines we use

*N*+ 2 knot points at locations

*N*second-order B splines,

*x*(

*t*) is defined as

*ξ*

^{m}are found by solving

*even*order, we create

*N*+

*K*knot points with

*odd*order, we create

*N*+

*K*knot points with

*N*splines for the

*N*data points such that the interpolated function

*x*(

*t*) crosses all

*N*observations (

*t*

_{i},

*x*

_{i}). The path

*x*(

*t*) is the

*canonical interpolating spline*of order

*K*. Examples are shown in Fig. 1.

The knot placements in (7) and (8) are equivalent to the *not-a-knot* boundary conditions described in De Boor (1978) and used in the cubic spline implementation in MATLAB. In the usual formulation of the not-a-knot boundary condition, the knot positions do not change as a function of spline order, and therefore additional constraints must be added at each order—especially the requirement that the highest derivative maintain continuity near the boundaries. In the formulation here, these constraints are implicit in (7) and (8).

### b. Numerical implementation

The root class in our suite of MATLAB classes is the BSpline class, which evaluates a complete B-spline basis set given a set of knot points. This class was used to generate Fig. 2.

The interpolating spline used to generate Fig. 1 is implemented in the InterpolatingSpline class—a subclass of BSpline. This class generates interpolating splines of arbitrary order given a set of data points (*t*_{i}, *x*_{i}), thus generalizing the cubic spline command built in to MATLAB.

### c. Synthetic data

*ω*is the frequency,

*A*sets the amplitude,

*p*> 1 sets the high-frequency slope, and

*λ*sets the frequency below which the signal looks increasingly white. This spectrum has finite amplitude at low frequencies and power-law falloff at high frequencies, two physically realistic properties observed in ocean surface drifters (Sykulski et al. 2016). Trajectories from this velocity spectrum will be generated using the “maternoise” function available in jLab (Lilly 2019). In our experiments, the parameter

*A*is chosen such that the square root of velocity variance in each direction is

*u*

_{rms}= 0.20 m s

^{−1}and the damping scale is

*λ*

^{−1}= 30 min. Values of

*p*are varied with

*p*= 2, 3, and 4 so that the high-frequency spectrum is proportional to

*ω*

^{−2},

*ω*

^{−3}, and

*ω*

^{−4}. Velocities are sampled every minute and are integrated to get positions. Figure 3 shows an example velocity spectrum of the signal with

*p*= 2.

The position data are contaminated with (white) Gaussian noise with *σ* = 10 m, a value chosen to resemble GPS errors. For all experiments we use a range of *strides*, that is, subsampled versions of the underlying process as input into the spline fits. A stride of 100 indicates that the signal is subsampled to 1 in every 100 data points. This lets us evaluate the quality of fit against different strides. In analyzing the quality of fits, we use velocities when computing the power spectrum, but report mean-square errors from positions.

### d. Spline degree S

We first examine a synthetic signal *uncontaminated* by noise, to examine the role of the spline degree *S* on the interpolated fit. As noted in Craven and Wahba (1978), the degree of the spline sets its roughness. In terms of the power spectrum, this corresponds to the high-frequency slope as can be seen in Fig. 3, which shows fits with *S* = 1, …, 4. Setting *S* = 1 produces a high-frequency falloff in the spline fit of *ω*^{−2}. Although this appears to be a desirable feature when fitting to a process with true slope *ω*^{−2}, the mean-square error is consistently higher (as indicated in the legend of Fig. 3).

The bottom panel of Fig. 3 shows the coherence between the spline fit and the true signal. A coherence of 1 indicates that the signals are perfectly matched at a given frequency, while a coherence of 0 indicates that the signals are unrelated. There is no discernible difference in coherence between spline fits with *S* = 1, …, 4. The coherence quickly drops to near zero at the same frequency in all three cases. The implication here is that the spline fits are essentially producing noise at frequencies above the loss of coherence. This is why the spline fits with shallower slopes (with more variance at high, incoherent frequencies) produce a larger overall mean-square error than those with steeper slopes (with less variance at high, incoherent frequencies). The conclusion here is that smoother is better: it is better to use an unnecessarily high-order spline to avoid adding extra noise at high frequencies.

## 3. Smoothing spline

*ε*

_{i}≡

*x*

_{i}−

*x*

_{true}(

*t*

_{i}). The canonical example in one dimension (e.g., Press et al. 1992) is to assume errors are independently drawn from a Gaussian with the following probability distribution

*σ*

_{g}is the standard deviation. This assumption alone places no assumptions on the signal, only on the structure of the noise.

*x*(

*t*) is

*σ*=

*σ*

_{g}. Maximizing the probability function in (11) is the same as minimizing its argument—this is the logarithmic likelihood (up to a constant), called the penalty function

### a. Smoothing-spline penalty function

The model used here is the canonical interpolating spline of order *K* described in section 2. We have chosen our knot points such that the model intersects the observations and this certainly maximizes (11) [and minimizes (12)] because all the errors are zero, but the resulting distribution of errors (a delta function at zero) does not resemble the assumed Gaussian distribution. Thus, additional constraints are required if the assumed error distribution is to be recovered.

The smoothing spline augments the penalty function of (12) by adding a global constraint on the *T*th derivative of the resulting function as in (2). If *λ*_{T} → 0 then this reduces to the least squares fit in (12), but if *λ*_{T} → ∞ then this forces the model to a *T*th-order polynomial.

To interpret the first term of (2), consider a motionless particle at true position *x*_{0}. Using the *N* relevant observations *x*_{i}, the *sample mean* *x*_{0}. The unbiased *sample variance* estimates the variance *σ*^{2} of the noise, and is given by

Now consider the opposite extreme in which the particle is moving so fast (or the observations are so sparse) that each observation is independent of its neighbors. In this case, each observation must be considered separately, so the sample mean at time *t*_{i} is *t*_{i}.

*N*. Here we use the term

*effective sample size*, denoted by

*n*

_{eff}, to describe the typical number of observations being used to estimate either the particle’s position or the variance of the noise at any given time. In this context, the first term of (2) is proportional to an ensemble of multiple estimates of the sample variance

*T*= 1 so that the smoothing spline is a constraint on velocity. When averaged over the integration time, the integral produces the root-mean-square velocity

*u*

_{rms}such that the second term scales as

*T*th derivative, this means

*λ*

_{T}scales like

*T*th derivative of the physical process. As discussed in section 4, both

*λ*

_{T}can be made.

### b. Smoothing-spline maximum likelihood

*any given instant*(and not just the times of observation) we expect the model velocity to be Gaussian. We discretize the problem by sampling the velocity

*Q*times

*t*

_{q}=

*t*

_{1}+

*q*Δ

*t*

_{q}, where Δ

*t*

_{q}= (

*t*

_{N}−

*t*

_{1})/(

*Q*− 1) and

*q*= 0, …,

*Q*− 1. The maximum likelihood is thus stated as

*γ*to set the relative weighting between the two distributions, although it could be absorbed into the definition of

*C*is a constant. Setting

*γ*=

*N*/

*Q*and renormalizing the penalty function by 2/

*N*(which has no effect on the location of its minimum), (17) can be written as

There is an important special case when tension is applied at the same order as the spline, *T* = *S*. In this case the spline is piecewise constant for *x*^{(T)} with exactly *N* − *T* unique values. The parameter *γ* = *N*/(*N* − *T*) ≈ 1 and (16) can be simplified. This case is appealing because only the *N* − *T* unique values of the derivative *x*^{(T)} that can be computed from *N* data points are being used for tension, which is not the case when *T* < *S*.

This maximum-likelihood perspective shows that adding tension to the penalty function is equivalent to assuming a higher-order derivative in the model (e.g., velocity if *T* = 1) is Gaussian. This is therefore making an assumption about the underlying *physical process* of the model. This is in contrast to the first term, which is entirely a statement about *measurement noise*.

Writing the smoothing spline as a maximum-likelihood condition (16), suggests that if the underlying physical process has a nonzero mean value in tension, the fit will not behave as expected. However, smoothing splines can be easily modified to accommodate a mean value in tension, as shown in appendix A.

### c. Optimal parameter estimation

For a given choice of *T* and *λ*_{T}, the minimum solution to (2) can be found analytically [see Teanby (2007) and our appendix A]. Once the solution is found the smoothing matrix _{λ} is defined as the matrix that takes observations **x** and maps them to their smooth values,

*λ*

_{T}is a relative weighting between the two terms in (2). Choosing its optimal value can be done by minimizing the expected mean-square error (Craven and Wahba 1978),

^{2}is the Euclidean norm, Tr indicates the trace, and

A significant amount of the literature on smoothing splines is devoted to minimizing the mean-square error when the variance *σ*^{2} is *not* known. Craven and Wahba (1978) and Wahba (1978) use cross validation to estimate *σ* and minimize mean-square error. Recent work comparing different estimators shows no single technique to be optimal (Lee 2003). For our application, however, the errors in GPS data can be relatively easily established, as shown in section 6.

*ϕ*in section 3a, the first term in (19) is an ensemble of sample variances, and therefore, by combining (13), (14), and (19) we obtain

_{λ}

**Σ**is the covariance matrix with the squared standard error along the diagonal and thus the mean-square standard error is given by (1/

*N*)Tr(

_{λ}

**Σ**). The variance of the sample mean is known to scale inversely with the number of samples being used to estimate the mean. We use this to define the effective sample size of the variance of the mean,

*λ*, the mean-square error can be expressed by combining (19)–(21):

*σ*

^{2}/

*n*

_{eff}. Although not shown here, in an empirical analysis we find that

*λ*

_{T}necessary for optimal tension without minimizing the expected mean-square error.

The definition of effective sample size used here is related to, but not the same as, the notion of degrees of freedom used in Cantoni and Hastie (2002) and references therein.

## 4. Spline order, tension order, and the spectrum

With a model path [(1)], a penalty function [(2)], and a minimization condition [(19)], we have all of the primary pieces to create a smoothing-spline interpolant to the data. However, a number of choices still must be made. In this section we use synthetically generated data to represent our physical process, and contaminate the process with Gaussian noise as described in section 2c. We test our ability to recover the signal and examine the effects of changing the spline and tension order on the mean-square error and the resulting spectrum.

The results of this section are empirical, and we acknowledge upfront that any conclusions reached *may* depend on our particular choice of physical model generating the signal. Nevertheless, our expectation is that the conclusions are “*O*(1)” correct and are applicable, at least, to our GPS-tracked drifter dataset.

### a. Tension degree T

Given a smoothing spline of degree *S*, the tension in the penalty function (2) can be applied at any degree *T* ≤ *S*. We use the synthetic data for the three different slopes to empirically establish the relationship between the tension degree *T* and the spline degree *S*.

For *S* = 1, …, 5 and all *T* ≤ *S* we minimize the mean-square error against the true values. The minimization is performed for 200 ensembles of noise and signal with three slopes (*ω*^{−2}, *ω*^{−3}, and *ω*^{−4}) and five different strides. For a given slope, stride, and realization of noise, we identify the minimum mean-square error across *S* and *T* and compare all values of *S* and *T* as a percentage increase relative to that minimum. After aggregating across slopes, strides, and ensembles, the 68% confidence range is shown in Table 1. The table shows that setting *T* = *S* is not always optimal but it is never significantly worse than the optimal choice. Thus for the remainder of the paper we set *T* = *S*.

The 68th-percentile range of increase in mean-square error from the optimal fit.

### b. Loss of coherence

*n*

_{eff}because this indicates how many points are being used to estimate the true value. Therefore, the loss of coherence occurs at the

*effective Nyquist*, which we define as

### c. Reduced spline coefficients

One practical consideration when working with large datasets is the computational cost of creating the spline fit, which is limited by the rate of solving for the spline coefficients. It is beneficial to reduce knot points (and therefore total splines) where possible. A reasonable strategy is that when the effective sample size is large, as measured by (21), we avoid placing a knot point at every data point—essentially “skipping” data points.

To test this idea, we find the optimal fit over a range of different strides (which varies the effective sample size) and increase the number of skipped knot points until the mean-square error starts to rise. We find that we can skip max[1, floor(2*n*_{eff}/3)] knot points without sacrificing precision. The column labeled “optimal mse” in Table 2 indicates the optimal fit in which one knot point is created for every observation point, whereas the reduced degrees of freedom (“reduced dof”) column indicates a fit in which the number of knot points is reduced. In some cases the optimal mean-square error improves with fewer knot points. This means that, when handling large datasets, we can reduce the number of splines being used if the effective sample size is large, and we can “chunk” the data (split them into multiple independent pieces) when the effective sample size is small.

Mean-square error (mse) and effective sample size for a range of strides and smoothing-spline methods.

### d. Interpolation condition

To estimate *λ*_{T} from (15), we estimate the mean-square value of a derivative of the process, *n*_{eff}. We argue that effective sample size should vary based on the relative size of the measurement errors to the speed of motion. For example, if the position errors are only 1 m, but a particle typically travels 10 m between measurements, then it is hardly justifiable to increase the tension so that the smoothing spline misses the observation points by 1 m. There is not enough statistical evidence to suggest that the particle did not go right through the observation point. On the other hand, if the position errors are 1 m, but the particle typically travels 10 cm between measurements, nearby measurements provide more information about the particle’s true position during that time, so our estimate of the particle’s true position is closer to a mean of the nearby observations.

*x*, is statistically significant if it exceeds the position errors

*σ*by some factor. Assuming the physical process has a characteristic velocity scale,

*u*

_{rms}, we use this concept to define Γ as

*t*is the typical time between observations. This argument suggests that effective sample size should be proportional to Γ; that is,

*C*and

*m*are unknown constants, and we prevent the effective sample size from dropping below 1. Intuitively this means as long as the particle does not move too far between observations, nearby observations help to estimate the true position of the particle.

To test the relationship between Γ and effective sample size, we compute the optimal smoothing spline for a range of values of Γ (created by subsampling the signal) for three different spectral slopes (*ω*^{−2}, *ω*^{−3}, and *ω*^{−4}). The value

*ω*

^{−3}. For all spline fits we use

*u*

_{rms}in (24) are estimated using the method described in appendix C. The scaling law for

*x*

_{i}, where

*n*⟩, then the particle changes position by ⟨

*n*⟩

*u*

_{rms}Δ

*t*between samples. Applying the two-sample

*z*test, two positions will be considered different for

*z*>

*z*

_{min}, where

*C*in (25) can be related to

*z*, a measure of statistical significance.

### e. Optimal fits

Table 2 summarizes the key results of this section by applying a smoothing spline (*S* = 3) to 200 ensembles with three different slopes (*ω*^{−2}, *ω*^{−3}, and *ω*^{−4}) and five different strides. When the algorithm uses true values, uncontaminated by noise, we consider the process to be “unblinded,” in contrast to “blind” methods, in which the algorithm only uses noisy data. The second and third columns show the effective sample size and average mean-square error when the smoothing spline is applied using the true values (i.e., unblinded) to minimize the mean-square error—this is the lower bound. The fourth column shows average increase in mean-square error when reducing the number of spline coefficients as documented in section 4c. There is almost no change in mean-square error, and therefore, all subsequent methods (whether blind or unblind) use this technique. The fifth column uses (26) from section 4d to provide a (blind) initial guess of the tension parameter. The results are mixed—a typical increase in mean-square error is 30%–50% when the effective sample size is large. While this seems large, this is a small fraction of the total noise variance; for example, an optimal mean-square error of 6 m^{2} increases to 8 m^{2} when the total variance is 100 m^{2}. Nearly optimal fits can be found using (19), as shown in the last column of the table.

### f. Numerical implementation

The numerical implementation of the methods in this section are available in the SmoothingSpline class, which subclasses BSpline. This class is initialized with three required parameters: a set of data points (*t*_{i}, *x*_{i}) and an error distribution.

## 5. Bivariate smoothing splines and stationarity

Up to this point we have considered univariate data, (*t*_{i}, *x*_{i}), but GPS position data are fundamentally bivariate. The term “bivariate” in the context of splines is often used to denote splines defined on two independent variables—however, in this context we define bivariate to mean two dependent variables (e.g., *x* and *y*) and one independent variable (e.g., *t*).

*not*mean that

*λ*

_{x}should necessarily equal

*λ*

_{y}. To be explicit, if

Therefore, to assume isotropy in *λ*_{T} and use a bivariate smoothing spline, the mean velocity from the underlying process must be removed. What qualifies as mean and fluctuation rarely has a clear answer, but a reasonable option is letting a polynomial of degree *T* + 1 define the mean. This has the added benefit of removing a constant nonzero tension value, which as shown in section 3b, changes the problem formulation.

It is stationarity, not isotropy, that requires removing the mean velocity. The effective sample size is shown to be dependent on rms velocity, so if velocity varies in time, then the optimal effective sample size varies as well. This means not only do smoothing splines require stationarity in the tensioned variable *x*^{(T)} as shown in section 3b, but they also require stationarity in the velocity *x*^{(1)} to be effective. This last requirement can be solved by either removing the mean (as suggested here), or segmenting observations into locally stationary chunks.

### a. Assessing errors

Removing the mean or some other low-passed version of the data means the total smoothing matrix is a combination of the low-passed and high-passed smoothing matrices. Once this matrix is computed, it can be used to compute the standard errors.

*mean*component of the flow using a simple polynomial fit

### b. Numerical implementation

The BivariateSmoothingSpline class is initialized with data (*t*_{i}, *x*_{i}, *y*_{i}) and a distribution. For a spline of degree *S* = *T*, a spline of degree *S* + 1 is used to remove the mean in each direction. With a Gaussian distribution this is simply a least squares polynomial fit. By assumption, the residual data are stationary and isotropic, so the tension parameter *λ*_{T} is applied equally in each direction. Minimization is performed on the sum of the expected mean-square errors in each direction.

## 6. GPS dataset

The primary dataset considered here is nine surface drifters deployed in the Sargasso Sea in the summer of 2011 (Shcherbina et al. 2015). In the past, such drifters used the Argos positioning system, which has significantly poorer temporal coverage and position accuracy (Elipot et al. 2016), but recently most surface drifters have employed GPS receivers and transmitted their data back through Argos or Iridium satellites.

The GPS receiver sits on the surface drifter and collects position data, but because of atmospheric conditions or ocean waves, the receivers are sometimes unable to obtain a position, or when they do, it is highly inaccurate. Despite nominal accuracies of a few meters, it is often the case that some positions are off by more than 1000 m, as can be seen in Fig. 8. Applying a smoothing-spline fit using the method in section 3 produces an extremely poor fit, with clear overshoots to bad data points.

### GPS error distribution

We characterize the GPS errors by considering data from a motionless GPS receiver allowed to run for 12 h. The GPS receiver used in this test is not the same as the one used for the drifters (because it was no longer available) but should produce errors similar enough for this analysis.

The position recorded by the motionless GPS are assumed to have isotropic errors with mean zero, which means the positions themselves are the errors. The PDF of the combined *x* and *y* position errors are shown in Fig. 6.

The error distribution is first fit to a zero-mean Gaussian PDF (10). The maximum-likelihood fit is found by computing the standard deviation of the sample, which is found to be *σ* ≈ 10 m and shown as the gray line in Fig. 6. However, it is clear the error distribution shows much longer tails than the Gaussian PDF.

*t*distribution is a generalization of the Gaussian that produces longer tails and is defined as

*σ*

_{s}parameter scales the distribution width and the

*ν*parameter sets the number of degrees of freedom [as

*ν*→ ∞, (31) becomes a Gaussian]. The variance is

*ν*> 2. Minimizing the Anderson–Darling test to find the best-fit

*t*distribution to the data, we find parameters

*σ*

_{s}≈ 8.5 m and

*ν*≈ 4.5 shown as the black line in Fig. 6. Different choices in GPS receivers and using the Kolmogorov–Smirnoff test results in very similar parameters; that is,

*σ*

_{s}≈ 8–10 m and

*ν*≈ 4–6.

*position*error distributions imply a combined

*distance*error distribution by computing

*t*-distributed noise is computed numerically and is shown by the black line. Around 95% of distance errors are within 30 m.

Figure 7 shows the autocorrelation function of the GPS position errors and the 99% confidence intervals. We find a rough empirical fit to be *ρ*(*τ*) = exp[max(−*τ*/*t*_{0}, −*τ*/*t*_{1} − 1.35)], where *t*_{0} = 100 s and *t*_{1} = 760 s, which reflects an initially rapid falloff in correlation, followed by a slower decline. The smallest sampling interval of the GPS drifters in question is 30 min and the correlation indistinguishable from zero according to Fig. 7. It is therefore safe to assume the errors are uncorrelated for our real-data example. Although the drifter sampling rate allows us to avoid further discussion of the autocorrelation function of GPS errors, accounting for autocorrelation is a relatively easy extension (and is implemented in the code).

The smoothing-spline algorithms described in section 3 are modified to use the *t* distribution as described in appendix B. Table 3 shows that the conclusions reached for Gaussian data in section 3 still apply with *t*-distributed data.

As in Table 2, but with noise following a *t* distribution.

## 7. Minimization with outliers

The goal here is to find a smooth solution in the presence of outliers—points that do not appear to be of the known error distribution for the GPS receiver shown in section 6. These points are obviously problematic as can be seen in Fig. 8, where individual data points jump hundreds of meters and even several kilometers away from its neighbors. Errors of this size are inconsistent with the noise analysis of the preceding section, so the goal here is to find a model path *x*(*t*) robust to this uncharacterized noise. What makes outliers “obvious” to the eye is they appear as unexpectedly large motions, inconsistent with the other motion for that path. The smoothing-spline formulation is therefore useful, as it assumes the motion at some order (e.g., acceleration) is Gaussian, as shown in section 3b. In the nine drifters we are analyzing here, one drifter shows no obvious outliers, suggesting the issue may be related to how the antennas are configured. This particular drifter serves as a useful point of comparison.

Minimizing with the expected mean-square error (19) produces a fit so poor it is not worth showing. Because outliers add enormous amounts of variance, the expected mean-square error vastly underestimates the spline tension—essentially chasing every outlier shown in Fig. 8. Because some of the noise is uncharacterized, this suggests using a method such as cross validation might be effective. The orange line in Fig. 8 uses a smoothing-spline fit, assuming Student’s *t* distributed errors, but minimized with cross validation. This fit performs relatively well, but, when compared with the drifter 7, it is clear it still chases some outliers. The goal in this section is to develop a method robust to outliers in cases where we know something about the noise.

*p*

_{robust}, that includes the known noise distribution,

*p*

_{noise}, plus an unknown (or assumed) form of an outlier distribution,

*p*

_{outlier},

*t*distribution for

*p*

_{noise}with parameters found from the GPS errors in section 6. The distribution of

*p*

_{outlier}is also set to be a

*t*distribution, but with

*ν*= 3 and

*σ*= 50

*σ*

_{gps}, which roughly matches the total variance of the observed outliers. In our tests we varied

*α*from 0 up to 0.25, approximately the range of observed outliers from the drifter datasets.

Throughout our attempts to smooth the noisy GPS data we tried many different approaches to modifying smoothing splines for robustness to outliers, but ultimately found enormous gains are made by simply discarding outliers while minimizing the expected mean-square error (19). The results of this approach are shown in section 7a, and we document our method to reliably estimate the outlier distribution in section 7b.

### a. Robust minimization

*λ*

_{T}too small). Our method excludes the outliers from the calculation of (19), where outliers are defined as points unlikely to arise with the known noise distribution. The

*ranged expected mean-square error*replaces

*σ*

^{2}with

_{λ}where (

_{λ}−

**x**< cdf

^{−1}(

*β*/2) or (

_{λ}−

**x**> cdf

^{−1}[1 − (

*β*/2)].

To test this approach we generated data as before but allowed a certain percentage of outliers *α* to be generated with an outlier distribution following (33). We considered five values of *β* (1/50, 1/100, 1/200, 1/400, and 1/800) as well as *β* = 0, which is just (19). Testing across a number of ensembles with outlier ratios *α* = 0.0, 0.05, 0.10, and 0.25 we found that *β* = 1/100 is overall the best choice.

### b. Full-tension solution and outlier distribution

The *full-tension* solution is defined as the maximum allowable value of *λ* given the known noise distribution. That is, the spline fit is pulled away from the observations so that the distribution of observed errors *x*_{i} − *x*(*t*_{i}) matches the expected distribution *p*_{noise}(*ε*). In cases where the effective sample size *n*_{eff} is large, the full-tension solution approximately matches the optimal (minimal mean-square error) solution. In cases in which the effective sample size is small, the full-tension solution is more akin to a low-pass solution [as increasing *λ* is equivalent to decreasing

In the simplest case where there are no outliers, the full-tension solution can be found by requiring the sample variance match the variance of *p*_{noise}(*ε*). When outliers are present, a more robust method of estimation is required. After some experimentation, we found the most reliable method of achieving full tension is to minimize the Anderson–Darling test of *p*_{noise}(*ε*) on the interquartile range of observed errors. This method can be used to estimate the outlier distribution and further refine both the full-tension solution and the range over which the expected mean-square error is computed.

*t*distribution with

*ν*= 3 and

*α*< 0.5. If the spline is in full tension, then the observed total variance can be used to find

*σ*

_{o}for the outlier distribution. From (33),

*α*, can be solved for

*σ*

_{o}. Our method uses 100 values of

*α*logarithmically spaced from 0.01 to 0.5 and chooses the value that minimizes the Anderson–Darling test. With an estimate for

*p*

_{robust}(

*ε*), the full-tension solution can be refined by minimizing the Anderson–Darling test of

*p*

_{robust}(

*ε*) on the interquartile range of observed errors. This iterative process converges very quickly to a good estimate for the outlier distribution and the full-tension solution.

### c. Extension to bivariate data

The strategies in this section are relatively easily extended to bivariate data. All error distributions are assumed isotropic, and the outlier distribution can be estimated by including the errors from both independent directions. The ranged expected mean-square error calculation defined in section 7a uses the *distance* of the error for its cutoff to remain invariant under rotation.

Application of this method to one of the GPS drifters (drifter 6) is shown in Fig. 8. Although it is impossible to know exactly how well the spline fit performed, comparison with drifter 7 (with no apparent outliers) suggests our method successfully avoids chasing outliers.

### d. Numerical implementation

The GPSSmoothingSpline inherits from the BivariateSmoothingSpline class and assumes errors follow a *t* distribution found in section 6. The class projects latitude and longitude using a transverse Mercator projection with the central meridian set to the center of the dataset.

## 8. Discussion

The methods discussed in this paper are related to other methods used to smooth and interpolate drifter trajectories.

Yaremchuk and Coelho (2015) formulate a cost function, their (9), based on PDFs of the drifter accelerations and the GPS errors. Setting their *μ* = 1 (they choose *μ* = 0.9) this is equivalent to the special case of (18) when *S* = *T* = 2, where they have implicitly chosen *λ*_{T} by assuming an infinite effective sample size, *n*^{eff}. Their method for isolating outliers is nearly equivalent to the iteratively reweighted least squares method detailed in appendix B using a weight function similar to Tukey’s biweight, (B6).

Elipot et al. (2016) apply their method to the Argos-tracked surface drifters, which are significantly noisier positions than GPS errors but also follow a *t* distribution. They assume a linear model for positions, equivalent to assuming *S* = *T* = 2 with *λ*_{T} → ∞. In the numerical implementation of this paper, this special case is implemented in the ConstrainedSpline class. The time-dependent weight function used in Elipot et al. (2016) requires manually specifying a weight for each point used, and this method is therefore somewhat different than the approach taken here.

Another technique used for smoothing and interpolating drifter positions is kriging (Hansen and Poulain 1996); however, its relationship to smoothing splines is less clear. In response to a study empirically comparing kriging with smoothing splines (Laslett 1994), Handcock et al. (1994) point out that kriging and smoothing splines are just two specific parameter choices of a more general class of splines defined by their covariance functions. In the context of the maximum-likelihood equation for smoothing splines [(18)], this generalization could be modeled by including a covariance structure on the physical process.

Overall, the method of this paper (in a loose sense) generalizes a number of existing approaches for interpolation, especially in terms of flexibly allowing different levels of smoothness and tension, and in terms of application to non-Gaussian noise structures.

## 9. Conclusions

The method in this paper solves our problem of finding smoothed, interpolated positions from a noisy GPS drifter dataset with outliers. In more general terms, for signals with second-order structure similar to a Matérn process we found that

- the spline degree
*S*should be set to a value higher than the high-frequency spectral slope of the process (section 2) and - the optimal tension parameter can be estimated a priori (section 4).

For GPS data, there appear to be three key steps for using smoothing splines:

Thanks are given to Miles Sundermeyer, whose drifters were used in this analysis. The work of J. J. Early was funded by ONR through the Scalable Lateral Mixing and Coherent Turbulence Departmental Research Initiative (LatMix) and National Science Foundation Award 1658564. The work of A. M. Sykulski was funded by the Engineering and Physical Sciences Research Council (Grant EP/R01860X/1).

# APPENDIX A

## Numerical Implementation

The B-splines are generated using the algorithm described in De Boor (1978) with knot points determined by (7) and (8). The matrix *m*th B-spline at time *t*_{i}. The column vector *ξ*^{m} represents the coefficients of the splines such that positions at time *t*_{i} are given by

*μ*

_{u},

*T*= 1 for this calculation. The discretized penalty function is

**Σ**denotes the covariance matrix describing the measurement errors and we absorbed several constants into

*λ*

_{1}. The matrix

*t*. To find the coefficients that minimize this function, we take the derivative with respect to

_{q}**, set it to zero, and solve for**

*ξ***,**

*ξ***is a vector of 1s. The operation**

*ι***essentially integrates the**

*ι**m*splines and results in a column vector with the integrated values.

**x**to their smoothed values

*μ*= 0.

# APPENDIX B

## Iteratively Reweighted Least Squares

Using the *t* distribution is challenging because it does not result in a linear solution for the coefficients as in (A3). One solution is to use a search algorithm to directly look for maximum values. Alternatively, one can use iteratively reweighted least squares (IRLS).

*σ*

_{g}in (10), so that the resulting distribution looks like the desired distribution, for example, (31). With the recollection that

*ε*

_{i}≡

*x*

_{i}−

*x*(

*t*

_{i},

**), the minimization condition**

*ξ**dp*

_{g}/

*dξ*= 0 implies that

*t*distribution this implies

*ε*

_{i}is already known, which initially it is not. So the method becomes iterative—one starts with

*ε*

_{i}determined from the Gaussian fit and then determines a new

*ε*

_{i}after reweighting

*σ*

_{g}. This method iterates until

*σ*

_{g}stops changing. We can rewrite (B3) as a function of

*ε*

_{i},

From (B4) it is clear that if *ε*_{i} < *σ*_{s} then it is reweighted to a smaller value, making the observation point more strongly weighted. On the other hand, if *ε*_{i} > *σ*_{s}, then its relative weighting decreases, and it is treated more as an outlier.

*w*(

*z*) for a PDF

*p*(

*z*) is found by setting −∂

_{z}logp(

*z*) equal to −∂

_{z}log

*p*

_{g}(

*z*) of a Gaussian PDF, where

*w*(

*z*) replaces

*w*(

*z*). The result is

**Σ**

^{−1}in (A4) is replaced with the diagonal matrix

*w*(

*ε*

_{i})] populated with the reweighted values for each observation such that

_{λ}

**Σ**, where the variance is assumed to be

*t*distribution.

The smoothing-spline solution *does* depend on the initial value of *w*(*ε*_{i}) used in the IRLS method. However, we find that for uniform initial weightings (e.g., all values start with the square root of the variance), the differences are not statistically significant from other initial values.

# APPENDIX C

## Estimating the Variance of the Signal

Our method requires good estimates of the root-mean-square velocity *u*_{rms} of the signal, to determine the effective sample size and variance of the tensioned derivative. Our approach is to compute the power spectrum of the signal at the derivative of interest, and sum the variance that is statistically significantly greater than the expected variance of the noise.

*x*

_{n}at times

*t*

_{n}=

*n*Δ where

*n*= 1, …,

*N*, we estimate the mean of its

*m*th derivative by performing a least squares fit to the polynomial

*detrended*time series is defined as

*f*

_{k}are given by

*f*

_{k}=

*k*/(

*N*Δ). By Plancherel’s theorem,

*m*th derivative of the process is computed as

*ε*

_{i}, has total variance

*m*th derivative of the noise has power spectrum

*χ*

^{2}distribution with 2 degrees of freedom, so we choose the threshold based on the 95th percentile of the expected distribution. And thus,

*q*≈ 20 for the 95th-percentile confidence.

## REFERENCES

Bracco, A., J. H. LaCasce, C. Pasquero, and A. Provenzale, 2000: The velocity distribution of barotropic turbulence.

,*Phys. Fluids***12**, 2478, https://doi.org/10.1063/1.1288517.Cantoni, E., and T. Hastie, 2002: Degrees-of-freedom tests for smoothing splines.

,*Biometrika***89**, 251–263, https://doi.org/10.1093/biomet/89.2.251.Craven, P., and G. Wahba, 1978: Smoothing noisy data with spline functions.

,*Numer. Math.***31**, 377–403, https://doi.org/10.1007/BF01404567.De Boor, C., 1978:

*A Practical Guide to Splines*. Vol. 27. Springer-Verlag, 348 pp.Elipot, S., R. Lumpkin, R. Perez, J. J. Early, and A. M. Sykulski, 2016: A global surface drifter data set at hourly resolution.

,*J. Geophys. Res. Oceans***121**, 2937–2966, https://doi.org/10.1002/2016JC011716.Green, P. J., and B. W. Silverman, 1993:

. Chapman and Hall, 184 pp.*Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach*Handcock, M. S., K. Meier, and D. Nychka, 1994: Kriging and splines: An empirical comparison of their predictive performance in some applications: Comment.

,*J. Amer. Stat. Assoc.***89**, 401–403, https://doi.org/10.2307/2290838.Hansen, D. V., and P.-M. Poulain, 1996: Quality control and interpolations of WOCE-TOGA drifter data.

,*J. Atmos. Oceanic Technol.***13**, 900–909, https://doi.org/10.1175/1520-0426(1996)013<0900:QCAIOW>2.0.CO;2.Laslett, G. M., 1994: Kriging and splines: An empirical comparison of their predictive performance in some applications.

,*J. Amer. Stat. Assoc.***89**, 391–400, https://doi.org/10.1080/01621459.1994.10476759.Lee, T. C. M., 2003: Smoothing parameter selection for smoothing splines: A simulation study.

,*Comput. Stat. Data Anal.***42**, 139–148, https://doi.org/10.1016/S0167-9473(02)00159-7.Lilly, J. M., 2019: A data analysis package for MATLAB, version 1.6.6. Jekyll, http://www.jmlilly.net/jmlsoft.html.

Lilly, J. M., A. M. Sykulski, J. J. Early, and S. C. Olhede, 2017: Fractional Brownian motion, the Matérn process, and stochastic modeling of turbulent dispersion.

,*Nonlinear Processes Geophys.***24**, 481–514, https://doi.org/10.5194/npg-24-481-2017.Nychka, D., 2000: Spatial process estimates as smoothers.

*Smoothing and Regression: Approaches, Computation and Application*, M. G. Schimek, Ed., John Wiley and Sons, 393–424.Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, 1992:

*Numerical Recipes in C: The Art of Scientific Computing*. 2nd ed. Cambridge University Press, 994 pp.Reinsch, C. H., 1967: Smoothing by spline functions.

,*Numer. Math.***10**, 177–183, https://doi.org/10.1007/BF02162161.Shcherbina, A. Y., and Coauthors, 2015: The LatMix summer campaign: Submesoscale stirring in the upper ocean.

,*Bull. Amer. Meteor. Soc.***96**, 1257–1279, https://doi.org/10.1175/BAMS-D-14-00015.1.Sykulski, A. M., S. C. Olhede, J. M. Lilly, and E. Danioux, 2016: Lagrangian time series models for ocean surface drifter trajectories.

,*J. Roy. Stat. Soc.***65**, 29–50, https://doi.org/10.1111/rssc.12112.Teanby, N. A., 2007: Constrained smoothing of noisy data using splines in tension.

,*Math. Geol.***39**, 419–434, https://doi.org/10.1007/s11004-007-9104-x.Wahba, G., 1978: Improper priors, spline smoothing and the problem of guarding against model errors in regression.

,*J. Roy. Stat. Soc.***40B**, 364–372, https://doi.org/10.1111/J.2517-6161.1978.TB01050.X.Whittaker, E. T., 1923: On a new method of graduation.

,*Proc. Edinburgh Math. Soc.***41**, 63–75, https://doi.org/10.1017/S0013091500077853.Wide Area Augmentation System T and E Team, 2016: Global positioning system (GPS) standard positioning service (SPS) performance analysis report. William J. Hughes Technical Center Tech. Rep. 92, 147 pp., https://www.nstb.tc.faa.gov/reports/PAN92_0116.pdf.

Yaremchuk, M., and E. F. Coelho, 2015: Filtering drifter trajectories sampled at submesoscale resolution.

,*IEEE J. Oceanic Eng.***40**, 497–505, https://doi.org/10.1109/JOE.2014.2353472.