1. Introduction
In 2011 an array of floating ocean surface buoys (drifters) were deployed in the Sargasso Sea to assess the lateral diffusivity of oceanic processes (Shcherbina et al. 2015). Each drifter was equipped with a global positioning system (GPS) receiver recording locations every 30 min. Addressing the primary goal of understanding the processes controlling lateral diffusivity requires significant processing of the drifter positions, including removing mean flow, accounting for the large-scale strain field, and analyzing the residual spectra for hints of a dynamical process. However, it quickly became clear that the GPS position data, which can have accuracies as low as a few meters (Wide Area Augmentation System T and E Team 2016), were contaminated by outliers with position jumps of hundreds of meters or more. Prior to analysis, the position data require removing outliers as well as interpolating gaps to keep the position data synchronized in time across the drifter array.
The basic problem is ubiquitous: observations from GPS receivers return observed positions xi at times ti that differ from the true positions xtrue(ti) by some noise εi ≡ xi − xtrue(ti) with variance σ2. The goal of smoothing is to find the true position xtrue(ti) that is not contaminated by the noise, whereas the goal of interpolating is to find the true position xtrue(t) between observation times. The approach taken here is to use smoothing splines. This approach is relatively broad (Handcock et al. 1994; Nychka 2000), and is related to the methods in Yaremchuk and Coelho (2015) and Elipot et al. (2016) for smoothing drifter trajectories, as discussed later.
Three issues must be addressed before smoothing splines are applied to GPS data:
how to choose S and T—and how these choices affect the recovered power spectrum,
how to modify the spline fit to accommodate the non-Gaussian errors of GPS receivers, and
how to identify and remove outliers.
To address these issues but also to serve as a practical guide to other practitioners, we review B-splines in section 2 and introduce the canonical interpolating spline as the underlying model for path x(t) in (1). We demonstrate the effect that choosing S has on the high-frequency slope of the power spectrum of the interpolated fit.
Section 3 takes a broad look at smoothing splines and the assumptions they make on the underlying process. Many of the ideas presented in this section are known to the statistics community, so here we present these ideas from a more physical perspective. We show that the penalty function in (2) can be formulated as a maximum-likelihood problem and that applying tension is equivalent to assuming a Gaussian distribution on the tensioned derivative of the underlying process.
Section 4 uses ensembles from synthetic data that mimic the oceanographic data to test a number of choices that must be made. We establish that setting T = S is a reasonable choice. We show how the tension parameter can be chosen a priori (without optimization of the mean-square error) when the effective sample size (which we define later) can be estimated from the data. This estimate for effective sample size can be used to reduce the coefficients ξi in the spline fit without increasing mean-square error.
The second half of the paper addresses issues specific to GPS position errors. In section 5 we discuss the assumptions of stationarity and isotropy required for bivariate smoothing splines. In section 6 we show that GPS errors are not Gaussian distributed but rather are t distributed, and we show how to modify our method for a t distribution. Section 7 addresses how to modify our method to make smoothing splines robust to outliers. We compare with alternative methods and conclude in sections 8 and 9, respectively.
A major outcome of this work is the implementation of MATLAB classes for generating B-splines, interpolating splines, smoothing splines, and a class specific to smoothing GPS data (https://github.com/JeffreyEarly/GLNumericalModelingKit). These classes are highlighted throughout in relevant sections.
2. Interpolating spline
Assume that we are given N observations of a particle position (ti, xi) with no errors. The simplest form of interpolation is a nearest-neighbor method that assigns the position of the particle to the nearest observations in time. The resulting interpolated function x(t) is a polynomial of order K = 1 (piecewise constant), shown in the top row of Fig. 1. The next level of sophistication is to assume a constant velocity between any two observations and to use that to interpolate positions between observations, as shown in the second row of Fig. 1. This means we now have a piecewise constant function dx/dt that represents the velocity of the particle, shown in the second row, second column, of Fig. 1. This is a polynomial function of order K = 2.

An example of interpolating between seven data points using a spline function of order K. The data points are shown as circles, and the interpolated function is shown as solid black lines. We show four different orders of interpolation K = 1, …, 4 (rows) and their nonzero derivatives (columns). The thin vertical gray lines are the knot points.
Citation: Journal of Atmospheric and Oceanic Technology 37, 3; 10.1175/JTECH-D-19-0087.1

An example of interpolating between seven data points using a spline function of order K. The data points are shown as circles, and the interpolated function is shown as solid black lines. We show four different orders of interpolation K = 1, …, 4 (rows) and their nonzero derivatives (columns). The thin vertical gray lines are the knot points.
Citation: Journal of Atmospheric and Oceanic Technology 37, 3; 10.1175/JTECH-D-19-0087.1
An example of interpolating between seven data points using a spline function of order K. The data points are shown as circles, and the interpolated function is shown as solid black lines. We show four different orders of interpolation K = 1, …, 4 (rows) and their nonzero derivatives (columns). The thin vertical gray lines are the knot points.
Citation: Journal of Atmospheric and Oceanic Technology 37, 3; 10.1175/JTECH-D-19-0087.1
It is less obvious how to proceed to a polynomial of order K = 3. With N data points we can construct a piecewise constant acceleration (the second derivative) using the N − 2 independent accelerations computed from finite differencing, but where to place knot points that define the boundaries of the regions and how to maintain continuity is less clear. The approach taken here is to use B-splines.
a. B-Splines
A B-spline (or basis spline) of order K (degree S = K − 1) is a piecewise polynomial that maintains nonzero continuity across S knot points. The knot points are a nondecreasing collection of points in time denoted by τi. The basic theory is well documented in De Boor (1978), but here we present a reduced version tailored to our needs.

The B-splines and derivatives (columns) for orders K = 1, …, 4 (rows).
Citation: Journal of Atmospheric and Oceanic Technology 37, 3; 10.1175/JTECH-D-19-0087.1

The B-splines and derivatives (columns) for orders K = 1, …, 4 (rows).
Citation: Journal of Atmospheric and Oceanic Technology 37, 3; 10.1175/JTECH-D-19-0087.1
The B-splines and derivatives (columns) for orders K = 1, …, 4 (rows).
Citation: Journal of Atmospheric and Oceanic Technology 37, 3; 10.1175/JTECH-D-19-0087.1
The knot placements in (7) and (8) are equivalent to the not-a-knot boundary conditions described in De Boor (1978) and used in the cubic spline implementation in MATLAB. In the usual formulation of the not-a-knot boundary condition, the knot positions do not change as a function of spline order, and therefore additional constraints must be added at each order—especially the requirement that the highest derivative maintain continuity near the boundaries. In the formulation here, these constraints are implicit in (7) and (8).
b. Numerical implementation
The root class in our suite of MATLAB classes is the BSpline class, which evaluates a complete B-spline basis set given a set of knot points. This class was used to generate Fig. 2.
The interpolating spline used to generate Fig. 1 is implemented in the InterpolatingSpline class—a subclass of BSpline. This class generates interpolating splines of arbitrary order given a set of data points (ti, xi), thus generalizing the cubic spline command built in to MATLAB.
c. Synthetic data

(top) The velocity spectrum of a synthetic Lagrangian velocity generated from the Matérn (black). The blue, red, and orange lines show the spectrum of the interpolating spline fit to the data with a stride of 100 for S = 1, …, 4, respectively. (bottom) The coherence between the smoothed velocities and the true velocity. The dashed vertical line denotes the Nyquist frequency of the strided data.
Citation: Journal of Atmospheric and Oceanic Technology 37, 3; 10.1175/JTECH-D-19-0087.1

(top) The velocity spectrum of a synthetic Lagrangian velocity generated from the Matérn (black). The blue, red, and orange lines show the spectrum of the interpolating spline fit to the data with a stride of 100 for S = 1, …, 4, respectively. (bottom) The coherence between the smoothed velocities and the true velocity. The dashed vertical line denotes the Nyquist frequency of the strided data.
Citation: Journal of Atmospheric and Oceanic Technology 37, 3; 10.1175/JTECH-D-19-0087.1
(top) The velocity spectrum of a synthetic Lagrangian velocity generated from the Matérn (black). The blue, red, and orange lines show the spectrum of the interpolating spline fit to the data with a stride of 100 for S = 1, …, 4, respectively. (bottom) The coherence between the smoothed velocities and the true velocity. The dashed vertical line denotes the Nyquist frequency of the strided data.
Citation: Journal of Atmospheric and Oceanic Technology 37, 3; 10.1175/JTECH-D-19-0087.1
The position data are contaminated with (white) Gaussian noise with σ = 10 m, a value chosen to resemble GPS errors. For all experiments we use a range of strides, that is, subsampled versions of the underlying process as input into the spline fits. A stride of 100 indicates that the signal is subsampled to 1 in every 100 data points. This lets us evaluate the quality of fit against different strides. In analyzing the quality of fits, we use velocities when computing the power spectrum, but report mean-square errors from positions.
d. Spline degree S
We first examine a synthetic signal uncontaminated by noise, to examine the role of the spline degree S on the interpolated fit. As noted in Craven and Wahba (1978), the degree of the spline sets its roughness. In terms of the power spectrum, this corresponds to the high-frequency slope as can be seen in Fig. 3, which shows fits with S = 1, …, 4. Setting S = 1 produces a high-frequency falloff in the spline fit of ω−2. Although this appears to be a desirable feature when fitting to a process with true slope ω−2, the mean-square error is consistently higher (as indicated in the legend of Fig. 3).
The bottom panel of Fig. 3 shows the coherence between the spline fit and the true signal. A coherence of 1 indicates that the signals are perfectly matched at a given frequency, while a coherence of 0 indicates that the signals are unrelated. There is no discernible difference in coherence between spline fits with S = 1, …, 4. The coherence quickly drops to near zero at the same frequency in all three cases. The implication here is that the spline fits are essentially producing noise at frequencies above the loss of coherence. This is why the spline fits with shallower slopes (with more variance at high, incoherent frequencies) produce a larger overall mean-square error than those with steeper slopes (with less variance at high, incoherent frequencies). The conclusion here is that smoother is better: it is better to use an unnecessarily high-order spline to avoid adding extra noise at high frequencies.
3. Smoothing spline
a. Smoothing-spline penalty function
The model used here is the canonical interpolating spline of order K described in section 2. We have chosen our knot points such that the model intersects the observations and this certainly maximizes (11) [and minimizes (12)] because all the errors are zero, but the resulting distribution of errors (a delta function at zero) does not resemble the assumed Gaussian distribution. Thus, additional constraints are required if the assumed error distribution is to be recovered.
The smoothing spline augments the penalty function of (12) by adding a global constraint on the Tth derivative of the resulting function as in (2). If λT → 0 then this reduces to the least squares fit in (12), but if λT → ∞ then this forces the model to a Tth-order polynomial.
To interpret the first term of (2), consider a motionless particle at true position x0. Using the N relevant observations xi, the sample mean
Now consider the opposite extreme in which the particle is moving so fast (or the observations are so sparse) that each observation is independent of its neighbors. In this case, each observation must be considered separately, so the sample mean at time ti is
b. Smoothing-spline maximum likelihood
There is an important special case when tension is applied at the same order as the spline, T = S. In this case the spline is piecewise constant for x(T) with exactly N − T unique values. The parameter γ = N/(N − T) ≈ 1 and (16) can be simplified. This case is appealing because only the N − T unique values of the derivative x(T) that can be computed from N data points are being used for tension, which is not the case when T < S.
This maximum-likelihood perspective shows that adding tension to the penalty function is equivalent to assuming a higher-order derivative in the model (e.g., velocity if T = 1) is Gaussian. This is therefore making an assumption about the underlying physical process of the model. This is in contrast to the first term, which is entirely a statement about measurement noise.
Writing the smoothing spline as a maximum-likelihood condition (16), suggests that if the underlying physical process has a nonzero mean value in tension, the fit will not behave as expected. However, smoothing splines can be easily modified to accommodate a mean value in tension, as shown in appendix A.
c. Optimal parameter estimation
For a given choice of T and λT, the minimum solution to (2) can be found analytically [see Teanby (2007) and our appendix A]. Once the solution is found the smoothing matrix
A significant amount of the literature on smoothing splines is devoted to minimizing the mean-square error when the variance σ2 is not known. Craven and Wahba (1978) and Wahba (1978) use cross validation to estimate σ and minimize mean-square error. Recent work comparing different estimators shows no single technique to be optimal (Lee 2003). For our application, however, the errors in GPS data can be relatively easily established, as shown in section 6.
The definition of effective sample size used here is related to, but not the same as, the notion of degrees of freedom used in Cantoni and Hastie (2002) and references therein.
4. Spline order, tension order, and the spectrum
With a model path [(1)], a penalty function [(2)], and a minimization condition [(19)], we have all of the primary pieces to create a smoothing-spline interpolant to the data. However, a number of choices still must be made. In this section we use synthetically generated data to represent our physical process, and contaminate the process with Gaussian noise as described in section 2c. We test our ability to recover the signal and examine the effects of changing the spline and tension order on the mean-square error and the resulting spectrum.
The results of this section are empirical, and we acknowledge upfront that any conclusions reached may depend on our particular choice of physical model generating the signal. Nevertheless, our expectation is that the conclusions are “O(1)” correct and are applicable, at least, to our GPS-tracked drifter dataset.
a. Tension degree T
Given a smoothing spline of degree S, the tension in the penalty function (2) can be applied at any degree T ≤ S. We use the synthetic data for the three different slopes to empirically establish the relationship between the tension degree T and the spline degree S.
For S = 1, …, 5 and all T ≤ S we minimize the mean-square error against the true values. The minimization is performed for 200 ensembles of noise and signal with three slopes (ω−2, ω−3, and ω−4) and five different strides. For a given slope, stride, and realization of noise, we identify the minimum mean-square error across S and T and compare all values of S and T as a percentage increase relative to that minimum. After aggregating across slopes, strides, and ensembles, the 68% confidence range is shown in Table 1. The table shows that setting T = S is not always optimal but it is never significantly worse than the optimal choice. Thus for the remainder of the paper we set T = S.
The 68th-percentile range of increase in mean-square error from the optimal fit.


b. Loss of coherence

(top) The uncontaminated velocity spectrum of the signal (black) and velocity spectrum of the noise (red). The observed signal is the sum of the two. The blue, red, and orange lines show the spectrum of the smoothing spline that is best fit to the observations with all, 1/10th, and 1/100th of the data, respectively. (bottom) The coherence between the smoothed signals and the true signal. The vertical dashed lines show the effective Nyquist computed using (23).
Citation: Journal of Atmospheric and Oceanic Technology 37, 3; 10.1175/JTECH-D-19-0087.1

(top) The uncontaminated velocity spectrum of the signal (black) and velocity spectrum of the noise (red). The observed signal is the sum of the two. The blue, red, and orange lines show the spectrum of the smoothing spline that is best fit to the observations with all, 1/10th, and 1/100th of the data, respectively. (bottom) The coherence between the smoothed signals and the true signal. The vertical dashed lines show the effective Nyquist computed using (23).
Citation: Journal of Atmospheric and Oceanic Technology 37, 3; 10.1175/JTECH-D-19-0087.1
(top) The uncontaminated velocity spectrum of the signal (black) and velocity spectrum of the noise (red). The observed signal is the sum of the two. The blue, red, and orange lines show the spectrum of the smoothing spline that is best fit to the observations with all, 1/10th, and 1/100th of the data, respectively. (bottom) The coherence between the smoothed signals and the true signal. The vertical dashed lines show the effective Nyquist computed using (23).
Citation: Journal of Atmospheric and Oceanic Technology 37, 3; 10.1175/JTECH-D-19-0087.1
c. Reduced spline coefficients
One practical consideration when working with large datasets is the computational cost of creating the spline fit, which is limited by the rate of solving for the spline coefficients. It is beneficial to reduce knot points (and therefore total splines) where possible. A reasonable strategy is that when the effective sample size is large, as measured by (21), we avoid placing a knot point at every data point—essentially “skipping” data points.
To test this idea, we find the optimal fit over a range of different strides (which varies the effective sample size) and increase the number of skipped knot points until the mean-square error starts to rise. We find that we can skip max[1, floor(2neff/3)] knot points without sacrificing precision. The column labeled “optimal mse” in Table 2 indicates the optimal fit in which one knot point is created for every observation point, whereas the reduced degrees of freedom (“reduced dof”) column indicates a fit in which the number of knot points is reduced. In some cases the optimal mean-square error improves with fewer knot points. This means that, when handling large datasets, we can reduce the number of splines being used if the effective sample size is large, and we can “chunk” the data (split them into multiple independent pieces) when the effective sample size is small.
Mean-square error (mse) and effective sample size for a range of strides and smoothing-spline methods.


d. Interpolation condition
To estimate λT from (15), we estimate the mean-square value of a derivative of the process,
To test the relationship between Γ and effective sample size, we compute the optimal smoothing spline for a range of values of Γ (created by subsampling the signal) for three different spectral slopes (ω−2, ω−3, and ω−4). The value

Effective sample size from the standard error vs Γ.
Citation: Journal of Atmospheric and Oceanic Technology 37, 3; 10.1175/JTECH-D-19-0087.1

Effective sample size from the standard error vs Γ.
Citation: Journal of Atmospheric and Oceanic Technology 37, 3; 10.1175/JTECH-D-19-0087.1
Effective sample size from the standard error vs Γ.
Citation: Journal of Atmospheric and Oceanic Technology 37, 3; 10.1175/JTECH-D-19-0087.1
e. Optimal fits
Table 2 summarizes the key results of this section by applying a smoothing spline (S = 3) to 200 ensembles with three different slopes (ω−2, ω−3, and ω−4) and five different strides. When the algorithm uses true values, uncontaminated by noise, we consider the process to be “unblinded,” in contrast to “blind” methods, in which the algorithm only uses noisy data. The second and third columns show the effective sample size and average mean-square error when the smoothing spline is applied using the true values (i.e., unblinded) to minimize the mean-square error—this is the lower bound. The fourth column shows average increase in mean-square error when reducing the number of spline coefficients as documented in section 4c. There is almost no change in mean-square error, and therefore, all subsequent methods (whether blind or unblind) use this technique. The fifth column uses (26) from section 4d to provide a (blind) initial guess of the tension parameter. The results are mixed—a typical increase in mean-square error is 30%–50% when the effective sample size is large. While this seems large, this is a small fraction of the total noise variance; for example, an optimal mean-square error of 6 m2 increases to 8 m2 when the total variance is 100 m2. Nearly optimal fits can be found using (19), as shown in the last column of the table.
f. Numerical implementation
The numerical implementation of the methods in this section are available in the SmoothingSpline class, which subclasses BSpline. This class is initialized with three required parameters: a set of data points (ti, xi) and an error distribution.
5. Bivariate smoothing splines and stationarity
Up to this point we have considered univariate data, (ti, xi), but GPS position data are fundamentally bivariate. The term “bivariate” in the context of splines is often used to denote splines defined on two independent variables—however, in this context we define bivariate to mean two dependent variables (e.g., x and y) and one independent variable (e.g., t).
Therefore, to assume isotropy in λT and use a bivariate smoothing spline, the mean velocity from the underlying process must be removed. What qualifies as mean and fluctuation rarely has a clear answer, but a reasonable option is letting a polynomial of degree T + 1 define the mean. This has the added benefit of removing a constant nonzero tension value, which as shown in section 3b, changes the problem formulation.
It is stationarity, not isotropy, that requires removing the mean velocity. The effective sample size is shown to be dependent on rms velocity, so if velocity varies in time, then the optimal effective sample size varies as well. This means not only do smoothing splines require stationarity in the tensioned variable x(T) as shown in section 3b, but they also require stationarity in the velocity x(1) to be effective. This last requirement can be solved by either removing the mean (as suggested here), or segmenting observations into locally stationary chunks.
a. Assessing errors
Removing the mean or some other low-passed version of the data means the total smoothing matrix is a combination of the low-passed and high-passed smoothing matrices. Once this matrix is computed, it can be used to compute the standard errors.
b. Numerical implementation
The BivariateSmoothingSpline class is initialized with data (ti, xi, yi) and a distribution. For a spline of degree S = T, a spline of degree S + 1 is used to remove the mean in each direction. With a Gaussian distribution this is simply a least squares polynomial fit. By assumption, the residual data are stationary and isotropic, so the tension parameter λT is applied equally in each direction. Minimization is performed on the sum of the expected mean-square errors in each direction.
6. GPS dataset
The primary dataset considered here is nine surface drifters deployed in the Sargasso Sea in the summer of 2011 (Shcherbina et al. 2015). In the past, such drifters used the Argos positioning system, which has significantly poorer temporal coverage and position accuracy (Elipot et al. 2016), but recently most surface drifters have employed GPS receivers and transmitted their data back through Argos or Iridium satellites.
The GPS receiver sits on the surface drifter and collects position data, but because of atmospheric conditions or ocean waves, the receivers are sometimes unable to obtain a position, or when they do, it is highly inaccurate. Despite nominal accuracies of a few meters, it is often the case that some positions are off by more than 1000 m, as can be seen in Fig. 8. Applying a smoothing-spline fit using the method in section 3 produces an extremely poor fit, with clear overshoots to bad data points.
GPS error distribution
We characterize the GPS errors by considering data from a motionless GPS receiver allowed to run for 12 h. The GPS receiver used in this test is not the same as the one used for the drifters (because it was no longer available) but should produce errors similar enough for this analysis.
The position recorded by the motionless GPS are assumed to have isotropic errors with mean zero, which means the positions themselves are the errors. The PDF of the combined x and y position errors are shown in Fig. 6.

(top) The position error distribution of the motionless GPS. The gray or black curves are the best-fit Gaussian or t distribution, respectively. (bottom) The distance error distribution with the corresponding expected distributions from the Gaussian and t distributions. The vertical line in the bottom panel shows the 95% error of the t distribution.
Citation: Journal of Atmospheric and Oceanic Technology 37, 3; 10.1175/JTECH-D-19-0087.1

(top) The position error distribution of the motionless GPS. The gray or black curves are the best-fit Gaussian or t distribution, respectively. (bottom) The distance error distribution with the corresponding expected distributions from the Gaussian and t distributions. The vertical line in the bottom panel shows the 95% error of the t distribution.
Citation: Journal of Atmospheric and Oceanic Technology 37, 3; 10.1175/JTECH-D-19-0087.1
(top) The position error distribution of the motionless GPS. The gray or black curves are the best-fit Gaussian or t distribution, respectively. (bottom) The distance error distribution with the corresponding expected distributions from the Gaussian and t distributions. The vertical line in the bottom panel shows the 95% error of the t distribution.
Citation: Journal of Atmospheric and Oceanic Technology 37, 3; 10.1175/JTECH-D-19-0087.1
The error distribution is first fit to a zero-mean Gaussian PDF (10). The maximum-likelihood fit is found by computing the standard deviation of the sample, which is found to be σ ≈ 10 m and shown as the gray line in Fig. 6. However, it is clear the error distribution shows much longer tails than the Gaussian PDF.
Figure 7 shows the autocorrelation function of the GPS position errors and the 99% confidence intervals. We find a rough empirical fit to be ρ(τ) = exp[max(−τ/t0, −τ/t1 − 1.35)], where t0 = 100 s and t1 = 760 s, which reflects an initially rapid falloff in correlation, followed by a slower decline. The smallest sampling interval of the GPS drifters in question is 30 min and the correlation indistinguishable from zero according to Fig. 7. It is therefore safe to assume the errors are uncorrelated for our real-data example. Although the drifter sampling rate allows us to avoid further discussion of the autocorrelation function of GPS errors, accounting for autocorrelation is a relatively easy extension (and is implemented in the code).

The autocorrelation function of the GPS positioning error, with 99% confidence intervals shown in gray. The correlation at a drifter sampling period of 30 min is indistinguishable from zero.
Citation: Journal of Atmospheric and Oceanic Technology 37, 3; 10.1175/JTECH-D-19-0087.1

The autocorrelation function of the GPS positioning error, with 99% confidence intervals shown in gray. The correlation at a drifter sampling period of 30 min is indistinguishable from zero.
Citation: Journal of Atmospheric and Oceanic Technology 37, 3; 10.1175/JTECH-D-19-0087.1
The autocorrelation function of the GPS positioning error, with 99% confidence intervals shown in gray. The correlation at a drifter sampling period of 30 min is indistinguishable from zero.
Citation: Journal of Atmospheric and Oceanic Technology 37, 3; 10.1175/JTECH-D-19-0087.1
The smoothing-spline algorithms described in section 3 are modified to use the t distribution as described in appendix B. Table 3 shows that the conclusions reached for Gaussian data in section 3 still apply with t-distributed data.
7. Minimization with outliers
The goal here is to find a smooth solution in the presence of outliers—points that do not appear to be of the known error distribution for the GPS receiver shown in section 6. These points are obviously problematic as can be seen in Fig. 8, where individual data points jump hundreds of meters and even several kilometers away from its neighbors. Errors of this size are inconsistent with the noise analysis of the preceding section, so the goal here is to find a model path x(t) robust to this uncharacterized noise. What makes outliers “obvious” to the eye is they appear as unexpectedly large motions, inconsistent with the other motion for that path. The smoothing-spline formulation is therefore useful, as it assumes the motion at some order (e.g., acceleration) is Gaussian, as shown in section 3b. In the nine drifters we are analyzing here, one drifter shows no obvious outliers, suggesting the issue may be related to how the antennas are configured. This particular drifter serves as a useful point of comparison.

GPS position data for a 40-h window from drifter 6. The points are the recorded positions, and the black line is the optimal fit using the ranged expected mean-square error. Data points with less than 0.01% chance of occurring are highlighted and are deemed outliers. The light gray line is the optimal smoothing-spline fit for drifter 7, which has no apparent outliers and was released a few hundred meters from drifter 6. The orange line is the smoothing-spline fit assuming t-distributed errors but using cross validation to minimize λT.
Citation: Journal of Atmospheric and Oceanic Technology 37, 3; 10.1175/JTECH-D-19-0087.1

GPS position data for a 40-h window from drifter 6. The points are the recorded positions, and the black line is the optimal fit using the ranged expected mean-square error. Data points with less than 0.01% chance of occurring are highlighted and are deemed outliers. The light gray line is the optimal smoothing-spline fit for drifter 7, which has no apparent outliers and was released a few hundred meters from drifter 6. The orange line is the smoothing-spline fit assuming t-distributed errors but using cross validation to minimize λT.
Citation: Journal of Atmospheric and Oceanic Technology 37, 3; 10.1175/JTECH-D-19-0087.1
GPS position data for a 40-h window from drifter 6. The points are the recorded positions, and the black line is the optimal fit using the ranged expected mean-square error. Data points with less than 0.01% chance of occurring are highlighted and are deemed outliers. The light gray line is the optimal smoothing-spline fit for drifter 7, which has no apparent outliers and was released a few hundred meters from drifter 6. The orange line is the smoothing-spline fit assuming t-distributed errors but using cross validation to minimize λT.
Citation: Journal of Atmospheric and Oceanic Technology 37, 3; 10.1175/JTECH-D-19-0087.1
Minimizing with the expected mean-square error (19) produces a fit so poor it is not worth showing. Because outliers add enormous amounts of variance, the expected mean-square error vastly underestimates the spline tension—essentially chasing every outlier shown in Fig. 8. Because some of the noise is uncharacterized, this suggests using a method such as cross validation might be effective. The orange line in Fig. 8 uses a smoothing-spline fit, assuming Student’s t distributed errors, but minimized with cross validation. This fit performs relatively well, but, when compared with the drifter 7, it is clear it still chases some outliers. The goal in this section is to develop a method robust to outliers in cases where we know something about the noise.
Throughout our attempts to smooth the noisy GPS data we tried many different approaches to modifying smoothing splines for robustness to outliers, but ultimately found enormous gains are made by simply discarding outliers while minimizing the expected mean-square error (19). The results of this approach are shown in section 7a, and we document our method to reliably estimate the outlier distribution in section 7b.
a. Robust minimization
To test this approach we generated data as before but allowed a certain percentage of outliers α to be generated with an outlier distribution following (33). We considered five values of β (1/50, 1/100, 1/200, 1/400, and 1/800) as well as β = 0, which is just (19). Testing across a number of ensembles with outlier ratios α = 0.0, 0.05, 0.10, and 0.25 we found that β = 1/100 is overall the best choice.
b. Full-tension solution and outlier distribution
The full-tension solution is defined as the maximum allowable value of λ given the known noise distribution. That is, the spline fit is pulled away from the observations so that the distribution of observed errors xi − x(ti) matches the expected distribution pnoise(ε). In cases where the effective sample size neff is large, the full-tension solution approximately matches the optimal (minimal mean-square error) solution. In cases in which the effective sample size is small, the full-tension solution is more akin to a low-pass solution [as increasing λ is equivalent to decreasing
In the simplest case where there are no outliers, the full-tension solution can be found by requiring the sample variance match the variance of pnoise(ε). When outliers are present, a more robust method of estimation is required. After some experimentation, we found the most reliable method of achieving full tension is to minimize the Anderson–Darling test of pnoise(ε) on the interquartile range of observed errors. This method can be used to estimate the outlier distribution and further refine both the full-tension solution and the range over which the expected mean-square error is computed.
c. Extension to bivariate data
The strategies in this section are relatively easily extended to bivariate data. All error distributions are assumed isotropic, and the outlier distribution can be estimated by including the errors from both independent directions. The ranged expected mean-square error calculation defined in section 7a uses the distance of the error for its cutoff to remain invariant under rotation.
Application of this method to one of the GPS drifters (drifter 6) is shown in Fig. 8. Although it is impossible to know exactly how well the spline fit performed, comparison with drifter 7 (with no apparent outliers) suggests our method successfully avoids chasing outliers.
d. Numerical implementation
The GPSSmoothingSpline inherits from the BivariateSmoothingSpline class and assumes errors follow a t distribution found in section 6. The class projects latitude and longitude using a transverse Mercator projection with the central meridian set to the center of the dataset.
8. Discussion
The methods discussed in this paper are related to other methods used to smooth and interpolate drifter trajectories.
Yaremchuk and Coelho (2015) formulate a cost function, their (9), based on PDFs of the drifter accelerations and the GPS errors. Setting their μ = 1 (they choose μ = 0.9) this is equivalent to the special case of (18) when S = T = 2, where they have implicitly chosen λT by assuming an infinite effective sample size, neff. Their method for isolating outliers is nearly equivalent to the iteratively reweighted least squares method detailed in appendix B using a weight function similar to Tukey’s biweight, (B6).
Elipot et al. (2016) apply their method to the Argos-tracked surface drifters, which are significantly noisier positions than GPS errors but also follow a t distribution. They assume a linear model for positions, equivalent to assuming S = T = 2 with λT → ∞. In the numerical implementation of this paper, this special case is implemented in the ConstrainedSpline class. The time-dependent weight function used in Elipot et al. (2016) requires manually specifying a weight for each point used, and this method is therefore somewhat different than the approach taken here.
Another technique used for smoothing and interpolating drifter positions is kriging (Hansen and Poulain 1996); however, its relationship to smoothing splines is less clear. In response to a study empirically comparing kriging with smoothing splines (Laslett 1994), Handcock et al. (1994) point out that kriging and smoothing splines are just two specific parameter choices of a more general class of splines defined by their covariance functions. In the context of the maximum-likelihood equation for smoothing splines [(18)], this generalization could be modeled by including a covariance structure on the physical process.
Overall, the method of this paper (in a loose sense) generalizes a number of existing approaches for interpolation, especially in terms of flexibly allowing different levels of smoothness and tension, and in terms of application to non-Gaussian noise structures.
9. Conclusions
The method in this paper solves our problem of finding smoothed, interpolated positions from a noisy GPS drifter dataset with outliers. In more general terms, for signals with second-order structure similar to a Matérn process we found that
the spline degree S should be set to a value higher than the high-frequency spectral slope of the process (section 2) and
the optimal tension parameter can be estimated a priori (section 4).
For GPS data, there appear to be three key steps for using smoothing splines:
Acknowledgments
Thanks are given to Miles Sundermeyer, whose drifters were used in this analysis. The work of J. J. Early was funded by ONR through the Scalable Lateral Mixing and Coherent Turbulence Departmental Research Initiative (LatMix) and National Science Foundation Award 1658564. The work of A. M. Sykulski was funded by the Engineering and Physical Sciences Research Council (Grant EP/R01860X/1).
APPENDIX A
Numerical Implementation
The B-splines are generated using the algorithm described in De Boor (1978) with knot points determined by (7) and (8). The matrix
APPENDIX B
Iteratively Reweighted Least Squares
Using the t distribution is challenging because it does not result in a linear solution for the coefficients as in (A3). One solution is to use a search algorithm to directly look for maximum values. Alternatively, one can use iteratively reweighted least squares (IRLS).
From (B4) it is clear that if εi < σs then it is reweighted to a smaller value, making the observation point more strongly weighted. On the other hand, if εi > σs, then its relative weighting decreases, and it is treated more as an outlier.
The smoothing-spline solution does depend on the initial value of w(εi) used in the IRLS method. However, we find that for uniform initial weightings (e.g., all values start with the square root of the variance), the differences are not statistically significant from other initial values.
APPENDIX C
Estimating the Variance of the Signal
Our method requires good estimates of the root-mean-square velocity urms of the signal, to determine the effective sample size and variance of the tensioned derivative. Our approach is to compute the power spectrum of the signal at the derivative of interest, and sum the variance that is statistically significantly greater than the expected variance of the noise.
REFERENCES
Bracco, A., J. H. LaCasce, C. Pasquero, and A. Provenzale, 2000: The velocity distribution of barotropic turbulence. Phys. Fluids, 12, 2478, https://doi.org/10.1063/1.1288517.
Cantoni, E., and T. Hastie, 2002: Degrees-of-freedom tests for smoothing splines. Biometrika, 89, 251–263, https://doi.org/10.1093/biomet/89.2.251.
Craven, P., and G. Wahba, 1978: Smoothing noisy data with spline functions. Numer. Math., 31, 377–403, https://doi.org/10.1007/BF01404567.
De Boor, C., 1978: A Practical Guide to Splines. Vol. 27. Springer-Verlag, 348 pp.
Elipot, S., R. Lumpkin, R. Perez, J. J. Early, and A. M. Sykulski, 2016: A global surface drifter data set at hourly resolution. J. Geophys. Res. Oceans, 121, 2937–2966, https://doi.org/10.1002/2016JC011716.
Green, P. J., and B. W. Silverman, 1993: Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. Chapman and Hall, 184 pp.
Handcock, M. S., K. Meier, and D. Nychka, 1994: Kriging and splines: An empirical comparison of their predictive performance in some applications: Comment. J. Amer. Stat. Assoc., 89, 401–403, https://doi.org/10.2307/2290838.
Hansen, D. V., and P.-M. Poulain, 1996: Quality control and interpolations of WOCE-TOGA drifter data. J. Atmos. Oceanic Technol., 13, 900–909, https://doi.org/10.1175/1520-0426(1996)013<0900:QCAIOW>2.0.CO;2.
Laslett, G. M., 1994: Kriging and splines: An empirical comparison of their predictive performance in some applications. J. Amer. Stat. Assoc., 89, 391–400, https://doi.org/10.1080/01621459.1994.10476759.
Lee, T. C. M., 2003: Smoothing parameter selection for smoothing splines: A simulation study. Comput. Stat. Data Anal., 42, 139–148, https://doi.org/10.1016/S0167-9473(02)00159-7.
Lilly, J. M., 2019: A data analysis package for MATLAB, version 1.6.6. Jekyll, http://www.jmlilly.net/jmlsoft.html.
Lilly, J. M., A. M. Sykulski, J. J. Early, and S. C. Olhede, 2017: Fractional Brownian motion, the Matérn process, and stochastic modeling of turbulent dispersion. Nonlinear Processes Geophys., 24, 481–514, https://doi.org/10.5194/npg-24-481-2017.
Nychka, D., 2000: Spatial process estimates as smoothers. Smoothing and Regression: Approaches, Computation and Application, M. G. Schimek, Ed., John Wiley and Sons, 393–424.
Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, 1992: Numerical Recipes in C: The Art of Scientific Computing. 2nd ed. Cambridge University Press, 994 pp.
Reinsch, C. H., 1967: Smoothing by spline functions. Numer. Math., 10, 177–183, https://doi.org/10.1007/BF02162161.
Shcherbina, A. Y., and Coauthors, 2015: The LatMix summer campaign: Submesoscale stirring in the upper ocean. Bull. Amer. Meteor. Soc., 96, 1257–1279, https://doi.org/10.1175/BAMS-D-14-00015.1.
Sykulski, A. M., S. C. Olhede, J. M. Lilly, and E. Danioux, 2016: Lagrangian time series models for ocean surface drifter trajectories. J. Roy. Stat. Soc., 65, 29–50, https://doi.org/10.1111/rssc.12112.
Teanby, N. A., 2007: Constrained smoothing of noisy data using splines in tension. Math. Geol., 39, 419–434, https://doi.org/10.1007/s11004-007-9104-x.
Wahba, G., 1978: Improper priors, spline smoothing and the problem of guarding against model errors in regression. J. Roy. Stat. Soc., 40B, 364–372, https://doi.org/10.1111/J.2517-6161.1978.TB01050.X.
Whittaker, E. T., 1923: On a new method of graduation. Proc. Edinburgh Math. Soc., 41, 63–75, https://doi.org/10.1017/S0013091500077853.
Wide Area Augmentation System T and E Team, 2016: Global positioning system (GPS) standard positioning service (SPS) performance analysis report. William J. Hughes Technical Center Tech. Rep. 92, 147 pp., https://www.nstb.tc.faa.gov/reports/PAN92_0116.pdf.
Yaremchuk, M., and E. F. Coelho, 2015: Filtering drifter trajectories sampled at submesoscale resolution. IEEE J. Oceanic Eng., 40, 497–505, https://doi.org/10.1109/JOE.2014.2353472.