## 1. Introduction

Tides are a major driver of oceanic variability. They are forced by the gravitational effects of the moon and sun and have a ubiquitous presence throughout the global ocean. The tide-generating potential was described harmonically by Sir George Howard Darwin in 1883 (Doodson and Lamb 1921) and was further developed by Doodson, with the Darwin symbols for tidal constituents (e.g., O_{1} and M_{2}) still in widespread use. The ubiquity of tides in oceanic data has motivated the development of techniques for determining tidal parameters. Two types of methods are frequently used to analyze tides: discrete Fourier transform–based methods and least squares–based harmonic analysis. Discrete Fourier transform–based methods use the energy contained in discrete frequency bands to diagnose the amplitude of tidal constituents. Least squares harmonic analysis has been used for decades (Munk and Hasselman 1964; Zetler et al. 1965) to estimate the amplitude and phase of tidal signals at known tidal frequencies.

Tidal signals can be separated into two components: the relatively predictable barotropic tide and the more variable baroclinic tide (Ray and Mitchum 1996). The predictability of the barotropic tide is a consequence of its stable phase and amplitude due to its large-scale, rapid propagation, and the regularity of the astronomical forcing. Classical harmonic analysis at tidal constituent frequencies is effective for analyzing time series of sea surface height or bottom pressure that are dominated by the barotropic tide and are characterized by sharp, narrow peaks in the frequency domain at tidal constituent frequencies. The interannual lunar nodal cycle (18.61 years) and cycle of lunar perigee (8.85 years), which are not directly resolvable in typically short tidal records, cause tidal modulations that affect the interpretation of tidal records (Haigh et al. 2011). The t_tide package, for example, accounts for these cycles using nodal corrections (Pawlowicz et al. 2002). However, other processes can modulate the tidal peaks.

In the case of the baroclinic tide, which is more variable in amplitude and phase, propagation through varying stratification or nonlinear interaction with other waves leads to amplitude and/or phase modulation via the transfer of energy to the internal wave continuum spectrum and a loss of coherence with the astronomical forcing (Chiswell 2002; Rainville and Pinkel 2006). Interaction with eddy fields, background currents, and the seasonal cycle in stratification all cause internal tides to vary in time (Ray and Zaron 2011). This variability spreads the tidal energy across a band of frequencies centered at the tidal forcing frequency, forming a tidal cusp (Munk et al. 1965); the spreading of energy in the frequency domain can pose challenges to describing the predictable tidal component. The component of tidal energy resulting from interaction with other processes in the ocean has been referred to by different names in the literature, including the incoherent tide (e.g., Eich et al. 2004), the nonstationary tide (e.g., Ray and Zaron 2011), and more recently the non-phase-locked tide (Zaron 2019). We have chosen the term “non-phase-locked” because it is associated purely with tides and their generating potential, whereas the other terms overlap with wave and statistics terminology. Nevertheless, these other terms are reasonable, as this component of the tides is incoherent with astronomical forcing and is nonstationary in time.

Throughout this paper, *model* refers to the series of basis functions (harmonics) at frequencies chosen to approximate (or *to model*) a given time series, while the coefficients of these harmonics are referred to as model parameters. Solving for these parameters to find the best estimate of the underlying tidal component of the observations is the goal of least squares tidal harmonic analysis. The choice of basis functions is central to the technique, and the designation of these functions as a model for observations follows standard least squares terminology (e.g., Wunsch 1996). Conventional least squares tidal harmonic analysis models a time series as a sum of sinusoids at tidal frequencies, with amplitudes and phases optimized to best fit observations. This contrasts with analysis via the discrete Fourier transform in several important ways: Fourier analysis requires evenly sampled time series and decomposes a signal into components at evenly spaced frequencies determined by the record length and sample rate. The Fourier transform is periodic at the record length (“fundamental”) and is band limited at the Nyquist frequency. Additionally, the Fourier transform does not allow for a separate component of noise. Harmonic techniques for analyzing tidal time series overcome these limitations of Fourier analysis. They allow for estimates that do not exactly match all observations and that provide a unique solution in the presence of noise and potential nonorthogonality between basis functions, for example, when the frequencies of basis sinusoids differ by less than the fundamental frequency. This is solved as an inverse problem in which the fit is expected to differ from the observations by some residual. Data may be irregularly spaced and fit to arbitrary basis functions, including sinusoids of any frequency, unconstrained by periodicity over the record length and not band limited by the sampling rate. Additionally, harmonic analysis allows for a noise component with prior statistics, generally mean and autocovariance. We denote the autocovariance and its matrix representation with the standard terms “covariance” and “covariance matrix,” respectively, as we do not discuss any cross-covariance quantities for which such shorthand might be confusing.

At frequencies outside the tidal bands, ocean data tend to be spectrally red, with greater power at lower frequencies (Munk et al. 1965). When finite duration records with steep spectra are analyzed, high-amplitude, low-frequency processes with periods that do not match the series length can alter estimated spectral power at higher frequencies. Spectral leakage is problematic for characterizing internal tides, especially in regions where mesoscale variability is much stronger than the internal tide (Ray and Zaron 2016). Low-frequency variability, tidal cusps, and high-frequency noise must all be accounted for either explicitly or as a residual term. In other words, total signal variability is modeled as sinusoids at given frequencies added to a residual broadband background that has power at all frequencies, which is characterized by a residual (or noise) covariance matrix. Our approach is to choose basis functions and prior statistical assumptions about the signal and noise components (quantified in covariance matrices) that match the expected variability in the observations as well as computational cost allows; such constraints bias the model parameters by reducing their variance, which is appropriate when tidal constituents and related components are estimated from limited sampled data with noise. We seek to avoid overfitting and we argue that this produces better results than estimators obtained from methods that use less prior knowledge.

Least squares tidal harmonic analysis has drawbacks. Pawlowicz et al. (2002) identify some challenges including record length requirements for distinguishing some tidal frequencies, the lack of distinction between true tidal lines and background energy at tidal frequencies, and the broadening of spectral lines from estuarine tidal responses and stratification-dependent internal tides. Nevertheless, harmonic analysis has been widely adopted for tides because it is well suited for signals with a weak noise component relative to the tidal signal. Pawlowicz et al. (2002), expanding upon earlier code for tidal harmonic analysis (Foreman 1977; Foreman and Henry 1989; Foreman et al. 2009) and employing MATLAB, created the widely used t_tide package, incorporating methods to mitigate known drawbacks to classical harmonic analysis. These methods include nodal corrections and inference of unresolvable constituents to account for the long record lengths required for resolution under classical harmonic analysis, as well as three algorithms to provide confidence intervals to account for nontidal energy at tidal frequencies. Other authors have expanded the t_tide procedure (Leffler and Jay 2009; Codiga 2011) or have modified it for specific dynamical regimes, such as tides in the presence of river outflow (Matte et al. 2013).

The twofold problem of accurately estimating tidal variability with a component that is not phase locked to astronomical forcing in the presence of spectrally colored noise while minimizing spectral leakage has motivated us to develop a new tidal harmonic analysis package, red_tide, that accounts for tidal cusps and red background spectra. Appendix A provides information on the access of the package, which may be modified to accommodate individual needs. We do not presently incorporate nodal corrections like those used in t_tide in order to emphasize features specific to our method, though these corrections may be incorporated in a future version. The primary scientific motivation for developing the package is to support a detailed level of tidal analysis of highly variable baroclinic tides. For example, such tides are expected to be an important part of the signal at spatial scales of

The rest of this study is divided into three sections. We begin by outlining the linear algebraic and statistical methods that underpin our tidal analysis in section 2. We then apply these methods to synthetic time series, first by highlighting specific features of the method, including application to a step function to demonstrate aliased signals (section 3a), and then by analyzing tide-like synthetic series to show the effect of prior statistical assumptions on model parameters (sections 3b–d). Two examples of observational data follow in section 4, with comparisons to t_tide. Finally, summary and discussion follow in section 5.

## 2. Methods

The basic framework of weighted least squares estimation used in red_tide is outlined in a number of references (e.g., Wunsch 1996; Menke 2018). Here we provide a review of this framework, formulated for tidal harmonic analysis of records with arbitrarily structured noise, with notation following that of Ide et al. (1997) with a few modifications. Note that tidally driven components of a time series are considered “signal” while nontidal processes are referred to as “noise” for the purpose of distinguishing them in the context of harmonic analysis. These so-called noise terms may include instrument error as well as nontidal processes in the ocean, such as submesoscale eddies and the internal wave continuum.

**y**of length

*N*is modeled as the sum of sinusoids of tidal and nontidal frequencies

**H**is an

*N*× 2

*M*regressor matrix (i.e., the model basis functions),

**x**are the 2

*M*model parameters, and

**r**represents the

*N*-element residual time series. The columns of

**H**are sines and cosines of prescribed frequencies

*ω*, for

_{m}*m*= 1, 2, …,

*M*, such that Eq. (1) can be expressed as

**x**are estimated by

**x**and

**r**have independent Gaussian distributions which satisfy the conditions under which the MAP estimate gives

**x**given observations

**y**is proportional to the product of the prior probability distribution of

**x**and the likelihood of

**y**given

**x**:

*P*(

**y**) is not a function of

**x**and therefore can be omitted, since it is not relevant to the optimization. The matrices

**R**= 〈

**rr**

^{T}〉 (size

*N*×

*N*) and

**P**= 〈

**xx**

^{T}〉 (size 2

*M*× 2

*M*) are the covariance matrices of

**r**and

**x**, respectively (

**P**is thus a hyperparameter of the Gaussian prior for

**x**). This expression as well as its logarithm,

*P*(

**x**|

**y**) follows a Gaussian distribution, its mean (the Bayes estimator) equals its mode (the MAP estimator) and can therefore be solved as a maximization problem (Van Trees 2001). At the mode, the partial derivative of Eq. (5) with respect to

**x**vanishes:

**y**, and solving for

**x**are assumed to have a Gaussian probability distribution function (PDF) resulting from the Gaussian distributions of the prior and likelihood function. Therefore, the posterior PDF of

*x*, the

_{m}*m*th element of

**x**, is a Gaussian with a mean given by the

*m*th element of

*m*th element of the diagonal of the matrix in Eq. (8).

If **x** was not expected to have a Gaussian distribution, a different expression for **x** with the posterior Gaussian PDF. This approach is implemented due to its flexibility and simplicity compared with analytical solutions, which are not always in closed form, or with more complicated approximations such as piecewise linear discretization of PDFs (e.g., Lourens and van Geer 2016). The tidal amplitude is one such quantity, where *A _{m}*, which follows a noncentral

*χ*distribution when the standard deviations of

*a*and

_{m}*b*are equal (

_{m}*ϕ*, defined by tan(

_{m}*ϕ*) = −

_{m}*b*/

_{m}*a*, whose statistics are also not Gaussian. Whenever uncertainty bounds are given for quantities derived from the Bayesian methods outlined above or for quantities that are functions of them, we refer to them as

_{m}*credible intervals*in accordance with Bayesian terminology for the interval in which an estimated parameter lies with the stated probability (Lee 1997). These are analogous to

*confidence intervals*, the uncertainty bounds on quantities derived from frequentist methods. Because Pawlowicz et al. (2002) uses the term

*confidence intervals*, and because t_tide is not derived using a Bayesian framework, we refer to t_tide output and other non-Bayesian quantities as having confidence intervals when comparing it to red_tide output and its credible intervals.

Because **y** is modeled as the sum of sinusoids and a noise component, its expected power spectrum *S*** _{yy}**(

*f*) may be interpolated to fitted frequencies and used to construct

**P**. This approach, however, results in doubly counting energy, as

**r**contributes to the variance of

**y**at all frequencies, including frequencies modeled by

**H**

**x**. Because the energy of a tidal peak and cusp are typically much higher than the background noise and may in fact be underestimated in

*S*

**(**

_{yy}*f*) due to peak broadening from spectral averaging, the double counting of energy will be small around most prominent tidal peaks. For cases where tidal signals are comparable in energy to

**r**at tidal frequencies,

**P**can be reduced by the appropriate amount. In sections 3c and 3d, the correct partition of energy into

**P**versus

**R**is possible because we have perfect knowledge of the underlying synthetic processes. For real data for which we lack perfect knowledge, such as those in section 4, the approach we use for convenience is to assume wide-sense stationary noise so that we can obtain the residual spectrum

*S*

**(**

_{rr}*f*) from the Fourier transform of any column of

**R**. We then subtract it from

*S*

**(**

_{yy}*f*) to construct

**P**in order to obtain a more accurate partition of signal versus noise energy. We also assume throughout that

**y**is a wide-sense stationary time series; this means that the elements of

**x**are assumed to be uncorrelated and therefore that

**P**is diagonal throughout [see Bendat and Piersol (2010) for nonstationary data analysis and double frequency spectra, the continuous analog to a nondiagonal

**P**].

Values along the main diagonal of **R** represent the expected variance of the misfits between fitted time series and observations. Off-diagonal elements indicate the covariance at lagged times, with values farther off the main diagonal corresponding to larger time lags. Beyond some time lag, the covariance may be approximated as zero if long-period energy is sufficiently small or explicitly represented in **H**. This approximation is useful, as it limits the memory requirement for large **R**. A diagonal **R** is a special case of this, with nonzero elements only along its main diagonal. This corresponds to an assumption of zero lagged noise correlation or equivalently an assumption of spectrally white noise. This approximation is often made for computational efficiency, as **R** = *σ*^{2}**I** can be replaced in Eq. (7) with the constant *σ*^{2}, which is estimated as *σ*^{2} = 〈**r**^{T}**r**〉/*N*. In cases where the residual **r** may be better approximated as nonwhite noise, other procedures can be used to construct a nondiagonal **R**, including those described in section 2b and appendix B. In the examples that follow, we assume that all residuals have the same variance. This assumption of stationarity corresponds to constant elements along the diagonals of **R**, i.e., a Toeplitz matrix.

Time series with a nonstationary residual **r**, for example due to time-varying instrumental noise, may have their residual covariance approximated by a non-Toeplitz matrix **R** in order to reduce the impact of the affected segments on the calculation of **R** is also not Toeplitz in this case, though for the direct inversion of relatively small **R**, this does not seem to impact performance. Computation may be reduced when analyzing multiple time series if each time series **y*** _{n}* can be analyzed with the same

**H**,

**R**, and

**P**such that all terms in Eq. (7) before

**y**are evaluated once and multiplied by each

**y**

*.*

_{n}### a. Relation to least squares

**R**=

*σ*

^{2}

**I**and

**P**

^{−1}→ 0 (Wunsch 1996), and is used in the t_tide package (Pawlowicz et al. 2002). OLS seeks to minimize (

**y**−

**H**

**x**)

^{T}(

**y**−

**H**

**x**), the misfit between the fitted time series and observations. In this case, the solution is

**y**−

**H**

**x**)

^{T}(

**y**−

**H**

**x**) +

**x**

^{T}

**W**

_{x}

**x**, where

**W**

_{x}is a matrix that weights the relative importance of minimizing model parameter magnitude over misfit. This technique is called ridge regression and reduces overfitting at the expense of bias (Wunsch 1996). The resulting least squares equation is said to be

*regularized*:

**r**may be weighted by the matrix

**W**

_{r}such that (

**y**−

**H**

**x**)

^{T}

**W**

_{r}(

**y**−

**H**

**x**) +

**x**

^{T}

**W**

_{x}

**x**is minimized, with the corresponding

*weighted*regularized least squares equation

**W**

_{x}=

**P**

^{−1}and

**W**

_{r}=

**R**

^{−1}, though the derivation of that solution is distinct from these least squares approaches [see section 3.6.2 of Wunsch (1996) for details]. A weighted least squares estimate (not regularized) is used in the UTide package (Codiga 2011), which implements an iteratively reweighted robust fit corresponding to Eq. (11) with

**W**

_{x}= 0 and

**W**

_{r}as a diagonal weighting matrix that deemphasizes outliers and is updated iteratively, the details of which are beyond the scope of this study.

### b. Power-law noise

Away from energetic processes at tidal, seasonal, and inertial frequencies, spectra *S* of ocean time series tend to follow a power law of the form *S* ∝ *f ^{γ}*, where

*γ*≤ 0 is the spectral slope and the negative value indicates more energy at lower frequencies (Agnew 1992). For computational efficiency in problems with many observations, we can assume wide-sense stationary noise and construct the residual covariance matrix

**R**as a sparse, symmetric Toeplitz matrix with the diagonals calculated from the Fourier transform of

*S*∝

*f*per the Wiener–Khinchin theorem, truncated at a user-defined time lag. Users of red_tide have the option of multiplying the covariance by a window function in order to reduce the spectral ringing that results from an abrupt drop to zero in the frequency domain. A low-amplitude white spectrum (spectral slope of 0) may also be added to account for observational error, such that the spectrum of

^{γ}**r**has a noise floor at all frequencies. This residual spectrum is not altered within fitted tidal frequency bands due to the impossibility of distinguishing tidal and nontidal energy at the same frequency, which results in model parameter uncertainty. For the typical case of tides that are much more energetic than the background, this effect is small, while in cases where tidal constituents have low energy or a broadband cusp of interest, the relatively larger uncertainty estimates on model parameters reflect the fact that nontidal variance is comparable to tidal variance. The approximation of the residual time series following some modified spectral power law with

*γ*< 0 will hereafter be referred to as a

*red noise*assumption, even when

*γ*is not exactly −2. In addition to the special cases considered above, the noise can exhibit more complicated structure that allows

**R**to be tractable while still representing some forms of red noise, for example noise as an autoregressive process (see appendix B).

## 3. Application of red_tide to illustrative cases

To compare the performance of red_tide to other widely used fitting procedures, we lay out several illustrative examples of its application to synthetic and physical data. Before more complicated cases are examined, it is worth reviewing a simplified case of tidal harmonic analysis that typifies the separate treatment of signal and noise and demonstrates the consequences of using an incomplete model. A typical ocean time series will have many tidal constituents present, with longer records better able to differentiate nearby frequencies. Because harmonic methods model the data using sinusoids of prescribed tidal frequencies, energy at unmodeled frequencies will remain in the residual time series. Figure 1 depicts the results of modeling a bottom pressure record (examined in greater detail in section 4a) at only two tidal frequencies: the principal lunar semidiurnal (M_{2}) and the principal solar semidiurnal (S_{2}). Figure 1a, with the interval in gray expanded in Fig. 1b, illustrates the misfit between the complicated data and the simple model that arises due to substantial residual energy at unmodeled frequencies. In the frequency domain (Fig. 1c), this is evident by the energetic tidal lines at frequencies that are not included in **H**. For a dataset with well-defined tidal lines with energy much greater than that of the background, conventional harmonic analysis like that employed by t_tide is well suited to addressing this issue by simply including more tidal constituents. The red_tide package is designed for more complicated cases for which single tidal constituents are insufficient as well as cases where nontidal variance is comparable to tidal variance.

### a. Model coefficient covariance, the Gibbs phenomenon, and periodicity

One challenge to harmonic analysis stems from the accurate representation of model coefficient covariance matrix **P** = 〈**xx**^{T}〉. Before further examining realistic cases, we start with a familiar example that demonstrates the effect of the covariance on computed model coefficients for a discrete step function. Modeling the step as a finite set of sinusoids leads to the Gibbs phenomenon, the tendency of the partial sum of a Fourier series to overshoot in the neighborhood of a discontinuity of the modeled function (Hewitt and Hewitt 1979). This phenomenon persists even with the addition of more terms in the partial sum, though the magnitude is reduced. While a discretely sampled time series may be fit well at observation points, band-limited Fourier coefficients do not adequately fit the step function between sampling times. The step function and resulting Gibbs phenomenon serve as a simple but extreme example of a situation present in real data: a process with variance at unmodeled frequencies has a prior that does not adequately describe the process. The variance of a step function is distributed across all frequencies [see Eq. (12) below], but when it is sampled coarsely and reconstructed as a band-limited process, some variance is aliased and the true underlying process is poorly reconstructed despite good agreement at observation times. Therefore, we incorporate assumptions and prior knowledge of a process, including cases in which there are fewer data than parameters that are suspected to be worth estimating.

Another challenge stemming from the limitations of finite sampling and fitting is the inherent periodicity of solutions when a finite record is modeled as the sum of periodic functions; this is not a problem for tidal processes, which have periods much shorter than those of typical observations, but it will affect estimates for explicitly modeled low-frequency processes that may also be of interest. To demonstrate the effect of the estimator on both of these related issues (Gibbs phenomenon and artificial periodicity), we analyze a finite, uniformly sampled record of a step function using the method described in section 2.

*y*(

*t*) is the underlying continuous time series to be analyzed after discrete sampling, sgn(

*t*−

*t*

_{0}) is the sign function with jump discontinuity at

*t*

_{0}and

^{−2}, therefore it has a spectral slope of −2.

In the example, a step function of record length *T* is sampled symmetrically about the jump discontinuity at 1000 evenly spaced times such that all Fourier frequencies, from the fundamental frequency Δ*f* = 1/*T* to the Nyquist frequency *f*_{Ny}, could be computed. Different choices of basis functions and model parameter covariance are used to evaluate the sensitivity of red_tide output to these inputs. If the model includes only frequencies greater than or equal to Δ*f*, spaced at increments of Δ*f*, then the model will have a fundamental periodicity of *T*, the record length, because this is the longest period represented in the model. To reduce this effect, we incorporate frequencies less than the fundamental frequency into **H**, starting at Δ*f*/2 and increasing by intervals of Δ*f*/2. This extends the model periodicity to 2*T*, twice the record length (Fig. 2a), and also reduces Gibbs-like behavior at the beginning and end of the fitted time series.

**xx**

^{T}〉 are tested: 1) an assumption that the covariance is constant at all frequencies (a spectrally white process), and 2) an assumption that its power spectrum is proportional to

*f*

^{−2}[a spectrally red process representing the true spectrum in Eq. (12)]. Two sets of basis functions are used: one that is composed of sinusoids at the Nyquist frequency

*f*

_{Ny}and lower, and one that is composed of sinusoids at frequency 2

*f*

_{Ny}and lower. Both basis sets are spaced in frequency by Δ

*f*= 1/

*T*. Discretely sampled sinusoids of frequencies greater than

*f*

_{Ny}are indistinguishable from sinusoids at the frequencies lower than

*f*

_{Ny}to which they are aliased. The larger but linearly dependent basis set is used in order to demonstrate the effect that

**P**in Eq. (7) has in constraining an underdetermined system; in this case, describing the behavior of a time series near a step discontinuity at times between the sampling interval requires that frequencies greater than

*f*

_{Ny}be represented, constrained by the expected spectral power of the signal. The residuals are assumed to be uncorrelated (white noise), and the expected residual variance can be calculated from the total spectral power at frequencies above the highest-frequency basis. The expected fraction of residual variance is calculated from the integral of the true spectrum over frequencies not explicitly modeled:

*f*

_{low}and

*f*

_{high}are the lowest and highest frequencies of sinusoids in

**H**. Frequencies less than

*f*

_{low}are not included in the integral due to the singularity of

*f*

^{−2}at

*f*= 0 and the approximation of very low-frequency variance as a mean and trend. The total residual variance in this case is small, and hence, the impact of the residual covariance matrix is expected to be negligible and is not examined in this example. The effects of different assumptions of residual covariance are examined later with data for which these effects are noticeable. Both fitted time series in Fig. 2a, periodic on

*T*and on 2

*T*, incorporate a covariance matrix

**P**constructed from a spectrum proportional to

*f*

^{−2}.

Near the jump discontinuity, there are two basis choices that we will consider: distinguishable basis sinusoids up to the Nyquist frequency and indistinguishable sinusoids up to twice the Nyquist frequency, which are aliased for *f* > *f*_{Ny}. We also consider two choices for model parameter covariance: a constant value and a covariance proportional to the known spectral power of the data. Together, these form four regimes that can demonstrate the effects of basis choice and model parameter covariance on the resulting fitted approximations to the same step data. The OLS approach models the step function as a series of sinusoids up to the Nyquist frequency with no assumptions about the model parameter covariance. This results in the Gibbs phenomenon near the jump discontinuity (Fig. 2b). This result is nearly identical to the partial sum of the Fourier series for this function (not shown), with a slight difference due to least squares’ allowance of nonzero **P** corresponds to a spectral slope of −2 as does the true spectrum, the red_tide procedure reduces the Gibbs phenomenon at the expense of greater misfit in the immediate vicinity of the jump discontinuity (Fig. 2c).

The Gibbs phenomenon may be reduced by including additional frequencies. Sinusoids at frequencies greater than the Nyquist frequency will be aliased, however, so the effect of the model parameter covariance is more pronounced than in the former examples. Figure 2d shows the result of fitting a finite record to indistinguishable (resolved and aliased) frequencies while naively assuming a spectrally white process. Without the regularization provided by a sufficiently accurate **P**, the duplicate bases in **H** render it rank deficient and therefore the interpolated fitted time series is unrealistically large in amplitude at unobserved times. Modeling aliased frequencies with the assumption of an accurate (spectrally red) covariance, however, results in a fit that has further reduced the Gibbs phenomenon while also reducing the misfit at sampled times immediately before and after the jump discontinuity (Fig. 2e). This approach (regularizing according to prior statistics) yields the smallest root-mean-square error of the four regimes, both at observation times and at the interpolated higher temporal resolution (Table 1). Though the solution in Fig. 2e does not pass through the observations like those in Figs. 2b and 2d, it nevertheless is the best representation of the underlying sampled process due to its reduction of the Gibbs phenomenon, despite fitting to the same data. Note that the accuracy of these interpolations is only quantifiable here due to knowledge of the true underlying function from which the data are perfectly drawn, which is not possible with real observations.

Root-mean-square error for the four step function analyses shown in Figs. 2b–e. RMSE is calculated at observation times in the top row and at a set of times sampled at 100 times the observational resolution in the bottom row.

These refinements to the standard OLS approach demonstrate three advantages of the methods used in red_tide. First, the use of a fundamental frequency lower than that suggested by the record length reduces the effect of periodicity imposed by the model on the solution by lengthening the time scale of this periodicity. Second, the choice of model parameter covariance matrix **P** impacts the solution: a choice of **P** that is more representative of the true behavior of the data reduces the magnitude of the Gibbs phenomenon. Third, frequencies inaccessible to a discrete Fourier transform may be included to more realistically account for variance. The ambiguity of aliased frequencies (those exceeding the Nyquist frequency but at which variance is present) is reduced by using an accurate **P**. We do not examine aliased signals elsewhere in this study.

### b. Synthetic time series

*ω*+

_{T}t*ϕ*) may be multiplied by an amplitude envelope

*A*(

*t*) such that

*ω*) domain,

*A*(

*t*), and ⊗ denotes convolution. The constant

To generate the synthetic time series used in this study, a spectrally red modulating function is used to simulate the cusps observed around tidal lines in the spectra of tidally dominated ocean time series, and a random phase is assigned at each frequency before an inverse Fourier transform is applied to produce a synthetic time series with known spectral power. Samples from a red background are added to the time series to simulate broadband nontidal ocean variability. Figure 3 shows an example of three synthetic spectra (top panel) and corresponding hourly time series computed with random phase and truncated at 500 h (bottom panel). The top panel also shows the power spectrum of observations in gray for comparison (these are discussed in section 4a). The phase at each frequency is identical for each time series, so only the spectral power differs. All three have peaks of equal magnitude at the M_{2} frequency. The red background spectrum is proportional to 1/*ω*^{2} while the modulating spectrum is proportional to *ω*_{0} is a small frequency introduced to eliminate the singularity at *ω* = 0. In the red-background time series, there is more variability at low frequencies as in the real ocean, while the modulated one imitates the interaction of the tide with low-frequency processes. The time series with the spectrally white background has comparatively more power at supertidal frequencies, which is evident in the time domain from the short time-scale noise that is less noticeable in the data with a red background. The data displayed are sampled hourly to be consistent with the choice of the Nyquist frequency, while the length of the time series is set by the choice of the fundamental frequency.

### c. Effect of noise covariance on amplitude, phase, and uncertainty

The estimated amplitude and phase of the tidal constituents are affected by the spectrum used to construct **R**, also referred to here as the noise covariance or residual covariance. At each frequency represented in **H**, the data are represented twice: explicitly in the frequency domain as model parameters (**R** to approximate the true covariance of the nontidal component of the data improves the accuracy of the estimated coefficients. This is important because geophysical time series generally do not have flat spectra but rather spectra that decrease with increasing frequency. For example, if data with a red noise term (*S*_{noise} ∝ *f* ^{−2}, like the red and blue curves in Fig. 3) were modeled using a spectrally white **R** (equivalent to the black curve), the variance would be treated disproportionately as signal at low frequencies (where the true noise is actually more energetic than what is given by **R**) and disproportionately as noise at high frequencies (where the true noise is actually less energetic than what is given by **R**). This is the case even if the spectrum corresponding to **R** has the same total energy as the true background.

Here we analyze a 1001-point synthetic record sampled hourly. If there is a low-amplitude, high-frequency signal present in the data, a high assumed noise level at that frequency would limit the detection of that constituent, as shown by the large blue intervals in Fig. 4a at *f* > 2 cpd. A colored spectrum that matches the frequency dependence of the true noise component is an improvement over the simpler assumption of an uncorrelated (spectrally flat) white noise that does not match the true noise component. The former gives a relatively constant ratio of assumed spectrum to true spectrum across frequencies, while the latter gives a frequency-dependent ratio. This results in the variance of **H****x** being overrepresented at low frequencies, because the noise covariance is too low, and underrepresented at high frequencies, because the noise covariance is too high. Because variance is represented twice at modeled frequencies, the covariance matrices **R** and **P** serve as constraints on the partition of energy between **H****x** and **r** (recall that **R** represent covariance in time of a wide-sense stationary noise component and therefore has power at all frequencies, including those to which data are fitted).

Figure 4 shows the effects of these constraints in the frequency domain (“model space”): as described in section 3b and illustrated in Fig. 3, a synthetic time series is constructed by adding a random background process with a spectral slope of −2 to a modulated semidiurnal (M_{2}) tidal process. In Fig. 4a, the total spectrum (solid black) and its tidal component (signal, dashed black) and background (noise, dotted black) are plotted along with red_tide squared amplitude estimates **P** constructed from the true tidal spectrum, not the total spectral density. Shading indicates 90% credible intervals of amplitude. These intervals are calculated using Monte Carlo sampling from a 10 000-member population obeying the posterior error of *f* = Δ*f* to *f*_{Ny}. This matches the true noise-to-signal ratio. Both red noise analyses (red and orange lines in Fig. 4b), which more closely approximate the true noise component, result in more accurate amplitude estimates at low frequencies where the signal-to-noise ratio is low (red or orange lines versus black dashed line in Fig. 4a at frequencies less than 0.2 cpd), and greater precision at higher frequencies, where the signal-to-noise ratio is high. The calculated spectra of the residual time series are lower for increasingly accurate noise spectra, indicating that more variance is allocated to the model parameters.

Figure 5 compares the performance of the algorithm for both amplitude and phase under these three noise regimes. The ratio of median estimated amplitude (*π*, *π*). This is expected for the phase of a low-amplitude signal in the presence of energetic noise that renders that signal’s phase unrecoverable. At frequencies higher than the tidal cusp, the standard deviation of phase from the white noise analysis approaches this value because it assumes that noise is more energetic than the signal, while the red noise assumptions give results with lower variability, with the full red noise assumption giving the lowest variability.

The lessons learned from the idealized case of a short record with a tidal cusp and red noise also hold for a more complicated and realistic case in which a long record with a proportionately finer frequency resolution is modeled only at frequencies in limited bands. In Figs. 4 and 5, all resolvable frequencies are modeled to illustrate the effect of noise representation on the partition of variance into signal and noise. In practice, however, tidal time series may be several years long, and so the number of data, and hence, the number of resolvable frequencies, can be large enough such that modeling all frequencies explicitly is computationally prohibitive. Furthermore, in such cases the residual covariance matrix **R** cannot be practically constructed at all possible time lags, necessitating instead an approximation like that discussed earlier (truncated, in sparse form, and windowed to reduce spectral ringing). Figure 6 shows results analogous to those in Fig. 4 for an hourly sampled 7-yr-long synthetic time series. In this case, only limited frequency bands are modeled, with the rest of the total variance accounted for by the residual. An accurate model parameter covariance enforces realistic amplitudes and uncertainty on the model parameters by setting a signal-to-noise ratio that approximates the true ratio of tidal energy to nontidal energy as a function of frequency and not only the total, frequency-integrated ratio. This corresponds to the red curve in Fig. 6. The blue curve corresponds to an assumption of white noise: though the total frequency-integrated signal-to-noise ratio is the same for both estimates, the white noise assumption overrepresents the variance of the nontidal component in the semidiurnal band (around 2 cpd), giving larger credible intervals resulting from the unnecessary uncertainty introduced by overestimating the nontidal variance. Though the red noise covariance is a windowed, truncated approximation of the true nontidal covariance, its spectrum (dotted red) closely matches the true underlying spectrum of nontidal variability (dashed black). With only 500 frequencies modeled out of the over 60 000 that are fully resolvable, the reconstructed time series accounts for more than 45% of the total variance and estimates tidal and off-tidal amplitudes within error. This shows that modeling all resolvable frequencies directly in

### d. Effect of noise covariance and record length on constituent estimates

To evaluate the impact of the choice of noise covariance and record length on estimated tidal coefficients, we ran ten Monte Carlo experiments. These ten experiments used varying combinations of tidal energy (two regimes) and background structure (five regimes), which are described below. The six runs with more negative spectral slopes (*γ* = −3, −2.5, 2, corresponding to Figs. 7a–c,f–h and 8a–c,f–h used 10 000 sample time series, while the four runs with less negative spectral slopes (*γ* = −1.5, −1, corresponding to Figs. 7d,e,i,j and 8d,e,i,j, used 50 000 sample time series to obtain stable results. Each time series had 24 000 hourly samples but was only analyzed up to 1500 h in order to investigate the effect of record length on estimated parameters, with the exception of the *γ* = −1 case, which was analyzed up to 5000 h in order to examine convergence at longer record lengths. Three tidal constituents typically seen in observations (K_{1}, M_{2}, and S_{2}) are added onto a synthetic noise background. Analysis for only the M_{2} constituent is shown, as the results for other tidal constituents are qualitatively similar. Only phase varies from one time series to another within a run, with signal and noise amplitude constant for all ensemble members. The phase is varied by randomly selecting from a uniform distribution on the interval [0, 2*π*), and is varied separately for the noise and signal components. Two quantities derived from model parameters are examined to determine the impact of noise covariance and record length: the bias of tidal amplitude estimates as a measure of accuracy, and the variance of model parameters (harmonic coefficients) about the true values as a measure of precision.

_{2}constituent amplitude to true amplitude across the ten Monte Carlo experiments:

*a*and

*b*are, respectively, the sine and cosine coefficients of the tidal constituent in question (here M

_{2}), and hats denote estimates. Note that

*a*and

*b*vary with each Monte Carlo simulation, but the sum of their squares does not; hence,

**R**constructed from an appropriate power spectrum. Amplitude estimates also improve with increasing record length, though this effect is smaller than the effect of the noise covariance.

Estimated amplitudes are comparably accurate and in some cases more accurate for spectral slopes less than (steeper than) the true slope when compared to the amplitude accuracy when using the true slope to construct **R**. This is, however, a small effect that appears to diminish with increasing record length. For example, in Figs. 7e and 7j, an assumed spectral slope of −1.5 achieves more accurate results than the true slope of −1, which gives lower amplitude ratios, an effect that is stronger at shorter record lengths. This may be due to the bias of the estimators **P** (noise slope of 0 means that **R** is a scaled identity matrix that can be treated as a constant and would be canceled out if **P**^{−1} were 0). OLS overestimates tidal amplitudes consistently while the regularization of the otherwise identical white noise approach typically underestimates tidal amplitudes except for the case of strong tides and a spectrally steep background (Figs. 7a,b) for which such a white noise assumption is not suitable.

**x**and, like amplitude

*a*

^{2}+

*b*

^{2}=

*A*

^{2}, where

*A*is constant across all Monte Carlo simulations. Therefore, both

As with amplitude, when the random background has a more negative spectral slope, estimates of tidal coefficients are more sensitive to noise covariance choice (Figs. 8a,b,f,g) than they are when the background has a less negative spectral slope (Figs. 8d,e,i,j). Unlike amplitude, coefficients are primarily sensitive to record length when the background noise is relatively unstructured, with increasing sensitivity to the choice of noise covariance as the true noise spectrum becomes steeper. This can be seen in the gradient of the normalized parameter variance in each panel: in Figs. 8a, 8b, 8f, and 8g, where *γ* is more negative, the variance decreases for more accurate noise slopes, whereas in Figs. 8d, 8e, 8i, and 8j, where *γ* is less negative, the variance decreases more strongly with increasing record length. Filled dots are placed at elements corresponding to the lowest value (highest precision) in their respective rows. For all simulations with a background process of spectral slope *γ* ≥ −2 (less steep), precision is highest for the noise assumption that matches the true spectral slope. For *γ* < −2 (steeper), this is also the case for sufficiently long records, and for short records it is still better to match the noise spectrum than to use OLS.

In almost all cases, using Eq. (7) (with **P** and **R**) instead of Eq. (9) (OLS without an explicit noise assumption, labeled “none” in the rightmost column of every panel in Figs. 7 and 8) resulted in more accurate estimates for *A* (ratios closer to 1 in Fig. 7) and more precise estimates for **x** (lower parameter variance in Fig. 8), at least for the cases examined here in which tidal energy is 10–100 times greater than that of noise at the same frequency. The OLS approach, which is widely implemented in tidal harmonic analysis, was comparable to the red_tide approach for *γ* = −1.5 and *γ* = −1, indicating that it is most suitable for tidal analysis of data with a spectral background that is nearly white. Data with steeper background spectra benefit from treating the residual **r** as a spectrally red process by way of the covariance matrix **R**.

## 4. Application to oceanographic data

### a. Bottom pressure

The methods implemented in red_tide seek to address several potential issues that can arise when harmonically analyzing real tidal time series, each of which are presented in isolation in section 3 by using synthetic data. The first demonstration of red_tide on observations uses a bottom pressure record, a dataset with low background noise where record length is sufficient and non-phase-locked energy is much weaker than the phase-locked tide. Bottom pressure measurements from the NOAA Deep-ocean Assessment and Reporting of Tsunamis (DART) (NOAA 2005) are dominated by the largely coherent barotropic tide. The time series examined here originates from site 51406 (8.48°S, 125.03°W) and spans 3 years and 7 months of observations, from 12 February 2011 to 6 September 2014. Many tidal constituents of amplitudes spanning several orders of magnitude are present in this record, as seen by tidal lines in Fig. 9. Pressure measurements have a significantly lower noise level than coastal surface height gauges and have accurate harmonic constituents even over short records (Le Provost 2001). Bottom pressure is therefore useful for evaluating the accuracy of harmonic decomposition in the regime of tidally dominated, low noise observations, which typically do not pose major problems when calculating constituents. Hourly averaging of 15-s-sampled data further suppresses noise and instrumental artifacts, such as digitization, and does not alias major constituents.

The power spectrum of the bottom pressure time series (Fig. 9) exhibits many prominent peaks, of which 22 are singled out for analysis. These 22 frequency bands together account for more than 99% of the variance of the time series; therefore both the white noise assumption (not shown) and red noise assumption produce essentially identical results. Figure 10 shows the output from red_tide using only the red noise assumption for **R** with spectral slope *γ* = −3/2 alongside t_tide output. Where t_tide models only tidal frequencies associated with astronomical parameters, this analysis includes those same frequencies and 30 additional frequencies per constituent in a band of 10 yr^{−1} centered on that tidal frequency in order to capture modulation at annual and semiannual cycles and cusp-like spreading of peaks. These harmonic amplitudes are spaced at Δ*f *= 1/3 yr^{−1}, a frequency step smaller (and hence of higher resolution) than that of the power spectrum, which is coarser in resolution due to segmenting. This corresponds to a 3-yr period, shorter than the record length of 3 years and 206.5 days in order to ensure the resolution of annually periodic modulation of the main tidal constituents, regardless of the exact record length. The amplitudes are normalized to have units of spectral power for comparison. The results of analysis by t_tide are also normalized and plotted with 90% confidence intervals (the t_tide code that defaults to 95% confidence intervals is modified).

The tidal amplitudes given by t_tide largely match the red_tide results for high-amplitude constituents and fall within credible intervals at low-amplitude constituents, while providing analysis at more frequencies. Focusing on a single cluster of constituents shows this more clearly (Fig. 11). The semidiurnal band, centered about the energetic M_{2} constituent, contains several other well-resolved tidal constituents (Darwin symbols 2N_{2}, *μ*_{2}, N_{2}, *ν*_{2}, *λ*_{2}, L_{2}, S_{2}, and K_{2}) that result from the complicated gravitational tidal forcing potential, many of which are not resolvable when using segmenting methods or are not exactly mapped by a discrete Fourier transform, which would result in spectral leakage. Additionally, the energy of the cusps is explicitly modeled, with intervals that reflect the uncertainty associated with a realistic noise level that is a function of frequency, which becomes important for these low amplitude components. The characteristics of the non-phase-locked component of the tides can therefore be diagnosed from these cusps. The high signal-to-noise ratio at energetic constituents, on the other hand, means that the noise level is less important at these frequencies; hence, the OLS approach of t_tide works well for these constituents.

### b. High-frequency radar

Like OLS, the red_tide package effectively models data with weak noise and tidal constituents that are highly coherent with astronomical forcing. The data regime for which red_tide is designed includes higher levels of structured noise and tidal energy that cannot be predominantly described by a small number of frequencies. Surface currents, which are driven by wind, tides, eddies, and mean flow, fall under this category.

Observations of surface currents are obtained from a high-frequency radar network (HFRnet; see Terrill et al. 2006). The California Current System (CCS), a region that is well sampled by this network (Roarty et al. 2019), is used to evaluate the harmonic decomposition technique. Radial velocities measured by antenna stations are mapped to a Cartesian grid of zonal and meridional velocities using a least squares fit (Ohlmann et al. 2007). Surface currents are driven by a wide range of dynamics: direct wind forcing, near-inertial motions, interannual variability of the local current system, and tides, including tidal currents and the surface expression of internal tides. This contrasts with bottom pressure, which is dominated by tides, with many more prominent tidal frequencies than appear in surface currents.

Figure 12 shows the averaged rotary power spectrum (spectral power partitioned by rotational polarization) over the grid points in a region of the CCS ranging from 33.7561° to 38.1252°N out to approximately 100 km from the coast (for the formulation of rotary spectra, see Gonella 1972). This formulation is convenient for visualizing the spectral power of surface currents due to the polarized flow resulting from near-inertial oscillations, which at these latitudes occur at frequencies between 1.12 and 1.26 cpd. Tidal peaks in HFR surface currents are pronounced and are comparable to or greater than the energy in the inertial and low-frequency (<0.4 cpd) bands. These spectra are calculated using Welch’s method, and the error bar denotes the ratio of high to low estimates for the power spectrum for a confidence level of 95%; this ratio is constant on logarithmic axes and does not vary with frequency. It is calculated from a chi-squared distribution for degrees of freedom equal to twice the number of windowed segments whose periodograms are averaged to calculate the spectrum (e.g., Bendat and Piersol 2010), divided by 9 from the assumption that neighboring sites are correlated.

An hourly sampled, 9-yr-and-3-month-long (1 January 2012–1 April 2021) HFR surface current time series from 35.5361°N, 121.1776°W, roughly 7 km off the coast of San Luis Obispo County, California, is analyzed in four frequency bands: low frequency (0.00732 cpd and less), S_{1} solar diurnal (centered on 1 cpd), M_{2} lunar semidiurnal (centered on 1.932 cpd), and S_{2} solar semidiurnal (centered on 2 cpd). These frequency bands, shown in Fig. 13a, account for roughly a fifth of the variance of the time series for both zonal velocity *u* and meridional velocity *υ*. Therefore, 80% of the variance is included in the residual time series. Harmonic coefficients from red_tide are calculated using a model parameter covariance built from the domain-averaged power spectrum (solid black line) from 1191 grid points at latitudes ranging from 33.7561° to 38.1252°N, out to about 100 km offshore. The individual power spectrum calculated from the analyzed time series is shown in gray for comparison: smoothed peaks due to low frequency resolution and spectral leakage due to sampling and windowing result in a spectrum of an individual time series that does not capture features that the least squares approaches can. The noise (residual) spectrum has a constant spectral slope of −1 and a corresponding covariance truncated at 300 h lag, resulting in modest spectral ringing (dashed line). Amplitudes given by red_tide are calculated as half the sum of the sine and cosine coefficients squared, *u* are shown, as results for *υ* are qualitatively similar; the full two-dimensional character of time series in this region is shown in Fig. 12 to illustrate the frequency-dependent polarization of surface currents, which may be modeled with red_tide. The calculation of rotary coefficients from red_tide output for *u* and *υ* is straightforward, though results are not shown.

As with bottom pressure, harmonic decomposition reveals sharp peaks at tidal frequencies that are not resolved in power spectra with lower frequency resolution due to averaging over time intervals shorter than the record length. Because the choice of basis functions for red_tide is arbitrary, the fundamental frequency of the dataset does not necessarily limit the spacing of modeled frequencies, though in practice the tolerance of basis nonorthogonality and resulting uncertainty will limit the choice of frequencies. The high noise level in these data results in large uncertainty at frequencies around tidal peaks as in Fig. 13b, as indicated by shaded intervals. Despite this, annual modulation of the surface current is evident by the second and third most energetic peaks in the M_{2} band appearing at

## 5. Summary and discussion

The methods outlined here and implemented in red_tide are intended to provide best estimates of tidal amplitudes for data with red background spectra and significant tidal cusps. Red_tide incorporates a red noise covariance and includes additional frequencies beyond those of the astronomical forcing to accommodate data with highly energetic and correlated nontidal components, a weak tidal signal relative to nontidal processes, or a modulated tidal component with energy distributed across a band of frequencies. Short records, for which long-period variance appears trend-like, also benefit from these methods because variance at fitted frequencies is allocated to model parameters according to prior statistical assumptions. The spectrally colored noise covariance is constructed to approximate the spectral properties of the nontidal component of the data, and may be truncated and represented as a sparse matrix for computational efficiency with a window function applied to off-diagonal elements to suppress spectral artifacts that result from truncation. On the other hand, highly coherent tidal records with well-defined peaks and small cusps, such as bottom pressure, are well described by OLS, as the background noise is orders of magnitude lower in amplitude than the tidal signal.

These methods also address time series for which the choice of model covariance impacts results. We demonstrate this with a step function, a simple case that exhibits an extreme mismatch between the fit and the data when an inappropriate model parameter covariance is used, resulting in the well-known Gibbs phenomenon. We have found that when fitting a discrete step function across resolvable frequencies, the assumption of a realistic covariance reduces the magnitude of the Gibbs phenomenon near the jump discontinuity when compared to an assumption of constant covariance matrix

The accuracy and precision of model parameters, given explicitly as a posterior covariance matrix, are impacted by the choice of residual (noise) covariance. Spectrally colored time series may have a residual background that varies over orders of magnitude across tidal bands, necessitating an appropriate noise covariance matrix if all constituents are to be estimated optimally. When the user-specified residual spectral power is significantly lower than the total spectral power of the time series, red_tide allocates more variance to the model parameters. If the energy of the nontidal component is well understood, variance can be realistically allocated between the estimated signal

In summary, red_tide was designed to estimate tidal coefficients while incorporating prior assumptions that accurately account for the spectral structure of underlying noise and allow flexibility in the choice of modeled frequencies, which is important for data with a modulated, non-phase-locked tidal component. Longer records and less strongly correlated noise benefit less from this flexibility. Ordinary least squares is comparatively less suited to computing tidal harmonics from data with spectrally colored noise, especially red noise with a steep spectral slope. The code is available for use and modification, the details of which are in appendix A.

## Acknowledgments.

This work has been supported by a Future Investigators in NASA Earth and Space Science and Technology award (80NSSC19K1342). In addition, Luke Kachelein, Bruce Cornuelle, Sarah Gille, and Matthew Mazloff acknowledge support from the NASA Surface Water and Ocean Topography Science Team (Awards NNX16AH67G and 80NSSC20K1136), and Sarah Gille also acknowledges support from the NASA Ocean Surface Topography Science Team (Award 80NSSC21K1822). We would also like to thank Edward Zaron for his guidance on the evolving nomenclature for modulated tidal phenomena, as well as the anonymous reviewers for their thoughtful comments and suggestions that improved the quality and readability of this study.

## Data availability statement.

Relevant data and scripts, including those used to generate all figures and results in this paper, are publicly available through the University of California, San Diego, library digital collections at https://doi.org/10.6075/J080515G. Readers interested in accessing observational data may visit https://doi.org/10.7289/V5F18WNS for bottom pressure data and https://dods.ndbc.noaa.gov/thredds/hfradar.html for high-frequency radar surface current data.

## APPENDIX A

### Downloading red_tide

The red_tide package is available for download as a GitHub repository at https://github.com/lkach/red_tide and in archived form (see data availability statement). This package is written in the MATLAB language, but translation to other programming languages is welcome and encouraged. It has also been designed to work in the free software GNU Octave language and Octave-specific instructions are provided with the software release. Input for red_tide is flexible, with several options and default settings.

## APPENDIX B

### Noise as an Autoregressive Process

**y**, except in the bands of modeled frequencies. With the energetic tidal components removed, a spectrally red residual can be modeled as an autoregressive process AR(

*p*), where order

*p*is the maximum number of time steps for which the system has memory (von Storch and Zwiers 2003):

*p*AR process

*r*at time

*t*with a white noise component

*ϵ*. The AR parameters

*α*can be estimated from the Yule–Walker equations (e.g., von Storch and Zwiers 2003). From these, the spectral density of

_{k}*r*can be estimated by

*f*is frequency, and

*ϵ*. The spectrum of the AR-modeled residual time series can be used as an estimate of the spectrum of underlying nontidal, non-wind-driven intrinsic variability in the ocean. If

**y**at frequencies outside those in

**H**and assuming white noise

*ϵ*, the coefficients

**R**for a second iteration of fitting using this new estimate for the noise covariance matrix. This can be done by taking the inverse Fourier transform of

**r**via the Wiener–Khinchin theorem.

**R**is then constructed from the covariance as outlined in section 2b.

## APPENDIX C

### Alternative Form of Eq. (7)

*N*than frequencies

*M*that are of interest to model. Therefore the data-space inversion [Eq. (7)] is used in red_tide because

**H**

^{T}

**R**

^{−1}

**H**has lower computation and memory requirements (4

*M*

^{2}elements) than

**HPH**

^{T}(

*N*

^{2}elements). The

*N*×

*N*residual covariance matrix inverse,

**R**

^{−1}, that appears in Eq. (7) does not need to be computed explicitly when using efficient linear system solution algorithms for matrix inversion, which instead directly calculate

**H**

^{T}

**R**

^{−1}, which is 2

*M*×

*N*. Further, a sparse representation of

**R**minimizes memory requirements in Eq. (7) compared to the more challenging requirements of the dense

*N*×

*N*matrix in Eq. (C1).

## APPENDIX D

### Nondimensionalization

**R**. For example, Coles et al. (2011) use a nondiagonal matrix expression for

**R**, the efficient inversion of which is outlined here. The computational resources to directly invert a nondiagonal

**R**are too high to be practical:

**R**is

*N*×

*N*, where

*N*is the length of

**y**, which is much longer than

**x**in practice. Though red_tide uses MATLAB’s default linear equation solving method, a Cholesky lower triangle factorization of the residual covariance matrix

**R**= 〈

**rr**

^{T}〉 may also be used. The residual covariance can be factored as

**R**=

**R**

^{1/2}

**R**

^{T/2}, where

**R**

^{1/2}is lower triangular with the inverse

**R**

^{−1/2}used as a (nonunique) whitening transform:

**R**

^{T/2}

**R**

^{−1}

**R**

^{1/2}=

**I**, these can be substituted into Eq. (7), which simplifies to

The choice of **R**, and thus **R**^{−1/2}, relies on accurately estimating the residual covariance. This may be done by examining the calculated data covariance matrix **yy**^{T} or equivalently the power spectrum *S*** _{yy}** and estimating 〈

**rr**

^{T}〉.

**P**can be expressed as

**P**=

**P**

^{1/2}

**P**

^{T}

^{/2}. Define

**H**′ =

**R**

^{1/2}

**HP**

^{1/2}such that Eq. (D2) can be written as

**P**

^{1/2}that allows

**y**.

## REFERENCES

Agnew, D. C., 1992: The time-domain behavior of power-law noises.

,*Geophys. Res. Lett.***19**, 333–336, https://doi.org/10.1029/91GL02832.Bendat, J., and A. Piersol, 2010: Nonstationary data analysis.

*Random Data*, John Wiley and Sons, 417–472.Chavanne, C. P., and P. Klein, 2010: Can oceanic submesoscale processes be observed with satellite altimetry?

,*Geophys. Res. Lett.***37**, L22602, https://doi.org/10.1029/2010GL045057.Chiswell, S. M., 2002: Energy levels, phase, and amplitude modulation of the baroclinic tide off Hawaii.

,*J. Phys. Oceanogr.***32**, 2640–2651, https://doi.org/10.1175/1520-0485-32.9.2640.Codiga, D. L., 2011: Unified tidal analysis and prediction using the UTide MATLAB functions. URI/GSO Tech. Rep. 2011-01, 59 pp., ftp://www.po.gso.uri.edu/pub/downloads/codiga/pubs/2011Codiga-UTide-Report.pdf.

Coles, W., G. Hobbs, D. J. Champion, R. N. Manchester, and J. P. W. Verbiest, 2011: Pulsar timing analysis in the presence of correlated noise.

,*Mon. Not. Roy. Astron. Soc.***418**, 561–570, https://doi.org/10.1111/j.1365-2966.2011.19505.x.Doodson, A. T., and H. Lamb, 1921: The harmonic development of the tide-generating potential.

,*Proc. Roy. Soc. London***100**, 305–329, https://doi.org/10.1098/rspa.1921.0088.Eich, M. L., M. A. Merrifield, and M. H. Alford, 2004: Structure and variability of semidiurnal internal tides in Mamala Bay, Hawaii.

,*J. Geophys. Res.***109**, C05010, https://doi.org/10.1029/2003JC002049.Foreman, M. G. G., 1977: Manual for tidal heights analysis and prediction. Pacific Marine Science Rep. 77-10, 66 pp.

Foreman, M. G. G., and R. F. Henry, 1989: The harmonic analysis of tidal model time series.

,*Adv. Water Resour.***12**, 109–120, https://doi.org/10.1016/0309-1708(89)90017-1.Foreman, M. G. G., J. Y. Cherniawsky, and V. A. Ballantyne, 2009: Versatile harmonic tidal analysis: Improvements and applications.

,*J. Atmos. Oceanic Technol.***26**, 806–817, https://doi.org/10.1175/2008JTECHO615.1.Gonella, J., 1972: A rotary-component method for analysing meteorological and oceanographic vector time series.

,*Deep-Sea Res. Oceanogr. Abstr.***19**, 833–846, https://doi.org/10.1016/0011-7471(72)90002-2.Haigh, I. D., M. Eliot, and C. Pattiaratchi, 2011: Global influences of the 18.61 year nodal cycle and 8.85 year cycle of lunar perigee on high tidal levels.

,*J. Geophys. Res.***116**, C06025, https://doi.org/10.1029/2010JC006645.Hewitt, E., and R. E. Hewitt, 1979: The Gibbs-Wilbraham phenomenon: An episode in Fourier analysis.

,*Arch. Hist. Exact Sci.***21**, 129–160, https://doi.org/10.1007/BF00330404.Hoerl, A. E., and R. W. Kennard, 1970: Ridge regression: Biased estimation for nonorthogonal problems.

,*Technometrics***12**, 55–67, https://doi.org/10.1080/00401706.1970.10488634.Ide, K., P. Courtier, M. Ghil, and A. C. Lorenc, 1997: Unified notation for data assimilation: Operational, sequential and variational.

,*J. Meteor. Soc. Japan***75**, 181–189, https://doi.org/10.2151/jmsj1965.75.1B_181.Lee, P. M., 1997:

*Bayesian Statistics: An Introduction.*2nd ed. Arnold Publishers, 351 pp.Leffler, K. E., and D. A. Jay, 2009: Enhancing tidal harmonic analysis: Robust (hybrid L1/L2) solutions.

*Cont. Shelf Res.*,**29**, 78–88, https://doi.org/10.1016/j.csr.2008.04.011.Le Provost, C., 2001: Ocean tides.

*Satellite Altimetry and Earth Sciences*, L.-L. Fu and A. Cazenave, Eds., International Geophysics Series, Vol. 69, Academic Press, 267–303.Lourens, A., and F. C. van Geer, 2016: Uncertainty propagation of arbitrary probability density functions applied to upscaling of transmissivities.

,*Stochastic Environ. Res. Risk Assess.***30**, 237–249, https://doi.org/10.1007/s00477-015-1075-8.Matte, P., D. A. Jay, and E. D. Zaron, 2013: Adaptation of classical tidal harmonic analysis to nonstationary tides, with application to river tides.

,*J. Atmos. Oceanic Technol.***30**, 569–589, https://doi.org/10.1175/JTECH-D-12-00016.1.Menke, W., 2018:

*Geophysical Data Analysis: Discrete Inverse Theory.*4th ed. Elsevier, 352 pp.Munk, W. H., and K. Hasselman, 1964: Super-resolution of tides.

*Studies on Oceanography: A Collection of Papers Dedicated to Koji Hidaka*, The University of Tokyo Press, 339–334.Munk, W. H., B. Zetler, and G. W. Groves, 1965: Tidal cusps.

,*Geophys. J. Int.***10**, 211–219, https://doi.org/10.1111/j.1365-246X.1965.tb03062.x.NOAA, 2005: Deep-Ocean Assessment and Reporting of Tsunamis (DART(R)). NOAA National Centers for Environmental Information, accessed 21 March 2021, https://doi.org/10.7289/V5F18WNS.

Ohlmann, C., P. White, L. Washburn, B. Emery, E. Terrill, and M. Otero, 2007: Interpretation of coastal HF radar–derived surface currents with high-resolution drifter data.

,*J. Atmos. Oceanic Technol.***24**, 666–680, https://doi.org/10.1175/JTECH1998.1.Pawlowicz, R., R. Beardsley, and S. Lentz, 2002: Classical tidal harmonic analysis including error estimates in MATLAB using T_TIDE.

,*Comput. Geosci.***28**, 929–937, https://doi.org/10.1016/S0098-3004(02)00013-4.Rainville, L., and R. Pinkel, 2006: Propagation of low-mode internal waves through the ocean.

,*J. Phys. Oceanogr.***36**, 1220–1236, https://doi.org/10.1175/JPO2889.1.Ray, R. D., and G. T. Mitchum, 1996: Surface manifestation of internal tides generated near Hawaii.

,*Geophys. Res. Lett.***23**, 2101–2104, https://doi.org/10.1029/96GL02050.Ray, R. D., and E. D. Zaron, 2011: Non-stationary internal tides observed with satellite altimetry.

,*Geophys. Res. Lett.***38**, L17609, https://doi.org/10.1029/2011GL048617.Ray, R. D., and E. D. Zaron, 2016: M2 internal tides and their observed wavenumber spectra from satellite altimetry.

,*J. Phys. Oceanogr.***46**, 3–22, https://doi.org/10.1175/JPO-D-15-0065.1.Roarty, H., and Coauthors, 2019: The Global High Frequency Radar Network.

,*Front. Mar. Sci.***6**, 164, https://doi.org/10.3389/fmars.2019.00164.Terrill, E., and Coauthors, 2006: Data management and real-time distribution in the HF-radar national network.

*Oceans 2006*, Boston, MA, IEEE, https://doi.org/10.1109/OCEANS.2006.306883.Van Trees, H. L., 2001: Classical detection and estimation theory.

*Detection, Estimation, and Linear Modulation Theory*, Part I, John Wiley and Sons, 19–165.von Storch, H., and F. W. Zwiers, 2003:

*Statistical Analysis in Climate Research.*Cambridge University Press, 484 pp.Wunsch, C., 1996:

*The Ocean Circulation Inverse Problem.*Cambridge University Press, 437 pp.Zaron, E. D., 2019: Predictability of non-phase-locked baroclinic tides in the Caribbean Sea.

,*Ocean Sci.***15**, 1287–1305, https://doi.org/10.5194/os-15-1287-2019.Zetler, B. D., M. D. Schuldt, R. W. Whipple, and S. D. Hicks, 1965: Harmonic analysis of tides from data randomly spaced in time.

,*J. Geophys. Res.***70**, 2805–2811, https://doi.org/10.1029/JZ070i012p02805.Zhao, Z., M. H. Alford, J. Girton, T. M. S. Johnston, and G. Carter, 2011: Internal tides around the Hawaiian Ridge estimated from multisatellite altimetry.

,*J. Geophys. Res.***116**, C12039, https://doi.org/10.1029/2011JC007045.