## Abstract

A new sea level observation network initiated by the World Ocean Circulation Experiment (WOCE) program is delivering hourly data from about 150 ocean stations worldwide. A complete analysis of these data was performed using a least squares method allowing better accuracy over time series virtually unlimited in time (up to 13 yr in this case). In addition to the classical tidal harmonic constants, the power density spectrum of the detided signal was computed. The 95% confidence intervals of the results of the analysis have been systematically estimated. The numerical and statistical methods applied to produce these numbers are presented for three stations as an illustration. One example of application of this new dataset and its error bars is the selection of a tidal constant set, which is useful for validating and comparing the long-period tide models. This selection is performed by eliminating stations where the oceanic continuum spectrum magnitude (around a given tidal frequency) is more than 25% of the corresponding tidal peak amplitude. This study intends to lead to a better exploitation of sea level observations, which contain highly valuable information in the tidal and nontidal domains. The authors conclude that the WOCE Sea Level network must be maintained, in particular for a combined use with the present and future satellite altimetry missions. Thus, the aim of this paper is to present a new analysis of the in situ sea level observations acquired within the WOCE program with a particular emphasis given to the precise determination of the tidal harmonic constants.

## 1. Introduction

Observing, understanding, and predicting sea level variations are currently primary concerns for the scientific community. Satellite altimetry missions provide us the means to observe the sea surface topography globally, accurate to within a few centimeters. Among the various processes responsible for sea surface variations, the oceanic tide represents the major contribution (more than 80% of sea level variations in terms of variance as estimated in TOPEX/Poseidon altimetric data;Ray 1993). It is clear that in high wavenumber regimes like Kuroshio, Gulf Stream, Malvinas, the sea surface variability, on a percentage basis, due to the tides is much less. Thus, in order to access the signal related to other processes such as mean sea level variations and short- and large-scale variability of the ocean circulation, it is necessary to predict the oceanic tide with high accuracy and therefore to determine the characteristics in amplitude and phase of the tidal constituents. In recent years, ocean tide models have been significantly improved, in particular through the assimilation of altimeter data in hydrodynamic models. For the Jason one-centimeter challenge (Ménard et al. 1999 manuscript submitted to *Eos, Trans. Amer. Geophys. Union*) the accuracy of the tide model solutions has to be even more precisely estimated. One way to evaluate this accuracy consists in comparing these solutions with tide gauge data. Thanks to extremely well controlled long-duration measurements, tidal gauge reading is indeed a highly effective technique to measure sea level and in particular to deduce the tidal harmonic constants. Historically deployed in coastal areas for navigation purposes, tide gauges are now a major tool for studying the ocean and its variations (Pugh 1987; IOC 1997).

Until recent years, many altimeter comparisons with tide gauge data have been performed using the 103 standard tide gauge set (ST103; Le Provost 1994, Molines et al. 1994; Andersen et al. 1995; Shum et al. 1997). In 1995, this in situ reference was used as one of the criteria to select the two tide models: CSR3.0 (see Eanes and Wahr 1996) and FES95.2 (see Le Provost et al. 1998) for reprocessing of the TOPEX/Poseidon (T/P) Geophysical Data Records. For these accuracy tests, the considered discrepancies between the different models, in terms of discrimination of their individual constituents, were lower than one millimeter. Unfortunately, the problem of the confidence interval estimates of tidal constant errors was neglected in this comparison. The study was hindered by the unavailability of these error bars and by the lack of access to the initial time series from which the tidal constituents were computed.

Now, with the World Ocean Circulation Experiment (WOCE) and its sea level component, a new and improved tide gauge dataset is available. Data have been acquired from over 150 sites in 25 countries worldwide and constitute a global dataset of unprecedented scope and accuracy. In addition to the high standards of observations, more than 70 stations have now compiled records dating back more than 13 years. This high quality material gives us the opportunity to produce a new and more precise bank of tidal harmonic constants. Furthermore, the long duration of the time series not only allows systematic computation of the error bars relative to the tidal analysis for the first time, but also provides an insight into the nonstationary component of the oceanic tide.

Thus, the aim of this paper is to illustrate through a couple of typical examples the available products of this analysis, namely the usual tidal harmonic constants plus their confidence limits and the power density spectra of the residuals with error bars. In addition, we present one application for the error bars which consists in providing data assimilation codes, such as that implemented for ocean tide modeling by Lyard et al. (1999, manuscript submitted to *J. Geophys. Res.,* hereafter LPP), with a realistic estimate of the error variance in the observations.

This paper is organized as follows: Section 2 presents the WOCE Sea Level data set. Section 3 gives an overview of numerical and statistical methods. Section 4 presents a couple of typical results. Section 5 provides one preliminary application for this new dataset: the selection of a validation set for the *M*_{f} tidal models.

## 2. The WOCE Sea Level dataset

During the recent years, the common in situ reference for global tide studies was referred to the “103 standard tide gauge set.” This was constructed, in 1994, from the original set of 80 tide gauges selected by Cartwright and Ray (1991) with various additions, corrections, and updates (Le Provost 1994). The additional or corrected data have been extracted from either the IAPSO data (Smithson 1992) or the preliminary analysis of the WOCE Sea Level time series (J. M. Molines, personal communication).

The study of the large-scale circulation in the ocean is at origin of WOCE, an international project initiated in 1990 (WRCP 1988). WOCE aims have been to acquire from satellites, ships, and autonomous instruments to obtain a basic description of the physical properties and of the dynamical parameters characteristic of the circulation of the global ocean over a limited period (1990–97). Within the scope of this project, the WOCE Sea Level network has been progressively deployed and delivers hourly time series recorded in 164 stations in the five oceans: 33 in the North Atlantic, 12 in the South Atlantic, 26 in the Indian Ocean, 55 in the North Pacific, 37 in the South Pacific, and 1 station in the Arctic Ocean (the real number of instrumented sites for the WOCE network can be slightly different as some instruments have not been deployed exactly at the same location along the observation duration; namely, the number of stations reduces to 153 if we consider as distinct sites the stations distant from each other by at least 25 km). The global WOCE Sea Level network is presented in Fig. 1. The WOCE project has been split into two phases. First, an acquisition phase from 1990 to 1997, followed by a second phase started in 1997 and scheduled to finish in 2002, consisting in the exploitation of the acquired material. At the time of writing, more than the half of the stations have record lengths exceeding 10 years. This is an exceptionally long duration for a set of this importance, which is one of the main reasons it is so invaluable. In response to a wide demand in the scientific community, this sea level network will be maintained as part of Climate Variability and Predictability (CLIVAR; WRCP 1998) scientific program and the Global Sea Level Observing System/Global Ocean Observing System (GLOSS/GOOS; IOC 1997) plans. Therefore, data distribution for these stations should (and do indeed) continue after the initial deadline of 1997.

Many stations transmit data by satellite thus allowing data collection in near–real time even for uninhabited and rarely visited observations sites, like the St. Paul Island in the South Indian Ocean. There are two delivery centres of data: the Fast Delivery Center at the University of Hawaii Sea Level Center (UHSLC), which assembles and distributes the data transmitted by satellite within 1–3 months; and the Delayed Mode Sea Level Centre at the British Oceanographic Data Centre (BODC), which assembles, quality controls, distributes, and archives all available sea level data within 18–24 months. At the present time, the UHSLC delivers data for 109 stations and the BODC for 105 stations. Data concerning about 50 stations are archived by the two centers. The time series distributed by the UHSLC generally overlap those archived in the BODC but there are some exceptions where time series are complementary such as the Ponta Delgada tidal gauge (37°44.0′N, 25°40.0′W) in the Northern Atlantic: the UHSLC delivers effectively data for the period ranging from 1994 to 1998, whereas the BODC distributes data for the period ranging from 1984 to 1993.

## 3. The method of analysis and statistics

### a. General

Tides are the major contribution to sea surface variations (up to 80% of the sea surface height variability). They are mostly generated by gravitational forces exerted by the Moon and the Sun upon the oceans (like the mean lunar tide *M*_{2} or the lunisolar *K*_{1}). It has been shown that these generative forces derived from a potential (Cajori 1962). Darwin and Adams (1995) described this tide-producing potential as a sum of circular functions of time, based on the apparent lunar and solar motions and leading to a discrete tidal spectrum including about 32 lunar terms. The Darwin harmonic expansion is still widely used, despite more precise expansions have been derived since [unlike Darwin who used the lunar orbit as reference system to express the longitude and the latitude of the Moon, Doodson (1921) expressed these quantities related to the ecliptic; he obtained purely harmonic developments including more than 400 significant constituents]. However, as mentioned before, Darwin’s spectrum, extended by Shureman (1958) to about 70 lunar terms and 59 solar terms, is the most commonly used. It is due to the simplifications offered by the nodal correction concept (see below). Abbreviated, this expansion can be expressed as a sum of long-period terms, diurnal terms, and semidiurnal terms. Each of them depends on parameters, such as *ϕ, I, e, m, T, s, a, p, ξ,* and *ν,* where

*ϕ*is the latitude of the place of observation,*I*is the inclination of the Moon’s orbit to Earth’s equator,*e*is the eccentricity of the Moon’s orbit = 0.0549,*m*is the ratio of mean motion of the Sun to that of the Moon,*T*hour angle of the mean Sun,*s*is the mean longitude of the Moon referred to equinox,*p*is the mean longitude of lunar perigee,*ξ*is the longitude in the Moon’s orbit of the intersection of the lunar orbit with the celestial equator, and*ν*is the longitude in the celestial equator of the intersection of the lunar orbit with the celestial equator.

For example, the major lunar contribution *V*_{M2} (inducing the *M*_{2} tidal constituent) to the tidal potential *V* can be formulated as

where *g* is the mean acceleration of gravity on Earth’s surface; *M*_{M}, *M*_{E} are, respectively, the Moon and Earth mass; *a* is the mean radius of Earth; *ρ* is the distance of the place of observation from the center of Earth; and *c*_{M} is the mean distance Earth to the Moon. Except *I, ξ,* and *ν,* which directly depend upon *N,* the longitude of the Moon’s node, all the terms uniformly vary with time.

The tide-producing potential created by the Sun presents the same analytical form as that of the Moon, except that the parameter *e* has to be replaced by *e*_{1}, (eccentricity of Earth’s orbit). *I* by *ω*_{s} (inclination of the ecliptic to the Earth’s equator), *s* by *h* (mean longitude of the Sun), and *p* by *p*_{1} (mean longitude of solar perigee). Here, *ξ* and *ν* are equal to 0.

Taking into account all the different contributions, the tide-producing potential *V* can generally be written as

where *M* is the number of tidal constituents.

The *V*_{k} arguments contain the linear combination of *T, h, s,* and *p* quantities. These values uniformly vary with time and thus can be expressed as

where *ω*_{k} is the frequency of the tidal constituent *k, t* is the mean solar time calculated from 1 January of the considered year, and *V*_{0k} is the value of *V*_{k} on 1 January of the considered year. The *u*_{k} quantities contain the linear combinations of *ξ* and *ν* parameters, which slowly vary with motion of the lunar ascendant node. The constant *G* is given by

The factor of latitude *λ*_{k} is equal to cos^{2}*ϕ* in the case of the semidiurnal constituents (such as *M*_{2} mentioned above).

The *S*_{k} coefficient is a product of a function of *e* or *m* [noted Φ_{k}(*e, m*)] and a function of the inclination *I* [noted Φ_{k}(*I*)]. The former is referred to as the elliptic factor because it involves the *e* or *m* quantities which are linked to the Keplerian motions of the earth and Moon. The latter is called the obliquity factor as it expresses the inclination of the Moon’s orbit to the plane of the earth’s equator:

As the inclination *I* slowly varies (due to the low variation of the longitude of the Moon’s node), it is possible to approximate the instantaneous value of Φ_{k}(*I*) by its mean value Φ_{k}(*I*) multiplied by a factor called the nodal correction coefficient in amplitude *f*_{k}. Thus, we have

The product of the elliptic factor Φ_{k}(*e, m*) by the mean value Φ_{k}(*I*) of the obliquity factor is known as the mean constituent *S*_{k} and it expresses the mean importance of the constituent *k*:

The concept of nodal corrections (through the *f* and *u* terms, which are not expanded in circular function of time) allows in fact to reduce the number of constituents in the tidal spectrum. This is equivalent to gathering together the constituents, which are not separable over a period shorter or equal to one year, therefore conserving in the spectrum only the most important constituent in the group. The neighboring constituents, of lower amplitude, are taken into account through the nodal correction coefficients, which are applied both to the amplitude and phase of the central constituent. These coefficients are thus responsible for the slow modulation of the amplitude and phase of the major constituent.

### b. Harmonic analysis based on the least squares method

The global WOCE dataset has been analyzed using an harmonic method. This consists in expressing the rise and fall of the tide as a sum of independent constituents accordingly to the tidal potential spectrum. To these astronomical tides, we must add the tides of a nongravitational origin. This is the case for the constituents of radiational origin (for instance *S*_{1}, or partially *S*_{2}), which are directly or indirectly induced by solar radiation through meteorological forcings. Last but not least, in coastal areas nonlinear constituents are generated by the nonlinear processes of the tidal wave propagation in shallow waters (bottom friction, continuity, and advection). As a consequence, nonlinear constituents such as *M*_{4} or *M*_{6} are, respectively, generated by advection of *M*_{2} constituent energy or by friction of *M*_{2} constituent on the bottom. Some of these nonlinear constituents can have exactly the same frequencies as astronomic constituents, in which case they are called mixed constituents (like *μ*_{2} and 2*MS*_{2}; Le Provost 1974;Pugh 1987). In the real world, the harmonic expansion is somehow artificial in that some tidal constituents cannot, dynamically speaking, be dissociated, as is the case for *M*_{2} and *M*_{4} for example. But in practice, it remains a valuable approach. Accordingly to the harmonic method, we express the sea surface elevation at a point (*x, y*) and time *t* by

where

*M*is the number of constituents,*S*_{0}(*x, y*) is the mean sea level,*S*_{k}(*x, y*) is the amplitude of the constituent of index*k,**G*_{k}(*x, y*) is the phase lag relative to Greenwich time,*ω*_{k}is the angular frequency of the constituent of index*k,**V*_{0k}is the astronomical argument at time origine*t*_{0},*f*_{k}(*t*) is the nodal correction coefficient applied to the amplitude of the constituent of index*k,*and*u*_{k}(*t*) is the nodal correction coefficient applied to the phase of the constituent of index*k.*

Among these parameters, *ω*_{k}, *f*_{k}(*t*) and *u*_{k}(*t*) are given by the development of the tidal potential, and therefore only *S*_{k}(*x, y*) and *G*_{k}(*x, y*) still remain to be determined. The amplitude *S*_{k} and the phase lag *G*_{k} are the so called harmonic constants of the constituent *k* of the tide. As ocean tide models become increasingly more accurate, the estimation of the accuracy of the harmonic constants derived from the tide gauge recordings is becoming significant. In particular, the accuracy of the predicted tide depends on the number *M* of constituents included in the analysis. In practice, we have defined an a priori tidal spectrum including the most significant astronomical constituents in the Darwin development (following Shureman 1958) completed by 33 nonlinear constituents. This a priori spectrum all together contains 77 constituents.

The temporal variations of the nodal coefficients have a very long period (18.61 yr), and are therefore traditionally recomputed once for all for a given year of interest in prediction or analysis applications. Nevertheless, in order to avoid any source of inaccuracy in our tidal analysis, even minor ones like the approximation on the variations of these coefficients, we have computed them for each day. In practice we set a new time origin *t*_{0} at the beginning of the day, and we recompute the three astronomical arguments *V*_{0}, *u,* and *f* at this time origin. The coefficients of the nonlinear constituents are deduced from those of the astronomical constituents, using the nonlinear theory on tidal wave propagation in coastal areas (Le Provost 1991).

We favor the least squares method for one main reason, namely that it allows analysis of incomplete time series where the spectral methods require that the time series be completed either by replacing the missing data by zero or by the result of an interpolation where the spectral methods call for the addition of artificial information not in the initial time series. On the contrary, the least squares method generates more processing overhead compared to the other methods. Also for optimisation purposes, the latter generally require to truncate the time series, thus losing information (i.e., spectral methods are really faster when the number of data is equal to the product of elementary prime numbers). Generally speaking, the difference, in terms of efficiency, between the least squares and the spectral methods is not striking when the number of observation is random (and thus likely to be far from the result of a simple product of prime numbers).

The least squares method consists in seeking the best fit of the tide gauge reading *S*_{ob}(*x, y, t*) by a representation *S*_{ap}(*x, y, t*):

In order to determine low amplitude constituents more accurately, we use an iterative process based on the analysis of the residuals [(*S*_{ob}(*x, y, t*) − *S*_{ap}(*x, y, t*)] until certain stop criteria are satisfied. This increases the precision of the linear system resolution by reducing the energy of the second member ɛ. In theory, because of the linearity of the system to be solved, the exact solution should be obtain in one step. However, in the case where the record length is barely sufficient to distinguish two tidal constituents of close frequencies, or where the number of missing data is large, the matrix conditioning may be weak. In such cases, the large magnitude of the main constituents is responsible for a computational pollution of the smaller constituents. The analysis of the residuals allows for a levelling of the right-hand-side vector, and thus for a minimization of the numeric error. Two criteria are checked and the analysis is stopped when at least one becomes true: 1) the number of iterations is greater than 50. This limit is set only to stop the analysis if the process does not converge. In practice, the typical number of iterations is less than 10. 2) The sum of residuals–squares varies less than 10^{−6} cm^{2}, compared to the previous step.

Several criteria essentially based on the record length *τ* of the time series are applied to select the constituents taken into account in the analysis. First, we eliminate non-linear constituents which have the same frequency as some of the constituents induced by the tidal potential. Second, constituents with excessively low frequency, for example, lower than *τ*^{−1}, are discarded. Third, we apply the usual Rayleigh criterion to estimate the minimal observation period necessary to separate two tidal constituents. Two constituents of frequencies *ν*_{i} and *ν*_{j} must satisfy:

When two constituents are not separable over the period *τ,* the following arguments are applied to select the constituent to eliminate 1) in case of two nonlinear constituents, the constituent of expected lower amplitude is ignored; 2) in case of one astronomical constituent and one nonlinear constituent, the former is kept, 3) in case of two astronomical constituents, we introduce additional hypotheses based on physical considerations. Similarities in phase and amplitude exist between the constituents of similar origin and close frequencies (credo of smoothness; Cartwright et al. 1988). Therefore, these purely local facts allow us to complete our harmonic method by the use of a rough response method. This formalism introduces the notion of major (*p*) and secondary (*s*) constituents. The amplitude ratio is referred to as *k*_{sp}, and the phase lag between the two constituents as Δ*G*_{ψ}. They can be deduced either from the tidal generating potential or from a pre-existing set of harmonic constants. Therefore, the analysis of the major constituent *p* leads to the determination of the secondary constituent through to the following constraints:

### c. Estimation of error of tidal parameters

The least squares analysis supplies a set of mean harmonic constants representative of the tidal signal over the whole observation duration *τ.* The results of the analysis are corrupted by the measurements errors and by the error due to the contamination of the tidal signal by the oceanic continuum (contamination must be understood from a tidalist point of view). In addition, the tidal signal, which is usually considered as strictly periodic, can be slightly modulated in time. Effectively, it may be rectified during some short periods (from a few days to a few months) because of the interaction with meteorologically forced processes, or can show some seasonal to annual fluctuations linked to the ocean climatology. Strictly speaking, these phenomena do not induce an error but make it difficult to provide a complete parameterization. We have chosen to produce a mean value (i.e., the value obtained by analysing the whole time series) for the amplitude and phase of each constituent, complemented by a confidence interval based on statistics computed from annual subanalysis. In order to compute the annual values of *S* and *G* for each constituent, we perform the harmonic analysis on the n annual time series obtained by splitting the whole (*n* years) time series. The temporal variations of the harmonic constants are estimated by applying statistical considerations on these samples. For the amplitude and phase of each constituent, we estimate the confidence interval with a confidence level (1 − *α*) such that we could affirm that the mean quantity (amplitude or phase) varies over the interval [*a, b*] with a given likelihood *α* of error. The confidence interval estimation is based on the use of the central limit theorem, which supposes that the distribution of the *n* annual values of *S* or *G* is close to a normal distribution *N*(0, 1). Therefore, we have computed for each constituent of index *k* the standard deviation which is defined for a finite set of *n* separate analysis as

where the quantity *X*^{k}_{j} is either the amplitude *S* or the phase *G* of the tidal constituent of index *k* for the year *j.* We deduce the confidence interval with a confidence level (1 − *σ*), associated to each tidal constant:

where *μ*_{α/2} = 1.96 for *α* = 5% and *μ*_{α/2} = 2.58 for *α* = 1% according to the normal distribution *N*(0, 1).

These confidence intervals depend only on the number of samples. Therefore, it appears clearly that the longer the multiyear time series, the larger the number of samples and thus the more accurate the confidence limits. This underlines the necessity to maintain as long as possible the WOCE Sea Level network. In our case, these limits were estimated with a 95% confidence level. It should be noted that these error bars correspond to the uncertainty regarding the determination of the mean tidal signal and not the interannual variations of the tidal signal. Therefore, to obtain the interannual variations of the tidal signal one can multiply the upper and lower limits by *n**μ*^{−1}_{α/2}.

For different reasons, we have also chosen to compute the Fourier spectrum of the detided signal. The main reason for this is to provide an internal check for the tidal analysis. Namely, we systematically analyze power density spectra of the residuals to check that we correctly remove the tides from the initial signal or not. Peaks in the residuals spectra may indicate that the analysis is not able to extract all of the tidal signal at the considered frequency (problem of separation, omission of constituents in the a priori spectrum, etc.). In addition to the classical data quality control based on the data residuals edition (usually done by the collector and archive center), the examination of the residuals spectrum can tell us about anomalies in the time series. In particular it can help to detect any problem with the time reference (such as an erroneous UTC zone labeling, or clock drift) by checking the level of the residuals signal around the tidal peaks. In order to compute the power spectral density (PSD) of the residuals, we use a discrete fast Fourier transform where missing data have been replaced by a zero value. Squaring the modulus of the Fourier coefficients, we obtain the powers of the residual signal averaged over the frequency interval [*ν* − (½)*δν*;*ν* + (½)*δν*], where *δν* = *τ*^{−1}. This confirms once again the usefulness of long-duration observations in approximating the theoretical case.

The spectra of the sea level records do not only contain the tidal peaks but also the continuum due to the background nontidal variability. Therefore, to estimate the significance of the remaining peaks after analysis we need to systematically determine the error bars associated with this background spectrum. It is hard to identify a single approach that works better than any other. We consequently pursued two distinct approaches to estimate the error bars.

The first approach is based on the assumed statistical properties of the discrete spectral density. Considering *P*_{j} the true spectral density at frequency *j* and *P̃*_{j} the sample spectral density at the same frequency, it supposes that the sample estimate *P̃*_{j} is unbiased (i.e., on average over many realisations, is equal to the true value *P*_{j}) and inconsistent (i.e., the scatter in individual realisations does not decrease as record length increases. Thus *P̃*_{j} from a long record is no more reliable than from a short record. The most important hypothesis is that the expected scatter in sample estimates is as large as the true value. In addition, it supposes that the sample spectral density estimates are chi square distributed with two degrees of freedom. The confidence interval on the power spectra can be expressed as

Therefore, the confidence intervals depend on the sample spectral density at each frequency. This problem of variable length of the confidence intervals is considerably simplified by taking the logarithm

Then the confidence range on the logarithm of the PSD is

It is independent of either *P̃*_{j} or *P*_{j}, and therefore is the same at all frequencies in a logarithm representation. In our case, we chose to estimate these error bars with a 95% confidence level. This led to a unique confidence interval on the PSD of [0.271, 39.525]. The main advantages of this approach are that it does not alter the signal and it generates a unique error bar. The drawback lies in the extent of the error bar, which is globally overpessimistic. Nevertheless, this approach is very useful to test the significance of a peak in the residuals, in that peaks estimated from this criterion are undoubtedly significant. On the other hand, peaks flagged as insignificant could be reestimated.

The second approach consists in filtering the PSD and therefore implies the alteration of the signal. Nevertheless, at this step, most of the high-frequency signal should be removed and therefore the effect of smoothing remains limited. We use a rectangular window of width [*f* − (^{1}/_{6})month; *f* + (^{1}/_{6})month]. A confidence interval relative to the central frequency of each window is computed with a 95% confidence level. This approach supplies information about the level of energy of the oceanic signal around the frequency of the considered tidal constituent. Indeed, the mean PSD of the residuals represents a good approximation of the oceanic continuum. In the following, we will refer to this level of energy as the noise level in the vicinity of the considered tidal constituent. The potential level of the tidal signal contamination by the oceanic continuum can indeed be evaluated by comparing the squared amplitudes for each constituent to the averaged spectra. This is one way of detecting energy transfers between the oceanic continuum and the tidal signal such as through tidal cusps.

## 4. Illustrations for a set of typical stations

In order to illustrate the usefulness and the efficiency but also the limits of the applied method, we selected three stations in separate oceanic basins: Rikitea (southwest Pacific), Pohnpei (northeast Pacific), and Ammassalik (North Atlantic). The complexity of the signals recorded at these distinct stations varies according to local and regional conditions. Indeed, this complexity is related to the bathymetry, the meteorological effects, or the presence of ocean currents in the vicinity.

### a. Rikitea: A low–noise level station

The Rikitea station is located in the west Pacific at 23°07.5′S and at 134°57.2′W. This station is typical of a long-duration observation in the WOCE Sea Level network (13.5 yr from 1985 to 1998). The values of the tidal harmonic constants are presented in Table 1 for the eight major astronomical constituents as well as the fortnightly and monthly constituents. The confidence limits translating the interannual variations are also reported in this table. The tidal constant variations are lower than 0.5 cm for the amplitudes. For the phases, the fluctuations are the order of a few degrees. These values indicate a very stable tidal signal for the Rikitea station. The power density spectrum of the residuals are shown in Figs. 2a–d. The top-hand diagram (see Fig. 2a) shows the power density spectrum of the residuals whereas the Figs. 2b,c, respectively, give enlarged views of the long-period, diurnal, and semidiurnal frequency bands. To make easier the comparison between the tidal constituent signal and the oceanic continuum, we have also plotted vertical lines translating the energy associated with each tidal constituent (i.e., squared physical amplitude of the tidal constituent). Consistently with the Table 1, Fig. 2a exhibits a very low signal in the high frequency band (if we except the tidal peaks themselves, of course). The energy associated with the nontidal signal is lower than 0.1 mm. In the diurnal band (Fig. 2c), all the tidal constituents are significantly above the noise level. If we translate the error bar along the black solid curve (corresponding to the smoothed power density spectrum, e.g., to the oceanic continuum), the energy associated with the tidal constituents appears always higher than the energy of the oceanic continuum. This indicates without doubt the physical significance of the tidal peaks. In the semidiurnal band (Fig. 2d) the picture is not exactly the same. The tidal cusps, that is, the local amplification of the oceanic continuum due to the transfer of energy between the tidal signal and the neighbouring oceanic signal, are responsible for a higher level of residual signal, essentially around the *M*_{2}, *S*_{2}, and *N*_{2} frequencies. Therefore, if we consider the Chi2-based estimator, the nonlinear constituents *M*(*KS*)_{2}, *M*(*SK*)_{2}, and *MSK*_{2} may be not significantly above the surrounding oceanic signal. Nevertheless, when considering the 95% confidence intervals indicating the upper and lower limits of variations in the mean oceanic continuum, we come to the conclusion (see Fig. 3d) that the values for constituents *M*(*KS*)_{2}, *M*(*SK*)_{2}, but not for *MSK*_{2}, are significant. In the low frequencies (see Fig. 2b), due to the over pessimist nature of the first estimator, it would tend to indicate that the *M*_{f} constituent is the only one which is without any doubt above the noise level when all the other long period tides (LPTs) are obscured by the background nontidal variability. On the other hand, the second estimator (see Fig. 3b) indicates that *M*_{f}, but also *M*_{m} and *M*_{qm} are significantly above the noise level. This example is a typical application of the complementary uses of the two estimators.

The mean noise levels deduced from the power spectral density in the vicinity of the major constituents are presented in Table 1. These are a good indicator of the contamination of the tidal signal by the oceanic continuum, and therefore a good estimator of the accuracy of the harmonic constants. For this station, the noise levels are very low (<1 mm), confirming the high quality of the analysed constants.

### b. Pohnpei: A more complex site

The second station shown here is Pohnpei station (6°59′N, 158°14′E) in the Pacific Ocean. Thirteen years of data are available between 1985 and 1998 at the Hawaii center. The BODC has archived sea level data only at the moment for the period starting in 1985 up to 1993. The signal is more complex at this station than it is at Rikitea (see Figs. 4a–d). A much larger amount of energy, corresponding to the nonlinear tidal harmonics, appears in the power density spectrum at the very high frequencies (see Fig. 4a and Fig. 5a). In particular, the energy relative to the quarter diurnal constituents and to the diurnal constituents are of the same order of magnitude. The main harmonic constants are given in Table 2. We can clearly observe the tidal cusps in the semidiurnal frequency band on Fig. 4d and Fig. 5d but also in the diurnal band (see Fig. 4c and Fig. 5c). About 10-mm^{2} energy is associated to these phenomena in the diurnal band, which corresponds to a magnitude 10 times larger than the one measured at Rikitea. This indicates the existence of higher nonlinear processes at Pohnpei. A larger continental shelf at Pohnpei than at Rikitea, combined with a stronger tidal signal in terms of amplitude and currents, may explain this feature. The noise level is about 1 mm for the four semidiurnal constituents and for the major diurnal constituent *K*_{1}. Consequently, the tidal harmonics for minor constituents such as *π*_{1}, *ϕ*_{1} close to *K*_{1} or *MSK*_{2}, *MKS*_{2}, *M*(*SK*)_{2}, *M*(*KS*_{2}) close to *M*_{2} or *T*_{2}, *R*_{2} in the vicinity of *S*_{2} are probably strongly biased. The noise level around these frequencies is effectively of the same order as their amplitudes. For example, using the harmonic analysis method combined with the smoothed tidal spectrum, the amplitude of *ϕ*_{1} is estimated at 0.4 cm, and the noise level is about 0.1 cm. With regard to the second estimator (see Fig. 5d), the energy values for *MSK*_{2}, *M*(*SK*_{2}), and *M*(*KS*_{2}) are significant but *MKS*_{2} is considerably under the lower bound of noise. All the analysed long-period tides (see Fig. 4b) are barely distinguishable from the surrounding noise except for *M*_{f} and *M*_{m}, but only *M*_{f} is undoubtedly above the noise level. Considering the second estimator (see Fig. 5b), we can complete the picture by saying that *M*_{m} is also above the noise level. In addition, we note that *MS*_{f} and *M*_{tm} stand out from the surrounding oceanic signal.

### c. Ammassalik: A “short” record station

Ammasalik station (65°36′N, 37°37′W) in the Northern Atlantic delivers data for the period ranging from 1990 to 1995. Unlike the first two examples, data are exclusively distributed by the BODC. The values of the harmonic constants and their interannual variations are given in Table 3. After detiding, the level of energy associated with the residuals is still high (see Figs. 6a and 7a) Compared to Rikitea the energies of the residuals at the two stations differ by a factor of ten. Therefore, many constituents have no significant values in the different frequency bands (see Figs. 6b–d). In particular, all the diurnal (except *O*_{1}, *P*_{1}, *K*_{1}) and long-period constituents are totally obscured by the nontidal signal (see Fig. 6c) Considering the second estimator (see Fig. 7c), the diurnal problem is completely solved. Namely, all the diurnal constituents are above the upper limit of the confidence interval. But, determining the long-period constituents is impossible even if we use this second estimator (see Fig. 7b). In fact, the oceanic signal levels are about 3 cm around the *M*_{f} and *M*_{m} frequencies, which are more or less higher than the expected amplitudes of the long-period tide amplitudes in this region. When considering the low-frequency spectrum on Fig. 7b, it is clear that there are no significant tidal peaks for *M*_{m} and *M*_{f}. The *MS*_{tm} and *MS*_{m} amplitudes seem to emerge from the oceanic continuum. For these two constituents, this is probably the sign of a strong non-linear constituent (Wunsch 1967). As a result, the harmonic constants of the long-period tides are meaningless which is confirmed by the strong year to year variations for these constants reported in Table 3 (the phase fluctuations are so large that the values have not been reported in the table).

## 5. An example of application of the tidal analysis database: The selection of a validation set for the *M*_{f} tide

Initiated by the T/P program, progress in global tidal modeling has focused largely on the major tidal constituents, which have the main impact on the accuracy of oceanic tide predictions. Progressively, it has been recognized that the prediction spectrum needed to be extended by including minor constituents (see Molines et al. 1994; Shum et al. 1997). Also the practice of using the equilibrium approximation for correcting the LPT has been questioned for similar reasons. Recent developments in tidal models have allowed for the production of some of these additional tidal constituents that need to be validated by being compared with in situ data. For all these new constituents, the lower amplitude of the tidal signal is an extra difficulty compared to the modeling of the main constituents. The picture is even worse for the LPT that are mixed with a strong nontidal signal (typically 1 mm to 1 cm PSD depending on the region;see section 4 for example). As a consequence, the LPT harmonic constants computed from the time series are likely to be polluted to a greater or lesser extent by the nontidal energy at a significant level. In practice, it is not possible to separate the tidal signal and the nontidal signal at the tide frequency. But, as mentioned in section 3c, it is possible to estimate the level of energy of the nontidal signal in the vicinity of the tidal peak from the smoothed PSD. Using this technique, we selected a validation set for the *M*_{f} tidal constituent. To do so, we must find the right balance between an acceptable noise to signal ratio (i.e., nontidal signal to tidal signal ratio) in the data and a satisfactory geographic distribution of the data. The maximum noise to signal ratio was set at 25%. This is rather low even so applying this criterion eliminates almost all of the stations of the mid and high latitudes but leaves nearly untouched the set of stations in the equatorial and tropical bands (see Fig. 8). The conclusion of this application is that this technique is more than useful for selecting significant data for validation and assimilation purposes. Unfortunately, the resulting validation set is not suitable for global validation. A different approach is still to be found for the mid and high-latitude problem. The approach can be applied to *M*_{f} but also to the other LPT, to nonlinear constituents such as *M*_{4} or to low-amplitude tidal constituents such as 2*N*_{2}.

## 6. Conclusions

The time series of the sea level observation network initiated by the WOCE program have been entirely reanalysed in order to produce a highly accurate harmonic constants set. In addition, we have estimated the error bars associated with the constants and computed the Fourier spectra of the residuals (after detiding the sea level records). As with the harmonic constants, the confidence intervals on the residual power density spectra have been estimated, using two different methods. Recently, these products have been made available on the WOCE CD-ROM (WOCE meeting, Halifax, Nova Scotia, 1998). This database has been labeled WSLA 98 (standing for WOCE Sea Level Analysis, 1998). In the future, the WOCE time series will be regularly reanalysed and the products will be made available on the LEGOS anonymous ftp site (for any information contact Techine P.: techine@pontos.cst.cnes.fr).

This work represents an original and unprecedented effort, at this scale, to improve the information delivered with the harmonic constants. For instance, one major application of the data error bars is to supply data assimilation models with realistic estimates of the observation errors. They also make it possible to improve tidal model validation by increasing the precision of the comparison tests. Namely, with the predictable improvements of the tidal models, it will be a great help to have more discriminative tests based on the tide gauge data set. As shown in the above section, the knowing of the nontidal signal energy allows us to estimate whether or not a given tidal constant is significant (assuming, as in the case of *M*_{f}, that a minimum 25% noise to signal ratio is required for a constant to be qualified). More generally, it will allow us to select validation and assimilation data for minor tidal constituent simulations.

In fact, we are just at the beginning of a wider exploitation of the sea level data. We have focused in this paper on the high-frequency terms because our main interest is in tides, but they will be used in conjunction with satellite altimetry for many other scientific investigations on ocean dynamics at interannual to decedal (up to multidecadal) timescales. The sea level observations contain the signature of the ocean’s internal processes, and we need very long time series to study and understand the interannual and decadal variability of the oceans, especially in connection with climate changes. In the context of global ocean dynamics, the application to the monitoring of the El Niño events was at the beginning of the TOGA sea level gauge deployment (Wyrtki 1979). For the long-term monitoring of the mean sea level, tidal gauge measurements appear to be an essential tool when used in conjunction with DORIS readings or/and GPS measurements (Cazenave et al. 1999; Neilan et al. 1998). On the other hand, the now recognized high quality of the WOCE network gives us the capability to control and validate the altimetric missions not only at few calibration sites but on a nearly global scale. In conclusion, the WOCE Sea Level readings are not only a very rich parameter for ocean studies but also play an important role for present and future altimetric missions. There is now a strong consensus in the scientific community to maintain this network in the future and we firmly support this consensus.

## Acknowledgments

We thank the WOCE/TOGA data centers at the University of Hawaii Sea Level Center and at the British Oceanographic Data Centre for delivering tide gauge data.

## REFERENCES

## Footnotes

*Corresponding author address:* Dr. Florent Lyard, LEGOS, UMR 55/66, 18 Avenue E. Belin, 31401 Toulouse, Cedex 4, France.

Email: Florent.Lyard@cnes.fr