## 1. Introduction

During the last 25 years, the study of extreme ocean waves (also known as “rogue waves” or “freak waves”) has experienced a renaissance, triggered by the observation of the 25.6-m-high New Year wave at the Draupner oil rig in 1995 (Haver 2004). By now, there are several known mechanisms to generate much higher waves than predicted by linear theory (Adcock and Taylor 2014; Kharif and Pelinovsky 2003; Slunyaev et al. 2011; Dysthe et al. 2008), most of which rely on either highly nonlinear effects like Benjamin–Feir instability (e.g., Gramstad et al. 2018) or weakly nonlinear corrections to the Rayleigh wave height distribution (e.g., Toffoli et al. 2010).

However, while there is plenty of experimental evidence for these mechanisms in wave tanks and simulations, the relative importance of these processes in the real ocean is still unknown. This is evidenced by the rich spectrum of studies emphasizing different physical causes of rogue waves (Janssen and Bidlot 2009; Toffoli et al. 2010; Gemmrich and Garrett 2011; Xiao et al. 2013; Fedele et al. 2016; Gramstad et al. 2018; McAllister et al. 2019). This has the consequence that, so far, there is no reliable forecast for rogue wave risk (see also Dudley et al. 2019), although there have been some recent efforts (Barbariol et al. 2019).

There are several studies that aim to relate sea state parameters to rogue wave occurrence (Cattrell et al. 2018; Casas-Prat and Holthuijsen 2010; Karmpadakis et al. 2020; Gemmrich and Garrett 2011), but they are limited by the analyzed amount of data (often only one or several storms), their coverage of parameter space (often only look at 1 or 2 parameters), or sophistication of analysis (often no uncertainty analysis). To our knowledge, no study has been able to show the dependence of rogue wave occurrence on sea state (or show that it does not exist) with statistical significance throughout a wide regime of sea states.

We attribute this shortcoming to a lack of sufficient amounts of well-curated, accessible data on one hand, and a lack of a sophisticated analysis framework that handles nonlinearities and feature interactions on the other hand. In this study, we address the first issue and present the Free Ocean Wave Dataset (FOWD).

Particularly since the advent of machine-learning competitions—e.g., via the platform “Kaggle” (kaggle.com), where teams compete to find the best-performing machine-learning solutions to domain-specific problems—freely available, high-quality datasets have become an invaluable resource both as benchmarks for machine-learning researchers and as study objects for domain experts. Enabling easy access to domain-specific data allows even non–domain experts to participate in model building, to the benefit of the whole research community. We therefore also see this work as an important stepping-stone toward opening extreme wave research to a wider, potentially more machine-learning-literate, audience.

While we will be using rogue waves as a motivating example throughout this publication, other researchers can and should of course use FOWD to study phenomena other than extreme wave/crest heights (e.g., wave steepness or characteristic shape). In essence, FOWD relates aggregated sea state parameters to individual wave measurements. Applications are therefore plentiful.

As a primary data source for this version of FOWD we use the Coastal Data Information Program (CDIP) buoy data catalog. CDIP is a buoy network consisting primarily of Datawell Directional Waverider buoys for wave monitoring around the coasts of the United States (see, e.g., Behrens et al. 2019). The CDIP catalog (as of November 2020) contains measurements at 161 locations along the west and east coasts of North America and U.S. overseas states and territories like Hawaii, Guam, Puerto Rico, and the Marshall Islands.

Section 2 describes FOWD in detail, particularly which parameters are included, how they are computed, and which quality control processes we employ to validate the results. Section 3 outlines our Python reference implementation that allows us to efficiently process massive amounts of raw data, and section 4 describes the processing of the CDIP buoy data catalog. Section 5 gives an example application in which we look at how rogue wave probabilities vary depending on various sea state parameters. Section 6 gives a summary and conclusive remarks.

The FOWD–CDIP dataset is freely available for download (https://doi.org/10.17894/ucph.c589422c-64fd-4585-af31-4571497bcbe5; see also the data availability statement).

## 2. The FOWD specification

At its core, FOWD describes a mechanism to process raw observations (elevation time series and, optionally, directional spectra) into a catalog that maps parameters describing the current sea state *x* to observed wave or crest parameters *y*.

By “wave” we denote the series of surface elevations (relative to the 30-min mean elevation) from a given zero upcrossing to the next zero upcrossing. The crest and trough are then the maximum and minimum elevation of the wave, respectively, and the wave height is the sum of its crest height and trough depth. Some waves might be excluded by quality control criteria; see section 2c.

Throughout this study, we characterize extreme waves on the basis of their abnormality index AI = *H*/*H*_{S}, with wave height *H* and spectral significant wave height *H*_{S} = 4(*m*_{0})^{1/2}, where *m*_{0} is the zeroth moment of the spectral density [see also section 2a(2)].

FOWD output files are in netCDF4 format, which is widely used throughout the sciences and allows additional metadata to be attached. Every row in the resulting netCDF4 file represents a single wave and the sea state in which it was recorded.

Section 2a introduces the various quantities included in FOWD output and gives a more in-depth description of the computation of some parameters (where estimation is nonobvious or ambiguous). Section 2b describes the running-window processing approach we use in FOWD. Section 2c lists our quality control (QC) criteria, and section 2d outlines the steps we take to ensure reproducibility of FOWD output files.

### a. Computed quantities

We group all output quantities into four categories:

*Station metadata*are anything that is specific to the sensor (and is not directly related to waves or the sea state). This includes both metadata describing the raw data source (to ensure reproducibility; more in section 2d) and the conditions in which it was recorded (latitude/longitude and water depth).*Wave-specific parameters*are all quantities that describe a single wave, such as wave height or maximum slope. A typical study using FOWD aims to determine how a wave-specific parameter depends on one or several sea state parameters.*Aggregated sea state parameters*describe the circumstances in which each wave occurred; that is, they relate to the past sea state of each wave. They are computed from the immediate 10- and 30-min history prior to (but not including) the current wave (see also section 2b for more on this running-window approach). Quantities are computed using only the raw sea surface elevation as input (either directly or by computing a spectrum first).*Directional sea state parameters*: Some sensors (like the CDIP buoys) might include additional directional information that is not computable from the raw surface elevation time series. When such directional information (in form of a directional spectrum) is given, FOWD computes some directional parameters from it and includes them in the output. Note that this does*not*use the same running-window approach as the aggregated sea state parameters. Instead, each wave is mapped to the nearest (in time) available directional measurement. I.e., directional information usually includes some information relating to the*future*of the wave. But since directional information is robust to the influence of individual extreme events, we do not consider this a problem.

A complete overview of all computed quantities is shown in Table A1 in the appendix. Here, we outline some important quantities (as suggested in literature) and how they are estimated from the observed time series.

#### 1) Space–time domain transformations

*f*to wavenumbers

*k*(and by extension, periods to wavelengths) through the dispersion relation for linear waves:

*D*and gravitational acceleration

*g*= 9.81 m s

^{−2}. This also assumes the absence of currents.

#### 2) Spectral density estimation

To compute spectral quantities, we need to estimate the spectral density *f*) from the raw surface elevation time series. There is no unique way to do this, and any given method is a trade-off between spectral resolution, bias, and variance (noise).

In FOWD, we chose to use Welch’s method (Welch 1967) with a window length of 180 s and a window overlap of 50% using a Hann window (also known as a Hanning window). This corresponds to about 230 measurements per segment in the case of CDIP data with sampling frequency 1.28 Hz. This implies that the 30-min spectra are an average of 20 individual segments and the 10-min spectra are an average of 7 segments. All segments are zero padded to the next highest power of 2. This gives a spectral resolution of 0.005 Hz and a maximum (Nyquist) frequency of 0.64 Hz for 1.28-Hz CDIP data.

#### 3) Wave period and steepness

There are several popular approaches to define a dominant wave period for a given sea state. Depending on the application, either peak period, spectral mean period, or mean zero-crossing period may be more appropriate. Also, since we only have access to a noisy estimate of the true spectral density

*t*

_{i}refers to the zero-crossing periods of all waves in the corresponding surface elevation slice (zero crossings determined by linear interpolation) and the expression for

*ϵ*we use the peak wavenumber

*k*

_{p}, approximated from the peak period (6) and dispersion relation (1), following Serio et al. (2005):

#### 4) Spectral bandwidth and Benjamin–Feir index

Broadness is problematic because of the occurrence of *m*_{4}, the fourth moment of the spectral density *f*^{4} term occurring in its estimation, broadness is extremely sensitive to the high-frequency tail of

*ϵ*, bandwidth

*σ*(which could be any of the three definitions above), peak wavenumber

*k*

_{p}, and depth

*D*as

*σ*estimated through both narrowness and peakedness [as defined in (10)].

#### 5) Crest–trough correlation

Tayfun (1990) suggests another key parameter to describe wave height distributions, the correlation coefficient *r* between squared crest height *r* is closely related to spectral bandwidth (as, for narrowband seas, crests and troughs are approximately of the same size, becoming increasingly chaotic/uncorrelated as more harmonics are added). By extension, it is also a measure for the tendency of the sea state to form wave groups (Fig. 1).

*r*via

*ω*= 2

*πf*is the angular frequency.

#### 6) Spectral partitioning

*f*

_{i}), each characteristic for a different physical regime (Table 1). [This is a crude way to perform spectral partitioning as compared with more-sophisticated approaches that take directionality into account (Portilla-Yandún et al. 2016; Portilla-Yandún 2018). However, this simple integral is straightforward to compute and interpret, and can be estimated using only a surface displacement time series].

Frequency bands used by FOWD and their approximate corresponding physical regime [as, e.g., given in Holthuijsen (2010)]. Here, and elsewhere ID is identifier.

*ρ*= 1024 kg m

^{−3}and gravitational acceleration

*g*= 9.81 m s

^{−2}.

#### 7) Angular integrals

To make it possible to investigate the dependence of waves on phenomena like swell-wind sea crossing angles, we also split directional quantities into five distinct frequency bands, analogously to spectral energy content (Table 1). Since directional spread and wave direction are measured as an angle, we need to take special care when averaging these quantities. Furthermore, we want to weight the directional value at each frequency with the corresponding spectral energy at that frequency, to ensure that the resulting average represents the dominant angle within this frequency band.

*q*(which can be either dominant direction or directional spread) component-wise in Cartesian coordinates, weighted with the spectral density

*f*

_{i}again demarcates the boundaries of each frequency band. Then we transform the resulting Cartesian components back to an angle:

#### 8) Directionality index

*R*(as introduced in Fedele 2015). It is commonly defined as

*σ*

_{θ}is the directional spread (in radians), and

*ν*denotes the spectral bandwidth [we use narrowness, as in Fedele et al. (2019)]. This factor

*R*makes it possible to compute various directionality-corrected versions of, for example, the Benjamin–Feir index and kurtosis (Fedele 2015; Fedele et al. 2019). In FOWD, we estimate

*R*by computing the narrowness of the spectrum as provided by CDIP. Directional spread is computed as outlined above, which we integrate over all frequencies to obtain

*σ*

_{θ}.

### b. Running-window processing

Usually, studies that investigate extreme wave observations divide all data into blocks of equal length in time, e.g., 30-min chunks, that are then analyzed separately (e.g., Casas-Prat and Holthuijsen 2010; Cattrell et al. 2018). However, the transient nature of the ocean has long been identified as a potential source for systematic error (Adcock and Taylor 2014; Gemmrich and Garrett 2011; Gemmrich et al. 2016), as it is not clear that the wave height distribution is constant within each chunk.

A related consideration is that the estimated quantities must be *agnostic of the future*—that is, look-aheads must be impossible. This property is critical for machine-learning applications, where future state leaking into the training data may completely invalidate the generalization abilities of a machine-learning algorithm.

We have therefore decided to use a running-window approach in FOWD. Here, we iterate through the raw data one zero-upcrossing at a time, computing the characteristic sea state parameters based on the immediate history of every wave. This implies that there is no time gap between the end of the aggregation period and the current wave, at the expense of additional computation time (since the sea state has to be recomputed for every wave).

Picking a window length is always a trade-off between bias (longer windows are more prone to nonstationarity) and variance (shorter windows leave us with less data with which to work). Therefore, all parameters are computed three times:

The parameters are calculated twice using fixed 30- and 10-min windows. This makes it possible to investigate the stationarity of the current sea state by comparing the values obtained from each window length.

- The parameters are calculated one more time using a variable, data-dependent window as suggested in Boccotti (2000) and used in Fedele et al. (2019). We define the optimal window size
*n*to be the one that minimizes$\text{std}\left({\displaystyle \frac{{\sigma}_{n,i+1}}{{\sigma}_{n,i}}}-1\right),$where

*σ*_{n,i}is the standard deviation of the sea surface elevation in the*i*th chunk with length*n*, applied to the past 12 h of time series.To make this process more robust, we recompute (24) 10 times for each candidate window with a different time offset. FOWD tries a total of 11 different windows lengths between 10 and 60 min and selects the one that minimizes the sum of (24) across all trials. This process tends to generate time windows longer than 40 min in most conditions but is also capable of reducing the window size if needed (Fig. 2).

Because the standard deviation of the sea surface elevation

*σ*is directly related to significant wave height, we expect this to yield near-optimal window sizes for significant wave height and other slowly drifting quantities (such as mean period and energy content), but suboptimal results for faster drifting parameters (such as steepness, peak period, and kurtosis).

### c. Quality control

FOWD uses a combination of QC flags, most of which are inspired by the process suggested in Christou and Ewans (2014). A measurement is discarded if any of the following conditions are met when applied to the past 30-min surface elevation:

There are any waves with zero-crossing period >25 s.

- The rate of change of the surface elevation
*η*exceeds the limit rate of change by a factor of 2 or more at any point; that is,$\left|{\displaystyle \frac{\partial \eta}{\partial t}}\right|>2{U}_{\mathrm{lim}}.$The limit rate of change*U*_{lim}is defined as${U}_{\mathrm{lim}}=2\pi {\displaystyle \frac{\text{std}\left(\eta \right)}{\langle {T}_{d,0}\rangle}}\sqrt{2\hspace{0.17em}\mathrm{ln}N},$with standard deviation std, mean observed zero-crossing periods ⟨*T*_{d,0}⟩, and number of waves in the record*N.*This criterion removes records containing waves that are much steeper than the average rate of change std(*η*)/⟨*T*_{d,0}⟩—that is, records with single, very steep waves—but leaves sea states with many steep waves intact. There are 10 consecutive data points of the same value.

- There is any absolute crest or trough elevation that is greater than 8 times the normalized median absolute deviation (MADN) of the surface elevation; that is,$\left|h\right|>8\kappa \hspace{0.17em}\text{median}\left[\left|\eta -\text{median}\left(\eta \right)\right|\right],$
with

*κ*= 1.483, which ensures that MADN converges to standard deviation for Gaussian distributed*η*with growing sample size (see, e.g., Huber and Ronchetti 2009). This criterion permits crest heights and trough depths of up to about 2 times the significant wave height, which should be more than enough for any real signal. [In a linear sea, a crest exceeding 2*H*_{S}would have a probability of exp(−32) ≈ 10^{−14}]. Surface elevations are not equally spaced in time (but they may contain “NaN” values).

The ratio of missing (NaN) data to valid data exceeds 5%.

There are less than 100 individual zero crossings.

All waves that fail QC and are larger than 2 times the significant wave height are written to a log file to allow for manual inspection. In addition, all waves that are larger than 2.5 times the significant wave height are written to the log file, regardless of whether they pass QC. This enables us to evaluate the QC process and tweak thresholds or exclude faulty subdatasets as needed. A brief evaluation of this QC process when applied to the CDIP data is given in section 4b.

### d. Additional metadata and reproducibility

All FOWD output files are self-documenting in the sense that they include all relevant metadata as netCDF4 attributes, both for each variable and the dataset as a whole. Apart from the static metadata documenting the coordinates and parameters (which is the same for every FOWD output file), we also include some metadata related to the processing environment and raw data source to ensure reproducibility. Specifically, each wave record includes the time stamp, file name, and a unique file identifier (UUID) of the raw source file from which it came (see Table A1). The output files also include the exact version of the FOWD processing implementation used to create the file in form of a “git” tag, along with a UUID. That way, we enable users to reproduce any result by allowing them to use the exact same processing version and input file.

## 3. Reference implementation

As part of this work, we supply a Python reference implementation of the FOWD processing toolkit. It makes use of the popular Python packages xarray, numpy, and scipy to process large amounts of input data efficiently. The implementation processes either CDIP netCDF4 files or generic input files in a fixed netCDF4 format. Multiple CDIP deployments (within the same station) can be processed in parallel.

### a. Memory efficiency

Because of FOWD’s running-window approach (see section 2b), FOWD output datasets are about 10 times as big as the input surface elevation time series (since every wave results in about 80 output features). This demands that the processing implementation does not store entire output files in memory.

We achieve this by keeping only the immediate 30-min history of the current processing time in memory. Each new record is flushed to disk using Python’s “pickle” format. After the processing has finished, these pickle files are read back by the main process in chunks, reformatted to the netCDF4 output format, and flushed to disk again. This ensures that the main process uses only a negligible amount of memory while each worker process only keeps the input data in memory. In other words, if the input data fit in memory, processing will succeed.

### b. Testing strategy

In software engineering, automated tests are an invaluable tool to ensure proper functionality of a product. Unfortunately, writing automated tests for processing workflows of physical data is often impossible or infeasible because of the lack of ground-truth answers with which to compare. On the other hand, faulty results are often easy to detect for humans when they fall outside of reasonable physical limits or show the wrong scaling behavior. We have therefore opted for semiautomated *sanity checks* instead of fully automated unit tests for the core processing.

Each sanity check test case generates a random surface elevation time series from a different ground-truth wave spectrum and runs it through the FOWD processing. Here, only the spectral shape is prescribed externally, surface elevations are drawn as harmonics with random phases from the spectrum. The resulting output parameters can then be inspected manually.

Indeed, all of these expectations are met for this particular test case (Fig. 3). Other sanity checks feature idealized spectra, for example, containing just a single harmonic, that allow us to validate parameters that are more difficult to interpret like crest–trough correlation, or idealized directional spectra. Because of these sanity checks, we are confident that the FOWD core processing produces meaningful results.

## 4. Processing of CDIP buoy data

The following sections describe the CDIP input and FOWD output data, analyze QC performance and the impact of FOWD’s running-window processing, and discuss some caveats that apply when using buoy data for extreme wave studies.

### a. Input data and processing

In total, the CDIP catalog spans about 750 years of continuous surface elevation measurements (almost all at sampling rates of 1.28 Hz) and is available in netCDF4 format through a THREDDS server. This amounts to about 270 GByte of raw data.

While CDIP data files also include horizontal displacements and a number of derived quantities (like significant wave height, peak period, and others), we use only the raw vertical surface displacement, station metadata, and directional quantities for processing. This ensures that FOWD is applicable to any instrument that delivers a surface displacement time series (including radar or laser sensors).

We applied only minimal preprocessing to the data, which consists of removing all data that have an error flag set and subtracting the 30-min running mean from the raw vertical surface elevation. After that, we processed all data in about 72 h on 10 cluster nodes in parallel (using the FOWD reference implementation described in section 3). The resulting output dataset has a total (compressed) size of 1.1 TB. We create one output file per CDIP station, with individual file sizes ranging between 1.7 MByte and 38 GByte.

In total, FOWD contains about 4.2 billion individual waves and sea states. An interactive map indicating all data locations and some key statistics is available in the online supplemental material.

### b. Quality control and filtering

As outlined in section 2c, FOWD automatically logs waves failing QC that are higher than 2 significant wave heights, and all waves higher than 2.5 significant wave heights (whether they pass QC or not). This allows us to assemble some higher-order statistics to get an idea of how prevalent quality issues are in the CDIP data and to verify that FOWD’s QC system works as intended.

In total, just under 80 000 waves fail QC (Table 2). About 80% of these QC failures occur in only 5 CDIP locations (of 161). This suggests that relatively few deployments with general quality problems cause a majority of QC failures.

The number of times each QC flag was triggered for the whole CDIP catalog. See section 2c for a definition of flags a–g. Note that multiple flags can be active for the same wave.

To investigate this further and isolate faulty deployments, the FOWD implementation includes a postprocessing command that produces plots of all records in the QC logs. These plots show the raw surface elevation of the failing wave and its immediate 30-min history.

After inspecting each of these plots, we decided to blacklist 38 deployments and 4 entire CDIP stations that showed obvious quality problems like frequent spikes, extreme oscillations, unphysical values, or jumps (Table 3). On top of excluding these blacklisted CDIP deployments, we also removed all records in conditions in which buoys are known to be unreliable [similar to McAllister and van den Bremer (2020)]:

records with 30-min significant wave height smaller than 1 m,

records with spectral mean frequency higher than 1/3.2 of the Nyquist frequency (for 1.28-Hz data, this is equivalent to filtering all records with a mean wave period below 5 s), and

records where the relative energy content of frequency band 1 exceeds 10% (extensive low-frequency drift).

Blacklisted CDIP deployments that failed visual inspection.

Since FOWD is also intended for use by non–wave experts, it is essential to provide access to a precleaned dataset. Therefore, the filtered FOWD–CDIP dataset is available for download along with the unfiltered one (see the data availability statement).

### c. Impact of running-window processing

After processing the CDIP data, we can now investigate how large of a difference FOWD’s running-window processing (as described in section 2b) makes in practice, relative to the usual fixed-window approach.

To this end, we divide the FOWD catalog for one particular CDIP station (with ID 188p1, containing about 30 million waves) into 30-min chunks. The last measurement in each of these chunks (concerning the past 30-min sea state) then represents what would have been obtained for all waves if FOWD did not use running windows.

We can then quantify the influence of the running-window approach by computing the root-mean-square (RMS) difference between this last measurement of every chunk and all other data points in it. To make it easier to compare the different parameters, we divide each by a characteristic scale to obtain a normalized RMS (Table 4).

Characteristic scale used to normalize root-mean-square residual for each parameter (Fig. 4).

The resulting distribution of the normalized RMS in each chunk shows that, while deviations are typically below 10% of the characteristic scale, they can reach up to 50% in extreme cases (Fig. 4). As expected, some parameters (such as kurtosis and maximum wave height) are much more prone to drift than others (such as significant wave height and spectral energy). However, this result is sensitive to which characteristic scale we choose, so comparisons between parameters remain qualitative.

A particularly important quantity in this context is the significant wave height. If the significant wave height is underestimated with an error of only 5%, a wave with true abnormality index AI = 2 is estimated as a wave with AI = 2.1, which is less than one-half as likely to occur (assuming Rayleigh-distributed waves).

We conclude that the running-window approach *can* lead to significantly different results, apart from the more important effect of preventing look-aheads (as discussed in section 2b). In other words, explicitly accounting for a drifting sea state provides an opportunity to reduce bias by a nontrivial amount—although we did not measure how much this approach influences final results or conclusions.

### d. Shortcomings of buoy data

Although any dataset that provides surface elevation measurements can be processed into a FOWD dataset, buoy measurements remain a dominant data source due to their relatively large availability (at least in comparison with radar and laser measurements). Therefore, this section discusses some of the known problems with buoy data, and how they carry over to FOWD and its possible applications.

First and foremost, buoys tend to linearize surface elevations to some degree [see McAllister and van den Bremer (2020, 2019) for a discussion]. This is especially problematic in rough seas with high steepness, because buoys can be dragged through a steep crest or move laterally around it and underestimate the true wave height. Combined with the inherent sampling variability of a point measurement (the two-dimensional wave has to hit the buoy at the crest to be registered at full height; see Benetazzo et al. 2015), wave estimates based on buoy data tend to be too conservative (see also Casas-Prat and Holthuijsen 2010).

This is inconvenient for studies with the goal to estimate absolute rogue wave risk, since one needs to take additional steps to correct for these biases, include other data sources, or accept that the results represent a lower bound for rogue wave risk. However, this is not a problem when estimating the *relative* importance of sea state risk factors, as buoys should be similarly inaccurate across a wide range of different sea states (after the most problematic conditions are filtered; see section 4b—perhaps with the exception of very steep seas). We therefore see no problem with using buoy data for the type of study presented in section 5.

Another issue to keep in mind is *selection bias*. Buoys tend to be placed in locations that are easy to reach and of special interest for humans. This implies that coastal areas are overrepresented, and therefore results derived from the whole dataset will be less representative for open-ocean conditions.

No reasonable amount of one-dimensional time series data can tell us about truly exceptional events. In offshore engineering contexts, an important quantity is the “10 000 year wave,” which is the largest expected wave in a 10 000 yr period. Events of this rarity cannot be estimated with this dataset without additional work (such as further theoretical assumptions, or data augmentation via simulations).

## 5. Example application: Which sea state parameter is the best predictor for rogue wave occurrence?

As an example of an application of FOWD, we look at the connection between sea state and the occurrence of rogue waves to find which sea state parameter is the best predictor for rogue wave activity (where we find the largest change in rogue wave probability when varying the parameter).

*P*(AI) we would expect the next wave to be a rogue wave with probability

Number of waves in the FOWD–CDIP dataset fulfilling various criteria.

This implies that the measured incidence rate of rogue waves across all sea states is lower by about a factor of 5 than is predicted by linear theory. This is not uncommon for buoy data (Casas-Prat and Holthuijsen 2010) and could to some degree be due to the underestimation of extreme waves by buoys (as discussed in section 4d). However, we suspect that this has mostly physical causes. Effects like crest–trough correlations < 1 (as we will see below) or wave breaking can severely limit the formation of rogue waves and are not accounted for in linear theory.

During the following sections, we will take a closer look under which conditions rogue waves preferably occur. For this, we use the combined data from all Hawaiian CDIP stations (stations with IDs 098p1, 106p1, 146p1, 165p1, 187p1, 188p1, 198p1, 225p1, 233p1), containing about 200 million waves.

### a. Confounding and roguish sea states

To get a feeling for the data, we investigate correlations between some of the sea state parameters and have a look at the probability density functions of sea states in which we find rogues with AI > 2 and AI > 2.4.

The correlation matrix of the sea state parameters (Fig. 5) provides yet another important sanity check for FOWD, since many parameters are correlated by definition (such as BFI, which is computed based on steepness and spectral bandwidth). Furthermore, it serves as an important reminder that there are many nonobvious correlations, such as the one between spectral bandwidth and mean period. Any conclusion we draw about the influence of a parameter on rogue wave activity thus has to take possible confounders into account.

The probability density functions of roguish seas (Fig. 6) indicate several potential controlling parameters for rogue wave occurrence, where the distribution of seas containing a rogue wave differs substantially from that of all waves (with, e.g., skewness, spectral bandwidth, and maximum wave height being promising candidates). This analysis, while intuitively approachable, yields little quantitative insight into the relative importance of each parameter, and it neglects the influence of sample size effects. The following section addresses this through a simple analytical Bayesian parameter estimation.

### b. Estimation of rogue wave probabilities with uncertainties

*p*. As the first step, we assume that the occurrence of

*n*

^{+}rogue waves and

*n*

^{−}nonrogue waves in a given sea state is drawn randomly with some rogue wave probability

*p*. Then

*n*

^{+}follows a binomial distribution:

*p*from measurements of

*n*

^{+}and

*n*

^{−}. For

*p*, we encode prior information by assuming a beta prior, given by

*α*

_{0}and

*β*

_{0}, which we choose as

*α*

_{0}= 1 and

*β*

_{0}= 10 000, roughly representing the expected order of magnitude

*O*(

*p*) ≈ 10

^{−4}(this is just a weakly informative prior to constrain

*p*to the right order of magnitude—the exact values have no influence on the conclusions of this analysis).

*p*is conjugate to the binomial likelihood of

*n*

^{+}).

This posterior is simple to evaluate analytically. In particular, we can use widely available library functions to compute the minimum credible interval (highest posterior credible interval) for *p*. This gives us the possibility to quantify our uncertainty in *p* based on the number of available samples, expressed as, for example, the 95% credible interval.

To finally investigate the influence of the sea state on the rogue wave probability *p*, we split each sea state parameter into 15 equally sized bins. We assume that, within each bin, *p* is independently and identically distributed (iid) with a distribution according to (33), and we evaluate the mean and credible interval of *p* independently for each bin. We also exclude bins that contain less than 10 rogue wave events (i.e., where *n*^{+} < 10) to eliminate overly uncertain estimates. As a result, we can study how *p* behaves as a function of each sea state parameter and quantify our uncertainty based on how much data we have in each regime.

We stress that this uncertainty is based on the assumption that *p* is iid. Beta distributed within each bin, which is clearly not the case if we acknowledge that *p* depends on more than one parameter. Therefore, these uncertainties can only serve as an indicator whether or not there are enough data to make a statement about this marginalized version of the true, multivariate distribution of *p*. In other words, they indicate how confident we can be in the best estimate of *p* for this dataset if we can only measure one parameter at a time.

The results of this process show a clear, highly significant dependence of the rogue wave probability on some sea state parameters, and the lack of such a dependence on others (Fig. 7). In particular, we find the following:

Surface elevation kurtosis, relative maximum wave height, and skewness are the strongest predictors for rogue wave risk. For relative maximum wave height,

*P*(AI > 2) ranges between 2.9 × 10^{−5}and 1.0 × 10^{−3}. So if an up-to-date, in situ surface elevation time series is available, these parameters are able to quantify rogue wave risk with a factor of about 35 in variation.Crest–trough correlation and spectral bandwidth (peakedness) are the strongest spectral predictors, with

*P*(AI > 2) varying between 2.4 × 10^{−5}and 1.4 × 10^{−4}for crest–trough correlation—that is, almost one order of magnitude in variation from the spectrum alone.The Tayfun wave height distribution (Tayfun 1990; Tayfun and Fedele 2007) seems to be an excellent baseline for rogue wave activity.

There is, at this level of detail, only a minor dependency of rogue wave occurrence on directional spread, Benjamin–Feir index, significant wave height, and steepness.

So, in this first analysis, it seems that bandwidth effects are the dominant modifier of rogue wave risk, whereas nonlinear effects (at least those governed by steepness and BFI) seem to play a minor corrective role in comparison with that. However, it is important to keep in mind that we are only looking at one set of stations and only one sea state parameter at a time.

## 6. Conclusions

FOWD is a free ocean wave dataset that relates wave point measurements to the conditions in which the wave occurred and that is optimized for use in data-mining and machine-learning applications. In the previous sections, we describe which quantities are included in our wave catalog FOWD and how they are computed, and which steps we take to ensure quality and reproducibility (section 2). We describe the reference implementation and the steps we take to be able to process massive amounts of data at the terabyte scale (section 3). We summarize the processing of the CDIP buoy data catalog and analyze the quality of the resulting catalog (section 4). We apply additional filtering to remove problematic measurements. By visual inspection, we find that the resulting dataset is of high quality. Last, we study the occurrence probability of rogue waves depending on the sea state in an example application, where we have been able to demonstrate that certain parameters are much better predictors than others (section 5). We find that, based on analyzing only one sea state parameter at a time, rogue wave risk can vary by at least one order of magnitude. The estimated rogue wave probabilities are consistent with those found in earlier studies based on observations and simulations (e.g., Fedele et al. 2016, 2017).

The strongest parameters in this analysis are surface elevation skewness/kurtosis, and maximum relative wave height of the past record. This is of little surprise when taking into account how many rogue waves occur in rapid succession of each other (Table 5), but the importance of kurtosis and skewness could also be evidence for the role of second- and third-order (weakly) nonlinear contributions (Mori and Janssen 2006; Gemmrich and Garrett 2011; Christou and Ewans 2014). The most important spectral parameters are spectral bandwidth and crest–trough correlation, which is compatible with the finding in Cattrell et al. (2018) that spectral bandwidth is important (although we disagree with the conclusion that rogue waves *cannot* be predicted from characteristic parameters).

On the other hand, we were unable to detect any noteworthy dependency of rogue wave risk on directional spread [hypothesized, e.g., by Gramstad et al. (2018) and McAllister et al. (2019)], wave steepness (which is evidence against the importance of weakly nonlinear corrections), or Benjamin–Feir index (one of two parameters used by ECMWF’s freak wave forecast; see Janssen and Bidlot 2009). This does of course *not* prove that such dependencies do not exist, just that it is not detectable in this limited dataset (of Hawaiian stations) and by univariate analysis (i.e., considering one parameter at a time). A more sophisticated analysis is needed, which is precisely what we want to enable with FOWD.

We believe that this work represents an important motivation and contribution to enable physical insight into ocean waves through sophisticated data-driven methods. Downstream studies can either process their own raw data—because of the flexibility of the FOWD specification and reference implementation—or make use of the already processed CDIP data.

Extreme probabilistic events such as rogue waves are notoriously difficult to analyze statistically in a robust, meaningful way. By lowering the bar of entry for non–wave experts, we hope to enable new, powerful descriptive and predictive approaches to ocean wave phenomena.

## Acknowledgments

Dion Häfner received funding from the Danish Hydrocarbon Research and Technology Centre (DHRTC). We thank Øyvind Breivik and Mika Malila for their valuable comments during the drafting stage of FOWD. We thank three anonymous reviewers for their constructive and insightful remarks. Raw data were furnished by the Coastal Data Information Program (CDIP), Integrative Oceanography Division, operated by the Scripps Institution of Oceanography, under the sponsorship of the U.S. Army Corps of Engineers and the California Department of Parks and Recreation. Computational resources were provided by DC^{3}, the Danish Center for Climate Computing.

## Data availability statement

Filtered and unfiltered versions of the the FOWD–CDIP data are available for download at https://doi.org/10.17894/ucph.c589422c-64fd-4585-af31-4571497bcbe5. The exact version of the FOWD reference implementation used throughout this study (v0.5.2) is available at https://doi.org/10.5281/zenodo.4628203. The current version can be found at https://github.com/dionhaefner/FOWD. The scripts used to generate the plots and statistics in this paper are available at https://gist.github.com/dionhaefner/51ef93980a87d6b6bb557599b79582da.

## APPENDIX

### Complete Overview of All FOWD Quantities

See Table A1 for an exhaustive list of all quantities included in FOWD.

All quantities included in FOWD output files. Quantities marked with a dagger are further explained throughout section 2a.

## REFERENCES

Adcock, T. A. A., and P. H. Taylor, 2014: The physics of anomalous (‘rogue’) ocean waves.

,*Rep. Prog. Phys.***77**, 105901, https://doi.org/10.1088/0034-4885/77/10/105901.Barbariol, F., J.-R. Bidlot, L. Cavaleri, M. Sclavo, J. Thomson, and A. Benetazzo, 2019: Maximum wave heights from global model reanalysis.

,*Prog. Oceanogr.***175**, 139–160, https://doi.org/10.1016/j.pocean.2019.03.009.Behrens, J., J. Thomas, E. Terrill, and R. Jensen, 2019: CDIP: Maintaining a robust and reliable ocean observing buoy network.

*2019 IEEE/OES 12th Current, Waves and Turbulence Measurement*, San Diego, CA, IEEE/OES, https://doi.org/10.1109/CWTM43797.2019.8955166.Benetazzo, A., F. Barbariol, F. Bergamasco, A. Torsello, S. Carniel, and M. Sclavo, 2015: Observation of extreme sea waves in a space–time ensemble.

,*J. Phys. Oceanogr.***45**, 2261–2275, https://doi.org/10.1175/JPO-D-15-0017.1.Boccotti, P., 2000:

*Wave Mechanics for Ocean Engineering.*Elsevier, 520 pp.Casas-Prat, M., and L. H. Holthuijsen, 2010: Short-term statistics of waves observed in deep water.

,*J. Geophys. Res.***115**, C09024, https://doi.org/10.1029/2009JC005742.Cattrell, A. D., M. Srokosz, B. I. Moat, and R. Marsh, 2018: Can rogue waves be predicted using characteristic wave parameters?

,*J. Geophys. Res. Oceans***123**, 5624–5636, https://doi.org/10.1029/2018JC013958.Christou, M., and K. Ewans, 2014: Field measurements of rogue water waves.

,*J. Phys. Oceanogr.***44**, 2317–2335, https://doi.org/10.1175/JPO-D-13-0199.1.Dudley, J. M., G. Genty, A. Mussot, A. Chabchoub, and F. Dias, 2019: Rogue waves and analogies in optics and oceanography.

,*Nat. Rev. Phys.***1**, 675–689, https://doi.org/10.1038/s42254-019-0100-0.Dysthe, K., H. E. Krogstad, and P. Müller, 2008: Oceanic rogue waves.

,*Annu. Rev. Fluid Mech.***40**, 287–310, https://doi.org/10.1146/annurev.fluid.40.111406.102203.Fedele, F., 2015: On the kurtosis of deep-water gravity waves.

,*J. Fluid Mech.***782**, 25–36, https://doi.org/10.1017/jfm.2015.538.Fedele, F., J. Brennan, S. Ponce de León, J. Dudley, and F. Dias, 2016: Real world ocean rogue waves explained without the modulational instability.

,*Sci. Rep.***6**, 27715, https://doi.org/10.1038/srep27715.Fedele, F., C. Lugni, and A. Chawla, 2017: The sinking of the El Faro: Predicting real world rogue waves during Hurricane Joaquin.

,*Sci. Rep.***7**, 11188, https://doi.org/10.1038/s41598-017-11505-5.Fedele, F., J. Herterich, A. Tayfun, and F. Dias, 2019: Large nearshore storm waves off the Irish coast.

,*Sci. Rep.***9**, 15406, https://doi.org/10.1038/s41598-019-51706-8.Fenton, J. D., 1988: The numerical solution of steady water wave problems.

,*Comput. Geosci.***14**, 357–368, https://doi.org/10.1016/0098-3004(88)90066-0.Gemmrich, J., and C. Garrett, 2011: Dynamical and statistical explanations of observed occurrence rates of rogue waves.

,*Nat. Hazards Earth Syst. Sci.***11**, 1437–1446, https://doi.org/10.5194/nhess-11-1437-2011.Gemmrich, J., J. Thomson, W. E. Rogers, A. Pleskachevsky, and S. Lehner, 2016: Spatial characteristics of ocean surface waves.

,*Ocean Dyn.***66**, 1025–1035, https://doi.org/10.1007/s10236-016-0967-6.Gramstad, O., E. Bitner-Gregersen, K. Trulsen, and J. C. Nieto Borge, 2018: Modulational instability and rogue waves in crossing sea states.

,*J. Phys. Oceanogr.***48**, 1317–1331, https://doi.org/10.1175/JPO-D-18-0006.1.Haver, S., 2004: A possible freak wave event measured at the Draupner Jacket 1 January 1995.

*Rogue Waves 2004*, Brest, France, IFREMER, http://www.ifremer.fr/web-com/stw2004/rw/fullpapers/walk_on_haver.pdf.Holthuijsen, L. H., 2010:

*Waves in Oceanic and Coastal Waters.*Cambridge University Press, 404 pp.Huber, P. J., and E. M. Ronchetti, 2009:

*Robust Statistics*. 2nd ed. John Wiley and Sons, 380 pp.Janssen, P. A. E. M., 2003: Nonlinear four-wave interactions and freak waves.

,*J. Phys. Oceanogr.***33**, 863–884, https://doi.org/10.1175/1520-0485(2003)33<863:NFIAFW>2.0.CO;2.Janssen, P. A. E. M., and J.-R. Bidlot, 2009: On the extension of the freak wave warning system and its verification. ECMWF Tech. Memo. 588, 44 pp., https://doi.org/10.21957/uf1sybog.

Karmpadakis, I., C. Swan, and M. Christou, 2020: Assessment of wave height distributions using an extensive field database.

,*Coastal Eng.***157**, 103630, https://doi.org/10.1016/j.coastaleng.2019.103630.Kharif, C., and E. Pelinovsky, 2003: Physical mechanisms of the rogue wave phenomenon.

,*Eur. J. Mech.***22B**, 603–634, https://doi.org/10.1016/j.euromechflu.2003.09.002.Longuet-Higgins, M. S., 1952: On the statistical distribution of the height of sea waves.

,*J. Mar. Res.***11**, 245–266.McAllister, M. L., and T. S. van den Bremer, 2019: Lagrangian measurement of steep directionally spread ocean waves: Second-order motion of a wave-following measurement buoy.

,*J. Phys. Oceanogr.***49**, 3087–3108, https://doi.org/10.1175/JPO-D-19-0170.1.McAllister, M. L., and T. S. van den Bremer, 2020: Experimental study of the statistical properties of directionally spread ocean waves measured by buoys.

,*J. Phys. Oceanogr.***50**, 399–414, https://doi.org/10.1175/JPO-D-19-0228.1.McAllister, M. L., S. Draycott, T. A. Adcock, P. H. Taylor, and T. S. Bremer, 2019: Laboratory recreation of the Draupner wave and the role of breaking in crossing seas.

,*J. Fluid Mech.***860**, 767–786, https://doi.org/10.1017/jfm.2018.886.Mori, N., and P. A. E. M. Janssen, 2006: On kurtosis and occurrence probability of freak waves.

,*J. Phys. Oceanogr.***36**, 1471–1483, https://doi.org/10.1175/JPO2922.1.Ochi, M. K., and E. N. Hubble, 1976: Six-parameter wave spectra.

*15th Int. Conf. on Coastal Engineering*, Honolulu, HI, ASCE, 301–328, https://doi.org/10.1061/9780872620834.018.Portilla-Yandún, J., 2018: The global signature of ocean wave spectra.

*Geophys. Res. Lett.*,**45**, 267–276, https://doi.org/10.1002/2017GL076431.Portilla-Yandún, J., A. Salazar, and L. Cavaleri, 2016: Climate patterns derived from ocean wave spectra.

,*Geophys. Res. Lett.***43**, 11 736–11 743, https://doi.org/10.1002/2016GL071419.Serio, M., M. Onorato, A. R. Osborne, and P. A. E. M. Janssen, 2005: On the computation of the Benjamin-Feir index.

,*Nuovo Cimento***28C**, 893–903, https://doi.org/10.1393/ncc/i2005-10134-1.Slunyaev, A., I. Didenkulova, and E. Pelinovsky, 2011: Rogue waters.

,*Contemp. Phys.***52**, 571–590, https://doi.org/10.1080/00107514.2011.613256.Tayfun, M. A., 1990: Distribution of large wave heights.

,*J. Waterw. Port Coastal Ocean Eng.***116**, 686–707, https://doi.org/10.1061/(ASCE)0733-950X(1990)116:6(686).Tayfun, M. A., and F. Fedele, 2007: Wave-height distributions and nonlinear effects.

,*Ocean Eng.***34**, 1631–1649, https://doi.org/10.1016/j.oceaneng.2006.11.006.Toffoli, A., O. Gramstad, K. Trulsen, J. Monbaliu, E. Bitner-Gregersen, and M. Onorato, 2010: Evolution of weakly nonlinear random directional waves: Laboratory experiments and numerical simulations.

,*J. Fluid Mech.***664**, 313–336, https://doi.org/10.1017/S002211201000385X.Welch, P., 1967: The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms.

,*IEEE Trans. Audio Electroacoust.***15**, 70–73, https://doi.org/10.1109/TAU.1967.1161901.Xiao, W., Y. Liu, G. Wu, and D. K. P. Yue, 2013: Rogue wave occurrence and dynamics by direct simulations of nonlinear wave-field evolution.

,*J. Fluid Mech.***720**, 357–392, https://doi.org/10.1017/jfm.2013.37.Young, I. R., 1995: The determination of confidence limits associated with estimates of the spectral peak frequency.

,*Ocean Eng.***22**, 669–686, https://doi.org/10.1016/0029-8018(95)00002-3.