1. Introduction
Uncertainties in the prediction of wind and wave extremes challenge the design and construction of marine systems. Design and construction of these systems rely on accurate statistical analyses of historical datasets, which provide extreme value return period estimates (i.e., values exceeded on average once every N years, where N is the defined return period). Inference statistics approaches are based on fitting a specified probability distribution to the available data series of observations and then, with preliminary hypotheses, estimating the extremes corresponding to the return period. In disciplines such as atmospheric and ocean sciences or hydrology, extreme value analysis (EVA) has become common practice (Ferreira and Soares 1998; Coles 2001; Caires 2011, 2016). Instead of considering the whole dataset, EVA concentrates on the tail of the probability distribution function. For such analyses, it is desirable to have long continuous observational records. In practice, however, such records are seldom available. This hinders a reliable evaluation of the probability of occurrence of the extreme values (EV) (Young 1999; Alves and Young 2003; Holthuijsen 2007; Zieger et al. 2009; Aarnes et al. 2012; Breivik et al. 2013, 2014). Satellite measurements are a potentially promising data source with which to estimate wind and wave extremes. They provide global coverage, and their temporal duration, across multiple platforms, is now approximately 30 years (Zieger et al. 2009; Young et al. 2017). However, the spatial density of such systems is such that storm events may be missed or underestimated (Young et al. 2011). In addition, there are questions about the accuracy of satellite measurements under extreme conditions (Young et al. 2017; Young and Donelan 2018). The different instrument biases may also affect the EV estimates. In contrast, buoys, ships, and platform measurements are limited in space but may have long time series. Although buoy and platform measurements are often regarded as “ground truth” and are used to calibrate and validate both models (Bidlot et al. 2002) and satellites (Zieger et al. 2009; Young et al. 2017), questions remain about their ability to measure extreme wind speeds and wave heights (Large et al. 1995; Zeng and Brown 1998; Taylor and Yelland 2001; Howden et al. 2008; Bender et al. 2010; Jensen et al. 2015). A solution to these constraints is to use numerical reanalyses. These provide long-term continuous time series on which to perform EVA, but are affected by analysis states and model biases (Caires et al. 2004; Sterl 2004; Stopa and Cheung 2014; Aarnes et al. 2015).
To address the stochastic variability in global weather systems, meteorological agencies have adopted numerical weather prediction (NWP) models of atmosphere–ocean dynamics that include a probabilistic element. These models adopt an ensemble of operational forecasts that allows the estimation of uncertainties in forecasts, and thus of the atmosphere–ocean system predictability (Palmer and Hagedorn 2006). Weather and climate are, by their nature, characterized by uncertainty. In such hydrodynamically complex systems significantly diverging states arise from slightly different initial conditions, as shown by the pioneering work by Lorenz (1963).
The ensemble approach deals with uncertainties, not only in determining future states of the atmosphere, but also by calculating an analysis (Epstein 1969), that is, the initial state of the model. Probabilistic methods and ensemble forecasts are now commonly used in weather prediction (Lewis 2005). In the past few decades the atmosphere–ocean coupled global Ensemble Prediction System at the European Centre for Medium-Range Weather Forecasts (ECMWF) has been continually developed and upgraded, in terms of model physics, assimilation methods, ingestion of observations and numerical resolution (Molteni et al. 1996).
The current ECMWF Integrated Forecast System (IFS) is based on a four-dimensional (time and the three spatial dimensions) variational assimilation scheme (4DVAR) to estimate the initial state (Rabier et al. 1998). This is combined in an ensemble data assimilation (EDA) procedure where an ensemble of 4DVAR analyses is created that reflects the uncertainty of the analysis (Isaksen et al. 2010). The EDA in turn is blended with singular vector estimates (Buizza and Palmer 1995) to generate an ensemble of initial conditions that contain the fastest growing modes to ensure a realistic ensemble spread with lead time. A variable-resolution approach is employed to allocate more resources to the first 10 days when forecasts are more accurate, and less (typically half the resolution) for the remainder, to day 15 (Buizza et al. 2007) and in some cases even further.
Ensemble prediction systems were not developed with the intention of extreme value estimation, but Breivik et al. (2013) demonstrated a practical application to ocean wind and wave extremes estimation. Their work was focused on the northeast Atlantic Ocean and the North Sea. Using a dataset of operational ensemble forecasts from 1999 to 2009 they introduced four criteria with which to validate an EVA and then found the 100-yr return period values of 10-m neutral wind speed
Breivik et al. (2014) extended the approach to the global scale and applied EVA to find the 100-yr return period values over the world’s oceans. Results were again compared with ERA-Interim and observational data showing comparable results but with much reduced confidence intervals compared to traditional EVA.
The objective of this work is to further develop this approach for wind and wave extreme value analysis from ensemble forecasts. The present methodology considers five different forecast lead times for a larger synthesized dataset, and thus tightens the estimated confidence levels compared to those found by Breivik et al. (2014). This is achieved by selecting peaks from each of the ensemble forecasts to ensure independence. The synthesized dataset has an equivalent length of 750 years [3 times longer than in Breivik et al. (2014)]. Thus, EVA can be achieved without the need to fit and extrapolate a probability distribution function to the data—for a 100-yr return period, the estimated wave height/wind speed is “in-sample.” The approach employs operational forecasts from a relatively short period (6 yr) to ensure that the dataset is stationary and that the model resolution remains unchanged. Finally, the resulting extreme value estimates of wind speed and wave height are compared with traditional approaches (peaks over threshold) for buoy data at specific sites and global values from satellite altimeter data and long duration model reanalysis (ERA-Interim) databases.
The paper is structured as follows: in section 2, a summary of traditional methods for EVA shows the potential for model ensemble EVA. Section 3 describes the methodology, the criteria followed, and the datasets used. Section 4 presents the results of the EVA with a comparison with different datasets, traditional long-term statistical methods, and EVA of buoy measurements. The discussion in section 5 leads to the conclusions (section 6), outlining the strengths and limitations in estimating ocean wind and wave extremes from model ensembles.
2. Estimation of wind and wave extremes
A large body of literature and research has been devoted to estimating
Based on three years of satellite altimeter data, a first evaluation of wind speed
Similarly, EVA of wind speed and significant wave height can be based on model data obtained from hindcasts or reanalyses (Aarnes et al. 2012, 2015; Caires and Sterl 2005). The goal of a reanalysis is to produce a dataset that is statistically homogeneous in space over a long time period that is not affected by model changes and thus statistically stationary. These reanalyses rely on data assimilation from in situ measurements and satellite records. An example of the impact is the improvement of results for the Southern Ocean due to satellite measurement assimilation (Caires et al. 2004). In an intercomparison of the different wind and wave reanalysis datasets, Caires et al. (2004) showed that the accuracy of such models has improved, particularly those produced by ECMWF.
ECMWF has generated a series of increasingly sophisticated reanalyses, starting with ERA-15 (Gibson et al. 1997), which covered the period 1979–93. This was followed by ERA-40 (Uppala et al. 2005), which covered the period 1957–2002. Several global extreme value analyses of significant wave height have been based on the ERA-40 dataset. The Royal Meteorological Institute of the Netherlands (KNMI) Atlas was based on ERA-40 model results, and produced results that generally underestimate wind speed and wave height extremes (Sterl and Caires 2005). Because of changes in data assimilation and model resolution over time (Sterl and Caires 2005), the consistency and hence the reliability of the results from the model were limited. Today the most commonly used reanalysis for EVA is ERA-Interim (Dee et al. 2011), which covers the period from 1979 until today. ERA-Interim has been used to evaluate ocean extremes and potential trends of wind and waves (Aarnes et al. 2012, 2015) even though reanalysis datasets have limitations due to inhomogeneity in data assimilation (Sterl 2004). Moreover, ERA-Interim still underestimates wind speed and wave height, and according to Stopa and Cheung (2014) particular attention must be paid to the analysis of the upper percentiles of the data, which may not be well represented by the model.
EVAs of datasets are also affected by the methodology used to estimate return periods. The initial distribution method (IDM) is a common approach that fits a probability density function (PDF) to the whole body of data available, and extrapolates the chosen PDF to the desired return period. The IDM is commonly used when the size of the available dataset is relatively small. As the IDM uses all the data, it can return stable estimates in such situations. However, there are a number of limitations to this approach. First, the extrapolation of the tail of the PDF to the desired probability level (return period) is done with a fit dominated by the bulk of the data in the body of the PDF. Hence, the accuracy with which the tail is defined is questionable. Second, there is no theoretical basis for the selection of the appropriate PDF form to fit to the data.
Extreme value analysis based on independent maxima (or minima), as opposed to IDM, which considers the entire dataset falls into two categories, 1) asymptotic models and 2) threshold models (Coles 2001). The extrema must be independent and identically distributed (i.i.d): independent, so that observations at any particular time are not correlated with other data points close in space or time, and identically distributed in order to apply common inference statistics theorems. For asymptotic models the extreme value theorem indicates that a generalized extreme value (GEV) distribution can be fitted to the dataset, and the distribution of asymptotic block maxima (or minima) be used to find the desired return levels. The time series is typically constructed by extracting the annual maxima (AM). The lack of long measurement time series, as well as the fact that only one value is retained per year, often prevents robust estimates of the desired return periods with AM.
The small datasets resulting from AM are partly overcome by threshold models. A peaks-over-threshold (POT) approach (Pickands 1975) fits peaks above a threshold u to a generalized Pareto (GP) distribution (Coles 2001). This method allows the selection of a larger number of extreme values compared to the AM method, improving statistical inferences. There is, however, no theoretically robust approach to selecting the threshold u. Once the peaks are selected, attention must be paid to ensure each of these values is independent (i.e., not selected from the same storm). A common practice is to consider 48-h storm independence; that is, the peaks must be at least 48 h apart (Lopatoukhin et al. 2000; Caires and Sterl 2005). The POT method is often strongly dependent on the threshold selection, and this in turn depends on the geographical location, time period, and dataset itself (Caires and Sterl 2005; Holthuijsen 2007). The ultimate choice tends to be a trade-off between bias and variance (Caires 2016). That is, a low threshold may lead to bias but will result in a larger dataset and reduced variance in the fit to the PDF.
In each of the approaches described above (IDM, AM, POT), the measured time series is shorter than the desired return period. As a result, the PDF needs to be extrapolated to the desired probability level (return period). If the time series were longer than the desired return period, extrapolation would not be necessary and the EV could be directly determined, in-sample, from the tail of the empirical PDF of the recorded data or from a theoretical PDF fitted to the data.
3. Wind and wave extremes from ensemble model predictions
Breivik et al. (2013, 2014) showed that at advanced lead times the individual ensemble members of the ECMWF Ensemble Prediction System (ENS) correlate only weakly, and hence represent independent realizations of the wind or wave field. As such, it is possible to synthesize a dataset the equivalent of hundreds of years. The desired return period can then be determined as above without extrapolation of the PDF. That is, using a direct return estimate (DRE) from an in-sample measure of the empirical distribution, Breivik et al. (2013, 2014) calculated 100-yr return estimates of 10-m wind speed and significant wave height
To answer these caveats, the methodology adopted here is based on the ECMWF ENS dataset (which consists of forecasts issued twice daily at 0000 and 1200 UTC) over a 6-yr period, from 3 March 2010 to 29 February 2016. In this period the spatial resolution was unchanged at approximately 32 km for the atmosphere and about 54 km for the wave field. Every forecast is composed of 50 ensemble members derived from slightly perturbed initial conditions, plus one control member modeled with the best estimate of the initial conditions, and one HRES (higher resolution) member that is the most accurate forecast run at higher spatial and temporal resolution. We chose to focus on the 50-member dataset that we have available. The present analysis could be extended to add the control member that has the same resolution as the ensemble, thus adding an additional member. The HRES member is not considered as it is from a model run at twice the resolution of the ensemble.
In this study we focus on the ensemble of 50 perturbed members interpolated on a 1.0° × 1.0° global grid, excluding the ice-infested areas (south of 70°S and north of 80°N). ENS forecasts are taken at advanced lead times to assure i.i.d. data, as detailed in section 3a. In contrast to Breivik et al. (2013, 2014), who selected values at a fixed lead time of 10 days, we select the maximum value of the five lead times from day 9 to day 10 (+216, +222, +228, +234, and +240 h). The objective is to estimate wind and wave extremes for a 100-yr return period
The operation of the ECMWF Ensemble Prediction System and the EVA methodology adopted are shown diagrammatically in Fig. 1. A new set of ensemble forecasts is issued twice daily (0000 and 1200 UTC). As noted above, 50 perturbed ensemble members are generated, each with slightly different initial conditions. As shown in Fig. 1a, these forecasts will diverge in time. If these ensemble forecasts remain representative of the true wind/wave climate (both means and extremes) and they diverge to the extent that they are not significantly correlated, they will each represent a possible realization of the wind/wave field (in other words, a random draw from climatology). If this is the case, they could all be used in an EVA. Breivik et al. (2014) selected values at a lead time of 10 days (+240 h). In the analysis here, we consider lead times from day 9 to day 10 (+216 to +240 h). Over this 30-h period we select the maximum wind speed/wave height (Fig. 1b) from each member.

Procedure for advanced lead times ensemble forecasts pooling. Construction of the synthesized dataset to find direct return estimates out of the 1000 highest peaks. (a) Each ensemble forecast diverges with time. (b) For each ensemble member, the maximum value is selected from lead times +216 to +240 h. (c) The selected maximum values are representative of a 30-h period. Each period is independent, as shown by the vertical dotted lines. (d) The selected maximum values represent independent realizations of extreme wind speed/wave height and are pooled for the EVA.
Citation: Journal of Climate 31, 21; 10.1175/JCLI-D-18-0217.1
Thus, the 30-h maximum is kept for each of the 50 ensemble members. It is, however, important to note that the values obtained for two consecutive forecasts, which commenced 12 h apart for, say, ensemble member 1 (ENS1), are not related as they are started from initial states 12 h apart and then diverge as the forecast evolves (out to days 9 and 10). Indeed, the perturbations added to each member at analysis time are random (see discontinuities in Fig. 1c). This is quite different from traditional EVA (e.g., POT, AM, IDM) in which values are selected from a continuous time series. As will be shown below, it is thus necessary to assign some representative time interval to each of these independent values to arrive at an equivalent length of our dataset.
Such an analysis is valid, and thus provides relevant results, only if the sample of data has specific characteristics (Coles 2001). These characteristics can be represented by the following methodological criteria.
Methodology criteria
Breivik et al. (2013, 2014) proposed four criteria required for statistical validity of the ensemble data when applied to EVA:
- No significant correlation exists between ensemble members at advanced lead times (i.e., i.i.d. ensemble members).
- Model climatology is comparable to the observed climatology distribution.
- There is no spurious trend in the dataset due to model updates.
- Forecasts are representative for a specific time interval.
1) Criterion 1: i.i.d. ensemble members

is the anomaly correlation coefficient between and (x represents and in the present application); and are two randomly chosen ensemble members (in this case ENS1 and ENS50);- t is time level, from 1 to T (T = 2192 days × 2 = 4384 values; total of the 2010–16 6-yr dataset at two forecasts per day); and
and are the anomalies computed by subtracting the monthly mean for each year, and each month of the 2010–16 ENS dataset.
Testing was performed initially at the same three reference grid points for the North Atlantic Ocean and the North Sea as used by Breivik et al. (2013) (Fig. 2):
- P40: Ekofisk Oil Field, WMO code LF5U (56.50°N, 3.20°E),
- P35: Heidrun Oil Field, WMO code LF3N (65.30°N, 7.30°E),
- B16: K5 buoy, WMO code 64045 (59.10°N, 11.40°W).

Locations used for testing the methodology criteria: P40—Ekofisk Oil Field, WMO code LF5U (56.50°N, 3.20°E); P35—Heidrun Oil Field, WMO code LF3N (65.30°N, 7.30°E); and B16—K5 buoy, WMO code 64045 (59.10°N, 11.40°W).
Citation: Journal of Climate 31, 21; 10.1175/JCLI-D-18-0217.1
The ACC [Eq. (1)] for


As expected, the correlation is higher for shorter lead times and is higher for the open ocean sites, compared to enclosed seas (P40). The reason why lower anomaly correlations are found in the North Sea compared to the North Atlantic is the reduced swell propagation to this more sheltered region. As such, locally generated wind sea will largely define the wave climate. Wind sea has lower predictability at long lead times than swell (see Fig. 4).
The results of Table 1 are comparable to the values found by Breivik et al. (2013). The small differences found may be due to the different approach used in the subtraction of seasonality in the data, and to the different time spans of the datasets. The present dataset spans the period 2010 to 2016, while Breivik et al. (2013) analyzed data from 1999 to 2009.
The ACC [Eq. (1)] was computed for both

Centered anomaly correlation:
Citation: Journal of Climate 31, 21; 10.1175/JCLI-D-18-0217.1

Centered anomaly correlation:
Citation: Journal of Climate 31, 21; 10.1175/JCLI-D-18-0217.1
To test whether the ensemble members were identically distributed, a quantile–quantile (Q–Q) comparison was performed for the three test sites of Fig. 2. The Q–Q plots and scatterplots are shown in Fig. 5. Again, results are shown for the two ensemble members ENS1 and ENS50. The large amount of scatter confirms that

Quantile–quantile plots of ENS1 against ENS50 at +216-h lead time for the test locations: (a),(b) P40 in the North Sea and (c),(d) P35 and (e),(f) B16 in the North Atlantic, showing (left)
Citation: Journal of Climate 31, 21; 10.1175/JCLI-D-18-0217.1
For comparison with the findings by Breivik et al. (2013), Fig. 5 presents the three North Atlantic sites shown in Fig. 2. The same analysis was carried out at a total of 33 locations around the globe (Fig. 6), yielding results consistent with those in Fig. 5.

Test locations over the global dataset domain.
Citation: Journal of Climate 31, 21; 10.1175/JCLI-D-18-0217.1
2) Criterion 2: Modeled and observed climatology
Wind and wave model data (ENS operational forecasts) must represent the observed climatology to be used for an EV estimation. Hence a second check of the data is performed involving comparisons of the following: 1) monthly ENS
To test the model climatology we first investigate the satellite dataset recently calibrated, validated and extended until 1 April 2015 by Young et al. (2017). The objective of the comparison is to evaluate the monthly mean and 90th percentile values of
The comparison is performed for the ensemble (1° × 1° resolution) at the three locations mentioned in section 1 (Fig. 2). We selected 2° × 2° satellite regions that surround the model locations. The maximum normalized difference values of
Comparison of normalized monthly average differences between the ENS forecasts and satellite (SAT) measurements (Young et al. 2017). The maximum normalized differences


Comparison of (left)
Citation: Journal of Climate 31, 21; 10.1175/JCLI-D-18-0217.1
A comparison of mean monthly values was also performed using ERA-Interim for the same period as the ENS forecasts (3 March 2010 to 29 February 2016). The ERA-Interim reanalysis provides data at the same spatial resolution (on a 1.0° grid) as the ENS, and the temporal resolution is every 6 h. Table 4 shows the maximum normalized difference values
Comparison of normalized monthly average differences between the ENS forecast dataset and ERA-Interim. The maximum normalized differences


Q–Q plots of ensemble member 1 at +216-h lead time and ERA-Interim for the testing locations—(a),(b) P40 North Sea; (c),(d) P35 North Atlantic; and (e),(f) B16 North Atlantic—comparing (left)
Citation: Journal of Climate 31, 21; 10.1175/JCLI-D-18-0217.1
A comparison with monthly 90th percentiles was undertaken to evaluate the climatology of the extremes. Figure 9 shows the normalized

Comparison of (left)
Citation: Journal of Climate 31, 21; 10.1175/JCLI-D-18-0217.1
The comparison for an additional six locations, chosen from the total tested (Fig. 6), is shown for three Northern Hemisphere points in Fig. 10 and three Southern Hemisphere points in Fig. 11. Differences between the model and satellite climatologies are shown at location NWPO1 (Figs. 10a,b). This is an area visited by tropical cyclones. Because of the coarse resolution (32 km × 32 km for the atmospheric component of IFS), the model is unable to reproduce the intensity of these extreme events and we see significant differences in the monthly 90th percentiles of significant wave height. It is also interesting to note the differences between the model and the satellite monthly 90th percentiles at location NIO1 (Figs. 10e,f) during the monsoon months. This is globally the location where the largest differences were found.

Comparison of (left)
Citation: Journal of Climate 31, 21; 10.1175/JCLI-D-18-0217.1

As in Fig. 10, but for the Southern Hemisphere: (a),(b) SO1 Southern Ocean; (c),(d) SEPO1 southeast Pacific Ocean; and (e),(f) SAO3 South Atlantic Ocean (codes refer to test locations in Fig. 6).
Citation: Journal of Climate 31, 21; 10.1175/JCLI-D-18-0217.1
The results for the Southern Hemisphere (Fig. 11) show that the satellites and model monthly 90th percentiles are in good agreement, with normalized differences, generally below 10%.
Both the altimeter and ERA-Interim comparisons were performed for all the locations in Fig. 6 around the globe. Although there are some regional variations (e.g., both models underestimate wind speeds and wave heights in the Somali jet), generally the IFS accurately reproduces the climatology (mean and 90th percentile) well. There remains a question about how well any model reproduces the extremes above the 90th percentile. However, a rigorous global comparison is not possible as altimeter data are also questionable at such extremes (e.g., undersampling). The ultimate test will be the magnitudes of the resulting extreme value estimates (see section 4).
3) Criterion 3: Spurious trend due to model updates
The ECMWF IFS was significantly upgraded in February 2010 and March 2016 with major changes to the horizontal resolution.1 To minimize the impact from model updates, we selected only forecasts between March 2010 and February 2016 as the horizontal resolution remained unchanged in this period.
4) Criterion 4: Representative time interval







As noted in section 3 and in Fig. 1, this representative interval should not be confused with the 12-h interval between forecasts in the IFS. Although forecasts are initiated every 12 h, we take values at long lead times (+216 to +240 h). Therefore, values from successive forecasts are independent. If forecasts were issued every 24 h (for example), it would not change the representative interval
4. Extreme value analysis estimates
Here we compare global return period estimates obtained using the ensemble analysis described in section 3 with the initial distribution method applied to the same short-duration model data and POT estimates obtained with longer-duration ERA-Interim and buoy datasets.
a. ENS direct return estimates
To undertake EVA using direct return estimates of 100-yr return period values for significant wave height,

Thus from a 6-yr ensemble archive we obtain a 750-yr equivalent time period. The 100-yr return period value can then be directly estimated from the (in sample) ranked dataset. The ratio
Figure 12 shows global values of

ENS 2010–16 direct return estimates of 100-yr significant wave height
Citation: Journal of Climate 31, 21; 10.1175/JCLI-D-18-0217.1
Tropical cyclone paths in both the Atlantic and Pacific tend to track east to west and then turn north along the eastern shorelines of these basins. The paths of these low pressure systems appear to be reproduced in Fig. 12 with the extreme waves in both the Atlantic and Pacific extending from high latitudes toward the equator along the eastern boundaries of each basin. Also, baroclinic instability could contribute to the variability of the extremes in these areas due to the contrast of cold air over land with warm air over the ocean. Globally, the values are comparable to the estimates for the 2003–12 period of Breivik et al. (2014) but slightly lower. There are two possible causes for this difference. First, in the present analysis, we select the highest value of the five lead times compared to Breivik et al. (2014), who selected only the value at one lead time. However, in these two cases the probability levels at the 100-yr event are also different [7.5/1000 for the present analysis compared to 2.29/1000 for Breivik et al. (2014)]. Second, the actual simulation periods are different [2010–16 for the present analysis and 2003–12 for Breivik et al. (2014)].
Similarly, we derived 100-yr return values of neutral 10-m wind speed,

As in Fig. 12, but for 100-yr wind speed
Citation: Journal of Climate 31, 21; 10.1175/JCLI-D-18-0217.1
There is also a suggestion of a band of higher
The 95% confidence intervals of the direct return estimates were computed from 500 bootstraps of the 1000 highest peaks considered in the EVA (Breivik et al. 2014; Breivik and Aarnes 2017). As outlined by Breivik and Aarnes (2017), tail estimation where only the k highest values in a bootstrap sequence are required can be greatly economized by keeping only a subset of the original dataset consisting of the

Width of ENS 2010–16 direct return estimate 95 confidence intervals for 100-yr significant wave height
Citation: Journal of Climate 31, 21; 10.1175/JCLI-D-18-0217.1
The confidence limits for

As in Fig. 14, but for 100-yr wind speed
Citation: Journal of Climate 31, 21; 10.1175/JCLI-D-18-0217.1
b. ENS initial distribution method
As noted above, the present analysis uses an archive of six years’ worth of ECMWF ensemble forecasts issued twice daily with 50 members. To make an assessment of how the extreme value analysis of the ensemble compares against a more traditional approach of a 6-yr time series we have performed an IDM analysis where we have selected two ensemble members and one lead time from each forecast (+222 h for ENS1 and +240 h for ENS50). This represents the same amount of data as a 6-yr time series with 6-hourly resolution (which is the temporal resolution of for example ERA-Interim). A 6-yr time series would be too short for traditional EVA. A POT analysis is not feasible with such a short record, let alone AM (which would consist of only six values). As noted in section 2, the IDM approach has a number of theoretical limitations and hence we do not advocate its use. Nevertheless, for such a short time series, it would be the only alternative and it is included here as a comparison to the DRE drawn from the same climatology. A decorrelation value of 3 h was chosen for the IDM analysis (Vinoth and Young 2011). As there is no agreed theoretical choice for the IDM PDF, a goodness-of-fit test was performed for each of the three testing points P40, P35, and B16 (Fig. 2) and to the additional locations of Fig. 6. The best fits were achieved with a Gumbel distribution for

ENS 2010–16 100-yr return value estimates of
Citation: Journal of Climate 31, 21; 10.1175/JCLI-D-18-0217.1

ENS 2010–16 100-yr return value estimates of
Citation: Journal of Climate 31, 21; 10.1175/JCLI-D-18-0217.1
The other significant difference compared to the DRE is the fact that the IDM shows the maximum values in both hemispheres of comparable magnitude. This is not surprising, as previous studies (e.g., Young 1994, 1999) have shown that the mean monthly values of
c. ERA-Interim POT method
It is common practice to determine extremes (design sea states) using reanalysis datasets such as ERA-Interim. As we have a relatively long reanalysis dataset (38 yr), it is possible to undertake a POT analysis. As this has a far sounder theoretical basis, it represents a more compelling comparison than the IDM approach.
As the ERA-Interim dataset is long, this analysis also tests whether adopting the relatively short 6-yr dataset for the DRE introduces any significant bias associated with a time period which is not representative of the general climate. The POT EVA was applied to the full ERA-Interim dataset (1979–2017) with a threshold at each location chosen at the local 99.8th percentile. An exponential distribution was then fitted to the data, with the results appearing in Figs. 18 and 19 for

ERA-Interim 1979–2017 POT 100-yr return period significant wave height
Citation: Journal of Climate 31, 21; 10.1175/JCLI-D-18-0217.1

ERA-Interim 1979–2017 POT 100-yr return period wind speed
Citation: Journal of Climate 31, 21; 10.1175/JCLI-D-18-0217.1
The spatial distributions are remarkably similar to the DRE results (Figs. 12 and 13). As for the DRE, the maximum values are located in the Northern Hemisphere and the maxima tend to be displaced toward the western sides of the North Atlantic and North Pacific, consistent with the locations of the major storm tracks.
The maximum values of
d. In situ measurements—NDBC buoys
The ENS return values are here compared with buoy data. In this case, we use selected buoys from the U.S. National Data Buoy Center (NDBC). Buoys were selected on the criterion that the mooring water depth is more than 300 m, that the buoy is more than 100 km offshore (to ensure the model data are not influenced by land, considering the relative coarse native resolution of the model of about 54 km for the wave field and 32 km for the atmosphere, here interpolated to 1.0° × 1.0°

NDBC buoys that meet the requirements for EVA comparison to ENS model forecasts.
Citation: Journal of Climate 31, 21; 10.1175/JCLI-D-18-0217.1
NDBC buoys record wave height and wind speed every hour. To compare these measurements to the 6-h forecasts of the ENS dataset used in the EVA, we average the buoy data at a ±2-h interval, centered at synoptic times, 0000–0600 and 1200–1800 UTC (Bidlot et al. 2002). We then select peaks above the 90th percentile, with a separation time greater than 48 h. The time separation ensures storm independence following Lopatoukhin et al. (2000). A peaks over threshold analysis was then applied to the data fitted to a generalized Pareto distribution. The results are shown in Table 5 for
DRE comparison with EVA of buoy data measurements averaged over the synoptic times. EVA of 100-yr return values of

DRE comparison with EVA of buoy data measurements averaged over the synoptic times. EVA of 100-yr return values of 10-m wind speed (

Regarding the winds, the major difference is found for Hawaii (WMO51003). Some of the islands are smaller than the model resolution. Thus, the model cannot correctly represent the topography of the area, which may provide shelter for in situ wind measurements. In general there is good agreement between DRE results from model ensembles and EVA results from buoys.
5. Discussion
The present work extends the method by Breivik et al. (2013, 2014) to estimate wind and wave extremes from operational forecast model ensembles. We focused on the significant wave height and wind speed from a 6-yr archive (2010–16) of the ECMWF operational forecast model. We derived direct return estimates (DREs) of
The global results for both wind speed and wave height direct return estimates are in good agreement with a POT analysis of the 38-yr-long ERA-Interim dataset. Similarly, results are in good agreement with a POT analysis of a range of buoy locations. Both of these comparisons provide strong validation for the approach.
The
Furthermore, the fact of higher correlations between ensemble members found in the tropical areas suggests paying particular attention to the evaluation of the
As mentioned, in the DRE of
The present work also compared the ENS IDM estimates, with an IDM approach for the ERA-Interim. The ERA-Interim
6. Conclusions
This study has extended the approach to wind and wave extreme estimates using atmosphere–wave model ensembles from the ECMWF operational forecasts by Breivik et al. (2013, 2014). The main advantage of the ensemble approach presented here is the ability to synthesize a long duration equivalent dataset for EVA. Considering ensemble members at advanced lead times as independent realizations of atmospheric and ocean surface wave states allows the evaluation of extreme values with classical extreme value methods (Coles 2001). This paper demonstrates the potential of this innovative approach in evaluating wind and wave extremes over the global oceans. With the expected increase in NWP model resolution over the next decade, further improvement is to be expected in model estimates of the ocean and atmosphere (Bauer et al. 2015) that will further strengthen the present approach. Focus on tropical cyclone winds and wave height in ensemble forecasts is the natural next step in order to understand whether increased resolution could correctly reproduce these extreme events, and hence provide reliable EVAs.
Stationarity has been assumed in this work; however, this may not always be the case for return value estimates of atmospheric and oceanic variables, as shown by Young et al. (2011). Future EVA should consider possible increases or decreases of significant wave height and wind speed extremes for the different ocean basins (Hemer et al. 2013; Wang et al. 2014; Aarnes et al. 2015) and integrate the present approach with climate projections.
IRY gratefully acknowledges the support of the Australian Research Council through Grants DP130100215 and DP160100738. ØB gratefully acknowledges support from the Research Council of Norway through the project ExWaMar (Grant 256466) and from the European Research Area for Climate Services through the ERA-4CS project WINDSURFER. The authors would also like to acknowledge Ole Johan Aarnes for supplying the code used by Breivik et al. (2014) to calculate their confidence limits.
REFERENCES
Aarnes, O. J., Ø. Breivik, and M. Reistad, 2012: Wave extremes in the northeast Atlantic. J. Climate, 25, 1529–1543, https://doi.org/10.1175/JCLI-D-11-00132.1.
Aarnes, O. J., S. Abdalla, J.-R. Bidlot, and Ø. Breivik, 2015: Marine wind and wave height trends at different ERA-Interim forecast ranges. J. Climate, 28, 819–837, https://doi.org/10.1175/JCLI-D-14-00470.1.
Alves, J. H. G. M., and I. R. Young, 2003: On estimating extreme wave heights using combined Geosat, Topex/Poseidon and ERS-1 altimeter data. Appl. Ocean Res., 25, 167–186, https://doi.org/10.1016/j.apor.2004.01.002.
Bauer, P., A. Thorpe, and G. Brunet, 2015: The quiet revolution of numerical weather prediction. Nature, 525, 47–55, https://doi.org/10.1038/nature14956.
Bender, L., N. Guinasso Jr., J. Walpert, and S. D. Howden, 2010: A comparison of methods for determining significant wave heights applied to a 3-m discus buoy during Hurricane Katrina. J. Atmos. Oceanic Technol., 27, 1012–1028, https://doi.org/10.1175/2010JTECHO724.1.
Bidlot, J.-R., D. J. Holmes, P. A. Wittmann, R. Lalbeharry, and H. S. Chen, 2002: Intercomparison of the performance of operational ocean wave forecasting systems with buoy data. Wea. Forecasting, 17, 287–310, https://doi.org/10.1175/1520-0434(2002)017<0287:IOTPOO>2.0.CO;2.
Breivik, Ø., and O. J. Aarnes, 2017: Efficient bootstrap estimates for tail statistics. Nat. Hazards Earth Syst. Sci., 17, 357–366, https://doi.org/10.5194/nhess-17-357-2017.
Breivik, Ø., O. J. Aarnes, J.-R. Bidlot, A. Carrasco, and Ø. Saetra, 2013: Wave extremes in the northeast Atlantic from ensemble forecasts. J. Climate, 26, 7525–7540, https://doi.org/10.1175/JCLI-D-12-00738.1.
Breivik, Ø., O. J. Aarnes, S. Abdalla, J.-R. Bidlot, and P. A. Janssen, 2014: Wind and wave extremes over the world oceans from very large ensembles. Geophys. Res. Lett., 41, 5122–5131, https://doi.org/10.1002/2014GL060997.
Buizza, R., 1997: Potential forecast skill of ensemble prediction and spread and skill distributions of the ECMWF Ensemble Prediction System. Mon. Wea. Rev., 125, 99–119, https://doi.org/10.1175/1520-0493(1997)125<0099:PFSOEP>2.0.CO;2.
Buizza, R., and T. Palmer, 1995: The singular-vector structure of the atmospheric global circulation. J. Atmos. Sci., 52, 1434–1456, https://doi.org/10.1175/1520-0469(1995)052<1434:TSVSOT>2.0.CO;2.
Buizza, R., J.-R. Bidlot, N. Wedi, M. Fuentes, M. Hamrud, G. Holt, and F. Vitart, 2007: The new ECMWF VAREPS (Variable Resolution Ensemble Prediction System). Quart. J. Roy. Meteor. Soc., 133, 681–695, https://doi.org/10.1002/qj.75.
Caires, S., 2011: Extreme value analysis: Wave data. Joint WMO/IOC Technical Commission for Oceanography and Marine Meteorology (JCOMM) Tech. Rep. 57, 33 pp., https://www.oceanbestpractices.net/handle/11329/367.
Caires, S., 2016: A comparative simulation study of the annual maxima and the peaks-over-threshold methods. J. Offshore Mech. Arctic Eng., 138, 051601, https://doi.org/10.1115/1.4033563.
Caires, S., and A. Sterl, 2005: 100-year return value estimates for ocean wind speed and significant wave height from the ERA-40 data. J. Climate, 18, 1032–1048, https://doi.org/10.1175/JCLI-3312.1.
Caires, S., A. Sterl, J. Bidlot, N. Graham, and V. Swail, 2004: Intercomparison of different wind–wave reanalyses. J. Climate, 17, 1893–1913, https://doi.org/10.1175/1520-0442(2004)017<1893:IODWR>2.0.CO;2.
Coles, S., 2001: An Introduction to Statistical Modelling of Extreme Value Theory. Springer, 208 pp.
Dee, D., and Coauthors, 2011: The ERA-Interim reanalysis: Configuration and performance of the data assimilation system. Quart. J. Roy. Meteor. Soc., 137, 553–597, https://doi.org/10.1002/qj.828.
Durrant, T. H., D. J. Greenslade, and I. Simmonds, 2013: The effect of statistical wind corrections on global wave forecasts. Ocean Modell., 70, 116–131, https://doi.org/10.1016/j.ocemod.2012.10.006.
Epstein, E. S., 1969: Stochastic dynamic prediction. Tellus, 21, 739–759, https://doi.org/10.3402/tellusa.v21i6.10143.
Ferreira, J., and C. G. Soares, 1998: An application of the peaks over threshold method to predict extremes of significant wave height. J. Offshore Mech. Arctic Eng., 120, 165–176, https://doi.org/10.1115/1.2829537.
Furevik, B. R., and H. Haakenstad, 2012: Near-surface marine wind profiles from rawinsonde and NORA10 hindcast. J. Geophys. Res., 117, D23106, https://doi.org/10.1029/2012JD018523.
Gibson, J., P. Kållberg, S. Uppala, A. Hernandez, A. Nomura, and E. Serrano, 1997: ERA description. ECMWF, 72 pp., https://www.ecmwf.int/en/elibrary/9584-era-description.
Gulev, S. K., and V. Grigorieva, 2004: Last century changes in ocean wind wave height from global visual wave data. Geophys. Res. Lett., 31, L24302, https://doi.org/10.1029/2004GL021040.
Hemer, M. A., Y. Fan, N. Mori, A. Semedo, and X. L. Wang, 2013: Projected changes in wave climate from a multi-model ensemble. Nat. Climate Change, 3, 471–476, https://doi.org/10.1038/nclimate1791.
Hersbach, H., and D. Dee, 2016: ERA5 reanalysis is in production. ECMWF Newsletter, No. 147, ECMWF, Reading, United Kingdom, p. 7, https://www.ecmwf.int/en/newsletter/147/news/era5-reanalysis-production.
Holthuijsen, L. H., 2007: Waves in Oceanic and Coastal Waters. Cambridge University Press, 404 pp.
Howden, S., D. Gilhousen, N. Guinasso, J. Walpert, M. Sturgeon, and L. Bender, 2008: Hurricane Katrina winds measured by a buoy-mounted sonic anemometer. J. Atmos. Oceanic Technol., 25, 607–616, https://doi.org/10.1175/2007JTECHO518.1.
Isaksen, L., J. Haseler, R. Buizza, and M. Leutbecher, 2010: The new ensemble of data assimilations. ECMWF Newsletter, No. 123, ECMWF, Reading, United Kingdom, 17–21.
Jensen, R. E., V. R. Swail, R. H. Bouchard, R. E. Riley, T. J. Hesser, M. Blaseckie, and C. MacIsaac, 2015: Field laboratory for ocean sea state investigation and experimentation: FLOSSIE: Intra-measurement evaluation of 6N wave buoy systems. 14th Int. Workshop on Wave Hindcasting and Forecasting and Fifth Coastal Hazard Symposium, Key West, FL, WMO/IOC JCOMM, Vol. A1, http://www.waveworkshop.org/14thWaves/Papers/WW14%20FLOSSIE%20Jensen%20et%20al.pdf.
Large, W., J. Morzel, and G. Crawford, 1995: Accounting for surface wave distortion of the marine wind profile in low-level ocean storms wind measurements. J. Phys. Oceanogr., 25, 2959–2971, https://doi.org/10.1175/1520-0485(1995)025<2959:AFSWDO>2.0.CO;2.
Lewis, J. M., 2005: Roots of ensemble forecasting. Mon. Wea. Rev., 133, 1865–1885, https://doi.org/10.1175/MWR2949.1.
Lopatoukhin, L., V. Rozhkov, V. Ryabinin, V. Swail, A. Boukhanovsky, and A. Degtyarev, 2000: Estimation of extreme wind wave heights. WMO/TD-No. 1041, 73 pp.
Lorenz, E. N., 1963: Deterministic nonperiodic flow. J. Atmos. Sci., 20, 130–141, https://doi.org/10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2.
Molteni, F., R. Buizza, T. N. Palmer, and T. Petroliagis, 1996: The ECMWF ensemble prediction system: Methodology and validation. Quart. J. Roy. Meteor. Soc., 122, 73–119, https://doi.org/10.1002/qj.49712252905.
Palmer, T., and R. Hagedorn, Eds., 2006: Predictability of Weather and Climate. Cambridge University Press, 718 pp.
Pickands, J. I., 1975: Statistical inference using extreme order statistics. Ann. Stat., 3, 119–131, https://doi.org/10.1214/aos/1176343003.
Pineau-Guillou, L., F. Ardhuin, M.-N. Bouin, J.-L. Redelsperger, B. Chapron, J.-R. Bidlot, and Y. Quilfen, 2018: Strong winds in a coupled wave–atmosphere model during a North Atlantic storm event: Evaluation against observations. Quart. J. Roy. Meteor. Soc., 144, 317–332, https://doi.org/10.1002/qj.3205.
Powell, M. D., P. J. Vickery, and T. A. Reinhold, 2003: Reduced drag coefficient for high wind speeds in tropical cyclones. Nature, 422, 279–283, https://doi.org/10.1038/nature01481.
Rabier, F., J.-N. Thépaut, and P. Courtier, 1998: Extended assimilation and forecast experiments with a four-dimensional variational assimilation system. Quart. J. Roy. Meteor. Soc., 124, 1861–1887, https://doi.org/10.1002/qj.49712455005.
Ranjha, R., M. Tjernström, A. Semedo, G. Svensson, and R. M. Cardoso, 2015: Structure and variability of the Oman coastal low-level jet. Tellus, 67A, 25285, https://doi.org/10.3402/tellusa.v67.25285.
Reistad, M., Ø. Breivik, H. Haakenstad, O. J. Aarnes, B. R. Furevik, and J.-R. Bidlot, 2011: A high-resolution hindcast of wind and waves for the North Sea, the Norwegian Sea, and the Barents Sea. J. Geophys. Res. Oceans, 116, C05019, https://doi.org/10.1029/2010JC006402.
Sterl, A., 2004: On the (in)homogeneity of reanalysis products. J. Climate, 17, 3866–3873, https://doi.org/10.1175/1520-0442(2004)017<3866:OTIORP>2.0.CO;2.
Sterl, A., and S. Caires, 2005: Climatology, variability and extrema of ocean waves: The web-based KNMI/ERA-40 wave atlas. Int. J. Climatol., 25, 963–977, https://doi.org/10.1002/joc.1175.
Stopa, J. E., and K. F. Cheung, 2014: Intercomparison of wind and wave data from the ECMWF Reanalysis Interim and the NCEP Climate Forecast System Reanalysis. Ocean Modell., 75, 65–83, https://doi.org/10.1016/j.ocemod.2013.12.006.
Taylor, P. K., and M. J. Yelland, 2001: Comments on “On the effect of ocean waves on the kinetic energy balance and consequences for the inertial dissipation technique.” J. Phys. Oceanogr., 31, 2532–2536, https://doi.org/10.1175/1520-0485(2001)031<2532:COOTEO>2.0.CO;2.
Uppala, S. M., and Coauthors, 2005: The ERA-40 Re-Analysis. Quart. J. Roy. Meteor. Soc., 131, 2961–3012, https://doi.org/10.1256/qj.04.176.
Vinoth, J., and I. Young, 2011: Global estimates of extreme wind speed and wave height. J. Climate, 24, 1647–1665, https://doi.org/10.1175/2010JCLI3680.1.
von Storch, H., and F. Zwiers, 1999: Statistical Analysis in Climate Research. Cambridge University Press, 485 pp.
Wang, X. L., Y. Feng, and V. R. Swail, 2014: Changes in global ocean wave heights as projected using multimodel CMIP5 simulations. Geophys. Res. Lett., 41, 1026–1034, https://doi.org/10.1002/2013GL058650.
Wilks, D. S., 2011: Statistical Methods in the Atmospheric Sciences. International Geophysics Series, Vol. 100. Academic Press, 676 pp.
WMO, 1998: Guide to wave analysis and forecasting. 2nd ed. WMO-No. 702, 159 pp., https://library.wmo.int/pmb_ged/wmo_702.pdf.
Young, I. R., 1994: Global ocean wave statistics obtained from satellite observations. Appl. Ocean Res., 16, 235–248, https://doi.org/10.1016/0141-1187(94)90023-X.
Young, I. R., 1999: Seasonal variability of the global ocean wind and wave climate. Int. J. Climatol., 19, 931–950, https://doi.org/10.1002/(SICI)1097-0088(199907)19:9<931::AID-JOC412>3.0.CO;2-O.
Young, I. R., and G. Holland, 1996: Atlas of the Oceans: Wind and Wave Climate. Pergamon, 241 pp.
Young, I. R., and M. Donelan, 2018: On the determination of global ocean wind and wave climate from satellite observations. Remote Sens. Environ., 215, 228–241, https://doi.org/10.1016/j.rse.2018.06.006.
Young, I. R., S. Zieger, and A. V. Babanin, 2011: Global trends in wind speed and wave height. Science, 332, 451–455, https://doi.org/10.1126/science.1197219.
Young, I. R., J. Vinoth, S. Zieger, and A. V. Babanin, 2012: Investigation of trends in extreme value wave height and wind speed. J. Geophys. Res., 117, C00J06, https://doi.org/10.1029/2011JC007753.
Young, I. R., E. Sanina, and A. Babanin, 2017: Calibration and cross-validation of a global wind and wave database of altimeter, radiometer and scatterometer measurements. J. Atmos. Oceanic Technol., 34, 1285–1306, https://doi.org/10.1175/JTECH-D-16-0145.1.
Zeng, L., and R. A. Brown, 1998: Scatterometer observations at high wind speeds. J. Appl. Meteor., 37, 1412–1420, https://doi.org/10.1175/1520-0450(1998)037<1412:SOAHWS>2.0.CO;2.
Zieger, S., J. Vinoth, and I. R. Young, 2009: Joint calibration of multiplatform altimeter measurements of wind speed and wave height over the past 20 years. J. Atmos. Oceanic Technol., 26, 2549–2564, https://doi.org/10.1175/2009JTECHA1303.1.
Zieger, S., A. V. Babanin, and I. R. Young, 2014: Changes in ocean surface wind with a focus on trends in regional and monthly mean values. Deep-Sea Res. I, 86, 56–67, https://doi.org/10.1016/j.dsr.2014.01.004.
See the list of major model updates maintained by ECMWF (https://goo.gl/2G5A7x).