1. Introduction
Extreme value analysis (EVA) techniques are widely used to estimate extreme magnitudes in many branches of sciences and specifically in ocean engineering. The basic principle in EVA is that based on a long-term record of a particular random variable, it is possible to infer statistically the exceedance values over a time horizon beyond the analyzed record (Castillo 1988; Coles et al. 2001; Goda 1992; Gumbel 2012). This is possible because under specific conditions, the ordered extreme values (hence the empirical probability distribution) follow an asymptotic behavior that can be modeled in the framework of the generalized extreme value (GEV) distribution (Fisher and Tippett 1928; Goda 1989; Bierlaire et al. 2008; Coles et al. 2001; Ewans and Jonathan 2008; Embrechts et al. 1997). Naturally, the application of EVA is a very delicate subject, because in essence an extrapolation exercise is being carried out. This is certainly critical in engineering applications, where the estimated values are generally intended for design purposes (Ewans and Jonathan 2008), so that an underestimation implies the risk of structural failure, while overestimation is associated with higher, sometimes unfeasible, implementation costs (Bitner-Gregersen et al. 1995). The application of EVA requires thus a severe control of the regularity conditions on the tail of the distribution. Particularly, the main lines of action for modeling extremes are concerned with the independence, and the identically distributed conditions of the random variables (e.g., Gumbel 1958; Vinoth and Young 2011). To satisfy independence in EVA, two well known approaches are typically followed for the sampling of extremes: the annual maxima series (AMS), and the peak over threshold (POT). The advantage of AMS is that block maxima data have negligible dependence, but this happens at the cost of excluding a significant amount of data that might be relevant for the distribution of the extremes (Huang et al. 2015; Leadbetter et al. 2012). A smaller sample also implies a larger variance, hence more uncertainty on the fitting parameters. These shortcomings can be tackled to some extent using the POT method, in which a larger sample is obtained by setting a threshold, above which the values are considered extremes (Coles et al. 2001; Hundecha et al. 2008). A trade-off exists however in the selection of the threshold, because a low value may contaminate the distribution of extremes with regular occurrences (Caires and Sterl 2005), while a too-high threshold derives in a small sample, and thus the same shortcomings of AMS. Setting the threshold is thus a critical but intricate task in the POT method (Jonathan et al. 2008; see also section 4).
While the implementation of EVA methods is generally focused on the condition of independence, less attention is usually given to the identically distributed condition of the data. The consequences of overlooking this condition can be serious, because in such a case, also the condition of homogeneity is violated (Caires and Sterl 2005; Jonathan et al. 2008; Mackay et al. 2010). This is typically the case for environmental variables, because these are indeed expected to be distributed in a nonidentical manner. For instance, due to inherent seasonal variability, they are usually composed of more than one data population (Coles and Walshaw 1994; Forristall 2004; Jones et al. 2016; Vinoth and Young 2011). A straightforward and illustrative example is given by the two monsoon regimes in the Indian Ocean. In summer, moistly warm and relatively strong southwesterly winds blow from the sea, causing the well-known rainy season of the area. In winter, northeasterly cold and dry winds blow from the Tibetan Plateau, imposing cold and dry weather conditions (e.g., Cadet 1979). In practical terms, following a sufficiently careful selection, the extreme values of such time series can be easily set to fulfill independence, using either AMS or POT. However, it is clear that two different populations are involved in the analysis. These two populations have different distributions, means, standard deviations, and certainly also different behaviors in the extremes. Therefore, the working sample is not homogeneous, and clearly it is not identically distributed. This issue has to be taken into account for any statistical analysis, and in particular for the implementation of EVA methods, because the identically distributed condition is a fundamental requirement (Forristall 2004; Jones et al. 2016; Young et al. 2012).
In wave data analysis, these problems have been addressed by several authors (Davison and Smith 1990; Jonathan et al. 2008; Jones et al. 2016; Mackay et al. 2010; Vinoth and Young 2011). Particularly, the covariate effects related to directionality and seasonality, are very conspicuous in wave data (Ewans and Jonathan 2008; Morton et al. 1997). The common practice, in order to account for directionality using integral parameters (Hs and θm), is to subdivide the Hs values according to directional sectors, assuming that every sector contains a homogeneous sample (e.g., Coles and Walshaw 1994; Forristall 2004; Morton et al. 1997). However, this is in general not the case, because the marginal distributions of the different sectors, are generally not independent from each other (e.g., Chavez-Demoulin and Davison 2005; Jonathan et al. 2008). This dependence is usually exhibited in the smooth transition between the different sectors. Jonathan et al. (2008) have shown that incorporating directional covariate effects into the analysis of a heterogeneous sample, produces different results from those generated when directionality is not accounted for. They concluded that the directional model explains better the observed variability, and provides relatively unbiased estimates. Carter and Challenor (1981), suggest that the return values calculated from a random sampling of the entire population, are less than or equal to those obtained when the subpopulations are sampled proportionally. These arguments are in general used to justify the use of covariate models. In turn, Jones et al. (2016), argue that an integrated model (nonstationary or nondirectional) is superior to a method based on subsets, based on two main arguments. The first is the loss of statistical efficiency of the estimated parameters, because of the smaller sample size. The second is that the model parameters in the covariate analysis are estimated considering that the subsets are independent from each other, and assuming that each subset is homogeneous with respect to covariates. However, these assumptions are not necessarily fulfilled, or are difficult to verify. As a consequence, users are generally confronted with the fact that return values estimated from a covariate model are typically not consistent with those obtained from the integrated one (Anderson et al. 2001). Therefore, the question remains on how to reconcile those differences in the design process. Although some guidelines exist for tackling this aspect in engineering manuals [e.g., International Association of Classification Societies (IACS); IACS 2000], the fact is that there is not a general agreement on whether a model considering covariates provides or not better estimates (Mackay et al. 2010). Nevertheless, a consensus exist in that both approaches are subject to, possibly substantial, uncertainties.
In ocean engineering, the working variable for the implementation of EVA methods is the significant wave height Hs. However, Hs is just an integral parameter of the more comprehensive variable for wave characterization, which is the wave spectrum (e.g., Holthuijsen 2007). Therefore, by definition Hs is a composite variable. If Hs is complemented with mean wave period Tm, and mean direction θm, as is usually the case, it is sometimes possible to identify the presence of different populations in the data (Coles et al. 2001; Gerson 1975; Kuwashima and Hogben 1986; Ochi 2005). However, a complicating aspect is the fact that in most cases, a single spectrum is not unimodal, but bimodal or multimodal (e.g., Hegermiller et al. 2017). In such cases, integral parameters (e.g., Hs, Tm, θm) do not provide a fair representation of the actual and complex sea state (Portilla-Yandún 2018). Contrarily, the wave spectrum contains detailed information about the energy distribution in frequency and direction (e.g., Cavaleri et al. 2007). In fact, looking at the wave spectrum, it is possible to identify packets of energy that belong to different wave systems, each originated by a single meteorological event (e.g., Sverdrup and Munk 1947; Pierson et al. 1955). These wave systems can be detected in an automated way, using spectral partitioning techniques (Portilla-Yandún et al. 2009), and further classified according to their long-term cluster characteristics (Portilla-Yandún et al. 2015b). In this paper, we make use of such techniques, in order to more consistently identify different populations in wave data series, and to assess their statistical properties in regard to the application of EVA, in particular the identically distributed condition. In section 2, we describe the data used, and briefly summarize the partitioning and long-term spectral characterization methods. For the implementation of EVA, we use the POT method, discussing the general challenges associated with the definition of the threshold, also in the partitioned series. In section 3, we present the main results, and give an overview of different situations at different locations, depending on the local wave climate. In section 4, we provide a general discussion about the statistical assumptions made in the analysis, and the possible solutions for practical applications. Finally, in section 5 we summarize our main conclusions.
2. Methodology
a. Data used
The present analysis is based on long-term spectral wave data from the European Centre for Medium-Range Weather Forecasts (ECMWF), corresponding to the ERA-Interim archive (see Dee et al. 2011). ERA-Interim contains global data with spatial resolution of about 110 km, and time coverage from 1979 to 2015 at 6 h interval, making a total of 54 056 spectra per grid point. These spectra are discretized in 30 frequencies, from 0.035 to 0.55 Hz in geometric steps of 1.1, and 24 directions from 7.5° to 352.5°. Although the method as such is equally applicable to both model and observational data, the main reason to use model data here, is that the introduction of the method is substantially facilitated by factors like the homogeneity of the model data, the availability of long records at single locations, and the availability of data with global coverage, from which clear and illustrative examples can be selected to illustrate different possible situations.
b. Implementation of the POT method
c. Wave spectral partitioning and wave spectral statistics
The wave spectrum contains the energy distribution in frequency and direction. From such distribution, it is possible to identify the energy packets, originated by different meteorological events in different places and times, that converge to the observation point. This is carried out by partitioning methods (see Portilla-Yandún et al. 2009). Furthermore, it has been shown that the long-term distribution of partitions at any particular place is not random, but responds to the local physical, geographical, and meteorological conditions that characterize the reference area (Portilla-Yandún et al. 2015b). For instance, in the open ocean long (low-frequency) swells with narrow spectral distributions are typically found, whereas in the storm areas, broader spectra with higher frequency are the norm. The directionality of waves is also not random, but depends on the specific generation conditions dominating the area.
For illustrating both, the principle of long-term spectral wave systems, and a clear situation of superimposed nonidentically distributed time series, we consider the example given in Fig. 1. The site corresponds to the eastern equatorial Pacific (EEP; Fig. 1a), whose spectral indicators allow a direct interpretation of the local wave conditions. Figure 1b shows the long-term distribution of partitions in the spectral space, and its corresponding clusters associated with different wave systems (WS). In the present example, we observe five of them, all very clearly defined, and belonging to a different origin. Their interpretation is as follows: WS1 and WS2 correspond to far generated southerly and northerly swells, respectively; WS3 is related to the trade winds; WS4 belongs to the Panama wind jet; and WS5 corresponds to westerlies [for further details on the wave climate in the area, see Portilla-Yandún et al. (2015a)]. Figure 1c shows the wave height distributions of each of these systems, and also that of the total, in boxplot format. The red crosses, outside the interquartile range (IQR), give a preliminary idea of the possible extremes. Note that in this particular case, both the central values and the extreme conditions of the total are larger than any of the individual series, this hints to simultaneous occurrences of different wave systems. Figure 1d, shows the seasonal variability of Hs, indicating that each wave system responds to a different season. WS1 peaks in May, whereas WS2 peaks in January, WS3 in November, WS4 in February, and WS5 in October. For interpretation, the local wind conditions are given in Fig. 1e, showing that the dominating winds are the trades, with typical southwesterly directions in the EEP, but with moderate magnitudes.
Given the different origin, spectral characteristics, and seasonality, these five wave systems cannot be considered homogeneous nor identically distributed (i.e., they comprise different data populations). Therefore, the total Hs series does not fulfill the basic requirements for the application of EVA. Note also that working with subsets based on directional sectors will not solve the problem, because the directional sector from 15° to 75°, contains both WS1 and WS2, while the sector from 75° to 135° contains both WS2 and WS5. A similar issue arises with subsets based on seasons, because the pairs WS2–WS4 and WS3–WS5 occur in overlapping seasons. The level of simultaneous occurrences of these wave systems is indicated in Fig. 1f, which shows the joint probability of the different pairs occurring above a threshold of 1.0 m, a value relatively high for this site. In this particular case, the largest joint probability is that of pair WS1–WS3, with 2.70% (Fig. 1f). All these aspects have implications in the application of EVA that we shall consider in the following section.
3. Analysis of extremes in the context of long-term spectral wave systems
a. Eastern equatorial Pacific case
Following the implementation of EVA for the site just described (EEP), Fig. 2 shows the empirical (i.e., derived from recorded data) and theoretical (i.e., GPD) probabilities of the total Hs, as well as those of the individual wave systems. Their fitting parameters are shown within the corresponding panels (see also Table A1 in the appendix for the actual values). We observe that the curve fitting is in general good, both for the total and also for the individual series. The red dash lines indicate the confidence limits according to Eq. (2). However, since these parameters are different for every dataset, the curves are different too, so the extremes project to different values. For instance, for the 99th percentile we obtain the values given in Table 1.
Fitting parameters for the wave systems in the EEP.
These discrepancies arise naturally, because the total Hs is a composite of the different individual conditions, each with different characteristics. This is illustrated in Fig. 3a, which shows the Q–Q plot of the total Hs.
We observe that in the extremes, the total Hs is indeed not only composed by a collection of different single wave systems (i.e., pure contamination), but it is also made of a mixture of single, bimodal, and multimodal conditions (i.e., superposition). Notably, the maximum and a large number of the highest extremes, result from the combination WS1–WS3. Similarly, other high events derive from the combination WS1–WS4–WS2. This is not unexpected, since from Fig. 1c, we already assessed that WS1, WS3, and WS4 dominated the extremes, with the pair WS1–WS3 related to relatively high joint occurrences at high values (Fig. 1f).
On the other hand, since for most practical applications we seek the Hs projection for a specific time horizon (e.g., 50, 100 years), we also evaluate the related implications for this variable. This is shown in Fig. 3b, in which indeed the total Hs displays the largest values and projections, with all the individual projections producing lower values. However, also important are the slopes of these lines, which are related to the individual distributions of the extremes. For instance, WS4 is particularly interesting, because despite the fact of being short tailed (negative κ), its slope in Fig. 3b is significantly larger than those of the others (see Table 1 and Table A1 in the appendix for the specific values). This arises, because although the general occurrences of WS4 involve relatively low values (see Fig. 1c), its extremes can be large and relatively recurrent. Physically, this is due to the exposure of the point to the Panama jet, whereas for the other systems this is a shadow area. Note also that due to the larger slope of WS4, the projection of this system will surpass that of the total Hs at some horizon beyond 100 years. Therefore, while Fig. 3 suggests that the extremes are dominated by the pair WS1–WS3 (gray dots in Fig. 3a), with systematic contamination from other combinations, it also shows that WS4 is involved in some of the largest extremes, being thus potentially overlooked. Since this situation affects the whole distribution, an overestimation from the total Hs is likely, but this cannot be definitely inferred from projected values of Hs alone. We discuss this aspect further in section 4, suggesting a possible way to tackle such uncertainties in the context of specific applications.
Naturally, the situation just described is not general, but specific to the considered site, and to the distributions of the individual wave systems time series. What is indeed general is the ill condition of the total Hs series to represent and project extremes. We illustrate this on different conditions in the following subsections.
b. Arabian Sea case
For a different situation, let us consider wave conditions in the Arabian Sea (AS) (12°N, 63.41°E), represented in Fig. 4. The interpretation of the spectral indicators is similar to the previous case. Figure 4b shows the existence of six wave systems. WS1 is related to the southeasterly trade winds in the area. WS2 and WS5 correspond to the southern swells, and WS3 and WS4 are driven by the winter and summer monsoon, respectively (see the wind rose in Fig. 4e). WS6 corresponds to eventual northwesterly winds. Note that the Hs values of WS4 are significantly higher than those of the other series, both in terms of regular occurrences and extremes (Fig. 4c). Indeed, the distribution of WS4 is short tailed (see Table A2 in the appendix), so there are actually no extremes beyond the IQR. It is also clear in this case, that the distribution of the total is not representative of any of the individual series, but just a poor mixture of all of them.
From the seasonal pattern of Fig. 4d, we observe that WS4 is specific to the austral winter (summer monsoon), together with the southern swells WS2 and WS5, while WS3 corresponds to the boreal winter regime (winter monsoon). As mentioned before, these two seasons are very marked in the region (see Fig. 4e). The implication for the computation of extremes is shown in Fig. 5, which shows the extreme value probability distributions for these different series (except WS6 for the sake of conciseness).
Similar to the previous case, the curve fitting is good, with each series projecting to different values. In turn, Fig. 6 shows the composition of the extremes in the Q–Q plot (Fig. 6a) and the return period projections (Fig. 6b). As expected, we observe that the extremes of the total are heavily dominated by WS4. Only below a wave height of 5 m, a bimodal condition appears, where WS4 is still involved. As a result, in contrast to the EEP, where the total was enhanced by the inherent combinations, here the return period projection of the total closely corresponds to one of them (WS4). See the fitting parameters in the appendix (Table A2) for more details. This is a convenient situation in the context of EVA, because it generates fewer conflicts between the total, and the partial Hs projections, providing thus better confidence of the results.
c. North Atlantic case
In the previous two cases, and under completely different conditions, we observed that the distribution of the total wave height is such that its projection is not larger than those of the components. However, assuming that this is a general rule is only wishful thinking, because as anticipated by WS4 in the EEP, the extreme behavior of the individual series can be such that their slopes are large enough to overcome at some projection horizon that of the total. From the design point of view, this may be indeed a very unfortunate situation, but examples of such a condition can be easily found in the oceans. We consider a location in the North Atlantic Ocean (NA; 32°N, 57.65°W), which is an area prone to high meteorological and wave conditions. Figure 7 offers an overview of the local spectral wave climate. The point is located in a storm generation area exposed to cyclonic conditions. Therefore, we find four wave components distributed along the whole circle (Fig. 7b). WS2 is the dominant one, related to the prevailing westerly winds (see Fig. 7e). WS1 is relatively opposite in direction (southeasterly), with recurrent conditions and eventually large extremes as well. WS3 is related to northeasterly winds, and WS4 is in general low and related to occasional southerly conditions. From Fig. 7c, we observe that the major extremes come from WS2, with contributions also from WS1 and WS3. Figure 7d shows that the individual systems have a rather complex seasonal pattern, with WS2 related to the boreal winter, WS1 belonging to the fall season, and WS3 displaying one peak in the spring and one in the fall. WS4 is rather uniform, but with higher values in winter and spring. Figure 7f shows a high probability of joint occurrences among WS1, WS2, and WS3. For instance, the joint probability of the pair WS2–WS3 is 7.87% and that of WS1–WS2 is 6.14%, indicating a probably high level of physical dependence among the wave systems due to the cyclonic conditions.
Similar to the previous examples, Fig. 8 shows the extreme probability distributions of the total Hs and those of the individual series, in general indicating a good fit. Figure 9 shows the Q–Q plot of the total Hs, showing the composition of its extremes (Fig. 9a) and the individual projections of the return periods (Fig. 9b).
The outlook of Fig. 9a is significantly different from that of Fig. 3a. In the EEP, the total Hs involves both superposition (bimodal and multimodal states) and contamination by unimodal systems. In turn, contamination is the main issue at this location, with a minor level of superposition (black stars). From Fig. 9a, it is also apparent that the behavior is similar to that of the Arabian Sea, with one of the series (WS2) dominating the total, with some contamination of WS1 and WS3. However, WS1 has two particular characteristics that complicate the issue. The first is that its slope is significantly larger than both, the one of WS2, and that of the total. The second is that in contrast to the others, its shape parameter κ is positive, indicating a long-tailed distribution, hence the concave shape of the fitting curve (see Table A3 in the appendix for the actual κ and
4. Discussion
Many arguments can be given to favor or contradict the use of integral or covariate models in EVA. In the present study, we advocate the use of covariate approaches based on spectral partitioning. The fundamental reason, is that the spectral wave climate clearly shows in each case, the coexistence of different data populations following different distributions, which contribute differently to the total Hs. Therefore, an integral model will always violate the identically distributed condition to a certain extent. In spite of a possible good fit, EVA theory and the GPD do not apply to these conditions. The different cases presented here illustrate that integral models lead to different scenarios for the return period projections in relation to those of the individual series. In most cases, the rather intuitive principle holds that for certain Hs the probability of the total is larger than any of the components (e.g., Carter and Challenor 1981; Forristall 2004; Wimmer et al. 2006). Unfortunately, this principle is not general, as demonstrated by the case in the North Atlantic. In turn, from the analyzed cases we observe that this is a natural consequence, given by the shape of the distributions in the extremes, which always follow different slopes, inherent to their physical and hence statistical characteristics. In general, we can say that every location in the ocean behaves differently because of the particular conditions making its wave climate. Projections of individual Hs values, together with the wave climate parameters analyzed here, are available for the whole ERA-Interim grid in the Global Spectral Wave Climate (GLOSWAC) website (https://modemat.epn.edu.ec/nereo/).
From the practical point of view, users are generally conflicted by the differences between integral and covariate models, so they seek to reconcile the two. However, a sensitive aspect in covariate EVA models using spectral partitioning, is the interdependence of the wave systems. Although the spectral identification method used here is skillful enough to clearly differentiate wave systems with different origin, it is possible to have a level of dependence among the different populations. The projections derived here, for the individual components (Figs. 3b, 6b, and 9b) imply independence. However, indeed this condition might be difficult to assess (e.g., Jones et al. 2016). In the EEP for instance, northerly swells (WS2) are features of the North Pacific Ocean, while the Panama jet (WS4) is strongly linked to the Caribbean dynamics. At first sight, they may be thought to be independent; however, from a global perspective, the two are related to the boreal winter, hence probably dependent. The same reasoning applies to any combination of wave systems (not only pairs). And more challenging yet is the fact that the contamination of extreme values in the total Hs is not only due to single (i.e., unimodal) wave systems, but it involves sometimes a rich mixture of single, bimodal, and multimodal states (see, e.g., Fig. 3a). For two physically independent wave systems, the total probability can be obtained from the joint probability of the single components [i.e., P(X, Y) = P(X)P(Y)]. Otherwise, the conditional probability provides a better estimate [i.e., P(X ∩ Y) = P(X|Y)P(Y)]. However, P(X|Y) cannot be easily derived for environmental variables for the reasons just explained. Therefore, at present there is not a definitive answer on how to estimate the total probability.
Alternatively, since EVA has direct practical applications, other variables can also be used to assess the differences between integral and covariate approaches by evaluating the target effects. For instance, in the design of offshore structures, the actual working variable is mechanical load, being Hs, or even the wave spectrum, only intermediate or input variables. In such a case, the response of the dynamic system can be assessed (e.g., via structural modeling), using the actual spectra (ground truth) and estimates from both the integral and covariate EVA models. Such an approach can be helpful to understand and possibly reconcile the differences.
We illustrate these criteria for the North Atlantic case, for which the corresponding parameters are presented in Fig. 10.
Figure 10 shows the variability of the parameters κ and σ*, as well as the projected return value period for time horizon available in the series (37 years), which we define as Hs37. For the sake of conciseness, we present this only for the covariates WS1 and WS2. For WS1, in all cases the value of κ is positive (heavy tail). In general, a swap between positive and negative is associated with difficult fitting. We observe a large region of stability in κ and σ* for thresholds from about 1.5 to 3.19 m (this last is the selected threshold). However, for lower thresholds the projected Hs37 tends to be first slightly underestimated, and then significantly overestimated (particularly in the stability zone) relative to the recorded maximum (i.e., the optimal fit). For higher thresholds, the quality of the fit at the largest values (Hs37) increases (bottom panels), while some stability and confidence are sacrificed (κ and σ*). Nevertheless, the higher threshold is preferred because it better represents the extremes and produces a robust projection, while the number of events is still sufficiently large (95 events; see Fig. 8b). For WS2, we observe that the higher the threshold the lower κ, indicating a stronger asymptotic behavior for higher thresholds (a desirable characteristic). However, the stability of κ and σ* gets compromised for thresholds higher than 4.7 m without any improvement on the projection value (Hs37). In this case, we also observe that for very low thresholds, κ is first positive, with Hs37 being underestimated (two undesirable characteristics), whereas higher thresholds improve the stability, the asymptotic behavior (lower negative κ), and the fit on the extremes.
5. Summary and conclusions
Using spectral partitioning, and its further clustering by long-term spectral statistics, we show that wave data are composed by different populations, which have different origin, seasonality, and distributions in the extremes, among other statistical characteristics. This implies that an integrated model (noncovariate) does not satisfy the identical distributed condition required for the implementation of EVA.
The overall behavior of the integrated series, relative to its components, cannot be predicted a priori, because it depends entirely on the characteristics of the individual series and their interrelations (e.g., superposition, contamination). We showed three of many possible cases. In one of them (eastern equatorial Pacific), regular superposition produces a total projection that is always higher than those of the components (with potential overestimation of the extremes by the total). In the second case (Arabian Sea), one of the individual series overwhelms the others in the extremes, dominating the total by imposing its distribution. This results in minor differences between the integrated and the covariate models. The third example (North Atlantic) is less intuitive, because although one of the series dominates the extremes, there is another one with heavy tail (possibly unbounded), which despite its typical lower values, becomes larger than the total beyond the time horizon of 60 years, suggesting potential underestimation of the extremes by the total. Furthermore, results for the full ERA-Interim grid are made available at the GLOSWAC website (https://modemat.epn.edu.ec/nereo/).
We find that the loss of statistical efficiency due to sample size reduction is not crucial in these covariate models. In fact, this attribute is more affected by the selection of the threshold in every case. Once a minimum number of samples to work with (no shorter than the number of years in the series, i.e., analogous to AMS) has been defined, a trade-off exists between the statistical efficiency (i.e., narrower confidence limits), and the robustness of the fit on the extremes (i.e., stability of the projected values). The appropriate threshold is the one that satisfies both conditions the best.
Although the covariate approach is more rigorous with the identical distributed condition and provides robust projections for the individual components, it can yield a direct estimate of the total probability only when the involved systems are physically independent (i.e., via the joint probability). In turn, the limitation of integral models is that they ignore the existence of different data populations. Therefore, in regard to the specification of guidelines and standards for end users, we conclude that further research is needed in order to understand and possibly reconcile the existing discrepancies.
Acknowledgments
This work was partially carried out during a research visit of J. Portilla at ISMAR-CNR, Venice (Grant STM-CUP B56C18002220005). We acknowledge the fruitful interactions with Luigi Cavaleri, Francesco Barbariol, and Alvise Benetazzo from ISMAR. E. Jácome acknowledges Ph.D. funding from EPN (Contract EPN-0017-PO-FIM-2019). We acknowledge the insightful comments of the anonymous reviewers that helped improve the final version of the manuscript.
APPENDIX
Fitting Parameters at the Selected Locations
The values for the fitting parameters in these equations are shown in Table A1 for the eastern equational Pacific Ocean location, Table A2 for the Arabian Sea location, and Table A3 for the North Atlantic Ocean location.
EVA fitting parameters in the eastern equatorial Pacific location.
EVA fitting parameters in the Arabian Sea location.
EVA fitting parameters in the North Atlantic location.
REFERENCES
Anderson, C. W., D. J. T. Carter, and P. D. Cotton, 2001: Wave climate variability and impact on offshore design extremes. Shell International Rep., 99 pp.
Benetazzo, A., F. Fedele, S. Carniel, A. Ricchi, E. Bucchignani, and M. Sclavo, 2012: Wave climate of the Adriatic Sea: A future scenario simulation. Nat. Hazards Earth Syst. Sci., 12, 2065–2076, https://doi.org/10.5194/nhess-12-2065-2012.
Bierlaire, M., D. Bolduc, and D. McFadden, 2008: The estimation of generalized extreme value models from choice-based samples. Transp. Res., 42B, 381–394, https://doi.org/10.1016/j.trb.2007.09.003.
Bitner-Gregersen, E. M., E. H. Cramer, and F. Korbijn, 1995: Environmental description for long-term load response of ship structures. Proc. Fifth Annual Int. Offshore and Polar Engineering Conf., Hague, Netherlands, International Society of Offshore and Polar Engineers, 353–360.
Boccotti, P., 2000: Wave Mechanics for Ocean Engineering. Vol. 64. Elsevier, 520 pp.
Borgman, L. E, and D. T. Resio, 1982: Extremal statistics in wave climatology. Topics in Ocean Physics, Vol. 80, North-Holland Publishing, 439–471.
Cadet, D., 1979: Meteorology of the Indian summer monsoon. Nature, 279, 761–767, https://doi.org/10.1038/279761a0.
Caires, S., and A. Sterl, 2005: 100-year return value estimates for ocean wind speed and significant wave height from the ERA-40 data. J. Climate, 18, 1032–1048, https://doi.org/10.1175/JCLI-3312.1.
Caires, S., and M. Van Gent, 2008: Extreme wave loads. Proc. 27th Int. Conf. on Ocean, Offshore and Arctic Engineering, Estoril, Portugal, American Society of Mechanical Engineers, 945–953, https://doi.org/10.1115/OMAE2008-57947.
Carter, D. J. T., and P. G. Challenor, 1981: Estimating return values of environmental parameters. Quart. J. Roy. Meteor. Soc., 107, 259–266, https://doi.org/10.1002/qj.49710745116.
Castillo, E., 1988: Extreme Value Theory in Engineering. Academic Press, 389 pp.
Cavaleri, L., and Coauthors, 2007: Wave modelling—The state of the art. Prog. Oceanogr., 75, 603–674, https://doi.org/10.1016/j.pocean.2007.05.005.
Chavez-Demoulin, V., and A. Davison, 2005: Generalized additive modelling of sample extremes. J. Roy. Stat. Soc., 54C, 207–222, https://doi.org/10.1111/j.1467-9876.2005.00479.x.
Coles, S. G., and D. Walshaw, 1994: Directional modelling of extreme wind speeds. J. Roy. Stat. Soc., 43C, 139–157, https://doi.org/10.2307/2986118.
Coles, S. G., J. Bawa, L. Trenner, and P. Dorazio, 2001: An Introduction to Statistical Modeling of Extreme Values. Vol. 208. Springer, 208 pp.
Davison, A. C., and R. L. Smith, 1990: Models for exceedances over high thresholds. J. Roy. Stat. Soc., 52B, 393–425, https://doi.org/10.1111/j.2517-6161.1990.tb01796.x.
Dee, D. P., and Coauthors, 2011: The ERA-Interim reanalysis: Configuration and performance of the data assimilation system. Quart. J. Roy. Meteor. Soc., 137, 553–597, https://doi.org/10.1002/qj.828.
Embrechts, P., C. Klüppelberg, and T. Mikosch, 1997: Modelling Extremal Events for Insurance and Financ e. Springer, 648 pp.
Ewans, K., and P. Jonathan, 2008: The effect of directionality on northern North Sea extreme wave design criteria. J. Offshore Mech. Arctic Eng., 130, 041604, https://doi.org/10.1115/1.2960859.
Fedele, F., and F. Arena, 2010: Long-term statistics and extreme waves of sea storms. J. Phys. Oceanogr., 40, 1106–1117, https://doi.org/10.1175/2009JPO4335.1.
Ferreira, J. A., and C. Guedes Soares, 1998: An application of the peaks over threshold method to predict extremes of significant wave height. J. Offshore Mech. Arctic Eng., 120, 165–176, https://doi.org/10.1115/1.2829537.
Fisher, R. A., and L. H. C. Tippett, 1928: Limiting forms of the frequency distribution of the largest or smallest member of a sample. Math. Proc. Cambridge Philos. Soc., 24, 180–190, https://doi.org/10.1017/S0305004100015681.
Forristall, G. Z., 2004: On the use of directional wave criteria. J. Waterw. Port Coastal Ocean Eng., 130, 272–275, https://doi.org/10.1061/(ASCE)0733-950X(2004)130:5(272).
Gerson, M., 1975: The techniques and uses of probability plotting. Statistician, 24, 235–257, https://doi.org/10.2307/2987921.
Goda, Y., 1989: On the methodology of selecting design wave height. 21st Int. Conf. on Coastal Engineering, Torremolinos, Spain, ASCE, 899–913, https://doi.org/10.1061/9780872626874.068.
Goda, Y., 1992: Uncertainty of design parameters from viewpoint of extreme statistics. J. Offshore Mech. Arctic Eng., 114, 76–82, https://doi.org/10.1115/1.2919962.
Gonzáles, F., 2013: Modelización estaadística de eventos extremos de olea aje y nivel del mar. Ph.D. thesis, Universidad de Las Palmas de Gran Canaria, 195 pp.
Gumbel, E. J., 1958: Statistical Theory of Floods and Droughts. Vol. 12. Institution of Water Engineers and Scientists, 28 pp.
Gumbel, E. J., 2012: Statistics of Extremes. Courier Corporation, 375 pp.
Hegermiller, C. A., J. A. A. Antolinez, A. Rueda, P. Camus, J. Perez, L. H. Erikson, P. L. Barnard, and F. J. Mendez, 2017: A multimodal wave spectrum–based approach for statistical downscaling of local wave climate. J. Phys. Oceanogr., 47, 375–386, https://doi.org/10.1175/JPO-D-16-0191.1.
Holthuijsen, L. H., 2007: Waves in Oceanic and Coastal Waters. Vol. 20. Cambridge University Press, 404 pp.
Huang, W. K., M. L. Stein, D. J. McInerney, S. Sun, and E. J. Moyer, 2015: Estimating changes in temperature extremes from millennial scale climate simulations using generalized extreme value (GEV) distributions. arXiv, https://arxiv.org/abs/1512.08775v3.
Hundecha, Y., A. St-Hilaire, T. B. M. J. Ouarda, S. El Adlouni, and P. Gachon, 2008: A nonstationary extreme value analysis for the assessment of changes in extreme annual wind speed over the Gulf of St. Lawrence, Canada. J. Appl. Meteor. Climatol., 47, 2745–2759, https://doi.org/10.1175/2008JAMC1665.1.
IACS, 2000: Standard wave data. IACS Note 34, 4 pp.
Jonathan, P., K. Ewans, and G. Forristall, 2008: Statistical estimation of extreme ocean environments: The requirement for modelling directionality and other covariate effects. Ocean Eng., 35, 1211–1225, https://doi.org/10.1016/j.oceaneng.2008.04.002.
Jones, M., D. Randell, K. Ewans, and P. Jonathan, 2016: Statistics of extreme ocean environments: Non-stationary inference for directionality and other covariate effects. Ocean Eng., 119, 30–46, https://doi.org/10.1016/j.oceaneng.2016.04.010.
Kuwashima, S., and N. Hogben, 1986: The estimation of wave height and wind speed persistence statistics from cumulative probability distributions. Coastal Eng., 9, 563–590, https://doi.org/10.1016/0378-3839(86)90004-9.
Leadbetter, M. R., G. Lindgren, and H. Rootzén, 2012: Extremes and Related Properties of Random Sequences and Processes. Springer Science and Business Media, 336 pp.
Mackay, E. B. L., P. G. Challenor, and A. S. Bahaj, 2010: On the use of discrete seasonal and directional models for the estimation of extreme wave conditions. Ocean Eng., 37, 425–442, https://doi.org/10.1016/j.oceaneng.2010.01.017.
Morton, I. D., J. Bowers, and G. Mould, 1997: Estimating return period wave heights and wind speeds using a seasonal point process model. Coastal Eng., 31, 305–326, https://doi.org/10.1016/S0378-3839(97)00016-1.
Ochi, M. K., 2005: Ocean Waves: The Stochastic Approach. Vol. 6. Cambridge University Press, 32 pp.
Pierson, W., G. Neumann, and R. James, 1955: Practical Methods for Observing and Forecasting Ocean Waves by Means of Wave Spectra and Statistics. U.S. Navy Hydrographic Office, 284 pp.
Portilla-Yandún, J., 2018: Open access atlas of global spectral wave conditions based on partitioning. 7th Int. Conf. on Ocean, Offshore and Arctic Engineering, Estoril, Portugal, American Society of Mechanical Engineers, V11BT12A051, https://doi.org/10.1115/OMAE2018-77230.
Portilla-Yandún, J., F. J. Ocampo-Torres, and J. Monbaliu, 2009: Spectral partitioning and identification of wind sea and swell. J. Atmos. Oceanic Technol., 26, 107–122, https://doi.org/10.1175/2008JTECHO609.1.
Portilla-Yandún, J., A. L. Caicedo, R. Padilla-Hernández, and L. Cavaleri, 2015a: Spectral wave conditions in the Colombian Pacific Ocean. Ocean Modell., 92, 149–168, https://doi.org/10.1016/j.ocemod.2015.06.005.
Portilla-Yandún, J., L. Cavaleri, and G. P. Van Vledder, 2015b: Wave spectra partitioning and long term statistical distribution. Ocean Modell., 96, 148–160, https://doi.org/10.1016/j.ocemod.2015.06.008.
Simiu, E., and N. A. Heckert, 1996: Extreme wind distribution tails: A “peaks over threshold” approach. J. Struct. Eng., 122, 539–547, https://doi.org/10.1061/(ASCE)0733-9445(1996)122:5(539).
Sverdrup, H. U., and W. Munk, 1947: Wind, Sea and Swell: Theory of Relations for Forecasting. Vol. 601. U.S. Navy Hydrographic Office, 44 pp.
Tayfun, M., and F. Fedele, 2007: Wave-height distributions and nonlinear effects. J. Ocean Eng., 34, 1631–1649, https://doi.org/10.1016/j.oceaneng.2006.11.006.
Vinoth, J., and I. Young, 2011: Global estimates of extreme wind speed and wave height. J. Climate, 24, 1647–1665, https://doi.org/10.1175/2010JCLI3680.1.
Walton, T. L., 2000: Distributions for storm surge extremes. Ocean Eng., 27, 1279–1293, https://doi.org/10.1016/S0029-8018(99)00052-9.
Wimmer, W., P. Challenor, and C. Retzler, 2006: Extreme wave heights in the North Atlantic from altimeter data. Renewable Energy, 31, 241–248, https://doi.org/10.1016/j.renene.2005.08.019.
Young, I. R., J. Vinoth, S. Zieger, and A. Babanin, 2012: Investigation of trends in extreme value wave height and wind speed. J. Geophys. Res, 117, C00J06, https://doi.org/10.1029/2011JC007753.