## Abstract

The development of numerical wave prediction models for hindcast applications allows a detailed description of wave climate in locations where long-term instrumental records are not available. Wave hindcast databases (WHDBs) have become a powerful tool for the design of offshore and coastal structures, offering important advantages for the statistical characterization of wave climate all over the globe (continuous time series, wide spatial coverage, constant time span, homogeneous forcing, and more than 60-yr-long time series). However, WHDBs present several deficiencies reported in the literature. One of these deficiencies is related to typhoons and hurricanes, which are inappropriately reproduced by numerical models. The main reasons are (i) the difficulty of specifying accurate wind fields during these events and (ii) the insufficient spatiotemporal resolution used. These difficulties make the data related to these events appear as “outliers” when compared with instrumental records. These bad data distort results from calibration and/or correction techniques. In this paper, several methods for detecting the presence of typhoons and/or hurricane data are presented, and their automatic outlier identification capabilities are analyzed and compared. All the methods are applied to a global wave hindcast database and results are compared with existing hurricane and buoy databases in the Gulf of Mexico, Caribbean Sea, and North Atlantic Ocean.

## 1. Introduction

In the last decade, the traditional approach to climatology based on observations has evolved toward a state-of-the-art data assimilation system, which is used to reprocess all past environmental observations in combination with numerical models consistent with atmospheric equations. The improved methodology allows us to obtain the best estimate of the state and evolution of the atmosphere. It can also be considered as a reintegration of our knowledge about the atmosphere into an easily accessible global atmospheric reanalysis database. This source of information provides different climate variables, such as wind fields, in a regular grid.

These atmospheric reanalysis databases can be subsequently reprocessed using wind wave models, which allow the simulation of the wave generation and propagation processes all over the globe. As in the meteorological case, these models provide consistent datasets to define the wave climatology. However, since wave models do not incorporate wave instrumental observations, the resulting databases are called wave hindcast rather than reanalysis.

In the last years, the importance of wave hindcast databases for the design of offshore and coastal structures has increased considerably. The main reason is their ability to provide a detailed description of wave climate (i.e., long continuous time series records with wide spatial coverage) in locations where long-term instrumental records are not available. However, hindcast models use (i) several simplifying assumptions of reality and (ii) discrete forcing fields consisting of surface winds at different times, and for these reasons hindcast results present differences when compared with instrumental data (buoys and/or satellites; Caires and Sterl 2005; Cavaleri and Sclavo 2006). Besides, if the orography is complex, the hindcast inaccuracy becomes more evident (Cavaleri and Bertotti 2004) as a result of the inappropriate spatial and temporal resolution and inaccurate description of wind fields.

An additional problem related to wave hindcast databases is the bad performance during hurricanes and typhoons. These inconsistencies are produced because of the difficulty of specifying accurate wind fields and the scarcity of high-quality wave measurements during these events. Thus, to better catch up ocean surface behavior when hurricane and typhoons occur, models with higher spatial and temporal resolution must be used. These models take advantage of (i) the advances made in recent years in the analysis of the time and space evolution of surface wind fields, especially in North Atlantic basin hurricanes (Powell et al. 1998), and (ii) the high-quality wind datasets from remote sensing systems. However, these models are too time consuming and they should only be used when and where the global wave hindcast does not appropriately reproduce the wave climate (i.e., during those hurricanes and typhoons that produce important discrepancies between hindcast results and instrumental data).

Coastal management and design demand the appropriate definition of the wave climate. This requirement has resulted in an increased interest in collecting information through instrumental devices (i.e., buoys and satellites). For example, the National Oceanic and Atmospheric Administration (NOAA) National Data Buoy Center (NDBC) has a fairly dense rich array of moored data buoys around the United States. In addition, several satellite missions [Skylab, GEOS-3, *Seasat*, *Geosat*, the Ocean Topography Experiment (TOPEX)/Poseidon, the *European Remote Sensing Satellites-1* and *-2* (*ERS-1*) and (*ERS-2*), the *GEOSAT Follow-On* (*GFO*), *Jason-1*, the *Environmental Satellite* (*Envisat*), and *Jason-2*] incorporate altimetry sensors for the evaluation of different ocean climate variables with a high level of precision (i.e., ±3 cm; Krogstad and Barstow 1999). These measurements are considerably more accurate than wave hindcast databases (WHDBs). However, there are also several shortcomings to be considered, such as disruptions on normal use due to failures, and temporal and spatial inhomogeneous records, which limit their use to certain regions, mostly related to developed countries. These reasons have motivated an increased interest in developing different wave generation models, such as the Wave Model (WAM) developed by the Wave Model Development and Implementation Group (WAMDI; WAMDI Group 1988) or Wave Watch (Tolman 1997, 1999, 2002). These models try to reproduce wave generation and propagation processes using wind fields as input data (Caires et al. 2004; Pilar et al. 2008; Dodet et al. 2010).

Since instrumental (buoys and/or satellites) and hindcast sources of information have advantages and drawbacks (Cavaleri and Sclavo 2006), several authors attempt to combine both types of information. Caires and Sterl (2005) establish a nonparametric correction based on analogs taken from a learning dataset. Cavaleri and Sclavo (2006) obtain calibrated decadal time series at a large number of points over the Mediterranean Sea. They use the overall information on models, buoys, and satellites. Tomás et al. (2008) include spatial correlation in the calibration process, proposing a spatial calibration procedure based on empirical orthogonal functions and a nonlinear transformation of the spatial–time modes. Mínguez et al. (2011) propose a calibration method based on a nonlinear regression problem in which the corresponding correction parameters vary smoothly along the possible wave directions by means of cubic splines. This procedure is based on a point-to-point basis including wave direction, but without considering the spatial correlation between neighboring nodes. However, none of these approaches provide a rational criterion to detect data associated with hurricanes and typhoons, which should be treated with care within the calibration process. Note that failing to exclude these outlying observations may provoke large distortion of calibration results. Besides, these data should be treated and analyzed separately for the results to be fully reliable. Efforts in this direction can be found in Cardone et al. (1976, 1996). This outlier detection task is of great importance if hindcast database information is used for maximum significant wave analysis, especially for the design of coastal protection and offshore structures, because it may underestimate maximum significant wave heights associated with given return periods, thus compromising safety and functionality.

Because of the difficulties of defining the wave climate, we are forced to work with mathematical and statistical models, as those proposed in this paper. Nevertheless, mathematical and statistical models are simplifications of reality and their results must be used with caution. For instance, it is known that in certain regions of the world, hurricane data may be present in instrumental records. Therefore, it is interesting to have statistical methods to automatically detect and/or remove outliers and other unduly influential observations. This would protect the results of the analysis from the influence of these rare events. Note that the techniques proposed in this paper would allow deciding “where” and “when” specific numerical models for hurricanes and typhoons should be used instead of wave hindcast databases.

There is a large amount of literature on outlier detection; see, for example, the books by Hawkins (1980), Belsley et al. (1980), Cook and Weisberg (1982), Atkinson (1985), Chatterjee and Hadi (1988), and Barnett and Lewis (1994), and the articles by Pregibon (1981), Gray and Ling (1984), Gray (1986), Cook (1986), Jones and Ling (1988), Weissfeld and Schneider (1990a,b), Schwarzmann (1991), Paul and Fung (1991), Simonoff (1991), Nyquist (1992), Hadi and Simonoff (1993), Atkinson (1984), Peña and Yohai (1995), Barrett and Gray (1997), Mayo and Gray (1997), Billor et al. (2001), and Winsnowski et al. (2001). As can be seen in these books and articles, the literature has focused mainly on the area of least squares linear regression. Other statistical models and estimation methods, such as reweighed techniques (Luceño 1997, 1998a,b), nonlinear methods (Castillo et al. 2004), heteroscedastic models (Cheng 2011), or some robust estimators (Rousseeuw and Leroy 1987; Rousseeuw and Van Driessen 1999) have received comparatively less attention.

The aim of this paper is twofold: first to present several outlier detection techniques for hurricanes and typhoons, and second to compare results from those techniques giving some recommendations.

The paper is organized as follows. In section 2, the dataset used for this study is described. section 3 presents four different methods for outlier detection. In section 4, the functioning of the different methods is illustrated through several examples using data from the Gulf of Mexico, the Caribbean Sea, and the North Atlantic Ocean. Finally, in section 5 relevant conclusions are drawn and some recommendations are given.

## 2. Data sources

For this study we have used the following database information:

Significant wave height data from 43 buoys from NOAA/NDBC (http://www.ndbc.noaa.gov/) over the Gulf of Mexico, the Caribbean Sea, and the Atlantic Ocean. The main characteristics of the buoys used are given in Table 1, and their locations are shown in Fig. 1.

Atlantic Hurricane Database (HURDAT): This database consists of an ASCII (text) file containing the 6-hourly center locations (latitude and longitude in tenths of degrees) and intensities (maximum 1-min surface wind speeds in knots and minimum central pressures in millibars) for all tropical storms and hurricanes from 1851 to 2009 (Jarvinen et al. 1984; Landsea et al. 2004, 2008). Figure 1 shows the hurricane tracks from Atlantic HURDAT database and the tracks of some Atlantic storms.

Global Ocean Waves (GOW): This is a global wave hindcast from 1948 onward developed by the Environmental Hydraulics Institute “IH Cantabria.” It uses the third-generation model Wave Watch III (Tolman 1997, 1999) forced by 6-hourly wind fields from the National Centers for Environmental Prediction–National Center for Atmospheric Research (NCEP–NCAR) atmosphere model. The GOW database has different spatial scales: (i) a global grid at 1.5° × 1° (longitude–latitude) spatial resolution, (ii) an Atlantic coast grid at 0.5° × 0.5° spatial resolution, and (iii) a Caribbean coast grid at 0.25° × 0.25° spatial resolution.

To increase the confidence in wave hindcast databases, results must be postprocessed and validated with instrumental data (buoys and/or satellites). For this task, hindcast versus instrumental data pairs coincident in time and space must be selected. For this particular case, and because of the hindcast homogeneity both in time and space, database information is interpolated to the buoy positions and to the times where buoy data are recorded. These data pairs are used for validation and calibration. The aim of this paper is to propose methods for detecting data pairs associated with hurricanes and typhoons previously to validating and/or applying any calibration–correction technique.

An example of these data and the hurricane effect on hindcast validation is shown in Fig. 2, where the instrumental and hindcast significant wave records at buoy 42059 (eastern Caribbean) are plotted. Note in Fig. 2a that the hindcast time series captures appropriately the magnitude and temporal evolution of the instrumental significant wave height record; however, there exist clear discrepancies when hurricane events occur, especially during Dean 2007 and Omar 2008. This effect is also shown in the scatterplot (Fig. 2b), where instrumental and hindcast data occurring during these tropical storms present important discrepancies, which would affect the calibration process and detract the good performance of the hindcast if they were not accounted for appropriately. This paper does not try to detect and remove all data related to hurricanes, but only those data that differ substantially between hindcast and instrumental records. In Fig. 2b there are many data points recorded during the occurrence of different tropical storms where the hindcast performs appropriately. The reason for this behavior is that the hurricane wave generation is a local effect. As shown in Fig. 2c, there are four tropical storm tracks passing within 2° distance from the buoy location; however, there are only considerable discrepancies during two of these events:

Dean 2007 evolved from east to west and went through the buoy location on 18 August. At that time, its hurricane category was H5. This is why discrepancies during this event are so high.

Noel 2007 was born close to the buoy location, being an extratropical storm at the time it passed close to the buoy on 25 October. The maximum category during this event was tropical or subtropical storm. For these reasons, discrepancies may be considered to be within tolerable limits.

Gustav 2008 was analogous to Noel 2007; its category was tropical or subtropical depression at the time it passed close to the buoy location on 25 August.

Omar 2008 reached category H1 on 15 October, when it passed close to the buoy location, increasing up to category H4 on 16 October, 500 km away from the buoy location, also producing remarkable discrepancies.

## 3. Outlier detection techniques

In this section, we start considering the weighted general linear regression model and continue showing different methods to deal with outliers.

### a. Weighted least squares

Consider the standard linear regression model

where **Y** = (*y*_{1}, *y*_{2}, … , *y _{n}*)

^{T}is a

*n*× 1 response variable vector, is a

*n*×

*k*matrix of predictor variables often called the “design matrix,”

**is a**

*β**k*× 1 vector of regression coefficients or parameters, and

**= (**

*ɛ**ɛ*

_{1},

*ɛ*

_{2}, … ,

*ɛ*)

_{n}^{T}is a

*n*× 1 vector of random errors assumed to be jointly normally distributed random variables , where is a positive definite variance-covariance matrix.

Regression parameters ** β** are usually estimated using the weighted least squares (WLS) method:

where . For Eq. (1), WLS coincides with the maximum likelihood (ML) estimation method. Note that, for homoscedastic models, corresponds to the identity matrix (i.e., *w _{ii}* = 1;

*i*= 1, … ,

*n*;

*w*= 0;

_{ij}*i*,

*j*= 1, … ,

*n*and

*i*≠

*j*), and Eq. (2) becomes the traditional least squares (LS) method. However, we include matrix in the formulation so that regression formulas remain valid for the reweighting approach presented in section 3b. Fitting results are (Draper and Smith 1981) the following:

where the hat refers to estimates, and

where *υ _{ii}* and

*p*are the

_{ii}*i*th diagonal element of and the projection matrix , respectively. The residual mean square estimator of

*σ*

^{2}is

#### 1) Differences between influential observations and outliers

Influential observations can be defined, according to Belsley et al. (1980), as those observations having a larger and excessive impact on the calculated values of some estimates. There are numerous influence measures in the literature, which according to Chatterjee and Hadi (1986) can be classified into five groups based on 1) residuals, 2) the prediction matrix, 3) volume of confidence ellipsoids, 4) influence functions, and 5) partial influence. In contrast, outliers are data that cannot be explained by the model, because they are produced under different dynamics than regular data. One can find outliers that are influential, as well as outliers that are not. Some outliers present large residuals and therefore are easy to detect. However, it is important to realize that some outliers may have small residuals because they have large influence on the parameter estimates; when outliers of this type appear in groups, they are often more difficult to detect even though they are very influential. Finally, there may be some outliers with small residuals that are not influential; these are also difficult to detect, but they are much less important.

Figures 2a,b show (i) the significant wave height evolution in time and (ii) the scatterplots corresponding to buoy 42059 (eastern Caribbean) and hindcast interpolated data. According to these plots, many outliers related to hurricanes seem to have large residuals, but a moderate influence on the fitted regression model.

#### 2) Influence measures

To assess the effect of outliers associated with hurricanes on the estimators, we use different influence measures, some of them based on the deletion approach (i.e., the influence of the *i*th observation on a given estimator is calculated comparing results using all data versus results obtained removing the *i*th observation from the dataset). We have considered the following statistics, which are valid only for diagonal matrix so that *w _{ii}* =

*υ*

_{ii}^{−1}:

- The
*i*th diagonal element of the projection matrix : where**x**is the_{i}*i*th row of the design matrix, which represents the amount of leverage of the response value*y*on the corresponding response estimate . Note that . High leverage points in regression (i.e., points that are outlying in the_{i}*x*space) should be further examined (Hoaglin and Welsch 1978). - Externally studentized residuals, a second version of studentized residuals in (13) where is replaced by and is the estimator of
*σ*^{2}when the*i*th observation is omitted: Large values of the two studentized residuals are related to outliers in the response-factor space and represent points not well fitted by the model. - The standardized squared modulus of the difference between the vector estimate for the whole set of data and the same vector when the
*i*th observation is omitted : where . This measure is based on the sensitivity curve (Chatterjee and Hadi 1986). - The weighted squared standardized distance (WSSD; Daniel and Wood 1980) of the
*i*th observation in the*x*space: where is an estimate of*σ*^{2}and .

#### 3) Heteroscedastic transformations

When the homoscedastic assumption (constant variance) does not hold, it is often possible to transform the response variable to stabilize the variance by using the transformation:

for some appropriate value of *γ*. This value of *γ* can be estimated using two different methods:

Including the transformation in (19) within a nonlinear LS model. Thus, the estimated value is obtained jointly with the regression parameters.

The second alternative is preferable, if one can find sets of repeated observations, because it allows using solutions given in section 3a. Consequently, heteroscedastic data can be analyzed using WLS, an appropriate transformation of the response variable, or a combination of both. We also show next that weights can be recalculated iteratively to match them with the observed standardized residuals.

### b. Reweighted least squares

The aim of many outlier detection methods is to determine whether an observation should be considered as an outlier or not, without allowing for intermediate situations. In contrast, the reweighted least squares (RWLS) method aims at empirically determining a weight 0 ≤ *w _{ii}* ≤ 1 for every observation ranging continuously from 0, for observations that are completely unreliable, and up to 1, for observations that are completely reliable. This can be attained by applying the following recursive procedure:

Step 0: Set

*w*= 1;_{ii}*i*= 1, … ,*n*.Step 1: Compute weighted least squares regression solving Eq. (2).

Step 2: Compute new weights from the residuals of the last fit.

Steps 1 and 2 are repeated till convergence.

A key issue for the successful application of this algorithm is the new weight computation in step 2. From different formulas proposed in the literature (Huber 1981; Chatterjee and Mächler 1997; Luceño 1998b), we choose Tuckey’s biweight:

where is a standardized residual based on the scaled median absolute deviation estimator of *σ*, with *c** = 0.6745 (for consistency of *σ**).

Within the RWLS scheme, outliers related to hurricanes and typhoons are characterized with low *w _{ii}* weights. Note that in addition to its multiple outlier detection capabilities, reweighting also provides a better performance on model estimation, because the influence of potential outliers is removed from the final estimates.

### c. Nonlinear weighted least squares

Regression models presented previously allow the treatment of nonlinear and/or heteroscedastic problems using adequate transformations and/or weighting.

Tomás et al. (2008) and Mínguez et al. (2011) show that potential nonlinear relationships of the type and heteroscedastic variance provide very good calibration results. For this reason, an outlier detection method based on a nonlinear heteroscedastic regression model is presented.

An intrinsically (nonlinearizable) nonlinear regression model can be written as

where the function *f _{μ}* is known and nonlinear in the parameter vector

**, and**

*β**ɛ*;

_{i}*i*= 1, … ,

*n*are jointly normally distributed errors as in Eq. (1).

As in (2), the standard nonlinear weighted least squares (NWLS) method, for diagonal, can be stated as

where *n* is the number of observations. Note that analogously to the linear case, nonlinear regression models can also be used including weights in the formulation.

For wave hindcast data, a simple scatterplot of hindcast versus instrumental data allows observing how the variance of the regression model changes over the regression function. Consequently, we consider a nonlinear heteroscedastic regression model in which the standard deviation *σ _{i}* of the

*i*th error is a function of the predictor variable (

*x*):

_{i}where ** θ** is a new

*s*× 1 vector of coefficients or parameters. If the parameter vector

**were known, estimation of the parameter vector**

*θ***could be based on the NWLS method in (23). However, the values of**

*β***are usually unknown, and can be estimated using maximum likelihood methods. Thus, assuming that random errors are uncorrelated and normally distributed random variables each with mean zero and standard deviation given by (24), the whole set of model parameters (**

*θ***and**

*β***) can be jointly estimated maximizing the log-likelihood function:**

*θ*The estimates that maximize the log-likelihood function in (25), and solve (23), allow calculating the residual vector , which is defined as

Observe that the maximization of the log-likelihood function can be done using any of the available solvers for nonlinear programming, possibly subject to bounds on variables. One such solver is MINOS (Murtagh and Saunders 1998) under General Algebraic Modeling System (GAMS) (Brooke et al. 1998) which allows for upper and lower bounds on parameters to be estimated, and uses a reduced-gradient algorithm (Wolfe 1963) combined with the quasi-Newton algorithm described in Murtagh and Saunders (1978), or the Trust Region Reflective Algorithm under Matlab, also capable of dealing with upper and lower bounds through the function fmincon. For details about the method see Coleman and Li (1994) and Coleman and Li (1996). To improve convergence properties both the gradient and Hessian of the objective function are calculated analytically (see the appendix for details).

Following the analogy between WLS and NWLS, it is also possible to apply reweighting strategies within nonlinear regression models, which will enhance the quality of parameter estimates reducing the effect of possible existing outliers. This will lead to an increase in the computational time, or a somewhat more difficult to fit nonlinear regression model.

#### 1) Residual covariance matrix and studentized residuals

Using a first-order Taylor series expansion of the function in (26) around the optimal estimated parameter vector , the estimated differential residual vector is obtained as

where is the *n* × *k* Jacobian matrix evaluated at . It readily follows that

where the *k* × *n* matrix contains the derivatives of vector ** β** with respect to

**y**evaluated at , matrix is the

*n*-dimensional identity matrix, and matrix is the so-called residual

*sensitivity*matrix.

Integration of (28) allows obtaining the first-order linear approximation to the (nonlinear in ) transformation (26) from **y** to :

where **k** is the integration constant vector.

The corresponding estimated residual covariance matrix is

where matrix is the error covariance matrix provided by (24):

Therefore, considering (28), the general expression for matrix **Ω** is

where matrices and depend on the selected *f _{μ}*(

**x**;

**) and**

*β**f*(

_{σ}**x**;

**) functions. Note that (32) is a nonlinear equivalent to (9).**

*θ*where Ω_{i}_{,i} is the *i*th diagonal element of **Ω**.

Vector ** z** provides the studentized residuals, and hence can be used straightforwardly for outlier identification as in the linear case.

#### 2) Sensitivity matrix from sensitivity analysis

Section 3c(1) shows that the sensitivity matrix , which allows calculating the estimated residual covariance matrix **Ω**, which depends on matrix . This matrix is obtained below, based on sensitivity analysis results reported in Castillo et al. (2006).

For the maximum likelihood estimation problem, which is an unconstrained nonlinear optimization problem, the Karush–Kuhn–Tucker (KKT) first-order optimality conditions at its optimal solution (Bazaraa et al. 1993; Luenberger 1984) reduce to

where , and **∇_{η}** stands for the vector of partial derivatives (of ℓ) with respect to

**.**

*η*This condition establishes that the gradient of the objective function with respect to ** β** and

**at the optimal solution and must be zero.**

*θ*To obtain sensitivity equations, we perturb or modify ** y** so that is modified accordingly to continue satisfying the KKT conditions (Castillo et al. 2006). After manipulating the resulting expressions, the required sensitivity equation reduces to the following linear system of equations:

where the vectors and submatrices in (35) are defined below (dimensions in parentheses):

which constitute Hessians with respect to parameters and data. Note that (36) is a nonlinear equivalent to (3) with unit weights.

Expression (35) allows deriving sensitivities of the parameter estimates with respect to the data. Under mild regularity conditions that are often satisfied (Coles 2001; Castillo et al. 2005) (the Fisher information matrix) is invertible, and (35) has a unique solution. Matrix can be partitioned in two different blocks associated with mean and standard deviation parameter functions, respectively:

The first block corresponds to matrix , which allows obtaining the sensitivity matrix using Eq. (28).

From a computational point of view, inversion of the Hessian matrix is not needed because it can be easily factorized using LU algorithms. Sensitivities are thus obtained using forward and backward elimination methods. Note that for the calculation of all sensitivities, second-order derivatives of the log-likelihood function with respect to parameters and data are needed. They can be obtained numerically by finite differences or analytically. For the analytical case, a detail derivation of Jacobians and Hessians with respect to parameters and data is given in the appendix. Although analytical results seem to be complex, we rather like this approach because it is easy to implement using any programming language, and it avoids possible numerical problems deriving from finite differences. In addition, to calculate studentized residuals, only the computation of the diagonal elements of the **Ω** matrix is required, which considerably reduces the computational time.

### d. Minimum covariance determinant estimator

A different method capable of detecting outliers is the minimum covariance determinant estimator (Rousseeuw and Van Driessen 1999), which is used in this paper for comparison purposes. The minimum covariance determinant estimator (MCD) method looks for the *h* observations out of *n* whose classical covariance matrix has the lowest possible determinant. This method allows us to calculate a robust distance:

where and are robust MCD location and scatter estimates, respectively, so as to determine whether the associated observation *i* is an outlier or not. Under the normal assumption, the outliers correspond to those values whose robust distances are larger than a given cutoff value usually defined as for some small 0 < *α* < 1. The robust distance in (39) is a robustification of the Mahalanobis distance.

## 4. Case study

In this section we illustrate the performance of the methods presented in section 3. We have applied them to the 43 buoys from the NDBC given in Table 1 and shown in Fig. 1. In this application we only deal with two variables: *y _{i}* corresponds to the

*i*th value of the response variable (buoy data), and

*x*is the predictor variable (interpolated hindcast data) corresponding to the

_{i}*i*th observation. However, methods presented in the paper are valid for multivariate analysis. We could, for example, use more than one function of

*X*in the regression Eqs. (1) or (22). Consequently, we have investigated some of these more complex models, but we will only show results for those models we have found to work best.

Before performing the analysis, the particular regression models we have chosen are presented:

- For the WLS method (section 3a), the response variable is transformed using Eq. (19) and the estimate is calculated based on Eq. (20). Because the relationship between
*X*and*Y*is approximately linear, we apply the same power transformation 1 −*γ*to the covariate*X*and response*Y*, which leads to the following regression model: This model is linear with respect to*β*_{0}and*β*_{1}and nonlinear with respect to*γ*. However, because the estimate of*γ*is obtained previously rather than using a nonlinear iteration, model (40) can be considered linear for practical purposes. The RWLS method (section 3b) is applied using Eq. (40).

- For the NWLS model (section 3c), the following parameterization is used for the mean and dispersion functions:
Transformed data

*Y*^{1−γ}and*X*^{1−γ}are also used within the MCD framework (section 3d).

Note that previous to deciding the particular regression model for each case, alternative expressions have been considered particularly to check whether other transformations of *X* and *Y* could be useful. We only provide those giving a better performance.

### a. Detailed results for eastern Caribbean buoy 42059

We first analyze some detailed results for buoy 42059 (eastern Caribbean) shown in Fig. 2. We have applied the WLS (section 3a), RWLS (section 3b), NWLS (section 3c), and MCD (section 3d) methods. For the WLS and NWLS, outliers are identified using the internally studentized residuals *z _{i}* given in (13) and (33), respectively. In both cases, a case is identified as an outlier if |

*z*| > Φ

_{i}^{−1}(1 −

*α*/2). Results for different significance levels

*α*= {0.1, 0.05, 0.01, 0.001, 0.0001} are shown in Figs. 3a,b, where data removed for each significance level are highlighted by using different dot marker specifiers. Table 2 also provides the number of data points detected as outliers for each significance level, and the computational time in seconds. Note that models have been run on a portable computer with one processor clocking at 2.39 GHz and 3.25 GB of RAM. From all these results the following observations are pertinent:

Both WLS and NWLS provide similar results. The numbers of outliers detected by the two methods for each significance level are almost the same.

The WLS method requires the evaluation of the optimal

*γ*value in transformation in (19) for the homoscedastic assumption to hold, which for this particular buoy corresponds to . However, once this value is calculated, the problem is easily solvable using (3)–(11), which requires little computational time. On the other hand, the nonlinear version requires solving an optimization problem, which takes longer to solve although it is easily solvable using standard nonlinear mathematical programming techniques.Since the RWLS method iteratively updates the weights associated with each case, the detection criterion is established as a function of the final weights

*w*. Outliers relevant for calibration purposes are those whose weights are lower than about 0.2 (note that 0 ≤_{ii}*w*≤ 1;_{ii}*i*= 1, … ,*n*). RWLS also appropriately detects the most relevant outliers (see Fig. 3c). The computational time increases slightly with respect to WLS, but decreases considerably with respect to NWLS. The iterative process usually converges in a few iterations (e.g., for this particular buoy, it requires six iterations).It is also important to realize that RWLS avoids the dichotomy outlier versus “not outlier” for each particular case in the sample. In contrast, the fitted regression line is estimated giving to each case a weight (0 ≤

*w*≤ 1) ranging from 0 to 1 according to our empirically determined degree of credibility on the goodness of each case in the sample._{ii}Hurricane data related to Dean (2007) and Omar (2008) (see Fig. 2), where the discrepancies are remarkable, are correctly detected with both WLS and NWLS methods using a significance level

*α*= 0.0001, as well as with the RWLS using a weight threshold of*w*= 0.2.

For comparison purposes, we have also applied the MCD approach (section 3d). Results are also given in Fig. 3d and Table 2. The MCD method is applied using the function mcdcov from MATLAB toolbox LIBRA (Verboven and Hubert 2005), which is an implementation of the fast-MCD algorithm proposed by Rousseeuw and Van Driessen (1999). Note that Fig. 3d shows the data detected using the classical approach based on Mahalanobis distance along with those for the robust approach. From these results, we can conclude that both methods (classical and robust) related to the MCD approach provide unsatisfactory results, since besides detecting data associated with outliers, they also eliminate extreme values of hindcast and instrumental distributions that are close to the regression line. These points are appropriately reproduced by the hindcast, and extremely important from the engineering design point of view. In addition, computational cost is much higher with respect to the other methods.

### b. Results for the remainder buoys

Table 3 provides the following information related to the performance of the WLS and NWLS methods on the 43 buoys from the NDBC: the number of cases at each buoy location (*n*), the number of detected outliers for significance levels *α*_{1} = 0.001 and *α*_{2} = 0.0001 , the mean and standard deviation of the studentized residual absolute value (, *σ*_{|z|}), the maximum and minimum studentized residual absolute value (|*z*|_{max}, |*z*|_{min}), and the CPU time in seconds. Note that , *σ*_{|z|}, |*z*|_{max} and |*z*|_{min} are based on data removed using *α*_{2} = 0.0001.

From results given in Table 3 the following observations are pertinent:

Both WLS and NWLS approaches provide satisfactory results on outlier identification in most cases, with the computational time required for WLS being lower.

In all buoys, and for the same significance level, the number of data points detected using the nonlinear approach is higher, and the maximum studentized residual absolute value |

*z*|_{max}is also higher, resulting in a more conservative approach, which may produce better postcalibration results.

For comparison purposes, Fig. 4 shows the performance of both WLS and NWLS methods on three different buoys: 41040, 41046, and 41047. For buoy 41040 both methods perform appropriately, detecting the most relevant outliers. However, whereas the mean, standard deviation, and minimum studentized residual absolute value are similar (see the corresponding row in Table 3), the maximum studentized residual absolute values are 11.8317 and 19.2277, respectively, the NWLS |*z*|_{max} value being considerably higher. In this particular case, using *α* = 0.0001, the performance can be considered equivalent from the postcalibration process perspective. This effect is also observed in buoys 41047 and 41046. In buoy 41047, the maximum studentized residual absolute values are 4.4024 and 5.6516, relatively close, but in 41046 the maximum studentized residual absolute values are 5.2096 and 9.3842, where differences increase considerably with respect to buoy 41047. On the other hand, the minimum studentized residual absolute values are very similar in both locations. Thus, NWLS method provides more conservative detection results at the *α* = 0.0001 level, as shown in Figs. 4c,d, because it includes as outliers those points associated with *H _{s}*

^{GOW}between 3 and 4 m, and

*H*

_{s}^{I}around 6 m. Note that both methods are also capable of detecting points associated with negative studentized residuals, as shown in Figs. 4a,b (left side of the regression lines), which may be related to points taken during disruption of normal use of the instrumental device.

An important contribution of our analysis is the assessment of the effect on outlier detection of using the different diagnostic statistics given in (12)–(18). We have confirmed that the most appropriate statistic for this particular application is the internally studentized residual, since the other statistics detect high leverage points that are not usually related to hurricanes. Regarding the differences between internally and externally studentized residuals, differences are negligible for the buoys considered.

Table 4 provides the following information related to the performance of the RWLS and NWLS methods on the 43 buoys from the NDBC: the number of cases at each buoy location (*n*), the number of detected outliers for weights holding 0.2 ≤ *w*_{1} ≤ 0.5 and *w*_{2} ≤ 0.2 , the number of detected outliers for significance levels *α*_{1} = 0.001 and *α*_{2} = 0.0001 , the mean and standard deviation of the weights , the maximum and minimum weights (*w*_{max}, *w*_{min}), and the CPU time in seconds. Note that , *σ _{w}*,

*w*

_{max},

*w*

_{min}are for data removed using the

*w*

_{2}≤ 0.2 criterion. This table also shows that the number of iterations required for convergence of the RWLS method is between 5 and 7, and is thus computationally faster than NWLS.

The number of outliers detected using RWLS (i.e., ) is very similar to the number detected using NWLS (i.e., ) with both methods capable of detecting all the relevant outliers. Differences are due to certain outliers detected by RWLS, which are related to lower values of the instrumental dataset. Figure 5 shows the performance of the RWLS method on buoys 41040, 41046, and 41047. Comparing these results with those in Figs. 4b,d,f, it can be observed that the outliers detected with NWLS using *α* = 0.0001 and RWLS using *w* < 0.5, associated with hurricanes (higher values of the instrumental record), are almost the same. However, RWLS also includes data records related to the medium and lower part of the instrumental distribution, which are not considered as outliers by NWLS.

Note that although computational time increases slightly with respect to WLS method, the RWLS detecting capabilities should be regarded as an insurance policy to obtain (i) better protection against outliers that are more difficult to detect, and (ii) better estimates for the model parameters, because suspected outliers are given small or null weights (see columns *w*_{max} and *w*_{min} in Table 4) depending on our belief in their true outlying nature.

## 5. Conclusions

Several methods for automatic “outlier” identification, when comparing wave hindcast versus instrumental time series, are analyzed and compared in this paper. We prove that these outlying data are mostly related to the presence of typhoons and/or hurricanes, which must be removed to avoid distorting postcalibration results. The main conclusions of the study are as follows:

The best diagnostic statistic for outlier identification purposes in the WLS and NWLS methods is the internally studentized residual.

Both WLS and NWLS models perform appropriately in most cases. The WLS method is computationally faster; however, NWLS provides better postcalibration results because it is more conservative for the same significance level, which may be convenient if computational time is not relevant.

The RWLS method is also recommended for this specific application since it provides analogous results to NWLS. This method increases its relevance if there is a special interest on the final regression model parameters beyond outlier detection.

RWLS and NWLS provide systematic procedures to (i) detect outliers and (ii) remove outliers for calibration purposes. In addition, NWLS allows us to identify those areas where the presence of hurricanes and typhoons is more relevant, which are related to high values of the maximum studentized residual. This is especially important if wave hindcast time series are intended to be used for engineering purposes.

Methods based on the minimum covariance determinant (MCD) produce inappropriate results for this particular application. The main reason is the assumption of an underlying multivariate normal pattern that wave data do not follow, even after transforming the variables.

Note that our automatic hurricane–typhoon identification procedures allow detecting those areas and periods of time in which it is necessary to carry out a more accurate analysis by increasing the spatial and temporal resolution of winds during these events.

An open question is to assess the importance of using the proposed outlier detection techniques in new calibration studies. However, this effort is beyond the scope of the present paper.

## Acknowledgments

This work was partly funded by projects “GRACCIE” (CSD2007-00067, Programa Consolider-Ingenio 2010) and “AMVAR” (CTM2010-15009) from the Spanish Ministry MICINN, by project C3E (200800050084091) from the Spanish Ministry MAMRM, and by project MARUCA (E17/08) from the Spanish Ministry MF. R. Mínguez is also indebted to the Spanish Ministry MICINN for the funding provided within the “Ramon y Cajal” program. Alberto Luceño acknowledges the support of the Spanish Grant MTM2008-00759. We also thank the editor and reviewers for their very helpful comments and suggestions, which have led to an improved manuscript.

### APPENDIX

#### Derivatives for Sensitivity Matrix Calculations

The analytical derivation for all required matrices for the calculation of the sensitivity matrix is provided below. For this task, first- and second-order derivatives of the log-likelihood function with respect to parameters ** η** at the optimum must be obtained. Note that all derivations are based on the chain rule.

##### a. First-order derivatives of the log-likelihood function

First-order derivatives of the log-likelihood function with respect to mean (*μ*) and standard deviation (*σ*) parameters are

##### b. Second-order derivatives of the log-likelihood function

Second-order derivatives of the log-likelihood function with respect to mean (*μ*) and standard deviation (*σ*) parameters are

In addition, the evaluation of the second-order derivatives of the log-likelihood function with respect to parameters to be estimated and data is required. These are as follows: