## 1. Introduction

This paper is the second part in a two-part series on bootstrap methods for atmospheric science applications (Gilleland 2020, henceforth, PI). In atmospheric science, a common question concerns extreme values of weather phenomena and how those rare events might be changing in a future climate. However, this situation presents hidden challenges that are often overlooked. Even when it is recognized that standard parametric-based statistical tests might not be appropriate, bootstrap methods are often seen as a fix for any situation. However, bootstrap methods still require assumptions and the most commonly used type, the independent and identically distributed (iid) bootstrap, fails to produce accurate results when those assumptions are not met.

Assumptions for the bootstrap procedure are often violated when interest is in the extreme values of a process. First, simply resampling from the data does not allow for sampling values that might occur but have not been observed in the data record. Second, asymptotic arguments for the appropriateness of the resampling procedure do not hold for the usual resampling paradigm when the underlying distribution function is heavy tailed (Resnick 2007, chapter 6).

Fawcett and Walshaw (2012) employ block bootstrap resampling along with a bivariate extreme-value model to make inferences when modeling threshold exceedances and Heffernan and Tawn (2004) employ a semiparametric bootstrap procedure in order to make inferences for their bivariate conditional extreme-value model. Kyselý (2002) found that a parametric bootstrap procedure performed fairly well, but nevertheless had a tendency to yield narrower confidence intervals (CI’s) than desired. Schendel and Thongwichian (2015, 2017) advocate for the use of the test-inversion bootstrap (TIB; see PI) procedure, but this method can be difficult to implement, especially in the case of nonstationary data.

The main objective of this paper is to describe new R software (R Core Team 2017), available in the “extRemes” (Gilleland and Katz 2016) package, and making use of code from the “distillery” (Gilleland 2017) package, for obtaining accurate CI’s for extreme values.

## 2. Extreme value analysis

Weather situations that have a high impact on human life, infrastructure, and the environment, such as extreme precipitation, severe winds, tornadoes, and hurricanes, are often the main thrust of atmospheric studies. For the rarest events, such as those that occur on average only once every 100 years, it is important to utilize the correct statistical analyses in order to accurately portray the risks of these types of events; as well as their uncertainty information. In what follows, it is helpful to denote a random sample of variables *X*_{1}, …, *X*_{n}, to represent a physical phenomena of interest such as 24-h accumulated rainfall, daily maximum temperature, and streamflow.

Theoretical, asymptotic arguments give justification for modeling maxima taken over very long blocks with the generalized extreme value (GEV) distribution function, the frequency of occurrence of rare events by the Poisson distribution function and subsequently the time between events by the exponential distribution function, and for excesses over a very high threshold by the generalized Pareto (GP) distribution function. The frequency and occurrence of extreme events can be modeled jointly by way of a Poisson point process (PP) framework that can be recharacterized in terms of the GEV distribution function. This same PP framework can be expressed as a marked Poisson process where the marks follow a GP distribution function. Approximately, the GP distribution function informs about the tail of the GEV distribution function so that all three approaches to modeling extreme values are essentially the same. This approximate equivalence can be easily intuited by understanding that, for a high threshold *u*, Pr{max(*X*_{1}, …, *X*_{n}) ≤ *u*} = Pr{*N* = 0}, where *N* is a random variable that represents the number of times that *X*_{i} > *u*, *i* = 1, …, *n*. That is, letting *M*_{n} = max(*X*_{1}, …, *X*_{n}), *N* ~ Poisson(*λ*) and *M*_{n} ~ GEV(*μ*, *σ*, *ξ*).

*N*

_{n}, that occur in a sequence of

*n*independent trials. For

*p*the probability of a success, i.e.,

*p*= Pr{

*X*

_{i}>

*u*}, on a given trial with 0 <

*p*< 1, the expected value for

*N*

_{n}is

*np*. If the expected number of events,

*λ*=

*np*, stays constant as the number of trials,

*n*, increases, then

*p*decreases with

*n*; write

*p*

_{n}to emphasize this relationship. Under this scenario,

*n*. That is, the probability distribution function of

*N*

_{n}is approximately Poisson with rate parameter

*λ*. More generally,

*n*and constant rate of occurrence as

*n*increases. The Poisson distribution function has the unusual property that its mean is equal to its variance, which is also the rate parameter,

*λ*.

_{+}indicates that the value inside {⋅} is set to zero if it is less than zero. The GEV distribution function has three parameters, −∞ <

*μ*< ∞,

*σ*> 0 and −∞ <

*ξ*< ∞. The parameter

*μ*is a location parameter that linearly adjusts where the overall mass of the GEV distribution function falls, but it is not equivalent to the mean of the GEV distribution function, which is only defined for

*ξ*< 1 and is given by

*μ*−

*σ*(1 − Γ(1 −

*ξ*))/

*ξ*, where Γ(⋅) is the gamma function defined for

*x*> 0 by

*x*− 1)! when

*x*is a positive integer. The scale parameter,

*σ*, relates to the dispersion of the distribution function but is not the same as the standard deviation, which is given by

*ξ*< 1/2. Finally,

*ξ*is the shape parameter where

*ξ*< 0 gives rise to the reverse Weibull distribution function, which has a finite upper bound. The Fréchet distribution function arises when

*ξ*> 0, which has a heavy tail so that the probability of observing increasingly higher values of

*Z*decreases at a polynomial rate. Defined by continuity, the case where

*ξ*= 0 yields the light-tailed Gumbel distribution function; namely,

*G*(

*z*) = exp{−exp[−(

*z*−

*μ*)/

*σ*]}.

*Y*=

*X*−

*u*, conditioned on

*X*>

*u*, with

*u*a high threshold, by

*u*subscript on the scale parameter emphasizes that

*σ*

_{u}> 0 depends on the threshold. As mentioned previously, the GP distribution function approximates the tail of the GEV distribution function, and the scale parameter of the GP distribution function is related to the GEV distribution by

*σ*

_{u}=

*σ*+

*ξ*(

*u*−

*μ*), where

*σ*and

*μ*are the parameters from the GEV distribution function associated with block maxima over large blocks from the same underlying random variable that gives rise to the GP distribution function.

^{1}The shape parameter is the same for both. Clearly, the location parameter

*μ*is not involved in the GP distribution function, which has only two parameters. The threshold

*u*can be thought of as a surrogate for the location parameter as it effectively takes on the same role.

The GP distribution function has mean *σ*_{u}/(1 − *ξ*) and variance *ξ*, once again determines the tail behavior for the GP distribution function. The upper-bounded beta distribution function arises when *ξ* < 0 and the heavy-tailed Pareto distribution function results when *ξ* > 0. The light-tailed exponential distribution function occurs when *ξ* = 0, defined again by continuity.

Comparison of the distribution functions defined in (1), (2), and (3) helps to see how the three are related. The GEV distribution function has the same form as the Poisson distribution function, but where *λ* varies according to the three parameters; that is, the GEV distribution function is a nonhomogeneous Poisson distribution function. The GP distribution function has a similar form as the exponent portion of the GEV distribution function where the threshold replaces the location parameter.

At the beginning of this section, it was suggested that information about a 100-yr event might be of interest. More generally, suppose interest is in a *T*-yr event; that is, one that is exceeded, *on average*, once every *T* years, or equivalently, with probability 1/*T*. It should be noted that the probability of occurrence of such an event over a number of years, *M*, is possibly higher than one might think (cf. Gilleland et al. 2017). For example, suppose a home is for sale by a river. A potential buyer might believe that they would live in the home for 25 years. Suppose further that the potential buyer is risk averse to a 100-yr flood event. The probability of such an event over the span of time is, assuming independence and stationarity in time, given by 1 − (1 − *p*)^{M}, which for this example is 1 − (1 − 1/100)^{25} ≈ 22.22%. That is, the probability of having a 100-yr flood event at least once during the 25-yr time frame is more than 20%. In general, it is not known what the *T*-yr event actually is. That is, the potential buyer is risk averse to the 100-yr flood event, but the assumption is that this buyer knows what the level is for the 100-yr event. The extreme-value distribution functions (EVD’s) given by (1), (2), and (3) are particularly well suited for answering this question. Not only are they the only distribution functions with theoretical support for modeling rare events of this magnitude, but information about such events are easily obtained by inverting these equations. Of course, much uncertainty is involved in the estimated return levels when their length approaches or even exceeds that of the data’s, which is often the case.

*Z*

_{1}, …,

*Z*

_{n}, where

*Z*

_{i},

*i*= 1, …,

*n*represent yearly maximum water levels. In this case, the quantiles of this GEV distribution function correspond directly to the return levels,

*z*

_{p}, which are simply the solutions to the equation

*G*(

*z*

_{p}) = 1 −

*p*for

*G*(⋅) as defined in (2), which yields

*u*is exceeded. Suppose the rate

*ζ*

_{u}= Pr{

*X*>

*u*} can be estimated, then the value

*x*

_{m}>

*u*that is exceeded, on average, once every

*m*observations (e.g., years) for the GP distribution function is given by

To utilize these models, however, the parameters must be estimated from data. Because it is the rare events that are of interest, most of the data are irrelevant and are not used in fitting the distribution function to them. In order for the asymptotic results for the GEV distribution function to hold for block maxima, the blocks over which the maxima are taken must be very long. In practice, year-long blocks are generally sufficient, and also lend themselves nicely to interpretation when considering return levels from Eq. (4). For the peaks-over-threshold (POT) models (i.e., the GP and PP models), a very high threshold must be chosen. In practice, a trade-off is made between employing a lower threshold that allows more data to be used in the fit, yielding a lower variance for the estimates, against having a high-enough threshold so that the assumptions are reasonable, yielding lower bias.

Generally, the POT models make better use of the data as more data points are used in fitting the EVD’s to them. However, dependence in the values above the threshold must be considered in order to achieve accurate estimates of the standard errors for the parameter and return level estimators. When choosing the block maxima (BM) approach over the POT approach, it is possible that some blocks might not have any truly extreme values, which leads to utilizing nonextreme data in the fitting procedure. Conversely, it is also possible to have an extreme value at one time during the block and another more extreme value later in the same block. In such a case, one of the extreme values is discarded and not used in the fitting procedure. These issues are not present in the POT approaches. On the other hand, the BM approach is less likely to have issues with temporal dependence than the POT methods.

The main methods, and those available with “extRemes,” for estimating the EVD parameters include maximum likelihood (ML), generalized (or penalized) maximum likelihood (GML), L-moments, and Bayesian estimation. Some other moment and fast estimates are also utilized in the literature. This paper focuses solely on ML estimation.

### Maximum-likelihood estimation

*Z*

_{1}, …,

*Z*

_{n}are independent variables that each follow the same GEV distribution function, the log likelihood for the GEV distribution function parameters is given by

*I*

_{A}(

*x*) = 1 if

*x*∈

*A*and zero otherwise. Equations (6) and (8) are identical apart from the characteristic functions. The ML estimator (MLE) is the combination of parameter values that maximizes the above log-likelihood function.

Usually, this likelihood is written more simply with just Eq. (6) without the characteristic functions but with the caveat that *ξ* ≠ 0 and 1 + *ξ*/*σ*(*z*_{i} − *μ*) > 0 for *i* = 1, …, *n*; the Gumbel case is then given separately. However, the more convoluted form given in Eqs. (6)–(8) emphasizes the fact that the GEV log likelihood involves characteristic functions that depend on the parameters. Subsequently, the regularity conditions that assure that the MLE is asymptotically normally distributed, and thus allow for construction of a fairly simple parametric CI, are not met. Büecher and Segers (2017) show that the asymptotic normality of the MLE holds for parametric families that are differentiable in quadratic mean whose supports depend on the parameter; they also show that the GEV family is not differentiable in the quadratic mean unless *ξ* > −1/2. Smith (1985) already showed that if *ξ* > −1/2, the regularity conditions for the MLE to be asymptotically normally distributed will hold. Similar results hold, of course, for the GP likelihood and PP characterization and so are omitted here.

While the MLE is perfectly valid as an estimator for the EVD parameters, clearly it is beneficial to have an alternative strategy for constructing CI’s. Even if *ξ* > −1/2, which is often the case, when interest is in return levels that exceed the temporal range of the data (e.g., estimating a 100-yr return level with only 20 years of data), the actual distribution functions for such return levels tend to be asymmetric. So, the assumption of approximate normality will not hold. The profile likelihood method is useful for finding CI’s in this context, but it is a difficult procedure to automate (cf. Gilleland and Katz 2016). Therefore, bootstrap methods are appealing for constructing CI’s for EVD parameters and return levels.

## 3. Bootstrap inference for extreme-value distribution functions

PI provides a thorough review of statistical inference as conducted via bootstrap methods, including the various CI’s calculated in what follows. Bootstrapping for extreme values is challenging for a couple of reasons. The most obvious is that if resampling is carried out using only the observed data, then more extreme values than those observed will not be included in any of the fitting procedures. A more subtle reason will be discussed in section 3c.

It is useful to use simulated data with known distributional forms in order to demonstrate the software. The following code shows how to draw a sample of size 100 from a GEV distribution function with parameters *μ* = 30, *σ* = 10, and *ξ* = 0.2, and the result is assigned to an object called zGEV:

Similarly, to draw a random sample of threshold excesses from a GP distribution function:

Drawing from a point process is slightly more complicated because it is necessary to draw from the nonextreme part of the distribution function in addition to the extreme part. The following code is one way to obtain such a sample:

### a. TIB

For the stationary GEV distribution function, Schendel and Thongwichian (2015) performed a simulation test to demonstrate the utility of the TIB approach in comparison to another recommended, nonbootstrap, technique known as the profile-likelihood method. For this special case, they introduced a fast method for employing the TIB, thereby allowing them to perform such a test of the method. To have a flexible method that works even for nonstationary models, the functions available in “extRemes” do not make use of this fast algorithm, and therefore such a test is not possible without resorting to parallel computing, which would still require an excessive amount of resources to implement. The TIB approach using the interpolation method is performed as below. Note that a GEV must first be fit to the data using fevd from “extRemes”:

The above example finds an estimated 1000-yr return level (specified by

The result of the plot command on the last line above is shown in Fig. 1. In the figure, an accurate interpolation method TIB interval should have a black circle near where both the leftmost vertical blue line crosses the top horizontal blue line (for the lower bound estimate) and where the rightmost vertical blue line crosses the bottom horizontal blue line. For this example, the interval appears to be reasonable and the “true” 1000-yr return level is within the 95% TIB CI.

The interpolation method for the TIB CI is not the recommended approach. A better approach is to use the Robbins–Monro algorithm. The following code performs this algorithm on the same simulation as above:

This time the *α*/2 and 1 − *α*/2 the estimated *p* value can be before the algorithm stops, and

Figure 2 shows the result of the plot command for this example. It makes a similar plot as in Fig. 1, but now the upper and lower bounds are estimated separately. Apart from generally being more accurate than the interpolation method, the Robbins–Monro algorithm enables the estimated *α*. With the interpolation method, only an inspection of the resulting plot gives any indication of the accuracy of the resulting interval.

As in Fig. 1, but using the Robbins–Monro algorithm instead of the interpolation method. In this case, the black circles are replaced with “l” and “u” symbols specifying lower vs upper. Accurate bounds should have a “u” symbol near where the rightmost vertical blue line crosses the lower horizontal blue line and an “l” symbol near where the leftmost vertical blue line crosses the top horizontal blue line.

Citation: Journal of Atmospheric and Oceanic Technology 37, 11; 10.1175/JTECH-D-20-0070.1

As in Fig. 1, but using the Robbins–Monro algorithm instead of the interpolation method. In this case, the black circles are replaced with “l” and “u” symbols specifying lower vs upper. Accurate bounds should have a “u” symbol near where the rightmost vertical blue line crosses the lower horizontal blue line and an “l” symbol near where the leftmost vertical blue line crosses the top horizontal blue line.

Citation: Journal of Atmospheric and Oceanic Technology 37, 11; 10.1175/JTECH-D-20-0070.1

As in Fig. 1, but using the Robbins–Monro algorithm instead of the interpolation method. In this case, the black circles are replaced with “l” and “u” symbols specifying lower vs upper. Accurate bounds should have a “u” symbol near where the rightmost vertical blue line crosses the lower horizontal blue line and an “l” symbol near where the leftmost vertical blue line crosses the top horizontal blue line.

Citation: Journal of Atmospheric and Oceanic Technology 37, 11; 10.1175/JTECH-D-20-0070.1

It is also possible to apply the TIB method in order to obtain CI’s for the parameters of the GEV. The code below shows how to do so for the shape parameter, which is difficult to estimate but arguably the most important one to pin down. This time, it is necessary to specify a different nuisance parameter because the default is the shape parameter. The which.one argument now specifies the number of the parameter in the order provided by strip below. In this case, the shape parameter is the third parameter:

For the example above, the 95% TIB CI is approximately (−0.09, 0.35), but note that results may vary because of the necessity for making random draws. The interval includes zero, but mostly includes more strongly positive values, and does include the “true” parameter value of 0.2. To change the confidence level, say to 99%, the alpha argument is used. The code below demonstrates how to perform the same analysis as above, but for 99% CI’s:

For this particular example (results not shown), the resulting 99% TIB CI is given by about (−0.18, 0.29), which also includes zero and a much longer interval below zero. The above intervals use the interpolation method. To use the Robbins–Monro algorithm, the following code for a 95% TIB CI can be used:

For one instance of the above code, the estimated achieved confidence level is close to the desired level at 0.024 for the lower bound and 0.98 for the upper. The 95% TIB CI is estimated to be about (0.04, 0.33). Again, individual results will vary. This 95% TIB CI does not include zero as the interpolation method did, and the Robbins–Monro method should be considered the more accurate. Indeed, the “true” shape parameter is 0.2, so ideally zero would not be in the interval.

For the GP distribution function, the same type of procedure can be carried out to obtain TIB CI’s. A GP distribution function must first be fit to the data, and then a similar analysis is carried out. For this example, however, a less ambitious interval for the 100-yr return level is sought. Because of the way the GP data were simulated, i.e., all values are above the threshold of 0, there is effectively only one data point per year; at least in the way

For this example (output not shown), the estimated 95% TIB CI using the Robbins–Monro algorithm gives an estimated achieved confidence level close to the desired one with an interval of about (8.65, 72.06). The “true” value of the return level is about 11.27, which falls inside the interval.

### b. Nonstationary analysis

Many atmospheric applications involve nonstationarities. It is possible to account for nonstationarity by modeling one or more parameters of the EVD as functions of the covariates (cf. Katz et al. 2002; Katz 2013). The following example takes the annual maximum of summer daily minimum temperature (°F) from Phoenix Sky Harbor Airport (Fig. 3; cf. Gilleland and Katz 2016) and fits a stationary GEV distribution function to the data, and then fits a nonstationary GEV distribution function with a linear trend in the location parameter given by *μ*(year) = *μ*_{0} + *μ*_{1} × year:^{2}

Annual maximum of summer daily minimum temperatures (°F) at Phoenix Sky Harbor airport.

Citation: Journal of Atmospheric and Oceanic Technology 37, 11; 10.1175/JTECH-D-20-0070.1

Annual maximum of summer daily minimum temperatures (°F) at Phoenix Sky Harbor airport.

Citation: Journal of Atmospheric and Oceanic Technology 37, 11; 10.1175/JTECH-D-20-0070.1

Annual maximum of summer daily minimum temperatures (°F) at Phoenix Sky Harbor airport.

Citation: Journal of Atmospheric and Oceanic Technology 37, 11; 10.1175/JTECH-D-20-0070.1

The function *p* value ≈ 2.19 × 10^{−13}) suggesting that the trend is important. The quantile–quantile (QQ) plots for each fit (not shown) suggest that the assumptions for the model with the linear trend are reasonable, where they might not be for the stationary model.

The penultimate line of code above sets up a special matrix that allows for finding “effective” return levels. The value of 91 corresponds to 1991, one year later than the data range which ends at 1990, and 1900 + 120 = 2020 to give the value for the year 2020.

Next, it is desired to find the 95% TIB CI for the nonstationary model for the “effective” 100-yr return level for the year 1991. They can be found for 2020 using an analogous approach, but for

While the above code finds a reasonable lower bound for the “effective” 100-yr return level for 1991 of about 93°F, it does not find an upper bound. The following code shows how to obtain the parametric CI that assumes normality for the MLE of the 100-yr “effective” return level, which is not a reasonable assumption. Nevertheless, the lower bound agrees pretty closely with the lower bound found by the TIB method at about 93.5°F. The upper bound estimate is given by nearly 96°F:

Given the difficulties with the TIB method, a viable alternative is to perform a regular bootstrap procedure, but using parametric resampling from the fitted GEV distribution function. To perform this type of resampling, the pbooter function can be used, which requires two functions to be defined: one to calculate the statistics of interest and another to simulate data from the fitted nonstationary distribution function. For the latter, “rextRemes” provides a useful shortcut. The first function must have a minimum of two arguments: data and

The result from one implementation of the above code gives a 95th-percentile bootstrap CI of about (93.24°, 96.31°F) for the 100-yr “effective” return level for the year 1991 and about (97.88°, 102.28°F) for the 100-yr “effective” return level for the year 2020. For this example, the CI for the year 1991 “effective” return level is very close to the one found by the normal approximation (classical) interval, which suggests that the distribution function for this 100-yr return level is at least approximately normal.

Other methods for communicating risk for nonstationary extreme values is nicely summarized in Cooley (2013). These methods are out of the scope of the present text, but may be very useful. It is hoped that the above information about how to apply the parametric bootstrap can still be useful if other more advanced techniques are preferred.

### c. The m < n bootstrap

It is well known that the asymptotic results that support bootstrap sampling as a valid method for hypothesis testing and CI construction fail in the case of heavy-tail data. As mentioned in PI, the main assumption for the bootstrap method to be valid is that the relationship between *θ*. Importantly, their scaled differences *D*_{n}, respectively, and *G* is their limiting distribution function. Extreme values, in particular, can be problematic for bootstrap resampling because of the heavy-tail case, whose asymptotics require the bootstrap sample size *m* → ∞ but also that *m*/*n* → 0 as *n* → ∞ (e.g.,

For this section, three random samples are drawn from a heavy-tail distribution function. Namely, the GEV distribution function with *μ* = 0, *σ* = 1, and *ξ* = 0.1, 0.5, and 1.5, respectively. The following R code is used to make the simulations:

Figure 4 displays the histograms for each simulation. The first simulation has a heavy tail, but it is not as “heavy” as the other two cases. That is, the GEV distribution function has a heavy tail when the shape parameter is greater than zero, and the distribution functions tail decays slower as the value of this parameter increases. The “true” mean can be derived for each of these distribution functions, and is given by [Γ(1 − 0.1) − 1]/0.1 ≈ 0.69, [Γ(1 − 0.5) − 1]/0.5 ≈ 1.54 and is undefined whenever *ξ* > 1.

Histograms of simulated heavy-tail samples from GEV distribution functions with location parameters equal to zero, scale parameters equal to unity, and shape parameters of 0.1, 0.5, and 1.5, respectively. The theoretical mean for each simulation is approximately 0.69, 1.54, and undefined, respectively. The means for these samples are approximately 0.64, 1.45, and 16.24.

Citation: Journal of Atmospheric and Oceanic Technology 37, 11; 10.1175/JTECH-D-20-0070.1

Histograms of simulated heavy-tail samples from GEV distribution functions with location parameters equal to zero, scale parameters equal to unity, and shape parameters of 0.1, 0.5, and 1.5, respectively. The theoretical mean for each simulation is approximately 0.69, 1.54, and undefined, respectively. The means for these samples are approximately 0.64, 1.45, and 16.24.

Citation: Journal of Atmospheric and Oceanic Technology 37, 11; 10.1175/JTECH-D-20-0070.1

Histograms of simulated heavy-tail samples from GEV distribution functions with location parameters equal to zero, scale parameters equal to unity, and shape parameters of 0.1, 0.5, and 1.5, respectively. The theoretical mean for each simulation is approximately 0.69, 1.54, and undefined, respectively. The means for these samples are approximately 0.64, 1.45, and 16.24.

Citation: Journal of Atmospheric and Oceanic Technology 37, 11; 10.1175/JTECH-D-20-0070.1

To demonstrate how to perform an *m* < *n* bootstrap, suppose interest is in finding 95% CI’s for the population mean of each of these samples. The sample estimate of the mean for the first two cases is fairly close to the population means given above at about 0.64 and 1.45. While the population mean does not exist for the third sample, given a sample of real values, the *sample* mean can always be estimated, and it is found to be about 16.24 here. First, a function is needed to calculate the statistic of interest which must take the arguments data and

Next, the bootstrap samples are made with the following commands. For each simulation, both the *m* = *n* and

Note that it is the

Table 1 displays the results of the above command. Despite that the seeds are set so that the reader can obtain the same original samples, no seed is set in these bootstrap results, so results may vary from what is displayed in the table. It is immediately clear that the *m* < *n* with *m* = *n* counterpart, as should be expected because of the smaller resample sizes. The estimated bias is also larger in each case.

The 95th-percentile bootstrap CI’s for the three heavy-tail simulations from the GEV distribution function with *μ* = 0 and *σ* = 1.

While there is no remedy for the fact that the undefined-mean situation cannot be discerned, the CI’s are so wide that they at least provide a hint that something might be awry. If the GEV distribution function is suspected, then it can be fit to the data and inferences about its shape parameter would then reveal this possibility.^{3}

## 4. Discussion and conclusions

This paper demonstrates how to use new bootstrap functions available in the R (R Core Team 2017) package “extRemes” (versions ≥ 2.0–12) for extreme-value analysis; the functions in “extRemes” are wrapper functions to bootstrap code from the “distillery” package. Bootstrap methods have been shown to be highly accurate in situations where usual assumptions for more standard intervals may not apply. However, the accuracy depends on utilizing the correct bootstrap methodology for the random sample to which it is applied.

While the “boot” package (Davison and Hinkley 1997; Canty and Ripley 2017) in R provides excellent utility for performing bootstrap resampling and estimating CI’s, the functions in “distillery” make certain operations easier; some of which are not possible with “boot.” For example, PI and this paper demonstrate how to perform a test-inversion bootstrap (TIB), which is currently not available in “boot,” and an *m* < *n* bootstrap, which is less straightforward to do with “boot.”

TIB interval results agree with previous works about their utility for analyzing extreme-value distribution functions, but it is found here that these methods may not be stable when fitting more complex, for example nonstationary, distribution functions. They are also fairly difficult to automate as they often need to be rerun in order to find function arguments that will allow the procedure to converge on an appropriately sized CI. Nevertheless, they represent a theoretically appealing approach, so their availability as a general tool in the “distillery” package might be useful to some research efforts.

Parametric intervals are a good alternative for extreme-value applications in part because it is possible to justifiably simulate values that are more extreme than those observed in the data. Here, it is shown how to apply this approach for the case of nonstationary peak-over-threshold data, which was previously unavailable in “extRemes.”

## Acknowledgments

Support for this manuscript was provided by the National Science Foundation (NSF) through the Regional Climate Uncertainty Program (RCUP) at the National Center for Atmospheric Research (NCAR). NCAR is sponsored by NSF and managed by the University Corporation for Atmospheric Research.

## REFERENCES

Arcones, M. A., and E. Giné, 1989: The bootstrap of the mean with arbitrary bootstrap sample.

,*Ann. Inst. Henri Poincaré***25**, 457–481.Arcones, M. A., and E. Giné, 1991: Additions and corrections to “The bootstrap of the mean with arbitrary bootstrap sample.”

,*Ann. Inst. Henri Poincaré***27**, 583–595.Athreya, K. B., 1987a: Bootstrap of the mean in the infinite variance case.

*Proc. First World Congress of the Bernoulli Society*, Utrecht, Netherlands, Bernoulli Society, 95–98.Athreya, K. B., 1987b: Bootstrap of the mean in the infinite variance case.

,*Ann. Stat.***15**, 724–731, https://doi.org/10.1214/aos/1176350371.Bickel, P. J., and D. A. Freedman, 1981: Some asymptotic theory for the bootstrap.

,*Ann. Stat.***9**, 1196–1217, https://doi.org/10.1214/aos/1176345637.Bickel, P. J., F. Götze, and W. R. van Zwet, 1997: Resampling fewer than

*n*observations: Gains, losses, and remedies for losses.,*Stat. Sin.***7**, 1–31.Büecher, A., and J. Segers, 2017: On the maximum likelihood estimator for the generalized extreme-value distribution.

,*Extremes***20**, 839–872, https://doi.org/10.1007/s10687-017-0292-6.Canty, A., and B. Ripley, 2017: boot: Bootstrap R (S-Plus) Functions, version 1.3-20. R package, http://statwww.epfl.ch/davison/BMA/.

Cooley, D., 2013: Return periods and return levels under climate change.

, Springer, 97–114.*Extremes in a Changing Climate: Detection, Analysis and Uncertainty*Davison, A., and D. Hinkley, 1997:

. Cambridge University Press, 582 pp.*Bootstrap Methods and Their Application*Deheuvels, P., D. M. Mason, and G. R. Shorack, 1993: Some results on the influence of extremes on the bootstrap.

,*Ann. Inst. Henri Poincaré Probab. Stat.***29**, 83–103.Fawcett, L., and D. Walshaw, 2012: Estimating return levels from serially dependent extremes.

,*Environmetrics***23**, 272–283, https://doi.org/10.1002/env.2133.Feigin, P., and S. I. Resnick, 1997: Linear programming estimators and bootstrapping for heavy-tailed phenomena.

,*Adv. Appl. Probab.***29**, 759–805, https://doi.org/10.2307/1428085.Fukuchi, J.-I., 1994: Bootstrapping extremes of random variables. Ph.D. thesis, Iowa State University, 101 pp.

Gilleland, E., 2017: distillery: Method Functions for Confidence Intervals and to Distill Information from an Object, version 1.0-4. R package, https://www.ral.ucar.edu/staff/ericg.

Gilleland, E., 2020: Bootstrap methods for statistical inference. Part I: Comparative forecast verification for continuous variables.

,*J. Atmos. Oceanic Technol.***36**, 2117–2134, https://doi.org/10.1175/JTECH-D-20-0069.1.Gilleland, E., and R. W. Katz, 2016: extRemes 2.0: An extreme value analysis package in R.

,*J. Stat. Software***72**, 1–39, https://doi.org/10.18637/jss.v072.i08.Gilleland, E., R. W. Katz, and P. Naveau, 2017: Quantifying the risk of extreme events under climate change.

,*Chance***30**, 30–36, https://doi.org/10.1080/09332480.2017.1406757.Giné, E., and J. Zinn, 1989: Necessary conditions for the bootstrap of the mean.

,*Ann. Stat.***17**, 684–691, https://doi.org/10.1214/aos/1176347134.Hall, P., 1990: Asymptotic properties of the bootstrap for heavy-tailed distributions.

,*Ann. Probab.***18**, 1342–1360, https://doi.org/10.1214/aop/1176990748.Heffernan, J. E., and J. A. Tawn, 2004: A conditional approach for multivariate extreme values (with discussion).

,*J. Roy. Stat. Soc.***66B**, 497–546, https://doi.org/10.1111/j.1467-9868.2004.02050.x.Katz, R. W., 2013: Statistical methods for non-stationary extremes.

, Springer, 15–37.*Extremes in a Changing Climate: Detection, Analysis and Uncertainty*Katz, R. W., M. B. Parlange, and P. Naveau, 2002: Statistics of extremes in hydrology.

,*Adv. Water Resour.***25**, 1287–1304, https://doi.org/10.1016/S0309-1708(02)00056-8.Kinateder, J. G., 1992: An invariance principle applicable to the bootstrap.

*Exploring the Limits of Bootstrap*, Wiley Series in Probability and Mathematical Statistics, Wiley, 157–181.Knight, K., 1989: On the bootstrap of the sample mean in the infinite variance case.

,*Ann. Stat.***17**, 1168–1175, https://doi.org/10.1214/aos/1176347262.Kyselý, J., 2002: Comparison of extremes in GCM-simulated, downscaled and observed central-European temperature series.

,*Climate Res.***20**, 211–222, https://doi.org/10.3354/cr020211.Lee, S., 1999: On a class of

*m*out of*n*bootstrap confidence intervals.,*J. Roy. Stat.***61B**, 901–911, https://doi.org/10.1111/1467-9868.00209.LePage, R., 1992: Bootstrapping signs.

*Exploring the Limits of Bootstrap*, Wiley Series in Probability and Mathematical Statistics, Wiley, 215–224.R Core Team, 2017: R: A language and environment for statistical computing. R Foundation for Statistical Computing, https://www.R-project.org/.

Resnick, S. I., 2007:

*Heavy-Tail Phenomena: Probabilistic and Statistical Modeling.*Springer Series in Operations Research and Financial Engineering, Springer, 404 pp.Schendel, T., and R. Thongwichian, 2015: Flood frequency analysis: Confidence interval estimation by test inversion bootstrapping.

,*Adv. Water Resour.***83**, 1–9, https://doi.org/10.1016/j.advwatres.2015.05.004.Schendel, T., and R. Thongwichian, 2017: Confidence intervals for return levels for the peaks-over-threshold approach.

,*Adv. Water Resour.***99**, 53–59, https://doi.org/10.1016/j.advwatres.2016.11.011.Shao, J., and T. Dongsheng, 1995:

. Springer, 123 pp.*The Jackknife and the Bootstrap*Smith, R. L., 1985: Maximum likelihood estimation in a class of nonregular cases.

,*Biometrika***72**, 67–90, https://doi.org/10.1093/biomet/72.1.67.

^{1}

The GP distribution is a *conditional* distribution with the condition that *X* > *u*. Therefore, it does not have a location parameter.

^{2}

When incorporating covariate information into the parameters of the GEV distribution function, it is important to first allow the location parameter to vary. If the inclusion of the covariate term is found to be significant, then inclusion of covariates in the scale parameter can be tested. Generally, it is desirable not to include covariates in the shape parameter, but if there is reason to do so, then they should be included only after including them with the scale parameter. The issue is for any location-scale family of distributions and not just extreme-value distribution functions. Consider, for example, a normal distribution where the mean is also a location parameter and the standard deviation is also a scale parameter. Because the standard deviation involves deviations about the mean, it follows that incorrect specification of the mean, e.g., ignoring a trend in the mean, will be problematic for estimating the standard deviation. This issue is related to one that arises in polynomial regression where it is well know that fitting a second-order polynomial without any linear term is problematic.

^{3}

Because of the three types of tail behavior for the extreme-value distributions, with one the heavy-tail case, it is problematic to perform resampling from the data without accounting for the uncertainty in type of tail. The parametric bootstrap avoids this issue.