## 1. Introduction

Large hail [≥25 mm (1 in.)] can produce significant damage to property and agriculture. However, little is known about the hazard posed by or incidence of the largest hail diameters. Large hail is the greatest contributor to insured losses from thunderstorms in both the United States and globally (Gunturi and Tippett 2017), producing cumulative and single-event losses that often total in excess of $1 billion [U.S. dollars (USD); Changnon (2008); Sander et al. (2013); Munich RE (2015)]. Cumulative losses typically arise as a result of a single or several days of damaging hail events of smaller magnitude (e.g., St. Louis, Missouri, 2012, USD $1.6 billion total) or impacts on a number of rural centers, in addition to agricultural losses. Large catastrophic single-event losses typically occur when a larger urban center is impacted with hail at or exceeding golf ball–sized diameter [45 mm (1.75 in.)], when damage to structures, windows and vehicles becomes more frequent (Brown et al. 2015). Recent examples of such catastrophic hailstorms include a USD $4 billion hail event in Phoenix, Arizona [100-mm (4 in.) maximum-diameter hailstones], a USD $900 million loss hail event impacting Dallas–Fort Worth, Texas, in 2012 [2.75–3.5-in. stones; Brown et al. (2015)], and two hailstorms in Texas during the spring of 2016, including one in San Antonio that produced a combined USD $4.7 billion in losses due to hail (Swiss RE 2017). To understand the hazard and damage potential posed by large hail events, there are several important quantities that need to be explored. The likelihood of hail occurrence at a given location provides some guidance in determining this hazard (e.g., Allen and Tippett 2015; Allen et al. 2015). However, it is not only the likelihood of occurrence, but also the size, shape, density, velocity of impact, and spatial extent of these stones that determines the scale and nature of the damage (Changnon 1966; Morgan and Towery 1975; Changnon 1977; Nelson and Young 1979; Cox and Armstrong 1981; Cheng et al. 1985; Sánchez et al. 1996; Heymsfield et al. 2014; Brown et al. 2015). These elements present an important part of the potential for economic losses to agriculture and property. The significance of large hail to the country motivates an analysis of just how large hailstones can get over the United States, leveraging both climatology and extrapolation of the likelihood of large hail.

To explore the spatial risk, occurrence, and magnitude of hail, previous work in the United States has leveraged a mixture of insurance data, National Weather Service spotter observations, field campaigns, and weather station data (e.g., Changnon 1977; Cox and Armstrong 1981). Most challenging to many of these studies was the limited spatial distribution of observed hail, which generally provided insufficient resolution to determine hail swathes and corresponding loss characteristics (Morgan and Towery 1975; Nelson and Young 1979). Obtaining a picture of the hazard over larger parts of the continental United States is challenging, as insurance data are often nonspecific in their spatial extent or combined with other hazards (Changnon 1999; Brown et al. 2015), and spotter observations and field campaigns are relatively few and far between (Strong and Lozowski 1977; Ortega et al. 2009; Blair and Leighton 2012; Heymsfield et al. 2014; Blair et al. 2017). In contrast, the abundance of hail reports for the United States in *Storm Data* (Schaefer and Edwards 1999) has led to a number of climatologies exploring hail occurrence and the hazard posed (Kelly et al. 1985; Changnon 1999; Changnon and Changnon 2000; Schaefer et al. 2004; Doswell et al. 2005; Changnon 2008; Allen et al. 2015; Allen and Tippett 2015). Despite these efforts contributing greatly to our understanding of spatial hail occurrence, the temporal and spatial limitations of hail size observations have made the hazard posed to property by large hail unclear (Doswell et al. 2005; Allen and Tippett 2015). Illustrative examples of these problems include the tendency for clustering and duplication of hail reports toward more heavily populated areas, concentration of the early reports in the record in the Great Plains, and a sensitivity to quantization as hail size approaches the arbitrary criteria used to define severe thunderstorms (Schaefer et al. 2004; Doswell et al. 2005; Allen and Tippett 2015).

There are also several challenges introduced by specifically considering hail size, rather than occurrence. Arbitrary methods of hail size measurement and a text-based observation system that trusts observers to make estimations increase the difficulty of subsequent analyses of size/occurrence distributions (Allen and Tippett 2015). Hail size observations are heavily quantized by the use of reference objects when recording their occurrence, rather than direct measurements (Doswell et al. 2005; Blair and Leighton 2012; Blair et al. 2017). This methodology leads to both the over- and underreporting of the maximum size as hail is skewed toward reference objects, further emphasizing issues with hail size observations (Heymsfield et al. 2014). It is also questionable whether the largest point observation of hail size correctly reflects the largest hail that occurs in a storm (Bardsley 1990; Blair and Leighton 2012; Blair et al. 2017 and references therein). This is exacerbated by the rarity of large hail (Fraile et al. 1992), as well as storms being likely to produce fewer large stones, or large volumes of hail, but not both (Cheng et al. 1985). A nonobservational complication is posed by the relative importance of the size of hail to different economic sectors. To agriculture, a hailstone of 12.5 mm (0.5 in.) or larger could be extremely damaging (Changnon 1977; McMaster 1999; Doswell 2001). In contrast, for structures or vehicles, hailstones of 38–50 mm (1.5–2.0 in.) or greater are typically necessary to cause large amounts of damage (Cox and Armstrong 1981; Heymsfield et al. 2014; Brown et al. 2015; Allen and Tippett 2015 and references therein). Thus, estimating the hazard posed by larger hail can present challenges and can depend heavily on the targeted group exposed to the hazard.

In this paper, we focus on the likelihood of hail in excess of the U.S. severe thunderstorm criterion [25 mm (1 in.)]. In particular, we explore the characteristics of the larger-diameter hail that produces the greatest degree of damage to infrastructure and property, and look to statistically estimate the probability of occurrence. We do this by applying extreme value theory methods to hail size observations. Extreme value theory (Fisher and Tippett 1928; Gumbel 1958; Frechet 1927) has changed the way engineers and scientists quantify the hazard associated with rare but extreme events. In particular, the generalized extreme value (GEV) distribution (Jenkinson 1955) has seen widespread application in fields such as hydrology (e.g., rainfall extremes, streamflow), extreme wind speeds, finance and, more generally, in the earth and atmospheric sciences [for more complete reviews, see Palutikof et al. (1999) and Coles (2001)]. However, this approach has only been applied rarely to hail as a result of limitations in data availability and insufficient record length to provide an estimate of the return size (Cox and Armstrong 1981; Smith and Waldvogel 1989; Bardsley 1990; Fraile et al. 2003). This study leverages the growing temporal extent and quality of the hail observation dataset to explore the likelihood of seeing given hail sizes using the Gumbel distribution.

The paper is structured as follows. Section 2 describes the hail observations dataset and the selection of the Gumbel distribution as an appropriate distribution to model extreme hail sizes. Section 3 outlines the characteristics of U.S. hail size data and approaches for negating the limitations of the data. Section 4 describes the fitted extreme value model developed from these data, while section 5 discusses the estimated return intervals for hail of various sizes and the stability of the fitting approach. In section 6, we interpret these results within the context of providing an analysis of the hazard posed by hail over the United States.

## 2. Datasets and approach

### a. U.S. hail observations

U.S. hail reports were taken from the National Centers for Environmental Information archive (Schaefer and Edwards 1999) for the period 1979–2013. While these data are available for a longer period (1955–2015), changes in reporting of events influence the dataset in a more pronounced way between 1955 and 1979, and there are several years with no reported hail for many locations (Allen and Tippett 2015). Hail reports were gridded onto a 1° × 1° grid to smooth the otherwise noisy observational dataset, which has larger point-to-point variations over the domain considered, but it is possible that in some locations choosing higher resolutions might be appropriate. These reports were gridded for 3-h periods (e.g., 0000–0300 and 0300–0600 UTC), with assignment of the largest hail reported for that grid point in each 3-hourly period. This aggregation choice prevents repetitive inclusion from a single thunderstorm, limits biases that would occur as a result of a higher reporting frequency over cities, and reduces the limitations associated with the small spatial distribution of large hail and sporadic report data. Despite the dataset being likely the longest and most complete national hail record (Allen and Tippett 2015), there are significant nonmeteorological inhomogeneities as described above, and thus gridded or point results should be carefully interpreted.

### b. Modeling extremes: Generalized extreme value (GEV) distribution

*k*,

*σ*, and

*μ*are known as the shape, scale, and location parameters, respectively. For

*k*, the distributions are, respectively, Frechet (EV2) and Weibull (EV3).

The three EV limiting patterns of behavior depend on the type of the distribution from which the maxima (or minima) are extracted. Since these parent distributions are often unknown, the GEV flexibility is particularly appealing, allowing all three parameters (including the shape parameter *k*) to vary. This flexibility has some drawbacks as well, with limited data making the estimation of the parameters (in particular *k*) difficult (i.e., Hosking et al. 1985; Martins and Stedinger 2000). Common applications of the Weibull distribution include intense tornadoes and wind speed analyses (e.g., Pavia and O’Brien 1986; Dotzek et al. 2003). Typical Frechet applications have included, among many, rainfall maxima and streamflow data (Coles 2001).

*k*= 0 and consider the simplest Gumbel (type I) distribution (Hosking et al. 1985). Thus, Eq. (1) in the type 1 case simplifies to

The location parameter *μ* summarizes the location or shift of the body of extremes (in this case the mean annual maximum hail size), while the scale parameter *σ* describes its statistical dispersion (interannual variability of the annual maximum hail size).

Typical estimation procedures are maximum likelihood (MLE), L-moments [also known as probability weighted moments (PWMs)] and more recent hybridized methods such as generalized maximum likelihood (GMLE), and generalized PWM (GPWM), with the latter three performing better with small samples (Hosking et al. 1985; Martins and Stedinger 2000; Coles 2001).

For the purposes of this investigation, we considered a Gumbel (type I) distribution with the MLE and L-moments estimation methods. This decision was made based on testing the value of the shape parameter over the continent, which revealed only small variations from zero and nonsignificant likelihood-ratio tests for all but seven grid points over the continental United States (not shown), suggesting that, when combined with the difficulty in fitting three-parameter models, the Gumbel approach was preferable given the characteristics of the data. Both MLE and L-moments approaches have been applied to limited areas for hail in the past (Cox and Armstrong 1981; Smith and Waldvogel 1989; Bardsley 1990; Fraile et al. 2003). We focus here on using gridded annual maxima, which show a more limited spurious temporal trend in the frequency of observations compared to higher-frequency data (Allen and Tippett 2015). An assumption of the Gumbel estimation technique is that the data do not exhibit a trend. Otherwise, there is a need for the trend to be accounted for separately, and thus this aspect of the data record was explored. Where data are missing, or less than 30 years’ worth of observations are available, the model is not fitted, as this was identified to lead to overly wide confidence intervals, particularly for long return periods.

## 3. Results

### a. U.S. hail size observations

Assessing the characteristics of the hail size record during the past 35 yr (1979–2013), the majority of the United States east of the Rockies has experienced at least one hail event where the maximum observed hail produced was between 75 and 100 mm (3–4 in.), and many places had hail of 112–125 mm (4.5–5 in.) diameter (Fig. 1a). Within this area, isolated hail events between 150 and 200 mm (6–8 in.) are scattered from southern Texas into South Dakota. The differences between the 1979–2013 period and the full hail record (Fig. 1b) are mostly in areas where severe thunderstorms producing very large hail are less frequent (Allen et al. 2015; Allen and Tippett 2015), over the northern plains and the Southeast (Fig. 1a). This extension of the record by 24 yr yields a considerable increase to gridded maxima above 125 mm (5 in.), and numerous sites with at least 175-mm (7 in.) hail, suggesting that much of the Great Plains, Midwest, and Northeast are susceptible to extremely large hail events. Instances of large hail are less common into the Southeast and in general farther east where thermodynamic energy (e.g., CAPE) is reduced, owing to decreased lapse rates from repeated diurnal mixing (Allen et al. 2015). In addition, over the Southeast the veracity of large hail size reports has been questioned (Cintineo et al. 2012; Allen and Tippett 2015).

The overall number of observed hail reports has increased remarkably over the past 58 yr (Allen and Tippett 2015). However, maximum hail size displays less of a trend than the number of reports, as many large hail events occurred between 1955 and 1979 (Fig. 1b). However, during the past decade, the largest 10 hailstones on record for the entire continental United States have changed on several occasions (Blair and Leighton 2012; Blair et al. 2017), suggesting that local maximum possible hail sizes may change as the record is extended. This variability is perhaps a result of the increased number of active observers in past decades. Also unsurprising is the incompleteness of the record, as at any one location, large hail size events occur on a rare subset of hail days, which are again a small subset of days in any given year. Thus, without a sufficiently long record, there is potential for significant instability in estimations of the maximum size of the hazard. The impact of this uncertainty can be considered by comparing the overall maximum hail size on a 1° × 1° grid for the period 1955–2013 to values for 1979–2013 (Figs. 1a,b). The largest-diameter hail reported for the United States occurred in Vivian, South Dakota, and was 200 mm (8 in.). This however may not reflect the upper bound for hail size, as it is plausible that individual stones in a storm may have exceeded this value (Blair and Leighton 2012; Blair et al. 2017). There is likely an upper limit to the maximum hail size suspended by any updraft depending on the updraft speed. This upper limit in turn is controlled by environmental parameters such as the maximum value of CAPE and the strength of vertical wind shear in a storm’s formative environment, but this value might not be captured by available observations (Ziegler et al. 1983; Nelson 1983, 1987). Other potential limiting factors to maximum hail size include the availability of supercooled liquid water, the ambient temperature, as well as other microphysical effects.

To evaluate the year-to-year consistency in observations of large hail sizes, the mean annual maximum was explored, which is partly dependent on the observational record of years with an annual maxima (Fig. 1c). As the record is sparse spatially and temporally for the period 1955–78 (Allen and Tippett 2015), we focus on the period 1979–2013. For much of Oklahoma, Kansas, Colorado, Nebraska, and the nearby states, the mean annual maximum hail size is 50 mm (2 in.) in diameter or larger, with 70–75 mm (2.75–3 in.) being more common in both Oklahoma and Texas. The annual mean hail size for much of the eastern United States is between 25 and 50 mm (1–2 in.), suggesting that for longer return periods, considerably damaging hail is certainly possible, and can be expected to be likely. The number of years with at least one nonzero hail observation is also examined on a gridpoint basis, illustrating that for most of the plains, Midwest, and Southeast, more than 30 of the last 35 yr meet this criterion (Fig. 1d). West of the Rocky Mountains, however, most locations have fewer than 20 annual maxima and, thus, are not fitted.

Seasonally, maximum hail size shifts northward during the summer months (Fig. 2a), consistent with the occurrence climatology and the seasonal cycle of CAPE (Allen et al. 2015). However, despite this shift, the incidence of the largest hail sizes is not uncommon through the entire central United States during the summer, reflecting climatologically rare events with extreme CAPE and some degree of vertical wind shear (Figs. 2b–d). These events introduce localized peaks in the maximum hail size, but the overall lower frequency of occurrence leads to a smaller mean maximum in the summer, reflecting the fact that environmental conditions favorable for larger hail are more infrequent during the summer months (Brooks 2013).

Seasonality of maximum hail size from 1979 to 2013 in terms of the (a) peak month of hail size (month with the largest hail size) based on the gridbox mean of nonzero months for the period 1979–2013, (b) March–May (MAM) maximum hail size, (c) June–August (JJA) maximum hail size, and (d) September–November (SON) maximum hail size.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Seasonality of maximum hail size from 1979 to 2013 in terms of the (a) peak month of hail size (month with the largest hail size) based on the gridbox mean of nonzero months for the period 1979–2013, (b) March–May (MAM) maximum hail size, (c) June–August (JJA) maximum hail size, and (d) September–November (SON) maximum hail size.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Seasonality of maximum hail size from 1979 to 2013 in terms of the (a) peak month of hail size (month with the largest hail size) based on the gridbox mean of nonzero months for the period 1979–2013, (b) March–May (MAM) maximum hail size, (c) June–August (JJA) maximum hail size, and (d) September–November (SON) maximum hail size.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

An important consideration of the overall hail record highlighted by Allen and Tippett (2015) is its consistency through time. To evaluate this, the maximum size and mean annual maximum size are broken into two segments: 1979–96 and 1997–2013 (Fig. 3a). The magnitude of the differences for the maxima suggests that changes are not large or systematic, particularly in the central United States. Performing a similar comparison to Fig. 3a for 1955–78 as compared to 1979–2013 provides an overall similar pattern, with isolated larger maxima reflecting rarely occurring events being captured by the longer record (not shown). This similarity suggests that the maximum hail size has a greater sensitivity to record length than the changes in reported size between the two segments, which is in contrast to the finding that many of the largest observed hailstones have occurred in the most recent decade (Blair and Leighton 2012; Blair et al. 2017). The inconsistency between these two characteristics can be resolved as any of the individual stones noted by Blair and Leighton (2012) would influence only a small number of the grid boxes used in the current study. In the southeastern United States, there is a slightly greater change in maximum observed hail size over a large area, which can be explained by a regional trend in environment, or potentially a bias in the reported maximum size arising from recent increases in reports in these regions (Schaefer et al. 2004; Allen and Tippett 2015). Considering the mean annual maximum (Fig. 3b), there is a noticeable contribution from the increasing number of reports of hail of 25–50-mm (1–2 in.) diameter. There is also a suggestion of increases in mean annual maximum hail size over the Southeast and high plains reflecting a greater diligence in collecting hail reports to verifying warnings, though these increases are generally small, at 12.5–25 mm (0.5–1 in.). Analyzing this change using a Wilcoxon signed rank test for the difference between the medians (Wilks 2006), a substantial number of points, especially in the Southeast, show a significant change at the p value of ≤0.05. This reflects the large increase in the number of observations in this region (where zeros occur during the first period) in the most recent two decades rather than a trend in size (Allen and Tippett 2015), suggesting the data are stationary and thus the trend does not need to be included in the fitting procedure. The results from this analysis of hail size characteristics suggest that while there are considerable pitfalls with the record over the continental United States, there are also sufficient data to warrant development of a hail size model.

Changes between the period 1979–95 and 1996–2013 in terms of (a) the largest recorded annual maximum hail size and (b) mean annual maximum hail size over the United States. Stippling shows where a Wilcoxon signed-rank test of medians has a *p* value of less than 0.05 and reflects a rejection of the hypothesis of no change in the median between the periods.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Changes between the period 1979–95 and 1996–2013 in terms of (a) the largest recorded annual maximum hail size and (b) mean annual maximum hail size over the United States. Stippling shows where a Wilcoxon signed-rank test of medians has a *p* value of less than 0.05 and reflects a rejection of the hypothesis of no change in the median between the periods.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Changes between the period 1979–95 and 1996–2013 in terms of (a) the largest recorded annual maximum hail size and (b) mean annual maximum hail size over the United States. Stippling shows where a Wilcoxon signed-rank test of medians has a *p* value of less than 0.05 and reflects a rejection of the hypothesis of no change in the median between the periods.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

### b. Model fitting

Examining the empirical cumulative distribution function (CDF; Fig. 4a), the distribution of hail size is heavily quantized as a result of reference objects, most notably in the 19–25-mm (0.75 –1.00 in.) range, reflecting the minimum thresholds for severe hail reports [19 mm (0.75 in.) during 1979–2010, 25 mm (1.00 in.) during 2010–13] and for golf ball–sized hail [45 mm (1.75 in.)] and to other extents for other reference objects [e.g., a baseball, 70 mm (2.75 in.)]. These characteristics suggest that care needs to be taken in subsequent model fitting. As the desired model is a Gumbel distribution of the annual maximum hail size over each 1° cell, several approaches are needed to reduce the sensitivity to quantization of the data and the limited sample size. To address the quantization, the data were dithered, whereby a small random uniform amount is added to, or subtracted from, the observed value before the whole set of observations is used to determine the sample annual maxima (Fig. 4b). To avoid overly large biases at small hail diameters, a linear fitted random uniform correction was developed, which uses a dithering process of the form

(a) Illustration of the impact of capped linear dithering on hail size and (b) the resulting empirical CDF of U.S. hail size observations following the dithering procedure.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

(a) Illustration of the impact of capped linear dithering on hail size and (b) the resulting empirical CDF of U.S. hail size observations following the dithering procedure.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

(a) Illustration of the impact of capped linear dithering on hail size and (b) the resulting empirical CDF of U.S. hail size observations following the dithering procedure.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Performing this fitting of the Gumbel distribution using the MLE procedure, we obtain a gridded set of location and scale parameters over the continental United States (Figs. 5a,b). Generally higher values for scale are found across the Great Plains states, particularly over Texas, Oklahoma, and Kansas, reflecting more regular return rates of larger hail sizes. To a lesser extent this is also found over the southeastern United States. The differences between neighboring grid points are considerable over the domain with parameter estimation errors of 15%–20% (Fig. 5c), reflecting the difficulty in estimating the scale parameter with limited observation sets, and its sensitivity to outliers. This variability between the nearby grid points is particularly noticeable for locations with significant urban population (e.g., Dallas–Fort Worth, Amarillo, Lubbock, in Texas; Wichita, Kansas; Oklahoma City, Oklahoma), which increases the likelihood of large hail size reports. In contrast to the relatively limited area with high scale parameters, the location parameter is higher over a larger area, including the plains and through the Midwest and Southeast, with the highest values from central Texas to the Dakotas. The standard error in the location parameter estimates is between 2.5% and 5% over much of the domain except in locations that receive fewer hail reports, suggesting a greater confidence in the expected maximum sizes from the sample available (Fig. 5d). A test of a random set of 30 dithered fits shows minimal to negligible contributions to the standard error in using the dithering procedure (not shown). As another test of performance, the mean of the Gumbel distribution

Gumbel distribution parameter estimates and their standard errors for the point fit of dithered annual maxima observations with more than 30 yr for the period 1979–2013. (a) Scale parameter using MLE fitting. (b) Location parameter using MLE fitting. (c) Percentage standard error in scale parameter estimates from MLE. (d) Percentage standard error in location parameter estimates from MLE. (e) Scale parameter using L-moments fitting. (f) Location parameter using L-moments fitting.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Gumbel distribution parameter estimates and their standard errors for the point fit of dithered annual maxima observations with more than 30 yr for the period 1979–2013. (a) Scale parameter using MLE fitting. (b) Location parameter using MLE fitting. (c) Percentage standard error in scale parameter estimates from MLE. (d) Percentage standard error in location parameter estimates from MLE. (e) Scale parameter using L-moments fitting. (f) Location parameter using L-moments fitting.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Gumbel distribution parameter estimates and their standard errors for the point fit of dithered annual maxima observations with more than 30 yr for the period 1979–2013. (a) Scale parameter using MLE fitting. (b) Location parameter using MLE fitting. (c) Percentage standard error in scale parameter estimates from MLE. (d) Percentage standard error in location parameter estimates from MLE. (e) Scale parameter using L-moments fitting. (f) Location parameter using L-moments fitting.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Comparison of the mean annual maximum hail size at each grid box used for the model fitting and the mean of the derived Gumbel distribution at each point, determined by

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Comparison of the mean annual maximum hail size at each grid box used for the model fitting and the mean of the derived Gumbel distribution at each point, determined by

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Comparison of the mean annual maximum hail size at each grid box used for the model fitting and the mean of the derived Gumbel distribution at each point, determined by

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Other fitting approaches can also be used to determine the point Gumbel distribution that may be able to leverage greater confidence from the limited observations (e.g., probability weighted moments or L-moments). To evaluate whether this difference in fitting procedure influences the result compared to the MLE approach, identical data were fitted using L-moments, which suggests that there is little to be gained by using the second procedure given the existing limitations of the dataset (Figs. 5e,f). This lack of distinction between the two methods is consistent with the prior analysis of hail-pad return levels by Fraile et al. (2003), and thus here we focus on the results from the MLE approach.

### c. Return levels and stability analysis

To assess the suitability of the models to produce realistic return periods, several evaluations of performance were needed. First, the return levels and the confidence intervals for four regional locations that are collocated with highly populated observational records were analyzed (Fig. 7). Individual grid data show a larger spread in the potential regional fits for the surrounding grid boxes, reflecting variations in the sample size and relatively infrequent returns of larger hail sizes with decreasing confidence at longer return intervals. The grid box encompassing Oklahoma City is chosen as the long-term station representing the Great Plains (Figs. 7a). Observed annual maxima in the area range between 25 and 127 mm (1–5 in.), with heavy quantization toward both golf ball– [45 mm (1.75 in.)] and baseball-sized diameters [70 mm (2.75 in.)] over the full 35-yr record. Exploring the return levels yields a 127-mm (5 in.) stone at a 40-yr return interval, with the remainder of observations pointing to a stably fitted model. The bounds of the surrounding points indicate that, at the 2-yr interval, 51-mm (2 in.) hail is expected, while at the 10-yr return period, hail is expected to be within 64–114 mm (2.5–4.5 in.) with relatively strong confidence based on the relatively narrow confidence interval of fit results. As would be expected given the paucity of the largest of hail observations, the greatest range in both regions and confidence intervals is seen outside of the 50-yr return levels, with the regional spread as large as between 101 and 178 mm (4–7 in.) at the 200-yr interval. To explore performance over the northern plains, the point nearest Pierre, South Dakota, was examined (Fig. 7b). This point includes, in the observed record, the largest verified hail size observation in the United States of 203 mm (8 in.) and, thus, can be used to explore whether a single outlying observation heavily skews the distribution. As for Oklahoma, there is heavy quantization in the golf ball–size category, with the three largest stones found to be 114, 114, and 200 mm (4.5, 4.5, and 8 in.) in diameter. Bounds from the surrounding grid points are tighter than for the Oklahoma case, with a confidence bound range at the 200-yr return level of between 89 and 197 mm (3.5–7.75 in.), which encompasses the record size observed near Vivian, South Dakota. As a third evaluation point to explore performance over the southeast United States, we consider the return levels around Atlanta, Georgia. There is extreme quantization at the golf ball–size level, with the largest observed size of 82.5 mm (3.25 in.). The concentration at smaller hail sizes over a wider area is reflected by the narrower range at longer return levels over the surrounding region and confidence bounds from the surrounding grid points are tighter than for the Oklahoma case, with a 200-yr return level between 76 and 152 mm (3–6 in.). Golf ball–sized [45 mm (1.75 in.)] hail has a 2-yr return period at Atlanta, with values of up to 76 mm (3 in.) expected at intervals as short as 20 yr. Finally, the model is evaluated over the mid-Atlantic and Northeast regions, using the grid closest to Philadelphia, Pennsylvania. Hail sizes in this region are again comparatively smaller, with most hail observed close to the minimal severe thresholds, and the largest sizes on record between golf ball and baseball [45–70 mm (1.75–2.75 in.)]. The wider spread in this region reflects the variations induced by a larger number of reports above 1 in. (25 mm), with the remaining sample at the minimum severe level, which leads to point-to-point variations in the estimation due to a noncontinuous observational distribution despite dithering. Nonetheless, 76-mm (3 in.) hail is certainly possible in Philadelphia and surrounds, with this size stone occurring between the 10- and 200-yr return levels, suggesting at least some degree of regularity. The spatial variability of return sizes at given probabilities should not be interpreted on a point basis in the unsmoothed form, as point-to-point sample variation can lead to larger variations in estimated return level, especially at the longest returns. A 100-yr return period implies that there is a 0.01 probability at any point of a given maximum hail size within a 1° × 1° grid box; however, at higher resolution and consequently smaller grid area (which is not examined in this study), the 100-yr value could be equal to this or smaller. While there is a considerable spread in the confidence intervals, there is a strong degree of consistency of hail at least in excess of 51 mm (2 in.) at the 10-yr return level for all locations, reflecting a hazard to property and vehicles.

Evaluation of the maximum expected size of hail at grid points and the nearby region for given return periods in years as illustrated on a Gumbel plot. Dots represent the raw observations of the point (gray) samples and dithered observations (black). Continuous lines indicate the return curve for the point fitted model on the dithered data (blue), and the range of model fits for the surrounding ±3 grid boxes (48 grid boxes total, in red). Confidence intervals for the point (blue), and the surrounding grid (red) are indicated by the dashed lines. Nearest grid points to (a) Oklahoma City, (b) Pierre, (c) Atlanta, and (d) Philadelphia are shown.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Evaluation of the maximum expected size of hail at grid points and the nearby region for given return periods in years as illustrated on a Gumbel plot. Dots represent the raw observations of the point (gray) samples and dithered observations (black). Continuous lines indicate the return curve for the point fitted model on the dithered data (blue), and the range of model fits for the surrounding ±3 grid boxes (48 grid boxes total, in red). Confidence intervals for the point (blue), and the surrounding grid (red) are indicated by the dashed lines. Nearest grid points to (a) Oklahoma City, (b) Pierre, (c) Atlanta, and (d) Philadelphia are shown.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Evaluation of the maximum expected size of hail at grid points and the nearby region for given return periods in years as illustrated on a Gumbel plot. Dots represent the raw observations of the point (gray) samples and dithered observations (black). Continuous lines indicate the return curve for the point fitted model on the dithered data (blue), and the range of model fits for the surrounding ±3 grid boxes (48 grid boxes total, in red). Confidence intervals for the point (blue), and the surrounding grid (red) are indicated by the dashed lines. Nearest grid points to (a) Oklahoma City, (b) Pierre, (c) Atlanta, and (d) Philadelphia are shown.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Generalizing this analysis to the entire fitted domain, we evaluate the fitted point distribution to determine the hail size (in inches) at the respective return periods (Fig. 8). This reveals that for most locations east of the Rockies that hail sizes at the 2-yr return level are over 25 mm (1 in.), and a large majority of grid points in the Great Plains exceed 50 mm (2 in.; Fig. 8b), with values reaching as high as 76–101 mm (3–4 in.) at the 5-yr interval. Increasing the return interval to the 10-yr level, sizes generally range between 76 and 127 mm (3–5 in.), with the higher values mostly confined to grid points in the Great Plains. Given the length of the record (35 yr), the 20- and 50-yr return values most closely resemble the maximum hail size observations, with higher values for many points (Figs. 8e,f). This is as would be expected for a fitted distribution, as there is considerable point uncertainty in event occurrence, especially when combined with the existence of a number of rarer large observations (Figs. 8a,d,e). The values at these levels range between 75 and 152 mm (3–6 in.), suggesting that the model is representative of the data to which it is fitted and includes extensions into the Southeast and Midwest. At longer return periods (100-yr return), hail sizes of 150 mm (6 in.) are identified for most of the fitted hail domain outside of the northeastern United States, with the highest values particularly concentrated through the Great Plains and toward the Canadian border. Extending this to an extremely long return period with low confidence, at the 200-yr level (Fig. 8), large portions of the Great Plains including Oklahoma, Kansas, and Texas would suggest return diameters of 152–203 mm (6–8 in.) or more, which are consistent with the largest values in the existing hail record. Analyzing throughout the return periods, there is low probability but high magnitude potential over the northern Great Plains and Midwest, reflecting rarer excursions of environmental parameters favorable to the development of storms producing this diameter hail. Over much of the domain east of the Rockies, including through the Southeast, east of the Appalachian Mountains, and the eastern population centers, 200-yr return levels are well in excess of 101 mm (4 in.), suggesting the potential for catastrophic hailstorms in areas that comparatively rarely experience these events. As with all extreme value estimates of return levels, the largest potential errors exist in the outer tails, especially when sample size is limited. Nonetheless, the fact that 101-mm (4 in.) or greater measurements are not unusual anywhere within the domain (consistent with the estimated 20–50-yr or greater return period), suggests that longer return levels are not unreasonable, but must be viewed with greater uncertainty.

Fitted point dithered Gumbel estimated return hail sizes for the respective quantiles. (a) Maximum observed hail size for each grid point during 1979–2013. Modeled return hail sizes are shown at the (b) 2-, (c) 5-, (d) 10-, (e) 20-, (f) 50-, (g) 100-, and (h) 200-yr intervals, for points with at least 30 annual maxima on the 1° × 1° grid.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Fitted point dithered Gumbel estimated return hail sizes for the respective quantiles. (a) Maximum observed hail size for each grid point during 1979–2013. Modeled return hail sizes are shown at the (b) 2-, (c) 5-, (d) 10-, (e) 20-, (f) 50-, (g) 100-, and (h) 200-yr intervals, for points with at least 30 annual maxima on the 1° × 1° grid.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Fitted point dithered Gumbel estimated return hail sizes for the respective quantiles. (a) Maximum observed hail size for each grid point during 1979–2013. Modeled return hail sizes are shown at the (b) 2-, (c) 5-, (d) 10-, (e) 20-, (f) 50-, (g) 100-, and (h) 200-yr intervals, for points with at least 30 annual maxima on the 1° × 1° grid.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

The gridbox-to-gridbox variations suggest a more pronounced influence of spatial observational quality on the return period estimates rather than reflecting robust differences in hail size at varying return levels. To offset this, we apply a 2D Gaussian kernel smoother to the return period data with a *σ* = 1.00 (1° smoother) kernel bandwidth to produce a more spatially consistent hazard profile (Fig. 9). The smoothed spatial return map for the maximum hail size for 1979–2013 suggests peak values of approximately 102–127 mm (4–5 in.), with the highest likelihood for these hail sizes over the central to northern Great Plains and extending into both the upper Midwest and Southeast (Fig. 9a). Values as high as 100 mm (4 in.) extend through New Mexico into Arizona and to the Canadian border and into Montana. At the 2-yr return level, much of the Great Plains exhibits values up to 50 mm (2 in.), with a steep gradient toward the east of the Rockies and fairly uniform coverage that extends from Montana to southern Maine, south to central Florida, and stretching west into the desert Southwest (Fig. 9b). At the 10-yr level, much of the region east of the CONUS has return hail sizes of 51–76 mm (2–3 in.) and 101 mm (4 in.) over the central Great Plains. This extension of significant hail [50 mm (2 in.)] is found into southern New York, Pennsylvania, and New Jersey. Return values increase substantially over the Great Plains at the 20- and 50-yr return levels, with a slower increase over much of the remainder of the eastern CONUS, with the smoothed 50-yr return level qualitatively similar to the maximum observed hail size, and the largest difference being the reduction in northern extent (reflecting the tendency of the smoothing kernel to flatten absolute point maxima). There is also some smearing by the smoothing procedure of four grid boxes in Arizona with sufficient very large hail measurements to justify fitting the model, where hail up to 101 mm (4 in.) has been observed in the recent past. Even following the smoothing procedure, hail sizes over 100–125 mm (4–5 in.) are likely over much of the eastern CONUS at return levels of over 100 yr, with 150 mm (6 in.) appearing to be a likely value for the more convectively prone regions of the Great Plains, Midwest, and Southeast, rising to 175–200 mm (7–8 in.) in the central Great Plains (Figs. 9g,h).

As in Fig. 8, but for hail return sizes as derived from Gaussian kernel smoothing of the raw Gumbel return values using a 1.00 sigma (1°) bandwidth.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

As in Fig. 8, but for hail return sizes as derived from Gaussian kernel smoothing of the raw Gumbel return values using a 1.00 sigma (1°) bandwidth.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

As in Fig. 8, but for hail return sizes as derived from Gaussian kernel smoothing of the raw Gumbel return values using a 1.00 sigma (1°) bandwidth.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Testing the modeled hail sizes further, the return periods for hail of 25 mm (1 in.) in diameter are evaluated by comparing them with the annual occurrence rate of a proxy for hail derived from environmental parameters (Allen et al. 2015). This proxy produces a spatially unbiased hail climatology using a combination of monthly environmental parameters favorable to hail development (CAPE, 0–3-km storm relative helicity, convective precipitation, mean 0–90 hPa above ground level specific humidity) in a Poisson regression to simulate the monthly frequency of hail ≥25 mm (1 in.). The point comparison suggests that this hail size has a return period of 1 yr over most of the United States, particularly over the Great Plains, Midwest, and Southeast (Fig. 10a). For comparison, the Gumbel distribution used here cannot provide a return period of less than 1 yr as it is defined by (

Comparison of return periods over the climatology 1979–2013 from the (a) 1° × 1° gridbox model for 25-mm (1 in.) hail with (b) inverse probability calculated using one on the mean annual hail occurrence above 25 mm (1 in.) determined using the hail index derived from North American Regional Reanalysis monthly environmental dataset (Allen et al. 2015). Note that by construction the minimum value of a Gumbel return period here is 1 yr (1/*p*), whereas the occurrence model is capable of producing intervals of less than 1 yr and thus the color scales for the two panels differ.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Comparison of return periods over the climatology 1979–2013 from the (a) 1° × 1° gridbox model for 25-mm (1 in.) hail with (b) inverse probability calculated using one on the mean annual hail occurrence above 25 mm (1 in.) determined using the hail index derived from North American Regional Reanalysis monthly environmental dataset (Allen et al. 2015). Note that by construction the minimum value of a Gumbel return period here is 1 yr (1/*p*), whereas the occurrence model is capable of producing intervals of less than 1 yr and thus the color scales for the two panels differ.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Comparison of return periods over the climatology 1979–2013 from the (a) 1° × 1° gridbox model for 25-mm (1 in.) hail with (b) inverse probability calculated using one on the mean annual hail occurrence above 25 mm (1 in.) determined using the hail index derived from North American Regional Reanalysis monthly environmental dataset (Allen et al. 2015). Note that by construction the minimum value of a Gumbel return period here is 1 yr (1/*p*), whereas the occurrence model is capable of producing intervals of less than 1 yr and thus the color scales for the two panels differ.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Following this positive test, the evaluation threshold is raised for the temporal return period through the respective sizes of interest [38, 45, 51, and 76 mm (1.5, 1.75, 2, and 3 in.]. Even for hail sizes as large as 75 mm (3 in.; Fig. 11), the minimum return period is between 2 and 5 yr for much of the Great Plains, while hailstone diameters of up to 50 mm (2 in.) have return periods of 1–3 yr over the Great Plains, Southeast, and Midwest. These hail sizes appear to be more infrequent along the Appalachian Mountains and into the Northeast, particularly for hail sizes in excess of 50 mm (2 in.). For the lower thresholds, much of the domain experiences hail of up to golf ball–sized diameter [45 mm (1.75 in.)] at a likelihood of an event every 1–2 yr throughout the Great Plains. These results suggest that hailstones capable of producing considerable damage to structures, vehicles, and property [≥45 mm (1.75 in.); Brown et al. 2015] are relatively commonplace on a yearly basis for the Great Plains and Southeast, reflecting a likely hazard irrespective of the available observations. While this does not necessarily imply certainty at any location such as a subgrid-scale city given grid boxes of ~ 100 km × 100 km, it does suggest that these large hail events have a higher rate of occurrence than may have been anticipated based on existing observationally derived climatologies.

Fitted Gumbel return periods on the 1° × 1° grid for the chosen size thresholds: (a) 1.5, (b) 1.75, (c) 2, and (d) 3 in. Note that there is a different scale for the return period values for (a)–(c) as compared to (d) to reflect a longer range of returns for the larger hail sizes.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Fitted Gumbel return periods on the 1° × 1° grid for the chosen size thresholds: (a) 1.5, (b) 1.75, (c) 2, and (d) 3 in. Note that there is a different scale for the return period values for (a)–(c) as compared to (d) to reflect a longer range of returns for the larger hail sizes.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Fitted Gumbel return periods on the 1° × 1° grid for the chosen size thresholds: (a) 1.5, (b) 1.75, (c) 2, and (d) 3 in. Note that there is a different scale for the return period values for (a)–(c) as compared to (d) to reflect a longer range of returns for the larger hail sizes.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

To evaluate these fitted distributions for performance in representing quantiles, next continental scatter diagrams of observed (percentiles from all hail observations) and modeled (percentiles derived from annual maxima) hail size were explored (Fig. 12). As the observational data are limited in quantity, we restrict this to the 80th, 90th, 95th, and 98th percentiles at each grid point (5-, 10-, 20-, and 50-yr return levels) for the undithered and dithered observations to both the point and smoothed model return periods. Against both the quantized and dithered observed quantiles, the point model performed well at the 80th, 90th, and 95th percentiles with high degrees of correlation (Fig. 12). At the 98th percentile, the model also appeared to perform relatively well; however, this is harder to assess as the observations are less representative as a result of the limited sample size, which can explain the slight upward bias in the modeled quantiles. Comparison to the dithered data results in a considerably higher degree of fit, suggesting that it provides a more representative depiction of hail size quantiles over the domain. This supports the conclusion that performance across the United States is very good out to the 20-yr return level, and perhaps slightly overestimating the return size at the 50-yr return period if the 35 yr of observations are representative of the true distribution. Considering instead the smooth return levels sampled on a point basis, there is a greater degree of spread in the compared points where hail size quantiles are both under- and overestimated relative to observed quantiles owing to the smoothing of the sample (Figs. 12c,f,i,l). Nonetheless, the degree of correlation is significantly high between the observed and smoothed datasets, with relatively small point variations particularly at the 80th, 90th, and 95th percentiles.

Comparison of point modeled hail size at the 80th, 90th, 95th, and 98th percentiles and observed hail size for (a),(d),(g),(j) undithered observations; (b),(e),(h),(k) dithered observations; and (c),(f),(i),(l) dithered observations with the smoothed modeled hail size for locations with at least 30 annual maxima observations. Significant Pearson correlations are shown for each comparison.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Comparison of point modeled hail size at the 80th, 90th, 95th, and 98th percentiles and observed hail size for (a),(d),(g),(j) undithered observations; (b),(e),(h),(k) dithered observations; and (c),(f),(i),(l) dithered observations with the smoothed modeled hail size for locations with at least 30 annual maxima observations. Significant Pearson correlations are shown for each comparison.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Comparison of point modeled hail size at the 80th, 90th, 95th, and 98th percentiles and observed hail size for (a),(d),(g),(j) undithered observations; (b),(e),(h),(k) dithered observations; and (c),(f),(i),(l) dithered observations with the smoothed modeled hail size for locations with at least 30 annual maxima observations. Significant Pearson correlations are shown for each comparison.

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Finally, the frequency with which observed hail sizes do not exceed the model percentiles (e.g., ideally 95% of observed hail sizes are below the modeled 95th percentile) are summarized as percentiles of nonexceedance. Each grid point was compared to the fitted model through the range of quantiles over the continental United States and regionally to establish any localized biases at a given return level. Over the entire domain (Fig. 13a), the gridbox model shows a relatively good fit through the middle quantiles (80th–99th) with upward divergence at the 50th and 99th percentiles, the lowest and highest values shown. This suggested that the gridbox model may be overestimating the size of hail for a given return period, which is consistent with the comparison of the Gumbel mean and the mean annual maximum values. This result is somewhat unexpected given that this part of the distribution has the most available data for evaluation (though data limitations influence a number of the fitted grid points), but may also reflect the influence of a lower bound on hail size incurred by the severe thresholds that preclude recording of smaller diameters (Allen and Tippett 2015). Another potential explanation is that the tendency of the Gumbel distribution to weigh toward the center of the dataset leads to the fitted curve being skewed at the extreme tail and the lower return frequencies. The values at the higher quantiles also display this divergence related to the limitations in the maximal observed sizes of the distribution. On a regional basis (Fig. 13b), the model appears to perform well over each of the respective NOAA climate regions (Allen et al. 2015), with similar positive biases over regions with fewer observations over the record length compared to the central Great Plains (e.g., the Northeast).

Percentile of nonexceedence plots for point and areal Gumbel modeled tiles compared to the number of tiles not exceeding the observed values at that grid point. (a) Comparison is made at each of the respective quantiles, with the box and whiskers. (b) As in (a), but limited to NOAA climate regions as defined in Allen et al. (2015).

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Percentile of nonexceedence plots for point and areal Gumbel modeled tiles compared to the number of tiles not exceeding the observed values at that grid point. (a) Comparison is made at each of the respective quantiles, with the box and whiskers. (b) As in (a), but limited to NOAA climate regions as defined in Allen et al. (2015).

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

Percentile of nonexceedence plots for point and areal Gumbel modeled tiles compared to the number of tiles not exceeding the observed values at that grid point. (a) Comparison is made at each of the respective quantiles, with the box and whiskers. (b) As in (a), but limited to NOAA climate regions as defined in Allen et al. (2015).

Citation: Monthly Weather Review 145, 11; 10.1175/MWR-D-17-0119.1

## 4. Discussion

A climatology of large hail occurrence and maximum size potential has been derived from *Storm Data* using observations from 1979 to 2013, providing insight into hazard modeling for large hail. The spatial hazard maps of hail size return intervals generated using this approach illustrate that a simple EVD can produce a first-of-its-kind spatial model for observed hail size return intervals for the central and eastern CONUS. However, it has also been demonstrated that it is necessary to carefully explore the limitations of the observed hail record and statistical techniques to accurately ascertain the hazard, or the reasons for point-to-point variability in the results obtained using this comparatively crude approach. Nonetheless, the performance of the EVD model based on the evaluation conducted here would indicate that it provides a useful analysis of the hazard posed by large hail in the United States, and higher than expected potential for large portions of the country compared to observational climatology, with the east of the country exposed to hail up to 76-mm (3 in.) diameters on a 20–50-yr interval, and over the Great Plains on a 10-yr recurrence period, and exposed to damaging hail [38–50 mm (1.5–2.0 in.) or greater] every 1–2 yr.

Perhaps the most stark limitation of this technique and assessment of U.S. hail size data is the significant quantization present in the size of hail reports resulting from a limited diversity of reference objects available for observers and the very basis of the reporting system (Blair and Leighton 2012; Allen and Tippett 2015; Blair et al. 2017). This challenge can be mitigated by dithering to some extent, but this data processing step introduces additional potential errors (albeit small) into the estimation of fitted distribution parameters. On the other hand, the step allows for a fairer evaluation of the modeled quantiles compared to the quantized observations, and likely reflects the errors that are naturally introduced by observers (Blair and Leighton 2012; Blair et al. 2017). Theoretically and physically, there must be an upper bound to the largest possible hail size at any one location (Knight and Knight 2001), as updraft speed cannot increase without bound for realistic environments. However, for most locations the sample is incomplete or not reflective of the narrow swathes of the largest stones for each storm (Blair et al. 2017) and, thus, may be underrepresentative. Additionally the spatial distribution of the observations reflects the characteristics of population, not only the actual distribution of the hail size observations. This results in errors in the fitted scale and location parameters, particularly where fewer observations or longer return sizes are found. While this error can be mitigated by smoothing procedures or possibly sampling over a wider region to fill out the distribution, there is the potential that this step over- or understates the hail size potential. This suggests a need to divorce the observations from the hail size model, perhaps by using environmental distributions (e.g., Brooks et al. 2003; Gilleland et al. 2013), particularly where data are limited or do not exist.

The limitations of the observational data also lead to issues in the parameter estimation, as there are insufficient samples in many locations to explore the characteristic of the tail of the distribution. It is likely that if additional data were available to constrain parameter estimations, a more general GEV model might be possible, which may include a tailing behavior toward the Weibull distribution like many processes of increasing rarity (Fraile et al. 1992; Dotzek et al. 2009). The point Gumbel model is one possible solution with currently available data. It is also plausible that this result may be sensitive to the spatial resolution of the grid chosen, which merits future investigation. A further complication is that it is not clear how these gridbox results translate to the true probability of experiencing hail at the subgrid scale. However, the nature of the Gumbel as a collector distribution and lack of tailing characteristics in the extremes to provide an upper bound can mean over- or underestimation of hail sizes depending on the available fit using the existing data. This suggests that as future data become available, it may be possible to improve on the modeled result here and possibly that the point distribution will converge to a Weibull distribution, or be better modeled using a generalized Pareto distribution. However, at the current juncture neither of these approaches produced stable results as a result of large gridbox-to-gridbox variations in the estimated shape parameters. Known long return period observations (35 yr) appear to be consistent with the modeled distribution, suggesting that the outer tail is being reasonably well captured by the Gumbel model. Despite the limitations of the Gumbel approach, for the 2–100-yr return periods, the regional and national performance metrics provide confidence that this model for U.S. hail size performs well in assessing the threat posed to the United States by large hail events.

## Acknowledgments

The authors acknowledge support from FM Global in conducting this research and preparing this manuscript. JTA conducted the data analysis, production of figures, and writing of the paper. All authors contributed to the research design and writing and editing of the final paper.

## REFERENCES

Allen, J. T., and M. K. Tippett, 2015: The characteristics of United States hail reports: 1955–2014.

,*Electron. J. Severe Storms Meteor.***10**(3), http://www.ejssm.org/ojs/index.php/ejssm/article/viewArticle/149.Allen, J. T., M. K. Tippett, and A. H. Sobel, 2015: An empirical model relating U.S. monthly hail occurrence to large-scale meteorological environment.

,*J. Adv. Model. Earth Syst.***7**, 226–243, doi:10.1002/2014MS000397.Bardsley, W., 1990: On the maximum observed hailstone size.

,*J. Appl. Meteor.***29**, 1185–1187, doi:10.1175/1520-0450(1990)029<1185:OTMOHS>2.0.CO;2.Blair, S. F., and J. W. Leighton, 2012: Creating high-resolution hail datasets using social media and post-storm ground surveys.

,*Electron. J. Oper. Meteor.***13**(3), 32–45, http://nwafiles.nwas.org/ej/pdf/2012-EJ3.pdf.Blair, S. F., and Coauthors, 2017: High-resolution hail observations: Implications for NWS warning operations.

,*Wea. Forecasting***32**, 1101–1119, doi:10.1175/WAF-D-16-0203.1.Brooks, H. E., 2013: Severe thunderstorms and climate change.

,*Atmos. Res.***123**, 129–138, doi:10.1016/j.atmosres.2012.04.002.Brooks, H. E., J. W. Lee, and J. P. Craven, 2003: The spatial distribution of severe thunderstorm and tornado environments from global reanalysis data.

,*Atmos. Res.***67–68**, 73–94, doi:10.1016/S0169-8095(03)00045-0.Brown, T. M., W. H. Pogorzelski, and I. M. Giammanco, 2015: Evaluating hail damage using property insurance claims data.

,*Wea. Climate Soc.***7**, 197–210, doi:10.1175/WCAS-D-15-0011.1.Changnon, S. A., Jr., 1966: Note on recording hail incidences.

,*J. Appl. Meteor.***5**, 899–901, doi:10.1175/1520-0450(1966)005<0899:NORHI>2.0.CO;2.Changnon, S. A., Jr., 1977: The scales of hail.

,*J. Appl. Meteor.***16**, 626–648, doi:10.1175/1520-0450(1977)016<0626:TSOH>2.0.CO;2.Changnon, S. A., Jr., 1999: Data and approaches for determining hail risk in the contiguous United States.

,*J. Appl. Meteor.***38**, 1730–1739, doi:10.1175/1520-0450(1999)038<1730:DAAFDH>2.0.CO;2.Changnon, S. A., Jr., 2008: Temporal and spatial distributions of damaging hail in the continental United States.

,*Phys. Geogr.***29**, 341–350, doi:10.2747/0272-3646.29.4.341.Changnon, S. A., Jr., and D. Changnon, 2000: Long-term fluctuations in hail incidences in the United States.

,*J. Climate***13**, 658–664, doi:10.1175/1520-0442(2000)013<0658:LTFIHI>2.0.CO;2.Cheng, L., M. English, and R. Wong, 1985: Hailstone size distributions and their relationship to storm thermodynamics.

,*J. Climate Appl. Meteor.***24**, 1059–1067, doi:10.1175/1520-0450(1985)024<1059:HSDATR>2.0.CO;2.Cintineo, J. L., T. M. Smith, V. Lakshmanan, H. E. Brooks, and K. L. Ortega, 2012: An objective high-resolution hail climatology of the contiguous United States.

,*Wea. Forecasting***27**, 1235–1248, doi:10.1175/WAF-D-11-00151.1.Coles, S., 2001:

*An Introduction to Statistical Modeling of Extreme Values.*Springer, 209 pp.Cox, M., and P. R. Armstrong, 1981: A statistical model for the incidence of large hailstones on solar collectors.

,*Sol. Energy***26**, 97–111, doi:10.1016/0038-092X(81)90072-4.Doswell, C. A., III, 2001: Severe convective storms—An overview.

*Severe Convective Storms, Meteor. Monogr.*, No. 50, Amer. Meteor. Soc., 1–26.Doswell, C. A., III, H. E. Brooks, and M. P. Kay, 2005: Climatological estimates of daily local nontornadic severe thunderstorm probability for the United States.

,*Wea. Forecasting***20**, 577–595, doi:10.1175/WAF866.1.Dotzek, N., J. Grieser, and H. E. Brooks, 2003: Statistical modeling of tornado intensity distributions.

,*Atmos. Res.***67–68**, 163–187, doi:10.1016/S0169-8095(03)00050-4.Dotzek, N., P. Groenemeijer, B. Feuerstein, and A. M. Holzer, 2009: Overview of ESSL’s severe convective storms research using the European Severe Weather Database ESWD.

,*Atmos. Res.***93**, 575–586, doi:10.1016/j.atmosres.2008.10.020.Fisher, R. A., and L. H. C. Tippett, 1928: Limiting forms of the frequency distribution of the largest or smallest member of a sample.

,*Math. Proc. Cambridge Philos. Soc.***24**, 180–190, doi:10.1017/S0305004100015681.Fraile, R., A. Castro, and J. Sánchez, 1992: Analysis of hailstone size distributions from a hailpad network.

,*Atmos. Res.***28**, 311–326, doi:10.1016/0169-8095(92)90015-3.Fraile, R., C. Berthet, J. Dessens, and J. L. Sánchez, 2003: Return periods of severe hailfalls computed from hailpad data.

,*Atmos. Res.***67–68**, 189–202, doi:10.1016/S0169-8095(03)00051-6.Frechet, M., 1927: Sur la loi de probabilité’ de l’cart maximum.

,*Ann. Soc. Pol. Math.***6**, 93–117.Gilleland, E., B. G. Brown, and C. M. Ammann, 2013: Spatial extreme value analysis to project extremes of large-scale indicators for severe weather.

,*Environmetrics***24**, 418–432, doi:10.1002/env.2234.Gumbel, E. J., 1958:

*Statistics of Extremes.*Columbia University Press, 375 pp.Gunturi, P., and M. K. Tippett, 2017: Managing severe thunderstorm risk: Impact of ENSO on U.S. tornado and hail frequencies. WillisRe Tech. Rep., 5 pp., http://www.willisre.com/Media_Room/Press_Releases_(Browse_All)/2017/WillisRe_Impact_of_ENSO_on_US_Tornado_and_Hail_frequencies_Final.pdf.

Heymsfield, A. J., I. M. Giammanco, and R. Wright, 2014: Terminal velocities and kinetic energies of natural hailstones.

,*Geophys. Res. Lett.***41**, 8666–8672, doi:10.1002/2014GL062324.Hosking, J. R. M., J. R. Wallis, and E. F. Wood, 1985: Estimation of the generalized extreme-value distribution by the method of probability-weighted moments.

,*Technometrics***27**, 251–261, doi:10.1080/00401706.1985.10488049.Jenkinson, A. F., 1955: The frequency distribution of the annual maximum (or minimum) values of meteorological elements.

,*Quart. J. Roy. Meteor. Soc.***81**, 158–171, doi:10.1002/qj.49708134804.Kelly, D. L., J. T. Schaefer, and C. A. Doswell III, 1985: Climatology of nontornadic severe thunderstorm events in the United States.

,*Mon. Wea. Rev.***113**, 1997–2014, doi:10.1175/1520-0493(1985)113<1997:CONSTE>2.0.CO;2.Knight, C. A., and N. C. Knight, 2001: Hailstorms.

*Severe Convective Storms, Meteor. Monogr.*, No. 50, Amer. Meteor. Soc., 223–248.Martins, E. S., and J. R. Stedinger, 2000: Generalized maximum-likelihood generalized extreme-value quantile estimators for hydrologic data.

,*Water Resour. Res.***36**, 737–744, doi:10.1029/1999WR900330.McMaster, H., 1999: The potential impact of global warming on hail losses to winter cereal crops in New South Wales.

,*Climatic Change***43**, 455–476, doi:10.1023/A:1005475717321.Morgan, G. M., Jr., and N. G. Towery, 1975: Small-scale variability of hail and its significance for hail prevention experiments.

,*J. Appl. Meteor.***14**, 763–770, doi:10.1175/1520-0450(1975)014<0763:SSVOHA>2.0.CO;2.Munich RE, 2015: Severe convective storms and hail—Icy cricket balls from above. Accessed 25 November 2015. http://www.munichre.com/australia/australia-natural-hazards/australia-storm/hailstorm/index.html.

Nelson, S. P., 1983: The influence of storm flow structure on hail growth.

,*J. Atmos. Sci.***40**, 1965–1983, doi:10.1175/1520-0469(1983)040<1965:TIOSFS>2.0.CO;2.Nelson, S. P., 1987: The hybrid multicellular–supercellular storm—An efficient hail producer. Part II: General characteristics and implications for hail growth.

,*J. Atmos. Sci.***44**, 2060–2073, doi:10.1175/1520-0469(1987)044<2060:THMSEH>2.0.CO;2.Nelson, S. P., and S. K. Young, 1979: Characteristics of Oklahoma hailfalls and hailstorms.

,*J. Appl. Meteor.***18**, 339–347, doi:10.1175/1520-0450(1979)018<0339:COOHAH>2.0.CO;2.Ortega, L., T. M. Smith, K. L. Manross, K. A. Scharfenberg, A. Witt, A. G. Kolodziej, and J. J. Gourley, 2009: The Severe Hazards Analysis and Verification Experiment.

,*Bull. Amer. Meteor. Soc.***90**, 1519–1530, doi:10.1175/2009BAMS2815.1.Palutikof, J., B. Brabson, D. Lister, and S. Adcock, 1999: A review of methods to calculate extreme wind speeds.

,*Meteor. Appl.***6**, 119–132, doi:10.1017/S1350482799001103.Pavia, E. G., and J. J. O’Brien, 1986: Weibull statistics of wind speed over the ocean.

,*J. Climate Appl. Meteor.***25**, 1324–1332, doi:10.1175/1520-0450(1986)025<1324:WSOWSO>2.0.CO;2.Sánchez, J., R. Fraile, J. De La Madrid, M. De La Fuente, P. Rodríguez, and A. Castro, 1996: Crop damage: The hail size factor.

,*J. Appl. Meteor.***35**, 1535–1541, doi:10.1175/1520-0450(1996)035<1535:CDTHSF>2.0.CO;2.Sander, J., J. Eichner, E. Faust, and M. Steuer, 2013: Rising variability in thunderstorm-related U.S. losses as a reflection of changes in large-scale thunderstorm forcing.

,*Wea. Climate Soc.***5**, 317–331, doi:10.1175/WCAS-D-12-00023.1.Schaefer, J. T., and R. Edwards, 1999: The SPC Tornado/Severe Thunderstorm Database. Preprints,

*11th Conf. on Applied Climatology*, Dallas, TX, Amer. Meteor. Soc., 215–220.Schaefer, J. T., J. J. Levit, S. J. Weiss, and D. W. McCarthy, 2004: The frequency of large hail over the contiguous United States.

*14th Conf. on Applied Climatology,*Seattle, WA, Amer. Meteor. Soc., 3.3, https://ams.confex.com/ams/pdfpapers/69834.pdf.Smith, P. L., and A. Waldvogel, 1989: On determinations of maximum hailstone sizes from hailpad observations.

,*J. Appl. Meteor.***28**, 71–76, doi:10.1175/1520-0450(1989)028<0071:ODOMHS>2.0.CO;2.Strong, G., and E. Lozowski, 1977: An Alberta study to objectively measure hailfall intensity.

,*Atmosphere***15**, 33–53.Swiss RE, 2017: Natural catastrophes and man-made disasters in 2016: A year of widespread damages. Swiss RE Rep. 2/2017, 42 pp., http://media.swissre.com/documents/sigma2_2017_en.pdf.

Wilks, D. S., 2006: Comparison of ensemble-MOS methods in the Lorenz ’96 setting.

,*Meteor. Appl.***13**, 243–256, doi:10.1017/S1350482706002192.Ziegler, C. L., P. S. Ray, and N. C. Knight, 1983: Hail growth in an Oklahoma multicell storm.

,*J. Atmos. Sci.***40**, 1768–1791, doi:10.1175/1520-0469(1983)040<1768:HGIAOM>2.0.CO;2.