Deriving Severe Hail Likelihood from Satellite Observations and Model Reanalysis Parameters Using a Deep Neural Network

Benjamin Scarino aNASA Langley Research Center, Hampton, Virginia

Search for other papers by Benjamin Scarino in
Current site
Google Scholar
PubMed
Close
,
Kyle Itterly bAnalytical Mechanics Associates, Hampton, Virginia

Search for other papers by Kyle Itterly in
Current site
Google Scholar
PubMed
Close
,
Kristopher Bedka aNASA Langley Research Center, Hampton, Virginia

Search for other papers by Kristopher Bedka in
Current site
Google Scholar
PubMed
Close
,
Cameron R. Homeyer cSchool of Meteorology, University of Oklahoma, Norman, Oklahoma

Search for other papers by Cameron R. Homeyer in
Current site
Google Scholar
PubMed
Close
,
John Allen dCentral Michigan University Institute for Great Lakes Research, Mount Pleasant, Michigan

Search for other papers by John Allen in
Current site
Google Scholar
PubMed
Close
,
Sarah Bang eNASA Marshall Space Flight Center, Huntsville, Alabama

Search for other papers by Sarah Bang in
Current site
Google Scholar
PubMed
Close
, and
Daniel Cecil eNASA Marshall Space Flight Center, Huntsville, Alabama

Search for other papers by Daniel Cecil in
Current site
Google Scholar
PubMed
Close
Open access

Abstract

Geostationary satellite imagers provide historical and near-real-time observations of cloud-top patterns that are commonly associated with severe convection. Environmental conditions favorable for severe weather are thought to be represented well by reanalyses. Predicting exactly where convection and costly storm hazards like hail will occur using models or satellite imagery alone, however, is extremely challenging. The multivariate combination of satellite-observed cloud patterns with reanalysis environmental parameters, linked to Next Generation Weather Radar (NEXRAD) estimated maximum expected size of hail (MESH) using a deep neural network (DNN), enables estimation of potentially severe hail likelihood for any observed storm cell. These estimates are made where satellites observe cold clouds, indicative of convection, located in favorable storm environments. We seek an approach that can be used to estimate climatological hailstorm frequency and risk throughout the historical satellite data record. Statistical distributions of convective parameters from satellite and reanalysis show separation between nonsevere and severe hailstorm classes for predictors that include overshooting cloud-top temperature and area characteristics, vertical wind shear, and convective inhibition. These complex, multivariate predictor relationships are exploited within a DNN to produce a likelihood estimate with a critical success index of 0.511 and Heidke skill score of 0.407, which is exceptional among analogous hail studies. Furthermore, applications of the DNN to case studies demonstrate good qualitative agreement between hail likelihood and MESH. These hail classifications are aggregated across an 11-yr Geostationary Operational Environmental Satellite (GOES) image database from GOES-12/13 to derive a hail frequency and severity climatology, which denotes the central Great Plains, the Midwest, and northwestern Mexico as being the most hail-prone regions within the domain studied.

© 2023 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Publisher’s Note: This article was revised on 18 September 2023 to correct the affiliation of the first author, which was incorrect when originally published.

Corresponding author: Benjamin Scarino, benjamin.r.scarino@nasa.gov

Abstract

Geostationary satellite imagers provide historical and near-real-time observations of cloud-top patterns that are commonly associated with severe convection. Environmental conditions favorable for severe weather are thought to be represented well by reanalyses. Predicting exactly where convection and costly storm hazards like hail will occur using models or satellite imagery alone, however, is extremely challenging. The multivariate combination of satellite-observed cloud patterns with reanalysis environmental parameters, linked to Next Generation Weather Radar (NEXRAD) estimated maximum expected size of hail (MESH) using a deep neural network (DNN), enables estimation of potentially severe hail likelihood for any observed storm cell. These estimates are made where satellites observe cold clouds, indicative of convection, located in favorable storm environments. We seek an approach that can be used to estimate climatological hailstorm frequency and risk throughout the historical satellite data record. Statistical distributions of convective parameters from satellite and reanalysis show separation between nonsevere and severe hailstorm classes for predictors that include overshooting cloud-top temperature and area characteristics, vertical wind shear, and convective inhibition. These complex, multivariate predictor relationships are exploited within a DNN to produce a likelihood estimate with a critical success index of 0.511 and Heidke skill score of 0.407, which is exceptional among analogous hail studies. Furthermore, applications of the DNN to case studies demonstrate good qualitative agreement between hail likelihood and MESH. These hail classifications are aggregated across an 11-yr Geostationary Operational Environmental Satellite (GOES) image database from GOES-12/13 to derive a hail frequency and severity climatology, which denotes the central Great Plains, the Midwest, and northwestern Mexico as being the most hail-prone regions within the domain studied.

© 2023 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Publisher’s Note: This article was revised on 18 September 2023 to correct the affiliation of the first author, which was incorrect when originally published.

Corresponding author: Benjamin Scarino, benjamin.r.scarino@nasa.gov

1. Introduction

Deep convection is ubiquitous across the globe and integral to Earth’s climate system. It is responsible for redistributing heat, moisture, and momentum, as well as transporting water vapor into the stratosphere. Deep convection produces beneficial rainfall but can also generate severe weather conditions that adversely impact society, such as damaging wind, large hail, tornadoes, lightning, flooding rainfall, and aviation icing and turbulence (Yost et al. 2018; Mecikalski et al. 2021). Hail damage produces approximately 60% of the average annualized loss of the three primary severe weather hazards in the United States: straight-line winds, hail, and tornadoes (Gunturi and Tippett 2017). Promoting resilience against such hazards on local and global scales is a primary goal of the NASA Applied Sciences Disasters program, which seeks to encourage use of satellite observations to quantify and mitigate risk (NASA Earth Science Applied Sciences Disasters 2021).

Extensive research, using rawinsonde, numerical weather prediction, and reanalysis profiles in environments near severe weather events, indicates that hazards produced by a storm as well as storm structure are by-products of its thermodynamic and kinematic environments. Parameters such as convective available potential energy (CAPE), vertical wind shear, midtropospheric lapse rate, lower tropospheric mixing ratio, and statistical combinations of these and many other predictors have been found useful for developing climatologies of and forecasting severe storms (e.g., Brooks et al. 2003; Johnson and Sugden 2014; Púčik et al. 2015; Prein and Holland 2018; Taszarek et al. 2020). Severe storm environments are often complex and can change significantly in time and space, with variations that may not be adequately captured by observations and models (Coniglio and Parker 2020; Coniglio and Jewell 2022). These complexities lead to some disagreements in reanalyses such as the Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2; Gelaro et al. 2017), and the fifth major global reanalysis produced by ECMWF (ERA5; Hersbach et al. 2020), especially near topographic boundaries (Taszarek et al. 2021). All storm cells that form in any given favorable severe storm environment do not necessarily produce severe weather, nor will storms in close proximity produce the same weather hazards or hazards of the same intensity. Observations of storms from remote sensing instruments, such as ground-based radars, satellite imagers, and lightning detection networks, are needed, therefore, to improve differentiation of severe from nonsevere storm cells.

Studies have shown that severe storms exhibit a variety of signatures in satellite observations near in time to reported severe weather events. Cold, textured cloud tops embedded within a warmer, smooth cirrus anvil indicate the presence of strong updrafts that have penetrated through the level of neutral buoyancy, and often into the stratosphere. These overshooting cloud tops (OTs) can be detected and characterized in terms of their tropopause-relative infrared (IR) brightness temperatures (BTs), BT difference (BTD) between the OT and surrounding anvil, OT area, and height, using automated methods (e.g., Marion et al. 2019; Khlopenkov et al. 2021). The presence of an OT is statistically correlated with severe weather reports (Dworak et al. 2012; Bedka and Khlopenkov 2016; Bedka et al. 2018; Khlopenkov et al. 2021). Additionally, ice particles within intense convection and hailstorms scatter microwave radiation emitted by the surface before it can reach a satellite sensor, leading to BT depressions observed by passive microwave imagers such as the Tropical Rainfall Measuring Mission (TRMM) Microwave Imager (TMI), the Global Precipitation Measurement Mission (GPM) Microwave Imager (GMI), and the Advanced Microwave Scanning Radiometer 2 (AMSR2) on board the Global Change Observation Mission–Water (GCOM-W1) satellite. Hailstorms generate notable BT depressions in lower-frequency (10–37 GHz) passive microwave observations. These multispectral signals have been recently combined with human-spotter hail reports to derive hail probability for convective feature detections (Bang and Cecil 2019; Bruick et al. 2019; Bang and Cecil 2021).

The Geostationary Operational Environmental Satellite (GOES) series has observed IR BT at 4 km pixel−1 nadir resolution, sufficient for resolving and characterizing OTs, since the launch of GOES-8 in 1994. The GOES-8 to GOES-15 series has collected imagery, typically at 15-min intervals over North America, useful for determining the climatology of severe storms throughout the diurnal cycle over much of the Western Hemisphere at much higher spatial resolution than is possible from reanalyses. OT detection climatologies have been used to define hailstorm risk for application within the reinsurance industry (e.g., Punge et al. 2017, 2023). The GOES-R satellite series, beginning with GOES-16 that provided initial preoperational imagery in 2017, observes IR BT at 2-km pixel−1nadir resolution and 5-min intervals over North America, and 10–15-min intervals over a hemispheric (i.e., “full disk”) field of view with its Advanced Baseline Imager (ABI). This improved ABI image quality represents a discontinuity in the GOES climate data record because spatial resolution has a significant impact on cloud-top appearance and apparent intensity that can affect severe storm detection algorithm performance (Khlopenkov et al. 2021; Cooney et al. 2021).

Although severe weather can be generated by storms with OTs, many OT-producing storms do not generate such weather, because the environment is not conducive due to deficient or excessive wind shear, unfavorable hodograph shape, and insufficient moisture or instability. Nevertheless, studies analyzing limited samples of data suggest that OTs with colder IR BT and larger area are more likely to be associated with heavy rainfall and severe weather (Dworak et al. 2012; Marion et al. 2019; Sandmæl et al. 2019; Khlopenkov et al. 2021). Knowing the extent to which these findings hold, however, requires more rigorous analysis, for example, examining the influence of the regional meteorological environment. This suggests that a combination of storm environmental conditions and satellite observations could be used to better depict the climatology of severe storms (Cintineo et al. 2018, 2020).

In addition to complexities associated with severe storm detection, knowledge of exactly when and where severe weather has occurred is often lacking. Deep convection can evolve rapidly, and uncertainties in human-spotter report timing and location can confound our ability to develop statistical relationships between cloud properties and severe weather occurrence. Spotter hail reports, for example, are known to be biased toward higher population density, daytime hours, and regions with greater road density (Allen and Tippett 2015; Ortega 2018; Elmore et al. 2022). Hail sizes are frequently underreported due to rapid melting, are misreported due to perceived size correlation with objects of known size (e.g., a golf ball), and hail events coincident with tornadoes are often omitted (Allen and Tippett 2015). The use of hail diameter as a proxy for storm intensity, which fails to sufficiently describe the mass or density of hailstones and is difficult to measure accurately, presents another fundamental uncertainty in hail reports (Doswell et al. 2005; Allen and Tippett 2015). To offset the limitations in observations, the maximum expected size of hail (MESH), derived from a vertical integration of radar reflectivity at horizontal polarization above the freezing level, is a commonly employed metric for estimating hail occurrence and size that can be used to mitigate uncertainties with reports (Witt et al. 1998; Murillo and Homeyer 2019). MESH has been used to derive high-spatial-resolution hail climatologies over the contiguous United States (CONUS; Cintineo et al. 2012; Murillo et al. 2021; Wendt and Jirak 2021), consistent with best-available ground report climatologies (Allen and Tippett 2015; Taszarek et al. 2020).

Long-term GOES satellite imager and MESH climatologies over the CONUS provide a new opportunity to explore satellite-derived statistical properties of hailstorm cloud tops, the environmental characteristics supportive of hailstorms depicted by reanalyses, and the detectability of hailstorms using GOES satellite, reanalysis, and a combination of the two datasets. In this paper, we analyze an 11-yr record of GOES-12/13 data, as well as a period during the 2017 warm season (April–August) when GOES-13 and GOES-16 provided nearly coincident measurements during the GOES-16 preoperational checkout period. We focus on the GOES-12/13 data here to demonstrate how hailstorms could be detected using the nearly 24-yr GOES-8 to GOES-15 data record; these satellites carried imagers with consistent spatiotemporal resolution. The NASA MERRA-2 and ECMWF ERA5 reanalyses are analyzed together to determine the sensitivity of hailstorm detectability to the choice of reanalysis, as well as other input variations. GOES and reanalysis data are combined and aligned with MESH-identified convective cells within a deep neural network (DNN) to determine the feasibility of discriminating potentially severe hailstorms from subsevere (or nonsevere) hailstorms. The relative importance of various GOES and reanalysis parameters for severe hailstorm discrimination was assessed, and these form the basis of the final DNN. With emphasis on satellite-identified storm signatures and their properties, this approach to estimating likelihood for severe hail builds on similar deep learning and machine learning efforts that tied radar- and model-identified cell features and lightning detection to hail reports, MESH, severe weather warnings, or a simulated hail-size reference through use of a regularized regression model, a convolutional neural network, or random forest (RF) classifier and RF regression models (Gagne et al. 2017; Czernecki et al. 2019; Gagne et al. 2019; Burke et al. 2020; Gensini et al. 2021; Mecikalski et al. 2021). We show that a likelihood for potentially severe hail can be estimated with relatively high accuracy using a combination of GOES and reanalysis data, exceeding the predictive skill of a DNN model that uses reanalysis or GOES inputs alone. Note that terms like predictive, and different forms of the word, are in the context a DNN output, which is commonly called a prediction, and should not be confused with a meteorological forecast. This work aims to develop a DNN model that can estimate severe hail likelihood for satellite-identified convective signatures as an initial step toward global application, which would especially benefit regions without a long-term database of weather radar observations.

2. Data

a. GOES convective cloud-top characteristics

Automated detections of OT and anvil characteristics are derived from 10.7-μm GOES-12/13 to 10.3-μm GOES-16 IR BTs using algorithms described by Khlopenkov et al. (2021) and validated and analyzed by Cooney et al. (2021). GOES IR BT observations are first resampled to a grid and then converted to tropopause-relative values (IR − tropopause) by differencing the IR BT and a tropopause temperature derived from the MERRA-2 reanalysis. This measurement is referred to as a BT score. Tropopause temperature is spatially smoothed and interpolated in time and space to the satellite image time and pixel grid. The IR − tropopause values up to 35 K warmer than the tropopause are further analyzed as possible anvils and OTs. Local BT-score maxima within anvils, referred to as embedded cold spots (ECSs) throughout this paper, are then identified and serve as OT candidates. Note that the absolute BTD between the ECS and the anvil (ECS − anvil BTD) can be 1 K at minimum (ECS colder than the anvil), so hailstorms without an ECS detection have little to no IR BT variation within their anvils. Derivations of IR anvil rating, anvil cloud extent surrounding each ECS, ECS area, and the ECS OT probability are further explained by Khlopenkov et al. (2021). ECS area is defined here as the cumulative area of pixels corresponding to a unique ECS identifier after parallax correction, with the following additional condition: All ECS pixels must be colder than a factor of 0.75 of the ECS − anvil BTD added to the coldest pixel’s IR BT, which ensures that only relatively prominent portions of each ECS are considered in the area calculation.

GOES-12/13 OT detections during 2007–17 from 23°–50°N to 65°–115°W, a region that includes most of the CONUS, subtropical convectively active regions over northern and western Mexico, the Gulf of Mexico, and the Gulf Stream, are matched with convective cells defined using hourly NEXRAD MESH. GOES-12/13 scanned the CONUS at approximately 15-min intervals during normal operations and at 7.5-min intervals when operated in rapid-scan mode during high-impact weather events. As described in section 3b, GOES-12/13 data can be matched with MESH convective cells when collected within ±7.5 min from the hour. No matches were possible when full-disk scanning limited imaging frequency to every 30 min, which occurred every 3 hours (0000, 0300, 0600, . . ., 2100 UTC). GOES-16 full-disk scans from April through August 2017, during its preoperational period, were also analyzed at 15-min scan intervals to provide consistency with typical GOES-12/13 operational scanning. Additionally, GOES-16 1-min Mesoscale Domain Sector data were subsampled to emulate the CONUS 5-min resolution for case-study analyses. GOES data were acquired from the University of Wisconsin–Madison Space Science and Engineering Center via the Man–Computer Interactive Data Access System-X (McIDAS-X) software package (Lazzara et al. 1999).

b. Maximum expected size of hail

The NOAA NEXRAD network collects volumetric scans of horizontal reflectivity factor at many elevation angles, which are combined to generate the MESH product. Percentiles of the unfiltered MESH are derived by combining isotherm heights from the National Centers for Environmental Prediction Rapid Refresh model with echo-top heights from the 3D Gridded NEXRAD WSR-88D (GridRad) dataset, which are then scaled to observed hail-size reports using an empirically derived power-law fit. Specifically, observed ≥40-dBZ echo-top heights above the freezing level are integrated and scaled to derive a severe hail index (SHI). Then, a power-law fit is empirically formulated to fit the SHI using the 75th and 95th percentiles of ground hail reports, resulting in the 75th- and 95th-percentile MESH (MESH75 and MESH95; Homeyer and Kumjian 2015; Homeyer and Bowman 2017; Murillo and Homeyer 2019).

Localized MESH maxima associated with convective cells (see section 3a) are identified from hourly, 2-km GridRad MESH over the CONUS (25°49°N; 69°–115°W; Bowman and Homeyer 2017). These cells are used as the basis for aggregating GOES and reanalysis data for training and validating the DNN. Case studies shown in this paper take advantage of a recently available 5-min GridRad Severe MESH dataset (University of Oklahoma School of Meteorology 2021).

c. Ground-spotter storm reports

Reports of hail diameter exceeding 1 in. (25.4 mm) were obtained from the NOAA Storm Prediction Center (SPC) Severe Weather Database (https://www.spc.noaa.gov/wcm/) from January 2007 through December 2017 (42 040 total reports). We examine GOES and reanalysis properties as a function of reported hail size, along with MESH and passive microwave hail probabilities from TMI, GMI, and AMSR2. As noted above, hail-size reporting suffers from several inadequacies, compounded with uncertainty regarding the locations of subsevere or nonhailstorms. These shortcomings inspire use of GridRad MESH as a hail proxy.

d. Passive microwave hail detections

Another proxy for hail occurrence is derived from satellite passive microwave radiometer (MWR) data. The presence of ice particles in an atmospheric column scatters outgoing microwave radiation emitted by Earth’s surface, the magnitude of which is dependent on the particle size, number concentration, and the wavelength band being observed (Cecil 2009). Such scattering results in a lowering of MWR multispectral BTs. Precipitation features, that is, a set of contiguous MWR pixels with low BTs thought to be caused by convection, are first constructed (Nesbitt et al. 2000; Cecil et al. 2005; Liu et al. 2008). The features are defined as contiguous areas with an 85-GHz BT (TMI) or 89-GHz BT (GMI and AMSR2) below 200 K. An algorithm to determine the probability of severe hail was developed based on an empirical relationship between SPC severe hail reports and the multispectral MWR BT (Bang and Cecil 2019, 2021). The precessing orbits of the TRMM and GPM satellites enable observations of deep convection at all hours, thereby mitigating observation time biases, whereas the GCOM-W1 AMSR2 instrument flies in a sun-synchronous orbit and collects observations at only 0130 and 1330 local time. Microwave hail detections have been used to construct global climatologies using a 2° × 2° grid, normalized by sampling opportunities, which reveal generally good spatial agreement with climatologies derived from MESH and ground hail reports over CONUS (Bang and Cecil 2019, 2021). These MWR hail measurements provide an alternative perspective of hailstorm intensity that is useful for comparing with the relationships found from GOES and reanalysis and hail-size metrics produced from MESH and the SPC.

e. Global reanalyses

Two state-of-the-art atmospheric reanalyses are employed to explore hail likelihood-estimation sensitivity to meteorological variables derived from models with distinct parameterizations, means of data assimilation, and spatiotemporal resolutions. Despite the coarse resolution relative to convective-scale processes, global reanalyses have been used in numerous studies to identify large-scale environmental conditions favorable for severe weather and to assess predictability of severe weather (Brooks et al. 2003; Brooks 2009; Punge et al. 2014; Allen et al. 2015; Punge et al. 2017; King and Kennedy 2019; Taszarek et al. 2020; Gensini et al. 2021). Validation of these reanalyses against observed soundings suggests strong performance over the CONUS for both thermodynamic and kinematic parameters, particularly outside of the boundary layer (Taszarek et al. 2021). Additionally, the continuous global coverage offered by reanalyses provides a framework to apply the model to regions beyond CONUS. A set of environmental parameters thought to be useful for identifying hailstorm environments, described in section 3c, was derived from the ERA5 and MERRA-2 reanalyses using the Python package, xcape (https://github.com/xgcm/xcape; Lepora et al. 2021). ERA5 is the latest global reanalysis product from the European Centre for Medium Range Weather Forecasting (ECMWF). It has horizontal grid spacing of 0.25° × 0.25°, 137 vertical levels, and a 1-h output interval (Hersbach et al. 2020). MERRA-2 is the latest global reanalysis product from NASA’s Global Modeling and Assimilation Office (GMAO). It has horizontal grid spacing of 0.5° × 0.625°, 72 vertical levels, and a 3-h output interval (Gelaro et al. 2017).

3. Methods

An 11-yr record of GOES-derived cloud-top properties for ECSs, spatially and temporally aligned with MESH convective cells, model reanalysis parameters, SPC hail-size reports, and MWR hail probability, was assembled using methods described below. We seek to identify cloud-top patterns and environmental parameters consistent with potentially severe hailstorms. A DNN allows for discovery of less intuitive, multidimensional input combinations of such parameters, which leads to enhanced predictive capability. These predictors and their roles in various DNN input sensitivity experiments are defined, as are the DNN architecture and evaluation metrics.

a. MESH cell feature database

A database of cell object characteristics is assembled by optimizing the open-source Python package Tracking and Object-Based Analysis of Clouds, version 1.2 (tobac), to detect local MESH95 maxima exceeding 10 mm (Heikenfeld et al. 2019). The location and maximum value of MESH95 for each feature are stored. Feature maxima must be spaced by at least 28 × 28 km2 (7 × 7 GOES-12/13 pixels) to facilitate extraction of GOES parameters around each MESH95 maximum. All settings used within tobac, version 1.2, are listed in Table 1.

Table 1.

Feature detection and watershed thresholds used for MESH cell detection in tobac.

Table 1.

The ECS matching requirement is expected to exclude less-confident MESH detections while anchoring DNN predictions to satellite-observed storm cells. Nonsevere MESH95 cells [<1.5 in. (38.1 mm) in diameter; see section 3d(2)] match with an ECS about 25% of the time, indicating that such cells either have warm cloud tops or tops with little IR BT variation (Table 2). Cells with larger (≥1.5 in.) MESH95 diameter, a value corresponding to potentially severe hail, have a 59% match rate with an ECS. The match rate increases to 73% for ≥2-in. (≥50.8 mm) MESH95, indicating that cells with larger estimated hail size have greater BT variation in their anvils. Furthermore, despite an ∼5:1 dominance of warm-season to cold-season matches, the consistent matching frequencies suggest that the relationship between MESH95 and ECS occurrence is largely independent of seasonality, except for a slight increase in incidents of significantly large hail in the warm season.

Table 2.

ECS match frequencies of nonsevere, potentially severe, and potentially significant severe MESH95 signals with ECSs detected by GOES-12/13.

Table 2.

b. GOES convective cloud-top-property database

A set of GOES-12/13 cloud-top properties is produced at ECS locations from every available scan (371 489) during 2007–17 (Table 3). The same parameters are derived from GOES-16 data during the 2017 warm season (April–August). Properties, such as mean anvil height and area of cold cloud, are computed using pixels around each ECS. Pixels are parallax corrected based on a cloud-top-height retrieval derived from matching the ECS IR BT with the collocated MERRA-2 temperature and geopotential height profiles. Only data for the most intense GOES-12/13 pixel (defined by minimum IR − tropopause) are stored for cases in which parallax causes two pixels to be assigned to the same gridded satellite pixel. GOES ECS regions detected within ±7.5 min and 28 × 28 km2 (30 × 30 km2 for GOES-16) of each MESH95 cell maximum are compiled. Note that during periods of GOES-12/13 rapid-scan operation, it is possible to have more than one observation within each 7.5-min window. The peak intensity of the multiple matches is used to accumulate every predictor in Table 3. The satellite and reanalysis parameters (section 3c) are stored alongside MESH data and other positional variables.

Table 3.

Satellite IR-derived cloud-top parameters evaluated in this study.

Table 3.

A snapshot of several GOES-12/13 and MESH properties are shown in Fig. 1 for a severe storm outbreak on 16 and 17 May 2017 that produced widespread large hail, damaging winds, and tornadoes. At 0355 UTC, MESH95 values exceeded 50 mm, especially within the regions where MESH areas surpassed 250–300 km2 along a squall line in Texas and Oklahoma (Figs. 1a,e). The coincident GOES-12/13 scan shows cloud tops 5–10 K colder than the tropopause (Fig. 1b) and highly probable OTs (Fig. 1d) aligned with cells containing MESH95 >30–50 mm or larger. Several larger-area ECSs (e.g., Fig. 1e gray arrows) exhibit greater prominence relative to the background anvil temperature (Fig. 1c), thereby indicating that wider and colder updrafts were more likely to produce severe hail as observed by radar. Note that the northernmost indicated cell in western Oklahoma has rather weak MESH95 despite relatively strong temperature difference, OT probability, and ECS area signatures, which suggests that satellite observations alone may not be sufficient for consistently quantifying active hail events and can, perhaps, benefit from complementary knowledge of the broader storm environment.

Fig. 1.
Fig. 1.

Snapshot of select properties from 0355 UTC 17 May 2017 during a severe hail, tornado, and wind event in the southern Great Plains: (a) MESH95 and cell areas (white contour thickness is proportional to cell area), (b) IR − tropopause (GOES–MERRA-2) and IR anvil rating ≥ 20 (white contours), (c) ECS − anvil BTD, (d) OT probability, (e) ECS area, and (f) DNN-predicted likelihood of severe hail. Gray arrows highlight particularly prominent ECSs, with strong temperature signatures, high OT probability, and the largest areas.

Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0042.1

c. Reanalysis matching with MESH cells and predictor importance analysis

A suite of convective environmental parameters, listed in Table 4, is derived using MERRA-2 and ERA5. The list includes common parameters for evaluating severe weather. Also included are indices and covariate predictors derived in combination, such as the energy helicity index (EHI; Brooks et al. 1994), which combines CAPE and storm-relative helicity (SRH) into one index for identifying environments favorable for rotating updrafts, and the significant hail parameter (SHIP), which is derived from most unstable CAPE (MUCAPE), mixing ratio, midlevel lapse rate, 500-hPa temperature, and 0–6-km vertical wind shear as follows (NOAA 2022):
SHIP=[(MUCAPEJkg1)×(MixingRatioofMUParcelgkg1)×(MIDLAPSE°Ckm1)×(T500°C)×(SHEAR06ms1)]44000000
Several steps are performed to resample these parameters from the reanalysis resolution to the GOES satellite grid. To avoid impacts of contamination from parameterized convection and to capture the environment that is forcing the convection, the most intense absolute model parameter values over an ∼1.5° × 1.8° latitude by longitude area (three grid boxes by three grid boxes for MERRA-2 and six grid boxes by seven grid boxes for ERA5) are found for each associated ECS (Brooks et al. 2003). Next, these reanalysis parameters are temporally interpolated to the satellite pixel scan time. Note that the tropopause height parameter is a standard output of the Khlopenkov et al. (2021) OT detection software, which always relies on MERRA-2. The other resampled reanalysis convective environmental parameters described in this section, aside from total precipitable water, are derived from either MERRA-2 or ERA5. Total precipitable water is always derived from MERRA-2.
Table 4.

Reanalysis model–derived convective parameters evaluated for this study. The table refers to both ERA5 and MERRA-2 models except where otherwise indicated.

Table 4.

Because several input parameters are composites of multiple variables, it is likely that there would be some degree of input redundancy in a DNN that uses all or most of these selections. Furthermore, overuse of redundant parameters or those with rotation direction dependency (i.e., all STP, SCP, EHI, and SRH parameters) heightens risk of overfitting to a specific environment, thereby limiting general application. As such, the experiments introduced here leverage parameters that were selected (Table 5) based on recursive assessment of model performance while also minimizing use of inputs with strong regional dependency that would most inhibit global applicability (e.g., left-mover and right-mover dynamics and elevation-influenced lapse rates). The order in which predictors were recursively included as model features, determined by their overall contribution to the model’s critical success index, are shown as superscripts in Table 5. Including latitude and longitude as predictors may help address such unwanted regional influences, but doing so is undesirable for creation of a generalized model. Although the selected parameters, nevertheless, likely contain some amount of regional bias, the dominant use of satellite inputs hopefully works to dampen such effects. Eight experiments are considered, defined in Table 6, with each using only the inputs listed in Table 5.

Table 5.

Predictors found to be optimal for general detection of potentially severe hail ranked (shown by a superscript) by order of their importance as determined by recursive feature analysis.

Table 5.
Table 6.

Summary of input and application constraints for the nine DNN experiments.

Table 6.

d. Deep neural network

Classification of potentially severe hail is accomplished through the training and application of an artificial neural network (ANN), specifically, a DNN. Figure 2 roughly outlines the basic DNN structure from input data X, through hidden layers with activations A, to the final layer output signal Y^. Further explanation of the DNN setup and architecture is provided below.

Fig. 2.
Fig. 2.

Basic illustration of an L-layer neural network.

Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0042.1

1) Architecture

The DNN output is a likelihood for potentially severe (MESH95 ≥ 1.5 in.) at each ECS detection. As such, a sigmoid activation function is used in the output-layer neuron, which is sensible for binary classification problems (Goodfellow et al. 2016). Sigmoid activation in the output layer produces a final prediction in terms of a probability ranging from 0.0 to 1.0. Model performance is assessed at some probability threshold p, at which value either a negative, that is, Y^(Y^<p)=0, or positive, that is, Y^(Y^p)=1, prediction is determined. The means of determining the value for p is explored below as well as in section 4c. Note that the DNN severe hail likelihoods are linearly scaled such that a prescaled likelihood of 0.8 would, instead, equal 1, which produces clearer visual distinction between lower and higher likelihoods of potentially severe hail.

Determination of the optimal DNN hyperparameters is accomplished through a randomized search of the hyperparameter space. These hyperparameters and their values are briefly discussed here and summarized in Table 7. The DNN structure determined for this study is one with eight hidden layers, having 75, 38, 17, 12, 11, 8, 7, and 4 neurons in layers 1–8, respectively (determined within a random search space of 3–10 hidden layers and number of neurons selected randomly within 10–100 without exceeding the size of the previous layer). In general, it was found that models with less capacity could not learn the task well enough, and larger models tend to overfit to the training set. Neurons of the hidden layers employ the rectified linear unit (ReLU) activation function, which allows for fast training given the relative simplicity of its derivative (Glorot et al. 2011). Adam optimization is employed using 256 minibatches at a learning rate of 0.001 (Bengio 2012; LeCun et al. 1998; Kingma and Ba 2015; Masters and Luschi 2018). In addition, early stopping is employed to mitigate possible overfitting (Prechelt 1998). With early stopping and the above learning schedule the DNN cycles for 10–19 epochs, depending on the fold, saving learned parameters for the last epoch before the skill on the validation set fails to improve over 10 consecutive epochs. Finally, the DNN employs focal loss, which is a modified version of binary cross-entropy loss in which the solution is adjusted by a factor that modulates the contributions of correctly classified and incorrectly classified samples. Implementing focal loss influences the value at which p represents the optimized model. As such, focal loss modulating and weighting factors specified as 2.0 and 0.374, respectively, result in an optimized p value near 0.50 (Lin et al. 2020).

Table 7.

Summary of DNN architecture and hyperparameters.

Table 7.

2) Training and validation sets

The DNN is trained using a random selection of ECS and MESH95 matches within groupings of three consecutive days across the 11 years of satellite and reanalysis data. A smaller, similarly random selection is used as a validation set that is monitored throughout the DNN tuning and training process, and an even smaller testing set is put aside for evaluation after learning is finalized to ensure that the model was not inadvertently overfit to the validation set. Only instances where the matched ECS is at least two adjacent pixels in size are considered in order to minimize introduction of GOES imager noise. Furthermore, following the method of Murillo et al. (2021), Fisher’s linear discriminant analysis (LDA; Wilks 2006) is performed on the training and validation sets using TPW and SHEAR06 to reduce the number of MESH false alarms, which tend to occur in high TPW and low SHEAR06 environments (Murillo et al. 2021).

Splitting the datasets at the boundaries of three consecutive days lends confidence that all sets are meteorologically distinct. All datasets are standardized based on the z score of the training set. The training set randomly selects from 55% of the 11-yr record in 3-day groups, whereas the validation set selects from 36%, and the testing set selects from 9%. These values roughly correspond to the equivalent of 6 years, 4 years, and 1 year of data being used for the training, validation, and testing sets, respectively. We used only 6 years for training because we felt it was desirable to minimize training data while keeping them in the majority as one way to reduce perceived overfitting in the climatology assessment. To demonstrate that the model is insensitive to randomness, that is, has low variance, the DNN skill is evaluated with k-fold cross validation using six folds. In other words, the DNN is trained and assessed six separate times using six distinct random compositions for the training, validation, and testing sets to determine the average and standard deviation of the skill metrics across all folds. The dataset fold used to train the model that is applied to the climatology and case studies (see section 4 and appendix B) was chosen because it uses none of the case-study days in the training set.

Cell maxima of MESH95 are used as the truth set Y. A 1.5-in. MESH95 threshold is chosen to mark the boundary of binary classification categories (Y = 0 or Y = 1), where MESH95 less than 1.5 in. is considered nonsevere, and MESH95 of 1.5 in. or more is considered potentially severe. The distribution of the categories is summarized in Table 8. The reason the positive class is labeled as potentially severe is because of a high bias in MESH data, which means that 1.5-in. MESH95 (or 1-in. MESH75) is more comparable to 0.75-in. (19.1 mm) hail reported at the ground, falling short of the National Weather Service definition of severe hail (Murillo et al. 2021). Hail this size, however, is still considered part of the small-hail category and remains of significant interest to the insurance industry (Murillo et al. 2021; North American Hail Workshop Panel Discussion 2022).

Table 8.

Average and variation (in parentheses) of total, warm-season, and cold-season sampling, as well as the total nonsevere and potentially severe samples across the k-fold training, validation, and testing sets. The last column is the ratio of potentially severe samples to nonsevere samples.

Table 8.

Note that for three of the experiments listed in Table 6 (i.e., GOES-16 + MERRA-2, warm season, and cold season), the DNN training and validation conditions are not the same. For the GOES-16 + MERRA-2 experiment, the model is trained with GOES-13 + MERRA-2 data that exclude input from 2017 during the k-fold random selections (each fold still representing the equivalent of 6 years of data), and the ECS area parameter is scaled between 0 and 1 based on GOES-12/13 minimum and maximum values. The model is then applied to six folds of 2017 GOES-16 + MERRA-2 data (80% validation set, 20% testing set) with ECS area scaled between 0 and 1 based on GOES-16 minimum and maximum values. The warm- and cold-season experiments use the trained parameters from the GOES-13 + MERRA-2 experiment, but skill is evaluated only for ordinal days 91–273 (for warm season) and 1–90 and 274–366 (for cold season).

3) Skill metrics

There are multiple relevant metrics with which the capability of a DNN can be evaluated. These metrics and their relevance to meteorological warnings have been reviewed extensively in previous works and are summarized in this section (Doswell et al. 1990; Schaefer 1990; Barnston 1992; Gerapetritis et al. 1995; Kunz 2007; Barnes et al. 2009; Hyvärinen 2014; Johnson and Sugden 2014; Gensini and Tippett 2019). Skill metrics consider different combinations of true-positive, false-positive, false-negative, and true-negative classifications; therefore, it is important to diversify one’s evaluation criteria to be sure the model is not deficient in any one category. The probability of detection (POD) considers the ratio of true positives to the sum of observed severe events, thereby explaining the rate of correct severe hail predictions for all actual cases of severe hail. Next, there are two metrics for explaining false alarms (FAs). One is the FA ratio, which considers the ratio of false-positive predictions to the sum of all positive predictions. The second is FA rate, which considers the ratio of false-positive predictions to the sum of observed nonsevere events. The critical success index (CSI) is given by the ratio of true-positive predictions to the sum of true-positive, false-positive, and false-negative predictions, and has a range from 0 (no predictive skill) to 1 (perfect prediction). The Heidke skill score (HSS) accounts for all four classification outcomes and can range from −∞ to 1, where values 0 and below indicate no predictive skill and 1 indicates a perfect prediction. The final metric is the frequency bias index (FBI), often simply called bias in the context of categorical forecasts, which is the ratio of total positive predictions to total observed severe events, where 1 is perfect and values less than (greater than) 1 indicate a tendency toward under (over) forecasting. The exact formulations for these skill metrics are summarized by Wilks (2006).

e. Climatological aggregation

To compare the frequency of the predicted hail events with various observational sources, daily occurrences of severe hail events are counted on an 80 km × 80 km grid for each dataset over CONUS from 2007 through 2017 (Brooks et al. 2003; Doswell et al. 2005; Cintineo et al. 2012; Allen and Tippett 2015; Murillo and Homeyer 2019). Only one detection per day (0000–2300 UTC) is required to be considered an “event day,” which thereby limits the impact of spatiotemporal resolution and sampling opportunity differences on the resulting climatologies. Following Murillo and Homeyer (2019), 1σ Gaussian smoothing is applied. Whereas the training, validation, and testing datasets use data collected within ±7.5 min from the top of the hour, the climatology is aggregated from satellite files with scans beginning 15 min after the hour (e.g., 2315 UTC). Owing to the time resolution of the reanalysis data, however, around 55% of the reanalysis inputs used to train the model also appear in the climatology application but never in the same exact paring with the satellite data and with slightly different time-interpolated values. Therefore, input combinations for application are distinct from training input combinations, although with 6 of 13 predictor values similar to what may have been used during training within a given reanalysis grid box.

4. Results and discussion

In this section, we discuss results covering five focus areas, beginning broadly before narrowing in on specific applications. We start by documenting the climatological distribution of MESH, hail reports, and MWR hail probability over the study domain. Second, the parameter space of certain satellite and reanalysis inputs are explored as a function of MESH hail size, and correlations of hailstorm detections with input parameters are examined. Outcomes of the DNN are presented next, including explanation of general predictive capability, demonstration of specific performance metrics for each input sensitivity experiment, and discussion of caveats. Fourth, we highlight the DNN hail likelihood climatology and relate that to the MESH climatology discussed previously. Finally, we end the section with examination of hail likelihood estimates for two case studies (with a third presented in appendix B).

a. Regional hail distribution

The yearly average severe hail event days from MESH and SPC storm reports show a maximum frequency in the central Great Plains that is shifted west of the maximum frequency for nonsevere hail event days (Figs. 3a,c). Reduced frequency, however, of ground-spotter reports (Fig. 3b), likely due to well-known biases (Allen and Tippett 2015), is apparent. Note that MWR hailstorm detections are rendered as point observations rather than as a grid, because TMI observations do not extend beyond 39° latitude and have greater sampling density at the northernmost extent of the orbit (Fig. 3d). This sampling discrepancy, in addition to time-of-day bias from the AMSR2 sun-synchronous orbit, would bias a gridded climatology relative to patterns depicted in the MESH climatology. Nevertheless, there is a concentration of high-likelihood MWR hail detections in the central Great Plains (red symbols in Fig. 3d), consistent with MESH and SPC. Higher densities of MWR hail detections also occur in regions outside of CONUS, including along the Sierra Madre in northwestern Mexico, northeastern Mexico, and the Gulf Stream.

Fig. 3.
Fig. 3.

Yearly average severe hail event days observed on an 80 × 80 km2 grid (1-sigma Gaussian smoothing applied) during 2007–17 matched within ±7.5 min and 28 × 28 km2 of a GOES-12/13 ECS detection: (a) MESH95 ≥ 1.5 in., (b) SPC hail reports, (c) MESH95 < 1.5 in., and (d) MWR hail detection probability exceeding 0.2. The satellite MWR is plotted as symbols and colored blue or red if exceeding a low or high threshold, respectively. Note that TMI observations do not extend beyond 39° latitude and have greater sampling density at the northernmost extent of the orbit.

Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0042.1

b. Parameter sensitivity to hail size and reanalysis

The satellite and reanalysis parameter distribution as a function of hail size is complex due, in part, to 1) challenges in depicting the spatiotemporal evolution of severe storm environments by reanalysis, 2) the fact that severe hailstorms can be caused by a broad spectrum of environmental conditions (i.e., low CAPE and high wind shear; high CAPE and low wind shear; Johnson and Sugden 2014; Gensini et al. 2021; Zhou et al. 2021), and 3) the imprecise process of spatiotemporally matching satellite-based cloud-top properties with hail reports and MESH cells. Nevertheless, value distribution plots stratified by MESH hail-size categories and correlation analyses can reveal patterns indicative of severe hailstorms.

Box-and-whisker distributions of selected GOES-12/13 IR-based parameters show increasing satellite-derived intensity as hail size increases, with varying degrees of overlap among neighboring bins (Fig. 4). For example, the median OT probability within the smallest MESH95 bin (0.5–1.0 in.; i.e., 12.7–25.4 mm) is ∼0.33, increasing to ∼0.8 for ≥3-in. (76.2 mm) MESH95 (Fig. 4a). ECS area is 2–3 times as large (Fig. 4b) and tropopause-relative IR BT (Fig. 4d) is ∼5 K colder for ≥3-in. MESH95 relative to <1-in. MESH95, indicating that colder and wider updrafts are more likely to generate larger hail. Marion et al. (2019) computed OT area using a similar approach and found significant correlation between OT area and tornado intensity. Some parameters, like ECS − anvil BTD, have more gradual and subtle sensitivity to MESH95 size, for which an ∼2-K median increase in BTD from the lowest to highest MESH95 bin was found (Fig. 4c). Sensitivity of these relationships to imager resolution is explored during the 2017 warm season using GOES-13 and GOES-16 data in appendix A, showing that GOES-16 metrics are largely consistent with GOES-13, but GOES-16 records slightly colder temperatures, smaller ECS area, and greater sensitivity to hail size due to the higher pixel resolution. It is also important to note that large hail has occurred when satellite parameters are weak, which could be caused by factors such as 1) time differences between GOES and MESH data; 2) uncertainty in MESH data where, in reality, a cell may not truly be as intense as depicted by MESH; and 3) the ability for large hail to be produced in a cloud without notable BT variations at cloud top.

Fig. 4.
Fig. 4.

Box-and-whisker distributions [center line indicates median value, rectangles encompass the interquartile range (IQR), and whiskers extend to 1.5 × IQR; outliers are not shown] of select GOES-12/13 IR cloud-top parameters stratified by MESH95 hail-size bins (matched within ±7.5 min and 28 × 28 km2) during 2007–17: (a) OT probability, (b) ECS area, (c) ECS − anvil BTD, and (d) IR − tropopause.

Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0042.1

Some MERRA-2 convective environment parameters also increase in intensity with increasing MESH95 (Fig. 5). Similar results are also found for ERA5 but are not shown. From the lowest to highest MESH95 bin, the median MUCIN (Fig. 5a) increases by ∼40 J kg−1 while SHEAR01 (Fig. 5c) decreases by ∼5 m s−1. Furthermore, Fig. 5d shows a preference for ∼2 × larger median values of SHIP for ≥2-in. as compared with <1-in. MESH95 bins. It is intuitive to anticipate more vigorous convective updrafts when parcels overcome large CIN in drier environments such as the explosive growth associated with the initiation of dryline thunderstorms over Oklahoma, Texas, and Kansas. The pattern may also be a reflection of reduced interaction between convective cells in large CIN environments, that is, discrete storms favoring larger hail likelihood. Although the negative correlation between MESH and SHEAR01 might seem counterintuitive, it is possibly due to decreased hailstone residence times in highly sheared updrafts (Dennis and Kumjian 2017). Like the satellite analyses depicted above, large MESH95 does occur in many instances where environmental parameters are more muted than expected, such as in Fig. 5b, which presents a challenge to DNN learning capability. Evaluating such parameters in isolation in this way, however, does not capture possible multivariate contributions that the DNN may detect.

Fig. 5.
Fig. 5.

As in Fig. 4, but for select MERRA-2 convective parameters binned by MESH95: (a) MUCIN, (b) T500, (c) SHEAR01, and (d) SHIP.

Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0042.1

A correlation matrix quantifies the relationships between hailstorm detections and satellite and reanalysis parameters (Fig. 6). Specifically, MESH95 maxima, MWR hail probability, and ground-reported hail size are correlated with one another and with the GOES-12/13 and MERRA-2 parameters from Tables 3 and 4. OT probability, ECS area, SHIP, and IR − tropopause are the most strongly correlated with MESH95, with Pearson correlation coefficients (r) in the range of 0.27–0.38. Ignoring the expectedly correlative height variables, and despite a few instances of exceptionally strong correlation among other predictors, for example, OT probability versus ECS area (r = 0.75) and ECS area versus IR − tropopause (r = −0.61), most predictors are relatively independent of one another. While MWR hail probability correlation with the selected parameters is overall comparable to that of MESH95, with some notable exceptions (SHEAR01), significantly weaker relationships are found between reported hail size and reanalysis and satellite parameters, likely a result of uncertainty of hail-size reports (Allen and Tippett 2015) and the inherent absence of subsevere hail reporting. Last, and still with acknowledgment of the same caveats for hail reporting, we see that MESH and MWR hail probability are more than twice as correlated as MESH and SPC hail size, which quantifies the relative patterns seen in Fig. 3.

Fig. 6.
Fig. 6.

Correlation matrix of select GOES-12/13 and MERRA-2 model reanalysis parameters matched within ±7.5 min and 28 × 28 km2 of a MESH95 cell maximum and MESH95 cell areas. Collocated MWR hail probability detections and ground-spotter hail-size reports are also included.

Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0042.1

Figure 6 also highlights that certain satellite parameters, that is, OT probability, ECS area, and ECS − anvil BTD, are largely independent of the environment and, therefore, should uniquely contribute to hailstorm discrimination. Meanwhile, some satellite and reanalysis correlations, such as anvil height with temperature at 500 mb (T500) and TPW, are rather strong (r = 0.71 and 0.64, respectively). A warmer atmosphere results in higher equilibrium heights and higher cloud tops given sufficient instability and moisture.

In summary, multiple satellite and model parameters are shown to be potentially useful predictors of observed hailstorm characteristics. Overall, larger hail is linked to wider and colder updrafts occurring in environments with weaker low-level shear and increased convective inhibition and SHIP. Next, we highlight the outcomes of a DNN model used to establish quantitative links among these parameters to skillfully detect potentially severe hailstorms.

c. DNN validation

1) Association of DNN prediction to truth

Increasing likelihood of potentially severe hail is correlated with increasing MESH, with a greater range of hail size covered by MESH95 than by MESH75 (Fig. 7). This positive correlation between likelihood and MESH lends confidence to the DNN performance. That is, the model is assigning the highest likelihoods to storms with the greatest potential severity; therefore, thresholds of hail likelihood can be tailored to focus on storms producing the largest hail.

Fig. 7.
Fig. 7.

Validation set distributions of MESH95 and MESH75 at 5% intervals of potentially severe hail likelihood, where potentially severe is defined as 1.5 in. for MESH95 and 1 in. for MESH75. Numbers indicate validation samples per bin interval. The severe hail likelihoods have been linearly scaled such that a prescaled likelihood of 0.8 would, instead, equal 1.

Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0042.1

2) Input sensitivity experiment results

The DNN accuracy is assessed using receiver operating characteristic (ROC) curves based on models applied to the validation set (Fig. 8). A summary of the experiment skill scores for likelihoods determined by the p value at the inflection point of the ROC curve is provided in Table 9, which provides not only validation set skill metrics but also those for the training and testing sets. While a ROC curve shows the relationship between POD and FA rate, a performance diagram (PD) incorporates multiple validation metrics: POD, FA ratio, and CSI (Gerapetritis et al. 1995; Roebber 2009). Figure 9 shows a PD for the same ensemble of models shown in Fig. 8. Figure curves and table statistics represent the k-fold-average results (standard deviations are given in parentheses in Table 9).

Fig. 8.
Fig. 8.

ROC k-fold-average curves illustrating the classification capability of the DNN as applied to the validation set for each experiment (see Table 6). Diamonds designate p = 0.5, and circles mark p increments of 0.1 lower and higher than 0.5.

Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0042.1

Table 9.

Average and variation (in parentheses) of skill metrics on training, validation, and testing sets for the different DNN experiments, evaluated with k-fold cross validation. The prediction likelihoods at which the statistics are determined are based on the inflection points of the ROC curves for each fold, the average and standard deviation of which are given in the column marked p.

Table 9.
Fig. 9.
Fig. 9.

As in Fig. 8, but shown as a performance diagram.

Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0042.1

Before discussing the specific performance details, it is notable that the various skill metrics are consistent, per each experiment, across the training, validation, and testing sets. For example, in the GOES-13 + MERRA-2 experiment, both CSI and HSS on the training set outperform CSI and HSS on the validation set by only 0.009 (by 0.014 and 0.006 relative to the testing set, respectively). This consistency suggests that there is a low amount of variance in the model, meaning that it is comprehensive and not overfit to the training set. Furthermore, because the validation set and testing set skill scores are comparable, there is no concern that the model was inadvertently overfit to the validation set during the model tuning process. There is bias in the model (the DNN predictions are not perfect), but it is indiscriminate of the input dataset. Therefore, these results should help alleviate concerns that climatology results may not be well generalized.

3) Input-sensitivity-experiment discussion

Interpretation of results may be influenced by the skill metric being considered. The GOES-13 + MERRA-2 experiment offers what one might consider a well-rounded performance, as it scores relatively well in terms of CSI, HSS, and area under the curve (AUC) (Fawcett 2006) at 0.511, 0.407, and 0.810, respectively, on the validation set, and 0.506, 0.410, and 0.810, respectively, on the testing set. The GOES-16 + MERRA-2 performance is comparable at 0.501–0.503 CSI, 0.396–0.403 HSS, and 0.797–0.800 AUC on the validation and testing sets. Other recent studies that have sought to detect severe hail events, using MESH, storm reports, or insurance reports as truth, and station data, radiosondes, radar reflectivity, lightning, and/or reanalysis as inputs, achieved CSI values ranging from 0.17 to 0.27, HSS ranging from 0.26 to 0.39, and AUC ranging from 0.77 to 0.78. The scope, goals, and success metrics of these previous studies, however, are unique and not necessarily aligned with those of our study, for example, forecasting is not our intention (Kunz 2007; Gagne et al. 2017; Czernecki et al. 2019; Gensini et al. 2021). One fold of the GOES-13 + MERRA-2 model, with scores of 0.512–0.514 CSI and 0.409–0.434 HSS on the validation and testing sets, is selected to compile the severe hail climatology and to demonstrate results from case studies shown below.

The GOES-13 + ERA5 experiment skill metrics are comparable to those of GOES-13 + MERRA-2. Despite the small CSI advantage of the latter and its overall better k-fold stability, that is, lower variance across the k-fold evaluation, using either reanalysis source along with satellite observations will lead to effectively the same outcome. One reason for why the MERRA-2 DNN may marginally outperform the ERA5 DNN in this study, either alone or in combination with satellite inputs, might be because the tropopause height, tropopause temperature, and TPW inputs are sourced from MERRA-2 in all instances of their use. Whether an ERA5 DNN that sources ERA5 for those inputs would demonstrate notably higher skill is an area for future study.

In the case of the cold-season analysis, the disparity in skill demonstrated by the ROC curve relative to the PD is notable. The cold-season ROC curve is higher than those from the other satellite + reanalysis experiments, demonstrating the highest AUC of all experiments at 0.823–0.826 and the highest HSS at 0.419. However, cold season is the poorest performer in terms of CSI at 0.397–0.399. This disparity in performance is due to the high FA ratio of 0.513–0.517 and relatively low FA rate of 0.208–0.219. In the case of the warm season, CSI is closer to that for all seasons at 0.500–0.505, but with worse HSS than the cold season at 0.393–0.396, outcomes that are due to relatively low POD (0.651–0.660) but relatively favorable FA ratio (0.317–0.318) and FA rate (0.212–0.22).

A pattern similar to the cold-season HSS and CSI discrepancy was reported in a large hail study performed by Czernecki et al. (2019), who, with a model trained to predict reported severe hail based on radar reflectivity, lightning detection, and ERA5 parameters, demonstrated HSS performance of 0.383 and CSI of 0.245. In that study, the discrepancy was driven by a significant difference between FA rate (∼1%) and FA ratio (∼67%), which may suggest that although there was adequate recognition of nonevents, the model perhaps attempted relatively few positive predictions or had few opportunities to do so. Relating the significance of such prediction opportunities to our cold-season results, it is important to keep in mind that model sampling is dominated by warm-season events, and, therefore, the DNN is biased toward such environments. Figure 10 demonstrates this seasonal dependence, revealing diminished MUCIN (Fig. 10a) in the cold season relative to the warm season for increasing MESH95, supporting previous findings on the distinct seasonal characteristics conducive to severe hail occurrence (Púčik et al. 2015). Aside from the higher CIN, a DNN dominated by warm-season inputs learns to associate severe hail with overall reduced SHEAR01 (Fig. 10b) and higher cloud-top heights (Fig. 10c). These differences between the cold- and warm-season distributions, with the exception of OT probability (Fig. 10d), do not lend to a cold-season DNN performance as favorable as that of the warm season. Improving cold-season skill, perhaps with specialized training and increased sampling, is an area that warrants further study.

Fig. 10.
Fig. 10.

Box-and-whisker distributions of select MERRA2 and GOES-12/13 parameters during the cold season (blue) and warm season (orange) stratified by MESH95 hail-size bins (matched within ±7.5 min and 28 × 28 km2) during 2007–17: (a) MUCIN, (b) SHEAR01, (c) cloud-top height, and (d) OT probability. The number of matches within each bin is listed below (a) and (b).

Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0042.1

It is important to note that the performance of the DNNs with reanalysis-only inputs are biased toward better predictions because reanalysis parameters are only being extracted at ECS locations where convection is actively occurring. Situations not reflected in these statistics are cases where model parameters might have indicated a favorable severe storm environment, but no storms formed. Therefore, the importance of the satellite measurements alone in precisely identifying areas of convection within these favorable environments is perhaps more significant than what is apparent from these results. With this consideration in mind, the experiments that rely only on reanalysis input have comparable performance to that of the GOES-13-only model. Model superiority depends on the position along the ROC and PD curves, but, considering only the optimal p values, the MERRA-2–only and ERA5-only models score marginally better than the GOES-13-only model. Performance is notably better, however, when GOES-12/13 inputs are combined with reanalysis.

d. DNN severe hail climatology and case studies

The DNN model is applied to the 11-yr database of merged MERRA-2 and GOES-12/13 parameters at ECS pixels to derive severe hail climatologies. ECS or OT pixels with varying hail likelihoods are aggregated at the 80 km × 80 km grid scale, matching the grid size of hail climatology truth datasets shown in Fig. 3. The spatial variability and frequency of confident OT detections with p > 50% likelihood of producing severe hail (Fig. 11b) highlight a region of maximum occurrence (∼20–22 events per year) across the central Great Plains, patterns that are supported by the MESH95 climatology (see contours of three and six MESH95 hail events per year in Fig. 11). Hail-producing ECSs predicted by the DNN reveal frequent severe hail days (∼50 events per year; Fig. 11a) in these regions, far exceeding MESH event days but keeping the same general spatial pattern. The excessive ECS event days may be tied to the fact that ECSs with low OT probability originate from cold outflow, from a tropopause-relative perspective, very near to true updraft regions (Cooney et al. 2021). Therefore, a true updraft OT may have several ECSs nearby that can trigger a moderately high hail probability, and the greater frequency and coverage of these ECSs inflate the counts. When we constrain to only the ECSs likely to be true updrafts (Fig. 11b), the event days are in much better agreement with that of MESH. When a higher likelihood threshold (p > 75%) corresponding to greater MESH (Fig. 7) is used to compile a climatology, event-day counts show even closer relative agreement.

Fig. 11.
Fig. 11.

Yearly average GOES-12/13 (left) ECS and (right) OT event days detected during 2007–17: (a) ECSs filtered by >50% likelihood of severe hail, (b) OTs filtered by >50% likelihood of severe hail, (c) ECSs filtered by >75% likelihood of severe hail, and (d) OTs filtered by >75% likelihood of severe hail. Black and dotted red contours show regions of three and six hail events per year, respectively, for MESH95 ≥ 1.5 in.

Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0042.1

The patterns are also consistent with observations from SPC and the findings of the Meteorological Phenomena Identification Near the Ground (MPING) project, which crowd-sourced volunteers to report precipitation events [Fig. 3b; Fig. 6 in Elmore et al. (2022)]. Other DNN hail maxima occur over the Sierra Madre, northeastern Mexico, and the Gulf Stream (Fig. 11), which are outside the NEXRAD range but are coincident with regions of MWR hail detections (Fig. 3d). Although radar confirmation is not possible in these areas of Mexico, previous intense convection climatology studies depict local maxima in these regions (Edwards 2006; Zisper et al. 2006; Farfán et al. 2020).

Case studies reveal interesting relationships between satellite parameters, large-scale environmental conditions, DNN-predicted hail likelihood, ground-spotter hail reports, and 5-min potentially severe hail swaths from MESH95 (≥1.5 in.), which allow for qualitative assessment of DNN performance. The first case, 16 and 17 May 2017, was initially highlighted in Fig. 1, although without showing consideration of the broader environmental conditions, where it is evident that the most prominent cells (Figs. 1c–e) are generally consistent with areas of ∼60% or greater severe hail likelihood (Fig. 1f). Grey arrows in Fig. 1f highlight a range of likelihood estimates for comparably strong satellite parameters. This likelihood range, which shows as decreasing in the south–north direction, is driven by tendencies in the environment toward decreasing SHIP and MUCIN but increasing SHEAR01 and TPW at the time of the image (not shown). These patterns are anticorrelated with MESH95 (Fig. 6).

In Fig. 12, compilations of select reanalysis and satellite variable maximum intensities are shown, summarizing the strongest convection during the event across the region and at each GOES gridded pixel. A large area of overshooting convection occurred on this day, evidenced by widespread swaths of IR − tropopause < 0 K (Fig. 12b), OT probabilities near 1.0 (Fig. 12c), and tropopause-relative cloud-top heights up to 2–3 km above the tropopause (Fig. 12f). MERRA-2 reveals coincident high SHIP and environmental stability along a front (Figs. 12a,e). It appears that the DNN is properly combining the GOES and MERRA-2 inputs to derive high hail likelihood (Fig. 12g) in regions where high MESH95 was observed and hail or tornadoes were reported (Figs. 12d,h), such as the Texas Panhandle and Minnesota and Wisconsin. Despite similarly intense cloud heights, temperatures, and overshooting tops elsewhere, such as in Iowa, the DNN hail likelihood was not as high, because of weaker SHIP, thereby demonstrating the significant influence of this third-most contributing model predictor (see Table 4).

Fig. 12.
Fig. 12.

Maximum intensity of parameters aggregated from 1800 UTC 16 through 0600 UTC 17 May 2017: (a) MERRA-2 SHIP; (b) GOES-13 IR − tropopause; (c) GOES-13 OT probability; (d) SPC severe wind, hail, and tornado reports; (e) MERRA-2 MUCIN; (f) GOES-13 tropopause-relative cloud-top height; (g) DNN-predicted severe hail likelihood exceeding 20%; and (h) 5-min MESH95 exceeding 1.5 in.

Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0042.1

Zoomed views of the areas outlined in Figs. 12g and 12h are provided in the top panels of Fig. 13. Results from the DNN applied to scaled GOES-16 parameters [see section 3d(2)], aggregated at GOES-13–like 15-min intervals, are also shown for this same case (Fig. 13b). GOES-16 hail likelihoods are comparable to those of rapid-scan GOES-13 (Fig. 13a) in terms of spatial distribution of low and high likelihoods, although with overall higher intensity, which may be owed to inherit imperfections in transferring parameters trained on scaled GOES-13 data to scaled GOES-16 data. On the other hand, the 15-min GOES-16 results appear to better represent actual MESH data, with fewer false alarms in the western Texas Panhandle, central to eastern Kansas, and eastern Colorado, and more precise areal alignment with ≥1.5-in. MESH95, thereby highlighting the importance of GOES data in the DNN (Fig. 13d). The amplification of hail likelihood as a result of the increased GOES-16 spatial and temporal resolution warrants further study (Fig. 13c). That is, rather than applying imperfect scaling, a DNN trained using an extensive record of GOES-16 scans at the native 5-min-interval resolution should be developed for applications to GOES-16 measurements.

Fig. 13.
Fig. 13.

Enhanced views of (a)–(c) DNN-predicted severe hail likelihoods exceeding 20% and (d) 5-min MESH95 exceeding 1.5 in. for areas outlined in black in Figs. 12g and 12h, and, similarly, (e)–(g) DNN-predicted severe hail likelihoods and (h) 5-min MESH95 for areas outlined in black in Figs. 14g and 14h. Likelihoods are determined using rapid-scan GOES-13, 15-min GOES-16, and 5-min GOES-16.

Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0042.1

Another case study on 11 July 2017 highlights a summertime hail, wind, and tornado event across Minnesota, Illinois, and the Dakotas (Fig. 14). Much of the reported severe hail for this case occurred in northern Minnesota, eastern South Dakota, eastern Iowa, and northern Illinois (Fig. 14d). The 5-min MESH95, however, shows additional regions (isolated cells in Kansas) of potentially severe hail (≥1.5 in., Fig. 14h). SHIP and 0–1-km wind-shear conditions supportive of severe convection (Figs. 14a,e) are aligned, with the most intense observed overshooting cloud tops (Figs. 14b,c,f) and severe weather reports (Fig. 14d). The GOES-13 DNN indicates severe hail likelihoods of ∼60% and higher along the primary swaths of MESH95 potentially severe hail in addition to isolated convective cells in the central Great Plains (Figs. 14g,h). Figures 13e–h show zoomed views of the areas outlined in Figs. 14g and 14h. As with the May case, the 15-min GOES-16 DNN application (Fig. 13f) highlights overall higher intensity than that for the rapid-scan GOES-13 case (Fig. 13e), again with better precision of the 15-min GOES-16 results aligning with ≥1.5-in. MESH95. That is, the GOES-16–based DNN application does well in showing well-aligned and enhanced likelihoods along true hail tracks, with fewer false alarms in Fig. 13f than in Fig. 13e, especially in central and northeast Minnesota and northeast Nebraska. The 5-min GOES-16 results (Fig. 13g) are comparable to those of the 15-min GOES-16 results in this case. DNN performance is also demonstrated for an early-season hail event (5 April 2017) in the southeastern United States in appendix B.

Fig. 14.
Fig. 14.

Maximum intensity of parameters aggregated from 1200 UTC 11 through 0800 UTC 12 Jul 2017: (a) MERRA-2 SHIP; (b) GOES-13 IR − tropopause; (c) GOES-13 OT probability; (d) SPC severe wind, hail, and tornado reports; (e) MERRA-2 SHEAR01; (f) GOES-13 tropopause-relative cloud-top height; (g) DNN-predicted severe hail likelihood exceeding 20%; and (h) 5-min MESH95 exceeding 1.5 in.

Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0042.1

5. Conclusions

Prediction of storms that produce potentially severe hail was accomplished through the training and application of a DNN that ties historical and present-day GOES-based detections and intensity metrics at the satellite-detected updraft scale, together with reanalysis environmental characteristics, to NEXRAD MESH over the CONUS. The DNN generates severe hail likelihood at each satellite-detected ECS, which serve as proxies for storm updraft regions. To the authors’ knowledge, this is a first-of-its-kind study applying deep learning to merged reanalysis and individual pixel-scale, satellite-derived parameters to estimate hail likelihood. This study emphasizes hailstorm detection using historical GOES-12/13 data for the purposes of assessing climatological hail risk while highlighting statistical performance and also features GOES-13 case studies during a period in 2017 with overlapping GOES-16 data. Our goal is to create a model that can produce a hail likelihood for any satellite-identified ECS and coincident reanalysis inputs as a first step toward global applicability.

A set of parameters was identified that correlates with MESH95 and exhibits a relatively low amount of regional dependency, and recursive feature analysis was performed to rank the parameters’ contributions to potentially severe hailstorm detection. These parameters encapsulate several GOES IR BT variables that define storm-top height and updraft intensity, 0–1-km wind shear, SHIP (which includes most unstable CAPE, mixing ratio, midlevel lapse rate, 500-hPa temperature, and 0–6-km wind shear), CIN, and TPW. It was found that these parameters are much better correlated with MESH95 maxima and MWR hail likelihood than observed hail size, an outcome attributed to uncertainties with hail-size reporting. It was also shown that that DNN severe hail likelihood is strongly correlated with MESH. Various metrics, computed using k-fold cross validation, were used to evaluate the DNN predictive capability. The performance of the experiments was shown to be dependent on what skill metrics are evaluated. The GOES-13 + MERRA-2 model demonstrates an exceptionally high-level and balanced performance in comparison with other recent similar efforts, with CSI of 0.511 and HSS of 0.407. The GOES-16 + MERRA-2 model performs comparably.

Various experiments with different DNN inputs and evaluations during the warm versus cold seasons were conducted to further understand model capability. The results demonstrate choice of either MERRA-2 or ERA5 for reanalysis training has rather minimal impact on the final skill, although MERRA-2 does perform marginally better. Focusing on only warm-season application improves the DNN performance in terms of FA ratio and FA rate but at the cost of reduced POD. The cold-season success ratio is much lower than that of the warm season, likely due to the more dominant sampling of warm-season environments and observations. This discrepancy could likely be mitigated with increased sampling and training specific to the seasons, an area that warrants further study.

Furthermore, it was shown that a satellite-only DNN performs slightly worse than a reanalysis-only DNN at optimal p values. There is notable improvement when the satellite and reanalysis are used together. An important consideration here is the fact that models relying solely on reanalysis are biased toward better predictions because parameters are extracted where convection has already been detected by GOES. Therefore, the marginally improved performance of MERRA-2–only or ERA5-only models relative to a GOES-13-only model is misleading. The DNN reproduced the general features of the 11-yr CONUS severe hail climatology defined by MESH95, SPC, and satellite MWR, with distinct maxima in hail frequency over the central Plains, the Midwest, and northwestern Mexico. Good agreement between GOES-13 and GOES-16 hail likelihoods and 5-min resolution MESH95 severe hail is also demonstrated for several case studies. Fewer false alarms were apparent in these cases when the DNN is applied to GOES-16 data.

Many open questions remain regarding optimal experimental design for achieving maximum, generalized predictive skill, which can be better answered as sampling is improved. Future efforts to enhance DNN hail predictions may include 1) using MWR hail probabilities to better understand how severe hail environments differ regionally and allow for model application and validation on a global scale; 2) incorporating additional satellite data like Geostationary Lightning Mapper flash characteristics and rate or visible channel parameters such as cloud-top reflectance or spatial texture; 3) incorporating a mesoscale weather prediction model or reanalysis that can better resolve the environment ingested into storms; 4) modifying spatiotemporal interpolation of reanalysis data to the satellite grid (e.g., use the model time step before a satellite ECS detection); and 5) training with a longer data record and incorporating 5-min data from GOES-16, coupled with the more recently available 5-min resolution GridRad Severe MESH dataset. Pursuing these advances should help further generalize the model and improve likelihood estimates, which is beneficial to those interested in better understanding hail risk across the globe. The release of the collocated GOES, reanalysis, and MESH95 dataset at the storm-cell scale enables additional experimentation and improvements to a DNN model by the community.

Acknowledgments.

An open-source Python package, Tracking and Object-Based Analysis of Clouds (tobac; https://tobac.readthedocs.io/en/latest/index.html), version 1.2, was modified to optimize the feature detection and watershed segmentation functionality to the hourly MESH95 data to determine locations of extrema and their areas. The University of Wisconsin Madison Space Science and Engineering Center McIDAS-V visualization and analysis software (https://www.ssec.wisc.edu/mcidas/software/v/) was used to check the quality of the alignment of radar, model, and satellite parameters and for figure creation. McIDAS-X was used to acquire the GOES input data and place them in subsets to derive convective cloud top parameters. The authors acknowledge the NASA Applied Sciences Disasters Program award (18-DISASTER18-0008), which provided the funding for this work. Author C. Homeyer acknowledges support from NASA Grant 80NSSC20K1410. Author J. Allen acknowledges support from the National Science Foundation (AGS-1855054).

Data availability statement.

The matched ECS and MESH cell dataset used in this paper can be found online (https://science-data.larc.nasa.gov/LaRC-SD-Publications/2023-01-25-001-BRS/). Severe storm reports are available through NOAA’s Storm Prediction Center (https://www.spc.noaa.gov/climo/reports/). Data from the fifth major global reanalysis produced by ECMWF were downloaded from the Copernicus Climate Change Service Climate Data Store (https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-pressure-levels?tab=overview). MERRA-2 data were provided by the NASA GMAO (https://disc.gsfc.nasa.gov/datasets/M2T1NXSLV_5.12.4/summary). Environmental parameters were derived using Python package xcape (https://github.com/xgcm/xcape). Hourly GridRad volumes used to derive MESH are available online (http://weather.ou.edu/∼chomeyer/MESH/). Five-minute GridRad Severe volumes used to derive Severe MESH are also available online (http://weather.ou.edu/∼chomeyer/GridRad_Severe_MESH/).

APPENDIX A

Parameter Sensitivity to Imager Resolution, Hail Report Type, and Season

GOES-13 and GOES-16 parameters are stratified by MESH95 bins to assess the sensitivity to imager resolution (Fig. A1). Convective intensity increases with MESH95 for all four parameters shown, although there are differences in the GOES-13 and GOES-16 distributions. The median IR − tropopause ranges from ∼0 K in the 0.5–1.0-in. bin to −6 K in the ≥2-in. bin for GOES-16 and from 3 K in the 0.5–1.0-in. bin to −3 K in the ≥2-in. bin for GOES-13, revealing a comparable relative sensitivity of GOES-16 IR − tropopause to MESH95 but starting at a lower temperature, likely owing to the ability of a finer-resolution imager to resolve colder IR pixels from intense convection (Khlopenkov et al. 2021; Cooney et al. 2021). ECS area is greater for GOES-13 than GOES-16 because of the greater pixel size of the former.

To compare the sensitivity of these GOES-12/13 parameters with other hail detection datasets, Fig. A2 stratifies GOES-12/13 parameters as a function of MWR hail probability. Although there is a significantly reduced sample size relative to MESH95 analyses (Fig. 4), the GOES-12/13 parameters exhibit similar increases in intensity as the MWR hail probability increases. The IR − tropopause interquartile range (IQR) ranges roughly between −2.5 and 2.5 K in the lowest hail-probability bin and between −5.5 and −2.5 K in the highest hail-probability bin, revealing comparable sensitivity to that found with MESH95 but with greater (more negative) median intensity for matches across all bins. This implies that MWR hail detections are associated with more intense overshooting convection than MESH cell detections.

GOES-12/13 parameters are stratified by SPC hail size in Fig. A3. The median GOES-12/13 match with SPC reports for all hail size bins is colder than the tropopause (Fig. A3d) with greater than 0.6 median OT probability (Fig. A3a). However, these high intensities come at the price of lower sensitivity to hail size and significant IQR overlap among bins. The median ECS area reveals only ∼20-km2 sensitivity between the lowest (1.0–1.5 in.) and highest (≥3 in.) reported hail-size bins (Fig. A3b), the median ECS − anvil BTD reveals only ∼0.75-K sensitivity (Fig. A3c), and the median IR − tropopause ranges between approximately −1.5 and −3.5 K (Fig. A3d). These results are consistent with the work of Murillo and Homeyer (2019) and Sandmæl et al. (2019), who did not find notable correlation between GOES-13/14 parameters and reported hail size.

Fig. A1.
Fig. A1.

Box-and-whisker distributions of select GOES-13 (blue) and GOES-16 (orange) IR cloud-top parameters stratified by MESH95 hail-size bins (matched within ±7.5 min and 30 × 30 km2) during the 2017 warm season: (a) OT probability, (b) ECS area, (c) ECS − anvil BTD, and (d) IR − tropopause. The number of matches within each bin is displayed below (a) and (b).

Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0042.1

Fig. A2.
Fig. A2.

Box-and-whisker distributions of select GOES-12/13 IR cloud-top parameters stratified by MWR hail-probability bins (matched within ±7.5 min and 28 × 28 km2) during 2007–17: (a) OT probability, (b) ECS area, (c) ECS − anvil BTD, and (d) IR − tropopause.

Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0042.1

Fig. A3.
Fig. A3.

Box-and-whisker distributions of select GOES-12/13 IR cloud-top parameters stratified by ground-spotter hail-size report bins (matched within ±7.5 min and 28 × 28 km2) during 2013–17: (a) OT probability, (b) ECS area, (c) ECS − anvil BTD, and (d) IR − tropopause.

Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0042.1

APPENDIX B

Additional Case Study, 5 April 2017

A case study of an early-season severe hail-, tornado-, and wind-producing event across the southeastern United States is shown in Fig. B1. Extremely large MERRA-2 SHIP (Fig. B1a) is aligned with environments of elevated SHEAR01 (Fig. B1d). Combined, these environmental forcings contributed to formation of severe hailstorms across Alabama, Georgia, and, to a lesser extent, areas of the Ohio River valley (Figs. B1d,h). GOES-13 reveals local regions of cloud tops colder and higher than the tropopause (Figs. B1b,f) aligned with OT probabilities at or near 1 (Fig. B1c), contributing to good agreement between the DNN hail likelihood with swaths of severe reports and potentially severe MESH95 (Figs. B1d,g,h). Isolated occurrences of potentially severe MESH95 in other areas are also correctly predicted as likely severe by the DNN. Enhanced views of Figs. B1g and B1h are shown in Fig. B2, combined with data from 15-min (Fig. B2b) and 5-min (Fig. B2c) GOES-16. Figure B2 provides a better view of the agreement between rapid-scan GOES-13 (Fig. B2a) and MESH95 swaths (Fig. B2d), the former of which is reasonably comparable to 5-min GOES-16 (Fig. B2b) along the strongest cell tracks, although with many false alarms elsewhere. The 15-min GOES-16 (Fig. B2c) application also shows more relative false alarms than we have seen from this result in the previous case studies. These minor shortcomings might be due to the DNN’s overreliance on Great Plains hail MESH events, and, therefore, the environmental conditions for strong southeastern U.S. hail events might not be as well understood by the model.

Fig. B1.
Fig. B1.

Maximum intensity of parameters aggregated from 1300 UTC 5 to 0600 UTC 6 Apr 2017: (a) MERRA-2 SHIP; (b) GOES-13 IR − tropopause; (c) GOES-13 OT probability; (d) SPC severe wind, hail, and tornado reports; (e) MERRA-2 SHEAR01; (f) GOES-13 tropopause-relative cloud-top height; (g) DNN-predicted severe hail likelihood exceeding 20%; and (h) 5-min MESH95 exceeding 1.5 in.

Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0042.1

Fig. B2.
Fig. B2.

Enhanced views of Figs. B1g and B1h black-outlined areas, starting from 1700 UTC 5 to 0600 UTC 6 Apr 2017, showing (a)–(c) DNN-predicted severe hail likelihoods exceeding 20% and (d) 5-min MESH95 exceeding 1.5 in. Likelihoods are determined using rapid-scan GOES-13, 15-min GOES-16, and 5-min GOES-16.

Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0042.1

REFERENCES

  • Allen, J. T., and M. K. Tippett, 2015: The characteristics of United States hail reports: 1955–2014. Electron. J. Severe Storms Meteor., 10 (3), https://ejssm.com/ojs/index.php/site/article/view/60.

    • Search Google Scholar
    • Export Citation
  • Allen, J. T., M. K. Tippett, and A. H. Sobel, 2015: An empirical model relating U.S. monthly hail occurrence to large-scale meteorological environment. J. Adv. Model. Earth Syst., 7, 226243, https://doi.org/10.1002/2014MS000397.

    • Search Google Scholar
    • Export Citation
  • Bang, S. D., and D. J. Cecil, 2019: Constructing a multifrequency passive microwave hail retrieval and climatology in the GPM domain. J. Appl. Meteor. Climatol., 58, 18891904, https://doi.org/10.1175/JAMC-D-19-0042.1.

    • Search Google Scholar
    • Export Citation
  • Bang, S. D., and D. J. Cecil, 2021: Testing passive microwave-based hail retrievals using GPM DPR Ku-band radar. J. Appl. Meteor. Climatol., 60, 255271, https://doi.org/10.1175/JAMC-D-20-0129.1.

    • Search Google Scholar
    • Export Citation
  • Barnes, L. R., D. M. Schultz, E. C. Gruntfest, M. H. Hayden, and C. C. Benight, 2009: CORRIGENDUM: False alarm rate or false alarm ratio? Wea. Forecasting, 24, 14521454, https://doi.org/10.1175/2009WAF2222300.1.

    • Search Google Scholar
    • Export Citation
  • Barnston, A. G., 1992: Correspondence among the correlation, RMSE, and Heidke forecast verification measures; refinement of the Heidke score. Wea. Forecasting, 7, 699709, https://doi.org/10.1175/1520-0434(1992)007<0699:CATCRA>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Bedka, K. M., and K. Khlopenkov, 2016: A probabilistic multispectral pattern recognition method for detection of overshooting cloud tops using passive satellite imager observations. J. Appl. Meteor. Climatol., 55, 19832005, https://doi.org/10.1175/JAMC-D-15-0249.1.

    • Search Google Scholar
    • Export Citation
  • Bedka, K. M., J. T. Allen, H. J. Punge, M. Kunz, and D. Simanovic, 2018: A long-term overshooting convective cloud-top detection database over Australia derived from MTSAT Japanese Advanced Meteorological Imager observations. J. Appl. Meteor. Climatol., 57, 937951, https://doi.org/10.1175/JAMC-D-17-0056.1.

    • Search Google Scholar
    • Export Citation
  • Bengio, Y., 2012: Practical recommendations for gradient-based training of deep architectures. Neural Networks: Tricks of the Trade, G. Montavon, G. B. Orr, and K. R. Müller, Eds., Lecture Notes in Computer Science, Vol. 7700, Springer, 437–374, https://doi.org/10.1007/978-3-642-35289-8_26.

  • Bowman, K. P., and C. R. Homeyer, 2017: GridRad—Three-dimensional gridded NEXRAD WSR-88D radar data. Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory, accessed 27 October 2019, https://doi.org/10.5065/D6NK3CR7.

  • Brooks, H. E., 2009: Proximity soundings for severe convection for Europe and the United States from reanalysis data. Atmos. Res., 93, 546553, https://doi.org/10.1016/j.atmosres.2008.10.005.

    • Search Google Scholar
    • Export Citation
  • Brooks, H. E., C. A. Doswell III, and J. Cooper, 1994: On the environments of tornadic and nontornadic mesocyclones. Wea. Forecasting, 9, 606618, https://doi.org/10.1175/1520-0434(1994)009<0606:OTEOTA>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Brooks, H. E., J. W. Lee, and J. P. Craven, 2003: The spatial distribution of severe thunderstorm and tornado environments from global reanalysis data. Atmos. Res., 67, 7394, https://doi.org/10.1016/S0169-8095(03)00045-0.

    • Search Google Scholar
    • Export Citation
  • Bruick, Z. S., K. L. Rasmussen, and D. J. Cecil, 2019: Subtropical South American hailstorm characteristics and environments. Mon. Wea. Rev., 147, 42894304, https://doi.org/10.1175/MWR-D-19-0011.1.

    • Search Google Scholar
    • Export Citation
  • Burke, A., N. Snook, D. J. Gagne II, S. McCorkle, and A. McGovern, 2020: Calibration of machine learning–based probabilistic hail predictions for operational forecasting. Wea. Forecasting, 35, 149168, https://doi.org/10.1175/WAF-D-19-0105.1.

    • Search Google Scholar
    • Export Citation
  • Cecil, D. J., 2009: Passive microwave brightness temperatures as proxies for hailstorms. J. Appl. Meteor. Climatol., 48, 12811286, https://doi.org/10.1175/2009JAMC2125.1.

    • Search Google Scholar
    • Export Citation
  • Cecil, D. J., S. J. Goodman, D. J. Boccippio, E. J. Zipser, and S. W. Nesbitt, 2005: Three years of TRMM precipitation features. Part I: Radar, radiometric, and lightning characteristics. Mon. Wea. Rev., 133, 543566, https://doi.org/10.1175/MWR-2876.1.

    • Search Google Scholar
    • Export Citation
  • Cintineo, J. L., T. M. Smith, V. Lakshmanan, H. E. Brooks, and K. L. Ortega, 2012: An objective high-resolution hail climatology of the contiguous United States. Wea. Forecasting, 27, 12351248, https://doi.org/10.1175/WAF-D-11-00151.1.

    • Search Google Scholar
    • Export Citation
  • Cintineo, J. L., and Coauthors, 2018: The NOAA/CIMSS ProbSevere model: Incorporation of total lightning and validation. Wea. Forecasting, 33, 331345, https://doi.org/10.1175/WAF-D-17-0099.1.

    • Search Google Scholar
    • Export Citation
  • Cintineo, J. L., M. J. Pavolonis, J. M. Sieglaff, L. Cronce, and J. Brunner, 2020: NOAA ProbSevere v2.0—ProbHail, ProbWind, and ProbTor. Wea. Forecasting, 35, 15231543, https://doi.org/10.1175/WAF-D-19-0242.1.

    • Search Google Scholar
    • Export Citation
  • Coniglio, M. C., and M. D. Parker, 2020: Insights into supercells and their environments from three decades of targeted radiosonde observations. Mon. Wea. Rev., 148, 48934915, https://doi.org/10.1175/MWR-D-20-0105.1.

    • Search Google Scholar
    • Export Citation
  • Coniglio, M. C., and R. E. Jewell, 2022: SPC mesoscale analysis compared to field-project soundings: Implications for supercell environment studies. Mon. Wea. Rev., 150, 567588, https://doi.org/10.1175/MWR-D-21-0222.1.

    • Search Google Scholar
    • Export Citation
  • Cooney, J. W., K. M. Bedka, K. P. Bowman, K. V. Khlopenkov, and K. Itterly, 2021: Comparing tropopause-penetrating convection identifications derived from NEXRAD and GOES over the contiguous United States. J. Geophys. Res. Atmos., 126, e2020JD034319, https://doi.org/10.1029/2020JD034319.

    • Search Google Scholar
    • Export Citation
  • Czernecki, B., M. Taszarek, M. Marosz, M. Półrolniczak, L. Kolendowicz, A. Wyszogrodzki, and J. Szturc, 2019: Application of machine learning to large hail prediction —The importance of radar reflectivity, lightning occurrence and convective parameters derived from ERA5. Atmos. Res., 227, 249262, https://doi.org/10.1016/j.atmosres.2019.05.010.

    • Search Google Scholar
    • Export Citation
  • Dennis, E. J., and M. R. Kumjian, 2017: The impact of vertical wind shear on hail growth in simulated supercells. J. Atmos. Sci., 74, 641663, https://doi.org/10.1175/JAS-D-16-0066.1.

    • Search Google Scholar
    • Export Citation
  • Doswell, C. A. III, R. Davies-Jones, and D. L. Keller, 1990: On summary measures of skill in rare event forecasting based on contingency tables. Wea. Forecasting, 5, 576585, https://doi.org/10.1175/1520-0434(1990)005%3C0576:OSMOSI%3E2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Doswell, C. A. III, R. Davies-Jones, H. E. Brooks, and M. P. Kay, 2005: Climatological estimates of daily local nontornadic severe thunderstorm probability for the United States. Wea. Forecasting, 20, 577595, https://doi.org/10.1175/WAF866.1.

    • Search Google Scholar
    • Export Citation
  • Dworak, R., K. Bedka, J. Brunner, and W. Feltz, 2012: Comparison between GOES-12 overshooting-top detections, WSR-88D radar reflectivity, and severe storm reports. Wea. Forecasting, 27, 684699, https://doi.org/10.1175/WAF-D-11-00070.1.

    • Search Google Scholar
    • Export Citation
  • Edwards, R., 2006: Supercells of the Serranias Del Burro (Mexico). 23rd Conf. on Severe Local Storms, St. Louis, MO, Amer. Meteor. Soc., P6.2, https://ams.confex.com/ams/pdfpapers/114980.pdf.

  • Elmore, K. L., J. T. Allen, and A. E. Gerard, 2022: Sub-severe and severe hail. Wea. Forecasting, 37, 13571369, https://doi.org/10.1175/WAF-D-21-0156.1.

    • Search Google Scholar
    • Export Citation
  • Farfán, L. M., B. S. Barrett, G. B. Raga, and J. J. Delgado, 2020: Characteristics of mesoscale convection over northwestern Mexico, the Gulf of California, and Baja California Peninsula. Int. J. Climatol., 41, E1062E1084, https://doi.org/10.1002/joc.6752.

    • Search Google Scholar
    • Export Citation
  • Fawcett, T., 2006: An introduction to ROC analysis. Pattern Recognit. Lett., 27, 861874, https://doi.org/10.1016/j.patrec.2005.10.010.

    • Search Google Scholar
    • Export Citation
  • Gagne, D. J., II, A. McGovern, S. E. Haupt, R. A. Sobash, J. K. Williams, and M. Xue, 2017: Storm-based probabilistic hail forecasting with machine learning applied to convection-allowing ensembles. Wea. Forecasting, 32, 18191840, https://doi.org/10.1175/WAF-D-17-0010.1.

    • Search Google Scholar
    • Export Citation
  • Gagne, D. J., II, S. E. Haupt, D. W. Nychka, and G. Thompson, 2019: Interpretable deep learning for spatial analysis of severe hailstorms. Mon. Wea. Rev., 147, 28272845, https://doi.org/10.1175/MWR-D-18-0316.1.

    • Search Google Scholar
    • Export Citation
  • Gelaro, R., W. McCarty, and M. J. Suárez, 2017: The Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2). J. Climate, 30, 54195454, https://doi.org/10.1175/JCLI-D-16-0758.1.

    • Search Google Scholar
    • Export Citation
  • Gensini, V. A., and M. K. Tippett, 2019: Global Ensemble Forecast System (GEFS) predictions of days 1–15 U.S. tornado and hail frequencies. Geophys. Res. Lett., 46, 29222930, https://doi.org/10.1029/2018GL081724.

    • Search Google Scholar
    • Export Citation
  • Gensini, V. A., C. Converse, W. S. Ashley, and M. Taszarek, 2021: Machine learning classification of significant tornadoes and hail in the United States using ERA5 proximity soundings. Wea. Forecasting, 36, 21432160, https://doi.org/10.1175/WAF-D-21-0056.1.

    • Search Google Scholar
    • Export Citation
  • Gerapetritis, H., J. M. Pelissier, and S. C. Greer, 1995: The critical success index and warning strategy. 17th Conf. on Probability and Statistics in the Atmospheric Sciences, Seattle, WA, Amer. Meteor. Soc., 2.10, https://ams.confex.com/ams/pdfpapers/70691.pdf.

  • Glorot, X., A. Bordes, and Y. Bengio, 2011: Deep sparse rectifier neural networks. Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS), JMLR, Vol. 15, ML Research Press, 315–323, https://proceedings.mlr.press/v15/glorot11a/glorot11a.pdf.

  • Goodfellow, I., Y. Bengio, and A. Courville, 2016: Deep Learning. MIT Press, 800 pp.

  • Griffin, S., K. M. Bedka, and C. Velden, 2016: A method for calculating the height of overshooting convective cloud tops using satellite-based IR imager and CloudSat Cloud Profiling Radar observations. J. Appl. Meteor. Climatol., 55, 479491, https://doi.org/10.1175/JAMC-D-15-0170.1.

    • Search Google Scholar
    • Export Citation
  • Gunturi, P., and M. K. Tippett, 2017: Managing severe thunderstorm risk: Impact of ENSO on U.S. tornado and hail frequencies. WillisRe Tech. Rep., 5 pp., http://www.columbia.edu/∼mkt14/files/WillisRe_Impact_of_ENSO_on_US_Tornado_and_Hail_frequencies_Final.pdf.

  • Heikenfeld, M., P. J. Marinescu, M. Christensen, D. Watson-Parris, F. Senf, S. C. van den Heever, and P. Stier, 2019: tobac 1.2: Towards a flexible framework for tracking and analysis of clouds in diverse datasets. Geosci. Model Dev., 12, 45514570, https://doi.org/10.5194/gmd-12-4551-2019.

    • Search Google Scholar
    • Export Citation
  • Hersbach, H., and Coauthors, 2020: The ERA5 global reanalysis. Quart. J. Roy. Meteor. Soc., 146, 19992049, https://doi.org/10.1002/qj.3803.

    • Search Google Scholar
    • Export Citation
  • Homeyer, C. R., and M. R. Kumjian, 2015: Microphysical characteristics of overshooting convection from polarimetric radar observations. J. Atmos. Sci., 72, 870891, https://doi.org/10.1175/JAS-D-13-0388.1.

    • Search Google Scholar
    • Export Citation
  • Homeyer, C. R., and K. P. Bowman, 2017: Algorithm description document for version 3.1 of the three-dimensional gridded NEXRAD WSR-88D radar (GridRad) dataset. University of Oklahoma Tech. Rep., 23 pp., http://gridrad.org/pdf/GridRad-v3.1-Algorithm-Description.pdf.

  • Hyvärinen, O., 2014: A probabilistic derivation of Heidke skill score. Wea. Forecasting, 29, 177181, https://doi.org/10.1175/WAF-D-13-00103.1.

    • Search Google Scholar
    • Export Citation
  • Johnson, A., and K. E. Sugden, 2014: Evaluation of sounding-derived thermodynamic and wind-related parameters associated with large hail events. Electron. J. Severe Storms Meteor., 9 (5), https://ejssm.com/ojs/index.php/site/article/view/57.

    • Search Google Scholar
    • Export Citation
  • Khlopenkov, K. V., K. M. Bedka, J. W. Cooney, and K. Itterly, 2021: Recent advances in detection of overshooting cloud tops from longwave infrared satellite imagery. J. Geophys. Res. Atmos., 126, e2020JD034359, https://doi.org/10.1029/2020JD034359.

    • Search Google Scholar
    • Export Citation
  • King, A. T., and A. D. Kennedy, 2019: North American supercell environments in atmospheric reanalyses and RUC-2. J. Appl. Meteor. Climatol., 58, 7192, https://doi.org/10.1175/JAMC-D-18-0015.1.

    • Search Google Scholar
    • Export Citation
  • Kingma, D. P., and J. L. Ba, 2015: Adam: A method for stochastic optimization. arXiv, 1412.6980v9, https://doi.org/10.48550/arXiv.1412.6980.

  • Kunz, M., 2007: The skill of convective parameters and indices to predict isolated and severe thunderstorms. Nat. Hazards Earth Syst. Sci., 7, 327342, https://doi.org/10.5194/nhess-7-327-2007.

    • Search Google Scholar
    • Export Citation
  • Lazzara, M. A., and Coauthors, 1999: The Man computer Interactive Data Access System: 25 years of interactive processing. Bull. Amer. Meteor. Soc., 80, 271284, https://doi.org/10.1175/1520-0477(1999)080<0271:TMCIDA>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • LeCun, Y., L. Bottou, G. B. Orr, and K. R. Müller, 1998: Efficient BackProp. Neural Networks: Tricks of the Trade, G. B. Orr and K. R. Müller, Eds., Lecture Notes in Computer Science, Vol. 1524, Springer, 9–50, https://doi.org/10.1007/3-540-49430-8_2.

  • Lepora, C., R. Abernathey, N. Henderson, J. T. Allen, and M. K. Tippett, 2021: Future global convective environments in CMIP6 models. Earth’s Future, 9, e2021EF002277, https://doi.org/10.1029/2021EF002277.

    • Search Google Scholar
    • Export Citation
  • Lin, T.-Y., P. Goyal, R. Girshick, K. He, and P. Dollár, 2020: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell., 42, 318327, https://doi.org/10.1109/TPAMI.2018.2858826.

    • Search Google Scholar
    • Export Citation
  • Liu, C., E. J. Zipser, D. J. Cecil, S. W. Nesbitt, and S. Sherwood, 2008: A cloud and precipitation feature database from nine years of TRMM observations. J. Appl. Meteor. Climatol., 47, 27122728, https://doi.org/10.1175/2008JAMC1890.1.

    • Search Google Scholar
    • Export Citation
  • Marion, G. R., R. J. Trapp, and W. Nesbitt, 2019: Using overshooting top area to discriminate potential for large, intense tornadoes. Geophys. Res. Lett., 46, 12 52012 526, https://doi.org/10.1029/2019GL084099.

    • Search Google Scholar
    • Export Citation
  • Masters, D., and C. Luschi, 2018: Revisiting small batch training for deep neural networks. arXiv, 1804.07612v1, https://doi.org/10.48550/arXiv.1804.07612.

  • Mecikalski, J. R., T. N. Sandmæl, E. M. Murillo, C. R. Homeyer, K. M. Bedka, J. M. Apke, and C. P. Jewett, 2021: A random-forest model to assess predictor importance and nowcast severe storms using high-resolution radar–GOES satellite–lightning observations. Mon. Wea. Rev., 149, 17251746, https://doi.org/10.1175/MWR-D-19-0274.1.

    • Search Google Scholar
    • Export Citation
  • Murillo, E. M., and C. R. Homeyer, 2019: Severe hail fall and hailstorm detection using remote sensing observations. J. Appl. Meteor. Climatol., 58, 947970, https://doi.org/10.1175/JAMC-D-18-0247.1.

    • Search Google Scholar
    • Export Citation
  • Murillo, E. M., C. R. Homeyer, and J. T. Allen, 2021: A 23-year severe hail climatology using GridRad MESH observations. Mon. Wea. Rev., 149, 945958, https://doi.org/10.1175/MWR-D-20-0178.1.

    • Search Google Scholar
    • Export Citation
  • NASA Earth Science Applied Sciences Disasters, 2021: 2021 annual summary. NASA, 81 pp., https://appliedsciences.nasa.gov/sites/default/files/2022-03/NASA%20Disasters%202021%20Annual%20Summary.pdf.

  • Nesbitt, S. W., E. J. Zipser, and D. J. Cecil, 2000: A census of precipitation features in the tropics using TRMM: Radar, ice scattering, and lightning observations. J. Climate, 13, 40874106, https://doi.org/10.1175/1520-0442(2000)013<4087:ACOPFI>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • NOAA, 2022: Significant Hail Parameter (SHiP). Accessed 1 February 2022, https://www.spc.noaa.gov/exper/soundings/help/ship.html.

  • North American Hail Workshop Panel Discussion, 2022: A new view: Hail science through the lens of early career scientists. 2022 North American Workshop on Hail & Hailstorms, accessed 5 January 2023, https://www.youtube.com/watch?v=OjVq4qyi5tQ&list=PLHgkMmlD5xYULMkfYRa5yMHqY1fBchpYo&index=2&t=126s.

  • Ortega, K. L., 2018: Evaluating multi-radar, multi-sensor products for surface hail-fall diagnosis. Electron. J. Severe Storms Meteor., 13 (1), https://doi.org/10.55599/ejssm.v13i1.69.

    • Search Google Scholar
    • Export Citation
  • Prechelt, L., 1998: Automatic early stopping using cross validation: Quantifying the criteria. Neural Networks, 11, 761767, https://doi.org/10.1016/S0893-6080(98)00010-0.

    • Search Google Scholar
    • Export Citation
  • Prein, A. F., and G. J. Holland, 2018: Global estimates of damaging hail hazard. Wea. Climate Extremes, 22, 10 23, https://doi.org/10.1016/j.wace.2018.10.004.

    • Search Google Scholar
    • Export Citation
  • Púčik, T., P. Groenemeijer, D. Rýva, and M. Kolář, 2015: Proximity soundings of severe and nonsevere thunderstorms in Central Europe. Mon. Wea. Rev., 143, 48054821, https://doi.org/10.1175/MWR-D-15-0104.1.

    • Search Google Scholar
    • Export Citation
  • Punge, H. J., K. M. Bedka, M. Kunz, and A. Werner, 2014: A new physically based stochastic event catalog for hail in Europe. Nat. Hazards, 73, 16251645, https://doi.org/10.1007/s11069-014-1161-0.

    • Search Google Scholar
    • Export Citation
  • Punge, H. J., K. M. Bedka, M. Kunz, and A. Reinbold, 2017: Hail frequency estimation across Europe based on a combination of overshooting top detections and the ERA-Interim reanalysis. Atmos. Res., 198, 3443, https://doi.org/10.1016/j.atmosres.2017.07.025.

    • Search Google Scholar
    • Export Citation
  • Punge, H. J., K. M. Bedka, M. Kunz, S. D. Bang, and K. F. Itterly, 2023: Characteristics of hail hazard in South Africa based on satellite detection of convective storms. Nat. Hazards Earth Syst. Sci., 23, 15491576, https://doi.org/10.5194/nhess-23-1549-2023.

    • Search Google Scholar
    • Export Citation
  • Roebber, P. J., 2009: Visualizing multiple measures of forecast quality. Wea. Forecasting, 24, 601608, https://doi.org/10.1175/2008WAF2222159.1.

    • Search Google Scholar
    • Export Citation
  • Sandmæl, T. N., C. R. Homeyer, K. M. Bedka, J. M. Apke, J. R. Mecikalski, and K. Khlopenkov, 2019: Evaluating the ability of remote sensing observations to identify significantly severe and potentially tornadic storms. J. Appl. Meteor. Climatol., 58, 25692590, https://doi.org/10.1175/JAMC-D-18-0241.1.

    • Search Google Scholar
    • Export Citation
  • Schaefer, J. T., 1990: The critical success index as an indicator of warning skill. Wea. Forecasting, 5, 570575, https://doi.org/10.1175/1520-0434(1990)005<0570:TCSIAA>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Taszarek, M., J. T. Allen, T. Púčik, K. A. Hoogewind, and H. E. Brooks, 2020: Severe convective storms across Europe and the United States. Part 2: ERA5 environments associated with lightning, large hail, severe wind and tornadoes. J. Climate, 33, 10 26310 286, https://doi.org/10.1175/JCLI-D-20-0346.1.

    • Search Google Scholar
    • Export Citation
  • Taszarek, M., N. Pilguj, J. T. Allen, V. Gensini, H. E. Brooks, and P. Szuster, 2021: Comparison of convective parameters derived from ERA5 and MERRA-2 with rawinsonde data over Europe and North America. J. Climate, 34, 32113237, https://doi.org/10.1175/JCLI-D-20-0484.1.

    • Search Google Scholar
    • Export Citation
  • University of Oklahoma School of Meteorology, 2021: GridRad-Severe—Three-dimensional gridded NEXRAD WSR-88D radar data for severe events. Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory, accessed 1 February 2022, https://doi.org/10.5065/2B46-1A97.

  • Wendt, N. A., and I. L. Jirak, 2021: An hourly climatology of operational MRMS MESH-diagnosed severe and significant hail with comparisons to Storm Data hail reports. Wea. Forecasting, 36, 645659, https://doi.org/10.1175/WAF-D-20-0158.1.