Search Results

You are looking at 11 - 20 of 22 items for

  • Author or Editor: David John Gagne II x
  • Refine by Access: All Content x
Clear All Modify Search
Amanda Burke
,
Nathan Snook
,
David John Gagne II
,
Sarah McCorkle
, and
Amy McGovern

Abstract

In this study, we use machine learning (ML) to improve hail prediction by postprocessing numerical weather prediction (NWP) data from the new High-Resolution Ensemble Forecast system, version 2 (HREFv2). Multiple operational models and ensembles currently predict hail, however ML models are more computationally efficient and do not require the physical assumptions associated with explicit predictions. Calibrating the ML-based predictions toward familiar forecaster output allows for a combination of higher skill associated with ML models and increased forecaster trust in the output. The observational dataset used to train and verify the random forest model is the Maximum Estimated Size of Hail (MESH), a Multi-Radar Multi-Sensor (MRMS) product. To build trust in the predictions, the ML-based hail predictions are calibrated using isotonic regression. The target datasets for isotonic regression include the local storm reports and Storm Prediction Center (SPC) practically perfect data. Verification of the ML predictions indicates that the probability magnitudes output from the calibrated models closely resemble the day-1 SPC outlook and practically perfect data. The ML model calibrated toward the local storm reports exhibited better or similar skill to the uncalibrated predictions, while decreasing model bias. Increases in reliability and skill after calibration may increase forecaster trust in the automated hail predictions.

Free access
Amanda Burke
,
Nathan Snook
,
David John Gagne II
,
Sarah McCorkle
, and
Amy McGovern

Abstract

In this study, we use machine learning (ML) to improve hail prediction by postprocessing numerical weather prediction (NWP) data from the new High-Resolution Ensemble Forecast system, version 2 (HREFv2). Multiple operational models and ensembles currently predict hail, however ML models are more computationally efficient and do not require the physical assumptions associated with explicit predictions. Calibrating the ML-based predictions toward familiar forecaster output allows for a combination of higher skill associated with ML models and increased forecaster trust in the output. The observational dataset used to train and verify the random forest model is the Maximum Estimated Size of Hail (MESH), a Multi-Radar Multi-Sensor (MRMS) product. To build trust in the predictions, the ML-based hail predictions are calibrated using isotonic regression. The target datasets for isotonic regression include the local storm reports and Storm Prediction Center (SPC) practically perfect data. Verification of the ML predictions indicates that the probability magnitudes output from the calibrated models closely resemble the day-1 SPC outlook and practically perfect data. The ML model calibrated toward the local storm reports exhibited better or similar skill to the uncalibrated predictions, while decreasing model bias. Increases in reliability and skill after calibration may increase forecaster trust in the automated hail predictions.

Full access
Amy McGovern
,
David John Gagne II
,
Jeffrey Basara
,
Thomas M. Hamill
, and
David Margolin
Full access
David John Gagne II
,
Amy McGovern
,
Jeffrey B. Basara
, and
Rodger A. Brown

Abstract

Oklahoma Mesonet surface data and North American Regional Reanalysis data were integrated with the tracks of over 900 tornadic and nontornadic supercell thunderstorms in Oklahoma from 1994 to 2003 to observe the evolution of near-storm environments with data currently available to operational forecasters. These data are used to train a complex data-mining algorithm that can analyze the variability of meteorological data in both space and time and produce a probabilistic prediction of tornadogenesis given variables describing the near-storm environment. The algorithm was assessed for utility in four ways. First, its probability forecasts were scored. The algorithm did produce some useful skill in discriminating between tornadic and nontornadic supercells as well as in producing reliable probabilities. Second, its selection of relevant attributes was assessed for physical significance. Surface thermodynamic parameters, instability, and bulk wind shear were among the most significant attributes. Third, the algorithm’s skill was compared with the skill of single variables commonly used for tornado prediction. The algorithm did noticeably outperform all of the single variables, including composite parameters. Fourth, the situational variations of the predictions from the algorithm were shown in case studies. They revealed instances both in which the algorithm excelled and in which the algorithm was limited.

Full access
Amy McGovern
,
Kimberly L. Elmore
,
David John Gagne II
,
Sue Ellen Haupt
,
Christopher D. Karstens
,
Ryan Lagerquist
,
Travis Smith
, and
John K. Williams

Abstract

High-impact weather events, such as severe thunderstorms, tornadoes, and hurricanes, cause significant disruptions to infrastructure, property loss, and even fatalities. High-impact events can also positively impact society, such as the impact on savings through renewable energy. Prediction of these events has improved substantially with greater observational capabilities, increased computing power, and better model physics, but there is still significant room for improvement. Artificial intelligence (AI) and data science technologies, specifically machine learning and data mining, bridge the gap between numerical model prediction and real-time guidance by improving accuracy. AI techniques also extract otherwise unavailable information from forecast models by fusing model output with observations to provide additional decision support for forecasters and users. In this work, we demonstrate that applying AI techniques along with a physical understanding of the environment can significantly improve the prediction skill for multiple types of high-impact weather. The AI approach is also a contribution to the growing field of computational sustainability. The authors specifically discuss the prediction of storm duration, severe wind, severe hail, precipitation classification, forecasting for renewable energy, and aviation turbulence. They also discuss how AI techniques can process “big data,” provide insights into high-impact weather phenomena, and improve our understanding of high-impact weather.

Open access
David John Gagne II
,
Amy McGovern
,
Sue Ellen Haupt
,
Ryan A. Sobash
,
John K. Williams
, and
Ming Xue

Abstract

Forecasting severe hail accurately requires predicting how well atmospheric conditions support the development of thunderstorms, the growth of large hail, and the minimal loss of hail mass to melting before reaching the surface. Existing hail forecasting techniques incorporate information about these processes from proximity soundings and numerical weather prediction models, but they make many simplifying assumptions, are sensitive to differences in numerical model configuration, and are often not calibrated to observations. In this paper a storm-based probabilistic machine learning hail forecasting method is developed to overcome the deficiencies of existing methods. An object identification and tracking algorithm locates potential hailstorms in convection-allowing model output and gridded radar data. Forecast storms are matched with observed storms to determine hail occurrence and the parameters of the radar-estimated hail size distribution. The database of forecast storms contains information about storm properties and the conditions of the prestorm environment. Machine learning models are used to synthesize that information to predict the probability of a storm producing hail and the radar-estimated hail size distribution parameters for each forecast storm. Forecasts from the machine learning models are produced using two convection-allowing ensemble systems and the results are compared to other hail forecasting methods. The machine learning forecasts have a higher critical success index (CSI) at most probability thresholds and greater reliability for predicting both severe and significant hail.

Full access
Peter D. Dueben
,
Martin G. Schultz
,
Matthew Chantry
,
David John Gagne II
,
David Matthew Hall
, and
Amy McGovern

Abstract

Benchmark datasets and benchmark problems have been a key aspect for the success of modern machine learning applications in many scientific domains. Consequently, an active discussion about benchmarks for applications of machine learning has also started in the atmospheric sciences. Such benchmarks allow for the comparison of machine learning tools and approaches in a quantitative way and enable a separation of concerns for domain and machine learning scientists. However, a clear definition of benchmark datasets for weather and climate applications is missing with the result that many domain scientists are confused. In this paper, we equip the domain of atmospheric sciences with a recipe for how to build proper benchmark datasets, a (nonexclusive) list of domain-specific challenges for machine learning is presented, and it is elaborated where and what benchmark datasets will be needed to tackle these challenges. We hope that the creation of benchmark datasets will help the machine learning efforts in atmospheric sciences to be more coherent, and, at the same time, target the efforts of machine learning scientists and experts of high-performance computing to the most imminent challenges in atmospheric sciences. We focus on benchmarks for atmospheric sciences (weather, climate, and air-quality applications). However, many aspects of this paper will also hold for other aspects of the Earth system sciences or are at least transferable.

Significance Statement

Machine learning is the study of computer algorithms that learn automatically from data. Atmospheric sciences have started to explore sophisticated machine learning techniques and the community is making rapid progress on the uptake of new methods for a large number of application areas. This paper provides a clear definition of so-called benchmark datasets for weather and climate applications that help to share data and machine learning solutions between research groups to reduce time spent in data processing, to generate synergies between groups, and to make tool developments more targeted and comparable. Furthermore, a list of benchmark datasets that will be needed to tackle important challenges for the use of machine learning in atmospheric sciences is provided.

Free access
Ryan A. Sobash
,
David John Gagne II
,
Charlie L. Becker
,
David Ahijevych
,
Gabrielle N. Gantos
, and
Craig S. Schwartz

Abstract

While convective storm mode is explicitly depicted in convection-allowing model (CAM) output, subjectively diagnosing mode in large volumes of CAM forecasts can be burdensome. In this work, four machine learning (ML) models were trained to probabilistically classify CAM storms into one of three modes: supercells, quasi-linear convective systems, and disorganized convection. The four ML models included a dense neural network (DNN), logistic regression (LR), a convolutional neural network (CNN), and semisupervised CNN–Gaussian mixture model (GMM). The DNN, CNN, and LR were trained with a set of hand-labeled CAM storms, while the semisupervised GMM used updraft helicity and storm size to generate clusters, which were then hand labeled. When evaluated using storms withheld from training, the four classifiers had similar ability to discriminate between modes, but the GMM had worse calibration. The DNN and LR had similar objective performance to the CNN, suggesting that CNN-based methods may not be needed for mode classification tasks. The mode classifications from all four classifiers successfully approximated the known climatology of modes in the United States, including a maximum in supercell occurrence in the U.S. Central Plains. Further, the modes also occurred in environments recognized to support the three different storm morphologies. Finally, storm mode provided useful information about hazard type, e.g., storm reports were most likely with supercells, further supporting the efficacy of the classifiers. Future applications, including the use of objective CAM mode classifications as a novel predictor in ML systems, could potentially lead to improved forecasts of convective hazards.

Significance Statement

Whether a thunderstorm produces hazards such as tornadoes, hail, or intense wind gusts is in part determined by whether the storm takes the form of a single cell or a line. Numerical forecasting models can now provide forecasts that depict this structure. We tested several automated algorithms to extract this information from forecast output using machine learning. All of the automated methods were able to distinguish between a set of three convective types, with the simple techniques providing similarly skilled classifications compared to the complex approaches. The automated classifications also successfully discriminated between thunderstorm hazards, potentially leading to new forecast tools and better forecasts of high-impact convective hazards.

Restricted access
Amy McGovern
,
Ann Bostrom
,
Phillip Davis
,
Julie L. Demuth
,
Imme Ebert-Uphoff
,
Ruoying He
,
Jason Hickey
,
David John Gagne II
,
Nathan Snook
,
Jebb Q. Stewart
,
Christopher Thorncroft
,
Philippe Tissot
, and
John K. Williams

Abstract

We introduce the National Science Foundation (NSF) AI Institute for Research on Trustworthy AI in Weather, Climate, and Coastal Oceanography (AI2ES). This AI institute was funded in 2020 as part of a new initiative from the NSF to advance foundational AI research across a wide variety of domains. To date AI2ES is the only NSF AI institute focusing on environmental science applications. Our institute focuses on developing trustworthy AI methods for weather, climate, and coastal hazards. The AI methods will revolutionize our understanding and prediction of high-impact atmospheric and ocean science phenomena and will be utilized by diverse, professional user groups to reduce risks to society. In addition, we are creating novel educational paths, including a new degree program at a community college serving underrepresented minorities, to improve workforce diversity for both AI and environmental science.

Full access
Amy McGovern
,
Ann Bostrom
,
Marie McGraw
,
Randy J. Chase
,
David John Gagne II
,
Imme Ebert-Uphoff
,
Kate D. Musgrave
, and
Andrea Schumacher

Abstract

Artificial intelligence (AI) can be used to improve performance across a wide range of Earth system prediction tasks. As with any application of AI, it is important for AI to be developed in an ethical and responsible manner to minimize bias and other effects. In this work, we extend our previous work demonstrating how AI can go wrong with weather and climate applications by presenting a categorization of bias for AI in the Earth sciences. This categorization can assist AI developers to identify potential biases that can affect their model throughout the AI development life cycle. We highlight examples from a variety of Earth system prediction tasks of each category of bias.

Open access