Browse

You are looking at 31 - 40 of 162 items for :

  • Artificial Intelligence for the Earth Systems x
  • Refine by Access: All Content x
Clear All
Israt Jahan
,
Diego Cerrai
, and
Marina Astitha

Abstract

Wind gusts are often associated with severe hazards and can cause structural and environmental damages, making gust prediction a crucial element of weather forecasting services. In this study, we explored the utilization of machine learning (ML) algorithms integrated with numerical weather prediction outputs from the Weather Research and Forecasting (WRF) Model, to align the estimation of wind gust potential with observed gusts. We have used two ML algorithms, namely, random forest (RF) and extreme gradient boosting (XGB), along with two statistical techniques: generalized linear model with identity link function (GLM-Identity) and generalized linear model with log link function (GLM-Log), to predict storm wind gusts for the northeast (NE) United States. We used 61 simulated extratropical and tropical storms that occurred between 2005 and 2020 to develop and validate the ML and statistical models. To assess the ML model performance, we compared our results with postprocessed gust potential from WRF. Our findings showed that ML models, especially XGB, performed significantly better than statistical models and Unified Post Processor for the WRF (WRF-UPP) Model and were able to better align predicted with observed gusts across all storms. The ML models faced challenges capturing the upper tail of the gust distribution, and the learning curves suggested that XGB was more effective than RF in generating better predictions with fewer storms.

Open access
Free access
L. Raynaud
,
G. Faure
,
M. Puig
,
C. Dauvilliers
,
J.-N. Trosino
, and
P. Béjean

Abstract

Detection and tracking of tropical cyclones (TCs) in numerical weather prediction model outputs is essential for many applications, such as forecast guidance and real-time monitoring of events. While this task has been automated in the 1990s with heuristic models, relying on a set of empirical rules and thresholds, the recent success of machine learning methods to detect objects in images opens new perspectives. This paper introduces and evaluates the capacity of a convolutional neural network based on the U-Net architecture to detect the TC wind structure, including maximum wind speed area and hurricane-force wind speed area, in the outputs of the convective-scale AROME model. A dataset of 400 AROME forecasts over the West Indies domain has been entirely hand-labeled by experts, following a rigorous process to reduce heterogeneities. The U-Net performs well on a wide variety of TC intensities and shapes, with an average intersection-over-union metric of around 0.8. Its performances, however, strongly depend on the TC strength, and the detection of weak cyclones is more challenging since their structure is less well defined. The U-Net also significantly outperforms an operational heuristic detection model, with a significant gain for weak TCs, while running much faster. In the last part, the capacity of the U-Net to generalize on slightly different data is demonstrated in the context of a domain change and a resolution increase. In both cases, the pretrained U-Net achieves similar performances as the original dataset.

Open access
Fraser King
,
Claire Pettersen
,
Christopher G. Fletcher
, and
Andrew Geiss

Abstract

CloudSat’s Cloud Profiling Radar is a valuable tool for remotely monitoring high-latitude snowfall, but its ability to observe hydrometeor activity near the Earth’s surface is limited by a radar blind zone caused by ground clutter contamination. This study presents the development of a deeply supervised U-Net-style convolutional neural network to predict cold season reflectivity profiles within the blind zone at two Arctic locations. The network learns to predict the presence and intensity of near-surface hydrometeors by coupling latent features encoded in blind zone-aloft clouds with additional context from collocated atmospheric state variables (i.e., temperature, specific humidity, and wind speed). Results show that the U-Net predictions outperform traditional linear extrapolation methods, with low mean absolute error, a 38% higher Sørensen–Dice coefficient, and vertical reflectivity distributions 60% closer to observed values. The U-Net is also able to detect the presence of near-surface cloud with a critical success index (CSI) of 72% and cases of shallow cumuliform snowfall and virga with 18% higher CSI values compared to linear methods. An explainability analysis shows that reflectivity information throughout the scene, especially at cloud edges and at the 1.2-km blind zone threshold, along with atmospheric state variables near the tropopause, are the most significant contributors to model skill. This surface-trained generative inpainting technique has the potential to enhance current and future remote sensing precipitation missions by providing a better understanding of the nonlinear relationship between blind zone reflectivity values and the surrounding atmospheric state.

Significance Statement

Snowfall is a critical contributor to the global water–energy budget, with important connections to water resource management, flood mitigation, and ecosystem sustainability. However, traditional spaceborne remote monitoring of snowfall faces challenges due to a near-surface radar blind zone, which masks a portion of the atmosphere. In this study, a deep learning model was developed to fill in missing data across these regions using surface radar and atmospheric state variables. The model accurately predicts reflectivity, with significant improvements over conventional methods. This innovative approach enhances our understanding of reflectivity patterns and atmospheric interactions, bolstering advances in remote snowfall prediction.

Open access
Tobias Bischoff
and
Katherine Deck

Abstract

We present a method to downscale idealized geophysical fluid simulations using generative models based on diffusion maps. By analyzing the Fourier spectra of fields drawn from different data distributions, we show how a diffusion bridge can be used as a transformation between a low-resolution and a high-resolution dataset, allowing for new sample generation of high-resolution fields given specific low-resolution features. The ability to generate new samples allows for the computation of any statistic of interest, without any additional calibration or training. Our unsupervised setup is also designed to downscale fields without access to paired training data; this flexibility allows for the combination of multiple source and target domains without additional training. We demonstrate that the method enhances resolution and corrects context-dependent biases in geophysical fluid simulations, including in extreme events. We anticipate that the same method can be used to downscale the output of climate simulations, including temperature and precipitation fields, without needing to train a new model for each application and providing a significant computational cost savings.

Significance Statement

The purpose of this study is to apply recent advances in generative machine learning technologies to obtain higher-resolution geophysical fluid dynamics model output at lower cost compared with direct simulation while preserving important statistical properties of the high-resolution data. This is important because while high-resolution climate model output is required by many applications, it is also computationally expensive to obtain.

Open access
Catharina Elisabeth Graafland
,
Swen Brands
, and
José Manuel Gutiérrez

Abstract

The different phases of the Coupled Model Intercomparison Project (CMIP) provide ensembles of past, present, and future climate simulations crucial for climate change impact and adaptation activities. These ensembles are produced using multiple global climate models (GCMs) from different modeling centers with some shared building blocks and interdependencies. Applications typically follow the “model democracy” approach which might have significant implications in the resulting products (e.g., large bias and low spread). Thus, quantifying model similarity within ensembles is crucial for interpreting model agreement and multimodel uncertainty in climate change studies. The classical methods used for assessing GCM similarity can be classified into two groups. The a priori approach relies on expert knowledge about the components of these models, while the a posteriori approach seeks similarity in the GCMs’ output variables and is thus data-driven. In this study, we apply probabilistic network models (PNMs), a well-established machine learning technique, as a new a posteriori method to measure intermodel similarities. The proposed methodology is applied to surface temperature fields of the historical experiments from the CMIP5 multimodel ensemble and different reanalysis gridded datasets. PNMs are able to learn the complex spatial dependency structures present in climate data, including teleconnections operating on multiple spatial scales, characteristic of the underlying GCM. A distance metric building on the resulting PNMs is applied to characterize GCM model dependencies. The results of this approach are in line with those obtained with more traditional methods but have further explanatory potential building on probabilistic model querying.

Significance Statement

The present study proposes the use of probabilistic network models (PNMs) to quantify model similarity within ensembles of global climate models (GCMs). This is crucial for interpreting model agreement and multimodel uncertainty in climate change studies. When applied to climate data (gridded global surface temperature in this study), PNMs encode the relevant spatial dependencies (local and remote connections). Similarities among the PNMs resulting from different GCMs can be quantified and are shown to capture similar GCM formulations reported in previous studies. Differently to other machine learning methods previously applied to this problem, PNMs are fully explainable (allowing probabilistic querying) and are applicable to high-dimensional gridded raw data.

Open access
Yingkai Sha
,
Ryan A. Sobash
, and
David John Gagne II

Abstract

An ensemble postprocessing method is developed for the probabilistic prediction of severe weather (tornadoes, hail, and wind gusts) over the conterminous United States (CONUS). The method combines conditional generative adversarial networks (CGANs), a type of deep generative model, with a convolutional neural network (CNN) to postprocess convection-allowing model (CAM) forecasts. The CGANs are designed to create synthetic ensemble members from deterministic CAM forecasts, and their outputs are processed by the CNN to estimate the probability of severe weather. The method is tested using High-Resolution Rapid Refresh (HRRR) 1–24-h forecasts as inputs and Storm Prediction Center (SPC) severe weather reports as targets. The method produced skillful predictions with up to 20% Brier skill score (BSS) increases compared to other neural-network-based reference methods using a testing dataset of HRRR forecasts in 2021. For the evaluation of uncertainty quantification, the method is overconfident but produces meaningful ensemble spreads that can distinguish good and bad forecasts. The quality of CGAN outputs is also evaluated. Results show that the CGAN outputs behave similarly to a numerical ensemble; they preserved the intervariable correlations and the contribution of influential predictors as in the original HRRR forecasts. This work provides a novel approach to postprocess CAM output using neural networks that can be applied to severe weather prediction.

Significance Statement

We use a new machine learning (ML) technique to generate probabilistic forecasts of convective weather hazards, such as tornadoes and hailstorms, with the output from high-resolution numerical weather model forecasts. The new ML system generates an ensemble of synthetic forecast fields from a single forecast, which are then used to train ML models for convective hazard prediction. Using this ML-generated ensemble for training leads to improvements of 10%–20% in severe weather forecast skills compared to using other ML algorithms that use only output from the single forecast. This work is unique in that it explores the use of ML methods for producing synthetic forecasts of convective storm events and using these to train ML systems for high-impact convective weather prediction.

Open access
Çağlar Küçük
,
Apostolos Giannakos
,
Stefan Schneider
, and
Alexander Jann

Abstract

Weather radar data are critical for nowcasting and an integral component of numerical weather prediction models. While weather radar data provide valuable information at high resolution, their ground-based nature limits their availability, which impedes large-scale applications. In contrast, meteorological satellites cover larger domains but with coarser resolution. However, with the rapid advancements in data-driven methodologies and modern sensors aboard geostationary satellites, new opportunities are emerging to bridge the gap between ground- and space-based observations, ultimately leading to more skillful weather prediction with high accuracy. Here, we present a transformer-based model for nowcasting ground-based radar image sequences using satellite data up to 2-h lead time. Trained on a dataset reflecting severe weather conditions, the model predicts radar fields occurring under different weather phenomena and shows robustness against rapidly growing/decaying fields and complex field structures. Model interpretation reveals that the infrared channel centered at 10.3 μm (C13) contains skillful information for all weather conditions, while lightning data have the highest relative feature importance in severe weather conditions, particularly in shorter lead times. The model can support precipitation nowcasting across large domains without an explicit need for radar towers, enhance numerical weather prediction and hydrological models, and provide radar proxy for data-scarce regions. Moreover, the open-source framework facilitates progress toward operational data-driven nowcasting.

Significance Statement

Ground-based weather radar data are essential for nowcasting, but data availability limitations hamper usage of radar data across large domains. We present a machine learning model, rooted in transformer architecture, that performs nowcasting of radar data using high-resolution geostationary satellite retrievals, for lead times of up to 2 h. Our model captures the spatiotemporal dynamics of radar fields from satellite data and offers accurate forecasts. Analysis indicates that the infrared channel centered at 10.3 μm provides useful information for nowcasting radar fields under various weather conditions. However, lightning activity exhibits the highest forecasting skill for severe weather at short lead times. Our findings show the potential of transformer-based models for nowcasting severe weather.

Open access
Wei-Yi Cheng
,
Daehyun Kim
,
Scott Henderson
,
Yoo-Geun Ham
,
Jeong-Hwan Kim
, and
Rober H. Holzworth

Abstract

The diversity in the lightning parameterizations for numerical weather and climate models causes considerable uncertainty in lightning prediction. In this study, we take a data-driven approach to address the lightning parameterization problem, by combining machine learning (ML) techniques with the rich lightning observations from the World Wide Lightning Location Network. Three ML algorithms are trained over the contiguous United States (CONUS) to predict lightning stroke density in a 1° box based on the information about the atmospheric variables in the same grid (local) or over the entire CONUS (nonlocal). The performance of the ML-based lightning schemes is examined and compared with that of a simple, conventional lightning parameterization scheme of Romps et al. We find that all ML-based lightning schemes exhibit a performance that is superior to that of the conventional scheme in the regions and in the seasons with climatologically higher lightning stroke density. To the west of the Rocky Mountains, the nonlocal ML lightning scheme achieves the best overall performance, with lightning stroke density predictions being 70% more accurate than the conventional scheme. Our results suggest that the ML-based approaches have the potential to improve the representation of lightning and other types of extreme weather events in the weather and climate models.

Open access
Harold E. Brooks
,
Montgomery L. Flora
, and
Michael E. Baldwin

Abstract

Forecast evaluation metrics have been discovered and rediscovered in a variety of contexts, leading to confusion. We look at measures from the 2 × 2 contingency table and the history of their development and illustrate how different fields working on similar problems has led to different approaches and perspectives of the same mathematical concepts. For example, probability of detection (POD) is a quantity in meteorology that was also called prefigurance in the field, while the same thing is named recall in information science and machine learning, and sensitivity and true positive rate in the medical literature. Many of the scores that combine three elements of the 2 × 2 table can be seen as either coming from a perspective of Venn diagrams or from the Pythagorean means, possibly weighted, of two ratios of performance measures. Although there are algebraic relationships between the two perspectives, the approaches taken by authors led them in different directions, making it unlikely that they would discover scores that naturally arose from the other approach. We close by discussing the importance of understanding the implicit or explicit values expressed by the choice of scores. In addition, we make some simple recommendations about the appropriate nomenclature to use when publishing interdisciplinary work.

Open access