Browse

You are looking at 1 - 10 of 118 items for :

  • Artificial Intelligence for the Earth Systems x
  • Refine by Access: All Content x
Clear All
Mahsa Payami
,
Yunsoo Choi
,
Ahmed Khan Salman
,
Seyedali Mousavinezhad
,
Jincheol Park
, and
Arman Pouyaei

Abstract

In this study, we developed an emulator of the Community Multiscale Air Quality (CMAQ) model by employing a 1-dimensional Convolutional Neural Network (CNN) algorithm to predict hourly surface nitrogen dioxide (NO2) concentrations over the most densely populated urban regions in Texas. The inputs for the emulator were the same as those for the CMAQ model, which includes emission, meteorology, and land use land cover data. We trained the model over June, July, and August (JJA) of 2011 and 2014 and then tested it on JJA of 2017, achieving an Index of Agreement (IOA) of 0.95 and a correlation of 0.90. We also employed temporal 3-fold cross-validation to evaluate the model’s performance, ensuring the robustness and generalizability of the results. To gain deeper insights and understand the factors influencing the model’s surface NO2 predictions, we conducted a Shapley Additive Explanations analysis. The results revealed solar radiation reaching the surface, Planetary Boundary Layer height, and NOx (NO + NO2) emissions are key variables driving the model’s predictions. These findings highlight the emulator’s ability to capture the individual impact of each variable on the model’s NO2 predictions. Furthermore, our emulator outperformed the CMAQ model in terms of computational efficiency, being more than 900 times faster in predicting NO2 concentrations, enabling the rapid assessment of various pollution management scenarios. This work offers a valuable resource for air pollution mitigation efforts, not just in Texas, but with appropriate regional data training, its utility could be extended to other regions and pollutants as well.

Open access
Zhiwei Zhen
,
Huikyo Lee
,
Ignacio Segovia-Dominguez
,
Meichen Huang
,
Yuzhou Chen
,
Michael Garay
,
Daniel Crichton
, and
Yulia R. Gel

Abstract

Virtually all aspects of our societal functioning – from food security to energy supply to healthcare – depend on the dynamics of environmental factors. Nevertheless, the social dimensions of weather and climate are noticeably less explored by the artificial intelligence community. By harnessing the strength of geometric deep learning (GDL), we aim to investigate the pressing societal question the potential disproportional impacts of air quality on COVID-19 clinical severity. To quantify air pollution levels, here we use aerosol optical depth (AOD) records which measure the reduction of the sunlight due to atmospheric haze, dust, and smoke. We also introduce unique and not yet broadly available NASA satellite records (NASAdat) on AOD, temperature and relative humidity and discuss the utility of these new data for biosurveillance and climate justice applications, with a specific focus on COVID-19 in the States of Texas and Pennsylvania in USA. The results indicate that, in general, the poorer air quality tends to be associated with higher rates for clinical severity and, in case of Texas, this phenomenon particularly stands out in Texan counties characterized by higher socioeconomic vulnerability. This, in turn, raises a concern of environmental injustice in these socio-economically disadvantaged communities. Furthermore, given that one of NASA’s recent long-term commitments is to address such inequitable burden of environmental harm by expanding the use of Earth science data such as NASAdat, this project is one of the first steps toward developing a new platform integrating NASA’s satellite observations with DL tools for social good.

Open access
Randy J. Chase
,
Amy McGovern
,
Cameron R. Homeyer
,
Peter J. Marinescu
, and
Corey K. Potvin

Abstract

The quantification of storm updrafts remains unavailable for operational forecasting despite their inherent importance to convection and its associated severe weather hazards. Updraft proxies, like overshooting top area from satellite images, have been linked to severe weather hazards but only relate to a limited portion of the total storm updraft. This study investigates if a machine learning model, namely U-Nets, can skillfully retrieve maximum vertical velocity and its areal extent from 3-dimensional gridded radar reflectivity alone. The machine learning model is trained using simulated radar reflectivity and vertical velocity from the National Severe Storm Laboratory’s convection permitting Warn on Forecast System (WoFS). A parametric regression technique using the sinh-arcsinh-normal distribution is adapted to run with U-Nets, allowing for both deterministic and probabilistic predictions of maximum vertical velocity. The best models after hyperparameter search provided less than 50% root mean squared error, a coefficient of determination greater than 0.65 and an intersection over union (IoU) of more than 0.45 on the independent test set composed of WoFS data. Beyond the WoFS analysis, a case study was conducted using real radar data and corresponding dual-Doppler analyses of vertical velocity within a supercell. The U-Net consistently underestimates the dual-Doppler updraft speed estimates by 50%. Meanwhile, the area of the 5 and 10 m s−1 updraft cores show an IoU of 0.25. While the above statistics are not exceptional, the machine learning model enables quick distillation of 3D radar data that is related to the maximum vertical velocity which could be useful in assessing a storm’s severe potential.

Open access
Charles H. White
,
Imme Ebert-Uphoff
,
John M. Haynes
, and
Yoo-Jeong Noh

Abstract

Super-resolution is the general task of artificially increasing the spatial resolution of an image. The recent surge in machine learning (ML) research has yielded many promising ML-based approaches for performing single-image super-resolution including applications to satellite remote sensing. We develop a convolutional neural network (CNN) to super-resolve the 1-km and 2-km bands on the GOES-R series Advanced Baseline Imager (ABI) to a common high resolution of 0.5-km. Access to 0.5-km imagery from ABI Band-2 enables the CNN to realistically sharpen lower-resolution bands without significant blurring. We first train the CNN on a proxy task which allows us to only use ABI imagery, namely degrading resolution of ABI bands and training the CNN to restore the original imagery. Comparisons at reduced resolution and at full resolution with Landsat-8/9 observations illustrate that the CNN produces images with realistic high-frequency detail that is not present in a bicubic interpolation baseline. Estimating all ABI bands at 0.5-km resolution allows for more easily combining information across bands without reconciling differences in spatial resolution. However, more analysis is needed to determine impacts on derived products or multispectral imagery that use super-resolved bands. This approach is extensible to other remote sensing instruments that have bands with different spatial resolutions and requires only a small amount of data and knowledge of each channel’s modulation transfer function.

Open access
Daniel G. Butt
,
Aaron L. Jaffe
,
Connell S. Miller
,
Gregory A. Kopp
, and
David M. L. Sills

Abstract

In many regions of the world, tornadoes travel through forested areas with low population densities, making downed trees the only observable damage indicator. Current methods in the EF scale for analyzing tree damage may not reflect the true intensity of some tornadoes. However, new methods have been developed that use the number of trees downed or treefall directions from high-resolution aerial imagery to provide an estimate of maximum wind speed. Treefall Identification and Direction Analysis (TrIDA) maps are used to identify areas of treefall damage and treefall directions along the damage path. Currently, TrIDA maps are generated manually, but this is labor-intensive, often taking several days or weeks. To solve this, this paper describes a machine learning– and image-processing-based model that automatically extracts fallen trees from large-scale aerial imagery, assesses their fall directions, and produces an area-averaged treefall vector map with minimal initial human interaction. The automated model achieves a median tree direction difference of 13.3° when compared to the manual tree directions from the Alonsa, Manitoba, tornado, demonstrating the viability of the automated model compared to manual assessment. Overall, the automated production of treefall vector maps from large-scale aerial imagery significantly speeds up and reduces the labor required to create a Treefall Identification and Direction Analysis map from a matter of days or weeks to a matter of hours.

Significance Statement

The automation of treefall detection and direction is significant to the analyses of tornado paths and intensities. Previously, it would have taken a researcher multiple days to weeks to manually count and assess the directions of fallen trees in large-scale aerial photography of tornado damage. Through automation, analysis takes a matter of hours, with minimal initial human interaction. Tornado researchers will be able to use this automated process to help analyze and assess tornadoes and their enhanced Fujita–scale rating around the world.

Open access
Wei-Yi Cheng
,
Daehyun Kim
,
Scott Henderson
,
Yoo-Geun Ham
,
Jeong-Hwan Kim
, and
Rober H. Holzworth

Abstract

The diversity in the lightning parameterizations for numerical weather and climate models causes a considerable uncertainty in lightning prediction. In this study, we take a data-driven approach to address the lightning parameterization problem, by combining machine learning (ML) techniques with the rich lightning observations from World-Wide Lightning Location Network. Three ML algorithms are trained over the Contiguous United States (CONUS) to predict lightning stroke density in a 1° box based on the information about the atmospheric variables in the same grid (local), or over the entire CONUS (non-local). The performance of the ML-based lightning schemes is examined and compared with that of a simple, conventional lightning parameterization scheme of Romps et al. (2014).

We find that all ML-based lightning schemes exhibit a performance that are superior to that of the conventional scheme in the regions and in the seasons with climatologically higher lightning stroke density. To the west of Rocky Mountains, the non-local ML lightning scheme achieves the best overall performance, with lightning stroke density predictions being 70% more accurate than the conventional scheme. Our results suggest that the ML-based approaches have the potential to improve the representation of lightning and other types of extreme weather events in the weather and climate models.

Open access
Alexander J. DesRosiers
and
Michael M. Bell

Abstract

Airborne Doppler radar provides detailed and targeted observations of winds and precipitation in weather systems over remote or difficult-to-access regions that can help to improve scientific understanding and weather forecasts. Quality control (QC) is necessary to remove nonweather echoes from raw radar data for subsequent analysis. The complex decision-making ability of the machine learning random-forest technique is employed to create a generalized QC method for airborne radar data in convective weather systems. A manually QCed dataset was used to train the model containing data from the Electra Doppler Radar (ELDORA) in mature and developing tropical cyclones, a tornadic supercell, and a bow echo. Successful classification of ∼96% and ∼93% of weather and nonweather radar gates, respectively, in withheld testing data indicate the generalizability of the method. Dual-Doppler analysis from the genesis phase of Hurricane Ophelia (2005) using data not previously seen by the model produced a comparable wind field to that from manual QC. The framework demonstrates a proof of concept that can be applied to newer airborne Doppler radars.

Significance Statement

Airborne Doppler radar is an invaluable tool for making detailed measurements of wind and precipitation in weather systems over remote or difficult to access regions, such as hurricanes over the ocean. Using the collected radar data depends strongly on quality control (QC) procedures to classify weather and nonweather radar echoes and to then remove the latter before subsequent analysis or assimilation into numerical weather prediction models. Prior QC techniques require interactive editing and subjective classification by trained researchers and can demand considerable time for even small amounts of data. We present a new machine learning algorithm that is trained on past QC efforts from radar experts, resulting in an accurate, fast technique with far less user input required that can greatly reduce the time required for QC. The new technique is based on the random forest, which is a machine learning model composed of decision trees, to classify weather and nonweather radar echoes. Continued efforts to build on this technique could benefit future weather forecasts by quickly and accurately quality-controlling data from other airborne radars for research or operational meteorology.

Open access
Junsu Kim
,
Yeon-Hee Kim
,
Hyejeong Bok
,
Sungbin Jang
,
Eunju Cho
, and
Seung-Bum Kim

Abstract

We developed an advanced post-processing model for precipitation forecasting using a micro-genetic algorithm (MGA). The algorithm determines the optimal combination of three general circulation models: the Korean Integrated Model, the Unified Model, and the Integrated Forecast System model. To measure model accuracy, including the critical success index (CSI), probability of detection (POD), and frequency bias index, the MGA calculates optimal weights for individual models based on a fitness function that considers various indices. Our optimized multi-model yielded up to 13% and 10% improvement in CSI and POD performance compared to each individual model, respectively. Notably, when applied to an operational definition that considers precipitation thresholds from three models and averages the precipitation amount from the satisfactory models, our optimized multi-model outperformed the current operational model used by the Korea Meteorological Administration by up to 1.0% and 6.8% in terms of CSI and false alarm ratio performance, respectively. This study highlights the effectiveness of a weighted combination of global models to enhance the forecasting accuracy for regional precipitation. By utilizing the MGA for the fine-tuning of model weights, we achieved superior precipitation prediction compared to that of individual models and existing standard post-processing operations. This approach can significantly improve the accuracy of precipitation forecasts.

Open access
Sam J. Silva
and
Christoph A. Keller

Abstract

Explainable artificial intelligence (XAI) methods are becoming popular tools for scientific discovery in the Earth and atmospheric sciences. While these techniques have the potential to revolutionize the scientific process, there are known limitations in their applicability that are frequently ignored. These limitations include that XAI methods explain the behavior of the AI model and not the behavior of the training dataset, and that caution should be used when these methods are applied to datasets with correlated and dependent features. Here, we explore the potential cost associated with ignoring these limitations with a simple case study from the atmospheric chemistry literature – learning the reaction rate of a bimolecular reaction. We demonstrate that dependent and highly correlated input features can lead to spurious process-level explanations. We posit that the current generation of XAI techniques should largely only be used for understanding system-level behavior and recommend caution when using XAI methods for process-level scientific discovery in the Earth and atmospheric sciences.

Open access
Kevin Höhlein
,
Benedikt Schulz
,
Rüdiger Westermann
, and
Sebastian Lerch

Abstract

Statistical postprocessing is used to translate ensembles of raw numerical weather forecasts into reliable probabilistic forecast distributions. In this study, we examine the use of permutation-invariant neural networks for this task. In contrast to previous approaches, which often operate on ensemble summary statistics and dismiss details of the ensemble distribution, we propose networks that treat forecast ensembles as a set of unordered member forecasts and learn link functions that are by design invariant to permutations of the member ordering. We evaluate the quality of the obtained forecast distributions in terms of calibration and sharpness and compare the models against classical and neural network–based benchmark methods. In case studies addressing the postprocessing of surface temperature and wind gust forecasts, we demonstrate state-of-the-art prediction quality. To deepen the understanding of the learned inference process, we further propose a permutation-based importance analysis for ensemble-valued predictors, which highlights specific aspects of the ensemble forecast that are considered important by the trained postprocessing models. Our results suggest that most of the relevant information is contained in a few ensemble-internal degrees of freedom, which may impact the design of future ensemble forecasting and postprocessing systems.

Open access