Browse
Abstract
With increasing interest in explaining machine learning (ML) models, this paper synthesizes many topics related to ML explainability. We distinguish explainability from interpretability, local from global explainability, and feature importance versus feature relevance. We demonstrate and visualize different explanation methods, how to interpret them, and provide a complete Python package (scikit-explain) to allow future researchers and model developers to explore these explainability methods. The explainability methods include Shapley additive explanations (SHAP), Shapley additive global explanation (SAGE), and accumulated local effects (ALE). Our focus is primarily on Shapley-based techniques, which serve as a unifying framework for various existing methods to enhance model explainability. For example, SHAP unifies methods like local interpretable model-agnostic explanations (LIME) and tree interpreter for local explainability, while SAGE unifies the different variations of permutation importance for global explainability. We provide a short tutorial for explaining ML models using three disparate datasets: a convection-allowing model dataset for severe weather prediction, a nowcasting dataset for subfreezing road surface prediction, and satellite-based data for lightning prediction. In addition, we showcase the adverse effects that correlated features can have on the explainability of a model. Finally, we demonstrate the notion of evaluating model impacts of feature groups instead of individual features. Evaluating the feature groups mitigates the impacts of feature correlations and can provide a more holistic understanding of the model. All code, models, and data used in this study are freely available to accelerate the adoption of machine learning explainability in the atmospheric and other environmental sciences.
Abstract
With increasing interest in explaining machine learning (ML) models, this paper synthesizes many topics related to ML explainability. We distinguish explainability from interpretability, local from global explainability, and feature importance versus feature relevance. We demonstrate and visualize different explanation methods, how to interpret them, and provide a complete Python package (scikit-explain) to allow future researchers and model developers to explore these explainability methods. The explainability methods include Shapley additive explanations (SHAP), Shapley additive global explanation (SAGE), and accumulated local effects (ALE). Our focus is primarily on Shapley-based techniques, which serve as a unifying framework for various existing methods to enhance model explainability. For example, SHAP unifies methods like local interpretable model-agnostic explanations (LIME) and tree interpreter for local explainability, while SAGE unifies the different variations of permutation importance for global explainability. We provide a short tutorial for explaining ML models using three disparate datasets: a convection-allowing model dataset for severe weather prediction, a nowcasting dataset for subfreezing road surface prediction, and satellite-based data for lightning prediction. In addition, we showcase the adverse effects that correlated features can have on the explainability of a model. Finally, we demonstrate the notion of evaluating model impacts of feature groups instead of individual features. Evaluating the feature groups mitigates the impacts of feature correlations and can provide a more holistic understanding of the model. All code, models, and data used in this study are freely available to accelerate the adoption of machine learning explainability in the atmospheric and other environmental sciences.
Abstract
Recently, the U.S. Department of Energy (DOE), Office of Science, Biological and Environmental Research (BER), and Advanced Scientific Computing Research (ASCR) programs organized and held the Artificial Intelligence for Earth System Predictability (AI4ESP) workshop series. From this workshop, a critical conclusion that the DOE BER and ASCR community came to is the requirement to develop a new paradigm for Earth system predictability focused on enabling artificial intelligence (AI) across the field, laboratory, modeling, and analysis activities, called model experimentation (ModEx). BER’s ModEx is an iterative approach that enables process models to generate hypotheses. The developed hypotheses inform field and laboratory efforts to collect measurement and observation data, which are subsequently used to parameterize, drive, and test model (e.g., process based) predictions. A total of 17 technical sessions were held in this AI4ESP workshop series. This paper discusses the topic of the AI Architectures and Codesign session and associated outcomes. The AI Architectures and Codesign session included two invited talks, two plenary discussion panels, and three breakout rooms that covered specific topics, including 1) DOE high-performance computing (HPC) systems, 2) cloud HPC systems, and 3) edge computing and Internet of Things (IoT). We also provide forward-looking ideas and perspectives on potential research in this codesign area that can be achieved by synergies with the other 16 session topics. These ideas include topics such as 1) reimagining codesign, 2) data acquisition to distribution, 3) heterogeneous HPC solutions for integration of AI/ML and other data analytics like uncertainty quantification with Earth system modeling and simulation, and 4) AI-enabled sensor integration into Earth system measurements and observations. Such perspectives are a distinguishing aspect of this paper.
Significance Statement
This study aims to provide perspectives on AI architectures and codesign approaches for Earth system predictability. Such visionary perspectives are essential because AI-enabled model-data integration has shown promise in improving predictions associated with climate change, perturbations, and extreme events. Our forward-looking ideas guide what is next in codesign to enhance Earth system models, observations, and theory using state-of-the-art and futuristic computational infrastructure.
Abstract
Recently, the U.S. Department of Energy (DOE), Office of Science, Biological and Environmental Research (BER), and Advanced Scientific Computing Research (ASCR) programs organized and held the Artificial Intelligence for Earth System Predictability (AI4ESP) workshop series. From this workshop, a critical conclusion that the DOE BER and ASCR community came to is the requirement to develop a new paradigm for Earth system predictability focused on enabling artificial intelligence (AI) across the field, laboratory, modeling, and analysis activities, called model experimentation (ModEx). BER’s ModEx is an iterative approach that enables process models to generate hypotheses. The developed hypotheses inform field and laboratory efforts to collect measurement and observation data, which are subsequently used to parameterize, drive, and test model (e.g., process based) predictions. A total of 17 technical sessions were held in this AI4ESP workshop series. This paper discusses the topic of the AI Architectures and Codesign session and associated outcomes. The AI Architectures and Codesign session included two invited talks, two plenary discussion panels, and three breakout rooms that covered specific topics, including 1) DOE high-performance computing (HPC) systems, 2) cloud HPC systems, and 3) edge computing and Internet of Things (IoT). We also provide forward-looking ideas and perspectives on potential research in this codesign area that can be achieved by synergies with the other 16 session topics. These ideas include topics such as 1) reimagining codesign, 2) data acquisition to distribution, 3) heterogeneous HPC solutions for integration of AI/ML and other data analytics like uncertainty quantification with Earth system modeling and simulation, and 4) AI-enabled sensor integration into Earth system measurements and observations. Such perspectives are a distinguishing aspect of this paper.
Significance Statement
This study aims to provide perspectives on AI architectures and codesign approaches for Earth system predictability. Such visionary perspectives are essential because AI-enabled model-data integration has shown promise in improving predictions associated with climate change, perturbations, and extreme events. Our forward-looking ideas guide what is next in codesign to enhance Earth system models, observations, and theory using state-of-the-art and futuristic computational infrastructure.
Abstract
The identification of atmospheric rivers (ARs) is crucial for weather and climate predictions as they are often associated with severe storm systems and extreme precipitation, which can cause large impacts on society. This study presents a deep learning model, termed ARDetect, for image segmentation of ARs using ERA5 data from 1960 to 2020 with labels obtained from the TempestExtremes tracking algorithm. ARDetect is a convolutional neural network (CNN)-based U-Net model, with its structure having been optimized using automatic hyperparameter tuning. Inputs to ARDetect were selected to be the integrated water vapor transport (IVT) and total column water (TCW) fields, as well as the AR mask from TempestExtremes from the previous time step to the one being considered. ARDetect achieved a mean intersection-over-union (mIoU) rate of 89.04% for ARs, indicating its high accuracy in identifying these weather patterns and a superior performance than most deep learning–based models for AR detection. In addition, ARDetect can be executed faster than the TempestExtremes method (seconds vs minutes) for the same period. This provides a significant benefit for online AR detection, especially for high-resolution global models. An ensemble of 10 models, each trained on the same dataset but having different starting weights, was used to further improve on the performance produced by ARDetect, thus demonstrating the importance of model diversity in improving performance. ARDetect provides an effective and fast deep learning–based model for researchers and weather forecasters to better detect and understand ARs, which have significant impacts on weather-related events such as floods and droughts.
Abstract
The identification of atmospheric rivers (ARs) is crucial for weather and climate predictions as they are often associated with severe storm systems and extreme precipitation, which can cause large impacts on society. This study presents a deep learning model, termed ARDetect, for image segmentation of ARs using ERA5 data from 1960 to 2020 with labels obtained from the TempestExtremes tracking algorithm. ARDetect is a convolutional neural network (CNN)-based U-Net model, with its structure having been optimized using automatic hyperparameter tuning. Inputs to ARDetect were selected to be the integrated water vapor transport (IVT) and total column water (TCW) fields, as well as the AR mask from TempestExtremes from the previous time step to the one being considered. ARDetect achieved a mean intersection-over-union (mIoU) rate of 89.04% for ARs, indicating its high accuracy in identifying these weather patterns and a superior performance than most deep learning–based models for AR detection. In addition, ARDetect can be executed faster than the TempestExtremes method (seconds vs minutes) for the same period. This provides a significant benefit for online AR detection, especially for high-resolution global models. An ensemble of 10 models, each trained on the same dataset but having different starting weights, was used to further improve on the performance produced by ARDetect, thus demonstrating the importance of model diversity in improving performance. ARDetect provides an effective and fast deep learning–based model for researchers and weather forecasters to better detect and understand ARs, which have significant impacts on weather-related events such as floods and droughts.
Abstract
Recently, there has been a surge of research on data-driven weather forecasting systems, especially applications based on convolutional neural networks (CNNs). These are usually trained on atmospheric data represented on regular latitude–longitude grids, neglecting the curvature of Earth. We assess the benefit of replacing the standard convolution operations with an adapted convolution operation that takes into account the geometry of the underlying data (SphereNet convolution), specifically near the poles. Additionally, we assess the effect of including the information that the two hemispheres of Earth have “flipped” properties—for example, cyclones circulating in opposite directions—into the structure of the network. Both approaches are examples of physics-informed machine learning. The methods are tested on the WeatherBench dataset, at a resolution of ∼1.4°, which is higher than many previous studies on CNNs for weather forecasting. For most lead times up to day +10 for 500-hPa geopotential and 850-hPa temperature, we find that using SphereNet convolution or including hemisphere-specific information individually leads to improvement in forecast skill. Combining the two methods typically gives the highest forecast skill. Our version of SphereNet is implemented flexibly and scales well to high-resolution datasets but is still significantly more expensive than a standard convolution operation. Finally, we analyze cases with high forecast error. These occur mainly in winter and are relatively consistent across different training realizations of the networks, pointing to flow-dependent atmospheric predictability.
Abstract
Recently, there has been a surge of research on data-driven weather forecasting systems, especially applications based on convolutional neural networks (CNNs). These are usually trained on atmospheric data represented on regular latitude–longitude grids, neglecting the curvature of Earth. We assess the benefit of replacing the standard convolution operations with an adapted convolution operation that takes into account the geometry of the underlying data (SphereNet convolution), specifically near the poles. Additionally, we assess the effect of including the information that the two hemispheres of Earth have “flipped” properties—for example, cyclones circulating in opposite directions—into the structure of the network. Both approaches are examples of physics-informed machine learning. The methods are tested on the WeatherBench dataset, at a resolution of ∼1.4°, which is higher than many previous studies on CNNs for weather forecasting. For most lead times up to day +10 for 500-hPa geopotential and 850-hPa temperature, we find that using SphereNet convolution or including hemisphere-specific information individually leads to improvement in forecast skill. Combining the two methods typically gives the highest forecast skill. Our version of SphereNet is implemented flexibly and scales well to high-resolution datasets but is still significantly more expensive than a standard convolution operation. Finally, we analyze cases with high forecast error. These occur mainly in winter and are relatively consistent across different training realizations of the networks, pointing to flow-dependent atmospheric predictability.
Abstract
Atmospheric models with typical resolution in the tenths of kilometers cannot resolve the dynamics of air parcel ascent, which varies on scales ranging from tens to hundreds of meters. Small-scale wind fluctuations are thus characterized by a subgrid distribution of vertical wind velocity W with standard deviation σW . The parameterization of σW is fundamental to the representation of aerosol–cloud interactions, yet it is poorly constrained. Using a novel deep learning technique, this work develops a new parameterization for σW merging data from global storm-resolving model simulations, high-frequency retrievals of W, and climate reanalysis products. The parameterization reproduces the observed statistics of σW and leverages learned physical relations from the model simulations to guide extrapolation beyond the observed domain. Incorporating observational data during the training phase was found to be critical for its performance. The parameterization can be applied online within large-scale atmospheric models, or offline using output from weather forecasting and reanalysis products.
Significance Statement
Vertical air motion plays a crucial role in several atmospheric processes, such as cloud droplet and ice crystal formation. However, it often occurs at scales smaller than those resolved by standard atmospheric models, leading to uncertainties in climate predictions. To address this, we present a novel deep learning approach that synthesizes data from various sources, providing a representation of small-scale vertical wind velocity suitable for integration into atmospheric models. Our method demonstrates high accuracy when compared to observation-based retrievals, offering potential to mitigate uncertainties and enhance climate forecasting.
Abstract
Atmospheric models with typical resolution in the tenths of kilometers cannot resolve the dynamics of air parcel ascent, which varies on scales ranging from tens to hundreds of meters. Small-scale wind fluctuations are thus characterized by a subgrid distribution of vertical wind velocity W with standard deviation σW . The parameterization of σW is fundamental to the representation of aerosol–cloud interactions, yet it is poorly constrained. Using a novel deep learning technique, this work develops a new parameterization for σW merging data from global storm-resolving model simulations, high-frequency retrievals of W, and climate reanalysis products. The parameterization reproduces the observed statistics of σW and leverages learned physical relations from the model simulations to guide extrapolation beyond the observed domain. Incorporating observational data during the training phase was found to be critical for its performance. The parameterization can be applied online within large-scale atmospheric models, or offline using output from weather forecasting and reanalysis products.
Significance Statement
Vertical air motion plays a crucial role in several atmospheric processes, such as cloud droplet and ice crystal formation. However, it often occurs at scales smaller than those resolved by standard atmospheric models, leading to uncertainties in climate predictions. To address this, we present a novel deep learning approach that synthesizes data from various sources, providing a representation of small-scale vertical wind velocity suitable for integration into atmospheric models. Our method demonstrates high accuracy when compared to observation-based retrievals, offering potential to mitigate uncertainties and enhance climate forecasting.
Abstract
Vertical profiles of temperature and dewpoint are useful in predicting deep convection that leads to severe weather, which threatens property and lives. Currently, forecasters rely on observations from radiosonde launches and numerical weather prediction (NWP) models. Radiosonde observations are, however, temporally and spatially sparse, and NWP models contain inherent errors that influence short-term predictions of high impact events. This work explores using machine learning (ML) to postprocess NWP model forecasts, combining them with satellite data to improve vertical profiles of temperature and dewpoint. We focus on different ML architectures, loss functions, and input features to optimize predictions. Because we are predicting vertical profiles at 256 levels in the atmosphere, this work provides a unique perspective at using ML for 1D tasks. Compared to baseline profiles from the Rapid Refresh (RAP), ML predictions offer the largest improvement for dewpoint, particularly in the middle and upper atmosphere. Temperature improvements are modest, but CAPE values are improved by up to 40%. Feature importance analyses indicate that the ML models are primarily improving incoming RAP biases. While additional model and satellite data offer some improvement to the predictions, architecture choice is more important than feature selection in fine-tuning the results. Our proposed deep residual U-Net performs the best by leveraging spatial context from the input RAP profiles; however, the results are remarkably robust across model architecture. Further, uncertainty estimates for every level are well calibrated and can provide useful information to forecasters.
Abstract
Vertical profiles of temperature and dewpoint are useful in predicting deep convection that leads to severe weather, which threatens property and lives. Currently, forecasters rely on observations from radiosonde launches and numerical weather prediction (NWP) models. Radiosonde observations are, however, temporally and spatially sparse, and NWP models contain inherent errors that influence short-term predictions of high impact events. This work explores using machine learning (ML) to postprocess NWP model forecasts, combining them with satellite data to improve vertical profiles of temperature and dewpoint. We focus on different ML architectures, loss functions, and input features to optimize predictions. Because we are predicting vertical profiles at 256 levels in the atmosphere, this work provides a unique perspective at using ML for 1D tasks. Compared to baseline profiles from the Rapid Refresh (RAP), ML predictions offer the largest improvement for dewpoint, particularly in the middle and upper atmosphere. Temperature improvements are modest, but CAPE values are improved by up to 40%. Feature importance analyses indicate that the ML models are primarily improving incoming RAP biases. While additional model and satellite data offer some improvement to the predictions, architecture choice is more important than feature selection in fine-tuning the results. Our proposed deep residual U-Net performs the best by leveraging spatial context from the input RAP profiles; however, the results are remarkably robust across model architecture. Further, uncertainty estimates for every level are well calibrated and can provide useful information to forecasters.
Abstract
Weather forecasting centers currently rely on statistical postprocessing methods to minimize forecast error. This improves skill but can lead to predictions that violate physical principles or disregard dependencies between variables, which can be problematic for downstream applications and for the trustworthiness of postprocessing models, especially when they are based on new machine learning approaches. Building on recent advances in physics-informed machine learning, we propose to achieve physical consistency in deep learning–based postprocessing models by integrating meteorological expertise in the form of analytic equations. Applied to the postprocessing of surface weather in Switzerland, we find that constraining a neural network to enforce thermodynamic state equations yields physically consistent predictions of temperature and humidity without compromising performance. Our approach is especially advantageous when data are scarce, and our findings suggest that incorporating domain expertise into postprocessing models allows the optimization of weather forecast information while satisfying application-specific requirements.
Significance Statement
Postprocessing is a widely used approach to reduce forecast error using statistics, but it may lead to physical inconsistencies. This outcome can be problematic for trustworthiness and downstream applications. We present the first machine learning–based postprocessing method intentionally designed to strictly enforce physical laws. Our framework improves physical consistency without sacrificing performance and suggests that human expertise can be incorporated into postprocessing models via analytic equations.
Abstract
Weather forecasting centers currently rely on statistical postprocessing methods to minimize forecast error. This improves skill but can lead to predictions that violate physical principles or disregard dependencies between variables, which can be problematic for downstream applications and for the trustworthiness of postprocessing models, especially when they are based on new machine learning approaches. Building on recent advances in physics-informed machine learning, we propose to achieve physical consistency in deep learning–based postprocessing models by integrating meteorological expertise in the form of analytic equations. Applied to the postprocessing of surface weather in Switzerland, we find that constraining a neural network to enforce thermodynamic state equations yields physically consistent predictions of temperature and humidity without compromising performance. Our approach is especially advantageous when data are scarce, and our findings suggest that incorporating domain expertise into postprocessing models allows the optimization of weather forecast information while satisfying application-specific requirements.
Significance Statement
Postprocessing is a widely used approach to reduce forecast error using statistics, but it may lead to physical inconsistencies. This outcome can be problematic for trustworthiness and downstream applications. We present the first machine learning–based postprocessing method intentionally designed to strictly enforce physical laws. Our framework improves physical consistency without sacrificing performance and suggests that human expertise can be incorporated into postprocessing models via analytic equations.
Abstract
This paper explores the application of emerging machine learning methods from image super resolution (SR) to the task of statistical downscaling. We specifically focus on convolutional neural network–based generative adversarial networks (GANs). Our GANs are conditioned on low-resolution (LR) inputs to generate high-resolution (HR) surface winds emulating Weather Research and Forecasting (WRF) Model simulations over North America. Unlike traditional SR models, where LR inputs are idealized coarsened versions of the HR images, WRF emulation involves using nonidealized LR and HR pairs, resulting in shared-scale mismatches due to internal variability. Our study builds upon current SR-based statistical downscaling by experimenting with a novel frequency-separation (FS) approach from the computer vision field. To assess the skill of SR models, we carefully select evaluation metrics and focus on performance measures based on spatial power spectra. Our analyses reveal how GAN configurations influence spatial structures in the generated fields, particularly biases in spatial variability spectra. Using power spectra to evaluate the FS experiments reveals that successful applications of FS in computer vision do not translate to climate fields. However, the FS experiments demonstrate the sensitivity of power spectra to a commonly used GAN-based SR objective function, which helps interpret and understand its role in determining spatial structures. This result motivates the development of a novel partial frequency-separation scheme as a promising configuration option. We also quantify the influence on GAN performance of nonidealized LR fields resulting from internal variability. Furthermore, we conduct a spectrum-based feature-importance experiment, allowing us to explore the dependence of the spatial structure of generated fields on different physically relevant LR covariates.
Significance Statement
We use artificial intelligence algorithms to mimic wind patterns from high-resolution climate models, offering a faster alternative to running these models directly. Unlike many similar approaches, we use datasets that acknowledge the essentially stochastic nature of the downscaling problem. Drawing inspiration from computer vision studies, we design several experiments to explore how different configurations impact our results. We find evaluation methods based on spatial frequencies in the climate fields to be quite effective at understanding how algorithms behave. Our results provide valuable insights into and interpretations of the methods for future research in this field.
Abstract
This paper explores the application of emerging machine learning methods from image super resolution (SR) to the task of statistical downscaling. We specifically focus on convolutional neural network–based generative adversarial networks (GANs). Our GANs are conditioned on low-resolution (LR) inputs to generate high-resolution (HR) surface winds emulating Weather Research and Forecasting (WRF) Model simulations over North America. Unlike traditional SR models, where LR inputs are idealized coarsened versions of the HR images, WRF emulation involves using nonidealized LR and HR pairs, resulting in shared-scale mismatches due to internal variability. Our study builds upon current SR-based statistical downscaling by experimenting with a novel frequency-separation (FS) approach from the computer vision field. To assess the skill of SR models, we carefully select evaluation metrics and focus on performance measures based on spatial power spectra. Our analyses reveal how GAN configurations influence spatial structures in the generated fields, particularly biases in spatial variability spectra. Using power spectra to evaluate the FS experiments reveals that successful applications of FS in computer vision do not translate to climate fields. However, the FS experiments demonstrate the sensitivity of power spectra to a commonly used GAN-based SR objective function, which helps interpret and understand its role in determining spatial structures. This result motivates the development of a novel partial frequency-separation scheme as a promising configuration option. We also quantify the influence on GAN performance of nonidealized LR fields resulting from internal variability. Furthermore, we conduct a spectrum-based feature-importance experiment, allowing us to explore the dependence of the spatial structure of generated fields on different physically relevant LR covariates.
Significance Statement
We use artificial intelligence algorithms to mimic wind patterns from high-resolution climate models, offering a faster alternative to running these models directly. Unlike many similar approaches, we use datasets that acknowledge the essentially stochastic nature of the downscaling problem. Drawing inspiration from computer vision studies, we design several experiments to explore how different configurations impact our results. We find evaluation methods based on spatial frequencies in the climate fields to be quite effective at understanding how algorithms behave. Our results provide valuable insights into and interpretations of the methods for future research in this field.
Abstract
There is growing use of machine learning algorithms to replicate subgrid parameterization schemes in global climate models. Parameterizations rely on approximations; thus, there is potential for machine learning to aid improvements. In this study, a neural network is used to mimic the behavior of the nonorographic gravity wave scheme used in the Met Office climate model, important for stratospheric climate and variability. The neural network is found to require only two of the six inputs used by the parameterization scheme, suggesting the potential for greater efficiency in this scheme. Use of a one-dimensional mechanistic model is advocated, allowing neural network hyperparameters to be chosen based on emergent features of the coupled system with minimal computational cost, and providing a testbed prior to coupling to a climate model. A climate model simulation, using the neural network in place of the existing parameterization scheme, is found to accurately generate a quasi-biennial oscillation of the tropical stratospheric winds, and correctly simulate the nonorographic gravity wave variability associated with El Niño–Southern Oscillation and stratospheric polar vortex variability. These internal sources of variability are essential for providing seasonal forecast skill, and the gravity wave forcing associated with them is reproduced without explicit training for these patterns.
Significance Statement
Climate simulations are required for providing advice to government, industry, and society regarding the expected climate on time scales of months to decades. Machine learning has the potential to improve the representation of some sources of variability in climate models that are too small to be directly simulated by the model. This study demonstrates that a neural network can simulate the variability due to atmospheric gravity waves that is associated with El Niño–Southern Oscillation and with the tropical and polar regions of the stratosphere. These details are important for a model to produce more accurate predictions of regional climate.
Abstract
There is growing use of machine learning algorithms to replicate subgrid parameterization schemes in global climate models. Parameterizations rely on approximations; thus, there is potential for machine learning to aid improvements. In this study, a neural network is used to mimic the behavior of the nonorographic gravity wave scheme used in the Met Office climate model, important for stratospheric climate and variability. The neural network is found to require only two of the six inputs used by the parameterization scheme, suggesting the potential for greater efficiency in this scheme. Use of a one-dimensional mechanistic model is advocated, allowing neural network hyperparameters to be chosen based on emergent features of the coupled system with minimal computational cost, and providing a testbed prior to coupling to a climate model. A climate model simulation, using the neural network in place of the existing parameterization scheme, is found to accurately generate a quasi-biennial oscillation of the tropical stratospheric winds, and correctly simulate the nonorographic gravity wave variability associated with El Niño–Southern Oscillation and stratospheric polar vortex variability. These internal sources of variability are essential for providing seasonal forecast skill, and the gravity wave forcing associated with them is reproduced without explicit training for these patterns.
Significance Statement
Climate simulations are required for providing advice to government, industry, and society regarding the expected climate on time scales of months to decades. Machine learning has the potential to improve the representation of some sources of variability in climate models that are too small to be directly simulated by the model. This study demonstrates that a neural network can simulate the variability due to atmospheric gravity waves that is associated with El Niño–Southern Oscillation and with the tropical and polar regions of the stratosphere. These details are important for a model to produce more accurate predictions of regional climate.