Browse
Abstract
Statistical post-processing of global ensemble weather forecasts is revisited by leveraging recent developments in machine learning. Verification of past forecasts is exploited to learn systematic deficiencies of numerical weather predictions in order to boost post-processed forecast performance. Here, we introduce PoET, a post-processing approach based on hierarchical transformers. PoET has 2 major characteristics: 1) the post-processing is applied directly to the ensemble members rather than to a predictive distribution or a functional of it, and 2) the method is ensemble-size agnostic in the sense that the number of ensemble members in training and inference mode can differ. The PoET output is a set of calibrated members that has the same size as the original ensemble but with improved reliability. Performance assessments show that PoET can bring up to 20% improvement in skill globally for 2m temperature and 2% for precipitation forecasts and outperforms the simpler statistical member-by-member method, used here as a competitive benchmark. PoET is also applied to the ENS10 benchmark dataset for ensemble post-processing and provides better results when compared to other deep learning solutions that are evaluated for most parameters. Furthermore, because each ensemble member is calibrated separately, downstream applications should directly benefit from the improvement made on the ensemble forecast with post-processing.
Abstract
Statistical post-processing of global ensemble weather forecasts is revisited by leveraging recent developments in machine learning. Verification of past forecasts is exploited to learn systematic deficiencies of numerical weather predictions in order to boost post-processed forecast performance. Here, we introduce PoET, a post-processing approach based on hierarchical transformers. PoET has 2 major characteristics: 1) the post-processing is applied directly to the ensemble members rather than to a predictive distribution or a functional of it, and 2) the method is ensemble-size agnostic in the sense that the number of ensemble members in training and inference mode can differ. The PoET output is a set of calibrated members that has the same size as the original ensemble but with improved reliability. Performance assessments show that PoET can bring up to 20% improvement in skill globally for 2m temperature and 2% for precipitation forecasts and outperforms the simpler statistical member-by-member method, used here as a competitive benchmark. PoET is also applied to the ENS10 benchmark dataset for ensemble post-processing and provides better results when compared to other deep learning solutions that are evaluated for most parameters. Furthermore, because each ensemble member is calibrated separately, downstream applications should directly benefit from the improvement made on the ensemble forecast with post-processing.
Abstract
There is growing use of machine learning algorithms to replicate subgrid parameterization schemes in global climate models. Parameterizations rely on approximations; thus, there is potential for machine learning to aid improvements. In this study, a neural network is used to mimic the behavior of the nonorographic gravity wave scheme used in the Met Office climate model, important for stratospheric climate and variability. The neural network is found to require only two of the six inputs used by the parameterization scheme, suggesting the potential for greater efficiency in this scheme. Use of a one-dimensional mechanistic model is advocated, allowing neural network hyperparameters to be chosen based on emergent features of the coupled system with minimal computational cost, and providing a testbed prior to coupling to a climate model. A climate model simulation, using the neural network in place of the existing parameterization scheme, is found to accurately generate a quasi-biennial oscillation of the tropical stratospheric winds, and correctly simulate the nonorographic gravity wave variability associated with El Niño–Southern Oscillation and stratospheric polar vortex variability. These internal sources of variability are essential for providing seasonal forecast skill, and the gravity wave forcing associated with them is reproduced without explicit training for these patterns.
Significance Statement
Climate simulations are required for providing advice to government, industry, and society regarding the expected climate on time scales of months to decades. Machine learning has the potential to improve the representation of some sources of variability in climate models that are too small to be directly simulated by the model. This study demonstrates that a neural network can simulate the variability due to atmospheric gravity waves that is associated with El Niño–Southern Oscillation and with the tropical and polar regions of the stratosphere. These details are important for a model to produce more accurate predictions of regional climate.
Abstract
There is growing use of machine learning algorithms to replicate subgrid parameterization schemes in global climate models. Parameterizations rely on approximations; thus, there is potential for machine learning to aid improvements. In this study, a neural network is used to mimic the behavior of the nonorographic gravity wave scheme used in the Met Office climate model, important for stratospheric climate and variability. The neural network is found to require only two of the six inputs used by the parameterization scheme, suggesting the potential for greater efficiency in this scheme. Use of a one-dimensional mechanistic model is advocated, allowing neural network hyperparameters to be chosen based on emergent features of the coupled system with minimal computational cost, and providing a testbed prior to coupling to a climate model. A climate model simulation, using the neural network in place of the existing parameterization scheme, is found to accurately generate a quasi-biennial oscillation of the tropical stratospheric winds, and correctly simulate the nonorographic gravity wave variability associated with El Niño–Southern Oscillation and stratospheric polar vortex variability. These internal sources of variability are essential for providing seasonal forecast skill, and the gravity wave forcing associated with them is reproduced without explicit training for these patterns.
Significance Statement
Climate simulations are required for providing advice to government, industry, and society regarding the expected climate on time scales of months to decades. Machine learning has the potential to improve the representation of some sources of variability in climate models that are too small to be directly simulated by the model. This study demonstrates that a neural network can simulate the variability due to atmospheric gravity waves that is associated with El Niño–Southern Oscillation and with the tropical and polar regions of the stratosphere. These details are important for a model to produce more accurate predictions of regional climate.
Abstract
This paper presents the Thunderstorm Nowcasting Tool (ThunderCast), a 24-h, year-round model for predicting the location of convection that is likely to initiate or remain a thunderstorm in the next 0–60 min in the continental United States, adapted from existing deep learning convection applications. ThunderCast utilizes a U-Net convolutional neural network for semantic segmentation trained on 320 km × 320 km data patches with four inputs and one target dataset. The inputs are satellite bands from the Geostationary Operational Environmental Satellite-16 (GOES-16) Advanced Baseline Imager (ABI) in the visible, shortwave infrared, and longwave infrared spectra, and the target is Multi-Radar Multi-Sensor (MRMS) radar reflectivity at the −10°C isotherm in the atmosphere. On a pixel-by-pixel basis, ThunderCast has high accuracy, recall, and specificity but is subject to false-positive predictions resulting in low precision. However, the number of false positives decreases when buffering the target values with a 15 km × 15 km centered window, indicating ThunderCast’s predictions are useful within a buffered area. To demonstrate the initial prediction capabilities of ThunderCast, three case studies are presented: a mesoscale convective vortex, sea-breeze convection, and monsoonal convection in the southwestern United States. The case studies illustrate that the ThunderCast model effectively nowcasts the location of newly initiated and ongoing active convection, within the next 60 min, under a variety of geographical and meteorological conditions.
Significance Statement
In this research, a machine learning model is developed for short-term (0–60 min) forecasting of thunderstorms in the continental United States using geostationary satellite imagery as inputs for predicting active convection based on radar thresholds. Pending additional testing, the model may be able to provide decision-support services for thunderstorm forecasting. The case studies presented here indicate the model is able to nowcast convective initiation with 5–35 min of lead time in areas without radar coverage and anticipate future locations of storms without additional environmental context.
Abstract
This paper presents the Thunderstorm Nowcasting Tool (ThunderCast), a 24-h, year-round model for predicting the location of convection that is likely to initiate or remain a thunderstorm in the next 0–60 min in the continental United States, adapted from existing deep learning convection applications. ThunderCast utilizes a U-Net convolutional neural network for semantic segmentation trained on 320 km × 320 km data patches with four inputs and one target dataset. The inputs are satellite bands from the Geostationary Operational Environmental Satellite-16 (GOES-16) Advanced Baseline Imager (ABI) in the visible, shortwave infrared, and longwave infrared spectra, and the target is Multi-Radar Multi-Sensor (MRMS) radar reflectivity at the −10°C isotherm in the atmosphere. On a pixel-by-pixel basis, ThunderCast has high accuracy, recall, and specificity but is subject to false-positive predictions resulting in low precision. However, the number of false positives decreases when buffering the target values with a 15 km × 15 km centered window, indicating ThunderCast’s predictions are useful within a buffered area. To demonstrate the initial prediction capabilities of ThunderCast, three case studies are presented: a mesoscale convective vortex, sea-breeze convection, and monsoonal convection in the southwestern United States. The case studies illustrate that the ThunderCast model effectively nowcasts the location of newly initiated and ongoing active convection, within the next 60 min, under a variety of geographical and meteorological conditions.
Significance Statement
In this research, a machine learning model is developed for short-term (0–60 min) forecasting of thunderstorms in the continental United States using geostationary satellite imagery as inputs for predicting active convection based on radar thresholds. Pending additional testing, the model may be able to provide decision-support services for thunderstorm forecasting. The case studies presented here indicate the model is able to nowcast convective initiation with 5–35 min of lead time in areas without radar coverage and anticipate future locations of storms without additional environmental context.
Abstract
Low-level marine clouds play a pivotal role in Earth’s weather and climate through their interactions with radiation, heat and moisture transport, and the hydrological cycle. These interactions depend on a range of dynamical and microphysical processes that result in a broad diversity of cloud types and spatial structures, and a comprehensive understanding of cloud morphology is critical for continued improvement of our atmospheric modeling and prediction capabilities moving forward. Deep learning has recently accelerated our ability to study clouds using satellite remote sensing, and machine learning classifiers have enabled detailed studies of cloud morphology. A major limitation of deep learning approaches to this problem, however, is the large number of hand-labeled samples that are required for training. This work applies a recently developed self-supervised learning scheme to train a deep convolutional neural network (CNN) to map marine cloud imagery to vector embeddings that capture information about mesoscale cloud morphology and can be used for satellite image classification. The model is evaluated against existing cloud classification datasets and several use cases are demonstrated, including: training cloud classifiers with very few labeled samples, interrogation of the CNN’s learned internal feature representations, cross-instrument application, and resilience against sensor calibration drift and changing scene brightness. The self-supervised approach learns meaningful internal representations of cloud structures and achieves comparable classification accuracy to supervised deep learning methods without the expense of creating large hand-annotated training datasets.
Abstract
Low-level marine clouds play a pivotal role in Earth’s weather and climate through their interactions with radiation, heat and moisture transport, and the hydrological cycle. These interactions depend on a range of dynamical and microphysical processes that result in a broad diversity of cloud types and spatial structures, and a comprehensive understanding of cloud morphology is critical for continued improvement of our atmospheric modeling and prediction capabilities moving forward. Deep learning has recently accelerated our ability to study clouds using satellite remote sensing, and machine learning classifiers have enabled detailed studies of cloud morphology. A major limitation of deep learning approaches to this problem, however, is the large number of hand-labeled samples that are required for training. This work applies a recently developed self-supervised learning scheme to train a deep convolutional neural network (CNN) to map marine cloud imagery to vector embeddings that capture information about mesoscale cloud morphology and can be used for satellite image classification. The model is evaluated against existing cloud classification datasets and several use cases are demonstrated, including: training cloud classifiers with very few labeled samples, interrogation of the CNN’s learned internal feature representations, cross-instrument application, and resilience against sensor calibration drift and changing scene brightness. The self-supervised approach learns meaningful internal representations of cloud structures and achieves comparable classification accuracy to supervised deep learning methods without the expense of creating large hand-annotated training datasets.
Abstract
Precipitation nowcasting is essential for weather-dependent decision-making, but it remains a challenging problem despite active research. The combination of radar data and deep learning methods has opened a new avenue for research. Radar data are well suited for precipitation nowcasting due to the high space–time resolution of the precipitation field. On the other hand, deep learning methods allow the exploitation of possible nonlinearities in the precipitation process. Thus far, deep learning approaches have demonstrated equal or better performance than optical flow methods for low-intensity precipitation, but nowcasting high-intensity events remains a challenge. In this study, we have built a deep generative model with various extensions to improve nowcasting of heavy precipitation intensities. Specifically, we consider different loss functions and how the incorporation of temperature data as an additional feature affects the model’s performance. Using radar data from KNMI and 5–90-min lead times, we demonstrate that the deep generative model with the proposed loss function and temperature feature outperforms other state-of-the-art models and benchmarks. Our model, with both loss function and feature extensions, is skillful at nowcasting precipitation for high rainfall intensities, up to 60-min lead time.
Abstract
Precipitation nowcasting is essential for weather-dependent decision-making, but it remains a challenging problem despite active research. The combination of radar data and deep learning methods has opened a new avenue for research. Radar data are well suited for precipitation nowcasting due to the high space–time resolution of the precipitation field. On the other hand, deep learning methods allow the exploitation of possible nonlinearities in the precipitation process. Thus far, deep learning approaches have demonstrated equal or better performance than optical flow methods for low-intensity precipitation, but nowcasting high-intensity events remains a challenge. In this study, we have built a deep generative model with various extensions to improve nowcasting of heavy precipitation intensities. Specifically, we consider different loss functions and how the incorporation of temperature data as an additional feature affects the model’s performance. Using radar data from KNMI and 5–90-min lead times, we demonstrate that the deep generative model with the proposed loss function and temperature feature outperforms other state-of-the-art models and benchmarks. Our model, with both loss function and feature extensions, is skillful at nowcasting precipitation for high rainfall intensities, up to 60-min lead time.
Abstract
Extreme wildfires continue to be a significant cause of human death and biodiversity destruction within countries that encompass the Mediterranean Basin. Recent worrying trends in wildfire activity (i.e., occurrence and spread) suggest that wildfires are likely to be highly impacted by climate change. To facilitate appropriate risk mitigation, it is imperative to identify the main drivers of extreme wildfires and assess their spatiotemporal trends, with a view to understanding the impacts of the changing climate on fire activity. To this end, we analyze the monthly burnt area due to wildfires over a region encompassing most of Europe and the Mediterranean Basin from 2001 to 2020 and identify high fire activity during this period in eastern Europe, Algeria, Italy, and Portugal. We build an extreme quantile regression model with a high-dimensional predictor set describing meteorological conditions, land-cover usage, and orography, for the domain. To model the complex relationships between the predictor variables and wildfires, we make use of a hybrid statistical deep learning framework that allows us to disentangle the effects of vapor pressure deficit (VPD), air temperature, and drought on wildfire activity. Our results highlight that while VPD, air temperature, and drought significantly affect wildfire occurrence, only VPD affects wildfire spread. Furthermore, to gain insights into the effect of climate trends on wildfires in the near future, we focus on the extreme wildfires in August 2001 and perturb VPD and temperature according to their observed trends. We find that, on average over Europe, trends in temperature (median over Europe: +0.04 K yr−1) lead to a relative increase of 17.1% and 1.6% in the expected frequency and severity, respectively, of wildfires in August 2001; similar analyses using VPD (median over Europe: +4.82 Pa yr−1) give respective increases of 1.2% and 3.6%. Our analysis finds evidence suggesting that global warming can lead to spatially nonuniform changes in wildfire activity.
Abstract
Extreme wildfires continue to be a significant cause of human death and biodiversity destruction within countries that encompass the Mediterranean Basin. Recent worrying trends in wildfire activity (i.e., occurrence and spread) suggest that wildfires are likely to be highly impacted by climate change. To facilitate appropriate risk mitigation, it is imperative to identify the main drivers of extreme wildfires and assess their spatiotemporal trends, with a view to understanding the impacts of the changing climate on fire activity. To this end, we analyze the monthly burnt area due to wildfires over a region encompassing most of Europe and the Mediterranean Basin from 2001 to 2020 and identify high fire activity during this period in eastern Europe, Algeria, Italy, and Portugal. We build an extreme quantile regression model with a high-dimensional predictor set describing meteorological conditions, land-cover usage, and orography, for the domain. To model the complex relationships between the predictor variables and wildfires, we make use of a hybrid statistical deep learning framework that allows us to disentangle the effects of vapor pressure deficit (VPD), air temperature, and drought on wildfire activity. Our results highlight that while VPD, air temperature, and drought significantly affect wildfire occurrence, only VPD affects wildfire spread. Furthermore, to gain insights into the effect of climate trends on wildfires in the near future, we focus on the extreme wildfires in August 2001 and perturb VPD and temperature according to their observed trends. We find that, on average over Europe, trends in temperature (median over Europe: +0.04 K yr−1) lead to a relative increase of 17.1% and 1.6% in the expected frequency and severity, respectively, of wildfires in August 2001; similar analyses using VPD (median over Europe: +4.82 Pa yr−1) give respective increases of 1.2% and 3.6%. Our analysis finds evidence suggesting that global warming can lead to spatially nonuniform changes in wildfire activity.
Abstract
Recently, there has been a surge of research on data-driven weather forecasting systems, especially applications based on convolutional neural networks (CNNs). These are usually trained on atmospheric data represented on regular latitude-longitude grids, neglecting the curvature of the Earth. We assess the benefit of replacing the standard convolution operations with an adapted convolution operation which takes into account the geometry of the underlying data (Spherenet convolution), specifically near the poles. Additionally, we assess the effect of including the information that the two hemispheres of the Earth have “flipped” properties - for example cyclones circulating in opposite directions - into the structure of the network. Both approaches are examples of physics-informed machine learning. The methods are tested on the WeatherBench dataset, at a resolution of ∼ 1.4° which is higher than many previous studies on CNNs for weather forecasting. For most lead times up to day +10 for 500 hPa geopotential and 850 hPa temperature, we find that using Spherenet convolution or including hemisphere-specific information individually lead to improvement in forecast skill. Combining the two methods typically gives the highest forecast skill. Our version of Spherenet is implemented flexibly and scales well to high resolution datasets, but is still significantly more expensive than a standard convolution operation. Finally, we analyze cases with high forecast error. These occur mainly in winter, and are relatively consistent across different training realizations of the networks, pointing to flow-dependent atmospheric predictability.
Abstract
Recently, there has been a surge of research on data-driven weather forecasting systems, especially applications based on convolutional neural networks (CNNs). These are usually trained on atmospheric data represented on regular latitude-longitude grids, neglecting the curvature of the Earth. We assess the benefit of replacing the standard convolution operations with an adapted convolution operation which takes into account the geometry of the underlying data (Spherenet convolution), specifically near the poles. Additionally, we assess the effect of including the information that the two hemispheres of the Earth have “flipped” properties - for example cyclones circulating in opposite directions - into the structure of the network. Both approaches are examples of physics-informed machine learning. The methods are tested on the WeatherBench dataset, at a resolution of ∼ 1.4° which is higher than many previous studies on CNNs for weather forecasting. For most lead times up to day +10 for 500 hPa geopotential and 850 hPa temperature, we find that using Spherenet convolution or including hemisphere-specific information individually lead to improvement in forecast skill. Combining the two methods typically gives the highest forecast skill. Our version of Spherenet is implemented flexibly and scales well to high resolution datasets, but is still significantly more expensive than a standard convolution operation. Finally, we analyze cases with high forecast error. These occur mainly in winter, and are relatively consistent across different training realizations of the networks, pointing to flow-dependent atmospheric predictability.
Abstract
In this study, we introduce a self-supervised deep neural network approach to classify satellite images into independent classes of cloud systems. The driving question of the work is to understand whether our algorithm can capture cloud variability and identify distinct cloud regimes. Ultimately, we want to achieve generalization such that the algorithm can be applied to unseen data and thus help automatically extract relevant information important to atmospheric science and renewable energy applications from the ever-increasing satellite data stream. We use cloud optical depth (COD) retrieved from postprocessed high-resolution Meteosat Second Generation (MSG) satellite data as input for the network. The network’s architecture is based on the DeepCluster, version 2, and consists of a convolutional neural network and a multilayer perceptron, followed by a k-means algorithm. We explore the network’s training capabilities by analyzing the centroids and feature vectors found from progressive minimization of the cross-entropy loss function. By making use of additional MSG retrieval products based on multichannel information, we derive the optimum number of classes to determine independent cloud regimes. We test the network capabilities on COD data from 2013 and find that the trained neural network gives insights into the cloud systems’ persistence and transition probability. The generalization on the 2015 data shows good skills of our algorithm with unseen data, but results depend on the spatial scale of cloud systems.
Significance Statement
This study uses a self-supervised deep neural network to identify distinct cloud systems from cloud optical depth satellite images over central Europe. Satellite-retrieved products support the physical interpretation of the identified cloud classes and help optimize the number of identified classes. The trained neural network gives insights into cloud systems’ persistence and transition probability. The generalization capacity of the deep neural network with unseen data is promising but depends on the spatial scale of cloud systems.
Abstract
In this study, we introduce a self-supervised deep neural network approach to classify satellite images into independent classes of cloud systems. The driving question of the work is to understand whether our algorithm can capture cloud variability and identify distinct cloud regimes. Ultimately, we want to achieve generalization such that the algorithm can be applied to unseen data and thus help automatically extract relevant information important to atmospheric science and renewable energy applications from the ever-increasing satellite data stream. We use cloud optical depth (COD) retrieved from postprocessed high-resolution Meteosat Second Generation (MSG) satellite data as input for the network. The network’s architecture is based on the DeepCluster, version 2, and consists of a convolutional neural network and a multilayer perceptron, followed by a k-means algorithm. We explore the network’s training capabilities by analyzing the centroids and feature vectors found from progressive minimization of the cross-entropy loss function. By making use of additional MSG retrieval products based on multichannel information, we derive the optimum number of classes to determine independent cloud regimes. We test the network capabilities on COD data from 2013 and find that the trained neural network gives insights into the cloud systems’ persistence and transition probability. The generalization on the 2015 data shows good skills of our algorithm with unseen data, but results depend on the spatial scale of cloud systems.
Significance Statement
This study uses a self-supervised deep neural network to identify distinct cloud systems from cloud optical depth satellite images over central Europe. Satellite-retrieved products support the physical interpretation of the identified cloud classes and help optimize the number of identified classes. The trained neural network gives insights into cloud systems’ persistence and transition probability. The generalization capacity of the deep neural network with unseen data is promising but depends on the spatial scale of cloud systems.
Abstract
Snow is an important component of Earth’s climate system, and snowfall intensity and variation often significantly impact society, the environment, and ecosystems. Understanding monthly and seasonal snowfall intensity and variations is challenging because of multiple controlling mechanisms at different spatial and temporal scales. Using 65 years of in situ snowfall observation, we evaluated seven machine learning algorithms for modeling monthly and seasonal snowfall in the Lower Peninsula of Michigan (LPM) based on selected environmental and climatic variables. Our results show that the Bayesian additive regression tree (BART) has the best fitting (R 2 = 0.88) and out-of-sample estimation skills (R 2 = 0.58) for the monthly mean snowfall followed by the random forest model. The BART also demonstrates strong estimation skills for large monthly snowfall amounts. Both BART and the random forest models suggest that topography, local/regional environmental factors, and teleconnection indices can significantly improve the estimation of monthly and seasonal snowfall amounts in the LPM. These statistical models based on machine learning algorithms can incorporate variables at multiple scales and address nonlinear responses of snowfall variations to environmental/climatic changes. It demonstrated that the multiscale machine learning techniques provide a reliable and computationally efficient approach to modeling snowfall intensity and variability.
Abstract
Snow is an important component of Earth’s climate system, and snowfall intensity and variation often significantly impact society, the environment, and ecosystems. Understanding monthly and seasonal snowfall intensity and variations is challenging because of multiple controlling mechanisms at different spatial and temporal scales. Using 65 years of in situ snowfall observation, we evaluated seven machine learning algorithms for modeling monthly and seasonal snowfall in the Lower Peninsula of Michigan (LPM) based on selected environmental and climatic variables. Our results show that the Bayesian additive regression tree (BART) has the best fitting (R 2 = 0.88) and out-of-sample estimation skills (R 2 = 0.58) for the monthly mean snowfall followed by the random forest model. The BART also demonstrates strong estimation skills for large monthly snowfall amounts. Both BART and the random forest models suggest that topography, local/regional environmental factors, and teleconnection indices can significantly improve the estimation of monthly and seasonal snowfall amounts in the LPM. These statistical models based on machine learning algorithms can incorporate variables at multiple scales and address nonlinear responses of snowfall variations to environmental/climatic changes. It demonstrated that the multiscale machine learning techniques provide a reliable and computationally efficient approach to modeling snowfall intensity and variability.
Abstract
With increasing interest in explaining machine learning (ML) models, this paper synthesizes many topics related to ML explainability. We distinguish explainability from interpretability, local from global explainability, and feature importance versus feature relevance. We demonstrate and visualize different explanation methods, how to interpret them, and provide a complete Python package (scikit-explain) to allow future researchers and model developers to explore these explainability methods. The explainability methods include Shapley additive explanations (SHAP), Shapley additive global explanation (SAGE), and accumulated local effects (ALE). Our focus is primarily on Shapley-based techniques, which serve as a unifying framework for various existing methods to enhance model explainability. For example, SHAP unifies methods like local interpretable model-agnostic explanations (LIME) and tree interpreter for local explainability, while SAGE unifies the different variations of permutation importance for global explainability. We provide a short tutorial for explaining ML models using three disparate datasets: a convection-allowing model dataset for severe weather prediction, a nowcasting dataset for sub-freezing road surface prediction, and satellite-based data for lightning prediction. In addition, we showcase the adverse effects that correlated features can have on the explainability of a model. Finally, we demonstrate the notion of evaluating model impacts of feature groups instead of individual features. Evaluating the feature groups mitigates the impacts of feature correlations and can provide a more holistic understanding of the model. All code, models, and data used in this study are freely available to accelerate the adoption of machine learning explainability in the atmospheric and other environmental sciences.
Abstract
With increasing interest in explaining machine learning (ML) models, this paper synthesizes many topics related to ML explainability. We distinguish explainability from interpretability, local from global explainability, and feature importance versus feature relevance. We demonstrate and visualize different explanation methods, how to interpret them, and provide a complete Python package (scikit-explain) to allow future researchers and model developers to explore these explainability methods. The explainability methods include Shapley additive explanations (SHAP), Shapley additive global explanation (SAGE), and accumulated local effects (ALE). Our focus is primarily on Shapley-based techniques, which serve as a unifying framework for various existing methods to enhance model explainability. For example, SHAP unifies methods like local interpretable model-agnostic explanations (LIME) and tree interpreter for local explainability, while SAGE unifies the different variations of permutation importance for global explainability. We provide a short tutorial for explaining ML models using three disparate datasets: a convection-allowing model dataset for severe weather prediction, a nowcasting dataset for sub-freezing road surface prediction, and satellite-based data for lightning prediction. In addition, we showcase the adverse effects that correlated features can have on the explainability of a model. Finally, we demonstrate the notion of evaluating model impacts of feature groups instead of individual features. Evaluating the feature groups mitigates the impacts of feature correlations and can provide a more holistic understanding of the model. All code, models, and data used in this study are freely available to accelerate the adoption of machine learning explainability in the atmospheric and other environmental sciences.