Limitations of XAI Methods for Process-Level Understanding in the Atmospheric Sciences

Sam J. Silva aDepartment of Earth Sciences, University of Southern California, Los Angeles, California
bDepartment of Civil and Environmental Engineering, University of Southern California, Los Angeles, California

Search for other papers by Sam J. Silva in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0001-6343-8382
and
Christoph A. Keller cMorgan State University, Baltimore, Maryland
dNASA Global Modeling and Assimilation Office, Greenbelt, Maryland

Search for other papers by Christoph A. Keller in
Current site
Google Scholar
PubMed
Close
Open access

We are aware of a technical issue preventing figures and tables from showing in some newly published articles in the full-text HTML view.
While we are resolving the problem, please use the online PDF version of these articles to view figures and tables.

Abstract

Explainable artificial intelligence (XAI) methods are becoming popular tools for scientific discovery in the Earth and atmospheric sciences. While these techniques have the potential to revolutionize the scientific process, there are known limitations in their applicability that are frequently ignored. These limitations include that XAI methods explain the behavior of the AI model and not the behavior of the training dataset, and that caution should be used when these methods are applied to datasets with correlated and dependent features. Here, we explore the potential cost associated with ignoring these limitations with a simple case study from the atmospheric chemistry literature – learning the reaction rate of a bimolecular reaction. We demonstrate that dependent and highly correlated input features can lead to spurious process-level explanations. We posit that the current generation of XAI techniques should largely only be used for understanding system-level behavior and recommend caution when using XAI methods for process-level scientific discovery in the Earth and atmospheric sciences.

© 2024 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Sam J. Silva, samsilva@usc.edu

Abstract

Explainable artificial intelligence (XAI) methods are becoming popular tools for scientific discovery in the Earth and atmospheric sciences. While these techniques have the potential to revolutionize the scientific process, there are known limitations in their applicability that are frequently ignored. These limitations include that XAI methods explain the behavior of the AI model and not the behavior of the training dataset, and that caution should be used when these methods are applied to datasets with correlated and dependent features. Here, we explore the potential cost associated with ignoring these limitations with a simple case study from the atmospheric chemistry literature – learning the reaction rate of a bimolecular reaction. We demonstrate that dependent and highly correlated input features can lead to spurious process-level explanations. We posit that the current generation of XAI techniques should largely only be used for understanding system-level behavior and recommend caution when using XAI methods for process-level scientific discovery in the Earth and atmospheric sciences.

© 2024 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Sam J. Silva, samsilva@usc.edu

1. Introduction

Explainable artificial intelligence (XAI) is a set of artificial intelligence (AI) techniques that can provide human-understandable explanations for the decision-making processes and predictions of AI models. In the atmospheric sciences, XAI has seen rapid adoption in conjunction with the rise in popularity of AI techniques more generally (e.g., Barnes et al. 2020; Holzinger et al. 2022; McGovern et al. 2019). As they are currently used, these XAI methods aid in scientific discovery, aim to promote trust and transparency in AI prediction systems, and broadly lead to improved understanding of the strengths and limitations of a particular AI application (e.g., Ebert-Uphoff and Hilburn 2020; Mamalakis et al. 2022; McGovern et al. 2020; Molina et al. 2021; Silva et al. 2022; Toms et al. 2020).

There are several important caveats in using these XAI methods. Principal among them is that the explanations derived tell users about the AI model and do not necessarily tell users anything useful about the dataset used to train the model (e.g., Molnar 2019; Molnar et al. 2021). In addition, many of these techniques assume that model input features are independent (e.g., Lundberg and Lee 2017). Both of these caveats are commonly violated when XAI methods are used in the Earth and atmospheric sciences. The purpose of this study is to explore the potential pitfalls in ignoring these known issues.

Variables in the atmosphere often exhibit high levels of dependence that can manifest in data as high correlation. Leveraging this correlation for predictive skill has provided a major benefit to many modern machine learning tasks, where the exact details of the relationship being studied are not as critical as accurate prediction (e.g., Geiss et al. 2022; Rasp et al. 2018; Silva et al. 2021b). However, these machine learning methods are increasingly being used for scientific discovery. In such cases, the methods are used to predict a quantity of interest on the basis of a set of these correlated inputs. The behavior of that model is then interrogated using XAI methods, the results of which are often used for hypothesis generation surrounding a particular study case (e.g., Silva et al. 2022; Wang et al. 2022).

Here, we investigate the impact of feature dependence on commonly used explainability techniques. Using predictions of atmospheric chemical reaction processes with boosted regression trees as a case study, we demonstrate that correlated and dependent features lead to spurious process-level explanations. This effect is sufficiently large as to potentially lead to incorrect scientific understanding of the underlying process. In the highlighted case here, a user would attribute chemical reactions to the fundamentally wrong compounds. These results are broadly applicable to the use of explainability methods with tree-based machine learning models across the Earth and atmospheric sciences and suggest caution in their interpretation.

2. Atmospheric chemistry case study

To investigate the impact of correlated and dependent features on XAI methods, we use simulated atmospheric chemistry data. These data are a useful case study because they represent a real machine learning use case with an active research community (e.g., Ivatt and Evans 2020; Keller and Evans 2019; Kelp et al. 2022; Sturm et al. 2023; Sturm and Wexler 2022), and the system represents a wide range of correlations across chemical species. The dataset we used was generated through coarse-resolution simulations from the NASA GEOS composition forecast modeling system (GEOS-CF; Keller et al. 2021), which predicts the global distribution of hundreds of chemical species using the GEOS-Chem chemistry mechanism (Bey et al. 2001). We sample a single time step of the simulated global 3D global distribution of atmospheric chemical species at approximately 2° × 2° spatial resolution (c48) with 72 vertical levels on 1230 UTC 2 June. We then subset the model output to 17 of the species contained within the SuperFast chemical mechanism, a computationally efficient and tractable use case (Cameron-Smith et al. 2006; Kelp et al. 2022; Silva et al. 2021a). After removing duplicate rows, the dataset has 922 896 samples for each chemical species, each sample corresponding to a grid box in the 3D model. The variables in this dataset are known to be chemically interdependent, with dependencies based on the structure of the GEOS-CF model. This is evident in the spatial correlation plots in Fig. 1, showing a histogram of all cross correlations in the dataset (without self-correlations) and a heat map of all correlation combinations. The nonidentity correlations vary from −0.6 to 0.8, centered near a correlation of 0. This is broadly consistent with the highly coupled nature of atmospheric chemistry, with certain variables strongly dependent on others through direct chemical reactions. It is worth noting that these correlations, while high, are not extreme. Correlations of up to 0.8 are reasonably common in the atmospheric sciences.

Fig. 1.
Fig. 1.

(left) Correlation histogram and (right) heat map of all variables in the atmospheric chemistry case study dataset. The histogram on the right shows the distribution for both OH and MP individually.

Citation: Artificial Intelligence for the Earth Systems 3, 1; 10.1175/AIES-D-23-0045.1

The target prediction quantity investigated here is the bimolecular reaction rate from a particular reaction. Bimolecular reactions are generally of the following form:
A+BC,
rate=kA,B×[A]×[B],
where A and B are chemical reactants and C is a chemical product. The reaction rate (molecules cm−3 s−1) is calculated as the product of the concentration of chemical species A and B along with the reaction-rate constant kA,B (cm3 molecules−1 s−1), In this case study, we use the oxidation of methyl hydroperoxide (MP; CH3OOH) by the hydroxyl radical (OH) to various product chemical species:
MP+OHproducts,
rate=kOH,MP×[OH]×[MP].
This reaction is present in most chemical mechanisms and is a chemical intermediate in the oxidation of methane in the atmosphere. Both OH and MP are known to be strongly dependent on other species in the chemical system and thus have moderately high correlations with these other species (see Fig. 1). The rate constant used was generated following the SuperFast mechanism, assuming constant temperature of 300 K.

We ultimately chose this system and prediction task because of the relative simplicity of interpreting the true functional form we are trying to learn (multiplication) and the high levels of dependence across the atmospheric chemical space.

3. Boosted regression trees and explainability

We predicted the reaction rate using boosted regression trees, implemented with the extreme gradient boosting (XGBoost) software package (Chen and Guestrin 2016). Boosted regression trees, and XGBoost in particular, have been used widely across the atmospheric chemistry space and Earth and atmospheric sciences more generally (e.g., Batunacun et al. 2021; Dietz et al. 2019; Ivatt and Evans 2020; Lee et al. 2019; Silva et al. 2020).

We investigate three widely used XAI methods applied to XGBoost models: gain, permutation feature importance, and Shapley additive explanations (SHAP) analysis. Gain and SHAP analysis methods are available directly from the XGBoost library, and we implement permutation feature importance ourselves for this work. We describe these methods in brief here. Each of them provides a quantitative measure of feature importance that is sensitive to the training dataset and feature distributions. For more detail on explainability techniques, see Molnar (2019) and McGovern et al. (2019).

Gain is an importance metric that is unique to tree-based models. It represents the improvement in accuracy brought to a model by the splits on a given variable within the tree structure. The addition of a split on a variable can (and frequently does) improve the accuracy of a model relative to if that split is not included. At a high level, that improvement of accuracy (with the inclusion of a regularization term) is what is used to calculated gain.

Permutation feature importance explains the model behavior by ranking the variables by their contribution to an accurate prediction. The accuracy of a trained model is assessed by randomly permuting input variables one at a time. The influence of that random permutation on the model predictive skill is quantified across many prediction cases. The average difference in error between permuted cases and the original dataset is used as the importance ranking for each individually permuted variable. In this way the variables ranked as “most important” are those that when permuted lead to the worst predictions. Permutation feature importance is model agnostic and works on all widely used machine learning models (e.g., McGovern et al. 2019).

SHAP analysis is an XAI technique that uses methods based in game theory literature to explain individual model predictions (Lundberg et al. 2019, 2020). In contrast to gain and permutation feature importance, which provide one bulk explanation for the entire model, SHAP analysis provides information at the individual prediction level. This ability to explain individual predictions, termed “local explainability,” has made SHAP analysis a popular choice for interrogating machine learning methods in the Earth and atmospheric sciences (e.g., Batunacun et al. 2021; Wang et al. 2021). For each model prediction, SHAP values can be calculated for each input feature. These SHAP values quantify the influence of every feature toward either increasing or decreasing the final predicted quantity using a local surrogate modeling method. Higher-magnitude SHAP values indicate that a variable has a larger influence on the resulting prediction. Averaging the magnitude of all SHAP values leads to a bulk, or “global,” importance ranking that can be compared with those from gain and permutation feature importance. The detailed derivation and mathematical calculation of SHAP values for machine learning models can be found in Lundberg et al. (2019, 2020). As with permutation feature importance, SHAP analysis is model agnostic, works on all widely used machine learning models, and the implementation in XGBoost is optimized for computational efficiency (Lundberg and Lee 2017).

4. Model training and XAI experiments

We trained a boosted regression tree to predict the reaction rate following Eq. (2) from chemical species concentrations. To explore the impact of correlated input features, we selected all species from the SuperFast mechanism with correlations greater than 0.5 with either MP or OH as input features to the prediction model. The final species feature list was (CH2O, CH4, CO, H2O, H2O2, HO2, MO2, MP, OH). We log-transformed the target prediction quantity (reaction rate) for improved predictive skill. The dataset contains 922 896 samples and was randomly split into training, validation, and test sets with ratios of 0.85/0.1/0.05. Model hyperparameters selected include setting the maximum depth of a tree (“max_depth”) to 12 and the learning rate (“eta”) to 0.1. We trained the model for 50 boosting iterations using a mean square error cost function. All other hyperparameters were kept at the package default values. Results discussed further in this paper are solely based on the test dataset.

We evaluated the impact of variable correlation on the prediction of atmospheric chemical reaction rates through comparison with a random baseline. In this random baseline, the relationships between all variables are broken through randomly reordering each variable in the original dataset. This was achieved by an individual row-wise permutation of each input feature (here, chemical species) by a randomly shifted index. This breaks all correlation and dependence in the dataset but importantly preserves the individual variable distributions (Gaussian, lognormal, etc.). From there, the reaction rate is recomputed and used as a new target for training.

We posit that the difference in the results of the XAI methods between the original and randomized baseline will inform us as to the potential scale of the issue associated with ignoring feature correlations in XAI analyses.

5. Results and discussion

We trained an XGBoost model to predict the MP + OH reaction rate on both the original dataset and the randomized baseline. Overall performance is very good for both cases. Predictions of the original dataset had a relative mean square error of 0.008 and a correlation coefficient of 0.999. The randomized dataset showed even better predictive skill, with a mean square error on the test dataset of 0.006 and a correlation coefficient of 0.999. Scatterplots of the respective model’s predictive skill are shown in Fig. 2 (residual plots are shown in Fig. S1 in the online supplemental material). This very high skill for both models is not particularly surprising given the reasonable triviality in learning multiplication with high-capacity machine learning models.

Fig. 2.
Fig. 2.

Performance scatterplots of the log10 of the reaction-rate (molecules cm−3 s−1) predictions for this case study for (left) the original dataset and (right) the randomized baseline.

Citation: Artificial Intelligence for the Earth Systems 3, 1; 10.1175/AIES-D-23-0045.1

We then ran the three XAI methods described above to generate global feature importance rankings for this prediction task. Results are summarized in Fig. 3 for both the original and randomized datasets. The most striking result is the distinction between the importance rankings in the original dataset and the randomized baseline. A particular input feature, the compound HO2, is shown to be important for all three methods in the original dataset despite having no explicit connection to the target quantity described in Eq. (2). In the randomized baseline experiments, only OH and MP are ranked as important. All other variables, which are not used in the direct calculation, are correctly given trivially low importance rankings.

Fig. 3.
Fig. 3.

Average importance across the three methods. The original dataset is shown in black, and the randomized baseline is shown in red. Differences between the original dataset and randomized baseline importance values are highlighted with gray lines. Values very close to zero have both the original and randomized points overplotted.

Citation: Artificial Intelligence for the Earth Systems 3, 1; 10.1175/AIES-D-23-0045.1

The results shown in Fig. 3 demonstrate the distinction between process- and system-level explanations. HO2 is an oxidant and is tightly coupled to the system of chemical reactions in the atmosphere and does correlate well with the rate (Seinfeld and Pandis 2016). It is correct to state that HO2 has an important influence on the reaction rate predicted here through the influence that HO2 has on the overall atmospheric chemical system. However, HO2 is not important for the specific instantaneous process of the reaction between OH and MP, as predicted here. In this way, these XAI methods are only useful as system-level explanations given this particular correlated input distribution.

In the random baseline scenario, the importance rankings for MP and OH differ. This difference exists even though in Eq. (2) the reaction rate is calculated while mathematically weighing both variables equally through their product. This difference in importance demonstrates how many XAI methods depend on the distribution (variance, etc.) of an individual variable and how differences in distributions manifest in the final explainability result. If both variables were randomly sampled from the same distribution (e.g., Gaussian) with the same variance and mean, their importance would be identical.

The spurious results from the overall importance rankings are echoed in so-called local explanations from the SHAP analysis. Local explanations provide explanations of individual model predictions. This can be very useful to study the model behavior at a high level of detail. Using the local SHAP analysis method, we tracked the number of times that both OH and MP appeared in the top three most important variables for a given prediction as assessed by the highest-magnitude SHAP values. In ∼91% of individual prediction explanations, either OH or MP was not present in the top three most important variables, with at least one commonly replaced by HO2. In the random baseline, SHAP analysis assigns the top two most important variables to OH and MP in nearly all cases.

In this case study, knowing the true functional form of the target variable makes it easy to identify the model trained on the original dataset as untrustworthy from a process-level perspective. The challenge arises if we imagine this scenario as one in which we did not know the functional form of the reaction rate a priori. If we were using this technique for scientific discovery, the results shown in Fig. 3 for the correlated distribution could very easily lead a reasonable scientist astray. It is reasonable to assume that an unknown reaction in the atmosphere is dependent on HO2 as a reactant. HO2 is a very common reactant in the atmosphere, participating in many reactions in current models of atmospheric chemistry (Silva et al. 2021a). If the drivers of the reaction rate were unknown and we used XAI techniques to try to discover them at a process level, variable dependence would lead to an incorrect answer.

It is important to note that the issue demonstrated here cannot be fixed through more advanced XAI techniques currently available. In the correlated dataset example, the model trained on the original dataset makes high-quality predictions using information from HO2. The issue is that HO2 is only important at a system level as an indicator of active oxidation chemistry. HO2 is not important for the specific reaction process—it is not part of the reaction calculation. The consequences of this are that any correct post hoc XAI technique will be fundamentally out of line with the scientific process being studied. This would even be true if we were to use certain methods that aim to help address issues of feature dependence in XAI techniques. These include updated but computationally expensive calculations of SHAP (e.g., Aas et al. 2020) and block permutation feature importance (e.g., McGovern et al. 2019). These spurious results are directly a consequence of XAI techniques providing explanations for AI models and not necessarily the training datasets. The incorrect explanation is due to the XGBoost model behavior and not the XAI method chosen.

6. Implications for XAI in atmospheric sciences

We explored the impact of feature dependence on XAI methods in the Earth and atmospheric sciences. Using a simple but realistic example from atmospheric chemistry, we demonstrated that correlated data lead to incorrect process-level explanations with boosted regression trees. These incorrect explanations exist despite very high model accuracy, highlighting the challenge that high-quality predictions and scientifically reasonable explanations are insufficient evidence that a machine learning model is correctly representing the processes being studied.

This work reinforces that XAI techniques inform us about AI model behavior and that good predictive skill of an AI model is not a sufficient criterion for assuming that the model correctly represents the data-generation properties of the training dataset. Instead of learning specific processes, under atmospherically relevant correlations the model is learning systemwide behavior. This does not render XAI methods useless or invalidate the predictive skill of the model, and it in fact directly reflects the properties of the training dataset. However, this result does demonstrate that these XAI techniques should not be used on their own for scientific discovery and that instead these techniques should be used as part of a suite of analytical approaches to ensure robust process-level scientific discovery (physical modeling, causal inference, etc.). We believe that these results are broadly applicable across the Earth and atmospheric sciences and suggest caution in using and interpreting the results of XAI methods.

Acknowledgments.

Author Silva acknowledges NASA Grant 80NSSC23K0523 for supporting this work. Resources supporting the GEOS model simulations were provided by the NASA Center for Climate Simulation at the Goddard Space Flight Center (https://www.nccs.nasa.gov/services/discover).

Data availability statement.

All data and analysis code are publicly available online (https://doi.org/10.5281/zenodo.10520095).

REFERENCES

  • Aas, K., M. Jullum, and A. Løland, 2020: Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. arXiv, 1903.10464v3, https://doi.org/10.48550/arXiv.1903.10464.

  • Barnes, E. A., B. Toms, J. W. Hurrell, I. Ebert‐Uphoff, C. Anderson, and D. Anderson, 2020: Indicator patterns of forced change learned by an artificial neural network. J. Adv. Model. Earth Syst., 12, e2020MS002195, https://doi.org/10.1029/2020MS002195.

    • Search Google Scholar
    • Export Citation
  • Batunacun, R. Wieland, T. Lakes, and C. Nendel, 2021: Using Shapley additive explanations to interpret extreme gradient boosting predictions of grassland degradation in Xilingol, China. Geosci. Model Dev., 14, 14931510, https://doi.org/10.5194/gmd-14-1493-2021.

    • Search Google Scholar
    • Export Citation
  • Bey, I., and Coauthors, 2001: Global modeling of tropospheric chemistry with assimilated meteorology: Model description and evaluation. J. Geophys. Res., 106, 23 07323 095, https://doi.org/10.1029/2001JD000807.

    • Search Google Scholar
    • Export Citation
  • Cameron-Smith, P., J.-F. Lamarque, P. Connell, C. Chuang, and F. Vitt, 2006: Toward an Earth system model: Atmospheric chemistry, coupling, and petascale computing. J. Phys.: Conf. Ser., 46, 343350, https://doi.org/10.1088/1742-6596/46/1/048.

    • Search Google Scholar
    • Export Citation
  • Chen, T., and C. Guestrin, 2016: XGBoost: A scalable tree boosting system. Proc. 22nd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, San Francisco, CA, Association for Computing Machinery, 785–794, https://doi.org/10.1145/2939672.2939785.

  • Dietz, S. J., P. Kneringer, G. J. Mayr, and A. Zeileis, 2019: Low-visibility forecasts for different flight planning horizons using tree-based boosting models. Adv. Stat. Climatol. Meteor. Oceanogr., 5, 101114, https://doi.org/10.5194/ascmo-5-101-2019.

    • Search Google Scholar
    • Export Citation
  • Ebert-Uphoff, I., and K. Hilburn, 2020: Evaluation, tuning, and interpretation of neural networks for working with images in meteorological applications. Bull. Amer. Meteor. Soc., 101, E2149E2170, https://doi.org/10.1175/BAMS-D-20-0097.1.

    • Search Google Scholar
    • Export Citation
  • Geiss, A., S. J. Silva, and J. C. Hardin, 2022: Downscaling atmospheric chemistry simulations with physically consistent deep learning. Geosci. Model Dev., 15, 66776694, https://doi.org/10.5194/gmd-15-6677-2022.

    • Search Google Scholar
    • Export Citation
  • Holzinger, A., R. Goebel, R. Fong, T. Moon, K.-R. Müller, and W. Samek, Eds., 2022: xxAI—Beyond Explainable AI: International Workshop, Held in Conjunction with ICML 2020, July 18, 2020, Vienna, Austria, Revised and Extended Papers. Lecture Notes in Computer Science, Vol. 13200, Springer International Publishing, 397 pp., https://doi.org/10.1007/978-3-031-04083-2.

  • Ivatt, P. D., and M. J. Evans, 2020: Improving the prediction of an atmospheric chemistry transport model using gradient-boosted regression trees. Atmos. Chem. Phys., 20, 80638082, https://doi.org/10.5194/acp-20-8063-2020.

    • Search Google Scholar
    • Export Citation
  • Keller, C. A., and M. J. Evans, 2019: Application of random forest regression to the calculation of gas-phase chemistry within the GEOS-Chem chemistry model v10. Geosci. Model Dev., 12, 12091225, https://doi.org/10.5194/gmd-12-1209-2019.

    • Search Google Scholar
    • Export Citation
  • Keller, C. A., and Coauthors, 2021: Description of the NASA GEOS composition forecast modeling system GEOS-CF v1.0. J. Adv. Model. Earth Syst., 13, e2020MS002413, https://doi.org/10.1029/2020MS002413.

    • Search Google Scholar
    • Export Citation
  • Kelp, M. M., D. J. Jacob, H. Lin, and M. P. Sulprizio, 2022: An online‐learned neural network chemical solver for stable long‐term global simulations of atmospheric chemistry. J. Adv. Model. Earth Syst., 14, e2021MS002926, https://doi.org/10.1029/2021MS002926.

    • Search Google Scholar
    • Export Citation
  • Lee, Y., D. Han, M.-H. Ahn, J. Im, and S. J. Lee, 2019: Retrieval of total precipitable water from Himawari-8 AHI data: A comparison of random forest, extreme gradient boosting, and deep neural network. Remote Sens., 11, 1741, https://doi.org/10.3390/rs11151741.

    • Search Google Scholar
    • Export Citation
  • Lundberg, S., and S.-I. Lee, 2017: A unified approach to interpreting model predictions. arXiv, 1705.0787v2, https://doi.org/10.48550/arXiv.1705.07874.

  • Lundberg, S., G. G. Erion, and S.-I. Lee, 2019: Consistent individualized feature attribution for tree ensembles. arXiv, 1802.03888v3, https://doi.org/10.48550/arXiv.1802.03888.

  • Lundberg, S., and Coauthors, 2020: From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell., 2, 5667, https://doi.org/10.1038/s42256-019-0138-9.

    • Search Google Scholar
    • Export Citation
  • Mamalakis, A., E. A. Barnes, and I. Ebert-Uphoff, 2022: Carefully choose the baseline: Lessons learned from applying XAI attribution methods for regression tasks in geoscience. arXiv, 2208.09473v1, https://doi.org/10.48550/arXiv.2208.09473.

  • McGovern, A., R. Lagerquist, D. J. Gagne II, G. E. Jergensen, K. L. Elmore, C. R. Homeyer, and T. Smith, 2019: Making the black box more transparent: Understanding the physical implications of machine learning. Bull. Amer. Meteor. Soc., 100, 21752199, https://doi.org/10.1175/BAMS-D-18-0195.1.

    • Search Google Scholar
    • Export Citation
  • McGovern, A., R. Lagerquist, and D. J. Gagne II, 2020: Using machine learning and model interpretation and visualization techniques to gain physical insights in atmospheric science. Eighth Int. Conf. on Learning Representations, Online, ICLR, https://ai4earthscience.github.io/iclr-2020-workshop/papers/ai4earth16.pdf.

  • Molina, M. J., D. J. Gagne, and A. F. Prein, 2021: A benchmark to test generalization capabilities of deep learning methods to classify severe convective storms in a changing climate. Earth Space Sci., 8, e2020EA001490, https://doi.org/10.1029/2020EA001490.

    • Search Google Scholar
    • Export Citation
  • Molnar, C., 2019: Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. Leanpub, 320 pp.

  • Molnar, C., and Coauthors, 2021: General pitfalls of model-agnostic interpretation methods for machine learning models. arXiv, 2007.04131v2, https://doi.org/10.48550/arXiv.2007.04131.

  • Rasp, S., M. S. Pritchard, and P. Gentine, 2018: Deep learning to represent subgrid processes in climate models. Proc. Natl. Acad. Sci. USA, 115, 96849689, https://doi.org/10.1073/pnas.1810286115.

    • Search Google Scholar
    • Export Citation
  • Seinfeld, J. H., and S. N. Pandis, 2016: Atmospheric Chemistry and Physics: From Air Pollution to Climate Change. 3rd ed. John Wiley and Sons, 1152 pp.

  • Silva, S. J., D. A. Ridley, and C. L. Heald, 2020: Exploring the constraints on simulated aerosol sources and transport across the North Atlantic with island-based sun photometers. Earth Space Sci., 7, e2020EA001392, https://doi.org/10.1029/2020EA001392.

    • Search Google Scholar
    • Export Citation
  • Silva, S. J., S. M. Burrows, M. J. Evans, and M. Halappanavar, 2021a: A graph theoretical intercomparison of atmospheric chemical mechanisms. Geophys. Res. Lett., 48, e2020GL090481, https://doi.org/10.1029/2020GL090481.

    • Search Google Scholar
    • Export Citation
  • Silva, S. J., P.-L. Ma, J. C. Hardin, and D. Rothenberg, 2021b: Physically regularized machine learning emulators of aerosol activation. Geosci. Model Dev., 14, 30673077, https://doi.org/10.5194/gmd-14-3067-2021.

    • Search Google Scholar
    • Export Citation
  • Silva, S. J., C. A. Keller, and J. Hardin, 2022: Using an explainable machine learning approach to characterize Earth system model errors: Application of SHAP analysis to modeling lightning flash occurrence. J. Adv. Model. Earth Syst., 14, e2021MS002881, https://doi.org/10.1029/2021MS002881.

    • Search Google Scholar
    • Export Citation
  • Sturm, P. O., and A. S. Wexler, 2022: Conservation laws in a neural network architecture: Enforcing the atom balance of a Julia-based photochemical model (v0.2.0). Geosci. Model Dev., 15, 34173431, https://doi.org/10.5194/gmd-15-3417-2022.

    • Search Google Scholar
    • Export Citation
  • Sturm, P. O., A. Manders, R. Janssen, A. Segers, A. Wexler, and H. X. Lin, 2023: Advecting superspecies: Efficiently modeling transport of organic aerosol with a mass-conserving dimensionality reduction method. J. Adv. Model. Earth Syst., 15, e2022MS003235, https://doi.org/10.1029/2022MS003235.

    • Search Google Scholar
    • Export Citation
  • Toms, B. A., E. A. Barnes, and I. Ebert‐Uphoff, 2020: Physically interpretable neural networks for the geosciences: Applications to Earth system variability. J. Adv. Model. Earth Syst., 12, e2019MS002002, https://doi.org/10.1029/2019MS002002.

    • Search Google Scholar
    • Export Citation
  • Wang, S. S.-C., Y. Qian, L. R. Leung, and Y. Zhang, 2021: Identifying key drivers of wildfires in the contiguous US using machine learning and game theory interpretation. Earth’s Future, 9, e2020EF001910, https://doi.org/10.1029/2020EF001910.

    • Search Google Scholar
    • Export Citation
  • Wang, S. S.-C., Y. Qian, L. R. Leung, and Y. Zhang, 2022: Interpreting machine learning prediction of fire emissions and comparison with FireMIP process-based models. Atmos. Chem. Phys., 22, 34453468, https://doi.org/10.5194/acp-22-3445-2022.

    • Search Google Scholar
    • Export Citation

Supplementary Materials

Save
  • Aas, K., M. Jullum, and A. Løland, 2020: Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. arXiv, 1903.10464v3, https://doi.org/10.48550/arXiv.1903.10464.

  • Barnes, E. A., B. Toms, J. W. Hurrell, I. Ebert‐Uphoff, C. Anderson, and D. Anderson, 2020: Indicator patterns of forced change learned by an artificial neural network. J. Adv. Model. Earth Syst., 12, e2020MS002195, https://doi.org/10.1029/2020MS002195.

    • Search Google Scholar
    • Export Citation
  • Batunacun, R. Wieland, T. Lakes, and C. Nendel, 2021: Using Shapley additive explanations to interpret extreme gradient boosting predictions of grassland degradation in Xilingol, China. Geosci. Model Dev., 14, 14931510, https://doi.org/10.5194/gmd-14-1493-2021.

    • Search Google Scholar
    • Export Citation
  • Bey, I., and Coauthors, 2001: Global modeling of tropospheric chemistry with assimilated meteorology: Model description and evaluation. J. Geophys. Res., 106, 23 07323 095, https://doi.org/10.1029/2001JD000807.

    • Search Google Scholar
    • Export Citation
  • Cameron-Smith, P., J.-F. Lamarque, P. Connell, C. Chuang, and F. Vitt, 2006: Toward an Earth system model: Atmospheric chemistry, coupling, and petascale computing. J. Phys.: Conf. Ser., 46, 343350, https://doi.org/10.1088/1742-6596/46/1/048.

    • Search Google Scholar
    • Export Citation
  • Chen, T., and C. Guestrin, 2016: XGBoost: A scalable tree boosting system. Proc. 22nd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, San Francisco, CA, Association for Computing Machinery, 785–794, https://doi.org/10.1145/2939672.2939785.

  • Dietz, S. J., P. Kneringer, G. J. Mayr, and A. Zeileis, 2019: Low-visibility forecasts for different flight planning horizons using tree-based boosting models. Adv. Stat. Climatol. Meteor. Oceanogr., 5, 101114, https://doi.org/10.5194/ascmo-5-101-2019.

    • Search Google Scholar
    • Export Citation
  • Ebert-Uphoff, I., and K. Hilburn, 2020: Evaluation, tuning, and interpretation of neural networks for working with images in meteorological applications. Bull. Amer. Meteor. Soc., 101, E2149E2170, https://doi.org/10.1175/BAMS-D-20-0097.1.

    • Search Google Scholar
    • Export Citation
  • Geiss, A., S. J. Silva, and J. C. Hardin, 2022: Downscaling atmospheric chemistry simulations with physically consistent deep learning. Geosci. Model Dev., 15, 66776694, https://doi.org/10.5194/gmd-15-6677-2022.

    • Search Google Scholar
    • Export Citation
  • Holzinger, A., R. Goebel, R. Fong, T. Moon, K.-R. Müller, and W. Samek, Eds., 2022: xxAI—Beyond Explainable AI: International Workshop, Held in Conjunction with ICML 2020, July 18, 2020, Vienna, Austria, Revised and Extended Papers. Lecture Notes in Computer Science, Vol. 13200, Springer International Publishing, 397 pp., https://doi.org/10.1007/978-3-031-04083-2.

  • Ivatt, P. D., and M. J. Evans, 2020: Improving the prediction of an atmospheric chemistry transport model using gradient-boosted regression trees. Atmos. Chem. Phys., 20, 80638082, https://doi.org/10.5194/acp-20-8063-2020.

    • Search Google Scholar
    • Export Citation
  • Keller, C. A., and M. J. Evans, 2019: Application of random forest regression to the calculation of gas-phase chemistry within the GEOS-Chem chemistry model v10. Geosci. Model Dev., 12, 12091225, https://doi.org/10.5194/gmd-12-1209-2019.

    • Search Google Scholar
    • Export Citation
  • Keller, C. A., and Coauthors, 2021: Description of the NASA GEOS composition forecast modeling system GEOS-CF v1.0. J. Adv. Model. Earth Syst., 13, e2020MS002413, https://doi.org/10.1029/2020MS002413.

    • Search Google Scholar
    • Export Citation
  • Kelp, M. M., D. J. Jacob, H. Lin, and M. P. Sulprizio, 2022: An online‐learned neural network chemical solver for stable long‐term global simulations of atmospheric chemistry. J. Adv. Model. Earth Syst., 14, e2021MS002926, https://doi.org/10.1029/2021MS002926.

    • Search Google Scholar
    • Export Citation
  • Lee, Y., D. Han, M.-H. Ahn, J. Im, and S. J. Lee, 2019: Retrieval of total precipitable water from Himawari-8 AHI data: A comparison of random forest, extreme gradient boosting, and deep neural network. Remote Sens., 11, 1741, https://doi.org/10.3390/rs11151741.

    • Search Google Scholar
    • Export Citation
  • Lundberg, S., and S.-I. Lee, 2017: A unified approach to interpreting model predictions. arXiv, 1705.0787v2, https://doi.org/10.48550/arXiv.1705.07874.

  • Lundberg, S., G. G. Erion, and S.-I. Lee, 2019: Consistent individualized feature attribution for tree ensembles. arXiv, 1802.03888v3, https://doi.org/10.48550/arXiv.1802.03888.

  • Lundberg, S., and Coauthors, 2020: From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell., 2, 5667, https://doi.org/10.1038/s42256-019-0138-9.

    • Search Google Scholar
    • Export Citation
  • Mamalakis, A., E. A. Barnes, and I. Ebert-Uphoff, 2022: Carefully choose the baseline: Lessons learned from applying XAI attribution methods for regression tasks in geoscience. arXiv, 2208.09473v1, https://doi.org/10.48550/arXiv.2208.09473.

  • McGovern, A., R. Lagerquist, D. J. Gagne II, G. E. Jergensen, K. L. Elmore, C. R. Homeyer, and T. Smith, 2019: Making the black box more transparent: Understanding the physical implications of machine learning. Bull. Amer. Meteor. Soc., 100, 21752199, https://doi.org/10.1175/BAMS-D-18-0195.1.

    • Search Google Scholar
    • Export Citation
  • McGovern, A., R. Lagerquist, and D. J. Gagne II, 2020: Using machine learning and model interpretation and visualization techniques to gain physical insights in atmospheric science. Eighth Int. Conf. on Learning Representations, Online, ICLR, https://ai4earthscience.github.io/iclr-2020-workshop/papers/ai4earth16.pdf.

  • Molina, M. J., D. J. Gagne, and A. F. Prein, 2021: A benchmark to test generalization capabilities of deep learning methods to classify severe convective storms in a changing climate. Earth Space Sci., 8, e2020EA001490, https://doi.org/10.1029/2020EA001490.

    • Search Google Scholar
    • Export Citation
  • Molnar, C., 2019: Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. Leanpub, 320 pp.

  • Molnar, C., and Coauthors, 2021: General pitfalls of model-agnostic interpretation methods for machine learning models. arXiv, 2007.04131v2, https://doi.org/10.48550/arXiv.2007.04131.

  • Rasp, S., M. S. Pritchard, and P. Gentine, 2018: Deep learning to represent subgrid processes in climate models. Proc. Natl. Acad. Sci. USA, 115, 96849689, https://doi.org/10.1073/pnas.1810286115.

    • Search Google Scholar
    • Export Citation
  • Seinfeld, J. H., and S. N. Pandis, 2016: Atmospheric Chemistry and Physics: From Air Pollution to Climate Change. 3rd ed. John Wiley and Sons, 1152 pp.

  • Silva, S. J., D. A. Ridley, and C. L. Heald, 2020: Exploring the constraints on simulated aerosol sources and transport across the North Atlantic with island-based sun photometers. Earth Space Sci., 7, e2020EA001392, https://doi.org/10.1029/2020EA001392.

    • Search Google Scholar
    • Export Citation
  • Silva, S. J., S. M. Burrows, M. J. Evans, and M. Halappanavar, 2021a: A graph theoretical intercomparison of atmospheric chemical mechanisms. Geophys. Res. Lett., 48, e2020GL090481, https://doi.org/10.1029/2020GL090481.

    • Search Google Scholar
    • Export Citation
  • Silva, S. J., P.-L. Ma, J. C. Hardin, and D. Rothenberg, 2021b: Physically regularized machine learning emulators of aerosol activation. Geosci. Model Dev., 14, 30673077, https://doi.org/10.5194/gmd-14-3067-2021.

    • Search Google Scholar
    • Export Citation
  • Silva, S. J., C. A. Keller, and J. Hardin, 2022: Using an explainable machine learning approach to characterize Earth system model errors: Application of SHAP analysis to modeling lightning flash occurrence. J. Adv. Model. Earth Syst., 14, e2021MS002881, https://doi.org/10.1029/2021MS002881.

    • Search Google Scholar
    • Export Citation
  • Sturm, P. O., and A. S. Wexler, 2022: Conservation laws in a neural network architecture: Enforcing the atom balance of a Julia-based photochemical model (v0.2.0). Geosci. Model Dev., 15, 34173431, https://doi.org/10.5194/gmd-15-3417-2022.

    • Search Google Scholar
    • Export Citation
  • Sturm, P. O., A. Manders, R. Janssen, A. Segers, A. Wexler, and H. X. Lin, 2023: Advecting superspecies: Efficiently modeling transport of organic aerosol with a mass-conserving dimensionality reduction method. J. Adv. Model. Earth Syst., 15, e2022MS003235, https://doi.org/10.1029/2022MS003235.

    • Search Google Scholar
    • Export Citation
  • Toms, B. A., E. A. Barnes, and I. Ebert‐Uphoff, 2020: Physically interpretable neural networks for the geosciences: Applications to Earth system variability. J. Adv. Model. Earth Syst., 12, e2019MS002002, https://doi.org/10.1029/2019MS002002.

    • Search Google Scholar
    • Export Citation
  • Wang, S. S.-C., Y. Qian, L. R. Leung, and Y. Zhang, 2021: Identifying key drivers of wildfires in the contiguous US using machine learning and game theory interpretation. Earth’s Future, 9, e2020EF001910, https://doi.org/10.1029/2020EF001910.

    • Search Google Scholar
    • Export Citation
  • Wang, S. S.-C., Y. Qian, L. R. Leung, and Y. Zhang, 2022: Interpreting machine learning prediction of fire emissions and comparison with FireMIP process-based models. Atmos. Chem. Phys., 22, 34453468, https://doi.org/10.5194/acp-22-3445-2022.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    (left) Correlation histogram and (right) heat map of all variables in the atmospheric chemistry case study dataset. The histogram on the right shows the distribution for both OH and MP individually.

  • Fig. 2.

    Performance scatterplots of the log10 of the reaction-rate (molecules cm−3 s−1) predictions for this case study for (left) the original dataset and (right) the randomized baseline.

  • Fig. 3.

    Average importance across the three methods. The original dataset is shown in black, and the randomized baseline is shown in red. Differences between the original dataset and randomized baseline importance values are highlighted with gray lines. Values very close to zero have both the original and randomized points overplotted.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 2494 1970 232
PDF Downloads 1469 993 89