Browse
Abstract
Visible and infrared radiance products of geostationary orbiting platforms provide virtually continuous observations of Earth. In contrast, low-Earth orbiters observe passive microwave (PMW) radiances at any location much less frequently. Prior literature demonstrates the ability of a machine learning (ML) approach to build a link between these two complementary radiance spectra by predicting PMW observations using infrared and visible products collected from geostationary instruments, which could potentially deliver a highly desirable synthetic PMW product with nearly continuous spatiotemporal coverage. However, current ML models lack the ability to provide a measure of uncertainty of such a product, significantly limiting its applications. In this work, Bayesian deep learning is employed to generate synthetic Global Precipitation Measurement (GPM) Microwave Imager (GMI) data from Advanced Baseline Imager (ABI) observations with attached uncertainties over the ocean. The study first uses deterministic residual networks (ResNets) to generate synthetic GMI brightness temperatures with as little mean absolute error as 1.72 K at the ABI spatiotemporal resolution. Then, for the same task, we use three Bayesian ResNet models to produce a comparable amount of error while providing previously unavailable predictive variance (i.e., uncertainty) for each synthetic data point. We find that the Flipout configuration provides the most robust calibration between uncertainty and error across GMI frequencies, and then demonstrate how this additional information is useful for discarding high-error synthetic data points prior to use by downstream applications.
Abstract
Visible and infrared radiance products of geostationary orbiting platforms provide virtually continuous observations of Earth. In contrast, low-Earth orbiters observe passive microwave (PMW) radiances at any location much less frequently. Prior literature demonstrates the ability of a machine learning (ML) approach to build a link between these two complementary radiance spectra by predicting PMW observations using infrared and visible products collected from geostationary instruments, which could potentially deliver a highly desirable synthetic PMW product with nearly continuous spatiotemporal coverage. However, current ML models lack the ability to provide a measure of uncertainty of such a product, significantly limiting its applications. In this work, Bayesian deep learning is employed to generate synthetic Global Precipitation Measurement (GPM) Microwave Imager (GMI) data from Advanced Baseline Imager (ABI) observations with attached uncertainties over the ocean. The study first uses deterministic residual networks (ResNets) to generate synthetic GMI brightness temperatures with as little mean absolute error as 1.72 K at the ABI spatiotemporal resolution. Then, for the same task, we use three Bayesian ResNet models to produce a comparable amount of error while providing previously unavailable predictive variance (i.e., uncertainty) for each synthetic data point. We find that the Flipout configuration provides the most robust calibration between uncertainty and error across GMI frequencies, and then demonstrate how this additional information is useful for discarding high-error synthetic data points prior to use by downstream applications.
Abstract
Ships inside the Arctic basin require high-resolution (1–5 km), near-term (days to semimonthly) forecasts for guidance on scales of interest to their operations where forecast model predictions are insufficient due to their coarse spatial and temporal resolutions. Deep learning techniques offer the capability of rapid assimilation and analysis of multiple sources of information for improved forecasting. Data from the National Oceanographic and Atmospheric Administration’s Global Forecast System, Multi-scale Ultra-high Resolution Sea Surface Temperature (MEaSUREs), and the National Snow and Ice Data Center’s Multisensor Analyzed Sea ice Extent (MASIE) were used to develop the sea ice extent deep learning forecast model, over the freeze-up periods of 2016, 2018, 2019, and 2020 in the Beaufort Sea. Sea ice extent forecasts were produced for 1–7 days in the future. The approach was novel for sea ice extent forecasting in using forecast data as model input to aid in the prediction of sea ice extent. Model accuracy was assessed against a persistence model. While the average accuracy of the persistence model dropped from 97% to 90% for forecast days 1–7, the deep learning model accuracy dropped only to 93%. A k-fold (fourfold) cross-validation study found that on all except the first day, the deep learning model, which includes a U-Net architecture with an 18-layer residual neural network (Resnet-18) backbone, does better than the persistence model. Skill scores improve the farther out in time to 0.27. The model demonstrated success in predicting changes in ice extent of significance for navigation in the Amundsen Gulf. Extensions to other Arctic seas, seasons, and sea ice parameters are under development.
Significance Statement
Ships traversing the Arctic require timely, accurate sea ice location information to successfully complete their transits. After testing several potential candidates, we have developed a short-term (7 day) forecast process using existing observations of ice extent and sea surface temperature, with operational forecasts of weather and oceanographic variables, and appropriate machine learning models. The process included using forecasts of atmospheric and oceanographic conditions, as a human forecaster/analyst would. The models were trained for the Beaufort Sea north of Alaska using data and forecasts from 2016 combined with 2018–20. The results showed improvement in short-term forecasts of ice locations over current methods and also demonstrated correctly predicted changes in the sea ice that are important for navigation.
Abstract
Ships inside the Arctic basin require high-resolution (1–5 km), near-term (days to semimonthly) forecasts for guidance on scales of interest to their operations where forecast model predictions are insufficient due to their coarse spatial and temporal resolutions. Deep learning techniques offer the capability of rapid assimilation and analysis of multiple sources of information for improved forecasting. Data from the National Oceanographic and Atmospheric Administration’s Global Forecast System, Multi-scale Ultra-high Resolution Sea Surface Temperature (MEaSUREs), and the National Snow and Ice Data Center’s Multisensor Analyzed Sea ice Extent (MASIE) were used to develop the sea ice extent deep learning forecast model, over the freeze-up periods of 2016, 2018, 2019, and 2020 in the Beaufort Sea. Sea ice extent forecasts were produced for 1–7 days in the future. The approach was novel for sea ice extent forecasting in using forecast data as model input to aid in the prediction of sea ice extent. Model accuracy was assessed against a persistence model. While the average accuracy of the persistence model dropped from 97% to 90% for forecast days 1–7, the deep learning model accuracy dropped only to 93%. A k-fold (fourfold) cross-validation study found that on all except the first day, the deep learning model, which includes a U-Net architecture with an 18-layer residual neural network (Resnet-18) backbone, does better than the persistence model. Skill scores improve the farther out in time to 0.27. The model demonstrated success in predicting changes in ice extent of significance for navigation in the Amundsen Gulf. Extensions to other Arctic seas, seasons, and sea ice parameters are under development.
Significance Statement
Ships traversing the Arctic require timely, accurate sea ice location information to successfully complete their transits. After testing several potential candidates, we have developed a short-term (7 day) forecast process using existing observations of ice extent and sea surface temperature, with operational forecasts of weather and oceanographic variables, and appropriate machine learning models. The process included using forecasts of atmospheric and oceanographic conditions, as a human forecaster/analyst would. The models were trained for the Beaufort Sea north of Alaska using data and forecasts from 2016 combined with 2018–20. The results showed improvement in short-term forecasts of ice locations over current methods and also demonstrated correctly predicted changes in the sea ice that are important for navigation.
Abstract
Skillful weather prediction on subseasonal to seasonal time scales is crucial for many socioeconomic ventures. But forecasting, especially extremes, on these time scales is very challenging because the information from initial conditions is gradually lost. Therefore, data-driven methods are discussed as an alternative to numerical weather prediction models. Here, quantile regression forests (QRFs) and random forest classifiers (RFCs) are used for probabilistic forecasting of central European mean wintertime 2-m temperatures and cold wave days at lead times of 14, 21, and 28 days. ERA5 reanalysis meteorological predictors are used as input data for the machine learning models. For the winters of 2000/01–2019/20, the predictions are compared with a climatological ensemble obtained from E-OBS observational data. The evaluation is performed as full distribution predictions for continuous values using the continuous ranked probability skill score and as binary categorical forecasts using the Brier skill score. We find skill at lead times up to 28 days in the 20-winter mean and for individual winters. Case studies show that all used machine learning models are able to learn patterns in the data beyond climatology. A more detailed analysis using Shapley additive explanations suggests that both random forest (RF)-based models are able to learn physically known relationships in the data. This underlines that RF-based data-driven models can be a suitable tool for forecasting central European mean wintertime 2-m temperatures and the occurrence of cold wave days.
Significance Statement
Because of the chaotic nature of weather, it is very complicated to make predictions with traditional numerical methods 2–4 weeks in advance. Therefore, we use alternative, interpretable methods that “learn” to find statistically relevant patterns in meteorological data that can be used for forecasting central European mean surface wintertime temperatures and cold wave days. These methods are part of the so-called machine learning methods that do not rely on the traditional numerical equations anymore. We test our methods for 20 winters between 2000/01 and 2019/20 against a static weather prediction consisting of the past 30 winters. For single winters and in a mean over the 20 predicted winters, we find improved predictions up to 4 weeks in advance.
Abstract
Skillful weather prediction on subseasonal to seasonal time scales is crucial for many socioeconomic ventures. But forecasting, especially extremes, on these time scales is very challenging because the information from initial conditions is gradually lost. Therefore, data-driven methods are discussed as an alternative to numerical weather prediction models. Here, quantile regression forests (QRFs) and random forest classifiers (RFCs) are used for probabilistic forecasting of central European mean wintertime 2-m temperatures and cold wave days at lead times of 14, 21, and 28 days. ERA5 reanalysis meteorological predictors are used as input data for the machine learning models. For the winters of 2000/01–2019/20, the predictions are compared with a climatological ensemble obtained from E-OBS observational data. The evaluation is performed as full distribution predictions for continuous values using the continuous ranked probability skill score and as binary categorical forecasts using the Brier skill score. We find skill at lead times up to 28 days in the 20-winter mean and for individual winters. Case studies show that all used machine learning models are able to learn patterns in the data beyond climatology. A more detailed analysis using Shapley additive explanations suggests that both random forest (RF)-based models are able to learn physically known relationships in the data. This underlines that RF-based data-driven models can be a suitable tool for forecasting central European mean wintertime 2-m temperatures and the occurrence of cold wave days.
Significance Statement
Because of the chaotic nature of weather, it is very complicated to make predictions with traditional numerical methods 2–4 weeks in advance. Therefore, we use alternative, interpretable methods that “learn” to find statistically relevant patterns in meteorological data that can be used for forecasting central European mean surface wintertime temperatures and cold wave days. These methods are part of the so-called machine learning methods that do not rely on the traditional numerical equations anymore. We test our methods for 20 winters between 2000/01 and 2019/20 against a static weather prediction consisting of the past 30 winters. For single winters and in a mean over the 20 predicted winters, we find improved predictions up to 4 weeks in advance.
Abstract
Emulating numerical weather prediction (NWP) model outputs is important to compute large datasets of weather fields in an efficient way. The purpose of the present paper is to investigate the ability of generative adversarial networks (GANs) to emulate distributions of multivariate outputs (10-m wind and 2-m temperature) of a kilometer-scale NWP model. For that purpose, a residual GAN architecture, regularized with spectral normalization, is trained against a kilometer-scale dataset from the AROME Ensemble Prediction System (AROME-EPS). A wide range of metrics is used for quality assessment, including pixelwise and multiscale Earth-mover distances, spectral analysis, and correlation length scales. The use of wavelet-based scattering coefficients as meaningful metrics is also presented. The GAN generates samples with good distribution recovery and good skill in average spectrum reconstruction. Important local weather patterns are reproduced with a high level of detail, while the joint generation of multivariate samples matches the underlying AROME-EPS distribution. The different metrics introduced describe the GAN’s behavior in a complementary manner, highlighting the need to go beyond spectral analysis in generation quality assessment. An ablation study then shows that removing variables from the generation process is globally beneficial, pointing at the GAN limitations to leverage cross-variable correlations. The role of absolute positional bias in the training process is also characterized, explaining both accelerated learning and quality-diversity trade-off in the multivariate emulation. These results open perspectives about the use of GAN to enrich NWP ensemble approaches, provided that the aforementioned positional bias is properly controlled.
Abstract
Emulating numerical weather prediction (NWP) model outputs is important to compute large datasets of weather fields in an efficient way. The purpose of the present paper is to investigate the ability of generative adversarial networks (GANs) to emulate distributions of multivariate outputs (10-m wind and 2-m temperature) of a kilometer-scale NWP model. For that purpose, a residual GAN architecture, regularized with spectral normalization, is trained against a kilometer-scale dataset from the AROME Ensemble Prediction System (AROME-EPS). A wide range of metrics is used for quality assessment, including pixelwise and multiscale Earth-mover distances, spectral analysis, and correlation length scales. The use of wavelet-based scattering coefficients as meaningful metrics is also presented. The GAN generates samples with good distribution recovery and good skill in average spectrum reconstruction. Important local weather patterns are reproduced with a high level of detail, while the joint generation of multivariate samples matches the underlying AROME-EPS distribution. The different metrics introduced describe the GAN’s behavior in a complementary manner, highlighting the need to go beyond spectral analysis in generation quality assessment. An ablation study then shows that removing variables from the generation process is globally beneficial, pointing at the GAN limitations to leverage cross-variable correlations. The role of absolute positional bias in the training process is also characterized, explaining both accelerated learning and quality-diversity trade-off in the multivariate emulation. These results open perspectives about the use of GAN to enrich NWP ensemble approaches, provided that the aforementioned positional bias is properly controlled.
Abstract
Exploring the climate impacts of various anthropogenic emissions scenarios is key to making informed decisions for climate change mitigation and adaptation. State-of-the-art Earth system models can provide detailed insight into these impacts but have a large associated computational cost on a per-scenario basis. This large computational burden has driven recent interest in developing cheap machine learning models for the task of climate model emulation. In this paper, we explore the efficacy of randomly wired neural networks for this task. We describe how they can be constructed and compare them with their standard feedforward counterparts using the ClimateBench dataset. Specifically, we replace the serially connected dense layers in multilayer perceptrons, convolutional neural networks, and convolutional long short-term memory networks with randomly wired dense layers and assess the impact on model performance for models with 1 million and 10 million parameters. We find that models with less-complex architectures see the greatest performance improvement with the addition of random wiring (up to 30.4% for multilayer perceptrons). Furthermore, of 24 different model architecture, parameter count, and prediction task combinations, only one had a statistically significant performance deficit in randomly wired networks relative to their standard counterparts, with 14 cases showing statistically significant improvement. We also find no significant difference in prediction speed between networks with standard feedforward dense layers and those with randomly wired layers. These findings indicate that randomly wired neural networks may be suitable direct replacements for traditional dense layers in many standard models.
Significance Statement
Modeling various greenhouse gas and aerosol emissions scenarios is important for both understanding climate change and making informed political and economic decisions. However, accomplishing this with large Earth system models is a complex and computationally expensive task. As such, data-driven machine learning models have risen in prevalence as cheap emulators of Earth system models. In this work, we explore a special type of machine learning model called randomly wired neural networks and find that they perform competitively for the task of climate model emulation. This indicates that future machine learning models for emulation may significantly benefit from using randomly wired neural networks as opposed to their more-standard counterparts.
Abstract
Exploring the climate impacts of various anthropogenic emissions scenarios is key to making informed decisions for climate change mitigation and adaptation. State-of-the-art Earth system models can provide detailed insight into these impacts but have a large associated computational cost on a per-scenario basis. This large computational burden has driven recent interest in developing cheap machine learning models for the task of climate model emulation. In this paper, we explore the efficacy of randomly wired neural networks for this task. We describe how they can be constructed and compare them with their standard feedforward counterparts using the ClimateBench dataset. Specifically, we replace the serially connected dense layers in multilayer perceptrons, convolutional neural networks, and convolutional long short-term memory networks with randomly wired dense layers and assess the impact on model performance for models with 1 million and 10 million parameters. We find that models with less-complex architectures see the greatest performance improvement with the addition of random wiring (up to 30.4% for multilayer perceptrons). Furthermore, of 24 different model architecture, parameter count, and prediction task combinations, only one had a statistically significant performance deficit in randomly wired networks relative to their standard counterparts, with 14 cases showing statistically significant improvement. We also find no significant difference in prediction speed between networks with standard feedforward dense layers and those with randomly wired layers. These findings indicate that randomly wired neural networks may be suitable direct replacements for traditional dense layers in many standard models.
Significance Statement
Modeling various greenhouse gas and aerosol emissions scenarios is important for both understanding climate change and making informed political and economic decisions. However, accomplishing this with large Earth system models is a complex and computationally expensive task. As such, data-driven machine learning models have risen in prevalence as cheap emulators of Earth system models. In this work, we explore a special type of machine learning model called randomly wired neural networks and find that they perform competitively for the task of climate model emulation. This indicates that future machine learning models for emulation may significantly benefit from using randomly wired neural networks as opposed to their more-standard counterparts.
Abstract
Physics-based simulations of Arctic sea ice are highly complex, involving transport between different phases, length scales, and time scales. Resultantly, numerical simulations of sea ice dynamics have a high computational cost and model uncertainty. We employ data-driven machine learning (ML) to make predictions of sea ice motion. The ML models are built to predict present-day sea ice velocity given present-day wind velocity and previous-day sea ice concentration and velocity. Models are trained using reanalysis winds and satellite-derived sea ice properties. We compare the predictions of three different models: persistence (PS), linear regression (LR), and a convolutional neural network (CNN). We quantify the spatiotemporal variability of the correlation between observations and the statistical model predictions. Additionally, we analyze model performance in comparison to variability in properties related to ice motion (wind velocity, ice velocity, ice concentration, distance from coast, bathymetric depth) to understand the processes related to decreases in model performance. Results indicate that a CNN makes skillful predictions of daily sea ice velocity with a correlation up to 0.81 between predicted and observed sea ice velocity, while the LR and PS implementations exhibit correlations of 0.78 and 0.69, respectively. The correlation varies spatially and seasonally: lower values occur in shallow coastal regions and during times of minimum sea ice extent. LR parameter analysis indicates that wind velocity plays the largest role in predicting sea ice velocity on 1-day time scales, particularly in the central Arctic. Regions where wind velocity has the largest LR parameter are regions where the CNN has higher predictive skill than the LR.
Significance Statement
We build and evaluate different machine learning (ML) models that make 1-day predictions of Arctic sea ice velocity using present-day wind velocity and previous-day ice concentration and ice velocity. We find that models that incorporate nonlinear relationships between inputs (a neural network) capture important information (i.e., have a higher correlation between observations and predictions than do linear and persistence models). This performance enhancement occurs primarily in deeper regions of the central Arctic where wind speed is the dominant predictor of ice motion. Understanding where these models benefit from increased complexity is important because future work will use ML to elucidate physically meaningful relationships within the data, looking at how the relationship between wind and ice velocity is changing as the ice melts.
Abstract
Physics-based simulations of Arctic sea ice are highly complex, involving transport between different phases, length scales, and time scales. Resultantly, numerical simulations of sea ice dynamics have a high computational cost and model uncertainty. We employ data-driven machine learning (ML) to make predictions of sea ice motion. The ML models are built to predict present-day sea ice velocity given present-day wind velocity and previous-day sea ice concentration and velocity. Models are trained using reanalysis winds and satellite-derived sea ice properties. We compare the predictions of three different models: persistence (PS), linear regression (LR), and a convolutional neural network (CNN). We quantify the spatiotemporal variability of the correlation between observations and the statistical model predictions. Additionally, we analyze model performance in comparison to variability in properties related to ice motion (wind velocity, ice velocity, ice concentration, distance from coast, bathymetric depth) to understand the processes related to decreases in model performance. Results indicate that a CNN makes skillful predictions of daily sea ice velocity with a correlation up to 0.81 between predicted and observed sea ice velocity, while the LR and PS implementations exhibit correlations of 0.78 and 0.69, respectively. The correlation varies spatially and seasonally: lower values occur in shallow coastal regions and during times of minimum sea ice extent. LR parameter analysis indicates that wind velocity plays the largest role in predicting sea ice velocity on 1-day time scales, particularly in the central Arctic. Regions where wind velocity has the largest LR parameter are regions where the CNN has higher predictive skill than the LR.
Significance Statement
We build and evaluate different machine learning (ML) models that make 1-day predictions of Arctic sea ice velocity using present-day wind velocity and previous-day ice concentration and ice velocity. We find that models that incorporate nonlinear relationships between inputs (a neural network) capture important information (i.e., have a higher correlation between observations and predictions than do linear and persistence models). This performance enhancement occurs primarily in deeper regions of the central Arctic where wind speed is the dominant predictor of ice motion. Understanding where these models benefit from increased complexity is important because future work will use ML to elucidate physically meaningful relationships within the data, looking at how the relationship between wind and ice velocity is changing as the ice melts.
Abstract
Digital twins are a transformative technology that can significantly strengthen climate adaptation and mitigation decision-making. Through provision of dynamic, virtual representations of physical systems, making intelligent use of multidisciplinary data, and high-fidelity simulations they equip decision-makers with the information they need, when they need it, marking a step change in how we extract value from data and models. While digital twins are commonplace in some industrial sectors, they are an emerging concept in the environmental sciences and practical demonstrations are limited, partly due to the challenges of representing complex environmental systems. Collaboration on challenges of mutual interest will unlock digital twins’ potential. To bridge the current gap between digital twins for industrial sectors and those of the environment, we identify the need for “environment aware” digital twins (EA-DT) that are a federation of digital twins of environmentally sensitive systems with weather, climate, and environmental information systems. As weather extremes become more frequent and severe, the importance of building weather, climate, and environmental information into digital twins of critical systems such as cities, ports, flood barriers, energy grids, and transport networks increases. Delivering societal benefits will also require significant advances in climate-related decision-making, which lags behind other applications. Progress relies on moving beyond heuristics, and driving advances in the decision sciences informed by new theoretical insights, machine learning and artificial intelligence. To support the use of EA-DTs, we propose a new ontology that stimulates thinking about application and best practice for decision-making so that we are resilient to the challenges of today’s weather and tomorrow’s climate.
Abstract
Digital twins are a transformative technology that can significantly strengthen climate adaptation and mitigation decision-making. Through provision of dynamic, virtual representations of physical systems, making intelligent use of multidisciplinary data, and high-fidelity simulations they equip decision-makers with the information they need, when they need it, marking a step change in how we extract value from data and models. While digital twins are commonplace in some industrial sectors, they are an emerging concept in the environmental sciences and practical demonstrations are limited, partly due to the challenges of representing complex environmental systems. Collaboration on challenges of mutual interest will unlock digital twins’ potential. To bridge the current gap between digital twins for industrial sectors and those of the environment, we identify the need for “environment aware” digital twins (EA-DT) that are a federation of digital twins of environmentally sensitive systems with weather, climate, and environmental information systems. As weather extremes become more frequent and severe, the importance of building weather, climate, and environmental information into digital twins of critical systems such as cities, ports, flood barriers, energy grids, and transport networks increases. Delivering societal benefits will also require significant advances in climate-related decision-making, which lags behind other applications. Progress relies on moving beyond heuristics, and driving advances in the decision sciences informed by new theoretical insights, machine learning and artificial intelligence. To support the use of EA-DTs, we propose a new ontology that stimulates thinking about application and best practice for decision-making so that we are resilient to the challenges of today’s weather and tomorrow’s climate.
Abstract
In November 2021, the Artificial Intelligence for Earth System Predictability (AI4ESP) workshop was held, which involved hundreds of researchers from dozens of institutions. There were 17 sessions held at the workshop, including one on ecohydrology. The ecohydrology session included various breakout rooms that addressed specific topics, including 1) soils and belowground areas; 2) watersheds; 3) hydrology; 4) ecophysiology and plant hydraulics; 5) ecology; 6) extremes, disturbance and fire, and land-use and land-cover change; and 7) uncertainty quantification methods and techniques. In this paper, we investigate and report on the potential application of artificial intelligence and machine learning in ecohydrology, highlight outcomes of the ecohydrology session at the AI4ESP workshop, and provide visionary perspectives for future research in this area.
Abstract
In November 2021, the Artificial Intelligence for Earth System Predictability (AI4ESP) workshop was held, which involved hundreds of researchers from dozens of institutions. There were 17 sessions held at the workshop, including one on ecohydrology. The ecohydrology session included various breakout rooms that addressed specific topics, including 1) soils and belowground areas; 2) watersheds; 3) hydrology; 4) ecophysiology and plant hydraulics; 5) ecology; 6) extremes, disturbance and fire, and land-use and land-cover change; and 7) uncertainty quantification methods and techniques. In this paper, we investigate and report on the potential application of artificial intelligence and machine learning in ecohydrology, highlight outcomes of the ecohydrology session at the AI4ESP workshop, and provide visionary perspectives for future research in this area.
Abstract
Machine learning algorithms are able to capture complex, nonlinear, interacting relationships and are increasingly used to predict agricultural yield variability at regional and national scales. Using explainable artificial intelligence (XAI) methods applied to such algorithms may enable better scientific understanding of drivers of yield variability. However, XAI methods may provide misleading results when applied to spatiotemporal correlated datasets. In this study, machine learning models are trained to predict simulated crop yield from climate indices, and the impact of cross-validation strategy on the interpretation and performance of the resulting models is assessed. Using data from a process-based crop model allows us to then comment on the plausibility of the “explanations” provided by XAI methods. Our results show that the choice of evaluation strategy has an impact on (i) interpretations of the model and (ii) model skill on held-out years and regions, after the evaluation strategy is used for hyperparameter tuning and feature selection. We find that use of a cross-validation strategy based on clustering in feature space achieves the most plausible interpretations as well as the best model performance on held-out years and regions. Our results provide the first steps toward identifying domain-specific “best practices” for the use of XAI tools on spatiotemporal agricultural or climatic data.
Significance Statement
“Explainable” or “interpretable” machine learning (XAI) methods have been increasingly used in scientific research to study complex relationships between climatic and biogeoscientific variables (such as crop yield). However, these methods can return contradictory, implausible, or ambiguous results. In this study, we train machine learning models to predict maize yield anomalies and vary the model evaluation method used. We find that the evaluation (cross validation) method used has an effect on model interpretation results and on the skill of resulting models in held-out years and regions. These results have implications for the methodological design of studies that aim to use XAI tools to identify drivers of, for example, crop yield variability.
Abstract
Machine learning algorithms are able to capture complex, nonlinear, interacting relationships and are increasingly used to predict agricultural yield variability at regional and national scales. Using explainable artificial intelligence (XAI) methods applied to such algorithms may enable better scientific understanding of drivers of yield variability. However, XAI methods may provide misleading results when applied to spatiotemporal correlated datasets. In this study, machine learning models are trained to predict simulated crop yield from climate indices, and the impact of cross-validation strategy on the interpretation and performance of the resulting models is assessed. Using data from a process-based crop model allows us to then comment on the plausibility of the “explanations” provided by XAI methods. Our results show that the choice of evaluation strategy has an impact on (i) interpretations of the model and (ii) model skill on held-out years and regions, after the evaluation strategy is used for hyperparameter tuning and feature selection. We find that use of a cross-validation strategy based on clustering in feature space achieves the most plausible interpretations as well as the best model performance on held-out years and regions. Our results provide the first steps toward identifying domain-specific “best practices” for the use of XAI tools on spatiotemporal agricultural or climatic data.
Significance Statement
“Explainable” or “interpretable” machine learning (XAI) methods have been increasingly used in scientific research to study complex relationships between climatic and biogeoscientific variables (such as crop yield). However, these methods can return contradictory, implausible, or ambiguous results. In this study, we train machine learning models to predict maize yield anomalies and vary the model evaluation method used. We find that the evaluation (cross validation) method used has an effect on model interpretation results and on the skill of resulting models in held-out years and regions. These results have implications for the methodological design of studies that aim to use XAI tools to identify drivers of, for example, crop yield variability.