Browse
Abstract
Global deep learning weather prediction models have recently been shown to produce forecasts that rival those from physics-based models run at operational centers. It is unclear whether these models have encoded atmospheric dynamics or simply pattern matching that produces the smallest forecast error. Answering this question is crucial to establishing the utility of these models as tools for basic science. Here, we subject one such model, Pangu-Weather, to a set of four classical dynamical experiments that do not resemble the model training data. Localized perturbations to the model output and the initial conditions are added to steady time-averaged conditions, to assess the propagation speed and structural evolution of signals away from the local source. Perturbing the model physics by adding a steady tropical heat source results in a classical Matsuno–Gill response near the heating and planetary waves that radiate into the extratropics. A localized disturbance on the winter-averaged North Pacific jet stream produces realistic extratropical cyclones and fronts, including the spontaneous emergence of polar lows. Perturbing the 500-hPa height field alone yields adjustment from a state of rest to one of wind–pressure balance over ∼6 h. Localized subtropical low pressure systems produce Atlantic hurricanes, provided the initial amplitude exceeds about 4 hPa, and setting the initial humidity to zero eliminates hurricane development. We conclude that the model encodes realistic physics in all experiments and suggest that it can be used as a tool for rapidly testing a wide range of hypotheses.
Abstract
Global deep learning weather prediction models have recently been shown to produce forecasts that rival those from physics-based models run at operational centers. It is unclear whether these models have encoded atmospheric dynamics or simply pattern matching that produces the smallest forecast error. Answering this question is crucial to establishing the utility of these models as tools for basic science. Here, we subject one such model, Pangu-Weather, to a set of four classical dynamical experiments that do not resemble the model training data. Localized perturbations to the model output and the initial conditions are added to steady time-averaged conditions, to assess the propagation speed and structural evolution of signals away from the local source. Perturbing the model physics by adding a steady tropical heat source results in a classical Matsuno–Gill response near the heating and planetary waves that radiate into the extratropics. A localized disturbance on the winter-averaged North Pacific jet stream produces realistic extratropical cyclones and fronts, including the spontaneous emergence of polar lows. Perturbing the 500-hPa height field alone yields adjustment from a state of rest to one of wind–pressure balance over ∼6 h. Localized subtropical low pressure systems produce Atlantic hurricanes, provided the initial amplitude exceeds about 4 hPa, and setting the initial humidity to zero eliminates hurricane development. We conclude that the model encodes realistic physics in all experiments and suggest that it can be used as a tool for rapidly testing a wide range of hypotheses.
Abstract
Increases in wildfire activity and the resulting impacts have prompted the development of high-resolution wildfire behavior models for forecasting fire spread. Recent progress in using satellites to detect fire locations further provides the opportunity to use measurements toward improving fire spread forecasts from numerical models through data assimilation. This work develops a physics-informed approach for inferring the history of a wildfire from satellite measurements, providing the necessary information to initialize coupled atmosphere–wildfire models from a measured wildfire state. The fire arrival time, which is the time the fire reaches a given spatial location, acts as a succinct representation of the history of a wildfire. In this work, a conditional Wasserstein generative adversarial network (cWGAN), trained with WRF–SFIRE simulations, is used to infer the fire arrival time from satellite active fire data. The cWGAN is used to produce samples of likely fire arrival times from the conditional distribution of arrival times given satellite active fire detections. Samples produced by the cWGAN are further used to assess the uncertainty of predictions. The cWGAN is tested on four California wildfires occurring between 2020 and 2022, and predictions for fire extent are compared against high-resolution airborne infrared measurements. Further, the predicted ignition times are compared with reported ignition times. An average Sørensen’s coefficient of 0.81 for the fire perimeters and an average ignition time difference of 32 min suggest that the method is highly accurate.
Significance Statement
To initialize coupled atmosphere–wildfire simulations in a physically consistent way based on satellite measurements of active fire locations, it is critical to ensure the state of the fire and atmosphere aligns at the start of the forecast. If known, the history of a wildfire may be used to develop an atmospheric state matching the wildfire state determined from satellite data in a process known as spinup. In this paper, we present a novel method for inferring the early stage history of a wildfire based on satellite active fire measurements. Here, inference of the fire history is performed in a probabilistic sense and physics is further incorporated through the use of training data derived from a coupled atmosphere–wildfire model.
Abstract
Increases in wildfire activity and the resulting impacts have prompted the development of high-resolution wildfire behavior models for forecasting fire spread. Recent progress in using satellites to detect fire locations further provides the opportunity to use measurements toward improving fire spread forecasts from numerical models through data assimilation. This work develops a physics-informed approach for inferring the history of a wildfire from satellite measurements, providing the necessary information to initialize coupled atmosphere–wildfire models from a measured wildfire state. The fire arrival time, which is the time the fire reaches a given spatial location, acts as a succinct representation of the history of a wildfire. In this work, a conditional Wasserstein generative adversarial network (cWGAN), trained with WRF–SFIRE simulations, is used to infer the fire arrival time from satellite active fire data. The cWGAN is used to produce samples of likely fire arrival times from the conditional distribution of arrival times given satellite active fire detections. Samples produced by the cWGAN are further used to assess the uncertainty of predictions. The cWGAN is tested on four California wildfires occurring between 2020 and 2022, and predictions for fire extent are compared against high-resolution airborne infrared measurements. Further, the predicted ignition times are compared with reported ignition times. An average Sørensen’s coefficient of 0.81 for the fire perimeters and an average ignition time difference of 32 min suggest that the method is highly accurate.
Significance Statement
To initialize coupled atmosphere–wildfire simulations in a physically consistent way based on satellite measurements of active fire locations, it is critical to ensure the state of the fire and atmosphere aligns at the start of the forecast. If known, the history of a wildfire may be used to develop an atmospheric state matching the wildfire state determined from satellite data in a process known as spinup. In this paper, we present a novel method for inferring the early stage history of a wildfire based on satellite active fire measurements. Here, inference of the fire history is performed in a probabilistic sense and physics is further incorporated through the use of training data derived from a coupled atmosphere–wildfire model.
Abstract
The prediction of tropical rain rates from atmospheric profiles poses significant challenges, mainly due to the heavy-tailed distribution exhibited by tropical rainfall. This study introduces overparameterized neural networks not only to forecast tropical rain rates but also to explain their heavy-tailed distribution. The investigation is separately conducted for three rain types (stratiform, deep convective, and shallow convective) observed by the Global Precipitation Measurement satellite radar over the west and east Pacific regions. Atmospheric profiles of humidity, temperature, and zonal and meridional winds from the MERRA-2 reanalysis are considered as features. Although overparameterized neural networks are well known for their “double descent phenomenon,” little has been explored about their applicability to climate data and capability of capturing the tail behavior of data. In our results, overparameterized neural networks accurately estimate the rain-rate distributions and outperform other machine learning methods. Spatial maps show that overparameterized neural networks also successfully describe the spatial patterns of each rain type across the tropical Pacific. In addition, we assess the feature importance for each overparameterized neural network to provide insight into the key factors driving the predictions, with low-level humidity and temperature variables being the overall most important. These findings highlight the capability of overparameterized neural networks in predicting the distribution of the rain rate and explaining extreme values.
Significance Statement
This study aims to introduce the capability of overparameterized neural networks, a type of neural network with more parameters than data points, in predicting the distribution of tropical rain rates from gridscale environmental variables and explaining their tail behavior. Rainfall prediction has been a topic of importance, yet it remains a challenging problem for its heavy-tailed nature. Overparameterized neural networks correctly captured rain-rate distributions and the spatial patterns and heterogeneity of the observed rain rates for multiple rain types, which could not be achieved by any other previous statistical or machine learning frameworks. We find that overparameterized neural networks can play a key role in general prediction tasks, with potential expanded applicability to other domains with heavy-tailed data distribution.
Abstract
The prediction of tropical rain rates from atmospheric profiles poses significant challenges, mainly due to the heavy-tailed distribution exhibited by tropical rainfall. This study introduces overparameterized neural networks not only to forecast tropical rain rates but also to explain their heavy-tailed distribution. The investigation is separately conducted for three rain types (stratiform, deep convective, and shallow convective) observed by the Global Precipitation Measurement satellite radar over the west and east Pacific regions. Atmospheric profiles of humidity, temperature, and zonal and meridional winds from the MERRA-2 reanalysis are considered as features. Although overparameterized neural networks are well known for their “double descent phenomenon,” little has been explored about their applicability to climate data and capability of capturing the tail behavior of data. In our results, overparameterized neural networks accurately estimate the rain-rate distributions and outperform other machine learning methods. Spatial maps show that overparameterized neural networks also successfully describe the spatial patterns of each rain type across the tropical Pacific. In addition, we assess the feature importance for each overparameterized neural network to provide insight into the key factors driving the predictions, with low-level humidity and temperature variables being the overall most important. These findings highlight the capability of overparameterized neural networks in predicting the distribution of the rain rate and explaining extreme values.
Significance Statement
This study aims to introduce the capability of overparameterized neural networks, a type of neural network with more parameters than data points, in predicting the distribution of tropical rain rates from gridscale environmental variables and explaining their tail behavior. Rainfall prediction has been a topic of importance, yet it remains a challenging problem for its heavy-tailed nature. Overparameterized neural networks correctly captured rain-rate distributions and the spatial patterns and heterogeneity of the observed rain rates for multiple rain types, which could not be achieved by any other previous statistical or machine learning frameworks. We find that overparameterized neural networks can play a key role in general prediction tasks, with potential expanded applicability to other domains with heavy-tailed data distribution.
Abstract
Weather predictions two to four weeks in advance, called the subseasonal timescale, are highly relevant for socio-economic decision makers. Unfortunately, the skill of numerical weather prediction models at this timescale is generally low. Here, we use probabilistic Random Forest- (RF) based machine learning models to postprocess the Sub-seasonal to Seasonal (S2S) reforecasts of the European Centre for Medium-Range Weather Forecasts (ECMWF). We show, that these models are able to improve the forecasts slightly in a 20-winter mean at lead times of 14 , 21 and 28 days for wintertime Central European mean 2-meter temperatures compared to the lead-time-dependent mean bias corrected ECMWF’s S2S reforecasts and RF-based models using only reanalysis data as input. Predictions of the occurrence of cold wave days are improved at lead times of 21 and 28 days. Thereby, forecasts of continuous temperatures show a better skill than forecasts of binary occurrences of cold wave days. Furthermore, we analyze if the skill depends on the large-scale flow configuration of the atmosphere at initialization, as represented by Weather Regimes (WR). We find that the WR at the start of the forecast influences the skill and its evolution across lead times. These results can be used to assess the conditional improvement of forecasts initialized during one WR in comparison to forecasts initialized during another WR.
Abstract
Weather predictions two to four weeks in advance, called the subseasonal timescale, are highly relevant for socio-economic decision makers. Unfortunately, the skill of numerical weather prediction models at this timescale is generally low. Here, we use probabilistic Random Forest- (RF) based machine learning models to postprocess the Sub-seasonal to Seasonal (S2S) reforecasts of the European Centre for Medium-Range Weather Forecasts (ECMWF). We show, that these models are able to improve the forecasts slightly in a 20-winter mean at lead times of 14 , 21 and 28 days for wintertime Central European mean 2-meter temperatures compared to the lead-time-dependent mean bias corrected ECMWF’s S2S reforecasts and RF-based models using only reanalysis data as input. Predictions of the occurrence of cold wave days are improved at lead times of 21 and 28 days. Thereby, forecasts of continuous temperatures show a better skill than forecasts of binary occurrences of cold wave days. Furthermore, we analyze if the skill depends on the large-scale flow configuration of the atmosphere at initialization, as represented by Weather Regimes (WR). We find that the WR at the start of the forecast influences the skill and its evolution across lead times. These results can be used to assess the conditional improvement of forecasts initialized during one WR in comparison to forecasts initialized during another WR.
Abstract
We developed and applied a machine-learned discretization for one-dimensional (1D) horizontal passive scalar advection, which is an operator component common to all chemical transport models (CTMs). Our learned advection scheme resembles a second-order accurate, three-stencil numerical solver but differs from a traditional solver in that coefficients for each equation term are output by a neural network rather than being theoretically derived constants. We subsampled higher-resolution simulation results—resulting in up to 16× larger grid size and 64× larger time step—and trained our neural-network-based scheme to match the subsampled integration data. In this way, we created an operator that has low resolution (in time or space) but can reproduce the behavior of a high-resolution traditional solver. Our model shows high fidelity in reproducing its training dataset (a single 10-day 1D simulation) and is similarly accurate in simulations with unseen initial conditions, wind fields, and grid spacing. In many cases, our learned solver is more accurate than a low-resolution version of the reference solver, but the low-resolution reference solver achieves greater computational speedup (500× acceleration) over the high-resolution simulation than the learned solver is able to (18× acceleration). Surprisingly, our learned 1D scheme—when combined with a splitting technique—can be used to predict 2D advection and is in some cases more stable and accurate than the low-resolution reference solver in 2D. Overall, our results suggest that learned advection operators may offer a higher-accuracy method for accelerating CTM simulations as compared to simply running a traditional integrator at low resolution.
Significance Statement
Chemical transport modeling (CTM) is an essential tool for studying air pollution. CTM simulations take a long computing time. Modeling pollutant transport (advection) is the second most computationally intensive part of the model. Decreasing the resolution not only reduces the advection computing time but also decreases accuracy. We employed machine learning to reduce the resolution of advection while keeping the accuracy. We verified the robustness of our solver with several generalization testing scenarios. In our 2D simulation, our solver showed up to 100 times faster simulation with fair accuracy. Integrating our approach to existing CTMs will allow broadened participation in the study of air pollution and related solutions.
Abstract
We developed and applied a machine-learned discretization for one-dimensional (1D) horizontal passive scalar advection, which is an operator component common to all chemical transport models (CTMs). Our learned advection scheme resembles a second-order accurate, three-stencil numerical solver but differs from a traditional solver in that coefficients for each equation term are output by a neural network rather than being theoretically derived constants. We subsampled higher-resolution simulation results—resulting in up to 16× larger grid size and 64× larger time step—and trained our neural-network-based scheme to match the subsampled integration data. In this way, we created an operator that has low resolution (in time or space) but can reproduce the behavior of a high-resolution traditional solver. Our model shows high fidelity in reproducing its training dataset (a single 10-day 1D simulation) and is similarly accurate in simulations with unseen initial conditions, wind fields, and grid spacing. In many cases, our learned solver is more accurate than a low-resolution version of the reference solver, but the low-resolution reference solver achieves greater computational speedup (500× acceleration) over the high-resolution simulation than the learned solver is able to (18× acceleration). Surprisingly, our learned 1D scheme—when combined with a splitting technique—can be used to predict 2D advection and is in some cases more stable and accurate than the low-resolution reference solver in 2D. Overall, our results suggest that learned advection operators may offer a higher-accuracy method for accelerating CTM simulations as compared to simply running a traditional integrator at low resolution.
Significance Statement
Chemical transport modeling (CTM) is an essential tool for studying air pollution. CTM simulations take a long computing time. Modeling pollutant transport (advection) is the second most computationally intensive part of the model. Decreasing the resolution not only reduces the advection computing time but also decreases accuracy. We employed machine learning to reduce the resolution of advection while keeping the accuracy. We verified the robustness of our solver with several generalization testing scenarios. In our 2D simulation, our solver showed up to 100 times faster simulation with fair accuracy. Integrating our approach to existing CTMs will allow broadened participation in the study of air pollution and related solutions.
Abstract
Forecasters routinely calibrate their confidence in model forecasts. Ensembles inherently estimate forecast confidence but are often underdispersive, and ensemble spread does not strongly correlate with ensemble-mean error. The misalignment between ensemble spread and skill motivates new methods for “forecasting forecast skill” so that forecasters can better utilize ensemble guidance. We have trained logistic regression and random forest models to predict the skill of composite reflectivity forecasts from the NSSL Warn-on-Forecast System (WoFS), a 3-km ensemble that generates rapidly updating forecast guidance for 0–6-h lead times. The forecast skill predictions are valid at 1-, 2-, or 3-h lead times within localized regions determined by the observed storm locations at analysis time. We use WoFS analysis and forecast output and NSSL Multi-Radar/Multi-Sensor composite reflectivity for 106 cases from the 2017 to 2021 NOAA Hazardous Weather Testbed Spring Forecasting Experiments. We frame the prediction task as a multiclassification problem, where the forecast skill labels are determined by averaging the extended fraction skill scores (eFSSs) for several reflectivity thresholds and verification neighborhoods and then converting to one of three classes based on where the average eFSS ranks within the entire dataset: POOR (bottom 20%), FAIR (middle 60%), or GOOD (top 20%). Initial machine learning (ML) models are trained on 323 predictors; reducing to 10 or 15 predictors in the final models only modestly reduces skill. The final models substantially outperform carefully developed persistence- and spread-based models and are reasonably explainable. The results suggest that ML can be a valuable tool for guiding user confidence in convection-allowing (and larger-scale) ensemble forecasts.
Significance Statement
Some numerical weather prediction (NWP) forecasts are more likely to verify than others. Forecasters often recognize situations where NWP output should be trusted more or less than usual, but objective methods for “forecasting forecast skill” are notably lacking for thunderstorm-scale models. Better estimates of forecast skill can benefit society through more accurate forecasts of high-impact weather. Machine learning (ML) provides a powerful framework for relating forecast skill to the characteristics of model forecasts and available observations over many previous cases. ML models can leverage these relationships to predict forecast skill for new cases in real time. We demonstrate the effectiveness of this approach to forecasting forecast skill using a cutting-edge thunderstorm prediction system and logistic regression and random forest models. Based on this success, we recommend the adoption of similar ML-based methods for other prediction models.
Abstract
Forecasters routinely calibrate their confidence in model forecasts. Ensembles inherently estimate forecast confidence but are often underdispersive, and ensemble spread does not strongly correlate with ensemble-mean error. The misalignment between ensemble spread and skill motivates new methods for “forecasting forecast skill” so that forecasters can better utilize ensemble guidance. We have trained logistic regression and random forest models to predict the skill of composite reflectivity forecasts from the NSSL Warn-on-Forecast System (WoFS), a 3-km ensemble that generates rapidly updating forecast guidance for 0–6-h lead times. The forecast skill predictions are valid at 1-, 2-, or 3-h lead times within localized regions determined by the observed storm locations at analysis time. We use WoFS analysis and forecast output and NSSL Multi-Radar/Multi-Sensor composite reflectivity for 106 cases from the 2017 to 2021 NOAA Hazardous Weather Testbed Spring Forecasting Experiments. We frame the prediction task as a multiclassification problem, where the forecast skill labels are determined by averaging the extended fraction skill scores (eFSSs) for several reflectivity thresholds and verification neighborhoods and then converting to one of three classes based on where the average eFSS ranks within the entire dataset: POOR (bottom 20%), FAIR (middle 60%), or GOOD (top 20%). Initial machine learning (ML) models are trained on 323 predictors; reducing to 10 or 15 predictors in the final models only modestly reduces skill. The final models substantially outperform carefully developed persistence- and spread-based models and are reasonably explainable. The results suggest that ML can be a valuable tool for guiding user confidence in convection-allowing (and larger-scale) ensemble forecasts.
Significance Statement
Some numerical weather prediction (NWP) forecasts are more likely to verify than others. Forecasters often recognize situations where NWP output should be trusted more or less than usual, but objective methods for “forecasting forecast skill” are notably lacking for thunderstorm-scale models. Better estimates of forecast skill can benefit society through more accurate forecasts of high-impact weather. Machine learning (ML) provides a powerful framework for relating forecast skill to the characteristics of model forecasts and available observations over many previous cases. ML models can leverage these relationships to predict forecast skill for new cases in real time. We demonstrate the effectiveness of this approach to forecasting forecast skill using a cutting-edge thunderstorm prediction system and logistic regression and random forest models. Based on this success, we recommend the adoption of similar ML-based methods for other prediction models.
Abstract
Precipitation values produced by climate models are biased due to the parameterization of physical processes and limited spatial resolution. Current bias-correction approaches usually focus on correcting lower-order statistics (mean and standard deviation), which make it difficult to capture precipitation extremes. However, accurate modeling of extremes is critical for policymaking to mitigate and adapt to the effects of climate change. We develop a deep learning framework, leveraging information from key dynamical variables impacting precipitation to also match higher-order statistics (skewness and kurtosis) for the entire precipitation distribution, including extremes. The deep learning framework consists of a two-part architecture: a U-Net convolutional network to capture the spatiotemporal distribution of precipitation and a fully connected network to capture the distribution of higher-order statistics. The joint network, termed UFNet, can simultaneously improve the spatial structure of the modeled precipitation and capture the distribution of extreme precipitation values. Using climate model simulation data and observations that are climatologically similar but not strictly paired, the UFNet identifies and corrects the climate model biases, significantly improving the estimation of daily precipitation as measured by a broad range of spatiotemporal statistics. In particular, UFNet significantly improves the underestimation of extreme precipitation values seen with current bias-correction methods. Our approach constitutes a generalized framework for correcting other climate model variables which improves the accuracy of the climate model predictions, while utilizing a simpler and more stable training process.
Abstract
Precipitation values produced by climate models are biased due to the parameterization of physical processes and limited spatial resolution. Current bias-correction approaches usually focus on correcting lower-order statistics (mean and standard deviation), which make it difficult to capture precipitation extremes. However, accurate modeling of extremes is critical for policymaking to mitigate and adapt to the effects of climate change. We develop a deep learning framework, leveraging information from key dynamical variables impacting precipitation to also match higher-order statistics (skewness and kurtosis) for the entire precipitation distribution, including extremes. The deep learning framework consists of a two-part architecture: a U-Net convolutional network to capture the spatiotemporal distribution of precipitation and a fully connected network to capture the distribution of higher-order statistics. The joint network, termed UFNet, can simultaneously improve the spatial structure of the modeled precipitation and capture the distribution of extreme precipitation values. Using climate model simulation data and observations that are climatologically similar but not strictly paired, the UFNet identifies and corrects the climate model biases, significantly improving the estimation of daily precipitation as measured by a broad range of spatiotemporal statistics. In particular, UFNet significantly improves the underestimation of extreme precipitation values seen with current bias-correction methods. Our approach constitutes a generalized framework for correcting other climate model variables which improves the accuracy of the climate model predictions, while utilizing a simpler and more stable training process.
Abstract
Seasonal hypoxia is a recurring threat to ecosystems and fisheries in the Chesapeake Bay. Hypoxia forecasting based on coupled hydrodynamic and biogeochemical models has proven useful for many stakeholders, as these models excel in accounting for the effects of physical forcing on oxygen supply, but may fall short in replicating the more complex biogeochemical processes that govern oxygen consumption. Satellite-derived reflectances could be used to indicate the presence of surface organic matter over the Bay. However, teasing apart the contribution of atmospheric and aquatic constituents from the signal received by the satellite is not straightforward. As a result, it is difficult to derive surface concentrations of organic matter from satellite data in a robust fashion. A potential solution to this complexity is to use deep learning to build end-to-end applications that do not require precise accounting of the satellite signal from the atmosphere or water, phytoplankton blooms, or sediment plumes. By training a deep neural network with data from a vast suite of variables that could potentially affect oxygen in the water column, improvement of short-term (daily) hypoxia forecast may be possible. Here, we predict oxygen concentrations using inputs that account for both physical and biogeochemical factors. The physical inputs include wind velocity reanalysis information, together with 3D outputs from an estuarine hydrodynamic model, including current velocity, water temperature, and salinity. Satellite-derived spectral reflectance data are used as a surrogate for the biogeochemical factors. These input fields are time series of weekly statistics calculated from daily information, starting 8 weeks before each oxygen observation was collected. To accommodate this input data structure, we adopted a model architecture of long short-term memory networks with eight time steps. At each time step, a set of convolutional neural networks are used to extract information from the inputs. Ablation and cross-validation tests suggest that among all input features, the strongest predictor is the 3D temperature field, with which the new model can outperform the state-of-the-art by ∼20% in terms of median absolute error. Our approach represents a novel application of deep learning to address a complex water management challenge.
Significance Statement
This study presents a novel approach that combines deep learning and hydrodynamic model outputs to improve the accuracy of hypoxia forecasts in the Chesapeake Bay. By training a deep neural network with both physical and biogeochemical information as input features, the model accurately predicts oxygen concentration at any depth in the water column 1 day in advance. This approach has the potential to benefit stakeholders and inform adaptation measures during the recurring threat of hypoxia in the Chesapeake Bay. The success of this study suggests the potential for similar applications of deep learning to address complex water management challenges. Further research could investigate the application of this approach to different forecast lead times and other regions and ecosystem types.
Abstract
Seasonal hypoxia is a recurring threat to ecosystems and fisheries in the Chesapeake Bay. Hypoxia forecasting based on coupled hydrodynamic and biogeochemical models has proven useful for many stakeholders, as these models excel in accounting for the effects of physical forcing on oxygen supply, but may fall short in replicating the more complex biogeochemical processes that govern oxygen consumption. Satellite-derived reflectances could be used to indicate the presence of surface organic matter over the Bay. However, teasing apart the contribution of atmospheric and aquatic constituents from the signal received by the satellite is not straightforward. As a result, it is difficult to derive surface concentrations of organic matter from satellite data in a robust fashion. A potential solution to this complexity is to use deep learning to build end-to-end applications that do not require precise accounting of the satellite signal from the atmosphere or water, phytoplankton blooms, or sediment plumes. By training a deep neural network with data from a vast suite of variables that could potentially affect oxygen in the water column, improvement of short-term (daily) hypoxia forecast may be possible. Here, we predict oxygen concentrations using inputs that account for both physical and biogeochemical factors. The physical inputs include wind velocity reanalysis information, together with 3D outputs from an estuarine hydrodynamic model, including current velocity, water temperature, and salinity. Satellite-derived spectral reflectance data are used as a surrogate for the biogeochemical factors. These input fields are time series of weekly statistics calculated from daily information, starting 8 weeks before each oxygen observation was collected. To accommodate this input data structure, we adopted a model architecture of long short-term memory networks with eight time steps. At each time step, a set of convolutional neural networks are used to extract information from the inputs. Ablation and cross-validation tests suggest that among all input features, the strongest predictor is the 3D temperature field, with which the new model can outperform the state-of-the-art by ∼20% in terms of median absolute error. Our approach represents a novel application of deep learning to address a complex water management challenge.
Significance Statement
This study presents a novel approach that combines deep learning and hydrodynamic model outputs to improve the accuracy of hypoxia forecasts in the Chesapeake Bay. By training a deep neural network with both physical and biogeochemical information as input features, the model accurately predicts oxygen concentration at any depth in the water column 1 day in advance. This approach has the potential to benefit stakeholders and inform adaptation measures during the recurring threat of hypoxia in the Chesapeake Bay. The success of this study suggests the potential for similar applications of deep learning to address complex water management challenges. Further research could investigate the application of this approach to different forecast lead times and other regions and ecosystem types.
Abstract
Explainable artificial intelligence (XAI) methods shed light on the predictions of machine learning algorithms. Several different approaches exist and have already been applied in climate science. However, usually missing ground truth explanations complicate their evaluation and comparison, subsequently impeding the choice of the XAI method. Therefore, in this work, we introduce XAI evaluation in the climate context and discuss different desired explanation properties, namely, robustness, faithfulness, randomization, complexity, and localization. To this end, we chose previous work as a case study where the decade of annual-mean temperature maps is predicted. After training both a multilayer perceptron (MLP) and a convolutional neural network (CNN), multiple XAI methods are applied and their skill scores in reference to a random uniform explanation are calculated for each property. Independent of the network, we find that XAI methods such as Integrated Gradients, layerwise relevance propagation, and input times gradients exhibit considerable robustness, faithfulness, and complexity while sacrificing randomization performance. Sensitivity methods, gradient, SmoothGrad, NoiseGrad, and FusionGrad, match the robustness skill but sacrifice faithfulness and complexity for the randomization skill. We find architecture-dependent performance differences regarding robustness, complexity, and localization skills of different XAI methods, highlighting the necessity for research task-specific evaluation. Overall, our work offers an overview of different evaluation properties in the climate science context and shows how to compare and benchmark different explanation methods, assessing their suitability based on strengths and weaknesses, for the specific research problem at hand. By that, we aim to support climate researchers in the selection of a suitable XAI method.
Significance Statement
Explainable artificial intelligence (XAI) helps to understand the reasoning behind the prediction of a neural network. XAI methods have been applied in climate science to validate networks and provide new insight into physical processes. However, the increasing number of XAI methods can overwhelm practitioners, making it difficult to choose an explanation method. Since XAI methods’ results can vary, uninformed choices might cause misleading conclusions about the network decision. In this work, we introduce XAI evaluation to compare and assess the performance of explanation methods based on five desirable properties. We demonstrate that XAI evaluation reveals the strengths and weaknesses of different XAI methods. Thus, our work provides climate researchers with the tools to compare, analyze, and subsequently choose explanation methods.
Abstract
Explainable artificial intelligence (XAI) methods shed light on the predictions of machine learning algorithms. Several different approaches exist and have already been applied in climate science. However, usually missing ground truth explanations complicate their evaluation and comparison, subsequently impeding the choice of the XAI method. Therefore, in this work, we introduce XAI evaluation in the climate context and discuss different desired explanation properties, namely, robustness, faithfulness, randomization, complexity, and localization. To this end, we chose previous work as a case study where the decade of annual-mean temperature maps is predicted. After training both a multilayer perceptron (MLP) and a convolutional neural network (CNN), multiple XAI methods are applied and their skill scores in reference to a random uniform explanation are calculated for each property. Independent of the network, we find that XAI methods such as Integrated Gradients, layerwise relevance propagation, and input times gradients exhibit considerable robustness, faithfulness, and complexity while sacrificing randomization performance. Sensitivity methods, gradient, SmoothGrad, NoiseGrad, and FusionGrad, match the robustness skill but sacrifice faithfulness and complexity for the randomization skill. We find architecture-dependent performance differences regarding robustness, complexity, and localization skills of different XAI methods, highlighting the necessity for research task-specific evaluation. Overall, our work offers an overview of different evaluation properties in the climate science context and shows how to compare and benchmark different explanation methods, assessing their suitability based on strengths and weaknesses, for the specific research problem at hand. By that, we aim to support climate researchers in the selection of a suitable XAI method.
Significance Statement
Explainable artificial intelligence (XAI) helps to understand the reasoning behind the prediction of a neural network. XAI methods have been applied in climate science to validate networks and provide new insight into physical processes. However, the increasing number of XAI methods can overwhelm practitioners, making it difficult to choose an explanation method. Since XAI methods’ results can vary, uninformed choices might cause misleading conclusions about the network decision. In this work, we introduce XAI evaluation to compare and assess the performance of explanation methods based on five desirable properties. We demonstrate that XAI evaluation reveals the strengths and weaknesses of different XAI methods. Thus, our work provides climate researchers with the tools to compare, analyze, and subsequently choose explanation methods.