Browse
Abstract
Machine learning algorithms are able to capture complex, nonlinear, interacting relationships and are increasingly used to predict agricultural yield variability at regional and national scales. Using explainable artificial intelligence (XAI) methods applied to such algorithms may enable better scientific understanding of drivers of yield variability. However, XAI methods may provide misleading results when applied to spatiotemporal correlated datasets. In this study, machine learning models are trained to predict simulated crop yield from climate indices, and the impact of cross-validation strategy on the interpretation and performance of the resulting models is assessed. Using data from a process-based crop model allows us to then comment on the plausibility of the “explanations” provided by XAI methods. Our results show that the choice of evaluation strategy has an impact on (i) interpretations of the model and (ii) model skill on held-out years and regions, after the evaluation strategy is used for hyperparameter tuning and feature selection. We find that use of a cross-validation strategy based on clustering in feature space achieves the most plausible interpretations as well as the best model performance on held-out years and regions. Our results provide the first steps toward identifying domain-specific “best practices” for the use of XAI tools on spatiotemporal agricultural or climatic data.
Significance Statement
“Explainable” or “interpretable” machine learning (XAI) methods have been increasingly used in scientific research to study complex relationships between climatic and biogeoscientific variables (such as crop yield). However, these methods can return contradictory, implausible, or ambiguous results. In this study, we train machine learning models to predict maize yield anomalies and vary the model evaluation method used. We find that the evaluation (cross validation) method used has an effect on model interpretation results and on the skill of resulting models in held-out years and regions. These results have implications for the methodological design of studies that aim to use XAI tools to identify drivers of, for example, crop yield variability.
Abstract
Machine learning algorithms are able to capture complex, nonlinear, interacting relationships and are increasingly used to predict agricultural yield variability at regional and national scales. Using explainable artificial intelligence (XAI) methods applied to such algorithms may enable better scientific understanding of drivers of yield variability. However, XAI methods may provide misleading results when applied to spatiotemporal correlated datasets. In this study, machine learning models are trained to predict simulated crop yield from climate indices, and the impact of cross-validation strategy on the interpretation and performance of the resulting models is assessed. Using data from a process-based crop model allows us to then comment on the plausibility of the “explanations” provided by XAI methods. Our results show that the choice of evaluation strategy has an impact on (i) interpretations of the model and (ii) model skill on held-out years and regions, after the evaluation strategy is used for hyperparameter tuning and feature selection. We find that use of a cross-validation strategy based on clustering in feature space achieves the most plausible interpretations as well as the best model performance on held-out years and regions. Our results provide the first steps toward identifying domain-specific “best practices” for the use of XAI tools on spatiotemporal agricultural or climatic data.
Significance Statement
“Explainable” or “interpretable” machine learning (XAI) methods have been increasingly used in scientific research to study complex relationships between climatic and biogeoscientific variables (such as crop yield). However, these methods can return contradictory, implausible, or ambiguous results. In this study, we train machine learning models to predict maize yield anomalies and vary the model evaluation method used. We find that the evaluation (cross validation) method used has an effect on model interpretation results and on the skill of resulting models in held-out years and regions. These results have implications for the methodological design of studies that aim to use XAI tools to identify drivers of, for example, crop yield variability.
Abstract
In this study, an effective method of estimating the volume transport of the Kuroshio Extension (KE) is proposed using surface geostrophic flow inferred from satellite altimetry and vertical stratification derived from climatological temperature/salinity (T/S) profiles. Based on velocity measurements by a subsurface mooring array across the KE, we found that the vertical structure of horizontal flow in this region is dominated by the barotropic and first baroclinic normal modes, which is commendably described by the leading mode of empirical orthogonal functions (EOFs) of the observed velocity profiles as well. Further analysis demonstrates that the projection coefficient of moored velocity onto the superimposed vertical normal mode can be represented by the surface geostrophic velocity as derived from satellite altimetry. Given this relationship, we proposed a dynamical method to estimate the volume transport across the KE jet, which is well verified with both ocean reanalysis and repeated hydrographic data. This finding implicates that, in the regions where the currents render quasi-barotropic structure, it takes only satellite altimetry observation and climatological T/S to estimate the volume transport across any section.
Significance Statement
The Kuroshio Extension (KE) plays an important role in the midlatitude North Pacific climate system. To better understand the KE dynamic and its influences, it is very important to estimate the KE transport. However, direct observation is very difficult in this area. Combining a subsurface mooring array and climatological temperature/salinity data, the vertical structure of the KE is explored in this study using mode decomposition methods. The relationship between the vertical structure of the zonal velocity and surface geostrophic flow observed by satellite altimetry in the KE region is further investigated. Based on this relationship, the KE transport can be well estimated by using satellite altimetry observation and historical hydrographic observation.
Abstract
In this study, an effective method of estimating the volume transport of the Kuroshio Extension (KE) is proposed using surface geostrophic flow inferred from satellite altimetry and vertical stratification derived from climatological temperature/salinity (T/S) profiles. Based on velocity measurements by a subsurface mooring array across the KE, we found that the vertical structure of horizontal flow in this region is dominated by the barotropic and first baroclinic normal modes, which is commendably described by the leading mode of empirical orthogonal functions (EOFs) of the observed velocity profiles as well. Further analysis demonstrates that the projection coefficient of moored velocity onto the superimposed vertical normal mode can be represented by the surface geostrophic velocity as derived from satellite altimetry. Given this relationship, we proposed a dynamical method to estimate the volume transport across the KE jet, which is well verified with both ocean reanalysis and repeated hydrographic data. This finding implicates that, in the regions where the currents render quasi-barotropic structure, it takes only satellite altimetry observation and climatological T/S to estimate the volume transport across any section.
Significance Statement
The Kuroshio Extension (KE) plays an important role in the midlatitude North Pacific climate system. To better understand the KE dynamic and its influences, it is very important to estimate the KE transport. However, direct observation is very difficult in this area. Combining a subsurface mooring array and climatological temperature/salinity data, the vertical structure of the KE is explored in this study using mode decomposition methods. The relationship between the vertical structure of the zonal velocity and surface geostrophic flow observed by satellite altimetry in the KE region is further investigated. Based on this relationship, the KE transport can be well estimated by using satellite altimetry observation and historical hydrographic observation.
Abstract
Across the globe, there has been an increasing interest in improving the predictability of subseasonal hydrometeorological forecasts, as they play a valuable role in medium- to long-term planning in many sectors, such as agriculture, navigation, hydropower, and emergency management. However, these forecasts still have very limited skill at the monthly time scale; hence, this study explores the possibilities for improving forecasts through different pre- and postprocessing techniques at the interface with a Precipitationn–Runoff–Evapotranspiration Hydrological Response Unit Model (PREVAH). Specifically, this research aims to assess the benefit of European weather regime (WR) data within a hybrid forecasting setup, a combination of a traditional hydrological model and a machine learning (ML) algorithm, to improve the performance of subseasonal hydrometeorological forecasts in Switzerland. The WR data contain information about the large-scale atmospheric circulation in the North Atlantic–European region, and thus allow the hydrological model to exploit potential flow-dependent predictability. Four hydrological variables are investigated: total runoff, baseflow, soil moisture, and snowmelt. The improvements in the forecasts achieved with the pre- and postprocessing techniques vary with catchments, lead times, and variables. Adding WR data has clear benefits, but these benefits are not consistent across the study area or among the variables. The usefulness of WR data is generally observed for longer lead times, e.g., beyond the third week. Furthermore, a multimodel approach is applied to determine the “best practice” for each catchment and improve forecast skill over the entire study area. This study highlights the potential and limitations of using WR information to improve subseasonal hydrometeorological forecasts in a hybrid forecasting system in an operational mode.
Abstract
Across the globe, there has been an increasing interest in improving the predictability of subseasonal hydrometeorological forecasts, as they play a valuable role in medium- to long-term planning in many sectors, such as agriculture, navigation, hydropower, and emergency management. However, these forecasts still have very limited skill at the monthly time scale; hence, this study explores the possibilities for improving forecasts through different pre- and postprocessing techniques at the interface with a Precipitationn–Runoff–Evapotranspiration Hydrological Response Unit Model (PREVAH). Specifically, this research aims to assess the benefit of European weather regime (WR) data within a hybrid forecasting setup, a combination of a traditional hydrological model and a machine learning (ML) algorithm, to improve the performance of subseasonal hydrometeorological forecasts in Switzerland. The WR data contain information about the large-scale atmospheric circulation in the North Atlantic–European region, and thus allow the hydrological model to exploit potential flow-dependent predictability. Four hydrological variables are investigated: total runoff, baseflow, soil moisture, and snowmelt. The improvements in the forecasts achieved with the pre- and postprocessing techniques vary with catchments, lead times, and variables. Adding WR data has clear benefits, but these benefits are not consistent across the study area or among the variables. The usefulness of WR data is generally observed for longer lead times, e.g., beyond the third week. Furthermore, a multimodel approach is applied to determine the “best practice” for each catchment and improve forecast skill over the entire study area. This study highlights the potential and limitations of using WR information to improve subseasonal hydrometeorological forecasts in a hybrid forecasting system in an operational mode.
Abstract
Analyses of the Northern Hemisphere’s sea level pressure, air surface temperature, and lower-stratospheric ozone during the period 1900–2019 reveal an existing coherence in their temporal variability. The coherence is heterogeneously distributed over the globe, and the patterns of ozone impact on the pressure and temperature are different. More specifically, the strongest ozone influence on the sea level pressure is found in the main “centers of action”—that is, the Aleutian low and the region of NAO formation. The ozone influence is localized mainly in the latitudinal belt 40°–75°N, where the ozone mixing ratio at 70 hPa is reduced during most of the twentieth century (relative to the first decade of the twenty-first century). This peculiarity of ozone spatial distribution we attribute to the energetic particles trapped in Earth’s radiation belts, activating ion-molecular reactions of ozone production in the region of Regener–Pfotzer ionization maximum. Consequently, the spatial–temporal variations of the lower-atmospheric ionization could be a good explanation for irregularly distributed ozone and its regionally specified impact on the climatic variables.
Significance Statement
We tried to understand the regional character of the Northern Hemisphere’s winter weather conditions. The latter is usually attributed to the North Atlantic Oscillation (NAO), but we actually do not know the factors impacting the NAO variability itself. We found that, at multiannual time scales, the surface pressure is only weakly related to the temperature variations, whereas its correlation with the ozone at 70 hPa is unexpectedly strong—especially in the active regions of the weather phenomena formation. We attribute the ozone variability itself to the variable intensity of energetic particles precipitating in the lower atmosphere—where they activate ion-molecular reactions producing ozone. This finding opens new horizons for understanding the regionality of atmospheric variation at different time scales.
Abstract
Analyses of the Northern Hemisphere’s sea level pressure, air surface temperature, and lower-stratospheric ozone during the period 1900–2019 reveal an existing coherence in their temporal variability. The coherence is heterogeneously distributed over the globe, and the patterns of ozone impact on the pressure and temperature are different. More specifically, the strongest ozone influence on the sea level pressure is found in the main “centers of action”—that is, the Aleutian low and the region of NAO formation. The ozone influence is localized mainly in the latitudinal belt 40°–75°N, where the ozone mixing ratio at 70 hPa is reduced during most of the twentieth century (relative to the first decade of the twenty-first century). This peculiarity of ozone spatial distribution we attribute to the energetic particles trapped in Earth’s radiation belts, activating ion-molecular reactions of ozone production in the region of Regener–Pfotzer ionization maximum. Consequently, the spatial–temporal variations of the lower-atmospheric ionization could be a good explanation for irregularly distributed ozone and its regionally specified impact on the climatic variables.
Significance Statement
We tried to understand the regional character of the Northern Hemisphere’s winter weather conditions. The latter is usually attributed to the North Atlantic Oscillation (NAO), but we actually do not know the factors impacting the NAO variability itself. We found that, at multiannual time scales, the surface pressure is only weakly related to the temperature variations, whereas its correlation with the ozone at 70 hPa is unexpectedly strong—especially in the active regions of the weather phenomena formation. We attribute the ozone variability itself to the variable intensity of energetic particles precipitating in the lower atmosphere—where they activate ion-molecular reactions producing ozone. This finding opens new horizons for understanding the regionality of atmospheric variation at different time scales.
Abstract
It has been proposed that air pollution increases the updraft speeds of warm-phase convective clouds by reducing their supersaturation and, thereby, enhancing their buoyancy. Observations from the GoAmazon field campaign, sampled using subjective criteria, have been offered as evidence for this warm-phase invigoration. Here, we reexamine those GoAmazon observations using objective sampling criteria and find no indication that air pollution increases warm-phase updraft speeds. In addition, the observations yield no statistically significant relationship between aerosol concentrations and either moist-convective vertical velocity or reflectivity in either the lower or upper troposphere.
Abstract
It has been proposed that air pollution increases the updraft speeds of warm-phase convective clouds by reducing their supersaturation and, thereby, enhancing their buoyancy. Observations from the GoAmazon field campaign, sampled using subjective criteria, have been offered as evidence for this warm-phase invigoration. Here, we reexamine those GoAmazon observations using objective sampling criteria and find no indication that air pollution increases warm-phase updraft speeds. In addition, the observations yield no statistically significant relationship between aerosol concentrations and either moist-convective vertical velocity or reflectivity in either the lower or upper troposphere.
Abstract
The moist static energy (MSE) budget is widely used to understand moist atmospheric thermodynamics. However, the budget is not exact, and the accuracy of the approximations that yield it has not been examined rigorously in the context of large-scale tropical motions (horizontal scales ≥ 1000 km). A scale analysis shows that these approximations are most accurate in systems whose latent energy anomalies are considerably larger than the geopotential and kinetic energy anomalies. This condition is satisfied in systems that exhibit phase speeds and horizontal winds on the order of 10 m s−1 or less. Results from a power spectral analysis of data from the DYNAMO field campaign and ERA5 qualitatively agree with the scaling, although they indicate that the neglected terms are smaller than what the scaling suggests. A linear regression analysis of the MJO events that occurred during DYNAMO yields results that support these findings. It is suggested that the MSE budget is accurate in the tropics because motions within these latitudes are constrained to exhibit small fluctuations in geopotential and kinetic energy as a result of weak temperature gradient (WTG) balance.
Abstract
The moist static energy (MSE) budget is widely used to understand moist atmospheric thermodynamics. However, the budget is not exact, and the accuracy of the approximations that yield it has not been examined rigorously in the context of large-scale tropical motions (horizontal scales ≥ 1000 km). A scale analysis shows that these approximations are most accurate in systems whose latent energy anomalies are considerably larger than the geopotential and kinetic energy anomalies. This condition is satisfied in systems that exhibit phase speeds and horizontal winds on the order of 10 m s−1 or less. Results from a power spectral analysis of data from the DYNAMO field campaign and ERA5 qualitatively agree with the scaling, although they indicate that the neglected terms are smaller than what the scaling suggests. A linear regression analysis of the MJO events that occurred during DYNAMO yields results that support these findings. It is suggested that the MSE budget is accurate in the tropics because motions within these latitudes are constrained to exhibit small fluctuations in geopotential and kinetic energy as a result of weak temperature gradient (WTG) balance.
Abstract
The Argo array provides nearly 4000 temperature and salinity profiles of the top 2000 m of the ocean every 10 days. Still, Argo floats will never be able to measure the ocean at all times, everywhere. Optimized Argo float distributions should match the spatial and temporal variability of the many societally important ocean features that they observe. Determining these distributions is challenging because float advection is difficult to predict. Using no external models, transition matrices based on existing Argo trajectories provide statistical inferences about Argo float motion. We use the 24 years of Argo locations to construct an optimal transition matrix that minimizes estimation bias and uncertainty. The optimal array is determined to have a 2° × 2° spatial resolution with a 90-day time step. We then use the transition matrix to predict the probability of future float locations of the core Argo array, the Global Biogeochemical Array, and the Southern Ocean Carbon and Climate Observations and Modeling (SOCCOM) array. A comparison of transition matrices derived from floats using Argos system and Iridium communication methods shows the impact of surface displacements, which is most apparent near the equator. Additionally, we demonstrate the utility of transition matrices for validating models by comparing the matrix derived from Argo floats with that derived from a particle release experiment in the Southern Ocean State Estimate (SOSE).
Abstract
The Argo array provides nearly 4000 temperature and salinity profiles of the top 2000 m of the ocean every 10 days. Still, Argo floats will never be able to measure the ocean at all times, everywhere. Optimized Argo float distributions should match the spatial and temporal variability of the many societally important ocean features that they observe. Determining these distributions is challenging because float advection is difficult to predict. Using no external models, transition matrices based on existing Argo trajectories provide statistical inferences about Argo float motion. We use the 24 years of Argo locations to construct an optimal transition matrix that minimizes estimation bias and uncertainty. The optimal array is determined to have a 2° × 2° spatial resolution with a 90-day time step. We then use the transition matrix to predict the probability of future float locations of the core Argo array, the Global Biogeochemical Array, and the Southern Ocean Carbon and Climate Observations and Modeling (SOCCOM) array. A comparison of transition matrices derived from floats using Argos system and Iridium communication methods shows the impact of surface displacements, which is most apparent near the equator. Additionally, we demonstrate the utility of transition matrices for validating models by comparing the matrix derived from Argo floats with that derived from a particle release experiment in the Southern Ocean State Estimate (SOSE).
Abstract
We explore the possible role of plant–atmosphere feedbacks in accelerating forest expansion using a simple example of forest establishment. We use an unconventional experimental design to simulate an initial forest establishment and the subsequent response of climate and nearby vegetation. We find that the forest’s existence produces favorable nearby growing-season conditions that would promote forest expansion. Specifically, we consider a hypothetical region of forest expansion in modern Alaska. We find that the forest acts as a source of heat and moisture for plants to the west, leading them to experience earlier springtime temperatures, snowmelt, and growth. Summertime cooling and cloud formation over the forest also drive a circulation change that reduces summertime cloud cover south of the forest, increasing solar radiation reaching plants there and driving warming. By isolating these vegetation–atmosphere interactions as the mechanisms of increased growth, we demonstrate the potential for forest expansion to be accelerated in a way that has not been highlighted before. These simulations illuminate two separate mechanisms that lead to increased plant growth nearby: 1) springtime heat advection and 2) summertime cloud feedbacks and circulation changes; both have implications for our understanding of past changes in forest cover and the predictability of biophysical impacts from afforestation projects and climate change–driven forest-cover changes. By examining these feedbacks, we seek to gain a more comprehensive understanding of past and potential future land–atmosphere interactions.
Significance Statement
This study investigates whether the emergence of a high-latitude forest could influence the way water and energy are exchanged between the land and atmosphere in a way that impacts nearby growing conditions and subsequent forest expansion. We use a computer model to simulate a climate with and without forest establishment in the high latitudes and test the response of plants surrounding the forest to the two different climates. We find that a forest is indeed able to spur neighboring plant growth by modifying regional climate and producing more favorable growing conditions for surrounding vegetation. Specifically, forest establishment can bring better growing conditions to plants adjacent to it by warming the air and altering nearby circulation and cloud cover.
Abstract
We explore the possible role of plant–atmosphere feedbacks in accelerating forest expansion using a simple example of forest establishment. We use an unconventional experimental design to simulate an initial forest establishment and the subsequent response of climate and nearby vegetation. We find that the forest’s existence produces favorable nearby growing-season conditions that would promote forest expansion. Specifically, we consider a hypothetical region of forest expansion in modern Alaska. We find that the forest acts as a source of heat and moisture for plants to the west, leading them to experience earlier springtime temperatures, snowmelt, and growth. Summertime cooling and cloud formation over the forest also drive a circulation change that reduces summertime cloud cover south of the forest, increasing solar radiation reaching plants there and driving warming. By isolating these vegetation–atmosphere interactions as the mechanisms of increased growth, we demonstrate the potential for forest expansion to be accelerated in a way that has not been highlighted before. These simulations illuminate two separate mechanisms that lead to increased plant growth nearby: 1) springtime heat advection and 2) summertime cloud feedbacks and circulation changes; both have implications for our understanding of past changes in forest cover and the predictability of biophysical impacts from afforestation projects and climate change–driven forest-cover changes. By examining these feedbacks, we seek to gain a more comprehensive understanding of past and potential future land–atmosphere interactions.
Significance Statement
This study investigates whether the emergence of a high-latitude forest could influence the way water and energy are exchanged between the land and atmosphere in a way that impacts nearby growing conditions and subsequent forest expansion. We use a computer model to simulate a climate with and without forest establishment in the high latitudes and test the response of plants surrounding the forest to the two different climates. We find that a forest is indeed able to spur neighboring plant growth by modifying regional climate and producing more favorable growing conditions for surrounding vegetation. Specifically, forest establishment can bring better growing conditions to plants adjacent to it by warming the air and altering nearby circulation and cloud cover.
Abstract
Geostationary observations provide measurements of the cloud liquid water path (LWP), permitting continuous observation of cloud evolution throughout the daylit portion of the diurnal cycle. Relative to LWP derived from microwave imagery, these observations have biases related to scattering geometry, which systematically varies throughout the day. Therefore, we have developed a set of bias corrections using microwave LWP for the Geostationary Operational Environmental Satellite-16 and -17 (GOES-16 and GOES-17) observations of LWP derived from retrieved cloud-optical properties. The bias corrections are defined based on scattering geometry (solar zenith, sensor zenith, and relative azimuth) and low cloud fraction. We demonstrate that over the low-cloud regions of the northeast and southeast Pacific, these bias corrections drastically improve the characteristics of the retrieved LWP, including its regional distribution, diurnal variation, and evolution along short-time-scale Lagrangian trajectories.
Significance Statement
Large uncertainty exists in cloud liquid water path derived from geostationary observations, which is caused by changes in the scattering geometry of sunlight throughout the day. This complicates the usefulness of geostationary satellites to analyze the time evolution of clouds using geostationary data. Therefore, microwave imagery observations of liquid water path, which do not depend on scattering geometry, are used to create a set of corrections for geostationary data that can be used in future studies to analyze the time evolution of clouds from space.
Abstract
Geostationary observations provide measurements of the cloud liquid water path (LWP), permitting continuous observation of cloud evolution throughout the daylit portion of the diurnal cycle. Relative to LWP derived from microwave imagery, these observations have biases related to scattering geometry, which systematically varies throughout the day. Therefore, we have developed a set of bias corrections using microwave LWP for the Geostationary Operational Environmental Satellite-16 and -17 (GOES-16 and GOES-17) observations of LWP derived from retrieved cloud-optical properties. The bias corrections are defined based on scattering geometry (solar zenith, sensor zenith, and relative azimuth) and low cloud fraction. We demonstrate that over the low-cloud regions of the northeast and southeast Pacific, these bias corrections drastically improve the characteristics of the retrieved LWP, including its regional distribution, diurnal variation, and evolution along short-time-scale Lagrangian trajectories.
Significance Statement
Large uncertainty exists in cloud liquid water path derived from geostationary observations, which is caused by changes in the scattering geometry of sunlight throughout the day. This complicates the usefulness of geostationary satellites to analyze the time evolution of clouds using geostationary data. Therefore, microwave imagery observations of liquid water path, which do not depend on scattering geometry, are used to create a set of corrections for geostationary data that can be used in future studies to analyze the time evolution of clouds from space.
Abstract
The societies of the coastal regions of the Greater Horn of Africa (GHA) experience two distinct rainy seasons: the generally wetter “long” rains in the boreal spring and the generally drier “short” rains in the boreal fall. The GHA rainfall climatology is unique for its latitude in both its aridity and for the dynamical differences between its two rainy seasons. This study explains the drivers of the rainy seasons through the climatology of moist static stability, estimated as the difference between surface moist static energy hs
and midtropospheric saturation moist static energy
Abstract
The societies of the coastal regions of the Greater Horn of Africa (GHA) experience two distinct rainy seasons: the generally wetter “long” rains in the boreal spring and the generally drier “short” rains in the boreal fall. The GHA rainfall climatology is unique for its latitude in both its aridity and for the dynamical differences between its two rainy seasons. This study explains the drivers of the rainy seasons through the climatology of moist static stability, estimated as the difference between surface moist static energy hs
and midtropospheric saturation moist static energy