Browse
Abstract
Satellite-based remotely sensed observations of snow cover fraction (SCF) can have data gaps in spatially distributed coverage from sensor and orbital limitations. We mitigate these limitations in the example fine-resolution Moderate Resolution Imaging Spectroradiometer (MODIS) data by gap-filling using auxiliary 1-km datasets that either aid in downscaling from coarser-resolution (5 km) MODIS SCF wherever not fully covered by clouds, or else by themselves via regression wherever fully cloud covered. This study’s prototype predicts a 1-km version of the 500-m MOD10A1 SCF target. Due to noncollocatedness of spatial gaps even across input and auxiliary datasets, we consider a recent gap-agnostic advancement of partial convolution in computer vision for both training and predictive gap-filling. Partial convolution accommodates spatially consistent gaps across the input images, effectively implementing a two-dimensional masking. To overcome reduced usable data from noncollocated spatial gaps across inputs, we innovate a fully generalized three-dimensional masking in this partial convolution. This enables a valid output value at a pixel even if only a single valid input variable and its value exist in the neighborhood covered by the convolutional filter zone centered around that pixel. Thus, our gap-agnostic technique can use significantly more examples for training (∼67%) and prediction (∼100%), instead of only less than 10% for the previous partial convolution. We train an example simple three-layer legacy super-resolution convolutional neural network (SRCNN) to obtain downscaling and regression component performances that are better than baseline values of either climatology or MOD10C1 SCF as relevant. Our generalized partial convolution can enable multiple Earth science applications like downscaling, regression, classification, and segmentation that were hindered by data gaps.
Abstract
Satellite-based remotely sensed observations of snow cover fraction (SCF) can have data gaps in spatially distributed coverage from sensor and orbital limitations. We mitigate these limitations in the example fine-resolution Moderate Resolution Imaging Spectroradiometer (MODIS) data by gap-filling using auxiliary 1-km datasets that either aid in downscaling from coarser-resolution (5 km) MODIS SCF wherever not fully covered by clouds, or else by themselves via regression wherever fully cloud covered. This study’s prototype predicts a 1-km version of the 500-m MOD10A1 SCF target. Due to noncollocatedness of spatial gaps even across input and auxiliary datasets, we consider a recent gap-agnostic advancement of partial convolution in computer vision for both training and predictive gap-filling. Partial convolution accommodates spatially consistent gaps across the input images, effectively implementing a two-dimensional masking. To overcome reduced usable data from noncollocated spatial gaps across inputs, we innovate a fully generalized three-dimensional masking in this partial convolution. This enables a valid output value at a pixel even if only a single valid input variable and its value exist in the neighborhood covered by the convolutional filter zone centered around that pixel. Thus, our gap-agnostic technique can use significantly more examples for training (∼67%) and prediction (∼100%), instead of only less than 10% for the previous partial convolution. We train an example simple three-layer legacy super-resolution convolutional neural network (SRCNN) to obtain downscaling and regression component performances that are better than baseline values of either climatology or MOD10C1 SCF as relevant. Our generalized partial convolution can enable multiple Earth science applications like downscaling, regression, classification, and segmentation that were hindered by data gaps.
Abstract
In the hydrological sciences, the outstanding challenge of regional modeling requires to capture common and event-specific hydrologic behaviors driven by rainfall spatial variability and catchment physiography during floods. The overall objective of this study is to develop robust understanding and predictive capability of how rainfall spatial variability influences flood peak discharge relative to basin physiography. A machine-learning approach is used on a high-resolution dataset of rainfall and flooding events spanning 10 years, with rainfall events and basins of widely varying characteristics selected across the continental United States. It overcomes major limitations in prior studies that were based on limited observations or hydrological model simulations. This study explores first-order dependencies in the relationships between peak discharge, rainfall variability, and basin physiography, and it sheds light on these complex interactions using a multidimensional statistical modeling approach. Among different machine-learning techniques, XGBoost is used to determine the significant physiographical and rainfall characteristics that influence peak discharge through variable importance analysis. A parsimonious model with low bias and variance is created that can be deployed in the future for flash flood forecasting. The results confirm that, although the spatial organization of rainfall within a basin has a major influence on basin response, basin physiography is the primary driver of peak discharge. These findings have unprecedented spatial and temporal representativeness in terms of flood characterization across basins. An improved understanding of subbasin scale rainfall spatial variability will aid in robust flash flood characterization as well as with identifying basins that could most benefit from distributed hydrologic modeling.
Abstract
In the hydrological sciences, the outstanding challenge of regional modeling requires to capture common and event-specific hydrologic behaviors driven by rainfall spatial variability and catchment physiography during floods. The overall objective of this study is to develop robust understanding and predictive capability of how rainfall spatial variability influences flood peak discharge relative to basin physiography. A machine-learning approach is used on a high-resolution dataset of rainfall and flooding events spanning 10 years, with rainfall events and basins of widely varying characteristics selected across the continental United States. It overcomes major limitations in prior studies that were based on limited observations or hydrological model simulations. This study explores first-order dependencies in the relationships between peak discharge, rainfall variability, and basin physiography, and it sheds light on these complex interactions using a multidimensional statistical modeling approach. Among different machine-learning techniques, XGBoost is used to determine the significant physiographical and rainfall characteristics that influence peak discharge through variable importance analysis. A parsimonious model with low bias and variance is created that can be deployed in the future for flash flood forecasting. The results confirm that, although the spatial organization of rainfall within a basin has a major influence on basin response, basin physiography is the primary driver of peak discharge. These findings have unprecedented spatial and temporal representativeness in terms of flood characterization across basins. An improved understanding of subbasin scale rainfall spatial variability will aid in robust flash flood characterization as well as with identifying basins that could most benefit from distributed hydrologic modeling.
Abstract
Hydrologic predictions at rural watersheds are important but also challenging due to data shortage. Long short-term memory (LSTM) networks are a promising machine learning approach and have demonstrated good performance in streamflow predictions. However, due to its data-hungry nature, most LSTM applications focus on well-monitored catchments with abundant and high-quality observations. In this work, we investigate predictive capabilities of LSTM in poorly monitored watersheds with short observation records. To address three main challenges of LSTM applications in data-scarce locations, i.e., overfitting, uncertainty quantification (UQ), and out-of-distribution prediction, we evaluate different regularization techniques to prevent overfitting, apply a Bayesian LSTM for UQ, and introduce a physics-informed hybrid LSTM to enhance out-of-distribution prediction. Through case studies in two diverse sets of catchments with and without snow influence, we demonstrate that 1) when hydrologic variability in the prediction period is similar to the calibration period, LSTM models can reasonably predict daily streamflow with Nash–Sutcliffe efficiency above 0.8, even with only 2 years of calibration data; 2) when the hydrologic variability in the prediction and calibration periods is dramatically different, LSTM alone does not predict well, but the hybrid model can improve the out-of-distribution prediction with acceptable generalization accuracy; 3) L2 norm penalty and dropout can mitigate overfitting, and Bayesian and hybrid LSTM have no overfitting; and 4) Bayesian LSTM provides useful uncertainty information to improve prediction understanding and credibility. These insights have vital implications for streamflow simulation in watersheds where data quality and availability are a critical issue.
Abstract
Hydrologic predictions at rural watersheds are important but also challenging due to data shortage. Long short-term memory (LSTM) networks are a promising machine learning approach and have demonstrated good performance in streamflow predictions. However, due to its data-hungry nature, most LSTM applications focus on well-monitored catchments with abundant and high-quality observations. In this work, we investigate predictive capabilities of LSTM in poorly monitored watersheds with short observation records. To address three main challenges of LSTM applications in data-scarce locations, i.e., overfitting, uncertainty quantification (UQ), and out-of-distribution prediction, we evaluate different regularization techniques to prevent overfitting, apply a Bayesian LSTM for UQ, and introduce a physics-informed hybrid LSTM to enhance out-of-distribution prediction. Through case studies in two diverse sets of catchments with and without snow influence, we demonstrate that 1) when hydrologic variability in the prediction period is similar to the calibration period, LSTM models can reasonably predict daily streamflow with Nash–Sutcliffe efficiency above 0.8, even with only 2 years of calibration data; 2) when the hydrologic variability in the prediction and calibration periods is dramatically different, LSTM alone does not predict well, but the hybrid model can improve the out-of-distribution prediction with acceptable generalization accuracy; 3) L2 norm penalty and dropout can mitigate overfitting, and Bayesian and hybrid LSTM have no overfitting; and 4) Bayesian LSTM provides useful uncertainty information to improve prediction understanding and credibility. These insights have vital implications for streamflow simulation in watersheds where data quality and availability are a critical issue.
Abstract
Soil moisture (SM) links the water and energy cycles over the land–atmosphere interface and largely determines ecosystem functionality, positioning it as an essential player in the Earth system. Despite its importance, accurate estimation of large-scale SM remains a challenge. Here we leverage the strength of neural network (NN) and fidelity of long-term measurements to develop a daily multilayer cropland SM dataset for China from 1981 to 2013, implemented for a range of different cropping patterns. The training and testing of the NN for the five soil layers (0–50 cm, 10-cm depth each) yield R 2 values of 0.65–0.70 and 0.64–0.69, respectively. Our analysis reveals that precipitation and soil properties are the two dominant factors determining SM, but cropping pattern is also crucial. In addition, our simulations of alternative cropping patterns indicate that winter wheat followed by fallow will largely alleviate the SM depletion in most parts of China. On the other hand, cropping patterns of fallow in the winter followed by maize/soybean seem to further aggravate SM decline in the Huang-Huai-Hai region and southwestern China, relative to prevalent practices of double cropping. This may be due to their low soil porosity, which results in more soil water drainage, as opposed to the case that winter crop roots help maintain SM. This multilayer cropland SM dataset with granularity of cropping patterns provides an important alternative and is complementary to modeled and satellite-retrieved products.
Abstract
Soil moisture (SM) links the water and energy cycles over the land–atmosphere interface and largely determines ecosystem functionality, positioning it as an essential player in the Earth system. Despite its importance, accurate estimation of large-scale SM remains a challenge. Here we leverage the strength of neural network (NN) and fidelity of long-term measurements to develop a daily multilayer cropland SM dataset for China from 1981 to 2013, implemented for a range of different cropping patterns. The training and testing of the NN for the five soil layers (0–50 cm, 10-cm depth each) yield R 2 values of 0.65–0.70 and 0.64–0.69, respectively. Our analysis reveals that precipitation and soil properties are the two dominant factors determining SM, but cropping pattern is also crucial. In addition, our simulations of alternative cropping patterns indicate that winter wheat followed by fallow will largely alleviate the SM depletion in most parts of China. On the other hand, cropping patterns of fallow in the winter followed by maize/soybean seem to further aggravate SM decline in the Huang-Huai-Hai region and southwestern China, relative to prevalent practices of double cropping. This may be due to their low soil porosity, which results in more soil water drainage, as opposed to the case that winter crop roots help maintain SM. This multilayer cropland SM dataset with granularity of cropping patterns provides an important alternative and is complementary to modeled and satellite-retrieved products.
Abstract
This study compares the performance of Global Ensemble Forecast System (GEFS) and European Centre for Medium-Range Weather Forecasts (ECMWF) precipitation ensemble forecasts in Brazil and evaluates different analog-based methods and a logistic regression method for postprocessing the GEFS forecasts. The numerical weather prediction (NWP) forecasts were evaluated against the Physical Science Division South America Daily Gridded Precipitation dataset using both deterministic and probabilistic forecasting evaluation metrics. The results show that the ensemble precipitation forecasts performed commonly well in the east and poorly in the northwest of Brazil, independent of the models and the postprocessing methods. While the raw ECMWF forecasts performed better than the raw GEFS forecasts, analog-based GEFS forecasts were more skillful and reliable than both raw ECMWF and GEFS forecasts. The choice of a specific postprocessing strategy had less impact on the performance than the postprocessing itself. Nonetheless, forecasts produced with different analog-based postprocessing strategies were significantly different and were more skillful and as reliable and sharp as forecasts produced with the logistic regression method. The approach considering the logarithm of current and past reforecasts as the measure of closeness between analogs was identified as the best strategy. The results also indicate that the postprocessing using analog methods with long-term reforecast archive improved raw GEFS precipitation forecasting skill more than using logistic regression with short-term reforecast archive. In particular, the postprocessing dramatically improves the GEFS precipitation forecasts when the forecasting skill is low or below zero.
Abstract
This study compares the performance of Global Ensemble Forecast System (GEFS) and European Centre for Medium-Range Weather Forecasts (ECMWF) precipitation ensemble forecasts in Brazil and evaluates different analog-based methods and a logistic regression method for postprocessing the GEFS forecasts. The numerical weather prediction (NWP) forecasts were evaluated against the Physical Science Division South America Daily Gridded Precipitation dataset using both deterministic and probabilistic forecasting evaluation metrics. The results show that the ensemble precipitation forecasts performed commonly well in the east and poorly in the northwest of Brazil, independent of the models and the postprocessing methods. While the raw ECMWF forecasts performed better than the raw GEFS forecasts, analog-based GEFS forecasts were more skillful and reliable than both raw ECMWF and GEFS forecasts. The choice of a specific postprocessing strategy had less impact on the performance than the postprocessing itself. Nonetheless, forecasts produced with different analog-based postprocessing strategies were significantly different and were more skillful and as reliable and sharp as forecasts produced with the logistic regression method. The approach considering the logarithm of current and past reforecasts as the measure of closeness between analogs was identified as the best strategy. The results also indicate that the postprocessing using analog methods with long-term reforecast archive improved raw GEFS precipitation forecasting skill more than using logistic regression with short-term reforecast archive. In particular, the postprocessing dramatically improves the GEFS precipitation forecasts when the forecasting skill is low or below zero.