Browse
Abstract
Resampling methods such as cross validation or bootstrap are often employed to estimate the uncertainty in a loss function due to sampling variability, usually for the purpose of model selection. In models that require nonlinear optimization, however, the existence of local minima in the loss function landscape introduces an additional source of variability that is confounded with sampling variability. In other words, some portion of the variability in the loss function across different resamples is due to local minima. Given that statistically sound model selection is based on an examination of variance, it is important to disentangle these two sources of variability. To that end, a methodology is developed for estimating each, specifically in the context of K-fold cross validation, and neural networks (NN) whose training leads to different local minima. Random effects models are used to estimate the two variance components—that due to sampling and that due to local minima. The results are examined as a function of the number of hidden nodes, and the variance of the initial weights, with the latter controlling the “depth” of local minima. The main goal of the methodology is to increase statistical power in model selection and/or model comparison. Using both simulated and realistic data, it is shown that the two sources of variability can be comparable, casting doubt on model selection methods that ignore the variability due to local minima. Furthermore, the methodology is sufficiently flexible so as to allow assessment of the effect of other/any NN parameters on variability.
Abstract
Resampling methods such as cross validation or bootstrap are often employed to estimate the uncertainty in a loss function due to sampling variability, usually for the purpose of model selection. In models that require nonlinear optimization, however, the existence of local minima in the loss function landscape introduces an additional source of variability that is confounded with sampling variability. In other words, some portion of the variability in the loss function across different resamples is due to local minima. Given that statistically sound model selection is based on an examination of variance, it is important to disentangle these two sources of variability. To that end, a methodology is developed for estimating each, specifically in the context of K-fold cross validation, and neural networks (NN) whose training leads to different local minima. Random effects models are used to estimate the two variance components—that due to sampling and that due to local minima. The results are examined as a function of the number of hidden nodes, and the variance of the initial weights, with the latter controlling the “depth” of local minima. The main goal of the methodology is to increase statistical power in model selection and/or model comparison. Using both simulated and realistic data, it is shown that the two sources of variability can be comparable, casting doubt on model selection methods that ignore the variability due to local minima. Furthermore, the methodology is sufficiently flexible so as to allow assessment of the effect of other/any NN parameters on variability.
Abstract
An accurate characterization of the water content of snowpack, or snow water equivalent (SWE), is necessary to quantify water availability and constrain hydrologic and land surface models. Recently, airborne observations (e.g., lidar) have emerged as a promising method to accurately quantify SWE at high resolutions (scales of ∼100 m and finer). However, the frequency of these observations is very low, typically once or twice per season in the Rocky Mountains of Colorado. Here, we present a machine learning framework that is based on random forests to model temporally sparse lidar-derived SWE, enabling estimation of SWE at unmapped time points. We approximated the physical processes governing snow accumulation and melt as well as snow characteristics by obtaining 15 different variables from gridded estimates of precipitation, temperature, surface reflectance, elevation, and canopy. Results showed that, in the Rocky Mountains of Colorado, our framework is capable of modeling SWE with a higher accuracy when compared with estimates generated by the Snow Data Assimilation System (SNODAS). The mean value of the coefficient of determination R 2 using our approach was 0.57, and the root-mean-square error (RMSE) was 13 cm, which was a significant improvement over SNODAS (mean R 2 = 0.13; RMSE = 20 cm). We explored the relative importance of the input variables and observed that, at the spatial resolution of 800 m, meteorological variables are more important drivers of predictive accuracy than surface variables that characterize the properties of snow on the ground. This research provides a framework to expand the applicability of lidar-derived SWE to unmapped time points.
Significance Statement
Snowpack is the main source of freshwater for close to 2 billion people globally and needs to be estimated accurately. Mountainous snowpack is highly variable and is challenging to quantify. Recently, lidar technology has been employed to observe snow in great detail, but it is costly and can only be used sparingly. To counter that, we use machine learning to estimate snowpack when lidar data are not available. We approximate the processes that govern snowpack by incorporating meteorological and satellite data. We found that variables associated with precipitation and temperature have more predictive power than variables that characterize snowpack properties. Our work helps to improve snowpack estimation, which is critical for sustainable management of water resources.
Abstract
An accurate characterization of the water content of snowpack, or snow water equivalent (SWE), is necessary to quantify water availability and constrain hydrologic and land surface models. Recently, airborne observations (e.g., lidar) have emerged as a promising method to accurately quantify SWE at high resolutions (scales of ∼100 m and finer). However, the frequency of these observations is very low, typically once or twice per season in the Rocky Mountains of Colorado. Here, we present a machine learning framework that is based on random forests to model temporally sparse lidar-derived SWE, enabling estimation of SWE at unmapped time points. We approximated the physical processes governing snow accumulation and melt as well as snow characteristics by obtaining 15 different variables from gridded estimates of precipitation, temperature, surface reflectance, elevation, and canopy. Results showed that, in the Rocky Mountains of Colorado, our framework is capable of modeling SWE with a higher accuracy when compared with estimates generated by the Snow Data Assimilation System (SNODAS). The mean value of the coefficient of determination R 2 using our approach was 0.57, and the root-mean-square error (RMSE) was 13 cm, which was a significant improvement over SNODAS (mean R 2 = 0.13; RMSE = 20 cm). We explored the relative importance of the input variables and observed that, at the spatial resolution of 800 m, meteorological variables are more important drivers of predictive accuracy than surface variables that characterize the properties of snow on the ground. This research provides a framework to expand the applicability of lidar-derived SWE to unmapped time points.
Significance Statement
Snowpack is the main source of freshwater for close to 2 billion people globally and needs to be estimated accurately. Mountainous snowpack is highly variable and is challenging to quantify. Recently, lidar technology has been employed to observe snow in great detail, but it is costly and can only be used sparingly. To counter that, we use machine learning to estimate snowpack when lidar data are not available. We approximate the processes that govern snowpack by incorporating meteorological and satellite data. We found that variables associated with precipitation and temperature have more predictive power than variables that characterize snowpack properties. Our work helps to improve snowpack estimation, which is critical for sustainable management of water resources.
Abstract
We investigate the predictability of the sign of daily southeastern U.S. (SEUS) precipitation anomalies associated with simultaneous predictors of large-scale climate variability using machine learning models. Models using index-based climate predictors and gridded fields of large-scale circulation as predictors are utilized. Logistic regression (LR) and fully connected neural networks using indices of climate phenomena as predictors produce neither accurate nor reliable predictions, indicating that the indices themselves are not good predictors. Using gridded fields as predictors, an LR and convolutional neural network (CNN) are more accurate than the index-based models. However, only the CNN can produce reliable predictions that can be used to identify forecasts of opportunity. Using explainable machine learning we identify which variables and grid points of the input fields are most relevant for confident and correct predictions in the CNN. Our results show that the local circulation is most important as represented by maximum relevance of 850-hPa geopotential heights and zonal winds to making skillful, high-probability predictions. Corresponding composite anomalies identify connections with El Niño–Southern Oscillation during winter and the Atlantic multidecadal oscillation and North Atlantic subtropical high during summer.
Abstract
We investigate the predictability of the sign of daily southeastern U.S. (SEUS) precipitation anomalies associated with simultaneous predictors of large-scale climate variability using machine learning models. Models using index-based climate predictors and gridded fields of large-scale circulation as predictors are utilized. Logistic regression (LR) and fully connected neural networks using indices of climate phenomena as predictors produce neither accurate nor reliable predictions, indicating that the indices themselves are not good predictors. Using gridded fields as predictors, an LR and convolutional neural network (CNN) are more accurate than the index-based models. However, only the CNN can produce reliable predictions that can be used to identify forecasts of opportunity. Using explainable machine learning we identify which variables and grid points of the input fields are most relevant for confident and correct predictions in the CNN. Our results show that the local circulation is most important as represented by maximum relevance of 850-hPa geopotential heights and zonal winds to making skillful, high-probability predictions. Corresponding composite anomalies identify connections with El Niño–Southern Oscillation during winter and the Atlantic multidecadal oscillation and North Atlantic subtropical high during summer.
Abstract
In the last decade, much work in atmospheric science has focused on spatial verification (SV) methods for gridded prediction, which overcome serious disadvantages of pixelwise verification. However, neural networks (NN) in atmospheric science are almost always trained to optimize pixelwise loss functions, even when ultimately assessed with SV methods. This establishes a disconnect between model verification during versus after training. To address this issue, we develop spatially enhanced loss functions (SELF) and demonstrate their use for a real-world problem: predicting the occurrence of thunderstorms (henceforth, “convection”) with NNs. In each SELF we use either a neighborhood filter, which highlights convection at scales larger than a threshold, or a spectral filter (employing Fourier or wavelet decomposition), which is more flexible and highlights convection at scales between two thresholds. We use these filters to spatially enhance common verification scores, such as the Brier score. We train each NN with a different SELF and compare their performance at many scales of convection, from discrete storm cells to tropical cyclones. Among our many findings are that (i) for a low or high risk threshold, the ideal SELF focuses on small or large scales, respectively; (ii) models trained with a pixelwise loss function perform surprisingly well; and (iii) nevertheless, models trained with a spectral filter produce much better-calibrated probabilities than a pixelwise model. We provide a general guide to using SELFs, including technical challenges and the final Python code, as well as demonstrating their use for the convection problem. To our knowledge this is the most in-depth guide to SELFs in the geosciences.
Significance Statement
Gridded predictions, in which a quantity is predicted at every pixel in space, should be verified with spatially aware methods rather than pixel by pixel. Neural networks (NN), which are often used for gridded prediction, are trained to minimize an error value called the loss function. NN loss functions in atmospheric science are almost always pixelwise, which causes the predictions to miss rare events and contain unrealistic spatial patterns. We use spatial filters to enhance NN loss functions, and we test our novel spatially enhanced loss functions (SELF) on thunderstorm prediction. We find that different SELFs work better for different scales (i.e., different-sized thunderstorm complexes) and that spectral filters, one of the two filter types, produce unexpectedly well calibrated thunderstorm probabilities.
Abstract
In the last decade, much work in atmospheric science has focused on spatial verification (SV) methods for gridded prediction, which overcome serious disadvantages of pixelwise verification. However, neural networks (NN) in atmospheric science are almost always trained to optimize pixelwise loss functions, even when ultimately assessed with SV methods. This establishes a disconnect between model verification during versus after training. To address this issue, we develop spatially enhanced loss functions (SELF) and demonstrate their use for a real-world problem: predicting the occurrence of thunderstorms (henceforth, “convection”) with NNs. In each SELF we use either a neighborhood filter, which highlights convection at scales larger than a threshold, or a spectral filter (employing Fourier or wavelet decomposition), which is more flexible and highlights convection at scales between two thresholds. We use these filters to spatially enhance common verification scores, such as the Brier score. We train each NN with a different SELF and compare their performance at many scales of convection, from discrete storm cells to tropical cyclones. Among our many findings are that (i) for a low or high risk threshold, the ideal SELF focuses on small or large scales, respectively; (ii) models trained with a pixelwise loss function perform surprisingly well; and (iii) nevertheless, models trained with a spectral filter produce much better-calibrated probabilities than a pixelwise model. We provide a general guide to using SELFs, including technical challenges and the final Python code, as well as demonstrating their use for the convection problem. To our knowledge this is the most in-depth guide to SELFs in the geosciences.
Significance Statement
Gridded predictions, in which a quantity is predicted at every pixel in space, should be verified with spatially aware methods rather than pixel by pixel. Neural networks (NN), which are often used for gridded prediction, are trained to minimize an error value called the loss function. NN loss functions in atmospheric science are almost always pixelwise, which causes the predictions to miss rare events and contain unrealistic spatial patterns. We use spatial filters to enhance NN loss functions, and we test our novel spatially enhanced loss functions (SELF) on thunderstorm prediction. We find that different SELFs work better for different scales (i.e., different-sized thunderstorm complexes) and that spectral filters, one of the two filter types, produce unexpectedly well calibrated thunderstorm probabilities.
Abstract
Near-surface wind is difficult to estimate using global numerical weather and climate models, because airflow is strongly modified by underlying topography, especially that of a country such as Switzerland. In this article, we use a statistical approach based on deep learning and a high-resolution digital elevation model to spatially downscale hourly near-surface wind fields at coarse resolution from ERA5 reanalysis from their original 25-km grid to a 1.1-km grid. A 1.1-km-resolution wind dataset for 2016–20 from the operational numerical weather prediction model COSMO-1 of the national weather service MeteoSwiss is used to train and validate our model, a generative adversarial network (GAN) with gradient penalized Wasserstein loss aided by transfer learning. The results are realistic-looking high-resolution historical maps of gridded hourly wind fields over Switzerland and very good and robust predictions of the aggregated wind speed distribution. Regionally averaged image-specific metrics show a clear improvement in prediction relative to ERA5, with skill measures generally better for locations over the flatter Swiss Plateau than for Alpine regions. The downscaled wind fields demonstrate higher-resolution, physically plausible orographic effects, such as ridge acceleration and sheltering, that are not resolved in the original ERA5 fields.
Significance Statement
Statistical downscaling, which increases the resolution of atmospheric fields, is widely used to refine the outputs of global reanalysis and climate models, most commonly for temperature and precipitation. Near-surface winds are strongly modified by the underlying topography, generating local flow conditions that can be very difficult to estimate. This study develops a deep learning model that uses local topographic information to spatially downscale hourly near-surface winds from their original 25-km resolution to a 1.1-km grid over Switzerland. Our model produces realistic high-resolution gridded wind fields with expected orographic effects but performs better in flatter regions than in mountains. These downscaled fields are useful for impact assessment and decision-making in regions where global reanalysis data at coarse resolution may be the only products available.
Abstract
Near-surface wind is difficult to estimate using global numerical weather and climate models, because airflow is strongly modified by underlying topography, especially that of a country such as Switzerland. In this article, we use a statistical approach based on deep learning and a high-resolution digital elevation model to spatially downscale hourly near-surface wind fields at coarse resolution from ERA5 reanalysis from their original 25-km grid to a 1.1-km grid. A 1.1-km-resolution wind dataset for 2016–20 from the operational numerical weather prediction model COSMO-1 of the national weather service MeteoSwiss is used to train and validate our model, a generative adversarial network (GAN) with gradient penalized Wasserstein loss aided by transfer learning. The results are realistic-looking high-resolution historical maps of gridded hourly wind fields over Switzerland and very good and robust predictions of the aggregated wind speed distribution. Regionally averaged image-specific metrics show a clear improvement in prediction relative to ERA5, with skill measures generally better for locations over the flatter Swiss Plateau than for Alpine regions. The downscaled wind fields demonstrate higher-resolution, physically plausible orographic effects, such as ridge acceleration and sheltering, that are not resolved in the original ERA5 fields.
Significance Statement
Statistical downscaling, which increases the resolution of atmospheric fields, is widely used to refine the outputs of global reanalysis and climate models, most commonly for temperature and precipitation. Near-surface winds are strongly modified by the underlying topography, generating local flow conditions that can be very difficult to estimate. This study develops a deep learning model that uses local topographic information to spatially downscale hourly near-surface winds from their original 25-km resolution to a 1.1-km grid over Switzerland. Our model produces realistic high-resolution gridded wind fields with expected orographic effects but performs better in flatter regions than in mountains. These downscaled fields are useful for impact assessment and decision-making in regions where global reanalysis data at coarse resolution may be the only products available.
Abstract
NOAA global surface temperature (NOAAGlobalTemp) is NOAA’s operational global surface temperature product, which has been widely used in Earth’s climate assessment and monitoring. To improve the spatial interpolation of monthly land surface air temperatures (LSATs) in NOAAGlobalTemp from 1850 to 2020, a three-layer artificial neural network (ANN) system was designed. The ANN system was trained by repeatedly randomly selecting 90% of the LSATs from ERA5 (1950–2019) and validating with the remaining 10%. Validations show clear improvements of ANN over the original empirical orthogonal teleconnection (EOT) method: the global spatial correlation coefficient (SCC) increases from 65% to 80%, and the global root-mean-square difference (RMSD) decreases from 0.99° to 0.57°C during 1850–2020. The improvements of SCCs and RMSDs are larger in the Southern Hemisphere than in the Northern Hemisphere and are larger before the 1950s and where observations are sparse. The ANN system was finally fed in observed LSATs, and its output over the global land surface was compared with those from the EOT method. Comparisons demonstrate similar improvements by ANN over the EOT method: The global SCC increased from 78% to 89%, the global RMSD decreased from 0.93° to 0.68°C, and the LSAT variability quantified by the monthly standard deviation (STD) increases from 1.16° to 1.41°C during 1850–2020. While the SCC, RMSD, and STD at the monthly time scale have been improved, long-term trends remain largely unchanged because the low-frequency component of LSAT in ANN is identical to that in the EOT approach.
Significance Statement
The spatial interpolation method of an artificial neural network has greatly improved the accuracy of land surface air temperature reconstruction, which reduces root-mean-square error and increases spatial coherence and variabilities over the global land surface from 1850 to 2020.
Abstract
NOAA global surface temperature (NOAAGlobalTemp) is NOAA’s operational global surface temperature product, which has been widely used in Earth’s climate assessment and monitoring. To improve the spatial interpolation of monthly land surface air temperatures (LSATs) in NOAAGlobalTemp from 1850 to 2020, a three-layer artificial neural network (ANN) system was designed. The ANN system was trained by repeatedly randomly selecting 90% of the LSATs from ERA5 (1950–2019) and validating with the remaining 10%. Validations show clear improvements of ANN over the original empirical orthogonal teleconnection (EOT) method: the global spatial correlation coefficient (SCC) increases from 65% to 80%, and the global root-mean-square difference (RMSD) decreases from 0.99° to 0.57°C during 1850–2020. The improvements of SCCs and RMSDs are larger in the Southern Hemisphere than in the Northern Hemisphere and are larger before the 1950s and where observations are sparse. The ANN system was finally fed in observed LSATs, and its output over the global land surface was compared with those from the EOT method. Comparisons demonstrate similar improvements by ANN over the EOT method: The global SCC increased from 78% to 89%, the global RMSD decreased from 0.93° to 0.68°C, and the LSAT variability quantified by the monthly standard deviation (STD) increases from 1.16° to 1.41°C during 1850–2020. While the SCC, RMSD, and STD at the monthly time scale have been improved, long-term trends remain largely unchanged because the low-frequency component of LSAT in ANN is identical to that in the EOT approach.
Significance Statement
The spatial interpolation method of an artificial neural network has greatly improved the accuracy of land surface air temperature reconstruction, which reduces root-mean-square error and increases spatial coherence and variabilities over the global land surface from 1850 to 2020.
Abstract
A deep learning model is presented to nowcast the occurrence of lightning at a 5-min time resolution 60 min into the future. The model is based on a recurrent-convolutional architecture that allows it to recognize and predict the spatiotemporal development of convection, including the motion, growth and decay of thunderstorm cells. The predictions are performed on a stationary grid, without the use of storm object detection and tracking. The input data, collected from an area in and surrounding Switzerland, comprise ground-based radar data, visible/infrared satellite data and derived cloud products, lightning detection, numerical weather prediction, and digital elevation model data. We analyze different alternative loss functions, class weighting strategies and model features, providing guidelines for future studies to select loss functions optimally and to properly calibrate the probabilistic predictions of their model. On the basis of these analyses, we use focal loss in this study but conclude that it only provides a small benefit over cross entropy, which is a viable option if recalibration of the model is not practical. The model achieves a pixelwise critical success index (CSI) of 0.45 to predict lightning occurrence within 8 km over the 60-min nowcast period, ranging from a CSI of 0.75 at a 5-min lead time to a CSI of 0.32 at a 60-min lead time.
Significance Statement
We have developed a method based on artificial intelligence to forecast the occurrence of lightning at 5-min intervals within the next hour from the forecast time. The method utilizes a neural network that learns to predict lightning from a set of training images containing lightning detection data, weather radar observations, satellite imagery, weather forecasts, and elevation data. We find that the network is able to predict the motion, growth, and decay of lightning-producing thunderstorms and that, when properly tuned, it can accurately determine the probability of lightning occurring. This is expected to permit more informed decisions to be made about short-term lightning risks in fields such as civil protection, electricity-grid management, and aviation.
Abstract
A deep learning model is presented to nowcast the occurrence of lightning at a 5-min time resolution 60 min into the future. The model is based on a recurrent-convolutional architecture that allows it to recognize and predict the spatiotemporal development of convection, including the motion, growth and decay of thunderstorm cells. The predictions are performed on a stationary grid, without the use of storm object detection and tracking. The input data, collected from an area in and surrounding Switzerland, comprise ground-based radar data, visible/infrared satellite data and derived cloud products, lightning detection, numerical weather prediction, and digital elevation model data. We analyze different alternative loss functions, class weighting strategies and model features, providing guidelines for future studies to select loss functions optimally and to properly calibrate the probabilistic predictions of their model. On the basis of these analyses, we use focal loss in this study but conclude that it only provides a small benefit over cross entropy, which is a viable option if recalibration of the model is not practical. The model achieves a pixelwise critical success index (CSI) of 0.45 to predict lightning occurrence within 8 km over the 60-min nowcast period, ranging from a CSI of 0.75 at a 5-min lead time to a CSI of 0.32 at a 60-min lead time.
Significance Statement
We have developed a method based on artificial intelligence to forecast the occurrence of lightning at 5-min intervals within the next hour from the forecast time. The method utilizes a neural network that learns to predict lightning from a set of training images containing lightning detection data, weather radar observations, satellite imagery, weather forecasts, and elevation data. We find that the network is able to predict the motion, growth, and decay of lightning-producing thunderstorms and that, when properly tuned, it can accurately determine the probability of lightning occurring. This is expected to permit more informed decisions to be made about short-term lightning risks in fields such as civil protection, electricity-grid management, and aviation.
Abstract
Tropical cyclone (TC) track forecasts derived from dynamical models inherit their errors. In this study, a neural network (NN) algorithm was proposed for postprocessing TC tracks predicted by the Global Ensemble Forecast System (GEFS) for lead times of 2, 4, 5, and 6 days over the western North Pacific. The hybrid NN is a combination of three NN classes: 1) convolutional NN that extracts spatial features from GEFS fields; 2) multilayer perceptron, which processes TC positions predicted by GEFS; and 3) recurrent NN that handles information from previous time steps. A dataset of 204 TCs (6744 samples), which were formed from 1985 to 2019 (June–October) and survived for at least six days, was separated into various track patterns. TCs in each track pattern were distributed uniformly to validation and test dataset, in which each contained 10% TCs of the entire dataset, and the remaining 80% were allocated to the training dataset. Two NN architectures were developed, with and without a shortcut connection. Feature selection and hyperparameter tuning were performed to improve model performance. The results present that mean track error and dispersion could be reduced, particularly with the shortcut connection, which also corrected the systematic speed and direction bias of GEFS. Although a reduction in mean track error was not achieved by the NNs for every forecast lead time, improvement can be foreseen upon calibration for reducing overfitting, and the performance encourages further development in the present application.
Abstract
Tropical cyclone (TC) track forecasts derived from dynamical models inherit their errors. In this study, a neural network (NN) algorithm was proposed for postprocessing TC tracks predicted by the Global Ensemble Forecast System (GEFS) for lead times of 2, 4, 5, and 6 days over the western North Pacific. The hybrid NN is a combination of three NN classes: 1) convolutional NN that extracts spatial features from GEFS fields; 2) multilayer perceptron, which processes TC positions predicted by GEFS; and 3) recurrent NN that handles information from previous time steps. A dataset of 204 TCs (6744 samples), which were formed from 1985 to 2019 (June–October) and survived for at least six days, was separated into various track patterns. TCs in each track pattern were distributed uniformly to validation and test dataset, in which each contained 10% TCs of the entire dataset, and the remaining 80% were allocated to the training dataset. Two NN architectures were developed, with and without a shortcut connection. Feature selection and hyperparameter tuning were performed to improve model performance. The results present that mean track error and dispersion could be reduced, particularly with the shortcut connection, which also corrected the systematic speed and direction bias of GEFS. Although a reduction in mean track error was not achieved by the NNs for every forecast lead time, improvement can be foreseen upon calibration for reducing overfitting, and the performance encourages further development in the present application.
Abstract
Many deep learning technologies have been applied to the Earth sciences. Nonetheless, the difficulty in interpreting deep learning results still prevents their applications to studies on climate dynamics. Here, we applied a convolutional neural network to understand El Niño–Southern Oscillation (ENSO) dynamics from long-term climate model simulations. The deep learning algorithm successfully predicted ENSO events with a high correlation skill (∼0.82) for a 9-month lead. For interpreting deep learning results beyond the prediction, we present a “contribution map” to estimate how much the grid box and variable contribute to the output and “contribution sensitivity” to estimate how much the output variable is changed to the small perturbation of the input variables. The contribution map and sensitivity are calculated by modifying the input variables to the pretrained deep learning, which is quite similar to the occlusion sensitivity. Based on the two methods, we identified three precursors of ENSO and investigated their physical processes with El Niño and La Niña development. In particular, it is suggested here that the roles of each precursor are asymmetric between El Niño and La Niña. Our results suggest that the contribution map and sensitivity are simple approaches but can be a powerful tool in understanding ENSO dynamics and they might be also applied to other climate phenomena.
Abstract
Many deep learning technologies have been applied to the Earth sciences. Nonetheless, the difficulty in interpreting deep learning results still prevents their applications to studies on climate dynamics. Here, we applied a convolutional neural network to understand El Niño–Southern Oscillation (ENSO) dynamics from long-term climate model simulations. The deep learning algorithm successfully predicted ENSO events with a high correlation skill (∼0.82) for a 9-month lead. For interpreting deep learning results beyond the prediction, we present a “contribution map” to estimate how much the grid box and variable contribute to the output and “contribution sensitivity” to estimate how much the output variable is changed to the small perturbation of the input variables. The contribution map and sensitivity are calculated by modifying the input variables to the pretrained deep learning, which is quite similar to the occlusion sensitivity. Based on the two methods, we identified three precursors of ENSO and investigated their physical processes with El Niño and La Niña development. In particular, it is suggested here that the roles of each precursor are asymmetric between El Niño and La Niña. Our results suggest that the contribution map and sensitivity are simple approaches but can be a powerful tool in understanding ENSO dynamics and they might be also applied to other climate phenomena.
Abstract
A simple yet flexible and robust algorithm is described for fully partitioning an arbitrary dataset into compact, nonoverlapping groups or classes, sorted by size, based entirely on a pairwise similarity matrix and a user-specified similarity threshold. Unlike many clustering algorithms, there is no assumption that natural clusters exist in the dataset, although clusters, when present, may be preferentially assigned to one or more classes. The method also does not require data objects to be compared within any coordinate system but rather permits the user to define pairwise similarity using almost any conceivable criterion. The method therefore lends itself to certain geoscientific applications for which conventional clustering methods are unsuited, including two nontrivial and distinctly different datasets presented as examples. In addition to identifying large classes containing numerous similar dataset members, it is also well suited for isolating rare or anomalous members of a dataset. The method is inductive in that prototypes identified in representative subset of a larger dataset can be used to classify the remainder.
Abstract
A simple yet flexible and robust algorithm is described for fully partitioning an arbitrary dataset into compact, nonoverlapping groups or classes, sorted by size, based entirely on a pairwise similarity matrix and a user-specified similarity threshold. Unlike many clustering algorithms, there is no assumption that natural clusters exist in the dataset, although clusters, when present, may be preferentially assigned to one or more classes. The method also does not require data objects to be compared within any coordinate system but rather permits the user to define pairwise similarity using almost any conceivable criterion. The method therefore lends itself to certain geoscientific applications for which conventional clustering methods are unsuited, including two nontrivial and distinctly different datasets presented as examples. In addition to identifying large classes containing numerous similar dataset members, it is also well suited for isolating rare or anomalous members of a dataset. The method is inductive in that prototypes identified in representative subset of a larger dataset can be used to classify the remainder.