Browse
Abstract
Globally available environmental observations (EOs), specifically from satellites and coupled Earth system models, represent some of the largest datasets of the digital age. As the volume of global EOs continues to grow, so does the potential of these data to help Earth scientists discover trends and patterns in Earth systems at large spatial scales. To leverage global EOs for scientific insight, Earth scientists need targeted and accessible exposure to skills in reproducible scientific computing and spatiotemporal data science, and to be empowered to apply their domain understanding to interpret data-driven models for knowledge discovery. The Generalizable, Reproducible, Robust, and Interpreted Environmental (GRRIEn) analysis framework was developed to prepare Earth scientists with an introductory statistics background and limited/no understanding of programming and computational methods to use global EOs to successfully generalize insights from local/regional field measurements across unsampled times and locations. GRRIEn analysis is generalizable, meaning results from a sample are translated to landscape scales by combining direct environmental measurements with global EOs using supervised machine learning; robust, meaning that the model shows good performance on data with scale-dependent feature and observation dependence; reproducible, based on a standard repository structure so that other scientists can quickly and easily replicate the analysis with a few computational tools; and interpreted, meaning that Earth scientists apply domain expertise to ensure that model parameters reflect a physically plausible diagnosis of the environmental system. This tutorial presents standard steps for achieving GRRIEn analysis by combining conventions of rigor in traditional experimental design with the open-science movement.
Significance Statement
Earth science researchers in the digital age are often tasked with pioneering big data analyses, yet have limited formal training in statistics and computational methods such as databasing or computer programming. Earth science researchers often spend tremendous amounts of time learning core computational skills, and making core analytical mistakes, in the process of bridging this training gap, at risk to the reputability of observational geostatistical research. The GRRIEn analytical framework is a practical guide introducing community standards for each phase of the computational research pipeline (dataset engineering, model training, and model diagnostics) to promote rigorous, accessible use of global EOs in Earth systems research.
Abstract
Globally available environmental observations (EOs), specifically from satellites and coupled Earth system models, represent some of the largest datasets of the digital age. As the volume of global EOs continues to grow, so does the potential of these data to help Earth scientists discover trends and patterns in Earth systems at large spatial scales. To leverage global EOs for scientific insight, Earth scientists need targeted and accessible exposure to skills in reproducible scientific computing and spatiotemporal data science, and to be empowered to apply their domain understanding to interpret data-driven models for knowledge discovery. The Generalizable, Reproducible, Robust, and Interpreted Environmental (GRRIEn) analysis framework was developed to prepare Earth scientists with an introductory statistics background and limited/no understanding of programming and computational methods to use global EOs to successfully generalize insights from local/regional field measurements across unsampled times and locations. GRRIEn analysis is generalizable, meaning results from a sample are translated to landscape scales by combining direct environmental measurements with global EOs using supervised machine learning; robust, meaning that the model shows good performance on data with scale-dependent feature and observation dependence; reproducible, based on a standard repository structure so that other scientists can quickly and easily replicate the analysis with a few computational tools; and interpreted, meaning that Earth scientists apply domain expertise to ensure that model parameters reflect a physically plausible diagnosis of the environmental system. This tutorial presents standard steps for achieving GRRIEn analysis by combining conventions of rigor in traditional experimental design with the open-science movement.
Significance Statement
Earth science researchers in the digital age are often tasked with pioneering big data analyses, yet have limited formal training in statistics and computational methods such as databasing or computer programming. Earth science researchers often spend tremendous amounts of time learning core computational skills, and making core analytical mistakes, in the process of bridging this training gap, at risk to the reputability of observational geostatistical research. The GRRIEn analytical framework is a practical guide introducing community standards for each phase of the computational research pipeline (dataset engineering, model training, and model diagnostics) to promote rigorous, accessible use of global EOs in Earth systems research.
Abstract
Two distinct features of anthropogenic climate change, warming in the tropical upper troposphere and warming at the Arctic surface, have competing effects on the midlatitude jet stream’s latitudinal position, often referred to as a “tug-of-war.” Studies that investigate the jet’s response to these thermal forcings show that it is sensitive to model type, season, initial atmospheric conditions, and the shape and magnitude of the forcing. Much of this past work focuses on studying a simulation’s response to external manipulation. In contrast, we explore the potential to train a convolutional neural network (CNN) on internal variability alone and then use it to examine possible nonlinear responses of the jet to tropospheric thermal forcing that more closely resemble anthropogenic climate change. Our approach leverages the idea behind the fluctuation–dissipation theorem, which relates the internal variability of a system to its forced response but so far has been only used to quantify linear responses. We train a CNN on data from a long control run of the CESM dry dynamical core and show that it is able to skillfully predict the nonlinear response of the jet to sustained external forcing. The trained CNN provides a quick method for exploring the jet stream sensitivity to a wide range of tropospheric temperature tendencies and, considering that this method can likely be applied to any model with a long control run, could be useful for early-stage experiment design.
Abstract
Two distinct features of anthropogenic climate change, warming in the tropical upper troposphere and warming at the Arctic surface, have competing effects on the midlatitude jet stream’s latitudinal position, often referred to as a “tug-of-war.” Studies that investigate the jet’s response to these thermal forcings show that it is sensitive to model type, season, initial atmospheric conditions, and the shape and magnitude of the forcing. Much of this past work focuses on studying a simulation’s response to external manipulation. In contrast, we explore the potential to train a convolutional neural network (CNN) on internal variability alone and then use it to examine possible nonlinear responses of the jet to tropospheric thermal forcing that more closely resemble anthropogenic climate change. Our approach leverages the idea behind the fluctuation–dissipation theorem, which relates the internal variability of a system to its forced response but so far has been only used to quantify linear responses. We train a CNN on data from a long control run of the CESM dry dynamical core and show that it is able to skillfully predict the nonlinear response of the jet to sustained external forcing. The trained CNN provides a quick method for exploring the jet stream sensitivity to a wide range of tropospheric temperature tendencies and, considering that this method can likely be applied to any model with a long control run, could be useful for early-stage experiment design.
Abstract
The objective of this paper is to employ machine learning (ML) and deep learning (DL) techniques to obtain, from input data (storm features) available in or derived from the HURDAT2 database, models capable of simulating important hurricane properties (e.g., landfall location and wind speed) consistent with historical records. In pursuit of this objective, a trajectory model providing the storm center in terms of longitude and latitude and intensity models providing the central pressure and maximum 1-min wind speed at 10-m elevation were created. The trajectory and intensity models are coupled and must be advanced together, 6 h at a time, as the features that serve as inputs to the models at any given step depend on predictions at the previous time steps. Once a synthetic storm database is generated, properties of interest, such as the frequencies of large wind speeds, may be extracted from any part of the simulation domain. The coupling of the trajectory and intensity models obviates the need for an intensity decay model inland of the coastline. Prediction results are compared with historical data, and the efficacy of the storm simulation models is evaluated at four sites: New Orleans, Louisiana; Miami, Florida; Cape Hatteras, North Carolina; and Boston, Massachusetts.
Abstract
The objective of this paper is to employ machine learning (ML) and deep learning (DL) techniques to obtain, from input data (storm features) available in or derived from the HURDAT2 database, models capable of simulating important hurricane properties (e.g., landfall location and wind speed) consistent with historical records. In pursuit of this objective, a trajectory model providing the storm center in terms of longitude and latitude and intensity models providing the central pressure and maximum 1-min wind speed at 10-m elevation were created. The trajectory and intensity models are coupled and must be advanced together, 6 h at a time, as the features that serve as inputs to the models at any given step depend on predictions at the previous time steps. Once a synthetic storm database is generated, properties of interest, such as the frequencies of large wind speeds, may be extracted from any part of the simulation domain. The coupling of the trajectory and intensity models obviates the need for an intensity decay model inland of the coastline. Prediction results are compared with historical data, and the efficacy of the storm simulation models is evaluated at four sites: New Orleans, Louisiana; Miami, Florida; Cape Hatteras, North Carolina; and Boston, Massachusetts.
Abstract
Many urban coastal communities are experiencing more profound flood impacts due to accelerated sea level rise that sometimes exceed their capacity to protect the built environment. In such cases, relocation may serve as a more effective hazard mitigation and adaptation strategy. However, it is unclear how urban residents living in flood-prone locations perceive the possibility of relocation and under what circumstances they would consider moving. Understanding the factors affecting an individual’s willingness to relocate because of coastal flooding is vital for developing accessible and equitable relocation policies. The main objective of this study is to identify the key considerations that would prompt urban coastal residents to consider permanent relocation because of coastal flooding. We leverage survey data collected from urban areas along the East Coast, assessing attitudes toward relocation, and design an artificial neural network (ANN) and a random forest (RF) model to find patterns in the survey data and indicate which considerations impact the decision to consider relocation. We trained the models to predict whether respondents would relocate because of socioeconomic factors, past exposure and experiences with flooding, and their flood-related concerns. Analyses performed on the models highlight the importance of flood-related concerns that accurately predict relocation behavior. Some common factors among the model analyses are concerns with increasing crime, the possibility of experiencing one more flood per year in the future, and more frequent business closures resulting from flooding.
Abstract
Many urban coastal communities are experiencing more profound flood impacts due to accelerated sea level rise that sometimes exceed their capacity to protect the built environment. In such cases, relocation may serve as a more effective hazard mitigation and adaptation strategy. However, it is unclear how urban residents living in flood-prone locations perceive the possibility of relocation and under what circumstances they would consider moving. Understanding the factors affecting an individual’s willingness to relocate because of coastal flooding is vital for developing accessible and equitable relocation policies. The main objective of this study is to identify the key considerations that would prompt urban coastal residents to consider permanent relocation because of coastal flooding. We leverage survey data collected from urban areas along the East Coast, assessing attitudes toward relocation, and design an artificial neural network (ANN) and a random forest (RF) model to find patterns in the survey data and indicate which considerations impact the decision to consider relocation. We trained the models to predict whether respondents would relocate because of socioeconomic factors, past exposure and experiences with flooding, and their flood-related concerns. Analyses performed on the models highlight the importance of flood-related concerns that accurately predict relocation behavior. Some common factors among the model analyses are concerns with increasing crime, the possibility of experiencing one more flood per year in the future, and more frequent business closures resulting from flooding.
Abstract
Accurate cloud-type identification and coverage analysis are crucial in understanding Earth’s radiative budget. Traditional computer vision methods rely on low-level visual features of clouds for estimating cloud coverage or sky conditions. Several handcrafted approaches have been proposed; however, scope for improvement still exists. Newer deep neural networks (DNNs) have demonstrated superior performance for cloud segmentation and categorization. These methods, however, need expert engineering intervention in the preprocessing steps—in the traditional methods—or human assistance in assigning cloud or clear-sky labels to a pixel for training DNNs. Such human mediation imposes considerable time and labor costs. We present the application of a new self-supervised learning approach to autonomously extract relevant features from sky images captured by ground-based cameras, for the classification and segmentation of clouds. We evaluate a joint embedding architecture that uses self-knowledge distillation plus regularization. We use two datasets to demonstrate the network’s ability to classify and segment sky images—one with ∼85 000 images collected from our ground-based camera and another with 400 labeled images from the WSISEG-Database. We find that this approach can discriminate full-sky images based on cloud coverage, diurnal variation, and cloud-base height. Furthermore, it semantically segments the cloud areas without labels. The approach shows competitive performance in all tested tasks, suggesting a new alternative for cloud characterization.
Significance Statement
Cloud macrophysical properties such as cloud-base height and coverage determine the amount of incoming radiation, mostly solar, and outgoing radiation, partly reflected from the sun and partly emitted from the Earth system, including the atmosphere. When this radiative budget is out of balance, it can affect our climate. Reporting sky conditions or cloud coverage from ground-based sky-imaging equipment is crucial in understanding Earth’s radiative budget. We present the application of a novel artificial intelligence approach to autonomously extract relevant features from sky images, for the characterization of atmospheric conditions. Unlike previous strategies, this novel approach requires reduced human intervention, suggesting a new path for cloud characterization.
Abstract
Accurate cloud-type identification and coverage analysis are crucial in understanding Earth’s radiative budget. Traditional computer vision methods rely on low-level visual features of clouds for estimating cloud coverage or sky conditions. Several handcrafted approaches have been proposed; however, scope for improvement still exists. Newer deep neural networks (DNNs) have demonstrated superior performance for cloud segmentation and categorization. These methods, however, need expert engineering intervention in the preprocessing steps—in the traditional methods—or human assistance in assigning cloud or clear-sky labels to a pixel for training DNNs. Such human mediation imposes considerable time and labor costs. We present the application of a new self-supervised learning approach to autonomously extract relevant features from sky images captured by ground-based cameras, for the classification and segmentation of clouds. We evaluate a joint embedding architecture that uses self-knowledge distillation plus regularization. We use two datasets to demonstrate the network’s ability to classify and segment sky images—one with ∼85 000 images collected from our ground-based camera and another with 400 labeled images from the WSISEG-Database. We find that this approach can discriminate full-sky images based on cloud coverage, diurnal variation, and cloud-base height. Furthermore, it semantically segments the cloud areas without labels. The approach shows competitive performance in all tested tasks, suggesting a new alternative for cloud characterization.
Significance Statement
Cloud macrophysical properties such as cloud-base height and coverage determine the amount of incoming radiation, mostly solar, and outgoing radiation, partly reflected from the sun and partly emitted from the Earth system, including the atmosphere. When this radiative budget is out of balance, it can affect our climate. Reporting sky conditions or cloud coverage from ground-based sky-imaging equipment is crucial in understanding Earth’s radiative budget. We present the application of a novel artificial intelligence approach to autonomously extract relevant features from sky images, for the characterization of atmospheric conditions. Unlike previous strategies, this novel approach requires reduced human intervention, suggesting a new path for cloud characterization.
Abstract
Earth system models struggle to simulate clouds and their radiative effects over the Southern Ocean, partly due to a lack of measurements and targeted cloud microphysics knowledge. We have evaluated biases of downwelling shortwave radiation in the ERA5 climate reanalysis using 25 years (1995–2019) of summertime surface measurements, collected on the Research and Supply Vessel (RSV) Aurora Australis, the Research Vessel (R/V) Investigator, and at Macquarie Island. During October–March daylight hours, the ERA5 simulation of SWdown exhibited large errors (mean bias = 54 W m−2, mean absolute error = 82 W m−2, root-mean-square error = 132 W m−2, and R 2 = 0.71). To determine whether we could improve these statistics, we bypassed ERA5’s radiative transfer model for SWdown with machine learning–based models using a number of ERA5’s gridscale meteorological variables as predictors. These models were trained and tested with the surface measurements of SWdown using a 10-fold shuffle split. An extreme gradient boosting (XGBoost) and a random forest–based model setup had the best performance relative to ERA5, both with a near complete reduction of the mean bias error, a decrease in the mean absolute error and root-mean-square error by 25% ± 3%, and an increase in the R 2 value of 5% ± 1% over the 10 splits. Large improvements occurred at higher latitudes and cyclone cold sectors, where ERA5 performed most poorly. We further interpret our methods using Shapley additive explanations. Our results indicate that data-driven techniques could have an important role in simulating surface radiation fluxes and in improving reanalysis products.
Significance Statement
Simulating the amount of sunlight reaching Earth’s surface is difficult because it relies on a good understanding of how much clouds absorb and scatter sunlight. Relative to summertime surface observations, the ERA5 reanalysis still overestimates the amount of sunlight entering the Southern Ocean. We taught some models how to predict the amount of sunlight entering the Southern Ocean using 25 years of surface observations and a small set of meteorological variables from ERA5. By bypassing the ERA5’s internal simulation of the absorption and scattering of sunlight, we can drastically reduce biases in the predicted surface shortwave radiation. Large improvements in cold sectors of cyclones and closer to Antarctica were observed in regions where many numerical models struggle to simulate the amount of incoming sunlight correctly.
Abstract
Earth system models struggle to simulate clouds and their radiative effects over the Southern Ocean, partly due to a lack of measurements and targeted cloud microphysics knowledge. We have evaluated biases of downwelling shortwave radiation in the ERA5 climate reanalysis using 25 years (1995–2019) of summertime surface measurements, collected on the Research and Supply Vessel (RSV) Aurora Australis, the Research Vessel (R/V) Investigator, and at Macquarie Island. During October–March daylight hours, the ERA5 simulation of SWdown exhibited large errors (mean bias = 54 W m−2, mean absolute error = 82 W m−2, root-mean-square error = 132 W m−2, and R 2 = 0.71). To determine whether we could improve these statistics, we bypassed ERA5’s radiative transfer model for SWdown with machine learning–based models using a number of ERA5’s gridscale meteorological variables as predictors. These models were trained and tested with the surface measurements of SWdown using a 10-fold shuffle split. An extreme gradient boosting (XGBoost) and a random forest–based model setup had the best performance relative to ERA5, both with a near complete reduction of the mean bias error, a decrease in the mean absolute error and root-mean-square error by 25% ± 3%, and an increase in the R 2 value of 5% ± 1% over the 10 splits. Large improvements occurred at higher latitudes and cyclone cold sectors, where ERA5 performed most poorly. We further interpret our methods using Shapley additive explanations. Our results indicate that data-driven techniques could have an important role in simulating surface radiation fluxes and in improving reanalysis products.
Significance Statement
Simulating the amount of sunlight reaching Earth’s surface is difficult because it relies on a good understanding of how much clouds absorb and scatter sunlight. Relative to summertime surface observations, the ERA5 reanalysis still overestimates the amount of sunlight entering the Southern Ocean. We taught some models how to predict the amount of sunlight entering the Southern Ocean using 25 years of surface observations and a small set of meteorological variables from ERA5. By bypassing the ERA5’s internal simulation of the absorption and scattering of sunlight, we can drastically reduce biases in the predicted surface shortwave radiation. Large improvements in cold sectors of cyclones and closer to Antarctica were observed in regions where many numerical models struggle to simulate the amount of incoming sunlight correctly.
Abstract
We assess the suitability of unpaired image-to-image translation networks for bias correcting data simulated by global atmospheric circulation models. We use the Unsupervised Image-to-Image Translation (UNIT) neural network architecture to map between data from the HadGEM3-A-N216 model and ERA5 reanalysis data in a geographical area centered on the South Asian monsoon, which has well-documented serious biases in this model. The UNIT network corrects cross-variable correlations and spatial structures but creates bias corrections with less extreme values than the target distribution. By combining the UNIT neural network with the classical technique of quantile mapping (QM), we can produce bias corrections that are better than either alone. The UNIT+QM scheme is shown to correct cross-variable correlations, spatial patterns, and all marginal distributions of single variables. The careful correction of such joint distributions is of high importance for compound extremes research.
Abstract
We assess the suitability of unpaired image-to-image translation networks for bias correcting data simulated by global atmospheric circulation models. We use the Unsupervised Image-to-Image Translation (UNIT) neural network architecture to map between data from the HadGEM3-A-N216 model and ERA5 reanalysis data in a geographical area centered on the South Asian monsoon, which has well-documented serious biases in this model. The UNIT network corrects cross-variable correlations and spatial structures but creates bias corrections with less extreme values than the target distribution. By combining the UNIT neural network with the classical technique of quantile mapping (QM), we can produce bias corrections that are better than either alone. The UNIT+QM scheme is shown to correct cross-variable correlations, spatial patterns, and all marginal distributions of single variables. The careful correction of such joint distributions is of high importance for compound extremes research.
Abstract
The Multi-Radar Multi-Sensor (MRMS) system produces a suite of hydrometeorological products that are widely used for applications such as flash flood warning operations, water resource management, and climatological studies. The MRMS radar-based quantitative precipitation estimation (QPE) products have greater challenges in the western United States compared to the eastern two-thirds of the CONUS due to terrain-related blockages and gaps in radar coverage. Further, orographic enhancement of precipitation often occurs, which is highly variable in space and time and difficult to accurately capture with physically based approaches. A deep learning approach was applied in this study to understand the correlations between several interacting variables and to obtain a more accurate precipitation estimation in these scenarios. The model presented here is a convolutional neural network (CNN), which uses spatial information from small grids of several radar variables to predict an estimated precipitation value at the central grid point. Several case analyses are presented along with a yearlong statistical evaluation. The CNN model 24-h QPE shows higher accuracy than the MRMS radar QPE for several cool season atmospheric river events. Areas of consistent improvement from the CNN model are highlighted in the discussion along with areas where the model can be further improved. The initial findings from this work help set the foundation for further exploration of machine learning techniques and products for precipitation estimation as part of the MRMS operational system.
Significance Statement
This study explores the development and use of a deep learning model to generate precipitation fields in the complex terrain of the western United States. Generally, the model is able to improve on the statistical performance of existing radar-based precipitation estimation methods for several case studies and over a long-term period in 2021. We explore the patterns associated with certain areas of strong performance and suggest potential means of improving areas with weaker performance. These initial results indicate the potential of deep learning to supplement radar-based approaches in areas with observational limitations.
Abstract
The Multi-Radar Multi-Sensor (MRMS) system produces a suite of hydrometeorological products that are widely used for applications such as flash flood warning operations, water resource management, and climatological studies. The MRMS radar-based quantitative precipitation estimation (QPE) products have greater challenges in the western United States compared to the eastern two-thirds of the CONUS due to terrain-related blockages and gaps in radar coverage. Further, orographic enhancement of precipitation often occurs, which is highly variable in space and time and difficult to accurately capture with physically based approaches. A deep learning approach was applied in this study to understand the correlations between several interacting variables and to obtain a more accurate precipitation estimation in these scenarios. The model presented here is a convolutional neural network (CNN), which uses spatial information from small grids of several radar variables to predict an estimated precipitation value at the central grid point. Several case analyses are presented along with a yearlong statistical evaluation. The CNN model 24-h QPE shows higher accuracy than the MRMS radar QPE for several cool season atmospheric river events. Areas of consistent improvement from the CNN model are highlighted in the discussion along with areas where the model can be further improved. The initial findings from this work help set the foundation for further exploration of machine learning techniques and products for precipitation estimation as part of the MRMS operational system.
Significance Statement
This study explores the development and use of a deep learning model to generate precipitation fields in the complex terrain of the western United States. Generally, the model is able to improve on the statistical performance of existing radar-based precipitation estimation methods for several case studies and over a long-term period in 2021. We explore the patterns associated with certain areas of strong performance and suggest potential means of improving areas with weaker performance. These initial results indicate the potential of deep learning to supplement radar-based approaches in areas with observational limitations.
Abstract
This study focuses on assessing the representation and predictability of North American weather regimes, which are persistent large-scale atmospheric patterns, in a set of initialized subseasonal reforecasts created using the Community Earth System Model, version 2 (CESM2). The k-means clustering was used to extract four key North American (10°–70°N, 150°–40°W) weather regimes within ERA5 reanalysis, which were used to interpret CESM2 subseasonal forecast performance. Results show that CESM2 can recreate the climatology of the four main North American weather regimes with skill but exhibits biases during later lead times with overoccurrence of the West Coast high regime and underoccurrence of the Greenland high and Alaskan ridge regimes. Overall, the West Coast high and Pacific trough regimes exhibited higher predictability within CESM2, partly related to El Niño. Despite biases, several reforecasts were skillful and exhibited high predictability during later lead times, which could be partly attributed to skillful representation of the atmosphere from the tropics to extratropics upstream of North America. The high predictability at the subseasonal time scale of these case-study examples was manifested as an “ensemble realignment,” in which most ensemble members agreed on a prediction despite ensemble trajectory dispersion during earlier lead times. Weather regimes were also shown to project distinct temperature and precipitation anomalies across North America that largely agree with observational products. This study further demonstrates that unsupervised learning methods can be used to uncover sources and limits of subseasonal predictability, along with systematic biases present in numerical prediction systems.
Significance Statement
North American weather regimes are large-scale atmospheric patterns that can persist for several days. Their skillful subseasonal (2 weeks or greater) prediction can provide valuable lead time to prepare for temperature and precipitation anomalies that can stress energy and water resources. The purpose of this study was to assess the climatological representation and subseasonal predictability of four key North American weather regimes using a research subseasonal prediction system and clustering analysis. We found that the Pacific trough and West Coast high regimes exhibited higher predictability than other regimes and that skillful representation of conditions across the tropics and extratropics can increase predictability during later lead times. Future work will quantify causal pathways associated with high predictability.
Abstract
This study focuses on assessing the representation and predictability of North American weather regimes, which are persistent large-scale atmospheric patterns, in a set of initialized subseasonal reforecasts created using the Community Earth System Model, version 2 (CESM2). The k-means clustering was used to extract four key North American (10°–70°N, 150°–40°W) weather regimes within ERA5 reanalysis, which were used to interpret CESM2 subseasonal forecast performance. Results show that CESM2 can recreate the climatology of the four main North American weather regimes with skill but exhibits biases during later lead times with overoccurrence of the West Coast high regime and underoccurrence of the Greenland high and Alaskan ridge regimes. Overall, the West Coast high and Pacific trough regimes exhibited higher predictability within CESM2, partly related to El Niño. Despite biases, several reforecasts were skillful and exhibited high predictability during later lead times, which could be partly attributed to skillful representation of the atmosphere from the tropics to extratropics upstream of North America. The high predictability at the subseasonal time scale of these case-study examples was manifested as an “ensemble realignment,” in which most ensemble members agreed on a prediction despite ensemble trajectory dispersion during earlier lead times. Weather regimes were also shown to project distinct temperature and precipitation anomalies across North America that largely agree with observational products. This study further demonstrates that unsupervised learning methods can be used to uncover sources and limits of subseasonal predictability, along with systematic biases present in numerical prediction systems.
Significance Statement
North American weather regimes are large-scale atmospheric patterns that can persist for several days. Their skillful subseasonal (2 weeks or greater) prediction can provide valuable lead time to prepare for temperature and precipitation anomalies that can stress energy and water resources. The purpose of this study was to assess the climatological representation and subseasonal predictability of four key North American weather regimes using a research subseasonal prediction system and clustering analysis. We found that the Pacific trough and West Coast high regimes exhibited higher predictability than other regimes and that skillful representation of conditions across the tropics and extratropics can increase predictability during later lead times. Future work will quantify causal pathways associated with high predictability.
Abstract
This study proposes and assesses a methodology to obtain high-quality probabilistic predictions and uncertainty information of near-landfall tropical cyclone–driven (TC-driven) storm tide and inundation with limited time and resources. Forecasts of TC track, intensity, and size are perturbed according to quasi-random Korobov sequences of historical forecast errors with assumed Gaussian and uniform statistical distributions. These perturbations are run in an ensemble of hydrodynamic storm tide model simulations. The resulting set of maximum water surface elevations are dimensionality reduced using Karhunen–Loève expansions and then used as a training set to develop a polynomial chaos (PC) surrogate model from which global sensitivities and probabilistic predictions can be extracted. The maximum water surface elevation is extrapolated over dry points incorporating energy head loss with distance to properly train the surrogate for predicting inundation. We find that the surrogate constructed with third-order PCs using elastic net penalized regression with leave-one-out cross validation provides the most robust fit across training and test sets. Probabilistic predictions of maximum water surface elevation and inundation area by the surrogate model at 48-h lead time for three past U.S. landfalling hurricanes (Irma in 2017, Florence in 2018, and Laura in 2020) are found to be reliable when compared to best track hindcast simulation results, even when trained with as few as 19 samples. The maximum water surface elevation is most sensitive to perpendicular track-offset errors for all three storms. Laura is also highly sensitive to storm size and has the least reliable prediction.
Significance Statement
The purpose of this study is to develop and evaluate a methodology that can be used to provide high-quality probabilistic predictions of hurricane-induced storm tide and inundation with limited time and resources. This is important for emergency management purposes during or after the landfall of hurricanes. Our results show that sampling forecast errors using quasi-random sequences combined with machine learning techniques that fit polynomial functions to the data are well suited to this task. The polynomial functions also have the benefit of producing exact sensitivity indices of storm tide and inundation to the forecasted hurricane properties such as path, intensity, and size, which can be used for uncertainty estimation. The code implementing the presented methodology is publicly available on GitHub.
Abstract
This study proposes and assesses a methodology to obtain high-quality probabilistic predictions and uncertainty information of near-landfall tropical cyclone–driven (TC-driven) storm tide and inundation with limited time and resources. Forecasts of TC track, intensity, and size are perturbed according to quasi-random Korobov sequences of historical forecast errors with assumed Gaussian and uniform statistical distributions. These perturbations are run in an ensemble of hydrodynamic storm tide model simulations. The resulting set of maximum water surface elevations are dimensionality reduced using Karhunen–Loève expansions and then used as a training set to develop a polynomial chaos (PC) surrogate model from which global sensitivities and probabilistic predictions can be extracted. The maximum water surface elevation is extrapolated over dry points incorporating energy head loss with distance to properly train the surrogate for predicting inundation. We find that the surrogate constructed with third-order PCs using elastic net penalized regression with leave-one-out cross validation provides the most robust fit across training and test sets. Probabilistic predictions of maximum water surface elevation and inundation area by the surrogate model at 48-h lead time for three past U.S. landfalling hurricanes (Irma in 2017, Florence in 2018, and Laura in 2020) are found to be reliable when compared to best track hindcast simulation results, even when trained with as few as 19 samples. The maximum water surface elevation is most sensitive to perpendicular track-offset errors for all three storms. Laura is also highly sensitive to storm size and has the least reliable prediction.
Significance Statement
The purpose of this study is to develop and evaluate a methodology that can be used to provide high-quality probabilistic predictions of hurricane-induced storm tide and inundation with limited time and resources. This is important for emergency management purposes during or after the landfall of hurricanes. Our results show that sampling forecast errors using quasi-random sequences combined with machine learning techniques that fit polynomial functions to the data are well suited to this task. The polynomial functions also have the benefit of producing exact sensitivity indices of storm tide and inundation to the forecasted hurricane properties such as path, intensity, and size, which can be used for uncertainty estimation. The code implementing the presented methodology is publicly available on GitHub.