Browse
Abstract
The implications of definitions of excessive rainfall observations on machine learning-model forecast skill is assessed using the Colorado State University Machine Learning Probabilities (CSU-MLP) forecast system. The CSU-MLP uses historical observations along with reforecasts from a global ensemble to train random forests to probabilistically predict excessive rainfall events. Here, random forest models are trained using two distinct rainfall datasets, one that is composed of fixed-frequency (FF) average recurrence intervals exceedances and flash flood reports, and the other a compilation of flooding and rainfall proxies (Unified Flood Verification System; UFVS). Both models generate 1–3 day forecasts and are evaluated against a climatological baseline to characterize their overall skill as a function of lead time, season, and region. Model comparisons suggest that regional frequencies in excessive rainfall observations contribute to when and where the ML models issue forecasts, and subsequently their skill and reliability. Additionally, the spatio-temporal distribution of observations have implications for ML model training requirements, notably, how long of an observational record is needed to obtain skillful forecasts. Experiments reveal that shorter-trained UFVS-based models can be as skillful as longer-trained FF-based models. In essence, the UFVS dataset exhibits a more robust characterization of excessive rainfall and impacts, and machine learning models trained on more representative datasets of meteorological hazards may not require as extensive training to generate skillful forecasts.
Abstract
The implications of definitions of excessive rainfall observations on machine learning-model forecast skill is assessed using the Colorado State University Machine Learning Probabilities (CSU-MLP) forecast system. The CSU-MLP uses historical observations along with reforecasts from a global ensemble to train random forests to probabilistically predict excessive rainfall events. Here, random forest models are trained using two distinct rainfall datasets, one that is composed of fixed-frequency (FF) average recurrence intervals exceedances and flash flood reports, and the other a compilation of flooding and rainfall proxies (Unified Flood Verification System; UFVS). Both models generate 1–3 day forecasts and are evaluated against a climatological baseline to characterize their overall skill as a function of lead time, season, and region. Model comparisons suggest that regional frequencies in excessive rainfall observations contribute to when and where the ML models issue forecasts, and subsequently their skill and reliability. Additionally, the spatio-temporal distribution of observations have implications for ML model training requirements, notably, how long of an observational record is needed to obtain skillful forecasts. Experiments reveal that shorter-trained UFVS-based models can be as skillful as longer-trained FF-based models. In essence, the UFVS dataset exhibits a more robust characterization of excessive rainfall and impacts, and machine learning models trained on more representative datasets of meteorological hazards may not require as extensive training to generate skillful forecasts.
Abstract
Probabilistic forecasts derived from ensemble prediction systems (EPS) have become the standard basis for many products and services produced by modern operational forecasting centres. However statistical post-processing is generally required to ensure forecasts have the desired properties expected for probability-based outputs. Precipitation, a core component of any operational forecast, is particularly challenging to calibrate due to its discontinuous nature and the extreme skew in rainfall amounts. A skillful forecasting system must maintain accuracy for low-to-moderate precipitation amounts, but preserve resolvability for high-to-extreme rainfall amounts, which, though rare, are important to forecast accurately in the interest of public safety. Existing statistical and machine-learning approaches to rainfall calibration address this problem, but each has drawbacks in design, training approaches, and/or performance. We describe RainForests, a machine-learning approach for calibrating ensemble rainfall forecasts using gradient-boosted decision trees. The model is based on the ecPoint system recently developed at ECMWF by Hewson and Pillosu (2021), but uses machine-learning models in place of the semi-subjective decision trees of ecPoint, along with some other improvements to the model structure. We evaluate RainForests on the Australian domain against some simple benchmarks, and show that it outperforms standard calibration approaches both in overall skill and in accurately forecasting high rainfall conditions, while being computationally efficient enough to be used in an operational forecasting system.
Abstract
Probabilistic forecasts derived from ensemble prediction systems (EPS) have become the standard basis for many products and services produced by modern operational forecasting centres. However statistical post-processing is generally required to ensure forecasts have the desired properties expected for probability-based outputs. Precipitation, a core component of any operational forecast, is particularly challenging to calibrate due to its discontinuous nature and the extreme skew in rainfall amounts. A skillful forecasting system must maintain accuracy for low-to-moderate precipitation amounts, but preserve resolvability for high-to-extreme rainfall amounts, which, though rare, are important to forecast accurately in the interest of public safety. Existing statistical and machine-learning approaches to rainfall calibration address this problem, but each has drawbacks in design, training approaches, and/or performance. We describe RainForests, a machine-learning approach for calibrating ensemble rainfall forecasts using gradient-boosted decision trees. The model is based on the ecPoint system recently developed at ECMWF by Hewson and Pillosu (2021), but uses machine-learning models in place of the semi-subjective decision trees of ecPoint, along with some other improvements to the model structure. We evaluate RainForests on the Australian domain against some simple benchmarks, and show that it outperforms standard calibration approaches both in overall skill and in accurately forecasting high rainfall conditions, while being computationally efficient enough to be used in an operational forecasting system.
Abstract
In this study, we introduce an ensemble approach to provide a probabilistic seasonal outlook of the length and seasonal rainfall anomaly of the wet season over Florida using the observed variations of the onset date of the season at the granularity of ∼10km grid resolution (which is the spatial resolution of the observed rainfall data used for this work). The timeseries of daily precipitation at the grid resolution of NASA’s Global Precipitation Mission is randomly perturbed 1000 times to account for the uncertainty of synoptic to mesoscale variations on the diagnosis of the onset and demise date of the wet season. The strong co-variability of the onset date with the seasonal length and seasonal rainfall anomaly of the wet season is then leveraged to provide the seasonal outlooks by monitoring the onset date of the wet season. This simple seasonal outlook is effective in predicting extreme tercile and even extreme pentile anomalies across Florida. We suggest that the proposed approach to the seasonal outlook of the wet season of Florida provides a viable alternative in the absence of strong external forcing like ENSO or tropical Atlantic variability that potentially limits the predictability of numerical climate models used for seasonal prediction.
Abstract
In this study, we introduce an ensemble approach to provide a probabilistic seasonal outlook of the length and seasonal rainfall anomaly of the wet season over Florida using the observed variations of the onset date of the season at the granularity of ∼10km grid resolution (which is the spatial resolution of the observed rainfall data used for this work). The timeseries of daily precipitation at the grid resolution of NASA’s Global Precipitation Mission is randomly perturbed 1000 times to account for the uncertainty of synoptic to mesoscale variations on the diagnosis of the onset and demise date of the wet season. The strong co-variability of the onset date with the seasonal length and seasonal rainfall anomaly of the wet season is then leveraged to provide the seasonal outlooks by monitoring the onset date of the wet season. This simple seasonal outlook is effective in predicting extreme tercile and even extreme pentile anomalies across Florida. We suggest that the proposed approach to the seasonal outlook of the wet season of Florida provides a viable alternative in the absence of strong external forcing like ENSO or tropical Atlantic variability that potentially limits the predictability of numerical climate models used for seasonal prediction.
Abstract
A skill baseline for five-day, 34-, 50-, and 64-knot (1 kt = 0.514 m s−1) tropical cyclone (TC) wind radii forecasts is described. The Markov Model CLiper (MMCL) generates a sequence of 12-h forecasts out to a forecast length limited only by the length of the forecast track and intensity. The model employs a climatology of TC size based on infrared satellite imagery, a Markov chain, and a basin-specific drift. MMCL uses the initial wind radii and initial forecast track and intensity as input. Unlike the previously developed wind radii climatology and persistence model (DRCL) that reverts to a climatological size and shape after approximately 48 h, MMCL retains more of its initial size and asymmetry and is likely more palatable for use in operational forecasting. MMCL runs operationally in the western North Pacific basin, the North Indian Ocean, and the Southern Hemisphere for the Joint Typhoon Warning Center (JTWC) in Pearl Harbor, Hawaii. This work also describes the development of Atlantic and eastern North Pacific versions of MMCL. MMCL’s formulation allows unlimited extension of forecast lead time without reverting to a generic climatological size and shape. Independent forecast comparisons between MMCL and DRCL for the 2020–2022 seasons demonstrates that MMCL’s mean absolute errors are generally smaller and biases are closer to zero in North Atlantic, and eastern North Pacific basins, and in the Southern Hemisphere. This validation includes a few example forecasts and demonstrates that MMCL can be used both as a baseline for assessing wind radii forecast skill and operational use.
Abstract
A skill baseline for five-day, 34-, 50-, and 64-knot (1 kt = 0.514 m s−1) tropical cyclone (TC) wind radii forecasts is described. The Markov Model CLiper (MMCL) generates a sequence of 12-h forecasts out to a forecast length limited only by the length of the forecast track and intensity. The model employs a climatology of TC size based on infrared satellite imagery, a Markov chain, and a basin-specific drift. MMCL uses the initial wind radii and initial forecast track and intensity as input. Unlike the previously developed wind radii climatology and persistence model (DRCL) that reverts to a climatological size and shape after approximately 48 h, MMCL retains more of its initial size and asymmetry and is likely more palatable for use in operational forecasting. MMCL runs operationally in the western North Pacific basin, the North Indian Ocean, and the Southern Hemisphere for the Joint Typhoon Warning Center (JTWC) in Pearl Harbor, Hawaii. This work also describes the development of Atlantic and eastern North Pacific versions of MMCL. MMCL’s formulation allows unlimited extension of forecast lead time without reverting to a generic climatological size and shape. Independent forecast comparisons between MMCL and DRCL for the 2020–2022 seasons demonstrates that MMCL’s mean absolute errors are generally smaller and biases are closer to zero in North Atlantic, and eastern North Pacific basins, and in the Southern Hemisphere. This validation includes a few example forecasts and demonstrates that MMCL can be used both as a baseline for assessing wind radii forecast skill and operational use.
Abstract
We describe the development of the wave component in the first global-scale coupled operational forecast system using the Unified Forecasting System (UFS) at the National Oceanic and Atmospheric Administration (NOAA), part of the US National Weather Service (NWS) operational forecasting suite. The operational implementation of the atmosphere-wave coupled Global Ensemble Forecast System version 12 (GEFSv12) in September 2020 was a critical step in NOAA’s transition to the broader community-based UFS framework. GEFSv12 represents a significant advancement, extending forecast ranges and empowering the NWS to deliver advanced weather predictions with extended lead times for high-impact events. The integration of a coupled wave component with higher spatial and temporal resolution and optimized physics parameterizations notably enhanced forecast skill and predictability, particularly benefiting winter storm predictions of wave heights and peak wave periods. This successful endeavor encountered challenges that were addressed by the simultaneous development of new features that enhanced wave model forecast skill and product quality and facilitated by a multidisciplinary team collaborating with NOAA’s operational forecasting centers. The GEFSv12 upgrade marks a pivotal shift in NOAA’s global forecasting capabilities, setting a new standard in wave prediction. We also describe the coupled GEFSv12-Wave component impacts on NOAA operational forecasts, and ongoing experimental enhancements, which altogether represent a substantial contribution to NOAA’s transition to the fully-coupled UFS framework.
Abstract
We describe the development of the wave component in the first global-scale coupled operational forecast system using the Unified Forecasting System (UFS) at the National Oceanic and Atmospheric Administration (NOAA), part of the US National Weather Service (NWS) operational forecasting suite. The operational implementation of the atmosphere-wave coupled Global Ensemble Forecast System version 12 (GEFSv12) in September 2020 was a critical step in NOAA’s transition to the broader community-based UFS framework. GEFSv12 represents a significant advancement, extending forecast ranges and empowering the NWS to deliver advanced weather predictions with extended lead times for high-impact events. The integration of a coupled wave component with higher spatial and temporal resolution and optimized physics parameterizations notably enhanced forecast skill and predictability, particularly benefiting winter storm predictions of wave heights and peak wave periods. This successful endeavor encountered challenges that were addressed by the simultaneous development of new features that enhanced wave model forecast skill and product quality and facilitated by a multidisciplinary team collaborating with NOAA’s operational forecasting centers. The GEFSv12 upgrade marks a pivotal shift in NOAA’s global forecasting capabilities, setting a new standard in wave prediction. We also describe the coupled GEFSv12-Wave component impacts on NOAA operational forecasts, and ongoing experimental enhancements, which altogether represent a substantial contribution to NOAA’s transition to the fully-coupled UFS framework.
Abstract
Reanalysis proximity vertical profile attributes associated with long-track tornadoes [LTTs; pathlength ≥48 km (30 mi)] and short-track tornadoes [STTs; pathlengths <48 km (30 mi)] for a total of 48 212 tornadoes with pathlengths ≥0.16 km (0.1 mi) from 1979–2022 in the United States were examined. Both longer- and shorter-track tornadoes were associated with vast ranges of mixed-layer convective available potential energy, together with relatively low mixed-layer lifted condensation level heights and minimal convective inhibition. A large range of 500–9000-m wind speeds and bulk wind differences, 500–3000-m streamwise vorticities, storm-relative helicities, and storm-relative wind speeds were found for STTs. In stark contrast, LTTs only occurred when these kinematic attributes were larger in amplitude through the troposphere, supporting previously documented associations between observed longer-track tornado pathlengths and faster-propagating parent storms. A novel parameter, heavily weighted by kinematic parameters and lightly weighted by thermodynamic parameters, outperformed the significant tornado parameter in differentiating environments that were more supportive of both LTTs as well as tornadoes rated <EF5. The high correlation values R2 = 0.79 between tornado pathlength and Bunkers’ approximate tornado duration (pathlength / VBunkers) calls for improved understanding of mesocyclone periodicities, which impact tornado longevity, to improve tornado pathlength diagnoses and forecasts. Pragmatically, diagnosing LTT environments using vertical profile attributes, perhaps, is not so much a problem of determining when there might be higher expectations for LTTs, but rather a problem of when there might be lower expectations for LTTs, e.g., weaker kinematic attributes in the lower troposphere.
Abstract
Reanalysis proximity vertical profile attributes associated with long-track tornadoes [LTTs; pathlength ≥48 km (30 mi)] and short-track tornadoes [STTs; pathlengths <48 km (30 mi)] for a total of 48 212 tornadoes with pathlengths ≥0.16 km (0.1 mi) from 1979–2022 in the United States were examined. Both longer- and shorter-track tornadoes were associated with vast ranges of mixed-layer convective available potential energy, together with relatively low mixed-layer lifted condensation level heights and minimal convective inhibition. A large range of 500–9000-m wind speeds and bulk wind differences, 500–3000-m streamwise vorticities, storm-relative helicities, and storm-relative wind speeds were found for STTs. In stark contrast, LTTs only occurred when these kinematic attributes were larger in amplitude through the troposphere, supporting previously documented associations between observed longer-track tornado pathlengths and faster-propagating parent storms. A novel parameter, heavily weighted by kinematic parameters and lightly weighted by thermodynamic parameters, outperformed the significant tornado parameter in differentiating environments that were more supportive of both LTTs as well as tornadoes rated <EF5. The high correlation values R2 = 0.79 between tornado pathlength and Bunkers’ approximate tornado duration (pathlength / VBunkers) calls for improved understanding of mesocyclone periodicities, which impact tornado longevity, to improve tornado pathlength diagnoses and forecasts. Pragmatically, diagnosing LTT environments using vertical profile attributes, perhaps, is not so much a problem of determining when there might be higher expectations for LTTs, but rather a problem of when there might be lower expectations for LTTs, e.g., weaker kinematic attributes in the lower troposphere.
Abstract
Prediction of severe convective storms at timescales of 2–4 weeks is of interest to forecasters and stakeholders due to their impacts to life and property. Prediction of severe convective storms on this timescale is challenging, since the large-scale weather patterns that drive this activity begin to lose dynamic predictability beyond week 1. Previous work related to severe convective storms on the subseasonal timescale has mostly focused on observed relationships with teleconnections. The skill of numerical weather prediction forecasts of convective-related variables has been comparatively less explored. In this study over the United States, a forecast evaluation of variables relevant in the prediction of severe convective storms is conducted using Global Ensemble Forecast System Version 12 reforecasts at lead times up to four weeks. We find that kinematic and thermodynamic fields are predicted with skill out to week 3 in some cases, while composite parameters struggle to achieve meaningful skill into week 2. Additionally, using a novel method of weekly summations of daily maximum composite parameters, we suggest that aggregation of certain variables may assist in providing additional predictability beyond week 1. These results should serve as a reference for forecast skill for the relevant fields and help inform the development of convective forecasting tools at timescales beyond current operational products.
Abstract
Prediction of severe convective storms at timescales of 2–4 weeks is of interest to forecasters and stakeholders due to their impacts to life and property. Prediction of severe convective storms on this timescale is challenging, since the large-scale weather patterns that drive this activity begin to lose dynamic predictability beyond week 1. Previous work related to severe convective storms on the subseasonal timescale has mostly focused on observed relationships with teleconnections. The skill of numerical weather prediction forecasts of convective-related variables has been comparatively less explored. In this study over the United States, a forecast evaluation of variables relevant in the prediction of severe convective storms is conducted using Global Ensemble Forecast System Version 12 reforecasts at lead times up to four weeks. We find that kinematic and thermodynamic fields are predicted with skill out to week 3 in some cases, while composite parameters struggle to achieve meaningful skill into week 2. Additionally, using a novel method of weekly summations of daily maximum composite parameters, we suggest that aggregation of certain variables may assist in providing additional predictability beyond week 1. These results should serve as a reference for forecast skill for the relevant fields and help inform the development of convective forecasting tools at timescales beyond current operational products.
Abstract
Producing a quantitative snowfall forecast (QSF) typically requires a model quantitative precipitation forecast (QPF) and snow-to-liquid ratio (SLR) estimate. QPF and SLR can vary significantly in space and time over complex terrain, necessitating fine-scale or point-specific forecasts of each component. Little Cottonwood Canyon (LCC) in Utah’s Wasatch Range frequently experiences high-impact winter storms and avalanche closures that result in substantial transportation and economic disruptions, making it an excellent testbed for evaluating snowfall forecasts. In this study, we validate QPFs, SLR forecasts, and QSFs produced by or derived from the Global Forecast System (GFS) and High-Resolution Rapid Refresh (HRRR) using liquid precipitation equivalent (LPE) and snowfall observations collected during the 2019/20–2022/23 cool seasons at the Alta–Collins snow-study site (2945 m MSL) in upper LCC. The 12-h QPFs produced by the GFS and HRRR underpredict the total LPE during the four cool seasons by 33% and 29%, respectively, and underpredict 50th, 75th, and 90th percentile event frequencies. Current operational SLR methods exhibit mean absolute errors of 4.5–7.7. In contrast, a locally trained random forest algorithm reduces SLR mean absolute errors to 3.7. Despite the random forest producing more accurate SLR forecasts, QSFs derived from operational SLR methods produce higher critical success indices since they exhibit positive SLR biases that offset negative QPF biases. These results indicate an overall underprediction of LPE by operational models in upper LCC and illustrate the need to identify sources of QSF bias to enhance QSF performance.
Significance Statement
Winter storms in mountainous terrain can disrupt transportation and threaten life and property due to road snow and avalanche hazards. Snow-to-liquid ratio (SLR) is an important variable for snowfall and avalanche forecasts. Using high-quality historical snowfall observations and atmospheric analyses, we developed a machine learning technique for predicting SLR at a high mountain site in Utah’s Little Cottonwood Canyon that is prone to closure due to winter storms. This technique produces improved SLR forecasts for use by weather forecasters and snow-safety personnel. We also show that current operational models and SLR techniques underforecast liquid precipitation amounts and overforecast SLRs, respectively, which has implications for future model development.
Abstract
Producing a quantitative snowfall forecast (QSF) typically requires a model quantitative precipitation forecast (QPF) and snow-to-liquid ratio (SLR) estimate. QPF and SLR can vary significantly in space and time over complex terrain, necessitating fine-scale or point-specific forecasts of each component. Little Cottonwood Canyon (LCC) in Utah’s Wasatch Range frequently experiences high-impact winter storms and avalanche closures that result in substantial transportation and economic disruptions, making it an excellent testbed for evaluating snowfall forecasts. In this study, we validate QPFs, SLR forecasts, and QSFs produced by or derived from the Global Forecast System (GFS) and High-Resolution Rapid Refresh (HRRR) using liquid precipitation equivalent (LPE) and snowfall observations collected during the 2019/20–2022/23 cool seasons at the Alta–Collins snow-study site (2945 m MSL) in upper LCC. The 12-h QPFs produced by the GFS and HRRR underpredict the total LPE during the four cool seasons by 33% and 29%, respectively, and underpredict 50th, 75th, and 90th percentile event frequencies. Current operational SLR methods exhibit mean absolute errors of 4.5–7.7. In contrast, a locally trained random forest algorithm reduces SLR mean absolute errors to 3.7. Despite the random forest producing more accurate SLR forecasts, QSFs derived from operational SLR methods produce higher critical success indices since they exhibit positive SLR biases that offset negative QPF biases. These results indicate an overall underprediction of LPE by operational models in upper LCC and illustrate the need to identify sources of QSF bias to enhance QSF performance.
Significance Statement
Winter storms in mountainous terrain can disrupt transportation and threaten life and property due to road snow and avalanche hazards. Snow-to-liquid ratio (SLR) is an important variable for snowfall and avalanche forecasts. Using high-quality historical snowfall observations and atmospheric analyses, we developed a machine learning technique for predicting SLR at a high mountain site in Utah’s Little Cottonwood Canyon that is prone to closure due to winter storms. This technique produces improved SLR forecasts for use by weather forecasters and snow-safety personnel. We also show that current operational models and SLR techniques underforecast liquid precipitation amounts and overforecast SLRs, respectively, which has implications for future model development.
Abstract
Accurate tropical cyclogenesis (TCG) prediction is important because it allows national operational forecasting agencies to issue timely warnings and implement effective disaster prevention measures. In 2020, the Korea Meteorological Administration employed a self-developed operational model called the Korean Integrated Model (KIM). In this study, we verified KIM’s TCG forecast skill over the western North Pacific. Based on 9-day forecasts, TCG in the model was objectively detected and classified as well-predicted, early formation, late formation, miss, or false alarm by comparing their formation times and locations with those of 46 tropical cyclones (TCs) from June to November in 2020–21 documented by the Joint Typhoon Warning Center. The prediction of large-scale environmental conditions relevant to TCG was also evaluated. The results showed that the probability of KIM detection was comparable to or better than that of previously reported statistics of other numerical weather prediction models. The intrabasin comparison revealed that the probability of detection in the Philippine Sea was the highest, followed by the South China Sea and central Pacific. The best TCG prediction performance in the Philippine Sea was supported by unbiased forecasts in large-scale environments. The missed and false alarm cases in all three regions had the largest prediction biases in the large-scale lower-tropospheric relative vorticity. Excessive false alarms may be associated with prediction biases in the vertical gradient of equivalent potential temperature within the boundary layer. This study serves as a primary guide for national forecasters and is useful to model developers for further refinement of KIM.
Abstract
Accurate tropical cyclogenesis (TCG) prediction is important because it allows national operational forecasting agencies to issue timely warnings and implement effective disaster prevention measures. In 2020, the Korea Meteorological Administration employed a self-developed operational model called the Korean Integrated Model (KIM). In this study, we verified KIM’s TCG forecast skill over the western North Pacific. Based on 9-day forecasts, TCG in the model was objectively detected and classified as well-predicted, early formation, late formation, miss, or false alarm by comparing their formation times and locations with those of 46 tropical cyclones (TCs) from June to November in 2020–21 documented by the Joint Typhoon Warning Center. The prediction of large-scale environmental conditions relevant to TCG was also evaluated. The results showed that the probability of KIM detection was comparable to or better than that of previously reported statistics of other numerical weather prediction models. The intrabasin comparison revealed that the probability of detection in the Philippine Sea was the highest, followed by the South China Sea and central Pacific. The best TCG prediction performance in the Philippine Sea was supported by unbiased forecasts in large-scale environments. The missed and false alarm cases in all three regions had the largest prediction biases in the large-scale lower-tropospheric relative vorticity. Excessive false alarms may be associated with prediction biases in the vertical gradient of equivalent potential temperature within the boundary layer. This study serves as a primary guide for national forecasters and is useful to model developers for further refinement of KIM.