Search Results
You are looking at 1 - 10 of 40 items for
- Author or Editor: Craig S. Schwartz x
- Refine by Access: All Content x
Abstract
Analyses with 20-km horizontal grid spacing were produced from continuously cycling three-dimensional variational (3DVAR), ensemble square root Kalman filter (EnSRF), and “hybrid” variational–ensemble data assimilation (DA) systems over a domain spanning the conterminous United States. These analyses initialized 36-h Weather Research and Forecasting Model forecasts containing a large convection-allowing 4-km nested domain, where downscaled 20-km 3DVAR, EnSRF, and hybrid analyses initialized the 4-km forecasts. Overall, hybrid analyses initialized the best 4-km precipitation forecasts.
Furthermore, whether 4-km precipitation forecasts could be improved by initializing them with true 4-km analyses was assessed. As it was computationally infeasible to produce 4-km continuously cycling ensembles over the large 4-km domain, several “dual-resolution” hybrid DA configurations were adopted where 4-km backgrounds were combined with 20-km ensembles to produce 4-km hybrid analyses. Additionally, 4-km 3DVAR analyses were produced.
In both hybrid and 3DVAR frameworks, initializing 4-km forecasts with true 4-km analyses, rather than downscaled 20-km analyses, yielded superior precipitation forecasts over the first 12 h. Differences between forecasts initialized from 4-km and downscaled 20-km hybrid analyses were smaller for 18–36-h forecasts, but there were occasionally meaningful differences. Continuously cycling the 4-km backgrounds and using static background error covariances with larger horizontal length scales in the hybrid led to better forecasts. All hybrid-initialized forecasts, including those initialized from downscaled 20-km analyses, were more skillful than forecasts initialized from 4-km 3DVAR analyses, suggesting the analysis method was more important than analysis resolution.
Abstract
Analyses with 20-km horizontal grid spacing were produced from continuously cycling three-dimensional variational (3DVAR), ensemble square root Kalman filter (EnSRF), and “hybrid” variational–ensemble data assimilation (DA) systems over a domain spanning the conterminous United States. These analyses initialized 36-h Weather Research and Forecasting Model forecasts containing a large convection-allowing 4-km nested domain, where downscaled 20-km 3DVAR, EnSRF, and hybrid analyses initialized the 4-km forecasts. Overall, hybrid analyses initialized the best 4-km precipitation forecasts.
Furthermore, whether 4-km precipitation forecasts could be improved by initializing them with true 4-km analyses was assessed. As it was computationally infeasible to produce 4-km continuously cycling ensembles over the large 4-km domain, several “dual-resolution” hybrid DA configurations were adopted where 4-km backgrounds were combined with 20-km ensembles to produce 4-km hybrid analyses. Additionally, 4-km 3DVAR analyses were produced.
In both hybrid and 3DVAR frameworks, initializing 4-km forecasts with true 4-km analyses, rather than downscaled 20-km analyses, yielded superior precipitation forecasts over the first 12 h. Differences between forecasts initialized from 4-km and downscaled 20-km hybrid analyses were smaller for 18–36-h forecasts, but there were occasionally meaningful differences. Continuously cycling the 4-km backgrounds and using static background error covariances with larger horizontal length scales in the hybrid led to better forecasts. All hybrid-initialized forecasts, including those initialized from downscaled 20-km analyses, were more skillful than forecasts initialized from 4-km 3DVAR analyses, suggesting the analysis method was more important than analysis resolution.
Abstract
Two sets of global, 132-h (5.5-day), 10-member ensemble forecasts were produced with the Model for Prediction Across Scales (MPAS) for 35 cases in April and May 2017. One MPAS ensemble had a quasi-uniform 15-km mesh while the other employed a variable-resolution mesh with 3-km cell spacing over the conterminous United States (CONUS) that smoothly relaxed to 15 km over the rest of the globe. Precipitation forecasts from both MPAS ensembles were objectively verified over the central and eastern CONUS to assess the potential benefits of configuring MPAS with a 3-km mesh refinement region for medium-range forecasts. In addition, forecasts from NCEP’s operational Global Ensemble Forecast System were evaluated and served as a baseline against which to compare the experimental MPAS ensembles. The 3-km MPAS ensemble most faithfully reproduced the observed diurnal cycle of precipitation throughout the 132-h forecasts and had superior precipitation skill and reliability over the first 48 h. However, after 48 h, the three ensembles had more similar spread, reliability, and skill, and differences between probabilistic precipitation forecasts derived from the 3- and 15-km MPAS ensembles were typically statistically insignificant. Nonetheless, despite fewer benefits of increased resolution for spatial placement after 48 h, 3-km ensemble members explicitly provided potentially valuable guidance regarding convective mode throughout the 132-h forecasts while the other ensembles did not. Collectively, these results suggest both strengths and limitations of medium-range high-resolution ensemble forecasts and reveal pathways for future investigations to improve understanding of high-resolution global ensembles with variable-resolution meshes.
Abstract
Two sets of global, 132-h (5.5-day), 10-member ensemble forecasts were produced with the Model for Prediction Across Scales (MPAS) for 35 cases in April and May 2017. One MPAS ensemble had a quasi-uniform 15-km mesh while the other employed a variable-resolution mesh with 3-km cell spacing over the conterminous United States (CONUS) that smoothly relaxed to 15 km over the rest of the globe. Precipitation forecasts from both MPAS ensembles were objectively verified over the central and eastern CONUS to assess the potential benefits of configuring MPAS with a 3-km mesh refinement region for medium-range forecasts. In addition, forecasts from NCEP’s operational Global Ensemble Forecast System were evaluated and served as a baseline against which to compare the experimental MPAS ensembles. The 3-km MPAS ensemble most faithfully reproduced the observed diurnal cycle of precipitation throughout the 132-h forecasts and had superior precipitation skill and reliability over the first 48 h. However, after 48 h, the three ensembles had more similar spread, reliability, and skill, and differences between probabilistic precipitation forecasts derived from the 3- and 15-km MPAS ensembles were typically statistically insignificant. Nonetheless, despite fewer benefits of increased resolution for spatial placement after 48 h, 3-km ensemble members explicitly provided potentially valuable guidance regarding convective mode throughout the 132-h forecasts while the other ensembles did not. Collectively, these results suggest both strengths and limitations of medium-range high-resolution ensemble forecasts and reveal pathways for future investigations to improve understanding of high-resolution global ensembles with variable-resolution meshes.
Abstract
As high-resolution numerical weather prediction models are now commonplace, “neighborhood” verification metrics are regularly employed to evaluate forecast quality. These neighborhood approaches relax the requirement that perfect forecasts must match observations at the grid scale, contrasting traditional point-by-point verification methods. One recently proposed metric, the neighborhood equitable threat score, is calculated from 2 × 2 contingency tables that are populated within a neighborhood framework. However, the literature suggests three subtly different methods of populating neighborhood-based contingency tables. Thus, this work compares and contrasts these three variants and shows they yield statistically significantly different conclusions regarding forecast performance, illustrating that neighborhood-based contingency tables should be constructed carefully and transparently. Furthermore, this paper shows how two of the methods use inconsistent event definitions and suggests a “neighborhood maximum” approach be used to fill neighborhood-based contingency tables.
Abstract
As high-resolution numerical weather prediction models are now commonplace, “neighborhood” verification metrics are regularly employed to evaluate forecast quality. These neighborhood approaches relax the requirement that perfect forecasts must match observations at the grid scale, contrasting traditional point-by-point verification methods. One recently proposed metric, the neighborhood equitable threat score, is calculated from 2 × 2 contingency tables that are populated within a neighborhood framework. However, the literature suggests three subtly different methods of populating neighborhood-based contingency tables. Thus, this work compares and contrasts these three variants and shows they yield statistically significantly different conclusions regarding forecast performance, illustrating that neighborhood-based contingency tables should be constructed carefully and transparently. Furthermore, this paper shows how two of the methods use inconsistent event definitions and suggests a “neighborhood maximum” approach be used to fill neighborhood-based contingency tables.
Abstract
Four convection-permitting Weather Research and Forecasting Model (WRF) forecasts were produced in an attempt to replicate the record-breaking rainfall across the Colorado Front Range between 1200 UTC 11 September and 1200 UTC 13 September 2013. A nested WRF domain with 4- and 1-km horizontal grid spacings was employed, and sensitivity to initial conditions (ICs) and microphysics (MP) parameterizations was examined. Rainfall forecasts were compared to gridded observations produced by National Weather Service River Forecast Centers and gauge measurements from the Community Collaborative Rain, Hail and Snow Network (CoCoRaHS). All 1-km forecasts produced 48-h rainfall exceeding 250 mm over portions of the Colorado Front Range and were more consistent with observations than the 4-km forecasts. While localized sensitivities to both ICs and MP were noted, systematic differences were not attributable to the varied ICs or MP schemes. At times, the 1-km forecasts produced precipitation structures similar to those observed, but none of the forecasts successfully captured the observed mesoscale evolution of the entire rainfall event. Nonetheless, as all 1-km forecasts produced torrential rainfall over the Colorado Front Range, these forecasts could have been useful guidance for this event.
Abstract
Four convection-permitting Weather Research and Forecasting Model (WRF) forecasts were produced in an attempt to replicate the record-breaking rainfall across the Colorado Front Range between 1200 UTC 11 September and 1200 UTC 13 September 2013. A nested WRF domain with 4- and 1-km horizontal grid spacings was employed, and sensitivity to initial conditions (ICs) and microphysics (MP) parameterizations was examined. Rainfall forecasts were compared to gridded observations produced by National Weather Service River Forecast Centers and gauge measurements from the Community Collaborative Rain, Hail and Snow Network (CoCoRaHS). All 1-km forecasts produced 48-h rainfall exceeding 250 mm over portions of the Colorado Front Range and were more consistent with observations than the 4-km forecasts. While localized sensitivities to both ICs and MP were noted, systematic differences were not attributable to the varied ICs or MP schemes. At times, the 1-km forecasts produced precipitation structures similar to those observed, but none of the forecasts successfully captured the observed mesoscale evolution of the entire rainfall event. Nonetheless, as all 1-km forecasts produced torrential rainfall over the Colorado Front Range, these forecasts could have been useful guidance for this event.
Abstract
Analyses with 20-km horizontal grid spacing were produced from parallel continuously cycling three-dimensional variational (3DVAR), ensemble square root Kalman filter (EnSRF), and “hybrid” variational–ensemble data assimilation (DA) systems between 0000 UTC 6 May and 0000 UTC 21 June 2011 over a domain spanning the contiguous United States. Beginning 9 May, the 0000 UTC analyses initialized 36-h Weather Research and Forecasting Model (WRF) forecasts containing a large convection-permitting 4-km nest. These 4-km 3DVAR-, EnSRF-, and hybrid-initialized forecasts were compared to benchmark WRF forecasts initialized by interpolating 0000 UTC Global Forecast System (GFS) analyses onto the computational domain.
While important differences regarding mean state characteristics of the 20-km DA systems were noted, verification efforts focused on the 4-km precipitation forecasts. The 3DVAR-, hybrid-, and EnSRF-initialized 4-km precipitation forecasts performed similarly regarding general precipitation characteristics, such as timing of the diurnal cycle, and all three forecast sets had high precipitation biases at heavier rainfall rates. However, meaningful differences emerged regarding precipitation placement as quantified by the fractions skill score. For most forecast hours, the hybrid-initialized 4-km precipitation forecasts were better than the EnSRF-, 3DVAR-, and GFS-initialized forecasts, and the improvement was often statistically significant at the 95th percentile. These results demonstrate the potential of limited-area continuously cycling hybrid DA configurations and suggest additional hybrid development is warranted.
Abstract
Analyses with 20-km horizontal grid spacing were produced from parallel continuously cycling three-dimensional variational (3DVAR), ensemble square root Kalman filter (EnSRF), and “hybrid” variational–ensemble data assimilation (DA) systems between 0000 UTC 6 May and 0000 UTC 21 June 2011 over a domain spanning the contiguous United States. Beginning 9 May, the 0000 UTC analyses initialized 36-h Weather Research and Forecasting Model (WRF) forecasts containing a large convection-permitting 4-km nest. These 4-km 3DVAR-, EnSRF-, and hybrid-initialized forecasts were compared to benchmark WRF forecasts initialized by interpolating 0000 UTC Global Forecast System (GFS) analyses onto the computational domain.
While important differences regarding mean state characteristics of the 20-km DA systems were noted, verification efforts focused on the 4-km precipitation forecasts. The 3DVAR-, hybrid-, and EnSRF-initialized 4-km precipitation forecasts performed similarly regarding general precipitation characteristics, such as timing of the diurnal cycle, and all three forecast sets had high precipitation biases at heavier rainfall rates. However, meaningful differences emerged regarding precipitation placement as quantified by the fractions skill score. For most forecast hours, the hybrid-initialized 4-km precipitation forecasts were better than the EnSRF-, 3DVAR-, and GFS-initialized forecasts, and the improvement was often statistically significant at the 95th percentile. These results demonstrate the potential of limited-area continuously cycling hybrid DA configurations and suggest additional hybrid development is warranted.
Abstract
Hourly accumulated precipitation forecasts from deterministic convection-allowing numerical weather prediction models with 3- and 1-km horizontal grid spacing were evaluated over 497 forecasts between 2010 and 2017 over the central and eastern conterminous United States (CONUS). While precipitation biases varied geographically and seasonally, 1-km model climatologies of precipitation generally aligned better with those observed than 3-km climatologies. Additionally, during the cool season and spring, when large-scale forcing was strong and precipitation entities were large, 1-km forecasts were more skillful than 3-km forecasts, particularly over southern portions of the CONUS where instability was greatest. Conversely, during summertime, when synoptic-scale forcing was weak and precipitation entities were small, 3- and 1-km forecasts had similar skill. These collective results differ substantially from previous work finding 4-km forecasts had comparable springtime precipitation forecast skill as 1- or 2-km forecasts over the central–eastern CONUS. Additional analyses and experiments suggest the greater benefits of 1-km forecasts documented here could be related to higher-quality initial conditions than in prior studies. However, further research is needed to confirm this hypothesis.
Abstract
Hourly accumulated precipitation forecasts from deterministic convection-allowing numerical weather prediction models with 3- and 1-km horizontal grid spacing were evaluated over 497 forecasts between 2010 and 2017 over the central and eastern conterminous United States (CONUS). While precipitation biases varied geographically and seasonally, 1-km model climatologies of precipitation generally aligned better with those observed than 3-km climatologies. Additionally, during the cool season and spring, when large-scale forcing was strong and precipitation entities were large, 1-km forecasts were more skillful than 3-km forecasts, particularly over southern portions of the CONUS where instability was greatest. Conversely, during summertime, when synoptic-scale forcing was weak and precipitation entities were small, 3- and 1-km forecasts had similar skill. These collective results differ substantially from previous work finding 4-km forecasts had comparable springtime precipitation forecast skill as 1- or 2-km forecasts over the central–eastern CONUS. Additional analyses and experiments suggest the greater benefits of 1-km forecasts documented here could be related to higher-quality initial conditions than in prior studies. However, further research is needed to confirm this hypothesis.
Abstract
“Neighborhood approaches” have been used in two primary ways to postprocess and verify high-resolution ensemble output. While the two methods appear deceptively similar, they define events over different spatial scales and yield fields with different interpretations: the first produces probabilities interpreted as likelihood of event occurrence at the grid scale, while the second produces probabilities of event occurrence over spatial scales larger than the grid scale. Unfortunately, some studies have confused the two methods, while others did not acknowledge multiple possibilities of neighborhood approach application and simply stated, “a neighborhood approach was applied” without supporting details. Thus, this paper reviews applications of neighborhood approaches to convection-allowing ensembles in hopes of clarifying the two methods and their different event definitions. Then, using real data, it is demonstrated how the two approaches can yield statistically significantly different objective conclusions about model performance, underscoring the critical need for thorough descriptions of how neighborhood approaches are implemented and events are defined. The authors conclude by providing some recommendations for application of neighborhood approaches to convection-allowing ensembles.
Abstract
“Neighborhood approaches” have been used in two primary ways to postprocess and verify high-resolution ensemble output. While the two methods appear deceptively similar, they define events over different spatial scales and yield fields with different interpretations: the first produces probabilities interpreted as likelihood of event occurrence at the grid scale, while the second produces probabilities of event occurrence over spatial scales larger than the grid scale. Unfortunately, some studies have confused the two methods, while others did not acknowledge multiple possibilities of neighborhood approach application and simply stated, “a neighborhood approach was applied” without supporting details. Thus, this paper reviews applications of neighborhood approaches to convection-allowing ensembles in hopes of clarifying the two methods and their different event definitions. Then, using real data, it is demonstrated how the two approaches can yield statistically significantly different objective conclusions about model performance, underscoring the critical need for thorough descriptions of how neighborhood approaches are implemented and events are defined. The authors conclude by providing some recommendations for application of neighborhood approaches to convection-allowing ensembles.
Abstract
Cutoff lows are often associated with high-impact weather; therefore, it is critical that operational numerical weather prediction systems accurately represent the evolution of these features. However, medium-range forecasts of upper-level features using the Global Forecast System (GFS) are often subjectively characterized by excessive synoptic progressiveness, i.e., a tendency to advance troughs and cutoff lows too quickly downstream. To better understand synoptic progressiveness errors, this research quantifies seven years of 500-hPa cutoff low position errors over the globe, with the goal of objectively identifying regions where synoptic progressiveness errors are common and how frequently these errors occur. Specifically, 500-hPa features are identified and tracked in 0–240-h 0.25° GFS forecasts during April 2015–March 2022 using an objective cutoff low and trough identification scheme and compared to corresponding 500-hPa GFS analyses. In the Northern Hemisphere, cutoff lows are generally underrepresented in forecasts compared to verifying analyses, particularly over continental midlatitude regions. Features identified in short- to long-range forecasts are generally associated with eastward zonal position errors over the conterminous United States and northern Asia, particularly during the spring and autumn. Similarly, cutoff lows over the Southern Hemisphere midlatitudes are characterized by an eastward displacement bias during all seasons.
Significance Statement
Cutoff lows are often associated with high-impact weather, including excessive rainfall, winter storms, and severe weather. GFS forecasts of cutoff lows over the United States are often subjectively noted to advance cutoff lows too quickly downstream, and thus limit forecast skill in potentially impactful scenarios. Therefore, this study quantifies the position error characteristics of cutoff lows in recent GFS forecasts. Consistent with typically anecdotal impressions of cutoff low position errors, this analysis demonstrates that cutoff lows over North America and central Asia are generally associated with an eastward position bias in medium- to long-range GFS forecasts. These results suggest that additional research to identify both environmental conditions and potential model deficiencies that may exacerbate this eastward bias would be beneficial.
Abstract
Cutoff lows are often associated with high-impact weather; therefore, it is critical that operational numerical weather prediction systems accurately represent the evolution of these features. However, medium-range forecasts of upper-level features using the Global Forecast System (GFS) are often subjectively characterized by excessive synoptic progressiveness, i.e., a tendency to advance troughs and cutoff lows too quickly downstream. To better understand synoptic progressiveness errors, this research quantifies seven years of 500-hPa cutoff low position errors over the globe, with the goal of objectively identifying regions where synoptic progressiveness errors are common and how frequently these errors occur. Specifically, 500-hPa features are identified and tracked in 0–240-h 0.25° GFS forecasts during April 2015–March 2022 using an objective cutoff low and trough identification scheme and compared to corresponding 500-hPa GFS analyses. In the Northern Hemisphere, cutoff lows are generally underrepresented in forecasts compared to verifying analyses, particularly over continental midlatitude regions. Features identified in short- to long-range forecasts are generally associated with eastward zonal position errors over the conterminous United States and northern Asia, particularly during the spring and autumn. Similarly, cutoff lows over the Southern Hemisphere midlatitudes are characterized by an eastward displacement bias during all seasons.
Significance Statement
Cutoff lows are often associated with high-impact weather, including excessive rainfall, winter storms, and severe weather. GFS forecasts of cutoff lows over the United States are often subjectively noted to advance cutoff lows too quickly downstream, and thus limit forecast skill in potentially impactful scenarios. Therefore, this study quantifies the position error characteristics of cutoff lows in recent GFS forecasts. Consistent with typically anecdotal impressions of cutoff low position errors, this analysis demonstrates that cutoff lows over North America and central Asia are generally associated with an eastward position bias in medium- to long-range GFS forecasts. These results suggest that additional research to identify both environmental conditions and potential model deficiencies that may exacerbate this eastward bias would be beneficial.
Abstract
Using the Weather Research and Forecasting Model, 80-member ensemble Kalman filter (EnKF) analyses with 3-km horizontal grid spacing were produced over the entire conterminous United States (CONUS) for 4 weeks using 1-h continuous cycling. For comparison, similarly configured EnKF analyses with 15-km horizontal grid spacing were also produced. At 0000 UTC, 15- and 3-km EnKF analyses initialized 36-h, 3-km, 10-member ensemble forecasts that were verified with a focus on precipitation. Additionally, forecasts were initialized from operational Global Ensemble Forecast System (GEFS) initial conditions (ICs) and experimental “blended” ICs produced by combining large scales from GEFS ICs with small scales from EnKF analyses using a low-pass filter. The EnKFs had stable climates with generally small biases, and precipitation forecasts initialized from 3-km EnKF analyses were more skillful and reliable than those initialized from downscaled GEFS and 15-km EnKF ICs through 12–18 and 6–12 h, respectively. Conversely, after 18 h, GEFS-initialized precipitation forecasts were better than EnKF-initialized precipitation forecasts. Blended 3-km ICs reflected the respective strengths of both GEFS and high-resolution EnKF ICs and yielded the best performance considering all times: blended 3-km ICs led to short-term forecasts with similar or better skill and reliability than those initialized from unblended 3-km EnKF analyses and ~18–36-h forecasts possessing comparable quality as GEFS-initialized forecasts. This work likely represents the first time a convection-allowing EnKF has been continuously cycled over a region as large as the entire CONUS, and results suggest blending high-resolution EnKF analyses with low-resolution global fields can potentially unify short-term and next-day convection-allowing ensemble forecast systems under a common framework.
Abstract
Using the Weather Research and Forecasting Model, 80-member ensemble Kalman filter (EnKF) analyses with 3-km horizontal grid spacing were produced over the entire conterminous United States (CONUS) for 4 weeks using 1-h continuous cycling. For comparison, similarly configured EnKF analyses with 15-km horizontal grid spacing were also produced. At 0000 UTC, 15- and 3-km EnKF analyses initialized 36-h, 3-km, 10-member ensemble forecasts that were verified with a focus on precipitation. Additionally, forecasts were initialized from operational Global Ensemble Forecast System (GEFS) initial conditions (ICs) and experimental “blended” ICs produced by combining large scales from GEFS ICs with small scales from EnKF analyses using a low-pass filter. The EnKFs had stable climates with generally small biases, and precipitation forecasts initialized from 3-km EnKF analyses were more skillful and reliable than those initialized from downscaled GEFS and 15-km EnKF ICs through 12–18 and 6–12 h, respectively. Conversely, after 18 h, GEFS-initialized precipitation forecasts were better than EnKF-initialized precipitation forecasts. Blended 3-km ICs reflected the respective strengths of both GEFS and high-resolution EnKF ICs and yielded the best performance considering all times: blended 3-km ICs led to short-term forecasts with similar or better skill and reliability than those initialized from unblended 3-km EnKF analyses and ~18–36-h forecasts possessing comparable quality as GEFS-initialized forecasts. This work likely represents the first time a convection-allowing EnKF has been continuously cycled over a region as large as the entire CONUS, and results suggest blending high-resolution EnKF analyses with low-resolution global fields can potentially unify short-term and next-day convection-allowing ensemble forecast systems under a common framework.
Abstract
A feed-forward neural network (NN) was trained to produce gridded probabilistic convective hazard predictions over the contiguous United States. Input fields to the NN included 174 predictors, derived from 38 variables output by 497 convection-allowing model forecasts, with observed severe storm reports used for training and verification. These NN probability forecasts (NNPFs) were compared to surrogate-severe probability forecasts (SSPFs), generated by smoothing a field of surrogate reports derived with updraft helicity (UH). NNPFs and SSPFs were produced each forecast hour on an 80-km grid, with forecasts valid for the occurrence of any severe weather report within 40 or 120 km, and 2 h, of each 80-km grid box. NNPFs were superior to SSPFs, producing statistically significant improvements in forecast reliability and resolution. Additionally, NNPFs retained more large magnitude probabilities (>50%) compared to SSPFs since NNPFs did not use spatial smoothing, improving forecast sharpness. NNPFs were most skillful relative to SSPFs when predicting hazards on larger scales (e.g., 120 vs 40 km) and in situations where using UH was detrimental to forecast skill. These included model spinup, nocturnal periods, and regions and environments where supercells were less common, such as the western and eastern United States and high-shear, low-CAPE regimes. NNPFs trained with fewer predictors were more skillful than SSPFs, but not as skillful as the full-predictor NNPFs, with predictor importance being a function of forecast lead time. Placing NNPF skill in the context of existing baselines is a first step toward integrating machine learning–based forecasts into the operational forecasting process.
Abstract
A feed-forward neural network (NN) was trained to produce gridded probabilistic convective hazard predictions over the contiguous United States. Input fields to the NN included 174 predictors, derived from 38 variables output by 497 convection-allowing model forecasts, with observed severe storm reports used for training and verification. These NN probability forecasts (NNPFs) were compared to surrogate-severe probability forecasts (SSPFs), generated by smoothing a field of surrogate reports derived with updraft helicity (UH). NNPFs and SSPFs were produced each forecast hour on an 80-km grid, with forecasts valid for the occurrence of any severe weather report within 40 or 120 km, and 2 h, of each 80-km grid box. NNPFs were superior to SSPFs, producing statistically significant improvements in forecast reliability and resolution. Additionally, NNPFs retained more large magnitude probabilities (>50%) compared to SSPFs since NNPFs did not use spatial smoothing, improving forecast sharpness. NNPFs were most skillful relative to SSPFs when predicting hazards on larger scales (e.g., 120 vs 40 km) and in situations where using UH was detrimental to forecast skill. These included model spinup, nocturnal periods, and regions and environments where supercells were less common, such as the western and eastern United States and high-shear, low-CAPE regimes. NNPFs trained with fewer predictors were more skillful than SSPFs, but not as skillful as the full-predictor NNPFs, with predictor importance being a function of forecast lead time. Placing NNPF skill in the context of existing baselines is a first step toward integrating machine learning–based forecasts into the operational forecasting process.