Browse
Abstract
The skillful prediction of monthly scale rainfall in small regions like Taiwan is one of the challenges of the meteorological scientific community. Taiwan is one of the subtropical islands in Asia. It experiences rainfall extremes regularly, leading to landslides and flash floods in/near the mountains and flooding over low-lying plains, particularly during the summer monsoon season [June–September (JJAS)]. In September 2020, NOAA/NCEP implemented Global Ensemble Forecast System, version 12 (GEFSv12), to support stakeholders for subseasonal forecasts and hydrological applications. In the present study, the performance evaluation of GEFSv12 for monthly rainfall and associated extreme rainfall (ER) events over Taiwan during JJAS against CMORPH has been done. There is a marginal improvement of GEFSv12 in depicting the East Asian summer monsoon index (EASMI) as compared to GEFS-SubX. The GEFSv12 rainfall raw products have been calibrated with a quantile–quantile (QQ) mapping technique for further prediction skill improvement. The results reveal that the spatial patterns of climatological features (mean, interannual variability, and coefficient of variation) of summer monsoon monthly rainfall over Taiwan from QQ-GEFSv12 are very similar to CMORPH than Raw-GEFSv12. Raw-GEFSv12 has an enormous wet bias and overforecast wet days, while QQ-GEFSv12 is close to reality. The prediction skill (correlation coefficient and index of agreement) of GEFSv12 in depicting the summer monsoon monthly rainfall over Taiwan is significantly high (>0.5) in most parts of Taiwan and particularly more during peak monsoon months, September, and August, followed by June and July. The calibration method significantly reduces the overestimation (underestimation) of wet (ER) events from the ensemble mean and probabilistic ensemble forecasts. The predictability of extreme rainfall events (>50 mm day−1) has also improved significantly.
Abstract
The skillful prediction of monthly scale rainfall in small regions like Taiwan is one of the challenges of the meteorological scientific community. Taiwan is one of the subtropical islands in Asia. It experiences rainfall extremes regularly, leading to landslides and flash floods in/near the mountains and flooding over low-lying plains, particularly during the summer monsoon season [June–September (JJAS)]. In September 2020, NOAA/NCEP implemented Global Ensemble Forecast System, version 12 (GEFSv12), to support stakeholders for subseasonal forecasts and hydrological applications. In the present study, the performance evaluation of GEFSv12 for monthly rainfall and associated extreme rainfall (ER) events over Taiwan during JJAS against CMORPH has been done. There is a marginal improvement of GEFSv12 in depicting the East Asian summer monsoon index (EASMI) as compared to GEFS-SubX. The GEFSv12 rainfall raw products have been calibrated with a quantile–quantile (QQ) mapping technique for further prediction skill improvement. The results reveal that the spatial patterns of climatological features (mean, interannual variability, and coefficient of variation) of summer monsoon monthly rainfall over Taiwan from QQ-GEFSv12 are very similar to CMORPH than Raw-GEFSv12. Raw-GEFSv12 has an enormous wet bias and overforecast wet days, while QQ-GEFSv12 is close to reality. The prediction skill (correlation coefficient and index of agreement) of GEFSv12 in depicting the summer monsoon monthly rainfall over Taiwan is significantly high (>0.5) in most parts of Taiwan and particularly more during peak monsoon months, September, and August, followed by June and July. The calibration method significantly reduces the overestimation (underestimation) of wet (ER) events from the ensemble mean and probabilistic ensemble forecasts. The predictability of extreme rainfall events (>50 mm day−1) has also improved significantly.
Abstract
Typhoon Morakot struck Taiwan during 7–9 August 2009 and became the deadliest tropical cyclone (TC) in five decades by producing up to 2635 mm of rain in 48 h, breaking the world record. The extreme rainfall of Morakot resulted from the strong interaction among several favorable factors that occurred simultaneously. These factors from large scale to small scale include the following: 1) weak environmental steering flow linked to the evolution of the monsoon gyre and consequently slow TC motion; 2) a strong moisture surge due to low-level southwesterly flow; 3) asymmetric rainfall and latent heating near southern Taiwan to further reduce the TC’s forward motion as its center began moving away from Taiwan; 4) enhanced rainfall due to steep topography; 5) atypical structure with a weak inner core, enhancing its susceptibility to the latent heating effect; and 6) cell merger and back building inside the rainbands associated with the interaction between the low-level jet and convective updrafts. From a forecasting standpoint, the present-day convective-permitting or cloud-resolving regional models are capable of short-range predictions of the Morakot event starting from 6 August. At longer ranges beyond 3 days, larger uncertainty exists in the track forecast and an ensemble approach is necessary. Due to the large computational demand at the required high resolution, the time-lagged strategy is shown to be a feasible option to produce useful information on rainfall probabilities of the event.
Abstract
Typhoon Morakot struck Taiwan during 7–9 August 2009 and became the deadliest tropical cyclone (TC) in five decades by producing up to 2635 mm of rain in 48 h, breaking the world record. The extreme rainfall of Morakot resulted from the strong interaction among several favorable factors that occurred simultaneously. These factors from large scale to small scale include the following: 1) weak environmental steering flow linked to the evolution of the monsoon gyre and consequently slow TC motion; 2) a strong moisture surge due to low-level southwesterly flow; 3) asymmetric rainfall and latent heating near southern Taiwan to further reduce the TC’s forward motion as its center began moving away from Taiwan; 4) enhanced rainfall due to steep topography; 5) atypical structure with a weak inner core, enhancing its susceptibility to the latent heating effect; and 6) cell merger and back building inside the rainbands associated with the interaction between the low-level jet and convective updrafts. From a forecasting standpoint, the present-day convective-permitting or cloud-resolving regional models are capable of short-range predictions of the Morakot event starting from 6 August. At longer ranges beyond 3 days, larger uncertainty exists in the track forecast and an ensemble approach is necessary. Due to the large computational demand at the required high resolution, the time-lagged strategy is shown to be a feasible option to produce useful information on rainfall probabilities of the event.
Abstract
The Advanced version of the Weather Research and Forecasting (WRF-ARW) Model is used to investigate the influence of an easterly wave (EW) on the genesis of Typhoon Hagupit (2008) in the western North Pacific. Observational analysis indicates that the precursor disturbance of Typhoon Hagupit (2008) is an easterly wave (EW) in the western North Pacific, which can be detected at least 7 days prior to the typhoon genesis. In the control experiment, the genesis of the typhoon is well captured. A sensitivity experiment is conducted by filtering out the synoptic-scale (3–8-day) signals associated with the EW. The absence of the EW eliminates the typhoon genesis. Two mechanisms are proposed regarding the effect of the EW on the genesis of Hagupit. First, the background cyclonic vorticity of the EW could induce the small-scale cyclonic vorticities to merge and develop into a system-scale vortex. Second, the EW provides a favorable environment in situ for the rapid development of the typhoon disturbance through a positive moisture–convection feedback.
Abstract
The Advanced version of the Weather Research and Forecasting (WRF-ARW) Model is used to investigate the influence of an easterly wave (EW) on the genesis of Typhoon Hagupit (2008) in the western North Pacific. Observational analysis indicates that the precursor disturbance of Typhoon Hagupit (2008) is an easterly wave (EW) in the western North Pacific, which can be detected at least 7 days prior to the typhoon genesis. In the control experiment, the genesis of the typhoon is well captured. A sensitivity experiment is conducted by filtering out the synoptic-scale (3–8-day) signals associated with the EW. The absence of the EW eliminates the typhoon genesis. Two mechanisms are proposed regarding the effect of the EW on the genesis of Hagupit. First, the background cyclonic vorticity of the EW could induce the small-scale cyclonic vorticities to merge and develop into a system-scale vortex. Second, the EW provides a favorable environment in situ for the rapid development of the typhoon disturbance through a positive moisture–convection feedback.
Abstract
The National Hurricane Center (NHC) uses a variety of guidance models for its operational tropical cyclone track, intensity, and wind structure forecasts, and as baselines for the evaluation of forecast skill. A set of the simpler models, collectively known as the NHC guidance suite, is maintained by NHC. The models comprising the guidance suite are briefly described and evaluated, with details provided for those that have not been documented previously. Decay-SHIFOR is a modified version of the Statistical Hurricane Intensity Forecast (SHIFOR) model that includes decay over land; this modification improves the SHIFOR forecasts through about 96 h. T-CLIPER, a climatology and persistence model that predicts track and intensity using a trajectory approach, has error characteristics similar to those of CLIPER and D-SHIFOR but can be run to any forecast length. The Trajectory and Beta model (TAB), another trajectory track model, applies a gridpoint spatial filter to smooth winds from the National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS) model. TAB model errors were 10%–15% lower than those of the Beta and Advection model (BAM), the model it replaced in 2017. Optimizing TAB’s vertical weights shows that the lower troposphere’s environmental flow provides a better match to observed tropical cyclone motion than does the upper troposphere’s, and that the optimal steering layer is shallower for higher-latitude and weaker tropical cyclones. The advantages and disadvantages of the D-SHIFOR, T-CLIPER, and TAB models relative to their earlier counterparts are discussed.
Significance Statement
This paper provides a comprehensive summary and evaluation of a set of simpler forecast models used as guidance for NHC’s operational tropical cyclone forecasts, and as baselines for the evaluation of forecast skill; these include newer techniques that extend forecasts to 7 days and beyond.
Abstract
The National Hurricane Center (NHC) uses a variety of guidance models for its operational tropical cyclone track, intensity, and wind structure forecasts, and as baselines for the evaluation of forecast skill. A set of the simpler models, collectively known as the NHC guidance suite, is maintained by NHC. The models comprising the guidance suite are briefly described and evaluated, with details provided for those that have not been documented previously. Decay-SHIFOR is a modified version of the Statistical Hurricane Intensity Forecast (SHIFOR) model that includes decay over land; this modification improves the SHIFOR forecasts through about 96 h. T-CLIPER, a climatology and persistence model that predicts track and intensity using a trajectory approach, has error characteristics similar to those of CLIPER and D-SHIFOR but can be run to any forecast length. The Trajectory and Beta model (TAB), another trajectory track model, applies a gridpoint spatial filter to smooth winds from the National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS) model. TAB model errors were 10%–15% lower than those of the Beta and Advection model (BAM), the model it replaced in 2017. Optimizing TAB’s vertical weights shows that the lower troposphere’s environmental flow provides a better match to observed tropical cyclone motion than does the upper troposphere’s, and that the optimal steering layer is shallower for higher-latitude and weaker tropical cyclones. The advantages and disadvantages of the D-SHIFOR, T-CLIPER, and TAB models relative to their earlier counterparts are discussed.
Significance Statement
This paper provides a comprehensive summary and evaluation of a set of simpler forecast models used as guidance for NHC’s operational tropical cyclone forecasts, and as baselines for the evaluation of forecast skill; these include newer techniques that extend forecasts to 7 days and beyond.
Abstract
While storm identification and tracking algorithms are used both operationally and in research, there exists no single standard technique to objectively determine performance of such algorithms. Thus, a comparative skill score is developed herein that consists of four parameters, three of which constitute the quantification of storm attributes—size consistency, linearity of tracks, and mean track duration—and the fourth that correlates performance to an optimal postevent reanalysis. The skill score is a cumulative sum of each of the parameters normalized from zero to one among the compared algorithms, such that a maximum skill score of four can be obtained. The skill score is intended to favor algorithms that are efficient at severe storm detection, i.e., high-scoring algorithms should detect storms that have higher current or future severe threat and minimize detection of weaker, short-lived storms with low severe potential. The skill score is shown to be capable of successfully ranking a large number of algorithms, both between varying settings within the same base algorithm and between distinct base algorithms. Through a comparison with manually created user datasets, high-scoring algorithms are verified to match well with hand analyses, demonstrating appropriate calibration of skill score parameters.
Significance Statement
With the growing number of options for storm identification and tracking techniques, it is necessary to devise an objective approach to quantify performance of different techniques. This study introduces a comparative skill score that assesses size consistency, linearity of tracks, mean track duration, and correlation to an optimal postevent reanalysis to rank diverse algorithms. This paper will show the capability of the skill score at highlighting algorithms that are efficient at detecting storms with higher severe potential, as well as those that closely resemble human-perceived storms through a comparison with manually created user datasets. The novel methodology will be useful in improving systems that rely on such algorithms, for both operational and research purposes focusing on severe storm detection.
Abstract
While storm identification and tracking algorithms are used both operationally and in research, there exists no single standard technique to objectively determine performance of such algorithms. Thus, a comparative skill score is developed herein that consists of four parameters, three of which constitute the quantification of storm attributes—size consistency, linearity of tracks, and mean track duration—and the fourth that correlates performance to an optimal postevent reanalysis. The skill score is a cumulative sum of each of the parameters normalized from zero to one among the compared algorithms, such that a maximum skill score of four can be obtained. The skill score is intended to favor algorithms that are efficient at severe storm detection, i.e., high-scoring algorithms should detect storms that have higher current or future severe threat and minimize detection of weaker, short-lived storms with low severe potential. The skill score is shown to be capable of successfully ranking a large number of algorithms, both between varying settings within the same base algorithm and between distinct base algorithms. Through a comparison with manually created user datasets, high-scoring algorithms are verified to match well with hand analyses, demonstrating appropriate calibration of skill score parameters.
Significance Statement
With the growing number of options for storm identification and tracking techniques, it is necessary to devise an objective approach to quantify performance of different techniques. This study introduces a comparative skill score that assesses size consistency, linearity of tracks, mean track duration, and correlation to an optimal postevent reanalysis to rank diverse algorithms. This paper will show the capability of the skill score at highlighting algorithms that are efficient at detecting storms with higher severe potential, as well as those that closely resemble human-perceived storms through a comparison with manually created user datasets. The novel methodology will be useful in improving systems that rely on such algorithms, for both operational and research purposes focusing on severe storm detection.
Abstract
The skillful anticipation of tornadoes produced by quasi-linear convective systems (QLCSs) is a well-known forecasting challenge. This study was motivated by the possibility that warning accuracy of QLCS tornadoes depends on the processes leading to tornadogenesis, namely, one that is dominated by an apparent release of horizontal shearing instability [shearing instability dominant (SID)] and one by a pre-tornadic mesocyclone [pre-tornadic mesocyclone dominant (PMD)] and its associated generative mechanisms. The manual classification of the genesis of 530 QLCS tornadoes as either SID or PMD was performed using heuristic, yet process-driven criteria based on single-Doppler radar (WSR-88D) data. This included 214, 213, and 103 tornadoes that occurred during 2019, 2017, and 2016, respectively. As a function of tornadogenesis process, 36% were classified as SID, and 60% were classified as PMD; the remaining 4% could not be classified. Approximately 30% of the SID cases were operationally warned prior to tornadogenesis, compared to 44% of the PMD cases. PMD tornadoes were also more common during the warm season and displayed a diurnal, midafternoon peak in frequency. Finally, SID cases were more likely to be associated with QLCS tornado outbreaks but tended to be slightly shorter lived. A complementary effort to investigate environmental characteristics of QLCS tornadogenesis revealed differences between SID and PMD cases. MLCAPE was relatively larger for warm-season SID cases, and 0–3-km SRH was relatively larger in warm-season PMD cases. Additionally, pre-tornadic frontogenesis was more prominent for cool-season SID cases, suggestive of a more significant role of the larger-scale meteorological forcing in vertical vorticity that fosters tornadogenesis through SID processes.
Abstract
The skillful anticipation of tornadoes produced by quasi-linear convective systems (QLCSs) is a well-known forecasting challenge. This study was motivated by the possibility that warning accuracy of QLCS tornadoes depends on the processes leading to tornadogenesis, namely, one that is dominated by an apparent release of horizontal shearing instability [shearing instability dominant (SID)] and one by a pre-tornadic mesocyclone [pre-tornadic mesocyclone dominant (PMD)] and its associated generative mechanisms. The manual classification of the genesis of 530 QLCS tornadoes as either SID or PMD was performed using heuristic, yet process-driven criteria based on single-Doppler radar (WSR-88D) data. This included 214, 213, and 103 tornadoes that occurred during 2019, 2017, and 2016, respectively. As a function of tornadogenesis process, 36% were classified as SID, and 60% were classified as PMD; the remaining 4% could not be classified. Approximately 30% of the SID cases were operationally warned prior to tornadogenesis, compared to 44% of the PMD cases. PMD tornadoes were also more common during the warm season and displayed a diurnal, midafternoon peak in frequency. Finally, SID cases were more likely to be associated with QLCS tornado outbreaks but tended to be slightly shorter lived. A complementary effort to investigate environmental characteristics of QLCS tornadogenesis revealed differences between SID and PMD cases. MLCAPE was relatively larger for warm-season SID cases, and 0–3-km SRH was relatively larger in warm-season PMD cases. Additionally, pre-tornadic frontogenesis was more prominent for cool-season SID cases, suggestive of a more significant role of the larger-scale meteorological forcing in vertical vorticity that fosters tornadogenesis through SID processes.
Abstract
In an ensemble Kalman filter, when the analysis update of an ensemble member is computed using error statistics estimated from an ensemble that includes the background of the member being updated, the spread of the resulting ensemble systematically underestimates the uncertainty of the ensemble mean analysis. This problem can largely be avoided by applying cross validation: using an independent subset of ensemble members for updating each member. However, in some circumstances cross validation can lead to the divergence of one or more ensemble members from observations. This can culminate in catastrophic filter divergence in which the analyzed or forecast states become unrealistic in the diverging members. So far, such instabilities have been reported only in the context of highly nonlinear low-dimensional models. The first known manifestation of catastrophic filter divergence caused by the use of cross validation in an NWP context is reported here. To reduce the risk of such filter divergence, a modification to the traditional cross-validation approach is proposed. Instead of always assigning the ensemble members to the same subensembles, the members forming each subensemble are randomly chosen at every analysis step. It is shown that this new approach can prevent filter divergence and also brings a cycling ensemble data assimilation system containing divergent members back to a state consistent with Gaussianity. The randomized subensemble approach was implemented in the operational global ensemble prediction system at Environment and Climate Change Canada on 1 December 2021.
Abstract
In an ensemble Kalman filter, when the analysis update of an ensemble member is computed using error statistics estimated from an ensemble that includes the background of the member being updated, the spread of the resulting ensemble systematically underestimates the uncertainty of the ensemble mean analysis. This problem can largely be avoided by applying cross validation: using an independent subset of ensemble members for updating each member. However, in some circumstances cross validation can lead to the divergence of one or more ensemble members from observations. This can culminate in catastrophic filter divergence in which the analyzed or forecast states become unrealistic in the diverging members. So far, such instabilities have been reported only in the context of highly nonlinear low-dimensional models. The first known manifestation of catastrophic filter divergence caused by the use of cross validation in an NWP context is reported here. To reduce the risk of such filter divergence, a modification to the traditional cross-validation approach is proposed. Instead of always assigning the ensemble members to the same subensembles, the members forming each subensemble are randomly chosen at every analysis step. It is shown that this new approach can prevent filter divergence and also brings a cycling ensemble data assimilation system containing divergent members back to a state consistent with Gaussianity. The randomized subensemble approach was implemented in the operational global ensemble prediction system at Environment and Climate Change Canada on 1 December 2021.
Abstract
The skill of NOAA’s official monthly U.S. precipitation forecasts (issued in the middle of the prior month) has historically been low, having shown modest skill over the southern United States, but little or no skill over large portions of the central United States. The goal of this study is to explain the seasonal and regional variations of the North American subseasonal (weeks 3–6) precipitation skill, specifically the reasons for its successes and its limitations. The performances of multiple recent-generation model reforecasts over 1999–2015 in predicting precipitation are compared to uninitialized simulation skill using the atmospheric component of the forecast systems. This parallel analysis permits attribution of precipitation skill to two distinct sources: one due to slowly evolving ocean surface boundary states and the other to faster time-scale initial atmospheric weather states. A strong regionality and seasonality in precipitation forecast performance is shown to be analogous to skill patterns dictated by boundary forcing constraints alone. The correspondence is found to be especially high for the North American pattern of the maximum monthly skill that is achieved in the reforecast. The boundary forcing of most importance originates from tropical Pacific SST influences, especially those related to El Niño–Southern Oscillation. We discuss physical constraints that may limit monthly precipitation skill and interpret the performance of existing models in the context of plausible upper limits.
Significance Statement
Skillful subseasonal precipitation predictions have societal benefits. Over the United States, however, NOAA’s official U.S. monthly precipitation forecast skill has been historically low. Here we explore origins for skill of North American week-3 to week-6 precipitation predictions. Skill arising from initial weather states is compared to that arising from ocean surface boundary states alone. The monthly and seasonally varying pattern of U.S. monthly precipitation skill is appreciably derived from boundary constraints, linked especially with El Niño–Southern Oscillation. Forecasts of opportunity are identified, despite the low skill of monthly precipitation forecasts on average. Potential limits of monthly precipitation skill are explored that provide insight on the juxtaposition of “skill deserts” over the central United States with high skill regions over western North America.
Abstract
The skill of NOAA’s official monthly U.S. precipitation forecasts (issued in the middle of the prior month) has historically been low, having shown modest skill over the southern United States, but little or no skill over large portions of the central United States. The goal of this study is to explain the seasonal and regional variations of the North American subseasonal (weeks 3–6) precipitation skill, specifically the reasons for its successes and its limitations. The performances of multiple recent-generation model reforecasts over 1999–2015 in predicting precipitation are compared to uninitialized simulation skill using the atmospheric component of the forecast systems. This parallel analysis permits attribution of precipitation skill to two distinct sources: one due to slowly evolving ocean surface boundary states and the other to faster time-scale initial atmospheric weather states. A strong regionality and seasonality in precipitation forecast performance is shown to be analogous to skill patterns dictated by boundary forcing constraints alone. The correspondence is found to be especially high for the North American pattern of the maximum monthly skill that is achieved in the reforecast. The boundary forcing of most importance originates from tropical Pacific SST influences, especially those related to El Niño–Southern Oscillation. We discuss physical constraints that may limit monthly precipitation skill and interpret the performance of existing models in the context of plausible upper limits.
Significance Statement
Skillful subseasonal precipitation predictions have societal benefits. Over the United States, however, NOAA’s official U.S. monthly precipitation forecast skill has been historically low. Here we explore origins for skill of North American week-3 to week-6 precipitation predictions. Skill arising from initial weather states is compared to that arising from ocean surface boundary states alone. The monthly and seasonally varying pattern of U.S. monthly precipitation skill is appreciably derived from boundary constraints, linked especially with El Niño–Southern Oscillation. Forecasts of opportunity are identified, despite the low skill of monthly precipitation forecasts on average. Potential limits of monthly precipitation skill are explored that provide insight on the juxtaposition of “skill deserts” over the central United States with high skill regions over western North America.
Abstract
A forecast “bust” or “dropout” can be defined as an intermittent but significant loss of model forecast performance. Deterministic forecast dropouts are typically defined in terms of the 500-hPa geopotential height (Φ500) anomaly correlation coefficient (ACC) in the Northern Hemisphere (NH) dropping below a predefined threshold. This study first presents a multimodel comparison of dropouts in the Navy Global Environmental Model (NAVGEM) deterministic forecast with the ensemble control members from the Environment and Climate Change Canada (ECCC) Global Ensemble Prediction System (GEPS) and the National Centers for Environmental Prediction (NCEP) Global Ensemble Forecast System (GEFS). Then, the relationship between dropouts and large-scale pattern variability is investigated, focusing on the temporal variability and correlation of flow indices surrounding dropout events. Finally, three severe dropout events are examined from an ensemble perspective. The main findings of this work are the following: 1) forecast dropouts exhibit some relation between models; 2) although forecast dropouts do not have a single cause, the most severe dropouts in NAVGEM can be linked to specific behavior of the large-scale flow indices, that is, they tend to follow periods of rapidly escalating volatility of the flow indices, and they tend to occur during intervals where the AO and Pacific North American (PNA) indices are exhibiting unusually strong interdependence; and 3) for the dropout events examined from an ensemble perspective, the NAVGEM ensemble spread does not provide a strong signal of elevated potential for very large forecast errors.
Abstract
A forecast “bust” or “dropout” can be defined as an intermittent but significant loss of model forecast performance. Deterministic forecast dropouts are typically defined in terms of the 500-hPa geopotential height (Φ500) anomaly correlation coefficient (ACC) in the Northern Hemisphere (NH) dropping below a predefined threshold. This study first presents a multimodel comparison of dropouts in the Navy Global Environmental Model (NAVGEM) deterministic forecast with the ensemble control members from the Environment and Climate Change Canada (ECCC) Global Ensemble Prediction System (GEPS) and the National Centers for Environmental Prediction (NCEP) Global Ensemble Forecast System (GEFS). Then, the relationship between dropouts and large-scale pattern variability is investigated, focusing on the temporal variability and correlation of flow indices surrounding dropout events. Finally, three severe dropout events are examined from an ensemble perspective. The main findings of this work are the following: 1) forecast dropouts exhibit some relation between models; 2) although forecast dropouts do not have a single cause, the most severe dropouts in NAVGEM can be linked to specific behavior of the large-scale flow indices, that is, they tend to follow periods of rapidly escalating volatility of the flow indices, and they tend to occur during intervals where the AO and Pacific North American (PNA) indices are exhibiting unusually strong interdependence; and 3) for the dropout events examined from an ensemble perspective, the NAVGEM ensemble spread does not provide a strong signal of elevated potential for very large forecast errors.
Abstract
We evaluate the short-term weather forecast performance of three flavors of artificial neural networks (NNs): feed forward back propagation, radial basis function, and generalized regression. To prepare the application of the NNs to an operational setting, we tune NN hyperparameters using over two years of historical data. Five objective guidance products serve as predictors to the NNs: North American Mesoscale and Global Forecast System model output statistics (MOS) forecasts, the High-Resolution Rapid Refresh (HRRR) model, National Weather Service forecasts, and the National Blend of Models product. We independently test NN performance using 96 real-time forecasts of temperature, wind, and precipitation across 11 U.S. cities made during the WxChallenge, a weather forecasting competition. We demonstrate that all NNs significantly improve short-range weather forecasts relative to the traditional objective guidance aids used to train the networks. For example, 1-day maximum and minimum temperature forecast error is 20%–30% lower than MOS. However, NN improvement over multiple linear regression for short-term forecasts is not significant. We suggest this may be attributed to the small number of training samples, the operational nature of the experiment, and the short forecast lead times. Regardless, our results are consistent with previous work suggesting that applying NNs to model forecasts can have a positive impact on operational forecast skill and will become valuable tools when integrated into the forecast enterprise.
Significance Statement
We used approximately two years of historical weather data and objective forecasts for a number of cities to tune a series of artificial neural networks (NNs) to forecast 1-day values of maximum and minimum temperature, maximum sustained wind speed, and quantitative precipitation. We compare forecast error against common objective guidance and multiple linear regression. We found that the NNs exhibit about 25% lower error than common objective guidance for temperature forecasting and 50% lower error for wind speed. Our results suggest that NNs will be a valuable contributor to improving weather forecast skill when adopted into the existing forecast enterprise.
Abstract
We evaluate the short-term weather forecast performance of three flavors of artificial neural networks (NNs): feed forward back propagation, radial basis function, and generalized regression. To prepare the application of the NNs to an operational setting, we tune NN hyperparameters using over two years of historical data. Five objective guidance products serve as predictors to the NNs: North American Mesoscale and Global Forecast System model output statistics (MOS) forecasts, the High-Resolution Rapid Refresh (HRRR) model, National Weather Service forecasts, and the National Blend of Models product. We independently test NN performance using 96 real-time forecasts of temperature, wind, and precipitation across 11 U.S. cities made during the WxChallenge, a weather forecasting competition. We demonstrate that all NNs significantly improve short-range weather forecasts relative to the traditional objective guidance aids used to train the networks. For example, 1-day maximum and minimum temperature forecast error is 20%–30% lower than MOS. However, NN improvement over multiple linear regression for short-term forecasts is not significant. We suggest this may be attributed to the small number of training samples, the operational nature of the experiment, and the short forecast lead times. Regardless, our results are consistent with previous work suggesting that applying NNs to model forecasts can have a positive impact on operational forecast skill and will become valuable tools when integrated into the forecast enterprise.
Significance Statement
We used approximately two years of historical weather data and objective forecasts for a number of cities to tune a series of artificial neural networks (NNs) to forecast 1-day values of maximum and minimum temperature, maximum sustained wind speed, and quantitative precipitation. We compare forecast error against common objective guidance and multiple linear regression. We found that the NNs exhibit about 25% lower error than common objective guidance for temperature forecasting and 50% lower error for wind speed. Our results suggest that NNs will be a valuable contributor to improving weather forecast skill when adopted into the existing forecast enterprise.