Search Results
You are looking at 1 - 10 of 31 items for :
- Author or Editor: S. Richardson x
- Article x
- Refine by Access: All Content x
Abstract
The relative operating characteristic (ROC) curve is a popular diagnostic tool in forecast verification, with the area under the ROC curve (AUC) used as a verification metric measuring the discrimination ability of a forecast. Along with calibration, discrimination is deemed as a fundamental probabilistic forecast attribute. In particular, in ensemble forecast verification, AUC provides a basis for the comparison of potential predictive skill of competing forecasts. While this approach is straightforward when dealing with forecasts of common events (e.g., probability of precipitation), the AUC interpretation can turn out to be oversimplistic or misleading when focusing on rare events (e.g., precipitation exceeding some warning criterion). How should we interpret AUC of ensemble forecasts when focusing on rare events? How can changes in the way probability forecasts are derived from the ensemble forecast affect AUC results? How can we detect a genuine improvement in terms of predictive skill? Based on verification experiments, a critical eye is cast on the AUC interpretation to answer these questions. As well as the traditional trapezoidal approximation and the well-known binormal fitting model, we discuss a new approach that embraces the concept of imprecise probabilities and relies on the subdivision of the lowest ensemble probability category.
Abstract
The relative operating characteristic (ROC) curve is a popular diagnostic tool in forecast verification, with the area under the ROC curve (AUC) used as a verification metric measuring the discrimination ability of a forecast. Along with calibration, discrimination is deemed as a fundamental probabilistic forecast attribute. In particular, in ensemble forecast verification, AUC provides a basis for the comparison of potential predictive skill of competing forecasts. While this approach is straightforward when dealing with forecasts of common events (e.g., probability of precipitation), the AUC interpretation can turn out to be oversimplistic or misleading when focusing on rare events (e.g., precipitation exceeding some warning criterion). How should we interpret AUC of ensemble forecasts when focusing on rare events? How can changes in the way probability forecasts are derived from the ensemble forecast affect AUC results? How can we detect a genuine improvement in terms of predictive skill? Based on verification experiments, a critical eye is cast on the AUC interpretation to answer these questions. As well as the traditional trapezoidal approximation and the well-known binormal fitting model, we discuss a new approach that embraces the concept of imprecise probabilities and relies on the subdivision of the lowest ensemble probability category.
Abstract
Previous work found that cold pools in ordinary convection are more sensitive to the microphysics scheme when the lifting condensation level (LCL) is higher owing to a greater evaporation potential, which magnifies microphysical uncertainties. In the current study, we explore whether the same reasoning can be applied to supercellular cold pools. To do this, four perturbed-microphysics ensembles are run, with each using an environment with a different LCL. Similar to ordinary convection, the sensitivity of supercellular cold pools to the microphysics increases with higher LCLs, though the physical reasoning for this increase in sensitivity differs from a previous study. Using buoyancy budgets along parcel trajectories that terminate in the cold pool, we find that negative buoyancy generated by microphysical cooling is partially countered by a decrease in environmental potential temperatures as the parcel descends. This partial erosion of negative buoyancy as parcels descend is most pronounced in the low-LCL storms, which have steeper vertical profiles of environmental potential temperature in the lower atmosphere. When this erosion is accounted for, the strength of the strongest cold pools in the low-LCL ensemble is reduced, resulting in a narrower distribution of cold pool strengths. This narrower distribution is indicative of reduced sensitivity to the microphysics. These results suggest that supercell behavior and supercell hazards (e.g., tornadoes) may be more predictable in low-LCL environments.
Significance Statement
Thunderstorms typically produce “pools” of cold air beneath them owing in part to the evaporation of rain and melting of ice produced by the storm. Past work has found that in computer simulations of thunderstorms, the cold pools that form beneath thunderstorms are sensitive to how rain and ice are modeled in the simulation. In this study, we show that in the strongest thunderstorms that are capable of producing tornadoes, this sensitivity is reduced when the humidity in the lowest few kilometers above the surface is increased. Exploring why the sensitivity is reduced when the humidity increases provides a deeper understanding of the relationship between humidity and cold pool strength, which is important for severe storm forecasting.
Abstract
Previous work found that cold pools in ordinary convection are more sensitive to the microphysics scheme when the lifting condensation level (LCL) is higher owing to a greater evaporation potential, which magnifies microphysical uncertainties. In the current study, we explore whether the same reasoning can be applied to supercellular cold pools. To do this, four perturbed-microphysics ensembles are run, with each using an environment with a different LCL. Similar to ordinary convection, the sensitivity of supercellular cold pools to the microphysics increases with higher LCLs, though the physical reasoning for this increase in sensitivity differs from a previous study. Using buoyancy budgets along parcel trajectories that terminate in the cold pool, we find that negative buoyancy generated by microphysical cooling is partially countered by a decrease in environmental potential temperatures as the parcel descends. This partial erosion of negative buoyancy as parcels descend is most pronounced in the low-LCL storms, which have steeper vertical profiles of environmental potential temperature in the lower atmosphere. When this erosion is accounted for, the strength of the strongest cold pools in the low-LCL ensemble is reduced, resulting in a narrower distribution of cold pool strengths. This narrower distribution is indicative of reduced sensitivity to the microphysics. These results suggest that supercell behavior and supercell hazards (e.g., tornadoes) may be more predictable in low-LCL environments.
Significance Statement
Thunderstorms typically produce “pools” of cold air beneath them owing in part to the evaporation of rain and melting of ice produced by the storm. Past work has found that in computer simulations of thunderstorms, the cold pools that form beneath thunderstorms are sensitive to how rain and ice are modeled in the simulation. In this study, we show that in the strongest thunderstorms that are capable of producing tornadoes, this sensitivity is reduced when the humidity in the lowest few kilometers above the surface is increased. Exploring why the sensitivity is reduced when the humidity increases provides a deeper understanding of the relationship between humidity and cold pool strength, which is important for severe storm forecasting.
Abstract
The effects of observation errors on rank histograms and reliability diagrams are investigated using a perfect model approach. The three-variable Lorenz-63 model was used to simulate an idealized ensemble prediction system (EPS) with 50 perturbed ensemble members and one control forecast. Observation errors at verification time were introduced by adding normally distributed noise to the true state at verification time. Besides these simulations, a theoretical analysis was also performed. One of the major findings was that rank histograms are very sensitive to the presence of observation errors, leading to overpopulated upper- and lowermost ranks. This sensitivity was shown to grow for larger ensemble sizes. Reliability diagrams were far less sensitive in this respect. The resulting u-shaped rank histograms can easily be misinterpreted as indicating too little spread in the ensemble prediction system. To account for this effect when real observations are used to assess an ensemble prediction system, normally distributed noise based on the verifying observation error can be added to the ensemble members before the statistics are calculated. The method has been tested for the ECMWF ensemble forecasts of ocean waves and forecasts of the geopotential at 500 hPa. The EPS waves were compared with buoy observations from the Global Telecommunication System (GTS) for a period of almost 3 yr. When the buoy observations were taken as the true value, more than 25% of the observations appeared in the two extreme ranks for the day 3 forecast range. This number was reduced to less than 10% when observation errors were added to the ensemble members. Ensemble forecasts of the 500-hPa geopotential were verified against the ECMWF analysis. When analysis errors were neglected, the maximum number of outliers was more than 10% for most areas except for Europe, where the analysis errors are relatively smaller. Introducing noise to the ensemble members, based on estimates of analysis errors, reduced the number of outliers, particularly in the short range, where a peak around day 1 more or less vanished.
Abstract
The effects of observation errors on rank histograms and reliability diagrams are investigated using a perfect model approach. The three-variable Lorenz-63 model was used to simulate an idealized ensemble prediction system (EPS) with 50 perturbed ensemble members and one control forecast. Observation errors at verification time were introduced by adding normally distributed noise to the true state at verification time. Besides these simulations, a theoretical analysis was also performed. One of the major findings was that rank histograms are very sensitive to the presence of observation errors, leading to overpopulated upper- and lowermost ranks. This sensitivity was shown to grow for larger ensemble sizes. Reliability diagrams were far less sensitive in this respect. The resulting u-shaped rank histograms can easily be misinterpreted as indicating too little spread in the ensemble prediction system. To account for this effect when real observations are used to assess an ensemble prediction system, normally distributed noise based on the verifying observation error can be added to the ensemble members before the statistics are calculated. The method has been tested for the ECMWF ensemble forecasts of ocean waves and forecasts of the geopotential at 500 hPa. The EPS waves were compared with buoy observations from the Global Telecommunication System (GTS) for a period of almost 3 yr. When the buoy observations were taken as the true value, more than 25% of the observations appeared in the two extreme ranks for the day 3 forecast range. This number was reduced to less than 10% when observation errors were added to the ensemble members. Ensemble forecasts of the 500-hPa geopotential were verified against the ECMWF analysis. When analysis errors were neglected, the maximum number of outliers was more than 10% for most areas except for Europe, where the analysis errors are relatively smaller. Introducing noise to the ensemble members, based on estimates of analysis errors, reduced the number of outliers, particularly in the short range, where a peak around day 1 more or less vanished.
Abstract
Recent high-resolution numerical simulations of supercells have identified a feature referred to as the streamwise vorticity current (SVC). Some have presumed the SVC to play a role in tornadogenesis and maintenance, though observations of such a feature have been limited. To this end, 125-m dual-Doppler wind syntheses and mobile mesonet observations are used to examine three observed supercells for evidence of an SVC. Two of the three supercells are found to contain a feature similar to an SVC, while the other supercell contains an antistreamwise vorticity ribbon on the southern fringe of the forward flank. A closer examination of the two supercells with SVCs reveals that the SVCs are located on the cool side of boundaries within the forward flank that separate colder, more turbulent flow from warmer, more laminar flow, similar to numerical simulations. Furthermore, the observed SVCs are similar to those in simulations in that they appear to be associated with baroclinic vorticity generation and have similar appearances in vertical cross sections. Aside from some apparent differences in the location of the maximum streamwise vorticity between simulated and observed SVCs, the SVCs seen in numerical simulations are indeed similar to reality. The SVC, however, may not be essential for tornadogenesis, at least for weak tornadoes, because the supercell that did not have a well-defined SVC produced at least one brief, weak tornado during the analysis period.
Abstract
Recent high-resolution numerical simulations of supercells have identified a feature referred to as the streamwise vorticity current (SVC). Some have presumed the SVC to play a role in tornadogenesis and maintenance, though observations of such a feature have been limited. To this end, 125-m dual-Doppler wind syntheses and mobile mesonet observations are used to examine three observed supercells for evidence of an SVC. Two of the three supercells are found to contain a feature similar to an SVC, while the other supercell contains an antistreamwise vorticity ribbon on the southern fringe of the forward flank. A closer examination of the two supercells with SVCs reveals that the SVCs are located on the cool side of boundaries within the forward flank that separate colder, more turbulent flow from warmer, more laminar flow, similar to numerical simulations. Furthermore, the observed SVCs are similar to those in simulations in that they appear to be associated with baroclinic vorticity generation and have similar appearances in vertical cross sections. Aside from some apparent differences in the location of the maximum streamwise vorticity between simulated and observed SVCs, the SVCs seen in numerical simulations are indeed similar to reality. The SVC, however, may not be essential for tornadogenesis, at least for weak tornadoes, because the supercell that did not have a well-defined SVC produced at least one brief, weak tornado during the analysis period.
Abstract
Early awareness of extreme precipitation can provide the time necessary to make adequate event preparations. At the European Centre for Medium-Range Weather Forecasts (ECMWF), one tool that condenses the forecast information from the Integrated Forecasting System ensemble (ENS) is the extreme forecast index (EFI), an index that highlights regions that are forecast to have potentially anomalous weather conditions compared to the local climate. This paper builds on previous findings by undertaking a global verification throughout the medium-range forecast horizon (out to 15 days) on the ability of the EFI for water vapor transport [integrated vapor transport (IVT)] and precipitation to capture extreme observed precipitation. Using the ECMWF ENS for winters 2015/16 and 2016/17 and daily surface precipitation observations, the relative operating characteristic is used to show that the IVT EFI is more skillful than the precipitation EFI in forecast week 2 over Europe and western North America. It is the large-scale nature of the IVT, its higher predictability, and its relationship with extreme precipitation that result in its potential usefulness in these regions, which, in turn, could provide earlier awareness of extreme precipitation. Conversely, at shorter lead times the precipitation EFI is more useful, although the IVT EFI can provide synoptic-scale understanding. For the whole globe, the extratropical Northern Hemisphere, the tropics, and North America, the precipitation EFI is more useful throughout the medium range, suggesting that precipitation processes not captured in the IVT are important (e.g., tropical convection). Following these results, the operational implementation of the IVT EFI is currently being planned.
Abstract
Early awareness of extreme precipitation can provide the time necessary to make adequate event preparations. At the European Centre for Medium-Range Weather Forecasts (ECMWF), one tool that condenses the forecast information from the Integrated Forecasting System ensemble (ENS) is the extreme forecast index (EFI), an index that highlights regions that are forecast to have potentially anomalous weather conditions compared to the local climate. This paper builds on previous findings by undertaking a global verification throughout the medium-range forecast horizon (out to 15 days) on the ability of the EFI for water vapor transport [integrated vapor transport (IVT)] and precipitation to capture extreme observed precipitation. Using the ECMWF ENS for winters 2015/16 and 2016/17 and daily surface precipitation observations, the relative operating characteristic is used to show that the IVT EFI is more skillful than the precipitation EFI in forecast week 2 over Europe and western North America. It is the large-scale nature of the IVT, its higher predictability, and its relationship with extreme precipitation that result in its potential usefulness in these regions, which, in turn, could provide earlier awareness of extreme precipitation. Conversely, at shorter lead times the precipitation EFI is more useful, although the IVT EFI can provide synoptic-scale understanding. For the whole globe, the extratropical Northern Hemisphere, the tropics, and North America, the precipitation EFI is more useful throughout the medium range, suggesting that precipitation processes not captured in the IVT are important (e.g., tropical convection). Following these results, the operational implementation of the IVT EFI is currently being planned.
Abstract
Subseasonal forecast skill is not homogeneous in time, and prior assessment of the likely forecast skill would be valuable for end-users. We propose a method for identifying periods of high forecast confidence using atmospheric circulation patterns, with an application to southern Australia precipitation. In particular, we use archetypal analysis to derive six patterns, called archetypes, of daily 500-hPa geopotential height (Z 500) fields over Australia. We assign Z 500 reanalysis fields to the closest-matching archetype and subsequently link the archetypes to precipitation for three key regions in the Australian agriculture and energy sectors: the Murray Basin, southwest Western Australia, and western Tasmania. Using a 20-yr hindcast dataset from the European Centre for Medium-Range Weather Forecasts subseasonal-to-seasonal prediction system, we identify periods of high confidence as when hindcast Z 500 fields closely match an archetype according to a distance criterion. We compare the precipitation hindcast accuracy during these confident periods compared to normal. Considering all archetypes, we show that there is greater skill during confident periods for lead times of less than 10 days in the Murray Basin and western Tasmania, and for greater than 6 days in southwest Western Australia, although these conclusions are subject to substantial uncertainty. By breaking down the skill results for each archetype individually, we highlight how skill tends to be greater than normal for those archetypes associated with drier-than-average conditions.
Abstract
Subseasonal forecast skill is not homogeneous in time, and prior assessment of the likely forecast skill would be valuable for end-users. We propose a method for identifying periods of high forecast confidence using atmospheric circulation patterns, with an application to southern Australia precipitation. In particular, we use archetypal analysis to derive six patterns, called archetypes, of daily 500-hPa geopotential height (Z 500) fields over Australia. We assign Z 500 reanalysis fields to the closest-matching archetype and subsequently link the archetypes to precipitation for three key regions in the Australian agriculture and energy sectors: the Murray Basin, southwest Western Australia, and western Tasmania. Using a 20-yr hindcast dataset from the European Centre for Medium-Range Weather Forecasts subseasonal-to-seasonal prediction system, we identify periods of high confidence as when hindcast Z 500 fields closely match an archetype according to a distance criterion. We compare the precipitation hindcast accuracy during these confident periods compared to normal. Considering all archetypes, we show that there is greater skill during confident periods for lead times of less than 10 days in the Murray Basin and western Tasmania, and for greater than 6 days in southwest Western Australia, although these conclusions are subject to substantial uncertainty. By breaking down the skill results for each archetype individually, we highlight how skill tends to be greater than normal for those archetypes associated with drier-than-average conditions.
Abstract
From time to time atmospheric flows become organized and form coherent long-lived structures. Such structures could be propagating, quasi-stationary, or recur in place. We investigate the ability of principal components analysis (PCA) and archetypal analysis (AA) to identify long-lived events, excluding propagating forms. Our analysis is carried out on the Southern Hemisphere midtropospheric flow represented by geopotential height at 500 hPa (Z 500). The leading basis patterns of Z 500 for PCA and AA are similar and describe structures representing (or similar to) the southern annular mode (SAM) and Pacific–South American (PSA) pattern. Long-lived events are identified here from sequences of 8 days or longer where the same basis pattern dominates for PCA or AA. AA identifies more long-lived events than PCA using this approach. The most commonly occurring long-lived event for both AA and PCA is the annular SAM-like pattern. The second most commonly occurring event is the PSA-like Pacific wave train for both AA and PCA. For AA the flow at any given time is approximated as weighted contributions from each basis pattern, which lends itself to metrics for discriminating among basis patterns. These show that the longest long-lived events are in general better expressed than shorter events. Case studies of long-lived events featuring a blocking structure and an annular structure show that both PCA and AA can identify and discriminate the dominant basis pattern that most closely resembles the flow event.
Abstract
From time to time atmospheric flows become organized and form coherent long-lived structures. Such structures could be propagating, quasi-stationary, or recur in place. We investigate the ability of principal components analysis (PCA) and archetypal analysis (AA) to identify long-lived events, excluding propagating forms. Our analysis is carried out on the Southern Hemisphere midtropospheric flow represented by geopotential height at 500 hPa (Z 500). The leading basis patterns of Z 500 for PCA and AA are similar and describe structures representing (or similar to) the southern annular mode (SAM) and Pacific–South American (PSA) pattern. Long-lived events are identified here from sequences of 8 days or longer where the same basis pattern dominates for PCA or AA. AA identifies more long-lived events than PCA using this approach. The most commonly occurring long-lived event for both AA and PCA is the annular SAM-like pattern. The second most commonly occurring event is the PSA-like Pacific wave train for both AA and PCA. For AA the flow at any given time is approximated as weighted contributions from each basis pattern, which lends itself to metrics for discriminating among basis patterns. These show that the longest long-lived events are in general better expressed than shorter events. Case studies of long-lived events featuring a blocking structure and an annular structure show that both PCA and AA can identify and discriminate the dominant basis pattern that most closely resembles the flow event.
Abstract
On the afternoon and evening of 22 May 2002, high-resolution observations of the boundary layer (BL) and a dryline were obtained in the eastern Oklahoma and Texas panhandles during the International H2O Project. Using overdetermined multiple-Doppler radar syntheses in concert with a Lagrangian analysis of water vapor and temperature fields, the 3D kinematic and thermodynamic structure of the dryline and surrounding BL have been analyzed over a nearly 2-h period. The dryline is resolved as a strong (2–4 g kg−1 km−1) gradient of water vapor mixing ratio that resides in a nearly north–south-oriented zone of convergence. Maintained through frontogenesis, the dryline is also located within a gradient of virtual potential temperature, which induces a persistent, solenoidally forced secondary circulation. Initially quasi-stationary, the dryline retrogrades to the west during early evening and displays complicated substructures including small wavelike perturbations that travel from south to north at nearly the speed of the mean BL flow. A second, minor dryline has similar characteristics to the first, but has weaker gradients and circulations. The BL adjacent to the dryline exhibits complicated structures, consisting of combinations of open cells, horizontal convective rolls, and transverse rolls. Strong convergence and vertical motion at the dryline act to lift moisture, and high-based cumulus clouds are observed in the analysis domain. Although the top of the analysis domain is below the lifted condensation level height, vertical extrapolation of the moisture fields generally agrees with cloud locations. Mesoscale vortices that move along the dryline induce a transient eastward dryline motion due to the eastward advection of dry air following misocyclone passage. Refractivity-based moisture and differential reflectivity analyses are used to help interpret the Lagrangian analyses.
Abstract
On the afternoon and evening of 22 May 2002, high-resolution observations of the boundary layer (BL) and a dryline were obtained in the eastern Oklahoma and Texas panhandles during the International H2O Project. Using overdetermined multiple-Doppler radar syntheses in concert with a Lagrangian analysis of water vapor and temperature fields, the 3D kinematic and thermodynamic structure of the dryline and surrounding BL have been analyzed over a nearly 2-h period. The dryline is resolved as a strong (2–4 g kg−1 km−1) gradient of water vapor mixing ratio that resides in a nearly north–south-oriented zone of convergence. Maintained through frontogenesis, the dryline is also located within a gradient of virtual potential temperature, which induces a persistent, solenoidally forced secondary circulation. Initially quasi-stationary, the dryline retrogrades to the west during early evening and displays complicated substructures including small wavelike perturbations that travel from south to north at nearly the speed of the mean BL flow. A second, minor dryline has similar characteristics to the first, but has weaker gradients and circulations. The BL adjacent to the dryline exhibits complicated structures, consisting of combinations of open cells, horizontal convective rolls, and transverse rolls. Strong convergence and vertical motion at the dryline act to lift moisture, and high-based cumulus clouds are observed in the analysis domain. Although the top of the analysis domain is below the lifted condensation level height, vertical extrapolation of the moisture fields generally agrees with cloud locations. Mesoscale vortices that move along the dryline induce a transient eastward dryline motion due to the eastward advection of dry air following misocyclone passage. Refractivity-based moisture and differential reflectivity analyses are used to help interpret the Lagrangian analyses.
Abstract
Convective inhibition (CIN) is one of the parameters used by forecasters to determine the inflow layer of a convective storm, but little work has examined the best way to compute CIN. One decision that must be made is whether to lift parcels following a pseudoadiabat (removing hydrometeors as the parcel ascends) or reversible moist adiabat (retaining hydrometeors). To determine which option is best, idealized simulations of ordinary convection are examined using a variety of base states with different reversible CIN values for parcels originating in the lowest 500 m. Parcel trajectories suggest that ascent over the lowest few kilometers, where CIN is typically accumulated, is best conceptualized as a reversible moist adiabatic process instead of a pseudoadiabatic process. Most inflow layers do not contain parcels with substantial reversible CIN, despite these parcels possessing ample convective available potential energy and minimal pseudoadiabatic CIN. If a stronger initiation method is used, or hydrometeor loading is ignored, simulations can ingest more parcels with large amounts of reversible CIN. These results suggest that reversible CIN, not pseudoadiabatic CIN, is the physically relevant way to compute CIN and that forecasters may benefit from examining reversible CIN instead of pseudoadiabatic CIN when determining the inflow layer.
Abstract
Convective inhibition (CIN) is one of the parameters used by forecasters to determine the inflow layer of a convective storm, but little work has examined the best way to compute CIN. One decision that must be made is whether to lift parcels following a pseudoadiabat (removing hydrometeors as the parcel ascends) or reversible moist adiabat (retaining hydrometeors). To determine which option is best, idealized simulations of ordinary convection are examined using a variety of base states with different reversible CIN values for parcels originating in the lowest 500 m. Parcel trajectories suggest that ascent over the lowest few kilometers, where CIN is typically accumulated, is best conceptualized as a reversible moist adiabatic process instead of a pseudoadiabatic process. Most inflow layers do not contain parcels with substantial reversible CIN, despite these parcels possessing ample convective available potential energy and minimal pseudoadiabatic CIN. If a stronger initiation method is used, or hydrometeor loading is ignored, simulations can ingest more parcels with large amounts of reversible CIN. These results suggest that reversible CIN, not pseudoadiabatic CIN, is the physically relevant way to compute CIN and that forecasters may benefit from examining reversible CIN instead of pseudoadiabatic CIN when determining the inflow layer.
Abstract
While chaos ensures that probabilistic weather forecasts cannot always be “sharp,” it is important for users and developers that they are reliable. For example, they should not be overconfident or underconfident. The “spread–error” relationship is often used as a first-order assessment of the reliability of ensemble weather forecasts. This states that the ensemble standard deviation (a measure of forecast uncertainty) should match the root-mean-square error on the ensemble mean (when averaged over a sufficient number of forecast start dates). It is shown here that this relationship is now largely satisfied at the European Centre for Medium-Range Weather Forecasts (ECMWF) for ensemble forecasts of the midlatitude, midtropospheric flow out to lead times of at least 10 days when averaged over all flow situations throughout the year. This study proposes a practical framework for continued improvement in the reliability (and skill) of such forecasts. This involves the diagnosis of flow-dependent deficiencies in short-range (∼12 h) reliability for a range of synoptic-scale flow types and the prioritization of modeling research to address these deficiencies. The approach is demonstrated for a previously identified flow type, a trough over the Rockies with warm, moist air ahead. The mesoscale convective systems that can ensue are difficult to predict and, by perturbing the jet stream, are thought to lead to deterministic forecast “busts” for Europe several days later. The results here suggest that jet stream spread is insufficient during this flow type, and thus unreliable. This is likely to mean that the uncertain forecasts for Europe may, nevertheless, still be overconfident.
Abstract
While chaos ensures that probabilistic weather forecasts cannot always be “sharp,” it is important for users and developers that they are reliable. For example, they should not be overconfident or underconfident. The “spread–error” relationship is often used as a first-order assessment of the reliability of ensemble weather forecasts. This states that the ensemble standard deviation (a measure of forecast uncertainty) should match the root-mean-square error on the ensemble mean (when averaged over a sufficient number of forecast start dates). It is shown here that this relationship is now largely satisfied at the European Centre for Medium-Range Weather Forecasts (ECMWF) for ensemble forecasts of the midlatitude, midtropospheric flow out to lead times of at least 10 days when averaged over all flow situations throughout the year. This study proposes a practical framework for continued improvement in the reliability (and skill) of such forecasts. This involves the diagnosis of flow-dependent deficiencies in short-range (∼12 h) reliability for a range of synoptic-scale flow types and the prioritization of modeling research to address these deficiencies. The approach is demonstrated for a previously identified flow type, a trough over the Rockies with warm, moist air ahead. The mesoscale convective systems that can ensue are difficult to predict and, by perturbing the jet stream, are thought to lead to deterministic forecast “busts” for Europe several days later. The results here suggest that jet stream spread is insufficient during this flow type, and thus unreliable. This is likely to mean that the uncertain forecasts for Europe may, nevertheless, still be overconfident.