Search Results
You are looking at 1 - 9 of 9 items for
- Author or Editor: Marion P. Mittermaier x
- Refine by Access: All Content x
Abstract
Skill is defined as actual forecast performance relative to the performance of a reference forecast. It is shown that the choice of reference (e.g., random or persistence) can affect the perceived performance of the forecast system. Two scores, the equitable threat score (ETS) and the odds ratio benefit skill score (ORBSS), were chosen to show the impact of using a persistence forecast, first using some simple hypothetical scenarios and second for actual forecasts from the Met Office Unified Model (UM) of precipitation, total cloud cover, and visibility during 2006. Overall persistence offers a sterner test of true forecast added value and accuracy, but using a more realistic reference may come at a cost. Using persistence introduces an additional degree of freedom to the skill assessment, which may be rather variable for “weather parameters.” Ultimately, the aim of any forecasting system should be to achieve a substantive separation between the inherent skill of the reference (which represents basic predictability) and the actual forecast.
Abstract
Skill is defined as actual forecast performance relative to the performance of a reference forecast. It is shown that the choice of reference (e.g., random or persistence) can affect the perceived performance of the forecast system. Two scores, the equitable threat score (ETS) and the odds ratio benefit skill score (ORBSS), were chosen to show the impact of using a persistence forecast, first using some simple hypothetical scenarios and second for actual forecasts from the Met Office Unified Model (UM) of precipitation, total cloud cover, and visibility during 2006. Overall persistence offers a sterner test of true forecast added value and accuracy, but using a more realistic reference may come at a cost. Using persistence introduces an additional degree of freedom to the skill assessment, which may be rather variable for “weather parameters.” Ultimately, the aim of any forecasting system should be to achieve a substantive separation between the inherent skill of the reference (which represents basic predictability) and the actual forecast.
Abstract
Routine verification of deterministic numerical weather prediction (NWP) forecasts from the convection-permitting 4-km (UK4) and near-convection-resolving 1.5-km (UKV) configurations of the Met Office Unified Model (MetUM) has shown that it is hard to consistently demonstrate an improvement in skill from the higher-resolution model, even though subjective comparison suggests that it performs better. In this paper the use of conventional metrics and precise matching (through extracting the nearest grid point to an observing site) of the forecast to conventional synoptic observations in space and time is replaced with the use of inherently probabilistic metrics such as the Brier score, ranked probability, and continuous ranked probability scores applied to neighborhoods of forecast grid points. Three neighborhood sizes were used: ~4, ~12, and ~25 km, which match the sizes of the grid elements currently used operationally. Six surface variables were considered: 2-m temperature, 10-m wind speed, total cloud amount (TCA), cloud-base height (CBH), visibility, and hourly precipitation. Any neighborhood has a positive impact on skill, either in reducing the skill deficit or enhancing the skillfulness over and above the single grid point. This is true for all variables. An optimal neighborhood appears to depend on the variable and threshold. Adopting this probabilistic approach enables easy comparison to future near-convection-resolving ensemble prediction systems (EPS) and also enables the optimization of postprocessing to maximize the skill of forecast products.
Abstract
Routine verification of deterministic numerical weather prediction (NWP) forecasts from the convection-permitting 4-km (UK4) and near-convection-resolving 1.5-km (UKV) configurations of the Met Office Unified Model (MetUM) has shown that it is hard to consistently demonstrate an improvement in skill from the higher-resolution model, even though subjective comparison suggests that it performs better. In this paper the use of conventional metrics and precise matching (through extracting the nearest grid point to an observing site) of the forecast to conventional synoptic observations in space and time is replaced with the use of inherently probabilistic metrics such as the Brier score, ranked probability, and continuous ranked probability scores applied to neighborhoods of forecast grid points. Three neighborhood sizes were used: ~4, ~12, and ~25 km, which match the sizes of the grid elements currently used operationally. Six surface variables were considered: 2-m temperature, 10-m wind speed, total cloud amount (TCA), cloud-base height (CBH), visibility, and hourly precipitation. Any neighborhood has a positive impact on skill, either in reducing the skill deficit or enhancing the skillfulness over and above the single grid point. This is true for all variables. An optimal neighborhood appears to depend on the variable and threshold. Adopting this probabilistic approach enables easy comparison to future near-convection-resolving ensemble prediction systems (EPS) and also enables the optimization of postprocessing to maximize the skill of forecast products.
Abstract
Synoptic observations are often treated as error-free representations of the true state of the real world. For example, when observations are used to verify numerical weather prediction (NWP) forecasts, forecast–observation differences (the total error) are often entirely attributed to forecast inaccuracy. Such simplification is no longer justifiable for short-lead forecasts made with increasingly accurate higher-resolution models. For example, at least 25% of t + 6 h individual Met Office site-specific (postprocessed) temperature forecasts now typically have total errors of less than 0.2 K, which are comparable to typical instrument measurement errors of around 0.1 K. In addition to instrument errors, uncertainty is introduced by measurements not being taken concurrently with the forecasts. For example, synoptic temperature observations in the United Kingdom are typically taken 10 min before the hour, whereas forecasts are generally extracted as instantaneous values on the hour. This study develops a simple yet robust statistical modeling procedure for assessing how serially correlated subhourly variations limit the forecast accuracy that can be achieved. The methodology is demonstrated by application to synoptic temperature observations sampled every minute at several locations around the United Kingdom. Results show that subhourly variations lead to sizeable forecast errors of 0.16–0.44 K for observations taken 10 min before the forecast issue time. The magnitude of this error depends on spatial location and the annual cycle, with the greater errors occurring in the warmer seasons and at inland sites. This important source of uncertainty consists of a bias due to the diurnal cycle, plus irreducible uncertainty due to unpredictable subhourly variations that fundamentally limit forecast accuracy.
Abstract
Synoptic observations are often treated as error-free representations of the true state of the real world. For example, when observations are used to verify numerical weather prediction (NWP) forecasts, forecast–observation differences (the total error) are often entirely attributed to forecast inaccuracy. Such simplification is no longer justifiable for short-lead forecasts made with increasingly accurate higher-resolution models. For example, at least 25% of t + 6 h individual Met Office site-specific (postprocessed) temperature forecasts now typically have total errors of less than 0.2 K, which are comparable to typical instrument measurement errors of around 0.1 K. In addition to instrument errors, uncertainty is introduced by measurements not being taken concurrently with the forecasts. For example, synoptic temperature observations in the United Kingdom are typically taken 10 min before the hour, whereas forecasts are generally extracted as instantaneous values on the hour. This study develops a simple yet robust statistical modeling procedure for assessing how serially correlated subhourly variations limit the forecast accuracy that can be achieved. The methodology is demonstrated by application to synoptic temperature observations sampled every minute at several locations around the United Kingdom. Results show that subhourly variations lead to sizeable forecast errors of 0.16–0.44 K for observations taken 10 min before the forecast issue time. The magnitude of this error depends on spatial location and the annual cycle, with the greater errors occurring in the warmer seasons and at inland sites. This important source of uncertainty consists of a bias due to the diurnal cycle, plus irreducible uncertainty due to unpredictable subhourly variations that fundamentally limit forecast accuracy.
Abstract
Monitoring precipitation forecast skill in global numerical weather prediction (NWP) models is an important yet challenging task. Rain gauges are inhomogeneously distributed, providing no information over large swathes of land and the oceans. Satellite-based products, on the other hand, provide near-global coverage at a resolution of ∼10–25 km, but limitations on data quality (e.g., biases) must be accommodated. In this paper the stable equitable error in probability space (SEEPS) is computed using a precipitation climatology derived from the Tropical Rainfall Measuring Mission (TRMM) TMPA 3B42 V7 product and a gauge-based climatology and then applied to two global configurations of the Met Office Unified Model (UM). The representativeness and resolution effects on an aggregated SEEPS are explored by comparing the gauge scores, based on extracting the nearest model grid point, with those computed by upscaling the model values to the TRMM grid and extracting the TRMM grid point nearest the gauge location. The sampling effect is explored by comparing the aggregate SEEPS for this subset of ∼6000 locations (dictated by the number of gauges available globally) with all land points within the TRMM region of 50°N and 50°S. The forecast performance over the oceanic areas is compared with performance over land. While the SEEPS computed using the two different climatologies should never be expected to be identical, using the TRMM climatology provides a means of evaluating near-global precipitation using an internally consistent dataset in a climatologically consistent way.
Abstract
Monitoring precipitation forecast skill in global numerical weather prediction (NWP) models is an important yet challenging task. Rain gauges are inhomogeneously distributed, providing no information over large swathes of land and the oceans. Satellite-based products, on the other hand, provide near-global coverage at a resolution of ∼10–25 km, but limitations on data quality (e.g., biases) must be accommodated. In this paper the stable equitable error in probability space (SEEPS) is computed using a precipitation climatology derived from the Tropical Rainfall Measuring Mission (TRMM) TMPA 3B42 V7 product and a gauge-based climatology and then applied to two global configurations of the Met Office Unified Model (UM). The representativeness and resolution effects on an aggregated SEEPS are explored by comparing the gauge scores, based on extracting the nearest model grid point, with those computed by upscaling the model values to the TRMM grid and extracting the TRMM grid point nearest the gauge location. The sampling effect is explored by comparing the aggregate SEEPS for this subset of ∼6000 locations (dictated by the number of gauges available globally) with all land points within the TRMM region of 50°N and 50°S. The forecast performance over the oceanic areas is compared with performance over land. While the SEEPS computed using the two different climatologies should never be expected to be identical, using the TRMM climatology provides a means of evaluating near-global precipitation using an internally consistent dataset in a climatologically consistent way.
Abstract
Ice clouds are an important yet largely unvalidated component of weather forecasting and climate models, but radar offers the potential to provide the necessary data to evaluate them. First in this paper, coordinated aircraft in situ measurements and scans by a 3-GHz radar are presented, demonstrating that, for stratiform midlatitude ice clouds, radar reflectivity in the Rayleigh-scattering regime may be reliably calculated from aircraft size spectra if the “Brown and Francis” mass–size relationship is used. The comparisons spanned radar reflectivity values from −15 to +20 dBZ, ice water contents (IWCs) from 0.01 to 0.4 g m−3, and median volumetric diameters between 0.2 and 3 mm. In mixed-phase conditions the agreement is much poorer because of the higher-density ice particles present. A large midlatitude aircraft dataset is then used to derive expressions that relate radar reflectivity and temperature to ice water content and visible extinction coefficient. The analysis is an advance over previous work in several ways: the retrievals vary smoothly with both input parameters, different relationships are derived for the common radar frequencies of 3, 35, and 94 GHz, and the problem of retrieving the long-term mean and the horizontal variance of ice cloud parameters is considered separately. It is shown that the dependence on temperature arises because of the temperature dependence of the number concentration “intercept parameter” rather than mean particle size. A comparison is presented of ice water content derived from scanning 3-GHz radar with the values held in the Met Office mesoscale forecast model, for eight precipitating cases spanning 39 h over southern England. It is found that the model predicted mean IWC to within 10% of the observations at temperatures between −30° and −10°C but tended to underestimate it by around a factor of 2 at colder temperatures.
Abstract
Ice clouds are an important yet largely unvalidated component of weather forecasting and climate models, but radar offers the potential to provide the necessary data to evaluate them. First in this paper, coordinated aircraft in situ measurements and scans by a 3-GHz radar are presented, demonstrating that, for stratiform midlatitude ice clouds, radar reflectivity in the Rayleigh-scattering regime may be reliably calculated from aircraft size spectra if the “Brown and Francis” mass–size relationship is used. The comparisons spanned radar reflectivity values from −15 to +20 dBZ, ice water contents (IWCs) from 0.01 to 0.4 g m−3, and median volumetric diameters between 0.2 and 3 mm. In mixed-phase conditions the agreement is much poorer because of the higher-density ice particles present. A large midlatitude aircraft dataset is then used to derive expressions that relate radar reflectivity and temperature to ice water content and visible extinction coefficient. The analysis is an advance over previous work in several ways: the retrievals vary smoothly with both input parameters, different relationships are derived for the common radar frequencies of 3, 35, and 94 GHz, and the problem of retrieving the long-term mean and the horizontal variance of ice cloud parameters is considered separately. It is shown that the dependence on temperature arises because of the temperature dependence of the number concentration “intercept parameter” rather than mean particle size. A comparison is presented of ice water content derived from scanning 3-GHz radar with the values held in the Met Office mesoscale forecast model, for eight precipitating cases spanning 39 h over southern England. It is found that the model predicted mean IWC to within 10% of the observations at temperatures between −30° and −10°C but tended to underestimate it by around a factor of 2 at colder temperatures.
Abstract
A major challenge of any operational cloud seeding project is the evaluation of the results. This paper describes the development of verification techniques based on data collected during the first South African operational rainfall enhancement project in which hygroscopic flares were used to seed the bases of convective storms. Radar storm properties as well as historical rainfall records were used in exploratory studies. The storm-scale analyses are viewed as extremely important, because individual storms are the units that are seeded. Their response to seeding has to be consistent with that of the seeded group in a randomized experiment using the same seeding technology before a positive effect on area rainfall can be expected. Sixty storms were selected for seeding, mostly early in their lifetimes. This permits a time-of-origin analysis in which the group of seeded storms can be compared to a “control” group of unseeded storms from the time they were first identified as 30-dBZ radar storm volumes. One such control group was obtained by selecting unseeded storms by using certain threshold criteria obtained from the seeded storms. Another control group was obtained by simply selecting the 60 largest storms from the set of unseeded storms meeting the threshold criteria. Yet another control group was obtained by matching the seeded storms, in the first 20 min of their lifetimes, before seeding effects can be expected, with a corresponding set of unseeded storms. Comparisons with the National Precipitation Research Programme’s randomized hygroscopic flare seeding experiment database show consistency in the way seeded storms reacted toward producing more rainfall. The analyses on historic rainfall suggest trends in the same direction, but it is shown that one has to be careful in interpreting these trends. The importance of quantitatively linking storm-scale seeding effects to apparent area effects is highlighted.
Abstract
A major challenge of any operational cloud seeding project is the evaluation of the results. This paper describes the development of verification techniques based on data collected during the first South African operational rainfall enhancement project in which hygroscopic flares were used to seed the bases of convective storms. Radar storm properties as well as historical rainfall records were used in exploratory studies. The storm-scale analyses are viewed as extremely important, because individual storms are the units that are seeded. Their response to seeding has to be consistent with that of the seeded group in a randomized experiment using the same seeding technology before a positive effect on area rainfall can be expected. Sixty storms were selected for seeding, mostly early in their lifetimes. This permits a time-of-origin analysis in which the group of seeded storms can be compared to a “control” group of unseeded storms from the time they were first identified as 30-dBZ radar storm volumes. One such control group was obtained by selecting unseeded storms by using certain threshold criteria obtained from the seeded storms. Another control group was obtained by simply selecting the 60 largest storms from the set of unseeded storms meeting the threshold criteria. Yet another control group was obtained by matching the seeded storms, in the first 20 min of their lifetimes, before seeding effects can be expected, with a corresponding set of unseeded storms. Comparisons with the National Precipitation Research Programme’s randomized hygroscopic flare seeding experiment database show consistency in the way seeded storms reacted toward producing more rainfall. The analyses on historic rainfall suggest trends in the same direction, but it is shown that one has to be careful in interpreting these trends. The importance of quantitatively linking storm-scale seeding effects to apparent area effects is highlighted.
Abstract
The International Verification Methods Workshop was held online in November 2020 and included sessions on physical error characterization using process diagnostics and error tracking techniques; exploitation of data assimilation techniques in verification practices, e.g., to address representativeness issues and observation uncertainty; spatial verification methods and the Model Evaluation Tools, as unified reference verification software; and meta-verification and best practices for scores computation. The workshop reached out to diverse research communities working in the areas of high-impact weather, subseasonal to seasonal prediction, polar prediction, and sea ice and ocean prediction. This article summarizes the major outcomes of the workshop and outlines future strategic directions for verification research.
Abstract
The International Verification Methods Workshop was held online in November 2020 and included sessions on physical error characterization using process diagnostics and error tracking techniques; exploitation of data assimilation techniques in verification practices, e.g., to address representativeness issues and observation uncertainty; spatial verification methods and the Model Evaluation Tools, as unified reference verification software; and meta-verification and best practices for scores computation. The workshop reached out to diverse research communities working in the areas of high-impact weather, subseasonal to seasonal prediction, polar prediction, and sea ice and ocean prediction. This article summarizes the major outcomes of the workshop and outlines future strategic directions for verification research.
Abstract
Recent advancements in numerical weather prediction (NWP) and the enhancement of model resolution have created the need for more robust and informative verification methods. In response to these needs, a plethora of spatial verification approaches have been developed in the past two decades. A spatial verification method intercomparison was established in 2007 with the aim of gaining a better understanding of the abilities of the new spatial verification methods to diagnose different types of forecast errors. The project focused on prescribed errors for quantitative precipitation forecasts over the central United States. The intercomparison led to a classification of spatial verification methods and a cataloging of their diagnostic capabilities, providing useful guidance to end users, model developers, and verification scientists. A decade later, NWP systems have continued to increase in resolution, including advances in high-resolution ensembles. This article describes the setup of a second phase of the verification intercomparison, called the Mesoscale Verification Intercomparison over Complex Terrain (MesoVICT). MesoVICT focuses on the application, capability, and enhancement of spatial verification methods to deterministic and ensemble forecasts of precipitation, wind, and temperature over complex terrain. Importantly, this phase also explores the issue of analysis uncertainty through the use of an ensemble of meteorological analyses.
Abstract
Recent advancements in numerical weather prediction (NWP) and the enhancement of model resolution have created the need for more robust and informative verification methods. In response to these needs, a plethora of spatial verification approaches have been developed in the past two decades. A spatial verification method intercomparison was established in 2007 with the aim of gaining a better understanding of the abilities of the new spatial verification methods to diagnose different types of forecast errors. The project focused on prescribed errors for quantitative precipitation forecasts over the central United States. The intercomparison led to a classification of spatial verification methods and a cataloging of their diagnostic capabilities, providing useful guidance to end users, model developers, and verification scientists. A decade later, NWP systems have continued to increase in resolution, including advances in high-resolution ensembles. This article describes the setup of a second phase of the verification intercomparison, called the Mesoscale Verification Intercomparison over Complex Terrain (MesoVICT). MesoVICT focuses on the application, capability, and enhancement of spatial verification methods to deterministic and ensemble forecasts of precipitation, wind, and temperature over complex terrain. Importantly, this phase also explores the issue of analysis uncertainty through the use of an ensemble of meteorological analyses.
Abstract
As part of the second phase of the spatial forecast verification intercomparison project (ICP), dubbed the Mesoscale Verification Intercomparison in Complex Terrain (MesoVICT) project, a new set of idealized test fields is prepared. This paper describes these new fields and their rationale and uses them to analyze a number of summary measures associated with distance and geometric-based approaches. The results provide guidance about how they inform about performance under various scenarios. The new case comparisons are grouped into four categories: (i) pathological situations such as when a variable is zero valued at all grid points; (ii) circular events aimed at evaluating how different methods handle contrived situations, such as equal but opposite translations, the presence of multiple events of same/different size, boundary effects, and the influence of the positioning of events in the domain; (iii) elliptical events representing simplified scenarios that mimic commonly encountered weather phenomena in complex terrain; and (iv) cases aimed at analyzing how the verification methods handle small-scale scattered events, very large events with holes (e.g., a small portion of clear sky on a cloudy overcast day), and the presence of noise in one or both fields. Results show that all analyzed measures perform poorly in the pathological setting. They are either not able to provide a result at all or they instigate a special rule to prescribe a value resulting in erratic results. The analysis also showed that methods provide similar information in many situations, but that each has its positive properties along with certain unique limitations.
Abstract
As part of the second phase of the spatial forecast verification intercomparison project (ICP), dubbed the Mesoscale Verification Intercomparison in Complex Terrain (MesoVICT) project, a new set of idealized test fields is prepared. This paper describes these new fields and their rationale and uses them to analyze a number of summary measures associated with distance and geometric-based approaches. The results provide guidance about how they inform about performance under various scenarios. The new case comparisons are grouped into four categories: (i) pathological situations such as when a variable is zero valued at all grid points; (ii) circular events aimed at evaluating how different methods handle contrived situations, such as equal but opposite translations, the presence of multiple events of same/different size, boundary effects, and the influence of the positioning of events in the domain; (iii) elliptical events representing simplified scenarios that mimic commonly encountered weather phenomena in complex terrain; and (iv) cases aimed at analyzing how the verification methods handle small-scale scattered events, very large events with holes (e.g., a small portion of clear sky on a cloudy overcast day), and the presence of noise in one or both fields. Results show that all analyzed measures perform poorly in the pathological setting. They are either not able to provide a result at all or they instigate a special rule to prescribe a value resulting in erratic results. The analysis also showed that methods provide similar information in many situations, but that each has its positive properties along with certain unique limitations.