Search Results
You are looking at 11 - 20 of 34 items for
- Author or Editor: Barbara G. Brown x
- Refine by Access: All Content x
Abstract
Several spatial forecast verification methods have been developed that are suited for high-resolution precipitation forecasts. They can account for the spatial coherence of precipitation and give credit to a forecast that does not necessarily match the observation at any particular grid point. The methods were grouped into four broad categories (neighborhood, scale separation, features based, and field deformation) for the Spatial Forecast Verification Methods Intercomparison Project (ICP). Participants were asked to apply their new methods to a set of artificial geometric and perturbed forecasts with prescribed errors, and a set of real forecasts of convective precipitation on a 4-km grid. This paper describes the intercomparison test cases, summarizes results from the geometric cases, and presents subjective scores and traditional scores from the real cases.
All the new methods could detect bias error, and the features-based and field deformation methods were also able to diagnose displacement errors of precipitation features. The best approach for capturing errors in aspect ratio was field deformation. When comparing model forecasts with real cases, the traditional verification scores did not agree with the subjective assessment of the forecasts.
Abstract
Several spatial forecast verification methods have been developed that are suited for high-resolution precipitation forecasts. They can account for the spatial coherence of precipitation and give credit to a forecast that does not necessarily match the observation at any particular grid point. The methods were grouped into four broad categories (neighborhood, scale separation, features based, and field deformation) for the Spatial Forecast Verification Methods Intercomparison Project (ICP). Participants were asked to apply their new methods to a set of artificial geometric and perturbed forecasts with prescribed errors, and a set of real forecasts of convective precipitation on a 4-km grid. This paper describes the intercomparison test cases, summarizes results from the geometric cases, and presents subjective scores and traditional scores from the real cases.
All the new methods could detect bias error, and the features-based and field deformation methods were also able to diagnose displacement errors of precipitation features. The best approach for capturing errors in aspect ratio was field deformation. When comparing model forecasts with real cases, the traditional verification scores did not agree with the subjective assessment of the forecasts.
Abstract
Advancements in weather forecast models and their enhanced resolution have led to substantially improved and more realistic-appearing forecasts for some variables. However, traditional verification scores often indicate poor performance because of the increased small-scale variability so that the true quality of the forecasts is not always characterized well. As a result, numerous new methods for verifying these forecasts have been proposed. These new methods can mostly be classified into two overall categories: filtering methods and displacement methods. The filtering methods can be further delineated into neighborhood and scale separation, and the displacement methods can be divided into features based and field deformation. Each method gives considerably more information than the traditional scores, but it is not clear which method(s) should be used for which purpose.
A verification methods intercomparison project has been established in order to glean a better understanding of the proposed methods in terms of their various characteristics and to determine what verification questions each method addresses. The study is ongoing, and preliminary qualitative results for the different approaches applied to different situations are described here. In particular, the various methods and their basic characteristics, similarities, and differences are described. In addition, several questions are addressed regarding the application of the methods and the information that they provide. These questions include (i) how the method(s) inform performance at different scales; (ii) how the methods provide information on location errors; (iii) whether the methods provide information on intensity errors and distributions; (iv) whether the methods provide information on structure errors; (v) whether the approaches have the ability to provide information about hits, misses, and false alarms; (vi) whether the methods do anything that is counterintuitive; (vii) whether the methods have selectable parameters and how sensitive the results are to parameter selection; (viii) whether the results can be easily aggregated across multiple cases; (ix) whether the methods can identify timing errors; and (x) whether confidence intervals and hypothesis tests can be readily computed.
Abstract
Advancements in weather forecast models and their enhanced resolution have led to substantially improved and more realistic-appearing forecasts for some variables. However, traditional verification scores often indicate poor performance because of the increased small-scale variability so that the true quality of the forecasts is not always characterized well. As a result, numerous new methods for verifying these forecasts have been proposed. These new methods can mostly be classified into two overall categories: filtering methods and displacement methods. The filtering methods can be further delineated into neighborhood and scale separation, and the displacement methods can be divided into features based and field deformation. Each method gives considerably more information than the traditional scores, but it is not clear which method(s) should be used for which purpose.
A verification methods intercomparison project has been established in order to glean a better understanding of the proposed methods in terms of their various characteristics and to determine what verification questions each method addresses. The study is ongoing, and preliminary qualitative results for the different approaches applied to different situations are described here. In particular, the various methods and their basic characteristics, similarities, and differences are described. In addition, several questions are addressed regarding the application of the methods and the information that they provide. These questions include (i) how the method(s) inform performance at different scales; (ii) how the methods provide information on location errors; (iii) whether the methods provide information on intensity errors and distributions; (iv) whether the methods provide information on structure errors; (v) whether the approaches have the ability to provide information about hits, misses, and false alarms; (vi) whether the methods do anything that is counterintuitive; (vii) whether the methods have selectable parameters and how sensitive the results are to parameter selection; (viii) whether the results can be easily aggregated across multiple cases; (ix) whether the methods can identify timing errors; and (x) whether confidence intervals and hypothesis tests can be readily computed.
Abstract
An important focus of research in the forecast verification community is the development of alternative verification approaches for quantitative precipitation forecasts, as well as for other spatial forecasts. The need for information that is meaningful in an operational context and the importance of capturing the specific sources of forecast error at varying spatial scales are two primary motivating factors. In this paper, features of precipitation as identified by a convolution threshold technique are merged within fields and matched across fields in an automatic and computationally efficient manner using Baddeley’s metric for binary images.
The method is carried out on 100 test cases, and 4 representative cases are shown in detail. Results of merging and matching objects are generally positive in that they are consistent with how a subjective observer might merge and match features. The results further suggest that the Baddeley metric may be useful as a computationally efficient summary metric giving information about location, shape, and size differences of individual features, which could be employed for other spatial forecast verification methods.
Abstract
An important focus of research in the forecast verification community is the development of alternative verification approaches for quantitative precipitation forecasts, as well as for other spatial forecasts. The need for information that is meaningful in an operational context and the importance of capturing the specific sources of forecast error at varying spatial scales are two primary motivating factors. In this paper, features of precipitation as identified by a convolution threshold technique are merged within fields and matched across fields in an automatic and computationally efficient manner using Baddeley’s metric for binary images.
The method is carried out on 100 test cases, and 4 representative cases are shown in detail. Results of merging and matching objects are generally positive in that they are consistent with how a subjective observer might merge and match features. The results further suggest that the Baddeley metric may be useful as a computationally efficient summary metric giving information about location, shape, and size differences of individual features, which could be employed for other spatial forecast verification methods.
Numerous new methods have been proposed for using spatial information to better quantify and diagnose forecast performance when forecasts and observations are both available on the same grid. The majority of the new spatial verification methods can be classified into four broad categories (neighborhood, scale separation, features based, and field deformation), which themselves can be further generalized into two categories of filter and displacement. Because the methods make use of spatial information in widely different ways, users may be uncertain about what types of information each provides, and which methods may be most beneficial for particular applications. As an international project, the Spatial Forecast Verification Methods Inter-Comparison Project (ICP; www.ral.ucar.edu/projects/icp) was formed to address these questions. This project was coordinated by NCAR and facilitated by the WMO/World Weather Research Programme (WWRP) Joint Working Group on Forecast Verification Research. An overview of the methods involved in the project is provided here with some initial guidelines about when each of the verification approaches may be most appropriate. Future spatial verification methods may include hybrid methods that combine aspects of filter and displacement approaches.
Numerous new methods have been proposed for using spatial information to better quantify and diagnose forecast performance when forecasts and observations are both available on the same grid. The majority of the new spatial verification methods can be classified into four broad categories (neighborhood, scale separation, features based, and field deformation), which themselves can be further generalized into two categories of filter and displacement. Because the methods make use of spatial information in widely different ways, users may be uncertain about what types of information each provides, and which methods may be most beneficial for particular applications. As an international project, the Spatial Forecast Verification Methods Inter-Comparison Project (ICP; www.ral.ucar.edu/projects/icp) was formed to address these questions. This project was coordinated by NCAR and facilitated by the WMO/World Weather Research Programme (WWRP) Joint Working Group on Forecast Verification Research. An overview of the methods involved in the project is provided here with some initial guidelines about when each of the verification approaches may be most appropriate. Future spatial verification methods may include hybrid methods that combine aspects of filter and displacement approaches.
Abstract
Which of two competing continuous forecasts is better? This question is often asked in forecast verification, as well as climate model evaluation. Traditional statistical tests seem to be well suited to the task of providing an answer. However, most such tests do not account for some of the special underlying circumstances that are prevalent in this domain. For example, model output is seldom independent in time, and the models being compared are geared to predicting the same state of the atmosphere, and thus they could be contemporaneously correlated with each other. These types of violations of the assumptions of independence required for most statistical tests can greatly impact the accuracy and power of these tests. Here, this effect is examined on simulated series for many common testing procedures, including two-sample and paired t and normal approximation z tests, the z test with a first-order variance inflation factor applied, and the newer Hering–Genton (HG) test, as well as several bootstrap methods. While it is known how most of these tests will behave in the face of temporal dependence, it is less clear how contemporaneous correlation will affect them. Moreover, it is worthwhile knowing just how badly the tests can fail so that if they are applied, reasonable conclusions can be drawn. It is found that the HG test is the most robust to both temporal dependence and contemporaneous correlation, as well as the specific type and strength of temporal dependence. Bootstrap procedures that account for temporal dependence stand up well to contemporaneous correlation and temporal dependence, but require large sample sizes to be accurate.
Abstract
Which of two competing continuous forecasts is better? This question is often asked in forecast verification, as well as climate model evaluation. Traditional statistical tests seem to be well suited to the task of providing an answer. However, most such tests do not account for some of the special underlying circumstances that are prevalent in this domain. For example, model output is seldom independent in time, and the models being compared are geared to predicting the same state of the atmosphere, and thus they could be contemporaneously correlated with each other. These types of violations of the assumptions of independence required for most statistical tests can greatly impact the accuracy and power of these tests. Here, this effect is examined on simulated series for many common testing procedures, including two-sample and paired t and normal approximation z tests, the z test with a first-order variance inflation factor applied, and the newer Hering–Genton (HG) test, as well as several bootstrap methods. While it is known how most of these tests will behave in the face of temporal dependence, it is less clear how contemporaneous correlation will affect them. Moreover, it is worthwhile knowing just how badly the tests can fail so that if they are applied, reasonable conclusions can be drawn. It is found that the HG test is the most robust to both temporal dependence and contemporaneous correlation, as well as the specific type and strength of temporal dependence. Bootstrap procedures that account for temporal dependence stand up well to contemporaneous correlation and temporal dependence, but require large sample sizes to be accurate.
Abstract
Recent research to improve forecasts of in-flight icing conditions has involved the development of algorithms to apply to the output of numerical weather prediction models. The abilities of several of these algorithms to predict icing conditions, as verified by pilot reports (PIREPs), are compared for two numerical weather prediction models (Eta and the Mesoscale Analysis and Prediction System) for the Winter Icing and Storms Program 1994 (WISP94) time period (25 January–25 March 1994). Algorithms included in the comparison were developed by the National Aviation Weather Advisory Unit [NAWAU, now the Aviation Weather Center (AWC)], the National Center for Atmospheric Research’s Research Applications Program (RAP), and the U.S. Air Force. Operational icing forecasts (AIRMETs) issued by NAWAU for the same time period are evaluated to provide a standard of comparison. The capabilities of the Eta Model’s explicit cloud liquid water estimates for identifying icing regions are also evaluated and compared to the algorithm results.
Because PIREPs are not systematic and are biased toward positive reports, it is difficult to estimate standard verification parameters related to overforecasting (e.g., false alarm ratio). Methods are developed to compensate for these attributes of the PIREPs. The primary verification statistics computed include the probability of detection (POD) of yes and no reports, and the areal and volume extent of the forecast region.
None of the individual algorithms were able to obtain both a higher POD and a smaller area than any other algorithm; increases in POD are associated in all cases with increases in area. The RAP algorithm provides additional information by attempting to identify the physical mechanisms associated with the forecast icing conditions. One component of the RAP algorithm, which is designed to detect and forecast icing in regions of“warm” stratiform clouds, is more efficient at detecting icing than the other components. Cloud liquid water shows promise for development as a predictor of icing conditions, with detection rates of 30% or more in this initial study. AIRMETs were able to detect approximately the same percentage of icing reports as the algorithms, but with somewhat smaller forecast areas and somewhat larger forecast volumes on average. The algorithms are able to provide guidance with characteristics that are similar to the AIRMETs and should be useful in their formulation.
Abstract
Recent research to improve forecasts of in-flight icing conditions has involved the development of algorithms to apply to the output of numerical weather prediction models. The abilities of several of these algorithms to predict icing conditions, as verified by pilot reports (PIREPs), are compared for two numerical weather prediction models (Eta and the Mesoscale Analysis and Prediction System) for the Winter Icing and Storms Program 1994 (WISP94) time period (25 January–25 March 1994). Algorithms included in the comparison were developed by the National Aviation Weather Advisory Unit [NAWAU, now the Aviation Weather Center (AWC)], the National Center for Atmospheric Research’s Research Applications Program (RAP), and the U.S. Air Force. Operational icing forecasts (AIRMETs) issued by NAWAU for the same time period are evaluated to provide a standard of comparison. The capabilities of the Eta Model’s explicit cloud liquid water estimates for identifying icing regions are also evaluated and compared to the algorithm results.
Because PIREPs are not systematic and are biased toward positive reports, it is difficult to estimate standard verification parameters related to overforecasting (e.g., false alarm ratio). Methods are developed to compensate for these attributes of the PIREPs. The primary verification statistics computed include the probability of detection (POD) of yes and no reports, and the areal and volume extent of the forecast region.
None of the individual algorithms were able to obtain both a higher POD and a smaller area than any other algorithm; increases in POD are associated in all cases with increases in area. The RAP algorithm provides additional information by attempting to identify the physical mechanisms associated with the forecast icing conditions. One component of the RAP algorithm, which is designed to detect and forecast icing in regions of“warm” stratiform clouds, is more efficient at detecting icing than the other components. Cloud liquid water shows promise for development as a predictor of icing conditions, with detection rates of 30% or more in this initial study. AIRMETs were able to detect approximately the same percentage of icing reports as the algorithms, but with somewhat smaller forecast areas and somewhat larger forecast volumes on average. The algorithms are able to provide guidance with characteristics that are similar to the AIRMETs and should be useful in their formulation.
Abstract
While traditional verification methods are commonly used to assess numerical model quantitative precipitation forecasts (QPFs) using a grid-to-grid approach, they generally offer little diagnostic information or reasoning behind the computed statistic. On the other hand, advanced spatial verification techniques, such as neighborhood and object-based methods, can provide more meaningful insight into differences between forecast and observed features in terms of skill with spatial scale, coverage area, displacement, orientation, and intensity. To demonstrate the utility of applying advanced verification techniques to mid- and coarse-resolution models, the Developmental Testbed Center (DTC) applied several traditional metrics and spatial verification techniques to QPFs provided by the Global Forecast System (GFS) and operational North American Mesoscale Model (NAM). Along with frequency bias and Gilbert skill score (GSS) adjusted for bias, both the fractions skill score (FSS) and Method for Object-Based Diagnostic Evaluation (MODE) were utilized for this study with careful consideration given to how these methods were applied and how the results were interpreted. By illustrating the types of forecast attributes appropriate to assess with the spatial verification techniques, this paper provides examples of how to obtain advanced diagnostic information to help identify what aspects of the forecast are or are not performing well.
Abstract
While traditional verification methods are commonly used to assess numerical model quantitative precipitation forecasts (QPFs) using a grid-to-grid approach, they generally offer little diagnostic information or reasoning behind the computed statistic. On the other hand, advanced spatial verification techniques, such as neighborhood and object-based methods, can provide more meaningful insight into differences between forecast and observed features in terms of skill with spatial scale, coverage area, displacement, orientation, and intensity. To demonstrate the utility of applying advanced verification techniques to mid- and coarse-resolution models, the Developmental Testbed Center (DTC) applied several traditional metrics and spatial verification techniques to QPFs provided by the Global Forecast System (GFS) and operational North American Mesoscale Model (NAM). Along with frequency bias and Gilbert skill score (GSS) adjusted for bias, both the fractions skill score (FSS) and Method for Object-Based Diagnostic Evaluation (MODE) were utilized for this study with careful consideration given to how these methods were applied and how the results were interpreted. By illustrating the types of forecast attributes appropriate to assess with the spatial verification techniques, this paper provides examples of how to obtain advanced diagnostic information to help identify what aspects of the forecast are or are not performing well.
Abstract
Focusing on afternoon thunderstorms in Taiwan during the warm season (May–October) under weak synoptic forcing, this study applied the Taiwan Auto-NowCaster (TANC) to produce 1-h likelihood nowcasts of afternoon convection initiation (ACI) using a fuzzy logic approach. The primary objective is to design more useful forecast products with uncertainty regions of predicted thunderstorms to provide nowcast guidance of ACI for forecasters. Four sensitivity tests on forecast performance were conducted to improve the usefulness of nowcasts for forecasters. The optimal likelihood threshold (Lt) for ACIs, which is the likelihood value that best corresponds to the observed ACIs, was determined to be 0.6. Because of the high uncertainty on the exact location or timing of ACIs in nowcasts, location displacement and temporal shifting of ACIs should be considered in operational applications. When a spatial window of 5 km and a temporal window of 18 min are applied, the TANC displays moderate accuracy and satisfactory discrimination with an acceptable degree of overforecasting. The nonparametric Mann–Whitney test indicated that the performance of the TANC substantially surpasses the competing Space and Time Multiscale Analysis System–Weather Research and Forecasting Model, which serves as a pertinent reference for short-range (0–6 h) forecasts at the Central Weather Bureau in Taiwan.
Abstract
Focusing on afternoon thunderstorms in Taiwan during the warm season (May–October) under weak synoptic forcing, this study applied the Taiwan Auto-NowCaster (TANC) to produce 1-h likelihood nowcasts of afternoon convection initiation (ACI) using a fuzzy logic approach. The primary objective is to design more useful forecast products with uncertainty regions of predicted thunderstorms to provide nowcast guidance of ACI for forecasters. Four sensitivity tests on forecast performance were conducted to improve the usefulness of nowcasts for forecasters. The optimal likelihood threshold (Lt) for ACIs, which is the likelihood value that best corresponds to the observed ACIs, was determined to be 0.6. Because of the high uncertainty on the exact location or timing of ACIs in nowcasts, location displacement and temporal shifting of ACIs should be considered in operational applications. When a spatial window of 5 km and a temporal window of 18 min are applied, the TANC displays moderate accuracy and satisfactory discrimination with an acceptable degree of overforecasting. The nonparametric Mann–Whitney test indicated that the performance of the TANC substantially surpasses the competing Space and Time Multiscale Analysis System–Weather Research and Forecasting Model, which serves as a pertinent reference for short-range (0–6 h) forecasts at the Central Weather Bureau in Taiwan.
Abstract
Recent advancements in numerical weather prediction (NWP) and the enhancement of model resolution have created the need for more robust and informative verification methods. In response to these needs, a plethora of spatial verification approaches have been developed in the past two decades. A spatial verification method intercomparison was established in 2007 with the aim of gaining a better understanding of the abilities of the new spatial verification methods to diagnose different types of forecast errors. The project focused on prescribed errors for quantitative precipitation forecasts over the central United States. The intercomparison led to a classification of spatial verification methods and a cataloging of their diagnostic capabilities, providing useful guidance to end users, model developers, and verification scientists. A decade later, NWP systems have continued to increase in resolution, including advances in high-resolution ensembles. This article describes the setup of a second phase of the verification intercomparison, called the Mesoscale Verification Intercomparison over Complex Terrain (MesoVICT). MesoVICT focuses on the application, capability, and enhancement of spatial verification methods to deterministic and ensemble forecasts of precipitation, wind, and temperature over complex terrain. Importantly, this phase also explores the issue of analysis uncertainty through the use of an ensemble of meteorological analyses.
Abstract
Recent advancements in numerical weather prediction (NWP) and the enhancement of model resolution have created the need for more robust and informative verification methods. In response to these needs, a plethora of spatial verification approaches have been developed in the past two decades. A spatial verification method intercomparison was established in 2007 with the aim of gaining a better understanding of the abilities of the new spatial verification methods to diagnose different types of forecast errors. The project focused on prescribed errors for quantitative precipitation forecasts over the central United States. The intercomparison led to a classification of spatial verification methods and a cataloging of their diagnostic capabilities, providing useful guidance to end users, model developers, and verification scientists. A decade later, NWP systems have continued to increase in resolution, including advances in high-resolution ensembles. This article describes the setup of a second phase of the verification intercomparison, called the Mesoscale Verification Intercomparison over Complex Terrain (MesoVICT). MesoVICT focuses on the application, capability, and enhancement of spatial verification methods to deterministic and ensemble forecasts of precipitation, wind, and temperature over complex terrain. Importantly, this phase also explores the issue of analysis uncertainty through the use of an ensemble of meteorological analyses.
Abstract
As part of the second phase of the spatial forecast verification intercomparison project (ICP), dubbed the Mesoscale Verification Intercomparison in Complex Terrain (MesoVICT) project, a new set of idealized test fields is prepared. This paper describes these new fields and their rationale and uses them to analyze a number of summary measures associated with distance and geometric-based approaches. The results provide guidance about how they inform about performance under various scenarios. The new case comparisons are grouped into four categories: (i) pathological situations such as when a variable is zero valued at all grid points; (ii) circular events aimed at evaluating how different methods handle contrived situations, such as equal but opposite translations, the presence of multiple events of same/different size, boundary effects, and the influence of the positioning of events in the domain; (iii) elliptical events representing simplified scenarios that mimic commonly encountered weather phenomena in complex terrain; and (iv) cases aimed at analyzing how the verification methods handle small-scale scattered events, very large events with holes (e.g., a small portion of clear sky on a cloudy overcast day), and the presence of noise in one or both fields. Results show that all analyzed measures perform poorly in the pathological setting. They are either not able to provide a result at all or they instigate a special rule to prescribe a value resulting in erratic results. The analysis also showed that methods provide similar information in many situations, but that each has its positive properties along with certain unique limitations.
Abstract
As part of the second phase of the spatial forecast verification intercomparison project (ICP), dubbed the Mesoscale Verification Intercomparison in Complex Terrain (MesoVICT) project, a new set of idealized test fields is prepared. This paper describes these new fields and their rationale and uses them to analyze a number of summary measures associated with distance and geometric-based approaches. The results provide guidance about how they inform about performance under various scenarios. The new case comparisons are grouped into four categories: (i) pathological situations such as when a variable is zero valued at all grid points; (ii) circular events aimed at evaluating how different methods handle contrived situations, such as equal but opposite translations, the presence of multiple events of same/different size, boundary effects, and the influence of the positioning of events in the domain; (iii) elliptical events representing simplified scenarios that mimic commonly encountered weather phenomena in complex terrain; and (iv) cases aimed at analyzing how the verification methods handle small-scale scattered events, very large events with holes (e.g., a small portion of clear sky on a cloudy overcast day), and the presence of noise in one or both fields. Results show that all analyzed measures perform poorly in the pathological setting. They are either not able to provide a result at all or they instigate a special rule to prescribe a value resulting in erratic results. The analysis also showed that methods provide similar information in many situations, but that each has its positive properties along with certain unique limitations.