Spatial Forecast Verification Methods Inter-Comparison Project (ICP)
Theme Description:
The Spatial Forecast Verification Methods Inter-Comparison Project (ICP) is a coordinated international effort to critically compare the numerous recently proposed methods for evaluating forecast performance in a spatial context. The papers here analyze cases of ICP geometric, contrived, and real quantitative precipitation forecasts performances based on many of the new spatial methods.
The articles will be presented below as they are published.
Editor:
Bill Gallus
Collection coordinator:
Eric Gilleland
Spatial Forecast Verification Methods Inter-Comparison Project (ICP)
Numerous new methods have been proposed for using spatial information to better quantify and diagnose forecast performance when forecasts and observations are both available on the same grid. The majority of the new spatial verification methods can be classified into four broad categories (neighborhood, scale separation, features based, and field deformation), which themselves can be further generalized into two categories of filter and displacement. Because the methods make use of spatial information in widely different ways, users may be uncertain about what types of information each provides, and which methods may be most beneficial for particular applications. As an international project, the Spatial Forecast Verification Methods Inter-Comparison Project (ICP; www.ral.ucar.edu/projects/icp) was formed to address these questions. This project was coordinated by NCAR and facilitated by the WMO/World Weather Research Programme (WWRP) Joint Working Group on Forecast Verification Research. An overview of the methods involved in the project is provided here with some initial guidelines about when each of the verification approaches may be most appropriate. Future spatial verification methods may include hybrid methods that combine aspects of filter and displacement approaches.
Numerous new methods have been proposed for using spatial information to better quantify and diagnose forecast performance when forecasts and observations are both available on the same grid. The majority of the new spatial verification methods can be classified into four broad categories (neighborhood, scale separation, features based, and field deformation), which themselves can be further generalized into two categories of filter and displacement. Because the methods make use of spatial information in widely different ways, users may be uncertain about what types of information each provides, and which methods may be most beneficial for particular applications. As an international project, the Spatial Forecast Verification Methods Inter-Comparison Project (ICP; www.ral.ucar.edu/projects/icp) was formed to address these questions. This project was coordinated by NCAR and facilitated by the WMO/World Weather Research Programme (WWRP) Joint Working Group on Forecast Verification Research. An overview of the methods involved in the project is provided here with some initial guidelines about when each of the verification approaches may be most appropriate. Future spatial verification methods may include hybrid methods that combine aspects of filter and displacement approaches.
Abstract
Image warping for spatial forecast verification is applied to the test cases employed by the Spatial Forecast Verification Intercomparison Project (ICP), which includes both real and contrived cases. A larger set of cases is also used to investigate aggregating results for summarizing forecast performance over a long record of forecasts. The technique handles the geometric and perturbed cases with nearly exact precision, as would be expected. A statistic, dubbed here the IWS for image warp statistic, is proposed for ranking multiple forecasts and tested on the perturbed cases. IWS rankings for perturbed and real test cases are found to be sensible and physically interpretable. A powerful result of this study is that the image warp can be employed using a relatively sparse, preset regular grid without having to first identify features.
Abstract
Image warping for spatial forecast verification is applied to the test cases employed by the Spatial Forecast Verification Intercomparison Project (ICP), which includes both real and contrived cases. A larger set of cases is also used to investigate aggregating results for summarizing forecast performance over a long record of forecasts. The technique handles the geometric and perturbed cases with nearly exact precision, as would be expected. A statistic, dubbed here the IWS for image warp statistic, is proposed for ranking multiple forecasts and tested on the perturbed cases. IWS rankings for perturbed and real test cases are found to be sensible and physically interpretable. A powerful result of this study is that the image warp can be employed using a relatively sparse, preset regular grid without having to first identify features.
Abstract
Verification methods for high-resolution forecasts have been based either on filtering or on objects created by thresholding the images. The filtering methods do not easily permit the use of deformation while identifying objects based on thresholds can be problematic. In this paper, a new approach is introduced in which the observed and forecast fields are broken down into a mixture of Gaussians, and the parameters of the Gaussian mixture model fit are examined to identify translation, rotation, and scaling errors. The advantages of this method are discussed in terms of the traditional filtering or object-based methods and the resulting scores are interpreted on a standard verification dataset.
Abstract
Verification methods for high-resolution forecasts have been based either on filtering or on objects created by thresholding the images. The filtering methods do not easily permit the use of deformation while identifying objects based on thresholds can be problematic. In this paper, a new approach is introduced in which the observed and forecast fields are broken down into a mixture of Gaussians, and the parameters of the Gaussian mixture model fit are examined to identify translation, rotation, and scaling errors. The advantages of this method are discussed in terms of the traditional filtering or object-based methods and the resulting scores are interpreted on a standard verification dataset.
Abstract
Several spatial forecast verification methods have been developed that are suited for high-resolution precipitation forecasts. They can account for the spatial coherence of precipitation and give credit to a forecast that does not necessarily match the observation at any particular grid point. The methods were grouped into four broad categories (neighborhood, scale separation, features based, and field deformation) for the Spatial Forecast Verification Methods Intercomparison Project (ICP). Participants were asked to apply their new methods to a set of artificial geometric and perturbed forecasts with prescribed errors, and a set of real forecasts of convective precipitation on a 4-km grid. This paper describes the intercomparison test cases, summarizes results from the geometric cases, and presents subjective scores and traditional scores from the real cases.
All the new methods could detect bias error, and the features-based and field deformation methods were also able to diagnose displacement errors of precipitation features. The best approach for capturing errors in aspect ratio was field deformation. When comparing model forecasts with real cases, the traditional verification scores did not agree with the subjective assessment of the forecasts.
Abstract
Several spatial forecast verification methods have been developed that are suited for high-resolution precipitation forecasts. They can account for the spatial coherence of precipitation and give credit to a forecast that does not necessarily match the observation at any particular grid point. The methods were grouped into four broad categories (neighborhood, scale separation, features based, and field deformation) for the Spatial Forecast Verification Methods Intercomparison Project (ICP). Participants were asked to apply their new methods to a set of artificial geometric and perturbed forecasts with prescribed errors, and a set of real forecasts of convective precipitation on a 4-km grid. This paper describes the intercomparison test cases, summarizes results from the geometric cases, and presents subjective scores and traditional scores from the real cases.
All the new methods could detect bias error, and the features-based and field deformation methods were also able to diagnose displacement errors of precipitation features. The best approach for capturing errors in aspect ratio was field deformation. When comparing model forecasts with real cases, the traditional verification scores did not agree with the subjective assessment of the forecasts.
Abstract
Bias-adjusted threat and equitable threat scores were designed to account for the effects of placement errors in assessing the performance of under- or overbiased forecasts. These bias-adjusted performance measures exhibit bias sensitivity. The critical performance ratio (CPR) is the minimum fraction of added forecasts that are correct for a performance measure to indicate improvement if bias is increased. In the opposite case, the CPR is the maximum fraction of removed forecasts that are correct for a performance measure to indicate improvement if bias is decreased. The CPR is derived here for the bias-adjusted threat and equitable threat scores to quantify bias sensitivity relative to several other measures of performance including conventional threat and equitable threat scores. The CPR for a bias-adjusted equitable threat score may indicate the likelihood of preserving or increasing the conventional equitable threat score if forecasts are bias corrected based on past performance.
Abstract
Bias-adjusted threat and equitable threat scores were designed to account for the effects of placement errors in assessing the performance of under- or overbiased forecasts. These bias-adjusted performance measures exhibit bias sensitivity. The critical performance ratio (CPR) is the minimum fraction of added forecasts that are correct for a performance measure to indicate improvement if bias is increased. In the opposite case, the CPR is the maximum fraction of removed forecasts that are correct for a performance measure to indicate improvement if bias is decreased. The CPR is derived here for the bias-adjusted threat and equitable threat scores to quantify bias sensitivity relative to several other measures of performance including conventional threat and equitable threat scores. The CPR for a bias-adjusted equitable threat score may indicate the likelihood of preserving or increasing the conventional equitable threat score if forecasts are bias corrected based on past performance.
Abstract
High-resolution forecasts may be quite useful even when they do not match the observations exactly. Neighborhood verification is a strategy for evaluating the “closeness” of the forecast to the observations within space–time neighborhoods rather than at the grid scale. Various properties of the forecast within a neighborhood can be assessed for similarity to the observations, including the mean value, fractional coverage, occurrence of a forecast event sufficiently near an observed event, and so on. By varying the sizes of the neighborhoods, it is possible to determine the scales for which the forecast has sufficient skill for a particular application. Several neighborhood verification methods have been proposed in the literature in the last decade. This paper examines four such methods in detail for idealized and real high-resolution precipitation forecasts, highlighting what can be learned from each of the methods. When applied to idealized and real precipitation forecasts from the Spatial Verification Methods Intercomparison Project, all four methods showed improved forecast performance for neighborhood sizes larger than grid scale, with the optimal scale for each method varying as a function of rainfall intensity.
Abstract
High-resolution forecasts may be quite useful even when they do not match the observations exactly. Neighborhood verification is a strategy for evaluating the “closeness” of the forecast to the observations within space–time neighborhoods rather than at the grid scale. Various properties of the forecast within a neighborhood can be assessed for similarity to the observations, including the mean value, fractional coverage, occurrence of a forecast event sufficiently near an observed event, and so on. By varying the sizes of the neighborhoods, it is possible to determine the scales for which the forecast has sufficient skill for a particular application. Several neighborhood verification methods have been proposed in the literature in the last decade. This paper examines four such methods in detail for idealized and real high-resolution precipitation forecasts, highlighting what can be learned from each of the methods. When applied to idealized and real precipitation forecasts from the Spatial Verification Methods Intercomparison Project, all four methods showed improved forecast performance for neighborhood sizes larger than grid scale, with the optimal scale for each method varying as a function of rainfall intensity.
Abstract
In this study, a recently introduced feature-based quality measure called SAL, which provides information about the structure, amplitude, and location of a quantitative precipitation forecast (QPF) in a prespecified domain, is applied to different sets of synthetic and realistic QPFs in the United States. The focus is on a detailed discussion of selected cases and on the comparison of the verification results obtained with SAL and some classical gridpoint-based error measures. For simple geometric precipitation objects it is shown that SAL adequately captures errors in the size and location of the objects, however, not in their orientation. The artificially modified (so-called fake) cases illustrate that SAL has the potential to distinguish between forecasts where intense precipitation objects are either isolated or embedded in a larger-scale low-intensity precipitation area. The real cases highlight that a quality assessment with SAL can lead to contrasting results compared to the application of classical error measures and that, overall, SAL provides useful guidance for identifying the specific shortcomings of a particular QPF. It is also discussed that verification results with SAL and other error measures should be interpreted with care if considering large domains, which may contain meteorologically distinct precipitation systems.
Abstract
In this study, a recently introduced feature-based quality measure called SAL, which provides information about the structure, amplitude, and location of a quantitative precipitation forecast (QPF) in a prespecified domain, is applied to different sets of synthetic and realistic QPFs in the United States. The focus is on a detailed discussion of selected cases and on the comparison of the verification results obtained with SAL and some classical gridpoint-based error measures. For simple geometric precipitation objects it is shown that SAL adequately captures errors in the size and location of the objects, however, not in their orientation. The artificially modified (so-called fake) cases illustrate that SAL has the potential to distinguish between forecasts where intense precipitation objects are either isolated or embedded in a larger-scale low-intensity precipitation area. The real cases highlight that a quality assessment with SAL can lead to contrasting results compared to the application of classical error measures and that, overall, SAL provides useful guidance for identifying the specific shortcomings of a particular QPF. It is also discussed that verification results with SAL and other error measures should be interpreted with care if considering large domains, which may contain meteorologically distinct precipitation systems.
Abstract
Three spatial verification techniques are applied to three datasets. The datasets consist of a mixture of real and artificial forecasts, and corresponding observations, designed to aid in better understanding the effects of global (i.e., across the entire field) displacement and intensity errors. The three verification techniques, each based on well-known statistical methods, have little in common and, so, present different facets of forecast quality. It is shown that a verification method based on cluster analysis can identify “objects” in a forecast and an observation field, thereby allowing for object-oriented verification in the sense that it considers displacement, missed forecasts, and false alarms. A second method compares the observed and forecast fields, not in terms of the objects within them, but in terms of the covariance structure of the fields, as summarized by their variogram. The last method addresses the agreement between the two fields by inferring the function that maps one to the other. The map—generally called optical flow—provides a (visual) summary of the “difference” between the two fields. A further summary measure of that map is found to yield useful information on the distortion error in the forecasts.
Abstract
Three spatial verification techniques are applied to three datasets. The datasets consist of a mixture of real and artificial forecasts, and corresponding observations, designed to aid in better understanding the effects of global (i.e., across the entire field) displacement and intensity errors. The three verification techniques, each based on well-known statistical methods, have little in common and, so, present different facets of forecast quality. It is shown that a verification method based on cluster analysis can identify “objects” in a forecast and an observation field, thereby allowing for object-oriented verification in the sense that it considers displacement, missed forecasts, and false alarms. A second method compares the observed and forecast fields, not in terms of the objects within them, but in terms of the covariance structure of the fields, as summarized by their variogram. The last method addresses the agreement between the two fields by inferring the function that maps one to the other. The map—generally called optical flow—provides a (visual) summary of the “difference” between the two fields. A further summary measure of that map is found to yield useful information on the distortion error in the forecasts.
Abstract
The composite method is applied to verify a series of idealized and real precipitation forecasts as part of the Spatial Forecast Verification Methods Intercomparison Project. The test cases range from simple geometric shapes to high-resolution (∼4 km) numerical model precipitation output. The performance of the composite method is described as it is applied to each set of forecasts. In general, the method performed well because it was able to relay information concerning spatial displacement and areal coverage errors. Summary scores derived from the composite means and the individual events displayed relevant information in a condensed form. The composite method also showed an ability to discern performance attributes from high-resolution precipitation forecasts from several competing model configurations, though the results were somewhat limited by the lack of data. Overall, the composite method proved to be most sensitive in revealing systematic displacement errors, while it was less sensitive to systematic model biases.
Abstract
The composite method is applied to verify a series of idealized and real precipitation forecasts as part of the Spatial Forecast Verification Methods Intercomparison Project. The test cases range from simple geometric shapes to high-resolution (∼4 km) numerical model precipitation output. The performance of the composite method is described as it is applied to each set of forecasts. In general, the method performed well because it was able to relay information concerning spatial displacement and areal coverage errors. Summary scores derived from the composite means and the individual events displayed relevant information in a condensed form. The composite method also showed an ability to discern performance attributes from high-resolution precipitation forecasts from several competing model configurations, though the results were somewhat limited by the lack of data. Overall, the composite method proved to be most sensitive in revealing systematic displacement errors, while it was less sensitive to systematic model biases.
Abstract
A field verification measure for precipitation forecasts is presented that combines distance and amplitude errors. It is based on an optical flow algorithm that defines a vector field that deforms, or morphs, one image to match another. When the forecast field is morphed to match the observation field, then for any point in the observation field, the magnitude of the displacement vector gives the distance to the corresponding forecast object (if any), while the difference between the observation and the morphed forecast is the amplitude error. Similarly, morphing the observation field onto the forecast field gives displacement and amplitude errors for forecast features. If observed and forecast features are separated by more than a prescribed maximum search distance, they are not matched to each other, but they are considered to be two separate amplitude errors: a missed event and a false alarm. The displacement and amplitude error components are combined to produce a displacement and amplitude score (DAS). The two components are weighted according to the principle that a displacement error equal to the maximum search distance is equivalent to the amplitude error that would be obtained by a forecast and an observed feature that are too far apart to be matched. The new score, DAS, is applied to the idealized and observed test cases of the Spatial Verification Methods Intercomparison Project (ICP) and is found to accurately measure displacement errors and quantify combined displacement and amplitude errors reasonably well, although with some limitations due to the inability of the image matcher to perfectly match complex fields.
Abstract
A field verification measure for precipitation forecasts is presented that combines distance and amplitude errors. It is based on an optical flow algorithm that defines a vector field that deforms, or morphs, one image to match another. When the forecast field is morphed to match the observation field, then for any point in the observation field, the magnitude of the displacement vector gives the distance to the corresponding forecast object (if any), while the difference between the observation and the morphed forecast is the amplitude error. Similarly, morphing the observation field onto the forecast field gives displacement and amplitude errors for forecast features. If observed and forecast features are separated by more than a prescribed maximum search distance, they are not matched to each other, but they are considered to be two separate amplitude errors: a missed event and a false alarm. The displacement and amplitude error components are combined to produce a displacement and amplitude score (DAS). The two components are weighted according to the principle that a displacement error equal to the maximum search distance is equivalent to the amplitude error that would be obtained by a forecast and an observed feature that are too far apart to be matched. The new score, DAS, is applied to the idealized and observed test cases of the Spatial Verification Methods Intercomparison Project (ICP) and is found to accurately measure displacement errors and quantify combined displacement and amplitude errors reasonably well, although with some limitations due to the inability of the image matcher to perfectly match complex fields.