Browse
Abstract
Image warping for spatial forecast verification is applied to the test cases employed by the Spatial Forecast Verification Intercomparison Project (ICP), which includes both real and contrived cases. A larger set of cases is also used to investigate aggregating results for summarizing forecast performance over a long record of forecasts. The technique handles the geometric and perturbed cases with nearly exact precision, as would be expected. A statistic, dubbed here the IWS for image warp statistic, is proposed for ranking multiple forecasts and tested on the perturbed cases. IWS rankings for perturbed and real test cases are found to be sensible and physically interpretable. A powerful result of this study is that the image warp can be employed using a relatively sparse, preset regular grid without having to first identify features.
Abstract
Image warping for spatial forecast verification is applied to the test cases employed by the Spatial Forecast Verification Intercomparison Project (ICP), which includes both real and contrived cases. A larger set of cases is also used to investigate aggregating results for summarizing forecast performance over a long record of forecasts. The technique handles the geometric and perturbed cases with nearly exact precision, as would be expected. A statistic, dubbed here the IWS for image warp statistic, is proposed for ranking multiple forecasts and tested on the perturbed cases. IWS rankings for perturbed and real test cases are found to be sensible and physically interpretable. A powerful result of this study is that the image warp can be employed using a relatively sparse, preset regular grid without having to first identify features.
Abstract
Verification methods for high-resolution forecasts have been based either on filtering or on objects created by thresholding the images. The filtering methods do not easily permit the use of deformation while identifying objects based on thresholds can be problematic. In this paper, a new approach is introduced in which the observed and forecast fields are broken down into a mixture of Gaussians, and the parameters of the Gaussian mixture model fit are examined to identify translation, rotation, and scaling errors. The advantages of this method are discussed in terms of the traditional filtering or object-based methods and the resulting scores are interpreted on a standard verification dataset.
Abstract
Verification methods for high-resolution forecasts have been based either on filtering or on objects created by thresholding the images. The filtering methods do not easily permit the use of deformation while identifying objects based on thresholds can be problematic. In this paper, a new approach is introduced in which the observed and forecast fields are broken down into a mixture of Gaussians, and the parameters of the Gaussian mixture model fit are examined to identify translation, rotation, and scaling errors. The advantages of this method are discussed in terms of the traditional filtering or object-based methods and the resulting scores are interpreted on a standard verification dataset.
Abstract
Several spatial forecast verification methods have been developed that are suited for high-resolution precipitation forecasts. They can account for the spatial coherence of precipitation and give credit to a forecast that does not necessarily match the observation at any particular grid point. The methods were grouped into four broad categories (neighborhood, scale separation, features based, and field deformation) for the Spatial Forecast Verification Methods Intercomparison Project (ICP). Participants were asked to apply their new methods to a set of artificial geometric and perturbed forecasts with prescribed errors, and a set of real forecasts of convective precipitation on a 4-km grid. This paper describes the intercomparison test cases, summarizes results from the geometric cases, and presents subjective scores and traditional scores from the real cases.
All the new methods could detect bias error, and the features-based and field deformation methods were also able to diagnose displacement errors of precipitation features. The best approach for capturing errors in aspect ratio was field deformation. When comparing model forecasts with real cases, the traditional verification scores did not agree with the subjective assessment of the forecasts.
Abstract
Several spatial forecast verification methods have been developed that are suited for high-resolution precipitation forecasts. They can account for the spatial coherence of precipitation and give credit to a forecast that does not necessarily match the observation at any particular grid point. The methods were grouped into four broad categories (neighborhood, scale separation, features based, and field deformation) for the Spatial Forecast Verification Methods Intercomparison Project (ICP). Participants were asked to apply their new methods to a set of artificial geometric and perturbed forecasts with prescribed errors, and a set of real forecasts of convective precipitation on a 4-km grid. This paper describes the intercomparison test cases, summarizes results from the geometric cases, and presents subjective scores and traditional scores from the real cases.
All the new methods could detect bias error, and the features-based and field deformation methods were also able to diagnose displacement errors of precipitation features. The best approach for capturing errors in aspect ratio was field deformation. When comparing model forecasts with real cases, the traditional verification scores did not agree with the subjective assessment of the forecasts.
Abstract
High-resolution forecasts may be quite useful even when they do not match the observations exactly. Neighborhood verification is a strategy for evaluating the “closeness” of the forecast to the observations within space–time neighborhoods rather than at the grid scale. Various properties of the forecast within a neighborhood can be assessed for similarity to the observations, including the mean value, fractional coverage, occurrence of a forecast event sufficiently near an observed event, and so on. By varying the sizes of the neighborhoods, it is possible to determine the scales for which the forecast has sufficient skill for a particular application. Several neighborhood verification methods have been proposed in the literature in the last decade. This paper examines four such methods in detail for idealized and real high-resolution precipitation forecasts, highlighting what can be learned from each of the methods. When applied to idealized and real precipitation forecasts from the Spatial Verification Methods Intercomparison Project, all four methods showed improved forecast performance for neighborhood sizes larger than grid scale, with the optimal scale for each method varying as a function of rainfall intensity.
Abstract
High-resolution forecasts may be quite useful even when they do not match the observations exactly. Neighborhood verification is a strategy for evaluating the “closeness” of the forecast to the observations within space–time neighborhoods rather than at the grid scale. Various properties of the forecast within a neighborhood can be assessed for similarity to the observations, including the mean value, fractional coverage, occurrence of a forecast event sufficiently near an observed event, and so on. By varying the sizes of the neighborhoods, it is possible to determine the scales for which the forecast has sufficient skill for a particular application. Several neighborhood verification methods have been proposed in the literature in the last decade. This paper examines four such methods in detail for idealized and real high-resolution precipitation forecasts, highlighting what can be learned from each of the methods. When applied to idealized and real precipitation forecasts from the Spatial Verification Methods Intercomparison Project, all four methods showed improved forecast performance for neighborhood sizes larger than grid scale, with the optimal scale for each method varying as a function of rainfall intensity.
Abstract
In this study, a recently introduced feature-based quality measure called SAL, which provides information about the structure, amplitude, and location of a quantitative precipitation forecast (QPF) in a prespecified domain, is applied to different sets of synthetic and realistic QPFs in the United States. The focus is on a detailed discussion of selected cases and on the comparison of the verification results obtained with SAL and some classical gridpoint-based error measures. For simple geometric precipitation objects it is shown that SAL adequately captures errors in the size and location of the objects, however, not in their orientation. The artificially modified (so-called fake) cases illustrate that SAL has the potential to distinguish between forecasts where intense precipitation objects are either isolated or embedded in a larger-scale low-intensity precipitation area. The real cases highlight that a quality assessment with SAL can lead to contrasting results compared to the application of classical error measures and that, overall, SAL provides useful guidance for identifying the specific shortcomings of a particular QPF. It is also discussed that verification results with SAL and other error measures should be interpreted with care if considering large domains, which may contain meteorologically distinct precipitation systems.
Abstract
In this study, a recently introduced feature-based quality measure called SAL, which provides information about the structure, amplitude, and location of a quantitative precipitation forecast (QPF) in a prespecified domain, is applied to different sets of synthetic and realistic QPFs in the United States. The focus is on a detailed discussion of selected cases and on the comparison of the verification results obtained with SAL and some classical gridpoint-based error measures. For simple geometric precipitation objects it is shown that SAL adequately captures errors in the size and location of the objects, however, not in their orientation. The artificially modified (so-called fake) cases illustrate that SAL has the potential to distinguish between forecasts where intense precipitation objects are either isolated or embedded in a larger-scale low-intensity precipitation area. The real cases highlight that a quality assessment with SAL can lead to contrasting results compared to the application of classical error measures and that, overall, SAL provides useful guidance for identifying the specific shortcomings of a particular QPF. It is also discussed that verification results with SAL and other error measures should be interpreted with care if considering large domains, which may contain meteorologically distinct precipitation systems.
Abstract
Three spatial verification techniques are applied to three datasets. The datasets consist of a mixture of real and artificial forecasts, and corresponding observations, designed to aid in better understanding the effects of global (i.e., across the entire field) displacement and intensity errors. The three verification techniques, each based on well-known statistical methods, have little in common and, so, present different facets of forecast quality. It is shown that a verification method based on cluster analysis can identify “objects” in a forecast and an observation field, thereby allowing for object-oriented verification in the sense that it considers displacement, missed forecasts, and false alarms. A second method compares the observed and forecast fields, not in terms of the objects within them, but in terms of the covariance structure of the fields, as summarized by their variogram. The last method addresses the agreement between the two fields by inferring the function that maps one to the other. The map—generally called optical flow—provides a (visual) summary of the “difference” between the two fields. A further summary measure of that map is found to yield useful information on the distortion error in the forecasts.
Abstract
Three spatial verification techniques are applied to three datasets. The datasets consist of a mixture of real and artificial forecasts, and corresponding observations, designed to aid in better understanding the effects of global (i.e., across the entire field) displacement and intensity errors. The three verification techniques, each based on well-known statistical methods, have little in common and, so, present different facets of forecast quality. It is shown that a verification method based on cluster analysis can identify “objects” in a forecast and an observation field, thereby allowing for object-oriented verification in the sense that it considers displacement, missed forecasts, and false alarms. A second method compares the observed and forecast fields, not in terms of the objects within them, but in terms of the covariance structure of the fields, as summarized by their variogram. The last method addresses the agreement between the two fields by inferring the function that maps one to the other. The map—generally called optical flow—provides a (visual) summary of the “difference” between the two fields. A further summary measure of that map is found to yield useful information on the distortion error in the forecasts.
Abstract
The composite method is applied to verify a series of idealized and real precipitation forecasts as part of the Spatial Forecast Verification Methods Intercomparison Project. The test cases range from simple geometric shapes to high-resolution (∼4 km) numerical model precipitation output. The performance of the composite method is described as it is applied to each set of forecasts. In general, the method performed well because it was able to relay information concerning spatial displacement and areal coverage errors. Summary scores derived from the composite means and the individual events displayed relevant information in a condensed form. The composite method also showed an ability to discern performance attributes from high-resolution precipitation forecasts from several competing model configurations, though the results were somewhat limited by the lack of data. Overall, the composite method proved to be most sensitive in revealing systematic displacement errors, while it was less sensitive to systematic model biases.
Abstract
The composite method is applied to verify a series of idealized and real precipitation forecasts as part of the Spatial Forecast Verification Methods Intercomparison Project. The test cases range from simple geometric shapes to high-resolution (∼4 km) numerical model precipitation output. The performance of the composite method is described as it is applied to each set of forecasts. In general, the method performed well because it was able to relay information concerning spatial displacement and areal coverage errors. Summary scores derived from the composite means and the individual events displayed relevant information in a condensed form. The composite method also showed an ability to discern performance attributes from high-resolution precipitation forecasts from several competing model configurations, though the results were somewhat limited by the lack of data. Overall, the composite method proved to be most sensitive in revealing systematic displacement errors, while it was less sensitive to systematic model biases.
Abstract
A field verification measure for precipitation forecasts is presented that combines distance and amplitude errors. It is based on an optical flow algorithm that defines a vector field that deforms, or morphs, one image to match another. When the forecast field is morphed to match the observation field, then for any point in the observation field, the magnitude of the displacement vector gives the distance to the corresponding forecast object (if any), while the difference between the observation and the morphed forecast is the amplitude error. Similarly, morphing the observation field onto the forecast field gives displacement and amplitude errors for forecast features. If observed and forecast features are separated by more than a prescribed maximum search distance, they are not matched to each other, but they are considered to be two separate amplitude errors: a missed event and a false alarm. The displacement and amplitude error components are combined to produce a displacement and amplitude score (DAS). The two components are weighted according to the principle that a displacement error equal to the maximum search distance is equivalent to the amplitude error that would be obtained by a forecast and an observed feature that are too far apart to be matched. The new score, DAS, is applied to the idealized and observed test cases of the Spatial Verification Methods Intercomparison Project (ICP) and is found to accurately measure displacement errors and quantify combined displacement and amplitude errors reasonably well, although with some limitations due to the inability of the image matcher to perfectly match complex fields.
Abstract
A field verification measure for precipitation forecasts is presented that combines distance and amplitude errors. It is based on an optical flow algorithm that defines a vector field that deforms, or morphs, one image to match another. When the forecast field is morphed to match the observation field, then for any point in the observation field, the magnitude of the displacement vector gives the distance to the corresponding forecast object (if any), while the difference between the observation and the morphed forecast is the amplitude error. Similarly, morphing the observation field onto the forecast field gives displacement and amplitude errors for forecast features. If observed and forecast features are separated by more than a prescribed maximum search distance, they are not matched to each other, but they are considered to be two separate amplitude errors: a missed event and a false alarm. The displacement and amplitude error components are combined to produce a displacement and amplitude score (DAS). The two components are weighted according to the principle that a displacement error equal to the maximum search distance is equivalent to the amplitude error that would be obtained by a forecast and an observed feature that are too far apart to be matched. The new score, DAS, is applied to the idealized and observed test cases of the Spatial Verification Methods Intercomparison Project (ICP) and is found to accurately measure displacement errors and quantify combined displacement and amplitude errors reasonably well, although with some limitations due to the inability of the image matcher to perfectly match complex fields.
Abstract
Advancements in weather forecast models and their enhanced resolution have led to substantially improved and more realistic-appearing forecasts for some variables. However, traditional verification scores often indicate poor performance because of the increased small-scale variability so that the true quality of the forecasts is not always characterized well. As a result, numerous new methods for verifying these forecasts have been proposed. These new methods can mostly be classified into two overall categories: filtering methods and displacement methods. The filtering methods can be further delineated into neighborhood and scale separation, and the displacement methods can be divided into features based and field deformation. Each method gives considerably more information than the traditional scores, but it is not clear which method(s) should be used for which purpose.
A verification methods intercomparison project has been established in order to glean a better understanding of the proposed methods in terms of their various characteristics and to determine what verification questions each method addresses. The study is ongoing, and preliminary qualitative results for the different approaches applied to different situations are described here. In particular, the various methods and their basic characteristics, similarities, and differences are described. In addition, several questions are addressed regarding the application of the methods and the information that they provide. These questions include (i) how the method(s) inform performance at different scales; (ii) how the methods provide information on location errors; (iii) whether the methods provide information on intensity errors and distributions; (iv) whether the methods provide information on structure errors; (v) whether the approaches have the ability to provide information about hits, misses, and false alarms; (vi) whether the methods do anything that is counterintuitive; (vii) whether the methods have selectable parameters and how sensitive the results are to parameter selection; (viii) whether the results can be easily aggregated across multiple cases; (ix) whether the methods can identify timing errors; and (x) whether confidence intervals and hypothesis tests can be readily computed.
Abstract
Advancements in weather forecast models and their enhanced resolution have led to substantially improved and more realistic-appearing forecasts for some variables. However, traditional verification scores often indicate poor performance because of the increased small-scale variability so that the true quality of the forecasts is not always characterized well. As a result, numerous new methods for verifying these forecasts have been proposed. These new methods can mostly be classified into two overall categories: filtering methods and displacement methods. The filtering methods can be further delineated into neighborhood and scale separation, and the displacement methods can be divided into features based and field deformation. Each method gives considerably more information than the traditional scores, but it is not clear which method(s) should be used for which purpose.
A verification methods intercomparison project has been established in order to glean a better understanding of the proposed methods in terms of their various characteristics and to determine what verification questions each method addresses. The study is ongoing, and preliminary qualitative results for the different approaches applied to different situations are described here. In particular, the various methods and their basic characteristics, similarities, and differences are described. In addition, several questions are addressed regarding the application of the methods and the information that they provide. These questions include (i) how the method(s) inform performance at different scales; (ii) how the methods provide information on location errors; (iii) whether the methods provide information on intensity errors and distributions; (iv) whether the methods provide information on structure errors; (v) whether the approaches have the ability to provide information about hits, misses, and false alarms; (vi) whether the methods do anything that is counterintuitive; (vii) whether the methods have selectable parameters and how sensitive the results are to parameter selection; (viii) whether the results can be easily aggregated across multiple cases; (ix) whether the methods can identify timing errors; and (x) whether confidence intervals and hypothesis tests can be readily computed.
Abstract
The authors use a procedure called the method for object-based diagnostic evaluation, commonly referred to as MODE, to compare forecasts made from two models representing separate cores of the Weather Research and Forecasting (WRF) model during the 2005 National Severe Storms Laboratory and Storm Prediction Center Spring Program. Both models, the Advanced Research WRF (ARW) and the Nonhydrostatic Mesoscale Model (NMM), were run without a traditional cumulus parameterization scheme on horizontal grid lengths of 4 km (ARW) and 4.5 km (NMM). MODE was used to evaluate 1-h rainfall accumulation from 24-h forecasts valid at 0000 UTC on 32 days between 24 April and 4 June 2005. The primary variable used for evaluation was a “total interest” derived from a fuzzy-logic algorithm that compared several attributes of forecast and observed rain features such as separation distance and spatial orientation. The maximum value of the total interest obtained by comparing an object in one field with all objects in the comparison field was retained as the quality of matching for that object. The median of the distribution of all such maximum-interest values was selected as a metric of the overall forecast quality.
Results from the 32 cases suggest that, overall, the configuration of the ARW model used during the 2005 Spring Program performed slightly better than the configuration of the NMM model. The primary manifestation of the differing levels of performance was fewer false alarms, forecast rain areas with no observed counterpart, in the ARW. However, it was noted that the performance varied considerably from day to day, with most days featuring indistinguishable performance. Thus, a small number of poor NMM forecasts produced the overall difference between the two models.
Abstract
The authors use a procedure called the method for object-based diagnostic evaluation, commonly referred to as MODE, to compare forecasts made from two models representing separate cores of the Weather Research and Forecasting (WRF) model during the 2005 National Severe Storms Laboratory and Storm Prediction Center Spring Program. Both models, the Advanced Research WRF (ARW) and the Nonhydrostatic Mesoscale Model (NMM), were run without a traditional cumulus parameterization scheme on horizontal grid lengths of 4 km (ARW) and 4.5 km (NMM). MODE was used to evaluate 1-h rainfall accumulation from 24-h forecasts valid at 0000 UTC on 32 days between 24 April and 4 June 2005. The primary variable used for evaluation was a “total interest” derived from a fuzzy-logic algorithm that compared several attributes of forecast and observed rain features such as separation distance and spatial orientation. The maximum value of the total interest obtained by comparing an object in one field with all objects in the comparison field was retained as the quality of matching for that object. The median of the distribution of all such maximum-interest values was selected as a metric of the overall forecast quality.
Results from the 32 cases suggest that, overall, the configuration of the ARW model used during the 2005 Spring Program performed slightly better than the configuration of the NMM model. The primary manifestation of the differing levels of performance was fewer false alarms, forecast rain areas with no observed counterpart, in the ARW. However, it was noted that the performance varied considerably from day to day, with most days featuring indistinguishable performance. Thus, a small number of poor NMM forecasts produced the overall difference between the two models.