Search Results

You are looking at 1 - 10 of 16 items for

  • Author or Editor: Eric Gilleland x
  • All content x
Clear All Modify Search
Eric Gilleland

Abstract

This paper proposes new diagnostic plots that take advantage of the lack of symmetry in the mean-error distance measure (MED) for binary images to yield a new concept of false alarms and misses appropriate to the spatial setting where the measure does not require perfect matching to be a hit or correct negative. Additionally, three previously proposed geometric indices that provide complementary information about forecast performance are used to produce useful diagnostic plots for forecast performance. The diagnostics are applied to previously analyzed case studies from the spatial forecast verification Intercomparison Project (ICP) to facilitate a comparison with more complicated methods. Relatively new test cases from the Mesoscale Verification Intercomparison over Complex Terrain (MesoVICT) project are also employed for future comparisons. It is found that the proposed techniques provide useful information about forecast model behavior by way of a succinct, easy-to-implement method that can be complementary to other measures of forecast performance.

Full access
Eric Gilleland

Abstract

A mathematical displacement metric, Baddeley’s Δ, is examined for verifying gridded forecasts against gridded observations using the Spatial Forecast Verification Methods Intercomparison Project test cases. Results are compared with several other new approaches. The metric performs similarly to other displacement methods, complementing neighborhood techniques.

Full access
Eric Gilleland

Abstract

Which model is best? Many challenges exist when testing competing forecast models, especially for those with high spatial resolution. Spatial correlation, double penalties, and small-scale errors are just a few such challenges. Many new methods have been developed in recent decades to tackle these issues. The spatial prediction comparison test (SPCT), which was developed for general spatial fields and applied to wind speed, is applied here to precipitation fields; which pose many unique challenges in that they are not normally distributed, are marked by numerous zero-valued grid points, and verification results are particularly sensitive to small-scale errors and double penalties. The SPCT yields a statistical test that solves one important issue for verifying forecasts spatially by accounting for spatial correlation. Important for precipitation forecasts is that the test requires no distributional assumptions, is easy to perform, and can be applied efficiently to either gridded or nongridded spatial fields. The test compares loss functions between two competing forecasts, where any such function can be used, but most still suffer from the limitations of traditional gridpoint-by-gridpoint assessment techniques. Therefore, two new loss functions to the SPCT are introduced here that address these concerns. The first is based on distance maps and the second on image warping. Results are consistent with other spatial assessment methods, but provide a relatively straightforward mechanism for comparing forecasts with a statistically powerful test. The SPCT combined with these loss functions provides a new mechanism for appropriately testing which of two competing precipitation models is best, and whether the result is statistically significant or not.

Full access
Eric Gilleland

Abstract

When making statistical inferences, bootstrap resampling methods are often appealing because of less stringent assumptions about the distribution of the statistic(s) of interest. However, the procedures are not free of assumptions. This paper addresses a specific situation that occurs frequently in atmospheric sciences where the standard bootstrap is not appropriate: comparative forecast verification of continuous variables. In this setting, the question to be answered concerns which of two weather or climate models is better in the sense of some type of average deviation from observations. The series to be compared are generally strongly dependent, which invalidates the most basic bootstrap technique. This paper also introduces new bootstrap code from the R package “distillery” that facilitates easy implementation of appropriate methods for paired-difference-of-means bootstrap procedures for dependent data.

Restricted access
Eric Gilleland

Abstract

This paper is the sequel to a companion paper on bootstrap resampling that reviews bootstrap methodology for making statistical inferences for atmospheric science applications where the necessary assumptions are often not met for the most commonly used resampling procedures. In particular, this sequel addresses extreme-value analysis applications with discussion on the challenges for finding accurate bootstrap methods in this context. New bootstrap code from the R packages “distillery” and “extRemes” is introduced. It is further found that one approach for accurate confidence intervals in this setting is not well suited to the case when the random sample’s distribution is not stationary.

Restricted access
Eric Gilleland, Johan Lindström, and Finn Lindgren

Abstract

Image warping for spatial forecast verification is applied to the test cases employed by the Spatial Forecast Verification Intercomparison Project (ICP), which includes both real and contrived cases. A larger set of cases is also used to investigate aggregating results for summarizing forecast performance over a long record of forecasts. The technique handles the geometric and perturbed cases with nearly exact precision, as would be expected. A statistic, dubbed here the IWS for image warp statistic, is proposed for ranking multiple forecasts and tested on the perturbed cases. IWS rankings for perturbed and real test cases are found to be sensible and physically interpretable. A powerful result of this study is that the image warp can be employed using a relatively sparse, preset regular grid without having to first identify features.

Full access
David Ahijevych, Eric Gilleland, Barbara G. Brown, and Elizabeth E. Ebert

Abstract

Several spatial forecast verification methods have been developed that are suited for high-resolution precipitation forecasts. They can account for the spatial coherence of precipitation and give credit to a forecast that does not necessarily match the observation at any particular grid point. The methods were grouped into four broad categories (neighborhood, scale separation, features based, and field deformation) for the Spatial Forecast Verification Methods Intercomparison Project (ICP). Participants were asked to apply their new methods to a set of artificial geometric and perturbed forecasts with prescribed errors, and a set of real forecasts of convective precipitation on a 4-km grid. This paper describes the intercomparison test cases, summarizes results from the geometric cases, and presents subjective scores and traditional scores from the real cases.

All the new methods could detect bias error, and the features-based and field deformation methods were also able to diagnose displacement errors of precipitation features. The best approach for capturing errors in aspect ratio was field deformation. When comparing model forecasts with real cases, the traditional verification scores did not agree with the subjective assessment of the forecasts.

Full access
Eric Gilleland, Thomas C. M. Lee, John Halley Gotway, R. G. Bullock, and Barbara G. Brown

Abstract

An important focus of research in the forecast verification community is the development of alternative verification approaches for quantitative precipitation forecasts, as well as for other spatial forecasts. The need for information that is meaningful in an operational context and the importance of capturing the specific sources of forecast error at varying spatial scales are two primary motivating factors. In this paper, features of precipitation as identified by a convolution threshold technique are merged within fields and matched across fields in an automatic and computationally efficient manner using Baddeley’s metric for binary images.

The method is carried out on 100 test cases, and 4 representative cases are shown in detail. Results of merging and matching objects are generally positive in that they are consistent with how a subjective observer might merge and match features. The results further suggest that the Baddeley metric may be useful as a computationally efficient summary metric giving information about location, shape, and size differences of individual features, which could be employed for other spatial forecast verification methods.

Full access
Eric Gilleland, David Ahijevych, Barbara G. Brown, Barbara Casati, and Elizabeth E. Ebert

Abstract

Advancements in weather forecast models and their enhanced resolution have led to substantially improved and more realistic-appearing forecasts for some variables. However, traditional verification scores often indicate poor performance because of the increased small-scale variability so that the true quality of the forecasts is not always characterized well. As a result, numerous new methods for verifying these forecasts have been proposed. These new methods can mostly be classified into two overall categories: filtering methods and displacement methods. The filtering methods can be further delineated into neighborhood and scale separation, and the displacement methods can be divided into features based and field deformation. Each method gives considerably more information than the traditional scores, but it is not clear which method(s) should be used for which purpose.

A verification methods intercomparison project has been established in order to glean a better understanding of the proposed methods in terms of their various characteristics and to determine what verification questions each method addresses. The study is ongoing, and preliminary qualitative results for the different approaches applied to different situations are described here. In particular, the various methods and their basic characteristics, similarities, and differences are described. In addition, several questions are addressed regarding the application of the methods and the information that they provide. These questions include (i) how the method(s) inform performance at different scales; (ii) how the methods provide information on location errors; (iii) whether the methods provide information on intensity errors and distributions; (iv) whether the methods provide information on structure errors; (v) whether the approaches have the ability to provide information about hits, misses, and false alarms; (vi) whether the methods do anything that is counterintuitive; (vii) whether the methods have selectable parameters and how sensitive the results are to parameter selection; (viii) whether the results can be easily aggregated across multiple cases; (ix) whether the methods can identify timing errors; and (x) whether confidence intervals and hypothesis tests can be readily computed.

Full access
Eric Gilleland, Amanda S. Hering, Tressa L. Fowler, and Barbara G. Brown

Abstract

Which of two competing continuous forecasts is better? This question is often asked in forecast verification, as well as climate model evaluation. Traditional statistical tests seem to be well suited to the task of providing an answer. However, most such tests do not account for some of the special underlying circumstances that are prevalent in this domain. For example, model output is seldom independent in time, and the models being compared are geared to predicting the same state of the atmosphere, and thus they could be contemporaneously correlated with each other. These types of violations of the assumptions of independence required for most statistical tests can greatly impact the accuracy and power of these tests. Here, this effect is examined on simulated series for many common testing procedures, including two-sample and paired t and normal approximation z tests, the z test with a first-order variance inflation factor applied, and the newer Hering–Genton (HG) test, as well as several bootstrap methods. While it is known how most of these tests will behave in the face of temporal dependence, it is less clear how contemporaneous correlation will affect them. Moreover, it is worthwhile knowing just how badly the tests can fail so that if they are applied, reasonable conclusions can be drawn. It is found that the HG test is the most robust to both temporal dependence and contemporaneous correlation, as well as the specific type and strength of temporal dependence. Bootstrap procedures that account for temporal dependence stand up well to contemporaneous correlation and temporal dependence, but require large sample sizes to be accurate.

Open access