Search Results

You are looking at 1 - 10 of 18 items for

  • Author or Editor: Eric Gilleland x
  • Refine by Access: All Content x
Clear All Modify Search
Eric Gilleland

Abstract

When making statistical inferences, bootstrap resampling methods are often appealing because of less stringent assumptions about the distribution of the statistic(s) of interest. However, the procedures are not free of assumptions. This paper addresses a specific situation that occurs frequently in atmospheric sciences where the standard bootstrap is not appropriate: comparative forecast verification of continuous variables. In this setting, the question to be answered concerns which of two weather or climate models is better in the sense of some type of average deviation from observations. The series to be compared are generally strongly dependent, which invalidates the most basic bootstrap technique. This paper also introduces new bootstrap code from the R package “distillery” that facilitates easy implementation of appropriate methods for paired-difference-of-means bootstrap procedures for dependent data.

Free access
Eric Gilleland

Abstract

This paper is the sequel to a companion paper on bootstrap resampling that reviews bootstrap methodology for making statistical inferences for atmospheric science applications where the necessary assumptions are often not met for the most commonly used resampling procedures. In particular, this sequel addresses extreme-value analysis applications with discussion on the challenges for finding accurate bootstrap methods in this context. New bootstrap code from the R packages “distillery” and “extRemes” is introduced. It is further found that one approach for accurate confidence intervals in this setting is not well suited to the case when the random sample’s distribution is not stationary.

Free access
Eric Gilleland

Abstract

Which model is best? Many challenges exist when testing competing forecast models, especially for those with high spatial resolution. Spatial correlation, double penalties, and small-scale errors are just a few such challenges. Many new methods have been developed in recent decades to tackle these issues. The spatial prediction comparison test (SPCT), which was developed for general spatial fields and applied to wind speed, is applied here to precipitation fields; which pose many unique challenges in that they are not normally distributed, are marked by numerous zero-valued grid points, and verification results are particularly sensitive to small-scale errors and double penalties. The SPCT yields a statistical test that solves one important issue for verifying forecasts spatially by accounting for spatial correlation. Important for precipitation forecasts is that the test requires no distributional assumptions, is easy to perform, and can be applied efficiently to either gridded or nongridded spatial fields. The test compares loss functions between two competing forecasts, where any such function can be used, but most still suffer from the limitations of traditional gridpoint-by-gridpoint assessment techniques. Therefore, two new loss functions to the SPCT are introduced here that address these concerns. The first is based on distance maps and the second on image warping. Results are consistent with other spatial assessment methods, but provide a relatively straightforward mechanism for comparing forecasts with a statistically powerful test. The SPCT combined with these loss functions provides a new mechanism for appropriately testing which of two competing precipitation models is best, and whether the result is statistically significant or not.

Full access
Eric Gilleland

Abstract

This paper proposes new diagnostic plots that take advantage of the lack of symmetry in the mean-error distance measure (MED) for binary images to yield a new concept of false alarms and misses appropriate to the spatial setting where the measure does not require perfect matching to be a hit or correct negative. Additionally, three previously proposed geometric indices that provide complementary information about forecast performance are used to produce useful diagnostic plots for forecast performance. The diagnostics are applied to previously analyzed case studies from the spatial forecast verification Intercomparison Project (ICP) to facilitate a comparison with more complicated methods. Relatively new test cases from the Mesoscale Verification Intercomparison over Complex Terrain (MesoVICT) project are also employed for future comparisons. It is found that the proposed techniques provide useful information about forecast model behavior by way of a succinct, easy-to-implement method that can be complementary to other measures of forecast performance.

Full access
Eric Gilleland

Abstract

A mathematical displacement metric, Baddeley’s Δ, is examined for verifying gridded forecasts against gridded observations using the Spatial Forecast Verification Methods Intercomparison Project test cases. Results are compared with several other new approaches. The metric performs similarly to other displacement methods, complementing neighborhood techniques.

Full access
Eric Gilleland
,
Johan Lindström
, and
Finn Lindgren

Abstract

Image warping for spatial forecast verification is applied to the test cases employed by the Spatial Forecast Verification Intercomparison Project (ICP), which includes both real and contrived cases. A larger set of cases is also used to investigate aggregating results for summarizing forecast performance over a long record of forecasts. The technique handles the geometric and perturbed cases with nearly exact precision, as would be expected. A statistic, dubbed here the IWS for image warp statistic, is proposed for ranking multiple forecasts and tested on the perturbed cases. IWS rankings for perturbed and real test cases are found to be sensible and physically interpretable. A powerful result of this study is that the image warp can be employed using a relatively sparse, preset regular grid without having to first identify features.

Full access
Eric Gilleland
,
Domingo Muñoz-Esparza
, and
David D. Turner

Abstract

When testing hypotheses about which of two competing models is better, say A and B, the difference is often not significant. An alternative, complementary approach, is to measure how often model A is better than model B regardless of how slight or large the difference. The hypothesis concerns whether or not the percentage of time that model A is better than model B is larger than 50%. One generalized test statistic that can be used is the power-divergence test, which encompasses many familiar goodness-of-fit test statistics, such as the loglikelihood-ratio and Pearson X 2 tests. Theoretical results justify using the χ k 1 2 distribution for the entire family of test statistics, where k is the number of categories. However, these results assume that the underlying data are independent and identically distributed, which is often violated. Empirical results demonstrate that the reduction to two categories (i.e., model A is better than model B versus model B is better than A) results in a test that is reasonably robust to even severe departures from temporal independence, as well as contemporaneous correlation. The test is demonstrated on two different example verification sets: 6-h forecasts of eddy dissipation rate (m2/3 s−1) from two versions of the Graphical Turbulence Guidance model and for 12-h forecasts of 2-m temperature (°C) and 10-m wind speed (m s−1) from two versions of the High-Resolution Rapid Refresh model. The novelty of this paper is in demonstrating the utility of the power-divergence statistic in the face of temporally dependent data, as well as the emphasis on testing for the “frequency-of-better” alongside more traditional measures.

Open access
David Ahijevych
,
Eric Gilleland
,
Barbara G. Brown
, and
Elizabeth E. Ebert

Abstract

Several spatial forecast verification methods have been developed that are suited for high-resolution precipitation forecasts. They can account for the spatial coherence of precipitation and give credit to a forecast that does not necessarily match the observation at any particular grid point. The methods were grouped into four broad categories (neighborhood, scale separation, features based, and field deformation) for the Spatial Forecast Verification Methods Intercomparison Project (ICP). Participants were asked to apply their new methods to a set of artificial geometric and perturbed forecasts with prescribed errors, and a set of real forecasts of convective precipitation on a 4-km grid. This paper describes the intercomparison test cases, summarizes results from the geometric cases, and presents subjective scores and traditional scores from the real cases.

All the new methods could detect bias error, and the features-based and field deformation methods were also able to diagnose displacement errors of precipitation features. The best approach for capturing errors in aspect ratio was field deformation. When comparing model forecasts with real cases, the traditional verification scores did not agree with the subjective assessment of the forecasts.

Full access
Eric Gilleland
,
Thomas C. M. Lee
,
John Halley Gotway
,
R. G. Bullock
, and
Barbara G. Brown

Abstract

An important focus of research in the forecast verification community is the development of alternative verification approaches for quantitative precipitation forecasts, as well as for other spatial forecasts. The need for information that is meaningful in an operational context and the importance of capturing the specific sources of forecast error at varying spatial scales are two primary motivating factors. In this paper, features of precipitation as identified by a convolution threshold technique are merged within fields and matched across fields in an automatic and computationally efficient manner using Baddeley’s metric for binary images.

The method is carried out on 100 test cases, and 4 representative cases are shown in detail. Results of merging and matching objects are generally positive in that they are consistent with how a subjective observer might merge and match features. The results further suggest that the Baddeley metric may be useful as a computationally efficient summary metric giving information about location, shape, and size differences of individual features, which could be employed for other spatial forecast verification methods.

Full access
Eric Gilleland
,
David Ahijevych
,
Barbara G. Brown
,
Barbara Casati
, and
Elizabeth E. Ebert

Abstract

Advancements in weather forecast models and their enhanced resolution have led to substantially improved and more realistic-appearing forecasts for some variables. However, traditional verification scores often indicate poor performance because of the increased small-scale variability so that the true quality of the forecasts is not always characterized well. As a result, numerous new methods for verifying these forecasts have been proposed. These new methods can mostly be classified into two overall categories: filtering methods and displacement methods. The filtering methods can be further delineated into neighborhood and scale separation, and the displacement methods can be divided into features based and field deformation. Each method gives considerably more information than the traditional scores, but it is not clear which method(s) should be used for which purpose.

A verification methods intercomparison project has been established in order to glean a better understanding of the proposed methods in terms of their various characteristics and to determine what verification questions each method addresses. The study is ongoing, and preliminary qualitative results for the different approaches applied to different situations are described here. In particular, the various methods and their basic characteristics, similarities, and differences are described. In addition, several questions are addressed regarding the application of the methods and the information that they provide. These questions include (i) how the method(s) inform performance at different scales; (ii) how the methods provide information on location errors; (iii) whether the methods provide information on intensity errors and distributions; (iv) whether the methods provide information on structure errors; (v) whether the approaches have the ability to provide information about hits, misses, and false alarms; (vi) whether the methods do anything that is counterintuitive; (vii) whether the methods have selectable parameters and how sensitive the results are to parameter selection; (viii) whether the results can be easily aggregated across multiple cases; (ix) whether the methods can identify timing errors; and (x) whether confidence intervals and hypothesis tests can be readily computed.

Full access