Search Results

You are looking at 1 - 10 of 10 items for :

  • Author or Editor: Keith F. Brill x
  • Weather and Forecasting x
  • Refine by Access: All Content x
Clear All Modify Search
Keith F. Brill

Abstract

Performance measures computed from the 2 × 2 contingency table of outcomes for dichotomous forecasts are sensitive to bias. The method presented here evaluates how the probability of detection (POD) must change as bias changes so that a performance measure improves at a given value of bias. A critical performance ratio (CPR) of the change of POD to the change in bias is derived for a number of performance measures. If a change in POD associated with a bias change satisfies the CPR condition, the performance measure will indicate an improved forecast. If a perfect measure of performance existed, it would always exhibit its optimal value at bias equal to one. Actual measures of performance are susceptible to bias, indicating a better forecast for bias values not equal to one. The CPR is specifically applied to assess the conditions for an improvement toward a more favorable value of several commonly used performance measures as bias increases or decreases through the value one. All performance measures evaluated are found to have quantifiable bias sensitivity. The CPR is applied to analyzing a performance requirement and bias sensitivity in a geometric model.

Full access
Keith F. Brill
and
Fedor Mesinger
Full access
Keith F. Brill
and
Matthew Pyle

Abstract

Critical performance ratio (CPR) expressions for the eight conditional probabilities associated with the 2 × 2 contingency table of outcomes for binary (dichotomous “yes” or “no”) forecasts are derived. Two are shown to be useful in evaluating the effects of hedging as it approaches random change. The CPR quantifies how the probability of detection (POD) must change as frequency bias changes, so that a performance measure (or conditional probability) indicates an improved forecast for a given value of frequency bias. If yes forecasts were to be increased randomly, the probability of additional correct forecasts (hits) is given by the detection failure ratio (DFR). If the DFR for a performance measure is greater than the CPR, the forecast is likely to be improved by the random increase in yes forecasts. Thus, the DFR provides a benchmark for the CPR in the case of frequency bias inflation. If yes forecasts are decreased randomly, the probability of removing a hit is given by the frequency of hits (FOH). If the FOH for a performance measure is less than the CPR, the forecast is likely to be improved by the random decrease in yes forecasts. Therefore, the FOH serves as a benchmark for the CPR if the frequency bias is decreased. The closer the FOH (DFR) is to being less (greater) than or equal to the CPR, the more likely it may be to enhance the performance measure by decreasing (increasing) the frequency bias. It is shown that randomly increasing yes forecasts for a forecast that is itself better than a randomly generated forecast can improve the threat score but is not likely to improve the equitable threat score. The equitable threat score is recommended instead of the threat score whenever possible.

Full access
Keith F. Brill
and
Fedor Mesinger

Abstract

Bias-adjusted threat and equitable threat scores were designed to account for the effects of placement errors in assessing the performance of under- or overbiased forecasts. These bias-adjusted performance measures exhibit bias sensitivity. The critical performance ratio (CPR) is the minimum fraction of added forecasts that are correct for a performance measure to indicate improvement if bias is increased. In the opposite case, the CPR is the maximum fraction of removed forecasts that are correct for a performance measure to indicate improvement if bias is decreased. The CPR is derived here for the bias-adjusted threat and equitable threat scores to quantify bias sensitivity relative to several other measures of performance including conventional threat and equitable threat scores. The CPR for a bias-adjusted equitable threat score may indicate the likelihood of preserving or increasing the conventional equitable threat score if forecasts are bias corrected based on past performance.

Full access
Matthew E. Pyle
and
Keith F. Brill

Abstract

A fair comparison of quantitative precipitation forecast (QPF) products from multiple forecast sources using performance metrics based on a 2 × 2 contingency table with assessment of statistical significance of differences requires accounting for differing frequency biases to which the performance metrics are sensitive. A simple approach to address differing frequency biases modifies the 2 × 2 contingency table values using a mathematical assumption that determines the change in hit rate when the frequency bias is adjusted to unity. Another approach uses quantile mapping to remove the frequency bias of the QPFs by matching the frequency distribution of each QPF to the frequency distribution of the verifying analysis or points. If these two methods consistently yield the same result for assessing the statistical significance of differences between two QPF forecast sources when accounting for bias differences, then verification software can apply the simpler approach and existing 2 × 2 contingency tables can be used for statistical significance computations without recovering the original QPF and verifying data required for the bias removal approach. However, this study provides evidence for continued application and wider adoption of the bias removal approach.

Full access
Bruce A. Veenhuis
and
Keith F. Brill

Abstract

Quantitative precipitation forecast (QPF) applications often demand accumulations of precipitation for both long- and short-duration time intervals. It is desired that the shorter-duration forecasts sum to the longer-duration accumulations spanning the same time period. In the context of calibration, it is further desired that both the subinterval and longer interval accumulations be similarly corrected to have near unit frequency bias on a spatial domain. This study examines two methods of achieving these goals for 6- and 24-h accumulation intervals: 1) the accumulation method bias corrects the 6-h forecasts and accumulates them to create the 24-h accumulations; and 2) the disaggregation method bias corrects the 24-h accumulation and then proportionately disaggregates the 24-h accumulation back into 6-h accumulations. The experiments for the study are done retrospectively so that a “perfect” bias correction is possible for each method. The results of the study show that neither method accomplishes the stated goal for the calibration because QPF placement and/or timing errors contribute to frequency bias in the course of accumulation or disaggregation. However, both methods can improve the frequency bias for both the subinterval and longer interval accumulations. The choice of method may hinge most strongly on the relative tolerance of bias for the subinterval accumulations versus the longer interval accumulation.

Restricted access
Keith F. Brill
,
Anthony R. Fracasso
, and
Christopher M. Bailey

Abstract

This article explores the potential advantages of using a clustering approach to distill information contained within a large ensemble of forecasts in the medium-range time frame. A divisive clustering algorithm based on the one-dimensional discrete Fourier transformation is described and applied to the 70-member combination of the 20-member National Centers for Environmental Prediction (NCEP) Global Ensemble Forecast System (GEFS) and the 50-member European Centre for Medium-Range Weather Forecasts (ECMWF) ensemble. Cumulative statistical verification indicates that clusters selected objectively based on having the largest number of members do not perform better than the ECMWF ensemble mean. However, including a cluster in a blended forecast to maintain continuity or to nudge toward a preferred solution may be a reasonable strategy in some cases. In such cases, a cluster may be used to sharpen a forecast weakly depicted by the ensemble mean but favored in consideration of continuity, consistency, collaborative thinking, and/or the trend in the guidance. Clusters are often useful for depicting forecast solutions not apparent via the ensemble mean but supported by a subset of ensemble members. A specific case is presented to demonstrate the possible utility of a clustering approach in the forecasting process.

Full access
David R. Novak
,
Keith F. Brill
, and
Wallace A. Hogsett

Abstract

An objective technique to determine forecast snowfall ranges consistent with the risk tolerance of users is demonstrated. The forecast snowfall ranges are based on percentiles from probability distribution functions that are assumed to be perfectly calibrated. A key feature of the technique is that the snowfall range varies dynamically, with the resultant ranges varying based on the spread of ensemble forecasts at a given forecast projection, for a particular case, for a particular location. Furthermore, this technique allows users to choose their risk tolerance, quantified in terms of the expected false alarm ratio for forecasts of snowfall range. The technique is applied to the 4–7 March 2013 snowstorm at two different locations (Chicago, Illinois, and Washington, D.C.) to illustrate its use in different locations with different forecast uncertainties. The snowfall range derived from the Weather Prediction Center Probabilistic Winter Precipitation Forecast suite is found to be statistically reliable for the day 1 forecast during the 2013/14 season, providing confidence in the practical applicability of the technique.

Full access
Norman W. Junker
,
James E. Hoke
,
Bruce E. Sullivan
,
Keith F. Brill
, and
Francis J. Hughes

Abstract

This paper assesses the performance of the National Meteorological Center (NMC) Nested-Grid Model (NGM) during a period from March 1988 through March 1990, and the NMC medium-range forecast model (MRF) in two 136-day tests, one during summer made up of two 68-day periods (19 July–25 September 1989 and 20 June–28 August 1990) and one during winter and early spring (12 December 1989–26 April 1990). Seasonal and geographical variations of precipitation bias and threat score are discussed for each model. Differences in model performance in predicting various amounts of precipitation are described.

The performance of the NGM and MRF varied by season, geographic area, and precipitation amount. The bias of the models varied significantly during the year. The NGM and MRF overpredicted the frequency of measurable precipitation (≥0.01 in.) across much of the eastern half of the United States during the warm season. Both models, however, underpredicted the frequency of ≥0.50-in. amounts across the South during the cool season.

The smooth orography in both models has a strong impact on the models’ precipitation forecasts. Each model overpredicted the frequency of heavier precipitation over the southern Appalachians, over portions of the Gulf-facing upslope areas east of the Rocky Mountains, and to the lee of the Cascade and Sierra ranges of the West. The NGM underpredicted the frequency of heavier amounts on the Pacific-facing windward side of the Cascade Range of Oregon and Washington.

Model performance also seems to be related to the synoptic situation. Threat scores were higher when the midlevel westerlies were more active, with the highest threat scores found north of the most frequent track of cyclones during the cool season.

Full access
David R. Novak
,
Christopher Bailey
,
Keith F. Brill
,
Patrick Burke
,
Wallace A. Hogsett
,
Robert Rausch
, and
Michael Schichtel

Abstract

The role of the human forecaster in improving upon the accuracy of numerical weather prediction is explored using multiyear verification of human-generated short-range precipitation forecasts and medium-range maximum temperature forecasts from the Weather Prediction Center (WPC). Results show that human-generated forecasts improve over raw deterministic model guidance. Over the past two decades, WPC human forecasters achieved a 20%–40% improvement over the North American Mesoscale (NAM) model and the Global Forecast System (GFS) for the 1 in. (25.4 mm) (24 h)−1 threshold for day 1 precipitation forecasts, with a smaller, but statistically significant, 5%–15% improvement over the deterministic ECMWF model. Medium-range maximum temperature forecasts also exhibit statistically significant improvement over GFS model output statistics (MOS), and the improvement has been increasing over the past 5 yr. The quality added by humans for forecasts of high-impact events varies by element and forecast projection, with generally large improvements when the forecaster makes changes ≥8°F (4.4°C) to MOS temperatures. Human improvement over guidance for extreme rainfall events [3 in. (76.2 mm) (24 h)−1] is largest in the short-range forecast. However, human-generated forecasts failed to outperform the most skillful downscaled, bias-corrected ensemble guidance for precipitation and maximum temperature available near the same time as the human-modified forecasts. Thus, as additional downscaled and bias-corrected sensible weather element guidance becomes operationally available, and with the support of near-real-time verification, forecaster training, and tools to guide forecaster interventions, a key test is whether forecasters can learn to make statistically significant improvements over the most skillful of this guidance. Such a test can inform to what degree, and just how quickly, the role of the forecaster changes.

Full access