Search Results

You are looking at 1 - 10 of 13 items for :

  • Author or Editor: Barbara G. Brown x
  • Weather and Forecasting x
  • Refine by Access: All Content x
Clear All Modify Search
Barbara G. Brown
and
Allan H. Murphy

Abstract

Fire-weather forecasts (FWFs) prepared by National Weather Service (NWS) forecasters on an operational basis are traditionally expressed in categorical terms. However, to make rational and optimal use of such forecasts, fire managers need quantitative information concerning the uncertainty inherent in the forecasts. This paper reports the results of two studies related to the quantification of uncertainty in operational and experimental FWFs.

Evaluation of samples of operational categorical FWFs reveals that these forecasts contain considerable uncertainty. The forecasts also exhibit modest but consistent biases which suggest that the forecasters are influenced by the impacts of the relevant events on fire behavior. These results underscore the need for probabilistic FWFs.

The results of a probabilistic fire-weather forecasting experiment indicate that NWS forecasters are able to make quite reliable and reasonably precise credible interval temperature forecasts. However, the experimental relative humidity and wind speed forecasts exhibit considerable overforecasting and minimal skill. Although somewhat disappointing, these results are not too surprising in view of the fact that (a) the forecasters had little, if any, experience in probability forecasting; (b) no feedback was provided to the forecasters during the experimental period; and (c) the experiment was of quite limited duration. More extensive experimental and operational probability forecasting trials as well as user-oriented studies are required to enhance the quality of FWFs and to ensure that the forecasts are used in an optimal manner.

Full access
Allan H. Murphy
,
Barbara G. Brown
, and
Yin-Sheng Chen

Abstract

A diagnostic approach to forecast verification is described and illustrated. This approach is based on a general framework for forecast verification. It is “diagnostic” in the sense that it focuses on the fundamental characteristics of the forecasts, the corresponding observations, and their relationship.

Three classes of diagnostic verification methods are identified: 1) the joint distribution of forecasts and observations and conditional and marginal distributions associated with factorizations of this joint distribution; 2) summary measures of these joint, conditional, and marginal distributions; and 3) performance measures and their decompositions. Linear regression models that can be used to describe the relationship between forecasts and observations are also presented. Graphical displays are advanced as a means of enhancing the utility of this body of diagnostic verification methodology.

A sample of National Weather Service maximum temperature forecasts (and observations) for Minneapolis, Minnesota, is analyzed to illustrate the use of this methodology. Graphical displays of the basic distributions and various summary measures are employed to obtain insights into distributional characteristics such as central tendency, variability, and asymmetry. The displays also facilitate the comparison of these characteristics among distributions–for example, between distributions involving forecasts and observations, among distributions involving different types of forecasts, and among distributions involving forecasts for different seasons or lead times. Performance measures and their decompositions are shown to provide quantitative information regarding basic dimensions of forecast quality such as bias, accuracy, calibration (or reliability), discrimination, and skill. Information regarding both distributional and performance characteristics is needed by modelers and forecasters concerned with improving forecast quality. Some implications of these diagnostic methods for verification procedures and practices are discussed.

Full access
Gregory Thompson
,
Roelof T. Bruintjes
,
Barbara G. Brown
, and
Frank Hage

Abstract

The purpose of the Federal Aviation Administration’s Icing Forecasting Improvement Program is to conduct research on icing conditions both in flight and on the ground. This paper describes a portion of the in-flight aircraft icing prediction effort through a comprehensive icing prediction and evaluation project conducted by the Research Applications Program at the National Center for Atmospheric Research. During this project, in- flight icing potential was forecast using algorithms developed by RAP, the National Weather Service’s National Aviation Weather Advisory Unit, and the Air Force Global Weather Center in conjunction with numerical model data from the Eta, MAPS, and MM5 models. Furthermore, explicit predictions of cloud liquid water were available from the Eta and MM5 models and were also used to forecast icing potential.

To compare subjectively the different algorithms, predicted icing regions and observed pilot reports were viewed simultaneously on an interactive, real-time display. To measure objectively the skill of icing predictions, a rigorous statistical evaluation was performed in order to compare the different algorithms (details and results are provided in Part II). Both the subjective and objective comparisons are presented here for a particular case study, whereas results from the entire project are found in Part II. By statistically analyzing 2 months worth of data, it appears that further advances in temperature and relative-humidity-based algorithms are unlikely. Explicit cloud liquid water predictions, however, show promising results although still relatively new in operational numerical models.

Full access
Christopher A. Davis
,
Barbara G. Brown
,
Randy Bullock
, and
John Halley-Gotway

Abstract

The authors use a procedure called the method for object-based diagnostic evaluation, commonly referred to as MODE, to compare forecasts made from two models representing separate cores of the Weather Research and Forecasting (WRF) model during the 2005 National Severe Storms Laboratory and Storm Prediction Center Spring Program. Both models, the Advanced Research WRF (ARW) and the Nonhydrostatic Mesoscale Model (NMM), were run without a traditional cumulus parameterization scheme on horizontal grid lengths of 4 km (ARW) and 4.5 km (NMM). MODE was used to evaluate 1-h rainfall accumulation from 24-h forecasts valid at 0000 UTC on 32 days between 24 April and 4 June 2005. The primary variable used for evaluation was a “total interest” derived from a fuzzy-logic algorithm that compared several attributes of forecast and observed rain features such as separation distance and spatial orientation. The maximum value of the total interest obtained by comparing an object in one field with all objects in the comparison field was retained as the quality of matching for that object. The median of the distribution of all such maximum-interest values was selected as a metric of the overall forecast quality.

Results from the 32 cases suggest that, overall, the configuration of the ARW model used during the 2005 Spring Program performed slightly better than the configuration of the NMM model. The primary manifestation of the differing levels of performance was fewer false alarms, forecast rain areas with no observed counterpart, in the ARW. However, it was noted that the performance varied considerably from day to day, with most days featuring indistinguishable performance. Thus, a small number of poor NMM forecasts produced the overall difference between the two models.

Full access
David Ahijevych
,
Eric Gilleland
,
Barbara G. Brown
, and
Elizabeth E. Ebert

Abstract

Several spatial forecast verification methods have been developed that are suited for high-resolution precipitation forecasts. They can account for the spatial coherence of precipitation and give credit to a forecast that does not necessarily match the observation at any particular grid point. The methods were grouped into four broad categories (neighborhood, scale separation, features based, and field deformation) for the Spatial Forecast Verification Methods Intercomparison Project (ICP). Participants were asked to apply their new methods to a set of artificial geometric and perturbed forecasts with prescribed errors, and a set of real forecasts of convective precipitation on a 4-km grid. This paper describes the intercomparison test cases, summarizes results from the geometric cases, and presents subjective scores and traditional scores from the real cases.

All the new methods could detect bias error, and the features-based and field deformation methods were also able to diagnose displacement errors of precipitation features. The best approach for capturing errors in aspect ratio was field deformation. When comparing model forecasts with real cases, the traditional verification scores did not agree with the subjective assessment of the forecasts.

Full access
Eric Gilleland
,
David Ahijevych
,
Barbara G. Brown
,
Barbara Casati
, and
Elizabeth E. Ebert

Abstract

Advancements in weather forecast models and their enhanced resolution have led to substantially improved and more realistic-appearing forecasts for some variables. However, traditional verification scores often indicate poor performance because of the increased small-scale variability so that the true quality of the forecasts is not always characterized well. As a result, numerous new methods for verifying these forecasts have been proposed. These new methods can mostly be classified into two overall categories: filtering methods and displacement methods. The filtering methods can be further delineated into neighborhood and scale separation, and the displacement methods can be divided into features based and field deformation. Each method gives considerably more information than the traditional scores, but it is not clear which method(s) should be used for which purpose.

A verification methods intercomparison project has been established in order to glean a better understanding of the proposed methods in terms of their various characteristics and to determine what verification questions each method addresses. The study is ongoing, and preliminary qualitative results for the different approaches applied to different situations are described here. In particular, the various methods and their basic characteristics, similarities, and differences are described. In addition, several questions are addressed regarding the application of the methods and the information that they provide. These questions include (i) how the method(s) inform performance at different scales; (ii) how the methods provide information on location errors; (iii) whether the methods provide information on intensity errors and distributions; (iv) whether the methods provide information on structure errors; (v) whether the approaches have the ability to provide information about hits, misses, and false alarms; (vi) whether the methods do anything that is counterintuitive; (vii) whether the methods have selectable parameters and how sensitive the results are to parameter selection; (viii) whether the results can be easily aggregated across multiple cases; (ix) whether the methods can identify timing errors; and (x) whether confidence intervals and hypothesis tests can be readily computed.

Full access
Barbara G. Brown
,
Gregory Thompson
,
Roelof T. Bruintjes
,
Randy Bullock
, and
Tressa Kane

Abstract

Recent research to improve forecasts of in-flight icing conditions has involved the development of algorithms to apply to the output of numerical weather prediction models. The abilities of several of these algorithms to predict icing conditions, as verified by pilot reports (PIREPs), are compared for two numerical weather prediction models (Eta and the Mesoscale Analysis and Prediction System) for the Winter Icing and Storms Program 1994 (WISP94) time period (25 January–25 March 1994). Algorithms included in the comparison were developed by the National Aviation Weather Advisory Unit [NAWAU, now the Aviation Weather Center (AWC)], the National Center for Atmospheric Research’s Research Applications Program (RAP), and the U.S. Air Force. Operational icing forecasts (AIRMETs) issued by NAWAU for the same time period are evaluated to provide a standard of comparison. The capabilities of the Eta Model’s explicit cloud liquid water estimates for identifying icing regions are also evaluated and compared to the algorithm results.

Because PIREPs are not systematic and are biased toward positive reports, it is difficult to estimate standard verification parameters related to overforecasting (e.g., false alarm ratio). Methods are developed to compensate for these attributes of the PIREPs. The primary verification statistics computed include the probability of detection (POD) of yes and no reports, and the areal and volume extent of the forecast region.

None of the individual algorithms were able to obtain both a higher POD and a smaller area than any other algorithm; increases in POD are associated in all cases with increases in area. The RAP algorithm provides additional information by attempting to identify the physical mechanisms associated with the forecast icing conditions. One component of the RAP algorithm, which is designed to detect and forecast icing in regions of“warm” stratiform clouds, is more efficient at detecting icing than the other components. Cloud liquid water shows promise for development as a predictor of icing conditions, with detection rates of 30% or more in this initial study. AIRMETs were able to detect approximately the same percentage of icing reports as the algorithms, but with somewhat smaller forecast areas and somewhat larger forecast volumes on average. The algorithms are able to provide guidance with characteristics that are similar to the AIRMETs and should be useful in their formulation.

Full access
Jamie K. Wolff
,
Michelle Harrold
,
Tressa Fowler
,
John Halley Gotway
,
Louisa Nance
, and
Barbara G. Brown

Abstract

While traditional verification methods are commonly used to assess numerical model quantitative precipitation forecasts (QPFs) using a grid-to-grid approach, they generally offer little diagnostic information or reasoning behind the computed statistic. On the other hand, advanced spatial verification techniques, such as neighborhood and object-based methods, can provide more meaningful insight into differences between forecast and observed features in terms of skill with spatial scale, coverage area, displacement, orientation, and intensity. To demonstrate the utility of applying advanced verification techniques to mid- and coarse-resolution models, the Developmental Testbed Center (DTC) applied several traditional metrics and spatial verification techniques to QPFs provided by the Global Forecast System (GFS) and operational North American Mesoscale Model (NAM). Along with frequency bias and Gilbert skill score (GSS) adjusted for bias, both the fractions skill score (FSS) and Method for Object-Based Diagnostic Evaluation (MODE) were utilized for this study with careful consideration given to how these methods were applied and how the results were interpreted. By illustrating the types of forecast attributes appropriate to assess with the spatial verification techniques, this paper provides examples of how to obtain advanced diagnostic information to help identify what aspects of the forecast are or are not performing well.

Full access
Hui-Ling Chang
,
Barbara G. Brown
,
Pao-Shin Chu
,
Yu-Chieng Liou
, and
Wen-Ho Wang

Abstract

Focusing on afternoon thunderstorms in Taiwan during the warm season (May–October) under weak synoptic forcing, this study applied the Taiwan Auto-NowCaster (TANC) to produce 1-h likelihood nowcasts of afternoon convection initiation (ACI) using a fuzzy logic approach. The primary objective is to design more useful forecast products with uncertainty regions of predicted thunderstorms to provide nowcast guidance of ACI for forecasters. Four sensitivity tests on forecast performance were conducted to improve the usefulness of nowcasts for forecasters. The optimal likelihood threshold (Lt) for ACIs, which is the likelihood value that best corresponds to the observed ACIs, was determined to be 0.6. Because of the high uncertainty on the exact location or timing of ACIs in nowcasts, location displacement and temporal shifting of ACIs should be considered in operational applications. When a spatial window of 5 km and a temporal window of 18 min are applied, the TANC displays moderate accuracy and satisfactory discrimination with an acceptable degree of overforecasting. The nonparametric Mann–Whitney test indicated that the performance of the TANC substantially surpasses the competing Space and Time Multiscale Analysis System–Weather Research and Forecasting Model, which serves as a pertinent reference for short-range (0–6 h) forecasts at the Central Weather Bureau in Taiwan.

Full access
Elizabeth E. Ebert
,
Laurence J. Wilson
,
Barbara G. Brown
,
Pertti Nurmi
,
Harold E. Brooks
,
John Bally
, and
Matthias Jaeneke

Abstract

The verification phase of the World Weather Research Programme (WWRP) Sydney 2000 Forecast Demonstration Project (FDP) was intended to measure the skill of the participating nowcast algorithms in predicting the location of convection, rainfall rate and occurrence, wind speed and direction, severe thunderstorm wind gusts, and hail location and size. An additional question of interest was whether forecasters could improve the quality of the nowcasts compared to the FDP products alone.

The nowcasts were verified using a variety of statistical techniques. Observational data came from radar reflectivity and rainfall analyses, a network of rain gauges, and human (spotter) observations. The verification results showed that the cell tracking algorithms predicted the location of the strongest cells with a mean error of about 15–30 km for a 1-h forecast, and were usually more accurate than an extrapolation (Lagrangian persistence) forecast. Mean location errors for the area tracking schemes were on the order of 20 km.

Almost all of the algorithms successfully predicted the frequency of rain throughout the forecast period, although most underestimated the frequency of high rain rates. The skill in predicting rain occurrence decreased very quickly into the forecast period. In particular, the algorithms could not predict the precise location of heavy rain beyond the first 10–20 min. Using radar analyses as verification, the algorithms' spatial forecasts were consistently more skillful than simple persistence. However, when verified against rain gauge observations at point locations, the algorithms had difficulty beating persistence, mainly due to differences in spatial and temporal resolution.

Only one algorithm attempted to forecast gust fronts. The results for a limited sample showed a mean absolute error of 7 km h−1 and mean bias of 3 km h−1 in the speed of the gust fronts during the FDP. The errors in sea-breeze front forecasts were half as large, with essentially no bias. Verification of the hail associated with the 3 November tornadic storm showed that the two algorithms that estimated hail size and occurrence successfully diagnosed the onset and cessation of the hail to within 30 min of the reported sightings. The time evolution of hail size was reasonably well captured by the algorithms, and the predicted mean and maximum hail diameters were consistent with the observations.

The Thunderstorm Interactive Forecast System (TIFS) allowed forecasters to modify the output of the cell tracking nowcasts, primarily using it to remove cells that were insignificant or diagnosed with incorrect motion. This manual filtering resulted in markedly reduced mean cell position errors when compared to the unfiltered forecasts. However, when forecasters attempted to adjust the storm tracks for a small number of well-defined intense cells, the position errors increased slightly, suggesting that in such cases the objective guidance is probably the best estimate of storm motion.

Full access