Search Results

You are looking at 1 - 10 of 12 items for :

  • Author or Editor: Caren Marzban x
  • Weather and Forecasting x
  • Refine by Access: All Content x
Clear All Modify Search
Caren Marzban

Abstract

A set of 14 scalar, nonprobabilistic measures—some old, some new—is examined in the rare-event situation. The set includes measures of accuracy, association, discrimination, bias, and skill. It is found that all measures considered herein are inequitable in that they induce under- or overforecasting. One condition under which such bias is not induced (for some of the measures) is when the underlying class-conditional distributions are Gaussian (normal) and equivariant.

Full access
Caren Marzban

Abstract

The receiver operating characteristic (ROC) curve is a two-dimensional measure of classification performance. The area under the ROC curve (AUC) is a scalar measure gauging one facet of performance. In this short article, five idealized models are utilized to relate the shape of the ROC curve, and the area under it, to features of the underlying distribution of forecasts. This allows for an interpretation of the former in terms of the latter. The analysis is pedagogical in that many of the findings are already known in more general (and more realistic) settings; however, the simplicity of the models considered here allows for a clear exposition of the relation. For example, although in general there are many reasons for an asymmetric ROC curve, the models considered here clearly illustrate that an asymmetry in the ROC curve can be attributed to unequal widths of the distributions. Furthermore, it is shown that AUC discriminates well between “good” and “bad” models, but not between good models.

Full access
Caren Marzban

Abstract

The distinction between forecast quality and economic value in a cost–loss formulation is well known. Also well known is their complex relationship, even with some instances of a reversal between the two, where higher quality is associated with lower economic value, and vice versa. It is reasonable to expect such counterintuitive results when forecast quality and economic value—both, multifaceted quantities—are summarized by single scalar measures. Diagrams are often used to display forecast quality in order to better represent the multidimensional nature of forecast quality. Here, it is proposed that economic value be displayed as a region on a plot of hit rate versus false-alarm rate. Such a display obviates any need to summarize economic value by a scalar measure. The choice of the axes is motivated by the relative operating characteristic (ROC) diagram, and, so, this manner of displaying economic value is useful for deterministic as well as probabilistic forecasts.

Full access
Caren Marzban and Arthur Witt

Abstract

The National Severe Storms Laboratory has developed algorithms that compute a number of Doppler radar and environmental attributes known to be relevant for the detection/prediction of severe hail. Based on these attributes, two neural networks have been developed for the estimation of severe-hail size: one for predicting the severe-hail size in a physical dimension, and another for assigning a probability of belonging to one of three hail size classes. Performance is assessed in terms of multidimensional (i.e., nonscalar) measures. It is shown that the network designed to predict severe-hail size outperforms the existing method for predicting severe-hail size. Although the network designed for classifying severe-hail size produces highly reliable and discriminatory probabilities for two of the three hail-size classes (the smallest and the largest), forecasts of midsize hail, though highly reliable, are mostly nondiscriminatory.

Full access
Caren Marzban and Scott Sandgathe

Abstract

The verification of a gridded forecast field, for example, one produced by numerical weather prediction (NWP) models, cannot be performed on a gridpoint-by-gridpoint basis; that type of approach would ignore the spatial structures present in both forecast and observation fields, leading to misinformative or noninformative verification results. A variety of methods have been proposed to acknowledge the spatial structure of the fields. Here, a method is examined that compares the two fields in terms of their variograms. Two types of variograms are examined: one examines correlation on different spatial scales and is a measure of texture; the other type of variogram is additionally sensitive to the size and location of objects in a field and can assess size and location errors. Using these variograms, the forecasts of three NWP model formulations are compared with observations/analysis, on a dataset consisting of 30 days in spring 2005. It is found that within statistical uncertainty the three formulations are comparable with one another in terms of forecasting the spatial structure of observed reflectivity fields. None, however, produce the observed structure across all scales, and all tend to overforecast the spatial extent and also forecast a smoother precipitation (reflectivity) field. A finer comparison suggests that the University of Oklahoma 2-km resolution Advanced Research Weather Research and Forecasting (WRF-ARW) model and the National Center for Atmospheric Research (NCAR) 4-km resolution WRF-ARW slightly outperform the 4.5-km WRF-Nonhydrostatic Mesoscale Model (NMM), developed by the National Oceanic and Atmospheric Administration/National Centers for Environmental Prediction (NOAA/NCEP), in terms of producing forecasts whose spatial structures are closer to that of the observed field.

Full access
Caren Marzban and Scott Sandgathe

Abstract

A statistical method referred to as cluster analysis is employed to identify features in forecast and observation fields. These features qualify as natural candidates for events or objects in terms of which verification can be performed. The methodology is introduced and illustrated on synthetic and real quantitative precipitation data. First, it is shown that the method correctly identifies clusters that are in agreement with what most experts might interpret as features or objects in the field. Then, it is shown that the verification of the forecasts can be performed within an event-based framework, with the events identified as the clusters. The number of clusters in a field is interpreted as a measure of scale, and the final “product” of the methodology is an “error surface” representing the error in the forecasts as a function of the number of clusters in the forecast and observation fields. This allows for the examination of forecast error as a function of scale.

Full access
Caren Marzban and Scott Sandgathe

Abstract

Modern numerical weather prediction (NWP) models produce forecasts that are gridded spatial fields. Digital images can also be viewed as gridded spatial fields, and as such, techniques from image analysis can be employed to address the problem of verification of NWP forecasts. One technique for estimating how images change temporally is called optical flow, where it is assumed that temporal changes in images (e.g., in a video) can be represented as a fluid flowing in some manner. Multiple realizations of the general idea have already been employed in verification problems as well as in data assimilation. Here, a specific formulation of optical flow, called Lucas–Kanade, is reviewed and generalized as a tool for estimating three components of forecast error: intensity and two components of displacement, direction and distance. The method is illustrated first on simulated data, and then on a 418-day series of 24-h forecasts of sea level pressure from one member [the Global Forecast System (GFS)–fifth-generation Pennsylvania State University–National Center for Atmospheric Research Mesoscale Model (MM5)] of the University of Washington’s Mesoscale Ensemble system. The simulation study confirms (and quantifies) the expectation that the method correctly assesses forecast errors. The method is also applied to a real dataset consisting of 418 twenty-four-hour forecasts spanning 2 April 2008–2 November 2009, demonstrating its value for analyzing NWP model performance. Results reveal a significant intensity bias in the subtropics, especially in the southern California region. They also expose a systematic east-northeast or downstream bias of approximately 50 km over land, possibly due to the treatment of terrain in the coarse-resolution model.

Full access
Caren Marzban and Gregory J. Stumpf

Abstract

A neural network is developed to diagnose which circulations detected by the National Severe Storms Laboratory’s Mesocyclone Detection Algorithm yield damaging wind. In particular, 23 variables characterizing the circulations are selected to be used as the input nodes of a feed-forward, supervised neural network. The outputs of the network represent the existence/nonexistence of damaging wind, based on ground observations. A set of 14 scalar, nonprobabilistic measures and a set of two multidimensional, probabilistic measures are employed to assess the performance of the network. The former set includes measures of accuracy, association, discrimination, skill, and the latter consists of reliability and refinement diagrams. Two classification schemes are also examined.

It is found that a neural network with two hidden nodes outperforms a neural network with no hidden nodes when performance is gauged with any of the 14 scalar measures, except for a measure of discrimination where the results are opposite. The two classification schemes perform comparably to one another. As for the performance of the network in terms of reliability diagrams, it is shown that the process by which the outputs are converted to probabilities allows for the forecasts to be completely reliable. Refinement diagrams complete the representation of the calibration-refinement factorization of the joint distribution of forecasts and observations.

Full access
Caren Marzban, Stephen Leyton, and Brad Colman

Abstract

Statistical postprocessing of numerical model output can improve forecast quality, especially when model output is combined with surface observations. In this article, the development of nonlinear postprocessors for the prediction of ceiling and visibility is discussed. The forecast period is approximately 2001–05, involving data from hourly surface observations, and from the fifth-generation Pennsylvania State University–National Center for Atmospheric Research Mesoscale Model. The statistical model for mapping these data to ceiling and visibility is a neural network. A total of 39 such neural networks are developed for each of 39 terminal aerodrome forecast stations in the northwest United States. These postprocessors are compared with a number of alternatives, including logistic regression, and model output statistics (MOS) derived from the Aviation Model/Global Forecast System. It is found that the performance of the neural networks is generally superior to logistic regression and MOS. Depending on the comparison, different measures of performance are examined, including the Heidke skill statistic, cross-entropy, relative operating characteristic curves, discrimination plots, and attributes diagrams. The extent of the improvement brought about by the neural network depends on the measure of performance, and the specific station.

Full access
Caren Marzban, E. De Wayne Mitchell, and Gregory J. Stumpf

Abstract

It is argued that the strength of a predictor is an ill-defined concept. At best, it is contingent on many assumptions, and, at worst, it is an ambiguous quantity. It is shown that many of the contingencies are met (or avoided) only in a bivariate sense, that is, one independent variable (and one dependent variable) at a time. Several such methods are offered after which data produced by the National Severe Storms Laboratory’s Tornado Detection Algorithm are analyzed for the purpose of addressing the question of which storm-scale vortex attributes based on Doppler radar constitute the “best predictors” of tornadoes.

Full access