Search Results

You are looking at 1 - 10 of 28 items for

  • Author or Editor: Caren Marzban x
  • Refine by Access: All Content x
Clear All Modify Search
Caren Marzban

Abstract

The transformation of a real, continuous variable into an event probability is reviewed from the Bayesian point of view, after which a Gaussian model is employed to derive an explicit expression for the probability. In turn, several scalar (one-dimensional) measures of performance quality and reliability diagrams are computed. It is shown that if the optimization of scalar measures is of concern, then prior probabilities must be treated carefully, whereas no special care is required for reliability diagrams. Specifically, since a scalar measure gauges only one component of performance quality—a multidimensional entity—it is possible to find the critical value of prior probability that optimizes that scalar measure; this value of “prior probability” is often not equal to the “true” value as estimated from group sample sizes. Optimum reliability, however, is obtained when prior probability is equal to the estimate based on group sample sizes. Exact results are presented for the critical value of “prior probability” that optimize the fraction correct, the true skill statistic, and the reliability diagram, but the critical success index and the Heidke skill statistic are treated only graphically. Finally, an example based on surface air pressure data is employed to illustrate the results in regard to precipitation forecasting.

Full access
Caren Marzban

Abstract

A set of 14 scalar, nonprobabilistic measures—some old, some new—is examined in the rare-event situation. The set includes measures of accuracy, association, discrimination, bias, and skill. It is found that all measures considered herein are inequitable in that they induce under- or overforecasting. One condition under which such bias is not induced (for some of the measures) is when the underlying class-conditional distributions are Gaussian (normal) and equivariant.

Full access
Caren Marzban

Abstract

The receiver operating characteristic (ROC) curve is a two-dimensional measure of classification performance. The area under the ROC curve (AUC) is a scalar measure gauging one facet of performance. In this short article, five idealized models are utilized to relate the shape of the ROC curve, and the area under it, to features of the underlying distribution of forecasts. This allows for an interpretation of the former in terms of the latter. The analysis is pedagogical in that many of the findings are already known in more general (and more realistic) settings; however, the simplicity of the models considered here allows for a clear exposition of the relation. For example, although in general there are many reasons for an asymmetric ROC curve, the models considered here clearly illustrate that an asymmetry in the ROC curve can be attributed to unequal widths of the distributions. Furthermore, it is shown that AUC discriminates well between “good” and “bad” models, but not between good models.

Full access
Caren Marzban

Abstract

The temperature forecasts of the Advanced Regional Prediction System are postprocessed by a neural network. Specifically, 31 stations are considered, and for each a neural network is developed. The nine input variables to the neural network are forecast hour, model forecast temperature, relative humidity, wind direction and speed, mean sea level pressure, cloud cover, and precipitation rate and amount. The single dependent variable is observed temperature at a given station. It is shown that the model temperature forecasts are improved in terms of a variety of performance measures. An average of 40% reduction in mean-squared error across all stations is accompanied by an average reduction in bias and variance of 70% and 20%, respectively.

Full access
Caren Marzban

Abstract

Sensitivity analysis (SA) generally refers to an assessment of the sensitivity of the output(s) of some complex model with respect to changes in the input(s). Examples of inputs or outputs include initial state variables, parameters of a numerical model, or state variables at some future time. Sensitivity analysis is useful for data assimilation, model tuning, calibration, and dimensionality reduction; and there exists a wide range of SA techniques for each. This paper discusses one special class of SA techniques, referred to as variance based. As a first step in demonstrating the utility of the method in understanding the relationship between forecasts and parameters of complex numerical models, here the method is applied to the Lorenz'63 model, and the results are compared with an adjoint-based approach to SA. The method has three major components: 1) analysis of variance, 2) emulation of computer data, and 3) experimental–sampling design. The role of these three topics in variance-based SA is addressed in generality. More specifically, the application to the Lorenz'63 model suggests that the Z state variable is most sensitive to the b and r parameters, and is mostly unaffected by the s parameter. There is also evidence for an interaction between the r and b parameters. It is shown that these conclusions are true for both simple random sampling and Latin hypercube sampling, although the latter leads to slightly more precise estimates for some of the sensitivity measures.

Full access
Caren Marzban

Abstract

The distinction between forecast quality and economic value in a cost–loss formulation is well known. Also well known is their complex relationship, even with some instances of a reversal between the two, where higher quality is associated with lower economic value, and vice versa. It is reasonable to expect such counterintuitive results when forecast quality and economic value—both, multifaceted quantities—are summarized by single scalar measures. Diagrams are often used to display forecast quality in order to better represent the multidimensional nature of forecast quality. Here, it is proposed that economic value be displayed as a region on a plot of hit rate versus false-alarm rate. Such a display obviates any need to summarize economic value by a scalar measure. The choice of the axes is motivated by the relative operating characteristic (ROC) diagram, and, so, this manner of displaying economic value is useful for deterministic as well as probabilistic forecasts.

Full access
Caren Marzban and Scott Sandgathe

Abstract

The verification of a gridded forecast field, for example, one produced by numerical weather prediction (NWP) models, cannot be performed on a gridpoint-by-gridpoint basis; that type of approach would ignore the spatial structures present in both forecast and observation fields, leading to misinformative or noninformative verification results. A variety of methods have been proposed to acknowledge the spatial structure of the fields. Here, a method is examined that compares the two fields in terms of their variograms. Two types of variograms are examined: one examines correlation on different spatial scales and is a measure of texture; the other type of variogram is additionally sensitive to the size and location of objects in a field and can assess size and location errors. Using these variograms, the forecasts of three NWP model formulations are compared with observations/analysis, on a dataset consisting of 30 days in spring 2005. It is found that within statistical uncertainty the three formulations are comparable with one another in terms of forecasting the spatial structure of observed reflectivity fields. None, however, produce the observed structure across all scales, and all tend to overforecast the spatial extent and also forecast a smoother precipitation (reflectivity) field. A finer comparison suggests that the University of Oklahoma 2-km resolution Advanced Research Weather Research and Forecasting (WRF-ARW) model and the National Center for Atmospheric Research (NCAR) 4-km resolution WRF-ARW slightly outperform the 4.5-km WRF-Nonhydrostatic Mesoscale Model (NMM), developed by the National Oceanic and Atmospheric Administration/National Centers for Environmental Prediction (NOAA/NCEP), in terms of producing forecasts whose spatial structures are closer to that of the observed field.

Full access
Caren Marzban and V. Lakshmanan

Abstract

Gandin and Murphy (GM) have shown that if a skill score is linear in the scoring matrix, and if the scoring matrix is symmetric, then in the two-event case there exists a unique, “equitable” skill score, namely, the True Skill Score (or Kuipers’s performance index). As such, this measure is treated as preferable to other measures because of its equitability. However, in most practical situations the scoring matrix is not symmetric due to the unequal costs associated with false alarms and misses. As a result, GM’s considerations must be reexamined without the assumption of a symmetric scoring matrix. In this note, it will be proven that if the scoring matrix is nonsymmetric, then there does not exist a unique performance measure, linear in the scoring matrix, that would satisfy any constraints of equitability. In short, there does not exist a unique, equitable skill score for two-category events that have unequal costs associated with a miss and a false alarm.

Full access
Caren Marzban and Arthur Witt

Abstract

The National Severe Storms Laboratory has developed algorithms that compute a number of Doppler radar and environmental attributes known to be relevant for the detection/prediction of severe hail. Based on these attributes, two neural networks have been developed for the estimation of severe-hail size: one for predicting the severe-hail size in a physical dimension, and another for assigning a probability of belonging to one of three hail size classes. Performance is assessed in terms of multidimensional (i.e., nonscalar) measures. It is shown that the network designed to predict severe-hail size outperforms the existing method for predicting severe-hail size. Although the network designed for classifying severe-hail size produces highly reliable and discriminatory probabilities for two of the three hail-size classes (the smallest and the largest), forecasts of midsize hail, though highly reliable, are mostly nondiscriminatory.

Full access
Caren Marzban and Scott Sandgathe

Abstract

A statistical method referred to as cluster analysis is employed to identify features in forecast and observation fields. These features qualify as natural candidates for events or objects in terms of which verification can be performed. The methodology is introduced and illustrated on synthetic and real quantitative precipitation data. First, it is shown that the method correctly identifies clusters that are in agreement with what most experts might interpret as features or objects in the field. Then, it is shown that the verification of the forecasts can be performed within an event-based framework, with the events identified as the clusters. The number of clusters in a field is interpreted as a measure of scale, and the final “product” of the methodology is an “error surface” representing the error in the forecasts as a function of the number of clusters in the forecast and observation fields. This allows for the examination of forecast error as a function of scale.

Full access