• Allwine, K. J., , J. H. Shinn, , G. E. Streit, , K. L. Clawson, , and M. Brown. 2002. Overview of URBAN 2000: A multiscale field study of dispersion through an urban environment. Bull. Amer. Meteor. Soc. 83:521536.

    • Search Google Scholar
    • Export Citation
  • ASTM 2000. Standard guide for statistical evaluation of atmospheric dispersion model performance. American Society for Testing and Materials, Designation D 6589-00, 17 pp. [Available from ASTM, 100 Barr Harbor Dr., PO Box C700, West Conshohocken, PA, 19428.].

    • Search Google Scholar
    • Export Citation
  • Barad, M. L. Ed.,. 1958. Project Prairie Grass, a field program in diffusion. Vols. I and II. Geophysical Res. Papers 59, Rep. AFCRC-TR-58-235, 439 pp.

    • Search Google Scholar
    • Export Citation
  • Efron, B., and R. J. Tibshirani. 1993. An Introduction to the Bootstrap. Monographs on Statistics and Applied Probability, No. 57, Chapman and Hall, 436 pp.

    • Search Google Scholar
    • Export Citation
  • Finney, D. J. 1971. Probit Analysis. Cambridge University Press, 333 pp.

  • Hanna, S. R. 1988. Air quality model evaluation and uncertainty. J. Air Pollut. Control Assoc. 38:406412.

  • Hanna, S. R., , J. C. Chang, , and D. G. Strimaitis. 1993. Hazardous model evaluation with field observations. Atmos. Environ. 27A:22652285.

    • Search Google Scholar
    • Export Citation
  • Irwin, J. S., and M-R. Rosu. 1998. Comments on draft practices for statistical evaluation of atmospheric dispersion models. Preprints, 10th Joint Conf. on the Applications of Air Pollution Meteorology, Phoenix, AZ, Amer. Meteor. Soc., 6–10.

    • Search Google Scholar
    • Export Citation
  • Mosca, S., , G. Graziani, , W. Klug, , R. Bellasio, , and R. Bianconi. 1998. A statistical methodology for the evaluation of long-range dispersion models: An application to the ETEX exercise. Atmos. Environ. 32:43074324.

    • Search Google Scholar
    • Export Citation
  • Nasstrom, J. S., , G. Sugiyama, , J. M. Leone Jr., , and D. L. Ermak. 2000. A real-time atmospheric dispersion modeling system. Preprints, 11th Joint Conf. on the Applications of Air Pollution Meteorology, Long Beach, CA, Amer. Meteor. Soc., 84–89.

    • Search Google Scholar
    • Export Citation
  • Petty, R. 2000. User requirements for dispersion modeling. Proc. Workshop on Multiscale Atmospheric Dispersion Within the Federal Community, Silver Spring, MD, Office of the Federal Coordinator for Meteorological Services and Supporting Research, 1-1–1-3.

    • Search Google Scholar
    • Export Citation
  • Platt, N., , S. Warner, , and J. F. Heagy. 2002. Application of two-dimensional user-oriented measure of effectiveness to interior building releases. Proc. Sixth Annual George Mason University Transport and Dispersion Modeling Workshop, Fairfax, VA, Defense Threat Reduction Agency, CD-ROM. [Available from School of Computational Sciences, MS 5C3, 103 Science & Technology I, George Mason University, 4400 University Drive, Fairfax, VA 22030-4444.].

    • Search Google Scholar
    • Export Citation
  • SAIC 2001. The hazard prediction and assessment capability (HPAC) user's guide version 4.0. Science Applications International Corporation (SAIC) for Defense Threat Reduction Agency (DTRA) HPAC-UGUIDE-02-U-RAC0, 598 pp. [Available from Defense Threat Reduction Agency, 6801 Telegraph Road, Alexandria, VA, 22310-3398.].

    • Search Google Scholar
    • Export Citation
  • Sprent, P. 1998. Data Driven Statistical Methods. Chapman and Hall, 406 pp.

  • Sprent, P., and N. C. Smeeton. 2001. Applied Nonparametric Statistical Methods. 3d ed. Chapman and Hall, 461 pp.

  • Sykes, R. I., , S. F. Parker, , and R. S. Gabruk. 1996. SCIPUFF—A generalized hazard prediction model. Preprints, Ninth Joint Conf. on the Applications of Air Pollution Meteorology, Atlanta, GA, Amer. Meteor. Soc., 184–188.

    • Search Google Scholar
    • Export Citation
  • Venkatram, A. 1988. Inherent uncertainty in air quality modeling. Atmos. Environ. 22:12211227.

  • Warner, S. Coauthors 2001a. Evaluation of transport and dispersion models: A controlled comparison of Hazard Prediction and Assessment Capability (HPAC) and National Atmospheric Release Advisory Center (NARAC) predictions. Institute for Defense Analyses Paper P-3555, 251 pp. [Available from Steve Warner, Institute for Defense Analyses, 4850 Mark Center Drive, Alexandria, VA 22311-1882.].

    • Search Google Scholar
    • Export Citation
  • Warner, S., , N. Platt, , and J. F. Heagy. 2001b. Application of user-oriented measure of effectiveness to HPAC probabilistic predictions of Prairie Grass field trials. Institute for Defense Analyses Paper P-3586, 275 pp. [Available from Steve Warner, Institute for Defense Analyses, 4850 Mark Center Drive, Alexandria, VA 22311-1882.].

    • Search Google Scholar
    • Export Citation
  • Warner, S., , N. Platt, , and J. F. Heagy. 2001c. User-oriented measures of effectiveness for the evaluation of transport and dispersion models. Proc. Seventh Int. Conf. on Harmonisation within Atmospheric Dispersion Modelling for Regulatory Purposes, Belgirate, Italy, JRC-EI, 24–29.

    • Search Google Scholar
    • Export Citation
  • Warner, S. Coauthors 2001d. Model intercomparison with user-oriented measures of effectiveness. Proc. Fifth Annual George Mason University Transport and Dispersion Modeling Workshop, Fairfax, VA, Defense Threat Reduction Agency, CD-ROM. [Available from School of Computational Sciences, MS 5C3, 103 Science & Technology I, George Mason University, 4400 University Drive, Fairfax, VA 22030-4444.].

    • Search Google Scholar
    • Export Citation
  • Warner, S. Coauthors 2001e. User-oriented measures of effectiveness for the evaluation of transport and dispersion models. Institute for Defense Analyses Paper P-3554, 797 pp. [Available from Steve Warner, Institute for Defense Analyses, 4850 Mark Center Drive, Alexandria, VA 22311-1882.].

    • Search Google Scholar
    • Export Citation
  • View in gallery

    Conceptual view of overlap (AOV), false-negative (AFN), and false-positive (AFP) regions that are used to construct the user-oriented MOE

  • View in gallery

    Key characteristics of the two-dimensional MOE space

  • View in gallery

    Interpretation of MOE comparisons: exclusionary zones

  • View in gallery

    Illustration of MOE component computations for a Project Prairie Grass field observation and SCIPUFF (Sykes et al. 1996) model prediction. The illustrated computations for the comparisons of predictions and observations are for dosages (mg s m−3) and are associated with Project Prairie Grass field trial number 8 along the 800-m arc: based on (a) dosage and (b) a threshold dosage of 60 mg s m−3

  • View in gallery

    Relationship between RWFMS and the MOE. Isolines of RWFMS in the 2D MOE space: (a) CFN = 1, and CFP = 1; (b) CFN = 5, and CFP = 0.5

  • View in gallery

    User coloring of MOE space: RWFMS

  • View in gallery

    Fractional bias isolines in the 2D MOE space for some values of the parameter s

  • View in gallery

    Relationship between FB and MOE: examples of FBFOM user coloring for s = 1.15, 1.5, and 2

  • View in gallery

    Isolines of NAD in the 2D MOE space. Note that the lower values of NAD correspond to less scatter and, hence, better model performance

  • View in gallery

    MOE 95% confidence regions for SCIPUFF predictions of 51 Project Prairie Grass releases as a function of distance to the arc: (a) MOE based on total dosage and (b) MOE based on dosage threshold of 60 mg s m−3. The clusters of 9500 bootstrap samples correspond to the 800- (red), 400- (pink), 200- (lightest blue), 100- (blue), and 50-m arcs (darkest blue). In the figures, especially Fig. 4b, there is considerable overlap associated with the 50–400-m-arc MOE confidence regions

  • View in gallery

    MOE 95% confidence regions for SCIPUFF predictions of 51 Project Prairie Grass releases: (a) MOE based on total dosage and 51 trials, (b) MOE based on dosage threshold of 60 mg s m−3 and 51 trials, (c) MOE based on total dosage as a function of stability category grouping, and (d) MOE based on dosage threshold of 60 mg s m−3 as a function of stability category grouping. For this study, stability category groupings were defined as follows, using Irwin and Rosu (1998) Project Prairie Grass stability class assignments 1–7: unstable = 1, 2, and 3 (red); neutral = 4 (blue); and stable = 5, 6, and 7 (green)

  • View in gallery

    View of SCIPUFF probabilistic prediction outputs: contours of the probability that dosage exceeds 60 mg s m−3. This HPAC (SCIPUFF) prediction was done for Project Prairie Grass release number 43 and corresponds to the predicted probability that the SO2 dosage at a sampler 1.5 m above ground is greater than 60 mg s m−3. See Warner et al. (2001b) for details

  • View in gallery

    Overlay of MOE estimates (based on 60 mg s m−3 threshold) for SCIPUFF probabilistic predictions and 51 Project Prairie Grass trials on notional RWFMS user coloring: (a) CFN = CFP = 1, (b) CFN = 5 and CFP = 0.5. Confidence regions (95%) for MOE estimates based on SCIPUFF probability predictions of 0.01, 0.50, 0.80, 0.85, 0.90, 0.95, and 0.99 are shown, as well as SCIPUFF mean-value (ensemble average) predictions (yellow cluster)

  • View in gallery

    Notional dose-response curve (for a probit-like model). Moderate dosage differences between the prediction and the observation near LCt50 (red dashed arrows) can have a dramatic impact on the fraction of the population affected, whereas substantial relative dosage differences associated with larger predicted and observed dosages (well beyond LCt50; green dashed arrows) or smaller predicted and observed dosages (well below LCt50) do not necessarily have much impact on the estimated fraction of the affected population

  • View in gallery

    SCIPUFF and NARAC MOE 95% confidence regions based on 51 Project Prairie Grass field trials, considering ocular and lethal effects of a notional agent: (a) overlay on RWFMS coloring with CFP = CFN = 0.5 and (b) overlay on FBFOM coloring with s = 1.25. The MOE confidence regions shown are based on probit models for ocular effects with OE50 = 30 mg s m−3 and for lethal effects with LCt50 = 4000 mg s m−3, with both effects models assuming a probit slope of 12. MOE clusters based on NARAC predictions (black and red), MOE clusters based on SCIPUFF predictions (blue and brown), MOE clusters based on ocular effects (black and blue), and MOE clusters based on lethal effects (red and brown) are shown

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 126 126 85
PDF Downloads 40 40 2

User-Oriented Two-Dimensional Measure of Effectiveness for the Evaluation of Transport and Dispersion Models

View More View Less
  • 1 Institute for Defense Analyses, Alexandria, Virginia
© Get Permissions
Full access

Abstract

A two-dimensional measure of effectiveness for comparing hazardous material transport and dispersion model predictions and field observations has been developed. This measure is used for comparing predictions and observations paired in space and time, and the components of this measure—overprediction, or false-positive fraction, and underprediction, or false-negative fraction—can illuminate strengths and weaknesses of a model in ways that many one-dimensional measures cannot. Comparisons of predictions of short-range field observations are used to illustrate features of this measure of effectiveness, including its computation based on dosages or based on a dosage threshold of interest. With this user-oriented measure of effectiveness, statistically significant resolution of transport and dispersion model performance differences as a function of downwind range and meteorological stability category grouping, as well as between different models, is described. Evaluation of probabilistic prediction outputs is also demonstrated with this measure of effectiveness. The user-oriented measure of effectiveness can be evaluated for different assumed hazardous materials and human effects of interest and, as such, can relate the quality of a prediction for agents of greatly varying toxicity. A quantitative method that can aid in the communication of a user's risk tolerance is identified and allows one to divide the two-dimensional measure of effectiveness space into regions of relative acceptability for the particular user and application.

Corresponding author address: Dr. Steve Warner, Institute for Defense Analyses, 4850 Mark Center Drive, Alexandria, VA 22311-1882. swarner@ida.org

Abstract

A two-dimensional measure of effectiveness for comparing hazardous material transport and dispersion model predictions and field observations has been developed. This measure is used for comparing predictions and observations paired in space and time, and the components of this measure—overprediction, or false-positive fraction, and underprediction, or false-negative fraction—can illuminate strengths and weaknesses of a model in ways that many one-dimensional measures cannot. Comparisons of predictions of short-range field observations are used to illustrate features of this measure of effectiveness, including its computation based on dosages or based on a dosage threshold of interest. With this user-oriented measure of effectiveness, statistically significant resolution of transport and dispersion model performance differences as a function of downwind range and meteorological stability category grouping, as well as between different models, is described. Evaluation of probabilistic prediction outputs is also demonstrated with this measure of effectiveness. The user-oriented measure of effectiveness can be evaluated for different assumed hazardous materials and human effects of interest and, as such, can relate the quality of a prediction for agents of greatly varying toxicity. A quantitative method that can aid in the communication of a user's risk tolerance is identified and allows one to divide the two-dimensional measure of effectiveness space into regions of relative acceptability for the particular user and application.

Corresponding author address: Dr. Steve Warner, Institute for Defense Analyses, 4850 Mark Center Drive, Alexandria, VA 22311-1882. swarner@ida.org

Introduction

The need to assess the accuracy of hazardous material transport and dispersion models continues to be of great importance. Applications for these models as planning aids, and even as “real time” emergency response tools, continue to increase (Petty 2000). Past studies have compared the predictions of transport and dispersion models with field observations using a variety of statistical quantities. Statistical measures of bias, scatter, and correlation have been discussed (Hanna et al. 1993) and have been applied typically to quantities derived from the field observations, for example, maximum dosage along a sampler arc or estimated plume width at a given downwind range. To a large degree, derived quantities have been used for comparisons because it was recognized that measures of bias, scatter, and correlation applied to point-to-point comparisons, that is, observations and predictions paired in space and time, could indicate very poor model performance, given small plume displacements (Hanna 1988). That is, the predicted and observed “plumes” could have the same shape and size, and yet point-to-point comparisons would indicate that the model performs poorly in terms of the above statistical measures simply because, for example, the input wind direction was in error by a few degrees. In addition, for typical air-pollution applications, models might be required to predict the maximum concentrations at a certain distance without regard to direction of the plume. Therefore, the above statistical measures applied to derived quantities (e.g., the maximum) would be deemed acceptable for assessing model performance in air-quality or similar studies. In a similar way, when used as a tool for future planning, one imagines that a transport and dispersion model that gets the overall size, shape, and maximum values correct might be good enough, especially when considering the impracticality of predicting the detailed wind field weeks or months into the future.

Recently developed transport and dispersion models that incorporate access to more complete weather information, to include numerical weather predictions/forecasts, have led some to consider their usage as real-time, or at least near-real-time, emergency response aids (Nasstrom et al. 2000). For such applications, the actual location of the hazardous material, not just its size and shape, is of critical importance. Thus, observations and predictions, paired in space and time, must be compared, and measures to assess such an examination should be identified.

Recent studies of the European Tracer Experiment (ETEX) have included a figure of merit in space (FMS), defined as the overlap area between the prediction and observation divided by the total predicted and observed areas, all above some threshold concentration. For ETEX, FMS was used to compare the predictions of several models (Mosca et al. 1998). At its core, FMS compares observations and predictions paired in space and time; however, the FMS method, when applied to actual field observations, requires interpolation over the set of sampled observations and, as such, can be subject to “artificial” sensitivities introduced by the type of interpolation technique used. Furthermore, FMS does not distinguish between regions of over- and underprediction (i.e., regions of overprediction and underprediction are “weighted” identically). For some applications, describing the model's accuracy separately in terms of regions of over- and underprediction would provide insight, in particular, because a model user may assess the risk associated with model over- and underpredictions very differently.

Model validation efforts are meant to examine the accuracy of a given model's predictions within an “operational” context. Such validation analysis should include a metric by which field trial observations and predictions can be compared within the context of the specific application—a measure of effectiveness (MOE). The ideal MOE would faithfully capture and portray model performance and would convey to the users a certain degree of confidence (high or low) in the model, taking into account their particular application.

In this paper, an MOE that allows for point-to-point comparisons of predictions and observations and can provide for straightforward communication of a model's relative worth (accuracy) to a user who is concerned with the size, shape, and location of the hazard is identified and described.

Description of two-dimensional MOE

A fundamental feature of any comparison of hazard prediction model output to observations is the over- and underprediction regions. We define the false-negative region where a hazard is observed but not predicted and the false-positive region where a hazard is predicted but not observed. Figure 1 shows one possible interpretation of these regions—the observed and predicted areas in which a prescribed dosage is exceeded. This view can be extended to consider the marginal over- and underpredicted dosages, as will be discussed below. In any case, numerical estimates of the false-negative region (AFN), the false-positive region (AFP), and the overlap region (AOV) characterize this conceptual view.

The MOE that we shall introduce has two dimensions. The x axis corresponds to the ratio of the overlap region to the observed region, and the y axis corresponds to the ratio of the overlap region to the predicted region. When these mathematical definitions are algebraically rearranged, one recognizes that the x axis corresponds to 1 minus the false-negative fraction and the y axis corresponds to 1 minus the false-positive fraction,
i1520-0450-43-1-58-e1
where AFN = region of false negative, AFP = region of false positive, AOV = region of overlap, APR = region of the prediction, and AOB = region of the observation.

Characteristics of the MOE

Consistent with the above algebraic rearrangement, Fig. 2 shows the region of false negative decreasing from left to right and the region of false positive decreasing from bottom to top. Figure 2 demonstrates some of the key characteristics of the two-dimensional (2D) MOE space. We begin with the (1, 1) point located at the upper-right corner. Here, both plumes overlap entirely (no false-negative or false-positive fraction), and, thus, the model would achieve perfect agreement with the field trial. Point (0, 0) signifies that there is no region of overlap, and, thus, the model disagrees completely with the field trial. The 2D MOE includes directional effects; that is, the prediction of the location of a hazard, not just the shape and size of the plume, is critical to obtaining a high MOE “score.”

Along the line x = 1, the prediction completely envelops the observation. Along the line y = 1, the observation completely envelops the prediction. The “purple” diagonal line represents the situation in which the prediction and the observation have identical “total” sizes [i.e., x = y implies from Eq. (1) that AOB = APR]. As one traverses this diagonal line from (1, 1) toward (0, 0), the fraction of overlap area between the predicted and observed plumes decreases.

Figure 3 suggests an additional interpretation of the 2D MOE. In this figure, the gold circular region represents the estimate of the MOE for some set of fictional model predictions and field trial observations. The point estimate, perhaps the vector mean value of several similar trials, would be found approximately at the center of this region, and the overall size of the region represents the uncertainty associated with the point estimate of the MOE.

If a second set of model predictions was compared with “model A,” several conclusions might be anticipated. The second model's MOE estimate might be found in the region shaded “orange” (lower left). This would imply that model A performs significantly better—both its false-positive and false-negative fractions are lower. As an alternative, the second model might lead to an estimate in the green region (upper right), an indication that model A is the poorer performer (for this set of field trial observations). Last, the new model predictions might lead to an MOE value that is located in one of the gray regions. The implication here is that a user would have to make a determination as to the trade-off between false positive and false negative before deciding which model was most appropriate for his or her specific application.

Computation of the MOE

Although the MOE has been introduced in terms of the three areas AFN, AFP, and AOV, it is not necessary to have actual physical areas to compute the components of the MOE. Rather, AFN, AFP, and AOV can be computed directly from the predictions and field trial observations paired in space and time. For the dosage-based MOE, the false-positive region is the dosage predicted in a region but not observed. Therefore, for AFP (as shown in Fig. 4a), one first considers all of the samplers at which the prediction is of greater value than the observation. Next, one sums the differences between the predicted and observed dosages at those samplers. Based on the samplers that contained observed values that were larger than the predicted values, one can similarly compute AFN. Then AOV is calculated by considering all samplers and summing the dosages associated with the minimum predicted or observed value. Restating the above mathematically, let
i1520-0450-43-1-58-e2
then
i1520-0450-43-1-58-e3

These estimates can be made on a linear scale, as shown in Fig. 4a for a Project Prairie Grass field trial, or on a logarithmic scale. If concentration information were available in place of dosages, an analogous procedure could be used to compute concentration-based MOE values.

In addition to the more general technique described above, one can compute an MOE value based on an identified threshold (e.g., concentration or dosage) of interest as notionally illustrated in Fig. 1. First, one considers the predictions and observations at each of the samplers. If both the prediction and observation are above the threshold, it is considered overlap at that sampler. If the prediction is below the threshold and the observation is above, a false negative is assessed at that sampler. In a similar way, a false positive is assessed when the prediction is above the threshold and the observation is not. Restating this mathematically, given a set of samplers with observations and predictions and a threshold T, one can partition this set into four subsets—OV, FN, FP, and BELOW:
i1520-0450-43-1-58-e4
Then
i1520-0450-43-1-58-e5
It is possible to modify the above definition of AOV to include the number of elements (samplers) in the BELOW set. To be consistent with the conceptual view illustrated in Fig. 1, AOV was defined as in Eq. (5).

Figure 4b illustrates this procedure using a sulfur dioxide (SO2) dosage threshold of 60 mg s m−3 that was consistent with the sampler sensitivity for the Project Prairie Grass field trial (Barad 1958). This procedure is analogous to assessing an area-based MOE at a contour level (e.g., as illustrated conceptually in Fig. 1) using area interpolation of observations and predictions.

Relationship of MOE to standard statistical measures

This section describes and illustrates the mathematical relationships between the 2D MOE and several one-dimensional measures, including FMS, fractional bias, and a measure of scatter between observations and predictions.

Figure of merit in space

FMS is defined as the ratio of the intersection of the observed and predicted areas to the union of the observed and predicted areas, at a fixed time and above a defined threshold concentration (Mosca et al. 1998). In terms of the MOE nomenclature of false-positive, false-negative, and overlap regions, the FMS can be rewritten as shown in Eq. (6). Mosca et al. (1998) define FMS as a percentage and, therefore, multiply Eq. (6) by 100:
i1520-0450-43-1-58-e6
It is important that we note that the right-hand side of Eq. (6) is actually a more general definition of FMS in that it is not restricted to physical areas but rather can be applied to any region, for example, summing concentrations at all samplers. Now, from Eq. (1),
i1520-0450-43-1-58-e7
Therefore,
i1520-0450-43-1-58-e8
These definitions of AFN and AFP are then substituted into Eq. (6), and, following algebraic rearrangement, one obtains
i1520-0450-43-1-58-e9
Some users of hazardous material transport and dispersion models might consider false positives and false negatives very differently from each other. For many applications, false positives would be much more acceptable to the user than false negatives (which could result in decisions that directly lead to death or injury). Equation (10) is an example of a user scoring function that takes the above-mentioned risk tolerance into consideration. In basic terms, this equation describes a modified FMS that includes coefficients CFN and CFP to weight the false-negative and false-positive regions, respectively. We refer to this notional user scoring function as the risk-weighted FMS (RWFMS):
i1520-0450-43-1-58-e10
where CFN and CFP are greater than 0.

It may be true that, for some applications (e.g., technical model validation), the weightings for false negatives and false positives are considered irrelevant or are set equal (CFN = CFP). As developed here, the implicit coefficient associated with AOV is 1.0. Therefore, the notion of equal weights for AFN and AFP (i.e., CFN = CFP) is insufficient for the complete specification of RWFMS. That is, the precise RWFMS values will depend on the values chosen for CFN and CFP and not just on their ratio.

Algebraic relationships similar to those discussed for FMS can be applied to RWFMS to yield the following:
i1520-0450-43-1-58-e11

Figure 5a shows contours of RWFMS (i.e., isolines) in the 2D MOE space for CFN = CFP = 1. Figure 5b similarly illustrates the case in which AFN is weighted by a factor of 10 relative to AFP and a factor of 5 relative to AOV; that is, CFN = 5 and CFP = 0.5.

The contours of RWFMS can be used as the basis for scoring the performance of a model and for coloring the MOE space according to the RWFMS score. Figure 6 provides an example of user coloring of the MOE space, based on RWFMS for several values of CFN and CFP (e.g., red is “bad” and green is “good”). At an RWFMS of 0.0, this coloring scheme incorporates pure red. As the user-defined RWFMS increases from 0.0 to 0.50, the intensity of green increases linearly. For instance, at an RWFMS value of 0.5, there are equal intensities of red and green (hence, yellow). In a similar way, for RWFMS values between 0.50 and 1.0, the red intensity is reduced linearly with increasing RWFMS value. At an RWFMS value of 1.0, the coloring used is pure green.

Fractional bias

A hazardous material transport and prediction model might be applied to problems for which the actual location of the hazard or direction of the plume is of no particular importance. For example, such a model might be used to study potential future outcomes of an accidental or intentional release. In these cases, the actual weather (e.g., wind speed and direction) in the far future, associated with the planning, cannot be known with any certainty. For these applications it is desirable to have a scoring function that simply compares the sizes of the predicted and observed regions. In essence, model users in these cases would want a model that minimizes the overall model bias.

Fractional bias (FB), defined below as in ASTM (2000), has been used to evaluate transport and dispersion models for such circumstances:
i1520-0450-43-1-58-e12
where C = observation/prediction of interest (e.g., dosage), Cp corresponds to model prediction, Co corresponds to observation, and a bar above the quantity (e.g., C) denotes the average. To begin to explore the relationship between FB and the MOE, we recall that the 2D MOE is defined as
i1520-0450-43-1-58-e13
Next, consider the ratio
i1520-0450-43-1-58-e14
We then consider points in the 2D MOE space that lie on the diagonal line, that is, the line y = x. Then,
i1520-0450-43-1-58-e15
Therefore, this diagonal in the 2D MOE space consists of the points that incorporate &ldquo=ual size” predictions and observations, that is, no bias, or FB = 0. Let us assume a hypothetical requirement that APR and AOB must be within a factor of s of each other, with s > 1. This requirement is stated mathematically by requiring
i1520-0450-43-1-58-e16

Figure 7 shows isolines of this FB figure of merit (FBFOM), in MOE space, for various values of the parameter s.

A coloring scheme (red to green, as discussed previously) for the 2D MOE space, using FBFOM, can be formulated (Warner et al. 2001b), with the results shown in Fig. 8.

One can also relate the fractional bias to the components of the 2D MOE as follows. From Eq. (12) we note that
i1520-0450-43-1-58-e17
where n = number of data points used in the comparisons, C(i)o refers to the ith observed concentration, and, C(i)p similarly refers to the ith predicted concentration. Then, the numerator can be rearranged as in Eq. (18):
i1520-0450-43-1-58-e18
In a similar way, the denominator in Eq. (17) can be reduced to
i1520-0450-43-1-58-e19
Thus, Eq. (20) results:
i1520-0450-43-1-58-e20
Substituting for AFP and AFN from Eq. (8) into Eq. (20) leads to
i1520-0450-43-1-58-e21
and, after algebraic simplification,
i1520-0450-43-1-58-e22
Further rearrangement of Eq. (22) yields
i1520-0450-43-1-58-e23
which shows that isolines of constant FB in the 2D MOE space are straight rays through the origin (Fig. 7) with slope, m:
i1520-0450-43-1-58-e24
Within the context of the FB figure of merit, for FB ≥ 0, m = 1/s from Eq. (16), and, for FB < 0, m = s.

Measure of scatter

Quite often, measures such as mean-square error or normalized-mean-square error are used to characterize the differences between observed and predicted quantities—the scatter, so to speak. Similar to the way in which bias between a prediction and observation can be portrayed in the MOE space, as discussed above, it is desirable to have a measure of scatter that can be likewise portrayed. For this purpose, we define a specialized version of a measure of scatter—normalized absolute difference (NAD)—between observations and predictions:
i1520-0450-43-1-58-e25
As with FB, one can express NAD in terms of the false-negative, false-positive, and overlap regions, and, after substitution and algebraic simplification using Eq. (8) [similar to Eqs. (18) and (19)], NAD is related to the MOE components as follows:
i1520-0450-43-1-58-e26
After additional rearrangement, one obtains
i1520-0450-43-1-58-e27
Also, NAD is related to FMS, Eq. (9), as follows:
i1520-0450-43-1-58-e28
Isolines of NAD in the 2D MOE space are shown in Fig. 9.

Example applications of the 2D MOE

This section provides a few example applications of the MOE to predictions of short-range field observations. Comparisons of predictions from the U.S. Defense Threat Reduction Agency's Hazardous Prediction and Assessment Capability (HPAC; SAIC 2001), which includes the second-order closure-integrated puff (SCIPUFF) model as its transport and dispersion engine, and the U.S. Department of Energy's National Atmospheric Release Advisory Center (NARAC; Nasstrom et al. 2000) model with the Project Prairie Grass field observations (Warner et al. 2001c) and with controlled, computer-simulated releases (Warner et al. 2001a) have been completed using the MOE.

Model comparisons with Project Prairie Grass

Project Prairie Grass field trials were conducted during the summer of 1956 in north-central Nebraska near the town of O'Neil (Barad 1958). The primary objective of Project Prairie Grass was to determine the rate of diffusion of a neutrally buoyant tracer gas as a function of meteorological conditions. These experiments involved continuous 10-min releases of SO2 from a near-surface point source. Downwind SO2 concentrations were sampled along five concentric, semicircular arcs located 50, 100, 200, 400, and 800 m away from the gas source. The samplers were arranged at 2° intervals along the 50-, 100-, 200-, and 400-m arcs (91 samplers per arc) and at 1° intervals for the 800-m arc (i.e., 181 samplers along the 800-m arc). The Project Prairie Grass experiments represent a relatively well-defined, well-known, and classic standard for the evaluation of transport and dispersion models. As such, these experiments are ideal for the initial demonstration of the MOE concepts.

A total of 70 releases were conducted during the Project Prairie Grass experiment. Of these 70 releases, 19 were eliminated from further consideration because crucial wind or sampler concentration information was missing (14 releases), the source height was different and no turbulence fit at that height was reported (4 releases) or, in one case, the maximum observed concentration was extremely small (<0.5 mg m−3). The 51 trials that were included in this study were numbered 5, 7–28, 32–46, 48–51, and 54–62 in the original Project Prairie Grass report (Barad 1958). The detailed protocol for the computation of SCIPUFF and NARAC predictions of the Project Prairie Grass releases are further discussed in Warner et al. (2001e).

Figure 10 displays SCIPUFF MOE values for the five Project Prairie Grass arcs. The point estimate for the MOE value for each arc is represented by the vector average obtained from the individual MOEs calculated for the 51 Project Prairie Grass releases that were examined and lies approximately at the center of the given colored cluster. The colored clusters correspond to the approximate 95% confidence region associated with the MOE point estimate. These approximate confidence regions were computed using the bootstrap percentile method and 10 000 bootstrap samples (Efron and Tibshirani 1993). For example, MOE values were computed at each arc for each of the 51 releases that were examined. Resampling with replacement (“bootstrap”) of the 51 MOE vectors was done; that is, 10 000 sets of 51 vector resamples were created. From these sets, 10 000 vector averages (i.e., MOE values) were calculated and used to estimate the confidence region associated with the original MOE point estimate.

For the MOE based on total dosage, model performance degrades at the longer ranges (Fig. 10a). This result is statistically significant, because the 800-m arc (red) MOE confidence region is completely separated from the 50-m arc (dark blue) MOE confidence region. For the MOEs based on a dosage threshold of 60 mg s m−3, similar results, albeit with smaller differences, can be seen (Fig. 10b).

MOE values for individual trials were created by combining the arc results. To create values that would be more closely related to actual area-based measures, the MOE computations were weighted by the intersampler distances for the individual arcs. First, each sampler was associated with a short line segment centered at the sampler location and having the same length as the intersampler distance for that arc. The intersampler spacing was computed as rθ, where r is the distance to the arc (e.g., 800 m) and θ is the angular separation between samplers (in radians). This procedure led to the following intersampler distances, rounded to two decimal places (in meters): 1.75, 3.49, 6.98, 13.96, and 13.96 for the 50-, 100-, 200-, 400-, and 800-m arcs, respectively. Next, the summed dosages used to define AOV, AFN, and AFP for each individual arc (or, for some specified threshold, the numbers of samplers used to define AOV, AFN, and AFP) were multiplied by the corresponding intersampler distance. Adding the values for the five arcs together forms the AOV, AFN, and AFP estimates for the entire (all arcs) trial. In this way, contributions from each arc to an overall “area based” MOE were estimated. This weighting scheme should not be considered to be a general technique, but rather to be a natural approach for this specific arc-based sampling space. For a densely sampled field trial like Project Prairie Grass, one can also consider area interpolation as a method of “sampler weighting” that would lead to area-based MOE values.

Figures 11a and 11b show MOE 95% confidence regions for the 51 Project Prairie Grass trials, based on total dosage and a dosage threshold of 60 mg s m−3, respectively. Figures 11c and 11d show MOE 95% confidence regions for the same 51 Project Prairie Grass trials as a function of stability category grouping. Stability category assignments, which were previously developed for Project Prairie Grass by Irwin and Rosu (1998), were used to group trials with similar characteristics in terms of atmospheric stability. For this display, stability category assignments of 1, 2, and 3 were considered “unstable,” assignments of 5, 6, and 7 were considered “stable;” and trials assigned 4 were considered “neutral.”

For both types of MOE values—those based on total dosage and those based on a dosage threshold—Fig. 11 suggests that the model performed best during the trials associated with the more unstable atmospheric conditions (red). In terms of total dosage (Figs. 11a,c), the false-negative fraction does not change much across stability conditions; however, the false-positive fraction steadily increases from unstable to neutral to stable. For the dosage threshold-based MOEs (Figs. 11b,d), predictions of trials conducted under stable conditions led to increased false-positive and decreased false-negative fractions relative to the other trials. That is, the model overpredicted the region above the threshold and, in this sense, might be considered somewhat conservative. For the neutral trials, the false-positive fraction was minimized, but at the expense of an increased false-negative fraction relative to the other stability conditions.

Probabilistic predictions

To address inherent uncertainties associated with real observations, some transport and dispersion models provide predictions of ensemble mean values, as well as probabilistic-based predictions. In this view, observations of concentration, for example, are seen as individual realizations from some population—the ensemble (Venkatram 1988; ASTM 2000). The SCIPUFF model can provide probabilistic predictions of hazardous material transport and dispersion (Sykes et al. 1996). This section illustrates the application of the user-oriented MOE to assess probabilistic-based prediction outputs.

Figure 12 provides a view of SCIPUFF's capability to produce probabilistic outputs. The predicted contours are associated with the probability that a dosage of 60 mg s m−3 is exceeded. For example, the 0.1 probability contour, the outer light purple contour, is meant to encompass the region in which 9 of 10 plume realizations will lie. Therefore, by this nomenclature, the smaller probability values, like 0.1, lead to notionally fatter plumes.

Figure 13 shows approximate 95% confidence regions for MOE estimates based on probabilistic prediction outputs and the mean value prediction (yellow cluster) for the 51 Project Prairie Grass field trials that were examined (Warner et al. 2001b). The 0.01 probability values are always associated with the widest predictions (most conservative) and the 0.999 probability values are always associated with the narrowest predictions. Figures 13a and 13b show the same MOE confidence regions superimposed on two notional user colorings—one that equally weights false-positive and false-negative fractions and one that weights false-negative fractions as 10 times as important as false-positive fractions.

Based on the RWFMS user colorings described in Fig. 13, the following notional evaluation is possible. For equal weighting of the false-negative, false-positive, and overlap fractions (i.e., CFN = CFP = 1), the probabilistic predictions in the range between 0.01 and 0.90, as well as the mean value predictions, lead to acceptable (i.e., within the green user-colored space) model performance. For the conservative CFN = 5 and CFP = 0.5 user coloring, the 0.01 probability prediction provides the only acceptable performance (of those examined in this study). This example illustrates how the MOE, in conjunction with an agreed-upon scoring function (or coloring), can be used to tune a model parameter (the probability parameter in this case) to meet a user's requirement.

MOEs assessed in dosage regimes of significance to humans

For many applications, users of transport and dispersion model predictions are more concerned with the effects associated with a hazardous material release and less concerned with specific amounts of material in particular locations. In essence, some model users will be concerned with numbers of people affected, or potentially affected, by a release. One can examine observations and predictions by considering the lethality or effects of a presumed agent. For example, a standard probit curve (or some other model) might be used to assess the effects of a given exposure (Finney 1971). We note, however, that lethality and effects models may be subject to poorly known thresholds and probit slopes (assuming a probit model). There may be uncertainties in the knowledge of the source purity and unknown correction factors for applying data to a general population versus military personnel. Determining actual casualty levels for a specific scenario would also need to include factors such as the means of exposure (respiratory or percutaneous) and other parameters relating to that exposure (e.g., breathing rate, the degree of population sheltering, and the effect of clothing). Such a lethality effects model would allow dosages to be converted to a fractional population effect, that is, the fraction of an exposed population, at that dosage level, that would be expected to become a casualty. Assume that a given hazardous material has LCt50 = l and probit slope = α (for our notional probit effects model), where at LCt50, by definition, one-half of the exposed population would be expected to die. Next, for any dosage (or time-integrated concentration) d, let LE(d) be the fraction of the people that die. Then, for any sampler i, consider the marginal contributions:
i1520-0450-43-1-58-e29
One then obtains AOV, AFN, and AFP by summing the marginal contributions of Eq. (29).

For the notional effects model shown in Fig. 14, small differences between actual and predicted dosages near LCt50 can have a dramatic impact on the outcome. On the other hand, for larger predicted and observed dosages (well beyond LCt50) or for smaller predicted and observed dosages (well below LCt50), substantial relative differences do not necessarily have much impact in terms of human effects. This technique allows one to assess the MOE in the regime that is of particular interest to the user and application. For example, if a model predicts a dosage of 10−15 mg s m−3 and the observation is really 10−12 mg s m−3, and neither level has any impact on humans, one could question the significance of this “3-orders-of-magnitude difference.”

By incorporating the lethality of the release, as outlined above, one can convert the x and y axes of the MOE to “fraction of the population inadvertently exposed” (false negative) and “fraction of the population unnecessarily warned” (false positive), respectively. We illustrate this with the Project Prairie Grass data and predictions. For these short-range, densely sampled observations, interpolation between samplers is relatively straightforward—we used a Delaunay triangulation procedure (Warner et al. 2001b). Therefore, one can compute MOE values in terms of actual areas, that is, false-positive, false-negative, and overlap areas. Next, one can assume an underlying population distribution—we chose spatially uniform for this illustration. At this point then, the x and y axes of our MOE space are converted to the fraction of the population inadvertently exposed to some effects level of interest and the fraction of the population unnecessarily warned.

Figure 15a presents an overlay of MOE estimates for the comparisons of SCIPUFF and NARAC predictions of the Project Prairie Grass field trials on a notional user-coloring scheme. Shown are MOE estimates for the NARAC and SCIPUFF predictions based on ocular and lethal effects for some notional agent. Sulfur dioxide was the agent actually released during the Project Prairie Grass field trials, but for this demonstration we assumed a nerve agent–like material in its place. To generate the MOE values of Fig. 15, a probit model was used with “ocular” defined as ocular effects of OE50 = 30 mg s m−3 with a probit slope = 12 and “lethal” defined as LCt50 = 4000 mg s m−3 with a probit slope of 12. For this situation (Fig. 15a), assuming green is acceptable, one might conclude that both models and both levels of effects, ocular and lethal, are satisfactory, at least at short range. Of course, at longer ranges, one imagines that MOE performance based on ocular effects might degrade substantially, because these ocular effects are expected to extend to ranges where, for example, uncertainties in wind field direction and speed are much more significant. Figure 15b presents the four SCIPUFF and NARAC comparative MOE estimates from our earlier discussion where, in this case, the FBFOM user coloring is applied. Here only models with MOE values near the diagonal are considered acceptable, because it is on the diagonal that the observed and predicted regions are of identical size. In this case, at least for ocular effects, our notional user might prefer the SCIPUFF predictions.

Testing for differences between two-dimensional MOE values

Figure 15 also shows that the 95% confidence regions for the corresponding NARAC and SCIPUFF MOE values are completely separate, suggesting statistically significant differences. These differences can be further quantified by computing p values associated with an appropriate hypothesis test, as discussed below. First, the 51 individual MOE vector differences between various model predictions are computed; for example, the vector differences between the NARAC (ocular) and the SCIPUFF (ocular) MOE values are calculated. If two sets of model predictions were identical, then all 51 vector differences would be (0, 0). For this study, the null hypothesis is that the two models being compared are equivalent. Therefore, any MOE vector difference is expected to be equally likely to reside in any of the four quadrants, defined as positive x, positive y (+, +); positive x, negative y (+, −); negative x, positive y (−, +); and negative x, negative y (−, −). Given this null hypothesis, one tests how unlikely the observed result is by simulating the appropriate permutations in the following way. First, the quadrant with the most MOE vector differences is identified, and the number of differences in that quadrant is noted. For example, for SCIPUFF (lethal) minus NARAC (lethal), 42 of 51 vector differences occupy the “−, +” quadrant, consistent with Fig. 15 and the notion that the SCIPUFF (lethal) predictions resulted in a larger false-negative and smaller false-positive fraction than the NARAC (lethal) predictions. Next, we simulate results for equivalent model predictions by creating 100 000 samples of 51 drawn from the uniform integer distribution on [1, 4], that is, a multinomial distribution with equal likelihood for each of the four outcomes. These 100 000 samples of 51 correspond to our simulated vector differences for equivalent models. For each sample of 51, the numbers of “1”s, “2”s, “3”s, and “4”s that were randomly selected are determined. The maximum observed number (e.g., 42 from above) is then compared with the corresponding maximum number associated with each simulated sample. The number of simulated samples that contain a maximum value that is greater than or equal to the observed maximum is determined and is denoted N. The estimated p value is then computed as N divided by 100 000. For suitably low p values, one might reject the null hypothesis of equivalence between the values being compared (Sprent and Smeeton 2001). The two-dimensional, four-quadrant hypothesis test described here is a natural extension of the one-dimensional sign test (Sprent 1998). When comparing SCIPUFF and NARAC MOE values based on ocular and lethal effects, the resulting maximum numbers for MOE vector differences are 42 and 36, respectively, and reside in the (−, +) quadrant. In both cases, the resulting p values are less than 1 × 10−5, strongly supporting the notion that the differences between SCIPUFF and NARAC shown in Fig. 15 are statistically significant.

Conclusions

A two-dimensional MOE has been proposed for the evaluation of transport and dispersion models. In model-to-field-trial comparisons this user-oriented measure of effectiveness has consistently resolved important model performance features. Statistically significant resolution of model performance differences as a function of downwind range and meteorological stability category grouping were described for SCIPUFF predictions. Also, differences between models—SCIPUFF and NARAC—could be easily discerned and characterized with the MOE.

By applying a lethality/effects “filter,” one can compute MOE values that relate the goodness of a prediction for presumed agents of greatly varying toxicity (e.g., ocular vs lethal effects). A quantitative method that can aid in the communication of a user's risk tolerance has been described—the coloring of the two-dimensional MOE space in terms of a user's potential scoring function.

The above features may make this MOE of particular value with respect to validation studies. For instance, the specific application and user will dictate the effects level of interest and the associated risk tolerance (therefore, describing a user-coloring scheme). This type of user-oriented MOE, when properly employed, necessarily involves the user and a specific application. This early involvement of the potential user is often a critical missing element during a validation effort.

Future related efforts will focus on expanding the application of this MOE to predictions of longer-range observations (e.g., European Tracer Experiment; Mosca et al. 1998), observations within an urban environment (Urban 2000; Allwine et al. 2002), and interior building releases (Platt et al. 2002) and as a diagnostic aid for model intercomparisons (Warner et al. 2001d).

Acknowledgments

This research is sponsored by the Defense Threat Reduction Agency, with Dr. Allan Reiter as project monitor. The authors thank Drs. Steven R. Hanna and Joseph C. Chang of George Mason University for numerous helpful discussions. The views expressed in this paper are solely those of the authors. No official endorsement by the Department of Defense is intended or should be inferred.

REFERENCES

  • Allwine, K. J., , J. H. Shinn, , G. E. Streit, , K. L. Clawson, , and M. Brown. 2002. Overview of URBAN 2000: A multiscale field study of dispersion through an urban environment. Bull. Amer. Meteor. Soc. 83:521536.

    • Search Google Scholar
    • Export Citation
  • ASTM 2000. Standard guide for statistical evaluation of atmospheric dispersion model performance. American Society for Testing and Materials, Designation D 6589-00, 17 pp. [Available from ASTM, 100 Barr Harbor Dr., PO Box C700, West Conshohocken, PA, 19428.].

    • Search Google Scholar
    • Export Citation
  • Barad, M. L. Ed.,. 1958. Project Prairie Grass, a field program in diffusion. Vols. I and II. Geophysical Res. Papers 59, Rep. AFCRC-TR-58-235, 439 pp.

    • Search Google Scholar
    • Export Citation
  • Efron, B., and R. J. Tibshirani. 1993. An Introduction to the Bootstrap. Monographs on Statistics and Applied Probability, No. 57, Chapman and Hall, 436 pp.

    • Search Google Scholar
    • Export Citation
  • Finney, D. J. 1971. Probit Analysis. Cambridge University Press, 333 pp.

  • Hanna, S. R. 1988. Air quality model evaluation and uncertainty. J. Air Pollut. Control Assoc. 38:406412.

  • Hanna, S. R., , J. C. Chang, , and D. G. Strimaitis. 1993. Hazardous model evaluation with field observations. Atmos. Environ. 27A:22652285.

    • Search Google Scholar
    • Export Citation
  • Irwin, J. S., and M-R. Rosu. 1998. Comments on draft practices for statistical evaluation of atmospheric dispersion models. Preprints, 10th Joint Conf. on the Applications of Air Pollution Meteorology, Phoenix, AZ, Amer. Meteor. Soc., 6–10.

    • Search Google Scholar
    • Export Citation
  • Mosca, S., , G. Graziani, , W. Klug, , R. Bellasio, , and R. Bianconi. 1998. A statistical methodology for the evaluation of long-range dispersion models: An application to the ETEX exercise. Atmos. Environ. 32:43074324.

    • Search Google Scholar
    • Export Citation
  • Nasstrom, J. S., , G. Sugiyama, , J. M. Leone Jr., , and D. L. Ermak. 2000. A real-time atmospheric dispersion modeling system. Preprints, 11th Joint Conf. on the Applications of Air Pollution Meteorology, Long Beach, CA, Amer. Meteor. Soc., 84–89.

    • Search Google Scholar
    • Export Citation
  • Petty, R. 2000. User requirements for dispersion modeling. Proc. Workshop on Multiscale Atmospheric Dispersion Within the Federal Community, Silver Spring, MD, Office of the Federal Coordinator for Meteorological Services and Supporting Research, 1-1–1-3.

    • Search Google Scholar
    • Export Citation
  • Platt, N., , S. Warner, , and J. F. Heagy. 2002. Application of two-dimensional user-oriented measure of effectiveness to interior building releases. Proc. Sixth Annual George Mason University Transport and Dispersion Modeling Workshop, Fairfax, VA, Defense Threat Reduction Agency, CD-ROM. [Available from School of Computational Sciences, MS 5C3, 103 Science & Technology I, George Mason University, 4400 University Drive, Fairfax, VA 22030-4444.].

    • Search Google Scholar
    • Export Citation
  • SAIC 2001. The hazard prediction and assessment capability (HPAC) user's guide version 4.0. Science Applications International Corporation (SAIC) for Defense Threat Reduction Agency (DTRA) HPAC-UGUIDE-02-U-RAC0, 598 pp. [Available from Defense Threat Reduction Agency, 6801 Telegraph Road, Alexandria, VA, 22310-3398.].

    • Search Google Scholar
    • Export Citation
  • Sprent, P. 1998. Data Driven Statistical Methods. Chapman and Hall, 406 pp.

  • Sprent, P., and N. C. Smeeton. 2001. Applied Nonparametric Statistical Methods. 3d ed. Chapman and Hall, 461 pp.

  • Sykes, R. I., , S. F. Parker, , and R. S. Gabruk. 1996. SCIPUFF—A generalized hazard prediction model. Preprints, Ninth Joint Conf. on the Applications of Air Pollution Meteorology, Atlanta, GA, Amer. Meteor. Soc., 184–188.

    • Search Google Scholar
    • Export Citation
  • Venkatram, A. 1988. Inherent uncertainty in air quality modeling. Atmos. Environ. 22:12211227.

  • Warner, S. Coauthors 2001a. Evaluation of transport and dispersion models: A controlled comparison of Hazard Prediction and Assessment Capability (HPAC) and National Atmospheric Release Advisory Center (NARAC) predictions. Institute for Defense Analyses Paper P-3555, 251 pp. [Available from Steve Warner, Institute for Defense Analyses, 4850 Mark Center Drive, Alexandria, VA 22311-1882.].

    • Search Google Scholar
    • Export Citation
  • Warner, S., , N. Platt, , and J. F. Heagy. 2001b. Application of user-oriented measure of effectiveness to HPAC probabilistic predictions of Prairie Grass field trials. Institute for Defense Analyses Paper P-3586, 275 pp. [Available from Steve Warner, Institute for Defense Analyses, 4850 Mark Center Drive, Alexandria, VA 22311-1882.].

    • Search Google Scholar
    • Export Citation
  • Warner, S., , N. Platt, , and J. F. Heagy. 2001c. User-oriented measures of effectiveness for the evaluation of transport and dispersion models. Proc. Seventh Int. Conf. on Harmonisation within Atmospheric Dispersion Modelling for Regulatory Purposes, Belgirate, Italy, JRC-EI, 24–29.

    • Search Google Scholar
    • Export Citation
  • Warner, S. Coauthors 2001d. Model intercomparison with user-oriented measures of effectiveness. Proc. Fifth Annual George Mason University Transport and Dispersion Modeling Workshop, Fairfax, VA, Defense Threat Reduction Agency, CD-ROM. [Available from School of Computational Sciences, MS 5C3, 103 Science & Technology I, George Mason University, 4400 University Drive, Fairfax, VA 22030-4444.].

    • Search Google Scholar
    • Export Citation
  • Warner, S. Coauthors 2001e. User-oriented measures of effectiveness for the evaluation of transport and dispersion models. Institute for Defense Analyses Paper P-3554, 797 pp. [Available from Steve Warner, Institute for Defense Analyses, 4850 Mark Center Drive, Alexandria, VA 22311-1882.].

    • Search Google Scholar
    • Export Citation
Fig. 1.
Fig. 1.

Conceptual view of overlap (AOV), false-negative (AFN), and false-positive (AFP) regions that are used to construct the user-oriented MOE

Citation: Journal of Applied Meteorology 43, 1; 10.1175/1520-0450(2004)043<0058:UTMOEF>2.0.CO;2

Fig. 2.
Fig. 2.

Key characteristics of the two-dimensional MOE space

Citation: Journal of Applied Meteorology 43, 1; 10.1175/1520-0450(2004)043<0058:UTMOEF>2.0.CO;2

Fig. 3.
Fig. 3.

Interpretation of MOE comparisons: exclusionary zones

Citation: Journal of Applied Meteorology 43, 1; 10.1175/1520-0450(2004)043<0058:UTMOEF>2.0.CO;2

Fig. 4.
Fig. 4.

Illustration of MOE component computations for a Project Prairie Grass field observation and SCIPUFF (Sykes et al. 1996) model prediction. The illustrated computations for the comparisons of predictions and observations are for dosages (mg s m−3) and are associated with Project Prairie Grass field trial number 8 along the 800-m arc: based on (a) dosage and (b) a threshold dosage of 60 mg s m−3

Citation: Journal of Applied Meteorology 43, 1; 10.1175/1520-0450(2004)043<0058:UTMOEF>2.0.CO;2

Fig. 5.
Fig. 5.

Relationship between RWFMS and the MOE. Isolines of RWFMS in the 2D MOE space: (a) CFN = 1, and CFP = 1; (b) CFN = 5, and CFP = 0.5

Citation: Journal of Applied Meteorology 43, 1; 10.1175/1520-0450(2004)043<0058:UTMOEF>2.0.CO;2

Fig. 6.
Fig. 6.

User coloring of MOE space: RWFMS

Citation: Journal of Applied Meteorology 43, 1; 10.1175/1520-0450(2004)043<0058:UTMOEF>2.0.CO;2

Fig. 7.
Fig. 7.

Fractional bias isolines in the 2D MOE space for some values of the parameter s

Citation: Journal of Applied Meteorology 43, 1; 10.1175/1520-0450(2004)043<0058:UTMOEF>2.0.CO;2

Fig. 8.
Fig. 8.

Relationship between FB and MOE: examples of FBFOM user coloring for s = 1.15, 1.5, and 2

Citation: Journal of Applied Meteorology 43, 1; 10.1175/1520-0450(2004)043<0058:UTMOEF>2.0.CO;2

Fig. 9.
Fig. 9.

Isolines of NAD in the 2D MOE space. Note that the lower values of NAD correspond to less scatter and, hence, better model performance

Citation: Journal of Applied Meteorology 43, 1; 10.1175/1520-0450(2004)043<0058:UTMOEF>2.0.CO;2

Fig. 10.
Fig. 10.

MOE 95% confidence regions for SCIPUFF predictions of 51 Project Prairie Grass releases as a function of distance to the arc: (a) MOE based on total dosage and (b) MOE based on dosage threshold of 60 mg s m−3. The clusters of 9500 bootstrap samples correspond to the 800- (red), 400- (pink), 200- (lightest blue), 100- (blue), and 50-m arcs (darkest blue). In the figures, especially Fig. 4b, there is considerable overlap associated with the 50–400-m-arc MOE confidence regions

Citation: Journal of Applied Meteorology 43, 1; 10.1175/1520-0450(2004)043<0058:UTMOEF>2.0.CO;2

Fig. 11.
Fig. 11.

MOE 95% confidence regions for SCIPUFF predictions of 51 Project Prairie Grass releases: (a) MOE based on total dosage and 51 trials, (b) MOE based on dosage threshold of 60 mg s m−3 and 51 trials, (c) MOE based on total dosage as a function of stability category grouping, and (d) MOE based on dosage threshold of 60 mg s m−3 as a function of stability category grouping. For this study, stability category groupings were defined as follows, using Irwin and Rosu (1998) Project Prairie Grass stability class assignments 1–7: unstable = 1, 2, and 3 (red); neutral = 4 (blue); and stable = 5, 6, and 7 (green)

Citation: Journal of Applied Meteorology 43, 1; 10.1175/1520-0450(2004)043<0058:UTMOEF>2.0.CO;2

Fig. 12.
Fig. 12.

View of SCIPUFF probabilistic prediction outputs: contours of the probability that dosage exceeds 60 mg s m−3. This HPAC (SCIPUFF) prediction was done for Project Prairie Grass release number 43 and corresponds to the predicted probability that the SO2 dosage at a sampler 1.5 m above ground is greater than 60 mg s m−3. See Warner et al. (2001b) for details

Citation: Journal of Applied Meteorology 43, 1; 10.1175/1520-0450(2004)043<0058:UTMOEF>2.0.CO;2

Fig. 13.
Fig. 13.

Overlay of MOE estimates (based on 60 mg s m−3 threshold) for SCIPUFF probabilistic predictions and 51 Project Prairie Grass trials on notional RWFMS user coloring: (a) CFN = CFP = 1, (b) CFN = 5 and CFP = 0.5. Confidence regions (95%) for MOE estimates based on SCIPUFF probability predictions of 0.01, 0.50, 0.80, 0.85, 0.90, 0.95, and 0.99 are shown, as well as SCIPUFF mean-value (ensemble average) predictions (yellow cluster)

Citation: Journal of Applied Meteorology 43, 1; 10.1175/1520-0450(2004)043<0058:UTMOEF>2.0.CO;2

Fig. 14.
Fig. 14.

Notional dose-response curve (for a probit-like model). Moderate dosage differences between the prediction and the observation near LCt50 (red dashed arrows) can have a dramatic impact on the fraction of the population affected, whereas substantial relative dosage differences associated with larger predicted and observed dosages (well beyond LCt50; green dashed arrows) or smaller predicted and observed dosages (well below LCt50) do not necessarily have much impact on the estimated fraction of the affected population

Citation: Journal of Applied Meteorology 43, 1; 10.1175/1520-0450(2004)043<0058:UTMOEF>2.0.CO;2

Fig. 15.
Fig. 15.

SCIPUFF and NARAC MOE 95% confidence regions based on 51 Project Prairie Grass field trials, considering ocular and lethal effects of a notional agent: (a) overlay on RWFMS coloring with CFP = CFN = 0.5 and (b) overlay on FBFOM coloring with s = 1.25. The MOE confidence regions shown are based on probit models for ocular effects with OE50 = 30 mg s m−3 and for lethal effects with LCt50 = 4000 mg s m−3, with both effects models assuming a probit slope of 12. MOE clusters based on NARAC predictions (black and red), MOE clusters based on SCIPUFF predictions (blue and brown), MOE clusters based on ocular effects (black and blue), and MOE clusters based on lethal effects (red and brown) are shown

Citation: Journal of Applied Meteorology 43, 1; 10.1175/1520-0450(2004)043<0058:UTMOEF>2.0.CO;2

Save