• Baldwin, M. E., and J. S. Kain, 2004: Examining the sensitivity of various performance measures. Preprints, 17th Conf. on Probability and Statistics in the Atmospheric Sciences, Seattle, WA, Amer. Meteor. Soc., CD-ROM, 2.9.

  • Barker, E. H., 1992: Design of the Navy’s multivariate optimum interpolation analysis system. Wea. Forecasting, 7 , 220231.

  • Bernardet, L. R., L. D. Grasso, J. E. Nachamkin, C. A. Finley, and W. R. Cotton, 2000: Simulating convective events using a high-resolution mesoscale model. J. Geophys. Res., 105 , 1496314982.

    • Search Google Scholar
    • Export Citation
  • Biltoft, C. A., 1998: Dipole Pride 26: Phase II of Defense Special Weapons Agency transport and dispersion model validation. DPG Doc. DPG-FR-98-001, prepared for Defense Threat Reduction Agency by Meteorology and Obscurants Divisions, West Desert Test Center, U.S. Army Dugway Proving Ground, Dugway, UT, 77 pp.

  • Casati, B., G. Ross, and D. B. Stephenson, 2004: A new intensity-scale approach for the verification of spatial precipitation forecasts. Meteor. Appl., 11 , 141154.

    • Search Google Scholar
    • Export Citation
  • Chang, J. C., P. Franzese, K. Chayantrakom, and S. R. Hanna, 2003: Evaluations of CALPUFF, HPAC, and VLSTRACK with two mesoscale field datasets. J. Appl. Meteor., 42 , 453466.

    • Search Google Scholar
    • Export Citation
  • Colle, B. A., C. F. Mass, and D. Ovens, 2001: Evaluation of the timing and strength of MM5 and Eta surface trough passages over the eastern Pacific. Wea. Forecasting, 16 , 553572.

    • Search Google Scholar
    • Export Citation
  • Daley, R., and E. Barker, 2001: NAVDAS Source Book 2001: NRL Atmospheric Variational Data Assimilation System. NRL/PU/7530—01-441, Naval Research Laboratory, Monterey, CA, 163 pp.

  • Davies, H. C., 1976: A lateral boundary formulation for multi-level prediction models. Quart. J. Roy. Meteor. Soc., 102 , 405418.

  • Doyle, J. D., M. A. Shapiro, Q. Jiang, and D. L. Bartels, 2005: Large amplitude mountain wave breaking over Greenland. J. Atmos. Sci., 62 , 31063126.

    • Search Google Scholar
    • Export Citation
  • DTRA, 1999: HPAC hazard prediction and assessment capability, version 3.2. Defense Threat Reduction Agency, Alexandria, VA, 406 pp.

  • Ebert, E., and J. L. McBride, 2000: Verification of precipitation in weather systems: Determination of systematic errors. J. Hydrol., 239 , 179202.

    • Search Google Scholar
    • Export Citation
  • Efron, B., 1987: Better bootstrap confidence intervals. J. Amer. Stat. Assoc., 82 , 171185.

  • Efron, B., and R. J. Tibshirani, 1993: An Introduction to the Bootstrap. Monographs on Statistics and Applied Probability, No. 57, Chapman and Hall, 436 pp.

    • Search Google Scholar
    • Export Citation
  • Hanna, S. R., and R. Yang, 2001: Evaluations of mesoscale models’ simulations of near-surface winds, temperature gradients, and mixing depths. J. Appl. Meteor., 40 , 10951104.

    • Search Google Scholar
    • Export Citation
  • Hanna, S. R., J. C. Chang, and D. G. Strimaitis, 1993: Hazardous gas model evaluation with field observations. Atmos. Environ., 27A , 22652285.

    • Search Google Scholar
    • Export Citation
  • Harvey Jr., L. O., K. R. Hammond, C. M. Lusk, and E. F. Mross, 1992: Application of signal detection theory to weather forecasting behavior. Mon. Wea. Rev., 120 , 863883.

    • Search Google Scholar
    • Export Citation
  • Hodur, R. M., 1997: The Naval Research Laboratory’s Coupled Ocean/Atmosphere Mesoscale Prediction System (COAMPS). Mon. Wea. Rev., 125 , 14141430.

    • Search Google Scholar
    • Export Citation
  • Hogan, T. F., M. S. Peng, J. A. Ridout, and W. M. Clune, 2002: A description of the impact of changes to NOGAPS convection parameterization and the increase in resolution to T239L30. NRL Memo. Rep. NRL/MR/7530-02-52, Naval Research Laboratory, Monterey, CA, 10 pp.

  • Jiang, Q., J. D. Doyle, and R. B. Smith, 2005: Blocking, descent and gravity waves: Observations and modeling of a MAP northerly föen event. Quart. J. Roy. Meteor. Soc., 131 , 675701.

    • Search Google Scholar
    • Export Citation
  • Louis, J-F., 1979: A parametric model of vertical eddy fluxes in the atmosphere. Bound.-Layer Meteor., 17 , 187202.

  • Manobianco, J., and P. A. Nutter, 1999: Evaluation of the 29-km Eta Model. Part II: Subjective verification over Florida. Wea. Forecasting, 14 , 1837.

    • Search Google Scholar
    • Export Citation
  • Mass, C. F., D. Ovens, K. Westrick, and B. A. Colle, 2002: Does increasing horizontal resolution produce more skillful forecasts? The results of two years of real-time numerical weather prediction over the Pacific Northwest. Bull. Amer. Meteor. Soc., 83 , 407430.

    • Search Google Scholar
    • Export Citation
  • Mellor, G. L., and T. Yamada, 1974: A hierarchy of turbulence closure models for planetary boundary layers. J. Atmos. Sci., 31 , 17911806.

    • Search Google Scholar
    • Export Citation
  • Mosca, S., G. Graziani, W. Klug, R. Bellasio, and R. Bianconni, 1998: A statistical methodology for the evaluation of long-range dispersion models: An application to the ETEX exercise. Atmos. Environ., 32 , 43074324.

    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1988: Skill scores based on the mean square error and their relationships to the correlation coefficient. Mon. Wea. Rev., 116 , 24172424.

    • Search Google Scholar
    • Export Citation
  • Nachamkin, J. E., 2004: Mesoscale verification using meteorological composites. Mon. Wea. Rev., 132 , 941955.

  • Pullen, J. D., J. P. Boris, T. Young, G. Patnaik, and J. Iselin, 2005: A comparison of contaminant plume statistics from a Gaussian puff and urban CFD model for two large cities. Atmos. Environ., 39 , 10491068.

    • Search Google Scholar
    • Export Citation
  • Rife, D. L., and C. A. Davis, 2005: Verification of temporal variations in mesoscale numerical wind forecasts. Mon. Wea. Rev., 133 , 33683381.

    • Search Google Scholar
    • Export Citation
  • Sykes, R. I., S. F. Parker, D. S. Henn, C. P. Cerasoli, and L. P. Santos, 1998: PC-SCIPUFF version 1.2PD, technical documentation. ARAP Rep. 718, Titan Research and Technology Division, Titan Corp., Princeton, NJ, 172 pp.

  • Warner, S., N. Platt, and J. F. Heagy, 2004: User-oriented two-dimensional measure of effectiveness for the evaluation of transport and dispersion models. J. Appl. Meteor., 43 , 5873.

    • Search Google Scholar
    • Export Citation
  • Warner, T. T., R-S. Sheu, J. F. Bowers, R. I. Sykes, G. C. Dodd, and D. S. Henn, 2002: Ensemble simulations with coupled atmospheric dynamic and dispersion models: Illustrating uncertainties in dosage simulations. J. Appl. Meteor., 41 , 488504.

    • Search Google Scholar
    • Export Citation
  • Watson, T. B., R. E. Keislar, B. Reese, D. H. George, and C. A. Biltoft, 1998: The Defense Special Weapons Agency Dipole Pride 26 field experiment. NOAA/Air Resources Laboratory Tech. Memo. ERL ARL-225, 90 pp.

  • White, G. B., J. Paegle, W. J. Steenburgh, J. D. Horel, R. T. Swanson, L. K. Cook, D. J. Onton, and J. G. Miles, 1999: Short-term forecast validation of six models. Wea. Forecasting, 14 , 84108.

    • Search Google Scholar
    • Export Citation
  • Zepeda-Arce, J., E. Foufoula-Georgiou, and K. K. Droegemeier, 2000: Space–time rainfall organization and its role in validating quantitative precipitation forecasts. J. Geophys. Res., 105 , 1012910146.

    • Search Google Scholar
    • Export Citation
  • View in gallery
    Fig. 1.

    Terrain elevation (m) at the DP26 experiment located at the Yucca Flat test site in NV. Whole-air samplers are depicted by the thick lines, with 30 samplers per line. Solid circles represent surface meteorological MEDA stations, while the open triangles represent the four possible gas release sites. The soundings used in this study were released from BJY. The map covers an area of 30 km × 35 km with the southwest corner roughly corresponding to 36.9°N, 116.3°W. This figure was taken from Chang et al. (2003) through the courtesy of J. C. Chang.

  • View in gallery
    Fig. 2.

    The nested grids used for the atmospheric model simulation are shown. Horizontal grid spacing for the innermost grid (grid 4) was 1 km. Spacings for the remaining grids were 3, 9, and 27 km, respectively.

  • View in gallery
    Fig. 3.

    Statistics comparing the predicted and observed wind speed at 10 m AGL are shown. The statistics for each forecast hour were averaged over all available MEDA stations for all of the 14 DP26 trials. (a) Bias statistics and (b) RMS (m s−1), with dashed and solid lines indicating coarse and fine grid values, respectively. (c) The average observed wind speed over all of the MEDA stations (m s−1).

  • View in gallery
    Fig. 4.

    Same as in Fig. 3, but the 10-m direction (a) bias and (b) RMS (°).

  • View in gallery
    Fig. 5.

    Upper-air statistics comparing the atmospheric forecasts against the soundings released from BJY (Fig. 1). Bias and RMS errors are represented by thin and thick lines, respectively. Errors on grid 4 and grid 1 are represented by solid and dashed lines, respectively. Errors from all forecast times (0–18 h) are combined because of the lack of soundings at a given time. (a) Temperature errors (°C), (b) speed errors (m s−1), and (c) direction errors (°).

  • View in gallery
    Fig. 6.

    Scatter diagrams of predicted vs observed 3-h integrated maximum SF6 dosages (ppt h) along the whole-air sampling lines. HPAC forecasts using atmospheric forcing from the (a) coarse grid and (b) fine grid.

  • View in gallery
    Fig. 7.

    The measure of effectiveness 95% confidence regions for the 14 coarse grid (black) and fine grid (gray) releases based on a dosage threshold of 30 ppt h.

  • View in gallery
    Fig. 8.

    The 12-h forecast from grid 1 and the accompanying MEDA observations, both valid at 1200 UTC 8 Nov 1996. Model topography (m) is shaded and forecast winds (m s−1) are depicted by the thin barbs. The only grid point within the DP26 region was located near (37.1°N, 116.2°W). Observed winds (m s−1) are depicted by thick barbs. For all barbs, each half barb increment equals 2.5 m s−1. Note that data from additional MEDA stations beyond those used in the RMS and bias statistics were available at this time.

  • View in gallery
    Fig. 9.

    As in Fig. 8, except that topography and wind forecasts on grid 4 are displayed.

  • View in gallery
    Fig. 10.

    Three-hour HPAC forecasts of total SF6 dosage (ppt h) from the 8 Nov trial are depicted according to the legend in (a). Atmospheric input is from (a) grid 1, (b) grid 4, and (c) the observations. Note that the gas cloud in (a) advected off the grid after 1.21 h. The distance scale at the bottom of each figure represents 10 km.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 83 39 5
PDF Downloads 64 56 0

Evaluation of Dispersion Forecasts Driven by Atmospheric Model Output at Coarse and Fine Resolution

Jason E. NachamkinNaval Research Laboratory, Monterey, California

Search for other papers by Jason E. Nachamkin in
Current site
Google Scholar
PubMed
Close
,
John CookNaval Research Laboratory, Monterey, California

Search for other papers by John Cook in
Current site
Google Scholar
PubMed
Close
,
Mike FrostNaval Research Laboratory, Monterey, California

Search for other papers by Mike Frost in
Current site
Google Scholar
PubMed
Close
,
Daniel MartinezComputer Sciences Corporation, Monterey, California

Search for other papers by Daniel Martinez in
Current site
Google Scholar
PubMed
Close
, and
Gary SprungComputer Sciences Corporation, Monterey, California

Search for other papers by Gary Sprung in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

Lagrangian parcel models are often used to predict the fate of airborne hazardous material releases. The atmospheric input for these integrations is typically supplied by surrounding surface and upper-air observations. However, situations may arise in which observations are unavailable and numerical model forecasts may be the only source of atmospheric data. In this study, the quality of the atmospheric forecasts for use in dispersion applications is investigated as a function of the horizontal grid spacing of the atmospheric model. The Coupled Ocean–Atmosphere Mesoscale Prediction System (COAMPS) was used to generate atmospheric forecasts for 14 separate Dipole Pride 26 trials. The simulations consisted of four telescoping one-way nested grids with horizontal spacings of 27, 9, 3, and 1 km, respectively. The 27- and 1-km forecasts were then used as input for dispersion forecasts using the Hazard Prediction Assessment Capability (HPAC) modeling system. The resulting atmospheric and dispersion forecasts were then compared with meteorological and gas-dosage observations collected during Dipole Pride 26. Although the 1-km COAMPS forecasts displayed considerably more detail than those on the 27-km grid, the RMS and bias statistics associated with the atmospheric observations were similar. However, statistics from the HPAC forecasts showed the 1-km atmospheric forcing produced more accurate trajectories than the 27-km output when compared with the dosage measurements.

Corresponding author address: Jason E. Nachamkin, Naval Research Laboratory, 7 Grace Hopper Ave., Monterey, CA 93943. Email: jason.nachamkin@nrlmry.navy.mil

Abstract

Lagrangian parcel models are often used to predict the fate of airborne hazardous material releases. The atmospheric input for these integrations is typically supplied by surrounding surface and upper-air observations. However, situations may arise in which observations are unavailable and numerical model forecasts may be the only source of atmospheric data. In this study, the quality of the atmospheric forecasts for use in dispersion applications is investigated as a function of the horizontal grid spacing of the atmospheric model. The Coupled Ocean–Atmosphere Mesoscale Prediction System (COAMPS) was used to generate atmospheric forecasts for 14 separate Dipole Pride 26 trials. The simulations consisted of four telescoping one-way nested grids with horizontal spacings of 27, 9, 3, and 1 km, respectively. The 27- and 1-km forecasts were then used as input for dispersion forecasts using the Hazard Prediction Assessment Capability (HPAC) modeling system. The resulting atmospheric and dispersion forecasts were then compared with meteorological and gas-dosage observations collected during Dipole Pride 26. Although the 1-km COAMPS forecasts displayed considerably more detail than those on the 27-km grid, the RMS and bias statistics associated with the atmospheric observations were similar. However, statistics from the HPAC forecasts showed the 1-km atmospheric forcing produced more accurate trajectories than the 27-km output when compared with the dosage measurements.

Corresponding author address: Jason E. Nachamkin, Naval Research Laboratory, 7 Grace Hopper Ave., Monterey, CA 93943. Email: jason.nachamkin@nrlmry.navy.mil

1. Introduction

Increased computing capabilities are currently facilitating a rapid increase in the spatial resolution of the available atmospheric forecast guidance. As horizontal grid spacings approach 1 km, the resultant forecasts become increasingly complex, producing detailed accounts of specific weather phenomena. Some of these forecasts have demonstrated considerable accuracy, especially in cases involving fixed physiographic features. Nachamkin (2004) showed that mistral forecasts were well correlated with the satellite wind observations up to 66 h in advance. Other case study examples indicate that high-resolution grids can resolve sea breezes (Manobianco and Nutter 1999), mountain waves (Doyle et al. 2005; Jiang et al. 2005), frontal and convective phenomena (Bernardet et al. 2000; Colle et al. 2001), and detailed weather in complex terrain (Mass et al. 2002; Hart et al. 2005). All of these weather phenomena have profound effects on the short-term fate of atmospheric contaminates. Thus forecast performance metrics are needed to properly gauge the uncertainty and any value-added improvements that arise from high-resolution forecasts.

Performance gains arising from increased resolution are notoriously difficult to measure quantitatively. Zepeda-Arce et al. (2000), Ebert and McBride (2000), Warner et al. (2002), Warner et al. (2004), Nachamkin (2004), and Casati et al. (2004) all discuss variability and scale in relation to verification. Essentially, as forecast detail increases, slight position or timing errors greatly impact most current evaluation methods that rely on comparisons at fixed points. Baldwin and Kain (2004) found that the equitable threat score, which measures the degree of overlap between the observed and predicted phenomena, rapidly decreases with small errors in position and timing. Absolute error measures, such as the root-mean-square (RMS), are also sensitive to small displacements in areas where atmospheric conditions are highly variable (Murphy 1988). White et al. (1999) compared forecasts from the global-scale Medium Range Forecast model with forecasts from several mesoscale models and found few significant differences in the RMS scores. In some cases, the global model actually outscored the higher-resolution forecasts.

Rife and Davis (2005) compensated for high variance by isolating significant wind shift events over the southwestern U.S. desert as diagnosed in observed surface station time series. By accounting for phase shifts in time, the method partially mitigated the effects of increased variability. Their results showed modest gains in forecast skill at high resolutions, especially in complex terrain. However, these gains were small relative to measures of the maximum possible skill. Poor predictability at low wind speeds also contributes to generally high random error scores regardless of the grid spacing (Hanna and Yang 2001). All numerical models have trouble predicting speed and direction in light and variable wind regimes. However, the actual vector displacement error, and thus the dosage error, may be much smaller than the direction errors indicate because of the low wind speeds.

Given the many problems associated with high-resolution pointwise verification, evaluating the final dispersion forecasts created using the atmospheric model input should prove to be useful. In addition to winds, other prognostic variables such as temperatures, boundary layer height, and turbulent mixing may all be used as input to the dispersion model (Pullen et al. 2005). Dispersion forecasts act as integrators, combining the cumulative effects of multiple atmospheric variables along the contaminate trajectories. Since trajectories are influenced by the spatial error correlations, the forecast quality can be evaluated in ways pointwise statistics cannot depict. The high-resolution atmospheric forecasts may have large RMS scores, but if the errors are small in scale, and are uncorrelated along the plume trajectory the effects of the errors will be significantly mitigated.

In this study, atmospheric forecasts from the Coupled Ocean–Atmosphere Mesoscale Prediction System (COAMPS;1 Hodur 1997) were used to drive dispersion forecasts using the Second-Order Closure Integrated Puff algorithm (SCIPUFF; Sykes et al. 1998) as implemented within the Hazard Prediction Assessment Capability (HPAC) modeling system (DTRA 1999). The dispersion forecasts were then compared with observed measurements of sulfur hexafloride (SF6) taken in conjunction with the Dipole Pride 26 (DP26) field project (Biltoft 1998; Watson et al. 1998), while the meteorological forecast variables were verified against surface mesonet and upper-air observations. The COAMPS-driven HPAC forecasts were also compared with a corresponding set of dispersion forecasts performed by Chang et al. (2003) wherein HPAC was driven directly by the DP26 wind observations. In an effort to gauge the effect of atmospheric model resolution on the dispersion forecasts, all of the evaluations above were conducted for two COAMPS grids. One grid had a horizontal grid spacing of 27 km while the other had a horizontal grid spacing of 1 km.

2. Data and methods

a. Field experiment

Field data for DP26 were collected at the Yucca Flat test site located in southern Nevada (Fig. 1). The experiment was carried out in a north–south-oriented high desert valley that generally sloped southward across the domain. Ridges along the western and eastern domain extended up to 1200 and 900 m above the valley floor, respectively. Morning drainage and afternoon upslope flow were ideally constrained for investigating dispersion along the valley. During the month of November 1996, 21 SF6 releases were conducted over 14 trials. In each experiment, approximately 10–20 kg of gas were released instantaneously from a height of roughly 6 m above the ground. Release sites, indicated by open triangles S2, S3, N2, and N3 in Fig. 1, were selected based on the direction of the prevailing wind. Most experiments consisted of single releases, but a few experiments consisted of two consecutive releases approximately 90 min apart. Gas concentration measurements were collected along three sampler arrays indicated by the thick solid lines in Fig. 1. Each array consisted of 30 whole-air samplers located 1.5 m above ground level and spaced at approximately 250-m intervals. At each sampler, average gas concentrations were collected at 15-min intervals over a period of 3 h. To maximize the utility of the collection process, samples from the line farthest from the release point were delayed by 30 min. Samples were terminated after 3 h, and the resulting accumulated totals were used to represent the observed gas dosages in this study.

Meteorological parameters were collected from eight meteorological data (MEDA) mesonet stations, indicated by “M” in Fig. 1, at 15-min intervals. Wind observations at 10 m above ground level (AGL), humidity, and temperature observations at 2 m AGL were all recorded. Upper-air measurements were collected at one rawinsonde (UCC) and two pilot balloon (UCC and BJY) sites at 3–12-h intervals. Quality assurance assessments conducted by Chang et al. (2003) included 1) data screens for inconsistencies in time, units, and the missing data indicator; 2) examinations of the spatial surface wind distributions for visibly questionable data; 3) comparisons of concentration and wind data from neighboring instruments; and 4) inspections of the whole-air sampler quality indicators. These indicated the majority of the field data were of satisfactory quality. However, radiosonde winds in the lowest 500 m above ground level were found to be questionable because of tracking problems in strongly sheared environments. The suspect winds were typically at levels below the 850-hPa height; thus the upper-air statistics in this study focused on levels at and above 850 hPa. Since most of the DP26 soundings terminated between 500 and 400 hPa, the statistics were terminated at the 500-hPa level. Additional quality control procedures conducted in the present study revealed that the MEDA temperature measurements exhibited very little diurnal variation relative to nearby synoptic stations. These readings were considered to be in error as large diurnal variations are likely in the high desert during the autumn. Thus for the near-surface comparisons, only the wind statistics were investigated.

b. Atmospheric forecast model

COAMPS forecasts were generated for each DP26 trial using the grid configuration shown in Fig. 2. Horizontal grid spacings were 27, 9, 3, and 1 km on each of the one-way nested telescoping grids. Sixty vertical levels were employed, with the lowest level at 10 m above ground level. To resolve the boundary layer, 15 of the vertical levels resided in the lowest 1500 m of the atmosphere. Each forecast was integrated to 18 h using the full COAMPS nonhydrostatic physics. The gas releases typically corresponded with the 6–12-h forecasts, with no releases occurring prior to the 6-h forecast.

The model equations were defined on terrain-following sigma coordinates. Turbulent kinetic energy was predicted using a level 2.5 Mellor and Yamada (1974) scheme, while the surface layer was parameterized based on Louis (1979). The terrain was interpolated from the digital terrain elevation data (DTED) supplied by the National Imagery and Mapping Agency. Boundary conditions were supplied by the Navy Operational Global Atmospheric Prediction System (NOGAPS) (Hogan et al. 2002) at 6-h intervals using a Davies (1976) scheme. COAMPS was initialized “cold” from the NOGAPS analysis using the multivariate optimal interpolation (MVOI) analysis (Barker 1992). As mentioned above, COAMPS was given at least 6 h of forecast lead time prior to a DP26 trial release in order to build high-resolution boundary layer circulations on the fine grid.

c. Dispersion model

The dispersion forecasts were made using HPAC version 4.04, which uses SCIPUFF to drive the dispersion calculations. SCIPUFF is based on a Gaussian puff dispersion formulation in which contaminant clouds are followed in a Lagrangian sense. Parcels expand and eventually split after reaching a specified size as the contaminant disperses. Vertical splitting is allowed, although the boundary layer inversion acts as an upper bound. Both the mean and the variance of the concentration field are predicted within SCIPUFF.

To perform the parcel advection calculations, HPAC requires a three-dimensional, gridded wind field, which in this study was supplied by the COAMPS forecasts at 1-h intervals. In addition to the winds, three-dimensional forecasts of humidity, geopotential height, temperature, and two-dimensional boundary layer height field were also supplied at 1-h intervals. The remaining boundary layer parameters were calculated within HPAC using the COAMPS-supplied fields. Aside from the atmospheric model input, the HPAC simulations in this study were set up using nearly identical configurations to the DP26 study conducted by Chang et al. (2003). Such a setup allows for direct comparisons between dispersion forecasts from the two studies. Chang et al. (2003) used the Stationary Wind Fit and Turbulence (SWIFT) diagnostic model within HPAC to incorporate the observed MEDA winds into the dispersion forecast. The high spatial and temporal concentration of surface and sounding observations surrounding the SF6 releases provide a quality ground truth dataset from which the best possible dispersion forecast can be derived.

d. Verification technique

The meteorological forecasts were verified against the DP26 (Fig. 1) surface wind and upper-air wind, temperature, and moisture observations. Basic RMS and bias errors were calculated on the coarsest (nest 1) and finest (nest 4) grids as defined by
i1558-8432-46-11-1967-e1
i1558-8432-46-11-1967-e2
where x represents an atmospheric scalar quantity, N is the total number of forecast–observation pairs, and the superscripts p and o refer to the predicted and observed values, respectively.

Forecasts were bilinearly interpolated to the observation points, and since both the coarse and fine grids covered the entire DP26 network, data grouping (verifying nest 1 only over the area covered by nest 4) was not required.

Line-maximum dosage statistics were collected following Chang et al. (2003) and Hanna et al. (1993). The primary quantity being measured was the maximum 3-h integrated dosage along a given sample line. This measure inherently contains some uncertainty as the sampling lines were about 7.5 km long, and the bulk of the SF6 crossed the network over a 3-h period. However, Zepeda-Arce et al. (2000) and Nachamkin (2004) indicate such compromises between measurement precision and statistical accuracy are often necessary for meaningful statistics. Scatterplots comparing observed and predicted maximum dosage were employed, as was a subset of quantitative performance metrics considered by Hanna et al. (1993). The fractional bias (FB), normalized mean-square-error (NMSE), and the fraction of predictions within a factor of two of the observations (FAC2) were calculated as shown below:
i1558-8432-46-11-1967-e3
i1558-8432-46-11-1967-e4
i1558-8432-46-11-1967-e5
where Co represents the maximum integrated observed dosage along a sampling line and Cp represents the maximum integrated dosage predicted by HPAC using the atmospheric model forcing. The overbar represents the average over all data values from all trials. In addition, the correlation coefficient and the Spearman rank correlation between the observed and predicted dosages were calculated as shown below:
i1558-8432-46-11-1967-e6
i1558-8432-46-11-1967-e7
where σo and σp represent the standard deviations of the observed and predicted dosages, d represents the difference in the statistical rank between the observations and forecasts, and n is the number of observation/prediction pairs. Both correlation calculations measure the level of forecast accuracy when the bias is removed; however, the rank correlation is less sensitive to clustered sets of outliers. Following Chang et al. (2003), the mean μ and standard deviation σ for all of the above measures were estimated using bootstrap resampling (Efron 1987). The 95% confidence intervals were defined as
i1558-8432-46-11-1967-e8
where n is the number of resamples, which in this study was 1000, and t95% is the Student’s t value at the 95% confidence level with n − 1 degrees of freedom.

In this study, the geometric mean bias and geometric variance [MG and VG in Chang et al. (2003)] were not calculated because of a large number of forecasts with dosage underestimates—in particular, those using the coarse grid. MG and VG depend on logarithmic functions of concentration and are strongly influenced by small values. At zero concentration these statistics are undefined. Removing the bad forecasts from the sample using a minimum concentration threshold could mitigate this problem but the number of samples would vary significantly between each grid. We felt it better to keep the number of samples consistent given that only 42 forecast/observation concentration pairs were available from the 14 separate trials and three sampling lines. Thus, the limit of detection for the whole-air samplers was only applied to the observed concentrations in the scatterplots in an effort to be consistent with the plots shown in Chang et al. (2003). Watson et al. (1998) reported the minimum limit of detection for the DP26 whole-air samplers was 10 ppt, which corresponds to an integrated 3-h dosage of 30 ppt h.

Because of the limitations in the line-maximum dosage statistics, a separate set of statistics was derived using the two-dimensional user-oriented measure of effectiveness (MOE) outlined by Warner et al. (2004). This metric considers all dosage measurements above a specified threshold along the sampling arcs to estimate the overlap between the observed and predicted clouds. In the present study, statistics were taken for all samplers for each of the 14 trials where either the observed (“obs”) or predicted (“fcst”) 3-h dosage exceeded threshold (“thresh”) values of 30, 100, and 500 ppt h. At each threshold, the number of false positives (fcst ≥ thresh; ob < thresh), false negatives (fcst < thresh; ob ≥ thresh), and overlaps (fcst ≥ thresh; ob ≥ thresh) were recorded and contingency statistics were collected as described below:
i1558-8432-46-11-1967-e9
i1558-8432-46-11-1967-e10
where AOV, AFN, and AFP represent the number of overlap, false-negative, and false-positive forecasts, and Ap and Ao represent the total number of predictions and observations, respectively. The threat score (TS) also known as the critical success index (CSI) or the figure of merit in space (FMS; Mosca et al. 1998), measures the degree of overlap between the observations and the forecasts and ranges from 0 to 1. The bias is similar to the fractional bias in that it measures the extent to which the concentrations are under- or overpredicted.

In addition to these statistics, the results were displayed in the two-dimensional MOE space described by Warner et al. (2004). The MOE diagram is similar to the receiver operating characteristic (ROC) (e.g., Harvey et al. 1992) in that the two-dimensional plotting space is defined by contingency measures of the observed and predicted distributions. For the MOE, the x axis is defined as the ratio of the overlap region to the observed region (AOV/Ao), and the y axis is defined by the ratio of the overlap region to the predicted region (AOV/Ap). In this space, a perfect score is defined at x = y = 1 where both plumes identically overlap. Along the diagonal defined by y = x, the plume coverage is identical but the area of overlap decreases to 0 at x = y = 0. All points to the left of the diagonal are characterized by a greater number of false negatives than false positives (bias < 1.0), and those to the right of the diagonal have high biases (bias > 1.0). In general, forecasts with equal coverage and greater overlap are considered to be superior; thus, projections along the diagonal that are closer to (1, 1) are considered to be better.

The statistical confidence region for the average MOE vector was estimated using the Efron and Tibshirani (1993) bootstrap percentile method as employed by Warner et al. (2004). A total of 10 000 sets of 14 MOE vector resamples were created, and from these sets 10 000 vector-average MOE values were calculated. The 95% confidence regions were estimated from the resulting scatter around the original MOE vector average.

3. Atmospheric model evaluation

The statistics for the surface MEDA stations for all of the COAMPS forecasts are shown in Figs. 3 and 4. Since 10 of the 14 DP26 forecasts were initialized at 1200 UTC, the statistics generally reflect the model response to the diurnal cycle starting from 1200 UTC (0400 PST). Separate statistics taken solely from the forecasts initialized at 1200 UTC (not shown) bear this out. The model performance was consistent with recent findings by White et al. (1999) and Mass et al. (2002) in that increased grid resolution did not result in large decreases in bias and RMS errors. Instead, errors often fluctuated through the diurnal cycle with varying magnitudes and signs on each grid.

Wind speed biases (Fig. 3) were generally within ±1 m s−1 on both grids with a tendency for negative biases during the 3–9-h forecasts on the fine grid, and a general positive bias on the coarse grid. Speed RMS errors (Fig. 3b) were similar on both grids, with a slight increase at 9 h on the fine grid in response to the negative bias. Observed wind speeds were often at or below 2.5 m s−1 (5 kt) during most of the forecasts (Fig. 3c). Wind directions in such light and variable regimes are often unpredictable (Hanna and Yang 2001), and as a result both grids exhibited large-direction RMS errors during the weak wind periods (Fig. 4). Although direction RMS errors were lower on the fine grid during the 3-, 9-, 12-, and 18-h forecasts, none of these improvements were statistically significant at the 95% confidence level. Near-surface direction errors were apparent in the analysis because the MEDA stations were not ingested by the MVOI analysis. Instead, the analyzed surface winds were determined using the NOGAPS first-guess field. The negative direction biases reflect local biases in the NOGAPS winds, as the entire DP26 area was easily contained within one NOGAPS grid cell. In general, NOGAPS and COAMPS wind direction biases are close to zero when averaged over large (continental) areas. As the forecasts progressed, the directional biases became positive within 3 h on the fine grid and 9 h on the coarse grid before becoming negative again after the initial response. Using the COAMPS fields as a background first guess in the MVOI may have mitigated these systematic problems. However, intense computing requirements associated with generating the multiple high-resolution forecasts required for the data assimilation cycle necessitated a compromise.

Comparison of the atmospheric model statistics with other studies is difficult because of the limited geographical and temporal sample. Eight stations were sampled covering an area approximately 40 km by 40 km, and some systematic error correlations are likely, especially on the coarse grid. The overall averaged errors for all stations and times (Table 1) can be compared with similar case study samples conducted by Hanna and Yang (2001). Although the Hanna and Yang studies were conducted over larger areas, the speed and direction errors in this study are similar to their results. Total averaged wind speed RMS values in their study varied from 1.6 to 2.5 m s−1 while direction RMS errors ranged over 51°–76°.

The upper-air statistics (Fig. 5) were combined for all forecasts because of the small number of observations at any given time. In total, about 50 soundings were used, most of which were launched between the 8- and 14-h forecasts. Since all of the soundings were launched from the same site, the statistics are representative of the local area errors. For most forecasts, lower-tropospheric temperature forecasts tended to be cold by between 0.5° and 1.25°C on both grids, though the bias on the fine grid was slightly smaller. Boundary layer wind speeds tended to be too low on both grids as well. These errors are consistent with the temperature biases in that they reflect the possible underestimation of the downward mixing of warmer, higher momentum air from aloft. Wind direction biases tended to be within ±10° while the RMS fluctuated between 40° and 50°. Direction biases on the fine grid tended to be negative close to the surface, which is consistent with the slight negative bias in the surface statistics (Table 1). Direction RMS errors were similar on both grids. Aside from the near-surface wind direction bias, the only other notable difference between the two grids was in the speed RMS errors. Grid 4 had slightly higher RMS values through the column, especially above 700 mb. The increased RMS errors may have resulted from the increased resolution of circulations associated with the complex topography. Even slight errors in the placement and structure of mountain waves, mountain–valley circulations, and other complex features can result in increased error, even if the general nature of the predicted phenomena is consistent with the observations. However, for dispersion applications the increased error magnitude may be a worthy compromise if the scale of the error, and hence the error correlation, along the trajectories is reduced.

4. Dispersion forecast evaluation

Taken alone, the RMS and bias statistics indicate that the fine grid forecast errors were comparable to or at times even greater in magnitude than those on the coarse grid. However, these statistics do not measure the scale or the degree of correlation of the along-trajectory errors. Weakly correlated errors that are small relative to the scale of the trajectory may partially cancel one another along the track, leading to less of an impact on the dispersion forecast. Comparison of line-maximum dosages indicates that the dispersion forecasts using the high-resolution model forcing generally performed the best. Scatterplots (Fig. 6) indicate that 16 missed forecasts occurred when the coarse grid forcing was used as opposed to four from the fine grid. A miss was defined by zero or negligible 3-h maximum predicted dosages along a given sample line coincident with observed values greater than 30 ppt h. The standard and rank correlations between the predicted and observed values were both higher with the fine grid forcing (Table 2). The rank correlations were low for both grids because of the large number of outliers. However, rank correlations from the coarse grid forcing were slightly negative because of the cluster of zero-dosage forecasts.

Missed forecasts occurred during 11 of the 14 trials, and always involved the two sampling lines most distant from the source. The misses on the fine grid occurred during three separate trials, and in each case the plumes driven by the coarse and fine grid winds closely resembled one another. At no time did the fine grid incur a miss with a corresponding hit on the coarse grid. The fine grid forcing provided the best advantage during the morning when winds appeared to be most influenced by local topographically induced drainage flows. Performance was more comparable during the afternoon when the domain-average wind speed was above 2.5 m s−1.

Overforecasts were also a problem on both grids (Table 2), but tended to be worse with the fine grid forcing. Trials 3 and 4, both of which occurred during the morning, exhibited the highest dosage overestimates. Predicted dosages exceeded 16 000 ppt h while the observations were close to 5000 ppt h. In these cases the high dosages resulted from COAMPS winds that were predicted to be too weak. In general, the high dosage biases are consistent with the low wind speed biases on grid 4.

The high dosage biases with the fine grid were associated with higher forecast dosage standard deviations as well as degraded NMSE and FB scores (Table 2) relative to the coarse grid. Although the differences in the FB scores were statistically significant at the 95% confidence level the NMSE differences were not. Unfortunately, both the NMSE and FB statistics were affected by the lack of observations. The forecast plumes that missed the network were not sampled as false alarms; thus the NMSE was reduced and the FB was positively skewed. In all likelihood, the NMSE would be higher and the FB more negative with the coarse grid forcing had the missed forecasts been properly sampled. Given these caveats, the FB and NMSE statistics should only be used when the core of the predicted and observed plumes are both known to be fully within the observation network. The FAC2 measure is more robust in that it is only sensitive to the location of the observed plumes. As long as the observed plumes remain within the network, complete forecast misses are correctly scored as non-FAC2 values. The FAC2 correctly diagnoses the improved trajectory forecasts from the fine grid as indicated by the scatterplots in Fig. 6. The fine grid FAC2 values were about 40% above those on the coarse grid (Table 2), and these differences were statistically significant at the 95% level.

Given the limitations in the line-maximum statistics, the MOE diagnostics are useful for their dependence on the area of overlap. Like FAC2, those forecasts that missed the network contain zero overlaps and are thus correctly recorded as x = y = 0. The relative superiority of the grid-4 forcing is reflected in the MOE 95% confidence regions for the 30 ppt h dosage threshold2 (Fig. 7), as the distribution of dosage estimates indicates a greater overlap between the predicted and observed plumes. Although the grid-4 point cluster lies to the right of the diagonal because of the positive bias, Warner et al. (2004) illustrated that false positive errors are less deleterious if they enclose the observed plume as some of the grid-4 forecasts did. Dosage forecasts from both grids contained considerable scatter, and although some of the grid-4 forecasts were very accurate, others contained major errors. Overall, 7 of the 14 original MOE vectors favored the grid-4 forcing while only 2 of the 14 favored grid 1. The rest were not decisive in that increases in the area of overlap were accompanied by corresponding increases in the area of false negatives or positives. Following Warner et al. (2004), the probability that a random draw would yield improved scores for 7 of 14 forecasts was 0.15, or 15%, indicating some degree of confidence that the grid-4 forcing was an improvement. The MOE results are reflected in the threat scores (Table 3) in that the values are considerably higher for grid 4 at all thresholds. The bias scores are also higher for grid 4, though again the grid-1 bias was artificially reduced because of the undersampling problem discussed previously.

The HPAC forecasts evaluated by Chang et al. (2003) reflect the best case scenario when high-density observations are available in the immediate vicinity of the release. To the extent that the meteorological variables were represented by the observations, the dosage differences between the two experiments provide an estimate of the uncertainty incurred from using the atmospheric model forecasts as input to HPAC. The observationally forced forecasts contained less scatter and no missed forecasts, and the observational FAC2 value of 0.595 (Chang et al. 2003) was considerably above those reported in Table 2. Clearly, the atmospheric model did not mimic the MEDA observations, and it should not be expected to do so. In operational situations, the number of relevant observations within a few kilometers of a given release will likely be lower than the current field experiment. Chang et al. (2003) showed that HPAC forecasts varied considerably when data from even one MEDA station was withheld. The utility of the atmospheric model forecasts will vary in relation to, among other things, the complexity of the terrain and the number and proximity of the observations. If no observations exist, or if they are not available, the atmospheric model forecasts may represent the only available meteorological input. Even if one or two observations exist, they may be located 50–60 km from the affected area. In regions of complex terrain, these observations may not represent the winds affecting the contaminant migration. Under these conditions the atmospheric model may provide useful information for the dispersion forecast.

5. Forecast example

The resolution-dependent behavior of the atmospheric forecasts, as well as the limitations of the statistical measurements, can be illustrated by examining individual cases. In this section, the atmospheric and dispersion forecasts are investigated for trial 3, which was conducted during the early morning of 8 November 1996. SF6 was released from site N3 at 1200 UTC (0400 PST) and tracked from north to south across the network (Figs. 8 –10). COAMPS was initialized at 0000 UTC and the 12–16-h forecasts were used as input to HPAC. The 12-h forecasts from the coarse and fine grids as well as the mesonet observations are shown in Figs. 8 and 9. The observed winds (thick barbs in Figs. 8, 9) depict a basic nocturnal drainage pattern, with flow descending from high terrain and converging along the valley floor (note the topography is best depicted on the high-resolution grid in Fig. 9).

On grid 4 (Fig. 9), the flow structure was generally well simulated despite some large errors at individual stations. Over the western portions of the network the descending northerly and northwesterly winds were well predicted. However, the two northeasterly wind observations just east of the valley axis (near 37.00°N, 116.05°W) were not depicted in the model. Here, the model may have underestimated the strength of the drainage on the eastern side of the network, resulting in the eastward displacement of the convergence zone between the opposing flows. Predicted wind speeds were generally 1–2 m s−1 too low over the northern portion of the network, though speeds on the valley floor were very close to the observations.

On grid 1, only one grid point was located directly within the network near 37.10°N, 116.20°W in Fig. 8. Although the HPAC fields are influenced by the four surrounding COAMPS grid points through the bilinear interpolation, the atmospheric forcing was very smooth. The entire valley was essentially a subgrid-scale feature on the 27-km grid, and only a very general north-to-south slope remained. The predicted synoptic northeasterly flow verified well against the MEDA stations in the eastern third of the network. However, the northwesterly drainage flow farther west was not predicted because of the highly smoothed model topography.

The general track of the HPAC plume was better simulated by the fine grid forcing (Fig. 10b), though the dosages were higher and more widespread than was observed because of the weak wind bias. Note that the simulated winds on grid 4 were nearly calm over the plume track just south and west of N3 while observed winds were between 2.5 and 5 m s−1 (Fig. 9). The HPAC plume driven by the mesonet winds (Fig. 10c) represents the best approximation of the observed track because of the high spatial and temporal frequency of the wind observations.3 The dosages were comparable to the whole-air sampler observations, though the sampled plume was about 2 km broader at line 2 (∼10 km south of N3) than the HPAC simulation. The coarse grid plume (Fig. 10a) was linear and relatively featureless, reflecting the uniform wind field. Dosage values were lower than those from the fine grid forcing because of the higher winds. However, the cloud completely missed the location of all but the closest of the sampling lines located south of the starting point, and instead extended southwestward exiting the western edge of the HPAC grid approximately 25 km from the starting point.

6. Summary and conclusions

The results of this study indicate that mesoscale models provide useful information for dispersion applications, and that increasing the grid resolution can improve the trajectory forecasts. Dosage statistics show that HPAC forecasts driven by the high-resolution COAMPS output had fewer missed forecasts, greater overlap between the predicted and observed plumes, and a greater number of predictions within a factor of 2 of the observations. Visual inspection of the dosage trajectories also indicated that the high-resolution output produced more realistic results. The FB and NMSE statistics did not indicate significant improvement primarily because of the large number of missed forecasts with the coarse grid forcing. Error magnitudes were underestimated because the plumes missed the sampling network. The MOE and FAC2 estimates were mostly unaffected by the undersampling errors because of their dependence on the area of overlap. Dosage overestimates were less of a problem with the coarse grid forcing because of a negative wind speed bias on the fine grid. Bias errors of this nature are generally correctable through adjustments to the physics packages.

Although the dosage statistics showed improvement with increasing grid resolution, wind speed and direction statistics derived from instantaneous comparisons at the surface and upper-air stations did not. Murphy (1988) showed that the RMS error is dependent upon the individual variances of the two fields being compared, as well as the correlation between the fields. As grid resolution increases, forecast variance increases and correlations between the predicted and observed fields generally decrease. As a result, RMS scores become increasingly sensitive to small displacements and quasi-random errors. Thus, as Rife and Davis (2005) point out, RMS scores are not good indicators of forecast skill at high resolution. Furthermore, dispersion forecasts are sensitive to integrated errors along a trajectory. Uncorrelated errors may largely cancel one another along a trajectory, and instantaneous, pointwise statistics do not measure error correlations. Errors on coarse grids are likely to be correlated over large distances, simply because of the area encompassed by each grid cell.

Realistic depiction of the topography likely played a major role in the improved forecasts. The entire valley was a subgrid-scale feature on the coarse grid, and any localized topographically forced flows were not well simulated. The general flow field was far more realistically depicted by the fine grid, though some of the mesoscale features were misplaced. Since topography and other fixed physiographic features such as coastlines and even urban areas essentially represent a boundary value problem, resolving them improves the forecast if the synoptic-scale flow is well simulated. Mass et al. (2002) point out that if the synoptic pattern is not well simulated then resolution of the mesoscale could lead to large errors. Clearly, more verification tests of this nature are required. The synoptic flow was generally well simulated over the DP26 experiment, perhaps because the experiments were conducted on relatively quiescent days.

The high-resolution model forcing produced useful results, but dispersion forecasts driven directly by the mesonet winds produced superior forecasts. These results are not surprising given high time and space resolution of the observations. Soundings were released within 30 km and 1 h of the release point, and eight mesonet stations supplied wind information at 15-min intervals. Operationally, however, observations will not always be so plentiful. Many areas are not well instrumented. Power and communications may be disrupted in an emergency, and in battlefield situations no observations may exist at all. In these cases, the atmospheric model forecasts may be the only tool available. Models are also the only tools available to provide predictions for planning purposes.

In the future, the fusion of atmospheric model analyses with the available observations will provide improved forcing for dispersion forecasts. Until recently, the implementation of atmospheric analyses near the surface was not feasible because computer constraints required relatively coarse horizontal grid spacings (greater than 5 km). At that resolution, large inconsistencies often exist between the simulated and the true topography. Entire mountains and valleys are not represented and significant aliasing occurs if the observations are not heavily averaged. Furthermore, differences between simulated and true ground height require assumptions to correct the boundary layer wind and temperature profiles. High-resolution grids (grid spacings less than 5 km) mitigate the problem significantly by resolving the topography. New three-dimensional variational assimilation systems like the Naval Research Laboratory Atmospheric Variational Data Assimilation System (NAVDAS) (Daley and Barker 2001) provide realistic covariance functions that vary according to physiological and even meteorological differences. Isolated observations are better incorporated because the covariance functions can be more realistically simulated. This surface data analysis is currently being implemented in NAVDAS to provide real-time, realistic flow fields that can be used directly in HPAC and the Joint Effects Model.

Acknowledgments

This research is supported by the Defense Threat Reduction Agency (DTRA) through Program Element B04MSB1001. The NOGAPS data archival was supported in part by a grant of high-performance computing (HPC) time from the U.S. Department of Defense Major Shared Resource Center, Stennis Space Center, Mississippi. The work was performed on a Sun F12000 computer. Computing time was also supported by an HPC grant from Fleet Numerical Meteorological and Oceanographic Center (FNMOC) as part of their Distributed Center for computing. The work was performed on an SGI Origin supercomputer. The DP26 data were supplied by the George Mason University (GMU) dispersion data archive, and Dr. Joseph C. Chang of the Homeland Security Institute provided valuable input through scientific discussions and assistance with the dataset. Three anonymous reviewers made valuable suggestions to improve this work.

REFERENCES

  • Baldwin, M. E., and J. S. Kain, 2004: Examining the sensitivity of various performance measures. Preprints, 17th Conf. on Probability and Statistics in the Atmospheric Sciences, Seattle, WA, Amer. Meteor. Soc., CD-ROM, 2.9.

  • Barker, E. H., 1992: Design of the Navy’s multivariate optimum interpolation analysis system. Wea. Forecasting, 7 , 220231.

  • Bernardet, L. R., L. D. Grasso, J. E. Nachamkin, C. A. Finley, and W. R. Cotton, 2000: Simulating convective events using a high-resolution mesoscale model. J. Geophys. Res., 105 , 1496314982.

    • Search Google Scholar
    • Export Citation
  • Biltoft, C. A., 1998: Dipole Pride 26: Phase II of Defense Special Weapons Agency transport and dispersion model validation. DPG Doc. DPG-FR-98-001, prepared for Defense Threat Reduction Agency by Meteorology and Obscurants Divisions, West Desert Test Center, U.S. Army Dugway Proving Ground, Dugway, UT, 77 pp.

  • Casati, B., G. Ross, and D. B. Stephenson, 2004: A new intensity-scale approach for the verification of spatial precipitation forecasts. Meteor. Appl., 11 , 141154.

    • Search Google Scholar
    • Export Citation
  • Chang, J. C., P. Franzese, K. Chayantrakom, and S. R. Hanna, 2003: Evaluations of CALPUFF, HPAC, and VLSTRACK with two mesoscale field datasets. J. Appl. Meteor., 42 , 453466.

    • Search Google Scholar
    • Export Citation
  • Colle, B. A., C. F. Mass, and D. Ovens, 2001: Evaluation of the timing and strength of MM5 and Eta surface trough passages over the eastern Pacific. Wea. Forecasting, 16 , 553572.

    • Search Google Scholar
    • Export Citation
  • Daley, R., and E. Barker, 2001: NAVDAS Source Book 2001: NRL Atmospheric Variational Data Assimilation System. NRL/PU/7530—01-441, Naval Research Laboratory, Monterey, CA, 163 pp.

  • Davies, H. C., 1976: A lateral boundary formulation for multi-level prediction models. Quart. J. Roy. Meteor. Soc., 102 , 405418.

  • Doyle, J. D., M. A. Shapiro, Q. Jiang, and D. L. Bartels, 2005: Large amplitude mountain wave breaking over Greenland. J. Atmos. Sci., 62 , 31063126.

    • Search Google Scholar
    • Export Citation
  • DTRA, 1999: HPAC hazard prediction and assessment capability, version 3.2. Defense Threat Reduction Agency, Alexandria, VA, 406 pp.

  • Ebert, E., and J. L. McBride, 2000: Verification of precipitation in weather systems: Determination of systematic errors. J. Hydrol., 239 , 179202.

    • Search Google Scholar
    • Export Citation
  • Efron, B., 1987: Better bootstrap confidence intervals. J. Amer. Stat. Assoc., 82 , 171185.

  • Efron, B., and R. J. Tibshirani, 1993: An Introduction to the Bootstrap. Monographs on Statistics and Applied Probability, No. 57, Chapman and Hall, 436 pp.

    • Search Google Scholar
    • Export Citation
  • Hanna, S. R., and R. Yang, 2001: Evaluations of mesoscale models’ simulations of near-surface winds, temperature gradients, and mixing depths. J. Appl. Meteor., 40 , 10951104.

    • Search Google Scholar
    • Export Citation
  • Hanna, S. R., J. C. Chang, and D. G. Strimaitis, 1993: Hazardous gas model evaluation with field observations. Atmos. Environ., 27A , 22652285.

    • Search Google Scholar
    • Export Citation
  • Harvey Jr., L. O., K. R. Hammond, C. M. Lusk, and E. F. Mross, 1992: Application of signal detection theory to weather forecasting behavior. Mon. Wea. Rev., 120 , 863883.

    • Search Google Scholar
    • Export Citation
  • Hodur, R. M., 1997: The Naval Research Laboratory’s Coupled Ocean/Atmosphere Mesoscale Prediction System (COAMPS). Mon. Wea. Rev., 125 , 14141430.

    • Search Google Scholar
    • Export Citation
  • Hogan, T. F., M. S. Peng, J. A. Ridout, and W. M. Clune, 2002: A description of the impact of changes to NOGAPS convection parameterization and the increase in resolution to T239L30. NRL Memo. Rep. NRL/MR/7530-02-52, Naval Research Laboratory, Monterey, CA, 10 pp.

  • Jiang, Q., J. D. Doyle, and R. B. Smith, 2005: Blocking, descent and gravity waves: Observations and modeling of a MAP northerly föen event. Quart. J. Roy. Meteor. Soc., 131 , 675701.

    • Search Google Scholar
    • Export Citation
  • Louis, J-F., 1979: A parametric model of vertical eddy fluxes in the atmosphere. Bound.-Layer Meteor., 17 , 187202.

  • Manobianco, J., and P. A. Nutter, 1999: Evaluation of the 29-km Eta Model. Part II: Subjective verification over Florida. Wea. Forecasting, 14 , 1837.

    • Search Google Scholar
    • Export Citation
  • Mass, C. F., D. Ovens, K. Westrick, and B. A. Colle, 2002: Does increasing horizontal resolution produce more skillful forecasts? The results of two years of real-time numerical weather prediction over the Pacific Northwest. Bull. Amer. Meteor. Soc., 83 , 407430.

    • Search Google Scholar
    • Export Citation
  • Mellor, G. L., and T. Yamada, 1974: A hierarchy of turbulence closure models for planetary boundary layers. J. Atmos. Sci., 31 , 17911806.

    • Search Google Scholar
    • Export Citation
  • Mosca, S., G. Graziani, W. Klug, R. Bellasio, and R. Bianconni, 1998: A statistical methodology for the evaluation of long-range dispersion models: An application to the ETEX exercise. Atmos. Environ., 32 , 43074324.

    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1988: Skill scores based on the mean square error and their relationships to the correlation coefficient. Mon. Wea. Rev., 116 , 24172424.

    • Search Google Scholar
    • Export Citation
  • Nachamkin, J. E., 2004: Mesoscale verification using meteorological composites. Mon. Wea. Rev., 132 , 941955.

  • Pullen, J. D., J. P. Boris, T. Young, G. Patnaik, and J. Iselin, 2005: A comparison of contaminant plume statistics from a Gaussian puff and urban CFD model for two large cities. Atmos. Environ., 39 , 10491068.

    • Search Google Scholar
    • Export Citation
  • Rife, D. L., and C. A. Davis, 2005: Verification of temporal variations in mesoscale numerical wind forecasts. Mon. Wea. Rev., 133 , 33683381.

    • Search Google Scholar
    • Export Citation
  • Sykes, R. I., S. F. Parker, D. S. Henn, C. P. Cerasoli, and L. P. Santos, 1998: PC-SCIPUFF version 1.2PD, technical documentation. ARAP Rep. 718, Titan Research and Technology Division, Titan Corp., Princeton, NJ, 172 pp.

  • Warner, S., N. Platt, and J. F. Heagy, 2004: User-oriented two-dimensional measure of effectiveness for the evaluation of transport and dispersion models. J. Appl. Meteor., 43 , 5873.

    • Search Google Scholar
    • Export Citation
  • Warner, T. T., R-S. Sheu, J. F. Bowers, R. I. Sykes, G. C. Dodd, and D. S. Henn, 2002: Ensemble simulations with coupled atmospheric dynamic and dispersion models: Illustrating uncertainties in dosage simulations. J. Appl. Meteor., 41 , 488504.

    • Search Google Scholar
    • Export Citation
  • Watson, T. B., R. E. Keislar, B. Reese, D. H. George, and C. A. Biltoft, 1998: The Defense Special Weapons Agency Dipole Pride 26 field experiment. NOAA/Air Resources Laboratory Tech. Memo. ERL ARL-225, 90 pp.

  • White, G. B., J. Paegle, W. J. Steenburgh, J. D. Horel, R. T. Swanson, L. K. Cook, D. J. Onton, and J. G. Miles, 1999: Short-term forecast validation of six models. Wea. Forecasting, 14 , 84108.

    • Search Google Scholar
    • Export Citation
  • Zepeda-Arce, J., E. Foufoula-Georgiou, and K. K. Droegemeier, 2000: Space–time rainfall organization and its role in validating quantitative precipitation forecasts. J. Geophys. Res., 105 , 1012910146.

    • Search Google Scholar
    • Export Citation

Fig. 1.
Fig. 1.

Terrain elevation (m) at the DP26 experiment located at the Yucca Flat test site in NV. Whole-air samplers are depicted by the thick lines, with 30 samplers per line. Solid circles represent surface meteorological MEDA stations, while the open triangles represent the four possible gas release sites. The soundings used in this study were released from BJY. The map covers an area of 30 km × 35 km with the southwest corner roughly corresponding to 36.9°N, 116.3°W. This figure was taken from Chang et al. (2003) through the courtesy of J. C. Chang.

Citation: Journal of Applied Meteorology and Climatology 46, 11; 10.1175/2007JAMC1570.1

Fig. 2.
Fig. 2.

The nested grids used for the atmospheric model simulation are shown. Horizontal grid spacing for the innermost grid (grid 4) was 1 km. Spacings for the remaining grids were 3, 9, and 27 km, respectively.

Citation: Journal of Applied Meteorology and Climatology 46, 11; 10.1175/2007JAMC1570.1

Fig. 3.
Fig. 3.

Statistics comparing the predicted and observed wind speed at 10 m AGL are shown. The statistics for each forecast hour were averaged over all available MEDA stations for all of the 14 DP26 trials. (a) Bias statistics and (b) RMS (m s−1), with dashed and solid lines indicating coarse and fine grid values, respectively. (c) The average observed wind speed over all of the MEDA stations (m s−1).

Citation: Journal of Applied Meteorology and Climatology 46, 11; 10.1175/2007JAMC1570.1

Fig. 4.
Fig. 4.

Same as in Fig. 3, but the 10-m direction (a) bias and (b) RMS (°).

Citation: Journal of Applied Meteorology and Climatology 46, 11; 10.1175/2007JAMC1570.1

Fig. 5.
Fig. 5.

Upper-air statistics comparing the atmospheric forecasts against the soundings released from BJY (Fig. 1). Bias and RMS errors are represented by thin and thick lines, respectively. Errors on grid 4 and grid 1 are represented by solid and dashed lines, respectively. Errors from all forecast times (0–18 h) are combined because of the lack of soundings at a given time. (a) Temperature errors (°C), (b) speed errors (m s−1), and (c) direction errors (°).

Citation: Journal of Applied Meteorology and Climatology 46, 11; 10.1175/2007JAMC1570.1

Fig. 6.
Fig. 6.

Scatter diagrams of predicted vs observed 3-h integrated maximum SF6 dosages (ppt h) along the whole-air sampling lines. HPAC forecasts using atmospheric forcing from the (a) coarse grid and (b) fine grid.

Citation: Journal of Applied Meteorology and Climatology 46, 11; 10.1175/2007JAMC1570.1

Fig. 7.
Fig. 7.

The measure of effectiveness 95% confidence regions for the 14 coarse grid (black) and fine grid (gray) releases based on a dosage threshold of 30 ppt h.

Citation: Journal of Applied Meteorology and Climatology 46, 11; 10.1175/2007JAMC1570.1

Fig. 8.
Fig. 8.

The 12-h forecast from grid 1 and the accompanying MEDA observations, both valid at 1200 UTC 8 Nov 1996. Model topography (m) is shaded and forecast winds (m s−1) are depicted by the thin barbs. The only grid point within the DP26 region was located near (37.1°N, 116.2°W). Observed winds (m s−1) are depicted by thick barbs. For all barbs, each half barb increment equals 2.5 m s−1. Note that data from additional MEDA stations beyond those used in the RMS and bias statistics were available at this time.

Citation: Journal of Applied Meteorology and Climatology 46, 11; 10.1175/2007JAMC1570.1

Fig. 9.
Fig. 9.

As in Fig. 8, except that topography and wind forecasts on grid 4 are displayed.

Citation: Journal of Applied Meteorology and Climatology 46, 11; 10.1175/2007JAMC1570.1

Fig. 10.
Fig. 10.

Three-hour HPAC forecasts of total SF6 dosage (ppt h) from the 8 Nov trial are depicted according to the legend in (a). Atmospheric input is from (a) grid 1, (b) grid 4, and (c) the observations. Note that the gas cloud in (a) advected off the grid after 1.21 h. The distance scale at the bottom of each figure represents 10 km.

Citation: Journal of Applied Meteorology and Climatology 46, 11; 10.1175/2007JAMC1570.1

Table 1.

Summary of atmospheric model errors in comparison with the MEDA winds measured at 10 m AGL. Values have been averaged for all forecast times (0–18 h) at 3-h intervals and all 14 DP26 trials.

Table 1.
Table 2.

Performance measures are summarized by comparing the HPAC forecasts using atmospheric forcing from grid 1 and grid 4 with the whole-air samplers for the maximum dosage (ppt h) anywhere along a sampling line. Measures include the fraction of predictions within a factor of 2 of the observations, normalized mean square error, fractional bias, correlation between the observed and forecast dosage, and the Spearman rank correlation of the same quantity. The average, standard deviation, and highest and second highest values are also shown. The sample size is 42.

Table 2.
Table 3.

Performance measures are summarized by comparing the HPAC forecasts using atmospheric forcing from grid 1 and grid 4 with the whole-air samplers for the indicated dosage thresholds. The threat score and bias are defined by Eqs. (9) and (10) in the text. Each score represents the average over the 14 Dipole Pride test cases.

Table 3.

1

COAMPS is a registered trademark of the Naval Research Laboratory.

2

Results for the 100 and 500 ppt h thresholds were similar though the number of points was considerably reduced.

3

Gridded observations were obtained from the archive at George Mason University, Fairfax, Virginia.

Save