This work was supported through NSSL and NOAA Earth System Research Laboratory’s Global Systems Division (GSD) director’s discretionary funds and through NOAA/Office of Oceanic and Atmospheric Research under NOAA–University of Oklahoma Cooperative Agreement NA08OAR4320904, U.S. Department of Commerce. Dr. Kim Elmore is thanked for his very helpful discussion of confidence intervals. Tara Jensen and everyone else associated with the use of MET at the DTC are thanked for their help in using MET. Drs. Jack Kain and Fred Carr are thanked for their insight into model verification. Model Evaluation Tools (MET) was developed at the National Center for Atmospheric Research (NCAR) through a grant from the U.S. Air Force Weather Agency (AFWA). NCAR is sponsored by the U.S. National Science Foundation. The CAPS forecasts were supported by the NOAA CSTAR grant to CAPS, and were produced at the National Institute of Computational Science (NICS) at the University of Tennessee, and at the Oklahoma Supercomputing Center for Research and Education (OSCER). Kevin Thomas and Fanyou Kong were instrumental in producing the forecasts.
Baldwin, M. E., , and Kain J. S. , 2006: Sensitivity of several performance measures to displacement error, bias, and event frequency. Wea. Forecasting, 21, 636–648.
Casati, B., 2010: New developments of the intensity-scale technique within the Spatial Verification Methods Intercomparison Project. Wea. Forecasting, 25, 113–143.
Casati, B., , Ross G. , , and Stephenson D. B. , 2004: A new intensity-scale approach for the verification of spatial precipitation forecasts. Meteor. Appl., 11, 141–154.
Clark, A. J., and Coauthors, 2012: An overview of the 2010 Hazardous Weather Testbed Experimental Forecast Program Spring Experiment. Bull. Amer. Meteor. Soc., 93, 55–74.
Doswell, C. A., III, , Davies-Jones R. , , and Keller D. L. , 1990: On summary measures of skill in rare event forecasting based on contingency tables. Wea. Forecasting, 5, 576–585.
DTC, 2011: MET: Version 3.0 Model Evaluation Tools users guide. Developmental Testbed Center, Boulder, CO, 209 pp. [Available at http://www.dtcenter.org/met/users/docs/overview.php.]
Easterling, D. R., , and Robinson P. J. , 1985: The diurnal variation of thunderstorm activity in the United States. J. Climate Appl. Meteor., 24, 1048–1058.
Ebert, E. E., 2008: Fuzzy verification of high-resolution gridded forecasts: A review and proposed framework. Meteor. Appl., 15, 51–64.
Gilleland, E., , Ahijevych D. , , Brown B. G. , , Casati B. , , and Ebert E. E. , 2009: Intercomparison of spatial forecast verification methods. Wea. Forecasting, 24, 1416–1430.
Gilleland, E., , Ahijevych D. , , Brown B. G. , , and Ebert E. E. , 2010: Verifying forecasts spatially. Bull. Amer. Meteor. Soc., 91, 1365–1373.
Hamill, T. M., , and Juras J. , 2006: Measuring forecast skill: Is it real skill or is it the varying climatology? Quart. J. Roy. Meteor. Soc., 132, 2905–2923.
Hu, M., , Xue M. , , and Brewster K. , 2006a: 3DVAR and cloud analysis with WSR-88D level-II data for the prediction of the Fort Worth, Texas, tornadic thunderstorms. Part I: Cloud analysis and its impact. Mon. Wea. Rev., 134, 675–698.
Hu, M., , Xue M. , , Gao J. , , and Brewster K. , 2006b: 3DVAR and cloud analysis with WSR-88D level-II data for the prediction of the Fort Worth, Texas, tornadic thunderstorms. Part II: Impact of radial velocity analysis via 3DVAR. Mon. Wea. Rev., 134, 699–721.
Kain, J. S., and Coauthors, 2010: Assessing advances in the assimilation of radar data and other mesoscale observations within a collaborative forecasting–research environment. Wea. Forecasting, 25, 1510–1521.
Lichtenfeld, S., , Maier M. A. , , Elliot A. J. , , and Pekrun R. , 2009: The semantic red effect: Processing the word red undermines intellectual performance. J. Exp. Soc. Psychol., 45, 1273–1276.
Mittermaier, M., , and Roberts N. , 2010: Intercomparison of spatial forecast verification methods: Identifying skillful spatial scales using the fractions skill score. Wea. Forecasting, 25, 343–354.
Roberts, N. M., , and Lean H. W. , 2008: Scale-selective verification of rainfall accumulations from high-resolution forecasts of convective events. Mon. Wea. Rev., 136, 78–97.
Schwartz, C. S., and Coauthors, 2009: Next-day convection-allowing WRF model guidance: A second look at 2-km versus 4-km grid spacing. Mon. Wea. Rev., 137, 3351–3372.
Stephenson, D. B., , Casati B. , , Ferro C. A. T. , , and Wilson C. A. , 2008: The extreme dependency score: A non-vanishing measure for forecasts of rare events. Meteor. Appl., 15, 41–50.
Vasiloff, S. V., and Coauthors, 2007: Improving QPE and very short term QPF: An initiative for a community-wide integrated approach. Bull. Amer. Meteor. Soc., 88, 1899–1911.
Wallace, J. M., 1975: Diurnal variations in precipitation and thunderstorm frequency over the conterminous United States. Mon. Wea. Rev., 103, 406–419.
Wurman, J. D., , Dowell C. A. III, , Richardson Y. , , Markowski P. , , Burgess D. , , Wicker L. , , and Bluestein H. , 2012: The Second Verification of the Origin of Rotation in Tornadoes Experiment: VORTEX 2. Bull. Amer. Meteor. Soc., 93, 1147–1170.
Xue, M., , Wang D.-H. , , Gao J.-D. , , Brewster K. , , and Droegemeier K. K. , 2003: The Advanced Regional Prediction System (ARPS) storm-scale numerical weather prediction and data assimilation. Meteor. Atmos. Phys., 82, 139–170.
Xue, M., and Coauthors, 2009: CAPS realtime multi-model convection-allowing ensemble and 1-km convection-resolving forecasts for the NOAA Hazardous Weather Testbed 2009 Spring Experiment. Preprints, 23rd Conf. on Weather Analysis and Forecasting/19th Conf. on Numerical Weather Prediction, Omaha, NE, Amer. Meteor. Soc., 16A.2. [Available online at http://ams.confex.com/ams/pdfpapers/154323.pdf.]
Xue, M., and Coauthors, 2010: CAPS realtime storm scale ensemble and high resolution forecasts for the NOAA Hazardous Weather Testbed 2010 Spring Experiment. Preprints, 25th Conf. on Severe Local Storms, Denver, CO, Amer. Meteor. Soc., 7B.3. [Available online at https://ams.confex.com/ams/25SLS/webprogram/Paper176056.html.]
It is worth noting as a caveat that the climatology varies as the verification domain is moved from location to location (Hamill and Juras 2006).
For each forecast hour, the datasets were aggregated (Mittermaier and Roberts 2010) together for each combination of neighborhood width and threshold. Knowing the FBS, FSS, and N values, the summations in the numerator and denominator of Eq. (1) were calculated and aggregated separately for the individual datasets. The aggregated summations were then used to calculate FSS.
Frequency biases are highly dependent on what microphysical scheme is used in a model, so it should be noted that these bias results and their effects on verification metrics are specific only to these models (i.e., different findings might result if different cloud microphysics schemes are used).
The bias-adjusted GSS from Mesinger (2008) was computed (not shown) and depicted smaller differences between CN and C0 through FH 6–8.
The base-rate bar graphs will be excluded from the results and discussion from this point forward, but they will be included in the figures for the reader’s interest.
The left ordinate in the ISS plots represents the spatial scale of the binary forecast errors and not just the neighborhood size as for the neighborhood method.