• Benjamin, S., , Weygandt S. S. , , Brown J. M. , , and DiMego G. , 2011: Beyond the 2011 Rapid Refresh: Hourly updated numerical weather prediction guidance from NOAA for aviation from 2012–2020. Preprints, 15th Conf. on Aviation, Range, and Aerospace Meteorology, Los Angeles, CA, Amer. Meteor. Soc., 11.1. [Available online at https://ams.confex.com/ams/14Meso15ARAM/techprogram/paper_191236.htm.]

  • Burghardt, B. J., , Evans C. , , and Roebber P. J. , 2014: Assessing the predictability of convection initiation in the high plains using an object-based approach. Wea. Forecasting, 29, 403418, doi:10.1175/WAF-D-13-00089.1.

    • Search Google Scholar
    • Export Citation
  • Cai, H., , Steiner M. , , Pinto J. , , He P. , , Dettling S. , , Albo D. , , Gotway J. H. , , and Brown B. , 2009: Scientific assessment and diagnostic evaluation of CoSPA 0–8 hour blended forecasts. The World Weather Research Program Symposium on Nowcasting and Very Short Range Forecasting, Whistler, BC, Canada, WMO, 7.2. [Available online at https://www.wmo.int/pages/prog/arep/wwrp/new/documents/WSN09_program_booklet.pdf.]

  • Cai, H., , Steiner M. , , Pinto J. , , Brown B. G. , , and He P. , 2011: Assessment of numerical weather prediction model storm forecasts using an object-based approach. Preprints, 24th Conf. on Weather Analysis and Forecasting/20th Conf. on Numerical Weather Prediction, Seattle, WA, Amer. Meteor. Soc., 8A.5 [Available online at http://ams.confex.com/ams/91Annual/webprogram/Paper182479.html.]

  • Carbone, R. E., , Tuttle J. D. , , Ahijevych D. , , and Trier S. B. , 2002: Inferences of predictability associated with warm season precipitation episodes. J. Atmos. Sci., 59, 20332056, doi:10.1175/1520-0469(2002)059<2033:IOPAWW>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Casati, B., , Ross G. , , and Stephenson D. B. , 2004: A new intensity-scale approach for the verification of spatial precipitation forecasts. Meteor. Appl., 11, 141154, doi:10.1017/S1350482704001239.

    • Search Google Scholar
    • Export Citation
  • Case, J. L., , Kumar S. V. , , Srikishen J. , , and Jedlovec G. J. , 2011: Improving numerical weather predictions of summertime precipitation over the southeastern United States through a high-resolution initialization of the surface state. Wea. Forecasting, 26, 785807, doi:10.1175/2011WAF2222455.1.

    • Search Google Scholar
    • Export Citation
  • Clark, A. J., , Bullock R. G. , , Jensen T. L. , , Xue M. , , and Kong F. , 2014: Application of object-based time-domain diagnostics for tracking precipitation systems in convection-allowing models. Wea. Forecasting, 29, 517542, doi:10.1175/WAF-D-13-00098.1.

    • Search Google Scholar
    • Export Citation
  • Curtis, R. A., , Weygandt S. S. , , Smirnova T. G. , , Benjamin S. , , Hofmann P. , , James E. P. , , and Koch D. A. , 2010: High Resolution Rapid Refresh (HRRR): Recent enhancements and evaluation during the 2010 convective season. Preprints, 25th Conf. on Severe Local Storms, Denver, CO, Amer. Meteor. Soc., 9.2 [Available online at https://ams.confex.com/ams/25SLS/techprogram/paper_175722.htm.]

  • Davis, C., , Brown B. G. , , and Bullock R. , 2006a: Object-based verification of precipitation forecasts. Part I: Methods and application to mesoscale rain areas. Mon. Wea. Rev., 134, 17721784, doi:10.1175/MWR3145.1.

    • Search Google Scholar
    • Export Citation
  • Davis, C., , Brown B. G. , , and Bullock R. , 2006b: Object-based verification of precipitation forecasts. Part II: Application to convective rain systems. Mon. Wea. Rev., 134, 17851795, doi:10.1175/MWR3146.1.

    • Search Google Scholar
    • Export Citation
  • Davis, C., , Brown B. G. , , Bullock R. , , and Halley-Gotway J. , 2009: The method for object-based diagnostic evaluation (MODE) applied to numerical forecasts from the 2005 NSSL/SPC Spring Program. Wea. Forecasting, 24, 12521267, doi:10.1175/2009WAF2222241.1.

    • Search Google Scholar
    • Export Citation
  • Dixon, M., , and Wiener G. , 1993: TITAN: Thunderstorm Identification, Tracking, Analysis and Nowcasting—A radar-based methodology. J. Atmos. Oceanic Technol., 10, 785797, doi:10.1175/1520-0426(1993)010<0785:TTITAA>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Dupree, W., , Morse D. , , Chan M. , , Tao X. , , Reiche C. , , Iskenderian H. , , and Wolfson M. , 2009: The 2008 CoSPA forecast demonstration. Preprints, Aviation, Range, and Aerospace Meteorology Special Symp. on Weather–Air Traffic Management Integration, Phoenix, AZ, Amer. Meteor. Soc., P1.1. [Available online at https://ams.confex.com/ams/pdfpapers/151488.pdf.]

  • Ebert, E. E., 2008: Fuzzy verification of high resolution gridded forecasts: A review and proposed framework. Meteor. Appl., 15, 5164, doi:10.1002/met.25.

    • Search Google Scholar
    • Export Citation
  • Ebert, E. E., , and McBride J. L. , 2000: Verification of precipitation in weather systems: Determination of systematic errors. J. Hydrol., 239, 179202, doi:10.1016/S0022-1694(00)00343-7.

    • Search Google Scholar
    • Export Citation
  • Gilleland, E., , Ahijevych D. , , Brown B. G. , , Casati B. , , and Ebert E. E. , 2009: Intercomparison of spatial forecast verification methods. Wea. Forecasting, 24, 14161430, doi:10.1175/2009WAF2222269.1.

    • Search Google Scholar
    • Export Citation
  • Greene, D. R., , and Clark R. A. , 1972: Vertically integrated liquid water—A new analysis tool. Mon. Wea. Rev., 100, 548552, doi:10.1175/1520-0493(1972)100<0548:VILWNA>2.3.CO;2.

    • Search Google Scholar
    • Export Citation
  • Huang, L., , and Meng Z. , 2014: Quality of the target area for metrics with different nonlinearities in a mesoscale convective system. Mon. Wea. Rev., 142, 23792397, doi:10.1175/MWR-D-13-00244.1.

    • Search Google Scholar
    • Export Citation
  • Johnson, A., , and Wang X. , 2013: Object-based evaluation of a storm-scale ensemble during the 2009 NOAA Hazardous Weather Testbed Spring Experiment. Mon. Wea. Rev., 141, 10791098, doi:10.1175/MWR-D-12-00140.1.

    • Search Google Scholar
    • Export Citation
  • Johnson, A., , Wang X. , , Kong F. , , and Xue M. , 2013: Object-based evaluation of the impact of horizontal grid spacing on convection-allowing forecasts. Mon. Wea. Rev., 141, 34133425, doi:10.1175/MWR-D-13-00027.1.

    • Search Google Scholar
    • Export Citation
  • Phillips, C., , Pinto J. , , Steiner M. , , Rasmussen R. , , Oien N. , , and Bateman R. , 2008: Statistical assessment of explicit model forecasts of convection using a new object-based approach. Preprints, 13th Conf. on Aviation, Range, and Aerospace Meteorology, New Orleans, LA, Amer. Meteor. Soc., 11.5. [Available online at https://ams.confex.com/ams/pdfpapers/134547.pdf.]

  • Pinto, J., , Dupree W. , , Weygandt S. , , Wolfson M. , , Benjamin S. , , and Steiner M. , 2010: Advances in the Consolidated Storm Prediction for Aviation (CoSPA). Preprints, 14th Conf. on Aviation, Range, and Aerospace Meteorology, Atlanta, GA, Amer. Meteor. Soc., J11.2. [Available online at https://ams.confex.com/ams/pdfpapers/163811.pdf.]

  • Pinto, J., , Grim J. , , and Steiner M. , 2015: Assessment of the High-Resolution Rapid Refresh model’s ability to predict mesoscale convective systems using object-based evaluation. Wea. Forecasting, 30, 892913, doi:10.1175/WAF-D-14-00118.1.

    • Search Google Scholar
    • Export Citation
  • Roberts, N. M., , and Lean H. W. , 2008: Scale-selective verification of rainfall accumulations from high-resolution forecasts of convective events. Mon. Wea. Rev., 136, 7897, doi:10.1175/2007MWR2123.1.

    • Search Google Scholar
    • Export Citation
  • Skamarock, W. C., 2004: Evaluating mesoscale NWP models using kinetic energy spectra. Mon. Wea. Rev., 132, 30193032, doi:10.1175/MWR2830.1.

    • Search Google Scholar
    • Export Citation
  • Skamarock, W. C., , Klemp J. B. , , Dudhia J. , , Gill D. O. , , Barker D. M. , , Wang W. , , and Powers J. G. , 2005: A description of the Advanced Research WRF version 2. NCAR Tech. Note NCAR/TN–468+STR, 88 pp. [Available online at http://www2.mmm.ucar.edu/wrf/users/docs/arw_v2_070111.pdf.]

  • Skok, G., , Tribbia J. , , and Rakovec J. , 2010: Object-based analysis and verification of WRF Model precipitation in the low- and midlatitude Pacific Ocean. Mon. Wea. Rev., 138, 45614575, doi:10.1175/2010MWR3472.1.

    • Search Google Scholar
    • Export Citation
  • Thompson, G., , Rasmussen R. , , and Manning K. , 2004: Explicit forecasts of winter precipitation using an improved microphysics scheme. Part I: Description and sensitivity analysis. Mon. Wea. Rev., 132, 519542, doi:10.1175/1520-0493(2004)132<0519:EFOWPU>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Weygandt, S. S., , Smirnova T. G. , , Benjamin S. G. , , Brundage K. J. , , Sahm S. R. , , Alexander C. R. , , and Schwartz B. E. , 2009: The High Resolution Rapid Refresh (HRRR): An hourly updated convection resolving model utilizing radar reflectivity assimilation from the RUC/RR. Preprints, 23rd Conf. on Weather Analysis and Forecasting/19th Conf. on Numerical Weather Prediction, Omaha, NE, Amer. Meteor. Soc., 15A.6. [Available online at https://ams.confex.com/ams/23WAF19NWP/techprogram/paper_154317.htm.]

  • Wilks, D. S., 2006: Statistical Methods in Atmospheric Sciences. 2nd ed. Academic Press, 627 pp.

  • Wolff, J. K., , Harrold M. , , Fowler T. , , Gotway J. H. , , Nance L. , , and Brown B. G. , 2014: Beyond the basics: Evaluating model-based precipitation forecasts using traditional, spatial, and object-based methods. Wea. Forecasting, 29, 14511472, doi:10.1175/WAF-D-13-00135.1.

    • Search Google Scholar
    • Export Citation
  • Wolfson, M. M., and et al. , 2008: Consolidated Storm Prediction for Aviation (CoSPA). Integrated Communications, Navigation and Surveillance Conf., Bethesda, MD, IEEE, 1–19, doi:10.1109/ICNSURV.2008.4559190.

  • Xiao, Q., , and Sun J. , 2007: Multiple-radar data assimilation and short-range quantitative precipitation forecasting of a squall line observed during IHOP 2002. Mon. Wea. Rev., 135, 33812404, doi:10.1175/MWR3471.1.

    • Search Google Scholar
    • Export Citation
  • View in gallery

    Examples showing the domain and subdomains and the storms (objects) identified by MODE in both forecast and observed VIL fields: (a) 4-h VIL forecast produced by the HRRR model valid at 0300 UTC 10 Jul 2010 and (b) objects identified by MODE based on (a); (c) observed VIL at 0300 UTC 10 Jul 2010 and (d) objects identified by MODE based on (c). All of the MODE analyses in section 3 are based on the whole domain: the eastern two-thirds of the CONUS. The two subdomains used in section 4, the upper Midwest and Southeast, are drawn as white boxes in (a).

  • View in gallery

    A schematic illustrating the storm attributes used in this paper. The thick black line outlines the storm (object). The convex hull of the object, which resembles a rubber band circling the object, is represented by the red line. The rectangle enclosing the object is represented by the dashed line. The length of the major axis is denoted by a; the minor axis by b. The hatched area is the area of convex hull minus the area of the object.

  • View in gallery

    (a) CSI, (b) bias, and (c) total area of storms for the 3-week period during the summer of 2010 for 4-, 8-, and 12-h HRRR convective storm forecasts (i.e., VIL threshold of 3.5 kg m−2) as a function of time of day. The verification domain is the whole domain shown in Fig. 1.

  • View in gallery

    Comparison of total number of storms and total area of storms summed over all forecasts with the same lead time in the large domain as a function of time of day between the forecast and observed storms during the same 3-week period as in Fig. 3: (a) 4-h HRRR total number of storms, (b) 4-h HRRR total area of storms, (c) 8-h HRRR total number of storms, (d) 8-h HRRR total area of storms, (e) 12-h HRRR total number of storms, and (f) 12-h HRRR total area of storms. Solid curves represent the model; dotted curves represent the observations. The two thin vertical black lines represent 1600 and 1800 UTC. The ratios of the rate of change of the total number of storms in the observations vs the forecast between 1600 and 1800 UTC are given in (a),(c), and (e).

  • View in gallery

    Box plots of the number of storms per forecast as a function of time of day for the 3-week period for (a) 4-, (b) 8-, and (c) 12-h HRRR forecasts. The top of the box represents the 75th percentile, the line inside the box represents the 50th percentile, and the bottom of the box represents the 25th percentile. Whiskers denote the normal data range. Outliers, which are defined as beyond ±2.7 std devs, are represented by the plus sign.

  • View in gallery

    Percentile plots of storm size per forecast as a function of time of day for the 3-week period. Shown are percentile (i.e., 25%, 50%, 75%, and 90%) plots of storm size for (a) 4-, (b) 8-, and (c) 12-h HRRR forecasts. Black lines represent the HRRR; gray lines represent the observations.

  • View in gallery

    Percentile plots of (a) storm angle, (b) aspect ratio, (c) complexity, and (d) intensity per forecast as a function of time of day for the 8-h HRRR forecast during the 3-week period. Black lines represent the HRRR; gray lines represent the observations.

  • View in gallery

    Histogram of storm size produced by the HRRR model and compared with the observations for (a) 4-, (b) 8-, and (c) 12-h forecasts during the 3-week period. Filled black rectangles represent the HRRR model; gray rectangles represent the observations.

  • View in gallery

    Percentile plots of (a) storm angle, (b) aspect ratio, (c) complexity, and (d) intensity as a function of storm size for 8-h HRRR forecasts during the 3-week period.

  • View in gallery

    As in Fig. 4, but for the total number of storms in the (a),(c),(e) upper Midwest and (b),(d),(f) Southeast subdomains shown in Fig. 1.

  • View in gallery

    As in Fig. 8, but for histogram of storm size produced by the HRRR model and compared with observations for 4-, 8-, and 12-h forecasts for the (a),(c),(e) upper Midwest and (b),(d),(f) Southeast subdomains. Filled black rectangles represent the HRRR model; gray rectangles represent the observations.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 83 83 16
PDF Downloads 75 75 12

Object-Based Evaluation of a Numerical Weather Prediction Model’s Performance through Forecast Storm Characteristic Analysis

View More View Less
  • 1 National Center for Atmospheric Research, Boulder, Colorado
  • | 2 U.S. Army Research Laboratory, White Sands Missile Range, New Mexico
© Get Permissions
Full access

Abstract

Traditional pixel-versus-pixel forecast evaluation scores such as the critical success index (CSI) provide a simple way to compare the performances of different forecasts; however, they offer little information on how to improve a particular forecast. This paper strives to demonstrate what additional information an object-based forecast evaluation tool such as the Method for Object-Based Diagnostic Evaluation (MODE) can provide in terms of assessing numerical weather prediction models’ convective storm forecasts. Forecast storm attributes evaluated by MODE in this paper include storm size, intensity, orientation, aspect ratio, complexity, and number of storms. Three weeks of the High Resolution Rapid Refresh (HRRR) model’s precipitation forecasts during the summer of 2010 over the eastern two-thirds of the contiguous United States were evaluated as an example to demonstrate the methodology. It is found that the HRRR model was able to forecast convective storm characteristics rather well either as a function of time of day or as a function of storm size, although significant bias does exist, especially in terms of storm number and storm size. Another interesting finding is that the model’s ability of forecasting new storm initiation varies substantially by regions, probably as a result of its different skills in forecasting convection driven by different forcing mechanisms (i.e., diurnal heating vs synoptic-scale frontal systems).

Current affiliation: U.S. Army Research Laboratory, White Sands Missile Range, New Mexico.

The National Center for Atmospheric Research is sponsored by the National Science Foundation.

Corresponding author address: Huaqing Cai, U.S. Army Research Laboratory, White Sands Missile Range, NM 88002. E-mail: huaqing.cai.civ@mail.mil

Abstract

Traditional pixel-versus-pixel forecast evaluation scores such as the critical success index (CSI) provide a simple way to compare the performances of different forecasts; however, they offer little information on how to improve a particular forecast. This paper strives to demonstrate what additional information an object-based forecast evaluation tool such as the Method for Object-Based Diagnostic Evaluation (MODE) can provide in terms of assessing numerical weather prediction models’ convective storm forecasts. Forecast storm attributes evaluated by MODE in this paper include storm size, intensity, orientation, aspect ratio, complexity, and number of storms. Three weeks of the High Resolution Rapid Refresh (HRRR) model’s precipitation forecasts during the summer of 2010 over the eastern two-thirds of the contiguous United States were evaluated as an example to demonstrate the methodology. It is found that the HRRR model was able to forecast convective storm characteristics rather well either as a function of time of day or as a function of storm size, although significant bias does exist, especially in terms of storm number and storm size. Another interesting finding is that the model’s ability of forecasting new storm initiation varies substantially by regions, probably as a result of its different skills in forecasting convection driven by different forcing mechanisms (i.e., diurnal heating vs synoptic-scale frontal systems).

Current affiliation: U.S. Army Research Laboratory, White Sands Missile Range, New Mexico.

The National Center for Atmospheric Research is sponsored by the National Science Foundation.

Corresponding author address: Huaqing Cai, U.S. Army Research Laboratory, White Sands Missile Range, NM 88002. E-mail: huaqing.cai.civ@mail.mil

1. Introduction

Traditional pixel-versus-pixel forecast verification scores based on contingency-table statistics have been widely used in forecast evaluations for many years, owing their popularity to the ease with which statistics are generated and forecasts can be compared. However, single-number scores such as the critical success index (CSI; Wilks 2006) offer little diagnostic information as to why a particular forecast may be good or bad and, thus, typically provide less useful diagnostic feedback on how to improve the forecast. Recognizing the limitations of pixel-versus-pixel forecast evaluation methods, a number of alternative new tools have been developed (e.g., Ebert and McBride 2000; Casati et al. 2004; Davis et al. 2006a,b; Ebert 2008; Roberts and Lean 2008; Burghardt et al. 2014; Clark et al. 2014). In fact, a special issue of Weather and Forecasting has been dedicated to intercomparisons of various spatial-verification methods (e.g., Gilleland et al. 2009). In this paper, we try to document a detailed evaluation of a numerical weather prediction (NWP) model’s storm forecasts using an object-based forecasting evaluation tool called the Method for Object-Based Diagnostic Evaluation (MODE; Davis et al. 2006a,b, 2009). The MODE technique is applied to convective storm forecasts produced by a numerical model named High Resolution Rapid Refresh (HRRR; Weygandt et al. 2009; Benjamin et al. 2011), which was developed by the Global Systems Division (GSD) of the National Oceanic and Atmospheric Administration’s (NOAA) Earth System Research Laboratory (ESRL) with major funding from the Federal Aviation Administration (FAA). Although HRRR has been widely used to support aviation weather prediction and in the severe storm community for many years (e.g., Wolfson et al. 2008; Curtis et al. 2010), its full capability to forecast convective storms has yet to be thoroughly evaluated in a systematic way through the use of an object-based approach, except for a recent study by Pinto et al. (2015). Therefore, the focus of this paper is on showing how object-based forecast evaluation tools such as MODE can provide diagnostic insights into the HRRR model’s ability to forecast convective storms.

MODE identifies objects in the forecast and observed fields by applying a convolution filter and a threshold. It can be applied to a number of fields including accumulated rainfall, reflectivity, and vertically integrated liquid water [VIL; kg m−2; Greene and Clark (1972)]. The convolution process smooths the original field while the threshold determines the objects. A number of attributes for each object can be readily calculated including size, intensity, aspect ratio, orientation, and complexity. The forecast and observed objects can then be matched using a fuzzy-logic algorithm. Two kinds of forecast performance statistics, namely metrics with or without matching of objects, can be derived from MODE output. The latter simply compiles and compares single-object attribute statistics from both forecasts and observations without matching each individual forecast object with its corresponding observed object; the former needs to match the forecast objects with observed objects first, then calculate performance statistics such as the percentage of forecast objects that matched with observed objects. Davis et al. (2006a,b) showed that both performance metrics are useful for evaluating NWP model storm forecast performance. Their study was based on precipitation predictions made with the Weather Research and Forecasting (WRF) Model (Skamarock et al. 2005). Storm number, orientation, and aspect ratio as a function of storm size were among the storm attributes investigated. They found that the WRF Model (http://wrf-model.org) was good at reproducing storm characteristics except for small storms in which some model bias was noted. Phillips et al. (2008) evaluated both WRF and the Fifth-generation Pennsylvania State University–National Center for Atmospheric Research (NCAR) Mesoscale Model (MM5) using another object-based storm analysis tool called Thunderstorm Identification Tracking Analysis and Nowcasting (TITAN; Dixon and Wiener 1993). They analyzed the number of storms and mean storm size as a function of time of day and found that WRF produced more storms than did MM5 during the late afternoon storm-initiation period (~1700–2000 UTC). Also using TITAN, Pinto et al. (2015) performed a detailed assessment of HRRR’s ability to predict mesoscale convective systems (MCSs) over different regions in the contiguous United States (CONUS). Despite model bias correction and VIL threshold optimization to best match forecast and observed MCSs, a significant model bias was found in terms of the number of MCSs predicted by the HRRR. To the best of the authors’ knowledge, Pinto et al. (2015) is the only other object-based evaluation of the HRRR in the literature besides the work reported in this paper. Naturally, our research shares many similarities to Pinto et al. (2015), but it also differs significantly in the following aspects. 1) We used the HRRR output of simulated radar VIL directly without a VIL bias correction since we are interested in HRRR’s original ability to forecast convective storms without any adjustment of model output aimed for better evaluation results. On the other hand, VIL bias correction and threshold optimization are imperative to the study by Pinto et al. (2015) because identification of MCSs is highly sensitive to VIL bias and threshold. 2) Our evaluation includes all the convective storms, not only MCSs, which were the focus of Pinto et al. (2015). 3) We used MODE instead of TITAN, which is a well-developed community model evaluation tool supported by NCAR.

Following the pioneering work of Davis et al. (2006a,b), a number of researchers have applied MODE to various forecast evaluation problems to address their own particular needs (e.g., Davis et al. 2009; Cai et al. 2009, 2011; Skok et al. 2010; Case et al. 2011; Johnson and Wang 2013; Johnson et al. 2013; Burghardt et al. 2014; Clark et al. 2014; Wolff et al. 2014). For instance, MODE was used to evaluate WRF Model precipitation forecasts in the low- and midlatitude Pacific Ocean area (Skok et al. 2010), demonstrate the improvement of summertime precipitation forecasts over the southeastern United States through high-resolution initialization of the surface state (Case et al. 2011), diagnose the impact of horizontal grid spacing on convective-allowing forecasts (Johnson et al. 2013), and assess a storm-scale ensemble during the 2009 NOAA Hazardous Weather Testbed Spring Experiment (Johnson and Wang 2013). Most recently, a new version of MODE with the capability of tracking objects through time (known as MODE time domain) was employed by Clark et al. (2014) for evaluating differences between model performances caused by different microphysics schemes used in the WRF. Although applied most frequently to high-resolution limited-area models, MODE has also been used for assessing model precipitation forecasts at medium (~5–20 km) and coarse (>20 km) resolution in global models (e.g., Wolff et al. 2014). One unique aspect of Wolff et al. (2014) is its comprehensive evaluation of precipitation forecasts from multiple models using multiple thresholds and multiple verification matrices, both traditional and object based.

Our initial attempt to evaluate HRRR using MODE for aviation weather was documented by Cai et al. (2009, 2011). In this paper, we also follow the work of Davis et al. (2006a,b) and expand upon the work of Cai et al. (2009, 2011) by performing a detailed evaluation of storm characteristics forecasted by the HRRR model—which builds upon the Advanced Research version of WRF (ARW)—to address the following questions:

  1. How well is the model able to forecast new storm initiation?
  2. How well is the model able to reproduce the diurnal variations of storm characteristics such as storm number, size, intensity, orientation, aspect ratio, and complexity?
  3. How well is the model able to forecast storm characteristics as a function of storm size?
  4. What is the regional variation in terms of the model performance?

All of the above questions will be addressed based on single-object statistics compiled from forecasts and their corresponding observations. No matching and merging is required in this analysis. It should be pointed out that although it is a well-known fact that model performance might vary in different geographical locations, most MODE-based studies still aggregate the verification results using fairly large domains (e.g., Johnson and Wang 2013; Clark et al. 2014), which may be controlled by different large-scale dynamics. This paper differs from most of the previous MODE studies by specifically comparing and contrasting MODE analyses over three different domains (a large domain and two subdomains within) in order to shed light on the regional variation in terms of the model performance. A similar approach was also taken by Pinto et al. (2015) in their object-based assessment of HRRR using TITAN.

The rest of the paper is organized as follows. Section 2 describes the dataset and methodology. Sections 3 and 4 present the object-based evaluation results of the 4-, 8-, and 12-h convective storm forecasts by the HRRR model (for a large domain and two subdomains, respectively). Section 5 summarizes this paper and suggests future work.

2. Dataset and methodology

The 2010 version of the HRRR model used in this study is an hourly updated, convection-allowing ARW with 3-km horizontal grid spacing initialized from the 13-km radar-enhanced Rapid Update Cycle (RUC) model. The parent RUC model includes a convective parameterization, but the HRRR, owing to its finer grid spacing, does not. [For details of the HRRR model configuration, the reader is referred to http://ruc.noaa.gov/hrrr/; see also Weygandt et al. (2009) and Benjamin et al. (2011).] It should be pointed out that there have been many updates added to the HRRR model since 2010; therefore, evaluation results presented in this paper might not necessarily apply to the current HRRR. As a matter of fact, it is usually assumed that any model evaluation work has a disclaimer like this because the model version and configuration greatly affect model performance. Thus, the objective of this paper is not only to evaluate a particular model per se, but also to demonstrate what diagnostic information an object-based forecast evaluation tool such as MODE could provide for evaluating model forecasts.

The HRRR model forecasts are postprocessed to produce VIL (Greene and Clark 1972), which is intended to mimic VIL derived from radar. Radar-derived VIL (i.e., the observed VIL) is a commonly used parameter in the aviation weather community to represent storms (e.g., Wolfson et al. 2008; Dupree et al. 2009; Pinto et al. 2010, 2015); therefore, it is important to evaluate how well the HRRR forecast VIL compared to the VIL observed by radar. There are two VIL fields in the HRRR output, however. One is named VIL, which is calculated directly from the HRRR microphysics; the other is named radar VIL, which is a misnomer since it is not derived from radar as the name would suggest but rather from the HRRR-simulated radar reflectivity. The radar VIL from HRRR is computed through three steps (Pinto et al. 2010). First, simulated radar reflectivity is calculated on each model grid point using the equations given by Thompson et al. (2004). Second, the simulated radar reflectivity at each grid point is converted to liquid water content according to a formula derived by Greene and Clark [(1972); see their Eq. (7)]. Finally, the radar VIL at each column is obtained by integrating liquid water content on each grid point vertically from model bottom to top. The same procedure used to convert three-dimensional simulated radar reflectivity to radar VIL in HRRR is also employed by Massachusetts Institute of Technology (MIT)–Lincoln Laboratory to obtain observed VIL with a horizontal resolution of 1 km from radar measurements (Wolfson et al. 2008; Dupree et al. 2009). The observed radar VIL was interpolated onto the 3-km HRRR model grid using a bilinear interpolation method so that it can serve as the truth field for model evaluation (J. Pinto 2015, personal communication). Between the two VIL fields from HRRR, the simulated radar VIL has a better agreement with observed radar VIL than does the VIL field because the algorithm was optimized to best match simulated radar reflectivity with observed radar reflectivity when calculating the simulated radar reflectivity using model microphysics in the first step described above (Pinto et al. 2010). For this reason, the simulated radar VIL from HRRR will serve as the forecast VIL, which is the basis for the subsequent analysis. Hereafter, the simulated radar VIL will simply be called forecast VIL.

The model verification domain covers the eastern two-thirds of the CONUS, which are smaller than the full HRRR domain in 2010 (see Fig. 1). This domain is also the large domain used for all the analyses in section 3. Two subdomains, the upper Midwest and the Southeast, are used to highlight the varying model performances in different geographical regions in section 4. The dataset spans a 3-week period during the summer of 2010: 12–18 June, 4–10 July, and 16–22 July. MODE is run for all hourly HRRR forecasts with nominal forecast lead times of 4, 8, and 12 h (i.e., without considering latency of when forecasts become available in real time). Two parameters in MODE that control the identification of storm objects are 1) the radius of influence for the convolution filter and 2) the VIL threshold. The forecast VIL field first goes through the convolution filter, which ensures the storm objects have smooth boundaries and do not have holes in them; then, a VIL threshold is applied to create the storm objects. In this study, we considered the applications for aviation purposes [i.e., we are only interested in convective storms; see Wolfson et al. (2008) and Pinto et al. (2010, 2015)] and followed suggestions by Davis et al. (2006a) in choosing a VIL threshold of 3.5 kg m−2, which corresponds to a radar reflectivity of ~38–44 dBZe (where e in dBZe indicates water equivalent; the e is implied for all subsequent instances of dBZ), and a convolution radius of 3 times the grid spacing (i.e., 3Δx = 9 km), which produces smooth-outlined, realistic-looking storm objects based on our subjective inspections. The VIL threshold of 3.5 kg m−2 was chosen because 1) it is the VIL value that has been used to represent convective storms in aviation weather for many years (e.g., Wolfson et al. 2008; Dupree et al. 2009; Pinto et al. 2010, 2015) and 2) it is the threshold for real-time verification of the FAA Consolidated Storm Prediction for Aviation (CoSPA; Wolfson et al. 2008; Dupree et al. 2009). As for selecting the convolution radius, sensitivity tests of 3, 4, and 5 times the grid spacing were conducted. Based on an NWP model’s effective resolution of perhaps 6–8 times its grid spacing (Skamarock 2004), Davis et al. (2006a) proposed to apply a convolution radius of 4 times the grid spacing in their study. A survey of the literature revealed that researchers have used convolution radii ranging from 2 to 8 times the grid spacing and hourly accumulated rainfall of ~6.25–10.0 mm for selecting convective storm objects in convection-allowing models with a horizontal grid spacing of ~4 km based on their specific needs (e.g., Case et al. 2011; Johnson and Wang 2013; Clark et al. 2014). As Wolff et al. (2014) pointed out, when it comes to selecting the appropriate threshold and convolution radius for identifying objects, “it is important to first determine the features of interest, and then select a set of MODE parameters that best capture the intended areas, prior to evaluation.” Selecting the proper parameters therefore consists of an iterative process and often heavily relies on the researcher’s subjective judgment. Our tests of convolution radius using 3, 4, and 5 times the grid spacing followed the procedure as outlined above. We found that a convolution radius of 3 times the grid spacing allowed us to keep most of the small storms, which was essential in diagnosing the timing of convection initiation (CI), yet at the same time created realistic-looking storm objects without holes in them. On the other hand, convolution radii of 4 and 5 times the grid spacing were also able to produce realistic-looking storm objects, but failed to keep many of the smaller storms. Therefore, a convolution radius of 3 times the grid spacing was adopted in this study.

Fig. 1.
Fig. 1.

Examples showing the domain and subdomains and the storms (objects) identified by MODE in both forecast and observed VIL fields: (a) 4-h VIL forecast produced by the HRRR model valid at 0300 UTC 10 Jul 2010 and (b) objects identified by MODE based on (a); (c) observed VIL at 0300 UTC 10 Jul 2010 and (d) objects identified by MODE based on (c). All of the MODE analyses in section 3 are based on the whole domain: the eastern two-thirds of the CONUS. The two subdomains used in section 4, the upper Midwest and Southeast, are drawn as white boxes in (a).

Citation: Weather and Forecasting 30, 6; 10.1175/WAF-D-15-0008.1

In addition to convolution and thresholding (the two processes in MODE that control what scale objects will be resolved), Davis et al. (2006a) also applied a minimum object size threshold of 25 grid squares to their statistical MODE analysis. This simple threshold ensures that objects too small to be realistically resolved by NWP models will be removed in the analysis. The same minimum object size (i.e., 25 grid squares) is also imposed in this paper, which could potentially mitigate any concerns that a convolution radius of 3 times the grid spacing might be too small, since small storms that survived the convolution process could still be discarded in the statistical analysis based on the storm size threshold. Finally, because both forecast and observed VIL have gone through the same convolution and thresholding processes (with the minimum storm size limitation imposed), it is expected that our results should be relatively robust for convective storm forecasting. On the other hand, if a lower VIL threshold were used so that stratiform rain areas might also be included, then a different set of results could be obtained. This is another area for potential future research.

MODE can be freely downloaded from the NCAR Developmental Testbed Center’s website (http://www.dtcenter.org/met/users/index.php) as part of their Model Evaluation Tools (MET). The single-storm attributes analyzed in this paper are briefly described below, and a schematic illustrating various object attributes discussed in this paper is shown in Fig. 2 [for a detailed discussion of all object properties in MODE, the reader is referred to Davis et al. (2006a)]:

  • area—this attribute is a simple measure of an object’s size conditioned on the chosen threshold (illustrated as the gray region in Fig. 2);
  • intensity—the original intensity of VIL inside an object is retained and its distribution is expressed in terms of percentiles by MODE; we use the median intensity (i.e., the 50th percentile) to represent an object’s intensity for this study;
  • axis angle—this attribute is used to represent object orientation; as shown in Fig. 2, the axis angle is defined as the angle between east and the major axis of an object; the angle can only vary within ±90°, with positive angles indicating southwest–northeast-oriented storms and negative angles indicating southeast–northwest-oriented storms;
  • aspect ratio—the ratio of the length of the minor axis to the length of the major axis is defined as the aspect ratio of an object (Fig. 2 shows it as b/a); for example, small storms tend to be fairly circular in shape with an aspect ratio close to 1; and
  • complexity—this attribute is defined as the difference in area between the convex hull and the original shape (i.e., the hatched area in Fig. 2) divided by the area of the convex hull (the hatched area plus the gray area in Fig. 2). The convex hull of a shape may be visualized as the shape enclosed by a rubber band stretched around the original shape as shown by the red line in Fig. 2. An object’s complexity range is 0–1; a complexity of zero indicates an object’s shape is convex. A regularly shaped storm, such as a circular storm, will have a small complexity value. On the other hand, a storm with a zigzag boundary or a bow shape will have a large complexity value.
Fig. 2.
Fig. 2.

A schematic illustrating the storm attributes used in this paper. The thick black line outlines the storm (object). The convex hull of the object, which resembles a rubber band circling the object, is represented by the red line. The rectangle enclosing the object is represented by the dashed line. The length of the major axis is denoted by a; the minor axis by b. The hatched area is the area of convex hull minus the area of the object.

Citation: Weather and Forecasting 30, 6; 10.1175/WAF-D-15-0008.1

For a certain forecast lead time, all of the forecast and observed objects valid at the same time of day from the 3-week period are binned together to study the diurnal variation of storm characteristics. Similarly, all of the forecast and observed storms for a certain forecast lead time can be binned together according to storm size in order to study variations of storm characteristics as a function of storm size. In doing so, the forecast and observed storm characteristics can be compared directly in a statistical sense as a function of time of day or as a function of storm size. No direct matching between forecast and observed objects is required since we are only interested in obtaining statistics of forecast storm attributes versus observed storm attributes.

3. Diagnosis of forecast versus observed storms for the large domain

As discussed in section 1, object-based forecast evaluation tools such as MODE provide diagnostic information of forecast performance. Therefore, when object-based forecast evaluation is combined with standard pixel-versus-pixel verification scores such as CSI, it may be possible to identify reasons why a particular forecast yielded a high or low CSI score (Wolff et al. 2014). This provides important feedback to the model developers on where to focus their efforts to improve their model, as well as helping the model users decide when to trust its forecast.

For comparison, we have calculated the CSI score and bias for 4-, 8-, and 12-h HRRR convective storm forecasts as a function of time of day for the same 3-week period used by the MODE analysis (Fig. 3). The results illustrate the following about model performance in forecasting convective storms:

  1. The CSI scores are highest during nighttime (0500–1500 UTC). It is hypothesized that this might be the result of a few large storms persisting nocturnally, which leads to more hits for a pixel-versus-pixel verification method when storm location errors are not too large.
  2. The CSI scores are lowest during the rapid storm-initiation period (1700–2200 UTC). It is hypothesized that this was caused by developing smaller storms that are less likely matched with observations even with relatively small location errors.
  3. The CSI scores are notably lower for longer lead times.
  4. The bias for all forecast lead times is smaller than one, indicating underforecasting in terms of total storm area, which is consistent with Fig. 3c. Also, the bias decreases (i.e., even more underforecasting) with increasing forecast lead time.
Fig. 3.
Fig. 3.

(a) CSI, (b) bias, and (c) total area of storms for the 3-week period during the summer of 2010 for 4-, 8-, and 12-h HRRR convective storm forecasts (i.e., VIL threshold of 3.5 kg m−2) as a function of time of day. The verification domain is the whole domain shown in Fig. 1.

Citation: Weather and Forecasting 30, 6; 10.1175/WAF-D-15-0008.1

Since the CSI scores in Fig. 3a were calculated using a 3-km grid spacing and a convective-scale VIL threshold (i.e., 3.5 kg m−2), it is not surprising that the scores are so low (e.g., Pinto et al. 2010; Huang and Meng 2014). Also note in Fig. 3a that when CSI scores are extremely low, it is not uncommon to see a large percentage change in CSI scores that sometimes exceeds 100%. The real forecast improvement might be much less dramatic than a doubling of the CSI scores would have suggested, since it might be caused by a few more overlapped pixels between the forecast and observed storms. By using an alternative, object-based forecast evaluation tool along with the CSI scores, we hope to be able to diagnose the reason(s) why the CSI scores behave in certain ways (Wolff et al. 2014). If the conclusions based on CSI scores or other traditional measure–based verification scores could be backed up by object-based forecast evaluation, it would bolster confidence in the use of very low CSI scores for convective forecast evaluation, a method that is still being employed in convective storm research (e.g., Pinto et al. 2010; Huang and Meng 2014).

a. Number and area of storms as a function of time of day

This section analyzes model performance from the perspective of its ability to reproduce the number and size of storms. The analysis of the total number of storms as a function of the time of day provides insights into how well the model forecasts handle new storm initiation, while the storm size distribution analysis yields further information with regard to the model’s ability to capture storm evolution. Hereafter in this paper, all the analyses as a function of the time of day (i.e., diurnal cycle) use all the forecasts binned together by the same forecast lead time and valid time during the 3-week period, while all the analyses as a function of storm size use all the forecasts binned together by the same forecast lead time, which would result in the forecasts with various valid times being analyzed together as long as they have the same forecast lead time.

The total number and the total area of storms for 4-, 8-, and 12-h HRRR forecasts compared with observations are shown in Fig. 4 for the large domain. The observed total number and total area of storms for different forecast lead times are slightly different in Fig. 4, especially the small dip at 2200 UTC in Figs. 4c and 4e. This was caused by a few 8- and 12-h forecasts that were missing during the 3-week period used in this study. This often happened when the HRRR was not able to finish its current forecast cycle before a new cycle had to be started. Since data aggregation was based on available forecasts, when a forecast was missing, its corresponding observation was also discarded from the analysis. Thus, the observed fields for different forecast lead times ended up with a small but noticeable difference. Since no more than two or three forecasts were missing in the 8- and 12-h forecasts, its impacts on data analysis were negligible, as indicated by the reasonable diurnal cycle obtained in Fig. 4.

Fig. 4.
Fig. 4.

Comparison of total number of storms and total area of storms summed over all forecasts with the same lead time in the large domain as a function of time of day between the forecast and observed storms during the same 3-week period as in Fig. 3: (a) 4-h HRRR total number of storms, (b) 4-h HRRR total area of storms, (c) 8-h HRRR total number of storms, (d) 8-h HRRR total area of storms, (e) 12-h HRRR total number of storms, and (f) 12-h HRRR total area of storms. Solid curves represent the model; dotted curves represent the observations. The two thin vertical black lines represent 1600 and 1800 UTC. The ratios of the rate of change of the total number of storms in the observations vs the forecast between 1600 and 1800 UTC are given in (a),(c), and (e).

Citation: Weather and Forecasting 30, 6; 10.1175/WAF-D-15-0008.1

A clear diurnal cycle of the total number and area of storms can be seen in Fig. 4 for both forecast and observed storms, which suggests that the HRRR was able to reproduce the diurnal cycle rather well. Both the total number and total area of storms reached a minimum during nighttime and early morning (~0800–1600 UTC) with a maximum during late afternoon and early evening (~2100–2300 UTC). The underforecasting of the total number of storms and total area of storms is evident—and gets worse with longer lead times. This is consistent with the CSI scores and bias in Figs. 3a and 3b, which show worse performance for longer forecast lead times as well.

The analysis of the total number of storms as a function of time of day in Figs. 4a, 4c, and 4e, also yields interesting information regarding the model’s ability to forecast new storm initiation. A careful examination of the total number of storms from 1600 to 1800 UTC in Figs. 4a, 4c, and 4e, suggests there is a delay or lack of new storm initiation by the HRRR model, as indicated by the notable lag in ramping up the number of predicted storms compared to the observed storms. For instance, the ratios of observed versus forecast rate of change of the total number of storms during 1600–1800 UTC for 4-, 8-, and 12-h forecasts are 1.9, 2.6, and 2.3, respectively, which means that the model was only able to produce less than half of the new storms initiated during this time period. It is important to emphasize that in Figs. 4a, 4c, and 4e, the total number of storms is summarized over a fairly large domain; therefore, convection initiation timing errors may get obscured as a result of varying model performance over different subregions. The timing of the convection initiation errors can be more accurately diagnosed using smaller domains—subdomains—as will be shown in section 4.

Because the total number and total area of storms could potentially be biased by a single forecast, it is imperative to investigate the distribution of the number and area of storms for each individual forecast. Figure 5 shows a box plot of the number of storms per forecast as a function of time of day for 4-, 8-, and 12-h HRRR forecasts and their corresponding observations for the whole MODE analysis domain. Similar to Fig. 4, a clear diurnal cycle is evident in both the forecast and observed numbers of storms, with a large number of storms (median value of ~100) occurring during late afternoon and early evening (~2000–2400 UTC) and a small number of storms (median value of ~40) during nighttime (~0800–1600 UTC). The spread of the number of storms is smaller during nighttime than in daytime, because there are fewer storms at night. Although the general diurnal trend in terms of the number of storms is captured fairly well by the model, the underforecasting is evident, especially for longer forecast lead times and for the time period just after convective initiation. The rather small spread in Figs. 5b and 5c for 8- and 12-h HRRR forecasts compared with observations during 2100–2300 UTC is another indication that longer-lead-time forecasts perform worse than 4-h forecasts. The 4-h forecast’s apparently better agreement with the observations might be attributed to radar-reflectivity assimilation in the HRRR model (Weygandt et al. 2009). This is consistent with other studies that have shown that the assimilation of radar-reflectivity data has limited positive effects beyond 6–9 h (Xiao and Sun 2007), although a different radar data assimilation technique was used by Xiao and Sun (2007).

Fig. 5.
Fig. 5.

Box plots of the number of storms per forecast as a function of time of day for the 3-week period for (a) 4-, (b) 8-, and (c) 12-h HRRR forecasts. The top of the box represents the 75th percentile, the line inside the box represents the 50th percentile, and the bottom of the box represents the 25th percentile. Whiskers denote the normal data range. Outliers, which are defined as beyond ±2.7 std devs, are represented by the plus sign.

Citation: Weather and Forecasting 30, 6; 10.1175/WAF-D-15-0008.1

In addition to a detailed analysis of the number of storms as a function of the time of day as presented in Figs. 4 and 5, we also performed a similar analysis regarding storm area. The 25th, 50th, 75th, and 90th percentiles of the storm area for 4-, 8-, and 12-h HRRR forecasts are shown in Fig. 6. Notice that the largest storms tend to occur during nighttime (~0500–1600 UTC), which may explain why the CSI scores shown in Fig. 3a are better at night: perhaps it is easier for larger storms to become matched in a pixel-versus-pixel verification (e.g., Wolff et al. 2014). Likewise, when storm sizes are small (i.e., during ~1700–2000 UTC), the CSI scores will be lower because it is more difficult to obtain hits for small storms (e.g., Case et al. 2011). It is interesting to note, based on Fig. 6, that the medium (50th percentile) and small (25th percentile) storm sizes do not exhibit a strong diurnal cycle as do the large storms (90th percentile). The 4-h forecast has the best agreement compared with the observations in terms of storm size distribution, especially during nighttime, although there is significant overforecasting of large storms around 1200 UTC. The decrease in forecast storm size compared with observed storm sizes during 1800–2400 UTC is a result of late or missed new storm initiation by the model. Since storms need time to grow in size, a delay in new storm initiation by the model would drag down the storm size percentile curves (as shown in Fig. 6). Finally, notice the seemly unrealistic dip of the 90th percentile storm size lines between 1000 and 1300 UTC in Fig. 6, which was mainly a sampling issue caused by a limited number of very large storms of considerably different size during the nighttime period. The fact that the dip does not exist at lower percentiles in Fig. 6 seems to back this explanation and excludes the possibility that the dip was caused by missing forecasts or data processing issues.

Fig. 6.
Fig. 6.

Percentile plots of storm size per forecast as a function of time of day for the 3-week period. Shown are percentile (i.e., 25%, 50%, 75%, and 90%) plots of storm size for (a) 4-, (b) 8-, and (c) 12-h HRRR forecasts. Black lines represent the HRRR; gray lines represent the observations.

Citation: Weather and Forecasting 30, 6; 10.1175/WAF-D-15-0008.1

b. Storm orientation, aspect ratio, intensity, and complexity as a function of time of day

Percentile plots for storm orientation, aspect ratio, intensity, and complexity from both the HRRR model and observations for the 8-h forecast over the whole domain are shown in Fig. 7. Both 4- and 12-h forecasts were also analyzed but not shown since similar results were obtained. The storm orientation, as represented by its major-axis angle in Fig. 7a, varies greatly in both forecast and observed storms. However, their distributions agree rather well and there is no apparent diurnal variation in terms of the orientation. As for the aspect ratio in Fig. 7b, forecast storms tend to have larger values than observed storms through the whole diurnal cycle, suggesting that the model tends to produce storms that are more circular. This finding is consistent with Johnson and Wang (2013) in their MODE evaluation of ARW with 4-km horizontal grid spacing. A diurnal cycle (noted in both the observations and model) in aspect ratio can be seen in Fig. 7b. Usually, storms tend to be more elongated (and also larger; see Fig. 6) during the nighttime, which corresponds to smaller aspect ratios in Fig. 7b. Once again, the relatively large difference between forecast and observed storms’ aspect ratios during late afternoon and early evening is an indirect indication of the model’s trouble in handling storm initiation, despite the fact that better diagnosis of convective initiation can be obtained directly from the number of storms. Since model storms need time to grow, a delay or lack of new storm initiation in the model will produce fewer large storms later on—perhaps causing the largest deviation between forecast and observed storms’ aspect ratios during 1800–2400 UTC for the 25th and 50th percentiles.

Fig. 7.
Fig. 7.

Percentile plots of (a) storm angle, (b) aspect ratio, (c) complexity, and (d) intensity per forecast as a function of time of day for the 8-h HRRR forecast during the 3-week period. Black lines represent the HRRR; gray lines represent the observations.

Citation: Weather and Forecasting 30, 6; 10.1175/WAF-D-15-0008.1

Similar to storm aspect ratio, storm complexity derived from the HRRR model also showed a diurnal cycle and a bias (Fig. 7c) with larger storms showing some diurnal variation. Usually, large storms tend to be more complex than small storms; therefore, it is expected that during nighttime large storms will generally exhibit increasing complexity. The observed storms are generally more complex than the forecast storms for all storm sizes, while the deviation for larger storms is the greatest.

The diurnal variation of storm intensity is evident in Fig. 7d, especially for lower-intensity storms. Storm intensity decreases dramatically from ~1700 UTC and then increases around 2100 UTC. The model’s delay in new storm initiation slows the intensity ramp-up of the forecast storms compared to the observed storms. Storm intensity reaches its peak during nighttime, and during this time larger storms dominate as a result of storm maturity. It is also noted that forecast storms tend to have reduced median intensity during the storm-initiation period (~1800–2400 UTC) compared to other time periods. Overall, the model bias in storm intensity as a function of time of the day is quite small as shown in Fig. 7d, which suggests very good calibration among model aspects such as microphysics scheme and VIL diagnostics.

c. Storm characteristics as a function of storm size

Histograms of storm size for 4-, 8-, and 12-h HRRR forecasts compared with observations for the whole MODE domain are shown in Fig. 8. Generally the model underforecasts the number of storms for all storm sizes and all lead times (consistent with Fig. 4). Notice also the absence of large forecast storms for 8- and 12-h lead forecasts in Figs. 8b and 8c, which corroborates the subjective assessment of HRRR being challenged to maintain larger MCSs (S. Weygandt 2011, personal communication) and is consistent with what Pinto et al. (2015) found out in their study. Overall, the 4-h forecast has the best agreement with the observations. Similar observations can also be made based on the storm size distributions for the two subdomains shown in the next section, which suggests that this diagnosis is not region specific.

Fig. 8.
Fig. 8.

Histogram of storm size produced by the HRRR model and compared with the observations for (a) 4-, (b) 8-, and (c) 12-h forecasts during the 3-week period. Filled black rectangles represent the HRRR model; gray rectangles represent the observations.

Citation: Weather and Forecasting 30, 6; 10.1175/WAF-D-15-0008.1

Storm orientation, aspect ratio, intensity, and complexity as a function of storm size for the 8-h forecast are shown in Fig. 9. Both 4- and 12-h forecasts were also analyzed but are not shown since similar results were obtained. Generally, good agreement between the forecast and observed storms in terms of storm characteristics is found across the storm size spectrum. The small storms tend to have small angle, large aspect ratio, small complexity, and small median intensity while the large storms tend to have large angle, small aspect ratio, large complexity, and large median intensity. A high bias in aspect ratio (Fig. 9b) and a low bias in complexity (Fig. 9c) in forecast storms are noted, suggesting the HRRR forecast storms tend to be rounder than the real storms. This conclusion is consistent with Fig. 7b, where the storms were binned together according to their forecast valid time. Given the relatively coarse horizontal grid spacing of the HRRR for resolving thunderstorms, it is not surprising that the HRRR forecast storms are more circular than radar-observed storms. As for storm intensity in Fig. 9d, a lower bias for large storms is observed. The conclusions based on Fig. 9 are very similar to those of Davis et al. (2006a,b); the model is able to reproduce the storm characteristics (i.e., convective mode) rather well, although bias does exist for certain storm properties.

Fig. 9.
Fig. 9.

Percentile plots of (a) storm angle, (b) aspect ratio, (c) complexity, and (d) intensity as a function of storm size for 8-h HRRR forecasts during the 3-week period.

Citation: Weather and Forecasting 30, 6; 10.1175/WAF-D-15-0008.1

4. Diagnosis of forecast versus observed storms for the two subdomains

Regional variation in model performance has long been recognized based on subjective model evaluations; however, it has yet to be adequately addressed in previous model evaluations using MODE. In this section, we select two subdomains inside the whole domain of Fig. 1 and run MODE analysis for those two subdomains—the upper Midwest and Southeast—to highlight the dramatic difference in the HRRR model performance over different geographical regions.

a. Number of storms as a function of time of day and storm size distribution

In section 3, we have shown that analyzing the forecast versus observed total number of storms as a function of time of day could shed light on the model’s capability of forecasting new storm initiation. As a matter of fact, this technique works better when the verification domain is not too large. Using a larger domain, early and late new storm initiations could cancel out.

The total number of storms, as a function of time of day for the upper Midwest and Southeast subdomains, is shown in Fig. 10. Notice the striking model performance difference in terms of predicting the total number of storms between the upper Midwest (Figs. 10a,c,e) and Southeast (Figs. 10b,d,f) subdomains. The diurnal variation of the total number of storms in the Southeast is much stronger than that of the upper Midwest, suggesting different forcing mechanisms are responsible for the storm initiation and evolution in these two subdomains. The timing of new storm initiation in the upper Midwest was much better handled by the model than in the Southeast. In fact, all forecasts for the upper Midwest showed almost simultaneous increases in the total number of storms compared to the observations starting at 1800 UTC, indicating fairly good timing of storm initiation in this subdomain. Both the 4- and 8-h upper Midwest forecasts do show some overforecasting of the total number of storms by 2200 UTC. On the other hand, the timing of new storm initiation in the Southeast subdomain was not handled very well by the model. All HRRR forecasts for the Southeast exhibited a significant delay or lack of new storms starting at 1700 UTC. In addition, a severe underforecasting of the total number of storms is evident in Figs. 10b, 10d, and 10f, for times later than 1600 UTC. For both subdomains, forecasts with shorter lead times tend to be more accurate.

Fig. 10.
Fig. 10.

As in Fig. 4, but for the total number of storms in the (a),(c),(e) upper Midwest and (b),(d),(f) Southeast subdomains shown in Fig. 1.

Citation: Weather and Forecasting 30, 6; 10.1175/WAF-D-15-0008.1

A subjective assessment of the large-scale environment of the upper Midwest and Southeast subdomains by the authors suggests that the storms in the upper Midwest were dominated by synoptic-scale forcing (mostly by fronts moving through the region), which yields large and organized precipitation systems. On the other hand, the Southeast subdomain’s storms were driven by local, diurnal heating that produces scattered isolated storms for the 3-week period analyzed in this study. This subjective evaluation is corroborated by the storm size distribution for both subdomains, as shown in Fig. 11. As expected, the Southeast does not have any storms greater than 2500 grid cells in size. This is in marked contrast to those in the upper Midwest, where some storms reached larger sizes. Another reason that local, diurnal heating was likely responsible for the storms across the Southeast subdomain is the strong diurnal cycle seen in the total number of storms (Figs. 10b,d,f) mirroring the diurnal heating cycle. In contrast, the rather weak diurnal variations (if any) in the total number of storms in the upper Midwest suggests there were other more prominent forcing mechanisms at work besides the usual diurnal heating. As a matter of fact, based on our subjective analysis, we found that the storms in the upper Midwest subdomain were mostly associated with large MCSs propagating through the west–east corridor as described by Carbone et al. (2002).

Fig. 11.
Fig. 11.

As in Fig. 8, but for histogram of storm size produced by the HRRR model and compared with observations for 4-, 8-, and 12-h forecasts for the (a),(c),(e) upper Midwest and (b),(d),(f) Southeast subdomains. Filled black rectangles represent the HRRR model; gray rectangles represent the observations.

Citation: Weather and Forecasting 30, 6; 10.1175/WAF-D-15-0008.1

One feature that is the same for both the upper Midwest and Southeast subdomains, as shown in Fig. 11, is that for longer forecast lead times the model tended to have fewer large storms compared with the observations. This same conclusion was drawn from Fig. 8, which utilized data from the whole domain. Thus, it seems that maintaining large storms, or decaying them too quickly after they are produced, poses a challenge for the HRRR model forecasts of longer lead times regardless of the geographical region. It should be pointed out that this kind of diagnostic information regarding model performance could only be achieved by an object-based forecast evaluation tool such as MODE.

b. Storm characteristics as a function of storm size

The percentile plots of storm orientation, aspect ratio, complexity, and intensity for the 4-, 8- and 12-h HRRR forecasts, as a function of storm size for the two subdomains, are analyzed but not shown. Similar results were obtained from Fig. 9, which is based on the whole domain. It appears that model performance regarding storm characteristics as a function of storm size is fairly consistent across the two different geographical regions as well as the whole verification domain with just one exception: the storm intensity in the Southeast seems to have a stronger correlation with storm size than that of the upper Midwest and across the whole domain (i.e., the larger the storm size, the stronger the storm intensity in the Southeast).

The absence of significant differences in terms of storm characteristics as a function of storm size in the upper Midwest and Southeast subdomains seems to suggest that storm characteristics are mostly affected by their size. Storms with similar size tend to have similar characteristics regardless of how they were formed initially.

5. Summary and future work

Object-based forecast evaluation tools like MODE can diagnose forecast performance, thus providing more insightful information than may be gleaned from traditional pixel-versus-pixel contingency-table verification. Based on the MODE analysis of storm properties during a 3-week period of the HRRR model convective storm forecasts in the summer of 2010 over the eastern two-thirds of the contiguous United States, it is found that the HRRR model satisfactorily forecasted storm property characteristics—storm number, size, intensity, orientation, aspect ratio, and complexity—as a function of either storm size or time of day. However, significant bias and/or time lag did exist for certain storm characteristics. Specifically, the model tended to underforecast the total number and total area of convective storms, and its capability to forecast new storm initiation was very different for the two subregions examined, especially at longer forecast lead times. For example, although the HRRR apparently had trouble initiating ordinary thunderstorms in the Southeast when afternoon diurnal heating is the major forcing mechanism, it did rather well in forecasting new storms in the upper Midwest where storms are more typically driven by synoptic-scale frontal boundaries.

Another interesting MODE diagnosis concerns the HRRR’s capability to handle large MCS features: HRRR produced comparably sized MCSs compared with observations for the 4-h forecast but failed to maintain them, resulting in much smaller storms for the 8- and 12-h forecasts.

The combined forecast evaluation of CSI scores along with MODE analysis confirmed the hypothesis that the diurnal CSI score variation was largely a result of storm size variation through the diurnal cycle. The fact that both MODE evaluation and CSI scores came to the same conclusion—forecasts with shorter lead times outperformed those with longer lead times—suggests that even very small CSI score differences could potentially be trusted, as long as that small difference is consistent and has meaningful physical explanations.

The focus of this paper has been on diagnosing model performance issues. The most significant contribution toward aiding the modelers’ efforts in improving their model is the diagnostic information on the convective initiation difficulties in the Southeast and the overall problem with maintaining large storms. In fact, close collaborations between NOAA/GSD, NCAR’s Research Applications Laboratory, and MIT–Lincoln Laboratory during the course of this research ensured that the evaluation of the 2010 HRRR model led to further improvements in later versions of the HRRR.

Future work will evaluate model forecasts for matched objects using MODE. A simple score such as CSI based on matched or unmatched objects could be calculated and used as a measure for forecast performance. In addition to object-based CSI scores, a number of matched-object attributes could also be analyzed, of which the most important one would be the centroid distance. The matched object’s centroid distance will provide a quantitative estimate of forecast displacement error—a parameter that a traditional pixel-versus-pixel verification cannot provide but that has important implications on how to use the model storm forecast most effectively, especially for aviation and severe weather applications of the HRRR.

Acknowledgments

The authors thank John Halley Gotway and Peiqi He at the National Center for Atmospheric Research (NCAR) for help with the installation and operation of MODE. James Pinto at NCAR kindly provided Fig. 2. Matthias Steiner took tremendous effort in reshaping this paper, and Barbra Brown provided insightful comments at the early stage of writing this paper. Their help is greatly appreciated.

Thanks also go to NOAA/GSD and MIT–Lincoln Laboratory for providing the HRRR and observed VIL datasets, respectively. Three anonymous reviewers’ constructive and critical review improved this manuscript tremendously. Last but not least, John Raby and Stephen Kirby at the U.S. Army Research Laboratory performed a technical review of this paper.

This research is in response to requirements and funding by the Federal Aviation Administration (FAA). HC is also partially supported by the U.S. Army Research Laboratory. The views expressed are those of the authors and do not necessarily represent the official policy or position of the FAA or the U.S. Army.

REFERENCES

  • Benjamin, S., , Weygandt S. S. , , Brown J. M. , , and DiMego G. , 2011: Beyond the 2011 Rapid Refresh: Hourly updated numerical weather prediction guidance from NOAA for aviation from 2012–2020. Preprints, 15th Conf. on Aviation, Range, and Aerospace Meteorology, Los Angeles, CA, Amer. Meteor. Soc., 11.1. [Available online at https://ams.confex.com/ams/14Meso15ARAM/techprogram/paper_191236.htm.]

  • Burghardt, B. J., , Evans C. , , and Roebber P. J. , 2014: Assessing the predictability of convection initiation in the high plains using an object-based approach. Wea. Forecasting, 29, 403418, doi:10.1175/WAF-D-13-00089.1.

    • Search Google Scholar
    • Export Citation
  • Cai, H., , Steiner M. , , Pinto J. , , He P. , , Dettling S. , , Albo D. , , Gotway J. H. , , and Brown B. , 2009: Scientific assessment and diagnostic evaluation of CoSPA 0–8 hour blended forecasts. The World Weather Research Program Symposium on Nowcasting and Very Short Range Forecasting, Whistler, BC, Canada, WMO, 7.2. [Available online at https://www.wmo.int/pages/prog/arep/wwrp/new/documents/WSN09_program_booklet.pdf.]

  • Cai, H., , Steiner M. , , Pinto J. , , Brown B. G. , , and He P. , 2011: Assessment of numerical weather prediction model storm forecasts using an object-based approach. Preprints, 24th Conf. on Weather Analysis and Forecasting/20th Conf. on Numerical Weather Prediction, Seattle, WA, Amer. Meteor. Soc., 8A.5 [Available online at http://ams.confex.com/ams/91Annual/webprogram/Paper182479.html.]

  • Carbone, R. E., , Tuttle J. D. , , Ahijevych D. , , and Trier S. B. , 2002: Inferences of predictability associated with warm season precipitation episodes. J. Atmos. Sci., 59, 20332056, doi:10.1175/1520-0469(2002)059<2033:IOPAWW>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Casati, B., , Ross G. , , and Stephenson D. B. , 2004: A new intensity-scale approach for the verification of spatial precipitation forecasts. Meteor. Appl., 11, 141154, doi:10.1017/S1350482704001239.

    • Search Google Scholar
    • Export Citation
  • Case, J. L., , Kumar S. V. , , Srikishen J. , , and Jedlovec G. J. , 2011: Improving numerical weather predictions of summertime precipitation over the southeastern United States through a high-resolution initialization of the surface state. Wea. Forecasting, 26, 785807, doi:10.1175/2011WAF2222455.1.

    • Search Google Scholar
    • Export Citation
  • Clark, A. J., , Bullock R. G. , , Jensen T. L. , , Xue M. , , and Kong F. , 2014: Application of object-based time-domain diagnostics for tracking precipitation systems in convection-allowing models. Wea. Forecasting, 29, 517542, doi:10.1175/WAF-D-13-00098.1.

    • Search Google Scholar
    • Export Citation
  • Curtis, R. A., , Weygandt S. S. , , Smirnova T. G. , , Benjamin S. , , Hofmann P. , , James E. P. , , and Koch D. A. , 2010: High Resolution Rapid Refresh (HRRR): Recent enhancements and evaluation during the 2010 convective season. Preprints, 25th Conf. on Severe Local Storms, Denver, CO, Amer. Meteor. Soc., 9.2 [Available online at https://ams.confex.com/ams/25SLS/techprogram/paper_175722.htm.]

  • Davis, C., , Brown B. G. , , and Bullock R. , 2006a: Object-based verification of precipitation forecasts. Part I: Methods and application to mesoscale rain areas. Mon. Wea. Rev., 134, 17721784, doi:10.1175/MWR3145.1.

    • Search Google Scholar
    • Export Citation
  • Davis, C., , Brown B. G. , , and Bullock R. , 2006b: Object-based verification of precipitation forecasts. Part II: Application to convective rain systems. Mon. Wea. Rev., 134, 17851795, doi:10.1175/MWR3146.1.

    • Search Google Scholar
    • Export Citation
  • Davis, C., , Brown B. G. , , Bullock R. , , and Halley-Gotway J. , 2009: The method for object-based diagnostic evaluation (MODE) applied to numerical forecasts from the 2005 NSSL/SPC Spring Program. Wea. Forecasting, 24, 12521267, doi:10.1175/2009WAF2222241.1.

    • Search Google Scholar
    • Export Citation
  • Dixon, M., , and Wiener G. , 1993: TITAN: Thunderstorm Identification, Tracking, Analysis and Nowcasting—A radar-based methodology. J. Atmos. Oceanic Technol., 10, 785797, doi:10.1175/1520-0426(1993)010<0785:TTITAA>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Dupree, W., , Morse D. , , Chan M. , , Tao X. , , Reiche C. , , Iskenderian H. , , and Wolfson M. , 2009: The 2008 CoSPA forecast demonstration. Preprints, Aviation, Range, and Aerospace Meteorology Special Symp. on Weather–Air Traffic Management Integration, Phoenix, AZ, Amer. Meteor. Soc., P1.1. [Available online at https://ams.confex.com/ams/pdfpapers/151488.pdf.]

  • Ebert, E. E., 2008: Fuzzy verification of high resolution gridded forecasts: A review and proposed framework. Meteor. Appl., 15, 5164, doi:10.1002/met.25.

    • Search Google Scholar
    • Export Citation
  • Ebert, E. E., , and McBride J. L. , 2000: Verification of precipitation in weather systems: Determination of systematic errors. J. Hydrol., 239, 179202, doi:10.1016/S0022-1694(00)00343-7.

    • Search Google Scholar
    • Export Citation
  • Gilleland, E., , Ahijevych D. , , Brown B. G. , , Casati B. , , and Ebert E. E. , 2009: Intercomparison of spatial forecast verification methods. Wea. Forecasting, 24, 14161430, doi:10.1175/2009WAF2222269.1.

    • Search Google Scholar
    • Export Citation
  • Greene, D. R., , and Clark R. A. , 1972: Vertically integrated liquid water—A new analysis tool. Mon. Wea. Rev., 100, 548552, doi:10.1175/1520-0493(1972)100<0548:VILWNA>2.3.CO;2.

    • Search Google Scholar
    • Export Citation
  • Huang, L., , and Meng Z. , 2014: Quality of the target area for metrics with different nonlinearities in a mesoscale convective system. Mon. Wea. Rev., 142, 23792397, doi:10.1175/MWR-D-13-00244.1.

    • Search Google Scholar
    • Export Citation
  • Johnson, A., , and Wang X. , 2013: Object-based evaluation of a storm-scale ensemble during the 2009 NOAA Hazardous Weather Testbed Spring Experiment. Mon. Wea. Rev., 141, 10791098, doi:10.1175/MWR-D-12-00140.1.

    • Search Google Scholar
    • Export Citation
  • Johnson, A., , Wang X. , , Kong F. , , and Xue M. , 2013: Object-based evaluation of the impact of horizontal grid spacing on convection-allowing forecasts. Mon. Wea. Rev., 141, 34133425, doi:10.1175/MWR-D-13-00027.1.

    • Search Google Scholar
    • Export Citation
  • Phillips, C., , Pinto J. , , Steiner M. , , Rasmussen R. , , Oien N. , , and Bateman R. , 2008: Statistical assessment of explicit model forecasts of convection using a new object-based approach. Preprints, 13th Conf. on Aviation, Range, and Aerospace Meteorology, New Orleans, LA, Amer. Meteor. Soc., 11.5. [Available online at https://ams.confex.com/ams/pdfpapers/134547.pdf.]

  • Pinto, J., , Dupree W. , , Weygandt S. , , Wolfson M. , , Benjamin S. , , and Steiner M. , 2010: Advances in the Consolidated Storm Prediction for Aviation (CoSPA). Preprints, 14th Conf. on Aviation, Range, and Aerospace Meteorology, Atlanta, GA, Amer. Meteor. Soc., J11.2. [Available online at https://ams.confex.com/ams/pdfpapers/163811.pdf.]

  • Pinto, J., , Grim J. , , and Steiner M. , 2015: Assessment of the High-Resolution Rapid Refresh model’s ability to predict mesoscale convective systems using object-based evaluation. Wea. Forecasting, 30, 892913, doi:10.1175/WAF-D-14-00118.1.

    • Search Google Scholar
    • Export Citation
  • Roberts, N. M., , and Lean H. W. , 2008: Scale-selective verification of rainfall accumulations from high-resolution forecasts of convective events. Mon. Wea. Rev., 136, 7897, doi:10.1175/2007MWR2123.1.

    • Search Google Scholar
    • Export Citation
  • Skamarock, W. C., 2004: Evaluating mesoscale NWP models using kinetic energy spectra. Mon. Wea. Rev., 132, 30193032, doi:10.1175/MWR2830.1.

    • Search Google Scholar
    • Export Citation
  • Skamarock, W. C., , Klemp J. B. , , Dudhia J. , , Gill D. O. , , Barker D. M. , , Wang W. , , and Powers J. G. , 2005: A description of the Advanced Research WRF version 2. NCAR Tech. Note NCAR/TN–468+STR, 88 pp. [Available online at http://www2.mmm.ucar.edu/wrf/users/docs/arw_v2_070111.pdf.]

  • Skok, G., , Tribbia J. , , and Rakovec J. , 2010: Object-based analysis and verification of WRF Model precipitation in the low- and midlatitude Pacific Ocean. Mon. Wea. Rev., 138, 45614575, doi:10.1175/2010MWR3472.1.

    • Search Google Scholar
    • Export Citation
  • Thompson, G., , Rasmussen R. , , and Manning K. , 2004: Explicit forecasts of winter precipitation using an improved microphysics scheme. Part I: Description and sensitivity analysis. Mon. Wea. Rev., 132, 519542, doi:10.1175/1520-0493(2004)132<0519:EFOWPU>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Weygandt, S. S., , Smirnova T. G. , , Benjamin S. G. , , Brundage K. J. , , Sahm S. R. , , Alexander C. R. , , and Schwartz B. E. , 2009: The High Resolution Rapid Refresh (HRRR): An hourly updated convection resolving model utilizing radar reflectivity assimilation from the RUC/RR. Preprints, 23rd Conf. on Weather Analysis and Forecasting/19th Conf. on Numerical Weather Prediction, Omaha, NE, Amer. Meteor. Soc., 15A.6. [Available online at https://ams.confex.com/ams/23WAF19NWP/techprogram/paper_154317.htm.]

  • Wilks, D. S., 2006: Statistical Methods in Atmospheric Sciences. 2nd ed. Academic Press, 627 pp.

  • Wolff, J. K., , Harrold M. , , Fowler T. , , Gotway J. H. , , Nance L. , , and Brown B. G. , 2014: Beyond the basics: Evaluating model-based precipitation forecasts using traditional, spatial, and object-based methods. Wea. Forecasting, 29, 14511472, doi:10.1175/WAF-D-13-00135.1.

    • Search Google Scholar
    • Export Citation
  • Wolfson, M. M., and et al. , 2008: Consolidated Storm Prediction for Aviation (CoSPA). Integrated Communications, Navigation and Surveillance Conf., Bethesda, MD, IEEE, 1–19, doi:10.1109/ICNSURV.2008.4559190.

  • Xiao, Q., , and Sun J. , 2007: Multiple-radar data assimilation and short-range quantitative precipitation forecasting of a squall line observed during IHOP 2002. Mon. Wea. Rev., 135, 33812404, doi:10.1175/MWR3471.1.

    • Search Google Scholar
    • Export Citation
Save