Previous studies examining convection-allowing models (CAMs), as well as NOAA/Hazardous Weather Testbed Spring Forecasting Experiments (SFEs) have typically emphasized “day 1” (12–36 h) forecast guidance. These studies find a distinct advantage in CAMs relative to models that parameterize convection, especially for fields strongly tied to convection like precipitation. During the 2014 SFE, “day 2” (36–60 h) forecast products from a CAM ensemble provided by the Center for Analysis and Prediction of Storms (CAPS) at the University of Oklahoma were examined. Quantitative precipitation forecasts (QPFs) from the CAPS ensemble, known as the Storm Scale Ensemble Forecast (SSEF) system, are compared to NCEP’s operational Short Range Ensemble Forecast (SREF) system, which provides lateral boundary conditions for the SSEF, to see if the CAM ensemble outperforms the SREF through forecast hours 36–60. Equitable threat scores (ETSs) were computed for precipitation thresholds ranging from 0.10 to 0.75 in. for each SSEF and SREF member, as well as ensemble means, for 3-h accumulation periods. The ETS difference between the SSEF and SREF peaked during hours 36–42. Probabilistic forecasts were evaluated using the area under the receiver operating characteristic curve (ROC area). The SSEF had higher values of ROC area, especially at thresholds ≥ 0.50 in. Additionally, time–longitude diagrams of diurnally averaged rainfall were constructed for each SSEF/SREF ensemble member. Spatial correlation coefficients between forecasts and observations in time–longitude space indicated that the SSEF depicted the diurnal cycle much better than the SREF, which underforecasted precipitation with a peak that had a 3-h phase lag. A minority of SREF members performed well.
Historically, warm season quantitative precipitation forecasts (QPFs) have been especially challenging for numerical weather prediction (NWP) models. While NWP forecasts for fields such as 500-hPa heights have improved, the skill of warm season QPFs has exhibited little change over time (e.g., Fritsch et al. 1998). Improvements in warm season QPFs would not only help with predictions of hazards such as flash floods (e.g., Vasiloff et al. 2007), which have accounted for roughly 2500 fatalities in the United States over the past half century (NOAA 2015), but would also benefit agriculture (e.g., through improved irrigation management), transportation industries, government, and emergency management (e.g., Sukovich et al. 2014). Recognizing the gap in skill for QPFs relative to other variables, Roebber et al. (2004) discussed a number of avenues for closing this gap including increasing model forecasts to sufficient resolution to explicitly depict convection and utilizing ensembles to depict the high degree of forecast uncertainty often associated with convection. However, due to computational limitations, it has only been very recently that operational models with sufficient resolution to explicitly depict convection [hereafter referred to as convection-allowing models (CAMs)1] have become available, and assessing and improving their capabilities is a rich area of research.
Recent work has shown that CAMs provide advantages relative to models that parameterize convection for several aspects of QPF. The advantages include an improved depiction of the diurnal precipitation cycle (Clark et al. 2007, 2009; Weisman et al. 2008; Berenguer et al. 2012) and better representation of the observed convective mode (e.g., Done et al. 2004; Kain et al. 2006). Additionally, Clark et al. (2009) found a distinct advantage using objective verification metrics in a small-membership CAM-based ensemble relative to a much larger convection-parameterizing ensemble. Furthermore, Roberts and Lean (2008), Schwartz et al. (2009), and Clark et al. (2010) also show improved precipitation forecasts in CAMs relative to coarser models, but illustrate that in some cases to see the improvements, spatial scales larger than the model grid spacing need to be considered using neighborhood-based objective metrics.
Other recent work has examined how CAM guidance is perceived relative to convection-parameterizing guidance by forecasters in simulated operational forecasting environments. For example, during the 2010 and 2011 NOAA/Hazardous Weather Testbed (HWT) Spring Forecasting Experiments (SFEs), QPF products from a CAM-based ensemble were compared to guidance from the operational Short Range Ensemble Forecast (SREF) system by a group of participants led by Weather Prediction Center (WPC; formerly the Hydrometeorological Prediction Center) forecasters. The WPC-led group found such an advantage in the CAM-based ensemble that they viewed the guidance as “transformational” to warm season QPFs (Clark et al. 2012). Evans et al. (2014) also conducted an experiment in a simulated forecasting environment, finding that forecasters felt CAM-based guidance added perceived value in QPFs relative to operational models that parameterized convection for an extreme heavy rainfall event related to a tropical storm.
From the aforementioned studies, it has become clear that CAM-based guidance provides important gains in QPF relative to convection-parameterizing guidance, but little work has been done to extend these comparisons past the day 1 (12–36 h) forecast period. For example, until April 2014, NOAA/HWT SFEs have only focused on CAM-based guidance extending to 36 h. However, starting in 2014, the Center for Analysis and Prediction of Storms (CAPS) ran the Storm Scale Ensemble Forecast (SSEF) system to 60 h to test the performance through the day 2 period. At such long forecast lead times, infiltration of lateral boundary conditions (LBCs) begins to have a relatively large influence on the forecasts; thus, whether a CAM-based ensemble can still maintain its advantage warrants further investigation. Therefore, the main purpose of this study is to objectively analyze these forecasts to evaluate whether the advantages relative to convection-parameterizing guidance translate to these longer lead times. For this purpose, the 60-h SSEF system forecasts are compared to those from the 16-km grid-spacing SREF ensemble using a variety of objective metrics for evaluating deterministic and probabilistic forecasts.
2. Data and methodology
a. SSEF and SREF ensemble description
Forecast precipitation for 3-h periods was examined from the convection-allowing SSEF and the convection-parameterizing SREF, which had 4- and 16-km grid spacing configurations, respectively. The SSEF system was provided to support the 2014 NOAA/HWT Spring Forecasting Experiment and consists of model integrations conducted from 21 April through 6 June. The 30 days of model integrations that were used for this study are as follows: 24–29 April; 1–2, 5–9, 12–16, 19–23, and 26–30 May; and 2–3 June. The SREF system 2100 UTC initializations were used, and the SSEF was initialized 3 h later at 0000 UTC (Kong et al. 2014). Because CAPS did not run the SSEF ensemble members during most weekend days during April–June, 30 days during the 2014 SFE period had the full datasets available from both ensembles. Observed precipitation data were derived from the NCEP stage IV dataset (Baldwin and Mitchell 1997), which was on a 4-km grid.
The SREF ensemble data (Du et al. 2014) were available on a 32-km grid from NCEP’s archives (available upon request). At the time, the 21-member ensemble consisted of 7 members from the Nonhydrostatic Multiscale Model on the B grid (NMMB; Janjić 2005, 2010; Janjić and Black 2007; Janjić et al. 2011; Janjić and Gall 2012), 7 members from the WRF Nonhydrostatic Multiscale Model (NMM; Janjić 2003), and 7 members from the Advanced Research WRF Model (ARW; Skamarock et al. 2008). Each set of seven members had one control member, three positive perturbations, and three negative perturbations. To generate these perturbations, the NMMB members used a breeding cycle (e.g., Toth and Kalnay 1997) initialized at 2100 UTC to create perturbations, which are added and subtracted from the control, creating six different perturbed analyses. The ARW members used the ensemble transform with rescaling methodology (ETR; Ma et al. 2014) to generate perturbations, while the NMM members used a blend of ETR and breeding.
Physics parameterizations in the SREF system consisted of the MYJ planetary boundary layer scheme (Mellor and Yamada 1982; Janjić 1990, 2002), the MRF planetary boundary layer scheme (Troen and Mahrt 1986; Hong and Pan 1996), and the Noah land surface model (Ek et al. 2003). Surface layer parameterizations consisted of the MYJ surface layer scheme (Mellor and Yamada 1982; Janjić 2002) and the Monin–Obukhov scheme with the Janjić Eta Model (Monin and Obukhov 1954; Paulson 1970; Dyer and Hicks 1970; Webb 1970; Janjić 2002). Radiation schemes consisted of the GFDL shortwave (SW; Lacis and Hansen 1974) and GFDL longwave (LW) schemes (Fels and Schwarzkopf 1975; Schwarzkopf and Fels 1991). Microphysical parameterizations consisted of the scheme used in the GFS (Zhao and Carr 1997), the Ferrier scheme (Ferrier et al. 2002), and the WSM6 scheme (Ferrier et al. 2002; Hong and Lim 2006). The convection parameterization schemes consisted of the Kain–Fritsch (KF; Kain and Fritsch 1998), the Betts–Mellor–Janjić (BMJ; Betts and Albrecht 1987; Janjić 2002), and the simplified Arakawa–Schubert (SAS; Arakawa 2004). Full specifications of the SREF are given in Table 1.
The SSEF system was generated using the WRF Model (Skamarock et al. 2008) run by CAPS for the 2014 NOAA/HWT Spring Forecasting Experiment (Kong et al. 2014). During 2014, the SSEF system had 20 members with 4-km grid spacing that were initialized on weekdays at 0000 UTC and integrated for 60 h over a continental United States (CONUS) domain from late April to the beginning of June. Initial condition (IC) analysis background and LBCs (3-h updates) for the control member were taken from the NAM analyses and forecasts, respectively. Radial velocity and reflectivity data from up to 140 Weather Surveillance Radar-1988 Dopplers (WSR-88Ds) and other high-resolution observations were assimilated into the ICs using the ARPS three-dimensional variational data assimilation (3DVAR; Xue et al. 2003; Gao et al. 2004) data and cloud analysis system (Xue et al. 2003; Hu et al. 2006; Gao and Xue 2008). IC perturbations were derived from evolved (through 3 h) perturbations of 2100 UTC initialized members of the SREF system and added to the control member ICs. For each perturbed member, the forecast of the SREF member used for the IC perturbations was also used for the LBCs. For the purposes of this study, only the 12 members comprised of the control member and the 11 members with IC/LBC perturbations were utilized. The other eight SSEF members were run with the same ICs/LBCs as the control with different physics parameterizations to study physics sensitivities.
Because a subset of the 21 SREF member forecasts is used as the LBCs for the SSEF members (except in the control), the two systems are inherently linked to each other. With the LBCs infiltrating into much of the domain interior, particularly by the latter half of the 60-h forecasts, we are essentially testing whether—given similar driving data (i.e., the LBCs)—the convection-allowing grid spacing can still provide an advantage. Table 2 shows detailed specifications of the 12 SSEF members used for this study. Nine out of the 12 members used the Noah land surface model (Ek et al. 2003), as was the case in the SREF ensemble. PBL schemes include the MYJ, Mellor–Yamada–Nakanishi–Niino (MYNN; Nakanishi 2000, 2001; Nakanishi and Niino 2004, 2006), Yonsei University (YSU; Noh et al. 2003), and quasi-normal scale elimination (QNSE; Sukoriansky et al. 2005) schemes. The microphysical parameterization consisted of the Thompson scheme (Thompson et al. 2004), Morrison scheme (Morrison et al. 2005), the WRF double-moment 6-class scheme (WDM6; Morrison et al. 2005; Lim and Hong 2010), and the double-moment Milbrandt and Yau scheme (MY2; Milbrandt and Yau 2005).
Before any verification metrics were computed, the SREF, SSEF, and 3-h observed NCEP stage IV precipitation data were interpolated onto the 32-km grid of the SREF using a neighbor budget interpolation (e.g., Accadia et al. 2003). In addition, the SREF has a larger domain than the SSEF, which only includes the CONUS. So, a mask was used to consider only points within the SSEF domain, east of the Rocky Mountains over land, and only in the United States (Fig. 1). This is due to the relative lack of reliable WSR-88D observations over the mountains, water, and outside the United States. The area of analysis is displayed in Fig. 1.
b. Forecast evaluation metrics
The first metric that was used to evaluate the precipitation forecasts from each ensemble was the equitable threat score (ETS; Schaefer 1990). ETS measures the fraction of observed and/or forecast events that were correctly predicted, adjusted for hits associated with random chance. The ETS was calculated using contingency table elements computed from every grid point in the 32-km grid-spacing analysis domain for each ensemble member every 3 h as follows: ETS = (H − Hcha)/(H + FA + M +Hcha), where H represents a hit (model correctly forecasted precipitation to exceed a certain threshold); Hcha represents the number of hits expected by chance; FA represents the number of false alarms (model forecasted precipitation to exceed a certain threshold, but the observed precipitation did not exceed that threshold); and M represents a miss (model did not forecast precipitation to exceed a certain threshold, but the observed precipitation did exceed that threshold). An ETS of 1 is perfect, and a score below zero represents no forecast skill.
In addition to computing the ETS for individual ensemble members, ETSs were also computed for ensemble mean precipitation forecasts. Ensemble means were computed using the probability matching technique (Ebert 2001). This technique assumes that the best spatial representation of the precipitation field is given by the ensemble mean and that the best probability density function (PDF) of rain rates is given by the ensemble member QPFs for all n ensemble members. To compute the probability matched mean, the precipitation forecasts from the ensemble members for every grid point are ranked in order from largest to smallest, keeping every nth value. The precipitation forecasts from the ensemble mean forecast are similarly ranked from largest to smallest, keeping every value. Then, the grid point with the highest value in the ensemble mean QPF field is reassigned to the highest QPF value in the ensemble member QPF distribution. Next, the grid point with the second highest value in the ensemble mean QPF field is reassigned to the second highest value in the ensemble member QPF distribution. This process is then repeated for all of the rankings, ending with the lowest. ETSs were then calculated from these probability matched means for both the SSEF and SREF for forecast hours 3–60.
Finally, hypothesis testing was conducted to evaluate whether the SSEF forecasts were significantly more accurate than the SREF forecasts. The hypothesis testing was conducted for all 20 accumulation periods spanning 60 forecast hours using the resampling method of Hamill (1999). To apply this method, the test statistic used to look at the difference in accuracy of the 3-h precipitation forecast ending at hour hr (where hr is a given forecast hour) is ETSSSEFhr − ETSSREFhr. The null hypothesis Ho is ETSSSEFhr − ETSSREFhr = 0.00. The alternative hypothesis Ha is ETSSSEFhr − ETSSREFhr ≠ 0.00. The significance level used was α = 0.05 and resampling was done 1000 times for each hypothesis test. In addition, the forecasts were corrected for bias before the resampling hypothesis tests were conducted. This was done for each threshold by calculating the average bias of the two ensemble means, and then finding the precipitation threshold at which the bias of each ensemble equals the average bias from the original precipitation threshold. More detailed information about the resampling method can be found in Hamill (1999).
In addition to the deterministic forecasts, probabilistic quantitative precipitation forecasts (PQPFs) for both the SSEF and SREF were generated for all five precipitation thresholds for 3-h QPFs. Probabilities were computed using the ratio of members that exceeded the specified threshold to the total number of members. The probabilistic forecasts were evaluated using the area under the receiver operating characteristics curve (ROC area; Mason 1982), which measures the ability to discriminate between events (exceedances of specified threshold) and nonevents (failure to exceed a specified threshold). It is calculated by computing the area under a curve constructed by plotting the probability of detection (POD) versus the probability of false detection (POFD). The area under the curve is computed using the trapezoidal method (Wandishin et al. 2001) employing the probabilities 0.05–0.95, in increments of 0.05. Values of the ROC area range from 0 to 1, with a value of 0.5 indicating no forecast skill and values above 0.5 indicating positive forecast skill. Similar to ETS, statistical significance was tested for the ROC areas using the resampling method (Hamill 1999).
To analyze the diurnal precipitation cycle, the 3-h QPFs were averaged over each forecast hour for each ensemble member, the probability matched means, and the stage IV observations. Latitudinal averages of forecast and observed 3-h precipitation were then computed and plotted in time–longitude space (i.e., Hovmoeller diagrams) for each ensemble member and the means. A Hovmoeller diagram depicting the difference between the model forecast and the observed precipitation was also constructed for each ensemble member. Spatial correlation coefficients between the forecasts and observations were computed for each 24-h forecast period (hours 12–36 and 36–60) for each ensemble member in order to quantify how well the model forecast precipitation corresponded to the observed diurnal cycle. This method is similar to that used in Clark et al. (2007, 2009).
Figures 2a–d depict the ETSs from the 0.10- to the 0.75-in. threshold for each of the ensemble members as a function of forecast hour. Generally, ETSs were fairly low, with values above 0.2 only existing at the lower thresholds and at forecast hours 3–12, mainly in the SSEF. However, these low values are consistent with the results from previous work that focused on the day 1 period (e.g., Clark et al. 2007). The SSEF outperformed the SREF during the day 1 period up to hour 36, with differences in ETS of around 0.05. ETSs in SSEF members had a broad maximum near 1200 UTC (forecast hours 12 and 36) and a broad minimum around 0000 UTC (forecast hour 24). There was a pronounced diurnal cycle in the SSEF member ETSs, especially for thresholds ≥ 0.50 in., likely associated with morning mesoscale convective system (MCS) activity leading to the peaks in ETSs. These peaks were likely due to the SSEF system members being able to explicitly depict large organized convective systems and their associated precipitation. This diurnal cycle in the ETSs was not as pronounced in the SREF ensemble, likely because of its inability to depict these types of convective systems.
The SSEF continued to have ETSs of 0.02–0.05 points higher than the SREF in the day 2 forecast period from hours 36 to 60, with a pronounced diurnal cycle. The difference in ETS between the SSEF and SREF ensembles was greater in the 36–42-h forecast period when compared to the later periods. But, a definite benefit still exists all the way out to forecast hour 60, as the mean ETS from the SSEF is always higher than the mean ETS from the SREF.
Significant differences between the ETSs of the probability matched means (indicated by red, dashed, vertical lines in Fig. 2) were more frequent at higher thresholds and generally occurred during the first 18 h of the forecasts and between hours 33 and 42. It is possible that a larger sample size or the consideration of larger spatial scales (e.g., Clark et al. 2010) would result in more times with significant differences.
b. Area under the ROC curve
Figures 3a–d depict the area under the ROC curve for all 60 forecast hours for the 0.10-, 0.25-, 0.50-, and 0.75-in. thresholds. The green shading represents statistically significant hours in favor of the SSEF. At the 0.10-in. threshold, both the SSEF and SREF have a similar amount of skill up to forecast hour 24. After that, the SSEF has a slightly greater ROC area, even during the day 2 period, with the SSEF ROC area being significantly higher at hours 42–48. At thresholds of 0.25 in. or greater, the SSEF outperforms the SREF by a much wider margin. The SSEF has significantly higher ROC area values up to hour 48 for thresholds of ≥0.25 in. At the 0.25-in. threshold, both ensembles have positive skill, but the SSEF consistently has a ROC area of about 0.1 greater than the SREF, including during the entire day 2 period, although hours 48–60 are not significant. The gap widens further, although the forecast skill starts to decrease at the 0.50-in. threshold. The SREF has almost no skill in the day 2 period, while the SSEF has some positive skill all the way out to forecast hour 60. During the day 2 period at the 0.75-in. threshold, the SREF has essentially no skill, while the SSEF still has a distinct positive amount of forecast skill, although the ROC area values are lower than values at the lower thresholds. Similar to the ETSs, ROC area values for all four thresholds show a distinct advantage for the SSEF during the day 2 period. Unlike the ETS results, there was not a more pronounced difference in ROC area values for forecast hours 36–42 versus the remainder of the day 2 period. For both ensembles, the ROC area values decreased sharply after hour 48, with only a fraction of those hours being statistically significant for the SSEF. When compared to the ETS results, there were many more hours with significant differences in favor of the SSEF using ROC area. The lower differences for ROC area likely occur because the ETS evaluates single deterministic forecasts from the individual members or the ensemble mean and requires gridpoint matches. Thus, if the forecast is wrong at the grid point, it is penalized. In contrast, the ROC area evaluates probabilistic forecasts that incorporate information from all ensemble members. Thus, if only one or two members of the ensemble have a correct forecast at a grid point, the forecast is only partially penalized and receives some credit for being correct. Thus, because of the inherent uncertainty associated with longer-range, high-resolution precipitation forecasts, it is easier for a superior ensemble system to receive more credit using probabilistic measures that account for forecast uncertainty.
c. Hovmoeller diagrams and related metrics
Hovmoeller diagrams were created for each SSEF and SREF ensemble member, but only the probability matched means and selected members are displayed in Figs. 4 and 5. The SSEF ensemble mean is a representative depiction of the latitudinal average forecast precipitation field of all SSEF members, as the variation between members was small (not shown). There was more member-to-member variability in the SREF ensemble, but Fig. 5 is fairly representative of what the Hovmoeller diagrams of most of the SREF ensemble members looked like, with the exception of some NMM and NMMB members. These exceptions will be addressed later in this paper. In general, the SSEF better represented the observed precipitation, although there was slight overforecasting evident in the eastern areas. During the day 2 period, a diurnal cycle was clearly evident in the SSEF, with both forecast and observed precipitation maxima occurring around forecast hour 48. In the SREF, it can be seen that much of the precipitation is smoothed out, especially in the day 2 period. Furthermore, there is a 3-h phase lag relative to observations evident in the SREF mean Hovmoeller diagram. This phase lag was observed with most of the SREF ensemble members, but not all of them. No phase lag was observed in any of the SSEF members.
As noted earlier in this section, there were a few SREF members that performed noticeably better than the SSEF ensemble mean during both the day 1 and day 2 periods. In total, four members came from the NMM model and two were from the NMMB model. Figure 6 is a Hovmoeller diagram of the nmm_n3, one of the six better-performing SREF members. As can be seen, there is no phase lag with the nmm_n3 SREF member, and a coherent diurnal cycle is evident even during the day 2 period. The other five better-performing SREF members had similar-looking Hovmoeller diagrams, with well-defined diurnal cycles (not shown). These six better-performing SREF members can be seen by looking at the results of the computed spatial correlation coefficients for the day 2 period, displayed in Fig. 7, along with the other SSEF/SREF members. The SSEF members had much higher values than most of the SREF members, but the six SREF members indicated in the plot had slightly higher coefficients than the SSEF members. In addition, the better-performing SREF members were able to accurately depict precipitation amounts across the area of analysis without a phase lag (not shown). Four of the six better-performing SREF members used the SAS convective parameterization scheme, but it is still not certain exactly why these six SREF members performed better.
Domain-averaged precipitation amounts are shown in Fig. 8 for all of the SSEF and SREF members in red and blue, respectively, along with observations in black. The phase lag can easily be seen, as the maxima in the SREF forecast precipitation are at hours 21 and 45, while the maxima in the SSEF forecasts and stage IV observations are both at hours 24 and 48. In addition, most of the SREF members greatly underforecasted the precipitation, during the both the day 1 and day 2 periods. Although the SSEF overforecast throughout the 60 h, the SSEF forecast and observed precipitation results match up fairly well, even in the day 2 period.
d. Individual cases
The ROC area for both ensembles was computed for each of the 30 cases to examine the distribution of ROC area differences between the SSEF and SREF during the day 2 forecast period. Figure 9 shows this distribution of ROC area differences (SSEF − SREF) for forecast hours 51 and 57. For both forecast hours, the median difference is around 0.1, but during forecast hour 57 there is a larger spread, with differences ranging from −0.164 to 0.498. For both forecast hours, in about 15% of the cases, the SREF performed slightly better during the day 2 period. In a few instances, especially when large convective systems were present, the SSEF far outperformed the SREF during the day 2 period. Selected forecasts and observations from two of these cases are displayed in Fig. 10. Figures 10a and 10b show the 51-h forecast PQPFs valid at 0300 UTC on 29 April for the 0.50-in. threshold (color shading), ROC area, and 3-h observed precipitation greater than 0.50 in. (overlaid in red stippling on PQPF forecast maps). Figures 10c and 10d display the same parameters as Figs. 10a and 10b except they display the 57-h forecast valid at 0900 UTC on 4 June.
There were a few differences in the synoptic setup and severe weather frequencies between the two cases. The 28–29 April case featured a vertically stacked closed low from the surface to around 300 hPa over southeast Nebraska and 50–80-kt (where 1 kt = 0.51 m s−1) 500-hPa winds over the region of interest displayed in Figs. 10a,b. At the surface, temperatures were only in the upper 70°s and lower 80°s F across Alabama and Mississippi, with dewpoints in the mid- to upper 60°s F. A weak cold front was advancing from west to east across the southeastern United States. Over 100 tornadoes were reported during this severe weather outbreak across the southeast, although none were rated higher on the Enhanced Fujita (EF) scale than EF2 (not shown). The SREF failed to depict where the heaviest precipitation would fall during the selected 3-h period on 29 April. None of the area with observed 3-h precipitation ≥ 0.50 in. had PQPFs greater than 0.2 for the 0.50-in. threshold. In addition, the SREF did not pick up on the southern part of the observed line of thunderstorms, having zero probabilities for a large area of Alabama and Mississippi at 0300 UTC on 29 April. On the other hand, the SSEF was able to predict the location of the main line of thunderstorms extending from eastern Mississippi to central Tennessee. Even the southern end of observed 3-h precipitation ≥0.50 in. corresponded fairly well to the southern end of the SSEF PQPFs for the forecast initialized 51 h before the event.
In contrast to the 29 April case, there were no well-defined upper-level features during the 3–4 June case across the plains and the midwestern United States. Zonal flow was observed in the upper levels, but there were 40–50-kt winds at 500 hPa over Nebraska, Iowa, and northern Missouri. Unlike the 28–29 April outbreak, a large temperature gradient existed across the area as a front extended from west to east across Nebraska and Iowa. South of the front, temperatures rose to around 90°F on 3 June and dewpoints were around 70°F, while north of the front, it was only in the 60°s F. Thunderstorms developed during the day in Nebraska and Iowa on 3 June and then moved into northern Missouri during the early morning hours on 4 June. Again, the SREF did not accurately predict where the bulk of the precipitation fell. It did not pick up on the area of northern Missouri and central Illinois where 3-h precipitation amounts were ≥0.50 in. In addition, the SREF also had a large area of false alarms in Iowa where PQPFs of 0.4–0.5 were forecasted, but no 3-h precipitation observations of ≥0.50 in. were observed. Conversely, the SSEF was able to predict the spatial extent of the precipitation associated with an MCS and had fewer false alarms in Iowa, as forecast probabilities were only in the 0.1–0.2 range. In the area where precipitation ≥ 0.50 in. was observed in the June case, the SSEF had probabilities of greater than 0.6 that verified. The SSEF was also able to accurately depict the eastern extension of the MCS into Illinois where the SREF had zero probability of 3-h precipitation ≥ 0.50 in.
In both cases, the SREF failed to depict where the heaviest precipitation would fall during the selected 3-h period as none of the area with observed 3-h precipitation ≥ 0.50 in. had PQPFs greater than 0.2 for the 0.50-in. threshold. In addition, the SREF did not pick up on the southern part of the observed line of thunderstorms in both the April and June cases, having zero probabilities for large areas where precipitation amounts of ≥0.50 in. were observed. The ROC area differences of 0.20–0.35 during these cases along with the forecasts and observations clearly show that the CAM SSEF ensemble had a definite forecast advantage during the day 2 period, with the advantage most recognizable when large-scale convection was present.
4. Summary and discussion
The 3-h QPFs during the day 2 forecast period from the 32-km grid-spacing SREF ensemble were compared to QPFs from the 4-km grid-spacing CAM SSEF ensemble. The forecasts were initialized at 2100 and 0000 UTC on 30 days during the 2014 NOAA/HWT Spring Forecasting Experiment from 24 April to 3 June. Some of the SREF members provide lateral boundary conditions for the SSEF. The goal of this study was to see whether the SSEF CAM ensemble QPFs outperform those from the convection-parameterizing SREF ensemble during the day 2 (36–60 h) forecast period, as previous work has found a distinct advantage in QPFs in CAMs relative to models that parameterize convection during the day 1 (12–36 h) period. The analysis was done by computing ETSs for both ensembles, using spatial correlation coefficients and Hovmoeller diagrams to examine the diurnal precipitation cycle, evaluating PQPFs by computing the ROC area, implementing hypothesis testing on the ETS and ROC area, and examining two specific cases. Results are summarized below.
ETSs computed for the 0.10–0.75-in. thresholds were fairly low (≤0.2), but the ETSs from the SSEF were consistently 0.02–0.05 points higher during the day 2 period, with the advantage peaking during forecast hours 36–42, likely associated with morning MCS activity. Hypothesis testing supported this, as significant differences were observed during the 36–42-h forecast period at many of the thresholds. The ROC area showed an even more pronounced advantage in the SSEF, especially at the higher thresholds. While the ROC area of the SSEF was only a few hundredths of a point higher at the 0.10-in. threshold, the average difference in ROC area was about 0.10–0.15 points higher at the higher thresholds. In addition, forecast hours 36–48 had significant differences at all thresholds ≥0.25 in. Hovmoeller diagrams for each ensemble member of latitudinally averaged QPFs compared with a diagram of the observed precipitation (in time–longitude space) showed that the SSEF overforecasted precipitation, but it modeled the diurnal cycle well, as evidenced by much higher mean spatial correlation coefficients than the SREF. Most of the SREF members had a 3-h phase lag in the QPFs. However, there were six SREF members that modeled the diurnal cycle well and had higher spatial correlation coefficients than the SSEF members, which was an unexpected result. Four of these six better-performing SREF members used the SAS convective parameterization scheme. However, there would need to be more investigation to determine why these six SREF members performed significantly better than the other SREF members. Day 2 PQPFs at the 0.50-in. threshold from two severe weather cases (28–29 April and 3–4 June) were examined in order to see how the SSEF and SREF forecasts would appear to an operational forecaster during a high-impact event. The SSEF greatly outperformed the SREF during the two selected cases, with ROC areas of 0.2–0.35 points higher for the selected forecast hours during these events.
After examining all of the results, it is clear that objective evaluation metrics, like the ETS, are in favor of the SSEF into the day 2 period. At all five thresholds examined, the difference in ETS is more pronounced in the early morning time frame (when some hours have statistically significant differences), when larger-scale convective systems tend to play more of a role in the forecast. The SSEF has the ability to explicitly depict these features, while the SREF cannot. Even out to hour 60, the ETS difference in favor of the SSEF ranges from about 0.02 to 0.05, with the larger differences at the higher thresholds. At thresholds higher than 0.50 in., the SREF showed essentially no skill during the day 2 period. Also, the ETS scores of the SSEF gradually decreased as the threshold increased. However, there was still some amount of skill even at the 1.00-in. threshold.
For the ROC area, similar to ETSs, the advantage of the SSEF was more pronounced as the precipitation threshold increased. This advantage was pronounced during the day 2 period as well, where the SREF showed no skill in terms of ROC area at thresholds greater than 0.50 in. The main difference in the ROC area results versus those of the ETS was that there was no pronounced maximum in performance during the early morning hours. Instead, there was a consistent difference between the SSEF and SREF during the day 2 period, with virtually all of the hours having statistically significant differences up to forecast hour 48 (at thresholds ≥ 0.25 in.).
In general, the SSEF depicts a more coherent diurnal cycle than the SREF over the domain of analysis. The SREF ensemble mean does not represent the diurnal cycle well, mainly because of its inability to explicitly depict convection, especially during the day 2 period. In addition, the SREF ensemble mean significantly underforecasted precipitation in the eastern part of the domain. The only result in favor of the SREF was those six NMM/NMMB members that slightly outperformed the SSEF in terms of spatial correlation coefficients. This was an unexpected result.
Future work should continue to examine the forecast range at which convection-allowing guidance demonstrates increased PQPF skill relative to coarser modeling systems. The NOAA/HWT Spring Forecasting Experiments serve as an ideal place for these tests, since both researchers and operational forecasters can be involved in the evaluations. The preliminary results from this study suggest that there would be at least noticeable benefits to extending an operational convection-allowing ensemble to at least 60 h.
This work was made possible by a Presidential Early Career Award for Scientists and Engineers (PECASE). Additional support was provided by NOAA/Office of Oceanic and Atmospheric Research under NOAA–University of Oklahoma Cooperative Agreement NA11OAR4320072, U.S. Department of Commerce. CAPS SSEF forecasts were supported by the NOAA Collaborative Science, Technology, and Applied Research (CSTAR) program with supplementary support from NSF Grant AGS-0802888, using computing resources at the National Science Foundation XSEDE National Institute of Computational Science (NICS) at the University of Tennessee.
It is generally believed that a maximum of 4-km grid spacing is required for CAMs to adequately resolve the bulk circulations within organized convective systems (e.g., Weisman et al. 1997).