This study addresses the uncertainty of High-Resolution Rapid Refresh (HRRR) quantitative precipitation forecasts (QPFs), which were recently appended to the operational hydrologic forecasting framework. In this study, we examine the uncertainty features of HRRR QPFs for an Iowa flooding event that occurred in September 2016. Our evaluation of HRRR QPFs is based on the conventional approach of QPF verification and the analysis of mean areal precipitation (MAP) with respect to forecast lead time. The QPF verification results show that the precipitation forecast skill of HRRR significantly drops during short lead times and then gradually decreases for further lead times. The MAP analysis also demonstrates that the QPF error sharply increases during short lead times and starts decreasing slightly beyond 4-h lead time. We found that the variability of QPF error measured in terms of MAP decreases as basin scale and lead time become larger and longer, respectively. The effects of QPF uncertainty on hydrologic prediction are quantified through the hillslope-link model (HLM) simulations using hydrologic performance metrics (e.g., Kling–Gupta efficiency). The simulation results agree to some degree with those from the MAP analysis, finding that the performance achieved from the QPF forcing decreases during 1–3-h lead times and starts increasing with 4–6-h lead times. The best performance acquired at the 1-h lead time does not seem acceptable because of the large overestimation of the flood peak, along with an erroneous early peak that is not observed in streamflow observations. This study provides further evidence that HRRR contains a well-known weakness at short lead times, and the QPF uncertainty (e.g., bias) described as a function of forecast lead times should be corrected before its use in hydrologic prediction.
Weather-related natural disasters have become more frequent and extreme, likely because of climate change and its accompanying effects (e.g., Meehl et al. 2000; Milly et al. 2002; Van Aalst 2006; Lehmann et al. 2015). These disasters have substantially endangered public safety and weakened community resilience in recent years. To prevent and manage potential threats from the increasing risks and vulnerability to extreme weather and water events, accurate hydrologic prediction is a critical factor in mitigating impacts (i.e., human life and economic losses) of the extreme events. However, operational hydrologic forecasting is challenging because prediction skill is limited by the difficulties in describing interactions of complex human-nature systems due to imperfect models and numerous uncertainties in model parameters and input data (e.g., Li et al. 2009; Pagano et al. 2014).
During the last several years, the Iowa Flood Center (IFC) has been developing and operating a fully automated flood forecasting system for the entire state of Iowa (Krajewski et al. 2017). As part of the system’s development, the IFC has deployed a distributed hydrologic model capable of representing the physical processes of transforming rainfall to runoff. The model is called the hillslope-link model (HLM); it uses a decomposition of landscape into channels and hillslopes with an average element size of 0.1 km2. In the HLM, the soil properties and land cover are used to determine the split between runoff and infiltration. For evapotranspiration, the HLM uses climatologic estimates using 12 years of North American Land Data Assimilation System data (Mitchell et al. 2004). The HLM is mainly driven by the IFC’s radar-derived quantitative precipitation estimates (QPEs), generated in real time with space and time resolutions of 0.5 km and 5 min (see, e.g., Seo and Krajewski 2015; Seo et al. 2015). For comparison, we also use the Multi-Radar Multi-Sensor (MRMS) QPE (Zhang et al. 2016) with 1-km and 1-h resolutions to drive the HLM. Likewise, in 2016, the U.S. National Water Center implemented and continues to test operationally the National Water Model (NWM). The NWM is also a distributed hydrologic model that simulates and forecasts streamflow over the entire United States (e.g., Maidment 2017; Lin et al. 2018). It contains land surface model components that require meteorological forcing data (e.g., incoming shortwave and longwave radiation, humidity, temperature, pressure, wind speed, and precipitation) to simulate terrestrial hydrologic processes. The precipitation data used for the NWM varies with different forecast model cycles (e.g., MRMS for “analysis and assimilation”). The HLM and NWM share some similarities, mainly that both are based on terrain data (see Quintero and Krajewski 2018). The operation of the HLM and NWM reveals the increasing demand for high-resolution hydrologic modeling and forecasting. These systems enable researchers to describe more detailed aspects of the interactions between atmosphere and land surface that have not been explored by conventional approaches (e.g., lumped and mesoscale models). This distributed modeling effort can complement current hydrologic guidance at National Weather Service (NWS) forecast locations and expand forecast capabilities and guidance coverage in underserved locations (Cosgrove et al. 2015, 2016).
The IFC recently added quantitative precipitation forecasts (QPFs) generated by the High-Resolution Rapid Refresh (HRRR) model into the HLM forcing stream to extend streamflow forecast lead times (e.g., Arduino et al. 2005; Li et al. 2017). The HRRR model provides precipitation forecasts up to 18-h lead times with space and time resolutions of 3 km and 1 h, which meet the requirements of the IFC’s high-resolution distributed hydrologic modeling. The NWM operational configuration of “analysis and assimilation” and “short range” model cycles (http://water.noaa.gov/about/nwm) also uses the HRRR QPF product. While the QPE products in the HLM and NWM forcing stream have often been evaluated in terms of precipitation accuracy as well as its impact on flood prediction (e.g., Chen et al. 2013; Seo et al. 2013; Gochis 2016; Krajewski et al. 2017), the utility of HRRR for flood prediction has not been widely examined. The uncertainty structure of HRRR QPF is still unknown (even if the uncertainty in the QPF is generally larger than that of QPE), and its potential effect on hydrologic prediction is not yet well understood. In this study, therefore, we aim to improve our understanding by evaluating the precipitation forecast skills of HRRR and the potential effects of its uncertainty on flood prediction. We used a significant Iowa flooding event that occurred in September 2016 to analyze and evaluate the HRRR model predictions. This case study allowed us to develop a framework to validate numerical weather predictions that are closely related to hydrologic processes (e.g., precipitation).
This paper is structured as follows. In section 2, we provide information on the flooding event, as well as precipitation (QPE and QPF) and streamflow data used in this study. We also provide a brief description of the HLM in the section. Section 3 describes the analysis framework and evaluation metrics we used for the HRRR QPF assessment and error characterization in this study. In section 4, we present the evaluation results on the HRRR QPF error and its propagation through rainfall–runoff processes. Section 5 summarizes and discusses our main findings and limitations of this study, as well as required future work.
2. Flooding event, data, and model
In September 2016, extremely heavy rainfall caused significant river and flash flooding in northeastern and central Iowa (see Fig. 1 for daily precipitation analysis). A tropical air mass interacting with a stationary front triggered several rounds of heavy storms in these areas. During the period of 20–23 September, rainfall totals reported for the region were 80–200 mm on average, and for some localities more than 250 mm. A number of towns near major rivers (e.g., Cedar and Wapsipinicon Rivers) and their tributaries in northeastern and central Iowa experienced significant flooding, causing the evacuation of thousands of residents. The river stages on the major rivers were comparable to those of the devastating Iowa flood of 2008 (e.g., Mutel 2010; Smith et al. 2013). Below, we provide more details on the meteorological and hydrologic aspects of this event, as well as collected precipitation and streamflow datasets used to analyze the uncertainty of HRRR QPF for the selected flooding event.
a. Flooding event summary
On 21 September, a stationary frontal system produced multiple rounds of showers and thunderstorms in northeastern Iowa near Mason City, as shown in Fig. 1a. These thunderstorms developed a supercell that brought significant amounts of rain over the Shell Rock River. The supercell stayed over the river for more than 3 h and produced 200–250 mm of rain, resulting in flash flooding in Floyd County. On 22 September, the frontal system moved southwesterly over central Iowa and delivered 80–120 mm of rain to the Ames area, as shown in Fig. 1b. The rain then spread and moved to northeast Iowa again, delivering 80–150 mm of heavy rain over the same area that the supercell had hit the night before. This resulted in new flash flooding and record water levels on the Shell Rock River at Shell Rock. On 23 September, another round of widespread showers and heavy storms generated by the frontal system delivered more than 130 mm of rain near Cedar Rapids, as shown in Fig. 1c. This heavy rain resulted in the second-highest crest ever recorded on the Cedar River at Cedar Rapids on 30 September. The flooding during this period was the result of intense and slow-moving storms that extensively covered the areas shown in Fig. 1, along with abnormally wet soil conditions during August 2016.
b. Precipitation and streamflow data
HRRR is a real-time convection-allowing atmospheric model updated hourly and initialized by 3-km grids with radar assimilation (Benjamin et al. 2009; Alexander et al. 2010). HRRR is fully dependent on its parent models, the radar-assimilating Rapid Refresh and radar-enhanced Rapid Update Cycle. The HRRR model provides hourly 3-km QPF for up to 18 h (while it also creates an experimental subhourly product, we used the hourly product in this study). We collected the HRRR QPF product for the 6-month period of May–October 2016 that contains the significant flooding events. We used the entire 6-month product for the conventional verification of HRRR precipitation forecast skills and examined the product for the flooding period to evaluate its impact on streamflow prediction.
MRMS integrates base radar data across the conterminous United States with satellite, lightning, and rain gauge observations, as well as atmospheric environmental data (Zhang et al. 2016). MRMS generates a suite of weather and QPE products (e.g., rainfall rate, accumulation, and precipitation type) with enhanced time resolutions ranging from 2 min to 1 day, with spatial resolution of approximately 1 km. Because of its national coverage and high resolution, MRMS products are extensively used to strengthen severe weather warnings and forecasting. In particular, MRMS QPE is not only fed into the HLM and NWM, but is also used for other operational hydrologic and weather models for improved flash flood and weather forecasting (e.g., Gourley et al. 2017). In this study, we used the rain gauge–corrected MRMS QPE (hereafter, MRMS-GC) as a reference to assess the forecast skills and performance of HRRR QPF. MRMS-GC is “1-h local gauge bias-corrected radar precipitation accumulations” referred to as “Q3GC_SHSR_1H” in Zhang et al. (2016). We collected the hourly MRMS-GC product for the same 6-month period.
The IFC product is a radar-only (e.g., without rain gauge correction) QPE, generated by combining data from seven Weather Surveillance Radar-1988 Doppler (WSR-88D) radars that cover the entire Iowa domain. The IFC QPE algorithms (e.g., Seo et al. 2011) construct a composite rain rate map every 5 min after applying polarimetric data quality control (Seo et al. 2015). The algorithms then generate hourly accumulations using an advection procedure (Seo and Krajewski 2015) to correct radar temporal sampling errors. We used the IFC product to simulate streamflow using the HLM and to compare the simulation results with those driven by MRMS-GC.
We also collected USGS streamflow observations for the evaluation of hydrologic model simulations driven by the multiple precipitation forcing products (e.g., QPE and QPF). As shown in Fig. 1, the heavy storms hit the domain of the Cedar River basin and caused significant flooding in the area. Therefore, we selected two major USGS stations within the Cedar River basin and acquired stream discharge data for the flooding period. In Fig. 1c, we present the two selected USGS stations, Waterloo and Cedar Rapids, with the basin boundary. The Cedar River basin is one of the largest basins in Iowa, with an upstream catchment area of approximately 17 000 km2.
c. Hydrologic model
The HLM builds on the concept of landscape decomposition into hillslopes and channels (Mantilla and Gupta 2005). The HLM allows for flexible structure and the representation of the physical processes of runoff generation and water transport; these processes include initial abstraction, infiltration, overland flow, percolation, base flow, and channel routing. The HLM is calibration-free, that is, a common configuration of parameters determined a priori applies to all the hillslopes. Each hillslope contains four water storage components: channel storage, water ponded on hillslope surface, effective water depth in the topsoil layer, and effective water depth in hillslope subsurface. The mass conservation equations of the water storage are defined in terms of ordinary differential equations. Channel streamflow comprises several flow components: 1) overland flow from the water ponded on hillslope surface, 2) interflow from the water depth in the top soil layer, and 3) baseflow from the hillslope subsurface. The mass transport for each channel link in the network is defined as a power-law relation that describes flow velocity as a function of discharge and drainage area (Ayalew et al. 2014). More details about the HLM equations, configuration, and numerical solver are provided in Small et al. (2013) and Krajewski et al. (2017).
In this section, we describe evaluation approaches to quantify the precipitation forecast skills of HRRR QPF and the effects of its error or uncertainty on the streamflow prediction. Figure 2 illustrates a schematic view of the evaluation procedures in this study. First of all, we assess the HRRR precipitation forecasts using common verification skill scores (see, e.g., Schaefer 1990) on forecast lead time. Before we apply the hydrologic model for streamflow simulations, we also characterize basin-scale uncertainties of the QPF forcing data. We use statistical metrics based on a meaningful hydrologic factor, that is, mean areal precipitation (MAP; Johnson et al. 1999), that is closely connected with the streamflow generation. We then drive the HLM with the forcing products (e.g., MRMS QPE and HRRR QPF) and evaluate simulated streamflow with input from HRRR QPF. We assess the streamflow simulation results with respect to HRRR lead time and basin scale. We do not use any calibration procedure in the HLM simulation. In real-time operations, it is common to drive a hydrologic model with a combination of QPE at present time and QPF for the next couple of hours or days and generate hydrologic forecasts. In this study, however, we split the QPE and QPF forcing data and drive the model individually. This forcing data separation prevents blending errors in hydrologic forecasts contributed by both the QPE and QPF forcing products and allows us to examine the errors driven solely by the QPF product.
In the analysis procedures shown in Fig. 2, we use MRMS-GC as reference to access the prediction capability of HRRR QPF. The MRMS-GC product incorporates hourly rain gauge data from the hydrometeorological automated data system (HADS; Kim et al. 2009). The performance evaluation of this reference requires an independent hourly gauge network that thoroughly covers the study domain. Unfortunately, there is no such network in the area, and thus we were not able to perform the reference product evaluation at the hourly scale. Instead, we performed a simple comparison with the results presented in Zhang et al. (2016), which reported that MRMS-GC and Stage IV products showed similar performance at the daily scale for the cold months of 2014, while MRMS-GC showed larger errors for warm months. We evaluated both Stage IV and MRMS-GC products for our study period, using the NWS Cooperative Observer Program (COOP; Mosbacher et al. 1989) network that reports daily precipitation values. We found that the daily averages of mean absolute error (MAE) of MRMS-GC (about 1.24 mm) and Stage IV were almost equivalent. This MAE value was comparable to the one for cold months (approximately 2.54 mm) in Zhang et al. (2016), which showed better performance than it did in the warm months. We also looked at an hourly gauge interpolation analysis based on HADS and observed that gauge interpolation was not effective to capture rainfall spatial variability at the hourly scale. These facts may justify the use of MRMS-GC as reference in this study. We note that radar–gauge merging schemes (e.g., Velasco-Forero et al. 2009; Nanding et al. 2015) that preserve the spatial variability of the radar field with maintaining the accuracy of rain gauge measurements might be potential options to obtain a more reliable reference product. However, we think that those merging schemes may not yield better results than MRMS-GC (local gauge bias correction) for the study area because of the low density of HADS rain gauges in the Cedar River basin (e.g., 18 gauges in a 17 000 km2 area).
a. Skill scores for QPF verification
We use four skill scores frequently employed in the forecast verification analysis to objectively assess the performance of precipitation forecasts. The scores include hit rate (HR), false alarm rate (FAR), frequency bias (FB), and Gilbert skill score (GSS). We calculate these scores based on a contingency table that describes the number of forecasting successes and failures depending on observations (event or nonevent), as presented in Table 1. The definition of the skill scores using Table 1 are
The values of TP, FN, FP, and TN in Table 1 are determined by counting the number of grids (in MRMS-GC and HRRR QPF) identified as rain or no-rain. For instance, a rain grid is defined when its rainfall value exceeds a specific rainfall threshold. The HR is equivalent to the probability of detection (POD) and indicates the ratio of the correctly forecasted number of grids to the total number of observed rain grids. The FAR is a measure of a forecast failure defined as the proportion of the number of failures to the total number of forecasted rain grids. The FB is the ratio of the total number of forecasted rain grids to the total number of observed rain grids, and an FB value greater than unity indicates the overprediction of a rainfall coverage. To evaluate overall forecast skill, we use the GSS instead of, but similar to, the critical success index (CSI; Donaldson et al. 1975), because the CSI is sometimes biased and dependent on the occurrence rate of rain events being forecasted. Schaefer (1990) provides a detailed discussion about the GSS and CSI.
b. Statistical metrics for hydrologic evaluation
As shown in Fig. 2, we divide the hydrologic evaluation of HRRR QPF into two parts based on 1) the analysis of mean areal precipitation estimated for basin units used in the HLM and 2) the HLM streamflow simulation driven by the QPF forcing. As references for both evaluations, we compute MAP using MRMS-GC and simulate streamflow discharge with MRMS-GC forcing.
We based our calculation of the verification scores of Eqs. (1)–(4) on binary classification (e.g., rain or no rain), and this does not imply the magnitude of errors in the precipitation forecasts that can directly affect the rainfall–runoff translation processes. Because precipitation forecasts contain significant uncertainties and are generally biased (e.g., Buizza et al. 2005), we introduce factors to describe errors/uncertainties of HRRR QPF that might affect errors in streamflow generation. The statistical metrics for the MAP analysis include bias B, correlation coefficient R, and root-mean-square error (RMSE). In this study, we define the bias as a multiplicative term of errors as seen in Eq. (5). The bias describes a systematic tendency of the precipitation forecasts that can erroneously generate water volume in hydrologic prediction. The correlation coefficient describes agreement or linear dependence between reference and forecasts, and the RMSE is used as a measure of the differences between them:
where RHRRR(s,t) and RMRMS-GC(s,t) denote rainfall values at a time step t and a grid location s within a specific basin.
To assess the impact of the QPF forcing on streamflow prediction, we define three evaluation metrics: Nash–Sutcliffe efficiency (NSE) defined by Nash and Sutcliffe (1970), R, and Kling–Gupta efficiency (KGE) proposed by Gupta et al. (2009). NSE typically describes the predictive power of hydrologic models and has been used steadily in numerous studies (e.g., Krause et al. 2005; Seo et al. 2013) for the evaluation of model performance. NSE is defined as
where S and O denote simulated and reference (observed) streamflow. NSE ranges from negative infinity to 1.0. The negative NSE values imply that the mean of the reference streamflow is a better predictor than model simulation, whereas values close to 1.0 indicate more accurate model performance. KGE is an alternative metric that was proposed to improve a deficiency in NSE, namely, that the peak is likely to be underestimated when NSE is used in optimization (e.g., calibration). A more detailed comparison of NSE and KGE can be found in Gupta et al. (2009). In Eq. (7), we calculate KGE using correlation and the ratios of mean and standard deviation of simulated and observed streamflow. The index can be decomposed into easy-to-understand terms:
where ρ, α, and β denote correlation, the ratio of standard deviation (σs/σo), and the ratio of mean (μs/μo) between simulated and observed streamflow, respectively. Similar to NSE, a KGE value close to 1.0 implies an optimal estimation.
In Fig. 3, we present the accumulated rainfall maps of MRMS-GC and HRRR QPF for the flooding period of 14–23 September 2016. We use the MRMS-GC product as reference because it was corrected for errors using hourly rain gauge observations (Zhang et al. 2014). Regarding the HRRR maps shown in Fig. 3, we accumulated a set of the same lead time products issued at different times and show the generated maps for 1-, 2-, 5-, and 6-h lead times. A visual inspection reveals similarities and significant discrepancies between MRMS and HRRR in Fig. 3. While the observed overall patterns that show relatively low rain in the south and high rain in the north of the domain tend to be similar, the exact locations of heavy rain and rainfall spatial distribution look quite different. HRRR QPF seems to be somewhat biased (notably, overestimated for short lead times) depending on lead times and locations. Heavy rainfall events might have larger conditional biases (on magnitude, season, storm type, and other attributes) that may have significant impact on flood forecasting (e.g., Brown et al. 2012).
a. Conventional QPF verification
We calculated the skill scores of HRRR QPF using Eqs. (1)–(4). We used the MRMS QPE product, MRMS-GC, as reference (“observation” in Table 1) because the number of ground observations (e.g., rain gauge measurements) is limited, and it is hard to extensively evaluate the skill with limited locations over a large domain. Since the reference product has higher spatial resolution (e.g., 1 vs 3 km), we assigned the same HRRR values to the multiple MRMS grids that are collocated within an identical HRRR grid for a one-to-one match between the two products. We used multiple rainfall threshold values (1, 2, 5, and 10 mm) and individually defined all matched grid cells in both products as rain or no-rain. We then counted all the categories shown in Table 1 for the 10-day flooding event and estimated the scores based on each MRMS grid. Because of the short sample period, the geographic maps of the calculated scores exposed spatially discontinuous patterns (e.g., some spikes and sinks in southeastern Iowa), particularly with HR and FB (we do not present the maps here). To resolve this issue, we extended the sampling period to six months (May–October), which included the main flooding event, and we were able to eliminate all spikes and sinks in the maps. Figure 4 shows the spatially averaged skill scores over the entire domain shown in Fig. 3, and we present the scores for both 10-day and 6-month periods with respect to forecast lead time. In Fig. 4, we can observe that the forecast skill decreases with increasing rainfall threshold as the forecast time span becomes longer. The decreasing pattern and magnitude of observed FB and overall skill (GSS) are somewhat comparable to those reported in Moser et al. (2015). We can also recognize that the accuracy or performance of HRRR QPF does not seem sufficient to justify its use in hydrologic prediction because the GSS in Fig. 4b is only about 30% even with just 1-h lead time, at which the score reaches its best performance. The sudden jumps and drops detected in Fig. 4b after the 15-h lead time arise from the fact that the HRRR forecast lead time was extended up to 18 h in late August (the previous lead time was up to 15 h). The proportion of the QPF data with full 18-h lead time is about 40% of the entire analysis period.
b. Hydrologic evaluation
The QPF verification scores shown in Fig. 4 were obtained from a binary rainfall identification and do not provide straightforward evidence to describe the QPF’s effects on streamflow predictions. In this section, we focus on one of the most significant hydrologic parameters, namely, MAP, in generating streamflow (see, e.g., Quintero et al. 2016) and characterize the statistical error structure of HRRR QPF. We also drive the HLM with HRRR QPF and investigate the QPF error’s contribution on the streamflow generation.
1) Mean areal precipitation
Figure 5 shows statistical properties of the HRRR QPF error (e.g., B, R, and RMSE) with respect to lead time, and here we present the analysis results up to 6-h lead time. For this analysis, we accumulated rainfall amounts over the Cedar River basin as shown in Fig. 1c for the same period as shown in Fig. 3. The bias is defined as a multiplicative term of QPF error as described in Eq. (5), and the value greater than 1.0 implies an overestimation of HRRR QPF versus MRMS. In Fig. 5, all nested drainage areas ranging from 0.1 to 17 000 km2 within the Cedar River basin are included in the box plots, and the statistical features of QPF error are compared across the basin scale. As one can see in Fig. 5, HRRR QPF tends to overestimate at short lead times, and the tendency of overestimation and high variability at small scales gradually decreases with longer lead times. This behavior agrees to some extent with the widely acknowledged fact that numerical weather prediction (NWP) models have difficulty in making predictions at short lead times because they do not capture the initial precipitation distribution and amounts well; the models perform better as they dynamically resolve the large-scale flow (e.g., Lin et al. 2005). The RMSE shows similar behavior observed in B, whereas R shows the opposite properties against B and RMSE: at the smaller-scale basin, variability increases with longer lead times, and R values tend to be somewhat higher as forecast lead time increases.
In Fig. 6, we aggregate the results presented in Fig. 5. Figure 6 shows the statistical error structure of forecasted MAP characterized by forecast lead time and drainage basin scale. For the metrics shown in Fig. 6, the prediction skill generally improves as basin scale grows larger, although there is an exception with the bias. We can also observe that the skill is not sufficiently good at the initial lead times; it approaches its best performance at 5- or 6-h lead time, and then decreases. Since we based this analysis on a specific event that induced significant flooding in Cedar Rapids, Iowa, the error characteristics may vary depending on the season, geographic location, and precipitation regime.
We drove the IFC HLM with the individual forcings of MRMS-GC, IFC, and HRRR QPF and present observed and simulated streamflow hydrographs. We determined the initial model states from a spinup run that uses a continuous forcing of the previous 6-month MRMS-GC product. Figure 7 shows the simulation results driven by QPE products (e.g., MRMS and IFC) to demonstrate the HLM’s capability compared to the streamflow observations at the two USGS stations, Waterloo and Cedar Rapids. We also present the NWM cycle of “analysis and assimilation” in Fig. 7. In this figure, the simulated hydrographs driven by the MRMS and IFC products look very similar and close, but show some early recession and two consecutive peaks when compared to the USGS streamflow observation. We closely investigated the double peaks and concluded that they were caused by rainfall event separation to which the actual streamflow did not sensitively respond. This is also associated with the HLM’s early recession issue discussed above. We will discuss the event separation and resulting peaks later in this section. The NWM results presented in Fig. 7 show that the falling limbs at both locations agree with the USGS observations well because the observation data were assimilated to initialize the forecast model cycles (e.g., short and medium range). We noted that the NWM also yields two peaks at Cedar Rapids that are similar to those from the HLM simulations, but the first peak of the NWM seems relatively sharp. We speculate that the sharp peak arises from the effect of the data assimilation (e.g., update of the model state using streamflow observations that are quite different from model forecasts). For Waterloo, the NWM significantly overestimated the peak without the doubled pattern observed from the HLM simulations. We present statistical metric values (e.g., NSE, R, and KGE) in Table 2 to illustrate the performance of streamflow simulations shown in Fig. 7. Table 2 shows that the NSE and KGE values of HLM simulations driven by MRMS and IFC are comparable to those of NWM, for which streamflow observation data were used for calibration. This enabled us to use the MRMS simulation results as reference for ungauged locations where USGS stations do not exist.
Figure 8 illustrates simulation results driven by HRRR QPF with lead times of 1-, 2-, 5-, and 6-h QPF. In Fig. 8, we present the USGS observation and MRMS result as references. We used the MRMS result as an alternative reference because USGS streamflow observations are not available at all catchment outlet locations employed for the MAP analysis in Fig. 5. We can only acquire USGS observations at some designated locations, which does not allow us to explore the impact of QPF error on the flood prediction at various catchment scales. For the locations where USGS observations do not exist, we captured the simulated hydrographs driven by MRMS for all hillslope links (almost 40 000 locations) within the Cedar River basin. These are used to quantify the performance of flood prediction driven by HRRR QPF. In Figs. 8a and 8c, 1- and 2-h lead time results show erroneous early peaks before 23 September as well as significant overestimation at the actual peak time shown in the MRMS simulation. On the other hand, 5- and 6-h lead time results do not generate the early peak detected with 1- and 2-h lead time QPF, but the peaks at both stations seem to be underestimated and delayed. In Tables 3 and 4, we present the performance metrics of simulations shown in Figs. 8a and 8c. For the calculation of these metrics, we used the USGS streamflow observations as reference while employing the MRMS forcing results as reference for ungauged locations in the further analysis. In Tables 3 and 4, we recognize that the performance decreases at short lead times and starts increasing from 4-h lead time, which agrees to some extent with the MAP results presented in Fig. 6. We speculate that the exceptional performance at 1-h lead time is quite different from that at other initial lead times (e.g., 2 and 3 h) because of the radar assimilation in the initialization of HRRR (e.g., Benjamin et al. 2009).
To inspect the discovered features from the simulated hydrographs, we present time series of MAP for the upstream basins of Waterloo and Cedar Rapids in Figs. 8b and 8d. The numeric values in Figs. 8b and 8d indicate accumulated MAP for the two rain events. A MAP difference of about 60 mm between MRMS and 2-h lead time QPF for the first rain event at Cedar Rapids resulted in a huge difference in water volume for the given catchment area of 17 000 km2 and led to the erroneous peak. The accumulated rainfall amounts (for the first event) over time and space as represented by the river networks are illustrated in Fig. 9a. The color on a specific location of the river networks represents MAP for the upstream catchment area of the location. Figure 9a demonstrates that the erroneous early peaks generated by 1- and 2-h lead time QPF (Fig. 8) were mainly caused by major overestimation in the northwestern upstream areas of the Cedar River basin. In Figs. 8b and 8d, the second rain event shows rainfall separation by about a day, which verifies that this event generated two consecutive peaks associated with rapid recession of HLM. We also present the aggregated rain for the second event over the river networks in Fig. 9b. Figure 9b shows significant differences in accumulated rainfall (e.g., over the Shell Rock River basin and upstream basins near Waterloo) and its spatial distribution among MRMS and HRRR QPF with different lead times. This led to the overestimation (1 and 2 h) and underestimation (5 and 6 h) of the peaks. Because of the large contributing area, the stream locations near the basin outlet (Cedar Rapids) reveal relatively small MAP when compared to some upstream tributaries.
Figure 10 demonstrates the statistical performance of hydrologic simulations driven by HRRR QPF with respect to forecast lead time and basin scale. The number of drainage areas included in Fig. 10 is identical to that in Fig. 5, and we present the results up to 6-h lead time. We note that some negative ranges for NSE and KGE are not fully shown in short lead times because negative values were widely distributed, particularly at the smaller-scale basins. In Fig. 10, we do not include the ingredients of KGE, α, and β, presented in Tables 3 and 4. Similar to the results shown in Fig. 5 and Tables 3 and 4, the performance gradually decreases at the initial lead times of 2 and 3 h, and then starts increasing at 4–6-h lead times, particularly at smaller basin scales. It is hard to find any scale-dependent property clearly, but large-scale basins demonstrate less variability and better performance, except for the performance of the largest basin scale at 3–6-h lead times.
5. Summary and discussion
To enhance hydrologic prediction capabilities, high-resolution QPF (e.g., HRRR) has recently been appended into the procedures of operational hydrologic forecasting based on distributed hydrologic modeling (e.g., Maidment 2017; Krajewski et al. 2017). In this study, we aimed to evaluate HRRR QPF in terms of the precipitation forecast skill and the effects of forecast uncertainty on flood prediction using an Iowa flooding event that occurred in September 2016. We statistically quantified the uncertainty of HRRR QPF and the performance of hydrologic simulation results driven by the QPF product. We used conventional verification skill scores employed in a number of meteorological studies (e.g., Schaefer 1990; Moser et al. 2015) for the precipitation forecast evaluation. For hydrologic evaluation, we examined MAP estimated from HRRR QPF against that from the gridded reference (MRMS), as well as hydrologic simulation results driven separately by HRRR and MRMS.
Verification of precipitation forecasts reveals that the skill gradually decreases as the forecast lead time increases (Fig. 4). The analysis also shows that the best skill is achieved at 1-h lead time (but about 30% skill based on GSS is not good enough to justify further streamflow forecasting) and significantly drops after that, implying that some improvement for 1-h as well as further lead times is necessary to apply the QPF product to hydrologic prediction. The MAP analysis shown in Figs. 5 and 6 clarifies the quantitative uncertainty features of HRRR QPF regarding catchment scale and forecast lead time. Overall, the bias and variability of HRRR QPF at smaller-scale basins steadily decrease as forecast lead times become longer (e.g., up to 6-h lead time). This indicates that the prediction capability of numerical models is limited at short lead times because of the challenge in model initialization (e.g., Lin et al. 2005). HRRR appeared to improve the initialization issue somewhat using radar data assimilation (Benjamin et al. 2009) because the QPF uncertainty at 1-h lead time was remarkably lower than it was at other initial lead times such as 2, 3, and 4 h (see, e.g., Fig. 6). Three-dimensional radar reflectivity data created as part of the MRMS data suite are assimilated into the HRRR model to initialize storm information at the 3-km scale and thus to improve predictions regarding ongoing convection at the earlier forecast lead times. While we are uncertain of the assimilation performance using the radar data in HRRR, Pinto et al. (2015) reported that the model predicted too many convective systems within 4-h lead time over the Great Plains.
In operational hydrologic forecasting, a combination of available QPE at present time and QPF for the next couple of hours or days is used to drive a hydrologic model. This general procedure is repeated at each forecast time and generates streamflow forecasts. In this study, however, we separately drove the HLM with each QPE and QPF forcing product (see Fig. 2) to examine hydrologic prediction errors contributed solely by the QPF uncertainty. The QPF forcing products were organized by the same lead time data issued at different times. We note that this setup may exaggerate the prediction errors, and we may not see the same or similar extent of errors in actual operations as or to those presented in this study. The hydrologic simulation results based on performance metrics (Tables 3, 4) demonstrate that the result driven by 1-h lead time QPF is relatively better than other lead time results, and the performance sharply drops at initial lead times (e.g., 2 and 3 h). The best performance achieved at 1-h lead time does not seem acceptable because it significantly overestimates the flood peak along with the generation of an early erroneous peak that is not detectable in the streamflow observation (Fig. 8). The hydrologic simulation shows a similar tendency discussed in the MAP analysis in which the QPF simulation performance sharply decreases at 1–3-h lead times and starts increasing from 4- through 6-h lead times. The scale dependence shown in the MAP analysis (Fig. 5) is not clearly detected in the hydrologic simulations, while large-scale basins show better performance and smaller variability, particularly at 5- and 6-h lead times (Fig. 10).
We based this study on a specific event that caused significant flooding in Cedar Rapids, Iowa. The HRRR error characteristics may vary depending on the season, geographic location, and precipitation regime (see, e.g., Pinto et al. 2015). Therefore, for comprehensive understanding, we need further study to collect multiyear products and examine error features of HRRR QPF. We also know that the performance of persistence-based precipitation forecasts tends to be better than that of numerical weather predictions at very short lead times (e.g., Wilson et al. 1998; Lin et al. 2005). The range of these short lead times may vary with different forecast schemes and models.
Our future investigation will study which lead time shows the performance transition between HRRR QPF and persistence-based (e.g., advection) forecasts by comparing their forecast skills. This will provide researchers and forecasters with useful information on whether they should replace or blend model QPF with persistence-based QPF to improve prediction skills at short lead times. The evaluation framework presented in this study demonstrates that the QPF error is a function of forecast lead time. We note that the QPF error (e.g., bias) should be individually corrected regarding forecast lead time before its application to hydrologic prediction, and the required procedure for error characterization (e.g., bias and variability) can be the basis of an ensemble nowcasting framework (e.g., Bowler et al. 2006).
This study was supported by the Iowa Flood Center at the University of Iowa and the Hydrometeorology Testbed (HMT) Program within NOAA/OAR Office of Weather and Air Quality under Grant NA17OAR4590131. The authors are grateful to the University Information Technology Services and IFC staff who facilitated the HLM simulations using the clusters of high-performance computing resources.