1. Introduction
Streamflow forecasting is an essential aspect of water resources management (Adams 2016; Demargne et al. 2009; Pagano et al. 2014; Sene 2010). Accurate and timely streamflow forecasts assist watershed stakeholders in making decisions about water resources over a range of time scales. Over the extended range (months to seasons) streamflow forecasting is often used to predict water supply, with a particular focus on the prospects for drought (Hisdal et al. 2004). Long-range streamflow forecasts also enable strategic planning for the maintenance of minimum environmental flows in rivers (Nicolle et al. 2014). Over shorter time scales (days to weeks), streamflow forecasting is primarily used to guide flood early warning systems in large river basins (Alfieri et al. 2013; Bartholmes et al. 2009). In smaller basins (<500 km2), subdaily (i.e., hours ahead) streamflow forecasts are often needed to provide flash flood guidance for periods of extreme rainfall that generate flash flooding (Reed et al. 2007).
In addition to aforementioned applications, there is nascent interest among the agricultural community in using short- to medium-range hydrometeorological forecasts to support operational decision-making in agriculture (Buda et al. 2013; Easton et al. 2017). Several forecasting systems have recently been developed that provide advanced warnings of rainfall–runoff events that have the potential to wash off recently applied fertilizers and manures (Easton et al. 2017). In the agricultural water quality arena, wash-off events are termed “incidental nutrient losses” (Preedy et al. 2001), and these nutrient transfers pose immediate risks to surface water quality (Kleinman and Sharpley 2003). The quickflow component of streamflow (also known as direct runoff), which includes fast-responding processes like interflow and overland flow (Robinson and Ward 2017), has been the focus of several runoff forecasting systems developed for agricultural decision-making (Drohan et al. 2019; Goering 2013). However, since quickflow is rarely the basis of most experimental and operational streamflow forecasting applications (cf. Ghimire et al. 2020), there is a need for research that provides insight into the quality of quickflow forecasts that are issued for decision-making purposes.
While hydrological models have seldom been employed in quickflow forecasting, several studies have used models to examine the disproportionate contribution of quickflow to high streamflows. Many of these studies have applied the Sacramento Soil Moisture Accounting Model (SAC-SMA) to investigate quickflow controls on the flow duration curve (Chouaib et al. 2018; van Werkhoven et al. 2008; Yilmaz et al. 2008), as the structure of SAC-SMA enables streamflow to be separated into slow components like baseflow and fast components like direct runoff, surface runoff, and interflow (Burnash 1995). Moreover, SAC-SMA is a key element of the hydrologic modeling system employed by the National Weather Service (NWS), and the model is widely used in operational streamflow forecasting across the United States (Adams 2016). While most operational streamflow forecasts are still produced using lumped models, a gridded version of SAC-SMA is also used by NWS River Forecast Centers to produce flash flood guidance (Schmidt et al. 2007) and by the National Severe Storms Laboratory to predict flash flooding (Gourley et al. 2017). Given the extensive use of SAC-SMA in hydrometeorological simulations, opportunity exists to test the model’s utility in forecasting quickflow.
In this study, we examined whether a gridded version of SAC-SMA could be applied to short-range (24–72 h) forecasting of quickflow in Mahantango Creek, a mixed use watershed in east-central Pennsylvania (PA). To do this, we used the Hydrology Laboratory–Research Distributed Hydrologic Model (HL-RDHM) software to simulate the hydrology of the Mahantango Creek watershed and a small headwater interior basin over a 15-yr period from 2004 to 2017. We then applied the calibrated model to quickflow forecasting in both watersheds from 21 July 2017 through 28 October 2019. For the purposes of this study, we define short-range forecasts as forecasts with lead times that extend out to 3 days. The overarching goal of the work was to assess the potential for quickflow forecasting to inform short-term management decisions in agricultural watersheds. The specific objectives of the study were twofold: 1) assess the performance of HL-RDHM in simulating the quickflow component of streamflow, and 2) ascertain the quality of short-range deterministic forecasts of quickflow. Because HL-RDHM was only calibrated for basin outlet simulations in Mahantango Creek, we were also interested in determining the potential for HL-RDHM to simulate and forecast quickflow in the interior basin.
2. Study area
The study focuses on Mahantango Creek, a 420-km2 mixed land-use watershed that discharges to the Susquehanna River roughly 40 km north of Harrisburg, PA (Fig. 1). The Mahantango Creek watershed, which lies within the forecasting domain of the Middle Atlantic River Forecast Center (MARFC) in State College, PA, has a long history of hydrometeorological monitoring and research that supports the current study. Since 1930, the U.S. Geological Survey (USGS) has routinely gauged stream discharge at the watershed outlet (Dalmatia, PA). In the northcentral portion of Mahantango Creek, the U.S. Department of Agriculture’s Agricultural Research Service (ARS) continues to conduct hydrological and water quality studies at the 7.3-km2 WE-38 watershed (Fig. 1), one of 23 benchmark experimental watersheds maintained by ARS (Bryant et al. 2011). Since its inception in 1968, WE-38 has been the locus of many plot- and field-scale studies examining the role of variable source area hydrology in producing quickflow (Buda et al. 2009; Gburek et al. 2006; Needelman et al. 2004).
3. Datasets
a. Hydrometeorological observations
We used continuous stream discharge and precipitation datasets (1968 to present) to corroborate historical model simulations and verify short-range forecasts. In the WE-38 watershed (Fig. 1), precipitation was measured at three permanent rain gauges in the watershed (Buda et al. 2011b; Lu et al. 2015) and discharge was measured using a broad-crested weir in tandem with a 90° v-notch weir (Buda et al. 2011a). All hydrometeorological measurements in WE-38 were recorded at 5-min intervals. Mean areal precipitation in WE-38 was estimated using the Thiessen polygon method (Thiessen 1911). Daily and 15-min discharge data were also obtained from the outlet of Mahantango Creek at Dalmatia, PA (USGS Gauge 01555500) using the “dataRetrieval” package (version 2.7.9; DeCicco et al. 2021) in R (version 4.0.3; R Core Team 2021). For the 2017–19 forecast verification period, all subdaily hydrometeorological data from WE-38 and Mahantango Creek were coarsened to daily intervals that represented 24-h totals (e.g., millimeters of precipitation, streamflow, and quickflow) measured at 1200 UTC.
b. Estimating quickflow observations
Thus, reasonable estimations of a ranged from 0.935 in WE-38 to 0.950 in Mahantango Creek. We used these a values as proxies for f in (1). Using daily streamflow data, we then implemented the Lyne and Hollick filter (number of passes = 1) using the BaseflowSeparation function in the “EcoHydRology” package (version 0.4.12.1; Fuka et al. 2018) in R (version 4.0.3).
4. Methodology
a. Distributed hydrologic model
We used HL-RDHM to simulate the hydrology of Mahantango Creek. HL-RDHM is a physically based, spatially distributed hydrologic modeling system that can be executed using grid cells or subbasins as the basis for computation (Koren et al. 2004). Typically, HL-RDHM is implemented using a gridded model structure that is based on the Hydrologic Rainfall Analysis Project (HRAP) grid (Greene and Hudlow 1982). Each HRAP grid cell has a nominal resolution of 4 km × 4 km, which corresponds directly with NOAA’s multisensor precipitation products (Adams 2016). Within each HRAP grid cell, the Sacramento Soil Moisture Accounting model with Heat Transfer (SAC-HT) is used to represent rainfall–runoff processes (Koren et al. 2006, 2014), and the SNOW-17 model (Anderson 2002, 2006) is used to model snow accumulation and melt. In essence, each cell modeled by SAC-HT acts as a hillslope that is capable of generating a range of slow and fast responding runoff components that discharge to streams. Slow responding runoff components in SAC-HT include supplemental and primary baseflow, while fast responding components include direct runoff, surface runoff, and interflow. In this study, the surface runoff and interflow components in SAC-HT were assumed to represent quickflow, which is consistent with studies by Finnerty et al. (1997) and Khakbaz et al. (2011) that used SAC-SMA (Burnash 1995), a precursor to SAC-HT. Hillslope and streamflow routing were then determined using the kinematic wave model (Koren et al. 2004).
To force HL-RDHM, we used gridded observations of near-surface temperature and precipitation over the period 2003–18. The gridded precipitation data consisted of hourly multisensor precipitation estimates (MPEs) that were generated by combining multiple radar products and hourly rain gauge data (Rafieeinasab et al. 2015; Zhang et al. 2011). Note that the three USDA-operated rain gauges in WE-38 were not included in the dataset used to generate MPEs. Hourly gridded temperature data, which were used by HL-RDHM to estimate snow accumulation and melt, consisted of observations from several networks, including METAR (Meteorological Aerodrome Reports), USGS, and the NWS Cooperative Observer Program (COOP). Precipitation and temperature forcing data were defined by the 4-km HRAP grid. To better represent the hydrology of small basins like WE-38, we implemented HL-RDHM in Mahantango Creek using a 1/2 HRAP (∼2 km) grid resolution. In this case, the same temperature and precipitation forcing was applied to each of the four 1/2 HRAP cells that were found within a given HRAP cell (e.g., Reed et al. 2007).
To initialize HL-RDHM in Mahantango Creek, we conducted a modest calibration targeting streamflow prediction at the USGS gauge near the watershed outlet in Dalmatia, PA (Fig. 1). Initial calibration was guided by a priori parameter estimates from previous studies (Koren et al. 2000, 2004). Additional manual calibration focused on three key facets of the simulation: 1) decreasing overall and monthly biases in the water balance, 2) minimizing flow interval biases to best represent the partitioning of baseflow and quickflow, and 3) reducing RMSEs of simulated surface runoff amounts. Greater emphasis was given to parameters affecting water storage (i.e., upper and lower zone tension) and evapotranspiration, while less attention was given to fine tuning parameters influencing runoff and channel routing. Similar to Reed et al. (2004) and Smith et al. (2012a), we were interested in ascertaining the performance of HL-RDHM simulations in a small interior basin like WE-38 when a modest calibration effort was applied to a larger parent basin like Mahantango Creek. As such, no explicit calibration was conducted in the WE-38 watershed. Rather, an indirect calibration procedure was implemented wherein scalar multipliers derived from the calibration at the outlet of Mahantango Creek were used to uniformly adjust all parameters in the grid cells within Mahantango Creek.
We used 15 years (2003–17) of hourly streamflow data to calibrate and validate HL-RDHM in the Mahantango Creek watershed. The first year of data (2003) was used as a warm-up period for HL-RDHM, while the subsequent 14 years of data (2004–17) were used to judge the performance of streamflow and quickflow predictions. Model calibration was carried out over the 9-yr period from 2004 to 2012, while model validation was performed during the 5-yr period from 2013 to 2017. We used the “hydroGOF” package (version 0.4-0; Zambrano-Bigiarini 2020) in R (version 4.0.3) to estimate a suite of goodness-of-fit measures (see Table A1 in the appendix) between simulations and observations of streamflow and quickflow. For the sake of simplicity, we reported goodness-of-fit measures for the entire 14-yr simulation period (2004–17). While we calculated a wide range of model performance measures (Table A1), our interpretation of model performance mainly focused on the metrics in Table 1.
Metrics used to judge the performance of HL-RDHM simulations of daily streamflow and quickflow. Interpretations based on R2, PBIAS, RSR, and NSE utilized the performance evaluation criteria of Moriasi et al. (2007) and Moriasi et al. (2015). Interpretations based on KGE″ were adapted from the criteria of Towner et al. (2019). For the NSE and KGE″ scores, single-valued internal climatology (i.e., using the mean of all daily flow observations as a constant prediction) was used as the benchmark for judging model efficiency (Knoben et al. 2019; Schaefli and Gupta 2007; Seibert 2001).
We included the newer KGE″ measure in our assessment because quickflow is an intermittent runoff generation process with recurrent days of zero flow. Frequent zero flow days can result in unrealistic estimations of the bias ratio in the traditional KGE (Santos et al. 2018). The KGE″ score avoids this issue by using the variance of the observations to standardize the difference between the simulation mean and the observation mean (Clark et al. 2021; Tang et al. 2021). As such, we used KGE″ to inform the performance of quickflow simulations by HL-RDHM.
To assess the simulations of flow magnitudes independent of timing issues, we also used “hydroGOF” (Zambrano-Bigiarini 2020) to compute performance metrics for flow duration curves (FDCs) and quickflow runoff duration curves (RDCs). We used NSE and KGE″ scores to assess the fit between simulated and observed FDCs (e.g., Euser et al. 2013) and RDCs. We also examined a set of FDC signatures (see Table A2 and Fig. A1 in the appendix) that quantified biases in four regions of the FDC (Ley et al. 2015; Yilmaz et al. 2008): peak flows (0%–5%), high flows (5%–20%), intermediate flows (20%–70%), and low flows (70%–100%). We were mainly interested in peak- and high-flow regions of the FDC given the controls of quickflow on these flow regimes (Yokoo and Sivapalan 2011). Based on Ley et al. (2016) and the PBIAS criteria in Table 1, biases in any region of the FDC were judged satisfactory if they did not exceed ±15%.
Goodness-of-fit statistics for daily streamflow simulations by the HL-RDHM for the USGS gauging station near the outlet of Mahantango Creek (MCW; parent basin) and the WE-38 weir (interior subbasin). Shown to the right of the KGE″ score are its three essential elements, including r (Pearson correlation coefficient), α (variability ratio), and βr (bias ratio). Ideal values of r and α are 1, while the ideal value of βr in the newly formulated KGE″ is 0.
b. Quickflow forecasts and verification strategy
We used HL-RDHM to generate daily deterministic (i.e., single-valued) quickflow forecasts for the Mahantango Creek and WE-38 watersheds from 21 July 2017 through 28 October 2019. All quickflow forecasts were issued once daily at 1200 UTC for lead times of 24, 48, and 72 h. To initialize quickflow forecast runs, the SAC-HT and SNOW-17 states were updated daily using hourly observed precipitation and temperature grids that were generated by MARFC; there was no assimilation to match observed streamflow at the outlet of Mahantango Creek. All quickflow forecasts generated by HL-RDHM were forced by archived quantitative precipitation forecasts (QPFs) produced at MARFC. We did not have archived temperature forecasts available, so we used observed temperatures as a proxy for the forecasting experiment. In this study, QPFs were provided at 6-h intervals, while quickflow forecasts were made hourly by HL-RDHM. All forecasts represented daily sums for 1-day periods: the valid time +24 h.
Verification of quickflow forecasts in Mahantango Creek focused on an array of categorical performance measures that could be calculated from binary forecasts. Following the guidance of WWRP (2008), we generated a set of categorical variables by stratifying continuous daily quickflow forecasts according to seven levels of runoff intensity: 0.1, 0.2, 0.5, 1, 2, 5, and 10 mm day−1. Essentially, quickflow was converted to a binary variable that indicated whether or not quickflow was detected at a given level of intensity. For each lead time, we developed performance diagrams (Roebber 2009) that graphically displayed the probability of detection (POD), success ratio (SR), frequency bias (BIAS), and critical success index (CSI) by quickflow exceedance threshold (see Table A3 in the appendix for definitions). The performance diagram aids in forecast verification because it utilizes the geometric relationship between POD, SR, BIAS, and CSI to illustrate forecast quality in one graph. Good forecasts have POD, SR, BIAS, and CSI values close to one, so that a perfect forecast plots in the top right corner of the graph.
Goodness-of-fit statistics for aggregate daily flow duration curves (FDCs) simulated by the HL-RDHM, as well as four distinct FDC zones: peak flows (0%–5%), high flows (5%–20%), intermediate flows (20%–70%), and low flows (70%–100%). Statistics are given for the USGS gauging station near the outlet of Mahantango Creek (MCW; parent basin) and the WE-38 weir (interior subbasin). See Table A1 in the appendix for more details on these metrics.
We also evaluated the Peirce skill score (PSS; Peirce 1884) by lead time for each of the seven runoff intensity levels. The PSS assesses how well the forecast system distinguishes quickflow events that exceed a given intensity threshold from those that do not exceed the threshold [see appendix and Wilks (2011) for more details on the PSS]. The PSS is an equitable skill score (Hogan et al. 2010) that accounts for all components of the 2 × 2 contingency table (see Fig. A2 and Table A3 in the appendix). Fundamentally, PSS represents the difference between POD (the fraction of observed quickflow events that were correctly forecast as events) and POFD (the fraction of observed nonevents that were incorrectly forecast as events); perfect forecasts achieve a PSS of 1, while no-skill forecasts have PSS < 0. The PSS is widely used in the categorical verification of precipitation forecasts (WWRP 2008), and it has also been used to verify flood early warning systems in Europe (Alfieri et al. 2013; Bartholmes et al. 2009) and to verify global forecasts of monthly flow extremes (Candogan Yossef et al. 2012).
In addition to verifying binary forecasts of quickflow, we examined multicategory forecasts of naturally ordered quickflow classes (Livezey 2012; Wilks 2011), as such classes would likely form the basis for an early warning system that could be applied to agricultural decision-making. To do this, we used observed quickflow RDCs in Mahantango Creek and WE-38 to classify daily quickflow into four categories based on runoff frequency and magnitude: low (<0.1 mm day−1), medium (0.1–1 mm day−1), high (1–5 mm day−1), and very high (>5 mm day−1). For each basin, the very high-flow category (>5 mm day−1) generally represented daily quickflow events with exceedance probabilities less than 2.5%, while the high-flow category (1–5 mm day−1) captured daily quickflow events with exceedance probabilities ranging from 2.5% to 10%. The medium flow category (0.1–1 mm day−1) included more frequently occurring, but smaller quickflow events with exceedance probabilities between 10% and 35%. Using the resultant 4 × 4 contingency tables for each lead time (see Figs. A3 and A4 in the appendix), we then calculated the Gerrity skill score (GSS; Gerrity 1992) to quantify the accuracy of issuing quickflow forecasts in the correct magnitudinal category. The GSS is a highly desirable metric for multicategory forecasts because it gives more weight to correct forecasts of low-probability events than correct forecasts of high-probability events. Moreover, the GSS is relatively unaffected by biases that might arise from outliers or nonlinear trends in the pairwise data. Like PSS, GSS is an equitable score (Gandin and Murphy 1992), with a value of 1 indicating perfect skill.
We used the “verification” package (version 1.42; Gilleland 2015) in R (version 4.0.3) to verify all quickflow forecasts. To characterize the uncertainty of the PSS and GSS calculations, we estimated 95th percentile confidence intervals using the circular block bootstrapping technique (Politis and Romano 1992); a minimum of 1000 replications was used to obtain a sufficiently large sample distribution of the PSS and GSS statistics. Typically, the skill scores of two or more forecasts (e.g., a modeled forecast and a naïve or unskilled reference forecast) are compared by assessing the degree of overlap between their respective confidence intervals. In accordance with recommendations by WWRP (2008), we opted to calculate confidence intervals for the mean difference between the skill score statistics (PSS and GSS) of the quickflow forecasting system and those of a persistence reference forecast. A persistence forecast (Stanski et al. 1989) is essentially the observed quickflow on the day the forecast is issued, persisted for all lead times. In short-range streamflow forecasting, persistence is frequently used as a reference (or benchmark) forecast (Harrigan et al. 2018; Pappenberger et al. 2015). Instances where the 95% confidence intervals were greater than zero suggested that the quickflow forecasts issued by HL-RDHM were significantly better than persistence forecasts.
5. Results and discussion
a. Performance of HL-RDHM simulations
1) Streamflow simulation
Comparisons of observed and simulated daily streamflow hydrographs (2004–17) are presented in the online supplemental material for Mahantango Creek (Figs. S1 and S2) and WE-38 (Figs. S3 and S4), while Fig. 2 focuses on specific results for 2011. Model performance measures for daily streamflows are reported in Table 2; hourly performance statistics are given in Table S1.
It is worth noting that HL-RDHM was only calibrated using streamflow observations at the Mahantango Creek outlet, while parameters for WE-38 were adjusted using scalar multipliers that were derived from the parent basin. Thus, the calibrations for WE-38 were indirect. Even so, the RSR value of 0.66 at WE-38 was satisfactory based on the criteria in Table 1, while the RSR value of 0.52 at Mahantango Creek was considered good. In terms of bias, HL-RDHM tended to slightly overestimate streamflow in Mahantango Creek (PBIAS = 4.1%), while in WE-38, HL-RDHM overestimated streamflow to a much greater degree (PBIAS = 20.9%).
In addition to quantifying errors in HL-RDHM streamflow simulations, we also evaluated measures of association and hydrologic performance metrics (Table 2; Fig. 3). As shown in Fig. 3, R2 values of 0.72 at Mahantango Creek and 0.68 at WE-38 indicated the strength of the linear association between observed and simulated streamflows was satisfactory (Table 1). Additionally, the intercepts of their respective regression lines were close to zero (Fig. 3), providing further evidence of the correspondence between simulations and observations (Krause et al. 2005). While the slope of the best-fit regression line was roughly 1:1 in Mahantango Creek, the slope for simulations in WE-38 was 0.72, reflecting the overestimation bias indicated by PBIAS. A notable trend in Fig. 3 was the propensity for greater deviations between observed and simulated streamflows at higher flows, which could be due to the occasional underestimation of high streamflows by HL-RDHM, as well as event-based simulation errors like hydrograph timing and peak-flow prediction (see supplemental material for more information on these metrics). Nevertheless, model efficiency measures like NSE generally indicated good performance in Mahantango Creek (NSE = 0.72) and satisfactory performance in WE-38 (NSE = 0.56) according to the criteria in Table 1. Higher values of KGE″ suggested the potential for slightly better model performance in both watersheds (Table 2).
We also used performance metrics to determine if HL-RDHM could replicate the observed flow frequency distribution in addition to representing the hydrograph (Westerberg et al. 2011). As shown in Fig. 4, aggregate FDCs at the outlets of Mahantango Creek and WE-38 were well modeled by HL-RDHM. This inference was supported by high NSE and KGE″ values indicating very good correspondence (Table 1) between observed and simulated FDCs (Table 3).
Concerning the different FDC zones, HL-RDHM showed mostly good performance (Table 1) in simulating all flow regimes except low flows in Mahantango Creek (Table 3), with generally low biases (<±15%) in the high-flow and peak-flow regions of the FDC (Fig. 4) that reflect quickflow runoff processes. Notably, however, the high flows in WE-38 were not well simulated, as indicated by negative values of NSE (NSE = −1.90) and KGE″ (KGE = −0.70), and an RSR value much greater than 0.7 (RSR = 1.70). Other flow regimes, including peak flows, intermediate flows, and low flows, were generally well-simulated in WE-38 (Table 3). Even so, biases in the high-flow and peak-flow regions of the FDC were quite high in WE-38, with the poorly simulated high-flow region having the largest positive bias (FMV = 47.2%; Fig. 4).
2) Quickflow simulation
Although HL-RDHM was calibrated for streamflow prediction in Mahantango Creek, we were interested in judging whether the model could represent the quickflow component of streamflow, as quickflow was the main focus of the forecasting experiments. In general, the performance metrics in Table 4 suggested that HL-RDHM provided good representations of quickflow at the calibrated outlet location in Mahantango Creek, while HL-RDHM did not perform quite as well in the uncalibrated interior WE-38 subbasin (Table 1). For instance, the RSR value for Mahantango Creek was less than 0.70, while RSR was slightly higher than 0.70 for WE-38. In Mahantango Creek, quickflow was modestly overestimated (PBIAS = 10.5%). In contrast, HL-RDHM substantially overestimated quickflows in WE-38 (PBIAS = 69.0%).
Goodness-of-fit statistics for daily quickflow simulations by the HL-RDHM for the USGS gauging station near the outlet of Mahantango Creek (MCW; parent basin) and the WE-38 weir (interior subbasin). Shown to the right of the KGE″ score are its three essential elements, including r (Pearson correlation coefficient), α (variability ratio), and βr (bias ratio). Ideal values of r and α are 1, while the ideal value of βr in the newly formulated KGE″ is 0.
Analysis of association measures and hydrologic performance metrics provided further insight into the ability of HL-RDHM to simulate quickflow. Similar to the streamflow simulations, the strength of the linear association between observed and simulated quickflows (Fig. 5) was deemed satisfactory (Table 1), with R2 values of 0.75 at Mahantango Creek and 0.63 at WE-38. Notably, the slope of the best-fit regression line was roughly 1:1 in Mahantango Creek with an intercept approaching zero (Fig. 5). In WE-38, however, the slope of the regression line was 0.64 (Fig. 5), highlighting a tendency for HL-RDHM to overestimate quickflow as indicated by the high PBIAS (Table 4).
Given the amplified bias, which was particularly evident in WE-38, we used KGE″ to inform the performance of HL-RDHM quickflow simulations (Table 1). In Mahantango Creek, quickflow simulations were very good (KGE″ = 0.81), while quickflow simulations in WE-38 were good (KGE″ = 0.65). The slightly poorer quickflow simulations in WE-38 highlighted some of the challenges in simulating small watershed hydrology with adjusted parameter fields from larger, calibrated parent basins like Mahantango Creek. As Reed et al. (2004) noted in their studies of distributed hydrologic models, smaller basins tend to have greater hydrologic variability than larger basins. Indeed, the variability ratio (α) element of the KGE″ score (Table 4) revealed that HL-RDHM tended to overestimate daily variations in quickflow in WE-38, indicating an inability to match quickflow dynamics at the scale of WE-38. In spite of the higher variability and bias ratios in WE-38, we noted that quickflow simulations were still generally satisfactory.
Similar to the FDC analyses for streamflow, we assessed the ability of HL-RDHM to simulate the distribution of quickflow runoff frequencies (Table 5; Fig. 6).
Goodness-of-fit statistics for aggregate daily quickflow runoff duration curves (RDCs) simulated by the HL-RDHM. Statistics are given for the USGS gauging station near the outlet of Mahantango Creek (MCW; parent basin) and the WE-38 weir (interior subbasin).
Unlike the FDC analyses (Fig. 4), we did not divide the RDC into different zones. Instead, we only quantified performance measures for the aggregate RDCs (Fig. 6). In general, HL-RDHM performed better at the outlet of Mahantango Creek, with simulated RDCs attaining high KGE″ values, and an RSR value less than 0.70. At WE-38, however, KGE″ values suggested good performance in simulating the aggregate quickflow RDC. It is worth noting that the overestimation bias in the upper zone of the RDC in WE-38 (exceedance probabilities < 20%) was readily apparent (Fig. 6). This overestimation bias was also evident in the high-flow and peak-flow zones of the FDC in WE-38 (Fig. 4); these are the FDC zones where quickflow has the greatest influence (Yokoo and Sivapalan 2011; Yaeger et al. 2012). Even so, the capacity of HL-RDHM to satisfactorily simulate the aggregate quickflow RDCs in both basins suggested an ability to represent the frequency and magnitude of quickflow.
b. Verification of short-range quickflow forecasts
Given the satisfactory representation of quickflow hydrographs and runoff frequency distributions by HL-RDHM, we used the calibrated model to generate 24-, 48-, and 72-h forecasts of quickflow in Mahantango Creek and WE-38. The forecasting period extended roughly 28 months from 21 July 2017 through 28 October 2019 (n = 830 daily observations). Table 6 details the number of quickflow events that exceeded each intensity threshold during the forecasting period.
Number of quickflow events observed for each of the seven quickflow intensity thresholds used in the forecasting experiment. Event totals are reported separately for Mahantango Creek (MCW; parent basin) and the WE-38 weir (interior subbasin).
In general, measurable quickflow was observed on 39% of the forecast days in Mahantango Creek and on 45% of the forecast days in WE-38. As shown in Fig. 7, the majority of these runoff events occurred during an exceptionally wet period that encompassed the warm season of 2018 (Winter et al. 2020), with several extreme events in mid- to late July.
1) Performance diagrams
We used performance diagrams (Roebber 2009) as an initial step in the verification of quickflow forecasts at Mahantango Creek and WE-38 (Fig. 8).
In general, quickflow forecasts issued by the HL-RDHM tended to perform as well as or better than persistence quickflow forecasts at all seven runoff intensity levels, especially at lead times of 48 and 72 h (Fig. 8). Because of the strong autocorrelation in persistence forecasting (Ghimire and Krajewski 2020; Ghimire et al. 2020), especially at short lead times, the performance diagrams showed all persistence quickflow forecasts generally plotting along the 1:1 line, indicating near-zero bias. Moreover, persistence forecasts at the 24-h lead time tended to slightly outperform HL-RDHM forecasts at lower intensity levels (≤1 mm day−1); at higher intensity levels (>1 mm day−1), the HL-RDHM forecasts were increasingly skillful. At lead times beyond 24 h, HL-RDHM quickflow forecasts outperformed persistence at all quickflow intensity levels, owing to the higher POD (discrimination), SR (reliability), and CSI (accuracy) scores attained by HL-RDHM forecasts relative to persistence. The superiority of persistence quickflow forecasts at 24-h lead times, especially in Mahantango Creek, is consistent with the findings of Ghimire and Krajewski (2020) and Ghimire et al. (2020) showing that short-range persistence forecasts are often difficult to surpass at larger watershed scales.
Quickflow forecasts generated by HL-RDHM showed a consistent positive frequency bias (Bias > 1; see Table A3) at all seven quickflow thresholds, with forecasts in WE-38 generally having a greater frequency bias than Mahantango Creek across all lead times and thresholds. In WE-38, the positive frequency bias was most apparent at the higher runoff thresholds (>1 mm day−1), which was in line with the positive biases noted in the quickflow simulations for WE-38, particularly in the quickflow RDC (Table 4; Fig. 6). Essentially, the positive frequency biases in both basins suggested a tendency for HL-RDHM to overforecast quickflow, which likely led to the high POD scores relative to those of persistence quickflow forecasts, especially at later lead times. Categorical verification of QPFs in Mahantango Creek and WE-38 (see supplemental material) did not reveal the same degree of positive frequency bias as the daily quickflow forecasts, although slight overforecasting of precipitation, which was most apparent at 24- and 48-h lead times, may have contributed to the overforecasting of quickflow. Notably, QPFs were relatively unbiased at 72-h lead times in both watersheds, with a tendency to actually underforecast (Bias < 1) the frequency of the highest precipitation amounts (>10 mm day−1).
2) Peirce skill scores
In addition to performance diagrams, we also plotted the PSS by lead time across the seven runoff intensity levels (Fig. 9).
In Mahantango Creek and WE-38, values of PSS for quickflow forecasts by HL-RDHM were consistently positive at all lead times and intensity levels, indicating skillful quickflow forecasts. Notably, HL-RDHM quickflow forecasts in WE-38 had higher PSS values than HL-RDHM quickflow forecasts in Mahantango Creek; roughly 90% of PSS values exceeded 0.5 in WE-38, while only 62% of PSS values exceeded 0.5 in Mahantango Creek. The higher PSS values in WE-38 relative to Mahantango Creek were likely due to the strong overforecasting bias for quickflow in WE-38, which enhanced POD scores (Fig. 8), and therefore increased the maximum possible PSS values that could be obtained in WE-38. As noted previously, QPFs in both watersheds did not exhibit the same level of overforecasting bias as quickflow forecasts (see supplemental material). Even so, we cannot dismiss the potential contribution of QPF biases to quickflow forecasting biases, as QPFs are often a significant source of error in hydrological forecasting (Cuo et al. 2011). In terms of persistence quickflow forecasts, PSS values achieved by persistence were generally similar in Mahantango Creek and WE-38 (Fig. 9). While PSS values for 24-h persistence forecasts exceeded the PSS values attained by HL-RDHM at the lower quickflow intensities (≤1 mm day−1), HL-RDHM tended to outperform persistence at higher quickflow intensities (>1 mm day−1), with a notable trend toward increasing relative skill for all intensities at the 48- and 72-h lead times.
Because of the intrinsic autocorrelation in the 24-h persistence forecasts noted earlier, the PSS values of HL-RDHM quickflow forecasts were not always statistically different from persistence for day-ahead forecasts. In particular, 95th bootstrap percentile confidence intervals indicated that HL-RDHM quickflow forecasts were not more skillful than persistence at lower quickflow intensities (≤1 mm day−1); however, at higher quickflow intensities (>2 mm day−1), 95th bootstrap percentile confidence intervals were greater than zero, suggesting the HL-RDHM forecasts exhibited skill. At the 48- and 72-h lead times, quickflow forecasts by HL-RDHM were significantly more skillful than persistence, although forecast skill by HL-RDHM seemed to diminish somewhat at the highest quickflow intensities (>5 mm day−1). The categorical verifications of quickflow with the PSS were generally in line with continuous verifications of quickflow in Mahantango Creek (Table S5) and WE-38 (Table S7) using MSE skill scores. Results also revealed the strong dependence of basin scale and lead time on the skill of persistence forecasts (Ghimire and Krajewski 2020; Ghimire et al. 2020), especially in larger basins like Mahantango Creek where 24-h persistence forecasts were most skillful.
3) Gerrity skill scores
Our final objective was to employ the GSS in order to ascertain whether HL-RDHM could correctly forecast ordinal categories of quickflow runoff (Fig. 10).
In line with the earlier PSS results, we found positive values of GSS for HL-RDHM forecasts of quickflow at all lead times in Mahantango Creek and WE-38, with the highest GSS values in WE-38 (Fig. 10). In general, HL-RDHM correctly forecasted a greater proportion of very high runoff events (>5 mm day−1) in WE-38 relative to Mahantango Creek; this pattern was readily apparent at all lead times, and likely contributed to the higher overall GSS values in WE-38. Notably, day-ahead forecasts by HL-RDHM correctly classified the three extreme quickflow events of 23–25 July 2018 as very high-magnitude events in WE-38 and Mahantango Creek. At a lead time of 48 h, HL-RDHM misclassified one of the three days as a high runoff event in WE-38, while all three events were correctly forecast in Mahantango Creek. At 72 h, the pattern reversed, with two of the three quickflow extremes correctly forecast in WE-38, and only one of the three days correctly forecast in Mahantango Creek; missed forecasts were misclassified as high-magnitude events. With respect to lower-magnitude events, we noted that HL-RDHM tended to misclassify some medium runoff events (0.1–1 mm day−1) as low- or high-runoff events in both watersheds. With the exception of the 24-h lead time in Mahantango Creek, GSS values for HL-RDHM forecasts exceeded the GSS values obtained with persistence quickflow forecasts.
As with the PSS analyses, we also used circular block bootstrapping to estimate 95th percentile confidence intervals for the mean difference between the GSS of the HL-RDHM quickflow forecasts and the GSS of persistence forecasts. For Mahantango Creek, HL-RDHM quickflow forecasts were not statistically different from persistence forecasts at the 24-h lead time (mean GSS difference = −0.04; 95% CI: from −0.11 to 0.01). Indeed, short-range persistence forecasts are often hard to beat for many hydrological models, especially in larger basins where the effects runoff processes extend over longer time scales (Ghimire and Krajewski 2020; Krajewski et al. 2020). The strength of short-range persistence relative to HL-RDHM was clearly evident for the 24-h quickflow forecasts in Mahantango Creek. Nevertheless, results clearly showed that HL-RDHM easily surpassed persistence (i.e., 95th percentile confidence intervals were greater than zero) in forecasting the correct category of runoff for all lead times in WE-38, and at 48- and 72-h lead times in Mahantango Creek (Fig. 10).
4) Limitations and opportunities for further research
Our study offered important insight into the potential for a gridded SAC-HT model to simulate and forecast quickflow in mixed use watersheds. However, we also acknowledge that the scope and scale of our study limited our ability to address the full chain of uncertainties that affected the model simulations and forecasts. For instance, we mostly focused on the errors in hydrologic predictions, but we did not quantify uncertainties related model parameterization or upstream sources of error like meteorological forcings. On the latter point, it is widely recognized that QPFs are a significant source of error in streamflow forecasting (Cuo et al. 2011), especially at small basin scales that are finer than the grid scale of the forcing data. Indeed, small watersheds like WE-38 have faster watershed response times than larger basins like Mahantago Creek (38 versus 84 h; see supplemental material, Fig. S7), which means that quickflow forecasting skill in headwaters increasingly depends on the quality of the QPFs for lead times greater than the response time (Vivoni et al. 2006; Ghimire et al. 2021). While it was beyond the scope of our study to formally address the effects of QPFs on quickflow forecasts, continuous forecast verification in Mahantango Creek and WE-38 did reveal a strong tendency for quickflow forecasting biases to track QPF biases in both watersheds (see supplemental materials, Fig. S8); in general, QPFs were increasingly underforecasted as lead time increased, and this underforecasting trend appeared to influence the bias of quickflow forecasts. Even so, quickflow forecasts in both watersheds generally maintained moderate correlations (r > 0.5) with observed quickflows at all lead times despite declining correlations between QPFs and QPEs as lead times increased from 24 to 72 h (Fig. S9). In light of these findings, there is clearly a need to more formally consider the effects of QPFs on quickflow forecasting errors as has been done in several recent streamflow and flood forecasting experiments (Seo et al. 2018; Adams and Dymond 2019; Ghimire et al. 2021). In addition to addressing the value of QPFs to quickflow forecasts, there is also a need to examine whether statistical postprocessing techniques (Li et al. 2017; Woldemeskel et al. 2018) could reduce forecasting biases and improve the skill and reliability of short-range quickflow forecasts. Finally, our quickflow forecasts were strictly deterministic in nature, and future studies of this sort should evaluate ensemble forecasting methods (Cloke and Pappenberger 2009) to provide probabilistic predictions of quickflow.
6. Summary and conclusions
In this study, we evaluated whether the NWS’s distributed hydrologic model could produce reasonable representations of quickflow at the watershed outlet and in WE-38, an interior headwater basin. Overall, HL-RDHM provided good to very good simulations of quickflow at the Mahantango Creek outlet, with high degrees of correlation between observed and simulated quickflows, and generally low biases. While quickflow simulations in WE-38 were satisfactory overall, there was a greater tendency for HL-RDHM to overestimate the amount and variability of quickflow at the headwater scale. The inferior performance of HL-RDHM in the interior WE-38 basin relative to that of the larger Mahantango Creek parent basin was consistent with previous intercomparisons of distributed hydrologic models in watersheds of varying sizes (Reed et al. 2004; Smith et al. 2012b). As noted in those studies, improved approaches to defining the distribution of a priori parameters are needed in order to enhance hydrologic simulations in interior basins like WE-38. Even so, the modest calibration applied in this study was sufficient to produce reasonable quickflow simulations for short-range forecasting applications in both basins.
Results of the forecasting experiment generally revealed that quickflow forecasts with HL-RDHM were highly skillful based on a number of different categorical forecast verification measures. Notably, we found that HL-RDHM tended to overforecast the frequency of quickflow for all intensity thresholds and lead times in both watersheds, although the overforecasting bias was most pronounced in WE-38 where large quickflow overestimation biases were also noted during the simulation phase of the study. In spite of these overforecasting biases, HL-RDHM showed an ability to skillfully forecast quickflow across a wide range of quickflow thresholds in WE-38 and Mahantango Creek. Moreover, HL-RDHM showed strong skill in forecasting ordinal categories of quickflow based on runoff magnitude and frequency. With the exception of day-ahead quickflow forecasts in Mahantango Creek, where persistence forecasts were notably difficult to overcome, HL-RDHM quickflow forecasts were significantly better than persistence quickflow forecasts, especially at 48- and 72-h lead times.
Given these promising findings, we see great potential for distributed models like HL-RDHM to provide short-range forecasts of quickflow that could aid operational (i.e., day-to-day) decision-making in agriculture (Easton et al. 2017) and other sectors where quickflow can negatively affect environmental quality. In the agricultural domain, wash-off of manure and fertilizer is dependent not only on the timing of runoff relative to application, but also on the size of the runoff event (Vadas et al. 2011, 2017). All other things being equal, a large storm occurring shortly after nutrients are applied has a greater potential for nutrient wash-off than a smaller storm. As this study demonstrates, the ability of HL-RDHM to correctly forecast naturally ordered categories of quickflow runoff based on magnitude and frequency is important, as these short-range forecasts can provide actionable information that enables farmers and other agricultural decision-makers to avoid applying nutrients when large storms with significant quickflow potential are predicted. While this study assessed the short-range predictability of quickflow, future studies should also consider longer-range forecasts that allow decision-makers to engage in tactical (weekly to monthly) and strategic (seasonal or yearly timeframes) planning.
Acknowledgments.
This work was partially funded with support from the National Institute of Food and Agriculture’s Agriculture and Food Research Initiative (AFRI) under Project 2012-67019-19296. Additional funding was provided by the USDA Agricultural Research Service’s Pasture Systems and Watershed Management Research Unit and USDA’s Natural Resources Conservation Service (NRCS) Conservation Effects Assessment Project. The existence, accuracy, and consistency of the Mahantango Creek watershed data are a testament to the dedication of numerous past and present USDA-ARS employees. We also acknowledge NOAA’s Middle Atlantic River Forecasting Center (MARFC) for the modeling and forecasting datasets that were developed specifically for this project. The authors declare no competing financial or nonfinancial interests.
Data availability statement.
The HL-RDHM was developed at NOAA’s NWS Office of Hydrologic Development (OHD), which is now the NWS Office of Water Prediction (OWP). HL-RDHM simulations and forecasts were generated using computational resources at NOAA’s NWS Middle Atlantic River Forecast Center (MARFC) in State College, Pennsylvania. Model outputs and observational datasets are available from the authors upon request.
APPENDIX
Model Performance Metrics and Forecast Verification Metrics
a. Model performance metrics
1) Goodness-of-fit metrics
We evaluated the goodness-of-fit between HL-RDHM simulations of streamflow and quickflow and observations using a suite of standard performance measures used by the hydrological modeling community (Table A1).
Goodness-of-fit metrics used to judge the performance of HL-RDHM simulations in Mahantango Creek, including the name of the metric, the abbreviation/symbol, the general equation for calculating the metric, the range, and relevant reference(s). Abbreviations in the equations include observed values O, simulated values S, the time step i, and the sample size N.
2) Flow duration curve diagnostic signatures
We also conducted a diagnostic evaluation of model fit using flow duration curves (FDCs) that were partitioned into four zones (Fig. A1): peak flows (0%–5%), high flows (5%–20%), intermediate flows (20%–70%), and low flows (70%–100%) (Table A2).
Flow duration curve (FDC) signatures used to judge the performance of simulated FDCs by HL-RDHM in Mahantango Creek, including the name of the metric and the applicable exceedance probabilities (EPs), the abbreviation of the metric, the general equation for calculating the metric, the range, and relevant reference(s). Abbreviations in the equations include streamflow Q, observed values O, and simulated values S.
b. Forecast verification metrics
To verify quickflow forecasts in Mahantango Creek, we used a set of forecast verification metrics and skill scores for binary (i.e., yes/no) forecasts, as well as skill scores developed for multicategory (e.g., none/low/medium/high) forecasts (Wilks 2011).
1) Binary forecasts
2 × 2 contingency table
The 2 × 2 contingency table (Fig. A2) shows the frequency of “yes” and “no” forecasts and observations for a given forecast criterion (e.g., runoff versus no runoff, runoff amounts exceeding a given threshold, etc.).
Successful forecasts (a) are called hits, while events that were forecast to happen but did not happen (b) are known as false alarms. Events that that were forecast not to occur but did occur (c) are called misses. Correct forecasts of nonevents (d) are known as correct rejections. The four combinations of forecasts and observations are collectively known as the joint distribution, and the totals given in the right-hand column and in the bottom row are called the marginal distribution (Wilks 2011; CAWCR 2017). A number of categorical statistics can be estimated from the 2 × 2 contingency table (CAWCR 2017; Hogan and Mason 2012; Wilks 2011), and the performance measures reported in this paper are described and defined in Table A3.
2) Multicategory forecasts
(i) 4 × 4 contingency table
Categorical forecasts are not limited to binary (i.e., yes/no) events. In many instances, three or more categories can be developed that are mutually exclusive (e.g., rain, snow, freezing rain) or naturally ordered (e.g., low, medium, high). In this study, we categorized runoff amounts into the four ordinal categories shown in Fig. A3 below.
(ii) Gerrity skill score
For a 4 × 4 contingency table, I = J = 4; the joint probability distribution p(fi, oj), shown in Fig. A4, is computed as the ratios of the cell counts in the 4 × 4 contingency table (Fig. A3) to the sample size (n), which is equal to the total number of forecast/observation pairs (Wilks 2011). The marginal probabilities for the observations (i.e., the sample climatological probabilities) are then calculated by summing the four columns of the joint probability distribution.
An important quality of the GSS is that it gives more credit to forecasts of rare events relative to common events and levies greater penalties to larger errors relative to smaller ones. Like the PSS (Table A3), the GSS is an equitable score (Gandin and Murphy 1992) with an ideal value of 1.
REFERENCES
Adams, T. E., 2016: Flood forecasting in the United States NOAA/National Weather Service. Flood Forecasting: A Global Perspective, T. E. Adams and T. C. Pagano, Eds., Academic Press, 249–310, https://doi.org/10.1016/B978-0-12-801884-2.00010-4.
Adams, T. E., and R. Dymond, 2019: The effect of QPF on real-time deterministic hydrologic forecast uncertainty. J. Hydrometeor., 20, 1687–1705, https://doi.org/10.1175/JHM-D-18-0202.1.
Alfieri, L., P. Burek, E. Dutra, B. Krzeminski, D. Muraro, J. Thielen, and F. Pappenberger, 2013: GloFAS – Global ensemble streamflow forecasting and flood early warning. Hydrol. Earth Syst. Sci., 17, 1161–1175, https://doi.org/10.5194/hess-17-1161-2013.
Anderson, E., 2002: Calibration of conceptual hydrologic models for use in river forecasting. 372 pp., https://www.weather.gov/media/owp/oh/hrl/docs/1_Anderson_CalbManual.pdf.
Anderson, E., 2006: Snow accumulation and ablation model—SNOW-17. User’s manual, NWS, 61 pp., https://www.weather.gov/media/owp/oh/hdsc/docs/TM17.pdf.
Arnold, J. G., and P. M. Allen, 1999: Automated methods for estimating baseflow and ground water recharge from streamflow records. J. Amer. Water Resour. Assoc., 35, 411–424, https://doi.org/10.1111/j.1752-1688.1999.tb03599.x.
Arnold, J. G., P. M. Allen, R. Muttiah, and G. Bernhardt, 1995: Automated base flow separation and recession analysis techniques. Ground Water, 33, 1010–1018, https://doi.org/10.1111/j.1745-6584.1995.tb00046.x.
Barlow, P. M., W. L. Cunningham, T. Zhai, and M. Gray, 2015: U.S. Geological Survey groundwater toolbox, a graphical and mapping interface for analysis of hydrologic data (version 1.0): User guide for estimation of base flow, runoff, and groundwater recharge from streamflow data. USGS Techniques and Methods Rep. 3-B10, 40 pp., https://doi.org/10.3133/tm3B10.
Bartholmes, J. C., J. Thielen, M. H. Ramos, and S. Gentilini, 2009: The European flood alert system EFAS – Part 2: Statistical skill assessment of probabilistic and deterministic operational forecasts. Hydrol. Earth Syst. Sci., 13, 141–153, https://doi.org/10.5194/hess-13-141-2009.
Bennett, N. D., and Coauthors, 2013: Characterising performance of environmental models. Environ. Modell. Software, 40, 1–20, https://doi.org/10.1016/j.envsoft.2012.09.011.
Bryant, R. B., and Coauthors, 2011: U.S. Department of Agriculture Agricultural Research Service Mahantango Creek Watershed, Pennsylvania, United States: Physiography and history. Water Resour. Res., 47, W08701, https://doi.org/10.1029/2010WR010056.
Buda, A. R., P. J. A. Kleinman, M. S. Srinivasan, R. B. Bryant, and G. W. Feyereisen, 2009: Factors influencing surface runoff generation from two agricultural hillslopes in central Pennsylvania. Hydrol. Processes, 23, 1295–1312, https://doi.org/10.1002/hyp.7237.
Buda, A. R. , and Coauthors, 2011a: U.S. Department of Agriculture Agricultural Research Service Mahantango Creek Watershed, Pennsylvania, United States: Long-term precipitation database. Water Resour. Res., 47, W08702, https://doi.org/10.1029/2010WR010058.
Buda, A. R. , and Coauthors, 2011b: U.S. Department of Agriculture Agricultural Research Service Mahantango Creek Watershed, Pennsylvania, United States: Long-term stream discharge database. Water Resour. Res., 47, W08703, https://doi.org/10.1029/2010WR010059.
Buda, A. R., P. J. A. Kleinman, G. W. Feyereisen, D. A. Miller, P. G. Knight, P. J. Drohan, and R. B. Bryant, 2013: Forecasting runoff from Pennsylvania landscapes. J. Soil Water Conserv., 68, 185–198, https://doi.org/10.2489/jswc.68.3.185.
Burnash, R., 1995: The NWS River Forecast System-catchment modeling. Computer Models of Watershed Hydrology, V. P. Singh, Ed., Water Resources Publications, 311–366.
Candogan Yossef, N., L. P. H. van Beek, J. C. J. Kwadijk, and M. F. P. Bierkens, 2012: Assessment of the potential forecasting skill of a global hydrological model in reproducing the occurrence of monthly flow extremes. Hydrol. Earth Syst. Sci., 16, 4233–4246, https://doi.org/10.5194/hess-16-4233-2012.
CAWCR, 2017: WWRP/WGNE Joint Working Group on forecast verification research. Accessed 27 September 2021, https://www.cawcr.gov.au/projects/verification/.
Chapman, T. G., 1991: Comment on “Evaluation of automated techniques for base flow and recession analyses” by R. J. Nathan and T. A. McMahon. Water Resour. Res., 27, 1783–1784, https://doi.org/10.1029/91WR01007.
Chouaib, W., P. V. Caldwell, and Y. Alila, 2018: Regional variation of flow duration curves in the eastern United States: Process-based analyses of the interaction between climate and landscape properties. J. Hydrol., 559, 327–346, https://doi.org/10.1016/j.jhydrol.2018.01.037.
Clark, M. P., and Coauthors, 2021: The abuse of popular performance metrics in hydrologic modeling. Water Resour. Res., 57, e2020WR029001, https://doi.org/10.1029/2020WR029001.
Cloke, H. L., and F. Pappenberger, 2009: Ensemble flood forecasting: A review. J. Hydrol., 375, 613–626, https://doi.org/10.1016/j.jhydrol.2009.06.005.
Coelho, C. A. S., B. Brown, L. Wilson, M. Mittermaier, and B. Casati, 2019: Forecast verification for S2S timescales. Sub-Seasonal to Seasonal Prediction, A. W. Robertson, and F. Vitart, Eds., Elsevier, 337–361.
Criss, R. E., and W. E. Winston, 2008: Do Nash values have value? Discussion and alternate proposals. Hydrol. Processes, 22, 2723–2725, https://doi.org/10.1002/hyp.7072.
Cuo, L., T. C. Pagano, and Q. J. Wang, 2011: A review of quantitative precipitation forecasts and their use in short- to medium-range streamflow forecasting. J. Hydrometeor., 12, 713–728, https://doi.org/10.1175/2011JHM1347.1.
DeCicco, L. A., D. Lorenz, R. M. Hirsch, W. Watkins, and M. Johnson, 2021: dataRetrieval: R packages for discovering and retrieving water data available from U.S. federal hydrologic web services. USGS, https://code.usgs.gov/water/dataRetrieval.
Demargne, J., M. Mullusky, K. Werner, T. Adams, S. Lindsey, N. Schwein, W. Marosi, and E. Welles, 2009: Application of forecast verification science to operational river forecasting in the U.S. National Weather Service. Bull. Amer. Meteor. Soc., 90, 779–784, https://doi.org/10.1175/2008BAMS2619.1.
Déqué, M., 2012: Deterministic Forecasts of Continuous Variables. Forecast Verification: A Practitioner’s Guide in Atmospheric Science, 2nd ed. I. T. Joliffe and D. B. Stephenson, Eds., John Wiley and Sons, Ltd., 77–94.
Drohan, P. J., and Coauthors, 2019: A global perspective on phosphorus management decision support in agriculture: Lessons learned and future directions. J. Environ. Qual., 48, 1218–1233, https://doi.org/10.2134/jeq2019.03.0107.
Duncan, H. P., 2019: Baseflow separation – A practical approach. J. Hydrol., 575, 308–313, https://doi.org/10.1016/j.jhydrol.2019.05.040.
Easton, Z. M., and Coauthors, 2017: Short-term forecasting tools for agricultural nutrient management. J. Environ. Qual., 46, 1257–1269, https://doi.org/10.2134/jeq2016.09.0377.
Euser, T., H. C. Winsemius, M. Hrachowitz, F. Fenicia, S. Uhlenbrook, and H. H. G. Savenije, 2013: A framework to assess the realism of model structures using hydrological signatures. Hydrol. Earth Syst. Sci., 17, 1893–1912, https://doi.org/10.5194/hess-17-1893-2013.
Finnerty, B. D., M. B. Smith, D.-J. Seo, V. Koren, and G. E. Moglen, 1997: Space-time scale sensitivity of the Sacramento model to radar-gage precipitation inputs. J. Hydrol., 203, 21–38, https://doi.org/10.1016/S0022-1694(97)00083-8.
Fuka, D. R., M. T. Walter, J. A. Archibald, T. S. Steenhuis, and Z. M. Easton, 2018: EcoHydRology: A community modeling foundation for eco-hydrology. R package, version 0.4.12.1, https://cran.r-project.org/src/contrib/Archive/EcoHydRology/.
Gandin, L. S., and A. H. Murphy, 1992: Equitable skill scores for categorical forecasts. Mon. Wea. Rev., 120, 361–370, https://doi.org/10.1175/1520-0493(1992)120<0361:ESSFCF>2.0.CO;2.
Gburek, W. J., B. A. Needelman, and M. S. Srinivasan, 2006: Fragipan controls on runoff generation: Hydropedological implications at landscape and watershed scales. Geoderma, 131, 330–344, https://doi.org/10.1016/j.geoderma.2005.03.021.
Gerrity, J. P., Jr., 1992: A note on Gandin and Murphy’s equitable skill score. Mon. Wea. Rev., 120, 2709–2712, https://doi.org/10.1175/1520-0493(1992)120<2709:ANOGAM>2.0.CO;2.
Ghimire, G. R., and W. F. Krajewski, 2020: Exploring persistence in streamflow forecasting. J. Amer. Water Resour. Assoc., 56, 542–550, https://doi.org/10.1111/1752-1688.12821.
Ghimire, G. R., S. Sharma, J. Panthi, R. Talchabhadel, B. Parajuli, P. Dahal, and R. Baniya, 2020: Benchmarking real-time streamflow forecast skill in the Himalayan region. Forecasting, 2, 230–247, https://doi.org/10.3390/forecast2030013.
Ghimire, G. R., W. F. Krajewski, and F. Quintero, 2021: Scale-dependent value of QPF for real-time streamflow forecasting. J. Hydrometeor., 22, 1931–1947, https://doi.org/10.1175/JHM-D-20-0297.1.
Gilleland, E., 2015: Verification: Weather forecast verification utilities. R package, version 1.42, https://cran.r-project.org/web/packages/verification/index.html.
Goering, D. C., 2013: Decision support for Wisconsin’s manure spreaders: Development of a real-time runoff risk advisory forecast. M.S. thesis, School of Natural Resources and the Environment, The University of Arizona, 265 pp., https://repository.arizona.edu/handle/10150/305874.
Gourley, J. J., and Coauthors, 2017: The FLASH Project: Improving the tools for flash flood monitoring and prediction across the United States. Bull. Amer. Meteor. Soc., 98, 361–372, https://doi.org/10.1175/BAMS-D-15-00247.1.
Greene, D. R., and M. D. Hudlow, 1982: Hydrometeorological grid mapping procedures. Preprints, Int. Symp. on Hydrometeorology, Denver, CO, American Water Resources Association, 20 pp.
Gupta, H. V., S. Sorooshian, and P. O. Yapo, 1999: Status of automatic calibration for hydrologic models: Comparison with multilevel expert calibration. J. Hydrol. Eng., 4, 135–143, https://doi.org/10.1061/(ASCE)1084-0699(1999)4:2(135).
Gupta, H. V., H. Kling, K. K. Yilmaz, and G. F. Martinez, 2009: Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. J. Hydrol., 377, 80–91, https://doi.org/10.1016/j.jhydrol.2009.08.003.
Harrigan, S., C. Prudhomme, S. Parry, K. Smith, and M. Tanguy, 2018: Benchmarking ensemble streamflow prediction skill in the UK. Hydrol. Earth Syst. Sci., 22, 2023–2039, https://doi.org/10.5194/hess-22-2023-2018.
Hisdal, H., L. M. Tallaksen, B. Clausen, E. Peters, and A. Gustard, 2004: Hydrological drought characteristics. Hydrological Drought, Processes and Estimation Methods for Streamflow and Groundwater, L. M. Tallaksen and H. A. J. Van Lanen, Eds., Developments in Water Science, Vol. 48, Elsevier Science, 139–198.
Hogan, R. J., and I. B. Mason, 2012: Deterministic forecasts of binary events. Forecast Verification: A Practitioner’s Guide in Atmospheric Science, 2nd ed. I. T. Joliffe and D. B. Stephenson, Eds., John Wiley and Sons, Ltd., 31–59.
Hogan, R. J., C. A. T. Ferro, I. T. Jolliffe, and D. B. Stephenson, 2010: Equitability revisited: Why the “equitable threat score” is not equitable. Wea. Forecasting, 25, 710–726, https://doi.org/10.1175/2009WAF2222350.1.
Khakbaz, B., B. Imam, S. Sorooshian, V. I. Koren, Z. Cui, M. B. Smith, and P. Restrepo, 2011: Modification of the National Weather Service Distributed Hydrologic Model for subsurface water exchanges between grids. Water Resour. Res., 47, W06524, https://doi.org/10.1029/2010WR009626.
Kleinman, P. J. A., and A. N. Sharpley, 2003: Effect of broadcast manure on runoff phosphorus concentrations over successive rainfall events. J. Environ. Qual., 32, 1072–1081, https://doi.org/10.2134/jeq2003.1072.
Knoben, W. J. M., J. E. Freer, and R. A. Woods, 2019: Technical note: Inherent benchmark or not? Comparing Nash–Sutcliffe and Kling–Gupta efficiency scores. Hydrol. Earth Syst. Sci., 23, 4323–4331, https://doi.org/10.5194/hess-23-4323-2019.
Koren, V., M. Wang, D. Zhang, and Z. Zhang, 2000: Use of soil property data in the derivation of conceptual rainfall-runoff model parameters. 15th Conf. on Hydrology, Long Beach, CA, Amer. Meteor. Soc., 2.16, https://ams.confex.com/ams/annual2000/techprogram/paper_6074.htm.
Koren, V., S. Reed, M. Smith, Z. Zhang, and D.-J. Seo, 2004: Hydrology Laboratory Research Modeling System (HL-RMS) of the US National Weather Service. J. Hydrol., 291, 297–318, https://doi.org/10.1016/j.jhydrol.2003.12.039.
Koren, V., F. Moreda, S. Reed, M. Smith, and Z. Zhang, 2006: Evaluation of a grid-based distributed hydrological model over a large area. IAHS Publ., 303, 47–56.
Koren, V., M. Smith, and Z. Cui, 2014: Physically-based modifications to the Sacramento Soil Moisture Accounting model. Part A: Modeling the effects of frozen ground on the runoff generation process. J. Hydrol., 519, 3475–3491, https://doi.org/10.1016/j.jhydrol.2014.03.004.
Krajewski, W. F., G. R. Ghimire, and F. Quintero, 2020: Streamflow forecasting without models. J. Hydrometeor., 21, 1689–1704, https://doi.org/10.1175/JHM-D-19-0292.1.
Krause, P., D. P. Boyle, and F. Bäse, 2005: Comparison of different efficiency criteria for hydrological model assessment. Adv. Geosci., 5, 89–97, https://doi.org/10.5194/adgeo-5-89-2005.
Ley, R., H. Hellebrand, M. C. Casper, and F. Fenicia, 2015: Comparing classical performance measures with signature indices derived from flow duration curves to assess model structures as tools for catchment classification. Hydrol. Res., 47, 1–14, https://doi.org/10.2166/nh.2015.221.
Ley, R., H. Hellebrand, M. C. Casper, and F. Fenicia, 2016: Is catchment classification possible by means of multiple model structures? A case study based on 99 catchments in Germany. Hydrology, 3, 22, https://doi.org/10.3390/hydrology3020022.
Li, W., Q. Duan, C. Miao, A. Ye, W. Gong, and Z. Di, 2017: A review on statistical postprocessing methods for hydrometeorological ensemble forecasting. Wiley Interdiscip. Rev.: Water, 4, e1246, https://doi.org/10.1002/wat2.1246.
Livezey, R. E., 2012: Deterministic forecasts of multi-category events. Forecast Verification: A Practitioner’s Guide in Atmospheric Science, 2nd ed. I. T. Joliffe and D. B. Stephenson, Eds., John Wiley and Sons, Ltd., 61–75.
Lu, H., R. B. Bryant, A. R. Buda, A. S. Collick, G. J. Folmar, and P. J. A. Kleinman, 2015: Long-term trends in climate and hydrology in an agricultural, headwater watershed of central Pennsylvania, USA. J. Hydrol. Reg. Stud., 4, 713–731, https://doi.org/10.1016/j.ejrh.2015.10.004.
Lyne, V., and M. Hollick, 1979: Stochastic time-variable rainfall-runoff modelling. Institute of Engineers Australia National Conf., Barton, ACT, Australia, Institute of Engineers Australia, 89–93.
Mau, D. P., and T. C. Winter, 1997: Estimating ground-water recharge from streamflow hydrographs for a small mountain watershed in a temperate humid climate, New Hampshire, USA. Ground Water, 35, 291–304, https://doi.org/10.1111/j.1745-6584.1997.tb00086.x.