1. Introduction
Streamflow simulations from hydrologic models contain errors propagated from uncertain forcings, model initial conditions, parameters and structures, and human control of storage and movement of water (Ajami et al. 2007; Doherty and Welter 2010; Gupta et al. 2012; Krzysztofowicz 1999; Montanari and Brath 2004; NRC 2006; Renard et al. 2010; Schaake et al. 2007b; Seo et al. 2006; Wood and Schaake 2008). For risk-based management of water resources and water-related hazards, it is necessary to quantify the uncertainties arising from these sources (Borgomeo et al. 2014; Butts et al. 2004; Georgakakos et al. 2004; Hall and Borgomeo 2013; Hall et al. 2020). Ensemble forecasting has emerged in recent years as the methodology of choice for modeling and communicating forecast uncertainty (Cloke and Pappenberger 2009; Demargne et al. 2014; Demeritt et al. 2010; NRC 2006; Schaake et al. 2007b). In the United States, the National Weather Service (NWS) has recently implemented the Hydrologic Ensemble Forecast Service (HEFS; Demargne et al. 2014) at all River Forecast Centers (RFC) (Lee et al. 2018) following experimental operation at selected RFCs (Hartman et al. 2015; Kim et al. 2018; Wells 2017). To reduce biases in precipitation and temperature forecasts, the HEFS uses the Meteorological Ensemble Forecast Processor (MEFP; Schaake et al. 2007a; Wu et al. 2011; NWS 2017a). To reduce and quantify hydrologic uncertainty in streamflow prediction, the HEFS employs the ensemble postprocessor, EnsPost (Seo et al. 2006; NWS 2017b). In the HEFS, each MEFP-processed forcing ensemble member is input to the chain of hydrologic models in the Community Hydrologic Prediction System (CHPS; Gijsbers et al. 2009; Roe et al. 2010). The resulting ensemble trace of “raw” streamflow forecast may be input to the ensemble postprocessor to produce an ensemble member of postprocessed streamflow forecast. The descriptor “post” arises from the fact that post processing of streamflow ensemble forecast occurs after the generation of raw ensemble streamflow forecast.
EnsPost was developed originally for short-range forecasting of natural flows in headwater basins and models predictive hydrologic uncertainty using a combination of probability matching (PM; Hashino et al. 2002; Madadgar et al. 2014) and autoregressive (AR)-1 model with an exogenous variable, or ARX (1,1) (Bennett et al. 2014; Damon and Guillas 2002), in bivariate normal space (Krzysztofowicz 1999; Seo et al. 2006). EnsPost applies PM and ARX(1,1) at a daily scale only. In reality, the characteristic time scales of error in model-simulated flow may span a range of scales, depending on the residence time of the hydrologic processes involved and the error characteristics of the forcings and the hydrologic models used (Blöschl and Sivapalan 1995). In addition, if the flow is strongly regulated, the errors may be reducible only over a certain range of temporal scales of aggregation due to the altered residence time and storage-outflow relationships. The postprocessing approach used in this work reflects the multiscale nature of the hydrologic and atmospheric processes (Blöschl and Sivapalan 1995; Carlu et al. 2019; Kumar 2011), and of hydrologic modeling and prediction, including parameter estimation (Mizukami et al. 2017), parameter regionalization (Samaniego et al. 2010), model evaluation (Rakovec et al. 2016), and data assimilation (Li et al. 2015).
The positive impact of postprocessing raw model simulations of streamflow in ensemble streamflow forecasting has been widely reported (Kim et al. 2018, 2016; Madadgar et al. 2014). Recently, it has also been shown that EnsPost significantly increases skill in ensemble forecasts of outflow from a water supply reservoir in North Texas during significant releases, in addition to that in ensemble inflow forecasts (Limon 2019). With increasing acceptance and adoption of ensemble streamflow forecasting by the operational community, developing more potent postprocessing methods has been a very active area of research. To that end, a number of comparison studies have recently been carried out. For postprocessing of meteorological forecast, Wilks (2006) compared direct model output (Wilks 2006), rank histogram recalibration (Hamill and Colucci 1998), single-integration Model Output Statistics (MOS; Erickson 1996), ensemble dressing (Roulston and Smith 2003), logistic regression (Hamill et al. 2004), nonhomogeneous Gaussian regression (Gneiting et al. 2005), forecast assimilation (Stephenson et al. 2005), and Bayesian model averaging (Raftery et al. 2005). He concluded that logistic regression (Duan et al. 2007; Hamill et al. 2004), ensemble MOS (Gneiting et al. 2005), and ensemble dressing outperform the others. For postprocessing of hydrologic forecast, Boucher et al. (2015) compared the regression and dressing methods using synthetic data. They concluded that the techniques have similar overall performance, and that the regression and dressing methods perform better in terms of resolution and reliability, respectively. Mendoza et al. (2016) used medium-range ensemble streamflow forecasts from the System for Hydrometeorological Applications, Research and Prediction, and compared quantile mapping (Mendoza et al. 2016; Hashino et al. 2006; Piani et al. 2010; Regonda and Seo 2008; Wood and Schaake 2008; Zhu and Luo 2015), logistic regression, quantile regression (Bjørnar Bremnes 2004; Bogner et al. 2016; Coccia and Todini 2011; Koenker and Bassett 1978) and the general linear model postprocessor (GLMPP; Zhao et al. 2011). They found that no single method performed best in all situations, and that the post processors’ performance depended on factors such as soil type and land use and hydroclimatic conditions of the basin. Ye et al. (2015) developed canonical events-based GLMPP for postprocessing of streamflow during dry season. Li et al. (2016) developed and evaluated for short-term streamflow forecasting a new method for error modeling where a sequence of simple error models, instead of a single complex model, is run through different stages. Recently, Li et al. (2017) and Vannitsem et al. (2018) carried out comprehensive reviews on the application of different postprocessing techniques.
In this paper, we introduce a new multiscale postprocessor for ensemble streamflow prediction for short to long ranges. By short and long ranges, we mean up to several days and at least 1 month ahead, respectively. The proposed technique, referred to herein as MS-EnsPost, is designed to reduce magnitude-dependent biases in raw model-simulated flow, and utilize all available skill that may exist over a range of temporal scales of aggregation in simulated and observed flows. We then comparatively evaluate MS-EnsPost with EnsPost for 139 basins in the services areas of eight RFCs in the continental United States (CONUS). As part of the evaluation, we also address the following research questions:
How does the prediction skill, as measured by MS-EnsPost in reference to climatology, compare among the different RFCs, and among different basins within an RFC?
How does the above skill relate to the hydroclimatology of the basin?
How does MS-EnsPost perform relative to EnsPost in reducing systematic errors and initial condition uncertainties?
This paper is organized as follows. Section 2 describes the study basins and data used. Section 3 describes the methods used. Section 4 presents the results. Section 5 provides the conclusions and future research recommendations.
2. Study basins and data used
The data used for this study are mean daily observed and simulated streamflow. The historical observed mean daily streamflow, referred to as QME in the NWS, is obtained from the U.S. Geological Survey. The focus of this work is on reducing and quantifying hydrologic uncertainty. As such, the model output of interest is the simulated streamflow, which reflects hydrologic uncertainty only, rather than the streamflow forecast, which reflects both meteorological and hydrologic uncertainties (Krzysztofowicz 1999; Seo et al. 2006). The simulated mean daily flow, or SQME, is derived from the simulated instantaneous flow, or SQIN, generated at a 6-h interval using the operational hydrologic models, and the observed forcings of mean areal precipitation, temperature, and potential evapotranspiration. For the remainder of this paper, by daily flow, we mean mean daily flow. The hydrologic models used are the Sacramento (SAC; Burnash et al. 1973) for soil moisture accounting, unit hydrograph (Chow et al. 1988) for surface runoff routing, and SNOW-17 (Anderson 1973) for snow ablation. The MARFC uses the continuous antecedent precipitation index model (API-CONT; Fedora and Beschta 1989; Sittner et al. 1969) instead of SAC. The SQIN time series were produced by the respective RFCs using the Community Hydrologic Prediction System (CHPS; Gijsbers et al. 2009; Roe et al. 2010) based on the RFCs’ historical forcings and calibrated model parameters. The CHPS is the main operational forecasting system at the RFCs, and uses the single (RES-SNGL) and joint (RES-J) reservoir regulation models, and the SSARR reservoir regulation (SSARRESV) model for simulation of reservoir operations (Adams 2016; NWS 2008a,b). Reservoir models were included in the hydrologic modeling of 20 of the 139 locations used in this work. There are about 14 additional locations impacted by reservoir regulations that are not modeled. Limon (2019) has shown that, for a water supply reservoir in North Texas, the magnitude of reservoir modeling uncertainty may be comparable to that of all other hydrologic uncertainties combined, and may even approach that of the meteorological uncertainty. As such, flow regulations present a large additional challenge to streamflow postprocessing. Experience thus far indicates that at least 20 years’ worth of data is necessary for estimation of the EnsPost parameters (NWS 2017b). The period of record used in this work common to both QME and SQME time series ranged from 12 to 66 years, and exceeded 30 years for over 90% of the basins.
3. Methods used
MS-EnsPost consists of three elements: bias correction, multiscale regression, and ensemble generation. Figure 2 provides a schematic of the data flow and the associated processes. In this section, we describe the above elements and how MS-EnsPost is evaluated.
a. Bias correction
Among the total of K different temporal scales of aggregation, the best-performing scale is identified via leave-one-year-out (or similar) cross validation using a period of record of N years as described below. First, the magnitude-dependent biases are estimated at the K different scales of aggregation using an (N − 1)-yr period of observed flow and matching model simulation. The resulting biases are applied to the simulated daily flow valid on each Julian day of the withheld year. The procedure then identifies the aggregation scale that produces the smallest RMSE over the year in the bias-corrected simulated daily flow by comparing with the verifying observed flow. Once completed for all N years, the leave-one-year-out cross validation produces a total of N different sets of magnitude-dependent biases for simulated daily flow. For a given
b. Multiscale regression
Postprocessing generally seeks predictions at the highest possible temporal resolution. High dimensional stochastic modeling necessary for such predictions, however, is a large challenge due to the complexity involved and large data requirements. In the multiscale regression approach used in this work, we solve instead a large number of very low-dimensional statistical modeling problems. Figure 4 illustrates the basic idea behind the approach in the context of predicting LM day-ahead observed daily flow using the model-simulated daily flow valid over the LM day-long prediction horizon, and the observed daily flow LM − 1 days into the past where M refers to the index for the largest aggregation scale. In this approach, rather than predicting
c. Error modeling and ensemble generation
The error modeling as described above requires estimation of λ,
To capture the distributional characteristics of multidaily flow, it is necessary to model the temporal dependence of
d. Evaluation
4. Results
This section presents the comparative evaluation results for single-valued and ensemble predictions, and assesses the predictability of streamflow as measured from the ensemble prediction results for different hydroclimatological regions.
a. Single-valued streamflow prediction
Figure 5 shows the RMSE of the raw, bias-corrected, MS-EnsPost ensemble mean, and EnsPost ensemble mean streamflow predictions for lead times of 1–7 days and 1 month for the basins in the CBRFC, MARFC, and WGRFCs’ service areas. For similar plots for all other RFCs, the reader is referred to Alizadeh (2019). The above three RFCs are chosen to represent the regions of the largest, medium, and the most limited predictability among the groups of basins considered in this work. In the figure, we connect the RMSE values for each basin to help assess the relative performance among the four different predictions for each basin. A reduction in RMSE by the bias-corrected prediction over the raw is an indication that significant magnitude-dependent biases exist in the raw model-simulated flow due to parametric or structural errors in the hydrologic models, biases in the forcings, or flow regulations. A reduction in RMSE by the MS-EnsPost ensemble mean prediction over the bias-corrected is due to multiscale regression, and indicates that significant uncertainties exist in the initial conditions of the hydrologic models, or significant hydrologic memory exists in the surface and soil water storages of the basin. Due to the temporal aggregation, the monthly results (rightmost columns in Fig. 5) amplify the relative performance of the bias correction components of MS-EnsPost and EnsPost. The monthly results are hence more reflective of the bias correction operation, which impacts over the entire forecast horizon, than the multiscale regression operation, which impacts over the range of hydrologic memory only. The results for all 139 basins indicate that, MS-EnsPost reduces the RMSE of the raw model predictions of daily flow by 5%–74% and when compared to the EnsPost predictions, by 5%–68%, and that MS-EnsPost is superior to EnsPost for 1-month-ahead streamflow prediction for all basins examined in this work. Below we summarize the main RMSE results for each of the 3 RFCs. The results for the other RFCs and the discussion may be found in Alizadeh (2019).
For CBRFC (Fig. 5a), the MS-EnsPost ensemble mean prediction improves over the raw and EnsPost ensemble mean predictions for all basins. The yellow circles in the figure indicate that the basin has flow regulations that are modeled with CHPS. In general, both bias correction and multiscale regression contributes to the improvement by MS-EnsPost. For a number of basins, the reduction in RMSE due to multiscale regression persists to Day 4 and beyond, a reflection of the longer hydrologic memory present in the Upper Colorado River basin owing to the snowmelt. For MARFC (Fig. 5b), bias correction generally contributes more to the RMSE reduction by MS-EnsPost than multiscale regression. In Fig. 5, the empty circles indicate that the basin has unmodeled regulated flow. The largest improvement by MS-EnsPost over EnsPost is in the day-1 prediction for RTDP1 (third from the top) which is downstream of Raystown Dam on the Raystown Branch of the Juniata River. Overall, the impact of multiscale regression is rather modest and wears off within the first two days of lead time, an indication that the hydrologic memory in the Juniata River basin in Pennsylvania is relatively short. For the WGRFC basins (Fig. 5c), the improvement by MS-EnsPost over EnsPost is particularly large. These basins are located in the semiarid western part of the Upper Trinity River basin (Kim et al. 2018). As such, they have short memory in surface and soil water storages, and their streams are ephemeral despite relatively large basin size (441 ~ 1764 km2). Because EnsPost does not model intermittency of streamflow, its results are particularly poor for the WGRFC basins. MS-EnsPost, on the other hand, is able to address intermittency to a significant extent by aggregating flow which reduces or removes zero flows at sufficiently large temporal scales. Overall, the reduction in RMSE due to multiscale regression is rather short-lived. The monthly results (rightmost panel in Fig. 5c) for JAKT2 and DCJT2 (second and fourth from the top, respectively) are unexpected in that multiscale regression in MS-EnsPost slightly increased RMSE over magnitude-dependent bias correction alone. The above observation indicates that statistical assimilation of observed streamflow up to a month in aggregation scale does not add skill due to the weak hydrologic memory in streamflow in these basins.
b. Ensemble streamflow prediction
In this section, we comparatively evaluate the MS-EnsPost ensemble streamflow predictions with the EnsPost. To facilitate comparison for a large number of basins, we use “worm” plots in which the mean CRPS of the MS-EnsPost predictions (y axis) versus the EnsPost predictions (x axis) are dot-plotted and connected for lead times of 1–7 days to form a worm for each basin. Figure 6a shows the worm plots in log–log scale for all study basins for each RFC. The lower and upper ends of each worm are associated with day-1 and -7 predictions for that basin, respectively. If MS-EnsPost improves over EnsPost for 7-day-ahead prediction, the worms would stretch downward from the diagonal. The longer the downward stretch is, the larger the improvement by MS-EnsPost over EnsPost. If MS-EnsPost does not improve over EnsPost, the worms would lie along the diagonal. Figure 6b shows the mean CRPS scatterplots of 1-month-ahead MS-EnsPost predictions of monthly flow versus the EnsPost.
Figure 6a shows that, for most basins, MS-EnsPost significantly improves over EnsPost. For most MARFC basins, however, little improvement is seen. For some MBRFC basin, MS-EnsPost performed worse than EnsPost for day-1 and -2 predictions. For RTDP1 of MARFC (the 3rd worm from the top), MS-EnsPost clearly improves over EnsPost. Recall in the single-valued prediction results that MS-EnsPost generally showed significant improvement over EnsPost for regulated flows. The MBRFC results were unexpected in that MS-EnsPost was clearly superior to EnsPost in ensemble mean prediction. A closer examination indicates that, for the four MBRFC basins, the EnsPost ensemble predictions are superior to the MS-EnsPost for the first one or two days of lead time. The above result was traced to reduced reliability (Alizadeh 2019) due to the use of the error statistics of
Overall, the CBRFC, NCRFC, and NWRFC basins show particularly large improvement by MS-EnsPost over EnsPost. For many CBRFC and NWRFC basins, streamflow is fed by snowmelt, which increases hydrologic memory. The CBRFC and NWRFC results indicate that multiscale regression in MS-EnsPost is able to utilize effectively the predictability present in the model-simulated and observed flows over a range of temporal scales of aggregation. For the NCRFC basins, on the other hand, the significant improvement by MS-EnsPost is found to be due more to bias correction than multiscale regression (Alizadeh 2019). For the NERFC basins, MS-EnsPost shows significantly larger improvement over EnsPost for larger basins. For the CNRFC basins, MS-EnsPost significantly improves over EnsPost for some basins whereas MS-EnsPost and EnsPost perform similarly for the others. The improvement is found to be generally smaller for the coastal basins (Alizadeh 2019). For the WGRFC basins, MS-EnsPost significantly improves over EnsPost. It indicates that bias correction and multiscale regression are effective in addressing the flow-magnitude-dependent biases in raw model-predicted flow and intermittency of streamflow in the semiarid region.
Decomposition of the mean CRPS [Eq. (22); see Alizadeh (2019) for examples] indicates that, for most basins, the reduction in mean CRPS by MS-EnsPost over EnsPost is due mostly to improved resolution, rather than improved reliability. This is not very surprising in that the EnsPost uses empirical probability-matching based on NQT whereas MS-EnsPost relies on approximate distribution modeling via the Box–Cox transformation. If the historical record is long enough to model the tails of the distributions with accuracy, one may expect the ensemble traces sampled from the empirically modeled distributions to be more reliable. To scrutinize reliability of MS-EnsPost ensemble predictions, we also examined the reliability diagrams (Brown and Seo 2010; Jolliffe and Stephenson 2012; Wilks 2006) and Brier scores (Brier 1950) for a wide range of thresholds [see Alizadeh (2019) for examples]. They indicate that the MS-EnsPost ensembles are generally as reliable as the EnsPost ensembles for the 90th percentile or larger thresholds, but less so for the 50th percentiles or smaller thresholds. Not surprisingly, Box–Cox transform is not as effective as normal quantile transform particularly for low flows. For flood and water supply forecasting, performance for larger flows is much more important than that for smaller flows. As such, some deterioration in reliability at lower thresholds may not be a large concern in most applications.
c. Streamflow predictability
The skill in postprocessed ensemble predictions is bounded by the predictability of streamflow explainable by the forcings (Baldwin et al. 2003; Bengtsso and Hodges 2006; Gebregiorgis and Hossain 2011; Li and Ding 2011; Simmons et al. 1995), hydrologic and reservoir models (Hou et al. 2009; Mahanama et al. 2012; Maurer and Lettenmaier 2004; Schlosser and Milly 2002), and statistical assimilation of streamflow via multiscale regression (Bogner et al. 2016; Sharma et al. 2018) used. In this section, we assess and characterize the predictability of streamflow in different hydroclimatological regions based on the ensemble prediction results presented above, and attribute the gains by MS-EnsPost over EnsPost by assessing the predictability through a skill score (Hou et al. 2009; Westra and Sharma 2010). Figure 7 shows the CRPSS of the MS-EnsPost ensemble predictions for all seasons for lead times of 1–32 days. The reference forecast is the sample climatology of historical observed flow. To assess seasonal variations, we also examined the wet-versus-dry seasonal results. They showed that, except for the CBRFC basins, the CRPSS does not differ much between the two seasons. As such, we only present the combined results which are necessarily more reflective of the wet season. For the CBRFC basins, the CRPSS is significantly lower for the dry season due to the fact that highly persistent low-flow conditions may be predicted very well with climatology. In Fig. 7, the vertical spread in the CRPSS curves represents the variations in predictability of streamflow among the different basins within each RFC’s service area. It is readily seen that the CBRFC basins, all of which are in the Upper Colorado River basin, exhibit the smallest variations. The largest variations are observed with the NWRFC basins which encompass the coastal, mountain and intermountain regions of the Pacific Northwest. For each RFC, there are a small number of basins with conspicuously lower CRPSS. They are generally associated with regulated flows which increase hydrologic uncertainty. Because these basins do not represent natural flows, they are treated separately in the analysis below.
Figure 8 shows the resulting pairs of Lhm and CRPSS(|∞|) for all basins in each RFC as obtained from the MS-EnsPost and EnsPost predictions. For each basin, an arrow connects the EnsPost result to the matching MS-EnsPost result. If MS-EnsPost increases the limiting CRPSS, the arrow would point upward. If MS-EnsPost increases the hydrologic memory scale, the arrow would point to the right. The longer the arrow is, the larger the improvement or deterioration is. Accordingly, lengthy arrows pointing in the upper-right direction would indicate MS-EnsPost clearly improving over EnsPost. It is seen that, for a number of basins, MS-EnsPost improves limiting CRPSS but reduces the hydrologic memory scale, resulting in arrows pointing in the upper-left direction. For these basins, the limiting CRPSS for EnsPost is significantly smaller than that for MS-EnsPost whereas the CRPSS for day-1 prediction is as large as or larger than that for MS-EnsPost. In such a case, EnsPost prediction is likely to yield a larger hydrologic memory than the MS-EnsPost. Such “inflated” hydrologic memory for EnsPost is an artifact of significantly smaller limiting CRPSS and hence, by itself, is not a very useful indicator of predictive skill. Accordingly, one may consider MS-EnsPost inferior to EnsPost only if the arrow is pointing in the lower-left direction.
Figure 8 shows that MS-EnsPost outperforms or performs comparably to EnsPost for all basins, increases limiting CRPSS for almost all basins, and provides significant additional skill via multiscale regression particularly for the CBRFC, CNRFC, NCRFC, and NWRFC basins. From Fig. 8, a number of postulations may also be made. The significant increase in Lhm by MS-EnsPost for the CBRFC, CNRFC, NCRFC, and NWRFC basins suggests that there exists significant multiscale hydrologic memory to be exploited for operational hydrologic forecasting via data assimilation. The significant increase in CRPSS(|∞|) by MS-EnsPost for the MBRFC and NERFC basins suggests that there may exist significant room for improving calibration, hydrologic modeling, and input forcings to reduce hydrologic uncertainties. The WGRFC basin results, on the other hand, suggest limited room for improving predictive skill within the existing modeling and forecasting process, and point to improving model physics as well as soil moisture sensing and its assimilation.
The relative importance of CRPSS(|∞|) versus Lhm in assessing predictability necessarily varies with the application at hand. For long-range predictions, CRPSS(|∞|) would be more important whereas, for short-range predictions, Lhm may be just as important. Hence, it is not readily possible to translate uniquely the two summary attributes into a single measure. One may consider, however, the relative positions of the [Lhm, CRPSS(|∞|)] pairs for MS-EnsPost (i.e., the tips of the arrows) within the x–y plot in Fig. 8, and approximately rank the groups of basins in different RFCs in terms of the collective strength of predictability as measured through MS-EnsPost. The figure indicates that the CBRFC, CNRFC, NWRFC basins are the most predictable, followed by the NERFC, MARFC, and NCRFC basins, and that the MBRFC and WGRFC basins are the least predictable. The above order reflects what may be garnered visually from Fig. 7, and generally follows the decreasing order of the fraction of precipitation as snowfall fs and mean annual precipitation (see Fig. 1). To illustrate, Fig. 9 shows CRPSS(|∞|) versus annual mean precipitation for basins with fs < 0.3 (Fig. 9a) and Lhm versus fs for all basins (Fig. 9b). Though the scatters are large, CRPSS(|∞|) for non-snow-dominated basins relates well with mean annual precipitation except for the few very wet coastal basins, and Lhm relate positively with fs for all basins.
5. Conclusions and future research recommendations
We describe a novel multiscale postprocessor for ensemble streamflow prediction, MS-EnsPost, and compare with the existing postprocessor, EnsPost, in the NWS’s HEFS for 139 basins in the service areas of eight RFCs in the CONUS. MS-EnsPost uses data-driven correction of magnitude-dependent biases in model-simulated flow, multiscale regression to utilize observed and simulated flows over a range of temporal scales of aggregation, and ensemble generation based on parsimonious error modeling. Comparative evaluation of raw and bias-corrected single-valued predictions and ensemble mean predictions from MS-EnsPost and EnsPost shows that MS-EnsPost reduces the RMSE of day-1 to day-7 predictions of daily flow from EnsPost on average by 28%, and that, for most basins, the improvement is due to both bias correction and multiscale regression. Comparative evaluation of ensemble predictions from MS-EnsPost and EnsPost shows that MS-EnsPost reduces the mean CRPS of day-1 to day-7 predictions of daily flow from EnsPost on average by 18%, and that the improvement is due mostly to improved resolution than reliability. Examination of the CRPSS of ensemble predictions indicate that, for most basins, the improvement by MS-EnsPost over EnsPost is due to both magnitude-dependent bias correction and multiscale regression, which utilizes hydrologic memory more effectively. Comparison of the CRPSS with hydroclimatic indices indicates that the skill in ensemble streamflow predictions from MS-EnsPost is modulated by the fraction of precipitation as snowfall and, for non-snowfall-driven basins, mean annual precipitation.
In addition to improving performance, the development of MS-EnsPost is motivated for reducing data requirement. Streamflow responses have changed or are changing significantly in many parts of the world due to urbanization and climate change (Milly et al. 2008). Changing conditions force statistical post processors a difficult trade-off between accounting for nonstationarities by dividing the period of record or modeling trends, which would significantly increase sampling uncertainties, versus keeping sampling uncertainties smaller but at the expense of introducing biases due to nonstationarities. Owing to the parsimony, one may expect MS-EnsPost to require significantly less data than EnsPost. We are currently assessing the data requirement for MS-EnsPost for possible application under nonstationarity, and the results will be reported in the near future.
Acknowledgments
This material is based upon work supported by the NOAA Climate Program Office under Grant NA15OAR4310109, the NWS COMET Program under UCAR Subaward No. SUBAWD000020, and the NSF under Grant CyberSEES-1442735. These supports are gratefully acknowledged. We thank John Lhotak of CBRFC, Brett Whitin and Ark Henkel of CNRFC, Seann Reed of MARFC, Lisa Holts of MBRFC, Andrea Holz of NCRFC, Erick Boehmler of NERFC, Brad Gillies of NWRFC, Andrew Philpott, Frank Bell, Paul McKee, Kris Lander and Mark Null of WGRFC for providing data and help during the course of this work.
REFERENCES
Adams, T. E., III, 2016: Flood forecasting in the United States NOAA/National Weather Service. Flood Forecasting: A Global Perspective, Academic Press, 249–310, https://doi.org/10.1016/B978-0-12-801884-2.00010-4.
Ajami, N. K., Q. Duan, and S. Sorooshian, 2007: An integrated hydrologic Bayesian multimodel combination framework: Confronting input, parameter, and model structural uncertainty in hydrologic prediction. Water Resour. Res., 43, W01403, https://doi.org/10.1029/2005WR004745.
Alizadeh, B., 2019: Improving post processing of ensemble streamflow forecast for short-to-long ranges: a multiscale approach, PhD dissertation, Dept. of Civil Engineering, The University of Texas at Arlington, 125 pp., https://rc.library.uta.edu/uta-ir/bitstream/handle/10106/28663/ALIZADEH-DISSERTATION-2019.pdf?sequence=1.
Anderson, E. A., 1973: National Weather Service River Forecast System—Snow accumulation and ablation model. NOAA Tech. Memo. NWS HYDRO-17, 87 pp., https://www.wcc.nrcs.usda.gov/ftpref/wntsc/H&H/snow/AndersonHYDRO17.pdf.
Baldwin, M. P., D. B. Stephenson, D. W. Thompson, T. J. Dunkerton, A. J. Charlton, and A. O’Neill, 2003: Stratospheric memory and skill of extended-range weather forecasts. Science, 301, 636–640, https://doi.org/10.1126/science.1087143.
Bengtsso, L., and K. I. Hodges, 2006: A note on atmospheric predictability. Tellus, 58A, 154–157, https://doi.org/10.1111/j.1600-0870.2006.00156.x.
Bennett, C., R. Stewart, and J. Lu, 2014: Autoregressive with exogenous variables and neural network short-term load forecast models for residential low voltage distribution networks. Energies, 7, 2938–2960, https://doi.org/10.3390/en7052938.
Berghuijs, W. R., and R. A. Woods, 2016: A simple framework to quantitatively describe monthly precipitation and temperature climatology. Int. J. Climatol., 36, 3161–3174, https://doi.org/10.1002/joc.4544.
Berghuijs, W. R., M. Sivapalan, R. A. Woods, and H. H. Savenije, 2014: Patterns of similarity of seasonal water balances: A window into streamflow variability over a range of time scales. Water Resour. Res., 50, 5638–5661, https://doi.org/10.1002/2014WR015692.
Bjørnar Bremnes, J., 2004: Probabilistic forecasts of precipitation in terms of quantiles using NWP model output. Mon. Wea. Rev., 132, 338–347, https://doi.org/10.1175/1520-0493(2004)132<0338:PFOPIT>2.0.CO;2.
Blöschl, G., and M. Sivapalan, 1995: Scale issues in hydrological modelling: a review. Hydrol. Processes, 9, 251–290, https://doi.org/10.1002/hyp.3360090305.
Bogner, K., K. Liechti, and M. Zappa, 2016: Post-processing of stream flows in Switzerland with an emphasis on low flows and floods. Water, 8, 115, https://doi.org/10.3390/w8040115.
Borgomeo, E., J. W. Hall, F. Fung, G. Watts, K. Colquhoun, and C. Lambert, 2014: Risk-based water resources planning: Incorporating probabilistic nonstationary climate uncertainties. Water Resour. Res., 50, 6850–6873, https://doi.org/10.1002/2014WR015558.
Boucher, M. A., L. Perreault, F. Anctil, and A. C. Favre, 2015: Exploratory analysis of statistical post-processing methods for hydrological ensemble forecasts. Hydrol. Processes, 29, 1141–1155, https://doi.org/10.1002/hyp.10234.
Box, G. E., and D. Cox, 1964: An analysis of transformations. J. Roy. Stat. Soc. B, 26, 211–252.
Box, G. E., and G. M. Jenkins, 1976: Time Series Analysis: Forecasting and Control. Holden Day, 575 pp.
Brier, G. W., 1950: Verification of forecasts expressed in terms of probability. Mon. Wea. Rev., 78, 1–3, https://doi.org/10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2.
Brown, J. D., and D.-J. Seo, 2010: A nonparametric postprocessor for bias correction of hydrometeorological and hydrologic ensemble forecasts. J. Hydrometeor., 11, 642–665, https://doi.org/10.1175/2009JHM1188.1.
Brown, J. D., J. Demargne, D.-J. Seo, and Y. Liu, 2010: The Ensemble Verification System (EVS): A software tool for verifying ensemble forecasts of hydrometeorological and hydrologic variables at discrete locations. Environ. Modell. Software, 25, 854–872, https://doi.org/10.1016/j.envsoft.2010.01.009.
Budyko, M. I., D. H. Miller, and D. H. Miller, 1974: Climate and Life. D. H. Miller, Ed., International Geophysics Series, Vol. 18, Academic Press, 508 pp.
Burnash, R. J., R. L. Ferral, and R. A. McGuire, 1973: A generalized streamflow simulation system: Conceptual models for digital computers. Joint Federal and State River Forecast Center, U.S. National Weather Service, and California Department of Water Resources Tech. Rep., 204 pp.
Butts, M. B., J. T. Payne, M. Kristensen, and H. Madsen, 2004: An evaluation of the impact of model structure on hydrological modelling uncertainty for streamflow simulation. J. Hydrol., 298, 242–266, https://doi.org/10.1016/j.jhydrol.2004.03.042.
Carlu, M., F. Ginelli, V. Lucarini, and A. Politi, 2019: Lyapunov analysis of multiscale dynamics: the slow bundle of the two-scale Lorenz 96 model. Nonlinear Processes Geophys., 26, 73–89, https://doi.org/10.5194/npg-26-73-2019.
Chapman, D. G., 1956: Estimating the parameters of a truncated gamma distribution. Ann. Math. Stat., 27, 498–506, https://doi.org/10.1214/aoms/1177728272.
Chow, V. T., D. R. Maidment, and L. W. Mays, 1988: Unit hydrograph. Applied Hydrology, McGraw-Hill, 100–118.
Cloke, H., and F. Pappenberger, 2009: Ensemble flood forecasting: A review. J. Hydrol., 375, 613–626, https://doi.org/10.1016/j.jhydrol.2009.06.005.
Coccia, G., and E. Todini, 2011: Recent developments in predictive uncertainty assessment based on the model conditional processor approach. Hydrol. Earth Syst. Sci., 15, 3253–3274, https://doi.org/10.5194/hess-15-3253-2011.
Damon, J., and S. Guillas, 2002: The inclusion of exogenous variables in functional autoregressive ozone forecasting. Environmetrics, 13, 759–774, https://doi.org/10.1002/env.527.
Demargne, J., and Coauthors, 2014: The science of NOAA’s operational hydrologic ensemble forecast service. Bull. Amer. Meteor. Soc., 95, 79–98, https://doi.org/10.1175/BAMS-D-12-00081.1.
Demeritt, D., S. Nobert, H. Cloke, and F. Pappenberger, 2010: Challenges in communicating and using ensembles in operational flood forecasting. Meteor. Appl., 17, 209–222, https://doi.org/10.1002/met.194.
Deutsch, R., 1965: Estimation Theory. Prentice-Hall, 269 pp.
Doherty, J., and D. Welter, 2010: A short exploration of structural noise. Water Resour. Res., 46, W05525, https://doi.org/10.1029/2009WR008377.
Duan, Q., N. K. Ajami, X. Gao, and S. Sorooshian, 2007: Multi-model ensemble hydrologic prediction using Bayesian model averaging. Adv. Water Resour., 30, 1371–1386, https://doi.org/10.1016/j.advwatres.2006.11.014.
Dunne, T., and R. D. Black, 1970: Partial area contributions to storm runoff in a small New England watershed. Water Resour. Res., 6, 1296–1311, https://doi.org/10.1029/WR006i005p01296.
Engman, E. T., and A. S. Rogowski, 1974: A partial area model for storm flow synthesis. Water Resour. Res., 10, 464–472, https://doi.org/10.1029/WR010i003p00464.
Erickson, M., 1996: Medium-range prediction of PoP and Max/Min in the era of ensemble model output. Preprints, Conf. on Weather Analysis and Forecasting, Norfolk, VA, Amer. Meteor. Soc., J35–J38.
Fedora, M., and R. Beschta, 1989: Storm runoff simulation using an antecedent precipitation index (API) model. J. Hydrol., 112, 121–133, https://doi.org/10.1016/0022-1694(89)90184-4.
Fowler, K., M. Peel, A. Western, and L. Zhang, 2018: Improved rainfall-runoff calibration for drying climate: Choice of objective function. Water Resour. Res., 54, 3392–3408, https://doi.org/10.1029/2017WR022466.
Freer, J., K. Beven, and B. Ambroise, 1996: Bayesian estimation of uncertainty in runoff prediction and the value of data: An application of the GLUE approach. Water Resour. Res., 32, 2161–2173, https://doi.org/10.1029/95WR03723.
Freeze, R. A., 1972: Role of subsurface flow in generating surface runoff: 2. Upstream source areas. Water Resour. Res., 8, 1272–1283, https://doi.org/10.1029/WR008i005p01272.
Gan, T. Y., E. M. Dlamini, and G. F. Biftu, 1997: Effects of model complexity and structure, data quality, and objective functions on hydrologic modeling. J. Hydrol., 192, 81–103, https://doi.org/10.1016/S0022-1694(96)03114-9.
Gebregiorgis, A., and F. Hossain, 2011: How much can a priori hydrologic model predictability help in optimal merging of satellite precipitation products? J. Hydrometeor., 12, 1287–1298, https://doi.org/10.1175/JHM-D-10-05023.1.
Gelb, A., 1974: Applied Optimal Estimation. MIT Press, 374 pp.
Georgakakos, K. P., D.-J. Seo, H. Gupta, J. Schaake, and M. B. Butts, 2004: Towards the characterization of streamflow simulation uncertainty through multimodel ensembles. J. Hydrol., 298, 222–241, https://doi.org/10.1016/j.jhydrol.2004.03.037.
Gijsbers, P., L. Cajina, C. Dietz, J. Roe, and E. Welles, 2009: CHPS - An NWS development to enter the interoperability era. 2009 Fall Meeting, San Francisco, CA, Amer. Geophys. Union, Abstract IN11A-1041.
Gneiting, T., A. E. Raftery, A. H. Westveld III, and T. Goldman, 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., 133, 1098–1118, https://doi.org/10.1175/MWR2904.1.
Greene, W. H., 2003: Econometric Analysis. 5th ed. Prentice Hall, 1026 pp.
Gupta, H. V., M. P. Clark, J. A. Vrugt, G. Abramowitz, and M. Ye, 2012: Towards a comprehensive assessment of model structural adequacy. Water Resour. Res., 48, W08301, https://doi.org/10.1029/2011WR011044.
Hall, J. W., and E. Borgomeo, 2013: Risk-based principles for defining and managing water security. Philos. Trans. Roy. Soc. London, 371A, 20120407, https://doi.org/10.1098/rsta.2012.0407.
Hall, J. W., and Coauthors, 2020: Risk-based water resources planning in practice: a blueprint for the water industry in England. Water Environ. J., https://doi.org/10.1111/wej.12479, in press.
Hamill, T. M., and S. J. Colucci, 1998: Evaluation of Eta–RSM ensemble probabilistic precipitation forecasts. Mon. Wea. Rev., 126, 711–724, https://doi.org/10.1175/1520-0493(1998)126<0711:EOEREP>2.0.CO;2.
Hamill, T. M., J. S. Whitaker, and X. Wei, 2004: Ensemble reforecasting: Improving medium-range forecast skill using retrospective forecasts. Mon. Wea. Rev., 132, 1434–1447, https://doi.org/10.1175/1520-0493(2004)132<1434:ERIMFS>2.0.CO;2.
Hartman, R., M. Fresch, and E. Wells, 2015: National Weather Service (NWS) implementation of the Hydrologic Ensemble Forecast Service. 2015 Fall Meeting, San Francisco, CA, Amer. Geophys. Union, Abstract H52A-01.
Hashino, T., A. A. Bradley, and S. S. Schwartz, 2002: Verification of probabilistic streamflow forecasts. IIHR Rep. 427, University of Iowa, 125 pp., https://www.iihr.uiowa.edu/wp-content/uploads/2013/06/IIHR427.pdf.
Hashino, T., A. A. Bradley, and S. S. Schwartz, 2006: Evaluation of bias-correction methods for ensemble streamflow volume forecasts. Hydrol. Earth Syst. Sci., 11, 939–950, https://doi.org/10.5194/hess-11-939-2007.
Hersbach, H., 2000: Decomposition of the continuous ranked probability score for ensemble prediction systems. Wea. Forecasting, 15, 559–570, https://doi.org/10.1175/1520-0434(2000)015<0559:DOTCRP>2.0.CO;2.
Horton, R. E., 1933: The role of infiltration in the hydrologic cycle. Trans. Amer. Geophys. Union, 14, 446–460, https://doi.org/10.1029/TR014i001p00446.
Hou, D., K. Mitchell, Z. Toth, D. Lohmann, and H. Wei, 2009: The effect of large-scale atmospheric uncertainty on streamflow predictability. J. Hydrometeor., 10, 717–733, https://doi.org/10.1175/2008JHM1064.1.
Jolliffe, I. T., and D. B. Stephenson, 2012: Forecast Verification: A Practitioner's Guide in Atmospheric Science. John Wiley & Sons, 292 pp.
Kim, S., and Coauthors, 2016: Integrating ensemble forecasts of precipitation and streamflow into decision support for reservoir operations in north central Texas. 2016 Fall Meeting, San Francisco, CA, Amer. Geophys. Union, Abstract H53I-08.
Kim, S., and Coauthors, 2018: Assessing the skill of medium-range ensemble precipitation and streamflow forecasts from the Hydrologic Ensemble Forecast Service (HEFS) for the Upper Trinity River Basin in North Texas. J. Hydrometeor., 19, 1467–1483, https://doi.org/10.1175/JHM-D-18-0027.1.
Kim, S. M., B. L. Benham, K. M. Brannan, R. W. Zeckoski, and J. Doherty, 2007: Comparison of hydrologic calibration of HSPF using automatic and manual methods. Water Resour. Res., 43, W01402, https://doi.org/10.1029/2006WR004883.
Knowles, N., M. D. Dettinger, and D. R. Cayan, 2006: Trends in snowfall versus rainfall in the western United States. J. Climate, 19, 4545–4559, https://doi.org/10.1175/JCLI3850.1.
Koenker, R., and G. Bassett, 1978: Regression quantiles. Econometrica, 46, 33–50, https://doi.org/10.2307/1913643.
Krause, P., D. Boyle, and F. Bäse, 2005: Comparison of different efficiency criteria for hydrological model assessment. Adv. Geosci., 5, 89–97, https://doi.org/10.5194/adgeo-5-89-2005.
Krzysztofowicz, R., 1999: Bayesian theory of probabilistic forecasting via deterministic hydrologic model. Water Resour. Res., 35, 2739–2750, https://doi.org/10.1029/1999WR900099.
Krzysztofowicz, R., and K. S. Kelly, 2000: Hydrologic uncertainty processor for probabilistic river stage forecasting. Water Resour. Res., 36, 3265–3277, https://doi.org/10.1029/2000WR900108.
Krzysztofowicz, R., and H. D. Herr, 2001: Hydrologic uncertainty processor for probabilistic river stage forecasting: precipitation-dependent model. J. Hydrol., 249, 46–68, https://doi.org/10.1016/S0022-1694(01)00412-7.
Kumar, P., 2011: Typology of hydrologic predictability. Water Resour. Res., 47, W00H05, https://doi.org/10.1029/2010WR009769.
Lee, H., Y. Liu, J. Brown, J. Ward, A. Maestre, M. A. Fresch, H. Herr, and E. Wells, 2018: Validation of ensemble streamflow forecasts from the Hydrologic Ensemble Forecast Service (HEFS). 2018 Fall Meeting, Washington, D.C., Amer. Geophys. Union, H31A-05.
Legates, D. R., and G. J. McCabe Jr., 1999: Evaluating the use of “goodness-of-fit” measures in hydrologic and hydroclimatic model validation. Water Resour. Res., 35, 233–241, https://doi.org/10.1029/1998WR900018.
Li, J., and R. Ding, 2011: Temporal–spatial distribution of atmospheric predictability limit by local dynamical analogs. Mon. Wea. Rev., 139, 3265–3283, https://doi.org/10.1175/MWR-D-10-05020.1.
Li, M., Q. J. Wang, J. C. Bennett, and D. E. Robertson, 2016: Error reduction and representation in stages (ERRIS) in hydrological modelling for ensemble streamflow forecasting. Hydrol. Earth Syst. Sci., 20, 3561–3579, https://doi.org/10.5194/hess-20-3561-2016.
Li, W., Q. Duan, C. Miao, A. Ye, W. Gong, and Z. Di, 2017: A review on statistical postprocessing methods for hydrometeorological ensemble forecasting. Wiley Interdiscip. Rev.: Water, 4, e1246, https://doi.org/10.1002/wat2.1246.
Li, Z., J. C. McWilliams, K. Ide, and J. D. Farrara, 2015: A multiscale variational data assimilation scheme: Formulation and illustration. Mon. Wea. Rev., 143, 3804–3822, https://doi.org/10.1175/MWR-D-14-00384.1.
Limon, R., 2019: Improving multi-reservoir water supply system operation using ensemble forecasting and global sensitivity analysis. Ph.D. dissertation, The University of Texas at Arlington, 164 pp., http://hdl.handle.net/10106/28115.
Loague, K., and J. E. VanderKwaak, 2004: Physics-based hydrologic response simulation: Platinum bridge, 1958 Edsel, or useful tool. Hydrol. Processes, 18, 2949–2956, https://doi.org/10.1002/hyp.5737.
Madadgar, S., H. Moradkhani, and D. Garen, 2014: Towards improved post-processing of hydrologic forecast ensembles. Hydrol. Processes, 28, 104–122, https://doi.org/10.1002/hyp.9562.