Multiscale Postprocessor for Ensemble Streamflow Prediction for Short to Long Ranges

Babak Alizadeh Department of Civil Engineering, The University of Texas at Arlington, Arlington, Texas

Search for other papers by Babak Alizadeh in
Current site
Google Scholar
PubMed
Close
,
Reza Ahmad Limon Department of Civil Engineering, The University of Texas at Arlington, Arlington, Texas

Search for other papers by Reza Ahmad Limon in
Current site
Google Scholar
PubMed
Close
,
Dong-Jun Seo Department of Civil Engineering, The University of Texas at Arlington, Arlington, Texas

Search for other papers by Dong-Jun Seo in
Current site
Google Scholar
PubMed
Close
,
Haksu Lee LEN Technologies, Inc., Oak Hill, Virginia

Search for other papers by Haksu Lee in
Current site
Google Scholar
PubMed
Close
, and
James Brown Hydrologic Solutions Limited, Southampton, United Kingdom

Search for other papers by James Brown in
Current site
Google Scholar
PubMed
Close
Free access

Abstract

A novel multiscale postprocessor for ensemble streamflow prediction, MS-EnsPost, is described and comparatively evaluated with the existing postprocessor in the National Weather Service’s Hydrologic Ensemble Forecast Service, EnsPost. MS-EnsPost uses data-driven correction of magnitude-dependent bias in simulated flow, multiscale regression using observed and simulated flows over a range of temporal aggregation scales, and ensemble generation using parsimonious error modeling. For comparative evaluation, 139 basins in eight River Forecast Centers in the United States were used. Streamflow predictability in different hydroclimatological regions is assessed and characterized, and gains by MS-EnsPost over EnsPost are attributed. The ensemble mean and ensemble prediction results indicate that, compared to EnsPost, MS-EnsPost reduces the root-mean-square error and mean continuous ranked probability score of day-1 to day-7 predictions of mean daily flow by 5%–68% and by 2%–62%, respectively. The deterministic and probabilistic results indicate that for most basins the improvement by MS-EnsPost is due to both magnitude-dependent bias correction and full utilization of hydrologic memory through multiscale regression. Comparison of the continuous ranked probability skill score results with hydroclimatic indices indicates that the skill of ensemble streamflow prediction with post processing is modulated largely by the fraction of precipitation as snowfall and, for non-snow-driven basins, mean annual precipitation.

Current affiliation: Servant Engineering and Consulting, PLLC, Austin, Texas.

© 2020 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Babak Alizadeh, babak.alizadeh@mavs.uta.edu

Abstract

A novel multiscale postprocessor for ensemble streamflow prediction, MS-EnsPost, is described and comparatively evaluated with the existing postprocessor in the National Weather Service’s Hydrologic Ensemble Forecast Service, EnsPost. MS-EnsPost uses data-driven correction of magnitude-dependent bias in simulated flow, multiscale regression using observed and simulated flows over a range of temporal aggregation scales, and ensemble generation using parsimonious error modeling. For comparative evaluation, 139 basins in eight River Forecast Centers in the United States were used. Streamflow predictability in different hydroclimatological regions is assessed and characterized, and gains by MS-EnsPost over EnsPost are attributed. The ensemble mean and ensemble prediction results indicate that, compared to EnsPost, MS-EnsPost reduces the root-mean-square error and mean continuous ranked probability score of day-1 to day-7 predictions of mean daily flow by 5%–68% and by 2%–62%, respectively. The deterministic and probabilistic results indicate that for most basins the improvement by MS-EnsPost is due to both magnitude-dependent bias correction and full utilization of hydrologic memory through multiscale regression. Comparison of the continuous ranked probability skill score results with hydroclimatic indices indicates that the skill of ensemble streamflow prediction with post processing is modulated largely by the fraction of precipitation as snowfall and, for non-snow-driven basins, mean annual precipitation.

Current affiliation: Servant Engineering and Consulting, PLLC, Austin, Texas.

© 2020 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Babak Alizadeh, babak.alizadeh@mavs.uta.edu

1. Introduction

Streamflow simulations from hydrologic models contain errors propagated from uncertain forcings, model initial conditions, parameters and structures, and human control of storage and movement of water (Ajami et al. 2007; Doherty and Welter 2010; Gupta et al. 2012; Krzysztofowicz 1999; Montanari and Brath 2004; NRC 2006; Renard et al. 2010; Schaake et al. 2007b; Seo et al. 2006; Wood and Schaake 2008). For risk-based management of water resources and water-related hazards, it is necessary to quantify the uncertainties arising from these sources (Borgomeo et al. 2014; Butts et al. 2004; Georgakakos et al. 2004; Hall and Borgomeo 2013; Hall et al. 2020). Ensemble forecasting has emerged in recent years as the methodology of choice for modeling and communicating forecast uncertainty (Cloke and Pappenberger 2009; Demargne et al. 2014; Demeritt et al. 2010; NRC 2006; Schaake et al. 2007b). In the United States, the National Weather Service (NWS) has recently implemented the Hydrologic Ensemble Forecast Service (HEFS; Demargne et al. 2014) at all River Forecast Centers (RFC) (Lee et al. 2018) following experimental operation at selected RFCs (Hartman et al. 2015; Kim et al. 2018; Wells 2017). To reduce biases in precipitation and temperature forecasts, the HEFS uses the Meteorological Ensemble Forecast Processor (MEFP; Schaake et al. 2007a; Wu et al. 2011; NWS 2017a). To reduce and quantify hydrologic uncertainty in streamflow prediction, the HEFS employs the ensemble postprocessor, EnsPost (Seo et al. 2006; NWS 2017b). In the HEFS, each MEFP-processed forcing ensemble member is input to the chain of hydrologic models in the Community Hydrologic Prediction System (CHPS; Gijsbers et al. 2009; Roe et al. 2010). The resulting ensemble trace of “raw” streamflow forecast may be input to the ensemble postprocessor to produce an ensemble member of postprocessed streamflow forecast. The descriptor “post” arises from the fact that post processing of streamflow ensemble forecast occurs after the generation of raw ensemble streamflow forecast.

EnsPost was developed originally for short-range forecasting of natural flows in headwater basins and models predictive hydrologic uncertainty using a combination of probability matching (PM; Hashino et al. 2002; Madadgar et al. 2014) and autoregressive (AR)-1 model with an exogenous variable, or ARX (1,1) (Bennett et al. 2014; Damon and Guillas 2002), in bivariate normal space (Krzysztofowicz 1999; Seo et al. 2006). EnsPost applies PM and ARX(1,1) at a daily scale only. In reality, the characteristic time scales of error in model-simulated flow may span a range of scales, depending on the residence time of the hydrologic processes involved and the error characteristics of the forcings and the hydrologic models used (Blöschl and Sivapalan 1995). In addition, if the flow is strongly regulated, the errors may be reducible only over a certain range of temporal scales of aggregation due to the altered residence time and storage-outflow relationships. The postprocessing approach used in this work reflects the multiscale nature of the hydrologic and atmospheric processes (Blöschl and Sivapalan 1995; Carlu et al. 2019; Kumar 2011), and of hydrologic modeling and prediction, including parameter estimation (Mizukami et al. 2017), parameter regionalization (Samaniego et al. 2010), model evaluation (Rakovec et al. 2016), and data assimilation (Li et al. 2015).

The positive impact of postprocessing raw model simulations of streamflow in ensemble streamflow forecasting has been widely reported (Kim et al. 2018, 2016; Madadgar et al. 2014). Recently, it has also been shown that EnsPost significantly increases skill in ensemble forecasts of outflow from a water supply reservoir in North Texas during significant releases, in addition to that in ensemble inflow forecasts (Limon 2019). With increasing acceptance and adoption of ensemble streamflow forecasting by the operational community, developing more potent postprocessing methods has been a very active area of research. To that end, a number of comparison studies have recently been carried out. For postprocessing of meteorological forecast, Wilks (2006) compared direct model output (Wilks 2006), rank histogram recalibration (Hamill and Colucci 1998), single-integration Model Output Statistics (MOS; Erickson 1996), ensemble dressing (Roulston and Smith 2003), logistic regression (Hamill et al. 2004), nonhomogeneous Gaussian regression (Gneiting et al. 2005), forecast assimilation (Stephenson et al. 2005), and Bayesian model averaging (Raftery et al. 2005). He concluded that logistic regression (Duan et al. 2007; Hamill et al. 2004), ensemble MOS (Gneiting et al. 2005), and ensemble dressing outperform the others. For postprocessing of hydrologic forecast, Boucher et al. (2015) compared the regression and dressing methods using synthetic data. They concluded that the techniques have similar overall performance, and that the regression and dressing methods perform better in terms of resolution and reliability, respectively. Mendoza et al. (2016) used medium-range ensemble streamflow forecasts from the System for Hydrometeorological Applications, Research and Prediction, and compared quantile mapping (Mendoza et al. 2016; Hashino et al. 2006; Piani et al. 2010; Regonda and Seo 2008; Wood and Schaake 2008; Zhu and Luo 2015), logistic regression, quantile regression (Bjørnar Bremnes 2004; Bogner et al. 2016; Coccia and Todini 2011; Koenker and Bassett 1978) and the general linear model postprocessor (GLMPP; Zhao et al. 2011). They found that no single method performed best in all situations, and that the post processors’ performance depended on factors such as soil type and land use and hydroclimatic conditions of the basin. Ye et al. (2015) developed canonical events-based GLMPP for postprocessing of streamflow during dry season. Li et al. (2016) developed and evaluated for short-term streamflow forecasting a new method for error modeling where a sequence of simple error models, instead of a single complex model, is run through different stages. Recently, Li et al. (2017) and Vannitsem et al. (2018) carried out comprehensive reviews on the application of different postprocessing techniques.

In this paper, we introduce a new multiscale postprocessor for ensemble streamflow prediction for short to long ranges. By short and long ranges, we mean up to several days and at least 1 month ahead, respectively. The proposed technique, referred to herein as MS-EnsPost, is designed to reduce magnitude-dependent biases in raw model-simulated flow, and utilize all available skill that may exist over a range of temporal scales of aggregation in simulated and observed flows. We then comparatively evaluate MS-EnsPost with EnsPost for 139 basins in the services areas of eight RFCs in the continental United States (CONUS). As part of the evaluation, we also address the following research questions:

  • How does the prediction skill, as measured by MS-EnsPost in reference to climatology, compare among the different RFCs, and among different basins within an RFC?

  • How does the above skill relate to the hydroclimatology of the basin?

  • How does MS-EnsPost perform relative to EnsPost in reducing systematic errors and initial condition uncertainties?

This paper is organized as follows. Section 2 describes the study basins and data used. Section 3 describes the methods used. Section 4 presents the results. Section 5 provides the conclusions and future research recommendations.

2. Study basins and data used

A total of 139 basins, comprising 11, 13, 7, 19, 13, 28, 42, and 6 in the service areas of the Colorado Basin RFC (CBRFC), California Nevada RFC (CNRFC), Middle Atlantic RFC (MARFC), Missouri Basin RFC (MBRFC), North Central RFC (NCRFC), Northeast RFC (NERFC), Northwest RFC (NWRFC), and West Gulf RFC (WGRFC), respectively, are used (see Fig. 1). The CBRFC, CNRFC, MARFC, and WGRFC basins were used in previous studies of EnsPost (Regonda and Seo 2008; Seo et al. 2006). The other basins were selected by the respective RFCs toward improved ensemble service. The basin areas range from 91 to 30 700 km2 with 48, 41, and 50 basins under 500, between 500 and 1000, and over 1000 km2, respectively. The basins cover a wide range of hydroclimatology as may be seen in mean annual precipitation, the aridity index, and the fraction of precipitation as snowfall (see Fig. 1). The aridity index φ is defined as (Budyko et al. 1974):
φ=E¯P¯,
where E¯ and P¯ denote the mean potential evaporation and precipitation (mm day−1), respectively. Semiarid arid basins have 2 < φ < 3 and 3 < φ < 7, respectively (Sankarasubramanian and Vogel 2003). The fraction of precipitation as snowfall fs is defined as
fs=P¯[T<0°C]P¯,
where T denotes the surface air temperature (°C) and P¯[] denotes the mean of precipitation for which the event bracketed holds true. Basins with fs > 0.4 are considered snow-dominated and tend to be located at higher elevations (Knowles et al. 2006). In general, a semiarid or arid basin has smaller predictability of streamflow than a humid basin, and a snowfall-driven basin has larger predictability than a rainfall-driven basin (Berghuijs et al. 2014; Berghuijs and Woods 2016).
Fig. 1.
Fig. 1.

Maps of 139 study basins, showing (a) the aridity index, (b) the fraction of precipitation as snowfall, and (c) mean annual precipitation.

Citation: Journal of Hydrometeorology 21, 2; 10.1175/JHM-D-19-0164.1

The data used for this study are mean daily observed and simulated streamflow. The historical observed mean daily streamflow, referred to as QME in the NWS, is obtained from the U.S. Geological Survey. The focus of this work is on reducing and quantifying hydrologic uncertainty. As such, the model output of interest is the simulated streamflow, which reflects hydrologic uncertainty only, rather than the streamflow forecast, which reflects both meteorological and hydrologic uncertainties (Krzysztofowicz 1999; Seo et al. 2006). The simulated mean daily flow, or SQME, is derived from the simulated instantaneous flow, or SQIN, generated at a 6-h interval using the operational hydrologic models, and the observed forcings of mean areal precipitation, temperature, and potential evapotranspiration. For the remainder of this paper, by daily flow, we mean mean daily flow. The hydrologic models used are the Sacramento (SAC; Burnash et al. 1973) for soil moisture accounting, unit hydrograph (Chow et al. 1988) for surface runoff routing, and SNOW-17 (Anderson 1973) for snow ablation. The MARFC uses the continuous antecedent precipitation index model (API-CONT; Fedora and Beschta 1989; Sittner et al. 1969) instead of SAC. The SQIN time series were produced by the respective RFCs using the Community Hydrologic Prediction System (CHPS; Gijsbers et al. 2009; Roe et al. 2010) based on the RFCs’ historical forcings and calibrated model parameters. The CHPS is the main operational forecasting system at the RFCs, and uses the single (RES-SNGL) and joint (RES-J) reservoir regulation models, and the SSARR reservoir regulation (SSARRESV) model for simulation of reservoir operations (Adams 2016; NWS 2008a,b). Reservoir models were included in the hydrologic modeling of 20 of the 139 locations used in this work. There are about 14 additional locations impacted by reservoir regulations that are not modeled. Limon (2019) has shown that, for a water supply reservoir in North Texas, the magnitude of reservoir modeling uncertainty may be comparable to that of all other hydrologic uncertainties combined, and may even approach that of the meteorological uncertainty. As such, flow regulations present a large additional challenge to streamflow postprocessing. Experience thus far indicates that at least 20 years’ worth of data is necessary for estimation of the EnsPost parameters (NWS 2017b). The period of record used in this work common to both QME and SQME time series ranged from 12 to 66 years, and exceeded 30 years for over 90% of the basins.

3. Methods used

MS-EnsPost consists of three elements: bias correction, multiscale regression, and ensemble generation. Figure 2 provides a schematic of the data flow and the associated processes. In this section, we describe the above elements and how MS-EnsPost is evaluated.

Fig. 2.
Fig. 2.

Schematic of MS-EnsPost elements and dataflow.

Citation: Journal of Hydrometeorology 21, 2; 10.1175/JHM-D-19-0164.1

a. Bias correction

We define within some time period of interest the multiplicative bias βi in the simulated flow valid at the ith day qis as
βi=qioqis,
where qio denotes the observed flow valid at the ith day. Model-simulated high and low flows generally have different multiplicative biases. Hydrologic models tend to simulate the physical processes that govern high flows relatively more accurately (Dunne and Black 1970; Engman and Rogowski 1974; Freeze 1972; Horton 1933; Loague and VanderKwaak 2004). In addition, most hydrologic models, Continuous API being a prime example, are calibrated to perform better for high flows than for low flows (Fowler et al. 2018; Freer et al. 1996; Gan et al. 1997; Kim et al. 2007; Krause et al. 2005; Legates and McCabe 1999; Nash and Sutcliffe 1970; Smith et al. 2014). The bias correction procedure in MS-EnsPost is designed to address this dependence. Sample estimates of βi at a daily scale are very noisy due to very large variabilities in qio and qis. To obtain stable estimates of the magnitude-dependent bias, the proposed procedure first pairs qio and qis, jointly sorts them in the ascending order of qis, and aggregates the resulting daily flows over different time scales. The temporal aggregation and the attendant noise cancellation greatly reduce the sampling uncertainty in the estimated bias, compared to that without aggregation. The time-aggregated flows are expressed as
ak,(j)o=i=(jk1)Lk+1jkLkq(i)o,
ak,(j)s=i=(jk1)Lk+1jkLkq(i)s.
In the above, the symbol (i) signifies that the variable subscripted is sorted in the ascending order of qis, Lk denotes the kth time scale of aggregation, and jk denotes the jth aggregation window of the kth scale within the period of record, and ak,(j)o and ak,(j)s denote the sorted observed and simulated flows aggregated over the jth time window of the kth time scale, respectively, where the symbol (j) signifies that the aggregation is based on the sorted daily flow. Equations (4) and (5) pool the simulated and observed flows such that, when averaged over the respective aggregation periods, the aggregated simulated flows are similar in magnitude to the conditioning flow qis in Eq. (3). The bias for qis at the kth aggregation scale βk,i is given by
βk,i=ak,(j)oak,(j)s,i[(jk1)Lk+1,jkLk].
In the above, the range for the index i, which is associated with the simulated flow to be bias-corrected qis identifies the time scale of aggregation associated with the bias being estimated. In this work, we used Lk = 2k (days), k = 1, …, 14, for the aggregation scales, but other choices are possible. In the above, the largest aggregation scale is almost 45 years long with which one would be applying a single multiplicative bias estimate for all simulated daily flows regardless of their magnitude.

Among the total of K different temporal scales of aggregation, the best-performing scale is identified via leave-one-year-out (or similar) cross validation using a period of record of N years as described below. First, the magnitude-dependent biases are estimated at the K different scales of aggregation using an (N − 1)-yr period of observed flow and matching model simulation. The resulting biases are applied to the simulated daily flow valid on each Julian day of the withheld year. The procedure then identifies the aggregation scale that produces the smallest RMSE over the year in the bias-corrected simulated daily flow by comparing with the verifying observed flow. Once completed for all N years, the leave-one-year-out cross validation produces a total of N different sets of magnitude-dependent biases for simulated daily flow. For a given qis, the procedure then arithmetically averages the N different biases associated with the respective time windows that enclose the particular value of qis in the sorted series. The procedure repeats the above steps for all possible values of qis, from which the final single relationship between the simulated flow and the magnitude-dependent bias results. Figure 3 shows the schematic of the magnitude-dependent bias estimation procedure. Alizadeh (2019) provides examples of the resulting relationship of the multiplicative bias versus the magnitude of simulated daily flow.

Fig. 3.
Fig. 3.

Schematic of the magnitude-dependent bias correction procedure.

Citation: Journal of Hydrometeorology 21, 2; 10.1175/JHM-D-19-0164.1

b. Multiscale regression

Postprocessing generally seeks predictions at the highest possible temporal resolution. High dimensional stochastic modeling necessary for such predictions, however, is a large challenge due to the complexity involved and large data requirements. In the multiscale regression approach used in this work, we solve instead a large number of very low-dimensional statistical modeling problems. Figure 4 illustrates the basic idea behind the approach in the context of predicting LM day-ahead observed daily flow using the model-simulated daily flow valid over the LM day-long prediction horizon, and the observed daily flow LM − 1 days into the past where M refers to the index for the largest aggregation scale. In this approach, rather than predicting qio,i=1,,LM, using qis,i=1,,LM, and qio,i=(LM1),,0, we predict ak,1o=i=1Lkqio using ak,1b=i=1Lkqib and ak,0o=i=(Lk1)0qio for all time scales of aggregation, k = 1, …, M, where qib denotes the bias-corrected model-simulated daily flow βk,iqis [see Eqs. (3) and (6)] and the subscripts 0 and 1 signify the current and one-step-ahead time intervals of the kth time scale, respectively. We then disaggregate the predicted multidaily flow to daily flow using the granular patterns of daily flow in the bias-corrected model-simulated daily flow. The above approach is motivated by the fact that the larger the temporal scale of aggregation is, the more skillful ak,1b is likely to be (Kim et al. 2018; Limon 2019). Similar approaches have also been used in postprocessing forecasts of precipitation (Kim et al. 2018; Schaake et al. 2007a) and streamflow (Regonda and Seo 2008). In this work, the prediction horizon used, LM, is 32 days, and the aggregation scales used are 1, 2, 4, 8, 16, and 32 days (see Fig. 4). Depending on the application and the pattern of time-scale-dependent predictability, however, one may choose a different set of scales.

Fig. 4.
Fig. 4.

Schematic of multiscale regression.

Citation: Journal of Hydrometeorology 21, 2; 10.1175/JHM-D-19-0164.1

To predict the observed flow at the kth scale, we use the following linear model for the predicted, time-aggregated observed flow at the kth time scale, ak,1p, where the subscript 1 signifies that the prediction is for a single time step ahead at the kth time scale:
ak,1p=ϖkak,0o+(1ϖk)ak,1b,k=1,,M.
In the above, ϖk denotes the optimal weight for the time-aggregated observed flow at the kth scale ak,0o, where the subscript 0 signifies that ak,0o is valid at the current time step at the kth time scale. As may be seen in Eq. (7), one may consider multiscale regression a form of statistical fusion of observed streamflow with simulated flow over a range of time scales of aggregation. Significant improvement by multiscale regression is an indication that large uncertainties exist in the initial condition of the hydrologic models, or significant hydrologic memory exists in the surface and soil water storages of the basin. The optimal weight ϖk in Eq. (7) may be obtained via optimal linear (i.e., maximum likelihood) estimation (Deutsch 1965; Schweppe 1973; Gelb 1974) as
[ϖk(1ϖk)]=[UTR1U]1UTR1.
In the above, U denotes the (2 × 1) unit vector, and R denotes the error covariance matrix:
R=[Var(ak,0oak,1o)Cov(ak,0oak,1o,ak,1bak,1o)Cov(ak,0oak,1o,ak,1bak,1o)Var(ak,1bak,1o)].
The predicted daily flow for the ith day from the multiscale regression at the kth time scale qk,ip may be obtained by disaggregating ak,1p,k=1,,M, according to
qk,ip=qibak,1bak,1p=ak,1pak,1bqib.
Equation (10) amounts to adjusting the bias-corrected model-simulated daily flow qib based on how much larger or smaller the predicted time-aggregated flow is relative to the time-aggregated bias-corrected flow, that is, ak,1p/ak,1b. Once the disaggregation process is complete for all time scales of aggregation, the final prediction of observed daily flow, qip,i=1,,LM, is constructed from qk,ip,k=1,,M, by choosing for each day i in the prediction horizon i = 1, …, Lk, the predicted daily flow qk,ip associated with the smallest k, that is, the smallest time scale of aggregation. In this way, if there are multiple predictions with overlapping prediction horizons, the procedure selects the one associated with the shortest lead time. Figure 4 shows the resulting time scales of aggregation over the prediction horizon of 32 days. Albeit heuristic, the above selection rule is based on the reasonable assumption that, the shorter the lead time is, the more skillful qk,ip is. If the period of record is too short relative to the largest time scale of aggregation, the estimation of the error covariance terms in Eq. (9) may not be possible due to small sample size. In such a case, the largest time scale may have to be reduced or dropped.

c. Error modeling and ensemble generation

The last element of MS-EnsPost models the error in the above prediction and its temporal structure for ensemble generation. This element models the time-correlated errors in qip,i=1,,LM from multiscale regression. We define the error εi in the predicted daily flow qip valid for the ith day in the prediction horizon as
εi=qioqip,εiqip,
where qio denotes the verifying observed daily flow. In EnsPost, qio and qip are normal quantile-transformed (NQT) empirically (Krzysztofowicz and Kelly 2000; Krzysztofowicz and Herr 2001), which renders εi normal in the transformed space (Seo et al. 2006). In MS-EnsPost, qio and qip are Box–Cox-transformed (Box and Cox 1964) to avoid data-intensive empirical distribution modeling while rendering the error in the transformed space ε˜i approximately normal and homoscedastic with respect to qip:
ε˜i=q˜ioq˜ip=(qio)λ1λ(qip)λ1λ=(qio)λ(qip)λλ,ε˜i(qip)λ/λ.
In the above, λ denotes the Box–Cox parameter, and q˜io and q˜ip denote the transformed observed and predicted daily flows, respectively. The parameter λ is chosen such that the probability density function (PDF) of ε˜i for any i, or ε˜, conditional on {ε˜qp}, or f1(ε˜|ε˜qp), may be approximated with truncated normal distribution (Robert 1995):
f1(ε˜|ε˜qp)=N(mε˜,σε˜2:qp),
where mε˜ and σε˜2 denote the mean and variance of the untruncated distribution, respectively, and qp denotes the conditioning predicted flow in the truncated distribution. The unknown statistics mε˜ and σε˜2 may be solved for by equating the sample mean με˜ and variancesε˜2 of actual realizations of ε˜ with the mean and variance of modeled ε˜ as shown below:
με˜=0E[ε˜|ε˜(qp)λ/λ]f2(qp)dqp,
sε˜2=0E[ε˜2|ε˜(qp)λ/λ]f2(qp)dqpμε˜2,
where E[] denotes expectation, and f2 denotes the PDF of qp. The expectations in Eqs. (14) and (15) may be expressed in terms of mε˜, σε˜2, and α as follows (Greene 2003):
E[ε˜|ε˜(qp)λ/λ]=mε˜+σε˜ϕ(α)1Φ(α),
E[ε˜2|ε˜(qp)λ/λ]=mε˜2+σε˜2{1ϕ(α)1Φ(α)×[ϕ(α)1Φ(α)α]},
where ϕ() and Φ() denote the standard normal PDF and cumulative distribution function (CDF), respectively, and the standardized lower bound α is given by α=[(qp)λ/λmε˜]/σε˜. Using Eqs. (14)(17) and the empirical distribution of qp, one may solve for mε˜ and σε˜2 numerically. Once mε˜ and σε˜2 are estimated, an ensemble realization of ε˜ conditioned on qip, or ε˜i(ω), may be generated by inverting the standard normal CDF (Robert 1995):
ε˜i(ω)=mε˜+σε˜Φ1{Φ(zil)+U(ω)[1Φ(zil)]}.
In the above, U(ω) denotes the sample realization of the [0,1] uniform random variable, the bracketed term represents Φ{[ε˜i(ω)mε˜]/σε˜} and the normalized lower bound of ε˜i, or zil, is given by zil=[(qip)λ/λmε˜]/σε˜. An ensemble trace of the postprocessed daily flow qio(ω) may then be obtained from
qio(ω)=qip+εi(ω)=[(qip)λ+λε˜i(ω)]1/λ,ε˜i(ω)(qip)λ/λ,
where qio(ω) and εi(ω) denote the ensemble realizations of qio and εi, respectively.

The error modeling as described above requires estimation of λ, mε˜, and σε˜2 that render the distribution of ε˜, approximately truncated normal given q˜p [see Eq. (12)]. In addition, Eq. (13) assumes that ε˜ is approximately homoscedastic with respect to q˜p except near the origin where the lower bound strongly suppresses variability. In reality, the above assumptions may not be met for all basins. In addition, there may not be enough data points over the tail ends of q˜p to test the conditional truncated normality or homoscedasticity. In this work, we check the reasonableness of the above assumptions by examining for each basin the sample moments, normal quantile plots, histograms, and scatterplots of ε˜i versus q˜ip. For those basins showing significant departures from truncated normality or homoscedasticity, we adjusted λ and/or σε˜2 until the results passed the visual test. To improve the objectivity of the error modeling procedure, additional research is necessary. In this study, we only considered truncated normal for the conditional PDF of ε˜. Other distributions, such as truncated gamma (Chapman 1956), are also possible.

To capture the distributional characteristics of multidaily flow, it is necessary to model the temporal dependence of ε˜i. Due to the large number of basins involved, basin-specific modeling of error time series (Box and Jenkins 1976) was beyond the scope of this study. Instead, we modeled the error, ε˜i=q˜ioq˜ib, with AR(1) as a first-order approximation for all basins. The use of q˜ib rather than q˜ip in the above is motivated by the fact that q˜ib is not lead-time dependent, and hence greatly simplifies the modeling. The impact of this simplification appears acceptably small for all but a small number of MBRFC basins (see section 4b). To assess the adequacy of AR(1), we carried out structure identification for a small number of basins in the MARFC, NCRFC, NERFC, and NWRFCs’ service areas. The results indicate that the error structures are generally more complex than AR(1), and contain both autoregressive and moving-average components of higher order. This is not very surprising given the very widely varying hydroclimatology of the basins and goodness of the hydrologic modeling. The simplifying choice of AR(1) in this work is additionally motivated by its use in EnsPost which facilitates direct comparison between MS-EnsPost and EnsPost. Though limited in sample size, the above findings suggest that additional improvement in ensemble prediction of multidaily flow may be possible with improved modeling of temporal dependence of prediction error.

d. Evaluation

For comparative evaluation of MS-EnsPost, we carried out both single-valued and ensemble verification of MS-EnsPost via leave-two-years-out cross validation using the Ensemble Verification System (EVS; Brown et al. 2010). The leave-one-year-out cross validation results are similar. In single-valued verification, we evaluate the raw and bias-corrected predictions, and ensemble mean predictions from postprocessing with EnsPost and MS-EnsPost. In ensemble verification, we evaluate the ensemble predictions from EnsPost and MS-EnsPost, and assess their skill in reference to sample climatology of historical observed flow. In both, we consider predictions of daily flow with lead times of 1–32 days, and of monthly flow with a lead time of one month. For single-valued predictions, we use root-mean-square error (RMSE) as the primary measure of performance:
RMSE(k)=1n(k)i=1n(k)[qip(k)qio(k)]2,
where qip(k) denotes the ith day-k prediction of daily flow; qio(k) denotes the verifying observed daily flow; and n(k) denotes the total number of day-k daily flow predictions. For ensemble predictions, we use the mean continuous ranked probability score (CRPS), its decomposition, and continuous ranked probability skill score (CRPSS) as primary measures (Brown and Seo 2010; Kim et al. 2018). The CRPS represents the integral squared difference between the CDF of the predicted variable FY(q) and that of the verifying observed variable FX(q) (i.e., a step function):
CRPS=[FY(q)FX(q)]2dq.
The mean CRPS (CRPS¯) is the average of the CRPS values from the individual pairs of ensemble forecasts and observations and reflects the overall quality of an ensemble forecasting system (the smaller, the better). In this work, mean CRPS is evaluated following Hersbach (2000). The mean CRPS can be decomposed into mean reliability (REL¯), mean resolution (RES¯), and mean uncertainty (UNC¯), or into mean REL and mean potential CRPS (CRPS¯POT) (Hersbach 2000):
CRPS¯=REL¯RES¯+UNC¯=REL¯+CRPS¯POT.
Smaller REL¯ indicates more reliable ensembles (desirable) and larger RES¯ means better resolution (desirable). The RES¯ component (=UNC¯CRPS¯POT) is positive if the ensemble forecast is better than the climatological ensemble forecast (Hersbach 2000). The UNC¯ component reflects climatological uncertainties in the observations and does not relate to forecast attributes. The CRPS¯POT (=CRPS¯REL¯) represents the CRPS¯ for a perfectly reliable forecast (Hersbach 2000). The CRPSS measures this skill relative to climatology, that is, historical traces of observed daily flow valid at the same time of the year as the subject forecast:
CRPSS¯=CRPS¯climCRPS¯CRPS¯clim
Perfect and skill-less ensemble forecasts have CRPSS of unity and zero, respectively.

4. Results

This section presents the comparative evaluation results for single-valued and ensemble predictions, and assesses the predictability of streamflow as measured from the ensemble prediction results for different hydroclimatological regions.

a. Single-valued streamflow prediction

Figure 5 shows the RMSE of the raw, bias-corrected, MS-EnsPost ensemble mean, and EnsPost ensemble mean streamflow predictions for lead times of 1–7 days and 1 month for the basins in the CBRFC, MARFC, and WGRFCs’ service areas. For similar plots for all other RFCs, the reader is referred to Alizadeh (2019). The above three RFCs are chosen to represent the regions of the largest, medium, and the most limited predictability among the groups of basins considered in this work. In the figure, we connect the RMSE values for each basin to help assess the relative performance among the four different predictions for each basin. A reduction in RMSE by the bias-corrected prediction over the raw is an indication that significant magnitude-dependent biases exist in the raw model-simulated flow due to parametric or structural errors in the hydrologic models, biases in the forcings, or flow regulations. A reduction in RMSE by the MS-EnsPost ensemble mean prediction over the bias-corrected is due to multiscale regression, and indicates that significant uncertainties exist in the initial conditions of the hydrologic models, or significant hydrologic memory exists in the surface and soil water storages of the basin. Due to the temporal aggregation, the monthly results (rightmost columns in Fig. 5) amplify the relative performance of the bias correction components of MS-EnsPost and EnsPost. The monthly results are hence more reflective of the bias correction operation, which impacts over the entire forecast horizon, than the multiscale regression operation, which impacts over the range of hydrologic memory only. The results for all 139 basins indicate that, MS-EnsPost reduces the RMSE of the raw model predictions of daily flow by 5%–74% and when compared to the EnsPost predictions, by 5%–68%, and that MS-EnsPost is superior to EnsPost for 1-month-ahead streamflow prediction for all basins examined in this work. Below we summarize the main RMSE results for each of the 3 RFCs. The results for the other RFCs and the discussion may be found in Alizadeh (2019).

Fig. 5a.
Fig. 5a.

RMSE of the raw, bias-corrected, MS-EnsPost ensemble mean and EnsPost ensemble mean predictions for lead times of 1–7 days and 1 month for the basins in the CBRFC’s service area (yellow dots indicate basins with reservoir model included).

Citation: Journal of Hydrometeorology 21, 2; 10.1175/JHM-D-19-0164.1

For CBRFC (Fig. 5a), the MS-EnsPost ensemble mean prediction improves over the raw and EnsPost ensemble mean predictions for all basins. The yellow circles in the figure indicate that the basin has flow regulations that are modeled with CHPS. In general, both bias correction and multiscale regression contributes to the improvement by MS-EnsPost. For a number of basins, the reduction in RMSE due to multiscale regression persists to Day 4 and beyond, a reflection of the longer hydrologic memory present in the Upper Colorado River basin owing to the snowmelt. For MARFC (Fig. 5b), bias correction generally contributes more to the RMSE reduction by MS-EnsPost than multiscale regression. In Fig. 5, the empty circles indicate that the basin has unmodeled regulated flow. The largest improvement by MS-EnsPost over EnsPost is in the day-1 prediction for RTDP1 (third from the top) which is downstream of Raystown Dam on the Raystown Branch of the Juniata River. Overall, the impact of multiscale regression is rather modest and wears off within the first two days of lead time, an indication that the hydrologic memory in the Juniata River basin in Pennsylvania is relatively short. For the WGRFC basins (Fig. 5c), the improvement by MS-EnsPost over EnsPost is particularly large. These basins are located in the semiarid western part of the Upper Trinity River basin (Kim et al. 2018). As such, they have short memory in surface and soil water storages, and their streams are ephemeral despite relatively large basin size (441 ~ 1764 km2). Because EnsPost does not model intermittency of streamflow, its results are particularly poor for the WGRFC basins. MS-EnsPost, on the other hand, is able to address intermittency to a significant extent by aggregating flow which reduces or removes zero flows at sufficiently large temporal scales. Overall, the reduction in RMSE due to multiscale regression is rather short-lived. The monthly results (rightmost panel in Fig. 5c) for JAKT2 and DCJT2 (second and fourth from the top, respectively) are unexpected in that multiscale regression in MS-EnsPost slightly increased RMSE over magnitude-dependent bias correction alone. The above observation indicates that statistical assimilation of observed streamflow up to a month in aggregation scale does not add skill due to the weak hydrologic memory in streamflow in these basins.

Fig. 5b.
Fig. 5b.

As in (a), but for the MARFC basins.

Citation: Journal of Hydrometeorology 21, 2; 10.1175/JHM-D-19-0164.1

Fig. 5c.
Fig. 5c.

As in (a), but for the WGRFC basins.

Citation: Journal of Hydrometeorology 21, 2; 10.1175/JHM-D-19-0164.1

b. Ensemble streamflow prediction

In this section, we comparatively evaluate the MS-EnsPost ensemble streamflow predictions with the EnsPost. To facilitate comparison for a large number of basins, we use “worm” plots in which the mean CRPS of the MS-EnsPost predictions (y axis) versus the EnsPost predictions (x axis) are dot-plotted and connected for lead times of 1–7 days to form a worm for each basin. Figure 6a shows the worm plots in log–log scale for all study basins for each RFC. The lower and upper ends of each worm are associated with day-1 and -7 predictions for that basin, respectively. If MS-EnsPost improves over EnsPost for 7-day-ahead prediction, the worms would stretch downward from the diagonal. The longer the downward stretch is, the larger the improvement by MS-EnsPost over EnsPost. If MS-EnsPost does not improve over EnsPost, the worms would lie along the diagonal. Figure 6b shows the mean CRPS scatterplots of 1-month-ahead MS-EnsPost predictions of monthly flow versus the EnsPost.

Fig. 6.(a)
Fig. 6.(a)

Worm plots (see text for explanation) of mean CRPS of ensemble predictions of daily flow from MS-EnsPost and EnsPost for lead times of 1–7 days.

Citation: Journal of Hydrometeorology 21, 2; 10.1175/JHM-D-19-0164.1

Fig. 6b.
Fig. 6b.

As in (a), but for 1-month-ahead predictions of monthly flow.

Citation: Journal of Hydrometeorology 21, 2; 10.1175/JHM-D-19-0164.1

Figure 6a shows that, for most basins, MS-EnsPost significantly improves over EnsPost. For most MARFC basins, however, little improvement is seen. For some MBRFC basin, MS-EnsPost performed worse than EnsPost for day-1 and -2 predictions. For RTDP1 of MARFC (the 3rd worm from the top), MS-EnsPost clearly improves over EnsPost. Recall in the single-valued prediction results that MS-EnsPost generally showed significant improvement over EnsPost for regulated flows. The MBRFC results were unexpected in that MS-EnsPost was clearly superior to EnsPost in ensemble mean prediction. A closer examination indicates that, for the four MBRFC basins, the EnsPost ensemble predictions are superior to the MS-EnsPost for the first one or two days of lead time. The above result was traced to reduced reliability (Alizadeh 2019) due to the use of the error statistics of q˜ib rather than q˜ip for simplicity (see section 3c). The above suggests that lead-time-dependent prescription of the error statistics may be necessary for some basins. The positive impact of bias correction may be seen in Fig. 6b for all MBRFC basins; the 1-month-ahead MS-EnsPost predictions of monthly flow are clearly superior to the EnsPost. Note also in Fig. 6b that the improvement by MS-EnsPost over EnsPost is larger for the smaller MBRFC basins. This is due to the fact that bias correction, rather than multiscale regression, is largely responsible for the improvement by MS-EnsPost which produces a large positive cumulative impact for prediction of monthly flow.

Overall, the CBRFC, NCRFC, and NWRFC basins show particularly large improvement by MS-EnsPost over EnsPost. For many CBRFC and NWRFC basins, streamflow is fed by snowmelt, which increases hydrologic memory. The CBRFC and NWRFC results indicate that multiscale regression in MS-EnsPost is able to utilize effectively the predictability present in the model-simulated and observed flows over a range of temporal scales of aggregation. For the NCRFC basins, on the other hand, the significant improvement by MS-EnsPost is found to be due more to bias correction than multiscale regression (Alizadeh 2019). For the NERFC basins, MS-EnsPost shows significantly larger improvement over EnsPost for larger basins. For the CNRFC basins, MS-EnsPost significantly improves over EnsPost for some basins whereas MS-EnsPost and EnsPost perform similarly for the others. The improvement is found to be generally smaller for the coastal basins (Alizadeh 2019). For the WGRFC basins, MS-EnsPost significantly improves over EnsPost. It indicates that bias correction and multiscale regression are effective in addressing the flow-magnitude-dependent biases in raw model-predicted flow and intermittency of streamflow in the semiarid region.

Decomposition of the mean CRPS [Eq. (22); see Alizadeh (2019) for examples] indicates that, for most basins, the reduction in mean CRPS by MS-EnsPost over EnsPost is due mostly to improved resolution, rather than improved reliability. This is not very surprising in that the EnsPost uses empirical probability-matching based on NQT whereas MS-EnsPost relies on approximate distribution modeling via the Box–Cox transformation. If the historical record is long enough to model the tails of the distributions with accuracy, one may expect the ensemble traces sampled from the empirically modeled distributions to be more reliable. To scrutinize reliability of MS-EnsPost ensemble predictions, we also examined the reliability diagrams (Brown and Seo 2010; Jolliffe and Stephenson 2012; Wilks 2006) and Brier scores (Brier 1950) for a wide range of thresholds [see Alizadeh (2019) for examples]. They indicate that the MS-EnsPost ensembles are generally as reliable as the EnsPost ensembles for the 90th percentile or larger thresholds, but less so for the 50th percentiles or smaller thresholds. Not surprisingly, Box–Cox transform is not as effective as normal quantile transform particularly for low flows. For flood and water supply forecasting, performance for larger flows is much more important than that for smaller flows. As such, some deterioration in reliability at lower thresholds may not be a large concern in most applications.

c. Streamflow predictability

The skill in postprocessed ensemble predictions is bounded by the predictability of streamflow explainable by the forcings (Baldwin et al. 2003; Bengtsso and Hodges 2006; Gebregiorgis and Hossain 2011; Li and Ding 2011; Simmons et al. 1995), hydrologic and reservoir models (Hou et al. 2009; Mahanama et al. 2012; Maurer and Lettenmaier 2004; Schlosser and Milly 2002), and statistical assimilation of streamflow via multiscale regression (Bogner et al. 2016; Sharma et al. 2018) used. In this section, we assess and characterize the predictability of streamflow in different hydroclimatological regions based on the ensemble prediction results presented above, and attribute the gains by MS-EnsPost over EnsPost by assessing the predictability through a skill score (Hou et al. 2009; Westra and Sharma 2010). Figure 7 shows the CRPSS of the MS-EnsPost ensemble predictions for all seasons for lead times of 1–32 days. The reference forecast is the sample climatology of historical observed flow. To assess seasonal variations, we also examined the wet-versus-dry seasonal results. They showed that, except for the CBRFC basins, the CRPSS does not differ much between the two seasons. As such, we only present the combined results which are necessarily more reflective of the wet season. For the CBRFC basins, the CRPSS is significantly lower for the dry season due to the fact that highly persistent low-flow conditions may be predicted very well with climatology. In Fig. 7, the vertical spread in the CRPSS curves represents the variations in predictability of streamflow among the different basins within each RFC’s service area. It is readily seen that the CBRFC basins, all of which are in the Upper Colorado River basin, exhibit the smallest variations. The largest variations are observed with the NWRFC basins which encompass the coastal, mountain and intermountain regions of the Pacific Northwest. For each RFC, there are a small number of basins with conspicuously lower CRPSS. They are generally associated with regulated flows which increase hydrologic uncertainty. Because these basins do not represent natural flows, they are treated separately in the analysis below.

Fig. 7.
Fig. 7.

CRPSS of ensemble predictions of daily flow from MS-EnsPost vs lead time. Reference is sample climatology of historical observed flow.

Citation: Journal of Hydrometeorology 21, 2; 10.1175/JHM-D-19-0164.1

MS-EnsPost seeks two effects in the CRPSS results: an increase in the limiting CRPSS, CRPSS(|∞|), from bias correction at very large lead times and an increase in CRPSS from multiscale regression at shorter lead times above CRPSS(|∞|). The first and second attributes above are referred to herein as the limiting CRPSS and the hydrologic memory scale (Schlosser and Milly 2002), respectively. The larger the limiting CRPSS, the more skillful the bias-corrected ensemble prediction relative to climatology. The larger the hydrologic memory scale, the larger the increase in CRPSS due to multiscale regression. The hydrologic memory scale Lhm (days), which represents the predictability of streamflow due to the surface and soil water storages in the basin (Kumar 2011), is defined as
Lhm=0ρCRPSS(|τ|)dτ,
where ρCRPSS(|τ|) denotes the normalized CRPSS at lead time τ (days). The normalization renders CRPSS to approach zero at large lead times. One may hence consider ρCRPSS(|τ|) as a correlogram with nugget effect (Norouzi et al. 2018).

Figure 8 shows the resulting pairs of Lhm and CRPSS(|∞|) for all basins in each RFC as obtained from the MS-EnsPost and EnsPost predictions. For each basin, an arrow connects the EnsPost result to the matching MS-EnsPost result. If MS-EnsPost increases the limiting CRPSS, the arrow would point upward. If MS-EnsPost increases the hydrologic memory scale, the arrow would point to the right. The longer the arrow is, the larger the improvement or deterioration is. Accordingly, lengthy arrows pointing in the upper-right direction would indicate MS-EnsPost clearly improving over EnsPost. It is seen that, for a number of basins, MS-EnsPost improves limiting CRPSS but reduces the hydrologic memory scale, resulting in arrows pointing in the upper-left direction. For these basins, the limiting CRPSS for EnsPost is significantly smaller than that for MS-EnsPost whereas the CRPSS for day-1 prediction is as large as or larger than that for MS-EnsPost. In such a case, EnsPost prediction is likely to yield a larger hydrologic memory than the MS-EnsPost. Such “inflated” hydrologic memory for EnsPost is an artifact of significantly smaller limiting CRPSS and hence, by itself, is not a very useful indicator of predictive skill. Accordingly, one may consider MS-EnsPost inferior to EnsPost only if the arrow is pointing in the lower-left direction.

Fig. 8.
Fig. 8.

Changes in limiting CRPSS and hydrologic memory scale from those of EnsPost to those of MS-EnsPost (see text for explanation).

Citation: Journal of Hydrometeorology 21, 2; 10.1175/JHM-D-19-0164.1

Figure 8 shows that MS-EnsPost outperforms or performs comparably to EnsPost for all basins, increases limiting CRPSS for almost all basins, and provides significant additional skill via multiscale regression particularly for the CBRFC, CNRFC, NCRFC, and NWRFC basins. From Fig. 8, a number of postulations may also be made. The significant increase in Lhm by MS-EnsPost for the CBRFC, CNRFC, NCRFC, and NWRFC basins suggests that there exists significant multiscale hydrologic memory to be exploited for operational hydrologic forecasting via data assimilation. The significant increase in CRPSS(|∞|) by MS-EnsPost for the MBRFC and NERFC basins suggests that there may exist significant room for improving calibration, hydrologic modeling, and input forcings to reduce hydrologic uncertainties. The WGRFC basin results, on the other hand, suggest limited room for improving predictive skill within the existing modeling and forecasting process, and point to improving model physics as well as soil moisture sensing and its assimilation.

The relative importance of CRPSS(|∞|) versus Lhm in assessing predictability necessarily varies with the application at hand. For long-range predictions, CRPSS(|∞|) would be more important whereas, for short-range predictions, Lhm may be just as important. Hence, it is not readily possible to translate uniquely the two summary attributes into a single measure. One may consider, however, the relative positions of the [Lhm, CRPSS(|∞|)] pairs for MS-EnsPost (i.e., the tips of the arrows) within the x–y plot in Fig. 8, and approximately rank the groups of basins in different RFCs in terms of the collective strength of predictability as measured through MS-EnsPost. The figure indicates that the CBRFC, CNRFC, NWRFC basins are the most predictable, followed by the NERFC, MARFC, and NCRFC basins, and that the MBRFC and WGRFC basins are the least predictable. The above order reflects what may be garnered visually from Fig. 7, and generally follows the decreasing order of the fraction of precipitation as snowfall fs and mean annual precipitation (see Fig. 1). To illustrate, Fig. 9 shows CRPSS(|∞|) versus annual mean precipitation for basins with fs < 0.3 (Fig. 9a) and Lhm versus fs for all basins (Fig. 9b). Though the scatters are large, CRPSS(|∞|) for non-snow-dominated basins relates well with mean annual precipitation except for the few very wet coastal basins, and Lhm relate positively with fs for all basins.

Fig. 9.
Fig. 9.

(a) Limiting CRPSS vs mean annual precipitation for non-snow-driven basins and (b) hydrologic memory scale vs fraction of precipitation as snowfall.

Citation: Journal of Hydrometeorology 21, 2; 10.1175/JHM-D-19-0164.1

5. Conclusions and future research recommendations

We describe a novel multiscale postprocessor for ensemble streamflow prediction, MS-EnsPost, and compare with the existing postprocessor, EnsPost, in the NWS’s HEFS for 139 basins in the service areas of eight RFCs in the CONUS. MS-EnsPost uses data-driven correction of magnitude-dependent biases in model-simulated flow, multiscale regression to utilize observed and simulated flows over a range of temporal scales of aggregation, and ensemble generation based on parsimonious error modeling. Comparative evaluation of raw and bias-corrected single-valued predictions and ensemble mean predictions from MS-EnsPost and EnsPost shows that MS-EnsPost reduces the RMSE of day-1 to day-7 predictions of daily flow from EnsPost on average by 28%, and that, for most basins, the improvement is due to both bias correction and multiscale regression. Comparative evaluation of ensemble predictions from MS-EnsPost and EnsPost shows that MS-EnsPost reduces the mean CRPS of day-1 to day-7 predictions of daily flow from EnsPost on average by 18%, and that the improvement is due mostly to improved resolution than reliability. Examination of the CRPSS of ensemble predictions indicate that, for most basins, the improvement by MS-EnsPost over EnsPost is due to both magnitude-dependent bias correction and multiscale regression, which utilizes hydrologic memory more effectively. Comparison of the CRPSS with hydroclimatic indices indicates that the skill in ensemble streamflow predictions from MS-EnsPost is modulated by the fraction of precipitation as snowfall and, for non-snowfall-driven basins, mean annual precipitation.

In addition to improving performance, the development of MS-EnsPost is motivated for reducing data requirement. Streamflow responses have changed or are changing significantly in many parts of the world due to urbanization and climate change (Milly et al. 2008). Changing conditions force statistical post processors a difficult trade-off between accounting for nonstationarities by dividing the period of record or modeling trends, which would significantly increase sampling uncertainties, versus keeping sampling uncertainties smaller but at the expense of introducing biases due to nonstationarities. Owing to the parsimony, one may expect MS-EnsPost to require significantly less data than EnsPost. We are currently assessing the data requirement for MS-EnsPost for possible application under nonstationarity, and the results will be reported in the near future.

Acknowledgments

This material is based upon work supported by the NOAA Climate Program Office under Grant NA15OAR4310109, the NWS COMET Program under UCAR Subaward No. SUBAWD000020, and the NSF under Grant CyberSEES-1442735. These supports are gratefully acknowledged. We thank John Lhotak of CBRFC, Brett Whitin and Ark Henkel of CNRFC, Seann Reed of MARFC, Lisa Holts of MBRFC, Andrea Holz of NCRFC, Erick Boehmler of NERFC, Brad Gillies of NWRFC, Andrew Philpott, Frank Bell, Paul McKee, Kris Lander and Mark Null of WGRFC for providing data and help during the course of this work.

REFERENCES

  • Adams, T. E., III, 2016: Flood forecasting in the United States NOAA/National Weather Service. Flood Forecasting: A Global Perspective, Academic Press, 249–310, https://doi.org/10.1016/B978-0-12-801884-2.00010-4.

    • Crossref
    • Export Citation
  • Ajami, N. K., Q. Duan, and S. Sorooshian, 2007: An integrated hydrologic Bayesian multimodel combination framework: Confronting input, parameter, and model structural uncertainty in hydrologic prediction. Water Resour. Res., 43, W01403, https://doi.org/10.1029/2005WR004745.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Alizadeh, B., 2019: Improving post processing of ensemble streamflow forecast for short-to-long ranges: a multiscale approach, PhD dissertation, Dept. of Civil Engineering, The University of Texas at Arlington, 125 pp., https://rc.library.uta.edu/uta-ir/bitstream/handle/10106/28663/ALIZADEH-DISSERTATION-2019.pdf?sequence=1.

  • Anderson, E. A., 1973: National Weather Service River Forecast System—Snow accumulation and ablation model. NOAA Tech. Memo. NWS HYDRO-17, 87 pp., https://www.wcc.nrcs.usda.gov/ftpref/wntsc/H&H/snow/AndersonHYDRO17.pdf.

  • Baldwin, M. P., D. B. Stephenson, D. W. Thompson, T. J. Dunkerton, A. J. Charlton, and A. O’Neill, 2003: Stratospheric memory and skill of extended-range weather forecasts. Science, 301, 636640, https://doi.org/10.1126/science.1087143.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bengtsso, L., and K. I. Hodges, 2006: A note on atmospheric predictability. Tellus, 58A, 154157, https://doi.org/10.1111/j.1600-0870.2006.00156.x.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bennett, C., R. Stewart, and J. Lu, 2014: Autoregressive with exogenous variables and neural network short-term load forecast models for residential low voltage distribution networks. Energies, 7, 29382960, https://doi.org/10.3390/en7052938.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Berghuijs, W. R., and R. A. Woods, 2016: A simple framework to quantitatively describe monthly precipitation and temperature climatology. Int. J. Climatol., 36, 31613174, https://doi.org/10.1002/joc.4544.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Berghuijs, W. R., M. Sivapalan, R. A. Woods, and H. H. Savenije, 2014: Patterns of similarity of seasonal water balances: A window into streamflow variability over a range of time scales. Water Resour. Res., 50, 56385661, https://doi.org/10.1002/2014WR015692.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bjørnar Bremnes, J., 2004: Probabilistic forecasts of precipitation in terms of quantiles using NWP model output. Mon. Wea. Rev., 132, 338347, https://doi.org/10.1175/1520-0493(2004)132<0338:PFOPIT>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Blöschl, G., and M. Sivapalan, 1995: Scale issues in hydrological modelling: a review. Hydrol. Processes, 9, 251290, https://doi.org/10.1002/hyp.3360090305.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bogner, K., K. Liechti, and M. Zappa, 2016: Post-processing of stream flows in Switzerland with an emphasis on low flows and floods. Water, 8, 115, https://doi.org/10.3390/w8040115.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Borgomeo, E., J. W. Hall, F. Fung, G. Watts, K. Colquhoun, and C. Lambert, 2014: Risk-based water resources planning: Incorporating probabilistic nonstationary climate uncertainties. Water Resour. Res., 50, 68506873, https://doi.org/10.1002/2014WR015558.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Boucher, M. A., L. Perreault, F. Anctil, and A. C. Favre, 2015: Exploratory analysis of statistical post-processing methods for hydrological ensemble forecasts. Hydrol. Processes, 29, 11411155, https://doi.org/10.1002/hyp.10234.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Box, G. E., and D. Cox, 1964: An analysis of transformations. J. Roy. Stat. Soc. B, 26, 211252.

  • Box, G. E., and G. M. Jenkins, 1976: Time Series Analysis: Forecasting and Control. Holden Day, 575 pp.

  • Brier, G. W., 1950: Verification of forecasts expressed in terms of probability. Mon. Wea. Rev., 78, 13, https://doi.org/10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Brown, J. D., and D.-J. Seo, 2010: A nonparametric postprocessor for bias correction of hydrometeorological and hydrologic ensemble forecasts. J. Hydrometeor., 11, 642665, https://doi.org/10.1175/2009JHM1188.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Brown, J. D., J. Demargne, D.-J. Seo, and Y. Liu, 2010: The Ensemble Verification System (EVS): A software tool for verifying ensemble forecasts of hydrometeorological and hydrologic variables at discrete locations. Environ. Modell. Software, 25, 854872, https://doi.org/10.1016/j.envsoft.2010.01.009.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Budyko, M. I., D. H. Miller, and D. H. Miller, 1974: Climate and Life. D. H. Miller, Ed., International Geophysics Series, Vol. 18, Academic Press, 508 pp.

  • Burnash, R. J., R. L. Ferral, and R. A. McGuire, 1973: A generalized streamflow simulation system: Conceptual models for digital computers. Joint Federal and State River Forecast Center, U.S. National Weather Service, and California Department of Water Resources Tech. Rep., 204 pp.

  • Butts, M. B., J. T. Payne, M. Kristensen, and H. Madsen, 2004: An evaluation of the impact of model structure on hydrological modelling uncertainty for streamflow simulation. J. Hydrol., 298, 242266, https://doi.org/10.1016/j.jhydrol.2004.03.042.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Carlu, M., F. Ginelli, V. Lucarini, and A. Politi, 2019: Lyapunov analysis of multiscale dynamics: the slow bundle of the two-scale Lorenz 96 model. Nonlinear Processes Geophys., 26, 7389, https://doi.org/10.5194/npg-26-73-2019.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Chapman, D. G., 1956: Estimating the parameters of a truncated gamma distribution. Ann. Math. Stat., 27, 498506, https://doi.org/10.1214/aoms/1177728272.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Chow, V. T., D. R. Maidment, and L. W. Mays, 1988: Unit hydrograph. Applied Hydrology, McGraw-Hill, 100–118.

  • Cloke, H., and F. Pappenberger, 2009: Ensemble flood forecasting: A review. J. Hydrol., 375, 613626, https://doi.org/10.1016/j.jhydrol.2009.06.005.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Coccia, G., and E. Todini, 2011: Recent developments in predictive uncertainty assessment based on the model conditional processor approach. Hydrol. Earth Syst. Sci., 15, 32533274, https://doi.org/10.5194/hess-15-3253-2011.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Damon, J., and S. Guillas, 2002: The inclusion of exogenous variables in functional autoregressive ozone forecasting. Environmetrics, 13, 759774, https://doi.org/10.1002/env.527.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Demargne, J., and Coauthors, 2014: The science of NOAA’s operational hydrologic ensemble forecast service. Bull. Amer. Meteor. Soc., 95, 7998, https://doi.org/10.1175/BAMS-D-12-00081.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Demeritt, D., S. Nobert, H. Cloke, and F. Pappenberger, 2010: Challenges in communicating and using ensembles in operational flood forecasting. Meteor. Appl., 17, 209222, https://doi.org/10.1002/met.194.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Deutsch, R., 1965: Estimation Theory. Prentice-Hall, 269 pp.

  • Doherty, J., and D. Welter, 2010: A short exploration of structural noise. Water Resour. Res., 46, W05525, https://doi.org/10.1029/2009WR008377.

  • Duan, Q., N. K. Ajami, X. Gao, and S. Sorooshian, 2007: Multi-model ensemble hydrologic prediction using Bayesian model averaging. Adv. Water Resour., 30, 13711386, https://doi.org/10.1016/j.advwatres.2006.11.014.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dunne, T., and R. D. Black, 1970: Partial area contributions to storm runoff in a small New England watershed. Water Resour. Res., 6, 12961311, https://doi.org/10.1029/WR006i005p01296.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Engman, E. T., and A. S. Rogowski, 1974: A partial area model for storm flow synthesis. Water Resour. Res., 10, 464472, https://doi.org/10.1029/WR010i003p00464.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Erickson, M., 1996: Medium-range prediction of PoP and Max/Min in the era of ensemble model output. Preprints, Conf. on Weather Analysis and Forecasting, Norfolk, VA, Amer. Meteor. Soc., J35–J38.

  • Fedora, M., and R. Beschta, 1989: Storm runoff simulation using an antecedent precipitation index (API) model. J. Hydrol., 112, 121133, https://doi.org/10.1016/0022-1694(89)90184-4.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Fowler, K., M. Peel, A. Western, and L. Zhang, 2018: Improved rainfall-runoff calibration for drying climate: Choice of objective function. Water Resour. Res., 54, 33923408, https://doi.org/10.1029/2017WR022466.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Freer, J., K. Beven, and B. Ambroise, 1996: Bayesian estimation of uncertainty in runoff prediction and the value of data: An application of the GLUE approach. Water Resour. Res., 32, 21612173, https://doi.org/10.1029/95WR03723.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Freeze, R. A., 1972: Role of subsurface flow in generating surface runoff: 2. Upstream source areas. Water Resour. Res., 8, 12721283, https://doi.org/10.1029/WR008i005p01272.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gan, T. Y., E. M. Dlamini, and G. F. Biftu, 1997: Effects of model complexity and structure, data quality, and objective functions on hydrologic modeling. J. Hydrol., 192, 81103, https://doi.org/10.1016/S0022-1694(96)03114-9.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gebregiorgis, A., and F. Hossain, 2011: How much can a priori hydrologic model predictability help in optimal merging of satellite precipitation products? J. Hydrometeor., 12, 12871298, https://doi.org/10.1175/JHM-D-10-05023.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gelb, A., 1974: Applied Optimal Estimation. MIT Press, 374 pp.

  • Georgakakos, K. P., D.-J. Seo, H. Gupta, J. Schaake, and M. B. Butts, 2004: Towards the characterization of streamflow simulation uncertainty through multimodel ensembles. J. Hydrol., 298, 222241, https://doi.org/10.1016/j.jhydrol.2004.03.037.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gijsbers, P., L. Cajina, C. Dietz, J. Roe, and E. Welles, 2009: CHPS - An NWS development to enter the interoperability era. 2009 Fall Meeting, San Francisco, CA, Amer. Geophys. Union, Abstract IN11A-1041.

  • Gneiting, T., A. E. Raftery, A. H. Westveld III, and T. Goldman, 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., 133, 10981118, https://doi.org/10.1175/MWR2904.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Greene, W. H., 2003: Econometric Analysis. 5th ed. Prentice Hall, 1026 pp.

  • Gupta, H. V., M. P. Clark, J. A. Vrugt, G. Abramowitz, and M. Ye, 2012: Towards a comprehensive assessment of model structural adequacy. Water Resour. Res., 48, W08301, https://doi.org/10.1029/2011WR011044.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hall, J. W., and E. Borgomeo, 2013: Risk-based principles for defining and managing water security. Philos. Trans. Roy. Soc. London, 371A, 20120407, https://doi.org/10.1098/rsta.2012.0407.

    • Search Google Scholar
    • Export Citation
  • Hall, J. W., and Coauthors, 2020: Risk-based water resources planning in practice: a blueprint for the water industry in England. Water Environ. J., https://doi.org/10.1111/wej.12479, in press.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and S. J. Colucci, 1998: Evaluation of Eta–RSM ensemble probabilistic precipitation forecasts. Mon. Wea. Rev., 126, 711724, https://doi.org/10.1175/1520-0493(1998)126<0711:EOEREP>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., J. S. Whitaker, and X. Wei, 2004: Ensemble reforecasting: Improving medium-range forecast skill using retrospective forecasts. Mon. Wea. Rev., 132, 14341447, https://doi.org/10.1175/1520-0493(2004)132<1434:ERIMFS>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hartman, R., M. Fresch, and E. Wells, 2015: National Weather Service (NWS) implementation of the Hydrologic Ensemble Forecast Service. 2015 Fall Meeting, San Francisco, CA, Amer. Geophys. Union, Abstract H52A-01.

  • Hashino, T., A. A. Bradley, and S. S. Schwartz, 2002: Verification of probabilistic streamflow forecasts. IIHR Rep. 427, University of Iowa, 125 pp., https://www.iihr.uiowa.edu/wp-content/uploads/2013/06/IIHR427.pdf.

  • Hashino, T., A. A. Bradley, and S. S. Schwartz, 2006: Evaluation of bias-correction methods for ensemble streamflow volume forecasts. Hydrol. Earth Syst. Sci., 11, 939950, https://doi.org/10.5194/hess-11-939-2007.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hersbach, H., 2000: Decomposition of the continuous ranked probability score for ensemble prediction systems. Wea. Forecasting, 15, 559570, https://doi.org/10.1175/1520-0434(2000)015<0559:DOTCRP>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Horton, R. E., 1933: The role of infiltration in the hydrologic cycle. Trans. Amer. Geophys. Union, 14, 446460, https://doi.org/10.1029/TR014i001p00446.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hou, D., K. Mitchell, Z. Toth, D. Lohmann, and H. Wei, 2009: The effect of large-scale atmospheric uncertainty on streamflow predictability. J. Hydrometeor., 10, 717733, https://doi.org/10.1175/2008JHM1064.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jolliffe, I. T., and D. B. Stephenson, 2012: Forecast Verification: A Practitioner's Guide in Atmospheric Science. John Wiley & Sons, 292 pp.

    • Crossref
    • Export Citation
  • Kim, S., and Coauthors, 2016: Integrating ensemble forecasts of precipitation and streamflow into decision support for reservoir operations in north central Texas. 2016 Fall Meeting, San Francisco, CA, Amer. Geophys. Union, Abstract H53I-08.

  • Kim, S., and Coauthors, 2018: Assessing the skill of medium-range ensemble precipitation and streamflow forecasts from the Hydrologic Ensemble Forecast Service (HEFS) for the Upper Trinity River Basin in North Texas. J. Hydrometeor., 19, 14671483, https://doi.org/10.1175/JHM-D-18-0027.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kim, S. M., B. L. Benham, K. M. Brannan, R. W. Zeckoski, and J. Doherty, 2007: Comparison of hydrologic calibration of HSPF using automatic and manual methods. Water Resour. Res., 43, W01402, https://doi.org/10.1029/2006WR004883.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Knowles, N., M. D. Dettinger, and D. R. Cayan, 2006: Trends in snowfall versus rainfall in the western United States. J. Climate, 19, 45454559, https://doi.org/10.1175/JCLI3850.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Koenker, R., and G. Bassett, 1978: Regression quantiles. Econometrica, 46, 3350, https://doi.org/10.2307/1913643.

  • Krause, P., D. Boyle, and F. Bäse, 2005: Comparison of different efficiency criteria for hydrological model assessment. Adv. Geosci., 5, 8997, https://doi.org/10.5194/adgeo-5-89-2005.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Krzysztofowicz, R., 1999: Bayesian theory of probabilistic forecasting via deterministic hydrologic model. Water Resour. Res., 35, 27392750, https://doi.org/10.1029/1999WR900099.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Krzysztofowicz, R., and K. S. Kelly, 2000: Hydrologic uncertainty processor for probabilistic river stage forecasting. Water Resour. Res., 36, 32653277, https://doi.org/10.1029/2000WR900108.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Krzysztofowicz, R., and H. D. Herr, 2001: Hydrologic uncertainty processor for probabilistic river stage forecasting: precipitation-dependent model. J. Hydrol., 249, 4668, https://doi.org/10.1016/S0022-1694(01)00412-7.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kumar, P., 2011: Typology of hydrologic predictability. Water Resour. Res., 47, W00H05, https://doi.org/10.1029/2010WR009769.

  • Lee, H., Y. Liu, J. Brown, J. Ward, A. Maestre, M. A. Fresch, H. Herr, and E. Wells, 2018: Validation of ensemble streamflow forecasts from the Hydrologic Ensemble Forecast Service (HEFS). 2018 Fall Meeting, Washington, D.C., Amer. Geophys. Union, H31A-05.

  • Legates, D. R., and G. J. McCabe Jr., 1999: Evaluating the use of “goodness-of-fit” measures in hydrologic and hydroclimatic model validation. Water Resour. Res., 35, 233241, https://doi.org/10.1029/1998WR900018.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Li, J., and R. Ding, 2011: Temporal–spatial distribution of atmospheric predictability limit by local dynamical analogs. Mon. Wea. Rev., 139, 32653283, https://doi.org/10.1175/MWR-D-10-05020.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Li, M., Q. J. Wang, J. C. Bennett, and D. E. Robertson, 2016: Error reduction and representation in stages (ERRIS) in hydrological modelling for ensemble streamflow forecasting. Hydrol. Earth Syst. Sci., 20, 35613579, https://doi.org/10.5194/hess-20-3561-2016.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Li, W., Q. Duan, C. Miao, A. Ye, W. Gong, and Z. Di, 2017: A review on statistical postprocessing methods for hydrometeorological ensemble forecasting. Wiley Interdiscip. Rev.: Water, 4, e1246, https://doi.org/10.1002/wat2.1246.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Li, Z., J. C. McWilliams, K. Ide, and J. D. Farrara, 2015: A multiscale variational data assimilation scheme: Formulation and illustration. Mon. Wea. Rev., 143, 38043822, https://doi.org/10.1175/MWR-D-14-00384.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Limon, R., 2019: Improving multi-reservoir water supply system operation using ensemble forecasting and global sensitivity analysis. Ph.D. dissertation, The University of Texas at Arlington, 164 pp., http://hdl.handle.net/10106/28115.

  • Loague, K., and J. E. VanderKwaak, 2004: Physics-based hydrologic response simulation: Platinum bridge, 1958 Edsel, or useful tool. Hydrol. Processes, 18, 29492956, https://doi.org/10.1002/hyp.5737.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Madadgar, S., H. Moradkhani, and D. Garen, 2014: Towards improved post-processing of hydrologic forecast ensembles. Hydrol. Processes, 28, 104122, https://doi.org/10.1002/hyp.9562.

    • Crossref