Implications of Ensemble Quantitative Precipitation Forecast Errors on Distributed Streamflow Forecasting

Giuseppe Mascaro Dipartimento di Ingegneria del Territorio, Università di Cagliari, Cagliari, Italy

Search for other papers by Giuseppe Mascaro in
Current site
Google Scholar
PubMed
Close
,
Enrique R. Vivoni School of Earth and Space Exploration, School of Sustainable Engineering and the Built Environment, Arizona State University, Tempe, Arizona

Search for other papers by Enrique R. Vivoni in
Current site
Google Scholar
PubMed
Close
, and
Roberto Deidda Dipartimento di Ingegneria del Territorio, Università di Cagliari, Cagliari, Italy

Search for other papers by Roberto Deidda in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

Evaluating the propagation of errors associated with ensemble quantitative precipitation forecasts (QPFs) into the ensemble streamflow response is important to reduce uncertainty in operational flow forecasting. In this paper, a multifractal rainfall downscaling model is coupled with a fully distributed hydrological model to create, under controlled conditions, an extensive set of synthetic hydrometeorological events, assumed as observations. Subsequently, for each event, flood hindcasts are simulated by the hydrological model using three ensembles of QPFs—one reliable and the other two affected by different kinds of precipitation forecast errors—generated by the downscaling model. Two verification tools based on the verification rank histogram and the continuous ranked probability score are then used to evaluate the characteristics of the correspondent three sets of ensemble streamflow forecasts. Analyses indicate that the best forecast accuracy of the ensemble streamflows is obtained when the reliable ensemble QPFs are used. In addition, results underline (i) the importance of hindcasting to create an adequate set of data that span a wide range of hydrometeorological conditions and (ii) the sensitivity of the ensemble streamflow verification to the effects of basin initial conditions and the properties of the ensemble precipitation distributions. This study provides a contribution to the field of operational flow forecasting by highlighting a series of requirements and challenges that should be considered when hydrologic ensemble forecasts are evaluated.

Corresponding author address: Giuseppe Mascaro, Dipartimento di Ingegneria del Territorio, Università di Cagliari, Piazza d’Armi 5, 09123 Cagliari, Italy. Email: gmascaro@unica.it

Abstract

Evaluating the propagation of errors associated with ensemble quantitative precipitation forecasts (QPFs) into the ensemble streamflow response is important to reduce uncertainty in operational flow forecasting. In this paper, a multifractal rainfall downscaling model is coupled with a fully distributed hydrological model to create, under controlled conditions, an extensive set of synthetic hydrometeorological events, assumed as observations. Subsequently, for each event, flood hindcasts are simulated by the hydrological model using three ensembles of QPFs—one reliable and the other two affected by different kinds of precipitation forecast errors—generated by the downscaling model. Two verification tools based on the verification rank histogram and the continuous ranked probability score are then used to evaluate the characteristics of the correspondent three sets of ensemble streamflow forecasts. Analyses indicate that the best forecast accuracy of the ensemble streamflows is obtained when the reliable ensemble QPFs are used. In addition, results underline (i) the importance of hindcasting to create an adequate set of data that span a wide range of hydrometeorological conditions and (ii) the sensitivity of the ensemble streamflow verification to the effects of basin initial conditions and the properties of the ensemble precipitation distributions. This study provides a contribution to the field of operational flow forecasting by highlighting a series of requirements and challenges that should be considered when hydrologic ensemble forecasts are evaluated.

Corresponding author address: Giuseppe Mascaro, Dipartimento di Ingegneria del Territorio, Università di Cagliari, Piazza d’Armi 5, 09123 Cagliari, Italy. Email: gmascaro@unica.it

1. Introduction

Quantitative precipitation forecasts (QPFs) represent one of the main sources of uncertainty in hydrometeorological flood forecasting systems (Seo et al. 2000; Jasper et al. 2002; Arduino et al. 2005), especially in small river basins (∼102–103 km2), where the spatiotemporal rainfall distribution is crucial to provide accurate flood forecasts (Warner et al. 2000; Kabold and Suselj 2005). Recently, ensemble forecasting techniques have been adopted to account for uncertainty in QPFs (Krajewski et al. 2005; Ciach et al. 2007; Seo et al. 2000, 2006) and in hydrometeorological flood forecasting systems (Schaake et al. 2007; Verbunt et al. 2007). In this kind of approach, a set or ensemble of precipitation fields, each representing a rainfall scenario, is used to force a hydrological model, which produces an ensemble of streamflow forecasts. The aim of this kind of procedure is usually to build the probability distribution function (PDF) of the basin streamflow to provide a probabilistic flood forecast, which can be used, for example, to estimate the exceedance probability of a certain discharge threshold that accounts for rainfall uncertainty.

Characterizing uncertainty and deficiencies of ensemble QPFs and assessing how they propagate into the basin hydrological response are keys to evaluate the predictability of a hydrometeorological forecasting system and to prioritize improvements aimed at decreasing forecast uncertainty. For this purpose, it is fundamental to evaluate the main attributes of ensemble forecast quality including bias, sharpness, uncertainty, resolution, and reliability (see section 2a for a definition of these terms). The development of verification techniques testing one or more of these attributes was first carried out by the meteorological community, which has designed methods to evaluate ensemble outputs of numerical weather prediction (NWP) models (Hamill and Colucci 1997, 1998; Wilson et al. 1999; Smith and Hansen 2004; Wilks 2004). A group of methods has been developed to verify probabilistic forecasts, including scalar measures, such as the Brier score (BS) and its decomposition (Brier 1950; Murphy 1973), the ranked probability score (RPS) (Epstein 1969; Murphy 1971), the continuous ranked probability score (CRPS) (Brown 1974; Matheson and Winkler 1976; Hersbach 2000), and graphical tools like the reliability diagram (Wilks 2006) and the relative operating characteristics (Swets 1973; Mason 1982). Another class of verification methods has been specifically designed to test the consistency hypothesis for ensemble forecasts (Anderson 1997), defined as the degree to which ensemble members and observations belong to the same probability distribution. One of the most commonly used methods is the verification rank histogram (VRH), proposed by Anderson (1996), Hamill and Colucci (1997), and Talagrand et al. (1997). An in-depth review of these techniques and forecast attributes is provided by Wilks (2006).

Recently, similar techniques have been explicitly developed to evaluate forecasts of hydrological variables (Bradley et al. 2004; Venugopal et al. 2005; Laio and Tamea 2007) or used in different kinds of hydrological applications, including analysis of the performance of medium- and long-range ensemble hydrologic predictions (Franz et al. 2003; Hashino et al. 2007; Roulin and Vannitsem 2005; Roulin 2007), evaluation of the uncertainty of hydrological response simulated by distributed models (Georgakakos et al. 2004), and calibration of hydrologic forecast ensembles (Vrugt et al. 2006; Seo et al. 2006; Wood and Schaake 2008). Additionally, the U.S. National Weather Service (NWS) is currently developing a river verification plan that provides criteria to evaluate the skill of hydrologic forecasts and characterize the importance of different error sources, using the techniques listed above (the report is available online at http://www.nws.noaa.gov/oh/rfcdev/docs/Final_Verification_Report.pdf).

A statistically significant evaluation of an ensemble forecasting system requires one to (i) evaluate the performance on a large number of verification events or hindcasts (Bradley et al. 2004; Seo et al. 2006), (ii) consider events spanning a wide range of hydrometeorological conditions, and (iii) use techniques aimed at analyzing not only the behavior of the ensemble forecast mean but also the entire distribution of the ensemble with respect to the observation, over the large set of forecast cases.

In this paper, we evaluate the propagation of ensemble QPF characteristics into the hydrological response (viz., streamflow) in a hydrometeorological flood forecasting system, using an approach that satisfies the requirements mentioned above. For this purpose, we first create an extensive set of synthetic hydrometeorological events in controlled conditions by coupling a precipitation downscaling model and a fully distributed hydrological model, both operating at regional scales of interest for operational forecasting purposes. For each event, we perform different hindcasts by generating ensemble QPFs with known characteristics (i.e., consistency, overdispersion, and underdispersion), and we evaluate the effects of their propagation into streamflow response. Specifically, we use verification tools based on the VRH and the CRPS to (i) investigate how consistency or deficiencies of the ensemble QPFs propagate into hydrological forecasts and (ii) provide a scalar measure of the forecast quality accounting for reliability, resolution, and uncertainty attributes.

2. Methods

For the sake of clarity, this section is divided into two subsections. In section 2a, a description of the forecast verification terminology used throughout the paper is provided, and the two verification techniques adopted to evaluate the ensemble forecast quality are illustrated. Section 2b describes the hydrometeorological ensemble forecasting system designed for the numerical experiments.

a. Overview of forecast quality attributes, verification rank histogram, and continuous ranked probability score

Using Fig. 1, we provide a brief qualitative description of the attributes of forecast quality mentioned in the introduction. The bias is the degree to which the average forecast value corresponds to the average observed value, and the sharpness is the degree of forecast dispersion (Fig. 1a). The uncertainty is the degree of intrinsic predictability of the observations (Fig. 1b). The resolution refers to the degree to which a forecast system sorts the observed events into different groups (Fig. 1c), whereas the reliability attribute is the degree of correspondence between the forecast probability and the relative frequency of event occurrence for a given probability. The consistency is a measure of reliability, but it was specifically introduced for the ensemble forecasts: it refers to the condition when the observation is “indistinguishable from a randomly selected member of an ensemble forecast over a large set of forecast cases” (Anderson 1997).

A graphical method commonly used to test ensemble consistency is the verification rank histogram (VRH). (In the following, the term forecast verification is adopted to designate consistency hypothesis testing through the VRH.) The idea of the VRH is simple. If Q is the scalar predictand variable, then, for each forecast event, the ensemble model provides Nens forecasts Q1, Q2, … , QNens, whereas the correspondent observation is Qobs. If forecasts and observations are equally likely, the rank of Qobs in the sorted vector Q containing Q1, … , QNens and Qobs assumes any of the values 1, 2, … , (Nens + 1) with the same probability. Therefore, if the rank is calculated for Nev forecast events, a histogram built with the corresponding Nev ranks should be uniform and any departure from a uniform distribution should only be due to sampling variability. The uniformity of the VRH does not assure a consistent ensemble but is only a necessary condition. As a consequence, the shape of the VRH should be carefully interpreted (Hamill 2001). If the VRH is not uniform, the consistency of ensemble forecasts has not been verified and the model may be affected by deficiencies. The overpopulation of the lowest or highest ranks reveals the presence of positive or negative bias, respectively. An excess of dispersion (overdispersion) implies overpopulation of the middle ranks, while a lack of variability (underdispersion) determines U-shaped histograms. The ensemble characteristics of underdispersion and overdispersion are qualitatively illustrated in Fig. 1d.

The other verification tool used in this paper is the continuous ranked probability score (CRPS) (Brown 1974; Matheson and Winkler 1976; Hersbach 2000), a technique providing, for each forecast event, a single scalar measure of forecast quality. The CRPS measures the distance between the ensemble cumulative distribution function (CDF) of the predictand variable and the Heaviside function of the observed value. As a result, the CRPS is a negative-oriented score, being smaller for more accurate forecasts and equal to zero for a perfect forecast. Note also that the CRPS has the dimension of the predictand. The CRPS has several attractive properties. In particular, it is defined for continuous predictands (like streamflow or precipitation), and the average of the CRPSs computed for a set of Nev events, CRPS, can be decomposed into reliability, resolution, and uncertainty terms, as shown by Hersbach (2000):
i1525-7541-11-1-69-e1
where Reli is the reliability term, whereas CRPSpot is the potential CRPS that can be further decomposed into a term U accounting for the uncertainty of the observations and a term Resol accounting for the ensemble forecast resolution. To achieve a better accuracy (i.e., a low CRPS), according to the decomposition (1), it is desirable to have low values for the reliability and uncertainty terms and a high-resolution term. See Hersbach (2000) for details on CRPS decomposition and computation of each term.

b. The hydrometeorological ensemble forecasting system

The hydrometeorological system for flood forecasting proposed here combines a precipitation downscaling model with a distributed hydrological model. The schematic of the system is illustrated in Fig. 2. For each forecast event, ensemble streamflow forecasts are obtained through three steps. NWP model forecasts are first used to determine the precipitation accumulated in a coarse spatiotemporal domain L × L × T (where L refers to space and T to time) containing the study region of interest. In this large domain, NWP model outputs are, in general, characterized by less uncertainty than in smaller domains (Ebert et al. 2003; Ferraris et al. 2002). Subsequently, a statistical downscaling model disaggregates the coarse-scale precipitation, providing an ensemble of spatiotemporal QPFs at a scale λ × λ × τ (where λ refers to space and τ to time) suitable for hydrological modeling. In the final step, the high-resolution downscaled QPFs are used to force a distributed hydrological model that furnishes an ensemble of hydrographs forecasted at multiple nested locations.

In recent years, the increasing availability of computational resources, improved parameterizations of physical processes, and the adoption of ensemble and data assimilation techniques have contributed to increase the resolution of QPFs produced by NWP models (e.g., Charba et al. 2003; Lee et al. 2006; Cartwright and Krishnamurti 2007). For example, the National Centers for Environmental Prediction (NCEP) Regional Spectral Model (RSM) (Juang and Kanamitsu 1994) produces ensemble precipitation forecasts at a resolution of 12 km in space and 6 h in time (Yuan et al. 2007). Nevertheless, the skill of high-resolution QPFs produced by NWPs is still limited at their grid resolution and they cannot be directly used as input for hydrological flood forecasting models. For example, the skill is poor in heterogeneous areas with few heavy precipitation events, as shown by Yuan et al. (2007). In particular, the forecast capability of NWP models are influenced by the limited spatial and temporal coverage of the observations used to initiate the model runs, the chaotic nature of the processes involved, and the underrepresentation of the energy contained in the smaller scales resolved by the models (Skamarock 2004; Frehlich and Sharman 2008). One important consequence of these limitations is that NWP models poorly predict convective rainfall, which is in turn responsible for severe floods and flash floods (Rezacova et al. 2007).

To improve the skill of NWP QPFs, a number of postprocessing techniques have been adopted to recalibrate the forecasts based on multidecadal reforecast datasets [see references cited in Wilks and Hamill (2007)]. An alternative approach to produce high-resolution QPFs is the use of statistical precipitation downscaling models that, starting from NWP forecasts at coarser scales, are able to reproduce small-scale variability of rainfall required for flood and flash-flood forecasts (Wilby et al. 1999; Deidda et al. 1999; Hay and Clark 2003; Friederichs and Hense 2007; among others).

A class of downscaling models is based on scale invariance analysis and multifractal theory (Schertzer and Lovejoy 1987; Over and Gupta 1996; Perica and Foufoula-Georgiou 1996; Venugopal et al. 1999a,b; Deidda 2000). The operational use of downscaling models is typically achieved by means of calibration relations linking the model parameters with a coarse meteorological observable, such as the convective available potential energy (CAPE) or the precipitation volume at the regional scale (e.g., Over and Gupta 1994, 1996; Perica and Foufoula-Georgiou 1996; Deidda 2000; Deidda et al. 2004). As a result, given a coarse meteorological variable from a NWP model, it is possible to robustly determine downscaling model parameters and generate an ensemble of fine-resolution QPFs over a region of interest, representing different possible statistical realizations for the same coarse-scale condition.

Physically based distributed hydrological models, in turn, take into account variability of hydrometeorological forcing and catchment properties at fine spatial scales (Pessoa et al. 1993; Bell and Moore 2000; Ivanov et al. 2004a,b; Vivoni et al. 2005, 2006, 2007a). Capturing finescale processes is essential for utilizing the information content of the high-resolution rainfall fields provided by downscaling models. Recent improvements in spatially distributed inputs to hydrological models—including hydrometeorological data, topography, vegetation and soil maps—allow adequate parameterization of physically based equations over distributed locations. Applications of distributed hydrological models in flood forecast systems can be found in Jasper et al. (2002), Liu et al. (2005), Gouweleeuw et al. (2005), and Vivoni et al. (2006, 2007b), among others.

The precipitation downscaling model adopted in the hydrometeorological forecasting system was developed by Deidda et al. (1999) and refined by Deidda (2000) and is known as the space–time rainfall (STRAIN) model. The STRAIN model reproduces the scale invariance properties of homogeneous spatiotemporal precipitation fields in a self-similar framework, through a log-Poisson generator, depending on two parameters: β and c. High-resolution precipitation scenarios generated by the STRAIN model are then used as forcing to a distributed hydrological model known as the triangulated irregular networks (TIN)–based Real-time Integrated Basin Simulator (tRIBS). Details on the model can be found in Ivanov et al. (2004a,b) and Vivoni et al. (2007a) and the references therein.

3. Synthetic hindcast experiments

In this section, we first describe the study area and hydrological model setup and then illustrate the generation of the synthetic “observed” events and the ensemble hindcasts constructed by applying the hydrometeorological forecasting system previously illustrated.

We remark that our study focuses on evaluating the propagation of the uncertainty introduced by the downscaling precipitation forecasts into the hydrological response and it is not intended to reproduce specific observed precipitation and streamflow events. When several sources of uncertainty coexist, the verification tools here applied may not readily detect the main sources of ensemble inconsistency. Thus, to isolate the effects due to uncertainty in downscaled rainfall, all the other sources of uncertainty, such as the basin initial state and model parameterization, were not taken into account in this study. We emphasize that the experiments were designed to generate downscaled precipitation events with known statistical properties, allowing a better control on result interpretation, and are based upon numerical models tested and calibrated for the study region.

a. Study area and tRIBS model setup

The Baron Fork basin at Eldon, Oklahoma, was selected as the study region to carry out the hydrometeorological experiments. Figure 3a shows the basin location with respect to the Arkansas Red River basin. Streamflow values are monitored by the U.S. Geological Survey (USGS) at the basin outlet at Eldon and at two interior stream gauges: Peacheater Creek and Dutch Mills (Fig. 3b). Basin topography for application of the tRIBS model is derived from a USGS 30-m digital elevation model (DEM) using the hydrographic TIN procedure described in Vivoni et al. (2004; Fig. 3c). The basin is characterized by high terrain variability; land use classification, including mainly forest (52.2%), croplands (46.3%), and small towns (1.3%); and soil textures ranging from silt loam (94%) to fine sandy loam (6%). The Baron Fork basin was selected as it has been previously simulated using tRIBS in both continuous (Ivanov et al. 2004a,b) and event-based (Vivoni et al. 2006) modes. To explore the ensemble forecast dependence on basin scale, we selected five interior subbasins and the basin outlet, ranging in area from ∼50 to ∼800 km2 (Fig. 3b and Table 1), as target forecast points.

In our experiments, the tRIBS model was used as a numerical tool to simulate the hydrological response to the downscaled precipitation products. Given the synthetic nature of the experiments, we selected reliable model parameter values for one summer period and applied these throughout the other summers. Model parameters were based on a long-term simulation (1993–2000) reported by Ivanov et al. (2004b). Minor calibration for summer 2000 was then carried out using multiple-gauge observations. Results of the calibration experiment (Fig. 4), reveal that the model adequately reproduces observed hydrographs at the multiple-gauge interior locations.

With the aim of preserving the regional summer conditions, hourly meteorological data from the Westville station (Fig. 3b) of the Oklahoma Mesonet network were utilized in tRIBS to compute the surface energy fluxes during the simulations.

b. Hydrometeorological ensemble hindcasts

The first step of the hindcast experiments was the generation of a synthetic database of “observed” rainfall and correspondent streamflow during summer periods (June–August) from 1997 to 2005, so as to simulate events with similar meteorological signatures. The STRAIN model was used to downscale coarse precipitation events by aggregating 4-km, 1-h Next-Generation Radar (NEXRAD) estimates under the following assumptions: (i) the existence of scale invariance laws between the coarse scales L = 256 km, T = 16 h, and the fine scales λ = 4 km, τ = 15 min; (ii) β = e−1; and (iii) a single calibration relation c = c(R) provided by the following nonlinear relation:
i1525-7541-11-1-69-e2
where R is the precipitation rate at the coarse scale and c = 0.675, a = 0.907, and γ = 0.764, as in Deidda (2000).

The synthetic observed rainfall database was generated as follows:

  1. An area of L = 256 km × L = 256 km centered on the Baron Fork basin was first identified. Subsequently, each summer of years l = 1997, … , 2005 was divided into consecutive T = 16-h-long events with a total of 138 events per summer from 1 June to 1 September. The observed precipitation rates Ri,l (i = 1, … , 138) at the coarse scale L × L × T were derived by aggregation of the NEXRAD data.

  2. For each 16-h-long event i in year l, STRAIN parameter ci,l = c(Ri,l) was determined by means of the calibration relation (2) and utilized to generate a synthetic field at a resolution of 4 km × 4 km × 15 min. No downscaling was performed if Ri,l = 0 and zero precipitation at the fine resolution was assumed throughout the 16 h.

  3. For each year l, the synthetic observed precipitation database was built by concatenating the high-resolution fields downscaled from R1,l, R2,l, … , R138,l. To illustrate this, Fig. 5 shows the time series of the Ri,2000 coarse precipitation rates (time step Δt = 16 h) and of the mean areal precipitation (MAP) over the Baron Fork basin for the correspondent downscaled fields (Δt = 15 min).

The synthetic observed precipitation data in each summer were then used as input for the tRIBS model, which simulated the synthetic observed streamflow database at multiple locations along the channel network.

Subsequently, hydrometeorological ensemble hindcasts (in the following, the words hindcast and forecast are used alternatively) were performed on Nev = 100 events, randomly selected among the large-scale precipitation values Ri,l > 0 used to build the synthetic observed database. The coarse rainfall values range in the interval [0.16, 4.07] mm h−1 with a mean value of 0.87 mm h−1. According to this selection criteria, our analysis included a range of possible rainfall events leading to the occurrence or nonoccurrence of floods in the basin.

For each coarse-scale rainfall event, three ensembles of high-resolution QPFs were generated using the STRAIN model by determining the value of parameter c in accordance to the calibration modes discussed in Mascaro et al. (2008) and referred to as “functional based,” “mean based,” and “event based” modes. These calibration modes are illustrated in Fig. 6 and their adoption permitted the generation of three sets of ensemble downscaled QPFs with known characteristics: consistency, overdispersion, and underdispersion. Evidence of these characteristics of the ensemble QPFs are reported in the histograms of Fig. 7.

The three different types of ensemble downscaled QPFs were used as input for the tRIBS model, which produced three sets of streamflow ensemble hindcasts, labeled as CONS, UNDER and OVER experiments, respectively. For a given coarse-scale event and for each characteristic of the rainfall forecasts, an ensemble of Nens = 50 QPFs was generated and utilized as input to the hydrological model resulting in (Nens = 50) × (Nev = 100) × (Nexp = 3) = 15 000 hydrological simulations. This computational effort was achieved using a 64-node high-performance computing cluster available at New Mexico Tech.

4. Evaluation of ensemble streamflow characteristics

a. Consistency of ensemble streamflows

To test the consistency hypothesis of ensemble streamflow forecasts, we used a verification tool based on a probability–space application of the VRH. In the following, the method is first proposed in a general case whenever ensemble streamflows are produced (e.g., when a hydrological model is run in a Monte Carlo approach for different initial conditions or for different values of model parameters). Let us consider an observed hydrograph with duration Thydro (Fig. 8a). The method first requires selecting a time interval Tver to identify the observed streamflow events used in the verification. For each event, the hydrological model returns an ensemble of Nens streamflow forecasts. Figure 8b shows the observed and the Nens ensemble hydrographs of a generic event k (k = 1, … , Nev) with duration Tver. It is then possible to extract a specific metric Qm, such as the maximum accumulated streamflow volume over a fixed duration mTver, for the Nens ensemble members Qjm (j = 1, … , Nens) and the observation . Subsequently, the vector is sorted in increasing order and the empirical CDF is built using the Hazen plotting position formula. If r is the position of in the sorted vector, we can compute the nonexceedance probability p of this position as (r − 0.5)/(Nens + 1) (Fig. 8c), that is, normalizing the rank r in the range (0, 1). The procedure is repeated for each event, and the VRH is populated by Nev probabilities.

The application of the verification procedure to test the consistency of ensemble streamflow in an event-based flood forecasting framework (as in our experiments) is illustrated in Fig. 9. Observed precipitation is available up to t*, when the forecast period begins and the coarse-scale precipitation forecast is issued for the next T hours over an area L × L. The STRAIN model downscales this coarse precipitation, generating an ensemble of high-resolution QPFs. A fixed duration Tver is then used to define the streamflow event. To evaluate the hydrological response due to the rainfall forecasted in the time interval [t*, t* + T], the selection of Tver should be greater than the duration of the precipitation event plus the basin response time. A practical method to estimate the basin response time is through the time of concentration Tc (see Kirpich 1940). As a consequence, Tver should be at least equal to (T + Tc).

The event-based approach adopted in our experiments was based on hydrologic model simulations with downscaled rainfall over T hours followed by a zero-padding interval of Tc hours (i.e., Tver = T + Tc). The use of zero padding was assumed to evaluate the effects of the rainfall occurring in [t*, t* + T] on subsequent streamflow response, as in Vivoni et al. (2006).

Figure 9b shows an example of the ensemble and observed hydrographs for an event-based forecast. The initial basin condition at the beginning of each forecast is determined by the hydrological model forced with observed precipitation up to t*, and it remains identical for the Nens runs. This was achieved by taking advantage of the hot restart mechanism in the hydrological model.

The verification procedure described here has been applied for the Nev = 100 selected events for every subbasin and each of the CONS, UNDER, and OVER experiments. The accumulated streamflow at durations m = 1, 16, and 32 h (Q1h, Q16h, Q32h) were selected as metrics to build the histograms.

b. A scalar measure of the forecast accuracy of ensemble streamflow

We used the CRPS to provide an average measure of the accuracy of the ensemble streamflow forecasts, completing the picture of forecast quality. For the Nev = 100 selected events at each subcatchment and for each type of experiment (CONS, UNDER, and OVER), the CRPS was computed for the metrics Q1h, Q16h, and Q32h. Then, we calculated the average CRPS over the Nev events, and, in conformity with the decomposition (1), the Reli, Resol, and uncertainty U terms.

To compare the CRPS, Reli, and Resol terms obtained for the different catchments and experiments, we used the skill score, SS (Wilks 2006), which for a generic measure of accuracy A is defined as
i1525-7541-11-1-69-e3
where Aperf is the value of the accuracy measure that would be achieved by perfect forecasts and Aref is the measure computed for a set of reference forecasts. The SS was computed for A = CRPS and A = Reli. In the case of the resolution term, since for a perfect forecast Resol → ∞, we used A = 1/Resol. In all cases, the accuracy measure Aperf for a perfect forecast takes the values of zero, whereas for Aref we assumed the accuracy measure for the CONS experiment. Therefore, a negative percentage of the SS implies worse performance than the reference CONS.

Finally, to evaluate the variation of uncertainty of the sample observations with the basin area, we computed a dimensionless uncertainty: for a given subbasin, we calculated the ratio between the U term and the average value of the observed metric over the Nev = 100 events.

5. Results and discussion

a. Consistency of ensemble streamflows

1) CONS experiment

The rank histograms constructed to test the consistency hypothesis in the CONS experiment are shown in Fig. 10a for the basin outlet. Clearly, the use of consistent ensemble QPFs leads to consistent ensemble streamflows for all the metrics. The same findings were obtained for the other subbasins (not shown). This result is somewhat expected for the following reasons. For a given event, the observed rainfall field is one of the possible realizations of the consistent ensemble QPFs. Thus, the correspondent observed streamflow simulated by the hydrological model will have the same probability to be one of the ensemble streamflow outputs. This is true even if the hydrological processes simulated by the model lead to a nonlinear transformation of the inputs.

We want now to highlight two notes of caution that should be considered when evaluating ensemble forecasts. First, there are limits of the verification analysis when only the VRH is utilized to evaluate the reliability of ensemble streamflow forecasts. Let us suppose that two hydrological models are forced with the same consistent ensemble QPFs and return ensemble streamflow forecasts characterized by different sharpness (e.g., the metric Q1h has different ranges). The VRH of the ensemble streamflow would provide a uniform histogram for both models, but would not indicate which model returns the most accurate forecasts. Thus, meeting the consistency hypothesis for the ensemble streamflow forecast is a necessary, but not sufficient condition for assessing forecast accuracy. For this purpose, the CRPS or other scalar measure may provide additional information about forecast accuracy.

Second, many studies have based the verification of forecasting systems on a limited number of cases, typically selected on the basis of their severity and impacts in a specific basin. Our numerical experiments reveal that a correct evaluation of the forecast capability of a hydrometeorological system requires instead a random selection of the verification events that will consequently span a wide range of conditions. These needs are illustrated in Fig. 11, where histograms, derived from the CONS ensemble forecasts, are shown for the basin outlet (800 km2) and the smallest subbasin 5 (50 km2). For each catchment, the 100 events were grouped into two subsets including those coming from the 50 highest and 50 lowest observed metric , respectively. Results clearly reveal that, if the verification considers only the most intense flood events, the corresponding VRHs show a negative bias (i.e., the ranks take more frequently a high value). On the other hand, the histogram built with the 50 least severe streamflow events is positively biased. These kinds of biases can occur when the selection of events is based on the values of flood observations. Thus, we strongly suggest selecting streamflow events in a random fashion to avoid possible misinterpretation of the ensemble consistency in the streamflow forecasts.

2) UNDER experiment

The VRHs for the UNDER experiments are reported in Fig. 10b for the basin outlet. Results for the other subbasins (not shown) were similar. Histograms show that the U-shaped pattern of underdispersion detected in the ensemble QPFs is not apparent in the hydrological response. In some cases (i.e., a certain metric for a given subbasin), the uniformity hypothesis of the VRHs cannot be discarded, while in other cases this hypothesis fails and the shape of the histograms cannot be interpreted properly (e.g., it lacks clear trends, such as underdispersion or overdispersion). For example, in the case of the basin outlet, uniformity was verified for Q32h but not for Q1h and Q16h.

Interpretation of the histograms for the UNDER experiment is challenging. However, we suggest two possible explanations. First, as in the experiments of Hamill (2001), where a uniform VRH was obtained despite having a nonconsistent ensemble, the VRH uniformity of the ensemble streamflow forecasts in our UNDER experiment may be due to an effect of compensation of different ensemble distributions with opposite biases. To show this effect, we divided the 100 events into three subsets on the basis of the values of the STRAIN parameter c used to generate the observed small-scale rainfall events. In the first subset, we included all the events generated with c < (cmean − 0.1) (which have then the highest R); in the second subset, those generated with (cmean − 0.1) ≤ c ≤ (cmean + 0.1); and the third subset included events with c > (cmean + 0.1) (with the lowest R), where cmean (i.e., the mean of c parameters estimated on the observed events) is the constant value of the parameter used to generate the downscaled QPFs in the UNDER experiment (see Fig. 6).

The VRHs of the ensemble streamflows for the three subsets and for the entire set of Nev = 100 events are reported for the CONS and the UNDER experiments in Fig. 12 for subbasin 5 and the metric Q1h. In the CONS experiment, the VRHs of the three subsets are uniform. Nevertheless, in the UNDER experiment only the second subset has a uniform shape (since observations were generated with c values close to cmean), whereas the first and the third subsets are affected by negative and positive biases, respectively. The presence of these three patterns causes a compensation effect that leads to a uniform histogram when pooling all the events. Note that a similar effect obviously occurs also with precipitation, but the bias effect in the first and third subsets is stronger, and a U-shaped histogram is obtained when the entire set of events is considered (see Fig. 7b).

A second potential factor that affects the shape of the streamflow VRHs of the UNDER experiment is the influence of the basin condition at the beginning of each forecast, as illustrated in Fig. 13. In an operational setting, the antecedent rainfall is a variable that can be used to quantify the initial state since it can be provided by real-time monitoring networks (e.g., rain gauges or weather radar). Figure 13a shows the CONS (top) and UNDER (bottom) ensemble streamflow forecasts at the basin outlet for the event starting at 0100 UTC 1 June 2000 (t* = 480 in the figure). The observed mean areal precipitation is shown at the top of the CONS experiment in the Tc ≅ 12 h prior to t* and in the Tver = 28 h of downscaled (first 16 h) and zero-padding (last 12 h) rainfall. For this event, we observed zero antecedent rainfall and a severe storm during the forecast period. The visual interpretation of the figure suggests that the CONS ensemble streamflows captured the observation much better than the UNDER ensemble. In the UNDER case, the observation did not appear to be drawn from the ensemble distribution, and the rank assumed a high value.

In Fig. 13b, we report the ensemble forecasts made for the 16 h following the event of Fig. 13a (t* = 496). In this case, the antecedent rainfall was very high and had a strong influence on the discharge predicted in the subsequent hours. In both CONS and UNDER cases, each streamflow ensemble member retained the memory of the conditions prior to the forecast and reached the same flood peak. The ensemble precipitation forecasts had low values compared to the antecedent rainfall, so that the use of the underdispersed QPFs did not modify the distribution of the metrics Q1h, Q16h, and Q32h derived from the ensemble streamflow forecasts. As a result, in both cases, the rank of the observation was randomly drawn in the interval [0, 1].

3) OVER experiment

Uniformity of the VRH is also obtained for the OVER experiments (Fig. 10c). The reasons leading to this finding are explained in subsection c after the presentation of results relative to CRPS computation.

b. The CRPS of the ensemble streamflows

Results from the computation of the CRPS and its decomposition terms over the Nev = 100 events for the basin outlet are summarized in Table 2 (results are referred to as Qm = Q1h, but are similar for the other metrics). The best forecast accuracy, that is, lowest CRPS, is obtained for the CONS and OVER experiments, which have almost the same CRPS, whereas it degrades in the UNDER case. Considering that the uncertainty term depends only on the sample observations and is the same for all cases, a lower value of CRPS for the CONS experiment is obtained owing to lower Reli and higher Resol values.

The computation of the CRPS allows us to further investigate the effect of the antecedent rainfall (AR) on hydrological forecast skill. For this purpose, we sorted the events according to the value of AR, and divided them into three subsets from the lowest to the highest AR. Then we computed the CRPS for each subset and experiment. The relation between the CRPS and the mean AR for each subset is reported in Fig. 14. Results reveal that the UNDER experiment returns higher CRPS values than the CONS and OVER cases for the lowest value of the mean AR. The values of CRPS is, instead, similar for the three experiments in the other two subsets, which have increasing antecedent rainfall. This implies that the use of ensemble QPFs affected by errors, as in the UNDER case, worsens streamflow forecast performance as the influence of the previous rainfall events decreases (i.e., for low AR).

The skill score SS of the CRPS, Reli, and Resol terms were computed for all the subbasins and for the outlet, assuming the CONS forecasts as reference, as discussed in section 2a. Figures 15a and 15b show the relation between subbasin area and the skill scores of the CRPS computed using the metric Q1h in the UNDER and OVER cases. Results for the other metrics Qm (not shown) were similar. In the OVER case, the skill score is slightly worse than the CONS case, owing to the reasons described in the next subsection. In the UNDER case, the SS of the CRPS decreases, on average, by about 9%, with no particular trend with the basin area.

The reliability and resolution terms are evaluated in Fig. 15b. Results reveal that the use of different calibration modes does not lead to marked differences in the resolution accuracy measure 1/Resol with no trend with the basin area. In contrast, the skill score of the reliability term, Reli, shows a small decrease with no evident trend with basin area in the OVER case, whereas it decreases with basin area in the UNDER experiment. Thus, the Reli term of the CRPS decomposition provides additional information about forecast reliability than the simple interpretation of the rank histogram, allowing us to establish which forecasts are the most reliable.

Results of the computation of the dimensionless uncertainty are reported in Fig. 15c, which shows that the uncertainty of the sample climatology decreases with the basin scale. This is somewhat expected since the larger the area, the higher the capability of the basin to integrate the variability of the rainfall input. As a result, the distribution of the is less dispersed in large basins, leading to a lower uncertainty term. The only subbasin that shows an opposite behavior is subbasin 1 (A = 65.05 km2). One reason for this difference is related to the topography of the subbasin, which differs substantially from other subbasins in Table 1 (see Martina and Entekhabi 2006).

c. Overall performance of the CONS, OVER, and UNDER experiments

In our controlled experiments, the CONS experiment has the highest forecast skill by construction. However, the verification tools revealed that the OVER experiment performed similarly to CONS, whereas the UNDER experiment exhibited deteriorated performance. The underlying causes for this result are related to (i) the differences in the intermittency properties of the three kinds of ensemble QPFs and (ii) the smoothing of rainfall variability due to the hydrological processes occurring in the basin.

To clarify this, let us refer to Fig. 6, which illustrates the three calibration modes used to select the parameter c (controlling intermittency of the downscaled fields) and generate the three types of ensemble QPFs. For each forecast event k, the value of ck used in the CONS experiment [ck = c(Rk)] differs slightly from the value used in the OVER experiment , owing to sampling errors in the estimation of c. The VRHs computed for the consistent and overdispersed ensemble QPFs have different shapes (Figs. 7a and 7c) and allow for discriminating these small sampling errors. Nevertheless, the verification methods applied to the two corresponding ensemble streamflow forecasts are no longer able to detect the small differences associated with the two types of rainfall inputs. This result can be explained considering the smoothing effect of rainfall variability introduced by the hydrological processes occurring in the basin.

In contrast, the difference between the values of ck adopted in the CONS and UNDER experiments (ck = cmean) is larger, especially for those events with high and low R. Thus, despite that the basin smoothing effect is still present, the use of consistent and underdispersed ensemble QPFs leads to ensemble streamflow forecasts that clearly belong to two distributions, as shown by the higher value of the CRPS in the UNDER experiment.

6. Summary and conclusions

Reducing forecast uncertainty in hydrometeorological prediction systems is a priority involving efforts of researchers, forecasters, and water managers. This paper makes a contribution in this field by analyzing how errors associated with ensemble QPFs propagate into ensemble streamflow response, produced by hydrological models. For this purpose, we generated an extensive set of synthetic events under controlled conditions, which we assumed as “observations,” and performed hydrometeorological hindcasts using consistent, underdispersed, and overdispersed ensemble QPFs. The quality of the ensemble streamflow hindcasts were then evaluated using two verification tools based on the VRH and the CRPS. Results of the numerical experiments can be summarized in the following points:

  1. The verification methods confirm that the ensemble streamflow forecasts generated with consistent QPFs are the most accurate.

  2. A careful selection of hydrometeorological events for verification purposes is extremely important. Indeed, if the verification events used to build the VRH of the streamflow ensemble are chosen on the basis of observed floods, selecting for instance only the severest events, biased histograms may be obtained even if the models produce consistent forecasts. Verification events must be, instead, randomly selected.

  3. When underdispersed QPFs were used, the VRH of the ensemble streamflow forecasts did not show a clear U-shaped pattern of underdispersion, and the interpretation of the histogram shape is challenging. To address this, we suggested two possible reasons that can explain these results: (i) the compensation effect due to the presence of different distributions with opposite biases in the ensemble forecasts and (ii) the influence of the basin initial condition prior to the forecast.

  4. Hindcasts made with overdispersed QPFs lead to uniform histograms of the correspondent ensemble streamflows and a CRPS value close to the one obtained with consistent ensembles. This is likely due to the small differences between the intermittency properties of consistent and overdispersed ensemble QPFs that are attenuated by the basin, which acts as a spatiotemporal integrator of rainfall variability.

In conclusion, we would like to summarize two important issues. First, on the basis of the numerical experiment results, our study emphasized how, to robustly verify ensemble hydrological forecasts, it is important to use techniques that test the entire distribution of the ensemble forecast members with respect to the observation, such as the VRH and the CRPS. In addition, the experiments permitted us to show some strengths and limitations related to the use of these techniques in hydrological applications. Other methods including graphical devices (e.g., reliability diagrams and the relative operating characteristics) and simple scalar measures (e.g., mean absolute error, Brier score) can be used to further characterize the ensemble forecast attributes.

Second, this study emphasized a number of challenges associated with the verification of hydrological systems that need to be further investigated. In particular, we studied the role of the basin conditions prior to the forecast, which we quantified through the antecedent rainfall (AR), a variable potentially available in operational settings. Our analyses revealed that (i) as AR decreases, the accuracy of streamflow forecasts significantly deteriorates if nonreliable QPFs are used and (ii) results provided by the verification metrics are influenced by the basin initial conditions over the set of verification events and should be carefully interpreted. In this context, it would be desirable to develop or adapt verification metrics that explicitly account for this factor. We acknowledge that the influence of the basin initial conditions should be further analyzed by considering other variables, such as soil moisture, to evaluate antecedent effects on hydrological forecast skill when AR is zero or very low. Future studies should be devoted to exploring this aspect by combining hydrological simulations and coarse soil moisture estimates provided by satellite remote sensors.

Acknowledgments

The authors thank three anonymous reviewers for their in-depth comments and suggestions that significantly helped to improve the quality of the manuscript. The authors thank the Oklahoma Mesonet for providing the data collected at the Westville meteorological station. We acknowledge funding from the Dottorato in Ingegneria del Territorio of the University of Cagliari (Italy) and funding from the NASA Terrestrial Hydrology Program (Grant NNHO5ZDAON). We also acknowledge the New Mexico EPSCoR program for funding the purchase of the computing cluster.

REFERENCES

  • Anderson, J. L., 1996: A method for producing and evaluating probabilistic forecasts from ensemble model integrations. J. Climate, 9 , 15181530.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Anderson, J. L., 1997: The impact of dynamical constraints on the selection of initial conditions on ensemble predictions: Low-order perfect model results. Mon. Wea. Rev., 125 , 29692983.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Arduino, G., Reggiani P. , and Todini E. , 2005: Recent advances in flood forecasting and risk assessment. Hydrol. Earth Syst. Sci., 9 , 280284.

  • Bell, V. A., and Moore R. J. , 2000: The sensitivity of catchment runoff models to rainfall data at different spatial scales. Hydrol. Earth Syst. Sci., 4 , 653667.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bradley, A. A., Schwartz S. S. , and Hashino T. , 2004: Distributions-oriented verification of ensemble streamflow predictions. J. Hydrometeor., 5 , 532545.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Brier, G. W., 1950: Verification of forecasts expressed in terms of probabilities. Mon. Wea. Rev., 78 , 13.

  • Brown, T. A., 1974: Admissible scoring systems for continuous distributions. The Rand Corporation Paper P-5235, 24 pp. [Available from The Rand Corporation, 1700 Main St., Santa Monica, CA 90407-2138].

    • Search Google Scholar
    • Export Citation
  • Cartwright, T. J., and Krishnamurti T. N. , 2007: Warm season mesoscale superensemble precipitation forecasts in the southeastern United States. Wea. Forecasting, 22 , 873886.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Charba, P. J., Reynolds W. D. , McDonald B. E. , and Carter G. M. , 2003: Comparative verification of recent quantitative precipitation forecasts in the National Weather Service: A simple approach for scoring forecast accuracy. Wea. Forecasting, 18 , 161183.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ciach, G. J., Krajewski W. F. , and Villarini G. , 2007: Product-error-driven uncertainty model for probabilistic quantitative precipitation estimation with NEXRAD data. J. Hydrometeor., 8 , 13251347.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Deidda, R., 2000: Rainfall downscaling in a space-time multifractal framework. Water Resour. Res., 36 , 17791794. doi:10.1029/2000WR900038.

  • Deidda, R., Benzi R. , and Siccardi F. , 1999: Multifractal modeling of anomalous scaling laws in rainfall. Water Resour. Res., 35 , 18531867.

  • Deidda, R., Badas M. G. , and Piga E. , 2004: Space-time scaling in high-intensity Tropical Ocean Global Atmosphere Coupled Ocean-Atmosphere Response Experiment (TOGA-COARE) storms. Water Resour. Res., 40 , W02056. doi:10.1029/2003WR002574.

    • Search Google Scholar
    • Export Citation
  • Ebert, E. E., Damhart U. , Wergen W. , and Baldwin M. , 2003: The WGNE assessment of short-term quantitative precipitation forecasts. Bull. Amer. Meteor. Soc., 84 , 481492.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Epstein, E. S., 1969: A scoring system for probability forecasts of ranked categories. J. Appl. Meteor., 8 , 985987.

  • Ferraris, L., Rudari R. , and Siccardi F. , 2002: The uncertainty in the prediction of flash floods in the northern Mediterranean environment. J. Hydrometeor., 3 , 714727.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Franz, K. J., Hatmann H. C. , Sorooshian S. , and Bales R. , 2003: Verification of National Weather Service ensemble streamflow predictions for water supply forecasting in the Colorado River basin. J. Hydrometeor., 4 , 11051118.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Frehlich, R., and Sharman R. , 2008: The use of structure functions and spectra from numerical model output to determine effective model resolution. Mon. Wea. Rev., 136 , 15371553.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Friederichs, P., and Hense A. , 2007: Statistical downscaling of extreme precipitation events using censored quantile regression. Mon. Wea. Rev., 135 , 23652378.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Georgakakos, K. P., Seo D-J. , Gupta H. , Schaake J. , and Butts M. B. , 2004: Towards the characterization of streamflow simulation uncertainty through multimodel ensembles. J. Hydrol., 298 , 222241.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gouweleeuw, B. T., Thielen J. , Franchello G. , De Roo A. P. J. , and Buizza R. , 2005: Flood forecasting using medium-range probabilistic weather prediction. Hydrol. Earth Syst. Sci., 9 , 365380.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., 2001: Interpretation of rank histograms for verifying ensemble forecasts. Mon. Wea. Rev., 129 , 550560.

  • Hamill, T. M., and Colucci S. J. , 1997: Verification of Eta–RSM short-range ensemble forecasts. Mon. Wea. Rev., 125 , 13121327.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and Colucci S. J. , 1998: Evaluation of Eta–RSM ensemble probabilistic precipitation forecasts. Mon. Wea. Rev., 126 , 711724.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hashino, T., Bradley A. A. , and Schwartz S. S. , 2007: Evaluation of bias-correction methods for ensemble streamflow volume forecasts. Hydrol. Earth Syst. Sci., 11 , 939950.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hay, L. E., and Clark M. P. , 2003: Use of statistically and dynamically downscaled atmospheric model output for hydrologic simulations in three mountainous basins in the western United States. J. Hydrol., 282 , 5675.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hersbach, H., 2000: Decomposition of the continuous ranked probability score for ensemble prediction systems. Wea. Forecasting, 15 , 559570.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ivanov, V. Y., Vivoni E. R. , Bras R. L. , and Entekhabi D. , 2004a: Catchment hydrologic response with a fully distributed triangulated irregular network model. Water Resour. Res., 40 , W11102. doi:10.1029/2004WR003218.

    • Search Google Scholar
    • Export Citation
  • Ivanov, V. Y., Vivoni E. R. , Bras R. L. , and Entekhabi D. , 2004b: Preserving high-resolution surface and rainfall data in operational-scale basin hydrology: A fully-distributed physically-based approach. J. Hydrol., 298 , 80111. doi:10.1016/j.jhydrol.2004.03.041.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jasper, K., Gurtz J. , and Lang H. , 2002: Advanced flood forecasting in Alpine watersheds by coupling meteorological observations and forecasts with a distributed hydrological model. J. Hydrol., 267 , 4042.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Juang, H-M. H., and Kanamitsu M. , 1994: The NMC nested regional spectral model. Mon. Wea. Rev., 122 , 326.

  • Kabold, M., and Suselj K. , 2005: Precipitation forecasts and their uncertainty as input into hydrological models. Hydrol. Earth Syst. Sci., 9 , 322332.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kirpich, Z. P., 1940: Time of concentration of small agricultural watersheds. Civ. Eng. (N.Y.), 10 , 362.

  • Krajewski, W. F., Ciach G. J. , and Villarini G. V. , 2005: Towards probabilistic quantitative precipitation WSR-88D algorithms: Data analysis and development of ensemble generator model: Phase 4. IIHR-Hydroscience and Engineering, The University of Iowa, Final Rep., 203 pp.

    • Search Google Scholar
    • Export Citation
  • Laio, F., and Tamea S. , 2007: Verification tools for probabilistic forecasts of continuous hydrological variables. Hydrol. Earth Syst. Sci., 11 , 12671277.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lee, M. S., Kuo Y. H. , Barker D. M. , and Lim E. , 2006: Incremental analysis updates initialization technique applied to 10-km MM5 and MM5 3DVAR. Mon. Wea. Rev., 134 , 13891404.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Liu, Z., Martina M. L. V. , and Todini E. , 2005: Flood forecasting using a fully distributed model: Application of the TOPKAPI model to the Upper Xixian Catchment. Hydrol. Earth Syst. Sci., 9 , 347364.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Martina, M. L. V., and Entekhabi D. , 2006: Identification of runoff generation spatial distribution using conventional hydrologic gauge time series. Water Resour. Res., 42 , W08431. doi:10.1029/2005WR004783.

    • Search Google Scholar
    • Export Citation
  • Mascaro, G., Deidda R. , and Vivoni E. R. , 2008: A new verification method to ensure consistent ensemble forecasts through calibrated precipitation downscaling models. Mon. Wea. Rev., 136 , 33743391.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mason, I., 1982: A model for assessment of weather forecasts. Aust. Meteor. Mag., 30 , 291303.

  • Matheson, J. E., and Winkler R. L. , 1976: Scoring rules for continuous probability distributions. Manage. Sci., 22 , 10871095.

  • Murphy, A. H., 1971: A note on the ranked probability score. J. Appl. Meteor., 10 , 155156.

  • Murphy, A. H., 1973: A new vector partition of the probability score. J. Appl. Meteor., 12 , 595600.

  • Over, T. M., and Gupta V. K. , 1994: Statistical analysis of mesoscale rainfall: Dependence of a random cascade generator on large-scale forcing. J. Appl. Meteor., 33 , 15261542.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Over, T. M., and Gupta V. K. , 1996: A space-time theory of mesoscale rainfall using random cascades. J. Geophys. Res., 101 , 2631926332.

  • Perica, S., and Foufoula-Georgiou E. , 1996: A model for multiscale disaggregation of spatial rainfall based on coupling meteorological and scaling descriptions. J. Geophys. Res., 101 , 2634726361.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Pessoa, M., Bras R. L. , and Williams E. R. , 1993: Use of weather radar for flood forecasting in the Sieve River basin: A sensitivity analysis. J. Appl. Meteor., 32 , 462475.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Rezacova, D., Sokol Z. , and Pesice P. , 2007: A radar-based verification of precipitation forecast for local convective storms. Atmos. Res., 83 , 211224.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Roulin, E., 2007: Skill and relative economic value of medium-range hydrological ensemble predictions. Hydrol. Earth Syst. Sci., 11 , 725737.

  • Roulin, E., and Vannitsem S. , 2005: Skill of medium-range hydrological ensemble predictions. J. Hydrol., 6 , 729744.

  • Schaake, J. C., Hamill T. M. , Buizza R. , and Clark M. , 2007: HEPEX: The Hydrological Ensemble Prediction Experiment. Bull. Amer. Meteor. Soc., 88 , 15411547.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Schertzer, D., and Lovejoy S. , 1987: Physical modeling and analysis of rain and clouds by anisotropic scaling of multiplicative processes. J. Geophys. Res., 92 , (D8). 96939714.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Seo, D-J., Perica S. , Welles E. , and Schaake J. C. , 2000: Simulation of precipitation fields from probabilistic quantitative precipitation forecasts. J. Hydrol., 239 , 203229.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Seo, D-J., Herr H. D. , and Schaake J. C. , 2006: A statistical post processor for accounting of hydrologic uncertainty in short-range ensemble streamflow prediction. Hydrol. Earth Syst. Sci., 3 , 19872035.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Skamarock, W. C., 2004: Evaluating mesoscale NWP models using kinetic energy spectra. Mon. Wea. Rev., 132 , 30193032.

  • Smith, L. A., and Hansen J. A. , 2004: Extending the limits of ensemble forecast verification with the minimum spanning tree. Mon. Wea. Rev., 132 , 15221528.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Swets, J., 1973: The relative operating characteristic in psychology. Science, 182 , 990999.

  • Talagrand, O., Vautard R. , and Strauss B. , 1997: Evaluation of probabilistic systems. Proc. ECMWF Workshop on Predictability, Vol. 125, Reading, United Kingdom, ECMWF, 1–25. [Available from ECMWF, Shinfield Park, Reading, Berkshire RG2 9AX, United Kingdom].

    • Search Google Scholar
    • Export Citation
  • Venugopal, V., Foufula-Georgiou E. , and Sapozhnikov V. , 1999a: Evidence of dynamic scaling in space-time rainfall. J. Geophys. Res., 104 , 3159931610.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Venugopal, V., Foufula-Georgiou E. , and Sapozhnikov V. , 1999b: A space-time downscaling model for rainfall. J. Geophys. Res., 104 , 1970519721.

  • Venugopal, V., Basu S. , and Foufoula-Georgiou E. , 2005: A new metric for comparing precipitation patterns with an application to ensemble forecasts. J. Geophys. Res., 110 , D08111. doi:10.1029/2004JD005395.

    • Search Google Scholar
    • Export Citation
  • Verbunt, M., Walser A. , Gurtz J. , Montani A. , and Schr C. , 2007: Probabilistic flood forecasting with a limited-area ensemble prediction system: Selected case studies. J. Hydrometeor., 8 , 897909.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Vivoni, E. R., Ivanov V. Y. , Bras R. L. , and Entekhabi D. , 2004: Generation of triangulated irregular networks based on hydrological similarity. J. Hydrol. Eng., 9 , 288302.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Vivoni, E. R., Ivanov V. Y. , Bras R. L. , and Entekhabi D. , 2005: On the effects of triangulated terrain resolution on distributed hydrologic model response. Hydrol. Processes, 19 , 21012122. doi:10.1002/hyp.5671.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Vivoni, E. R., Entekhabi D. , Bras R. L. , Ivanov V. Y. , Van Horne M. P. , Grassotti C. , and Hoffman R. N. , 2006: Extending the predictability of hydrometeorological flood events using radar rainfall nowcasting. J. Hydrometeor., 7 , 660677.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Vivoni, E. R., Entekhabi D. , Bras R. L. , and Ivanov V. Y. , 2007a: Controls on runoff generation and scale-dependence in a distributed hydrologic model. Hydrol. Earth Syst. Sci., 11 , 16381701.

    • Search Google Scholar
    • Export Citation
  • Vivoni, E. R., Entekhabi D. , and Hoffman R. N. , 2007b: Error propagation of radar rainfall nowcasting fields through a fully distributed flood forecasting model. J. Appl. Meteor. Climatol., 46 , 932940.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Vrugt, J. A., Clark M. P. , Diks C. G. H. , Duan Q. , and Robinson D. A. , 2006: Multi-objective calibration of forecast ensembles using Bayesian model averaging. Geophys. Res. Lett., 33 , L19817. doi:10.1029/2006GL027126.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Warner, T. T., Brandes E. A. , Sun J. , Yates D. N. , and Mueller C. K. , 2000: Prediction of a flash flood in complex terrain. Part I: A comparison of rainfall estimates from radar, and very short range rainfall simulations from a dynamic model and an automated algorithmic system. J. Appl. Meteor., 39 , 797814.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wilby, R. L., Hay L. E. , and Leavesley G. H. , 1999: A comparison of downscaled and raw GCM output: Implications for climate change scenarios in the San Juan River basin, Colorado. J. Hydrol., 225 , 6791.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2004: The minimum spanning tree histogram as a verification tool for multidimensional ensemble forecasts. Mon. Wea. Rev., 132 , 13291340.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2006: Statistical Methods in the Atmospheric Sciences. 2nd ed. Academic Press, 627 pp.

  • Wilks, D. S., and Hamill T. M. , 2007: Comparison of ensemble-MOS methods using GFS reforecasts. Mon. Wea. Rev., 135 , 23792390.

  • Wilson, L. J., Burrows W. R. , and Lanzinger A. , 1999: A strategy for verification of weather element forecasts from an ensemble prediction system. Mon. Wea. Rev., 127 , 956970.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wood, A. W., and Schaake J. C. , 2008: Correcting errors in streamflow forecast ensemble mean and spread. J. Hydrometeor., 9 , 132148.

  • Yuan, H., Mullen S. L. , Gao X. , Sorooshian S. , Du J. , and Juang H-M. H. , 2007: Short-range probabilistic quantitative precipitation forecasts over the southwest United States by the RSM ensemble system. Mon. Wea. Rev., 135 , 16851698.

    • Crossref
    • Search Google Scholar
    • Export Citation

Fig. 1.
Fig. 1.

Qualitative description of the attributes of the forecast quality and of the ensemble forecast characteristics. Let Q be our predictand variable. (a) Sharpness and bias: , , and are the PDFs of three ensemble models, and 〈Qobs〉 is the average of the observations. has the same sharpness of , but it is positively biased; has no bias, but it is less sharp than . (b) Uncertainty: The PDFs of the sets of observations has larger variance and then more uncertainty than . (c) Resolution: Let us consider the set of observations obtained after the values and have been forecasted. A system with high (low) resolution provides distinguishable (flatter) PDFs of the observations following and as and , respectively. (d) Underdispersion and overdispersion: for each forecast event, the observed value is drawn from Pobs. An underdispersed (overdispersed) forecast is obtained when the ensemble forecasts are drawn from PDF, since the position of the observation will be more likely at one of the tails (in the middle) of the ensemble members.

Citation: Journal of Hydrometeorology 11, 1; 10.1175/2009JHM1144.1

Fig. 2.
Fig. 2.

Schematic of the hydrometeorological ensemble forecasting system combining a precipitation downscaling model and a distributed hydrological model. First step: numerical weather prediction model provides coarse-scale precipitation in a spatiotemporal domain L × L × T over the study region. Second step: statistical downscaling model generates an ensemble of precipitation fields at the finescale λ × λ × τ. Third step: distributed hydrological model simulates ensemble hydrographs at multiple stream locations.

Citation: Journal of Hydrometeorology 11, 1; 10.1175/2009JHM1144.1

Fig. 3.
Fig. 3.

(a) Baron Fork basin in relation to the Arkansas Red River basin. (b) Basin boundaries with the five nested subbasins, including the USGS streamflow gauges at Baron Fork at Eldon, Peacheater Creek at Christie, and Baron Fork at Dutch Mills. (c) Terrain representation using a triangulated irregular network (TIN) derived from the 30-m Digital Elevation Model (DEM) in which the higher triangle density corresponds to more rugged topography (see Vivoni et al. 2007a).

Citation: Journal of Hydrometeorology 11, 1; 10.1175/2009JHM1144.1

Fig. 4.
Fig. 4.

An excerpt of the calibration experiment during summer 2000 with precipitation input provided by 4-km, 1-h NEXRAD estimates, illustrating observed and simulated streamflow at the basin outlet at (top) Eldon and two nested locations at (middle) Peacheater Creek, and (bottom) Dutch Mills. The boxes in each panel report zooms on two streamflow peaks.

Citation: Journal of Hydrometeorology 11, 1; 10.1175/2009JHM1144.1

Fig. 5.
Fig. 5.

Example of the synthetic observed precipitation database for summer 2000. (a) Time series of Ri,2000 (i = 1, … , 138) with time step Δt = 16 h. (b) Time series of the mean areal precipitation (MAP) with resolution Δt = 15 min over Baron Fork basin calculated from each precipitation field downscaled from the coarse Ri,2000. (right) The detail for one 16-h-long event.

Citation: Journal of Hydrometeorology 11, 1; 10.1175/2009JHM1144.1

Fig. 6.
Fig. 6.

Calibration modes derived from the set of synthetic observed rainfall events and used to generate the ensemble QPFs with different characteristics. Asterisks represent the STRAIN parameter values estimated on the single event k and plotted vs the correspondent coarse-scale mean rainfall Rk. The continuous line is the calibration relation c = c(R) [see Eq. (2)] fitted on the and the dashed line is the mean cmean of the . For each forecast event k, three ensembles of QPFs are generated by selecting parameter ck according to three different calibration modes: the functional based [ck = c(Rk)] leading to a consistent ensemble; the event based leading to an overdispersed ensemble; and the mean based (ck = cmean) leading to an underdispersed ensemble. For more details, see Mascaro et al. (2008).

Citation: Journal of Hydrometeorology 11, 1; 10.1175/2009JHM1144.1

Fig. 7.
Fig. 7.

Results of the ensemble QPF verification: (a) consistent ensembles generated with the functional-based calibration mode, (b) underdispersed ensembles generated with the mean-based calibration mode, and (c) overdispersed ensembles generated with the event-based calibration mode. The histograms are built with the exceedance probability of the precipitation threshold i* = 50 mm h−1 as univariate predictand and are plotted in 10 bins to group the 100 ranks. The horizontal lines are the 5%, 50%, and 95% quantiles of a uniform distribution. Each histogram is plotted with the empirical CDF of k that accounts for the random assignment of the rank. Since k = 0 for all the events in each case, all the ranks were always unequivocally assigned [see Mascaro et al. (2008) for further details].

Citation: Journal of Hydrometeorology 11, 1; 10.1175/2009JHM1144.1

Fig. 8.
Fig. 8.

Ensemble streamflow forecast verification method. (a) Time series of observed streamflow with duration Thydro, where a set of Nev intervals (Tver hour long) are randomly selected. (b) Observed and Nens ensemble hydrographs for a generic event k of length Tver, from which (Nens + 1) values of a metric Qm are computed. (c) Empirical CDF of the sorted vector containing [extracted from the hydrographs of (b)] used to determine the nonexceedance probability pk of .

Citation: Journal of Hydrometeorology 11, 1; 10.1175/2009JHM1144.1

Fig. 9.
Fig. 9.

Event-based approach applied in the synthetic forecast experiments. (a) Rainfall input (plotted as MAP) for the hydrological model. Precipitation is observed up to t*, the time when coarse precipitation forecast R is provided. Forecasted precipitation is downscaled in the interval [t*, t* + T]. After t* + T, zero-padding rainfall is assumed up to t* + Tver. (b) Ensemble and observed hydrographs returned by the hydrological model. Hydrographs in the interval Tver = T + Tc are used to calculate the rank of the observation according to the verification procedure.

Citation: Journal of Hydrometeorology 11, 1; 10.1175/2009JHM1144.1

Fig. 10.
Fig. 10.

Results of the ensemble streamflow verification. VRHs are built from the ensemble streamflows obtained when the tRIBS model is forced with consistent, underdispersed, and overdispersed ensemble QPFs. Results are shown for the basin outlet, and VRHs are computed using the metrics Q1h, Q16h, and Q32h. The 100 ranks are grouped in 10 bins, and the horizontal lines represent the 5%, 50%, and 95% quantiles of a uniform distribution.

Citation: Journal of Hydrometeorology 11, 1; 10.1175/2009JHM1144.1

Fig. 11.
Fig. 11.

Effect on the VRH due to a posterior selection of events based on the value of the observations. Two histograms were built using the (left) 50 lowest and (right) 50 highest for the (top) basin outlet and (bottom) subbasin 5.

Citation: Journal of Hydrometeorology 11, 1; 10.1175/2009JHM1144.1

Fig. 12.