1. Introduction
Large-scale hydrological models have proved to be valuable tools for assessing fluctuations in terrestrial water stores and fluxes on continental and global scales (e.g., Dirmeyer 2011; Dirmeyer et al. 2006; Milly et al. 2005). To date, models describing the terrestrial water balance have been developed by different communities and parallel terminologies, and modeling philosophies have emerged (Haddeland et al. 2011). Among the most commonly used terms are global hydrology models (GHMs), focusing on closing the water balance for the purpose of water resource assessment, and land surface models (LSMs) that were historically developed to provide lower boundary conditions for atmospheric circulation models with a focus on the surface water and energy balances. However, many models (both GHMs and LSMs) share essentially the same conceptualization of the water fluxes (Haddeland et al. 2011). Thus, all models that resolve the terrestrial part of the water cycle at global and continental scales will in the following be referred to as large-scale hydrological models.
Various efforts have been made to evaluate large-scale hydrological models, including macroscale studies that compare observed and modeled continental river discharge (e.g., Balsamo et al. 2009; Decharme and Douville 2007; Gerten et al. 2004; Hagemann et al. 2009), as well as studies with relatively detailed spatial and temporal resolution on continental and global scales (e.g., Döll et al. 2003; Hunger and Döll 2008; Troy et al. 2008; Widén-Nilsson et al. 2009; Stahl et al. 2011). Generally the focus is on evaluating a single model, possibly with a new representation of certain processes. Another approach is followed by large model intercomparison exercises that focus less on model evaluation by comparison to observations, and rather more on identifying differences in model dynamics. Examples are the Project for Intercomparison of Land Surface Parameterization Schemes (PILPS) (Henderson-Sellers et al. 1995), the Global Soil Wetness Project (GSWP) (Oki et al. 1999; Dirmeyer et al. 2006; Dirmeyer 2011), and the Water Model Intercomparison Project (WaterMIP) (Haddeland et al. 2011). In general, these studies conclude that there are large differences between the models, which may be caused by incomplete process understanding, different parameter estimates, and imperfect atmospheric forcing data.
Several multimodel evaluation studies not only compare individual models to observations, but also investigate the behavior of the mean of all models, commonly referred to as the ensemble mean. Being widely applied in atmospheric science (e.g., Reichler and Kim 2008; Hagedorn et al. 2005; Palmer et al. 2004), so-called ensemble techniques are also increasingly used in the evaluation of large-scale hydrological models. So far most studies that employed ensemble techniques in the context of large-scale hydrological modeling have focused on the mean annual cycle of monthly discharge from large, continental-scale river basins. Generally these studies show that the uncertainty in river discharge introduced by the use of different atmospheric forcing models (Nohara et al. 2006; Hagemann and Jacob 2007) and different land surface schemes (Materia et al. 2010) can be reduced by ensemble techniques. Several studies have compared soil moisture simulations from the GSWP to monthly observations from a global observation network (e.g., Gao and Dirmeyer 2006; Guo and Dirmeyer 2006; Guo et al. 2007). These studies assessed, amongst others, the ability of the ensemble members to capture mean values, the phasing of the annual cycle, and the interannual variability, showing that the ensemble mean was closer to the observations than most participating models (Gao and Dirmeyer 2006; Guo and Dirmeyer 2006; Guo et al. 2007).
Relatively few studies evaluated large-scale hydrological models with respect to their ability to capture hydrological extremes, and consequently no standard procedure has been established. Most available studies have focused on the analysis of daily river discharge, partly because the observational time window is longer, and partly because this increases the number of observations, which renders model validation more reliable. Lehner et al. (2006), for example, evaluated the ability of the Water—Global Analysis and Prognosis (WaterGAP) model to capture the average magnitude and return periods of annual flood and drought statistics in Europe based on daily data. They concluded that the model captured average annual low and high flows reasonably well, but had a tendency to overestimate the return periods of extreme events. Similarly, Hirabayashi et al. (2008) compared the estimated return periods of seven disastrous floods around the globe to the results from a global offline simulation with daily resolution and concluded that the return period of the simulated events compared reasonably well to the observed values. However, Hirabayashi et al. (2008) also pointed out that a statistically reliable evaluation of model performance with respect to extremes on large (global) scales is hampered by the scarcity of long-term observations. Recently, Feyen and Dankers (2009) compared the return periods of selected low-flow statistics derived from observed and simulated daily data from rivers across Europe, highlighting deficiencies of the simulations in the frost season. In an accompanying study, Dankers and Feyen (2009) reported that the simulations captured peak flows from large river basins quite well, whereas the performance was at times poor in small catchments. It shall be noted that all the above studies are based on data from the Global Runoff Data Centre (GRDC; http://grdc.bafg.de/), which provides a collection of observations from relatively large river basins.
The main focus of the studies summarized above was to investigate the impacts of climate change on hydrological variables. Therefore, in these studies model evaluation was only regarded as a prerequisite to further analysis and thus often received little attention. In contrast, Stahl et al. (2011) focused solely on the evaluation of simulated runoff (7-day running mean) from a regional climate model in Europe with respect to 19 different anomaly levels, ranging from low to high flows. Comparing event dynamics and interannual variability, the lowest agreement was found for the dry anomalies and that model performance was best for moderately wet anomalies.
Studies evaluating multimodel ensembles have focused mainly on mean water balance components and rarely on hydrological extremes. This is partially due to limits from the temporal resolution of the commonly stored summary statistics (e.g., monthly means) and relatively short integrations that preclude a proper analysis of extremes. To overcome such limitations, a major effort was made within the European Framework Project Water and Global Change (WATCH; www.eu-watch.org) to create a multimodel ensemble of large-scale hydrological models with summaries available on a daily resolution. The main objective of this study is to get first insights into the ability of the WATCH multimodel ensemble to capture hydrological extremes, with respect to both their magnitude and interannual variability on a large, continental scale.
The observed data used in this model evaluation exercise comprise time series from a large number of small, nearly natural catchments in Europe that are not nested (see section 2b for details). In contrast to discharge from large river basins, which are often strongly influenced by human activities (Döll et al. 2009), observations from small undisturbed catchments are often more likely to represent the natural system behavior. Further, discharge observations from large rivers are bound to suffer from small sample sizes, as there are a small number of continental-scale drainage basins. A small sample size increases the risk that observation errors lead to biased results in the model evaluation. It is also interesting to note that the mathematical structure underlying individual grid cells in large-scale models is often comparable to the model structure of so-called lumped catchment models, which are commonly used to model streamflow from small catchments (see Clark et al. 2008, 2011b for a comprehensive overview). One example from the current ensemble is the Global Water Availability Assessment (GWAVA) model (Meigh et al. 1999), which uses the commonly applied lumped Probability Distributed Model (Moore 2007, 1985) to parameterize gridcell processes.
However, the use of streamflow observations from small catchments to evaluate large-scale hydrological models raises several issues. Streamflow observations are prone to measurement errors (e.g., Di Baldassarre and Montanari 2009) that are known to affect the calibration of hydrological models (e.g., Reitan and Petersen-Øverleir 2009; McMillan et al. 2010) and consequently also the performance assessments of large-scale hydrological models. Strategies to incorporate these observational errors into predictive uncertainty, however, are not well established and are subject to ongoing research (e.g., Kavetski et al. 2006; Renard et al. 2010). The model parameters at each grid cell, derived from large-scale maps, are unlikely to perfectly characterize the true catchment properties and this may result in large discrepancies between observed and simulated runoff at the gridcell scale. It is important to note that model parameters such as vegetation and soil properties exhibit high spatial variability (Duan et al. 2006). Maps used to derive model parameters are therefore highly uncertain and parameter estimates based on different map sources may hence result in significant differences in simulated system behavior (Teuling et al. 2009).
One approach to minimize the effect of the large uncertainty in model parameters at the gridcell scale is to focus on spatially aggregated system behavior. For example, in atmospheric sciences it is common to investigate time series of variables that have been averaged over large spatial areas. One example is the assessment of time series of mean global temperature (e.g., Hansen et al. 2006; Macadam et al. 2010). This study adapts this strategy as it agrees with the main objective, which is to evaluate the ability of the WATCH multimodel ensemble to capture key aspects of the interannual variability of runoff in Europe. Importantly, we use data from the level of the grid cell and small catchments, and then aggregate to the larger scale, rather than just using data from continental-scale catchments, for the reasons outlined above.
The remainder of this article is organized as follows: first, the multimodel ensemble of nine large-scale hydrological models and the observed streamflow data are introduced. In the methods section, statistical summaries that represent low, mean, and high flows over large (continental) scales are defined, followed by the introduction of three performance metrics. The results of the analysis are then presented and discussed. The paper concludes with comments on the ability of the multimodel ensemble to simulate European, large-scale hydrology, with special emphasis on low and high river flows.
2. Models and observations
a. Individual models and ensemble mean
Overview of the participating models and their main characteristics. Models written in italic are classified as LSMs. Surface runoff (Qs) is in all instances modeled as saturation or infiltration excess or both; the following abbreviations refer to approaches to parameterize subgrid variability: ARNO (Todini 1996), improved ARNO (Dümenil and Todini 1992), and Probability Distributed Model (PDM) (Moore 1985). Subsurface runoff (Qsb) is either modeled as a function of soil moisture Qsb = Qd = f(Ssoil) or groundwater Qsb = f(Sgw), where f(S) denotes linear or nonlinear model specific functions (“Richards”: N-layer approximation of Richards equation). Adapted from Haddeland et al. (2011).
Brief descriptions of the nine large-scale hydrological models.
The structure underlying most of the models is illustrated in Fig. 1, indicating the different conceptual storages and fluxes. Note that not every model considers all elements of this generalized architecture and the models differ in their representation of the processes.
Simplified conceptualization of state (storage) and flux variables involved in runoff generation. Not all variables are considered in each model. See Table 1 for an overview of the models.
Citation: Journal of Hydrometeorology 13, 2; 10.1175/JHM-D-11-083.1
Despite large differences in the description of subsurface processes, all models simulate Qs (water leaving the grid cell on the surface) and Qsb (water leaving the grid cell below the surface). In Fig. 1, Qsb represents the outflow from groundwater storage (Sgw); however, not all models simulate Sgw. In such cases, the water draining from the lowest soil layer (Qd) is used to represent subsurface runoff (Qsb = Qd; Table 1).
The simulation setup is, except for the time window and the temporal resolution of the stored output, identical to that described by Haddeland et al. (2011). Model runs for the time window 1963–2000, with output data available at daily time steps, were considered. The runs were preceded by a spinup period of 5 yr. All model simulations were carried out on the 0.5° grid defined by the Climate Research Unit (CRU) of the University of East Anglia global land mask. No effort was made to harmonize model parameters, but the models were forced by the same meteorological data—the so-called WATCH Forcing Data (WFD; Weedon et al. 2010, 2011). The WFD are based on the 40-yr European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis (ERA-40; Uppala et al. 2005) interpolated to the 0.5° grid defined by the CRU land mask and then adjusted for elevation differences. Air temperature is bias corrected and shortwave radiation adjusted according to cloud cover and aerosol loading using the CRU data (Mitchell and Jones 2005; New et al. 1999, 2000). Precipitation is bias corrected using the Global Precipitation Climatology Centre full product (GPCCv4) data (Rudolf and Schneider 2005; Schneider et al. 2010; Fuchs 2009) and undercatch corrected (Adam and Lettenmaier 2003). The simulations assumed “naturalized” conditions, which means that direct anthropogenic effects such as dams and water abstraction were not included. This is consistent with the use of observations from undisturbed catchments.
Besides the runoff simulations of individual models, this study also analyzes the arithmetic mean of the runoff simulations of the multimodel ensemble. This mean will in the following be referred to as the “ensemble mean” (or ENSEMBLE) and is treated as a separate model throughout the analysis.
b. Observations
Daily streamflow series from 426 near-natural and spatially independent headwater catchments across Europe were considered. The records cover the time period 1963–2000 and originate from the European Water Archive (EWA)—a database assembled by the European Flow Regimes from International Experimental and Network Data (Euro-FRIEND; http://ne-friend.bafg.de/servlet/is/7413/) project. The EWA is accessible to active members of FRIEND and stored at the GRDC, which also manages data requests. The EWA dataset was recently updated (Stahl et al. 2008) and further complemented by partners from the WATCH project and is described in detail in Stahl et al. (2010). Observed streamflow (m3 s−1) was converted into equivalent runoff rates (mm day−1), which we will refer to as observed runoff. Catchment boundaries and mean catchment elevation, based on a high-resolution digital elevation model, were derived from the pan-European river and catchment database Catchment Characterisation and Modeling 2 (CCM2; Vogt et al. 2007). The majority of the catchments have an area that is considerably smaller (median catchment size 258 km2) than the size of the 0.5° model grid cells (Fig. 2). The size of a grid cell varies, depending on the latitude, between 1065 km2 (at 70°N) and 2387 km2 (at 39.5°N). To compare observations and simulations, each gauging station was assigned to the corresponding grid cell and, in cases with more than one station per grid cell, the area-weighted average of the series was used. This procedure resulted in 298 grid cells with observed runoff series. Figure 3 shows the spatial distribution of the grid cells as well as the boundaries of the corresponding catchments. The spatial density and extent of observed runoff were limited by data availability, with most stations located in central Europe. The median elevation of the catchments is 525 MSL and the average elevation of the selected grid cells is 439 MSL. This systematic lower gridcell elevation may be a result of small headwater catchments being located in higher altitudes, while the grid cells reflect the average elevation of larger areas.
Histogram of catchment areas. The vertical dashed lines indicate the range of the size of a 0.5° × 0.5° grid cell between the extremes at the lowest and highest latitudes of the spatial domain.
Citation: Journal of Hydrometeorology 13, 2; 10.1175/JHM-D-11-083.1
Map showing grid cells with observations and associated catchment boundaries.
Citation: Journal of Hydrometeorology 13, 2; 10.1175/JHM-D-11-083.1
3. Methods
Observed and modeled daily runoff series were aggregated into time series of annual runoff percentiles at five different percentile levels. Low flows are characterized by series of annual 5 percentiles (Q5), mean flows by series of annual 50 percentiles (Q50; i.e., annual medians), and high flows by series of 95 percentiles (Q95). The notion of percentiles follows the statistical convention commonly used in the United States (representing cumulative or nonexceedance frequencies) and not the hydrological one commonly used in Europe (representing exceedance frequencies). Extreme high and low values are often prone to measurement errors (Laaha and Blöschl 2007) and, therefore, this study excludes annual maximum and minimum values. To provide insights into the entire flow range, two additional percentile series were introduced to characterize moderately low (Q25) and high (Q75) values. It can be argued that this set of five percentile series is sufficient to characterize the overall flow range, as previous results have demonstrated that the information gain by introducing additional percentile levels is limited for continental-scale analysis (Gudmundsson et al. 2011a). This procedure resulted in a set of five time series of annual runoff percentiles for both observed and modeled runoff in each grid cell. The time series from the individual grid cells were then aggregated using the median to obtain one time series for each runoff percentile, resulting in a total of five time series of average percentile values for both simulated and observed values.
Finally, the relative merits of the individual models were assessed by ranking their performance (e.g., Gleckler et al. 2008; Macadam et al. 2010). A ranking procedure allows for an easy combination of several performance metrics, even if they have different scales (such as R2, Δμ, and Δσ). However, a ranking will not allow insights into the “absolute performance” of the models; rather it allows the models to be ordered from the one that is on average closest to the observations (rank 1) to the most distant one.
To do an overall ranking, the values of the three performance metrics for each model and runoff percentile were summarized in Table 3, where the columns represent the models and the rows the performance metrics derived for each runoff percentile. First, the values of each row were ranked such that the model being closest to the optimal value (0 for Δμ and Δσ; 1 for R2) gets rank 1, the next model rank 2, and so on. This procedure results in a new matrix of ranks, which is then summarized to achieve an overall ranking. First, the sum of ranks for each model (columns) is determined and the models are then ordered from the best-performing model (lowest rank sums) to the model with lowest performance (highest rank sums). Finally, the rank sums are replaced by the overall ranks.
Model performance for the five runoff percentiles as measured by the correlation coefficient (R2), the relative difference in mean (Δμ), and the relative difference in standard deviation (Δσ). The best models in each row are in boldface. The column labeled percentile median provides median model performance for each runoff percentile. The three rows labeled model median give median performance for each model. The last row ranks the performance of the models. The last column ranks the overall model performance for a given runoff percentile.
Similarly, the percentiles can be ranked by reorganizing the initial matrix in such a way that the columns represent the runoff percentiles and the rows represent the performance of each model. The percentile with the highest rank will then be the percentile value that is overall best reproduced by the models. A similar set of performance metrics was used in a parallel study (Gudmundsson et al. 2011c, manuscript submitted to Water. Resour. Res.) to quantify the models’ ability to capture the mean annual cycle of runoff with respect to different hydroclimatic regimes as well as the uncertainty of the associated spatial patterns.
4. Results
Figure 4 displays the spatially aggregated time series of observed and modeled runoff percentiles and Fig. 5 shows the mean value of each series. Overall, the models capture the temporal evolution of the interannual variability of observed runoff well. However, there are differences in the mean value as well as in the amplitude of the annual percentile series. For the highest runoff percentile (Q95), the models scatter evenly around the observed values. For all other runoff percentiles, most of the models underestimate the observations and there are, in some instances, also pronounced differences in the amplitude of the series. For example, H08 has a lower amplitude in the Q75 series than any other model, and some models [the hydrological model of the Max Planck Institute for Meteorology (MPI-HM) and Lund–Potsdam–Jena managed Land (LPJmL)] have almost constant values throughout the years for the two lowest runoff percentiles (Q5 and Q25). The LSM Minimal Advanced Treatments of Surface Interaction and Runoff (MATSIRO) is the only model that consistently overestimates the three lowest percentile levels.
Annual time series of observed and modeled runoff percentiles across Europe. Note the different scales of the y axes.
Citation: Journal of Hydrometeorology 13, 2; 10.1175/JHM-D-11-083.1
Mean value of the runoff percentiles series (see Fig. 4).
Citation: Journal of Hydrometeorology 13, 2; 10.1175/JHM-D-11-083.1
Table 3 quantifies the differences between the observed and modeled runoff percentiles based on the three performance metrics R2, Δμ, and Δσ, and Fig. 6 summarizes the range of the performance metrics for each of the five runoff percentiles. The column “percentile median” in Table 3 provides the median of each performance metric for the different runoff percentiles and corresponds to the horizontal bars in Fig. 6. Numerical values reported in the following paragraph refer to these values if not specified differently. The correlation coefficients (R2), quantifying the similarity of the temporal evolution of observed and modeled runoff percentiles, are on average highest for Q95 (
Comparison of model performance for the different runoff percentiles. Performance is measured by (left to right) correlation (R2), relative bias (Δμ), and the relative difference in standard deviation (Δσ) for the five runoff percentiles. (Bar: median, box: interquartile range, and whiskers: range).
Citation: Journal of Hydrometeorology 13, 2; 10.1175/JHM-D-11-083.1
Figure 7 summarizes the performance of the individual models. The rows “model median” in Table 3 provide the median performance for each model averaged over all runoff percentiles and correspond to the bars in Fig. 7. The numbers reported in this paragraph refer to these median values if not stated differently. On average the Joint U.K. Land Environment Simulator (JULES) captures the interannual variability of the observed Q95 series best (
As in Fig. 6, but for the participating models.
Citation: Journal of Hydrometeorology 13, 2; 10.1175/JHM-D-11-083.1
The last column in Table 3 ranks the ability of the models (including the ENSEMBLE) to reproduce the interannual dynamics of European runoff percentiles. The overall model performance decreases systematically from high (Q95; rank 1) to low (Q5; rank 5) percentiles, implying that the models capture annual high flows better than annual low flows. Note, however, that this ranking is not strictly monotonic (anomaly in the ordering of Q50 and Q75). Interestingly, the tendency for poorer model performance for the low runoff percentiles is not only manifested in a drop in average model performance, but also by an increasing spread in Δμ and Δσ (Fig. 6).
Table 3 also shows the ranking of the models themselves. The ENSEMBLE ranks number one, followed by GWAVA, JULES, and HTESSEL. A careful inspection of Table 3 confirms that the three highest-ranking models are closest to the observations with respect to the correlation coefficient (R2) and the relative difference in mean (Δμ). For the relative difference in standard deviation (Δσ), however, this is not strictly the case, and more midranking models exhibit a closer similarity to the observations. In general, no single performance metric could be identified that clearly explains why some models perform better than others. There is rather a tendency for a uniform decrease in all three criteria from the highest- to the lowest-ranked model.
5. Discussion
The comparison of the five aggregated time series of observed and simulated annual runoff percentiles not only provided insights into the ability of individual models to capture the magnitude and dynamics of annual runoff percentiles, but also allowed for an assessment of the overall performance of the multimodel ensemble. A good model performance with respect to interannual variability of all runoff percentiles (as reflected by relatively high R2) is most likely related to the fact that the dynamics of annual runoff closely follow those of the atmospheric drivers. Shorthouse and Arnell (1997, 1999), for example, have demonstrated the coupling between atmospheric oscillation indices and river flow in Europe, and recently Gudmundsson et al. (2011b) showed that the dominant space–time patterns of European low-frequency runoff variability (variability on time scales longer than 1 yr) were closely related to the corresponding patterns of precipitation and temperature. This dependence of runoff on atmospheric variability suggests that simulated runoff on interannual time scales may be more sensitive to the data product used to force the models than to the parameterization of terrestrial hydrological processes. In fact, it has been previously demonstrated that simulated river discharge from continental-scale basins is highly sensitive to the choice of forcing data (e.g., Nasonova et al. 2011; Materia et al. 2010; Gerten et al. 2008; Hagemann and Jacob 2007).
The models’ ability to capture the interannual variability was contrasted by a systematic underestimation of observed runoff in Europe. In a global analysis of discharge from continental-scale river basins (e.g., Amazon, Congo, and Lena) using a multimodel ensemble comparable to the ensemble used in this study, Haddeland et al. (2011) did not find similar consistent patterns of underestimation. They rather found large regional differences, with a tendency to underestimate observed discharge from river basins at high latitudes. In principal, a bias in the mean can either be attributed to biased atmospheric input variables (e.g., Nasonova et al. 2011; Teutschbein and Seibert 2010) or to a too-rapid depletion of stores through modeled evapotranspiration. The consistency of the underestimation in the present study, however, points toward biased forcing data, for example, because of the fact that local orographic effects on precipitation cannot be resolved within large grid cells of atmospheric reanalysis or interpolated data products. It is, for example, well documented that the ERA-40 data underlying the WFD underestimate precipitation in regions with complex topography (e.g., Adam et al. 2006; Barstad et al. 2009) and the bias correction procedure underlying the WFD does not account for orographic effects on precipitation (Weedon et al. 2010, 2011). Thus, this likely explains some of the biases in simulated runoff. Additional observations would be needed to investigate this further, which is beyond the scope of this study.
One of the most striking results of the model evaluation is the systematic decrease in model performance from wet to dry runoff percentiles (Table 3, Fig. 6). Both Δμ and Δσ are relative measures and the impact of small absolute errors is larger for small observed values. Therefore, both Δμ and Δσ can increase in magnitude for the lower runoff percentiles even if the absolute value of the error is constant. The existence of such effects is to some extent supported by Fig. 5, where the differences in observed and simulated mean values are almost constant throughout the runoff percentiles. This shows that there are only minor differences in the absolute model error between high and low flows. Despite such artifacts there are good reasons for normalizing the model error. The difference between low and high flows is larger than one order of magnitude. Therefore, model errors that are not normalized simply would follow this pattern, rendering interpretations difficult. Further, an error of a particular magnitude will be less relevant for large than for small values. This is especially the case if the error has the same magnitude as the observed quantity itself. In this context, it shall also be emphasized that the ranking of model performance has to be interpreted with caution and is only thought of as guidance for the careful inspection of the performance metrics themselves. Because of the nature of the procedure, small, possibly insignificant, differences may alter the ranking. Therefore, it is likely that neighboring ranks in fact represent broadly comparable performances. An alternative approach to make an average ranking (such as in this study) more reliable is to introduce weights for the different performance metrics such that metrics with a larger spread will have a larger influence on the overall ranking (e.g., Gulden et al. 2008; Gleckler et al. 2008). However, the choice of weights is nontrivial and results may depend on the method selected. Therefore, we opted to present only an unweighted ranking.
The large differences in model performance, especially for the lowest runoff percentiles, demonstrate the uncertainty associated with the appropriate mathematical representation of hydrological systems. Resolving this structural uncertainty is a subject of ongoing research (e.g., Gupta et al. 2008; Rosero et al. 2009; Martinez and Gupta 2010; Clark et al. 2011b,a) and would go beyond of the scope of the current study. Other sources of uncertainty are related to the estimation of model parameters. The models use a wide range of data products to determine soil properties and vegetation characteristics and different models may even have different interpretations of the same data source. For example, Teuling et al. (2009) demonstrated that soil properties derived from three different data products used in the European Land Data Assimilation System (ELDAS) project led to significant differences in the system behavior of a stochastic soil moisture model. The data products used to retrieve model parameters were not harmonized for the present ensemble and, even if some of the models rely on the same input maps, the processing and interpretation of the mapped values to derive the parameters may differ substantially. For example, H08 assumes a soil layer with an uniform soil with a depth of 1 m and a field capacity of 15 cm throughout all grid cells (Hanasaki et al. 2008), while the soil parameters of HTESSEL are taken from the Food and Agriculture Organization (FAO) dataset (FAO 2003), and ORCHIDEE determines the parameters of the Van Genuchten equations based on the suggestions of Carsel and Parrish (1988) for U.S. Department of Agriculture (USDA) soil types. A similar diversity of data products and approaches is also the case for other parameters such as vegetation characteristics.
It is regularly observed that hydrological models with mathematical structures that are comparable to the models in the current ensemble often have deficiencies in simulating the lowest flows correctly (Smakhtin 2001; Stahl et al. 2011). To date, the reason for high flows being better (and more consistently) simulated than low flows is not fully understood. The fact that four of the five lowest-ranking models overestimate Q95, followed by an increasingly pronounced underestimation in all other runoff percentiles (see Fig. 5), suggests that some models release too much of the incoming precipitation too quickly. Consequently, too little water is stored in soils and aquifers, which in turn may lead to pronounced underestimation of the lowest flows. The only model to exhibit an opposite behavior is MATSIRO, which reacts too slowly to precipitation as it underestimates the magnitude of high flows and overestimates the low flows.
Most models capture the standard deviation of Q95 relatively well, but large discrepancies are found in the standard deviations of the annual low flows. This may be a result of high flows (and floods) being more directly coupled to atmospheric variability than low flows. Thus, the variance of high flows, as well as the temporal evolution, is likely to be directly related to precipitation variability, whereas low flows are to a much larger extent influenced by terrestrial hydrological processes. Various empirical studies support this. For example, Gudmundsson et al. (2011a) demonstrated, using the same observed dataset that is the basis for this study, that annual high flows have a high degree of synchronization across Europe, reflecting their link to atmospheric variability. Low flows, on the other hand, were found to have a more complex spatial pattern and a lower degree of synchronization, suggesting an increasing influence of catchment processes under dry conditions. Similarly, Bouwer et al. (2008) found that annual maximum river discharges in Europe were more sensitive to variations in the atmospheric forcing than annual mean discharges. It is also noteworthy that statistical moments of mean annual floods have been reported to be significantly correlated to the hydroclimatic conditions, but not to static catchment properties such as geology and soil types (Merz and Blöschl 2009). In summary, these results suggest that continental-scale patterns of runoff response are closely linked to the atmospheric forcing under wet conditions, irrespective of the properties of the catchments. Under dry conditions on the other hand, runoff depends primarily on depleting storages, the extent and properties of which vary strongly with topography and hydrogeology (Smakhtin 2001; Whitehouse et al. 1983) as well as on the antecedent moisture conditions.
The large differences in performance between models are contrasted by the good performance of the ensemble mean (ENSEMBLE). The present study showed that the ENSEMBLE is actually closer to the observed series of annual high flows (Q95) and low flows (Q5) than any other model with respect to R2, and has a performance comparable to the best models with respect to Δμ and Δσ (Table 3). The ENSEMBLE is also superior for the simulation of low and high flows, which can likely be related to the fact that the percentile series provide robust estimates of annual high and low flows, but do not take the actual timing of flow events into account. Accordingly, ensemble techniques appear to increase the reliability of simulations of the terrestrial water cycle with respect to extremes on large spatial and temporal scales. The reason for the superiority of the ENSEMBLE compared to any individual model is not clear, but a possible explanation is that the model solutions scatter more or less evenly around the true value (unless the errors are systematic), and thus, the errors behave like random noise that can be efficiently removed by averaging. Note, however, that in the present study this is only the case for the highest flows (Fig. 4). For climate simulations, such noise arises from the simulated internal climate variability and from uncertainties in the model parameterizations (Reichler and Kim 2008). Similar arguments also hold for hydrological systems where the uncertainty on the “true” physical representation may lead to an even scatter of model errors around the observations, and thus increases the reliability of the predictions.
6. Summary and conclusions
This study assessed the ability of an ensemble of nine large-scale, hydrological models to capture the magnitude and the interannual variability of runoff percentiles representing dry, mean, and wet conditions in Europe. In contrast to other studies that evaluate the performance of large-scale hydrological models using only a few continental-scale river basins, this study uses observation-based runoff estimates in 298 grid cells. The gridded runoff was derived from gauged river flow series from 426 small, near-natural catchments, reducing the risk of biased conclusions due to observation error. To minimize the effect of local parameter uncertainty and to focus on the dominant patterns of interannual variability, spatially aggregated time series were analyzed.
Overall, the ensemble members were able to capture the temporal evolution of the interannual variability, measured by the correlation coefficient R2, reasonably well. However, an overall tendency toward underestimation of runoff was found, and both structural issues common to all models and biases in the forcing data are plausible explanations.
Model performance decreases from wet to dry conditions. This change in average model performance is accompanied by an increasing spread in the relative error in the mean (Δμ) as well as in the standard deviation (Δσ) for the low runoff percentiles. One possible explanation is that hydrological systems are more closely coupled to the meteorological forcing under wet conditions, whereas runoff under dry conditions depends more on storage processes whose parameterization are highly uncertain.
The large differences in performance among the models are contrasted by the fact that the ENSEMBLE, the mean over all models, provides the most reliable estimation of spatially aggregated time series of all annual runoff percentiles. The ensemble mean not only provides a good overall estimator, but is also closer to the series of annual high flows (Q95) and low flows (Q5) than most models. This leads us to caution against the use of a single model in climate impact assessment, which is associated with a high risk of biased conclusions, and rather recommend the use of multimodel ensembles.
A principle limitation of this study is the loss of information due to the spatial aggregation in data preprocessing. Possible approaches to gain insights to the spatial patterns of model performance could include the analysis of smaller regions or more “intelligent” data preprocessing to define and extract signals (e.g., the mean annual cycle and leading empirical orthogonal functions) that are expected to be reproduced by the models. These issues are subject to ongoing research and addressed in a parallel study (Gudmundsson et al. 2011c, manuscript submitted to Water. Resour. Res.).
Acknowledgments
This research contributes to the European Union (FP6) funded Integrated Project WATCH (Contract 036946). The provision of streamflow data by all agencies that contributed data to the EWA-FRIEND or to the WATCH project is gratefully acknowledged. We further acknowledge the contribution of Pedro Viterbo and Sandra Gomes from the University of Lisbon and Jan Polcher Laboratoire de Meteorologie Dynamique (Paris) for providing model results and helpful comments.
REFERENCES
Adam, J. C., and Lettenmaier D. P. , 2003: Adjustment of global gridded precipitation for systematic bias. J. Geophys. Res., 108, 4257, doi:10.1029/2002JD002499.
Adam, J. C., Clark E. A. , Lettenmaier D. P. , and Wood E. F. , 2006: Correction of global precipitation products for orographic effects. J. Climate, 19, 15–38.
Alcamo, J., Petra Döll P. , Henrichs T. , Kaspar F. , Lehner B. , Roumlsch T. , and Siebert S. , 2003: Development and testing of the WaterGAP 2 global model of water use and availability. Hydrol. Sci. J., 48, 317–337, doi:10.1623/hysj.48.3.317.45290.
Balsamo, G., Beljaars A. , Scipal K. , Viterbo P. , van den Hurk B. , Hirschi M. , and Betts A. K. , 2009: A revised hydrology for the ECMWF model: Verification from field site to terrestrial water storage and impact in the integrated forecast system. J. Hydrometeor., 10, 623–643.
Barstad, I., Sorteberg A. , Flatø F. , and Déqué M. , 2009: Precipitation, temperature and wind in Norway: Dynamical downscaling of ERA40. Climate Dyn., 33, 769–776, doi:10.1007/s00382-008-0476-5.
Best, M. J., and Coauthors, 2011: The Joint UK Land Environment Simulator (JULES), model description—Part 1: Energy and water fluxes. Geosci. Model Dev., 4, 677–699, doi:10.5194/gmd-4-677-2011.
Bondeau, A., and Coauthors, 2007: Modelling the role of agriculture for the 20th century global terrestrial carbon balance. Global Change Biol., 13, 679–706, doi:10.1111/j.1365-2486.2006.01305.x.
Bouwer, L. M., Vermaat J. E. , and Aerts J. C. J. H. , 2008: Regional sensitivities of mean and peak river discharge to climate variability in Europe. J. Geophys. Res., 113, D19103, doi:10.1029/2008JD010301.
Carsel, R. F., and Parrish R. S. , 1988: Developing joint probability distributions of soil water retention characteristics. Water Resour. Res., 24, 755–769.
Clark, D. B., and Coauthors, 2011: The Joint UK Land Environment Simulator (JULES), model description—Part 2: Carbon fluxes and vegetation dynamics. Geosci. Model Dev., 4, 701–722, doi:10.5194/gmd-4-701-2011.
Clark, M. P., Slater A. G. , Rupp D. E. , Woods R. A. , Vrugt J. A. , Gupta H. V. , Wagener T. , and Hay L. E. , 2008: Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resour. Res., 44, W00B02, doi:10.1029/2007WR006735.
Clark, M. P., Kavetski D. , and Fenicia F. , 2011a: Pursuing the method of multiple working hypotheses for hydrological modeling. Water Resour. Res., 47, W09301, doi:10.1029/2010WR009827.
Clark, M. P., McMillan H. K. , Collins D. B. G. , Kavetski D. , and Woods R. A. , 2011b: Hydrological field data from a modeller’s perspective: Part 2: Process-based evaluation of model hypotheses. Hydrol. Processes, 25, 523–543, doi:10.1002/hyp.7902.
Dankers, R., and Feyen L. , 2009: Flood hazard in Europe in an ensemble of regional climate scenarios. J. Geophys. Res., 114, D16108, doi:10.1029/2008JD011523.
Decharme, B., and Douville H. , 2007: Global validation of the ISBA sub-grid hydrology. Climate Dyn., 29, 21–37, doi:10.1007/s00382-006-0216-7.
Di Baldassarre, G., and Montanari A. , 2009: Uncertainty in river discharge observations: A quantitative analysis. Hydrol. Earth Syst. Sci., 13, 913–921, doi:10.5194/hess-13-913-2009.
Dirmeyer, P. A., 2011: A history and review of the Global Soil Wetness Project (GSWP). J. Hydrometeor., 12, 729–749.
Dirmeyer, P. A., Gao X. , Zhao M. , Guo Z. , Oki T. , and Hanasaki N. , 2006: GSWP-2: Multimodel analysis and implications for our perception of the land surface. Bull. Amer. Meteor. Soc., 87, 1381–1397.
Döll, P., Kaspar F. , and Lehner B. , 2003: A global hydrological model for deriving water availability indicators: Model tuning and validation. J. Hydrol., 270 (1–2), 105–134, doi:10.1016/S0022-1694(02)00283-4.
Döll, P., Fiedler K. , and Zhang J. , 2009: Global-scale analysis of river flow alterations due to water withdrawals and reservoirs. Hydrol. Earth Syst. Sci., 13, 2413–2432.
d’Orgeval, T., Polcher J. , and de Rosnay P. , 2008: Sensitivity of the West African hydrological cycle in ORCHIDEE to infiltration processes. Hydrol. Earth Syst. Sci., 12, 1387–1401, doi:10.5194/hess-12-1387-2008.
Duan, Q., and Coauthors, 2006: Model Parameter Estimation Experiment (MOPEX): An overview of science strategy and major results from the second and third workshops. J. Hydrol., 320 (1–2), 3–17, doi:10.1016/j.jhydrol.2005.07.031.
Dümenil, L., and Todini E. , 1992: A rainfall-runoff scheme for use in the Hamburg climate model. Advances in Theoretical Hydrology: A Tribute to James Dooge, J. P. O’Kane, Ed., Elsevier Science, 129–157.
Fader, M., Rost S. , Müller C. , Bondeau A. , and Gerten D. , 2010: Virtual water content of temperate cereals and maize: Present and potential future patterns. J. Hydrol., 384 (3–4), 218–231, doi:10.1016/j.jhydrol.2009.12.011.
FAO, 2003: Digital soil map of the world and derived soil properties. Food and Agriculture Organization of the United Nations, CD-ROM.
Feyen, L., and Dankers R. , 2009: Impact of global warming on streamflow drought in Europe. J. Geophys. Res., 114, D17116, doi:10.1029/2008JD011438.
Fuchs, T., 2009: GPCC Annual report for year 2008. Global Precipitation Climatology Centre Tech. Rep., DWD, 13 pp. [Available online at http://gpcc.dwd.de.]
Gao, X., and Dirmeyer P. A. , 2006: A multimodel analysis, validation, and transferability study of global soil wetness products. J. Hydrometeor., 7, 1218–1236.
Gerten, D., Schaphoff S. , Haberlandt U. , Lucht W. , and Sitch S. , 2004: Terrestrial vegetation and water balance—Hydrological evaluation of a dynamic global vegetation model. J. Hydrol., 286 (1–4), 249–270, doi:10.1016/j.jhydrol.2003.09.029.
Gerten, D., Rost S. , von Bloh W. , and Lucht W. , 2008: Causes of change in 20th century global river discharge. Geophys. Res. Lett., 35, L20405, doi:10.1029/2008GL035258.
Gleckler, P. J., Taylor K. E. , and Doutriaux C. , 2008: Performance metrics for climate models. J. Geophys. Res., 113, D06104, doi:10.1029/2007JD008972.
Gudmundsson, L., Tallaksen L. M. , and Stahl K. , 2011a: Spatial cross-correlation patterns of European low, mean and high flows. Hydrol. Processes, 25, 1034–1045, doi:10.1002/hyp.7807.
Gudmundsson, L., Tallaksen L. M. , Stahl K. , and Fleig A. K. , 2011b: Low-frequency variability of European runoff. Hydrol. Earth Syst. Sci., 15, 2853–2869, doi:10.5194/hess-15-2853-2011.
Gulden, L. E., Rosero E. , Yang Z.-L. , Wagener T. , and Niu G.-Y. , 2008: Model performance, model robustness, and model fitness scores: A new method for identifying good land-surface models. Geophys. Res. Lett., 35, L11404, doi:10.1029/2008GL033721.
Guo, Z., and Dirmeyer P. A. , 2006: Evaluation of the Second Global Soil Wetness Project soil moisture simulations: 1. Intermodel comparison. J. Geophys. Res., 111, D22S02, doi:10.1029/2006JD007233.
Guo, Z., Dirmeyer P. A. , Gao X. , and Zhao M. , 2007: Improving the quality of simulated soil moisture with a multi-model ensemble approach. Quart. J. Roy. Meteor. Soc., 133, 731–747, doi:10.1002/qj.48.
Gupta, H. V., Wagener T. , and Liu Y. , 2008: Reconciling theory with observations: Elements of a diagnostic approach to model evaluation. Hydrol. Processes, 22, 3802–3813, doi:10.1002/hyp.6989.
Haddeland, I., and Coauthors, 2011: Multimodel estimate of the global terrestrial water balance: Setup and first results. J. Hydrometeor., 12, 869–884.
Hagedorn, R., Doblas-Reyes F. J. , and Palmer T. N. , 2005: The rationale behind the success of multi-model ensembles in seasonal forecasting—I. Basic concept. Tellus, 57A, 219–233, doi:10.1111/j.1600-0870.2005.00103.x.
Hagemann, S., and Dümenil L. , 1998: A parametrization of the lateral waterflow for the global scale. Climate Dyn., 14, 17–31, doi:10.1007/s003820050205.
Hagemann, S., and Dümenil Gates L. , 2003: Improving a subgrid runoff parameterization scheme for climate models by the use of high resolution data derived from satellite observations. Climate Dyn., 21, 349–359, doi:10.1007/s00382-003-0349-x.
Hagemann, S., and Jacob D. , 2007: Gradient in the climate change signal of European discharge predicted by a multi-model ensemble. Climatic Change, 81 (Suppl.), 309–327, doi:10.1007/s10584-006-9225-0.
Hagemann, S., Göttel H. , Jacob D. , Lorenz P. , and Roeckner E. , 2009: Improved regional scale processes reflected in projected hydrological changes over large European catchments. Climate Dyn., 32, 767–781, doi:10.1007/s00382-008-0403-9.
Hanasaki, N., Kanae S. , Oki T. , Masuda K. , Motoya K. , Shirakawa N. , Shen Y. , and Tanaka K. , 2008: An integrated model for the assessment of global water resources—Part 1: Model description and input meteorological forcing. Hydrol. Earth Syst. Sci., 12, 1007–1025.
Hansen, J., Sato M. , Ruedy R. , Lo K. , Lea D. W. , and Medina-Elizade M. , 2006: Global temperature change. Proc. Natl. Acad. Sci. USA, 103, 14 288–14 293, doi:10.1073/pnas.0606291103.
Henderson-Sellers, A., Pitman A. J. , Love P. K. , Irannejad P. , and Chen T. H. , 1995: The Project for Intercomparison of Land Surface Parameterization Schemes (PILPS): Phases 2 and 3. Bull. Amer. Meteor. Soc., 76, 489–503.
Hirabayashi, Y., Kanae S. , Emori S. , Oki T. , and Kimoto M. , 2008: Global projections of changing risks of floods and droughts in a changing climate. Hydrol. Sci. J., 53, 754–772.
Hunger, M., and Döll P. , 2008: Value of river discharge data for global-scale hydrological modeling. Hydrol. Earth Syst. Sci., 12, 841–861, doi:10.5194/hess-12-841-2008.
Kavetski, D., Kuczera G. , and Franks S. W. , 2006: Bayesian analysis of input uncertainty in hydrological modeling: 1. Theory. Water Resour. Res., 42, W03407, doi:10.1029/2005WR004368.
Laaha, G., and Blöschl G. , 2007: A national low flow estimation procedure for Austria. Hydrol. Sci. J., 52, 625–644, doi:10.1623/hysj.52.4.625.
Lehner, B., Döll P. , Alcamo J. , Henrichs T. , and Kaspar F. , 2006: Estimating the impact of global change on flood and drought risks in Europe: A continental, integrated analysis. Climatic Change, 75, 273–299, doi:10.1007/s10584-006-6338-4.
Macadam, I., Pitman A. J. , Whetton P. H. , and Abramowitz G. , 2010: Ranking climate models by performance using actual values and anomalies: Implications for climate change impact assessments. Geophys. Res. Lett., 37, L16704, doi:10.1029/2010GL043877.
Manabe, S., 1969: Climate and the ocean circulation. I. The atmospheric circulation and the hydrology of the earth’s surface. Mon. Wea. Rev., 97, 739–774.
Martinez, G. F., and Gupta H. V. , 2010: Toward improved identification of hydrological models: A diagnostic evaluation of the “abcd” monthly water balance model for the conterminous United States. Water Resour. Res., 46, W08507, doi:10.1029/2009WR008294.
Materia, S., Dirmeyer P. A. , Guo Z. , Alessandri A. , and Navarra A. , 2010: The sensitivity of simulated river discharge to land surface representation and meteorological forcings. J. Hydrometeor., 11, 334–351.
McMillan, H., Freer J. , Pappenberger F. , Krueger T. , and Clark M. , 2010: Impacts of uncertain river flow data on rainfall-runoff model calibration and discharge predictions. Hydrol. Processes, 24, 1270–1284, doi:10.1002/hyp.7587.
Meigh, J. R., McKenzie A. A. , and Sene K. J. , 1999: A grid-based approach to water scarcity estimates for eastern and southern Africa. Water Resour. Manage., 13, 85–115, doi:10.1023/A:1008025703712.
Merz, R., and Blöschl G. , 2009: Process controls on the statistical flood moments—A data based analysis. Hydrol. Processes, 23, 675–696, doi:10.1002/hyp.7168.
Milly, P. C. D., Dunne K. A. , and Vecchia A. V. , 2005: Global pattern of trends in streamflow and water availability in a changing climate. Nature, 438, 347–350, doi:10.1038/nature04312.
Mitchell, T. D., and Jones P. D. , 2005: An improved method of constructing a database of monthly climate observations and associated high-resolution grids. Int. J. Climatol., 25, 693–712, doi:10.1002/joc.1181.
Moore, R. J., 1985: The probability-distributed principle and runoff production at point and basin scales. Hydrol. Sci. J., 30, 273–297.
Moore, R. J., 2007: The PDM rainfall-runoff model. Hydrol. Earth Syst. Sci., 11, 483–499, doi:10.5194/hess-11-483-2007.
Nasonova, O. N., Gusev Ye. M. , and Kovalev Ye. E. , 2011: Impact of uncertainties in meteorological forcing data and land surface parameters on global estimates of terrestrial water balance components. Hydrol. Processes, 25, 1074–1090, doi:10.1002/hyp.7651.
New, M., Hulme M. , and Jones P. , 1999: Representing twentieth-century space–time climate variability. Part I: Development of a 1961–90 mean monthly terrestrial climatology. J. Climate, 12, 829–856.
New, M., Hulme M. , and Jones P. , 2000: Representing twentieth-century space–time climate variability. Part II: Development of 1901–96 monthly grids of terrestrial surface climate. J. Climate, 13, 2217–2238.
Nohara, D., Kitoh A. , Hosaka M. , and Oki T. , 2006: Impact of climate change on river discharge projected by multimodel ensemble. J. Hydrometeor., 7, 1076–1089.
Oki, T., Nishimura T. , and Dirmeyer P. , 1999: Assessment of annual runoff from land surface models using Total Runoff Integrating Pathways (TRIP). J. Meteor. Soc. Japan, 77, 235–255.
Palmer, T. N., and Coauthors, 2004: Development of a European Multimodel Ensemble System for Seasonal-to-Interannual Prediction (DEMETER). Bull. Amer. Meteor. Soc., 85, 853–872.
Reichler, T., and Kim J. , 2008: How well do coupled models simulate today’s climate? Bull. Amer. Meteor. Soc., 89, 303–311.
Reitan, T., and Petersen-Øverleir A. , 2009: Bayesian methods for estimating multi-segment discharge rating curves. Stochastic Environ. Res. Risk Assess., 23, 627–642, doi:10.1007/s00477-008-0248-0.
Renard, B., Kavetski D. , Kuczera G. , Thyer M. , and Franks S. W. , 2010: Understanding predictive uncertainty in hydrologic modeling: The challenge of identifying input and structural errors. Water Resour. Res., 46, W05521, doi:10.1029/2009WR008328.
Roeckner, E., and Coauthors, 2003: The atmospheric general circulation model ECHAM 5. Part I: Model description. Max Planck Institute for Meteorology Tech. Rep. 349, 127 pp.
Rosero, E., Yang Z.-L. , Gulden L. E. , Niu G.-Y. , and Gochis D. J. , 2009: Evaluating enhanced hydrological representations in Noah LSM over transition zones: Implications for model development. J. Hydrometeor., 10, 600–622.
Rudolf, B., and Schneider U. , 2005: Calculation of gridded precipitation data for the global land-surface using in-situ gauge observations. Proc. Second Workshop of the International Precipitation Working Group, Monterey, CA, IPWG, 231–247. [Available online at http://gpcc.dwd.de.]
Schneider, U., Becker A. , Meyer-Christoffer A. , Ziese M. , and Rudolf B. , 2010: Global precipitation analysis products of the GPCC. Global Precipitation Climatology Centre Tech. Rep., DWD, 12 pp. [Available online at ftp://ftp.dwd.de/pub/data/gpcc/PDF/GPCC_intro_products_2008.pdf.]
Shorthouse, C., and Arnell N. , 1997: Spatial and temporal variability in European river flows and the North Atlantic oscillation. FRIEND’97—Regional Hydrology: Concepts and Models for Sustainable Water Resource Management, A. Gustard et al., Eds., IAHS, 77–85.
Shorthouse, C., and Arnell N. , 1999: The effects of climatic variability on spatial characteristics of European river flows. Phys. Chem. Earth, 24B (1–2), 7–13, doi:10.1016/S1464-1909(98)00003-3.
Smakhtin, V. U., 2001: Low flow hydrology: A review. J. Hydrol., 240 (3–4), 147–186, doi:10.1016/S0022-1694(00)00340-1.
Stahl, K., Hisdal H. , Tallaksen L. , van Lanen H. , Hannaford J. , and Sauquet E. , 2008: Trends in low flows and streamflow droughts across Europe. UNESCO Tech. Rep., 39 pp.
Stahl, K., and Coauthors, 2010: Streamflow trends in Europe: Evidence from a dataset of near-natural catchments. Hydrol. Earth Syst. Sci. Discuss., 7, 5769–5804, doi:10.5194/hessd-7-5769-2010.
Stahl, K., Tallaksen L. M. , Gudmundsson L. , and Christensen J. H. , 2011: Streamflow data from small basins: A challenging test to high-resolution regional climate modeling. J. Hydrometeor., 12, 900–912.
Takata, K., Emori S. , and Watanabe T. , 2003: Development of the minimal advanced treatments of surface interaction and runoff. Global Planet. Change, 38 (1–2), 209–222, doi:10.1016/S0921-8181(03)00030-4.
Teuling, A. J., Uijlenhoet R. , van den Hurk B. , and Seneviratne S. I. , 2009: Parameter sensitivity in LSMs: An analysis using stochastic soil moisture models and ELDAS soil parameters. J. Hydrometeor., 10, 751–765.
Teutschbein, C., and Seibert J. , 2010: Regional climate models for hydrological impact studies at the catchment scale: A review of recent modeling strategies. Geography Compass, 4, 834–860, doi:10.1111/j.1749-8198.2010.00357.x.
Todini, E., 1996: The ARNO rainfall–runoff model. J. Hydrol., 175 (1–4), 339–382, doi:10.1016/S0022-1694(96)80016-3.
Troy, T. J., Wood E. F. , and Sheffield J. , 2008: An efficient calibration method for continental-scale land surface modeling. Water Resour. Res., 44, W09411, doi:10.1029/2007WR006513.
Uppala, S. M., and Coauthors, 2005: The ERA-40 Re-Analysis. Quart. J. Roy. Meteor. Soc., 131, 2961–3012, doi:10.1256/qj.04.176.
Vogt, J., and Coauthors, 2007: A pan-European river and catchment database. JRC Reference Rep. EUR 22920 EN, 119 pp.
Weedon, G. P., Gomes S. , Viterbo P. , Österle H. , Adam J. C. , Bellouin N. , Boucher O. , and Best M. , 2010: The WATCH forcing data 1958–2001: A meteorological forcing dataset for land surface- and hydrological-models. WATCH Tech. Rep. 22, 41 pp. [Available online at http://www.eu-watch.org.]
Weedon, G. P., and Coauthors, 2011: Creation of the WATCH Forcing Data and its use to assess global and regional reference crop evaporation over land during the twentieth century. J. Hydrometeor., 12, 823–848.
Whitehouse, I., McSaveney M. , and Horrell G. , 1983: Spatial variability of low flows across a portion of the central Southern Alps, New Zealand. J. Hydrol., 20, 123–137.
Widén-Nilsson, E., Gong L. , Halldin S. , and Xu C.-Y. , 2009: Model performance and parameter behavior for varying time aggregations and evaluation criteria in the WASMOD-M global water balance model. Water Resour. Res., 45, W05418, doi:10.1029/2007WR006695.