## 1. Introduction

It has been long recognized by hydrometeorologists that significant improvements in hydrologic forecast lead time and accuracy could be achieved if accurate enough meteorological predictions are available at the watershed scale. While recent advances in numerical weather forecast have started providing some significant improvement in temperature predictions, the precipitation estimates are still often elusive because of the nonstationarity and the space–time variability of precipitation. Accurate site-specific predictions of precipitation beyond 3 days represent one of the most challenging tasks still facing hydrometeorologists.

At the same time, the rapid population growth and economic development, along with changing climate and land use/cover have imposed new challenges on water resources managers in many regions around the world. It is nowadays of utmost urgency to improve water resources management in various sectors (e.g., reservoir operation, water supply, and flood or drought mitigation). This typically requires accurate site-specific forecast of reservoir inflow and river flow with appropriate lead times. Many studies have sought to examine the potential of hydrologic models for longer lead time (up to 2 weeks ahead) forecasting of hydrologic variables for more efficient water resources management and planning. In most of the previous studies, the forecast lead time was less than 7 days ahead because usually when the lead time increases beyond 3 days, the model performance deteriorates drastically (Goswami and O’Connor 2007; Sivakumar et al. 2002; Karunasinghe and Liong 2006; Coulibaly et al. 2001). Some studies have attempted to include downscaled predictors from large-scale climate or weather forecasting models into hydrologic models (Leung et al. 1999; Hay et al. 2000; Bergström et al. 2001). It was observed that daily temperature and precipitation are the principal atmospheric forcing parameters required for hydrologic modeling, and a spatial resolution of 0.125° is generally sufficient to simulate river flows. Another study focused on assessing the impact in term of uncertainty inherent in the use of imperfect meteorological predictions in real-time spring flow forecasting (Coulibaly 2003). It was observed that, even with large prediction errors, meteorological forecasts can provide significant improvement of spring flow forecast up to 7 days’ lead time. However, to our best knowledge, none of those studies have investigated the use of downscaled ensemble weather predictions for longer lead time (week 2) forecasting. Operationally used lead times for reservoir operation and planning can range from 6 hours to 12 days, where the first week forecasts are mostly used for reservoir operation and scheduling and the second week forecasts are essentially used for short-term planning. Providing accurate and longer lead times for site-specific forecasts are not only essential for optimal reservoir management, but also critical for flood mitigation. However, this study focuses only on daily streamflow and reservoir inflow forecasts (up to week 2) routinely used for reservoir operation and planning.

In general, climate or weather forecasting models are run at much coarser resolutions (typically 2° or more) and do not resolve important mesoscale processes and surface features that control the regional- or catchment-scale precipitation. Downscaling methods were initially developed for generating high-resolution data from global climate model (GCM) outputs. Kalnay et al. (1996) suggested that National Centers for Environmental Prediction–National Center for Atmospheric Research (NCEP–NCAR) predictions shall be used as the large-scale predictor fields in an analogous manner to which a general circulation model would be used in a climate change study. Wilby et al. (2000) examined the hydrological responses in the Animas River basin (Colorado) to dynamically and statistically downscaled model outputs. Compared with raw NCEP output, downscaled climate variables provided more realistic simulations of basin-scale hydrology. Given the large systematic biases and the poor skill present in NCEP precipitation and temperature estimates in some regions, it is necessary to explore methods that may improve upon these global-scale models (Hay and Clark 2003). It is observed from the previous studies that hydrologic models based on the downscaled predictors could play a significant role in improving hydrologic forecast accuracy and lead time.

In this study, the selected downscaling methods are used to generate local-scale daily precipitation and temperature series from the ensemble weather predictions provided by the NCEP Global Forecast System (GFS), which has a resolution of 2.5°. Three downscaling models are used and their performances are compared. The downscaled daily meteorological predictions of the two best models are then used as inputs in two different hydrological models to improve week-2 hydrologic forecasting.

## 2. Study area and data description

The study area selected for applying and evaluating the downscaling methods and the hydrological models is the Chute-du-Diable (CDD) basin and the Serpent River basin located in the Saguenay–Lac-Saint-Jean watershed (Fig. 1), which is a well-known flood-prone region in northeastern Canada. There are a large number of reservoirs and dams in the Saguenay watershed and most of the large reservoirs are managed by the Aluminum Company of Canada (ALCAN) for hydroelectric power production. The Chute-du-Diable basin has an area of 9700 km^{2} and is located in the eastern part of the Saguenay watershed. The Serpent River basin is located at the middle of the watershed and has an area of 1760 km^{2}.

This study area was chosen because of the availability of good hydrometeorological records for a long period in those particular basins. A total of 23 years (1979–2001) of historical total daily precipitation (Prec, in mm), mean maximum temperature (Tmax, in °C), and mean minimum temperature (Tmin, in °C) series for the two basins were obtained from the ALCAN hydrometeorological network and used as predictands in the downscaling models and as additional predictors in the hydrological forecasting models. Specifically, for the Serpent River basin, the precipitation and temperature data were obtained from the Chute-des-Passes (CDP) meteorological station (station ID 7061541) located near the Serpent basin with latitude and longitude of 49.9°N, 71.25°W. For the Chute-du-Diable basin, the precipitation and temperature data were obtained from both the CDP and the CDD meteorological stations. The latter station is located at 48.75°N, 71.7°W with ID 7061560. For a description of the study area and data, the reader can be referred to Coulibaly et al. 2005.

The predictors used for the downscaling models are collected from NCEP GFS modeling system. The National Oceanic and Atmospheric Administration (NOAA)/Cooperative Institute for Research in Environmental Sciences (CIRES) Climate Diagnostic Center has undertaken a reforecasting project providing retrospective numerical ensemble forecasts. An unchanged version of NCEP’s Global Forecast System at T62 resolution is used to generate 15-day real-time forecast scenarios (30 time steps of 12 hours each). Forecasts are run every day from 0000 UTC initial conditions from 1979 to present. There are 15-member ensemble forecasts that are generated from 15 initial conditions consisting of a reanalysis and seven pairs of bred modes (Hamill et al. 2004). The global latitude–longitude (lat–lon) grid has a large-scale resolution of 2.5° both in longitude and latitude and contains 144 × 73 grid points. The map of the NOAA ensemble forecast grid points and the local meteorological stations is shown in Fig. 2, where the blue points are the grid points and the two red stars are the meteorological stations. The global data were collected directly from the reforecast project FTP server. The 3D ensemble data has 12 files for 12 variables per day, and for each variable there are 15 forecast range (Fr) or time delays, and for each delay, there are 15 members. In this study, the eight variables processed and used are shown in Table 1. For a more detailed description of the ensemble weather predictions, the reader is referred to Hamill et al. 2004 and Liu et al. 2008.

Map of meteorological stations (stars) and NOAA ensemble forecast grid points (points in circles).

Citation: Journal of Hydrometeorology 12, 6; 10.1175/2011JHM1366.1

Map of meteorological stations (stars) and NOAA ensemble forecast grid points (points in circles).

Citation: Journal of Hydrometeorology 12, 6; 10.1175/2011JHM1366.1

Map of meteorological stations (stars) and NOAA ensemble forecast grid points (points in circles).

Citation: Journal of Hydrometeorology 12, 6; 10.1175/2011JHM1366.1

NOAA reforecast ensemble variable fields.

In the hydrological forecasting experiments, in addition to the downscaled data (precipitation and temperature), the historical precipitation and temperature data from the two meteorological stations described above are used as predictors to forecast the short-term Serpent River flows and forecast the reservoir inflows in Chute-du-Diable. The observed daily flow data for the Serpent River basin are obtained from a hydrometric station (station ID 062214) located at 49.41°N and 71.22°W. For this station, 11 years of observed daily river flow data is available, from 1991 to 2001, among which the first 8 years (1991–98) of data are used to calibrate the hydrologic models, and the last 3 years (1999–2001) of data are used to validate the models. The observed daily reservoir inflow data are for the whole Chute-du-Diable catchment (Fig. 1), and 23 years of daily inflow data (1979–2001) are used in the study—the first 18 years (1979–96) of data are used for model calibration, and the last 5 years (1997–2001) are used for the validation.

## 3. Downscaling ensemble weather forecasts

### a. Downscaling methods

Three data-driven methods are used, specifically (i) a temporal neural network: the time-lagged feed-forward neural network (TLFN), (ii) an elaborate version of genetic programming: evolutionary polynomial regression (EPR), and (iii) a conventional multiple regression method: the statistical downscaling model (SDSM). The three methods selected have been first investigated for the downscaling of the ensemble weather forecasts using only the mean of the 15 members of each forecast time delay (or lead time) from each variable used (Liu et al. 2008). It was found that the TLFN and EPR are more efficient downscaling techniques than the SDSM for both the daily precipitation and temperature. However, it was unclear if using the mean of the 15 members of each forecast time delay was the best approach for downscaling the ensemble weather forecasts. Therefore, in this study, the three downscaling methods are further investigated using different downscaling experiments as described hereafter in section 3b. The description of the downscaling methods is limited herein; the reader can be referred to Liu et al. (2008). A detailed description of the SDSM model is provided by Wilby et al. (2002).

#### 1) TLFN

The TLFN used in the downscaling experiments is characterized by a short-term memory structure (namely tap delay). The size (or depth) of the memory structure depends on the number of past samples that are needed to describe the input characteristics in time and it has to be determined on a case-by-case basis (Coulibaly et al. 2005). The advantage of TLFN is that it shares some of the salient properties of feed-forward neural networks, but in addition they can capture the temporal patterns present in the input time signals. A key advantage of the TLFN is that it is less complex than the conventional time delay and recurrent networks and has similar temporal patterns processing capability, and can still be trained with the classical or a variant of the back-propagation algorithm (Coulibaly et al. 2005). For a detailed description of the TLFN model used herein, the reader is referred to Coulibaly et al. 2005.

The major changes to the TLFN in this study include the network learning rule, the transfer functions, and the network architecture (number of hidden units and memory structure order), which are problem dependent. The optimal number of hidden units (or processing elements) was determined by automatically varying the number of hidden units from 1 to 50, and the optimal number based on the mean-square error was selected. Cross validation was used to stop network training when the generalization error starts increasing, thus the optimal model identified is assumed to have good generalization potential. Different optimal numbers of hidden units were identified for the three variables (Prec, Tmax, and Tmin) at each meteorological station. The TLFN model parameters are summarized in Table 2. For the downscaling of precipitation, the tap-delay order (or memory depth) was 2, a sigmoid function was used in the hidden layer, and a linear function in the output layer. The optimal number of hidden units was 4 for the CDD station and 10 for the CDP station. For the downscaling of Tmax and Tmin, the memory depth was 2 and 3 for CDD and CDP, respectively, and a hyperbolic tangent sigmoid was used in the hidden layer and a linear function in the output layer. The optimal number of hidden units for Tmax was 2 and 8 for CDD and CDP, respectively, while the optimal number of hidden units for Tmin was 8 and 21 for CDD and CDP, respectively (Table 2). The differences in the optimal number of hidden units reflect the varying complexity of the physical processes controlling local climate (e.g., orographic effect, wind speed/direction, forest, and lake). For all the TLFN models, an adaptive step-size algorithm—namely Delta-Bar-Delta—was used for network training. A salient feature of the Delta-Bar-Delta algorithm is that it includes momentum, and can automatically increase or decrease the learning rate when necessary (Principe et al. 2000). The same input variables are used in all the downscaling models, and the selection of those inputs is described in section 3b.

TLFN model parameters.

#### 2) EPR

*y*is the least squares estimate of the target value,

*a*is an adjustable parameter for the

_{i}*i*th term,

*a*

_{0}is an optional bias,

*k*is the number of terms/parameters of the expression, and

*z*is a transformed variable. Conversely, in the EPR method, it is necessary to transform Eq. (1) into the following vector form (Giustolisi and Savic 2005):

_{i}*Y*

_{N}_{×1}(

*θ*,

*Z*) is the least squares estimate vector of the

*N*target values,

*θ*

_{d}_{×1}is the vector of

*d*

_{k}_{+1}parameters

*a*and

_{i}*a*

_{0}(

*θ*^{T}is the transposed vector), and

**I**, unitary vector for bias

*a*

_{0}, and

*k*vectors of variables

*Z*that for fixed

_{i}*i*are a product of the independent predictor vectors of inputs:

**X**= (

*X*

_{1}

*X*

_{2}…

*X*) (Liu et al. 2008). The key idea behind the EPR approach is to use evolutionary search for exponents of polynomial expressions by means of a GA engine (Giustolisi and Savic 2005). This allows (i) easy computational implementation of the algorithm, (ii) efficient search for an expression (or formula), (iii) improved control of the complexity of the expression generated, and (iv) a small number of search parameters to be prespecified (Giustolisi and Savic 2005). For a more detailed description of the EPR method, readers are referred to other sources, such as Giustolisi and Savic (2005) and Liu et al. (2008).

_{n}### b. Selection of predictors

In order to identify a suitable approach for using the ensemble weather data in the downscaling experiments, six scenarios or cases of input data are explored and their results are compared. This analysis is performed for the CDD station. The first four cases use the predictors only from grid point (4, 3) [(longitude, latitude); see Fig. 2], the fifth case explores the predictors from all the four grid points around the CDD station, and finally the sixth case includes the variables from grid points (4, 3) and (3, 3), which are the nearest to the CDD station. The six cases are explained respectively as follows:

Case 1: uses the mean of the 15 members of each time delay from each variable as predictors for the downscaling experiment.

Case 2: uses the first member (M1) of each time delay from each variable as predictors for the downscaling.

Case 3: principal component analysis (PCA) of the 15 members of each time delay from each variable is performed, and the first principal components are used as predictors.

Case 4: the means of the 15 members from all the delays of the variables at grid point (4, 3) are calculated, and then the mean series are screened through sensitivity analysis to identify the most significant (or relevant) predictors for each lead time.

Case 5: the fourth case is repeated for all the four grid points [(4, 3), (3, 3), (3, 4), and (4, 4)] around the CDD station (see Fig. 2). The relevant variables identified through the sensitivity analysis are used as predictors for downscaling.

Case 6: sensitivity analysis is applied to the means of the members of each time delay for all the variables from grid points (4, 3) and (3, 3), which are the closest to the CDD station, and then the significant predictors are used as input for the downscaling.

The sensitivity analysis involves the selection of TLFN architecture as explained in section 3b(1) and the use of all the predictors as inputs, the objective being to assess the relative importance of each predictor variable. The sensitivity analysis procedure measures the relative importance among the predictors by calculating how the model output varies in response to variation of an input variable. The relative sensitivity of the model to each input is calculated by dividing the standard deviation of the output by the standard deviation of the input, which is varied to create the output (Coulibaly et al. 2005). The results provide a measure of the relative importance of each input (predictor) in the particular input–output transformation. The network is then retrained with the few selected (most relevant) predictor variables.

Once the predictors for all the six cases are selected, they are used to construct downscaling models with the SDSM and TLFN methods. All potential optimal combinations of the parameters of SDSM and TLFN models are tested to get the best performance for each model. The performance of the models is evaluated by three statistics: the mean-square error (MSE), the normalized mean-square error (NMSE; NMSE = MSE/variance of desired output), and the correlation coefficient (*r*) between model output and desired output for the test period. The SDSM and TLFN model performances for all six cases are shown in Table 1. It can be seen that when applying TLFN models in downscaling Prec, all the six cases have quite close results: the NMSE are all around 0.65 and correlations are all around 0.61. This may suggest that all the six cases provide similar information in downscaling precipitation. When SDSM method is applied in downscaling Prec, based on the NMSE, the results of cases 1, 4, 5, and 6 are similar and better than the other two cases. Among these four cases, the correlation statistics for case 6 is a little bit better than the other cases. For downscaling Tmax, TLFN models have quite good performance in almost all the cases except case 5, where the NMSE is 0.22 compared with around 0.6 or 0.5 in other cases. SDSM models have obviously better performance in case 6, where the NMSE is 0.08 compared with 0.38 or 0.44 in other cases. For Tmin, the TLFN models have quite close performance with that in downscaling Tmax. The SDSM model results are better with cases 4, 5, and 6 than other cases. The comparative results (Table 3) indicate that the mean of 15 members of each delay (case 1) used in Liu et al. (2008) does not appear to be the optimal approach for downscaling the ensemble weather predictions.

Performance of downscaling models for Prec, Tmax, and Tmin for all six cases at the CDD station.

Based on model performance and the lower number of predictors used, case 6 is selected as the optimal approach for downscaling the ensemble weather predictions for both the CDD and CDP stations using SDSM, TLFN, and EPR models. Thus for the CDP station, the same screening method (case 6) is applied and significant predictors from the two closest grid points, (3, 4) and (3, 3), are selected. The selected predictors at CDD and CDP stations are presented in Tables 4 and 5, respectively.

Predictors selected in case 6 for downscaling meteorological data at the CDD station; (4, 3)Fr0 indicates grid point (4, 3) and forecast range (Fr) for delay 0 (which is the forecast for day 1).

Predictors selected for downscaling meteorological data at the CDP station.

### c. Downscaling results

Validation statistics in terms of MSE, NMSE, and *r* are used to evaluate the downscaling models’ performance. Table 6 shows all the performance statistics of SDSM, TLFN, and EPR models result in downscaling Prec, Tmax, and Tmin at CDD and CDP stations. For the CDD station, the model performance statistics show that the three models do not perform very well in downscaling Prec, the correlation between the downscaled data and the observed data is no more than 0.62, and the NMSE is more than 0.65. However, TLFN and EPR perform much better than SDSM, considering their NMSE is about 0.65 compared with 1.57 for SDSM model results. For Tmax and Tmin, all three models perform pretty well in downscaling Tmax and Tmin, especially for TLFN and EPR: the NMSE is approximately 0.05 and the correlation is about 0.97. One surprising thing is that the comparative results indicate that TLFN and EPR have very similar performance in downscaling precipitation, Tmax, and Tmin. For the CDP station, judging by the statistics, all three models perform quite well in downscaling temperature: the correlation statistics for all three models are above 0.95 and the NMSE statistics are under 0.1. Still, the TLFN and EPR perform slightly better than SDSM.

Downscaling results for Prec, Tmax, and Tmin at CDD and CDP stations.

In the following section, the downscaled daily precipitation and temperature data from SDSM and TLFN are used as input to the hydrologic models to improve the short-term reservoir inflow and river flow forecasting. Although in most cases, TLFN and EPR have similar performance, the TLFN model results are selected because the model is better established than EPR and is well understood by the authors. SDSM model results are also applied for comparison purposes because it is one of the most widely used downscaling models.

## 4. Hydrologic forecasting models

### a. The Hydrologiska Byråns Vattenbalansavdelning (HBV) Model

The HBV model was developed to cover the most important runoff-generating processes by using the most simple and robust structure possible, and it belongs to the class of semidistributed conceptual models (Khan and Coulibaly 2006). The HBV model was developed at the Swedish Meteorological and Hydrological Institute by Bergström (1976) and Lindström et al. (1996) and was named after the abbreviation of Hydrologiska Byråns Vattenbalansavdelning (Hydrological Bureau Water Balance Section). The HBV model has been applied to a wide range of applications including the analysis of extreme floods, effects of land-use change, and impacts of climate change (Arheimer and Brandt 1998; Brandt 1990; Lidén and Harlin 2000; Dibike and Coulibaly 2005).

In general, the input data required for this model are observed precipitation, air temperature, and estimates of potential evaporation. The model has a routine for snow accumulation and snowmelt based on a degree-day relation with an altitude correction of temperature. Air temperature data are used for the calculation of snow accumulation and melt. Air temperature can also be used to calculate potential evapotranspiration, or to adjust potential evapotranspiration when the temperature deviates from normal values. In the case where elevations in a basin are highly variable, the basin can be subdivided into elevation zones for the snow and soil moisture routines. Each elevation zone can be divided further into different vegetation zones (forested and nonforested areas). The model can be run independently for several subbasins and their contributions are combined into a full basin output. For further description of the HBV model as used herein, the reader can be referred to Khan and Coulibaly (2006). The model was selected for this study because it has been shown to outperform other physically based models for streamflow modeling in the Saguenay watershed (Dibike and Coulibaly 2005, 2007).

### b. Bayesian neural network (BNN)-based hydrologic model

Recent advances in the application of data-driven methods such as artificial neural networks (ANNs) for rainfall-runoff modeling have shown promising results. Many studies have demonstrated the capacity of ANNs for improved hydrologic modeling and forecasting (see ASCE Task Committee on Application of Artificial Neural Networks in Hydrology 2000 for a review). The main conclusions of those studies are that ANNs can be considered as robust modeling alternatives to the conceptual and physically based hydrologic models where data and resources are not available to apply the latter. However, there are major limitations in the conventional neural network approach (Coulibaly et al. 2001). One of the main limitations is that the network is trained by minimizing an error function in order to obtain the best set of parameters starting with an initial random set of parameters. In this method, a complex model can fit the training data well but it does not necessarily mean that it will provide smaller errors with respect to new data. This happens because of not considering the uncertainty about the model parameters or the uncertainty about the relationship between the input and output mapped by the network during the training process. A salient feature of the Bayesian learning approach for neural networks is that it can overcome this problem and prevent the occurrence of overfitting while dealing with new data. In addition, it can provide prediction with uncertainty estimates in the form of confidence intervals.

The BNN-based hydrologic model used herein was proposed by Khan and Coulibaly (2006) for daily streamflow modeling in the Saguenay watershed. The BNN model was shown to be very competitive with the HBV model for 1-day-ahead streamflow forecast as well as for assessing hydrologic impact of climate change with uncertainty estimate (Khan and Coulibaly 2010). Therefore, the two models (HBV and BNN) appear to be good candidates for investigating the use of downscaled weather predictions for improved hydrologic forecasts in the same study area. For a full description of the BNN model, readers are referred to Khan and Coulibaly (2006).

The major changes to the BNN model include the number of hidden units and the transfer functions, which are problem dependent. The optimal number of hidden units was determined by varying the number of hidden units from 1 to 50, and selecting the number with the lowest MSE. For the modeling of reservoir inflows and streamflows without using downscaled meteorological data, the optimal number of hidden units was 13 and 19, respectively. When TLFN downscaled data are used, the optimal number of hidden units was 16 and 25 for reservoir inflows and streamflows, respectively. Similarly, when SDSM downscaled data are used in BNN, the optimal number of hidden units was 18 and 25 for reservoir inflows and streamflows, respectively. For all the BNN models, a hyperbolic tangent sigmoid was used in the hidden layer and a linear function at the output layer.

### c. Selection of predictors for hydrologic models

Three hydrologic modeling experiments are run for each model based on the following input sets (Fig. 3):

Case 1: observed meteorological data without downscaled data.

Case 2: observed meteorological data and TLFN downscaled data.

Case 3: observed meteorological data and SDSM downscaled data.

*t*+ 1 to

*t*+ 14, the following predictors are used: (i) total daily precipitation of the last 12 days starting from

*t*to

*t*− 11 has been considered as 12 separate inputs, (ii) the moving sum of the last four weeks’ snowfall as another input, (iii) the mean daily temperature at

*t*, (iv) the moving average of last the two weeks’ mean daily temperature, (v) flow at time

*t*, and (vi) months (January–December) as logical inputs to account for seasonal variability. In total, 28 input vectors have been used to simulate the river flow. Similarly, for daily reservoir inflow predictions from time

*t*+ 1 to

*t*+ 14, the following input variables are used: (i) total daily precipitation of the last 7 days starting from

*t*to

*t*− 6 has been considered as seven separate inputs, (ii) the moving sum of the last 10 weeks’ snow as another input, (iii) the mean daily temperature at

*t*− 1, (iv) the moving average of the last 9 weeks of mean daily temperature, (v) the flow at time

*t*− 1, and (vi) the months (January–December) as a logical inputs. This experiment represents the conventional (or business as usual) approach for river flow or reservoir inflow forecasting in the study area.

Flowchart showing the connections between data sources, downscaling methods, and hydrologic models used in the study.

Citation: Journal of Hydrometeorology 12, 6; 10.1175/2011JHM1366.1

Flowchart showing the connections between data sources, downscaling methods, and hydrologic models used in the study.

Citation: Journal of Hydrometeorology 12, 6; 10.1175/2011JHM1366.1

Flowchart showing the connections between data sources, downscaling methods, and hydrologic models used in the study.

Citation: Journal of Hydrometeorology 12, 6; 10.1175/2011JHM1366.1

For cases 2 and 3 where downscaled precipitation and temperature data are included in the model inputs, when HBV model is applied for *t* + *n* (*n =* 1, 2, … 14) reservoir inflow or river flow forecasting, the downscaled precipitation and temperature for *t* + *n* − 1 and *t* + *n* are included in the input data. Similarly, when the BNN model is applied for *t* + *n* reservoir inflow or river flow forecasting, the downscaled precipitation and temperature for *t* + *n* − 1 and *t* + *n* are included in the input data.

### d. Reservoir inflow forecasting results

Each model performance is evaluated by three statistics: the root-mean-square error (RMSE, in m^{3} s^{−1}), the correlation coefficient (*r*), and the Nash and Sutcliffe model efficiency index (*R*^{2}). From Table 7, for case 1 (without downscaled data), it can be seen that the HBV model has good performance in forecasting up to 5-day-ahead inflows, the *r* values are all above 0.85, and the *R*^{2} statistics are about or greater than 0.7, but the model performance starts deteriorating when forecasting more than 5-day-ahead inflows. When the TLFN downscaled data are included in the hydrologic forecasting, a significant decrease in the RMSE is obtained from 6–14-day-ahead forecasts compared with the model without downscaled data. The RMSE decreases by about 12% (for day 7) and 21% (for day 14), suggesting that the use of weather predictions did provide some improvement for week-2 reservoir inflow forecasts. However, for the first 5 days, there is no improvement, and even some deterioration is observed, which is not surprising because for short-term (1–5-day ahead) forecast, the memory effect of past streamflow and past meteorological information is more important than the imperfect weather predictions even if downscaled. Therefore, for the short-term predictions, the downscaled data seem to increase the number of predictors without providing useful information to the model. Conversely, for longer term (week 2) hydrologic predictions, the memory effect of the historical flow and meteorological information seem to disappear, and then the downscaled data play a more significant role in preventing a drastic deterioration of the forecast. Specifically, for the 14-day-ahead forecasts, the *R*^{2} increases from 0.39 to 0.61, which is a significant improvement of the model forecast skill.

HBV model performance statistics for up to 14-day-ahead reservoir inflow forecasting.

When the SDSM downscaled data are included in the hydrologic forecasting, there is also some improvement from 6–14-day-ahead forecasts. The RMSE decreases by about 14% on average for week-2 reservoir inflow forecast. However, for 1–5-day-ahead forecasting, the model performs worse, and the RMSE increased by about 10% to 20%. This may be caused by the poor performance of the SDSM model, especially in downscaling precipitation data. The smaller improvement obtained for week-2 forecasts is also probably due to the lower quality of the SDSM downscaled precipitation as compared to the TLFN model results.

Table 8 shows the BNN model forecast results for up to 14-day-ahead reservoir inflows. For case 1 (without downscaled data), the BNN model has good performance up to 7-day-ahead forecasting and the *R*^{2} values are more than 0.70. For all the forecast range, the BNN model outperforms the HBV model (Table 8). However, when TLFN downscaled data are included, the model does not show obvious improvement as the decrease in RMSE is very slight. When SDSM downscaled data are included, the model performs a little bit worse, especially from days 5–14. It can be concluded from these results that the BNN provides no obvious improvements when the downscaled data are included in the reservoir inflow forecasting. This may suggest that the BNN model is not able to capture the relevant information from the downscaled weather forecasts that the HBV model is able to capture. Those results are consistent with previous study results that showed that physically based and conceptual models appear the most appropriate for the use of meteorological predictions in hydrologic forecasting (Georgakakos and Hudlow 1984; Wilby et al. 2000; Coulibaly 2003).

BNN model performance statistics for up to 14-day-ahead reservoir inflow forecasting.

To further substantiate the hydrologic models’ performance, observed and predicted 8- and 14-day-ahead reservoir inflow from the HBV model are plotted in Fig. 4. It can be seen that the use of TLFN downscaled data significantly improves the peak flow forecasts. Figure 5 shows the scatterplots of 8- and 14-day-ahead reservoir inflow forecasts versus observed using the HBV model. Figure 5 confirms that the use of TLFN downscaled data provides significant improvement as compared to the case where no meteorological forecasts are used. The HBV model forecast skill is improved for peak, average, and low flows, suggesting that the large-scale weather predictions provided by the NCEP GFS can be useful for improved week-2 hydrologic forecasts.

Comparison of hydrographs for 8- and 14-day-ahead reservoir inflow forecasts using the HBV model.

Citation: Journal of Hydrometeorology 12, 6; 10.1175/2011JHM1366.1

Comparison of hydrographs for 8- and 14-day-ahead reservoir inflow forecasts using the HBV model.

Citation: Journal of Hydrometeorology 12, 6; 10.1175/2011JHM1366.1

Comparison of hydrographs for 8- and 14-day-ahead reservoir inflow forecasts using the HBV model.

Citation: Journal of Hydrometeorology 12, 6; 10.1175/2011JHM1366.1

Scatterplots of observed vs predicted reservoir inflow using the HBV model.

Citation: Journal of Hydrometeorology 12, 6; 10.1175/2011JHM1366.1

Scatterplots of observed vs predicted reservoir inflow using the HBV model.

Citation: Journal of Hydrometeorology 12, 6; 10.1175/2011JHM1366.1

Scatterplots of observed vs predicted reservoir inflow using the HBV model.

Citation: Journal of Hydrometeorology 12, 6; 10.1175/2011JHM1366.1

### e. River flow forecasting results

Similar to the reservoir inflow forecasting, in this section, up to 14-day-ahead Serpent River flows are forecasted based on the three different cases of input data. The forecasting results using HBV and BNN are presented in Tables 9 and 10, respectively. From Table 9, for case 1, it can be seen that the HBV model performs well in forecasting up to 4-day-ahead river flow, the *r* values are above 0.85, and the *R*^{2} statistics are about or more than 0.70, but the model performance deteriorates when forecasting more than 4-day-ahead river flow. When TLFN downscaled data are included in the river flow forecasting, the model performance improves from 5–14-day-ahead forecasting, compared to the model run without the downscaled data. The RMSE decreases by about 20% on average for week-2 river flow forecast. When SDSM downscaled data are included in the river flow forecasting, there are also some improvements from 5–14-day-ahead forecasting. The RMSE decreases by about 17% on average for week-2 river flow forecasts.

HBV model performance statistics for up to 14-day-ahead river flow forecasting.

BNN model performance statistics for up to 14-day-ahead river flow forecasting.

Table 10 shows the BNN model results in forecasting up to 14-day-ahead river flows. It appears that the BNN model performs well for up to 5-day-ahead forecasting, the *R*^{2} values are about or more than 0.70, and the *r* statistics are more than 0.82. When TLFN or SDSM downscaled data are included, the model does not show any improvement; the RMSE even increases slightly. This suggests that further investigation of the BNN model is required including the use of memory structure to understand why it is unable to capture relevant information from the downscaled meteorological data.

To further substantiate the hydrologic models’ performance on river flow forecasting using downscaling weather data, observed and predicted 8- and 14-day-ahead river flow from the HBV model are plotted in Fig. 6. It can be seen that overall, the use of SDSM downscaled data significantly improve the spring peak flow forecast, which is particularly important in cold and snowy watersheds. Figure 7 shows the scatterplots of 8- and 14-day-ahead river flow forecasts versus observed using the HBV model. Overall, it can be seen that for peak, average, and low flows, there is improved forecast skill up to 14 days ahead when downscaled weather data are used. Those results (Fig. 7) further substantiate the HBV model performance statistics shown in Table 9, and indicate that improved week-2 hydrologic forecasts can be achieved with downscaled weather data from NCEP GFS.

Comparison of hydrographs for 8- and 14-day-ahead river flow forecasts using the HBV model.

Citation: Journal of Hydrometeorology 12, 6; 10.1175/2011JHM1366.1

Comparison of hydrographs for 8- and 14-day-ahead river flow forecasts using the HBV model.

Citation: Journal of Hydrometeorology 12, 6; 10.1175/2011JHM1366.1

Comparison of hydrographs for 8- and 14-day-ahead river flow forecasts using the HBV model.

Citation: Journal of Hydrometeorology 12, 6; 10.1175/2011JHM1366.1

Scatterplots of observed vs predicted river flow using the HBV model.

Citation: Journal of Hydrometeorology 12, 6; 10.1175/2011JHM1366.1

Scatterplots of observed vs predicted river flow using the HBV model.

Citation: Journal of Hydrometeorology 12, 6; 10.1175/2011JHM1366.1

Scatterplots of observed vs predicted river flow using the HBV model.

Citation: Journal of Hydrometeorology 12, 6; 10.1175/2011JHM1366.1

### f. Discussion

Although the HBV model results show some improvements for week-2 river flow and reservoir inflow forecasting when downscaled data are used, it is unclear what role the poor accuracy of the downscaled precipitation plays as compared to the downscaled temperature. To investigate the impact of precipitation and temperature respectively, the following experiment is run. The HBV model is calibrated and tested using only the spring (March–May) data, and is thus called the HBV seasonal model. Two experiments are performed. In the first run, the seasonal model is applied using only TLFN downscaled temperature, while in the second run, both TLFN downscaled temperature and precipitation data are used. Given the important role played by temperature in the spring snowmelt process in the study area, it is anticipated that the high quality (or accuracy) of the downscaled temperature should help to achieve better forecast performance with the HBV seasonal model.

Table 11 shows the seasonal model performance statistics for spring reservoir inflow forecasting. It appears that when only downscaled temperature is used in the HBV seasonal model, the forecast improvements for week 2 range from 16% to 24%. When both the downscaled temperature and precipitation are used, the forecast improvements for week 2 range from 21% to 31%. Those results indicate that about 75% of the improvement in the week-2 spring reservoir inflow forecast can be attributed to the use of the downscaled temperature. Therefore, given that the downscaled temperature is quite accurate, it can be recommended for improved week-2 spring peak flow forecasting in cold and snowy climate. Weather predictions are not only of high value to the general public (Lazo et al. 2009), they can also be of particular importance for hydrologists if properly processed and used. It is shown that even week-2 weather predictions can be very useful for producing significantly improved week-2 hydrologic forecasts (Table 11). The results in Table 11 also suggest that improving the accuracy of the downscaled precipitation will yield further improvement of the hydrologic model forecast skill for week 2. Some suggestions for improving the downscaling of precipitation and the BNN-based hydrologic model are provided in the following conclusion.

Comparison of HBV seasonal model performance statistics for up to 14-day-ahead spring reservoir inflow forecasting using downscaled temperature only and downscaled temperature and precipitation.

## 5. Conclusions and suggestions for further research

As has been shown in the above downscaling and hydrologic forecasting experiments’ results, the NCEP GFS ensemble weather predictions can be effectively downscaled and used for improved week-2 hydrologic forecasting. Through the comparison of the results of the six cases investigated, it is shown that in this case, the best approach to downscaling ensemble weather predictions is to use the means of the members derived from the two grid points closest to the meteorological station of interest. Improved downscaling results can be obtained by using the TLFN, EPR, or the SDSM. However, the former two outperform the latter for downscaling both precipitation and temperature. It is also shown that the EPR model can be a good alternative to the TLFN for downscaling the ensemble weather forecasts. The TLFN and the EPR models have quite similar performance, but both models perform better than SDSM. However, none of the three models perform very well for downscaling precipitation.

In forecasting up to 14-day-ahead reservoir inflow using the HBV model without downscaling data, the forecast results are good only up to 5-day lead time. When the TLFN or the SDSM downscaled data are used, no improvement is obtained for the first week (1–5 day) forecasts, whereas significant forecast improvements are achieved for 6–14-day-ahead forecasts. Specifically, when TLFN downscaled data are used, the week-2 reservoir inflow forecast improvement ranges from 12% to 21%, whereas an improvement range of 7% to 18% is obtained when SDSM downscaled data are used. Similar results are obtained for the Serpent River flow forecasts. The week-2 hydrologic forecasts results indicate that even week-2 weather predictions can be very useful for producing significantly improved week-2 hydrologic forecasts. It is also shown that 75% of the improvement in the week-2 spring reservoir inflow forecast can be attributed to the use of the downscaled temperature alone. However, the forecast results also indicate that the BNN model is unable to capture the relevant information from the downscaled weather forecasts that the HBV model is able to capture. This is attributed to the static structure of the BNN model. In future research, the investigation of the Bayesian learning approach for temporal neural networks, such as recurrent neural networks that have a dynamic memory structure, is suggested. Furthermore, the downscaling of precipitation also needs some improvement; nonparametric methods such as the nearest neighbor methods are thus suggested for further investigation. Further research should also consider downscaling of all ensemble members of the weather predictions to perform hydrologic ensemble forecasting with an uncertainty estimate. This appears to be the next challenging task needed to facilitate the operational use of ensemble weather predictions in real-time hydrologic forecasting.

## Acknowledgments

This work was supported by a joint grant (CRD) from the National Science and Engineering Research Council (NSERC) of Canada and Hydro-Quebec. The authors acknowledge the contribution of Noel Evora in collecting and preparing the experiment data. The authors wish to thank O. Giustolisi and D. Savic for providing the EPR software used in the study. The authors thank the Aluminum Company of Canada (ALCAN, now ALCOA) for providing some of the experiment data. The ensemble reforecast data is kindly made available by NOAA at http://www.cdc.noaa.gov/reforecast/. The authors are grateful to two anonymous reviewers for their valuable comments, which helped to improve the manuscript.

## REFERENCES

Arheimer, B., and Brandt M. , 1998: Modelling nitrogen transport and retention in the catchments of southern Sweden.

,*Ambio***27**, 471–480.ASCE Task Committee on Application of Artificial Neural Networks in Hydrology, 2000: Artificial neural networks in hydrology. II: Hydrologic applications.

,*J. Hydrol. Eng.***5**, 124–137.Bergström, S., 1976: Development and application of a conceptual runoff model for Scandinavian catchments. SMHI Rep. 7, 134 pp.

Bergström, S., Carlsson B. , Gardelin M. , Lindstroem G. , Pettersson A. , and Rummukainen M. , 2001: Climate change impacts on runoff in Sweden—Assessments by global climate models, dynamical downscaling and hydrological modeling.

,*Climate Res.***16**, 101–112.Brandt, M., 1990: Simulation of runoff and nitrate transport from mixed basins in Sweden.

,*Nord. Hydrol.***21**, 13–34.Coulibaly, P., 2003: Impact of meteorological predictions on real-time spring flow forecasting.

,*Hydrol. Processes***17**, 3791–3801.Coulibaly, P., 2004: Downscaling daily extreme temperatures with genetic programming.

,*Geophys. Res. Lett.***31**, L16203, doi:10.1029/2004GL020075.Coulibaly, P., Anctil F. , and Bobee B. , 2001: Multivariate reservoir inflow forecasting using temporal neural networks.

,*J. Hydrol. Eng.***6**, 367–376.Coulibaly, P., Dibike Y. B. , and Anctil F. , 2005: Downscaling precipitation and temperature with temporal neural networks.

,*J. Hydrometeor.***6**, 483–495.Davidson, J. W., Savic D. A. , and Walters G. A. , 2000: Approximators for the Colebrook-White formula obtained through a hybrid regression method.

*Proc. 13th Int. Conf. on Computational Methods in Water Resources*, Calgary, Canada, University of Calgary, 983–989.Dibike, Y. B., and Coulibaly P. , 2005: Hydrologic impact of climate change in the Saguenay watershed: Comparison of downscaling methods and hydrologic models.

,*J. Hydrol.***307**, 145–163.Dibike, Y. B., and Coulibaly P. , 2007: Validation of hydrological models for climate scenario simulation: The case of Saguenay watershed in Quebec.

,*Hydrol. Processes***21**, 3123–3135.Georgakakos, K. P., and Hudlow M. D. , 1984: Quantitative precipitation forecast techniques for use in hydrologic forecasting.

,*Bull. Amer. Meteor. Soc.***65**, 1186–1200.Giustolisi, O., and Savic D. A. , 2005: A symbolic data-driven technique based on evolutionary polynomial regression.

,*J. Hydroinformatics***8**, 207–222.Goswami, M., and O’Connor K. , 2007: Real-time flow forecasting in the absence of quantitative precipitation forecasts: A multi-model approach.

,*J. Hydrol.***334**, 125–140.Hamill, T. M., Whitaker J. S. , and Wei X. , 2004: Ensemble reforecasting: Improving medium-range forecast skill using retrospective forecasts.

,*Mon. Wea. Rev.***132**, 1434–1447.Hay, L. E., and Clark M. P. , 2003: Use of statistically and dynamically downscaled atmospheric model output for hydrologic simulations in three mountainous basins in the western United States.

,*J. Hydrol.***282**, 56–75.Hay, L. E., Wilby R. L. , and Leavesley G. H. , 2000: A comparison of delta change and downscaled GCM scenarios for three mountainous basins in the United States.

,*J. Amer. Water Resour. Assoc.***36**, 387–397.Kalnay, E., and Coauthors, 1996: The NCEP/NCAR 40-Year Reanalysis Project.

,*Bull. Amer. Meteor. Soc.***77**, 437–471.Karunasinghe, D., and Liong S. , 2006: Chaotic time series prediction with a global model: Artificial neural network.

,*J. Hydrol.***323**, 92–105.Khan, M. S., and Coulibaly P. , 2006: Bayesian neural network for rainfall-runoff modeling.

,*Water Resour. Res.***42**, W07409, doi:10.1029/2005WR003971.Khan, M. S., and Coulibaly P. , 2010: Assessing hydrologic impact of climate change with uncertainty estimates: Bayesian neural network approach.

,*J. Hydrometeor.***11**, 482–495.Koza, J. R., 1992:

*Genetic Programming: On the Programming of Computers by Natural Selection*. MIT Press, 835 pp.Lazo, J. K., Morss R. E. , and Demuth J. L. , 2009: 300 billion served: Sources, perceptions, uses, and values of weather forecasts.

,*Bull. Amer. Meteor. Soc.***90**, 785–798.Leung, L. R., Hamlet A. F. , Lettenmaier D. P. , and Kumar A. , 1999: Simulations of the ENSO hydroclimate signals in the Pacific Northwest Columbia River basin.

,*Bull. Amer. Meteor. Soc.***80**, 2313–2329.Lidén, R., and Harlin J. , 2000: Analysis of conceptual rainfall-runoff modeling performance in different climates.

,*J. Hydrol.***238**, 231–247.Lindström, G., Gardelin M. , Johansson B. , Persson M. , and Bergström S. , 1996: HBV-96-En areellt fordelad model for vattenkrafthydrologin. Swedish Meteorologic and Hydrologic Institute Rep. RH 12, 97 pp.

Liu, X., Coulibaly P. , and Evora N. , 2008: Comparison of data-driven methods for downscaling ensemble weather forecasts.

,*Hydrol. Earth Syst. Sci.***12**, 615–624.Principe, J. C., Euliano N. R. , and Lefebvre W. C. , 2000:

*Neural and Adaptive Systems: Fundamentals through Simulations*. John Wiley, 647 pp.Sivakumar, B., Jayawardena A. , and Fernando T. , 2002: River flow forecasting: Use of phase-space reconstruction and artificial neural networks approaches.

,*J. Hydrol.***265**, 225–245.Wilby, R. L., Hay L. E. Jr., Gutowski W. J. , Arritt R. W. , Takle E. S. , Pan Z. , Leavesley G. H. , and Clark M. P. , 2000: Hydrological responses to dynamically and statistically downscaled climate model output.

,*Geophys. Res. Lett.***27**, 1199–1202.Wilby, R. L., Dawson C. W. , and Barrow E. M. , 2002: SDSM—A decision support tool for the assessment of regional climate change impacts.

,*Environ. Modell. Software***17**, 145–159.