## 1. Introduction and literature review

Frequency analysis (FA) is an operational tool commonly used in hydrological analysis. It is a crucial step in the analysis of hydrological risk enabling optimal water resource management and design of hydraulic structures. The procedure consists generally of identifying the probability distribution that best fits the observed data and hence provides adequate estimates of quantiles associated with specified return periods. In practice, this approach is privileged when enough hydrological information is available at the site of interest. However, the use of this technique becomes inefficient at sites where little or no data are available. Regional FA (RFA) is instead used in such a case to estimate quantiles at ungauged sites (e.g., Cunnane 1988; Burn 1990; Castellarin et al. 2001; Hosking and Wallis 2005). In an RFA, to mitigate the lack of data, regional flood quantiles estimation is achieved via transferring information from gauged sites to the ungauged site (e.g., Burn 1990; Guse et al. 2010; Ouarda 2013; Chebana et al. 2014).

Ouarda et al. (2008) provided an overview of the various available RFA methods. Among regional flood quantile estimation methods, regression and index-flood models are equivalent and are superior to other models. The index-flood method makes the basic assumptions that data at different sites within a homogeneous region are independent and follow the same statistical distribution apart from a scale parameter that characterizes each site (e.g., Brath et al. 2001; Sveinsson et al. 2001; Javelle et al. 2002; Chebana and Ouarda 2009). Conversely, the regression model is a simple approach that allows the use of different distributions for different sites in the region (e.g., Pandey and Nguyen 1999; Shu and Burn 2004; Ouarda et al. 2006). Regression models use a transfer function to find a direct relationship between at-site quantiles (outputs) and physio-meteorological variables (predictors or inputs). They are commonly used in RFA because of their ease of implementation, their rapidity, and their good performance. In this regard, numerous models were proposed in RFA using different transfer functions, including the linear regression model (e.g., Holder 1985; Phien et al. 1990; Pandey and Nguyen 1999; Di Prinzio et al. 2011), the generalized linear model (e.g., Nelder and Baker 1972), the generalized additive model (e.g., Chebana et al. 2014), and artificial neural networks (e.g., Abrahart and See 2007; Shu and Ouarda 2007).

The major drawback of regression-based methods is that they generally provide only the mean or the central part of the at-site quantiles. As a result, most regression methods are applied in RFA to provide the conditional mean of the quantile at ungauged sites given the physiographical variables (e.g., Pandey and Nguyen 1999; Ouarda 2013; Wazneh et al. 2013; Ouali et al. 2016). Hence, estimated quantiles at gauged sites are commonly used to calibrate the transfer function of the regression model in RFA and are not directly derived from the hydrological observations. For each quantile *p*, a regression model has to be performed including variable selection and parameter estimation. Generally, only quantiles estimated with long data series are retained for the calibration and the evaluation of the RFA model, while regional information from sites with few data is ignored. In addition, even if the at-site quantiles are estimated with long data series, they are always inaccurate since different sources of uncertainties may occur (Arnell 1989; Girard et al. 2004; Hamed and Rao 2010). Hence, the use of at-site estimated quantiles for the calibration of the RFA model may induce significant biases in the modeled relationships. This makes the evaluation of the model performance more difficult, especially when statistical evaluation criteria such as the mean errors (MEs) and the root-mean-square errors (RMSEs) are computed using the at-site estimated quantiles. The availability of regression techniques that directly provide the conditional quantile (instead of the conditional mean) would be useful in RFA in order to use the raw data for the model calibration (rather than using the estimated quantiles at gauged sites) as well as the appropriate evaluation criteria.

For this purpose, a quantile regression (QR) model is presented for RFA in the present paper. Unlike classical regression approaches, there is no need to perform at-site studies as a step to provide quantiles for the calibration of the regression model. The model proposed in this work presents another advantage: all regional information can be used to calibrate the QR model, including sites with short data records. Actually, even a single local observation can be considered in the QR approach. QR was developed by Koenker and Bassett (1978) in order to model the functional relationship between the predictors and conditional quantiles of the response distribution. More than a decade after, this technique started to receive considerable attention and a variety of applications were carried out in several fields (Koenker and Hallock 2001; Coad and Rao 2008). Examples of such applications fields include meteorology (e.g., Ben Alaya et al. 2016), economy (e.g., Melly 2005), medicine (e.g., Gebregziabher et al. 2011), ecology (e.g., Planque and Buffaz 2008), and education (e.g., Hartog et al. 2001). In terms of statistical development, a number of variations of the QR model were proposed in the literature, such as the single-index QR (Wu et al. 2010) and the penalized single-index QR (Alkenani and Yu 2013). Several other studies, for instance, Cheng et al. (2011), Koenker (2011), and Hu et al. (2013), presented advances in QR modeling. Despite all this diversity and progress in the QR literature, little attention was devoted to this approach in the water resources literature. For instance, Sankarasubramanian and Lall (2003) used a QR model with both synthetic and real data to estimate flood quantiles under climate change circumstances. Cannon (2011) developed a neural network QR model in statistical downscaling of precipitation in order to identify the conditional distribution of a given day. In Villarini et al. (2011), an FA of the annual maximum daily precipitation records was performed where the QR approach has been used to investigate the stationarity assumption. A few other relevant studies have also integrated the QR tool when dealing with precipitation analysis, such as Tareghian and Rasmussen (2013) and Choi et al. (2014). In the present paper, the aim is to investigate the applicability, potential, and benefits of the QR technique in the RFA context. The performance of the proposed approach is evaluated through a rigorous comparison with the classical regression model.

To avoid confusion, note that in some studies, for instance, Palmen et al. (2011) and Haddad and Rahman (2012), the terminology of QR refers to the classical regression model that requires estimated at-site quantiles as inputs to provide quantile estimates at ungauged sites.

The hydrological literature abounds with studies dealing with the development of new RFA models. However, much less attention has been dedicated to the development of new evaluation criteria. Traditional evaluation criteria such as the RMSE and the ME are commonly defined in a cross-validation procedure. Such statistical criteria require the availability of at-site quantiles. Therefore, since they are estimated, they are not suitable in the cross-validation framework. For this purpose, an evaluation criterion directly based on raw data (rather than estimated at-site quantiles) is proposed using the Koenker loss function.

The remainder of this paper is organized as follows. Section 2 presents the theoretical background of linear regression as well as the QR models. In section 3, the adopted methodology for the adaptation of the QR model to the RFA framework is presented. To assess the potential of the proposed method, an evaluation criterion is developed. The QR model is then applied to a case study of the southern part of the province of Quebec, Canada. The considered dataset is described in section 4. Obtained results of flood quantiles estimation corresponding to the 10-, 50-, and 100-yr return periods using both LR and QR models are given in section 5. Section 6 includes a discussion of the implication of the findings to future research into the same field. The last section of the paper is dedicated to concluding remarks.

## 2. Theoretical background

In this section, the adopted procedures in the current paper are described. A brief description of the statistical background of regression models is provided herein.

### a. Linear regression model

*p*in (0, 1)?

### b. Quantile regression

Let us consider the following question: if the sample mean is the solution to the problem of minimizing a sum of squared errors [(2)], and the sample median is the solution to the problem of minimizing a sum of absolute residuals [(4)], which optimization problem can have, as a solution, a given sample quantile of order *p*? By looking for the answer to this question, Koenker and Bassett (1978) introduced the QR technique that provides the conditional quantile of the response variable given a set of predictors. Several applications of QR were carried out in the environmental sciences, such as climatic change detection (Chamaillé-Jammes et al. 2007), air pollution prediction (Sousa et al. 2009), and statistical precipitation downscaling (Friederichs and Hense 2007; Cannon 2011). In this paper, we adopt the QR model in the RFA context.

*p*th

*p*, denoted by the Koenker function (KF):

*u*as

*z*is a given random variable. Note that, unlike the normal distribution, the ALD is more appropriate for high peaks and thick tail data. The reader is referred to Kozubowski and Podgórski (1999) for more details.

The QR model also presents a number of attractive statistical properties that are absent in the LR model, such as the invariance with respect to any monotonic transformation and the robustness against outliers. More theoretical aspects are detailed in Koenker (2005). Another important advantage of the QR model is the simultaneous estimation of quantiles with different orders *p* (e.g., Tokdar and Kadane 2011; Reich and Smith 2013). The simultaneous QR estimation can be seen as a solution to the crossing QR problem by imposing simultaneous noncrossing constraints (Liu and Wu 2011). Unlike the proposed QR-based approach, the classical regression modeling in RFA focuses only on few values of *p* and also requires conducting a new analysis for each new value of *p* (including each time variable selection, transformations, assumptions checking). Table 1 summarizes the main differences between the classical LR model and the QR model.

LR vs QR model characteristics in RFA context.

## 3. Application to RFA

In hydrological RFA, the aim is to estimate flood quantiles at ungauged sites. However, classical regression models provide estimates of the conditional mean of the response variable. Consequently, these models are calibrated using estimated quantiles derived from a local FA that may generate unreliable results. In this section, the adaptation of the above statistical tools within the RFA context is briefly presented.

### a. Regional models

Given *N* hydrologic stations, for a given station *i*, let **Y**_{i} be the hydrological vector of the series of maximum annual streamflows of length *n*_{i}, and let *y*_{ij} be the *j*th observation (maximum annual streamflows at year *j* for station *i*) where *j* = 1, …, *n*_{i}. For each station, let us define the vector of physio-meteorological variables *m*.

*p*estimated using local FA at a site

*i*when the series of observations is of adequate length. Typically, a classical LR is performed using the log-linear regression function:

**Y**

_{i}at a given site

*i*. Once the vector of parameter is estimated, the regional quantile at ungauged site

*i*′ can be estimated given the physio-meteorological variables

*n*

_{i}= 30 or

*n*

_{i}= 40 years), and these same sites are also retained for the validation of the model in a cross-validation (leave one out) procedure. For the remainder, let

*S*denote the number of sites with record length exceeding a certain length

_{l}*l*. As mentioned above, the estimated at-site quantiles

*p*at a site

*i*is written as

**x**and observed annual maximum flood records

**Y**. Hence, the whole available dataset, even sites with very short records, can be employed in the calibration procedure. The use of the QR method would thus involve much more information than the traditional one. In addition, there is no need to employ a log transformation in the regression model, unlike the classical regression approach that conducts an additional bias. Figure 1 illustrates the steps involved in regional quantile estimation using both LR and QR approaches. One can identify differences concerning the calibration of the two considered approaches; when dealing with the conventional RFA, one has to carry out an FA at each site containing enough data records within the region of interest. This step could require important time and experience since it includes for each site 1) the check of the basic assumption of FA including stationarity, independence, and homogeneity; 2) the identification of the frequency distribution that best fit each data series; 3) the estimation of the distribution parameters; and 4) the estimation of the at-site quantile of order

*p*(e.g., Chebana et al. 2013). These quantiles are then implemented in the LR model as the outputs or variables of interest. Using the proposed QR model, as illustrated in Fig. 1, all observed data are directly input into the regression model to get the regional quantile without conducting at-site analysis.

### b. Model quality assessment

One of the objectives of the present research is to evaluate the performance of the QR approach and to compare it to the classical LR model. All evaluation criteria commonly used for this purpose, namely, the Nash criterion (NASH), the RMSE, the relative RMSE (RRMSE), the mean bias (BIAS), and the relative mean bias (RBIAS), are established using at-site estimated quantiles. They consist of calculating the residual errors between the at-site estimated quantiles and the regional estimated ones. This approach considers estimated at-site quantiles as a perfect estimation. Indeed, the total error related to the regional LR model results from two main sources: 1) the at-site estimation error (denoted

Illustration of regional quantile estimation error related to the LR and QR models, compared to the “true” quantile.

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

Illustration of regional quantile estimation error related to the LR and QR models, compared to the “true” quantile.

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

Illustration of regional quantile estimation error related to the LR and QR models, compared to the “true” quantile.

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

*n*denotes the total number of observations in all stations combined,

The rationale of this criterion is to assess the performance of adopted models through the use of raw observed data. The lower the criterion value is, the more suitable is the considered model. As indicated in Koenker and Machado (1999), the use of the optimal value of the loss function as a goodness of fit measure is a very natural idea commonly used in the robust literature. Based on this argument, the MPLF criterion was calculated by summing up the values of KF computed at each site and then standardized by the total number of observations at all sites. Thus, this calculation procedure will attribute a weight of *i* depending on its number of observations, in other words, giving more importance to sites with long data series.

The concept behind this criterion as well as the classical ones is explained graphically in Fig. 3. As indicated in this figure, these criteria (proposed and traditional ones) are calculated based on a cross-validation procedure (e.g., Ouarda et al. 2001). It consists of temporarily removing each site and considering it as an ungauged one. A new flood record value is thus estimated and the ability of each method is then evaluated. This figure illustrates the difference between classical evaluation indices and the proposed one. In reality, when using the classical approach, the error related to at-site estimation is not included in the evaluation procedure. Indeed, the at-site estimated quantiles are considered as references, although they are not observed data. The quality of the at-site estimated quantiles depends strongly on the record lengths (Tasker and Moss 1979). Consequently, in order to ensure a minimum reliability, the RMSE criterion should be calculated over a subset **e** of sites with enough record lengths, for instance, more than 30 years. Using the MPLF criterion, there is no need to reduce the dataset neither for the fitting nor for the evaluation step.

Procedure to evaluate the LR and QR models.

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

Procedure to evaluate the LR and QR models.

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

Procedure to evaluate the LR and QR models.

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

The basic concepts behind the proposed criterion are justified in a similar way as in Koenker and Machado (1999). In this study, the authors developed the

## 4. Case study and study design

The proposed procedure was applied to the dataset issued from the hydrometric station network of southern Quebec, provided by the Quebec Ministry of the Environment (province of Quebec, Canada). A total of 151 stations, located between 45° and 55°N in the southern part of Quebec, are selected. Two types of variables are considered: physio-meteorological variables and hydrological variables. The physio-meteorological variables are those used previously by Chokmani and Ouarda (2004), namely, the mean basin slope (PMBV; %), the basin area (BV; km^{2}), the proportion of the basin area covered by lakes (PLAC; %), the annual mean total precipitation (PTMA; mm), and the annual mean degree days over 0°C (DJBZ; degree day). Hydrological variables are at-site-specific flood quantiles, corresponding to return periods *T* = 10, 50, and 100 years, denoted *Q*_{ST} (m^{3} s^{−1} km^{−2}). For each site, the most appropriate statistical distribution has been identified in order to estimate at-site quantiles for different return periods.

For the proposed approach, we use the same information as for the classical regional estimation methods (the flood record at gauged sites) but in a different way: we use the raw flow data available at each station for the period between 1900 and 2002, rather than using the processed quantile data at the gauged sites. Figure 4 represents the spatial variation of length of historical data records for each site ranging from 15 to 84 years.

Map showing the spatial variation of flood record length at gauged sites.

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

Map showing the spatial variation of flood record length at gauged sites.

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

Map showing the spatial variation of flood record length at gauged sites.

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

The present study seeks to address the classical RFA gaps through 1) providing a direct quantile estimation without performing an at-site FA and 2) proposing an adopted evaluation criterion. This is performed in several steps:

Apply and compare the considered models (QR and LR) using different criteria; the calibration and the application of both models are performed using the entire dataset.

Take into account the at-site quantile estimation quality; modify the data used for the calibration step.

Consider a more suitable case for which the LR performs well and the QR advantages are accounted for: the QR model built and assessed using the entire dataset and the LR model built using only sites with record length exceeding 30 years and evaluated using the entire dataset.

Compare both models using the MPLF criterion; the concept of this criterion permits the model assessment using the entire dataset.

## 5. Results

Results related to the application of both QR and LR using the whole dataset are initially reported. Then, the next step consists of investigating the effect of long local data series through a comparison of the results of the various models. Finally, comparison results of the two models based on the MPLF criterion are presented.

By considering the entire set of 151 sites, regional estimated quantiles using both LR and QR models are compared with those obtained from the at-site estimated quantiles. Obtained results are illustrated in Fig. 5. It is important to recall that the at-site quantile estimates are considered as reference values; thus, regional estimation is as accurate as it is closer to the at-site estimation. A comparison of the two results reveals that the LR reproduces more adequately the at-site estimated quantiles than the QR model, which generally overestimates them. Associated BIAS and RMSE values of each case are also reported. One can see that quantile estimates obtained from the LR model are less biased (i.e., smaller BIAS) and more accurate (i.e., smaller RMSE) than QR quantiles. Conversely, the QR model was found to be more biased and less accurate. This finding is expected and can be explained by the definition of these criteria that are, by construction, based on the at-site estimated quantiles. Indeed, regardless of the quality of the at-site processed data, these criteria have been often used to measure the regional flood quantile estimation performance. Recall that the BIAS is related with the deviation from the true value while the RMSE is associated with the model accuracy. Thus, a good estimation quality is systematically related to long data series since the at-site estimation error, denoted

Scatterplots of at-site and regional estimated quantiles using (a),(c),(e) the LR model and (b),(d),(f) the QR model for *Q*_{S10}, *Q*_{S50}, and *Q*_{S100}. Both models are calibrated and evaluated over the entire dataset. Shading of points indicates the length of records.

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

Scatterplots of at-site and regional estimated quantiles using (a),(c),(e) the LR model and (b),(d),(f) the QR model for *Q*_{S10}, *Q*_{S50}, and *Q*_{S100}. Both models are calibrated and evaluated over the entire dataset. Shading of points indicates the length of records.

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

Scatterplots of at-site and regional estimated quantiles using (a),(c),(e) the LR model and (b),(d),(f) the QR model for *Q*_{S10}, *Q*_{S50}, and *Q*_{S100}. Both models are calibrated and evaluated over the entire dataset. Shading of points indicates the length of records.

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

To quantify the at-site estimation error and to assess the record length effect on the estimation quality, we proceeded by Monte Carlo simulation (e.g., Arora and Singh 1989). For a selected site, with long data series, the appropriate distribution was identified and the quantile of interest *Q*_{S10}, *Q*_{S50}, and *Q*_{S100} are 8.77 × 10^{−5}, 1.46 × 10^{−4}, and 2.25 × 10^{−4} m^{3} s^{−1} km^{−2}, respectively. Hence, an average quantification of the total at-site RMSE may be obtained by multiplying the RMSE related to the mean record length 151 times (respectively 0.013, 0.022, and 0.034 m^{3} s^{−1} km^{−2}).

Evaluation of the quadratic error related to the at-site flood quantile estimates for various record lengths. Simulations are performed using Weibull parameters of a fixed site belonging to the dataset. For each record length 100 series were randomly generated using the same Weibull distribution. The quantile of each series is then estimated and compared to the theoretical one (derived from the fixed distribution).

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

Evaluation of the quadratic error related to the at-site flood quantile estimates for various record lengths. Simulations are performed using Weibull parameters of a fixed site belonging to the dataset. For each record length 100 series were randomly generated using the same Weibull distribution. The quantile of each series is then estimated and compared to the theoretical one (derived from the fixed distribution).

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

Evaluation of the quadratic error related to the at-site flood quantile estimates for various record lengths. Simulations are performed using Weibull parameters of a fixed site belonging to the dataset. For each record length 100 series were randomly generated using the same Weibull distribution. The quantile of each series is then estimated and compared to the theoretical one (derived from the fixed distribution).

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

An important question that can be raised at this level is the effect of considering longer records, which systematically implies better at-site quantile estimation quality. In this respect, quantile estimation at ungauged sites was performed using different minimum record lengths *l* belonging to the set {15, 20, 30, 40, 50, 60}. For each *l* value *S _{l}* sites are used ranging from 151 sites (associated with

*l*= 15 years, i.e., all sites) to only 14 sites (Fig. 7). The differences in the performance criterion RMSE using the LR and QR approaches are presented in Fig. 8. From Figs. 8a and 8b, it can be seen that the LR method leads to a better performance in all considered cases. However, as pointed out previously, from a conceptual viewpoint, the consideration of such an evaluation criterion is in favor of the LR model since both are based on the estimated values of flood quantiles. Thus, it would be advantageous to assess models with an objective tool such as the above-mentioned MPLF.

Bar plot of number of stations. Classes are defined to indicate the number of stations with record lengths exceeding a given minimum.

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

Bar plot of number of stations. Classes are defined to indicate the number of stations with record lengths exceeding a given minimum.

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

Bar plot of number of stations. Classes are defined to indicate the number of stations with record lengths exceeding a given minimum.

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

RMSE of the regional estimators of (a) *Q*_{S50} and (b) *Q*_{S100} as well as the MPLF of the regional estimators of (c) *Q*_{S50} and (d) *Q*_{S100} according to the length of regional data series. Both models are calibrated using sites with record length exceeding *l* years, except (c) and (d), where QR model was calibrated using the entire dataset; QR and LR validation is done using the entire dataset.

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

RMSE of the regional estimators of (a) *Q*_{S50} and (b) *Q*_{S100} as well as the MPLF of the regional estimators of (c) *Q*_{S50} and (d) *Q*_{S100} according to the length of regional data series. Both models are calibrated using sites with record length exceeding *l* years, except (c) and (d), where QR model was calibrated using the entire dataset; QR and LR validation is done using the entire dataset.

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

RMSE of the regional estimators of (a) *Q*_{S50} and (b) *Q*_{S100} as well as the MPLF of the regional estimators of (c) *Q*_{S50} and (d) *Q*_{S100} according to the length of regional data series. Both models are calibrated using sites with record length exceeding *l* years, except (c) and (d), where QR model was calibrated using the entire dataset; QR and LR validation is done using the entire dataset.

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

In Table 2 validation results of both LR and QR models using the proposed observation-based MPLF criterion are presented. It can be seen that QR outperforms the LR model. This finding remains the same when considering different record lengths for the LR calibration. Figures 8c and 8d summarize the results of the LR method and the QR method in terms of MPLF for *Q*_{S50} and *Q*_{S100}, respectively. It is obvious that, for different *l*, the QR model shows a better performance than the classical model (i.e., smaller MPLF). Note also that, for higher record lengths (meaning fewer considered sites for the calibration step), the LR performance decreases. On the other hand, since QR and the MPLF criteria do not depend on the at-site quantile estimation, the MPLF values associated to the QR approach are always constant.

MPLF values associated to QR and LR approaches. Best results are in boldface.

One of the objectives of the present study was to tangibly assess (in reference to observed data) the performance of the two regional estimation approaches. To achieve this, comparison results using the MPLF criterion were further developed. For each site, the PE given in (14) was calculated and the differences between the LR and the QR approaches were highlighted in Fig. 9 for *Q*_{S10}, *Q*_{S50}, and *Q*_{S100}. This figure indicates that for long data series (deep black points) both models behave well, meaning that the errors are almost equal and small compared to those corresponding to short records (clear points). Furthermore, it can be seen that QR errors, especially for short record lengths, are smaller than LR errors, which is in accordance with our earlier findings. We have also identified some problematic sites associated with high PE for both models, namely, sites with identification numbers: 030415, 073301, 076601, and 080104.

QR piecewise error (QRPE) vs LR piecewise error (LRPE) associated with (a) *Q*_{S10}, (b) *Q*_{S50}, and (c) *Q*_{S100}. Both models are validated over the entire dataset. LR calibration is done using sites with record length exceeding 30 years; QR calibration is done over all sites. Shading of points indicates the length of records.

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

QR piecewise error (QRPE) vs LR piecewise error (LRPE) associated with (a) *Q*_{S10}, (b) *Q*_{S50}, and (c) *Q*_{S100}. Both models are validated over the entire dataset. LR calibration is done using sites with record length exceeding 30 years; QR calibration is done over all sites. Shading of points indicates the length of records.

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

QR piecewise error (QRPE) vs LR piecewise error (LRPE) associated with (a) *Q*_{S10}, (b) *Q*_{S50}, and (c) *Q*_{S100}. Both models are validated over the entire dataset. LR calibration is done using sites with record length exceeding 30 years; QR calibration is done over all sites. Shading of points indicates the length of records.

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

It is also of interest to verify whether the basin size has an effect on the performance criterion. Figure 10 illustrates the PE of *Q*_{S100} in all sites for the LR model (Fig. 10a) and the QR model (Fig. 10b). It was effectively noticed that the basin size seems to influence the performance criterion. Indeed, in both cases (LR and QR), the error was larger for smaller basins in comparison to larger ones. Note also that the QR errors are less scattered than the ones resulting from the classical model. This confirms the robustness of the QR model.

Piecewise error of regional quantiles estimations of *Q*_{S100} using the (a) LR and (b) QR models for various basin areas in the logarithmic scale.

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

Piecewise error of regional quantiles estimations of *Q*_{S100} using the (a) LR and (b) QR models for various basin areas in the logarithmic scale.

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

Piecewise error of regional quantiles estimations of *Q*_{S100} using the (a) LR and (b) QR models for various basin areas in the logarithmic scale.

Citation: Journal of Hydrometeorology 17, 6; 10.1175/JHM-D-15-0187.1

## 6. Discussion

This work was motivated by the weakness of the classical approach. Indeed, most traditional regression-based approaches are conceived to answer the question of how to transfer estimated quantiles from gauged sites to ungauged ones. Consequently, these classical approaches depend strongly on the quality of at-site quantile estimates that in turn depend on the choice of the distribution function, the parameter estimation method, and the length of the data series.

To address these limitations, the proposed approach seeks an answer to the following question: why not estimate quantiles in an ungauged site by transferring information from observed hydrological data instead of using estimated quantiles? The answer to this question resides in the principle of the traditional regression model. Because classical regression models give only the conditional mean of the response variable, traditional regression-based approaches in RFA are often performed using the processed quantile data at the gauged sites rather than using raw flow data. Evaluation of the at-site estimation error is often performed using simulations. In the present work, an attempt to assess the at-site error was elaborated using the Monte Carlo simulations. Presented results indicate that, somewhat surprisingly, for the mean record length, the at-site error is relatively high. To avoid training regression models with at-site estimated quantiles, the QR model is introduced in the present study in the RFA context. An evaluation criterion is also proposed based on raw data as reference to assess the models’ performance. This forms the main advantage of the MPLF criterion. The second advantage is that it includes information not only from long historic records, but also from short ones. Unlike the classical approach, this leads to retaining all available information.

From the results of this study, one may conclude the good performance of the proposed criterion. Furthermore, when dealing with this criterion, each site has a weight proportionally on its data record length. Hence, the longer the data series is, the greater the corresponding weight is. This is quite the opposite of the classical criteria, such as the RMSE, which are not sensitive to the data size characteristics, that is, the site record length or the region size.

An alternative application of QR in RFA framework could be performed by generating data at ungauged sites. The idea consists of generating a set of quantiles given only physio-meteorological variables at a site. Indeed, the QR model can be used to estimate the conditional distribution through generating quantiles randomly between 0 and 1 (Cannon 2011). The resulting sample is then compared to the observed one using nonparametric tests, namely, the two-sample Kolmogorov–Smirnov and the Kruskal–Wallis tests. This procedure was applied in the present work for some selected sites, supposed to be ungauged, with 5% significance level. Obtained results, in terms of *p* values, suggest that for all chosen sites the observed and simulated samples come from the same continuous distribution. This finding may serve as a base for several practical applications within the RFA context.

Another important issue is related to the first step of the traditional RFA, namely, the delineation of homogeneous regions (DHR). This is a challenging step that consists of grouping sites within homogeneous regions, in other words, identifying groups of stations that have similar hydrological, meteorological, and physiographical behaviors. This would significantly improve quantile estimates at ungauged sites. Actually, when compared to the entire region’s results, the DHR-based approaches provide the smallest standard error values (Gingras and Adamowski 1993). In the current work we opted for two regional (one step) models to estimate regional flood quantiles at ungauged sites without identifying sites having a similar hydrological behavior to the target site. Indeed, the proposed methodology aimed only to compare the two regression models. It could be important to assess the true performance of the QR method in association with a number of commonly used DHR methodologies (e.g., Ouali et al. 2016).

Moreover, it should be noted that, as it is the case for most of the RFA models, the proposed approach is based on the so-called stationary assumption, meaning estimated quantiles for a given site will not vary significantly over time. An investigation of this assumption has been performed using the nonparametric Spearman test (Yue et al. 2002; Villarini et al. 2011). Application of this test to the maximum annual streamflow series for each gauged station indicates that three stations of 151 are found to be nonstationary at a significance level of 1% (Table 3), as found by Kouider (2003). Given the small percentage of rejected stations (2%) and to maximize sources of information, these stations have been retained in this study, as was the case in previous studies (e.g., Chokmani and Ouarda 2004). Note that with a 5% significance level, results show that 20 stations are nonstationary. In this case, the proposed approach is also considered on the remaining stationary stations. The obtained results are similar to those with all stations and led to the same conclusions.

Rejected stations at a significance level of 1% using Spearman test.

As a matter of fact, under changing climate conditions, flood extremes at a given site could be altered. Hence, we argue that the RFA models, such as the proposed approach, should be adapted to account for such a context in future work.

Further potential directions for future research would be to consider noncrossing constraints when dealing with the QR approach. Indeed, the crossing QR is a serious modeling problem that may lead to an invalid response distribution. To address this problem, several solutions are proposed in the statistical literature. Indeed, depending on the goal of the modeling procedure, two main approaches exist, namely, the direct approach (e.g., Bondell et al. 2010; Liu and Wu 2011) whenever the focus is on estimating the entire parametric distribution, and the indirect approach (e.g., Hall et al. 1999; Dette and Volgushev 2008; Chernozhukov et al. 2010) when the interest is to estimate particular quantiles. In the present work, since the focus is on adequately estimating a few quantile orders, an indirect estimation approach is required. This consists of determining the conditional cumulative distribution function and then inverting it to get the desirable quantile estimates.

## 7. Conclusions

QR is a tool often used in several fields such as economy, health, ecology, and environmental studies, including hydrology. Nevertheless, it remains unutilized in the specific hydrological RFA framework, even though the quantile is a very important quantity to estimate. It is the LR model that is commonly employed as an estimation model in RFA essentially because of its simplicity. In this work, we address the important issue of including at-site FA as a step within RFA and hence employing estimated at-site quantiles in RFA. The purpose of the present study is first to consider observed data directly in the RFA using the QR model, then to evaluate the estimation performance of the two regional models (LR and QR) through an objective criterion.

Initially, the efficiency of the QR model was investigated in comparison to the classical approach. The QR model was considered in the RFA in order to estimate flood quantiles at ungauged sites. At a second step, the proposed criterion was applied within the same framework to ensure a fair model assessment. The methodology is validated on a real case study for different quantile orders. The developments are made using several datasets for each minimum record length *l*. Then, the relevance of both models in terms of MPLF criterion was explored in each site and according to the basin area.

Overall, we can conclude that the proposed approach is a promising method for the estimation and evaluation of flood quantiles at sites with short- to medium-length records.

## Acknowledgments

Financial support for this study was graciously provided by the Natural Sciences and Engineering Research Council (NSERC) of Canada. The authors thank the Ministry of the Environment of Quebec (MENVIQ) for having provided the employed datasets.

## REFERENCES

Abrahart, R., and See L. , 2007: Neural network modelling of non-linear hydrological relationships.

,*Hydrol. Earth Syst. Sci.***11**, 1563–1579, doi:10.5194/hess-11-1563-2007.Alkenani, A., and Yu K. , 2013: Penalized single-index quantile regression.

,*Int. J. Stat. Probab.***2**, 12, doi:10.5539/ijsp.v2n3p12.Arnell, N. W., 1989: Expected annual damages and uncertainties in flood frequency estimation.

,*J. Water Resour. Plann. Manage.***115**, 94–107, doi:10.1061/(ASCE)0733-9496(1989)115:1(94).Arora, K., and Singh V. P. , 1989: A comparative evaluation of the estimators of the log Pearson type (LP) 3 distribution.

,*J. Hydrol.***105**, 19–37, doi:10.1016/0022-1694(89)90094-2.Ben Alaya, M. A., Chebana F. , and Ouarda T. B. M. J. , 2016: Multisite and multivariable statistical downscaling using a Gaussian copula quantile regression model.

, doi:10.1007/s00382-015-2908-3, in press.*Climate Dyn.*Bondell, H. D., Reich B. J. , and Wang H. , 2010: Noncrossing quantile regression curve estimation.

,*Biometrika***97**, 825–838, doi:10.1093/biomet/asq048.Brath, A., Castellarin A. , Franchini M. , and Galeati G. , 2001: Estimating the index flood using indirect methods.

,*Hydrol. Sci. J.***46**, 399–418, doi:10.1080/02626660109492835.Bro, R., Sidiropoulos N. D. , and Smilde A. K. , 2002: Maximum likelihood fitting using ordinary least squares algorithms.

,*J. Chemom.***16**, 387–400, doi:10.1002/cem.734.Burn, D. H., 1990: Evaluation of regional flood frequency analysis with a region of influence approach.

,*Water Resour. Res.***26**, 2257–2265, doi:10.1029/WR026i010p02257.Cannon, A. J., 2011: Quantile regression neural networks: Implementation in R and application to precipitation downscaling.

,*Comput. Geosci.***37**, 1277–1284, doi:10.1016/j.cageo.2010.07.005.Castellarin, A., Burn D. , and Brath A. , 2001: Assessing the effectiveness of hydrological similarity measures for flood frequency analysis.

,*J. Hydrol.***241**, 270–285, doi:10.1016/S0022-1694(00)00383-8.Chamaillé-Jammes, S., Fritz H. , and Murindagomo F. , 2007: Detecting climate changes of concern in highly variable environments: Quantile regressions reveal that droughts worsen in Hwange National Park, Zimbabwe.

,*J. Arid Environ.***71**, 321–326, doi:10.1016/j.jaridenv.2007.05.005.Chebana, F., and Ouarda T. B. M. J. , 2009: Index flood–based multivariate regional frequency analysis.

,*Water Resour. Res.***45**, W10435, doi:10.1029/2008WR007490.Chebana, F., Ouarda T. B. M. J. , and Duong T. C. , 2013: Testing for multivariate trends in hydrologic frequency analysis.

,*J. Hydrol.***486**, 519–530, doi:10.1016/j.jhydrol.2013.01.007.Chebana, F., Charron C. , Ouarda T. B. M. J. , and Martel B. , 2014: Regional frequency analysis at ungauged sites with the generalized additive model.

,*J. Hydrometeor.***15**, 2418–2428, doi:10.1175/JHM-D-14-0060.1.Cheng, Y., De Gooijer J. G. , and Zerom D. , 2011: Efficient estimation of an additive quantile regression model.

,*Scand. J. Stat.***38**, 46–62, doi:10.1111/j.1467-9469.2010.00706.x.Chernozhukov, V., Fernandez-Val I. , and Galichon A. , 2010: Quantile and probability curves without crossing.

,*Econometrica***78**, 1093–1125, doi:10.3982/ECTA7880.Choi, W., Tareghian R. , Choi J. , and Hwang C. S. , 2014: Geographically heterogeneous temporal trends of extreme precipitation in Wisconsin, USA during 1950–2006.

,*Int. J. Climatol.***34**, 2841–2852, doi:10.1002/joc.3878.Chokmani, K., and Ouarda T. B. M. J. , 2004: Physiographical space-based kriging for regional flood frequency estimation at ungauged sites.

,*Water Resour. Res.***40**, W12514, doi:10.1029/2003WR002983.Coad, A., and Rao R. , 2008: Innovation and firm growth in high-tech sectors: A quantile regression approach.

,*Res. Policy***37**, 633–648, doi:10.1016/j.respol.2008.01.003.Cunnane, C., 1988: Methods and merits of regional flood frequency analysis.

,*J. Hydrol.***100**, 269–290, doi:10.1016/0022-1694(88)90188-6.Dette, H., and Volgushev S. , 2008: Non‐crossing non‐parametric estimates of quantile curves.

,*J. Roy. Stat. Soc.***70B**, 609–627, doi:10.1111/j.1467-9868.2008.00651.x.Di Prinzio, M., Castellarin A. , and Toth E. , 2011: Data-driven catchment classification: Application to the pub problem.

,*Hydrol. Earth Syst. Sci.***15**, 1921–1935, doi:10.5194/hess-15-1921-2011.Friederichs, P., and Hense A. , 2007: Statistical downscaling of extreme precipitation events using censored quantile regression.

,*Mon. Wea. Rev.***135**, 2365–2378, doi:10.1175/MWR3403.1.Gebregziabher, M., Lynch C. , Mueller M. , Gilbert G. , Echols C. , Zhao Y. , and Egede L. , 2011: Using quantile regression to investigate racial disparities in medication non-adherence.

,*BMC Med. Res. Methodol.***11**, 88, doi:10.1186/1471-2288-11-88.Gingras, D., and Adamowski K. , 1993: Homogeneous region delineation based on annual flood generation mechanisms.

,*Hydrol. Sci. J.***38**, 103–121, doi:10.1080/02626669309492649.Girard, C., Ouarda T. B. M. J. , and Bobée B. , 2004: Étude du biais dans le modèle log-linéaire d’estimation régionale.

,*Can. J. Civ. Eng.***31**, 361–368, doi:10.1139/l03-099.Guse, B., Thieken A. H. , Castellarin A. , and Merz B. , 2010: Deriving probabilistic regional envelope curves with two pooling methods.

,*J. Hydrol.***380**, 14–26, doi:10.1016/j.jhydrol.2009.10.010.Haddad, K., and Rahman A. , 2012: Regional flood frequency analysis in eastern Australia: Bayesian GLS regression-based methods within fixed region and ROI framework—Quantile regression vs. parameter regression technique.

,*J. Hydrol.***430–431**, 142–161, doi:10.1016/j.jhydrol.2012.02.012.Hall, P., Wolff R. C. , and Yao Q. , 1999: Methods for estimating a conditional distribution function.

,*J. Amer. Stat. Assoc.***94**, 154–163, doi:10.1080/01621459.1999.10473832.Hamed, K., and Rao A. R. , 2010:

*Flood Frequency Analysis*. CRC Press, 376 pp.Hao, L., and Naiman D. Q. , 2007:

*Quantile Regression*. Quantitative Applications in the Social Sciences Series, Vol. 149, Sage Publications, 136 pp.Harrell, F., Lee K. L. , and Mark D. B. , 1996: Tutorial in biostatistics multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors.

,*Stat. Med.***15**, 361–387, doi:10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4.Hartog, J., Pereira P. T. , and Vieira J. A. , 2001: Changing returns to education in Portugal during the 1980s and early 1990s: OLS and quantile regression estimators.

,*Appl. Econ.***33**, 1021–1037, doi:10.1080/00036840122679.Holder, R., 1985:

*Multiple Regression in Hydrology*. Institute of Hydrology, 147 pp.Hosking, J. R. M., and Wallis J. R. , 2005:

*Regional Frequency Analysis: An Approach Based on L-Moments*. Cambridge University Press, 244 pp.Hsiao, C., 2014:

*Analysis of Panel Data*. 3rd ed.*Econometric Society Monogr.*, No. 54, Cambridge University Press, 562 pp.Hu, Y., Gramacy R. B. , and Lian H. , 2013: Bayesian quantile regression for single-index models.

,*Stat. Comput.***23**, 437–454, doi:10.1007/s11222-012-9321-0.Javelle, P., Ouarda T. B. M. J. , Lang M. , Bobée B. , Galéa G. , and Grésillon J.-M. , 2002: Development of regional flood-duration–frequency curves based on the index-flood method.

,*J. Hydrol.***258**, 249–259, doi:10.1016/S0022-1694(01)00577-7.Johnston, J., and DiNardo J. , 1972:

*Econometric Methods*. McGraw-Hill, 437 pp.Koenker, R., 2005:

*Quantile Regression*.*Econometric Society Monogr.*, No. 38, Cambridge University Press, 366 pp.Koenker, R., 2011: Additive models for quantile regression: Model selection and confidence bandaids.

,*Braz. J. Probab. Stat.***25**, 239–262, doi:10.1214/10-BJPS131.Koenker, R., and Bassett G. , 1978: Regression quantiles.

,*Econometrica***46**, 33–50, doi:10.2307/1913643.Koenker, R., and Machado J. A. , 1999: Goodness of fit and related inference processes for quantile regression.

,*J. Amer. Stat. Assoc.***94**, 1296–1310, doi:10.1080/01621459.1999.10473882.Koenker, R., and Hallock K. , 2001: Quantile regression: An introduction.

,*J. Econ. Perspect.***15**, 143–156, doi:10.1257/jep.15.4.143.Kouider, A., 2003: Analyse fréquentielle locale des crues au Quebec. M.S. thesis, Eau Terre Environnement, INRS, Université du Québec, 256 pp. [Available online at http://espace.inrs.ca/365/1/T000342.pdf.]

Kozubowski, T. J., and Podgórski K. , 1999: A class of asymmetric distributions. Actuarial Research Clearing House, Vol. 1, Society of Actuaries, 113–134.

Lee, E. R., Noh H. , and Park B. U. , 2014: Model selection via Bayesian information criterion for quantile regression models.

,*J. Amer. Stat. Assoc.***109**, 216–229, doi:10.1080/01621459.2013.836975.Liu, Y., and Wu Y. , 2011: Simultaneous multiple non-crossing quantile regression estimation using kernel constraints.

,*J. Nonparametr. Stat.***23**, 415–437, doi:10.1080/10485252.2010.537336.Melly, B., 2005: Public–private sector wage differentials in Germany: Evidence from quantile regression.

,*Empir. Econ.***30**, 505–520, doi:10.1007/s00181-005-0251-y.Nelder, J. A., and Baker R. , 1972: Generalized linear models.

,*J. Roy. Stat. Soc.***135A**(3), 370–384.Ouali, D., Chebana F. , and Ouarda T. B. M. J. , 2016: Non-linear canonical correlation analysis in regional frequency analysis.

,*Stochastic Environ. Res. Risk Assess.***30**, 449–462, doi:10.1007/s00477-015-1092-7.Ouarda, T. B. M. J., 2013: Hydrological frequency analysis, regional.

*Encyclopedia of Environmetrics*, Vol. 3, Wiley, 1311–1315, doi:10.1002/9780470057339.vnn043.Ouarda, T. B. M. J., Girard C. , Cavadias G. S. , and Bobée B. , 2001: Regional flood frequency estimation with canonical correlation analysis.

,*J. Hydrol.***254**, 157–173, doi:10.1016/S0022-1694(01)00488-7.Ouarda, T. B. M. J., Cunderlik J. , St-Hilaire A. , Barbet M. , Bruneau P. , and Bobée B. , 2006: Data-based comparison of seasonality-based regional flood frequency methods.

,*J. Hydrol.***330**, 329–339, doi:10.1016/j.jhydrol.2006.03.023.Ouarda, T. B. M. J., St-Hilaire A. , and Bobée B. , 2008: Synthèse des développements récents en analyse régionale des extrêmes hydrologiques.

,*J. Water Sci.***21**, 219–232, doi:10.7202/018467ar.Palmen, L., Weeks W. , and Kuczera G. , 2011: Regional flood frequency for Queensland using the quantile regression technique.

,*Aust. J. Water Resour.***15**(1), 47–57.Pandey, G., and Nguyen V.-T.-V. , 1999: A comparative study of regression based methods in regional flood frequency analysis.

,*J. Hydrol.***225**, 92–101, doi:10.1016/S0022-1694(99)00135-3.Phien, H. N., Huong B. K. , and Loi P. D. , 1990: Daily flow forecasting with regression analysis.

,*Water S.A.***16**(3), 179–184.Planque, B., and Buffaz L. , 2008: Quantile regression models for fish recruitment environment relationships: Four case studies.

,*Mar. Ecol. Prog. Ser.***357**, 213–223, doi:10.3354/meps07274.Reich, B. J., and Smith L. B. , 2013: Bayesian quantile regression for censored data.

,*Biometrics***69**, 651–660, doi:10.1111/biom.12053.Sankarasubramanian, A., and Lall U. , 2003: Flood quantiles in a changing climate: Seasonal forecasts and causal relations.

,*Water Resour. Res.***39**, 1134, doi:10.1029/2002WR001593.Shu, C., and Burn D. H. , 2004: Artificial neural network ensembles and their application in pooled flood frequency analysis.

,*Water Resour. Res.***40**, W09301, doi:10.1029/2003WR002816.Shu, C., and Ouarda T. B. M. J. , 2007: Flood frequency analysis at ungauged sites using artificial neural networks in canonical correlation analysis physiographic space.

,*Water Resour. Res.***43**, W07438, doi:10.1029/2006WR005142.Sousa, S., Pires J. , Martins F. , Pereira M. , and Alvim‐Ferraz M. , 2009: Potentialities of quantile regression to predict ozone concentrations.

,*Environmetrics***20**, 147–158, doi:10.1002/env.916.Sveinsson, O. G., Boes D. C. , and Salas J. D. , 2001: Population index flood method for regional frequency analysis.

,*Water Resour. Res.***37**, 2733–2748, doi:10.1029/2001WR000321.Tareghian, R., and Rasmussen P. F. , 2013: Statistical downscaling of precipitation using quantile regression.

,*J. Hydrol.***487**, 122–135, doi:10.1016/j.jhydrol.2013.02.029.Tasker, G. D., and Moss M. E. , 1979: Analysis of Arizona flood data network for regional information.

,*Water Resour. Res.***15**, 1791–1796, doi:10.1029/WR015i006p01791.Tokdar, S., and Kadane J. B. , 2011: Simultaneous linear quantile regression: A semiparametric Bayesian approach.

,*Bayesian Anal.***7**(1), 51–72.Villarini, G., Smith J. A. , Baeck M. L. , Vitolo R. , Stephenson D. B. , and Krajewski W. F. , 2011: On the frequency of heavy rainfall for the Midwest of the United States.

,*J. Hydrol.***400**, 103–120, doi:10.1016/j.jhydrol.2011.01.027.Wazneh, H., Chebana F. , and Ouarda T. B. M. J. , 2013: Optimal depth-based regional frequency analysis.

,*Hydrol. Earth Syst. Sci.***17**, 2281–2296, doi:10.5194/hess-17-2281-2013.Wu, T. Z., Yu K. , and Yu Y. , 2010: Single-index quantile regression.

,*J. Multivariate Anal.***101**, 1607–1621, doi:10.1016/j.jmva.2010.02.003.Ying, Z., Jung S. , and Wei L. , 1995: Survival analysis with median regression models.

,*J. Amer. Stat. Assoc.***90**, 178–184, doi:10.1080/01621459.1995.10476500.Yu, K., and Moyeed R. A. , 2001: Bayesian quantile regression.

,*Stat. Probab. Lett.***54**, 437–447, doi:10.1016/S0167-7152(01)00124-9.Yue, S., Pilon P. , and Cavadias G. , 2002: Power of the Mann–Kendall and Spearman’s rho tests for detecting monotonic trends in hydrological series.

,*J. Hydrol.***259**, 254–271, doi:10.1016/S0022-1694(01)00594-7.