## 1. Introduction

Stochastic weather generators are statistical models designed to provide realistic random sequences of atmospheric variables such as precipitation, temperature, and wind speeds (see, e.g., Wilks and Wilby 1999). In particular, precipitation poses a number of challenges including, for instance, its spatiotemporal intermittence, its highly skewed distribution, and its complex stochastic dependencies. For example, maintaining realistic relationships between precipitation events at several sites is particularly important in hydrology. Indeed, streamflow depends strongly on the spatial distribution of precipitation across a watershed, and generated precipitation can be entered directly into a hydrological model to estimate streamflow in a given watershed (Xu 1999). A large number of stochastic precipitation models have been proposed in the literature, including resampling-based approaches (e.g., Buishand and Brandsma 2001), hidden Markov models for occurrence (e.g., Robertson et al. 2004) and for intensity (e.g., Charles et al. 1999), power transformation to normality (e.g., Yang et al. 2005), copula-based approaches (e.g., Bárdossy and Pegram 2009), or artificial neural networks (e.g., Cannon 2008). Wilks and Wilby (1999) and Baigorria and Jones (2010) provided an overview of precipitation models.

At a single site, a commonly used approach for modeling precipitation involves a two-stage model that simulates the occurrence of wet and dry days before simulating precipitation amounts. To preserve the local properties of precipitation (i.e., marginal distributions and temporal correlation), a number of variations of this general method have been proposed in the literature. From a two-state simulation representing wet and dry days, first- or higher-order Markov chains are commonly used to generate the occurrence process. However, this approach may underestimate the observed occurrence of prolonged droughts (Katz and Parlange 1998). Alternatively, wet and dry spell lengths could be simulated alternately from distributions fitted to corresponding observed records (Racsko et al. 1991). Once the days with occurrence of precipitation have been determined, the precipitation amount on wet days can be generated from a statistical distribution fitted to observed precipitation amounts. The short-term autocorrelation of the precipitation amount has been modeled by a parametric autocorrelation function (e.g., Katz and Parlange 1998), by an autoregressive process (e.g., Hutchinson 1995), or more recently using a copula framework (e.g., Serinaldi 2009; Li et al. 2013a).

To properly account for the stochastic dependence between sites, a number of techniques employed two spatial models: one for precipitation occurrence process and one for the precipitation amount (e.g., Jeong et al. 2012b). To avoid splitting occurrence and amount processes, Bardossy and Plate (1992) employed a censored power-transformed Gaussian distribution. Within the same context, Ailliot et al. (2009) combined the latter with a hidden Markov model for daily precipitation. An alternative solution is employed to avoid the split between occurrence and amount process when reproducing the stochastic dependence structure is to use uniform marginal distributions of a meta-Gaussian random field. The latter can also be employed in an autoregressive form in order to reproduce both spatial dependence and short-term autocorrelation. This last procedure avoids a sequential simulation conditioned on the simulation of the rainfall random fields at the previous time steps. Serinaldi and Kilsby (2014) opted for combining this autoregressive random field with a generalized additive model whose outputs are parameters of the on-site mixed discrete–continuous marginal distribution of the precipitation process. This modular structure is mathematically rich because it offers a simple way of generating the space–time evolution of discrete–continuous variables. Furthermore, it can be adapted to different areas as well as introducing exogenous forcing covariates, thus making it a valuable tool in hydrometeorology and climate research analyses where often nonnormally distributed random variables, like precipitation, wind speed, cloud cover, and humidity, are involved. After recent successful applications, in simulating daily rainfall fields over large areas (Serinaldi and Kilsby 2014), and modeling radar rainfall uncertainties (Villarini et al. 2014), it is very likely that this modular structure will have a growing impact on climate downscaling applications where stochastic weather generators are routinely adapted for these purposes. Within this context, the aim of the present paper is to propose a multisite probabilistic regression-based model that adapts this approach for daily precipitation downscaling.

Downscaling techniques have been developed to refine atmosphere–ocean global climate models (AOGCMs) data and to provide information at more relevant scales. These techniques include dynamic downscaling, which uses regional climate models (RCMs) over a limited area, and statistical downscaling, which considers statistical relationships between large-scale variables (predictors) and small-scale variables (predictands) (Wilby et al. 1998) and also provides climate information at the equivalent of point climate observations (Wilby et al. 2002). Statistical downscaling techniques represent a good alternative to dynamic methods in cases of limited resources, because of their ease of implementation and their low computational requirements (Benestad et al. 2008; Maraun et al. 2010). Stochastic weather generators can be used for climate change downscaling through appropriate adjustments to their parameters. These adjustments can be accomplished in two ways: (i) through imposed changes in the corresponding monthly statistics or (ii) by controlling the generator parameters by daily variations in simulated atmospheric circulation patterns (Wilks 2010). The considered approach in the present study focuses on the second method, since the modular structure proposed by Serinaldi and Kilsby (2014) allows for the introduction of exogenous forcing covariates.

Precipitation is one of the most important predictands from a downscaling perspective. Maraun et al. (2010) provided an overview of downscaling precipitation techniques. An alternative to stochastic weather generators in statistical downscaling is to find a direct relationship between large-scale predictors and local predictands using a transfer function within a regression framework. For example, a transfer function can include multiple linear regression (Wilby et al. 2002; Hammami et al. 2012; Jeong et al. 2012a,b), empirical orthogonal functions analysis (Huth 2004), canonical correlation analysis (Palutikof et al. 2002; Huth and Pokorná 2004), artificial neural networks (Schoof and Pryor 2001), singular value decomposition (Widmann et al. 2003), generalized linear models (GLMs; Beecham et al. 2014), and generalized additive models (GAMs; Levavasseur et al. 2011). Regression models are successfully used in downscaling, but their major drawback is that they generally reproduce the mean or the central predictions conditional to the selected predictors. Therefore, regression variability is always lower than the observed variability (von Storch 1999). In addition, Wilby et al. (2003) mentioned that regression-based approaches have difficulty preserving spatial dependence among multisite precipitation.

To correctly estimate the temporal variability in a regression model, three main approaches have been proposed in the literature: inflation (Huth 1999), randomization (von Storch 1999; Clark et al. 2004), and expansion (Burger and Chen 2005). Inflation is usually performed by multiplying the downscaled data by a constant factor, but in this case the spatial correlations between sites can be misrepresented. Randomization consists of adding in random noise. In this way, regression and unconditional resampling techniques can be combined in a single hybrid model, which can overcome weaknesses of both approaches (Jeong et al. 2012b). However, Burger and Chen (2005) indicated that a hybrid approach based on static noise failed to represent local changes in atmospheric variability in a climate change simulation, which is well explained using expended downscaling (Bürger 1996). Expanded downscaling is applied to multisite predictands by constraining the covariance matrix of the predicted series to be equal to the observed covariance matrix (Cannon 2009). On the other hand, von Storch (1999) suggested that the inflation and expansion approaches are inappropriate techniques, because the implicit assumption that all local variability can be traced back to the large-scale is improper and is not the case in reality.

Given the drawbacks of these three existing techniques in reproducing the observed temporal variability, it is relevant to build the whole conditional distribution in order to capture the variability of the process. In this regard, probabilistic regression approaches have provided useful contributions in downscaling applications. Probabilistic approaches include Bayesian formulation (Fasbender and Ouarda 2010), quantile regression (Bremnes 2004; Friederichs and Hense 2007; Cannon 2011), and regression models where outputs are parameters of the conditional distribution. The last regression approach includes the vector form of generalized linear model (VGLM), the vector form of the generalized additive model (VGAM; Yee and Wild 1996; Yee and Stephenson 2007), and the conditional density estimation network (Williams 1998; Li et al. 2013b). Probabilistic regression approaches have been extended to multisite downscaling by Cannon (2008), following the methodology used in expanded downscaling. But this method is based on the assumption that all spatial dependence structures could be reproduced using synoptic-scale atmospheric predictors. Alternatively, Ben Alaya et al. (2014) proposed a probabilistic Gaussian copula regression (PGCR) model for multisite and multivariable downscaling. However, the PGCR model does not take into account cross correlations lagged in time.

The aim of the present paper is to propose a multisite probabilistic regression-based downscaling model for daily precipitation, namely, a Bernoulli–generalized Pareto multivariate autoregressive (BMAR) model. BMAR specifies the conditional marginal distribution of precipitation for each site through AOGCM predictors, by using a VGLM whose outputs are parameters of the Bernoulli–generalized Pareto distribution. Thus, with this component, the BMAR is able to model the occurrence and the amount of precipitation simultaneously and reproduce the observed temporal variability. In addition, a latent meta-Gaussian autoregressive random field is employed by the BMAR as a stochastic component to extend the probabilistic modeling framework in multisite downscaling tasks. This component allows the BMAR model to reproduce the observed spatial relationships between sites (such as the observed lag-0 and lag-1 cross correlations) and to randomly generate realistic synthetic precipitation series.

The present paper is structured as follows. After a brief presentation of the multisite hybrid statistical downscaling model of Jeong et al. (2012b) as a classical model for comparison, the proposed BMAR model is presented. The BMAR model is then applied to the case of daily precipitation events in the southern part of the province of Quebec, Canada. Reanalysis data are used to assess the potential of the proposed method. After the calibration of the BMAR model, an independent dataset is used to assess the downscaling quality. Based on statistical criteria and climatic indices that describe the frequency, intensity, and duration, results are compared with those obtained using a multisite hybrid model of Jeong et al. (2012b) and the multivariate multiple linear regression model (MMLR). Finally, a discussion and conclusions are given.

## 2. Data and study area

The study area is located in Quebec, between 45° and 60°N and between 60° and 80°W. Nine series of observed daily precipitation events (see Fig. 1) are selected as predictands. These series, provided by Environment Canada’s hydrometeorological network, have been rehabilitated by Mekis and Hogg (1999) and cover the period from 1 January 1961 to 31 December 2000. Table 1 reports the names and latitude–longitude locations of the nine selected meteorological stations. These stations are mapped in Fig. 1 with respect to their numbers as in Table 1.

The nine stations used in this study.

Reanalysis data from the National Centers for Environmental Prediction–National Center for Atmospheric Research (NCEP–NCAR) over the period 1961–2000 (Kalnay et al. 1996; Kistler et al. 2001) are used to evaluate the potential of the downscaling method. NCEP–NCAR data are averaged on a daily basis from 6-h data on the original regular 2.5° latitude–longitude grid. Obtained predictors are then linearly interpolated onto the CCCma CGCM3 Gaussian grid (3.75° latitude–longitude grid) and normalized to the reference period 1961–90. The study area is covered by six grid points (see Fig. 1), and for each grid point, 25 NCEP–NCAR predictors are provided (see Table 2). For each day, 150 predictors are thus available. To reduce the number of predictors, a principal component analysis (PCA) is employed and the first components that preserve more than 97% of the variance of the original NCEP–NCAR predictors are then preserved as predictor variables. Finally, data from 1961 to 1990 are used for the calibration, whereas data from 1991 to 2000 are used for the validation.

NCEP–NCAR predictors on the CGCM3 grid.

## 3. Methodology

The multisite hybrid downscaling model of Jeong et al. (2012b) and the BMAR model are presented in sections 3a and 3b, respectively. The probabilistic framework for the BMAR model is presented with a description of the conditional Bernoulli–generalized Pareto distribution. Then, a simulation procedure is presented using a latent multivariate autoregressive Gaussian field to reproduce the dependence structure of precipitation results at multiple sites.

### a. Multisite hybrid statistical downscaling of Jeong et al. (2012b)

*n*×

*l*and

*n*×

*m*. The linear relationship between the two matrices

*l*×

*m*and

*n*×

*m*. The parameter matrix

#### 1) Precipitation occurrences

*n*×

*m*matrix of precipitation occurrence. For a given day,

*i*= 1, 2, …,

*n*, and a given site,

*j*= 1, 2, …,

*m*, an element

*O*

_{ij}of the matrix

*n*×

*m*) and

*l*×

*m*)are estimated MMLR parameters. The residual

*n*×

*m*matrix

*n*×

*m*). Generated residuals

*j*and on a day

*i*. The value of

*j*) is the normal cumulative distribution function having mean and standard deviation equal to that of the time series of

*j*. In addition,

*p*

_{01}is the probability of a wet day following a dry day and

*p*

_{11}is the probability of a wet day following a wet day. These transition probabilities,

*p*

_{11}and

*p*

_{01}, are estimated separately for each observation site. Jeong et al. (2012b) mentioned that the transformed binary series

*φ*(

*j*,

*s*) and continuous series

*ζ*(

*j*,

*s*) at any locations

*j*and

*s*using a simple power function expressed asThe parameters

*c*and

*d*in Eq. (7) have been estimated by minimizing RMSE among all

*m*(

*m*− 1)/2 pairs of cross-site correlation coefficients in the observed binary series and transformed binary series

#### 2) Precipitation amount

*i*at a site

*j*, to transform the precipitation amount vector

**Y**

_{j}for a site

*j*into a normal distribution (Terrell 2003; Yang et al. 2005). Then, using the MMLR model, the transformed precipitation amount matrix

*n*×

*m*) can be modeled using the following equation:where

*n*×

*m*) is the downscaled deterministic series of an Anscombe residuals matrix. The constant term matrix

*n*×

*m*) and the parameter matrix

*n*×

*m*) are estimated MMLR parameters using the OLS method. The residual matrix of the deterministic series of daily precipitation amounts

*n*×

*m*) can be described byThereafter, the residual matrix

*n*×

*m*) is generated from multivariate normal distribution having error variances

### b. BMAR model

In most applications, regression-based models are employed to reproduce the mean or the central part of predictands conditional on a set of selected predictors. The resulting model defines a mapping from predictors to predictand variables. This mapping is more suitable if predictions are generated from a deterministic function that is corrupted by a normally distributed noise process with constant variance (Cannon 2008).

For precipitation, the normality assumption might not be feasible on short time scales. On daily time scales, precipitation is more skewed and commonly modeled as a gamma distribution (Stephenson et al. 1999; Giorgi et al. 2001; Yang et al. 2005). To handle such situations, the GLM extends the linear regression to the model conditional mean of variables that may follow a wide class of distributions, such as the gamma distribution (Coe and Stern 1982; Stern and Coe 1984; Chandler and Wheater 2002). However, the gamma distribution may be not flexible enough to capture all rainfall amount behavior and can be heavy tailed at some sites. Wan et al. (2005) showed that a mixed exponential distribution outperformed the gamma distribution over a part of Canada. In general, other alternatives are needed to model extreme amounts such as a Weibull (WEI) distribution or a generalized Pareto (GP) distribution. Note that these two distributions cannot be used directly in a GLM. For this purpose, VGLMs have been proposed (Yee and Stephenson 2007). Instead of the conditional mean of a distribution, an appropriate probability density function (PDF) is selected, and then a linear regression model is employed where outputs are vectors of parameters corresponding to this selected PDF. Thus, within a probabilistic regression framework, VGLM is able to build the whole conditional distribution (Kleiber et al. 2012). In addition, it has a particular advantage in downscaling applications where it is able to recapture the variability of the process.

#### 1) Bernoulli–GP regression

*y*is the precipitation amount;

*α*(

*α*> 0) and

*β*(where 1 +

*βy*/

*α*> 0) are, respectively, the scale and the shape parameters of the zero-adjusted GP model; and

*ρ*(0 ≤

*ρ*≤ 1) is the probability of precipitation.

*β*

_{j}for a site

*j*is fixed in time, to guarantee the convergence of the maximum likelihood estimates. For the parameter of the probability of precipitation occurrences, we adopt a logistic regression, which is written aswhere

*ρ*

_{j}(

*t*) is the probability of precipitation occurrence at a site

*j*on a day

*t*,

*c*

_{j}is the coefficient of the logistic model, and

**x**(

*t*) is the value of the predictors at the day

*t*. The scale parameters

*α*

_{j}(

*t*) are given bywhere

*d*

_{j}are the coefficients of the model. Hence, the conditional Bernoulli–GP density function for the precipitation

*y*

_{j}(

*t*) on a day

*t*and at site

*j*is given byFigure 3a shows the steps involved in training the proposed Bernoulli–GP regression model given the calibration data. The coefficients

*c*

_{j},

*d*

_{j}, and

*β*

_{j}for all sites are obtained following the method of maximum likelihood by minimizing the negative log predictive density (NLPD) cost function (Haylock et al. 2006; Cawley et al. 2007; Cannon 2008),via the simplex search method of Lagarias et al. (1998). This is a direct search method that does not use numerical or analytic gradients.

#### 2) Conditional simulation using a latent multivariate autoregressive Gaussian field

*ρ*

_{j}(

*t*),

*α*

_{j}(

*t*), and

*β*

_{j}for each site

*j*and for a given day

*t*when we have the AOGCM predictors. Then, it is possible to create synthetic predicted series of precipitation by sampling the obtained Bernoulli–GP distribution for each day. In this step, it is important to maintain realistic spatiotemporal intermittence of precipitation at multiple sites. In this paper, the dependence structure between multisite precipitation is reproduced by assuming a multivariate first-order autoregressive model [MAR(1)] for a multivariate latent Gaussian process

**z**(

*t*) = [

*z*

_{1}(

*t*), …,

*z*

_{m}(

*t*)], through the following equations:where Φ is the standard normal cumulative distribution function and

*F*

_{tj}is the Bernoulli–GP cumulative density function at time

*t*and site

*j*. Figure 3b shows the steps involved in obtaining the latent multivariate Gaussian variables over the calibration period. Based on the fitted conditional Bernoulli–GP parameters, the Bernoulli–GP cumulative density function is used to express precipitation amounts as cumulative probabilities ranging from 0 to 1. To map

*z*

_{j}(

*t*) onto the full range of the normal distribution, the cumulative probabilities

*F*

_{tj}[

*y*

_{j}(

*t*)] are randomly drawn from a uniform distribution on [0, 1 −

*ρ*(

*t*)] for dry days. Finally, to obtain the set of the latent variables, data are normalized by applying the standard normal inverse cumulative density function to the series of cumulative probabilities.

**Z**

_{t}= (z

_{1t}, z

_{2t}, …, z

_{mt})

^{T}denote the obtained latent Gaussian vector of

*z*values at the

*m*sites at time

*t*= 1, 2, …,

*n*after the normalization step. The latent multivariate first-order autoregressive model for

**Z**

_{t}is given bywhere

*m*×

*m*parameter matrices and

*ε*_{t}is a random

*m*× 1 noise vector with a standard multivariate normal distribution. The method of moment estimators of the MAR(1) model is given by Bras and Rodríguez-Iturbe (1985):where

_{0}is the sample lag-0 cross covariance matrix and

_{1}is the sample lag-1 covariance matrix. Both

_{0}and

_{1}can be estimated in a pairwise manner. An element (

*k*,

*s*) in

_{0}can be estimated by noting that the joint distribution of the

*z*variables at sites

*k*and

*s*is a bivariate normal, and the correlation coefficient is the only unknown parameter. The elements of

_{1}can be estimated using this similar procedure. The matrix

Then, for a new day *t*′ from the validation period, it is possible to randomly generate **z**(*t*′) = [*z*_{1}(*t*′), …, *z*_{m}(*t*′)] using the fitted MAR(1) model. Figure 4 illustrates how the Bernoulli–GP regression model and the MAR(1) model are combined to produce one simulation at a day *t*′. The value of the synthetic precipitation time series *u*_{j}(*t*′) are obtained by applying the standard normal cumulative distribution to the generated *z*_{j}(*t*′) at site *j* using the MAR(1) model and *t*′ and at site *j*.

### c. Quality assessment of downscaled precipitation

*n*refers to the number of observations,

*t*refers to the day. The mean error (ME) is a measure of accuracy, whereas the RMSE is given by an inverse measure of the accuracy and must be minimized.

In a second validation approach, several precipitation indices defined in Table 3 are considered. For precipitation amounts, five indices are considered: the mean precipitation of wet days (MPWD), the 90th percentile of daily precipitation (Pmax90), the maximum 1-day precipitation (PX1D), the maximum 3-day precipitation (PX3D), and the maximum 5-day precipitation (PX5D). For precipitation occurrences, three indices are considered: the maximum number of consecutive wet days (WRUN), the maximum number of consecutive dry days (DRUN), and the number of wet days (NWD). The BMAR, hybrid, and MMLR models are then compared by calculating the RMSEs for each of the climatic indices for all sites.

Definition of the climatic indices used for the performance assessment of downscaled precipitation.

## 4. Results

The BMAR model has been trained for the calibration period (1960–90), using precipitation data series from the nine stations and the 40 predictors obtained by the PCA. The coefficients *c*_{j}, *d*_{j}, and *β*_{j} for each site were set following the maximum likelihood estimator. Once the parameters of the conditional Bernoulli–GP distribution [*ρ*_{j}(*t*), *α*_{j}(*t*)] have been estimated for each day *t* and for each site *j* over the calibration period, all the obtained conditional marginal distributions were used to obtain the latent variables **z**(*t*) as shown in Fig. 3b and to fit the parameters of the MAR(1) model. Finally, all the fitted BMAR parameters where used to generate precipitation series during the validation period (1991–2000), as shown in Fig. 4. Figure 5 shows an example of the results obtained when using the BMAR model for precipitation at Chelsea station during the year 1991. Figure 5a shows the estimated series of the probability of precipitation occurrences, and Fig. 5b shows both synthetic and observed precipitation series. We can see that the BMAR model provides interesting results for both precipitation amounts and precipitation occurrences.

Application results of the BMAR model are compared to those of both the hybrid and MMLR models. To explain the abilities of the stochastic weather generating scheme in the hybrid model, the MMLR model here is employed without stochastic variation. Note that, for the MMLR model, the wet day was determined when the deterministic series of the daily probability of precipitation occurrence by the MMLR occurrence model was larger than the threshold value of 0.5. The BMAR and the hybrid models give probabilistic predictions; thus, for the stability and robustness of both BMAR and the hybrid model, 100 realizations are generated of the precipitation series.

For each station, values of RMSE and ME for the three models are given in Table 4. The RMSE and ME for both BMAR and hybrid were calculated using the conditional mean for each day. From Table 4 it can be seen that, for all stations, BMAR shows the best performance, since it has lower RMSEs and close to zero MEs compared to both the hybrid and MMLR. These results demonstrate the effectiveness of the conditional Bernoulli–GP regression component in BMAR to adequately replicate the observed series. Table 4 indicates also that the hybrid model performs better than MMLR in terms of both RMSE and ME. This result is expected because of the fact that the MMLR model is in reality biased because zero precipitation amounts were included to calibrate the MMLR amount model. In addition, the Anscombe residuals *R* from the observed precipitation amount may not be exactly normally distributed. For this reason, the hybrid model employs a probability mapping technique to correct this bias. However, in terms of ME and RMSE, BMAR not only performs better than the hybrid model but it also has the advantage of its automatic aspect of mapping in the conditional distribution of precipitation using its probabilistic regression component. Thus, there is no need to rely on transformation steps or on bias correction procedures (such us a probability mapping technique) when evaluating the BMAR model.

Quality assessment of the estimated series for the validation period (1991–2000) for the BMAR, hybrid, and MMLR models. Criteria are ME and RMSE. For the PGCR model, the criteria were calculated from the median of 100 realizations. Boldface indicates the best result.

Figure 6 summarizes the RMSEs of downscaled precipitation climatic indices for each model over the nine weather stations during the validation period (1991–2000). The RMSEs of both the BMAR and hybrid models are calculated using the mean of 100 realizations. For all precipitation amounts indices, it can be seen that, in terms of RMSE, BMAR performs better than the two other models for all stations. Thereby, the use of the GP distribution allows the BMAR model to better reproduce observed monthly characteristics of precipitation amounts. It can also be noted that the hybrid model gives better results compared to the MMLR model for all stations, except for Pmax90 indices, for which it improves the results only for stations Cedars, Drummondville, Donnacona, and Bagotville A.

Results of downscaled precipitation occurrence indices are presented in Fig. 6f. for WRUN indices. These results indicate that both the BMAR and hybrid models outperformed the MMLR model in terms of the RMSE for WRUN indices over the all stations. On the other hand, for the same indices BMAR is slightly better than the hybrid. Although, Fig. 6g shows that for all stations BMAR outperformed the two others models in terms of the RMSE of NWD indices, and the hybrid gives better results than MMLR. Finally, for DRUN indices, the same conclusion can be deduced from Fig. 6h, except for Nicolet station, where the hybrid model is slightly better than the BMAR model. Thereby, based on WRUN, NWD, and DRUN, it can be concluded that the logistic regression in the BMAR model does a good job overall in representing the monthly characteristics of precipitation amounts.

To evaluate the ability of the multivariate autoregressive component in BMAR to reproduce the observed dependence structures in both time and space, scatterplots (Figs. 7a,b) of the lag-0 and lag-1 cross correlation of modeled versus observed precipitation are plotted for the three models. The correlation values of both the BMAR and hybrid models are obtained using the mean of the correlation values calculated from a 100 realizations. For lag-0 cross correlation, points correspond to all 36 combinations of pairs of stations, while for lag-1 cross correlation, points correspond to all 81 combinations because lag-1 cross correlations are generally not symmetric. It can be seen that MMLR generally overestimates the cross correlation of both lag 0 and lag 1, with an RMSE equal to 0.3184 for lag-0 cross correlation and 0.1063 for lag-1 cross correlation. MMLR gives the poorest results compared to BMAR and hybrid. This finding is expected since MMLR is not a multisite model. Figure 7a shows that the BMAR and hybrid models preserve the lag-0 cross correlation adequately. On the other hand, BMAR outperformed the hybrid in terms of RMSE. Finally, from Fig. 7b it can be seen that BMAR reproduces more adequately the lag-1 cross correlation than the hybrid model and values of RMSE confirm this result, since they are equal to 0.0495 for BMAR and 0.0795 for the hybrid. In fact, the hybrid model, by its construction, is only able to take into account the lag-1 autocorrelation, unlike BMAR, which is assumed to preserve the full lag-1 cross correlation.

Finally, joint probabilities of the events when two sites are both dry or both wet on a given day are displayed in Fig. 8. The BMAR adequately simulates these joint probabilities and outperforms both hybrid and MMLR models that provide overestimates of these probabilities. When dealing with “wet” cases, the three models provide underestimates. Nevertheless, the BMAR gives better results. In reality, as described in section 3b(2), in the step of obtaining the latent variables, the cumulative probabilities *F*_{tj}[*y*_{j}(*t*)] are randomly drawn from a uniform distribution on [0, 1 − *ρ*(*t*)]. This implies that this part of the joint distribution is free from having any spatial correlation structure. However, the autoregressive component can indirectly reproduce a part of the spatial dependence structure in this part of the joint distribution, because a value of the generated latent variable for dry days depends on previous days that may depend on generated values in other sites. Nevertheless, to circumvent this problem, Ben Alaya et al. (2014) separated the two processes of the occurrence and amount by considering a latent variable for each process. However, as proposed in the present paper, taking into account the two processes simultaneously makes the model more parsimonious, since the number of latent variables is reduced.

## 5. Discussion

In general, regression-based downscaling mapping from coarse-scale predictors reproduces the mean of the process conditionally to the selected independent predictors. As a consequence, the variability of the regression is always smaller than the initial variability. Moreover, spatial dependency among multisite local predictand variables is not reproduced accurately by regression mapping from large-scale predictors (Burger and Chen 2005; Jeong et al. 2012a). In this study, cross-site correlations of multisite precipitation are obviously overestimated using the MMLR model and this overestimation is evidence that one cannot reproduce local-scale spatial dependency by simply using coarse NCEP–NCAR predictors. Therefore, the hybrid model of Jeong et al. (2012b) provided a statistical generation procedure based on a randomization approach, in order to reproduce the unexplained temporal variability and the cross-site correlation of precipitation occurrence and amount among the observation sites. However, this hybrid procedure is based on a static noise model and failed to represent local changes in total precipitation variability in a climate change simulation.

Therefore, this study proposed using the BMAR model, which employs a stochastic generation procedure that considers only the dependency structures. Indeed, temporal variability can be preserved by using the conditional distributions through the probabilistic regression. This attractive characteristic of the proposed BMAR model allows the model to correctly reproduce the observed temporal variability. Thereby, the elimination of the marginal effect helps to model and understand effectively the spatiotemporal dependency structures, as it has no relationship with the marginal behavior. To this end, the biggest challenge in the proposed method is to uniformly generate a multivariate distribution in the open interval (0, 1) that preserves the spatiotemporal intermittences of several variables after the elimination of marginal distribution effects. Then, the estimation can be obtained by applying the inverse cumulative distribution function using the conditional distributions. Hence, the proposed solution can be considered to be similar to a copula-based framework (Chebana and Ouarda 2007; El Adlouni and Ouarda 2008). A copula is a multivariate distribution whose marginals are uniformly distributed on the interval [0, 1]. In the proposed method the generation of random variables in the interval (0, 1) is carried out through the latent Gaussian variables obtained by applying the transformation *h*(⋅) [Eq. (17)] to the multisite precipitation data. This same transformation is introduced in a Gaussian copula. Nevertheless, in a Gaussian copula, latent variables are modeled through a multivariate Gaussian distribution. In the present work, they are modeled through a multivariate Gaussian autoregressive model in order to include the spatiotemporal dependences, more precisely, the lag-1 cross correlation. The Gaussian copula is employed by Ben Alaya et al. (2014) in a probabilistic regression model to preserve dependence structures in a multisite and multivariable downscaling perspective. Nevertheless, this approach is limited to preserving only the lag-0 cross correlation. Therefore, the proposed BMAR model can be considered to be an extension to the Gaussian copula regression model framework to account for the lag-1 cross correlation when the marginal distributions are specified using the Bernoulli–GP distribution.

As a direct consequence of the elimination of the marginal distribution effect when preserving dependence structures in underlying BMAR, it is straightforward to include the observed series of other variables and to extend the model to multivariable tasks. The extension of the BMAR by adding variables other than precipitation would require that appropriate distributions must be identified and incorporated into the VGLM. However, the stochastic generator component procedures remain the same. For example, for the temperature variable the normal distribution could be chosen, and for a normally distributed noise process with nonconstant variance, the conditional density regression for the temperature variable would have two outputs: one for the conditional mean and one for the conditional variance.

The NCEP–NCAR data are used for calibration and validation of the BMAR model. Even if NCEP–NCAR data are complete and physically consistent, since they are basically interpolations of observational databased on a dynamical model, they are subject to model biases (Hofer et al. 2012). NCEP–NCAR variables that are not assimilated, but which are generated by the parameterizations based on dynamical model, can significantly deviate from real weather. The use of such variables for the calibration and validation of empirical downscaling techniques may induce a significant deviation of the modeled relationships between predictors and predictands from the reality. Thus, this makes evaluation of downscaling techniques more difficult. Thereby, the selection of appropriate large-scale atmospheric predictor variables for the proposed BMAR requires comprehensive consideration. In this way, studying the sensitivity of the BMAR model to NCEP–NCAR predictors is important, not only for a better selection of predictors but also for a more realistic elaboration of future climate scenarios.

## 6. Conclusions

A Bernoulli–GP multivariate autoregressive model is proposed in this paper for simultaneously downscaling AOGCM predictors to daily precipitation at several sites. The BMAR relies on a probabilistic modeling framework in order to predict the conditional distribution of precipitation at a daily time scale using a VGLM applied to the discrete–continuous Bernoulli–GP distribution. Prediction parameters of the Bernoulli–GP distribution allow for (i) modeling precipitation occurrences and precipitation amounts at the same time, (ii) dealing with the problem of nonnormality of precipitation data, and (iii) reproducing observed temporal variability. To allow a realistic representation of relationships between stations in both time and space, stochastic generators procedures where applied to the VGLM using a latent multivariate autoregressive Gaussian field.

The developed model was then applied to generate daily precipitation series at nine stations located in the southern part of the province of Quebec, Canada. NCEP–NCAR reanalysis data were used as predictors in order to assess the potential of the method, although the final objective is to use AOGCM predictors. Application results of the BMAR model were compared to those obtained using the MMLR and the hybrid model. Results show that the BMAR model gives the best performance between the two models in terms of RMSE and ME. Moreover, the comparison based on precipitation indices shows that the BMAR model is more able to reproduce precipitation amounts and occurrence characteristics on a seasonal basis. In addition, BMAR provides better preservation of the relationships between multisite precipitation events in both time and space.

Model evaluations suggest that the BMAR model is capable of generating series with realistic spatial and temporal variability. In addition, the proposed model performed better than a multisite hybrid regression-stochastic generator model for most verification statistics. The BMAR model may be a useful tool for multisite precipitation downscaling based on AOGCM data.

## Acknowledgments

We gratefully acknowledge the comments of the Editor Joseph Barsugli, and two reviewers, Francesco Serinaldi and an anonymous reviewer. We acknowledge Eva Mekis from Environment Canada for providing observed datasets of rehabilitated precipitation. The authors would like to acknowledge also the Data Access and Integration (DAI; http://loki.qc.ec.gc.ca/DAI/login-e.php) team for providing the predictors data and technical support. The DAI data download gateway is made possible through collaboration among the Global Environmental and Climate Change Centre (GEC3), the Adaptation and Impacts Research Section (AIRS) of Environment Canada, and the Drought Research Initiative (DRI).

## REFERENCES

Ailliot, P., , C. Thompson, , and P. Thomson, 2009: Space–time modelling of precipitation by using a hidden Markov model and censored Gaussian distributions.

,*J. Roy. Stat. Soc.***58C**, 405–426, doi:10.1111/j.1467-9876.2008.00654.x.Baigorria, G. A., , and J. W. Jones, 2010: GiST: A stochastic model for generating spatially and temporally correlated daily rainfall data.

,*J. Climate***23**, 5990–6008, doi:10.1175/2010JCLI3537.1.Bárdossy, A., , and E. J. Plate, 1992: Space–time model for daily rainfall using atmospheric circulation patterns.

,*Water Resour. Res.***28**, 1247–1259, doi:10.1029/91WR02589.Bárdossy, A., , and G. Pegram, 2009: Copula based multisite model for daily precipitation simulation.

,*Hydrol. Earth Syst. Sci.***13**, 2299–2314, doi:10.5194/hess-13-2299-2009.Beecham, S., , M. Rashid, , and R. K. Chowdhury, 2014: Statistical downscaling of multi-site daily rainfall in a South Australian catchment using a generalized linear model.

,*Int. J. Climatol.***34,**3654–3670, doi:10.1002/joc.3933.Ben Alaya, M. A., , F. Chebana, , and T. Ouarda, 2014: Probabilistic Gaussian copula regression model for multisite and multivariable downscaling.

,*J. Climate***27**, 3331–3347, doi:10.1175/JCLI-D-13-00333.1.Benestad, R. E., , I. Hanssen-Bauer, , and D. Chen, 2008:

*Empirical-Statistical Downscaling.*World Scientific, 228 pp.Bras, R. L., , and I. Rodríguez-Iturbe, 1985:

*Random Functions and Hydrology,*Courier Dover, 559 pp.Bremnes, J. B., 2004: Probabilistic forecasts of precipitation in terms of quantiles using NWP model output.

,*Mon. Wea. Rev.***132**, 338–347, doi:10.1175/1520-0493(2004)132<0338:PFOPIT>2.0.CO;2.Buishand, T. A., , and T. Brandsma, 2001: Multisite simulation of daily precipitation and temperature in the Rhine basin by nearest-neighbor resampling.

,*Water Resour. Res.***37**, 2761–2776, doi:10.1029/2001WR000291.Bürger, G., 1996: Expanded downscaling for generating local weather scenarios.

,*Climate Res.***7**, 111–128, doi:10.3354/cr007111.Bürger, G., , and Y. Chen, 2005: A regression-based downscaling of spatial variability for hydrologic applications.

,*J. Hydrol.***311**, 299–317, doi:10.1016/j.jhydrol.2005.01.025.Cannon, A. J., 2008: Probabilistic multisite precipitation downscaling by an expanded Bernoulli–gamma density network.

,*J. Hydrometeor.***9**, 1284–1300, doi:10.1175/2008JHM960.1.Cannon, A. J., 2009: Negative ridge regression parameters for improving the covariance structure of multivariate linear downscaling models.

,*Int. J. Climatol.***29**, 761–769, doi:10.1002/joc.1737.Cannon, A. J., 2011: Quantile regression neural networks: Implementation in R and application to precipitation downscaling.

,*Comput. Geosci.***37**, 1277–1284, doi:10.1016/j.cageo.2010.07.005.Cawley, G. C., , G. J. Janacek, , M. R. Haylock, , and S. R. Dorling, 2007: Predictive uncertainty in environmental modelling.

,*Neural Networks***20**, 537–549, doi:10.1016/j.neunet.2007.04.024.Chandler, R. E., , and H. S. Wheater, 2002: Analysis of rainfall variability using generalized linear models: A case study from the west of Ireland.

*Water Resour. Res.,***38,**1192, doi:10.1029/2001WR000906.Charles, S. P., , B. C. Bates, , and J. P. Hughes, 1999: A spatiotemporal model for downscaling precipitation occurrence and amounts.

*J. Geophys. Res.,***104,**31 657–31 669, doi:10.1029/1999JD900119.Chebana, F., , and T. B. M. J. Ouarda, 2007: Multivariate L-moment homogeneity test.

,*Water Resour. Res.***43**, W08406, doi:10.1029/2006WR005639.Clark, M., , S. Gangopadhyay, , L. Hay, , B. Rajagopalan, , and R. Wilby, 2004: The Schaake shuffle: A method for reconstructing space–time variability in forecasted precipitation and temperature fields.

,*J. Hydrometeor.***5**, 243–262, doi:10.1175/1525-7541(2004)005<0243:TSSAMF>2.0.CO;2.Coe, R., , and R. Stern, 1982: Fitting models to daily rainfall data.

,*J. Appl. Meteor.***21**, 1024–1031, doi:10.1175/1520-0450(1982)021<1024:FMTDRD>2.0.CO;2.El Adlouni, S., , and T. Ouarda, 2008: Study of the joint distribution flow-level by copulas: Case of the Chateauguay River.

,*Can. J. Civ. Eng.***35**, 1128–1137, doi:10.1139/L08-054.Fasbender, D., , and T. B. M. J. Ouarda, 2010: Spatial Bayesian model for statistical downscaling of AOGCM to minimum and maximum daily temperatures.

,*J. Climate***23**, 5222–5242, doi:10.1175/2010JCLI3415.1.Friederichs, P., , and A. Hense, 2007: Statistical downscaling of extreme precipitation events using censored quantile regression.

,*Mon. Wea. Rev.***135**, 2365–2378, doi:10.1175/MWR3403.1.Giorgi, F., and et al. , 2001: Regional climate information—Evaluation and projections.

*Climate Change 2001: The Scientific Basis,*J. T. Houghton et al., Eds., Cambridge University Press, 585–638.Hammami, D., , T. S. Lee, , T. B. M. J. Ouarda, , and J. Le, 2012: Predictor selection for downscaling GCM data with LASSO.

,*J. Geophys. Res. Atmos.***117**, D17116, doi:10.1029/2012JD017864.Haylock, M. R., , G. C. Cawley, , C. Harpham, , R. L. Wilby, , and C. M. Goodess, 2006: Downscaling heavy precipitation over the United Kingdom: A comparison of dynamical and statistical methods and their future scenarios.

,*Int. J. Climatol.***26**, 1397–1415, doi:10.1002/joc.1318.Hofer, M., , B. Marzeion, , and T. Mölg, 2012: Comparing the skill of different reanalyses and their ensembles as predictors for daily air temperature on a glaciated mountain (Peru).

,*Climate Dyn.***39**, 1969–1980, doi:10.1007/s00382-012-1501-2.Hutchinson, M., 1995: Stochastic space–time weather models from ground-based data.

,*Agric. For. Meteor.***73**, 237–264, doi:10.1016/0168-1923(94)05077-J.Huth, R., 1999: Statistical downscaling in central Europe: Evaluation of methods and potential predictors.

,*Climate Res.***13**, 91–101, doi:10.3354/cr013091.Huth, R., 2004: Sensitivity of local daily temperature change estimates to the selection of downscaling models and predictors.

*J. Climate,***17,**640–652, doi:10.1175/1520-0442(2004)017<0640:SOLDTC>2.0.CO;2.Huth, R., , and L. Pokorná, 2004: Parametric versus non-parametric estimates of climatic trends.

,*Theor. Appl. Climatol.***77**, 107–112, doi:10.1007/s00704-003-0026-3.Jeong, D. I., , A. St-Hilaire, , T. B. M. J. Ouarda, , and P. Gachon, 2012a: A multivariate multi-site statistical downscaling model for daily maximum and minimum temperatures.

,*Climate Res.***54**, 129–148, doi:10.3354/cr01106.Jeong, D. I., , A. St-Hilaire, , T. B. M. J. Ouarda, , and P. Gachon, 2012b: Multisite statistical downscaling model for daily precipitation combined by multivariate multiple linear regression and stochastic weather generator.

,*Climatic Change***114**, 567–591, doi:10.1007/s10584-012-0451-3.Kalnay, E., and et al. , 1996: The NCEP/NCAR 40-Year Reanalysis Project.

,*Bull. Amer. Meteor. Soc.***77**, 437–471, doi:10.1175/1520-0477(1996)077<0437:TNYRP>2.0.CO;2.Katz, R. W., , and M. B. Parlange, 1998: Overdispersion phenomenon in stochastic modeling of precipitation.

,*J. Climate***11**, 591–601, doi:10.1175/1520-0442(1998)011<0591:OPISMO>2.0.CO;2.Kistler, R., and et al. , 2001: The NCEP–NCAR 50-Year Reanalysis: Monthly means CD-ROM and documentation.

,*Bull. Amer. Meteor. Soc.***82**, 247–268, doi:10.1175/1520-0477(2001)082<0247:TNNYRM>2.3.CO;2.Kleiber, W., , R. W. Katz, , and B. Rajagopalan, 2012: Daily spatiotemporal precipitation simulation using latent and transformed Gaussian processes.

,*Water Resour. Res.***48**, W01523, doi:10.1029/2011WR011105.Lagarias, J. C., , J. A. Reeds, , M. H. Wright, , and P. E. Wright, 1998: Convergence properties of the Nelder-Mead simplex method in low dimensions.

,*SIAM J. Optim.***9**, 112–147, doi:10.1137/S1052623496303470.Levavasseur, G., , M. Vrac, , D. Roche, , D. Paillard, , A. Martin, , and J. Vandenberghe, 2011: Present and LGM permafrost from climate simulations: contribution of statistical downscaling.

,*Climate Past Discuss.***7**, 1647–1692, doi:10.5194/cpd-7-1647-2011.Li, C., , V. P. Singh, , and A. K. Mishra, 2013a: A bivariate mixed distribution with a heavy-tailed component and its application to single-site daily rainfall simulation.

,*Water Resour. Res.***49**, 767–789, doi:10.1002/wrcr.20063.Li, C., , V. P. Singh, , and A. K. Mishra, 2013b: Monthly river flow simulation with a joint conditional density estimation network.

,*Water Resour. Res.***49**, 3229–3242, doi:10.1002/wrcr.20146.Maraun, D., and et al. , 2010: Precipitation downscaling under climate change: Recent developments to bridge the gap between dynamical models and the end user.

,*Rev. Geophys.***48**, RG3003, doi:10.1029/2009RG000314.Mekis, E., , and W. D. Hogg, 1999: Rehabilitation and analysis of Canadian daily precipitation time series.

,*Atmos.–Ocean***37**, 53–85, doi:10.1080/07055900.1999.9649621.Palutikof, J. P., , C. M. Goodess, , S. J. Watkins, , and T. Holt, 2002: Generating rainfall and temperature scenarios at multiple sites: Examples from the Mediterranean.

,*J. Climate***15**, 3529–3548, doi:10.1175/1520-0442(2002)015<3529:GRATSA>2.0.CO;2.Racsko, P., , L. Szeidl, , and M. Semenov, 1991: A serial approach to local stochastic weather models.

,*Ecol. Modell.***57**, 27–41, doi:10.1016/0304-3800(91)90053-4.Rasmussen, P., 2013: Multisite precipitation generation using a latent autoregressive model.

,*Water Resour. Res.***49**, 1845–1857, doi:10.1002/wrcr.20164.Robertson, A. W., , S. Kirshner, , and P. Smyth, 2004: Downscaling of daily rainfall occurrence over northeast Brazil using a hidden Markov model.

,*J. Climate***17**, 4407–4424, doi:10.1175/JCLI-3216.1.Schoof, J. T., , and S. C. Pryor, 2001: Downscaling temperature and precipitation: A comparison of regression-based methods and artificial neural networks.

,*Int. J. Climatol.***21**, 773–790, doi:10.1002/joc.655.Serinaldi, F., 2009: A multisite daily rainfall generator driven by bivariate copula-based mixed distributions.

*J. Geophys. Res.,***114,**D10103, doi:10.1029/2008JD011258.Serinaldi, F., , and C. G. Kilsby, 2014: Simulating daily rainfall fields over large areas for collective risk estimation.

,*J. Hydrol.***512**, 285–302, doi:10.1016/j.jhydrol.2014.02.043.Stephenson, D. B., , K. Rupa Kumar, , F. J. Doblas-Reyes, , J. F. Royer, , F. Chauvin, , and S. Pezzulli, 1999: Extreme daily rainfall events and their impact on ensemble forecasts of the Indian monsoon.

,*Mon. Wea. Rev.***127**, 1954–1966, doi:10.1175/1520-0493(1999)127<1954:EDREAT>2.0.CO;2.Stern, R., , and R. Coe, 1984: A model fitting analysis of daily rainfall data.

,*J. Roy. Stat. Soc.***147A,**1–34, doi:10.2307/2981736.Terrell, G. R., 2003: The Wilson–Hilferty transformation is locally saddlepoint.

,*Biometrika***90**, 445–453, doi:10.1093/biomet/90.2.445.Villarini, G., , B.-C. Seo, , F. Serinaldi, , and W. F. Krajewski, 2014: Spatial and temporal modeling of radar rainfall uncertainties.

,*Atmos. Res.***135****–****136**, 91–101, doi:10.1016/j.atmosres.2013.09.007.von Storch, H., 1999: On the use of “inflation” in statistical downscaling.

,*J. Climate***12**, 3505–3506, doi:10.1175/1520-0442(1999)012<3505:OTUOII>2.0.CO;2.Wan, H., , X. Zhang, , and E. M. Barrow, 2005: Stochastic modelling of daily precipitation for Canada.

,*Atmos.–Ocean***43**, 23–32, doi:10.3137/ao.430102.Widmann, M., , C. S. Bretherton, , and E. P. Salathé Jr., 2003: Statistical precipitation downscaling over the northwestern United States using numerically simulated precipitation as a predictor.

,*J. Climate***16**, 799–816, doi:10.1175/1520-0442(2003)016<0799:SPDOTN>2.0.CO;2.Wilby, R. L., , H. Hassan, , and K. Hanaki, 1998: Statistical downscaling of hydrometeorological variables using general circulation model output.

,*J. Hydrol.***205**, 1–19, doi:10.1016/S0022-1694(97)00130-3.Wilby, R. L., , C. W. Dawson, , and E. M. Barrow, 2002: SDSM—A decision support tool for the assessment of regional climate change impacts.

,*Environ. Model. Software***17**, 145–157, doi:10.1016/S1364-8152(01)00060-3.Wilby, R. L., , O. Tomlinson, , and C. Dawson, 2003: Multi-site simulation of precipitation by conditional resampling.

,*Climate Res.***23**, 183–194, doi:10.3354/cr023183.Wilks, D. S., 2010: Use of stochastic weathergenerators for precipitation downscaling.

,*Wiley Interdiscip. Rev.: Climate Change***1**, 898–907, doi:10.1002/wcc.85.Wilks, D. S., , and R. L. Wilby, 1999: The weather generation game: A review of stochastic weather models.

,*Prog. Phys. Geogr.***23**, 329–357, doi:10.1177/030913339902300302.Williams, P. M., 1998: Modelling seasonality and trends in daily rainfall data.

,*Adv. Neural Inf. Process. Syst.***10**, 985–991.Xu, C.-y., 1999: From GCMs to river flow: A review of downscaling methods and hydrologic modelling approaches.

,*Prog. Phys. Geogr.***23**, 229–249, doi:10.1177/030913339902300204.Yang, C., , R. E. Chandler, , V. S. Isham, , and H. S. Wheater, 2005: Spatial–temporal rainfall simulation using generalized linear models.

,*Water Resour. Res.***41**, W11415, 1–13, doi:10.1029/2004WR003739.Yee, T. W., , and C. Wild, 1996: Vector generalized additive models.

,*J. Roy. Stat. Soc.***58B**, 481–493.Yee, T. W., , and A. G. Stephenson, 2007: Vector generalized linear and additive extreme value models.

,*Extremes***10**, 1–19, doi:10.1007/s10687-007-0032-4.