## 1. Introduction

Numerical weather forecast (NWF) models lack enough spatial resolution to achieve accurate precipitation forecasts at the observatory level, particularly for hydrology applications. Therefore, the use of downscaling methods to bridge the gap between the scale where the NWF models are proficient and the observatory level is a common task in current meteorological practice (Hughes and Guttorp 1994; Wilby and Wigley 1997). Downscaling models take into account those local influences, such as topography or slope orientation, that cannot be properly resolved by the coarse-resolution NWF models (Obled et al. 2002).

In general, statistical downscaling techniques rely on several assumptions. One of the most important ones is that there exists a strong relationship between the structure of a set of large-scale predictor fields and the local variable that is being predicted by means of the statistical downscaling model (SDM). See, for instance, Benestad et al. (2008) and references therein. There exist several techniques to formulate SDMs, such as multiple linear regression models, neural networks, weather classification techniques, and many more (Wilby et al. 2004). In general, SDMs constitute a cheap alternative to the dynamical downscaling approach. The concept of dynamical downscaling refers to the use of a regional numerical weather forecasting model to solve the equations of motion with a much higher-resolution nested inside a coarser model (Yuan et al. 2007). The fact that SDMs are computationally simpler than full-featured regional models makes them interesting in several uses. For instance, they are often used to perform estimations of local variables for long periods of time, like the ones commonly used in climate change assessment. However, the computational simplicity is also very interesting in other uses, such as in ensemble seasonal forecasting (Feddersen and Andersen 2005) or operational ensemble forecasting (Diomede et al. 2008). These downscaling techniques are also often used as a postprocessing step of numerical weather forecast, in order to calibrate forecasts (Hamill and Whitaker 2007). The method of analogs is one of the easiest ones that can be used in the field of statistical downscaling, and, despite its simplicity, it has proven competitive in performance with respect to more complicated methods in several studies (Fernández-Ferrero et al. 2009; Wetterhall et al. 2005; Zorita et al. 1995; Zorita and von Storch 1999). This is to some extent surprising, since several studies have shown in the past that it is seldom easy to find perfect analogs, at least on a global scale (Van den Dool 1994), while this problem is easier to overcome at regional scales (Roebber and Bosart 1998), particularly for short forecast lead time, typically of the order of 1 day or shorter (Hamill and Whitaker 2006; Stephenson et al. 2005).

The use of an analog-based SDM allows several degrees of freedom to configure the downscaling model. There exist several measures of analogy, such as the Euclidean distance, the Mahalanobis distance, the angle between the state vectors in the phase space, anomaly correlation, or distance in the canonical correlation analysis (CCA) space, among many others (Fernández and Sáenz 2003; Matulla et al. 2008; Obled et al. 2002; Toth 1991; Yates et al. 2003; Wetterhall et al. 2005).

In a previous paper (Fernández-Ferrero et al. 2009), the authors have already shown that an analog-based SDM shows good performance for a deterministic quantitative precipitation forecast (QPF) over the same study area at short ranges after a thorough evaluation of several predictors and alternative SDMs based, for instance, on neural networks. A statistical downscaling model based on the use of analogs followed by a classical CCA allowed prediction of precipitation 6 h ahead at each of the 6 sensors of the area. The most relevant input variables turned out to be temperature, dewpoint temperature, and persistence (Fernández-Ferrero et al. 2009). Additionally, a principal component analysis was carried out and showed that the leading EOF accounted for as much as 82% of the overall variability of precipitation in the area covered by the six sensors. The six sensors exhibited similar loading factors, thus indicating fronts linked to incoming lows as the main mechanism driving precipitation in the area. That is why for this study it has been considered that using average values of the statistical indicators of the six sensors can provide a clearer insight on the overall performance of the models analyzed throughout the whole area, without analyzing the spatial variability of the performance, which is not significant over such a small area.

It has long ago been recognized that, particularly for hydrological forecasts, there exists a need to convey measures of uncertainty with the forecasts, so that the end user is able to make a better use of the forecast (Diomede et al. 2008; Kruizinga and Murphy 1983; Krzysztofowicz 1998, 2001).

In the literature, there exist several examples of SDMs designed to develop probabilistic quantitative precipitation forecasts (PQPFs). Gutiérrez et al. (2004) developed a method based on the pre-selection of a set of *m*-means clusters. The quantization inherent to the use of *m*-means clusters was introduced to speed up the algorithm. Next, a kernel-based probability density function (PDF) estimation around the clusters’ centroids was performed to yield probabilistic forecasts. There are other cases in which first there exists the downscaling of some statistical moments and, next, the use of a weather generator yields probabilistic forecasts (Busuioc and von Storch 2003; Marty et al. 2008). Obled et al. (2002) designed a method to provide PQPF by means of analogs. After a pre-election of a number of similar weather situations to the date that they forecasted, they proposed the parametric fit of an incomplete gamma function to the precipitation corresponding to the selected set of similar analogs. Next, the quantiles of the precipitation values, which represent some confidence boundaries, could be obtained from the fitted parametric distribution. This method of preselecting a set of analogs to produce a probabilistic forecast is very common in the literature (Diomede et al. 2008; Hamill et al. 2004; Kruizinga and Murphy 1983). Since the subset of analogs yields several forecasts of precipitation, an approximation to the probability of precipitation can be obtained from the distribution of analogs, through several algorithms (Hamill and Whitaker 2006). In some cases, the estimation of the PQPF is also obtained through the use of an ensemble of inputs obtained from a NWF model where the initial conditions have been perturbed by means of bred vectors (Gangopadhyay et al. 2005).

Bayes’s theorem is the starting point of a set of techniques used to combine and calibrate forecasts produced by different systems, such as multimodel ensembles (Coelho et al. 2004) or various types of sources of uncertainty in hydrologic forecasting systems (Krzysztofowicz 1998, 2001). The use of Bayes’s theorem allows us to include under the same theoretical framework the calibration, combination, and downscaling in multimodel ensemble systems, by means of the so-called forecast assimilation theory (Stephenson et al. 2005).

The main objective of this work is to evaluate and intercompare the ability of different PQPF systems in a study covering a period of 2.5 yr over the city of Bilbao, Spain, and the surrounding areas. The ultimate practical application of this study would be the management of precipitation waters in the sewage network of the city of Bilbao (Fernández-Ferrero et al. 2009). The four PQPF models are characterized by different formulations and the intercomparison of their results is intended to analyze the sensitivity of the performance to increasing levels of complexity in the formulation, given that some other factors (like the length of the database) are kept fixed, since longer hourly datasets of enough quality do not exist over the area.

The first downscaling model used is probably the simplest PQPF system that can be built. The probability of exceedance over a set of thresholds is computed from the PDF built from a set of preselected analogs. The analogs are defined on the basis of the analogy of the synoptic state corresponding to the analyses at the start of the period being forecast. The analogy between the analyses’ synoptic situations is evaluated by means of the Euclidean distance, which is a common selection in this kind of studies (Matulla et al. 2008; Zorita and von Storch 1999). The second PQF is a Bayesian system in which the prior probability is the one yielded by the previous model. Next, in the last two models, a change in the distance that is used to evaluate the likelihood used in the framework of a Bayesian PQPF system is presented. The distance used in the evaluation of the likelihood includes a term, which considers both the distance between the synoptic situations and the difference between the precipitation produced during each of the events. The use of such a distance means that analogs that produced similar quantities of precipitation in the past are more likely to happen again than those synoptic situations that shared a similar synoptic structure, but failed to produce a similar amount of precipitation (Roebber and Bosart 1998). The rationale under the selection of this formulation of the likelihood is, therefore, clear. Without this term involving precipitation in the definition of the likelihood, any PQPF system would yield some forecasts conditioned only on the similarity of today’s synoptic situation with the analogs present in the search library. If a set of analogs produced different amounts of precipitation in the past, it is to be hoped that the uncertainty of the forecast is wide. Conversely, if precipitation amounts were very similar in the past, it can be expected that they will repeat in the future with high probability.

Section 2 presents the data and methods used in this study. Results are presented in section 3 and, finally, the discussion and conclusions are stated in section 4.

## 2. Data and methods

The study has been performed using 6-hourly precipitation data from 6 rain gauges placed over Bilbao (Spain) and the surrounding towns, located at both banks of the Nervión-Ibaizabal River. The six rain gauges cover an area of roughly 10 × 15 km^{2}. The names of the six rain gauges are: Galindo (Gal), Sifón (Sif), Abusu (Abu), Algorta (Alg), Lamiako (Lam), and Sondika (Son). The data from the 6 rain gauges cover the period June 2000–December 2003, and 6-hourly averages of precipitation were computed for this study from the original 10-min records.

The European Centre for Medium-Range Weather Forecasts (ECMWF) operational atmospheric analysis corresponding to the start of each of the 6-h-long forecasts (covering the period 1 June 2000–31 December 2003) are used to feed the analog-based SDM. They have been obtained with a horizontal resolution of 1.125° (latitude and longitude) and with a 6-hourly frequency from ECMWF’s MARS server. Geopotential height and zonal and meridional components of the wind speed from the archive of operational analysis are used at the following isobaric levels: 1000, 850, 700, 500, and 300 hPa. Additionally, temperature and dewpoint temperature are used in the vertical of the study area, plus the persistence from precipitation (rainfall observed during the previous 6-hourly interval). Because of the different dimensionality of the different predictors, principal component analysis has been used to get the main principal components of each of the fields. For details of the algorithm, the predictors and the comparison with other methods, the reader is referred to Fernández-Ferrero et al. (2009). The sensitivity analysis of the results to the different downscaling models, different predictors, and even a mesoscale integration using the fifth-generation Pennsylvania State University–National Center for Atmospheric Research (PSU–NCAR) Mesoscale Model (MM5) was shown in a previous study (Fernández-Ferrero et al. 2009). The current paper uses the best of the downscaling models identified in the previous paper.

As explained more thoroughly below, during verification, data from 80 records (13 days) before/after the record being forecast are discarded from the statistical downscaling model’s analog library.

Several precipitation thresholds *u _{i}*, with

*i*= 1 …

*N*and

_{U}*N*= 11 (0.05, 0.1, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 2, 3, and 4 mm h

_{U}^{−1}) have been used to define the precipitation probability. These thresholds have been computed from the precipitation distribution during the period June 2000–December 2003. For a realistic application of the methodology in operational mode, it would be interesting to also use higher thresholds. However, with the current length of the database, it is not feasible, since very intense precipitation events are very scarce in this period. Anyway, since the purpose of this study is to evaluate the performance of different methodologies, the set of thresholds is enough to show the validity of the method to produce valid PQPFs and to illustrate the differences in the performance of the four models.

For the previously mentioned thresholds, PQPF is expressed by means of the probability of exceedance of precipitation above each of the thresholds. That is, the PQPF expresses the probability that rainfall is larger than any of the *u _{i}* thresholds. From the probability of exceedance, a discrete version of the PDF of precipitation is obtained. The central points

*u**

_{i}, with

*i*= 1 …

*N*− 1 of the intervals (

_{U}*u*,

_{i}*u*

_{i}_{+1}) are used to get a discrete version of the PDF. The central points are always computed at the center of each of the intervals. For the last interval,

**x**

*in the phase space that is being considered. The phase space refers to the principal components (PCs) of the geopotential height, the zonal and meridional wind speed at the isobaric levels, the temperature and dewpoint temperature above the study area, and the precipitation at each site during the previous 6 h. The distance between two states of the atmosphere*

_{t}**x**

*and*

_{t}**x**

_{t*}is given by the usual Euclidean distance in the phase space spanned by the PCs:The use of PCs instead of the fields in the original resolution of the numerical model is customary (Zorita and von Storch 1999) when the resolutions of the numerical weather forecast model and the local observations are very different, as is the case of this study.

For each of the analysis **x*** ^{a}* in which the forecast will be issued, there are several options to yield a probabilistic quantitative precipitation forecast. The first step consists in the preselection from the database of the observations of the

*N*= 40 most similar synoptic situations

_{S}**x**

*to the one corresponding to the analysis, with*

_{k}^{p}*k*= 1 …

*N*. The results have also been checked using

_{S}*N*= 20 and

_{S}*N*= 30 and the results (not shown) are not sensitive to the value of this parameter.

_{S}*y*(every rain gauge is forecasted separately). The set of

_{k}^{p}*N*preselected analogs can be used to create a first estimation of probability of precipitation for each of the thresholds. Thus, if

_{S}*n*is the number of elements in the subset such that

_{i}*y*>

_{k}^{p}*u*, a simple estimation of the probability that precipitation exceeds the threshold

_{i}*u*is given (Hamill and Whitaker 2006) byFrom the probability of exceedance of precipitation, it is easy to obtain the probability associated with each of the intervals between the thresholds by subtracting the values corresponding to each of the classes (intervals between the thresholds). This will be referred to as PQPF model A.

_{i}*D*,

_{i}*i*= 1 …

*N*− 1 be the average of the

_{U}*d*

_{tt*}distances of all the analogs yielding precipitation equal to or above to threshold

*u*but below threshold

_{i}*u*

_{i}_{+1}:where

*N*is the number of analogs corresponding to the

_{i}*i*th class,

**x**

*are the predictors corresponding to the analysis being forecast, and*

^{a}**x**

*are the predictors belonging to the events in the*

_{j}^{p}*i*th precipitation class.

*y*∈ (

*u*,

_{i}*u*

_{i}_{+1}) (or, equivalently,

*y*=

*u**

_{i}) is given byThe probability derived from this model will be referred to as B in this paper. In the previous equation, the prior probability

*P*(

*u**

_{i}) will be the one corresponding to the A model. The likelihood represents the probability of getting a distance

*D*in those cases in which precipitation belonged to the interval (

_{i}*u*,

_{i}*u*

_{i}_{+1}) and has been computed from the whole period by means ofFigure 1 shows the likelihood

*P*(

*D*|

_{i}*u**

_{i}) for some of the events and the corresponding normal (lognormal for low classes) fit of the observed likelihood:where

*h*

_{i}*h*(

*D*) of distance of elements in precipitation class

_{i}*u**

_{i}and

*σ*is the corresponding standard deviation. The function of distance is defined as follows:

_{i}*u**

_{i}and ¬

*u**

_{i}(not

*u**

_{i}) inside each of the classes is shown in Fig. 2. It shows the likelihood

*P*(

*D*|

_{i}*u**

_{i}) corresponding to events characterized by precipitation in the class

*u**

_{i}. It also shows events not in that class by means of

*P*(

*D*|¬

_{i}*u**

_{i}). The latter is computed as

The cases labeled as “observed” in Fig. 2 represent the likelihood as computed from Eq. (5), while those curves labeled as “unobserved” in Fig. 2 represent the likelihood corresponding to preselected analogs that produced precipitation outside the *i*th class, according to Eq. (7). It can be seen that the raw distance between synoptic situations yielding (not yielding) precipitation in class *u**_{i} is not able to properly discriminate between both complementary precipitation classes. There exists too a high overlap of the likelihoods corresponding to complementary events.

**x**

*and the predictand*

^{a}*u**

_{i}as follows:This is the final distance used to compute the likelihood in two of the bayesian models. According to this definition of the distance, the term on the left under the square root corresponds to the Euclidean distance

*D*between the analysis corresponding to the date the forecast is being issued

_{i}**x**

*and each of the preselected analogs*

^{a}**x**

*. This distance*

_{k}^{p}*D*is the one used by the B Bayesian model explained before. The righmost term under the square root represents the difference between the precipitation observed during the preselected day

_{i}*k*(considered in discrete increments given by the precipitation thresholds

*u**

_{i}) and the precipitation that we consider that will be the best estimation (

*θ*) corresponding to the date being forecasted. The constant

*α*allows us to regulate the importance of the difference between precipitation with respect to the difference between synoptic situations. This distance Δ

_{i}*reduces to the Euclidean one between synoptic situations when*

_{i}*α*= 0. The parameter

_{i}*θ*is obviously unknown at this moment, but procedures to estimate it will be described later. Once it is determined, the posterior probability derived from Bayes’s theorem can be obtained, since the likelihood can be computed and applied to the prior probability. The interpretation of this distance is clear. The first phase of the algorithm (pre-selection of

*N*analogs) guarantees that the synoptic situations are close to each other and to the analysis corresponding to the date being forecasted. For instance, let us suppose that most of the

_{S}*N*analogs with similar values of the predictors preselected from the historical library are good analogs and that they produced in the past a similar amount of precipitation. Therefore, the value of

_{S}*θ*producing the best estimate of precipitation probability would be the one closest to the observed past precipitation. In this case, the prior probability would be sharp under this hypothesis (

*N*good analogs produced a similar amount of precipitation). Additionally, the distances between the predictors would be low (they are good analogs) and the observed precipitation would be the same for all of them, equal to the value of

_{S}*θ*producing the lowest error. Then, the likelihood would be high and the posterior probability would also be sharp around the precipitation class common to all the preselected past analogs. Therefore, the introduction of the parameter

*θ*involves not only the predictor, but the predictand as well in the downscaling process. In this study the analysis over each of the rain gauges is performed separately, so that both

*u**

_{i}and

*θ*are scalars. However, there is no problem in considering multivariate vectors

**U***

_{i}and parameters

**Θ**, which would imply a term of the form

*α*

_{i}

^{2}(

**U***

_{i}−

**Θ**) · (

**U***

_{i}−

**Θ**) for a Euclidean distance. The definition of the interior product in the multivariate case can be easily extended to the Mahalanobis distance using

*α*

_{i}

^{2}(

**U***

_{i}−

**Θ**)

^{T}Σ

^{−1}(

**U***

_{i}−

**Θ**), where Σ is the covariance matrix or similarly with any other distance common in this application (Matulla et al. 2008). The rightmost term in Eq. (8) is nonzero for close synoptic situations that produce varying amounts of precipitation. Therefore, when computing the likelihood, this term includes a penalty for those synoptic situations not clustered around a given precipitation event

*u**

_{i}while it includes no penalty for those analogs that actually produced similar amounts of rainfall. Figure 3 shows the likelihoods computed using the whole record when the parameter

*θ*is substituted with the actual precipitation corresponding to each of the dates. The value of the parameter

*α*needed to get an effective separation of the likelihoods is 4 for low thresholds and 2 for higher thresholds (4 mm h

_{i}^{−1}). It is clearly seen that this approach produces a better discriminating ability between observed and nonobserved events in the likelihoods than the raw Bayesian model described above.

*θ*must be estimated, since it is unknown in the moment that the forecast is being issued. As

*θ*and, consequently, Δ are unknown at this moment, there is no way to compute the likelihood. To proceed, an estimator

*T*of

*θ*is computed by means of the minimization of the objective function, which represents the square error, computed from the posterior probability:Here

*E*

_{m}^{2}can be decomposed as the sum of the variance of

*u**

_{i}and the bias of forecast precipitation (computed from the posterior probability). That is, if

*y*

_{i}

*u**

_{i}

*P*(

*u**

_{i}|Δ

_{i}) is the mean of the probabilistic forecast and

*σ*

_{y}^{2}= ∑

_{i}(

*u**

_{i}−

*y*

^{2}

*P*(

*u**

_{i}|Δ

_{i}) its variance, the square error is given byAt each of the dates being forecasted,

*E*

_{m}^{2}is numerically minimized for the subset of preselected analogs so that the bias term is zero. Therefore,

*T*is an unbiased estimator of

*θ*for each date. If there are several values of

*θ*which yield a zero bias, the one with the lowest variance is selected.

*T*of

*θ*is known, the likelihood associated to the forecast can be computed and then, the posterior probability is the final PQPF. To check the sensitivity of the results to the actual shape of the likelihoods used, two different functions are used. The first one resembles the form of the likelihood identified from the fit of the observed data during the analysis of the desirable properties of the distance used in the evaluation of the likelihoods in Fig. 3. Therefore, the first probabilistic model [i.e., the Bayesian experimental (BE)] uses as likelihood the following function:The parameters

*A*,

_{i}*h*

_{i}*σ*are computed from the distribution of observed (

_{i}*u**

_{i}) events and corresponding Δ distances during the whole observational record. This function is almost symmetric and is based in the observational data.

*β*and

_{i}*γ*are computed from the distribution of observed events and corresponding Δ

_{i}*distances during the whole period. The advantage of this potential likelihood is that it is very small for large values of Δ*

_{i}*, in the area where the nonobserved events start to be frequent. The values of*

_{i}*β*range from 5 in the lowest class (0 mm h

_{i}^{−1}) to 63 for the highest class (4 mm h

^{−1}). The exponent is

*n*= 2.75, approximately constant for all the classes.

## 3. Results

As explained in section 2, model A refers to a probabilistic model based on the probability of exceedance, as given by Eq. (2). Model B refers to a simple Bayesian model in which the probability is given by Eq. (4), without involving the predictand (precipitation) in the definition of the likelihood. Similarly, models BE and BI refer to Bayesian models built using the Δ distance described in Eq. (8). Therefore, both BE and BI involve precipitation in the definition of the likelihoods, as given by Eqs. (11) and (12), with probabilities in models BE and BI following Eqs. (13) and (14).

Figure 4 shows the reliability diagram corresponding to the different models (A, B, BE, and BI) in four selected precipitation thresholds (0.1, 0.5, 1, and 2 mm h^{−1}). It can be seen that the model, which makes no use of the Bayes’s theorem (A) deviates from the diagonal more than the rest of all the models. The non-Bayesian model (A) tends to underforecast precipitation, particularly below 2 mm h^{−1}. It can also be seen that the models involving the precipitation in the computation of the likelihood (BE and BI) visually seem better than the Bayesian model, which uses the distance computed from the predictors alone (B), particularly at high precipitation thresholds, which are the most interesting events from the point of view of management of precipitation in a catchment, which is the main practical objective of this research (Fernández-Ferrero et al. 2009). For these events, the B model tends to overforecast precipitation at medium–high thresholds (Fig. 4).

Table 1 shows the Brier score (BS) and the Brier skill score (BSS) computed with respect to the climatological probability for each of the different thresholds used in the study and the four models (A, B, BE, and BI). The BS shows that the performance of the models, as measured by this score is very similar for all the thresholds. The BSS (Table 1) shows that all the probabilistic models used in this study produce forecasts with better skill than the climatology for all the thresholds up to 1.5 mm h^{−1}, and all the models except B work better than climatology for the next threshold (2 mm h^{−1}), too. All the models are unable to produce better forecasts than climatological ones for the highest thresholds. To clarify the behavior of the models, Fig. 5 shows the reliability (top) and the resolution (bottom) components of the BS, such that the BS can be decomposed as usual BS = *U* + Rel − Res, where *U* is the uncertainty term (not shown, equal for all the models), Rel is the reliability term, and Res is the resolution term. It is clear from Fig. 5 that the worst model in terms of reliability is the B model, with the A and BE models being very similar, and the BI model showing the lowest (best) values of the reliability component of the BS. Conversely, the resolution term (Fig. 5, bottom) shows that the Bayesian models (B for low precipitation thresholds and B, BE, and BI for intermediate and high precipitation thresholds) show a lower value of the resolution component of the BS than the one corresponding to the simple A model.

The comparison of results from the point of view of the RPS shows that all the models behave, overall, in a quite similar manner, when it comes to predicting the match between the occurrence of the event and the forecast probabilities. The distribution of the RPSs is highly nonnormal according to a Kolmogorov–Smirnov test (probability that the distributions of RPS values for all the models are Gaussian is less than 10^{−6}). There are some very small differences in the distributions of the RPS values for the four models (Table 2). For instance, for low and medium quantiles of the RPS distribution, the A model seems to behave better. However, this effect is lost for the high percentiles of the distribution. For the 95th percentile and above, all the Bayesian models show lower values of RPS than the A model. This means that, while A seems to be yielding better forecasts most of the time, when the forecasts show high RPS values (bad forecasts), they are slightly better with the Bayesian models.

The stratification of the RPS values by precipitation category is shown in Fig. 6. The top panel shows by means of a box and whiskers diagram the minimum, the 25th quantile, the median, the 75th quantile, and the maximum value of RPS for each of the precipitation intervals used in the PQPF. The quantiles are computed from the RPS values corresponding to each of the 6-hourly forecasts. The bottom panels in Fig. 6 shows the number of events in each of the categories. It can be seen that the dry or very low precipitation events (i.e., precipitation lower than 0.1 mm h^{−1}) are the best forecasted with any of the probabilistic models, with median values of the RPS and higher terciles very close to zero. The spread at these classes (i.e., precipitation up to 0.1 mm h^{−1}) is very low. In general, the simplest A model behaves slightly better than the Bayesian models at these precipitation thresholds, since the maximum values of the RPS values corresponding to the A model are the lowest in these thresholds, which are the less interesting ones from a practical point of view. The median of the RPS, the higher quantiles, and the highest values are better captured by the A model at the low precipitation rates. Most of the models yield similar forecasts (statistically speaking, and according to this RPS score) up to the 1 mm h^{−1} threshold. In this intervals, it is noticeable that, however, the median of the RPS values is significantly lower for the Bayesian models than for the A one, even though the upper tercile is similar for all of the models. From this threshold onward, the A model systematically yields worst forecasts than any of the Bayesian models. In general, the B model shows a higher spread than the BI or BE models for those high precipitation rate thresholds. This is not necessarily a virtue, since this basically happens on the low quartile part of the distribution, in which B tends to remain lower than BI or BE. The median corresponding to B tends to be in general lower than the median for BE or BI, particularly for intermediate thresholds. The bottom panel of Fig. 6 shows the number of events recorded at each of the classes. It is very likely that the short record in this study is one of the main reasons behind the lack of performance of the Bayesian probabilistic models for high precipitation rates. However, with the short record available (i.e., 3 yr), this cannot strictly be proved.

## 4. Discussion and conclusions

Results of this study show that all the analog-based PQPF models used are able to yield precipitation forecasts with a 6-h range that are more skillful than the climatology.

Four kinds of models have been tested in this study. The first one (A) is simply based in the probability of exceedance, as given by a subset of preselected analogs. The second one (B) is based in using Bayes’s theorem to derive a posterior probability from the prior probability given by the A model. The other two models (BE and BI) test whether the predictand (precipitation) can be introduced in the posterior probability through the likelihood in order to build better Bayesian models.

In terms of the components of the Brier score, the BI model shows a lower reliability than the other alternative models, something also apparent in the reliability diagrams. In terms of resolution the A model shows a slightly better resolution than the rest of the models. These opposing trends in reliability and resolution imply that the values of the Brier score are very similar for models A and BI, with model B showing in general the worst behavior of all of them. However, for middle and intermediate precipitation thresholds, the Bayesian models yield better RPS values.

The analog-based PQPF models used in this study yield forecasts that, in general, are less reliable during very intense events. The Brier skill score shows that events above 2 mm h^{−1} cannot be forecast by means of these models better than using the climatological probability. This can be traced to the long recurrence time of these events, which are not very common in observational databases (Diomede et al. 2008). This is quite problematic, since it points toward the well-known fact that the analog methods cannot generate events not recorded in the training database (Obled et al. 2002). Thus, for an operational application of the method, a longer database is needed. This is particularly evident for higher rainfall thresholds. To this end, previous studies (Hamill and Whitaker 2006; Hamill et al. 2006) already point that the use of reforecasts can be a solution to this problem. Unfortunately, there exist technological limits to the routine use of this technique by small groups. In further studies, with longer data periods available, higher thresholds will be analyzed.

In this study, the operational analysis have been used as a perfect prognosis of the atmospheric state corresponding to the time interval for which the 6-hourly forecast is being issued. It is known that, for short-range PQPF, this is not a real problem (Fernández-Ferrero et al. 2009; Stephenson et al. 2005), since the state of the atmosphere does not change that fast. For longer time ranges, this is a real problem and a weather forecast model must anyway be run before the PQPF by means of analogs can be issued.

In this study, not all the sources of uncertainty have been considered. There exist sources of uncertainty associated to errors in the initial conditions, but also errors due to different parameterizations of the numerical models.

However, these extra sources of variability are not being considered in this study, since all the meteorological fields come from ECMWF’s operational analysis. Ideally, all the sources of uncertainty should be considered in the real evaluation of posterior probability (Krzysztofowicz 2001). However, this implies the use of other inputs of information (e.g., multimodel ensembles), something that is outside the scope of this study.

The distance Δ* _{i}* proposed in this paper tends to yield higher likelihoods to synoptic situations that, being closer, yield similar amounts of precipitation. Thus, they are able to better discriminate the values of the likelihood used in the Bayesian formalism for medium and intermediate precipitation thresholds. However, other factors, like the length of the available database imply that the net effect in the results is not very important. The positive influence of this distance can be detected in better reliabilities of the forecast, but only for some of the rainfall thresholds. Further research with longer databases is needed in order to identify the role of the length of the database used in the performance of the Bayesian models used.

The results in this paper show that the performance of the models is different depending on the score used and the rainfall interval that is most important for the forecaster. The A model works best for lowest precipitation rates. This might be interesting for rain–no rain forecasts. For hydrological forecasts, it is probably better to use Bayesian models (B, BI, or BE), since they perform better than the simple analog-based one for intense precipitation events. Therefore, users should clearly define their main interests before selecting the final model.

## Acknowledgments

Authors thank the ETORTEK Strategic Research Programme (Department of Industry, Trade and Tourism and Department of Transport and Civil Works of the Basque Government, Basque Meteorological Service-Euskalmet) for their financial support through the EKLIMA21 Project (ETORTEK IE08-217 and IE09-264). Funding was also received from the National R + D + i Plan, Spanish Ministry of Science and Innovation (CGL2008-03321/CLI). The precipitation data were provided by AEMET, Euskalmet, CABB, and DFB. The authors thank the ECMWF for granting access to reanalysis and operational data through the MARS archive system by means of the special project SPESIPRA. Comments by two anonymous reviewers were very helpful and led to a better final version of the paper.

## REFERENCES

Barnett, T. P., , and R. W. Preisendorfer, 1987: Origins and levels of monthly and seasonal forecast skill for U.S. air temperature determined by canonical correlation analysis.

,*Mon. Wea. Rev.***115****,**1825–1850.Benestad, R. E., , I. Hanssen-Bauer, , and D. Chen, 2008:

*Empirical-Statistical Downscaling*. World Scientific Publishing Company, Inc., 215 pp.Busuioc, A., , and H. von Storch, 2003: Conditional stochastic model for generating daily precipitation time series.

,*Climate Res.***24****,**181–195.Coelho, C. A. S., , S. Pezzulli, , M. Balmaseda, , F. J. Doblas-Reyes, , and D. B. Stephenson, 2004: Forecast calibration and combination: A siple Bayesian approach for ENSO.

,*J. Climate***17****,**1504–1516.Diomede, T., , F. Nerozzi, , T. Paccagnella, , and E. Todini, 2008: The use of meteorological analogues to account for LAM QPF uncertainty.

,*Hydrol. Earth Syst. Sci.***12****,**141–157.Feddersen, H., , and U. Andersen, 2005: A method for statistical downscaling of seasonal ensemble predictions.

,*Tellus***57****,**398–408.Feddersen, H., , A. Navarra, , and M. Ward, 1999: Reduction of model systematic error by statistical correction for dynamical seasonal predictions.

,*J. Climate***12****,**1974–1989.Fernández-Ferrero, A., , J. Sáenz, , G. Ibarra-Berastegi, , and J. Fernández, 2009: Evaluation of statistical downscaling in short range precipitation forecasting.

,*Atmos. Res.***94****,**448–461.Fernández, J., , and J. Sáenz, 2003: Improved field reconstruction with the analog method: searching the CCA space.

,*Climate Res.***24****,**199–213.Gangopadhyay, S., , M. Clark, , and B. Rajagopalan, 2005: Statistical downscaling using k-nearest neighbors.

,*Water Resour. Res.***41****,**W02024. doi:10.1029/2004WR003444.Gutiérrez, J. M., , A. S. Cofiño, , R. Cano, , and M. A. Rodríguez, 2004: Clustering methods in statistical downscaling for short-range weather forecasts.

,*Mon. Wea. Rev.***132****,**2169–2183.Hamill, T. M., , and J. S. Whitaker, 2006: Probabilistic quantitative precipitation forecasts based on reforecast analogs: Theory and application.

,*Mon. Wea. Rev.***134****,**3209–3229.Hamill, T. M., , and J. Whitaker, 2007: Ensemble calibration of 500-hPa geopotential height and 850-hPa and 2-m temperatures using reforecasts.

,*Mon. Wea. Rev.***135****,**3273–3280.Hamill, T. M., , J. S. Whitaker, , and X. Wei, 2004: Ensemble reforecasting: Improving medium-range forecast skill using retrospective forecasts.

,*Mon. Wea. Rev.***132****,**1434–1447.Hamill, T. M., , J. Whitaker, , and S. Mullen, 2006: Reforecasts: An important dataset for improving weather predictions.

,*Bull. Amer. Meteor. Soc.***87****,**33–46.Hughes, J. P., , and P. Guttorp, 1994: A class of stochastic models for relating synoptic atmospheric patterns to regional hydrologic phenomena.

,*Water Resour. Res.***30****,**1525–1546.Jolliffe, I. T., , and D. B. Stephenson, 2003:

*Forecast Verification: A Practitioner’s Guide in Atmospheric Science*. John Wiley and Sons, 240 pp.Kruizinga, S., , and A. H. Murphy, 1983: Use of an analogue procedure to formulate objective probabilistic temperature forecasts in The Netherlands.

,*Mon. Wea. Rev.***111****,**2244–2254.Krzysztofowicz, R., 1998: Probabilistic hydrometeorological forecasts: Toward a new era in operational forecasting.

,*Bull. Amer. Meteor. Soc.***79****,**243–251.Krzysztofowicz, R., 2001: The case for probabilistic forecasting in hydrology.

,*J. Hydrol.***249****,**2–9.Marty, R., , I. Zin, , and C. Obled, 2008: On adapting PQPFs to fit hydrological needs: The case of flash flood forecasting.

,*Atmos. Sci. Lett.***9****,**73–79.Matulla, C., , X. Zhang, , X. L. Wang, , J. Wang, , E. Zorita, , S. Wagner, , and H. von Storch, 2008: Influence of similarity measures on the performance of the analog method for downscaling daily precipitation.

,*Climate Dyn.***30****,**133–144.Obled, C., , G. Bontron, , and R. Garçon, 2002: Quantitative precipitation forecasts: A statistical adaptation of model outputs through an analogues sorting approach.

,*Atmos. Res.***63****,**303–324.Roebber, P. J., , and L. F. Bosart, 1998: The sensitivity of precipitation to circulation details. Part I: An analysis of regional analogs.

,*Mon. Wea. Rev.***126****,**437–455.Stephenson, D. B., , C. A. S. Coelho, , F. J. Doblas-Reyes, , and M. Balmaseda, 2005: Forecast assimilation: A unified framework for the combination of multi-model weather and climate predictions.

,*Tellus***57****,**253–264.Toth, Z., 1991: Intercomparison of circulation similarity measures.

,*Mon. Wea. Rev.***119****,**55–64.Van den Dool, H. M., 1994: Searching for analogues, how long must we wait?

,*Tellus***46****,**314–324.Wetterhall, F., , S. Halldin, , and C. Xu, 2005: Statistical precipitation downscaling in central Sweden with the analogue method.

,*J. Hydrol.***306****,**174–190.Wilby, R. L., , and T. M. L. Wigley, 1997: Downscaling general circulation model output: A review of methods and limitations.

,*Prog. Phys. Geogr.***21**(4) 530–548.Wilby, R. L., , S. P. Charles, , E. Zorita, , B. Timbal, , P. Whetton, , and L. O. Mearns, 2004: Guidelines for use of climate scenarios developed from statistical downscaling methods. Tech. Rep., IPCC, 27 pp.

Yates, D., , S. Gangopadhyay, , B. Rajagopalan, , and K. Strzepek, 2003: A technique for generating regional climate scenarios using a nearest-neighbor algorithm.

,*Water Resour. Res.***39****,**1199. doi:10.1029/2002WR001769.Yuan, H., , S. L. Mullen, , X. Gao, , S. Sorooshian, , J. Du, , and H. M. H. Juang, 2007: Short-range probabilistic quantitative precipitation forecasts over the southwest United States by the RSM ensemble system.

,*Mon. Wea. Rev.***135****,**1685–1698.Zorita, E., , and H. von Storch, 1999: The analog method as a simple statistical downscaling technique: Comparison with more complicated methods.

,*J. Climate***12****,**2474–2489.Zorita, E., , J. P. Hughes, , D. P. Lettemaier, , and H. von Storch, 1995: Stochastic characterization of regional circulation patterns for climate model diagnosis and estimation of local precipitation.

,*J. Climate***8****,**1023–1042.

The BS and BSS for each of the models and thresholds used in the study.

Percentiles of the RPS values corresponding to all the 6-hourly forecasts during the period June 2000–December 2003 for the different models.