## 1. Introduction

During the last decades statistical methods that use output from numerical weather prediction (NWP) models have become an important tool in producing quantitative forecasts at sites with observations. One of the reasons is that forecasts can be formulated in terms of probabilities independent of whether the information available is deterministic. Probabilistic forecasts are ideally fully specified probability distributions, but in practice those made by statistical methods are almost exclusively probabilities of discrete events, such as probability of precipitation. That is, continuous (or partly continuous) variables are commonly discretized before statistical methods are applied.

The aim of this article is to make forecasts in terms of quantiles. The *p*th quantile is informally defined as the value where the probability of an observation less than this value is *p.* When probabilities are specified in percent, quantiles are referred to as percentiles. The attention is here restricted to precipitation, which is one of the most difficult parameters to forecast both from a meteorological and statistical point of view. An example of such forecasts is shown in Fig. 1. Here, the probability distributions are presented in terms of the 95th, 75th, 50th, 25th, and 5th percentiles (not all are defined for all days). On the first day, as an example of interpretation, there is a 50% chance of precipitation less than about 10 mm.

With the exception of ensemble forecasting (e.g., Molteni et al. 1996; Toth and Kalnay 1997), the meteorological literature for probabilistic forecasts of continuous variables is quite sparse. Statistical approaches have been applied mainly to calibrate ensemble forecasts. Hamill and Colucci (1998) use Gamma distributions based on either corrected ensembles or means of ensembles for precipitation forecasting. Atger (1999) fits Gaussian distributions to ensembles of geopotential height with various estimators of the variance. In Wilks (2002), mixtures of multivariate Gaussian distributions are fitted to ensembles of surface parameters (transformed if necessary) with possible corrections of bias and spread. The approach proposed here is different in that the problem is solved by means of regression methods. The main advantages are that distributional assumptions are not necessary and that any type of relevant information can be included as predictors; ensembles, for example, are not required.

In section 2 the statistical framework for producing quantile forecasts of precipitation is outlined. Section 3 presents examples of daily precipitation forecasting using single deterministic forecasts and ensemble forecasts as input. Some general comments are given in section 4 followed by conclusions in section 5.

## 2. Statistical methods

Statistical methods for precipitation accumulated over periods of days or less have to take into account that precipitation has a mixed discrete–continuous probability distribution. This difficulty is commonly solved by first estimating the probability of precipitation and then modeling the precipitation amounts given occurrence of precipitation. These two steps can then be combined using the laws of probability. This strategy is, for example, applied by Stern and Coe (1984) in stochastic climate models and by Krzysztofowicz et al. (1993) and Kelly and Krzysztofowicz (2000) for inclusion of precipitation forecasts in stochastic hydrological models.

### a. Probability of precipitation

*y*

_{1}, … ,

*y*

_{n}denote binary observations that either take the “values”

*precipitation*or

*no precipitation*and assume that they are realizations of the conditional random variables

*Y*

_{1}|

**x**

_{1}, … ,

*Y*

_{n}|

**x**

_{n}from Bernoulli distributions. A Bernoulli distribution is fully characterized by one parameter,

*π,*which here is the probability of precipitation. The vector variables

**x**

_{1}, … ,

**x**

_{n}denote predictors and contain the information available about the observations at a given time before they become known. Formally, the model may be written

*i*= 1, … ,

*n,*where

*π*

_{i}is the probability of precipitation for case

*i,*Φ

^{−1}( · ) the inverse Gaussian distribution function,

*x*

_{ij}the

*j*th component of

**x**

_{i}, and

*α*

_{0}, … ,

*α*

_{J}unknown parameters to be estimated. Probit regression is, as logistic regression, an example of a generalized linear model, and by assuming conditionally independent observations maximum likelihood estimates of the parameters can be obtained by iterative weighted least squares (e.g., McCullagh and Nelder 1989; Dobson 1990).

### b. Precipitation amounts given occurrence of precipitation

*r*

_{1}, … ,

*r*

_{n*}denote observed precipitation amounts of cases with observed precipitation above a given lower threshold and

**z**

_{1}, … ,

**z**

_{n*}corresponding predictor values where

**z**

_{i}= (

*z*

_{i1}, … ,

*z*

_{iK})

^{T}. For linear quantile functions

*θ*quantile,

*q*

_{θ}(

*z*;

**), 0 <**

*β**θ*< 1, is obtained by solving the following minimization problem with respect to

**:**

*β**ρ*

_{θ}( · ) is defined by

*β*s to avoid crossing quantiles, especially in situations with more than one predictor variable or few data. Computational aspects are discussed in Koenker and D'Orey (1987, 1993), who also provide computer codes, and Portnoy and Koenker (1997).

*β*s to avoid crossing quantiles may be troublesome. One possible solution to largely circumvent these inconveniences is to assume that the quantiles are only locally linear in the neighborhood of a given predictor value

**z**. This method will be referred to as local quantile regression (LQR). For a given predictor value,

**z**, an estimate of the

*θ*quantile,

*q*

_{θ}(

**z**;

**β**), is now obtained by

*z*

_{k}is the

*k*th component of

**z.**In these equations there are two differences from (3) and (4). First, and most important, is that each item in the sum of (4) is given a weight

*w*

_{i}. Second, the predictors are centered around

**z,**such that the estimate of the quantile

*q*

_{θ}(

**z**;

**β**) equals the estimate of

*β*

_{0}. This means that possible constraints only must be put on

*β*

_{0}, which is much easier than making constraints for all

*β*s simultaneously.

*i*= 1, … ,

*n**. In this expression, ‖ · ‖

_{2}is the Euclidean norm,

*λ*∈ (0, 1] a smoothing parameter that determines the fraction of the data to be used, and

*h*

_{λ}(

**z**) the distance from

**z**to the

*λn** nearest predictor value. In practice, this means that (1 −

*λ*)100% of the cases have no impact on the fit, and that cases with predictor values close to

**z**have greater impact than those farther away. It should also be noted that each predictor variable should be scaled before computing the weights. In the examples of section 3, the inverse of each predictor's standard deviation is used as scaling factors.

When applying LQR, the minimization problem must be solved for each **z** of interest and, hence, it is more computationally demanding than QR. However, Portnoy and Koenker (1997) demonstrate that the computations are comparable in terms of speed with those of least square estimation (linear regression), and this may therefore not be a limiting factor. Recent experiments at the Norwegian Meteorological Institute also indicate that LQR is feasible for operational forecasting using standard personal computers only.

### c. Forecasting

Forecasts for the probability of precipitation are simply made by replacing the regression parameters *α*_{0}, … , *α*_{J} in (2) with estimates and inserting values for the predictors. Note that with this strategy the uncertainties in the parameter estimates are not taken into account, but in practice these are often small and negligible compared to the model assumptions; the linear dependence assumed in (2), for example, might not be optimal.

*R*denote a random variable for precipitation (not conditional on the occurrence of precipitation) and

*q*

_{p}the quantile of interest, that is,

*P*(

*R*≤

*q*

_{p}) =

*p.*In order to estimate

*q*

_{p}, the correct quantile for precipitation amounts given the occurrence of precipitation must be estimated. Denoting the estimated probability of precipitation by

*π̂*

*c,*the probability associated with this quantile is

*p*quantile is of interest, the 1 − (1 −

*p*)

*π̂*

^{−1}quantile in section 2b must be estimated. Note that

*π̂*

*p*in order to estimate this quantile.

## 3. Examples of daily precipitation forecasting

In this section the approach is applied to data at Brekke i Sogn (Norway), which is located about 70 km north of Bergen near the outlet of Sognefjorden. This site is strongly exposed to precipitation from the west and southwest and is one of the wettest measuring stations in Norway with approximately 3500 mm yr^{−1} on average.

The data consist of model output from the high-resolution model and the ensemble prediction system (EPS) of the European Centre for Medium-Range Weather Forecasts (ECMWF) for the period November 2000 to June 2002 and corresponding daily observations. In all, there were 525 days, excluding days with missing data. For convenience the examples are restricted to making forecasts 66 h ahead only.

After a description of the procedures for selecting predictors and related issues, three examples of using the NWP data are described. In the first, output from the high-resolution model is applied while the other two demonstrate use of ensembles in different ways.

### a. Model selection procedures

Model selection mainly involves reducing the NWP model output to a few variables such that new forecasts are as good as possible. For the examples using local quantile regression, the smoothing parameter must also be determined. In order to quantify the quality of potential forecast models, measures of goodness and a framework for testing forecasts must be defined. Evaluation of forecasts is here carried out separately for each of the two steps described in section 2 by means of cross-validation. In cross-validation the data are divided in *K* parts (here *K* = 5), where *K* − 1 parts are used for training while the remaining part is used for testing. The part left out for testing is alternated such that in the end there is one forecast for each day.

The first step involves selecting predictors for the probit regression. Good probabilistic forecasts for binary events should have forecasted probabilities close to 0 or 1 and at the same time be well calibrated or reliable in the sense that the proportion of observations for an event agrees with the forecasted probability. These properties are summarized by the Brier score (Brier 1950), but it is often more informative to look at these properties separately by a decomposition of the Brier score (Murphy 1973; Wilks 1995). In addition to Brier scores, subjective interpretation of reliability diagrams for the candidates with the lowest (best) Brier scores is used for choosing predictors.

*n*

_{0},

*n*

_{1}, … ,

*n*

_{I}denote the number of observations in the intervals formed by

*I*quantiles and

*p*

_{0},

*p*

_{1}, … ,

*p*

_{I}denote the corresponding theoretical probabilities. Forecasts produced by a candidate are then defined as not reliable if

*n*is the total number of observations and

*χ*

^{2}

_{1−α,I}

*α*) quantile of the

*χ*

^{2}distribution with

*I*degrees of freedom. Informally, candidates are rejected if the fractions of observations between the estimated quantiles are not in accordance with the claimed probabilities. In the examples to follow, the reliability tests are based on the 5th, 25th, 50th, 75th and 95th percentiles with significance level

*α*= 0.05, which was found to be an appropriate level. A weakness with this test is that it does not detect whether observations above (or, equivalently, below) a given quantile are clustered in some way; for example, the observations above the forecasted 95th percentiles might all occur when the NWP model precipitation is large and, clearly, this is not desirable. By splitting the NWP model precipitation into bins and calculating for each bin the fraction of observations below a given quantile, such features can be revealed. Ideally, all fractions should equal the quantile probability. In the examples, this check is only performed for a few quantiles of the most promising and reliable candidate models. For convenience, the check is carried out here only for NWP model precipitation, but it could be used for other predictors.

Sharpness is measured here by distances between quantiles, although this does not necessarily give credit for multimodal distributions. Multimodal distributions occur mainly for longer lead times than in our examples, and this somewhat justifies our use. In addition to the mean or median distances between pairs of quantiles, the variation in the length of these intervals and the ability to forecast extreme events are taken subjectively into account in the evaluations of potential forecast models.

### b. Statistical forecasts using output from the high-resolution model

Many parameters of the high-resolution model were available, and these were all interpolated to the site by means of a linear interpolation scheme. In addition, new variables that quantify the large-scale circulation pattern were constructed by using the method of Jones et al. (1993) for computing Lamb's circulation types. In their method the mean sea level pressure is used to compute wind flow (two components) and shear vorticity that are representative for a large area. A set of rules is then applied to determine the Lamb class for each case. Due to the small number of observations in this study, it was not feasible to use the 27 circulation classes directly. Instead, further use was based only on the three basic variables and derived variables of these. A list of all available variables from the high-resolution model is given in Table 1.

*Z*and RH

_{925–500}. Not surprisingly, the probability of precipitation increases with NWP precipitation (RR), but when RR is higher than about 1 mm the westerly flow component

*W*has greater impact. It can also be noted that the probability is, as expected, higher for cyclonic weather types (

*Z*> 0) than for anticyclonic (

*Z*< 0). Further, the mean relative humidity is mainly important in anticyclonic situations.

For precipitation amounts on days with observed precipitation, RR was included in all candidate models without further testing. An initial search for variables was then carried out by applying cross-validation to models using RR in combination with each of the others variables as predictors. A short evaluation revealed that *W* was the most influential predictor next to RR. Various predictor combinations based on these and a few more were then more carefully tested in a new cross-validation with smoothing parameters *λ* ∈ {0.2, 0.3, … , 1}. In all, 17 predictor combinations were tested for each smoothing parameter, and of these, 92 (out of 153) gave well-calibrated forecasts using the chi-square test. Further evaluation was restricted to the models that produced reliable forecasts. First, the empirical distributions of the distances between the 95th and 5th percentiles and between the 75th and 25th percentiles were examined. Only the candidates with the shortest median distances, and to some extent also other quantiles of the distances, were further considered. Next, the empirical distributions of the predicted quantiles were studied, mainly to see how well the extreme cases were predicted. The model selection procedure resulted in using a model with RR, *W,* and southerly flow (*S*) as predictors and smoothing parameter *λ* = 0.7. The estimated relationship is shown in Fig. 2 (lower row) for some choices of *W* and *S.* As expected, the quantiles increase with the NWP model precipitation, and the predicted distributions are most skewed for small precipitation amounts in the NWP model.

### c. Statistical forecasts using all ensemble members separately

The EPS data were restricted to the 50 perturbed members of the parameter total precipitation. As above, the precipitation fields were interpolated to the site by a linear interpolation scheme; that is, no spatial information was made available. Since there was only one predictor, model selection for the probit regression was reduced to finding a suitable transformation or polynomial expansion of the precipitation (RR). A few candidates were tested using the procedure described in 3a. The final choice was log(RR + 0.1) for probability of precipitation. As anticipated, the estimated relationship shown in Fig. 3 (top) demonstrates that the probability increases with increasing precipitation in the NWP model.

Due to the computational cost of dealing with all ensemble members, local weighting and constraints on the fits were not used to estimate the quantiles in the cross-validation. The latter did not have any impact anyway, since with only one predictor and a large dataset none of the quantiles crossed at any predictor value. A second-degree polynomial of the NWP precipitation (RR) gave the best verification scores for precipitation amounts given the occurrence of precipitation. The fit is shown in Fig. 3 (bottom) and reveals considerable uncertainty for moderate and high precipitation amounts in the NWP model.

In a forecasting situation the proposed methodology would produce 50 values of each quantile. As there is no reason to weigh the ensemble members differently, the average for each quantile would be used for forecasting.

### d. Statistical forecasts using statistics of the ensemble members

Using ensemble members separately and averaging the resulting estimated quantiles may not produce the best forecasts. An alternative approach is to use statistics of the ensemble members directly as predictors. Here, two types of predictors were constructed from each ensemble. The first was the 5th, 10th, 25th, 50th, 75th, 90th, and 95th percentiles (*Q*_{5}, *Q*_{10}, *Q*_{25}, *Q*_{50}, *Q*_{75}, *Q*_{90}, and *Q*_{95}), and the maximum (MAX) and minimum (MIN) NWP model precipitation. The second type were the probabilities of more than 0.1, 1, and 5 mm of precipitation.

For the probability of precipitation an initial search for good predictor combinations was carried out by a stepwise backward procedure starting with different sets of predictors. The most promising as well as some subjectively chosen candidates were then tested more extensively. In all, 17 candidates were considered in the cross-validation and evaluated by the verification scores.

The final choice was log(MIN + 0.1), log(*Q*_{50} + 0.1), and log(MAX + 0.1), but others could have been used. In Fig. 4 (top row) contour plots of the fitted probability of precipitation are shown for the medians (*Q*_{50}) 0.5, 1, and 2.5 mm. Although it is difficult to interpret these plots, one might notice that the probability increases, as expected, with increasing median (*Q*_{50}) and that the ensemble maximum has greater impact on the probability than the ensemble minimum with the exception of situations where ensemble minimas are close to zero.

For precipitation amounts given the occurrence of precipitation, 11 different predictor combinations, each with up to three predictors, were tested for smoothing parameters 0.2, 0.3, … , 1. From the 99 candidates, 77 produced reliable predictions when applying the chi-square test. The evaluation resulted in using *Q*_{25} and *Q*_{75} with smoothing parameter *λ* = 0.6. In Fig. 4 (bottom row) estimates of the 5th, 25th, 50th, 75th, and 95th percentiles are shown as a function of the 25th and 75th percentiles of the ensemble. Most notable is the increasing uncertainty with increasing 25th percentile of the ensemble. The nonsmoothness of the 95th percentiles indicates that the smoothing parameter possibly could have been higher.

### e. Evaluation and comparison of final forecasts

In the previous sections, cross-validation was used to determine three statistical forecasting models. To compare these with the raw EPS forecasts, one should ideally test the forecasting models on data not used for model selection. However, due to limited amounts of data, only a new cross-validation was performed. The evaluations were similar to the ones used for model selection with the exception of the quantiles that now were not conditioned on the occurrence of precipitation. The assessment was based on Figs. 5 and 6, which visually quantify reliability and sharpness of both the probability of precipitation forecasts and the quantile/interval forecasts. The 50% and 90% intervals are derived from the 75th and 25th percentiles and the 95th and 5th percentiles. Hereafter, the statistical models that use the high-resolution model, all ensemble members separately, and ensemble statistics will be referred to as Probit/LQR EC, Probit/QR EPS ALL, and Probit/LQR EPS STATS, respectively.

In Fig. 5 (top) the reliability diagrams show that the forecasted probabilities of precipitation from the EPS are generally too high and that the forecasts by Probit EPS ALL are somewhat uncalibrated for low forecasted probabilities of precipitation. The other two models have less systematic deviations from the ideal dashed line. With respect to sharpness or resolution these two are also clearly better than the Probit EPS ALL, as can be seen from the number of forecast probabilities close to 0 or 1.

The reliability of each forecasted percentile is quantified by the fraction of observations less than the percentile and its 95% confidence interval (e.g., Wilks 1995). In Fig. 5 (bottom) this is illustrated for the 5th, 25th, 50th, 75th, and 95th percentiles of each forecast model. Models whose confidence intervals do not cover the solid lines (the claimed probabilities) cannot be regarded as well calibrated. As expected, the raw EPS quantile forecasts are not well calibrated, but, more surprisingly, the QR EPS ALL to some extent has the same property. This can partly be explained by the systematic deviation of low forecasted probabilities of precipitation.

In Fig. 6 (left) the empirical distributions of the forecasted percentiles are shown. For reference, the five highest observations in the sample were 81, 75, 73, 56, and 54 mm. The raw EPS forecasts have less variation than the statistical models. It can also be noticed that in particular the 5 percentiles of QR EPS ALL are consistently very low. This is due to few probabilities of precipitation above 95%, which implies that the probabilities, defining the conditional percentiles, are close to 0 and, consequently, low percentiles. The sharpness, as measured by the length of forecast intervals, is shown in Fig. 6 (right). EPS has considerably sharper forecasts than those produced by the statistical models, but as mentioned above, they are not very reliable. LQR EC have the sharpest forecasts of the statistical models especially for the 90% intervals where LQR EPS STATS produces wider intervals. It is, however, reasonable to expect that the statistical forecasts based on EPS would have improved if more variables from the EPS were available. These forecasts would also have been comparably better if longer lead times were considered.

## 4. General comments

Careful selection of predictors is crucial to obtain good forecasts but can soon be time consuming if measures of goodness are not precisely defined. Verification scores for probabilistic forecasts of continuous variables have so far received little attention in the literature. Further research on this topic would be useful, in particular on measures of sharpness and how these influence or penalize variations in the quantiles.

In the examples, the same predictors are used for all quantiles. To further increase flexibility and possibly improve forecasts, one might allow the quantiles to depend on different predictors. This might especially be useful when the information is an ensemble of NWP model forecasts. The amount of information can then effectively be reduced by using the ensemble member corresponding to the desired quantile. A minor disadvantage is that the probability of precipitation must be known to decide which conditional quantile to estimate and, hence, which ensemble member to use.

In this study the local quantile regression uses the same smoothing parameter for all quantiles. Since more data are needed to estimate extreme quantiles well, it might be beneficial to use higher smoothing parameters for these than, for example, the median. A rule of thumb is given in Yu and Jones (1998). In this article another method for estimating conditional quantiles is also presented. Their simulations show that this method, referred to as the double kernel approach, generally gives smoother quantiles and better performance. Besides, it is not necessary to define constraints to avoid crossing quantiles.

## 5. Conclusions

In this paper it is demonstrated how reliable probabilistic precipitation forecasts in terms of quantiles can be made by means of quantile regression. The approach requires no strong assumptions and inclusion of information from NWP models is very flexible. It is also straight forward to apply the approach to other variables like temperature and wind speed.

## REFERENCES

Applequist, S., G. E. Gahrs, R. L. Pfeffer, and X. F. Niu, 2002: Comparison of methodologies for probabilistic quantitative precipitation forecasting.

,*Wea. Forecasting***17****,**783–799.Atger, F., 1999: The skill of ensemble prediction systems.

,*Mon. Wea. Rev.***127****,**1941–1953.Brier, G. W., 1950: Verification of forecasts expressed in terms of probability.

,*Mon. Wea. Rev.***78****,**1–3.Dobson, A. J., 1990:

*An Introduction to Generalized Linear Models*. Chapman & Hall, 174 pp.Hamill, T. M., and S. J. Colucci, 1998: Evaluation of Eta-RSM ensemble probabilistic precipitation forecasts.

,*Mon. Wea. Rev.***126****,**711–724.Jones, P. D., M. Hulme, and K. R. Briffa, 1993: A comparison of Lamb circulation types with an objective classification scheme.

,*Int. J. Climatol.***13****,**655–663.Kelly, K. S., and R. Krzysztofowicz, 2000: Precipitation uncertainty processor for probabilistic river stage forecasting.

,*Water Resour. Res.***36****,**2643–2653.Koenker, R., and B. Bassett, 1978: Regression quantiles.

,*Econometrica***46****,**33–49.Koenker, R., and V. D'Orey, 1987: Computing regression quantiles.

,*J. Roy. Stat. Soc.***36C****,**383–393.Koenker, R., and V. D'Orey, 1993: Computing dual regression quantiles and regression rank score.

,*J. Roy. Stat. Soc.***43C****,**410–414.Krzysztofowicz, R., W. J. Drzal, T. R. Drake, J. C. Weyman, and L. A. Giordano, 1993: Probabilistic quantitative precipitation forecasts for river basins.

,*Wea. Forecasting***8****,**424–439.McCullagh, P., and J. A. Nelder, 1989:

*Generalized Linear Models*. Chapman & Hall, 511 pp.Molteni, F., R. Buizza, T. N. Palmer, and T. Petroliagis, 1996: The ECMWF ensemble prediction system: Methodology and validation.

,*Quart. J. Roy. Meteor. Soc.***122****,**73–119.Murphy, A. H., 1973: A new vector partition of the probability score.

,*J. Appl. Meteor.***12****,**595–600.Portnoy, S., and R. Koenker, 1997: The Gaussian hare and the Laplacian tortoise: Computability of squared-error versus absolute-error estimators.

,*Stat. Sci.***12****,**279–300.Stern, R., and R. Coe, 1984: A model fitting analysis of daily rainfall data (with discussion).

,*J. Roy. Stat. Soc.***147A****,**1–34.Toth, Z., and E. Kalnay, 1997: Ensemble forecasting at NCEP and the breeding method.

,*Mon. Wea. Rev.***125****,**3297–3319.Wilks, D. S., 1995:

*Statistical Methods in the Atmospheric Sciences*. Academic Press, 467 pp.Wilks, D. S., 2002: Smoothing forecast ensembles with fitted probability distributions.

,*Quart. J. Roy. Meteor. Soc.***128****,**2821–2836.Yu, K., and M. C. Jones, 1998: Local linear quantile regression.

,*J. Amer. Stat. Assoc.***93****,**228–237.

List of predictors from the high-resolution model