Generating and Calibrating Probabilistic Quantitative Precipitation Forecasts from the High-Resolution NWP Model COSMO-DE

Sabrina Bentzien Meteorological Institute, University of Bonn, Bonn, Germany

Search for other papers by Sabrina Bentzien in
Current site
Google Scholar
PubMed
Close
and
Petra Friederichs Meteorological Institute, University of Bonn, Bonn, Germany

Search for other papers by Petra Friederichs in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

Statistical postprocessing is an integral part of an ensemble prediction system. This study compares methods used to derive probabilistic quantitative precipitation forecasts based on the high-resolution version of the German-focused Consortium for Small-Scale Modeling (COSMO-DE) time-lagged ensemble (COSMO-DE-TLE). The investigation covers the period from July 2008 to June 2011 for a region over northern Germany with rain gauge measurements from 445 stations. The investigated methods provide pointwise estimates of the predictive distribution using logistic and quantile regression, and full predictive distributions using parametric mixture models. All mixture models use a point mass at zero to represent the probability of precipitation. The amount of precipitation is modeled by either a gamma, lognormal, or inverse Gaussian distribution. Furthermore, an adaptive tail using a generalized Pareto distribution (GPD) accounts for a better representation of extreme precipitation. The predictive probabilities, quantiles, and distributions are evaluated using the Brier, the quantile verification, and the continuous ranked probability scores. Baseline predictions and covariates are based on first-guess estimates from the COSMO-DE-TLE. Predictive performance is largely improved by statistical postprocessing due to an increase in reliability and resolution. The mixture models show some deficiencies. The inverse Gaussian fails to provide calibrated predictive distributions, whereas the lognormal and gamma mixtures perform well within the bulk of the distribution. Both mixtures provide significantly less skill for the extremal quantiles (0.99–0.999). Their representation is largely improved by incorporating an adaptive GPD tail. Even more stable estimates are obtained if the annual cycle is included in the postprocessing and training is performed on almost 3 yr of data.

Corresponding author address: Sabrina Bentzien, Meteorological Institute, University of Bonn, Auf dem Hügel 20, 53121 Bonn, Germany. E-mail: bentzien@uni-bonn.de

Abstract

Statistical postprocessing is an integral part of an ensemble prediction system. This study compares methods used to derive probabilistic quantitative precipitation forecasts based on the high-resolution version of the German-focused Consortium for Small-Scale Modeling (COSMO-DE) time-lagged ensemble (COSMO-DE-TLE). The investigation covers the period from July 2008 to June 2011 for a region over northern Germany with rain gauge measurements from 445 stations. The investigated methods provide pointwise estimates of the predictive distribution using logistic and quantile regression, and full predictive distributions using parametric mixture models. All mixture models use a point mass at zero to represent the probability of precipitation. The amount of precipitation is modeled by either a gamma, lognormal, or inverse Gaussian distribution. Furthermore, an adaptive tail using a generalized Pareto distribution (GPD) accounts for a better representation of extreme precipitation. The predictive probabilities, quantiles, and distributions are evaluated using the Brier, the quantile verification, and the continuous ranked probability scores. Baseline predictions and covariates are based on first-guess estimates from the COSMO-DE-TLE. Predictive performance is largely improved by statistical postprocessing due to an increase in reliability and resolution. The mixture models show some deficiencies. The inverse Gaussian fails to provide calibrated predictive distributions, whereas the lognormal and gamma mixtures perform well within the bulk of the distribution. Both mixtures provide significantly less skill for the extremal quantiles (0.99–0.999). Their representation is largely improved by incorporating an adaptive GPD tail. Even more stable estimates are obtained if the annual cycle is included in the postprocessing and training is performed on almost 3 yr of data.

Corresponding author address: Sabrina Bentzien, Meteorological Institute, University of Bonn, Auf dem Hügel 20, 53121 Bonn, Germany. E-mail: bentzien@uni-bonn.de

1. Introduction

Precipitation does not only impact daily life, but also agriculture, hydrology, civil protection, and many other fields of human and economic activities. However, quantitative precipitation forecasting (QPF) still represents a major challenge in numerical weather prediction (NWP) and the skill of QPF is low compared to other meteorological variables (Hense and Wulfmeyer 2008; Ebert et al. 2003). The reason is that precipitation results from complex dynamical and microphysical processes operating on a large range of scales and showing high temporal and spatial variabilities.

High-resolution limited-area mesoscale NWP (HR-NWP) models have been developed to predict weather that has the potential for hazardous impacts, denoted as high-impact weather. High-impact weather in western Europe is related to strong mean winds, severe gusts, and heavy precipitation (Craig et al. 2010), and particularly during summer, these weather situations are often related to deep moist convection. Because of their high resolution of 10 km or less and their nonhydrostatic dynamics, HR-NWP models are able to describe mesoscale processes in an explicit and more detailed way. Today, many meteorological services use HR-NWP models for operational forecasts (Skamarock and Klemp 2008; Saito et al. 2006; Staniforth and Wood 2008; Baldauf et al. 2011; Seity et al. 2011). Although HR-NWP model forecasts lead to more realistic mesoscale structures (Mass et al. 2002), QPF is still affected by large errors, namely systematic biases, displacement errors, and fast error growth.

Due to the large uncertainties, probabilistic prediction is likely to be the best choice of precipitation forecasting. In our study, a multianalysis ensemble is generated from an operational HR-NWP system [notably the German-focused Consortium for Small-Scale Modeling (COSMO-DE) operated by the German Meteorological Service (DWD)] by the technique of lagged averaging forecasts (Hoffman and Kalnay 1983). Operational forecasts are generally started several times within one prediction period from updated initial states. Consequently, an operational HR-NWP system provides more than one forecast for periods smaller than the prediction period. These can be considered as a time-lagged ensemble (TLE). The TLE accounts for uncertainties in the initial and boundary conditions, since every member is initialized with new observational data from the data assimilation cycle and forced with updated lateral boundary conditions.

A TLE is a very pragmatic generation method for an ensemble since it comes at no additional costs. Forecasts for short-range periods of 1–2 days may indeed be improved by TLE (Lu et al. 2007), and several studies show the benefit for QPF (Walser et al. 2004; Mittermaier 2007; Yuan et al. 2009). However, a TLE only insufficiently accounts for the different sources of forecast errors (e.g., it ignores model errors). Furthermore, the forecast errors due to the initial and boundary perturbations at different time lags are not independent, since they are obtained from the previous forecast cycle modified by observations, respectively. So the perturbations account only for a restricted range of the analysis error statistics, and they are not optimized in terms of error growth, like those derived by, e.g., the singular vector method (Buizza and Palmer 1995) or breeding techniques (Toth and Kalnay 1993).

Standard methods for ensemble generation, such as ensemble Kalman filters (Evensen 1994; Houtekamer and Mitchell 1998), are less developed for HR-NWP models. One aspect of development involves the joint specification of a suitable ensemble of initial and lateral boundary conditions (Torn et al. 2006). Another problem with respect to HR-NWP is the distinct nonlinear character of the relevant microphysical processes (Snyder and Zhang 2003). Peralta et al. (2012) give a description of recent developments in ensemble generation for HR-NWP. They describe a mesoscale ensemble prediction system (EPS) based on COSMO-DE that is being developed at DWD (Gebhardt et al. 2011). COSMO-DE-EPS is a multianalysis and multiphysics ensemble and has been in a preoperational phase since December 2010. By now, there is roughly one year of data available from COSMO-DE-EPS. Therefore, this study concentrates on the TLE. Since a TLE comes at low cost, it defines a suitable baseline in order to assess the benefits of a mesoscale EPS such as COMSO-DE-EPS. The gain in predictive performance of COMSO-DE-EPS with respect to the TLE will be assessed in a forthcoming study.

Any EPS will be unable to completely incorporate all sources of uncertainty. They generally are underdispersive and exhibit systematic biases. Hence, postprocessing must be regarded as an integral part of an EPS in order to obtain reliable forecasts (Hamill and Colucci 1997). Pioneering work on probabilistic mesoscale ensemble prediction has been done by the University of Washington Probcast group (Mass et al. 2009). Their web-based portal provides uncertainty information and probabilistic weather forecasts. In addition to the construction of a mesoscale EPS and the development of a postprocessing scheme that produces reliable and sharp predictive distributions, the web site was constructed in cooperation with psychologists and human interface specialists to help disseminate information in a way that allows users to make better decisions.

An overview of state-of-the-art ensemble postprocessing techniques is given in Wilks (2006a). Rank histogram recalibration (Hamill and Colucci 1997; Hamill et al. 2008; Eckel and Walters 1998) provides calibrated ensemble forecasts. Regression techniques such as logistic regression (LR; Hamill et al. 2004) or quantile regression (QR; Bremnes 2004) directly estimate conditional probabilities or quantiles of a variable of interest. However, both techniques are limited to values that are sufficiently sampled and do not allow extrapolation to extreme or rare events. Moreover, logistic and quantile regression are generally applied for each quantile or probability separately, which leads to a large amount of parameters to be estimated. Extended logistic regression (Wilks 2009) and simultaneous quantile regression (Tokdar and Kadane 2012) aim to reduce the number of parameters and to avoid crossing quantile and probability curves, but come along with some restrictions on the regression coefficients.

An alternative approach is the estimation of a parametric distribution that requires only a few distribution parameters, for example, nonhomogeneous Gaussian regression (Gneiting et al. 2005). A parametric distribution function allows us to calculate probabilities and quantiles directly from the distribution parameters. This approach requires an a priori assumption on the type of distribution, and the performance strongly depends on how suitable the assumed distribution fits the data. Other statistical postprocessing techniques are kernel dressing, affine kernel dressing (Bröcker and Smith 2008), and Bayesian model averaging (BMA; Raftery et al. 2005). Sloughter et al. (2007) used BMA for precipitation forecasts, assuming a mixture of a point mass at zero and a gamma distribution. Gamma distributions were also fitted to rainfall amounts by Hamill et al. (2008).

This study investigates several approaches to generate calibrated probabilistic quantitative precipitation forecasts (PQPFs) from an ensemble of high-resolution forecasts. We first investigate logistic and quantile regression. They define baseline approaches that are investigated with respect to training and verification issues, as well as to the spatial and temporal characteristics. Once a suitable baseline is defined, we propose parametric mixture models and discuss their benefits and deficiencies. The mixtures are constructed using a point mass at zero to represent the probability of precipitation and a distribution from the exponential family for the amount of precipitation. Both parts are estimated within the framework of generalized linear models. Furthermore, a mixture model with an adaptive tail distribution based on extreme value theory is used to provide calibrated forecasts for extreme precipitation.

This article is organized as follows. In section 2 we introduce the time-lagged COSMO-DE ensemble and the observational data. Section 3 describes the postprocessing approaches and gives an overview of the probabilistic scores that are used for evaluation. The results are presented and discussed in sections 4 and 5. We conclude the study in section 6.

2. Data

a. The COSMO-DE model

This study is based on the HR-NWP model COSMO-DE (Baldauf et al. 2011), which is operated by DWD. For this study, operational forecasts from July 2008 to June 2011 are used. The model domain of COSMO-DE covers the area of Germany, most of the Alps region, and smaller parts of neighboring countries with a horizontal grid spacing of 2.8 km and 421 × 461 grid boxes. An important process for heavy precipitation on the mesoscale is deep convection. In COSMO-DE it is explicitly resolved by the nonhydrostatic model dynamics, whereas shallow convection remains parameterized. Precipitation sources in the forms of rain, snow, and graupel are modeled within a three-category ice scheme (Reinhardt and Seifert 2006). The data assimilation uses a nudging approach, which includes the assimilation of radar-derived rain rates through latent heat nudging (LHN). This allows for an initialization of convective events at the beginning of the simulation and significantly improves forecasts during the first forecast hours (Stephan et al. 2008). Furthermore, it produces horizontal wind fields that represent a realistic energy spectrum at initialization (Bierdel et al. 2012). The limited-area model COSMO-DE is nested into the coarser grid model COSMO-EU (7 km), which provides hourly boundary conditions for COSMO-DE. COSMO-EU is a regional model for central Europe. It is in turn nested into the global model GME (30 km), and is also updated hourly. Several changes are made concerning the operational setup of the COSMO-DE model for the time period of investigation. The changes include a successive spatial extension and improved brightband detection of the assimilated radar composite, as well as a bias correction for radiosonde measurements. Further changes concern the physical parameterizations of turbulence, advection, and diffusion.

b. The time-lagged ensemble

Hoffman and Kalnay (1983) describe how a TLE may be generated from an operational NWP system. A TLE consists of successively started operational forecasts that cover a common verification period. In our case, the TLE consists of COSMO-DE operational forecasts that are initialized every 3 h and predict a period of 21 h ahead. In this configuration, a prediction period of 12 h is covered by four successively started forecasts.

We further extend the TLE by using the neighborhood method (Theis et al. 2005; Schwartz et al. 2010). To that end, the forecasts within a spatial neighborhood of 5 × 5 grid boxes are used as additional ensemble members. This leads to a COSMO-DE-TLE of 4 × 25 = 100 members. Alternatives to this neighborhood definition [e.g., within a statistical or weather space (cf. Hamill et al. 2008)], as well as investigations with respect to the size of the neighborhood, are not part of this study. This would go beyond the scope of this article and will be the subject of future studies.

We have decided to restrict the analysis to a subdomain of the original COSMO-DE model domain of 160 × 160 grid boxes that covers large parts of northwestern Germany (Fig. 1). This is done for two reasons. First, the amount of data is considerable, and a restriction of the domain largely reduces the computational time. Second, the postprocessing assumes spatial homogeneity. So the subdomain is selected to be not too inhomogeneous and mountainous.

Fig. 1.
Fig. 1.

Mean 12-h precipitation accumulations in mm (12 h)−1 for (a) station measurements (linearly interpolated) and (b) COSMO-DE forecasts for the period 1 Jul 2008–30 Jun 2011. The stars in (a) represent the station locations.

Citation: Weather and Forecasting 27, 4; 10.1175/WAF-D-11-00101.1

c. Observational data

Rain gauge observations with hourly temporal resolution are taken from the observational network of DWD. Altogether, we use 445 observational sites, which are located in the subdomain of the model domain. Figure 1 shows mean 1200–0000 UTC precipitation accumulations from the rain gauges and from the COSMO-DE-TLE. The gauge measurements are linearly interpolated only for enhanced visibility in Fig. 1a. Both, COSMO-DE-TLE and station measurements show enhanced precipitation over the mountain ranges in the southern parts of the study area. The highest precipitation rates occur over Harz (51.8°N, 10.6°E) and Thüringer Wald (50.7°N, 10.8°E), followed by Vogelsberg (50.5°N, 9.2°E) and Sauerland/Rothaargebirge (51°N, 8°E). Lower precipitation rates are observed and modeled over the northern parts with mostly flat topography.

3. Methods

To obtain PQPFs from an ensemble, we mainly consider two types of ensemble postprocessing: logistic regression and quantile regression estimate single threshold exceedances and quantiles, while linear mixture models provide a complete parametric distribution function. LR and the mixture models are special cases of generalized linear models (GLMs; McCullagh and Nelder 1998; Fahrmeir and Tutz 1994) which are briefly described in the following. An adaptive tail can be included in the mixture models by the use of the generalized Pareto distribution (GPD). The nonstationary GPD uses a linear regression ansatz for the scale parameter. QR can also be formulated in terms of a linear regression model using the asymmetric Laplace distribution (Yu and Moyeed 2001).

a. Generalized linear models

GLMs extend classical linear regression models on the basis of two assumptions. The distributional assumption assumes that the response variable Y is conditionally independent given the covariates X and that its conditional distribution belongs to the exponential family. The structural assumption implies a relation between the expectation value μ and the linear predictor η = ZTβ of the form
eq1
Hereby, h(.) is called the response function, the inverse is called the link function η = h−1(μ), and the design vector Z is a function of the covariate, Z = Z(X). A GLM is fully described by the type of the exponential family, the response or link function, and the design vector.

b. Logistic regression

LR is used to estimate the probability of precipitation (PoP) and the probability of the exceedance of a threshold u (PoTu). Both determine the Bernoulli outcome of a variable K, which is 1 if precipitation Y exceeds u and 0 otherwise. The conditional distribution of K follows a Bernoulli distribution with E[K | X = x] = Pr(Y > u | X = x) = π and Var[K | X = x] = π(1 − π). The conditional probability π is derived as a function of the linear predictor π = h(η), where h(·) is the inverse logit function:
eq2

c. The mixture models

The cumulative distribution function (CDF) of precipitation FY(y | X = x) is described by a combination of two components. The first component represents the PoP given by π0 = Pr(Y > 0 | X = x), which is estimated via LR. The second component accounts for the amount of precipitation. Given that precipitation is above zero, it is assumed to follow the parametric distribution F*. Hereby, the asterisk subscript denotes any suitable distribution that belongs to the exponential family. This yields a predictive CDF of the form
e1

From the CDF in (1), one can calculate PoTu for all thresholds u as 1 − FY(u) and the conditional quantiles as .

The amount of precipitation should be represented by a distribution defined on . In this study we use the lognormal, inverse Gaussian, and gamma distributions. The expectation μ* = h(η) is a function of the linear predictor. The variance is estimated by the second-moment estimator as a function of the expectation. The parameters of the conditional distribution f*(·) are then derived by the method of moments.

d. A mixture model with GPD tail

The parametric mixture might well represent the bulk of the distribution but does not necessarily capture the right tail behavior. All proposed distributions in section 3c exhibit an exponential tail pattern of behavior, although several studies show evidence for a heavy tail pattern with regard to precipitation (Friederichs 2010). A misrepresentation of the tail behavior might lead to large prediction errors for extreme precipitation events. Extreme value theory provides an asymptotic theory for the tail of a distribution and, in particular, is developed to make predictions beyond the range of the data. Hence, a very natural extension of the mixture model is to represent the tail of the conditional distribution using a GPD (Coles 2001).

The GPD is used to model excesses, Z = Yuτ, over large thresholds, uτ. Extreme value theory proves that under very general conditions Z asymptotically follows a GPD for large u:
eq3
Here, σu denotes the scale parameter and ξ the shape parameter.
We use a relatively simple formulation of a mixture with variable tail behavior, where one of the mixtures defined in section 3c models the range below the conditional τu quantile and a GPD shows the excesses above. The probability τu is set to 0.95 in our application. The GPD is additionally conditioned on the covariates by assuming a linear model for the scale parameter with
eq4
The shape parameter ξ = ξ0 is kept constant. Note that this approach leads to a probability density function (PDF), which will be discontinuous in y = uτ. The CDF might not be smooth in τu, but is constructed to be strictly monotonically increasing and therefore valid. There are potentially more sophisticated approaches to constructing the mixture via a weight function as proposed by Frigessi et al. (2003) and used in Vrac and Naveau (2007). However, this would go beyond the scope of this article and will be investigated in a separate study.
The complete mixture with GPD in terms of its CDF reads as
e2

e. Censored quantile regression

A detailed description of QR is given in Koenker (2005). For an a priori defined probability τ, the quantile function of a response variable Y is modeled as a linear function of Z(X):
eq5
where Z = Z(X) is again a design vector. The regression coefficients βτ are estimated so that they minimize a function of weighted absolute errors from i = 1, … , n pairs of observations yi and quantile forecasts :
eq6
Hereby, ρτ denotes the so-called check function with ρτ(υ) = τυ if υ ≥ 0, and ρτ(υ) = (τ − 1)υ otherwise.
An important property of the quantile function is the equivariance to monotone transformations (Koenker 2005, p. 39). This equivariance property allows the formulation of a censored QR model for precipitation. Precipitation is bounded by zero and might be represented by a censored process, r(Y*) = max(0, Y*) (Friederichs and Hense 2007). The censored QR is then formulated as
eq7
where the estimates of the coefficients βτ minimize
e3
The optimization of (3) is not straightforward. Here, we follow a three-step procedure proposed by Chernozhukov and Hong (2002) and described in detail in Friederichs and Hense (2007).

f. Verification

Predictive performance of PoP, PoTu, and the τ quantiles as derived from LR and QR or the mixture models is assessed in terms of the Brier score (BS; Brier 1950) and the quantile verifications score (QVS; Gneiting and Raftery 2007; Friederichs and Hense 2007).

The BS represents the mean squared error over a verification sample of size n between the probability forecasts 0 ≤ π ≤ 1 and the observation, which is 1 when u is exceeded, and 0 otherwise (Wilks 2006b):
eq8
The BS can be decomposed into a reliability, resolution, and uncertainty component (Murphy 1973). Note that this decomposition is only meaningful if all samples are drawn from a distribution with the same underlying climatological probability (Hamill and Juras 2006).
The QVS is defined as the mean of weighted absolute errors between the observation y and the quantile forecast qτ, which are either derived from QR as or from one of the mixture models by inverting (1) or (2):
eq9
Both scores, BS and QVS, are negatively oriented where zero denotes a perfect forecast and higher scores indicate weaker performance of a forecast. BS and QVS are displayed in terms of skill scores, which describe the relative improvement of a forecast with respect to a reference forecast. The Brier skill score (BSS) is defined as
eq10
and the quantile verification skill score (QVSS) is defined analogously. In this study we chose as a reference the unconditional distribution often denoted as climatology. The full predictive distribution derived from the mixture models are assessed using the continuous ranked probability score (CRPS; Matheson and Winkler 1976; Hersbach 2000; Gneiting and Raftery 2007). The CRPS is defined as
eq11
where H(·) denotes the Heaviside function. The CRPS is also a negatively oriented score where smaller values correspond to better performance.

All scores are summary measures over a set of observations and forecasts. In this study, climatology and scores are summarized over time, but for each location separately. This is in accordance with Hamill and Juras (2006) and will minimize the effects of varying climatology between the operational sites.

4. Preliminary stages

Ensemble postprocessing requires an a priori determination of several choices with regard to model training and verification. First of all, a decision has to be made concerning the temporal setup of training and verification of the postprocessing. An important aspect is a strict separation of training and verification in the sense that the observational data that enter the verification are independent from those used for statistical model training. Furthermore, we want to closely follow an operational setup. In our setup the preceding time period is used for the training of the parameters of the postprocessing. Forecasts are then derived for the following 15 days. This setup is moved through the complete set of time series such that forecasts are derived for each 15-day period. In this way we obtain a series of forecasts that are independently derived from the respective observations used for verification. The training period should be relatively large for stable parameter estimation. However, if the period is too large, nonstationarities due to the seasonal cycle might deteriorate the predictive skill. A shorter training period is also more robust with respect to changes in the operational setup and model versions of COSMO-DE. A preliminary study has shown that the predictive skill increases with the length of the training period up to 50 days. Inconsistent patterns of behavior for a variety of scores for larger training periods led us to set the training period to 50 days. This results in a training sample of 50 × 445 observations. Since the sampling uncertainty becomes an issue for the extremal quantiles, we included an analysis in section 5d with a training sample that makes use of the complete dataset. Only the prediction period of 15 days extended by 15 days is withheld from the training data. Seasonal changes are implemented in the postprocessing via a sine and cosine with a period of 1 yr. Note, however, that substantial changes in the operational setup of COSMO-DE might impose temporal inhomogeneities that are not captured by the postprocessing.

In a next step, we need to identify an informative choice of predictors. We decided to use summary measures derived from the TLE. This is in accordance with approaches by Hamill et al. (2004) and Wilks (2009), who used the ensemble mean, or Bremnes (2004), who used quantiles and minima and maxima within the ensemble, as well as relative frequencies of precipitation. Here, the following statistical quantities are derived from the TLE precipitation forecasts. The ensemble mean denoted as first-guess mean (fgM), first-guess quantile estimates derived from the order statistics of the ensemble (fgQτ), and first-guess probabilities of precipitation (fgPoP) or of threshold exceedance (fgPoTu). The quantiles range from τ = 0.25 to τ = 0.999 and the thresholds are u = 0, 5, 10 mm. These quantities can be regarded as uncalibrated first-guess ensemble forecasts. Note that the first-guess estimates are derived from the TLE including the 5 × 5 gridpoint neighborhood for each location. Note, also, that the postprocessing pools the data from all stations in the considered subregion into one vector and estimates a spatially constant relation between the local predictors and precipitation.

Several studies have applied power transformation to precipitation accumulations. Wilks (2009) used the square root, Sloughter et al. (2007) the third root, and Hamill et al. (2004, 2008) the fourth root of precipitation. Preliminary tests revealed the largest improvements in terms of scores with a third-root transformation. Except for the lognormal GLM, the power transformation is applied to the target precipitation, as well as to the predictor precipitation in terms of fgM and fgQτ.

5. Results

a. Logistic regression

We start with LR for PoP, PoT5, and PoT10. Figure 2a displays the BSS for the 445 stations and various predictor combinations including the first-guess PoP or PoTu. Overall, the variability of the scores between stations is remarkable, reflecting a large uncertainty in the predictive skill on the one hand, and station specific characteristics that are not captured in the spatially homogeneous postprocessing on the other. Postprocessing largely increases the predictive skill, particularly for PoP. It turns out that fgM is a more informative predictor than are the respective fgPoP or fgPoTu for all thresholds. When the respective fgPoP or fgPoTu is added as a predictor, only a small amount of additional skill is obtained for PoP. The reliability and resolution parts of the BS are displayed in Figs. 2b and 2c. For PoP, large improvement in reliability is achieved for all LR forecasts. In contrast, an increase in resolution is obtained only for LR, which includes fgM as predictor. That explains the gain in BSS, if fgM is considered. Both PoT5 and PoT10 also show better reliability with respect to the first-guess forecasts, but no significant increase in resolution. The fgM is the most informative predictor, and fgPoP should be only included in the postprocessing of PoP.

Fig. 2.
Fig. 2.

(a) BSS, (b) reliability, and (c) resolution for all stations (boxplot) and different predictor combinations. The white box refers to the uncalibrated first-guess forecasts. The gray-shaded boxes refer to LR with the corresponding fgPoP or fgPoTu, fgM, and fgPoP or PoTu plus fgM as predictors. The legend in (b) is valid for all three plots. The y axis in (b) is restricted for enhanced visibility.

Citation: Weather and Forecasting 27, 4; 10.1175/WAF-D-11-00101.1

The large improvement in calibration of the postprocessed probabilities can also be seen from the diagrams in Fig. 3. The boxplots denote the observed frequencies conditional on the forecast probabilities at each station. For calibrated forecasts, these boxes should lie on the diagonal line. While the first-guess probabilities (Fig. 3, top) largely overestimate high probabilities, the postprocessed probabilities (Fig. 3, bottom) are well calibrated for PoP. Although the miscalibration is small, high PoT5 values are still overestimated even after the postprocessing. Here, PoT10 shows a better calibration for lower probabilities. The interpretation of calibration for higher probabilities remains difficult, since only 1% of the forecasts are larger than 0.4.

Fig. 3.
Fig. 3.

Reliability of (a) fgPoP, (b) fgPoT5, (c) fgPoT10, and of the LR estimates for (d) PoP with fgM+fgPoP as predictors, (e) PoT5, and (f) PoT10 with fgM as predictor. The boxplots show the observed frequencies conditional on each of the 10 forecast probabilities. The bar plot refers to the frequency of forecasts in each bin. A total of 484 134 pairs of observations and forecasts are used for each diagram.

Citation: Weather and Forecasting 27, 4; 10.1175/WAF-D-11-00101.1

Figure 4 shows the temporal evolution of the predictive intercept and the regression coefficients (β) of the LR for PoP and each 15-day period. The training is performed over the preceding 50-day period, respectively. All coefficients show a distinct seasonal cycle. The intercept is smaller during Northern Hemispheric winter compared to summer, whereas the regression coefficient for fgM is much larger during winter and spring. The latter suggests stronger coherence between fgM and PoP and, hence, greater predictability. The seasonal cycle in the fgPoP regression coefficient is less pronounced. However, fgPoP is more relevant during the summer months than during winter.

Fig. 4.
Fig. 4.

Temporal evolution of (a) the predictive intercept, and the regression coefficients for (b) fgM and (c) fgPoP of the LR for PoP. The gray shading indicates the standard error of the parameter estimates. The three lines refer to the three years of data used.

Citation: Weather and Forecasting 27, 4; 10.1175/WAF-D-11-00101.1

A distinct seasonal cycle is also observed for the other thresholds. We restrict the representation to the 5-mm threshold (Fig. 5). The pattern of behavior of the intercept and regression coefficients of fgM for PoT5 is very similar to that of PoP. The influence of fgPoT5 tends to zero for the 5-mm threshold, which is consistent with the fact that fgPoTu does not add additional predictive skill when included into the postprocessing. Identical results are obtained for the other thresholds.

Fig. 5.
Fig. 5.

As in Fig. 4, but for PoT5.

Citation: Weather and Forecasting 27, 4; 10.1175/WAF-D-11-00101.1

The mean BSS over all stations amounts to 54% for PoP, 34% for PoT5, and 21% for PoT10. The predictive performance depends on the seasonality with a clear sinusoidal variation for all thresholds (not shown), and highest BSSs during winter and lowest during summer. The smaller predictability during summer is consistent with the fact that summer precipitation is dominated by small-scale convective events, whereas autumn and winter precipitation is generated within larger-scale weather systems.

There is a clear correspondence between the elevation of the station and the BSS, with a generally greater skill for more elevated stations. The correlation between BSS and elevation amounts to 0.36 (0.29, 0.16) for PoP (PoT5, PoT10). Likewise, the mean amount of precipitation received at a station location is positively correlated to the elevation. The correspondence between the amount of precipitation and the BSS might be an issue of signal-to-noise ratio. If one assumes a mean forecast error that is constant across the domain and a signal that is related to the amount of precipitation, then wet grid points tend to have a larger signal-to-noise ratio than dry grid points. Another argument is that the occurrence of orographic precipitation is probably better captured by the model forecasts due to the localization of the forcing. Finally, predictability decreases for rarer events as seen in Fig. 2a. The mean PoP ranges between 0.32 for dry and 0.46 for wet stations, and its correlation with elevation amounts to about 0.5. The correlation is even slightly larger for PoT. The 5-mm threshold corresponds to the unconditional 0.96 quantile for the driest and 0.86 for the wettest stations. An exceedance of the 5-mm threshold is thus significantly less rare for the wettest station and hence better predictable.

b. Quantile regression

We now turn to the QR approach. Figure 6 shows the QVSS for a variety of predictor combinations. Again, the boxplots reflect a large variability between stations. The QVSS is small for the median and the lower range of the distribution. Since the mean PoP amounts to 39%, the 0.25 and 0.5 conditional quantiles are frequently censored (i.e., zero). The QVSS increases with probability τ. However, the uncertainty also largely increases for quantiles above τ = 0.95. The fgM is again a good predictor for all quantiles, even for the small ones. For larger quantiles a postprocessing that uses the respective fgQτ seems to perform slightly better than fgM. Additional skill is obtained for large quantiles, if fgPoP is included as a covariate. For low quantiles, fgPoP does not provide additional information, since PoP is already large over the training subsample, where the low quantiles are not censored (i.e., where fgPoP is large). Including fgPoP as a covariate introduces large uncertainties with respect to the regression parameter of fgPoP and leads to unstable and potentially wrong parameter estimates.1 We thus omitted fgPoP as a covariate for the lower quantiles.

Fig. 6.
Fig. 6.

QVSS for all stations (boxplot) and different predictor combinations. The white box refers to the uncalibrated first-guess forecasts. The gray-shaded boxes refer to QR with the fgM, the corresponding fgQτ, and the combination of fgQτ with fgPoP and with fgM as predictors. The y axis is restricted for enhanced visibility.

Citation: Weather and Forecasting 27, 4; 10.1175/WAF-D-11-00101.1

The temporal evolution of the intercept and the regression coefficients of QR for the 0.95 quantile is displayed in Fig. 7. We use fgQ95+fgPoP as predictors. The intercept shows a clear seasonal cycle with largest values during summer. The variability of the intercept is relatively large. The seasonal variations within the regression coefficient for the fgQ95 predictor are small, although the interannual variance is larger during summer compared to winter. The first-guess quantile already contains a realistic seasonal cycle. Thus, the postprocessing only corrects for a general bias. In contrast, the regression coefficient for fgPoP shows significant seasonal variations. It is large from late spring to early autumn and small otherwise. This suggests that the additional skill obtained with fgPoP is most pronounced during summer.

Fig. 7.
Fig. 7.

Temporal evolution of (a) the predictive intercept, and the regression coefficients for (b) fgQ95 and (c) fgPoP of the QR for the 0.95 quantile. The gray shading indicates the standard error of the parameter estimates.

Citation: Weather and Forecasting 27, 4; 10.1175/WAF-D-11-00101.1

The seasonal variation of the QVSS is very similar to that of the BSS. Again, the predictive performance is largest during winter and late autumn, and lowest during summer. Generally, the higher quantiles for τ ≥ 0.9 are underestimated, particularly during summer, when convective rainfall events lead to large rainfall amounts. Like the BSS, the QVSS for the lower quantiles shows a similar accordance to the mean amount of precipitation a station receives. This accordance becomes weaker for the upper quantiles. The correlation between the QVSS and the mean amount of precipitation varies between 0.2 and 0.6 for quantiles between 0.25 ≤ τ ≤ 0.95, and decreases to 0.05 and 0.07 for τ ≥ 0.99.

c. Parametric mixture models

A parametric mixture model approach is intended to provide the complete predictive distribution based on a small number of parameters. This requires an a priori assumption about the type of distribution. Here, we assume that the distribution of precipitation follows a mixture of a point mass at zero representing the PoP and a continuous distribution for the amount of precipitation, given that it is above zero (cf. section 3c). As shown in section 5a, PoP is best estimated using LR with predictors fgM+fgPoP. Three mixture distributions are investigated to represent the distribution of precipitation: a mixture using the gamma distribution (LR-Gam), the lognormal distribution (LR-LogN), and the inverse Gaussian distribution (LR-InvG). The link function for all distributions is the identity; that is, the expectation is estimated by a linear predictor.

Figure 8 shows the predictive performance of the different mixtures using various predictor combinations in terms of the CRPS. The boxplot represents the distribution of the temporal mean CRPS at the stations. Note that the CRPS is a negatively oriented score with smaller values indicating better performance. Overall, the CRPS shows large variability between stations, but the differences between mixtures are fairly significant (Fig. 8). The lowest median is obtained for LR-Gam with fgM+fgPoP as a predictor. However, the differences with respect to the predictors are also very small. The largest outliers and the largest median are obtained for LR-InvG with fgQ90 as a predictor.

Fig. 8.
Fig. 8.

CRPS for all stations (boxplot) and different distributions. The gray-shaded boxes refer to the fgM, fgM+fgPoP, fgQ90, and fgQ90+fgPoP as predictors.

Citation: Weather and Forecasting 27, 4; 10.1175/WAF-D-11-00101.1

To further explore the performance of the different mixtures, we compare BSS and QVSS for selected thresholds and quantiles in Figs. 9 and 10. As predictors, we use fgQ90+fgPoP, since the performance is best in terms of QVSS. The baseline approach is to use the respective LR and QR that performed best. Both LR-Gam and LR-LogN show similar BSS compared to LR for all thresholds (Fig. 9). It seems that LR-Gam reveals less uncertainty in BSS for the high thresholds compared to LR and LR-LogN. The LR-InvG mixture fails to give skillful forecasts, although the CRPS is similar to the other mixtures.

Fig. 9.
Fig. 9.

BSS for all stations (boxplot) and different thresholds. The white box refers to LR, while the gray-shaded boxes refer to LR-Gam, LR-LogN, and LR-InvG with predictors fgQ90+fgPoP.

Citation: Weather and Forecasting 27, 4; 10.1175/WAF-D-11-00101.1

Fig. 10.
Fig. 10.

As in Fig. 9, but for QVSS. The y axis is restricted for enhanced visibility.

Citation: Weather and Forecasting 27, 4; 10.1175/WAF-D-11-00101.1

Similar results are obtained when comparing the quantile forecasts (Fig. 10). Again, LR-InvG completely fails for the 0.75–0.99 quantiles, but seems to recover skill for the very high 0.995 and 0.999 quantiles. Here, LR-InvG reveals significant deficiencies in the distributional assumption that are not detected by looking solely to the CRPS, but which are evident in BSS and QVSS. For the higher quantiles the semiparametric approach is superior to all mixtures. Both LR-Gam and LR-LogN fit well to the bulk of the data, but fail to correctly describe the tail behavior. Among the mixtures, LR-Gam seems to provide the best performance in terms of QVSS.

Figure 11 shows the 3-month moving average of the QVSS of QR minus the QVSS of LR-Gam for the upper quantiles. The differences are positive for all seasons, confirming that QR provides more skill than the mixture LR-Gam. However, the differences vary over the year, with the smallest values during spring and early summer. The differences also increase for higher quantiles, indicating a general bias in the tail of the LR-Gam mixture distribution that increases with τ.

Fig. 11.
Fig. 11.

The 3-month moving average of QVSS for QR minus QVSS for the LR-Gam mixture.

Citation: Weather and Forecasting 27, 4; 10.1175/WAF-D-11-00101.1

d. Mixture model with GPD tail

We have seen that LR-Gam represents a sophisticated parametric model for the bulk of the distribution but tends to systematically overestimate large quantiles. We thus turn to the LR-Gam-GPD mixture as introduced in section 3d. The scale parameter of the GPD is estimated using the same predictors as for the LR-Gam mixture, namely fgQ90+fgPoP.

It turns out that the LR-Gam-GPD mixture indeed better represents the high quantiles (Fig. 12). The predictive performance of LR-Gam-GPD for the 0.999 quantile is comparable to the semiparametric approach, with slightly smaller negative outliers. This indicates that LR-Gam-GPD is slightly more robust in parameter estimation. It is clearly superior compared to the LR-Gam mixture.

Fig. 12.
Fig. 12.

Boxplot of QVSS for all stations. The white boxes refer to the QR approach, while the gray-shaded boxes refer to the LR-Gam mixture (dark gray) and the LR-Gam-GPD mixture (light gray). The gray-bordered boxplots show the results for a 3-yr training period including the annual cycle as a covariate. The y axis is restricted for enhanced visibility.

Citation: Weather and Forecasting 27, 4; 10.1175/WAF-D-11-00101.1

To assess the role of the size of the training sample, which might be important for the extremal quantiles, we use a training period of 1065 days (i.e., 3 yr without the 15 days of the prediction period extended by 15 days). It now becomes necessary to account for the seasonal cycle. This is achieved by including a sine and cosine with a period of 1 yr, as well as the interaction terms (i.e., sine–cosine times fgQ90 or fgPoP) as predictors. The extended training period results only in a small increase in predictive performance (Fig. 12).

However, the effect of a longer training period becomes evident when comparing the regression parameters of the GPD scale parameter and the shape parameter. Figure 13 shows the temporal evolution of the intercept, the regression coefficients for fgQ90 and fgPoP, and the GPD shape parameter as obtained from a 50-day training period. All three parameters show large variations between the prediction periods. This indicates that the uncertainty is considerable. The shape parameter is generally negative, thereby reducing the systematic overestimation of the LR-Gam mixture.

Fig. 13.
Fig. 13.

Temporal evolution of (a) the predictive intercept, the regression coefficients for (b) fgQ90 and (c) fgPoP for the scale parameter, and (d) the shape parameter of the GPD with a 50-day training period. The gray shading indicates the standard error of the parameter estimates. The three lines refer to the three years of data used.

Citation: Weather and Forecasting 27, 4; 10.1175/WAF-D-11-00101.1

Figure 14 shows the intercept, the fgQ90 and fgPoP regression coefficients, and the GPD shape parameter as obtained using a 1065-day training period. The estimates show fewer variations and a clear seasonal cycle. The shape parameter is nearly constant over time and less negative.

Fig. 14.
Fig. 14.

As in Fig. 13, but for the 1065-day training period.

Citation: Weather and Forecasting 27, 4; 10.1175/WAF-D-11-00101.1

The training sample of 50 days over large parts of Germany (445 locations) already provides good estimates of the regression coefficients, but also contains large uncertainties. The larger training sample reduces the variations, but does not lead to large improvements in predictive performance. It is possible that the effect of the annual cycle is not sufficient described by a sine and cosine wave. Moreover, it might be an issue that the model setup of COSMO-DE changed several times during the time period of investigation.

Figure 15 compares the seasonal variations in the mean difference between the QVSS for QR and LR-Gam-GPD, similar to Fig. 11. The effect of the GPD can be seen by the decreasing bias in QVSS for higher probabilities τ. While the mean differences in QVSS between QR and LR-Gam range between 0% and 15% (Fig. 11), the differences for LR-Gam-GPD are of the order of ±5%. This indicates that QR and LR-Gam-GPD have similar levels of predictive performance. However, since the sampling uncertainty of the QVSS may become very large for high quantiles, it is difficult to rate the performance of the methods. Which of the methods is to be preferred depends on the application and the user’s needs.

Fig. 15.
Fig. 15.

Three-month moving average of QVSS for QR minus QVSS for the LR-Gam-GPD mixture.

Citation: Weather and Forecasting 27, 4; 10.1175/WAF-D-11-00101.1

6. Conclusions

We present several approaches to deriving PQPF based on the COMSO-DE time-lagged ensemble. As covariates, we use first-guess estimates for the mean, PoP, PoT, and quantiles from the COMSO-DE-TLE model, which are estimated from the TLE at each station location including a neighborhood of 5 × 5 grid points.

First, we show that statistical postprocessing largely improves the predictive performance of the probability and quantile forecasts when compared to the first-guess estimates. The postprocessing not only calibrates the probability forecasts, but through a combination of predictors also increases their resolution. The value of PoP is best predicted using LR with a combination of first-guess mean and PoP from the ensemble. For higher thresholds, the first-guess mean represents the most informative predictor. The first-guess mean is also a good predictor for conditional quantiles. However, higher quantiles are estimated slightly more effectively with a QR using fgQτ as a predictor.

Predictability is largest during winter and smallest during summer. This is observed for all thresholds and quantiles. The BSS is largest for PoP and decreases for higher thresholds. Due to the frequent occurrence of censoring, predictive skill is very small for low quantiles, but largely increases with the probability of the quantile. Largest QVSS is observed for the 0.95 quantiles. Even the very high quantiles show large QVSS, although the uncertainty largely increases.

Parametric mixture models are formulated in order to derive a sophisticated parametric postprocessing. The advantage is a smaller amount of parameters, and since a full conditional distribution is estimated, forecasts are easily derived for all thresholds and quantiles, even for the extreme ones. All of the mixtures we propose have a point mass at zero for the probability of precipitation above zero. The mixtures differ in their representations of the amount of precipitation. Here, four models are used: a gamma, lognormal, inverse Gaussian, and a mixture between gamma and GPD. The latter represents a mixture with an adaptive tail distribution motivated by extreme value theory.

The inverse Gaussian shows some major deficiencies in the range between the conditional 0.75 and 0.99 quantile, and is not further discussed here. The gamma and lognormal mixtures show a comparable level of performance, although the former seems to provide slightly more skill. Both show reduced predictability for the high quantiles when compared with the semiparametric estimates from QR. This lack in performance increases with the probability of the quantile. This is due to a systematic overestimation. Thus, an extrapolation to extreme values results in a significant overestimation of the extremes. It turned out that the CRPS is relatively insensitive to deviations outside the bulk of the distribution, and other performance measures are needed to assess predictive skill in the tail of the distribution.

The gamma mixture with an adaptive GPD tail successfully corrects for this overestimation. This correction is partly obtained with a GPD shape parameter that is significantly negative during most of the time. Although this suggests an upper bound on extreme precipitation, we refrain from an interpretation of the results in terms of physical mechanisms. Further research is needed in this respect, and a more sophisticated formulation of a mixture with an adaptive GPD tail is desirable.

The sampling uncertainty for a 50-day training period becomes an issue for the GPD tail. The uncertainty of the parameter estimates is quite large. An extended training set that uses almost 3 yr of data provides much more stable parameters estimates, but only a very small increase in predictive performance.

The study provides a benchmark model for PQPF based on a deterministic operational prediction system using the method of lagged average forecasting. Postprocessing is indispensable within this TLE, and LR and QR provide skillful point estimates of the predictive distribution. A mixture using a gamma distribution with an adaptive GPD tail is an appropriate parametric alternative that allows for an extrapolation toward high quantiles. However, none of the parametric mixtures outperforms LR or QR. Thus, the user’s needs will dictate which choice of postprocessing is most appropriate. If a complete predictive distribution is required (e.g., in order to sample from this distribution), the LR-Gam-GPD is an appropriate alternative to LR and QR.

Acknowledgments

The authors thank Tom Hamill and two anonymous reviewers for their constructive and helpful comments. Martin Göber kindly provided the rain gauge dataset from the DWD network. This work is funded by Deutscher Wetterdienst in Offenbach, Germany, within the framework of an extramural research program.

REFERENCES

  • Baldauf, M., Seifert A. , Förstner J. , Majewski D. , Raschendorfer M. , and Reinhardt T. , 2011: Operational convective-scale numerical weather prediction with the COSMO model: Description and sensitivities. Mon. Wea. Rev., 139, 38873905.

    • Search Google Scholar
    • Export Citation
  • Bierdel, L., Friederichs P. , and Bentzien S. , 2012: Spatial kinetic energy spectra in the convection-permitting limited-area NWP model COSMO-DE. Meteor. Z., in press.

    • Search Google Scholar
    • Export Citation
  • Bremnes, J. B., 2004: Probabilistic forecasts of precipitation in terms of quantiles using NWP model output. Mon. Wea. Rev., 132, 338347.

    • Search Google Scholar
    • Export Citation
  • Brier, G. W., 1950: Verification of forecasts expressed in terms of probability. Mon. Wea. Rev., 78, 13.

  • Bröcker, J., and Smith L. A. , 2008: From ensemble forecasts to predictive distribution functions. Tellus, 60A, 663678.

  • Buizza, R., and Palmer T. N. , 1995: The singular-vector structure of the atmospheric global circulation. J. Atmos. Sci., 52, 14341456.

    • Search Google Scholar
    • Export Citation
  • Chernozhukov, V., and Hong H. , 2002: Three-step censored quantile regression, with an application to extramarital affairs. J. Amer. Stat. Assoc., 97, 872882.

    • Search Google Scholar
    • Export Citation
  • Coles, S., 2001: An Introduction to Statistical Modeling of Extreme Values. Springer Series in Statistics, Springer, 208 pp.

  • Craig, G., and Coauthors, 2010: Weather research in Europe: A THORPEX European plan. WMO/TD-1531, WWRP/THORPEX Rep. 14, 41 pp.

  • Ebert, E. E., Damrath U. , Wergen W. , and Baldwin M. E. , 2003: The WGNE assessment of short-term quantitative precipitation forecasts. Bull. Amer. Meteor. Soc., 84, 481492.

    • Search Google Scholar
    • Export Citation
  • Eckel, F. A., and Walters M. K. , 1998: Calibrated probabilistic quantitative precipitation forecasts based on the MRF ensemble. Wea. Forecasting, 13, 11321147.

    • Search Google Scholar
    • Export Citation
  • Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99 (C5), 10 14310 162.

    • Search Google Scholar
    • Export Citation
  • Fahrmeir, L., and Tutz G. , 1994: Multivariate Statistical Modelling Based on Generalized Linear Models. Springer, 425 pp.

  • Friederichs, P., 2010: Statistical downscaling of extreme precipitation events using extreme value theory. Extremes, 13, 109132.

  • Friederichs, P., and Hense A. , 2007: Statistical downscaling of extreme precipitation events using censored quantile regression. Mon. Wea. Rev., 135, 23652378.

    • Search Google Scholar
    • Export Citation
  • Frigessi, A., Haug O. , and Rue H. , 2003: A dynamic mixture model for unsupervised tail estimation without threshold selection. Extremes, 5, 219236.

    • Search Google Scholar
    • Export Citation
  • Gebhardt, C., Theis S. E. , Paulat M. , and Ben Bouallègue Z. , 2011: Uncertainties in COSMO-DE precipitation forecasts introduced by model perturbations and variations of lateral boundaries. Atmos. Res., 100, 168177.

    • Search Google Scholar
    • Export Citation
  • Gneiting, T., and Raftery A. E. , 2007: Strictly proper scoring rules, prediction, and estimation. J. Amer. Stat. Assoc., 102, 359378.

    • Search Google Scholar
    • Export Citation
  • Gneiting, T., Raftery A. E. , Westveld A. H. III, and Goldman T. , 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., 133, 10981118.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and Colucci S. J. , 1997: Verification of Eta–RSM short-range ensemble forecasts. Mon. Wea. Rev., 125, 13121327.

  • Hamill, T. M., and Juras J. , 2006: Measuring forecast skill: Is it real skill or is it the varying climatology? Quart. J. Roy. Meteor. Soc., 132, 29052923.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., Whitaker J. S. , and Wei X. , 2004: Ensemble reforecasting: Improving medium-range forecasts skill using retrospective forecasts. Mon. Wea. Rev., 132, 14341447.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., Hagedorn R. , and Whitaker J. S. , 2008: Probabilistic forecast calibration using ECMWF and GFS ensemble reforecasts. Part II: Precipitation. Mon. Wea. Rev., 136, 26202632.

    • Search Google Scholar
    • Export Citation
  • Hense, A., and Wulfmeyer V. , 2008: The German priority program SPP1167 “Quantitative Precipitation Forecast.” Meteor. Z., 17, 703705.

    • Search Google Scholar
    • Export Citation
  • Hersbach, H., 2000: Decomposition of the continuous ranked probability score for ensemble prediction systems. Wea. Forecasting, 15, 559570.

    • Search Google Scholar
    • Export Citation
  • Hoffman, R. N., and Kalnay E. , 1983: Lagged average forecasting, an alternative to Monte Carlo forecasting. Tellus, 35A, 100118.

  • Houtekamer, P. L., and Mitchell H. L. , 1998: Data assimilation using an ensemble Kalman filter technique. Mon. Wea. Rev., 126, 796811.

    • Search Google Scholar
    • Export Citation
  • Koenker, R., 2005: Quantile Regression. Econometric Society Monogr., No. 38, Cambridge University Press, 349 pp.

  • Lu, C., Yuan H. , Schwartz B. E. , and Benjamin S. G. , 2007: Short-range numerical weather prediction using time-lagged ensembles. Wea. Forecasting, 22, 580595.

    • Search Google Scholar
    • Export Citation
  • Mass, C. F., Ovens D. , Westrick K. , and Colle B. A. , 2002: Does increasing horizontal resolution produce more skillful forecasts? Bull. Amer. Meteor. Soc., 83, 407430.

    • Search Google Scholar
    • Export Citation
  • Mass, C. F., and Coauthors, 2009: PROBCAST: A web-based portal to mesoscale probabilistic forecasts. Bull. Amer. Meteor. Soc., 90, 10091014.

    • Search Google Scholar
    • Export Citation
  • Matheson, J. E., and Winkler R. L. , 1976: Scoring rules for continuous probability distributions. Manage. Sci., 22, 10871096.

  • McCullagh, P., and Nelder J. A. , 1998: Generalized Linear Models. 2nd ed. Monogr. on Statistics and Applied Probability, Vol. 37, Chapman and Hall/CRC, 511 pp.

  • Mittermaier, M. P., 2007: Improving short-range high-resolution model precipitation forecast skill using time-lagged ensembles. Quart. J. Roy. Meteor. Soc., 133, 14871500.

    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1973: A new vector partition of the probability score. J. Appl. Meteor., 12, 595600.

  • Peralta, C., Bouallegue Z. B. , Theis S. E. , Gebhardt C. , and Buchhold M. , 2012: Accounting for initial condition uncertainties in COSMO-DE-EPS. J. Geophys. Res., 117, D07108, doi:10.1029/2011JD016581.

    • Search Google Scholar
    • Export Citation
  • Raftery, A. E., Gneiting T. , Balabdaoui F. , and Polakowski M. , 2005: Using Bayesian model averaging to calibrate forecast ensembles. Mon. Wea. Rev., 133, 11551174.

    • Search Google Scholar
    • Export Citation
  • Reinhardt, T., and Seifert A. , 2006: A three-category ice scheme for LMK. COSMO Newsletter, No. 6, Consortium for Small-Scale Modeling, Offenbach am Main, Germany, 115–120.

  • Saito, K., and Coauthors, 2006: The operational JMA nonhydrostatic mesoscale model. Mon. Wea. Rev., 134, 12661298.

  • Schwartz, C. S., and Coauthors, 2010: Toward improved convection-allowing ensembles: Model physics sensitivities and optimizing probabilistic guidance with small ensemble membership. Wea. Forecasting, 25, 263280.

    • Search Google Scholar
    • Export Citation
  • Seity, Y., Brousseau P. , Malardel S. , Hello G. , Bénard P. , Bouttier F. , Lac C. , and Masson V. , 2011: The AROME-France convective-scale operational model. Mon. Wea. Rev., 139, 976991.

    • Search Google Scholar
    • Export Citation
  • Skamarock, W. C., and Klemp J. B. , 2008: A time-split nonhydrostatic atmospheric model for weather research and forecasting applications. J. Comput. Phys., 227, 34653485.

    • Search Google Scholar
    • Export Citation
  • Sloughter, J. M., Raftery A. E. , Gneiting T. , and Fraley C. , 2007: Probabilistic quantitative precipitation forecasting using Bayesian model averaging. Mon. Wea. Rev., 135, 32093220.

    • Search Google Scholar
    • Export Citation
  • Snyder, C., and Zhang F. , 2003: Assimilation of simulated Doppler radar observations with an ensemble Kalman filter. Mon. Wea. Rev., 131, 16631677.

    • Search Google Scholar
    • Export Citation
  • Staniforth, A., and Wood N. , 2008: Aspects of the dynamical core of a nonhydrostatic, deep-atmosphere, unified weather and climate-prediction model. J. Comput. Phys., 227, 34453464.

    • Search Google Scholar
    • Export Citation
  • Stephan, K., Klink S. , and Schraff C. , 2008: Assimilation of radar-derived rain rates into the convective-scale model COSMO-DE at DWD. Quart. J. Roy. Meteor. Soc., 134, 13151326.

    • Search Google Scholar
    • Export Citation
  • Theis, S. E., Hense A. , and Damrath U. , 2005: Probabilistic precipitation forecasts from a deterministic model: A pragmatic approach. Meteor. Appl., 12, 257268.

    • Search Google Scholar
    • Export Citation
  • Tokdar, S. T., and Kadane J. B. , 2012: Simultaneous linear quantile regression: A semiparametric Bayesian approach. Bayesian Anal., 7, 5172.

    • Search Google Scholar
    • Export Citation
  • Torn, R. D., Hakim G. J. , and Snyder C. , 2006: Boundary conditions for limited-area ensemble Kalman filters. Mon. Wea. Rev., 134, 24902502.

    • Search Google Scholar
    • Export Citation
  • Toth, Z., and Kalnay E. , 1993: Ensemble forecasting at NMC: The generation of perturbations. Bull. Amer. Meteor. Soc., 74, 23172330.

  • Vrac, M., and Naveau P. , 2007: Stochastic downscaling of precipitation: From dry events to heavy rainfalls. Water Resour. Res., 43, W07402, doi:10.1029/2006WR005308.

    • Search Google Scholar
    • Export Citation
  • Walser, A., Lüthi D. , and Schär C. , 2004: Predictability of precipitation in a cloud-resolving model. Mon. Wea. Rev., 132, 560577.

  • Wilks, D. S., 2006a: Comparison of ensemble-MOS methods in the Lorenz ’96 setting. Meteor. Appl., 13, 243256.

  • Wilks, D. S., 2006b: Statistical Methods in the Atmospheric Sciences. 2nd ed. International Geophysics Series, Vol. 91, Academic Press, 627 pp.

  • Wilks, D. S., 2009: Extending logistic regression to provide full-probability-distribution MOS forecasts. Meteor. Appl., 16, 361368.

  • Yu, K., and Moyeed R. A. , 2001: Bayesian quantile regression. Stat. Probab. Lett., 54, 437447.

  • Yuan, H., Lu C. , McGinley J. A. , Schultz P. J. , Jamison B. D. , Wharton L. , and Anderson C. J. , 2009: Evaluation of short-range quantitative precipitation forecasts from a time-lagged multimodel ensemble. Wea. Forecasting, 24, 1838.

    • Search Google Scholar
    • Export Citation
1

Incorrect parameter estimates occur if the regression parameter of fgPoP becomes negative. A negative regression parameter would predict larger precipitation quantiles with decreasing PoP, which contradicts the conditions for censoring. The regression model counterbalances this by setting the intercept to an unrealistically large positive value. Instead of omitting fgPoP as a covariate, one might also restrict the regression parameter of fgPoP to values above zero.

Save
  • Baldauf, M., Seifert A. , Förstner J. , Majewski D. , Raschendorfer M. , and Reinhardt T. , 2011: Operational convective-scale numerical weather prediction with the COSMO model: Description and sensitivities. Mon. Wea. Rev., 139, 38873905.

    • Search Google Scholar
    • Export Citation
  • Bierdel, L., Friederichs P. , and Bentzien S. , 2012: Spatial kinetic energy spectra in the convection-permitting limited-area NWP model COSMO-DE. Meteor. Z., in press.

    • Search Google Scholar
    • Export Citation
  • Bremnes, J. B., 2004: Probabilistic forecasts of precipitation in terms of quantiles using NWP model output. Mon. Wea. Rev., 132, 338347.

    • Search Google Scholar
    • Export Citation
  • Brier, G. W., 1950: Verification of forecasts expressed in terms of probability. Mon. Wea. Rev., 78, 13.

  • Bröcker, J., and Smith L. A. , 2008: From ensemble forecasts to predictive distribution functions. Tellus, 60A, 663678.

  • Buizza, R., and Palmer T. N. , 1995: The singular-vector structure of the atmospheric global circulation. J. Atmos. Sci., 52, 14341456.

    • Search Google Scholar
    • Export Citation
  • Chernozhukov, V., and Hong H. , 2002: Three-step censored quantile regression, with an application to extramarital affairs. J. Amer. Stat. Assoc., 97, 872882.

    • Search Google Scholar
    • Export Citation
  • Coles, S., 2001: An Introduction to Statistical Modeling of Extreme Values. Springer Series in Statistics, Springer, 208 pp.

  • Craig, G., and Coauthors, 2010: Weather research in Europe: A THORPEX European plan. WMO/TD-1531, WWRP/THORPEX Rep. 14, 41 pp.

  • Ebert, E. E., Damrath U. , Wergen W. , and Baldwin M. E. , 2003: The WGNE assessment of short-term quantitative precipitation forecasts. Bull. Amer. Meteor. Soc., 84, 481492.

    • Search Google Scholar
    • Export Citation
  • Eckel, F. A., and Walters M. K. , 1998: Calibrated probabilistic quantitative precipitation forecasts based on the MRF ensemble. Wea. Forecasting, 13, 11321147.

    • Search Google Scholar
    • Export Citation
  • Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99 (C5), 10 14310 162.

    • Search Google Scholar
    • Export Citation
  • Fahrmeir, L., and Tutz G. , 1994: Multivariate Statistical Modelling Based on Generalized Linear Models. Springer, 425 pp.

  • Friederichs, P., 2010: Statistical downscaling of extreme precipitation events using extreme value theory. Extremes, 13, 109132.

  • Friederichs, P., and Hense A. , 2007: Statistical downscaling of extreme precipitation events using censored quantile regression. Mon. Wea. Rev., 135, 23652378.

    • Search Google Scholar
    • Export Citation
  • Frigessi, A., Haug O. , and Rue H. , 2003: A dynamic mixture model for unsupervised tail estimation without threshold selection. Extremes, 5, 219236.

    • Search Google Scholar
    • Export Citation
  • Gebhardt, C., Theis S. E. , Paulat M. , and Ben Bouallègue Z. , 2011: Uncertainties in COSMO-DE precipitation forecasts introduced by model perturbations and variations of lateral boundaries. Atmos. Res., 100, 168177.

    • Search Google Scholar
    • Export Citation
  • Gneiting, T., and Raftery A. E. , 2007: Strictly proper scoring rules, prediction, and estimation. J. Amer. Stat. Assoc., 102, 359378.

    • Search Google Scholar
    • Export Citation
  • Gneiting, T., Raftery A. E. , Westveld A. H. III, and Goldman T. , 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., 133, 10981118.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and Colucci S. J. , 1997: Verification of Eta–RSM short-range ensemble forecasts. Mon. Wea. Rev., 125, 13121327.

  • Hamill, T. M., and Juras J. , 2006: Measuring forecast skill: Is it real skill or is it the varying climatology? Quart. J. Roy. Meteor. Soc., 132, 29052923.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., Whitaker J. S. , and Wei X. , 2004: Ensemble reforecasting: Improving medium-range forecasts skill using retrospective forecasts. Mon. Wea. Rev., 132, 14341447.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., Hagedorn R. , and Whitaker J. S. , 2008: Probabilistic forecast calibration using ECMWF and GFS ensemble reforecasts. Part II: Precipitation. Mon. Wea. Rev., 136, 26202632.

    • Search Google Scholar
    • Export Citation
  • Hense, A., and Wulfmeyer V. , 2008: The German priority program SPP1167 “Quantitative Precipitation Forecast.” Meteor. Z., 17, 703705.

    • Search Google Scholar
    • Export Citation
  • Hersbach, H., 2000: Decomposition of the continuous ranked probability score for ensemble prediction systems. Wea. Forecasting, 15, 559570.

    • Search Google Scholar
    • Export Citation
  • Hoffman, R. N., and Kalnay E. , 1983: Lagged average forecasting, an alternative to Monte Carlo forecasting. Tellus, 35A, 100118.

  • Houtekamer, P. L., and Mitchell H. L. , 1998: Data assimilation using an ensemble Kalman filter technique. Mon. Wea. Rev., 126, 796811.

    • Search Google Scholar
    • Export Citation
  • Koenker, R., 2005: Quantile Regression. Econometric Society Monogr., No. 38, Cambridge University Press, 349 pp.

  • Lu, C., Yuan H. , Schwartz B. E. , and Benjamin S. G. , 2007: Short-range numerical weather prediction using time-lagged ensembles. Wea. Forecasting, 22, 580595.

    • Search Google Scholar
    • Export Citation
  • Mass, C. F., Ovens D. , Westrick K. , and Colle B. A. , 2002: Does increasing horizontal resolution produce more skillful forecasts? Bull. Amer. Meteor. Soc., 83, 407430.

    • Search Google Scholar
    • Export Citation
  • Mass, C. F., and Coauthors, 2009: PROBCAST: A web-based portal to mesoscale probabilistic forecasts. Bull. Amer. Meteor. Soc., 90, 10091014.

    • Search Google Scholar
    • Export Citation
  • Matheson, J. E., and Winkler R. L. , 1976: Scoring rules for continuous probability distributions. Manage. Sci., 22, 10871096.

  • McCullagh, P., and Nelder J. A. , 1998: Generalized Linear Models. 2nd ed. Monogr. on Statistics and Applied Probability, Vol. 37, Chapman and Hall/CRC, 511 pp.

  • Mittermaier, M. P., 2007: Improving short-range high-resolution model precipitation forecast skill using time-lagged ensembles. Quart. J. Roy. Meteor. Soc., 133, 14871500.

    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1973: A new vector partition of the probability score. J. Appl. Meteor., 12, 595600.

  • Peralta, C., Bouallegue Z. B. , Theis S. E. , Gebhardt C. , and Buchhold M. , 2012: Accounting for initial condition uncertainties in COSMO-DE-EPS. J. Geophys. Res., 117, D07108, doi:10.1029/2011JD016581.

    • Search Google Scholar
    • Export Citation
  • Raftery, A. E., Gneiting T. , Balabdaoui F. , and Polakowski M. , 2005: Using Bayesian model averaging to calibrate forecast ensembles. Mon. Wea. Rev., 133, 11551174.

    • Search Google Scholar
    • Export Citation
  • Reinhardt, T., and Seifert A. , 2006: A three-category ice scheme for LMK. COSMO Newsletter, No. 6, Consortium for Small-Scale Modeling, Offenbach am Main, Germany, 115–120.

  • Saito, K., and Coauthors, 2006: The operational JMA nonhydrostatic mesoscale model. Mon. Wea. Rev., 134, 12661298.

  • Schwartz, C. S., and Coauthors, 2010: Toward improved convection-allowing ensembles: Model physics sensitivities and optimizing probabilistic guidance with small ensemble membership. Wea. Forecasting, 25, 263280.

    • Search Google Scholar
    • Export Citation
  • Seity, Y., Brousseau P. , Malardel S. , Hello G. , Bénard P. , Bouttier F. , Lac C. , and Masson V. , 2011: The AROME-France convective-scale operational model. Mon. Wea. Rev., 139, 976991.

    • Search Google Scholar
    • Export Citation
  • Skamarock, W. C., and Klemp J. B. , 2008: A time-split nonhydrostatic atmospheric model for weather research and forecasting applications. J. Comput. Phys., 227, 34653485.

    • Search Google Scholar
    • Export Citation
  • Sloughter, J. M., Raftery A. E. , Gneiting T. , and Fraley C. , 2007: Probabilistic quantitative precipitation forecasting using Bayesian model averaging. Mon. Wea. Rev., 135, 32093220.

    • Search Google Scholar
    • Export Citation
  • Snyder, C., and Zhang F. , 2003: Assimilation of simulated Doppler radar observations with an ensemble Kalman filter. Mon. Wea. Rev., 131, 16631677.

    • Search Google Scholar
    • Export Citation
  • Staniforth, A., and Wood N. , 2008: Aspects of the dynamical core of a nonhydrostatic, deep-atmosphere, unified weather and climate-prediction model. J. Comput. Phys., 227, 34453464.

    • Search Google Scholar
    • Export Citation
  • Stephan, K., Klink S. , and Schraff C. , 2008: Assimilation of radar-derived rain rates into the convective-scale model COSMO-DE at DWD. Quart. J. Roy. Meteor. Soc., 134, 13151326.

    • Search Google Scholar
    • Export Citation
  • Theis, S. E., Hense A. , and Damrath U. , 2005: Probabilistic precipitation forecasts from a deterministic model: A pragmatic approach. Meteor. Appl., 12, 257268.

    • Search Google Scholar
    • Export Citation
  • Tokdar, S. T., and Kadane J. B. , 2012: Simultaneous linear quantile regression: A semiparametric Bayesian approach. Bayesian Anal., 7, 5172.

    • Search Google Scholar
    • Export Citation
  • Torn, R. D., Hakim G. J. , and Snyder C. , 2006: Boundary conditions for limited-area ensemble Kalman filters. Mon. Wea. Rev., 134, 24902502.

    • Search Google Scholar
    • Export Citation
  • Toth, Z., and Kalnay E. , 1993: Ensemble forecasting at NMC: The generation of perturbations. Bull. Amer. Meteor. Soc., 74, 23172330.

  • Vrac, M., and Naveau P. , 2007: Stochastic downscaling of precipitation: From dry events to heavy rainfalls. Water Resour. Res., 43, W07402, doi:10.1029/2006WR005308.

    • Search Google Scholar
    • Export Citation
  • Walser, A., Lüthi D. , and Schär C. , 2004: Predictability of precipitation in a cloud-resolving model. Mon. Wea. Rev., 132, 560577.

  • Wilks, D. S., 2006a: Comparison of ensemble-MOS methods in the Lorenz ’96 setting. Meteor. Appl., 13, 243256.

  • Wilks, D. S., 2006b: Statistical Methods in the Atmospheric Sciences. 2nd ed. International Geophysics Series, Vol. 91, Academic Press, 627 pp.

  • Wilks, D. S., 2009: Extending logistic regression to provide full-probability-distribution MOS forecasts. Meteor. Appl., 16, 361368.

  • Yu, K., and Moyeed R. A. , 2001: Bayesian quantile regression. Stat. Probab. Lett., 54, 437447.

  • Yuan, H., Lu C. , McGinley J. A. , Schultz P. J. , Jamison B. D. , Wharton L. , and Anderson C. J. , 2009: Evaluation of short-range quantitative precipitation forecasts from a time-lagged multimodel ensemble. Wea. Forecasting, 24, 1838.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    Mean 12-h precipitation accumulations in mm (12 h)−1 for (a) station measurements (linearly interpolated) and (b) COSMO-DE forecasts for the period 1 Jul 2008–30 Jun 2011. The stars in (a) represent the station locations.

  • Fig. 2.

    (a) BSS, (b) reliability, and (c) resolution for all stations (boxplot) and different predictor combinations. The white box refers to the uncalibrated first-guess forecasts. The gray-shaded boxes refer to LR with the corresponding fgPoP or fgPoTu, fgM, and fgPoP or PoTu plus fgM as predictors. The legend in (b) is valid for all three plots. The y axis in (b) is restricted for enhanced visibility.

  • Fig. 3.

    Reliability of (a) fgPoP, (b) fgPoT5, (c) fgPoT10, and of the LR estimates for (d) PoP with fgM+fgPoP as predictors, (e) PoT5, and (f) PoT10 with fgM as predictor. The boxplots show the observed frequencies conditional on each of the 10 forecast probabilities. The bar plot refers to the frequency of forecasts in each bin. A total of 484 134 pairs of observations and forecasts are used for each diagram.

  • Fig. 4.

    Temporal evolution of (a) the predictive intercept, and the regression coefficients for (b) fgM and (c) fgPoP of the LR for PoP. The gray shading indicates the standard error of the parameter estimates. The three lines refer to the three years of data used.

  • Fig. 5.

    As in Fig. 4, but for PoT5.

  • Fig. 6.

    QVSS for all stations (boxplot) and different predictor combinations. The white box refers to the uncalibrated first-guess forecasts. The gray-shaded boxes refer to QR with the fgM, the corresponding fgQτ, and the combination of fgQτ with fgPoP and with fgM as predictors. The y axis is restricted for enhanced visibility.

  • Fig. 7.

    Temporal evolution of (a) the predictive intercept, and the regression coefficients for (b) fgQ95 and (c) fgPoP of the QR for the 0.95 quantile. The gray shading indicates the standard error of the parameter estimates.

  • Fig. 8.

    CRPS for all stations (boxplot) and different distributions. The gray-shaded boxes refer to the fgM, fgM+fgPoP, fgQ90, and fgQ90+fgPoP as predictors.

  • Fig. 9.

    BSS for all stations (boxplot) and different thresholds. The white box refers to LR, while the gray-shaded boxes refer to LR-Gam, LR-LogN, and LR-InvG with predictors fgQ90+fgPoP.

  • Fig. 10.

    As in Fig. 9, but for QVSS. The y axis is restricted for enhanced visibility.

  • Fig. 11.

    The 3-month moving average of QVSS for QR minus QVSS for the LR-Gam mixture.

  • Fig. 12.

    Boxplot of QVSS for all stations. The white boxes refer to the QR approach, while the gray-shaded boxes refer to the LR-Gam mixture (dark gray) and the LR-Gam-GPD mixture (light gray). The gray-bordered boxplots show the results for a 3-yr training period including the annual cycle as a covariate. The y axis is restricted for enhanced visibility.

  • Fig. 13.

    Temporal evolution of (a) the predictive intercept, the regression coefficients for (b) fgQ90 and (c) fgPoP for the scale parameter, and (d) the shape parameter of the GPD with a 50-day training period. The gray shading indicates the standard error of the parameter estimates. The three lines refer to the three years of data used.

  • Fig. 14.

    As in Fig. 13, but for the 1065-day training period.

  • Fig. 15.

    Three-month moving average of QVSS for QR minus QVSS for the LR-Gam-GPD mixture.