• Agresti, A., 2002: Categorical Data Analysis.2nd ed. John Wiley & Sons, 734 pp.

  • Ben Bouallègue, Z., 2013: Calibrated short-range ensemble precipitation forecasts using extended logistic regression with interaction terms. Wea. Forecasting, 28, 515524, doi:10.1175/WAF-D-12-00062.1.

    • Search Google Scholar
    • Export Citation
  • Bröcker, J., , and L. A. Smith, 2007: Increasing the reliability of reliability diagrams. Wea. Forecasting, 22, 651661, doi:10.1175/WAF993.1.

    • Search Google Scholar
    • Export Citation
  • Christensen, R. H. B., 2013: ordinal: Regression Models for Ordinal Data, version 2013.09-30. R package. [Available online at http://CRAN.R-project.org/package=ordinal.]

  • Epstein, E. S., 1969: A scoring system for probability forecasts of ranked categories. J. Appl. Meteor., 8, 985987, doi:10.1175/1520-0450(1969)008<0985:ASSFPF>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Gneiting, T., , A. E. Raftery, , A. H. Westveld, , and T. Goldman, 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., 133, 10981118, doi:10.1175/MWR2904.1.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., 2012: Verification of TIGGE multimodel and ECMWF reforecast-calibrated probabilistic precipitation forecasts over the contiguous United States. Mon. Wea. Rev., 140, 22322252, doi:10.1175/MWR-D-11-00220.1.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., , C. Snyder, , and J. S. Whitaker, 2003: Ensemble forecasts and the properties of flow-dependent analysis-error covariance singular vectors. Mon. Wea. Rev., 131, 17411758, doi:10.1175//2559.1.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., , J. S. Whitaker, , and X. Wei, 2004: Ensemble reforecasting: Improving medium-range forecast skill using retrospective forecasts. Mon. Wea. Rev., 132, 14341447, doi:10.1175/1520-0493(2004)132<1434:ERIMFS>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Hersbach, H., 2000: Decomposition of the continuous ranked probability score for ensemble prediction systems. Wea. Forecasting, 15, 559570, doi:10.1175/1520-0434(2000)015<0559:DOTCRP>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Matheson, J. E., , and R. L. Winkler, 1976: Scoring rules for continuous probability distributions. Manage. Sci., 22, 10871096, doi:10.1287/mnsc.22.10.1087.

    • Search Google Scholar
    • Export Citation
  • Messner, J. W., , and A. Zeileis, 2013: crch: Censored Regression with Conditional Heteroscedasticity, version 0.1-0. R package. [Available online at http://CRAN.R-project.org/package=crch.]

  • Messner, J. W., , A. Zeileis, , G. J. Mayr, , and D. S. Wilks, 2014: Heteroscedastic extended logistic regression for postprocessing of ensemble guidance. Mon. Wea. Rev., 142, 448456, doi:10.1175/MWR-D-13-00271.1.

    • Search Google Scholar
    • Export Citation
  • Nelder, J. A., , and R. W. M. Wedderburn, 1972: Generalized linear models. J. Roy. Stat. Soc., 135A, 370384, doi:10.2307/2344614.

  • Raftery, A. E., , T. Gneiting, , F. Balabdaoui, , and M. Polakowski, 2005: Using Bayesian model averaging to calibrate forecast ensembles. Mon. Wea. Rev., 133, 11551174, doi:10.1175/MWR2906.1.

    • Search Google Scholar
    • Export Citation
  • R Core Team, 2013: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. [Available online at http://www.R-project.org/.]

  • Roulin, E., , and S. Vannitsem, 2012: Postprocessing of ensemble precipitation predictions with extended logistic regression based on hindcasts. Mon. Wea. Rev., 140, 874888, doi:10.1175/MWR-D-11-00062.1.

    • Search Google Scholar
    • Export Citation
  • Roulston, M. S., , and L. A. Smith, 2003: Combining dynamical and statistical ensembles. Tellus, 55A, 1630, doi:10.1034/j.1600-0870.2003.201378.x.

    • Search Google Scholar
    • Export Citation
  • Ruiz, J. J., , and C. Saulo, 2012: How sensitive are probabilistic precipitation forecasts to the choice of calibration algorithms and the ensemble generation method? Part I: Sensitivity to calibration methods. Meteor. Appl., 19, 302313, doi:10.1002/met.286.

    • Search Google Scholar
    • Export Citation
  • Schefzik, R., , T. L. Thorarinsdottir, , and T. Gneiting, 2013: Uncertainty quantification in complex simulation models using ensemble copula coupling. Stat. Sci., 28, 616–640, doi:10.1214/13-STS443.

    • Search Google Scholar
    • Export Citation
  • Scheuerer, M., 2014: Probabilistic quantitative precipitation forecasting using ensemble model output statistics. Quart. J. Roy. Meteor. Soc., 140, 1086–1096, doi:10.1002/qj.2183.

    • Search Google Scholar
    • Export Citation
  • Schmeits, M. J., , and K. J. Kok, 2010: A comparison between raw ensemble output, (modified) Bayesian model averaging, and extended logistic regression using ECMWF ensemble precipitation reforecasts. Mon. Wea. Rev., 138, 41994211, doi:10.1175/2010MWR3285.1.

    • Search Google Scholar
    • Export Citation
  • Thorarinsdottir, T. L., , and T. Gneiting, 2010: Probabilistic forecasts of wind speed: Ensemble model output statistics by using heteroscedastic censored regression. J. Roy. Stat. Soc., 173A, 371388, doi:10.1111/j.1467-985X.2009.00616.x.

    • Search Google Scholar
    • Export Citation
  • Tobin, J., 1958: Estimation of relationships for limited dependent variables. Econometrica, 26, 2436, doi:10.2307/1907382.

  • Wang, X., , and C. H. Bishop, 2003: A comparison of breeding and ensemble transform Kalman filter ensemble forecast schemes. J. Atmos. Sci., 60, 11401158, doi:10.1175/1520-0469(2003)060<1140:ACOBAE>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2006: Comparison of ensemble-MOS methods in the Lorenz ’96 setting. Meteor. Appl., 13, 243256, doi:10.1017/S1350482706002192.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2009: Extending logistic regression to provide full-probability-distribution MOS forecasts. Meteor. Appl., 16, 361368, doi:10.1002/met.134.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2011: Statistical Methods in the Atmospheric Sciences. 3rd ed. Academic Press, 676 pp.

  • Wilks, D. S., , and T. M. Hamill, 2007: Comparison of ensemble-MOS methods using GFS reforecasts. Mon. Wea. Rev., 135, 23792390, doi:10.1175/MWR3402.1.

    • Search Google Scholar
    • Export Citation
  • View in gallery

    Schematic figure of regression lines fitted with extended, ordered, or censored logistic regression with one predictor variable x1 and J = 3 thresholds (q1, q2, q3).

  • View in gallery

    Intercepts θj from heteroscedastic ordered (HOLR) and extended logistic regression (HXLR) relative to threshold values, for the locations Wien—Hohe-Warte and Paris—Orly, lead time 48 h, and the predictands (left) wind speed and (right) 24-h accumulated precipitation amount. For better comparability intercepts are normalized with β1, respectively. The square root is used as transformation for HXLR .

  • View in gallery

    Ranked probability skill score (RPSS) relative to heteroscedastic extended logistic regression (HXLR) of (left) wind speed and (right) 24-h accumulated precipitation amount for different models (see Table 2 for details) aggregated over 10 European locations (see text for details) and the selected locations Wien—Hohe-Warte and Paris—Orly. Nine climatological deciles that were computed separately for each forecast location are used as thresholds. Because for precipitation several thresholds are 0 the effective number of deciles is smaller (e.g., 4 in Wien and or 5 in Paris). The effective training data length is approximately 900 days. Positive values indicate improvements over HXLR. The solid circles mark the median and the boxes the interquartile ranges of the 250 values from the bootstrapping approach, the whiskers show the most extreme values that are less than 1.5 times the length of the box away from the box, and empty circles are plotted for values that are outside the whiskers.

  • View in gallery

    Ranked probability skill score (RPSS) relative to the raw ensemble (ensemble relative frequencies within each interval) for different training data lengths and models (see Table 2 for details) and lead time 48 h.

  • View in gallery

    Continuous ranked probability skill score (CRPSS) relative to heteroscedastic extended logistic regression (HXLR) and their bootstrap sampling distributions for different predictands, models (see Table 2 for details), and lead time 48 h, aggregated over 10 European locations, and the selected locations Wien—Hohe-Warte and Paris—Orly, respectively. Positive values indicate improvements over HXLR.

  • View in gallery

    Continuous ranked probability skill score (CRPSS) of HCLR relative to HXLR for different numbers of climatological quantiles as thresholds, Wien—Hohe-Warte, lead time 48 h, and the predictand variables (left) wind speed and (right) precipitation. Note that the scales in the two panels are different. If two or more quantiles are equal they are merged to one. The shaded areas show the 90% confidence intervals from bootstrapping.

  • View in gallery

    Reliability diagrams for predicted probabilities to fall below the first climatological decile P(yq1 | x) for Wien—Hohe-Warte, lead time 48 h, and different models. Forecasts are aggregated in 0.1 probability intervals. Calibration functions for wind speed are plotted as red “×” and for precipitation amount as green “+” and are only shown for intervals with more than 10 forecasts. Refinement distributions for wind speed are plotted in the bottom-right corner in red and for precipitation in the top-left corner in green. 95% consistency intervals derived from consistency resampling (Bröcker and Smith 2007) are shown as red and green shaded areas, respectively. Note that because of the frequent zero observations q1 = q2 = ⋯ = q6 for precipitation so that P(yq1 | x) = P(yq6 | x) = P(y = 0 | x).

  • View in gallery

    As in Fig. 7, but for predicted probabilities to fall below the upper climatological decile P(yq9 | x).

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 297 297 20
PDF Downloads 227 227 20

Extending Extended Logistic Regression: Extended versus Separate versus Ordered versus Censored

View More View Less
  • 1 Institute of Meteorology and Geophysics, University of Innsbruck, Innsbruck, Austria
  • | 2 Department of Earth and Atmospheric Sciences, Cornell University, Ithaca, New York
  • | 3 Department of Statistics, Faculty of Economics and Statistics, University of Innsbruck, Innsbruck, Austria
© Get Permissions
Full access

Abstract

Extended logistic regression is a recent ensemble calibration method that extends logistic regression to provide full continuous probability distribution forecasts. It assumes conditional logistic distributions for the (transformed) predictand and fits these using selected predictand category probabilities. In this study extended logistic regression is compared to the closely related ordered and censored logistic regression models. Ordered logistic regression avoids the logistic distribution assumption but does not yield full probability distribution forecasts, whereas censored regression directly fits the full conditional predictive distributions. The performance of these and other ensemble postprocessing methods is tested on wind speed and precipitation data from several European locations and ensemble forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF). Ordered logistic regression performed similarly to extended logistic regression for probability forecasts of discrete categories whereas full predictive distributions were better predicted by censored regression.

Denotes Open Access content.

Corresponding author address: Jakob W. Messner, Institute of Meteorology and Geophysics, University of Innsbruck, Innrain 52f, 6020 Innsbruck, Austria. E-mail: jakob.messner@uibk.ac.at

Abstract

Extended logistic regression is a recent ensemble calibration method that extends logistic regression to provide full continuous probability distribution forecasts. It assumes conditional logistic distributions for the (transformed) predictand and fits these using selected predictand category probabilities. In this study extended logistic regression is compared to the closely related ordered and censored logistic regression models. Ordered logistic regression avoids the logistic distribution assumption but does not yield full probability distribution forecasts, whereas censored regression directly fits the full conditional predictive distributions. The performance of these and other ensemble postprocessing methods is tested on wind speed and precipitation data from several European locations and ensemble forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF). Ordered logistic regression performed similarly to extended logistic regression for probability forecasts of discrete categories whereas full predictive distributions were better predicted by censored regression.

Denotes Open Access content.

Corresponding author address: Jakob W. Messner, Institute of Meteorology and Geophysics, University of Innsbruck, Innrain 52f, 6020 Innsbruck, Austria. E-mail: jakob.messner@uibk.ac.at

1. Introduction

Important applications such as severe weather warnings or decision making in agriculture, industry, and finance strongly demand accurate weather forecasts. Usually numerical weather prediction (NWP) models are used to provide these weather forecasts. Unfortunately, because of the only roughly known current state of the atmosphere and unknown or unresolved physical processes, NWP models are always subject to error. To estimate these errors many forecasting centers nowadays provide ensemble forecasts. These are several NWP forecasts with perturbed initial conditions and/or different model formulations. However, the perturbed initial conditions do not necessarily represent initial condition uncertainty (Hamill et al. 2003; Wang and Bishop 2003) and some structural deficiencies in the models are also not accounted for. Thus, the ensemble forecasts usually do not represent the full uncertainty of NWP models. Ensemble forecasts therefore typically need to be statistically postprocessed to achieve well-calibrated probabilistic forecasts.

In the past decade a variety of different ensemble postprocessing methods have been proposed. Examples are ensemble dressing (Roulston and Smith 2003), Bayesian model averaging (Raftery et al. 2005), heteroscedastic linear regression (Gneiting et al. 2005), or logistic regression (Hamill et al. 2004). Comparisons of these and other postprocessing methods (Wilks 2006; Wilks and Hamill 2007) showed that logistic regression performs relatively well. Recently, Wilks (2009) extended logistic regression by including the (transformed) predictand thresholds as an additional predictor variable. In addition to requiring fewer coefficients and providing coherent probabilistic forecasts, this extended logistic regression allows derivation of full continuous predictive distributions. Extended logistic regression has been used frequently (Schmeits and Kok 2010; Ruiz and Saulo 2012; Roulin and Vannitsem 2012; Hamill 2012; Ben Bouallègue 2013; Scheuerer 2014; Messner et al. 2014) and has been further extended to additionally account for conditional heteroscedasticy (Messner et al. 2014). Recently, several studies noticed that extended logistic regression assumes a conditional logistic distribution for the transformed predictand (Scheuerer 2014; Schefzik et al. 2013; Messner et al. 2014) where this logistic distribution is fitted to selected predictand category probabilities.

In this study we compare (heteroscedastic) extended logistic regression with two closely related regression models from statistics that are particularly popular in econometrics (and more broadly in social sciences):

  1. (Heteroscedastic) ordered logistic regression also provides coherent forecasts of category probabilities. However, it differs from extended logistic regression in that no continuous distribution is assumed or specified by the model.
  2. (Heteroscedastic) censored regression also fits conditional logistic distributions to a transformed predictand but employs the full set of training-data points (as opposed to a set of thresholds) for fitting the model.
The performance of these statistical models is tested on wind speed and precipitation data from 10 European locations and ensemble forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF). In addition to heteroscedastic ordered logistic regression, heteroscedastic extended logistic regression, and heteroscedastic censored logistic regression, also separate logistic regressions (Hamill et al. 2004) and for wind speed forecasts heteroscedastic truncated Gaussian regression (Thorarinsdottir and Gneiting 2010) are tested.

In section 2 the different statistical models are described in detail. A brief description of the data can be found in section 3. Finally, section 4 presents the results and section 5 provides a summary and discussion.

2. Statistical models

This section describes different statistical models to predict conditional probabilities P(yqj | x) of a continuous predictand y falling below a threshold qj, given a vector of predictor variables x = (1, x1, x2, …)T (i.e., NWP forecasts). Conditional category probabilities of y to fall between two thresholds qa and qb can then easily be derived with P(qa < yqb) = P(yqb | x) − P(yqa | x).

a. Separate logistic regressions (SLR)

Logistic regression was one of the first statistical methods that were proposed to postprocess ensemble forecasts (Hamill et al. 2004). Originally it is a regression model from the generalized linear model framework (Nelder and Wedderburn 1972) to model the probability of binary responses:
e1
where β = (β0, β1, β2, …)T is a coefficient vector and Λ(·) = exp(·)/[1 + exp(·)] is notationally equivalent to the cumulative distribution function of the standard logistic distribution. The coefficient vector β is estimated by maximizing the log-likelihood
e2
as a function of β as defined in Eq. (1), where N is the number of events in the dataset and πi is the predicted probability of the ith observed outcome:
e3
Often separate logistic regressions (i.e., with separate coefficient vectors β) are fitted for several thresholds qj of interest (e.g., Hamill et al. 2004; Wilks 2006; Wilks and Hamill 2007). This implies that the regression lines for different thresholds can cross, so that for some values of the predictor variables x, P(yqa | x) > P(yqb | x) although qa < qb which leads to nonsense negative probability for y to fall between qa and qb.

b. Heteroscedastic extended logistic regression (HXLR)

To avoid these negative probabilities and to reduce the number of regression coefficients Wilks (2009) proposed to include a transformation of the predictand thresholds as an additional predictor variable in logistic regression:
e4
where α is an additional coefficient that has to be estimated and the transformation g( ) is a monotone function. Equation (4) also differs from standard logistic regression, where β is estimated separately for each threshold, in that here β is the same for all thresholds. Thus, one interpretation of Eq. (4) is that it defines parallel regression lines in log-odds space with equal slope but different intercepts [θj = αg(qj) − β0]. Figure 1 shows examples of these regression curves schematically.
Fig. 1.
Fig. 1.

Schematic figure of regression lines fitted with extended, ordered, or censored logistic regression with one predictor variable x1 and J = 3 thresholds (q1, q2, q3).

Citation: Monthly Weather Review 142, 8; 10.1175/MWR-D-13-00355.1

Extended logistic regression not only avoids the problem of crossing regression lines but also allows for computing probabilities for any threshold value qj (and not only the thresholds employed for estimating the model). In other words, Eq. (4) can also be interpreted as a cumulative distribution function that describes a full continuous predictive distribution. After some reformulation (see Messner et al. 2014), Eq. (4) can also be written as
e5
which shows that the predictive distribution of the transformed predictand g(y) is a logistic distribution with location parameter xTβ/α and scale parameter 1/α. Thus, the transformation g( ) must be chosen such that the transformed predictand can be assumed to follow a conditional (on the predictors x) logistic distribution.
To effectively utilize uncertainty information contained in the ensemble spread, Messner et al. (2014) proposed to use additional predictor variables (z = 1, z1, z2, …)T (e.g., the ensemble spread) to directly control the dispersion (variance) of the logistic predictive distribution:
e6
where γ = (γ0, γ1, γ2, …)T and δ = (δ0, δ1, δ2, …)T are the coefficient vectors that have to be estimated. The exponential function is used as a simple method to ensure positive values (Messner et al. 2014).
The coefficient vectors γ and δ are also estimated by maximizing the log-likelihood function given by Eq. (2). However, the probability of the observed outcome for the multicategorical predictand is
e7
(Messner et al. 2014), where J is the number of thresholds qj that have been selected for the fitting calculation.

c. Heteroscedastic ordered logistic regression (HOLR)

Ordered logistic regression—also known as ordered logit, proportional odds logistic regression, or cumulative link model—is a popular regression model from statistics and econometrics for ordinal data, which has not received much attention in meteorology so far. Like extended logistic regression it is an extension of standard logistic regression for multicategorical and ordered predictands. Different from extended logistic regression, separate intercepts θj are fitted for each selected threshold instead of modeling them as a linear function of the (transformed) thresholds:
e8
where the estimated separate intercepts θj are only constrained to be ordered (θ1θ2θJ) for ordered thresholds qj. Because the intercepts of the regression lines are fully determined by θj further intercepts are not needed anymore so that x = (x1, x2, …)T must not contain any constant. Similar to extended logistic regression β is the same for all thresholds.

The separate intercepts for each threshold imply the estimation of more coefficients than for extended logistic regression. Furthermore, only the probabilities for the thresholds qj employed in the estimation can be derived, so that Eq. (8) does not specify full continuous predictive distributions. In return, ordered logistic regression does not assume a continuous distribution for the transformed predictand. Thus, no (possibly nonexistent) transformation has to be determined to fulfill this assumption.

Similar to heteroscedastic extended logistic regression, a heteroscedastic version of ordered logistic regression also allows control of the scale (variance) of an underlying latent distribution with additional predictor variables (Agresti 2002):
e9
Also, note that here no constant is needed in z = (z1, z2, …)T.

Maximum likelihood estimation with the same log-likelihood function as for extended logistic regression [Eqs. (2) and (7)] is used to estimate the coefficients θj, γ, and δ.

d. Heteroscedastic censored logistic regression (HCLR)

Above we have shown that extended logistic regression assumes a conditional logistic distribution for the transformed predictand. The maximum likelihood estimation with the log-likelihood function given by Eqs. (2) and (7) fits the selected category probabilities. However, if the predictand is given in continuous form, the model described by Eq. (6) can also be estimated with the log-likelihood function from Eq. (2) with
e10
where λ[·] denotes the likelihood function of the standard logistic distribution. The likelihood is notationally identical to the probability density function [i.e., the derivative of Eq. (6) with respect to g(qj)], but differs because it is a function of the parameter vectors γ and δ for a fixed predictand value yi, rather than being a function of yi given fixed values for γ and δ. In this way, the πi employed for fitting the model are not the likelihoods for predictands falling into discrete intervals, but rather the likelihoods that they take on their exact observed values. This model can also be interpreted as a linear regression model with a (heteroscedastic) logistic error distribution.
Nonnegative variables (e.g., wind speeds or precipitation amounts) are only continuous for positive values and have a natural threshold at 0. This nonnegativity can easily accommodated using censored regression (first discussed by Tobin 1958, for the Gaussian case) where the πi are replaced by
e11
in Eq. (2).

This heteroscedastic censored logistic regression fits a logistic error distribution with point mass at zero to the transformed predictand. While such an error distribution seems reasonable for square root transformed precipitation amounts (Scheuerer 2014; Schefzik et al. 2013), usually other error distributions are assumed for wind speed. For example Thorarinsdottir and Gneiting (2010) proposed to fit a truncated normal distribution to the untransformed wind speed. In this case, in Eqs. (6) and (10) the logistic distribution is replaced with a truncated normal distribution and g(y) is set to g(y) = y. Note that Thorarinsdottir and Gneiting (2010) also called this model heteroscedastic censored regression although actually the data are considered to be truncated and not censored. In the following we therefore denote this model as heteroscedastic truncated Gaussian regression (HTGR), which we also employ as benchmark model for wind speed.

e. Comparison

Table 1 summarizes the major differences between the four different logistic regression models that were presented above. Extended logistic regression (XLR) and censored logistic regression (CLR) (and their heteroscedastic versions HXLR and HCLR, respectively) are essentially the same models and only differ in their parameter estimation. They have the fewest parameters of the compared models but imply continuous distribution assumptions. Ordered logistic regression (OLR) and its heteroscedastic version (HOLR) avoid this continuous distribution assumption but require estimation of more coefficients than (H)XLR and (H)CLR. With its unconstrained slope estimates, separate logistic regressions SLR is more flexible than OLR but requires estimation of even more coefficients. Figure 1 shows schematic parallel regression lines for XLR, CLR, or OLR. In contrast to these models, regression curves from SLR would not be constrained to be parallel and so could potentially cross, which would lead to nonsense negative probabilities.

Table 1.

Overview of the different logistic regression models with respect to their parameterization and the likelihood. Here K is the number of predictor variables (x1, x2, …, z1, z2, …) and J is the number of thresholds qj.

Table 1.

3. Data

To compare the presented ensemble postprocessing methods, we used 10-m wind speed observations (10-min average) and 24-h accumulated precipitation amount from the 10 European weather stations: Wien—Hohe-Warte, Austria (48.249°N, 16.356°E); Paris—Orly, France (48.717°N, 2.383°E); Amsterdam—Schiphol, Netherlands (52.3°N, 4.783°E); Berlin—Tegel, Germany (52.55°N, 13.3°E); Brussels—National, Belgium (50.9°N, 4.533°E); Frankfurt—Main, Germany (50.033°N, 8.583°E); London—Heathrow, United Kingdom (51.467°N, −0.45°E); Lisbon—Geof, Portugal (38.767°N, −9.133°E); Madrid—Barajas, Spain (40.467°N, −3.55°E); and Rome—Fiumicino, Italy (41.8°N, 12.233°E). As input for the statistical models, 10-m wind speed and total precipitation ensemble forecasts from the ECMWF were linearly interpolated from neighboring grid points to the station locations. The data were available from April 2010 to December 2012 (approximately 1000 days) and separate models were fitted for the lead times 24, 48, and 96 h, respectively.

Since the predictands were square root transformed for most regression models (see section 4) we mainly used the mean and standard deviation of square root transformed ensemble forecasts as predictor variables. For HTGR the untransformed predictand is used, following Thorarinsdottir and Gneiting (2010). Consequently we employed the mean and standard deviation of the untransformed ensemble forecasts as input for this model.

As thresholds qj we defined J = 9 climatological deciles that are estimated for each location and predictand variable separately. Note that for precipitation several deciles are 0 and are merged to one threshold so that the effective number of thresholds is smaller (e.g., J = 4 in Wien—Hohe-Warte and J = 5 in Paris—Orly for precipitation).

We found the ensemble standard deviation to improve the forecasts of all statistical models, indicating useful spread–skill relationships. Therefore, we only show results for the heteroscedastic models in the following. For separate logistic regressions the product of ensemble mean and spread is included as additional predictor variable (Wilks and Hamill 2007). Table 2 lists the different models that are compared in the following in detail.

Table 2.

List of different statistical models. Here g(y) is the transformation, x are vectors of predictor variables for the location (mean), and z are predictor variables for the scale (variance). The M and S are the mean and standard deviation of square root transformed ensemble forecasts, respectively; and Mr and Sr are the mean and standard deviation of the untransformed ensemble forecasts, respectively. For wind speed forecasts M, S, Mr, and Sr are derived from 10-m wind speed ensemble forecasts and for precipitation forecasts M and S are derived from total precipitation ensemble forecasts.

Table 2.

4. Results

Before comparing the performance of the different ensemble postprocessing methods we show how ordered logistic regression can be used to determine appropriate transformations g( ) for extended logistic regression. The crosses and plus signs in Fig. 2 show the fitted intercepts from ordered logistic regression (HOLR) for two predictands and two selected locations. For both locations and variables these plots suggest that the intercepts can be parameterized as being proportional to the square roots of the thresholds. Thus, we fitted HXLR models with and added the corresponding HXLR intercept functions as curves in Fig. 2. For both predictand variables and locations the HXLR intercept functions fit the HOLR intercepts reasonably well. Note that similar figures can also be used to compare the intercepts of extended logistic regression with those of separate logistic regression (e.g., Ruiz and Saulo 2012). However, the varying slope coefficients then complicate the comparison.

Fig. 2.
Fig. 2.

Intercepts θj from heteroscedastic ordered (HOLR) and extended logistic regression (HXLR) relative to threshold values, for the locations Wien—Hohe-Warte and Paris—Orly, lead time 48 h, and the predictands (left) wind speed and (right) 24-h accumulated precipitation amount. For better comparability intercepts are normalized with β1, respectively. The square root is used as transformation for HXLR .

Citation: Monthly Weather Review 142, 8; 10.1175/MWR-D-13-00355.1

Figure 2 already suggests that HXLR and HOLR predict similarly well. In the following we compare these and the other statistical models more thoroughly. Because all models provide probabilistic forecasts for discrete intervals we mainly employ the ranked probability score (RPS; Epstein 1969; Wilks 2011) to characterize forecast accuracy:
e12
where J is the number of thresholds and I(·) is the indicator function. For each model, forecast location, and lead time we applied 10-fold cross validation to get independent training and test datasets. Therefore, the data are divided into 10 equally sized blocks and in each block the RPS were computed for the models that were trained on the 9 remaining blocks, respectively. Consequently, the effective training data length is of the full dataset (approximately 900).
To estimate the sampling distribution for the average we computed means of 250 bootstrap samples. To compare the models with a reference model we finally computed ranked probability skill scores (RPSS):
e13
where is the of appropriate reference forecasts.

Figure 3 shows the RPSS relative to HXLR for different models, lead times, locations, and predictand variables. HOLR performs equally well or slightly better than HXLR for all locations, lead times, and predictand variables. For precipitation in Paris forecasts of HXLR and HOLR are nearly identical, which is consistent with Fig. 2 where the HXLR intercept function almost perfectly interpolates the HOLR intercepts. SLR generally performs worse than HXLR, exceptions are wind speed forecasts in Wien for 24- and 96-h lead time and precipitation forecasts in Paris for 24-h lead time. However, note that the RPS [Eq. (12)] does not penalize the partly inconsistent forecasts from SLR. HCLR and HTGR also tend to perform worse than HXLR, especially for wind speed. While for Paris HTGR is slightly better than HCLR there is no clear preference for one of these models in Wien or the aggregated locations. For Fig. 3 we used nine climatological deciles as thresholds qj for estimation and verification. We additionally also tested different other numbers of climatological quantiles. However, apart from SLR and HOLR reaching slightly better skills for fewer quantiles, the results are very similar and therefore not shown.

Fig. 3.
Fig. 3.

Ranked probability skill score (RPSS) relative to heteroscedastic extended logistic regression (HXLR) of (left) wind speed and (right) 24-h accumulated precipitation amount for different models (see Table 2 for details) aggregated over 10 European locations (see text for details) and the selected locations Wien—Hohe-Warte and Paris—Orly. Nine climatological deciles that were computed separately for each forecast location are used as thresholds. Because for precipitation several thresholds are 0 the effective number of deciles is smaller (e.g., 4 in Wien and or 5 in Paris). The effective training data length is approximately 900 days. Positive values indicate improvements over HXLR. The solid circles mark the median and the boxes the interquartile ranges of the 250 values from the bootstrapping approach, the whiskers show the most extreme values that are less than 1.5 times the length of the box away from the box, and empty circles are plotted for values that are outside the whiskers.

Citation: Monthly Weather Review 142, 8; 10.1175/MWR-D-13-00355.1

Because the different statistical models differ considerably in their number of estimated coefficients (SLR: 3J, HOLR: 2 + J, HXLR, HCLR, HTGR: 4) it is also interesting to compare their performance for different training data lengths. Figure 4 shows RPSS for wind speed and precipitation forecasts for 48-h lead time at Wien and Paris, relative to the raw ensemble interval relative frequencies. Similar to Fig. 3 the RPS are computed with 10-fold cross validation but for each test sample, only a subset of the remaining data are used for training. It can be seen that almost all models lose skill with a reduced training dataset. With the largest parameter count SLR clearly loses most. In contrast HOLR generally exhibits comparable skill reductions as HXLR and HCLR in response to decreasing training data although more parameters have to be estimated. Interestingly, for wind speed in Paris the skill of HCLR seems not to depend on the training data length and is therefore superior to the other models for short training datasets.

Fig. 4.
Fig. 4.

Ranked probability skill score (RPSS) relative to the raw ensemble (ensemble relative frequencies within each interval) for different training data lengths and models (see Table 2 for details) and lead time 48 h.

Citation: Monthly Weather Review 142, 8; 10.1175/MWR-D-13-00355.1

HCLR basically fits the same model as HXLR, with the only difference being that the estimated model parameters optimize either the selected category probabilities (HXLR) or the continuous predictive distribution (HCLR). Since the RPS only measures the quality of the selected category probabilities the better RPS of HXLR in Fig. 3 is not surprising. To compare also the quality of the full predictive distributions we therefore employ the continuous ranked probability score (CRPS; Matheson and Winkler 1976; Hersbach 2000; Wilks 2011) that generalizes the RPS to full predictive distributions:
e14
To avoid finding closed forms of the integral in Eq. (14) we solved these integrals numerically. Figure 5 shows the continuous ranked probability skill score (CRPSS) relative to HXLR for lead time 48 h. Results for the other lead times are very similar and therefore not shown. In contrast to the RPSS (Fig. 3) the CRPSS clearly favors HCLR for both predictand variables.
Fig. 5.
Fig. 5.

Continuous ranked probability skill score (CRPSS) relative to heteroscedastic extended logistic regression (HXLR) and their bootstrap sampling distributions for different predictands, models (see Table 2 for details), and lead time 48 h, aggregated over 10 European locations, and the selected locations Wien—Hohe-Warte and Paris—Orly, respectively. Positive values indicate improvements over HXLR.

Citation: Monthly Weather Review 142, 8; 10.1175/MWR-D-13-00355.1

For wind speed, Fig. 5 also shows the CRPSS for HTGR. As in Fig. 3 HCLR and HTGR show similar CRPSS for Wien while HTGR is slightly preferred for Paris and the aggregated locations, which could indicate that the real error distribution is better estimated by a truncated normal than by a censored transformed logistic distribution.

Since HXLR fits the selected category probabilities it is also interesting how the choice of the thresholds that define these categories affects the quality of the predictive distribution. Figure 6 shows the CRPSS of HCLR relative to HXLR for different numbers of climatological quantiles that are used to fit HXLR. Since HCLR could also be interpreted as HXLR with infinitesimal category intervals it is not surprising that the CRPS of HXLR and HCLR become more similar with higher numbers of thresholds. Although the patterns look similar for both predictand variables, precipitation forecasts lose much more skill for few thresholds.

Fig. 6.
Fig. 6.

Continuous ranked probability skill score (CRPSS) of HCLR relative to HXLR for different numbers of climatological quantiles as thresholds, Wien—Hohe-Warte, lead time 48 h, and the predictand variables (left) wind speed and (right) precipitation. Note that the scales in the two panels are different. If two or more quantiles are equal they are merged to one. The shaded areas show the 90% confidence intervals from bootstrapping.

Citation: Monthly Weather Review 142, 8; 10.1175/MWR-D-13-00355.1

Finally, Figs. 7 and 8 show reliability diagrams (e.g., Wilks 2011) for the lower and upper climatological deciles, respectively, for 48-h lead time at Wien. With few exceptions the observed conditional relative frequencies of both predictand variables lie within the 95% consistency intervals (Bröcker and Smith 2007) with only minor differences between the different statistical models. The refinement distributions in Figs. 7 and 8 show the frequencies of the predicted probabilities. Similar to the calibration function the different models show only minor differences. Only for zero precipitation SLR and HOLR have slightly sharper forecasts than HXLR and HCLR (forecasts more frequently close to 0 and 1).

Fig. 7.
Fig. 7.

Reliability diagrams for predicted probabilities to fall below the first climatological decile P(yq1 | x) for Wien—Hohe-Warte, lead time 48 h, and different models. Forecasts are aggregated in 0.1 probability intervals. Calibration functions for wind speed are plotted as red “×” and for precipitation amount as green “+” and are only shown for intervals with more than 10 forecasts. Refinement distributions for wind speed are plotted in the bottom-right corner in red and for precipitation in the top-left corner in green. 95% consistency intervals derived from consistency resampling (Bröcker and Smith 2007) are shown as red and green shaded areas, respectively. Note that because of the frequent zero observations q1 = q2 = ⋯ = q6 for precipitation so that P(yq1 | x) = P(yq6 | x) = P(y = 0 | x).

Citation: Monthly Weather Review 142, 8; 10.1175/MWR-D-13-00355.1

Fig. 8.
Fig. 8.

As in Fig. 7, but for predicted probabilities to fall below the upper climatological decile P(yq9 | x).

Citation: Monthly Weather Review 142, 8; 10.1175/MWR-D-13-00355.1

5. Summary and conclusions

Extended logistic regression fits predictand category probabilities by assuming a conditional logistic distribution for the transformed predictand (Scheuerer 2014; Schefzik et al. 2013; Messner et al. 2014). However, for some applications the transformed predictand cannot be assumed to follow a logistic distribution. Moreover, fitting selected category probabilities implies disregarding available information when the predictand is actually given in continuous form.

In this study we compared extended logistic regression with two closely related regression models from statistics and econometrics. Ordered logistic regression is very similar to extended logistic regression but avoids a continuous distribution assumption. On the other hand, censored logistic regression fits the same model as extended logistic regression but uses each individual predictand value in the training dataset instead of the selected category probabilities. As further benchmark models we also employed separate logistic regressions and a truncated Gaussian regression model (Thorarinsdottir and Gneiting 2010). The performance of the different statistical models was tested with wind speed and precipitation data from 10 European locations and ensemble forecasts from the ECMWF. Overall, the logistic distribution assumption seemed to be quite appropriate for the square root–transformed predictands for both predictand variables. Thus, the performance differences between ordered and extended logistic regression were only minor. However, because no continuous distribution has to be assumed, ordered logistic regression should generally be preferred if solely threshold probabilities are required.

Since extended logistic regression fits selected category probabilities, it is actually not surprising that RPS skills are higher for this model than for censored logistic regression, which fits the full continuous predictive distribution. For the same reason it is unsurprising that censored logistic regression performed better than extended logistic regression according to CRPS skill, which evaluates accuracy of the full predictive distributions.

Extended and censored logistic regression assume censored conditional logistic distributions for the transformed predictand. In contrast, wind speed was assumed to follow a truncated normal distribution in Thorarinsdottir and Gneiting (2010). A comparison between censored and truncated regression models showed that the assumption of a truncated normal distribution resulted in slightly better wind speed forecasts than the assumption of a censored transformed logistic distribution.

Our results show that the optimal statistical model strongly depends on the intended application. Ordered logistic regression was best suited for category probability predictions for the forecasts considered here, given sufficiently long training series. When the transformed predictand can be assumed to follow a conditional logistic distribution then extended logistic regression provides equally good category probability forecasts while requiring fewer coefficients and additionally specifying full predictive distributions. However, if the primary interest is in predicting full continuous probability distributions, censored or truncated regression models should be preferred because they use the information contained in the training data more fully.

Acknowledgments

We thank three anonymous reviewers for their valuable comments to improve this manuscript. This study was supported by the Austrian Science Fund (FWF): L615-N10. The first author was also supported by a Ph.D. scholarship from the University of Innsbruck, Vizerektorat für Forschung. Data from the ECMWF forecasting system were obtained from the ECMWF Data Server.

APPENDIX

Computational Details

Our results were obtained on Ubuntu Linux using the statistical software R 2.15.2 (R Core Team 2013). Heteroscedastic extended logistic regression and heteroscedastic censored logistic regression were fitted using the package crch 0.1–0 (Messner and Zeileis 2013). For ordered logistic regression models we used the package ordinal 2012.09–11 (Christensen 2013).

REFERENCES

  • Agresti, A., 2002: Categorical Data Analysis.2nd ed. John Wiley & Sons, 734 pp.

  • Ben Bouallègue, Z., 2013: Calibrated short-range ensemble precipitation forecasts using extended logistic regression with interaction terms. Wea. Forecasting, 28, 515524, doi:10.1175/WAF-D-12-00062.1.

    • Search Google Scholar
    • Export Citation
  • Bröcker, J., , and L. A. Smith, 2007: Increasing the reliability of reliability diagrams. Wea. Forecasting, 22, 651661, doi:10.1175/WAF993.1.

    • Search Google Scholar
    • Export Citation
  • Christensen, R. H. B., 2013: ordinal: Regression Models for Ordinal Data, version 2013.09-30. R package. [Available online at http://CRAN.R-project.org/package=ordinal.]

  • Epstein, E. S., 1969: A scoring system for probability forecasts of ranked categories. J. Appl. Meteor., 8, 985987, doi:10.1175/1520-0450(1969)008<0985:ASSFPF>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Gneiting, T., , A. E. Raftery, , A. H. Westveld, , and T. Goldman, 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., 133, 10981118, doi:10.1175/MWR2904.1.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., 2012: Verification of TIGGE multimodel and ECMWF reforecast-calibrated probabilistic precipitation forecasts over the contiguous United States. Mon. Wea. Rev., 140, 22322252, doi:10.1175/MWR-D-11-00220.1.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., , C. Snyder, , and J. S. Whitaker, 2003: Ensemble forecasts and the properties of flow-dependent analysis-error covariance singular vectors. Mon. Wea. Rev., 131, 17411758, doi:10.1175//2559.1.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., , J. S. Whitaker, , and X. Wei, 2004: Ensemble reforecasting: Improving medium-range forecast skill using retrospective forecasts. Mon. Wea. Rev., 132, 14341447, doi:10.1175/1520-0493(2004)132<1434:ERIMFS>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Hersbach, H., 2000: Decomposition of the continuous ranked probability score for ensemble prediction systems. Wea. Forecasting, 15, 559570, doi:10.1175/1520-0434(2000)015<0559:DOTCRP>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Matheson, J. E., , and R. L. Winkler, 1976: Scoring rules for continuous probability distributions. Manage. Sci., 22, 10871096, doi:10.1287/mnsc.22.10.1087.

    • Search Google Scholar
    • Export Citation
  • Messner, J. W., , and A. Zeileis, 2013: crch: Censored Regression with Conditional Heteroscedasticity, version 0.1-0. R package. [Available online at http://CRAN.R-project.org/package=crch.]

  • Messner, J. W., , A. Zeileis, , G. J. Mayr, , and D. S. Wilks, 2014: Heteroscedastic extended logistic regression for postprocessing of ensemble guidance. Mon. Wea. Rev., 142, 448456, doi:10.1175/MWR-D-13-00271.1.

    • Search Google Scholar
    • Export Citation
  • Nelder, J. A., , and R. W. M. Wedderburn, 1972: Generalized linear models. J. Roy. Stat. Soc., 135A, 370384, doi:10.2307/2344614.

  • Raftery, A. E., , T. Gneiting, , F. Balabdaoui, , and M. Polakowski, 2005: Using Bayesian model averaging to calibrate forecast ensembles. Mon. Wea. Rev., 133, 11551174, doi:10.1175/MWR2906.1.

    • Search Google Scholar
    • Export Citation
  • R Core Team, 2013: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. [Available online at http://www.R-project.org/.]

  • Roulin, E., , and S. Vannitsem, 2012: Postprocessing of ensemble precipitation predictions with extended logistic regression based on hindcasts. Mon. Wea. Rev., 140, 874888, doi:10.1175/MWR-D-11-00062.1.

    • Search Google Scholar
    • Export Citation
  • Roulston, M. S., , and L. A. Smith, 2003: Combining dynamical and statistical ensembles. Tellus, 55A, 1630, doi:10.1034/j.1600-0870.2003.201378.x.

    • Search Google Scholar
    • Export Citation
  • Ruiz, J. J., , and C. Saulo, 2012: How sensitive are probabilistic precipitation forecasts to the choice of calibration algorithms and the ensemble generation method? Part I: Sensitivity to calibration methods. Meteor. Appl., 19, 302313, doi:10.1002/met.286.

    • Search Google Scholar
    • Export Citation
  • Schefzik, R., , T. L. Thorarinsdottir, , and T. Gneiting, 2013: Uncertainty quantification in complex simulation models using ensemble copula coupling. Stat. Sci., 28, 616–640, doi:10.1214/13-STS443.

    • Search Google Scholar
    • Export Citation
  • Scheuerer, M., 2014: Probabilistic quantitative precipitation forecasting using ensemble model output statistics. Quart. J. Roy. Meteor. Soc., 140, 1086–1096, doi:10.1002/qj.2183.

    • Search Google Scholar
    • Export Citation
  • Schmeits, M. J., , and K. J. Kok, 2010: A comparison between raw ensemble output, (modified) Bayesian model averaging, and extended logistic regression using ECMWF ensemble precipitation reforecasts. Mon. Wea. Rev., 138, 41994211, doi:10.1175/2010MWR3285.1.

    • Search Google Scholar
    • Export Citation
  • Thorarinsdottir, T. L., , and T. Gneiting, 2010: Probabilistic forecasts of wind speed: Ensemble model output statistics by using heteroscedastic censored regression. J. Roy. Stat. Soc., 173A, 371388, doi:10.1111/j.1467-985X.2009.00616.x.

    • Search Google Scholar
    • Export Citation
  • Tobin, J., 1958: Estimation of relationships for limited dependent variables. Econometrica, 26, 2436, doi:10.2307/1907382.

  • Wang, X., , and C. H. Bishop, 2003: A comparison of breeding and ensemble transform Kalman filter ensemble forecast schemes. J. Atmos. Sci., 60, 11401158, doi:10.1175/1520-0469(2003)060<1140:ACOBAE>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2006: Comparison of ensemble-MOS methods in the Lorenz ’96 setting. Meteor. Appl., 13, 243256, doi:10.1017/S1350482706002192.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2009: Extending logistic regression to provide full-probability-distribution MOS forecasts. Meteor. Appl., 16, 361368, doi:10.1002/met.134.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2011: Statistical Methods in the Atmospheric Sciences. 3rd ed. Academic Press, 676 pp.

  • Wilks, D. S., , and T. M. Hamill, 2007: Comparison of ensemble-MOS methods using GFS reforecasts. Mon. Wea. Rev., 135, 23792390, doi:10.1175/MWR3402.1.

    • Search Google Scholar
    • Export Citation
Save