## 1. Introduction

Forecasts of total cloud cover (TCC) are an important part of numerical weather prediction (NWP) both in terms of model feedbacks and with respect to forecast users in areas such as energy demand and production, agriculture, and tourism. In NWP models cloud cover affects the evolution of the model state through feedback loops on radiative fluxes and heating rates (Köhler 2005; Haiden and Trentmann 2016). Predictions of energy demand and production rely in part on TCC forecasts. Photovoltaic energy forecasting in particular relies on accurate predictions of solar irradiance, which is on a day-to-day basis mainly determined by variations in TCC (Taylor and Buizza 2003; Pelland et al. 2013). Observational astronomy depends on reliable TCC forecasts (Ye and Chen 2013). Other applications of TCC forecasts can be found in agriculture, where they may facilitate irrigation scheduling (Diak et al. 1998), in avalanche forecasting, where the amount of radiational cooling influences the stability of snowpacks (McClung 2002), and in leisure activities where cloudiness influences, for example, the amount of sun protection required (Dixon et al. 2008).

(Total) cloud cover is defined as the “portion of the sky cover that is attributed to clouds…” (American Meteorological Society 2015). Obviously, TCC takes values in

The limited skill of direct model output TCC point forecasts is partly due to a representativeness mismatch between models and observations. Areas covered by visual observations typically vary in scale from 10 to 100 km, depending on visibility and topography (Mittermaier 2012). Automated observations as derived from ceilometers measure cloud cover directly overhead. Depending on the wind speed in the cloud layer the scanned area may or may not be representative of the model grid scale. Temporal variability of cloudiness on hourly and subhourly scales presents an additional challenge for predicting instantaneous TCC. As shown by Haiden et al. (2015), the forecast range over which there is positive skill relative to persistence increases from 2–3 days to 5 days if daytime averages rather than instantaneous values of TCC are considered.

The potential benefits of skillful TCC forecasts together with the relatively low performance of state-of-the-art NWP TCC point forecasts (i.e., forecasts interpolated from the NWP model grid to specific sites), motivates the development of statistical methods to postprocess raw ensemble TCC forecasts. In this study, we focus on global point forecasts of TCC from the ECMWF ensemble forecast system. To take account of the discrete nature of TCC, two discrete statistical postprocessing methods are proposed: a method based on multinomial (or polytomous) logistic regression (MLR; see e.g., Agresti and Kateri 2011), and a method based on proportional odds logistic regression (POLR; Walker and Duncan 1967; McCullagh 1980; Ananth and Kleinbaum 1997; Messner et al. 2014). In the field of meteorological forecasting several (postprocessing) approaches based on logistic regression have been proposed over the last 15 years. Applequist et al. (2002) applied logistic regression to produce forecasts of precipitation threshold exceedance probabilities. Hamill et al. (2004) used logistic regression to obtain probabilistic forecasts of temperature and precipitation from ensemble model output statistics. Wilks (2009) proposed extended logistic regression (ELR) as a further development of the approach by Hamill et al. (2004) that provides full predictive distributions from ensemble model output statistics. ELR has been used to postprocess NWP ensemble precipitation (and much less frequently also wind speed) forecasts in many studies. Schmeits and Kok (2010) compared raw ensemble forecasts from a 20-yr ECMWF precipitation reforecast dataset with Bayesian model averaging (BMA; Raftery et al. 2005) and ELR. While ELR outperformed the raw ensemble only slightly in case of area-mean precipitation amounts, area-maximum forecast skill was significantly improved by ELR. Furthermore, ELR performed considerably better than BMA and equally well as a modified BMA approach by Schmeits and Kok (2010). A similar study by Roulin and Vannitsem (2012) showed that applying ELR led to substantially improved skill and mean error of ECMWF precipitation ensemble forecasts for two catchments in Belgium. Likewise, Ben Bouallègue (2013) confirmed the good performance of ELR. However, there are also studies that reveal the limitations of ELR. In a case study comparing eight different postprocessing methods for (ensemble) precipitation forecasts over South America, ELR ranks in the upper midrange among the methods considered (Ruiz and Saulo 2012). Hamill (2012) showed that ELR improved skill of ECMWF precipitation ensemble forecasts considerably over the United States, but that the multimodel ensemble consisting of the ensemble forecasts from the ECMWF, the Met Office, the National Centers for Environmental Prediction, and the Canadian Meteorological Centre could not be improved much by ELR. Scheuerer (2014) was able to outperform ELR by applying an ensemble model output statistics approach (Gneiting et al. 2005) based on a generalized extreme value distribution. Messner et al. (2014) applied ELR, censored logistic regression, and POLR to ECMWF ensemble wind speed and precipitation forecasts. Their study revealed the good performance of POLR on discrete, categorical sample spaces. However, we are not aware of any study that postprocesses TCC ensemble forecasts based on a logistic regression approach.

First, the TCC dataset used for this study is presented in section 2. This is followed by section 3 that discusses the different forecast models and methods. Section 4 provides an in-depth presentation of the results, which is followed by a brief discussion in section 5.

## 2. Data

Like in Hemri et al. (2014) the TCC dataset (T. Haiden 2014, unpublished data) used in this study consists of stationwise daily time series from January 2002 to March 2014 of forecast–observation pairs at 1200 UTC for lead times up to 10 days. As ECMWF forecasts are issued on the global domain, we have selected 3435 surface synoptic observations (SYNOP) stations that cover the entire globe (except from Australia, which does not report at 1200 UTC) as observational dataset. Stations with unreliable observation time series are detected and removed according to the following scheme, which is a modification of the approach by Pinson and Hagedorn (2012):

Count the number of days with observed values that are equal to the observations from the previous 10 days. If this number exceeds 20% of the length of the time series, a station is considered to be unreliable.

Additionally, remove stations with recorded observations outside the range [0, 1].

After removing the unreliable stations, 3330 are left for the following analyses.

## 3. Methods

### a. Training and verification periods

Prior to introducing the different forecast models, the training periods used for estimation of the parameters of the statistical postprocessing models are presented here along with the corresponding verification periods. In line with Hemri et al. (2014) rather long training periods of up to 5 years are applied. Accordingly, the verification period extends from January 2007 to March 2014. The corresponding training periods are selected in a nonseasonal and in a seasonal way. In case of the nonseasonal approach, for any verification day *x* the corresponding training period covers the five calendar years prior to the day *x*. For instance, for a random verification day *x* in 2009, say 27 June 2009, the corresponding training period lasts from 1 January 2004 to 31 December 2008. The same training period would apply for any other verification day in 2009. In case of the seasonal approach, the blockwise training periods from the nonseasonal approach are additionally differentiated according to the season of the verification day. For this study, we divide the year into two seasons (April–September and October–March).

### b. Climatological and uniform forecasts

Climatological and uniform forecasts are used as reference. The climatological forecasts are constructed stationwise in the same way as the seasonal training periods. That is, for each verification day the climatological forecast corresponds to the empirical distribution of all TCC observations in the same season (winter half-year or summer half-year) within the five calendar years prior to the verification day. The uniform forecasts simply assign a probability of

### c. Raw ensemble forecasts

The ECMWF TCC forecasts used in this study are issued daily at 1200 UTC from 1 January 2002 to 20 March 2014 and cover the lead times

### d. MLR and POLR

As stated in section 1 statistical postprocessing methods for TCC should take account of the discrete nature of the reported TCC data. Hence, among the different postprocessing methods those that contain some kind of a “logistic regression” core should be best. Here, we introduce MLR and POLR, which are two different, but closely related models.

*K*= 52). Accordingly, the first three predictors in the MLR model are the mean of the ENS runs

*Z*can be written as

*J*− 1 times such that the probabilities sum up to 1. Using a suitable training period, this model can be easily estimated using the function

*p*denotes the number of predictors not counting the intercept. In case of the POLR model we need only

Overview of the different POLR-S model variants, where *I* is the interaction term between

To allow numerically trouble-free verification, any forecast distribution *T* is the length of the training period. The parameter *α* denotes the probability that state *j* is observed at least once during a period of length *T* [i.e.,

### e. Example forecasts

Before discussing the results in section 4, four subjectively selected example forecasts for Vienna, Austria, are presented in Fig. 1 to highlight typical properties of the postprocessing. Vienna was chosen as a location in Europe that is situated in the broad transition zones from maritime to continental in winter, and from Mediterranean to temperate in summer. As a result, it experiences a rich and complex cloud climatology that is additionally modulated by orographic effects due to its proximity to the European Alps. For illustrative purposes, raw ensemble forecasts are compared with the corresponding seasonal POLR forecasts that use the complete set of predictors (POLR-S h). A detailed discussion of the different POLR-S models can be found in section 4. The raw ensemble and POLR-S h bear strong resemblances. However, POLR-S h seems to move some weight from the extremes (0 or 8 octas) toward the more moderate levels of cloudiness (1–7 octas).

## 4. Results

After having introduced the different forecast models, we first evaluate forecast skill of these models. This is followed by an in-depth assessment of calibration and sharpness of a selected set of models. For a fair comparison of verification scores, raw ensemble and postprocessed forecasts have to be mapped to the space of the observations. The function selected to map raw ensemble and postprocessed forecasts to the observation space influences most of the verification measures. Hence, it is important that the mapping function mimics the procedure of TCC observers, who have to give ⅛ as soon as a little cloud appears, even if the TCC is only 1%, and have to give ⅞ as soon as there is a little gap somewhere in the cloud layer. This is ensured by applying a nonequidistant mapping function, for which the details can be found in section a of the appendix.

### a. Forecast skill

Average skill of the different TCC forecast models is assessed using the log score and the continuous ranked probability score (CRPS) averaged over the entire verification period and all stations. As TCC is a discrete variable, the ranked probability score (RPS; Epstein 1969; Murphy 1969) could be used instead of the CRPS. But since the ordered categories of TCC are not equidistant in the dataset at hand (see above and in section a of the appendix), RPS and CRPS would differ slightly. For this study, we have decided to use the CRPS, because it allows direct skill comparison with continuous TCC forecasts, which may become available in future (see also section 5). Both log score and CRPS are proper scoring rules that are negatively oriented (i.e., the lower the score the higher is the forecast skill). While the log score is a local scoring rule that takes only the forecast probability of the materializing observation into account, the CRPS is sensitive to distance in that forecasts with high probabilities attributed to values close to the materializing observation are considered to be skillful (Gneiting and Raftery 2007). Mathematical formulations of both scores are given in section b of the appendix. According to Table 2, the raw ensemble outperforms climatological and uniform forecasts in terms of CRPS for lead times of 1, 3, and 6 days, but not for 10 days. In terms of log score, it exhibits very poor performance irrespective of the forecast lag. All MLR and POLR models outperform the climatological, uniform, and raw ensemble forecasts in terms of log score and CRPS at all lead times. In case of MLR, the seasonal model slightly outperforms its nonseasonal counterpart in terms of CRPS, while the log score tends to prefer the nonseasonal model. For POLR, log score and CRPS are more consistent in that both scores indicate a slightly better skill of the seasonal model. This is also reflected in Fig. 2, which shows averaged log score and CRPS values including their associated 90% confidence intervals for the raw ensemble, MLR-B, MLR-S, POLR-B, and POLR-S h (i.e., the full model) (cf. Table 1). The 90% confidence intervals are obtained by block bootstrapping (Künsch 1989) with block resamples following a geometric distribution with mean

Means of log scores and CRPS values over the entire verification period and all stations. In each column the best value is shown in boldface.

Means of log scores and CRPS values over the entire verification period and all stations for the raw ensemble, MLR-B, MLR-S, POLR-B, and POLR-S h. The centered 90% confidence intervals have been obtained by block bootstrapping.

Citation: Monthly Weather Review 144, 7; 10.1175/MWR-D-15-0426.1

Means of log scores and CRPS values over the entire verification period and all stations for the raw ensemble, MLR-B, MLR-S, POLR-B, and POLR-S h. The centered 90% confidence intervals have been obtained by block bootstrapping.

Citation: Monthly Weather Review 144, 7; 10.1175/MWR-D-15-0426.1

Means of log scores and CRPS values over the entire verification period and all stations for the raw ensemble, MLR-B, MLR-S, POLR-B, and POLR-S h. The centered 90% confidence intervals have been obtained by block bootstrapping.

Citation: Monthly Weather Review 144, 7; 10.1175/MWR-D-15-0426.1

Weights pooled over all stations and training periods of

Citation: Monthly Weather Review 144, 7; 10.1175/MWR-D-15-0426.1

Weights pooled over all stations and training periods of

Citation: Monthly Weather Review 144, 7; 10.1175/MWR-D-15-0426.1

Weights pooled over all stations and training periods of

Citation: Monthly Weather Review 144, 7; 10.1175/MWR-D-15-0426.1

To assess the importance of *I*, we perform an in-depth comparison of predictive skill of the models c, e, g, and h. As the mean verification scores are almost equal, statistical testing is required in order to be able to make sound statements on relative model performances. To this end, a stationwise assessment of significant changes in CRPS and/or log score has been performed using block bootstrapping. To combine log score and CRPS, three cases are distinguished:

Deterioration: at least one of the two scores (CRPS or log score) is deteriorated, while the other is not improved.

No clear-cut difference: either both scores indicate no change in forecast skill or one of the two scores is improved, while the other is deteriorated.

Improvement: at least one of the two scores is improved, while the other is not deteriorated.

*I*leads to greater improvements in skill than adding

(from left to right) Percentage of stations with a deterioration, no clear-cut difference, or an improvement in skill when adding *I* to POLR-S c resulting in POLR-S g, and when adding

### b. Calibration and sharpness

Keeping the improvement in skill by TCC postprocessing in mind, calibration and sharpness are now assessed in more detail. Calibration is the degree of statistical consistency between predictive distributions and observations, and is verified using the probability integral transform (PIT; Dawid 1984; Diebold et al. 1998; Gneiting et al. 2007). Figure 4 compares the PIT histograms of the raw ensemble, MLR-B, POLR-B, and POLR-S h predictions at forecast lead times of 3, 6, and 10 days. Flat PIT histograms indicate well-calibrated forecast distributions, whereas a

Histograms of the PIT values pooled over all stations and verification days for the raw ensemble, MLR-B, POLR-B, and POLR-S h.

Citation: Monthly Weather Review 144, 7; 10.1175/MWR-D-15-0426.1

Histograms of the PIT values pooled over all stations and verification days for the raw ensemble, MLR-B, POLR-B, and POLR-S h.

Citation: Monthly Weather Review 144, 7; 10.1175/MWR-D-15-0426.1

Histograms of the PIT values pooled over all stations and verification days for the raw ensemble, MLR-B, POLR-B, and POLR-S h.

Citation: Monthly Weather Review 144, 7; 10.1175/MWR-D-15-0426.1

Sharpness refers to how focused a forecast is (Gneiting et al. 2007) and is assessed here by an evaluation of the variances, and the widths of the centered 90% prediction intervals, pooled over all stations and verification days. As shown in Fig. 5, the raw ensemble provides the sharpest forecasts at a forecast horizon of 3 days. At lead times of 6 and 10 days the sharpness of the raw ensemble and the postprocessed forecasts is quite poor. However, all postprocessed models are sharper than the raw ensemble. This result is somewhat surprising in that statistical postprocessing improves both calibration and sharpness. Further insight into this can be obtained by assessing marginal calibration (Gneiting et al. 2007). A forecast is marginally well calibrated if the average predictive cumulative distribution function (CDF) over all verification days equals the empirical CDF of the observations. A marginally well calibrated forecast leads to a horizontal marginal calibration graph. Details on the marginal calibration graph can be found in section c of the appendix. Figure 6 shows such graphs for the climatological, the raw ensemble, and the POLR-S h forecasts for a selection of European stations with different TCC climate. As expected, the climatological forecasts show almost perfect marginal calibration. The raw ensemble exhibits poor marginal calibration, even though it is mapped to the observation space in a sound way (see above and in section a of the appendix). It assigns too much weight to TCC values of 0 or 8 octas irrespective of station and lead time. Brussels provides a good example of this. The most frequently observed TCC value is 7 octas. However, the raw ensemble assigns forecast weight rather to 8 octas as can be seen from the accentuated negative peak in the marginal calibration graph. POLR-S h performs as well as the climatological forecasts in terms of marginal calibration. Hence, postprocessing conveys a significant improvement in marginal calibration.

Box plots showing the 5%, 25%, 50%, 75%, and 95% quantiles of the empirical distribution of (a) the forecast variances and (b) the widths of the centered 90% prediction intervals pooled over all stations and all verification days for the climatological, the raw ensemble, MLR-B, POLR-B, and POLR-S h forecasts. The horizontal dashed (dotted) line corresponds to the 50% quantile of the empirical distribution of the corresponding statistic of the raw ensemble (climatological) forecasts.

Citation: Monthly Weather Review 144, 7; 10.1175/MWR-D-15-0426.1

Box plots showing the 5%, 25%, 50%, 75%, and 95% quantiles of the empirical distribution of (a) the forecast variances and (b) the widths of the centered 90% prediction intervals pooled over all stations and all verification days for the climatological, the raw ensemble, MLR-B, POLR-B, and POLR-S h forecasts. The horizontal dashed (dotted) line corresponds to the 50% quantile of the empirical distribution of the corresponding statistic of the raw ensemble (climatological) forecasts.

Citation: Monthly Weather Review 144, 7; 10.1175/MWR-D-15-0426.1

Box plots showing the 5%, 25%, 50%, 75%, and 95% quantiles of the empirical distribution of (a) the forecast variances and (b) the widths of the centered 90% prediction intervals pooled over all stations and all verification days for the climatological, the raw ensemble, MLR-B, POLR-B, and POLR-S h forecasts. The horizontal dashed (dotted) line corresponds to the 50% quantile of the empirical distribution of the corresponding statistic of the raw ensemble (climatological) forecasts.

Citation: Monthly Weather Review 144, 7; 10.1175/MWR-D-15-0426.1

Marginal calibration plots comparing the climatological, the raw ensemble, and the POLR-S h forecasts at lead times of 3, 6, and 10 days at different stations in Europe. The observed climatology over the verification period is visualized by the bar plots showing the relative frequencies of the different TCC classes.

Citation: Monthly Weather Review 144, 7; 10.1175/MWR-D-15-0426.1

Marginal calibration plots comparing the climatological, the raw ensemble, and the POLR-S h forecasts at lead times of 3, 6, and 10 days at different stations in Europe. The observed climatology over the verification period is visualized by the bar plots showing the relative frequencies of the different TCC classes.

Citation: Monthly Weather Review 144, 7; 10.1175/MWR-D-15-0426.1

Marginal calibration plots comparing the climatological, the raw ensemble, and the POLR-S h forecasts at lead times of 3, 6, and 10 days at different stations in Europe. The observed climatology over the verification period is visualized by the bar plots showing the relative frequencies of the different TCC classes.

Citation: Monthly Weather Review 144, 7; 10.1175/MWR-D-15-0426.1

## 5. Discussion and conclusions

Both MLR and POLR prove to be useful methods for postprocessing of raw ensemble TCC forecasts. The results indicate that on average POLR with seasonally estimated model parameters performs best. This postprocessing method clearly improves forecast calibration. To achieve well calibrated forecasts, sharpness has to be reduced at the shorter forecast horizon of 3 days. But surprisingly, sharpness can be improved by postprocessing for the longer forecast lags of 6 and 10 days. Keeping in mind the paradigm stated by Gneiting et al. (2005, 2007) that the goal of statistical postprocessing is to maximize sharpness subject to calibration, the simultaneous improvement in calibration and sharpness is very desirable. This is mostly due to the tendency of the raw ensemble to assign too much weight to cloud cover states of 0 and 8 octas.

The methods presented in this study are designed to postprocess discrete TCC raw ensemble forecasts against SYNOP observations. Depending on the region, TCC observations are recorded automatically or manually, with different observation error characteristics. According to Mittermaier (2012) automated observations may underestimate the amount of high cloud (cirrus), while for human observers there is a tendency to underestimate cloud cover states of 0 and 8 octas. This may partly explain the poor marginal calibration of the raw ensemble when compared to the observations. However, a comparison of results at individual stations with manual observations and with automated observations did not reveal a systematic difference in marginal calibration. As human TCC observers are increasingly replaced by automated observations (Wacker et al. 2015), one would need to know the exact date at which a particular station has been changed from manual to automated for a more detailed analysis of this effect. Currently, SYNOP observations of total cloud cover are mainly automated in western Europe, North America, Australia, New Zealand, Japan, South Africa, and Antarctica. Because of the increasing number of automated stations, continuous TCC observations may become more widely available in the future. As the ECMWF raw ensemble provides TCC forecasts that are continuous on the unit interval, this would allow for continuous verification and postprocessing of TCC raw ensemble predictions and probably further enhance forecast skill. A continuous postprocessing method for predictions of visibility, which is a bounded variable like TCC, has already been implemented by Chmielecki and Raftery (2011).

TCC can be differentiated into low-, medium-, and high-level clouds. Predictive skill of NWP cloud cover forecasts can be different depending on cloud level. For instance, in the lowlands of the greater alpine region, the ECMWF HRES model underestimates persistent low stratus (Haiden and Trentmann 2016). It might be possible to reduce such systematic biases by cloud-level-specific postprocessing. Though a direct inclusion of low-, medium-, and high-level cloud forecasts as predictors in the POLR model [cf. (4)] did not lead to any improvement in forecast skill (results not shown in this paper), further analyses may be beneficial. In particular, a separate postprocessing of each cloud level with training observations differentiated according to cloud level may further increase forecast skill.

To summarize, considering the global set of SYNOP stations covered by this paper, postprocessing of discrete TCC raw ensemble predictions using readily available methods can improve forecast skill significantly. Hence, postprocessing helps to improve the generally low predictive performance of raw ensemble TCC forecasts. Additionally, this study identified the seasonal POLR model as the most skillful TCC postprocessing approach.

## Acknowledgments

S. Hemri gratefully acknowledges the support of the Klaus Tschira Foundation. Furthermore, we are grateful to D. S. Richardson and B. Ingleby of the ECMWF for helpful discussions and inputs. We like to thank T. Gneiting of the Heidelberg Institute for Theoretical Studies (HITS) and of the Institute for Stochastics at Karlsruhe Institute of Technology for valuable inputs and comments. Furthermore, we are grateful to M. Scheuerer of the NOAA/ESRL who performed preliminary analyses on postprocessing of TCC forecasts during his short stay at ECMWF. Last but not least, we like to thank those who made comments to our presentations on postprocessing of TCC forecasts at the workshops on statistical postprocessing of ensemble forecasts at the HITS in July 2015 and on forecast calibration/verification at the ECMWF in August 2015. Finally, we are grateful to the two anonymous reviewers for their helpful comments.

## APPENDIX

### Methods for Verification

#### a. TCC mapping

The SYNOP observations dataset at hand reports TCC states as values in *Z* according to Table A1.

Mapping of TCC raw ensemble and postprocessed forecasts.^{a}

#### b. Log score and CRPS

*F*and a corresponding observation

*z*, then the log score can be written as

*z*by

*F*. The CRPS is given by

*Z*,

*F*is a discrete probabilistic TCC forecast on the observed space, which is described in section a of the appendix, the CRPS can be calculated using the following:

#### c. Marginal calibration

*υ*in verification period

*V*, then the average predictive CDF for TCC can be written as

For a marginally well-calibrated forecast the graph of

## REFERENCES

Agresti, A., and M. Kateri, 2011: Categorical data analysis.

*International Encyclopedia of Statistical Science*, M. Lovric, Ed., Springer, 206–208.American Meteorological Society, 2015: Cloud cover. Glossary of Meteorology. [Available online at http://glossary.ametsoc.org/wiki/Cloud_cover.]

Ananth, C. V., and D. G. Kleinbaum, 1997: Regression models for ordinal responses: A review of methods and applications.

,*Int. J. Epidemiol.***26**, 1323–1333.Applequist, S., G. E. Gahrs, R. L. Pfeffer, and X.-F. Niu, 2002: Comparison of methodologies for probabilistic quantitative precipitation forecasting.

,*Wea. Forecasting***17**, 783–799, doi:10.1175/1520-0434(2002)017<0783:COMFPQ>2.0.CO;2.Ben Bouallègue, Z., 2013: Calibrated short-range ensemble precipitation forecasts using extended logistic regression with interaction terms.

,*Wea. Forecasting***28**, 515–524, doi:10.1175/WAF-D-12-00062.1.Bonferroni, C. E., 1936: Teoria statistica delle classi e calcolo delle probabilità (Statistical theory of classes and calculating probability). Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze, 8, 3–62.

Buizza, R., J.-R. Bidlot, N. Wedi, M. Fuentes, M. Hamrud, G. Holt, and F. Vitart, 2007: The new ECMWF VAREPS (variable resolution ensemble prediction system).

,*Quart. J. Roy. Meteor. Soc.***133**, 681–695, doi:10.1002/qj.75.Canty, A., and B. Ripley, 2014: boot: Bootstrap R (S-Plus) Functions, version 1.3-13. R package, accessed 3 November 2015. [Available online at http://CRAN.R-project.org/package=boot.]

Chmielecki, R. M., and A. E. Raftery, 2011: Probabilistic visibility forecasting using Bayesian model averaging.

,*Mon. Wea. Rev.***139**, 1626–1636, doi:10.1175/2010MWR3516.1.Dawid, A. P., 1984: Present position and potential developments: Some personal views: Statistical theory: The prequential approach.

,*J. Roy. Stat. Soc.***147A**, 278–292, doi:10.2307/2981683.Diak, G. R., M. C. Anderson, W. L. Bland, J. M. Norman, J. M. Mecikalski, and R. M. Aune, 1998: Agricultural management decision aids driven by real-time satellite data.

,*Bull. Amer. Meteor. Soc.***79**, 1345–1355, doi:10.1175/1520-0477(1998)079<1345:AMDADB>2.0.CO;2.Diebold, F. X., T. A. Gunther, and A. S. Tay, 1998: Evaluating density forecasts with applications to financial risk management.

,*Int. Econ. Rev.***39**, 863–883, doi:10.2307/2527342.Dixon, H. G., M. Lagerlund, M. J. Spittal, D. J. Hill, S. J. Dobbinson, and M. A. Wakefield, 2008: Use of sun-protective clothing at outdoor leisure settings from 1992 to 2002: Serial cross-sectional observation survey.

,*Cancer Epidemiol. Biomarkers Prev.***17**, 428–434, doi:10.1158/1055-9965.EPI-07-0369.Epstein, E. S., 1969: A scoring system for probability forecasts of ranked categories.

,*J. Appl. Meteor.***8**, 985–987, doi:10.1175/1520-0450(1969)008<0985:ASSFPF>2.0.CO;2.Gneiting, T., and A. E. Raftery, 2007: Strictly proper scoring rules, prediction, and estimation.

,*J. Amer. Stat. Assoc.***102**, 359–378, doi:10.1198/016214506000001437.Gneiting, T., A. E. Raftery, A. H. Westveld, and T. Goldman, 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation.

,*Mon. Wea. Rev.***133**, 1098–1118, doi:10.1175/MWR2904.1.Gneiting, T., F. Balabdoui, and A. E. Raftery, 2007: Probabilistic forecasts, calibration and sharpness.

,*J. Roy. Stat. Soc.***69B**, 243–268, doi:10.1111/j.1467-9868.2007.00587.x.Haiden, T., and J. Trentmann, 2016: Verification of cloudiness and radiation forecasts in the greater Alpine region.

,*Meteor. Z.***25**, 3–15, doi:10.1127/metz/2015/0630.Haiden, T., R. Forbes, M. Ahlgrimm, and A. Bozzo, 2015: The skill of ECMWF cloudiness forecasts.

*ECMWF Newsletter*, No. 143, ECMWF, Reading, United Kingdom,14–19.Hamill, T. M., 2012: Verification of TIGGE multimodel and ECMWF reforecast-calibrated probabilistic precipitation forecasts over the contiguous United States.

,*Mon. Wea. Rev.***140**, 2232–2252, doi:10.1175/MWR-D-11-00220.1.Hamill, T. M., J. S. Whitaker, and X. Wei, 2004: Ensemble reforecasting: Improving medium-range forecast skill using retrospective forecasts.

,*Mon. Wea. Rev.***132**, 1434–1447, doi:10.1175/1520-0493(2004)132<1434:ERIMFS>2.0.CO;2.Hamill, T. M., R. Hagedorn, and J. S. Whitaker, 2008: Probabilistic forecast calibration using ECMWF and GFS ensemble reforecasts. Part II: Precipitation.

,*Mon. Wea. Rev.***136**, 2620–2632, doi:10.1175/2007MWR2411.1.Hemri, S., M. Scheuerer, F. Pappenberger, K. Bogner, and T. Haiden, 2014: Trends in the predictive performance of raw ensemble weather forecasts.

,*Geophys. Res. Lett.***41**, 9197–9205, doi:10.1002/2014GL062472.Köhler, M., 2005: Improved prediction of boundary layer clouds.

*ECMWF Newsletter*, No. 104, ECMWF, Reading, United Kingdom, 18–22.Künsch, H. R., 1989: The jackknife and the bootstrap for general stationary observations.

,*Ann. Stat.***17**, 1217–1241, doi:10.1214/aos/1176347265.McClung, D. M., 2002: The elements of applied avalanche forecasting. Part II: The physical issues and the rules of applied avalanche forecasting.

,*Nat. Hazards***26**, 131–146, doi:10.1023/A:1015604600361.McCullagh, P., 1980: Regression model for ordinal data (with discussion).

,*J. Roy. Stat. Soc.***42B**, 109–142.Messner, J. W., G. J. Mayr, D. S. Wilks, and A. Zeileis, 2014: Extending extended logistic regression: Extended versus separate versus ordered versus censored.

,*Mon. Wea. Rev.***142**, 3003–3014, doi:10.1175/MWR-D-13-00355.1.Mittermaier, M., 2012: A critical assessment of surface cloud observations and their use for verifying cloud forecasts.

,*Quart. J. Roy. Meteor. Soc.***138**, 1794–1807, doi:10.1002/qj.1918.Molteni, F., R. Buizza, T. Palmer, and T. Petroliagis, 1996: The ECMWF ensemble prediction system: Methodology and validation.

,*Quart. J. Roy. Meteor. Soc.***122**, 73–119, doi:10.1002/qj.49712252905.Murphy, A. H., 1969: On the “ranked probability score.”

,*J. Appl. Meteor.***8**, 988–989, doi:10.1175/1520-0450(1969)008<0988:OTPS>2.0.CO;2.Pelland, S., G. Galanis, and G. Kallos, 2013: Solar and photovoltaic forecasting through post-processing of the Global Environmental Multiscale numerical weather prediction model.

,*Prog. Photovolt. Res. Appl.***21**, 284–296, doi:10.1002/pip.1180.Pinson, P., and R. Hagedorn, 2012: Verification of the ECMWF ensemble forecasts of wind speed against analyses and observations.

,*Meteor. Appl.***19**, 484–500, doi:10.1002/met.283.Raftery, A. E., T. Gneiting, F. Balabdaoui, and M. Polakowski, 2005: Using Bayesian model averaging to calibrate forecast ensembles.

,*Mon. Wea. Rev.***133**, 1155–1174, doi:10.1175/MWR2906.1.Richardson, D., S. Hemri, K. Bogner, T. Gneiting, T. Haiden, F. Pappenberger, and M. Scheuerer, 2015: Calibration of ECMWF forecasts.

*ECMWF Newsletter*, No. 142, ECMWF, Reading, United Kingdom, 12–16.Ripley, B., and W. Venables, 2014: Feed-forward neural networks and multinomial log-linear models, version 7.3-8, R package, accessed 3 November 2015. [Available online at https://cran.r-project.org/web/packages/nnet/.]

Roulin, E., and S. Vannitsem, 2012: Postprocessing of ensemble precipitation predictions with extended logistic regression based on hindcasts.

,*Mon. Wea. Rev.***140**, 874–888, doi:10.1175/MWR-D-11-00062.1.Ruiz, J. J., and C. Saulo, 2012: How sensitive are probabilistic precipitation forecasts to the choice of calibration algorithms and the ensemble generation method? Part I: Sensitivity to calibration methods.

,*Meteor. Appl.***19**, 302–313, doi:10.1002/met.286.Scheuerer, M., 2014: Probabilistic quantitative precipitation forecasting using ensemble model output statistics.

,*Quart. J. Roy. Meteor. Soc.***140**, 1086–1096, doi:10.1002/qj.2183.Schmeits, M. J., and K. J. Kok, 2010: A comparison between raw ensemble output, (modified) Bayesian model averaging, and extended logistic regression using ECMWF ensemble precipitation reforecasts.

,*Mon. Wea. Rev.***138**, 4199–4211, doi:10.1175/2010MWR3285.1.Taylor, J. W., and R. Buizza, 2003: Using weather ensemble predictions in electricity demand forecasting.

,*Int. J. Forecasting***19**, 57–70, doi:10.1016/S0169-2070(01)00123-6.Venables, W. N., and B. D. Ripley, 2002:

*Modern Applied Statistics with S.*4th ed. Springer, 495 pp.Wacker, S., and Coauthors, 2015: Cloud observations in Switzerland using hemispherical sky cameras.

,*J. Geophys. Res. Atmos.***120**, 695–707, doi:10.1002/2014JD022643.Walker, S. H., and D. B. Duncan, 1967: Estimation of the probability of an event as a function of several independent variables.

,*Biometrika***54**, 167–179, doi:10.1093/biomet/54.1-2.167.Wilks, D. S., 2009: Extending logistic regression to provide full-probability-distribution MOS forecasts.

,*Meteor. Appl.***16**, 361–368, doi:10.1002/met.134.Wilks, D. S., and T. M. Hamill, 2007: Comparisons of ensemble-MOS methods using GFS forecasts.

,*Mon. Wea. Rev.***135**, 2379–2390, doi:10.1175/MWR3402.1.Ye, Q. Z., and S. S. Chen, 2013: The ultimate meteorological question from observational astronomers: How good is the cloud cover forecast?

,*Mon. Not. Roy. Astron. Soc.***428**, 3288–3294, doi:10.1093/mnras/sts278.