## 1. Introduction

Uncertainty information is essential for an optimal use of a forecast (Krzysztofowicz 1983). Such information can be provided by an ensemble prediction system (EPS) that aims at describing the flow-dependent forecast uncertainty (Leutbecher and Palmer 2008). Several deterministic forecasts are run simultaneously, accounting for uncertainties in the description of the initial state, the model parameterization, and, for limited area models, the boundary conditions. Probabilistic products are derived from an ensemble, tailored to a specific user’s needs. For example, wind forecasts in the form of quantiles at selected probability levels are of particular interest for actors in the renewable energy sector (Pinson 2013).

However, probabilistic products generally suffer from a lack of reliability, the system showing biases and failing to fully represent the forecast uncertainty. Statistical techniques allow users to adjust the ensemble forecast, correcting for systematic inconsistencies (Gneiting et al. 2007). This step, known as calibration, is based on past data and usually focuses on a single or a few aspects of the ensemble forecast. For example, calibration of a wind forecast can be performed by univariate approaches (Bremnes 2004; Sloughter et al. 2010; Thorarinsdottir and Gneiting 2010) or bivariate methods, which account for correlation structures of the wind components (Pinson 2012; Schuhen et al. 2012). These calibration procedures provide reliable predictive probability distributions of wind speed or wind components for each forecast lead time and location independently. Decision-making problems can however require information about the spatial and/or temporal structure of the forecast uncertainty. Examples of application in the renewable energy sector resemble the optimal operation of a wind-storage system in a market environment, the unit commitment over a control zone, or the optimal maintenance planning (Pinson et al. 2009). In other words, scenarios that describe spatiotemporal wind variability are relevant products for end users of wind forecasts.

The generation of scenarios from calibrated ensemble forecasts is a step that can be performed with the use of empirical copulas. The empirical copula approaches are nonparametric and, in comparison with parametric approaches (Keune et al. 2014; Feldmann et al. 2015), simple to implement and computationally cheap. Empirical copulas can be based on climatological records [Schaake shuffle (ScSh); Clark et al. (2004)] or on the original raw ensemble [ensemble copula coupling (ECC); Schefzik et al. (2013)]. ECC, which features the conservation of the ensemble member rank structure from the original ensemble to the calibrated one, has the advantage of being applicable to any location within the model domain without restriction related to the availability of observations. However, unrealistic scenarios can be generated by the ECC approach when the postprocessing indiscriminately increases the ensemble spread to a large extent. Nonrepresentative correlation structures in the raw ensemble are magnified after calibration, leading to unrealistic forecast variability. As a consequence, ECC can deteriorate the ensemble information content when applied to ensembles with relatively poor reliability, as suggested, for example, by the verification results in Flowerdew (2014).

In this paper, a new version of the ECC approach is proposed to overcome the generation of unrealistic scenarios. Focusing on time series, a temporal component is introduced into the ECC scheme accounting for the autocorrelation of the forecast error over consecutive forecast lead times. The assumption of forecast error stationarity, already adopted for the development of fully parametric approaches (Pinson et al. 2009; Schölzel and Hense 2011), is exploited in combination with the structure information of the original scenarios. The new approach based on these two sources of information, past data and ensemble structure, is called *dual*-ensemble copula coupling (d-ECC). Objective verification is performed in order to show the benefits of the proposed approach with regard to the standard ECC.

The manuscript is organized as follows. Section 2 describes the dataset used to illustrate the manuscript as well as the calibration method applied to derive the calibrated quantile forecasts from the raw ensemble. Sections 3 and 4 introduce the empirical copula approaches for the generation of scenarios and discuss in particular the ECC and d-ECC methods. Section 5 describes the verification process for the scenario assessment. Section 6 presents the results obtained by means of multivariate scores and within a product-oriented verification framework.

## 2. Data

### a. Ensemble forecasts and observations

COSMO-DE-EPS is the high-resolution Consortium for Small-Scale Modeling (COSMO) EPS run operationally at DWD. It consists of 20 COSMO-DE forecasts with variations in the initial conditions, the boundary conditions, and the model physics (Gebhardt et al. 2011; Peralta et al. 2012). COSMO-DE-EPS follows the multimodel ensemble approach, with four global models each driving five physically perturbed members. The ensemble configuration implies a clustering of the ensemble members as a function of the driving global model when large-scale structures dominate the forecast uncertainty.

The focus here is on wind forecasts at 100-m height above ground. The postprocessing methods are applied to forecasts of the 0000 UTC run with an hourly output interval and a forecast horizon of up to 21 h. The observation dataset comprises quality-controlled wind measurements from seven stations: Risoe, *FINO1*, *FINO2*, *FINO3*, Karlsruhe, Hamburg, and Lindenberg, as plotted in Fig. 1. The verification period covers a 3-month period: March–May 2013.

Figure 2a shows an example of a COSMO-DE-EPS wind forecast at hub height. The forecast is valid on day 2 (March 2013) at *FINO1* (see Fig. 1). The ensemble members are shown in gray while the corresponding observations are in black. In Fig. 2b, the raw ensemble forecast is interpreted in the form of quantiles.

*τ*(with 0 ≤ τ ≤ 1) is defined aswhere

*F*is the cumulative probability distribution of the random variable

*n*can be interpreted as a quantile forecast at probability level

In the example shown in Fig. 2, the raw ensemble is not able to capture the observation variability. Calibration aims to correct for this lack of reliability by adjusting the mean and enlarging the spread of the ensemble forecast.

### b. Calibrated ensemble forecasts

Since COSMO-DE-EPS forecasts have been shown to suffer from statistical inconsistencies (Ben Bouallègue 2013, 2015), calibration has to be applied in order to provide reliable forecasts to the users. The method applied in this study is the bivariate nonhomogeneous Gaussian regression (EMOS; Schuhen et al. 2012). The mean and variance of each wind component, as well as the correlation between the two components, characterize the predictive bivariate normal distribution. Corrections applied to the raw ensemble mean and variance are optimized by minimizing the continuous ranked probability score (CRPS; Matheson and Winkler 1976). The calibration coefficients are estimated for each station and each lead time separately (local version of EMOS), based on a training period being defined as a moving window of 45 days.

The final calibrated products considered here are

Information about spatial and temporal dependence structures, which are crucial in many applications, are however no longer available after this calibration step (see Fig. 2c). The next postprocessing step consists then in the generation of consistent scenarios based on the calibrated samples.

## 3. Generation of scenarios

The generation of scenarios with empirical copulas is here briefly described. For a deeper insight into the methods, the reader is invited to refer to the original article by Schefzik et al. (2013) or to Wilks (2015) and references within.

*G*defined asof a random vector

*G*can be expressed aswhere

*C*is a copula that links an L-variate cumulative distribution function

*G*to its univariate marginal cdf’s:

In Eq. (6), a joint distribution is represented as univariate margins plus copulas. The problems of estimating univariate distributions and estimating dependence can therefore be treated separately. Univariate calibration marginal cdf’s *C* depends on the application and on the size *L* of the multivariate problem. We focus here on empirical copulas since they are suitable for problems with high dimensionality.

*L*-tuples of size

*N*with entries in

*L*is the dimension of the multivariate variable and

*N*is the number of scenarios. The rank of

*N*equidistant quantiles of

**q**is rearranged following the dependence structure of the reference template

**q**. The postprocessed scenarios

*l*is expressed as

The multivariate correlation structures are generated based on the rank correlation structures of a sample template

### a. Ensemble copula coupling

**x**:where

**x**as the required template in Eq. (7).

Based on COSMO-DE-EPS forecasts in Fig. 3a (identical to Fig. 2a), an example of scenarios derived with ECC is provided in Fig. 3b. The increase in spread after the calibration step implies a larger step-to-step variability in the time trajectories. Figure 4 focuses on a single scenario highlighting the difference between the original and postprocessed scenarios.

### b. Dual-ensemble copula coupling

ECC assumes that the ensemble prediction system correctly describes the spatiotemporal dependence structures of the weather variable. This assumption is quite strong and cannot be valid in all cases. On the other hand, based on the assumption of error stationarity, parametric methods have been developed that focus on the covariance structures of the forecast error (Pinson et al. 2009; Schölzel and Hense 2011). We propose a new version of the ECC approach, which is an attempt to combine both types of information: the structure of the original ensemble and the error autocorrelation estimated from past data. Therefore, the new scheme is called dual-ensemble copula coupling, as the copula relies on a dual source of information.

**e**the forecast error defined as the difference between ensemble mean forecasts and observations:where

Again here, we aim to construct a template [Eq. (7)] in order to establish the correlation structures within the calibrated ensemble:

- Apply ECC with the original ensemble forecast
**x**as the reference sample template, in order to derive a postprocessed ensemble of scenarios: - Derive the error correction
imposed to each scenario *i*of the reference template by this postprocessing step: *Transformation step*: Apply a transformation to the correctionof each scenario based on the estimate of the error autocorrelation and its eigendecomposition in order to derive the *adjusted corrections*: - Derive the so-called adjusted ensemble
:where a scenario of is defined as a combination of the original member and the adjusted error correction, namely, - Take
as the reference template in Eq. (7) so that the new empirical copula is based on the adjusted ensemble.

The d-ECC reference template *coloring transformation* of a vector of random variables (Kessy et al. 2015).

## 4. Illustration and discussion of d-ECC

Focusing on a single member, the d-ECC steps are illustrated in Fig. 4. First, the correction associated with each ECC scenario with respect to the corresponding original ensemble member is computed [black line in Fig. 4b; Eq. (18)]. This scenario correction is adjusted based on the assumption of temporal autocorrelation of the error [dashed line in Fig. 4b; Eq. (20)]. This adjusted scenario correction is then superimposed onto the original ensemble forecast before the correlation structure of the adjusted ensemble is drawn again.

The new scheme reduces to the standard ECC in the case where

, where is the identity matrix, which means that there is no temporal correlation of the error in the original ensemble; , where **0**is the null vector, which means that the calibration step does not impact the forecast, the forecast being already well calibrated; and, where *h*is a constant and**J**an all-ones vector, which means that the calibration step corrects only for bias errors and the system is spread-bias free.

**Y**and

*k*and defined aswhere

*t*. The corresponding estimators areandFrom Eq. (18) recall thatso, we can rewrite the expression in Eq. (25) aswhere

*t*, respectively. The term

*ε*corresponds to the estimated covariances of

*x*and

*c*, and can be considered to be negligible assuming that the original forecast and the corrections are drawn from two independent random processes.

**the mean vector and**

*μ***Σ**the covariance matrix. The mean vector is set to a null vector,in all cases. The covariance matrix of the observation distribution is set toso the distribution has unit variances and a correlation coefficient of 0.5 between the two dimensions. Using this setting results in target quantiles of the calibration process that correspond to the quantiles of the standard normal distribution. The covariance matrix of the forecast distribution is defined aswith α a spread parameter and β a correlation parameter that allow us to simulate deficiencies in spread and correlation of the synthetic ensemble forecasts. Postprocessing using ECC and d-ECC is applied considering 50 ensemble members and a sample of 1000 cases. The impact of the multivariate postprocessing schemes is illustrated by plotting the correlation coefficient between the two dimensions of the process for a range of α and β parameters (Fig. 5). The correlation coefficient of the observation is maintained as a constant (0.5) and the correlation of the raw forecasts is modified by varying the parameter β from 0.1 to 0.9. The spread parameter α takes a value of 0.5 to simulate an underdispersive ensemble, 1 a calibrated ensemble, and 1.5 an overdispersive ensemble.

The correlation structure of the forecast is not modified by applying ECC, as illustrated by the gray dashed line while the gray line shows how the transformation step affects the correlation structure of the forecast: the correlation is increased when the ensemble is underdispersed and decreased in cases of overdispersion. We find that d-ECC appears to be appropriate in the cases of ensemble forecasts with the following combination of characteristics: underdispersion combined with a lack of autocorrelation or overdispersion combined with too strong autocorrelation in the time series.

This investigation could certainly be extended considering more complex idealized studies and developing a rigorous mathematical framework. This would be welcomed as further research and would add additional evidence to the expected behavior of d-ECC. Furthermore, in the remainder of this paper, time series derived with d-ECC are compared to ECC-derived scenarios. A complementary study could aim to estimate the benefits of the dual approach with respect to purely statistical methods that only account for error characteristics estimated from historical data (Pinson et al. 2009; Möller et al. 2013).

Another important aspect of d-ECC is the estimation of the correlation matrix

Considering again our case study, the scenarios generated with d-ECC based on the COSMO-DE-EPS forecasts are shown in Fig. 3c. The d-ECC-derived scenarios are smoother and subjectively more realistic than the ones derived with ECC in Fig. 3b. In Fig. 4, focusing on a single scenario, it is highlighted that the difference between the original and the d-ECC time trajectories varies gradually from one time interval to the next, while abrupt transitions occur in the case of the ECC scenario, as in this example between hours 15 and 17.

Note that d-ECC does not give the same result as would a simple smoothing of the calibrated scenarios **q** of the calibrated ensemble and possibly diminish its reliability. Instead, d-ECC affects the time variability of the scenarios by constructing a template [Eq. (7)] based on **q**.

## 5. Verification methods

### a. Multivariate scores

**y**, the energy score (ES; Gneiting et al. 2008) is defined aswhere ||⋅|| represents the Euclidean norm. ES is a generalization of the CRPS to the multivariate case.

*p*the order of the variogram and where

*i*and

*j*indicate the

*i*th and the

*j*th components of the marked vectors, respectively. To focus on rapid changes in wind speed, the weights

*i*and

*j*are here forecast lead-time indices.

### b. Multivariate rank histograms

The multivariate aspect of the forecast is in a second step assessed by means of rank histograms applied to multidimensional fields (Thorarinsdottir et al. 2016). Two variants of the multivariate rank histogram are applied: the averaged rank histogram (ARH) and the band depth rank histogram (BDRH). The difference between the two approaches lies in the way they distinguish pre-ranks from multivariate forecasts. ARH considers the averaged rank over the multivariate aspect while BDRH assesses the centrality of the observation within the ensemble based on the concept of functional band depth.

The interpretation of ARH is the same as the interpretation of a univariate rank histogram:

### c. Product-oriented verification

In addition to multivariate verification of time series scenarios, the forecasts are assessed within a product-oriented framework. This type of scenario verification follows the spirit of the event-oriented verification framework proposed by Pinson and Girard (2012). Probabilistic forecasts that require time trajectories are provided and assessed by means of well-established univariate probabilistic scores.

Two types of products derived from forecasted scenarios are examined here. The first one is defined as the mean wind speed over a day (here, a day is limited to the 21-h forecast horizon). The second product is defined as the maximal upward wind ramp over a day, a wind ramp being defined as the difference between two consecutive forecast intervals. For both products, 20 forecasts are derived from the 20 scenarios at each station and each verification day.

### d. Bootstrapping

The statistical significance of the results is tested by applying a block-bootstrap approach. Bootstrapping is a resampling technique that provides an estimation of the statistical consistency and is commonly applied to meteorological datasets (Efron and Tibshirani 1986).

A block-bootstrap approach is applied in the following, which consists of defining a block as a single day during the verification period (Hamill 1999). Each day is considered to be a separate block of fully independent data. The verification process is repeated 500 times by using each time a random sample with replacement of the 92 verification days (March–May 2013). The derived score distributions illustrate consequently the variability of the performance measures over the verification period and not between locations. Boxplots are used to represent the distributions of the performance measures, where the quantile of the distributions at probability levels of 5%, 25%, 50%, 75%, and 95% are highlighted.

## 6. Results and discussion

Before applying the verification methods introduced in the previous section, we propose to explore statistically the COSMO-DE-EPS time series variability by means of a spectral analysis, an analysis of the time series in the frequency domain. Such an analysis is useful in order to describe the statistical properties of the scenarios but also has direct implications for user’s applications (see below; Vincent et al. 2010). A Fourier transformation is applied to each forecasted and observed scenario and the contributions of the oscillations at various frequencies to the scenario variance are examined (Wilks 2006). In Fig. 7, the mean amplitude of the forecast and observation time series over all stations and verification days is plotted as a function of their frequency components.

As has already been suggested by the case study, this analysis confirms that the ECC considerably increases the variability of the time trajectories with respect to the original ensemble, in particular at high frequencies. The ECC scenario fluctuations are also much larger than the observed ones. Indeed, the amplitude is on average about 2 times larger at high frequencies in ECC time series than in the observed results, which explains the visual impression that ECC scenarios are unrealistic. Conversely, scenarios derived with the new copula approach do not exhibit such features. While the original ensemble shows a deficit of variability with respect to the observations, the d-ECC approach allows for improving this aspect of the forecast. This first result, showing that d-ECC scenarios have a mean spectrum similar to that of the observations, is complemented with an objective assessment of the forecasted scenarios based on probabilistic verification measures.

Figure 8 shows the performance of the forecasted time trajectories by means of multivariate scores. The postprocessed scenarios perform significantly better than the raw members in terms of ES (Fig. 8a). In terms of pVS, the d-ECC scenarios are better than the ECC ones and significantly better than the raw ones when *p* = 0.5 (Fig. 8b). For higher orders of the variogram (here *p* = 1; Fig. 8c), the forecast improvement after postprocessing is still clear when using d-ECC while the ECC results are slightly worse than those of the original forecasts.

Figure 9 depicts the results in terms of multivariate rank histograms for ARH (top panel) and BDRH (bottom panel). The raw ensemble shows clear reliability deficiencies (Figs. 9a,d), which motivated the use of postprocessing techniques. Forecasts derived with ECC continue to show underdispersiveness but also too little correlation (Figs. 9b,e) while forecasts derived with d-ECC are better calibrated according to the rank histograms in Figs. 9c,f. Indeed, both plots indicate good reliability among the d-ECC-derived scenarios.

Figure 10 focuses on two products drawn from the time series forecasts: the daily mean wind speed (top panel) and the daily maximal upward ramp (bottom panel). The performance are assessed in terms of CRPS, CRPS reliability, and CRPS resolution, from left to right, respectively. Looking at the results in terms of CRPS, we note the high degree of similarity between Figs. 10a and 10d and Figs. 8a and 8c. As for the ES, postprocessing significantly improves the forecasts of the daily mean product. As for pVS with *p* = 1, d-ECC improves the ramp product with respect to the original while ECC does not generate improved products. The CRPS decomposition allows us to provide detail related to the origin of these performance improvements. We see in Figs. 10b,e that the CRPS results are mainly explained by the impact of the postprocessing on the CRPS reliability components. However, focusing on the results in terms of CRPS resolution in Figs. 10c,f, we note that the resolution of the original and d-ECC products are comparable while ECC deteriorates the resolution of the ramp product with respect to the original.

These verification results are interpreted as follows. Calibration corrects for the mean of the ensemble forecast and this is reflected, after the derivation of scenarios, by an improvement in the ES and daily mean product skill. Calibration also corrects for spread deficiencies increasing the variability of the ensemble forecasts. This increase in spread associated with the preservation of the rank structure of the original ensemble, as is the case with the ECC approach, enlarges indiscriminately the temporal variability of the forecasts and leads to a slight deterioration of the pVS and ramp product results.

The d-ECC approach provides scenarios with a temporal variability comparable to that of the observations. In that case, the benefit of the calibration step in terms of reliability (at single forecast lead times) persists at the multivariate level (looking at time trajectories) after the reconstruction of scenarios with d-ECC. The multivariate reliability, or the reliability of derived products, is significantly improved after postprocessing, though it is not perfect for specific derived products. Moreover, d-ECC scenarios perform as well as the original ensemble forecast in terms of resolution. So, unlike ECC, d-ECC is able to generate reliable scenarios with a level of resolution that does not deteriorate with respect to the original ensemble forecasts.

## 7. Conclusions and outlook

A new empirical copula approach is proposed for the postprocessing of calibrated ensemble forecasts. The so-called dual-ensemble copula coupling approach is introduced with a focus on temporal structures of wind forecasts. The new scheme includes a temporal component in the ECC approach accounting for the error autocorrelation of the ensemble members. The estimation of the correlation structure in the error based on past data allows for adjusting the dependence structure in the original ensemble.

Based on COSMO-DE-EPS forecasts, the scenarios derived with d-ECC prove to be qualitatively realistic and quantitatively of superior quality. Postprocessing of wind speed combining EMOS and d-ECC improves the forecasts in many aspects. In comparison to ECC, d-ECC drastically improves the quality of the derived scenarios. Applications that require temporal trajectories will fully benefit from the new approach in that case. As for any postprocessing technique, the benefit of the new copula approach can be weakened by improving the representation of the forecast uncertainty with more efficient member generation techniques and/or by improving the calibration procedure correcting for conditional biases. Meanwhile, with its low additional complexity and computational costs, d-ECC can be considered to be a valuable alternative to the standard ECC for the generation of consistent scenarios from COSMO-DE-EPS.

Though only the temporal aspects have been investigated in this study, the dual-ensemble copula approach could be generalized to any multivariate setting. Further research is however required for the application of d-ECC at scales that are unresolved by the observations. For example, geostatistical tools could be applied for the description of the autocorrelation error structure at the model grid level. Moreover, the mathematical interpretation of the d-ECC scheme developed here would benefit from further theoretical investigation.

This work has been done within the framework of the EWeLiNE project (Erstellung innovativer Wetter- und Leistungsprognosemodelle für die Netzintegration wetterabhängiger Energieträger) funded by the German Federal Ministry for Economic Affairs and Energy. The authors acknowledge the Department of Wind Energy of the Technical University of Denmark (DTU), the German Wind Energy Institute (DEWI GmbH), DNV GL, the Meteorological Institute (MI) of University of Hamburg, and the Karlsruhe Institute of Technology (KIT) for providing wind measurements at stations Risoe, *FINO1* and *FINO3*, *FINO2*, Hamburg and Lindenberg, and Karlsruhe, respectively. The authors are also grateful to Tilmann Gneiting and two anonymous reviewers for helpful and accurate comments on a previous version of this manuscript.

## REFERENCES

Ben Bouallègue, Z., 2013: Calibrated short-range ensemble precipitation forecasts using extended logistic regression with interaction terms.

,*Wea. Forecasting***28**, 515–524, doi:10.1175/WAF-D-12-00062.1.Ben Bouallègue, Z., 2015: Assessment and added value estimation of an ensemble approach with a focus on global radiation forecasts.

,*Mausam***66**, 541–550.Bentzien, S., , and P. Friederichs, 2014: Decomposition and graphical portrayal of the quantile score.

,*Quart. J. Roy. Meteor. Soc.***140**, 1924–1934, doi:10.1002/qj.2284.Bremnes, J. B., 2004: Probabilistic wind power forecasts using local quantile regression.

,*Wind Energy***7**, 47–54, doi:10.1002/we.107.Brier, G. W., 1950: Verification of forecasts expressed in terms of probability.

,*Mon. Wea. Rev.***78**, 1–3, doi:10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2.Bröcker, J., 2012: Evaluating raw ensembles with the continuous ranked probability score.

,*Quart. J. Roy. Meteor. Soc.***138**, 1611–1617, doi:10.1002/qj.1891.Clark, M., , S. Gangopadhyay, , L. Hay, , B. Rajagopalan, , and R. Wilby, 2004: The Schaake shuffle: A method for reconstructing space–time variability in forecasted precipitation and temperature fields.

,*J. Hydrometeor.***5**, 243–262, doi:10.1175/1525-7541(2004)005<0243:TSSAMF>2.0.CO;2.Efron, B., , and R. Tibshirani, 1986: Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy.

,*Stat. Sci.***1**, 54–75, doi:10.1214/ss/1177013815.Feldmann, K., , M. Scheuerer, , and T. Thorarinsdottir, 2015: Spatial postprocessing of ensemble forecasts for temperature using nonhomogeneous Gaussian regression.

,*Mon. Wea. Rev.***143**, 955–971, doi:10.1175/MWR-D-14-00210.1.Flowerdew, J., 2014: Calibrating ensemble reliability whilst preserving spatial structure.

,*Tellus***66A**, 22662, doi:10.3402/tellusa.v66.22662.Gebhardt, C., , S. E. Theis, , M. Paulat, , and Z. Ben Bouallègue, 2011: Uncertainties in COSMO-DE precipitation forecasts introduced by model perturbations and variation of lateral boundaries.

,*Atmos. Res.***100**, 168–177, doi:10.1016/j.atmosres.2010.12.008.Gneiting, T., , F. Balabdaoui, , and A. E. Raftery, 2007: Probabilistic forecasts, calibration, and sharpness.

,*J. Roy. Stat. Soc.***69B**, 243–268, doi:10.1111/j.1467-9868.2007.00587.x.Gneiting, T., , L. Stanberry, , E. Grimit, , L. Held, , and N. Johnson, 2008: Assessing probabilistic forecasts of multivariate quantities, with applications to ensemble predictions of surface winds.

,*Test***17**, 211–235, doi:10.1007/s11749-008-0114-x.Hamill, T. M., 1999: Hypothesis tests for evaluating numerical precipitation forecasts.

,*Wea. Forecasting***14**, 155–167, doi:10.1175/1520-0434(1999)014<0155:HTFENP>2.0.CO;2.Kessy, A., , A. Lewin, , and K. Strimmer, 2015: Optimal whitening and decorrelation. arXiv.org, 14 pp. [Available online at https://arxiv.org/abs/1512.00809.]

Keune, J., , C. Ohlwein, , and A. Hense, 2014: Multivariate probabilistic analysis and predictability of medium-range ensemble weather forecasts.

,*Mon. Wea. Rev.***142**, 4074–4090, doi:10.1175/MWR-D-14-00015.1.Koenker, R., , and G. Bassett, 1978: Regression quantiles.

,*Econometrica***46**, 33–50, doi:10.2307/1913643.Krzysztofowicz, R., 1983: Why should a forecaster and a decision maker use Bayes theorem.

,*Water Resour. Res.***19**, 327–336, doi:10.1029/WR019i002p00327.Leutbecher, M., , and T. N. Palmer, 2008: Ensemble forecasting.

,*J. Comput. Phys.***227**, 3515–3539, doi:10.1016/j.jcp.2007.02.014.Matheson, J. E., , and R. L. Winkler, 1976: Scoring rules for continuous probability distributions.

,*Manage. Sci.***22**, 1087–1096, doi:10.1287/mnsc.22.10.1087.Möller, A., , A. Lenkoski, , and T. L. Thorarinsdottir, 2013: Multivariate probabilistic forecasting using ensemble Bayesian model averaging and copulas.

,*Quart. J. Roy. Meteor. Soc.***139**, 982–991, doi:10.1002/qj.2009.Peralta, C., , Z. Ben Bouallègue, , S. E. Theis, , and C. Gebhardt, 2012: Accounting for initial condition uncertainties in COSMO-DE-EPS.

,*J. Geophys. Res.***117**, D07108, doi:10.1029/2011JD016581.Pinson, P., 2012: Adaptive calibration of (

*u*,*v*)-wind ensemble forecasts.,*Quart. J. Roy. Meteor. Soc.***138**, 1273–1284, doi:10.1002/qj.1873.Pinson, P., 2013: Wind energy: Forecasting challenges for its operational management.

,*Stat. Sci.***28**, 564–585, doi:10.1214/13-STS445.Pinson, P., , and R. Girard, 2012: Evaluating the quality of scenarios of short-term wind power generation.

,*Appl. Energy***96**, 12–20, doi:10.1016/j.apenergy.2011.11.004.Pinson, P., , and J. Tastu, 2013: Discrimination ability of the energy score. DTU Tech. Rep., Technical University of Denmark, 16 pp. [Available online at http://orbit.dtu.dk/files/56966842/tr13_15_Pinson_Tastu.pdf.]

Pinson, P., , G. Papaefthymiou, , B. Klockl, , H. Nielsen, , and H. Madsen, 2009: From probabilistic forecasts to statistical scenarios of short-term wind power production.

,*Wind Energy***12**, 51–62, doi:10.1002/we.284.Schefzik, R., , T. Thorarinsdottir, , and T. Gneiting, 2013: Uncertainty quantification in complex simulation models using ensemble copula coupling.

,*Stat. Sci.***28**, 616–640, doi:10.1214/13-STS443.Scheuerer, M., , and T. M. Hamill, 2015: Variogram-based proper scoring rules for probabilistic forecasts of multivariate quantities.

,*Mon. Wea. Rev.***143**, 1321–1334, doi:10.1175/MWR-D-14-00269.1.Schölzel, C., , and A. Hense, 2011: Probabilistic assessment of regional climate change in southwest Germany by ensemble dressing.

,*Climate Dyn.***36**, 2003–2014, doi:10.1007/s00382-010-0815-1.Schuhen, N., , T. Thorarinsdottir, , and T. Gneiting, 2012: Ensemble model output statistics for wind vectors.

,*Mon. Wea. Rev.***140**, 3204–3219, doi:10.1175/MWR-D-12-00028.1.Sklar, M., 1959: Fonctions de répartition à n dimensions et leurs marges.

,*Publ. Inst. Stat. Univ. Paris***8**, 229–231.Sloughter, J., , T. Gneiting, , and A. E. Raftery, 2010: Probabilistic wind speed forecasting using ensembles and Bayesian model averaging.

,*J. Amer. Stat. Assoc.***105**, 25–35, doi:10.1198/jasa.2009.ap08615.Thorarinsdottir, T., , and T. Gneiting, 2010: Probabilistic forecasts of wind speed: Ensemble model output statistics by using heteroscedastic censored regression.

,*J. Roy. Stat. Soc.***173A**, 371–388, doi:10.1111/j.1467-985X.2009.00616.x.Thorarinsdottir, T., , M. Scheuerer, , and C. Heinz, 2016: Assessing the calibration of high-dimensional ensemble forecasts using rank histograms.

,*J. Comput. Graph. Stat.***25**, 105–122, doi:10.1080/10618600.2014.977447.Vincent, C., , G. Giebel, , P. Pinson, , and H. Madsen, 2010: Resolving nonstationary spectral information in wind speed time series using the Hilbert–Huang transform.

,*J. Appl. Meteor. Climatol.***49**, 253–267, doi:10.1175/2009JAMC2058.1.Wilks, D. S., 2006:

*Statistical Methods in the Atmospheric Sciences*. 2nd ed. Academic Press, 627 pp.Wilks, D. S., 2015: Multivariate ensemble model output statistics using empirical copulas.

,*Quart. J. Roy. Meteor. Soc.***141**, 945–952, doi:10.1002/qj.2414.