1. Introduction
One goal of ensemble forecasting is the quantification of flow-dependent forecast uncertainties. Ensemble forecasts directly taken as output from model-based ensemble prediction systems (EPSs) require postprocessing (calibration) to remove systematic errors and to increase both their reliability and statistical consistency. A variety of methods have recently been developed for the statistical calibration of ensemble forecasts (Gneiting et al. 2005; Raftery et al. 2005; Hamill and Whitaker 2006; Pinson 2012; Alessandrini et al. 2013, among others).
To perform well, each method requires appropriate training data consisting of past forecasts and measurements. Pinson (2012) developed a recursive and adaptive wind vector calibration (AUV) in which only the last forecast–measurement pair is used to update the model coefficients, while the weight of training data from previous model updates exponentially decreases. Hamill and Whitaker (2006) proposed an analog-based approach where the N closest forecasts (analogs) to a current model-based ensemble forecast are searched over a training period and where analyses that correspond to the forecast analogs constitute an analog ensemble. Gneiting et al. (2005) suggested the use of rolling training periods including the N previous days of forecasts and measurements, to estimate the model coefficients of the ensemble model output statistics [EMOS; also see Raftery et al. (2005)].
This study proposes an analog-based EMOS in which the model coefficients are estimated from a subset of the training data, which are the N closest analogs to a current ensemble forecast and their corresponding measurements. The choice of an analog is determined by a metric given an optimized combination of predictors. The analog-based EMOS is tested using ensemble forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF) EPS and 100-m wind measurements from six towers, a quantity important for the prediction of wind power. The analog-based EMOS is compared against EMOS, AUV, and the analog ensemble, where all postprocessing methods are applied to the raw ECMWF EPS.
2. Data
The forecasts from the global ECMWF EPS consist of 50 perturbed predictions and 1 unperturbed control forecast. The current horizontal resolution of ECMWF EPS is T639 (since 26 January 2010), which corresponds to about ~30-km horizontal grid increments. The nearest grid point to the geographical coordinate of each measurement tower is used. Note that both bilinear interpolation and the nearest gridpoint approach result in a similar forecast skill for the tower sites considered in this study (not shown). The 3-hourly wind forecasts at 100-m height are taken from the run initialized at 0000 UTC up to lead time of 120 h from February 2010 to June 2014. Data from February 2010 to June 2012 are used as a training period, and from July 2012 to June 2014 as a 2-yr test period.
The 100-m wind measurements, which are available at a temporal interval of 10 min, are from the onshore towers Cabauw, Netherlands (51.970°N, 4.926°E); Falkenberg (52.167°N, 14.122°E), Karlsruhe (49.093°N, 8.426°E), and Hamburg (53.519°N, 10.103°E), Germany, and the offshore Research Platforms Fino2 (55.004°N, 13.154°E) and Fino3 (55.195°N, 7.158°E). The measurements at Hamburg are influenced by upwind urban and industrial areas, and at Karlsruhe by the surrounding mountains and a nearby forest. The Falkenberg tower is in an area with mixed farmland and forest. The Cabauw tower is in flat terrain. We refer to Junk et al. (2014) for more details on the tower measurements.
3. Methods
a. Verification methods
The continuous ranked probability score (CRPS) is a proper scoring rule to verify the performance of probabilistic predictions; we follow Eq. (21) from Gneiting and Raftery (2007) to calculate the CRPS values. The Brier score [BS; Brier (1950); Eq. (8.36) from Wilks (2011)] is used as a scoring rule to verify ensemble forecasts at single-event thresholds (50th and 95th percentile). To evaluate reliability and sharpness, we present reliability diagrams (Wilks 2011). The observed percentiles are calculated separately for each station, month, and time of day, following the approach proposed by Eckel and Delle Monache (2015, manuscript submitted to Mon. Wea. Rev.). The statistical significance of the scoring rules is assessed with bootstrap resampling (Efron 1979; Bröcker and Smith 2007; Pinson et al. 2010).
b. EMOS









c. Analog ensemble









To find the optimal weights
Optimized predictor weights (%) of the 100-m ensemble mean wind direction (100-m WD mean), 100-m ensemble mean wind speed (100-m WS mean), and the variance of the 100-m wind speed ensemble (100-m WS var) at all measurement sites. The weights are optimized by minimizing the CRPS of the 20-member wind speed analog ensemble with a brute-force predictor-weighting strategy.
The optimization is based on finding
d. Analog-based EMOS
The EMOS model as proposed by Gneiting et al. (2005) and Thorarinsdottir and Gneiting (2010) estimates the model coefficients based on rolling training periods. We propose a new approach for the selection of training data by searching for N analogs to a current ensemble forecast with the metric shown in Eq. (2), a set of predictors, and their optimized weights. The optimized weight
For a given lead time, the N analogs and corresponding measurements form the training dataset for estimating the coefficients. Since the quality of an analog is given by the distance value computed by the metric in Eq. (2), the training data are also weighted with the inverse of this distance. We found that
e. AUV
A comparison of calibration methods using ECMWF EPS 100-m wind forecasts indicated that the bivariate AUV calibration proposed by Pinson (2012) outperforms other methods such as EMOS at almost all towers (Junk et al. 2014). For this reason, we benchmark the analog-based EMOS not only against EMOS and the analog ensemble but also against AUV.
The AUV calibration recursively estimates the wind components via bivariate bias correction and univariate variance correction along each wind component in a bivariate normal framework. While EMOS minimizes an objective function based on the CRPS, AUV minimizes an objective function in a recursive maximum likelihood framework. To update the AUV coefficients only the last forecast and measurement is used. Exponential forgetting of past forecast–measurement pairs ensures that the coefficients smoothly adapt to changing conditions. The forgetting factor
4. Results
As mentioned in the previous section, the estimation of EMOS coefficients from previous N forecasts and measurements might have the disadvantage that the forecast error distribution over the N forecast–measurement pairs may include information that is not relevant for calibrating the current forecast (Fig. 1, left). Thus, the aim of the analog-based EMOS is to select training forecasts (analogs) in the neighborhood of the current forecast with presumably similar error characteristics (Fig. 1, right).
Example of the (left) EMOS and (right) analog-based EMOS calibration at Hamburg for 48-h lead time of the forecast initialized at 0000 UTC 29 Sep 2012. The circles indicate the training data (i.e., pairs of ensemble mean wind speed and corresponding measurement) for the estimation of the a and b coefficients in Eq. (1). The red square shows the uncalibrated ensemble mean wind speed forecast and the blue triangle shows the calibrated ensemble mean. For the analog-based EMOS, the size of each circle is proportional to the inverse of the distance computed with the metric in Eq. (2).
Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-15-0095.1
To understand how the analog-based EMOS approach works compared to EMOS, the distribution of the estimates of the EMOS coefficients is analyzed in Fig. 2. Under the ideal assumption that analogs are in perfect agreement with the current forecast (infinite training length, a frozen model, and stationary climate), the ensemble mean
Distributions of the estimates of the EMOS coefficients a, b, c, and d in Eq. (1) for EMOS and analog-based EMOS over the test period. The distributions are for Karlsruhe for 3–120-h lead times. Shapes of the coefficient distributions are very similar at the remaining sites.
Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-15-0095.1
The analog-based EMOS significantly improves the CRPS at almost all sites (Fig. 3). At Hamburg and Karlsruhe, the CRPS decreases by up to 11% compared to EMOS. The analog-based EMOS also outperforms AUV and the analog ensemble at all sites. The Brier score values at the 50th-percentile (common event) and 95th-percentile (rare event) threshold lead to similar results (Fig. 4). Only for rare events at Fino2, AUV has a higher skill than the analog-based EMOS. However, the statistical significance of the results is lower for rare events because the confidence intervals are considerably wider. Note that a portion of the lower probabilistic skill of the 20-member analog ensemble in terms of the CRPS and BS might be attributed to the lower ensemble size compared to the 51-member analog-based EMOS (Ferro et al. 2008).
Continuous ranked probability skill score (CRPSS) (%) relative to the EMOS ensemble of the 20-member analog ensemble (Analog), the AUV-calibrated ensemble (AUV), the analog-based EMOS with the optimized predictor weights (AN-EMOS), and with the 100-m ensemble mean wind speed as only predictor (AN-EMOS-100WS). The CRPSS is presented at each measurement tower for lead times up to 120 h over the test period. The same plot but calculated over all forecast lead times is shown to the right of each figure. The 95% bootstrap confidence intervals are indicated by the errors bars.
Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-15-0095.1
Brier skill score (BSS) (%) relative to the EMOS ensemble for wind speeds larger than the (top) 50th and (bottom) 95th percentile threshold of the 20-member analog ensemble (Analog), the AUV-calibrated ensemble (AUV), and the analog-based EMOS ensemble (AN-EMOS) at each measurement tower at 100-m height for 3–120-h lead times over the test period. The boxes indicate 50% and the whiskers indicate 95% bootstrap confidence intervals. The thick black line within the boxes is the median of the bootstrap.
Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-15-0095.1
The analog-based EMOS is more reliable for rare events than AUV and EMOS, which produce too large forecast probabilities compared to the conditional observed frequencies (Fig. 5). The analog ensemble and analog-based EMOS have overall similar reliabilities. Reliability diagrams are not shown for the common events since all methods yield forecasts with similar reliabilities and sharpness. We also compared the methods in terms of their statistical consistency with rank histograms and binned-spread/skill diagrams. However, we do not present these results because all methods produced statistically consistent ensemble forecasts.
Reliability diagram and sharpness histogram at the 95th percentile threshold at each measurement tower for 3–120-h lead times over the test period. Each sharpness histogram displays the relative frequency of events in each forecast probability bin. The vertical ranges represent consistency bars that have been calculated with a quantile function for a binomial distribution.
Citation: Monthly Weather Review 143, 7; 10.1175/MWR-D-15-0095.1
As explained in section 3d, the key component of the analog-based EMOS is the search for past forecast analogs given a set of predictors and their weights optimized over the training period. The ensemble mean wind speed predictor dominates the combination at all sites except at Karlsruhe and Hamburg where wind direction is equally important (Table 1), which may be explained by the stronger wind-direction dependency of ECMWF EPS forecast errors at these sites (not shown).
To show that the skill of the analog-based EMOS strongly depends on the optimized predictor combination, we ran the analog-based EMOS with the ensemble mean wind speed as the only predictor. The CRPS improvements relative to EMOS are close to zero (Fig. 3, black line), which emphasizes the importance of finding the optimal weights of each predictor [also see Junk et al. (2015)]. Thus, a key component of the analog-based EMOS is that information from additional predictors such as wind direction can be considered for building the training data. The additional weighting of the training data with the inverse distance of the analog metric improves the analog-based EMOS only marginally (not shown).
5. Discussion and conclusions
An analog-based ensemble model output statistics (EMOS) has been proposed to improve the EMOS by an analog-based selection of training data. To calibrate an ensemble forecast, forecast analogs in the past are searched with a multivariate metric, given an optimized predictor combination, which is found by CRPS minimization with a predictor-weighting strategy (Junk et al. 2015). To test the analog-based EMOS for wind energy applications, ensemble forecasts from the ECMWF EPS and wind measurements from six towers in central Europe were used. The analog-based EMOS was compared to EMOS (Gneiting et al. 2005; Thorarinsdottir and Gneiting 2010), the analog ensemble (Hamill and Whitaker 2006), and a recursive and adaptive wind vector calibration (AUV; Pinson 2012) where all the postprocessing methods were applied to the raw ECMWF EPS. The analog-based EMOS outperforms EMOS, the analog ensemble, and AUV in terms of CRPS (with improvements up to 11% with respect to EMOS) and Brier score values for common and rare events. The analog-based EMOS and the analog ensemble are more reliable than EMOS and AUV for rare events.
The skill of the analog-based EMOS strongly depends on optimizing the predictor weights. For future studies, the optimization of the predictor combination could be carried out with additional predictors such as temperature and pressure to possibly improve the skill of the analog-based EMOS. Furthermore, the analog-based EMOS could be compared to the analog ensemble based on only one deterministic model estimate rather than an existing ensemble forecast (Delle Monache et al. 2013b).
We have proposed the analog-based selection for building the training data to generate wind speed predictions with EMOS. This approach could also be applied to EMOS based on a mixture of models as proposed by Lerch and Thorarinsdottir (2013) and Baran and Lerch (2015), or to other methods such as Bayesian model averaging (Raftery et al. 2005) or logistic regression (Wilks 2009). Furthermore, the present study tested the analog-based EMOS only for wind speed at six measurement towers. To generalize the results, more measurement sites should be evaluated and possibly other meteorological variables should be considered.
Acknowledgments
The authors thank the Ministry for Science and Education of Lower Saxony for funds within the reseach project “Ventus Efficiens” (ZN2988). The authors acknowledge the Karlsruhe Institute of Technology, the Royal Netherlands Meteorological Institute, the Lindenberg Meteorological Observatory—Richard Aßmann Observatory (German Meteorological Service), and the Meteorological Institute of the University of Hamburg for providing the measurements of the onshore masts. The Project Management Jülich and the Federal Maritime and Hydrographic Agency are acknowledged for providing measurements of Fino2 and Fino3. Ensemble predictions are provided by ECMWF. This paper has been improved by valuable comments and suggestions of Jakob Messner (University of Innsbruck) and Jason Knievel (National Center for Atmospheric Research). Furthermore, we thank the editor and two reviewers for their valuable comments and suggestions.
REFERENCES
Alessandrini, S., S. Sperati, and P. Pinson, 2013: A comparison between the ECMWF and COSMO Ensemble Prediction Systems applied to short-term wind power forecasting on real data. Appl. Energy, 107, 271–280, doi:10.1016/j.apenergy.2013.02.041.
Baran, S., and S. Lerch, 2015: Log-normal distribution based Ensemble Model Output Statistics models for probabilistic wind-speed forecasting. Quart. J. Roy. Meteor. Soc., doi:10.1002/qj.2521, in press.
Brier, G. W., 1950: Verification of forecasts expressed in terms of probability. Mon. Wea. Rev., 78, 1–3, doi:10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2.
Bröcker, J., and L. Smith, 2007: Increasing the reliability of reliability diagrams. Wea. Forecasting, 22, 651–661, doi:10.1175/WAF993.1.
Delle Monache, L., T. Nipen, Y. Liu, G. Roux, and R. Stull, 2011: Kalman filter and analog schemes to postprocess numerical weather predictions. Mon. Wea. Rev., 139, 3554–3570, doi:10.1175/2011MWR3653.1.
Delle Monache, L., F. A. Eckel, B. Nagarajan, D. Rife, J. Knievel, T. McClung, and K. R. Searight, 2013a: Optimization of the analog ensemble method. Special Symp. on Advancing Weather and Climate Forecasts: Innovative Techniques and Applications, Austin, TX, Amer. Meteor. Soc., 851. [Available online at https://ams.confex.com/ams/93Annual/webprogram/Paper222187.html.]
Delle Monache, L., F. A. Eckel, D. L. Rife, B. Nagarajan, and K. Searight, 2013b: Probabilistic weather prediction with an analog ensemble. Mon. Wea. Rev., 141, 3498–3516, doi:10.1175/MWR-D-12-00281.1.
Efron, B., 1979: Bootstrap methods: Another look at the jackknife. Ann. Stat., 7, 1–26, doi:10.1214/aos/1176344552.
Ferro, C. A., D. S. Richardson, and A. P. Weigel, 2008: On the effect of ensemble size on the discrete and continuous ranked probability scores. Meteor. Appl., 15, 19–24, doi:10.1002/met.45.
Gneiting, T., and A. E. Raftery, 2007: Strictly proper scoring rules, prediction, and estimation. J. Amer. Stat. Assoc., 102, 359–378, doi:10.1198/016214506000001437.
Gneiting, T., A. E. Raftery, A. Westveld, and T. Goldman, 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., 133, 1098–1118, doi:10.1175/MWR2904.1.
Hamill, T., and J. Whitaker, 2006: Probabilistic quantitative precipitation forecasts based on reforecast analogs: Theory and application. Mon. Wea. Rev., 134, 3209–3229, doi:10.1175/MWR3237.1.
Jammalamadaka, S. R., and A. Sengupta, 2001: Topics in Circular Statistics. Vol. 5. World Scientific Publishing Co. Inc., 336 pp.
Junk, C., L. von Bremen, M. Kühn, S. Späth, and D. Heinemann, 2014: Comparison of postprocessing methods for the calibration of 100-m wind ensemble forecasts at off- and onshore sites. J. Appl. Meteor. Climatol., 53, 950–969, doi:10.1175/JAMC-D-13-0162.1.
Junk, C., L. Delle Monache, S. Alessandrini, G. Cervone, and L. von Bremen, 2015: Predictor-weighting strategies for probabilistic wind power forecasting with an analog ensemble. Meteor. Z., doi:10.1127/metz/2015/0659, in press.
Lerch, S., and T. Thorarinsdottir, 2013: Comparison of non-homogeneous regression models for probabilistic wind speed forecasting. Tellus, 65A, 21206, http://dx.doi.org/10.3402/tellusa.v65i0.21206.
Pinson, P., 2012: Adaptive calibration of (u,v)-wind ensemble forecasts. Quart. J. Roy. Meteor. Soc., 138, 1273–1284, doi:10.1002/qj.1873.
Pinson, P., P. McSharry, and H. Madsen, 2010: Reliability diagrams for non-parametric density forecasts of continuous variables: Accounting for serial correlation. Quart. J. Roy. Meteor. Soc., 136, 77–90, doi:10.1002/qj.559.
Raftery, A., T. Gneiting, F. Balabdaoui, and M. Polakowski, 2005: Using Bayesian model averaging to calibrate forecast ensembles. Mon. Wea. Rev., 133, 1155–1174, doi:10.1175/MWR2906.1.
Thorarinsdottir, T., and T. Gneiting, 2010: Probabilistic forecasts of wind speed: Ensemble model output statistics by using heteroscedastic censored regression. J. Roy. Stat. Soc., 173A, 371–388, doi:10.1111/j.1467-985X.2009.00616.x.
Wilks, D. S., 2009: Extending logistic regression to provide full-probability-distribution MOS forecasts. Meteor. Appl., 16, 361–368, doi:10.1002/met.134.
Wilks, D. S., 2011: Statistical Methods in the Atmospheric Sciences. Academic Press, 676 pp.