## 1. Introduction

Accurate forecasts of wind speeds and variability are critical for the growth of the wind power industry. Errors in wind speed forecasts lead to errors in forecasting both demand of electricity and power supply from wind farms (McSharry et al. 2005). Errors in wind energy prediction can significantly increase production costs and loss of income (Fabbri et al. 2005; Cardell and Anderson 2009).The wind power industry needs wind speed forecasts over time scales ranging from a few minutes to several years for a wide range of applications including turbine blade pitch control, conversion systems control, load scheduling, maintenance scheduling, and resource planning (Sfetsos 2000; Costa et al. 2008). All of these activities involve financial risks that are exacerbated by uncertainties in wind speed forecasts.

Accurate wind forecasts are of particular importance to the U.S. wind power industry. The U.S. Department of Energy’s goal of providing 20% of total power from wind by 2030 is considered an Engineering Grand Challenge (U.S. Department of Energy 2008). At such a high level of market penetration, wind farms must be integrated with the grid (Georgilakis 2008). To be better “grid citizens,” wind farms must guarantee a fixed amount of electricity generation over different time scales (Smith et al. 2004). Over the short-to-medium time horizon, the following three time scales are of interest for operation of the utility system and the structure of the competitive electricity markets (Smith et al. 2004):

unit-commitment horizon of 1 day to 1 week,

load-following horizons of 1 h to several hours, and

regulation horizon of 1 min to 1 h.

For this work, we focus on wind speed forecasts for wind power applications in the load-following horizon. Giebel et al. (2011) provide an excellent review of the available technologies used for this purpose. Currently, such forecasts are obtained almost exclusively from statistical models that study past records to identify patterns to make predictions for the future. Statistical forecasting employs a wide range of methods including linear and nonlinear regressions, autoregressive models, moving-average models, autoregressive moving-average models, autoregressive integrated moving-average models, Kalman filters, and spatial-correlation methods (Duran et al. 2007; Costa et al. 2008; Riahy and Abedi 2008; Lei et al. 2009; Taylor et al. 2009). Recent studies have used artificial intelligence methods employing neural networks, fuzzy logic, support vector machines, and hybrid methods for short-term wind speed forecasting (Oztopal 2006; Singh et al. 2007; Kulkarni et al. 2008; Monfared et al. 2009).

Another option for generating wind speed forecasts is mesoscale numerical weather prediction (NWP) models. NWP models consist of discretized conservation equations of mass, momentum, and energy in the atmosphere. They also include representations for subgrid-scale turbulent transfer and microphysical processes. Starting from an initial state determined by observations, the system of equations is numerically integrated to provide a forecast of atmospheric variables including wind speed and direction for a later time. NWP models require large computing infrastructure and are typically run by national and international agencies. Examples of operational NWP models include the North American Mesoscale model (http://www.emc.ncep.noaa.gov/NAM/), the Global Forecast System model (http://www.emc.ncep.noaa.gov/GFS/), and the Rapid Update Cycle (RUC) model (http://ruc.noaa.gov) at the National Centers for Environmental Prediction, and the European Centre for Medium-Range Weather Forecasts model in Europe.

Deterministic single-valued forecasts from NWP models contain uncertainties primarily due to errors in model initialization and/or model imperfections. These uncertainties can be minimized by conducting ensemble simulations where multiple forecasts are generated by (i) adding small perturbations to the initial conditions, (ii) using different parameterizations for geophysical processes, and (iii) using multiple models (Molteni et al. 1996; Toth and Kalnay 1997; Buizza et al. 2005). The ensemble mean can now be considered the most likely forecast (Buizza et al. 1999; Palmer 2000). Recent studies have proposed improvements over the conventional ensemble averaging method. Linear averaging of outputs from individual ensemble members assumes that the individual forecasts are equiprobable and hence can underestimate uncertainty (Taylor 2004; Taylor and Buizza 2006). To improve the quantification and further reduce the effects of uncertainty, the individual ensembles can be calibrated against observations. One such calibration technique is Bayesian model averaging (BMA), introduced by Raftery et al. (2005), where weights are calculated for each ensemble member based on their performance during a training period. The weighted ensemble average forecasts constitute the mean forecast. Studies show that BMA-calibrated ensemble forecasts consistently outperform conventional linear-averaged ensemble forecasts (Raftery et al. 2005; Sloughter et al. 2007). In a recent paper that is very relevant to our research, Sloughter et al. (2010) have shown that BMA calibration significantly improves 48-h forecasts of maximum wind speeds in the U.S. Pacific Northwest.

NWP models are rarely used for short-term wind speed forecasting, primarily because of high computing costs (Giebel et al. 2011). However, NWP models have several distinct advantages over statistical models. First, typical statistical models predict the future based on past history. This approach has an implicit assumption of stationarity that may not be applicable in a changing environment. For example, empirically identified relationships that govern wind speed are likely to change with changes in climate or land use–land cover. Second, many statistical models avoid the stationarity problem by using adaptive techniques where model parameters are updated frequently. This approach may also pose a problem if the memory of the system is short, for example, in a dynamic environment dominated by small-scale turbulence where winds are changing fast. Hence, numerical models with prognostic differential equations representing the temporal evolution of atmospheric dynamic and thermodynamic variables are the best option for forecasting wind speeds in changing environments. In fact, a direct comparative study, albeit for longer time scales, has found that appropriately calibrated NWP model ensembles provide better wind power density forecasts than statistical models alone (Taylor et al. 2009). Finally, another major advantage of NWP models is that they can simultaneously provide wind speed forecasts and information on atmospheric turbulence with no added computational costs. The added turbulence forecasting capability can be beneficial for wind turbine operations (Wagner et al. 2010). Hence, there is a need to explore if numerical models can play a role in providing short-term wind speed forecasts.

In this work we attempt to build NWP model-based forecasting systems to predict wind speeds 1 h ahead of time in the load-following horizon. We test the performance of these forecasting systems by comparing with forecasts from persistence, autoregressive (AR), and autoregressive moving-average (ARMA) models. These three statistical time series models are widely used to predict wind speeds in this time horizon (Giebel et al. 2011). We overcome the main hindrance (i.e., high computational costs) by using the Weather Research and Forecasting Single Column Model (WRFSCM), a one-dimensional column model that is computationally much faster than traditional three-dimensional NWP models. We estimate forecast uncertainty by generating ensemble forecasts. Three different ensembles are built and tested: a purely WRFSCM forecast ensemble and two blended ensembles comprising WRFSCM and time series model forecasts. We calibrate these ensembles using the BMA technique to improve the forecasts. We also conduct sensitivity experiments to test if the nature and length of the training period can affect the calibration and further improve the forecasts.

## 2. Materials and methods

### a. Study site and period

In this study we use wind speed data at a height of 90 m from a meteorological tower located at Chalmers Township (40.41°N, 90.72°W) in Illinois. These data are made available by Illinois Wind, a project of the Illinois Institute for Rural Affairs, which collects and disseminates wind data to promote and assist wind energy projects in Illinois. Data from 1 January to 30 June 2006 are used to “train” the ensembles (i.e., to calculate the weights for individual ensemble members). The ensembles are then used to forecast wind speeds for the 1 July 2006–31 December 2007 period and the forecasts are evaluated using the corresponding observations.

Wind speeds are strong functions of atmospheric stability (Sumner and Masson 2006). We compute vertical gradients of equivalent potential temperature θ* _{e}* between the surface and 975 hPa to estimate stability as a function of local standard time (LST). Based on this analysis (Traiteur 2011), we divide the study period into four different regimes: (i) diurnal unstable (0900–1700 LST), (ii) weakly unstable evening transition (1800–2000 LST), (iii) nocturnal stable (2100–0500 LST), and finally (iv) weakly stable morning transition (0600–0900 LST). We estimate four different sets of weights corresponding to these four stability regimes.

### b. WRFSCM description

We use the single-column version of the WRFSCM to generate the forecasts. WRFSCM is a one-dimensional stand-alone implementation of the WRF mesoscale numerical weather prediction model and is available as a part of the standard WRF distribution (Skamarock et al. 2005; Hacker et al. 2007). WRFSCM runs on a 3 × 3 stencil with periodic lateral boundary conditions in *x* and *y*. It solves the same conservation equations for mass, momentum, energy, and moisture in the atmosphere as the three-dimensional WRF model using a wide range of parameterizations for turbulent mixing and closure, radiation, microphysics, and surface fluxes. The options include different subgrid parameterizations for the surface layer and the planetary boundary layer (PBL) that can simulate different stability regimes.

The model domain is centered on the location of the tower in Chalmers Township. The vertical domain spans from the surface to a height of 12 km and is discretized with 120 levels. With seven to eight layers in the lowest 100 m, the model is capable of adequately resolving the lower PBL where the wind speed sensor is located. The model forecasts are interpolated to 90-m height to compare with the observations. Because of the high vertical resolution of the model (~15 m), the interpolation errors are likely to be small.

The model is initialized with horizontal winds, potential temperature, moisture, surface pressure, soil temperature, and soil moisture from the hourly updated 20-km-resolution RUC 0-h forecast data for the RUC model grid cell containing the study site. The RUC 0-h forecast is essentially a three-dimensional real-time meteorological analysis dataset that assimilates observations from a wide range of surface, tower, and remote sensing instruments. The model is numerically integrated at a 1-s time step for 1 h to generate the forecast.

In a model intercomparison study, Bosveld et al. (2010) have shown that the performance of WRFSCM with different PBL and surface layer parameterizations is comparable with other state-of-the-art one-dimensional numerical models. We conduct a brief comparison of the WRFSCM with the three-dimensional version of the WRF model for a 7-day period from 0000 UTC 1 August to 0000 UTC 8 August 2008. The model parameterizations are the same as the baseline WRFSCM simulation described below. The three-dimensional simulation is conducted over a 50 km × 50 km domain discretized with 1-km horizontal grid spacing. This domain is successively nested within two coarse-resolution domains: 160 km × 160 km discretized with a 4-km grid and 480 km × 480 km discretized with a 16-km grid.

### c. Ensemble construction

We construct a 21-member ensemble of WRFSCM forecasts (Table 1). The ensemble consists of a baseline run (member 1), 12 initial condition perturbation runs (members 2–13), and 8 model physics variation runs (members 14–21). The number of members in the ensemble is constrained by the availability of model physics options in WRF. The ensemble size can be increased beyond 21 in future experiments if more parameterizations become available. The goal of using a large ensemble is to include all possible sources of variability. We discuss later how the BMA calibration results can be used as a model selection tool to reduce the size of the ensemble by eliminating models that do not significantly contribute to the ensemble mean forecast.

Characteristics of the 21-member WRFSCM ensemble with different initial condition perturbations and surface-layer and boundary layer parameterizations.

We run the baseline case with the Monin–Obukhov surface layer and the Yonsei University (YSU) boundary layer physics schemes and initialize it with the RUC model analysis as described above. We use the Rapid Radiative Transfer Model (RRTM) scheme for longwave radiation, the Dudhia scheme for shortwave radiation, and the Unified Noah land surface model to represent land surface processes. Descriptions of and references for these parameterizations are available in Skamarock et al. (2005).

For the perturbation cases we run the baseline model with different initial conditions. The role of the perturbation runs is to incorporate realistic errors in initialization in the ensemble to produce a realistic estimate of uncertainty in the forecasts. Typically, random numbers are used to perturb the initial conditions for ensemble simulations. However, Roquelaure and Bergot (2008) have shown that forecast uncertainty and errors in initialization are correlated with the intrinsic variability of the initial conditions. Additionally, the calibration process requires that the perturbations for each individual ensemble member must be the same for all the simulations. Following the approach of Roquelaure and Bergot (2008), instead of using random perturbations in the initial vertical profiles, we quantify perturbations as the standard deviation of the data estimated from the 3-month-long study period. The initial conditions are obtained by adding a perturbation equal to ±1 standard deviation to the vertical profiles of one of the following parameters: air temperature, air specific humidity, zonal and meridional winds, and soil moisture and temperature. For example, in the initial temperature *T*(*k*) at the *k*th atmospheric level for ensemble member 2 is given by *T*(*k*) *=* T0(*k*) *− σ*(*k*), where T0(*k*) and *σ*(*k*) are the observed unperturbed temperature and standard deviation of temperature calculated from RUC analysis, respectively.

To account for uncertainties due to model imperfections, we also conduct a set of eight model physics runs by using various combinations of four surface layer physics schemes and four PBL schemes. While doing so, we make sure that the surface-layer and PBL schemes are compatible with each other.

In addition to the WRFSCM simulations, we also use forecasts from three widely used statistical time series models: persistence, AR, and ARMA models (von Storch and Zwiers 2001). The persistence model assumes that wind speed remains constant for the entire forecast period of 1 h. The AR model forecasts future wind speed as a linear combination of past wind speeds and white noise. The ARMA model is an AR model superposed by a moving average of a white-noise series. We conduct sensitivity studies to identify the parameters that produce the best forecasts with these models. Results show that an AR model of order 75 and an ARMA model with order 5 for the autoregressive part and order 50 for the moving-average part produce the lowest MAE.

Using the numerical and statistical models, we build and test the following three ensembles:

ensemble 1—21 WRFSCM forecasts,

ensemble 2—21 WRFSCM forecasts + persistence forecast, and

ensemble 3—21 WRFSCM forecasts + forecasts from the persistence, AR, and ARMA models.

### d. Ensemble calibration

The forecasts from the individual ensemble members are different estimates of wind speeds for a particular time at the given location. However, these estimates may not be equally likely and hence the ensembles need to be calibrated. We calibrate the ensembles using the BMA technique following the algorithm developed by Raftery et al. (2005) that has been used for many applications. A public-domain version of code for this application is available online (http://cran.r-project.org/web/packages/ensembleBMA). For the sake of brevity we provide only a short description of the algorithm. The BMA method learns from a training dataset which ensemble members are the most efficient in generating the most accurate forecast probability density function (pdf). During this training, BMA iteratively calculates and assigns a weight to each ensemble member to increase the ensemble reliability. The BMA weights and the variance of the BMA pdf are estimated by maximum likelihood (Fisher 1922) from the training data. Due to algebraic simplicity and numerical stability, the log-likelihood function is maximized using the expectation-maximization algorithm (Dempster et al. 1977; McLachlan and Krishnan 1997). While a number of different options are available, we choose the normal-fit, no-bias-correction method with 250 iterations to train our ensembles. Sensitivity studies with the 2-month training window and 1-month evaluation period show that this combination with support restricted to positive wind speeds produces the best fit with the observations. Using data from the 1 January–30 June 2006 period, we estimate four sets of weights for each ensemble member in the three ensembles corresponding to the four different stability regimes discussed earlier.

We study the importance of training on the calibrated forecasts by using the following training periods:

seven training periods of various lengths all ending on 30 June 2006: 2 weeks (17–30 June 2006), 1 month (1–30 June 2006), 2 months (1 May–30 June 2006), 3 months (1 April–30 June 2006), 4 months (1 March–30 June 2006), 5 months (1 February–30 June 2006), and 6 months (1 January–30 June 2006);

a 2-month moving training period where the ensemble forecast for each day is calibrated with data from the previous 60 days; and

a 2-week moving training period where the ensemble forecast for each day is calibrated with data from the previous 14 days.

### e. Forecast evaluation

We conduct ensemble simulations with the WRFSCM and the statistical time series models to forecast 90-m wind speeds at the Chalmers Township tower location for the 1–31 August 2008 period. The linear mean of the forecasts from individual ensemble members constitutes the uncalibrated ensemble forecast. We also calculate the BMA-calibrated ensemble forecasts using the weights estimated from the training simulations. We quantitatively evaluate the performance of the uncalibrated and calibrated ensemble forecasts by comparing them with observations and computing the mean absolute error (MAE) around the median (Gneiting 2011) and bias. Following Sloughter et al. (2010), we also calculate the continuous ranked probability score (CRPS), coverage, and width of the ensemble forecasts. Consider a 21-member calibrated ensemble. The probability that the observation is below the ensemble range is

The statistical significance of the performance improvements are estimated using the Student’s *t* test with confidence level *p* < 0.001. For reference purposes, we also compare the performance of the three ensemble forecasts with the persistence, AR, and ARMA forecasts.

## 3. Results

### a. WRFSCM evaluation

Figure 1 compares the outputs from the WRFSCM baseline and WRF three-dimensional simulation. Because of the lack of spatial structure in the horizontal, WRFSCM cannot properly simulate the effects of horizontal transport of momentum through the tower site. Despite this drawback, the performance of WRFSCM is qualitatively comparable to the WRF model. The correlation between the two models is 0.52, significant at *p* < 0.001. The WRFSCM simulations tend to underpredict the wind speeds relative to WRF and the observations. The subsequent sections discuss how BMA calibration can improve the WRFSCM forecast quality by reducing the negative bias.

### b. Forecast evaluation

We show as an example the vertical profiles of wind speed forecasts from all models for 0900 LST 1 August 2008 in Fig. 2. All the simulated profiles are qualitatively similar, increasing logarithmically with height. The corresponding pdf from the BMA-calibrated ensembles for the same period is more evenly distributed around the observed value than the uncalibrated ensemble pdf. Thus, simple visual inspection indicates that BMA calibration improves the ensemble predictability for this example forecast. Obviously, this is just one example forecast and the results for the other forecasts can be different. Since it is impossible to plot and analyze the results of the thousands of individual forecasts, we conduct a comprehensive evaluation of all forecasts by computing error statistics (Table 2).

Example 90-m wind speed forecasts for Chalmers Township at 0900 LST 1 Jul 2006. (left) Forecast wind speed profiles from 21 simulations. The black plus signs represent the observations at 30-, 60-, and 90-m heights. (right) The pdfs of the uncalibrated (gray vertical bars) and calibrated (black line) ensemble forecasts. The arrow on the *x* axis denotes the value of the observation.

Citation: Journal of Applied Meteorology and Climatology 51, 10; 10.1175/JAMC-D-11-0122.1

Example 90-m wind speed forecasts for Chalmers Township at 0900 LST 1 Jul 2006. (left) Forecast wind speed profiles from 21 simulations. The black plus signs represent the observations at 30-, 60-, and 90-m heights. (right) The pdfs of the uncalibrated (gray vertical bars) and calibrated (black line) ensemble forecasts. The arrow on the *x* axis denotes the value of the observation.

Citation: Journal of Applied Meteorology and Climatology 51, 10; 10.1175/JAMC-D-11-0122.1

Example 90-m wind speed forecasts for Chalmers Township at 0900 LST 1 Jul 2006. (left) Forecast wind speed profiles from 21 simulations. The black plus signs represent the observations at 30-, 60-, and 90-m heights. (right) The pdfs of the uncalibrated (gray vertical bars) and calibrated (black line) ensemble forecasts. The arrow on the *x* axis denotes the value of the observation.

Citation: Journal of Applied Meteorology and Climatology 51, 10; 10.1175/JAMC-D-11-0122.1

MAE, mean bias, CRPS, coverage, and width of forecasts under four different environmental stability regimes for the forecasts from the persistence (P), autoregressive (AR), and autoregressive moving-average (ARMA) models, as well as the uncalibrated ensembles 1–3 and those same ensembles calibrated using data from seven different training periods ranging from 2 weeks to 6 months.

The results show that the calibrated ensembles 2 and 3 consistently provide better forecasts than the statistical time series models and the uncalibrated ensembles. We primarily focus on the MAE and CRPS because positive and negative biases in individual forecasts may cancel each other out and lead to a small mean bias. Note that the CRPS reduces to AE for point forecasts (e.g., persistence, AR, and ARMA). The calibrated ensemble-2 forecasts show statistically significant (*p* < 0.001) improvements in MAE by 3%–12% over statistical time series models and by 19%–44% over the uncalibrated ensembles. Similarly, the CRPS scores show up to 20% improvement in the calibrated ensemble 3 forecasts over the uncalibrated ensembles.

The key to the success of ensembles 2 and 3 is the adaptive blending of numerical and statistical models. Ensemble 1, the ensemble that is purely based on numerical models forecasts, performs poorly compared to the three time series models. The results show improvement in ensembles 2 and 3 where forecasts from both numerical and statistical models are combined. This improvement increases dramatically when the ensembles are calibrated. Calibrated ensembles 2 and 3 consistently outperform the uncalibrated ensembles as well as the three time series models. We note here that for ensemble-1 calibration does not lead to any statistically significant improvement in MAE and only a small improvement in CRPS, highlighting the importance of time series models in producing accurate ensemble forecasts.

The calibrated ensembles are a diverse, if skewed, mix of information from both numerical and statistical models. For example, consider ensemble 2 with a 2-month training period (Table 3). The persistence model contributes 68%–73% of the calibration weights in ensemble 2 for different regimes. However, the WRFSCM forecasts also contribute significantly to the ensemble mean. Hence, it can be argued that the additional information from the WRFSCM forecasts leads to improved performance of ensemble 2 relative to the persistence forecast.

Weights calculated from the BMA calibration for the individual ensemble members under four different environmental conditions using the 2-month training period (May–June 2006). All weights <0.01 are reported as 0. Ensemble members with weights <0.01 in all regimes are not shown.

In general, the differences between ensembles 2 and 3 are not statistically significant, implying that the addition of the AR and ARMA model forecasts does not improve the performance of ensemble 3 over ensemble 2. This is because there is no statistically significant difference between the persistence, AR, and ARMA models except during the evening transition regime. Thus, including forecasts from the AR and ARMA models adds little new information to the ensembles.

The performances of individual ensemble members are widely divergent. The calibration weights for ensemble 2 with a 2-month training period (Table 3) show that the persistence model consistently outperforms the numerical models in all four stability regimes. Only 8 of the 21 WRFSCM simulations provide meaningful results with some degree of consistency. Therefore, the 13 remaining WRFSCM configurations can probably be eliminated from the ensemble without sacrificing performance in future experiments. In this sense, BMA calibration can be a valuable tool for model selection in ensemble forecasting. Even though this study evaluates WRFSCM performance over the entire seasonal cycle, it is important to keep in mind that the 13 underperforming WRFSCM configurations might perform better in other locations. Hence, rigorous calibration must be done prior to using BMA as a model selection tool.

As described earlier, we calculate the coverage and width of the 90% central prediction interval to compare across the ensembles. The impacts of calibration on the coverage and width are strongest on ensembles 2 and 3 especially during the nocturnal stable and morning transition regimes. In both cases, the width of the 90% prediction interval decreases but that is always accompanied by a decrease in coverage as well. Sloughter et al. (2010) also found a similar trade-off between a sharper interval and reduced coverage due to BMA calibration.

### c. Training period sensitivity study

The length of the training period seems to have a fundamental impact on mean forecast accuracy during the morning and evening transition periods (Table 2). For example, in the morning-calibrated ensemble-3 forecasts, the 2-week training period produces a mean absolute error of 1.34 m s^{−1} that is significantly reduced when the training period is increased to 1 month. Further increasing the training period does not significantly reduce the MAE. For the evening transition period, the calibrated ensemble-2 and ensemble-3 forecasts show statistically significant improvement as the training period is increased from 2 weeks to 2 months but no improvement is observed with longer training periods. Hence, a 2-month-long training period is probably optimal for calibrating these ensembles.

In contrast with the transition regimes, a 2-week-long training period seems to be sufficient for the day and night regimes. According to our classification, the morning and evening transition regimes are 3 h long while the day and night regimes are 9 h long. A 2-week training period consists of 42 training data points for the morning and evening regimes but 126 data points for the day and night regimes. Thus, a 2-week training period contains a reasonably large number of data points to calibrate the ensembles during the day and night stability regimes but not enough to calibrate the ensembles during the transition regimes.

The previous results are based on static training windows where first the calibration weights are calculated from a training dataset and then those weights are used to calculate the mean forecast for all subsequently generated ensembles. We conduct sensitivity studies with 2-week and 2-month moving training windows where the weights are updated for each forecast using the latest information on model performance. For example, in the 2-month fixed training window, each ensemble for a particular forecast period is calibrated with weights calculated from a 2-month period immediately preceding that forecast period.

The outcomes of these sensitivity studies are not shown but summarized here for brevity. All calibrated forecasts lead to a statistically significant improvement over the corresponding uncalibrated forecasts. However, the improvements with moving windows are statistically similar to those obtained with fixed windows. A comparison of the calibration weights using the two different methods reveals a statistically significant positive correlation of 0.89 (*p* < 0.001). This implies the performance of the models does not change with time (e.g., a model performs equally well in the summer as in the winter). This is especially true because only a small subset of the models actually contribute meaningful information to the ensemble mean forecast. Moreover, the ensemble means are dominated by forecasts from statistical models. Hence, updating the weights at every forecast period using a moving training window instead of a fixed window does not improve the mean ensemble performance.

### d. Computational efficiency

The major constraint in using NWP models is computational cost. However, single-column models like WRFSCM are several orders of magnitude faster than traditional three-dimensional models because they have fewer grid points in the horizontal. Each hour-long WRFSCM integration takes a few seconds on a LINUX personal computer (PC). In contrast, a single hourly forecast with the WRF three-dimensional run described earlier takes 21.9 min. Computing an entire 21-member ensemble takes between 30 and 110 s (Fig. 3). The mean wall-clock time is 61 s with 99.7% of the ensembles taking less than 80 s. Calibrating the ensemble, which involves simply computing a weighted mean, takes less than 1 s. Thus, the WRFSCM ensembles are capable of producing forecasts with sufficient lead time. Computational speed can be further increased by parallelizing the code and/or by eliminating ensemble members with low calibration weights.

Histogram of wall-clock times required to generate a 22-member ensemble forecast, including 21-h-long WRFSCM simulations on a LINUX PC. The additional time required for the calibration of the ensemble is <1 s.

Citation: Journal of Applied Meteorology and Climatology 51, 10; 10.1175/JAMC-D-11-0122.1

Histogram of wall-clock times required to generate a 22-member ensemble forecast, including 21-h-long WRFSCM simulations on a LINUX PC. The additional time required for the calibration of the ensemble is <1 s.

Citation: Journal of Applied Meteorology and Climatology 51, 10; 10.1175/JAMC-D-11-0122.1

Histogram of wall-clock times required to generate a 22-member ensemble forecast, including 21-h-long WRFSCM simulations on a LINUX PC. The additional time required for the calibration of the ensemble is <1 s.

Citation: Journal of Applied Meteorology and Climatology 51, 10; 10.1175/JAMC-D-11-0122.1

## 4. Conclusions and discussion

In this work we attempt to develop a computationally efficient and accurate method to provide wind speed forecasts over a 1-h time scale for wind energy applications. We find that the best results are obtained from a blended ensemble consisting of WRFSCM and time series model forecasts that is calibrated with the BMA technique using data from a potential wind-farm site. This system provides significantly more accurate forecasts than uncalibrated ensembles and common statistical time series models like persistence, AR, and ARMA during all environmental stability regimes.

Physically based models that solve conservation equations of atmospheric dynamics and thermodynamics have the potential to consistently produce better wind speed forecasts than statistical models, especially in rapidly changing environments such as the turbulent surface layer. However, this potential is rarely achieved due to a range of factors including imperfections in model parameterizations and initialization. This study demonstrates that there is a strong need for improving the capabilities of numerical models to simulate the stable boundary layer and the stable–unstable transition periods (Draxl et al. 2010; Shin and Hong 2011). Until numerical models improve significantly, statistical models will continue to be the method of choice for forecasting nocturnal wind speeds. One area of improvement that may have an immediate impact is the issue of model initialization. Holtslag et al. (2007) have found that land–atmosphere interactions are key to better performance of one-dimensional models in stable environments. Hence, initialization with local soil moisture and temperature data, instead of RUC model analyses, is likely to improve the performance of the WRFSCM forecasts. Another major advantage about NWP models is they can also provide information about atmospheric turbulence that can be of critical importance in wind farm operations. We are currently investigating this topic for a follow-up paper.

Our forecasting systems were tested over a flat terrain. A major limitation of these systems is that single-column models like WRFSCM are likely to perform poorly in complex terrain. Three-dimensional models are essential to simulating the effects of topography on local meteorology. Even though many wind farms are located in regions with complex topography, more and more wind farms are being constructed in relatively flat lands. In fact, the Midwest and the Great Plains are among of the fastest growing regions in terms of wind power (U.S. Department of Energy 2011). Hence, this forecasting system can develop into a valuable tool for the U.S. wind power industry.

Currently, 0–3-h wind speed forecasts are almost exclusively generated with statistical models (Giebel et al. 2011). This work shows that a computationally efficient short-term forecasting system can be constructed using an ensemble of numerical models. When appropriately calibrated, this system can be quite skillful in forecasting wind speeds and reducing forecast uncertainty in certain environments. This is a case study and the system must obviously be tested at other locations. In particular additional testing with the gamma distribution used by Sloughter et al. (2010) instead of the normal distribution is essential. Even then, the results are likely to trigger a debate in the scientific community about the role of numerical model ensembles in short-term wind speed forecasting.

## REFERENCES

Bosveld, F. C., P. Baas, and A. A. M. Holtslag, 2010: The Third GABLS SCM Intercomparison and Evaluation Case. Preprints,

*19th Symp. on Boundary Layer Turbulence,*Keystone, CO, Amer. Meteor. Soc., 5.2. [Available online at http://ams.confex.com/ams/pdfpapers/172634.pdf.]Bougeault, P., and P. Lacarrere, 1989: Parameterization of orography-induced turbulence in a mesobeta-scale model.

,*Mon. Wea. Rev.***117**, 1872–1890.Buizza, R., M. Miller, and T. N. Palmer, 1999: Stochastic representation of model uncertainties in the ECMWF Ensemble Prediction System.

,*Quart. J. Roy. Meteor. Soc.***125**, 2887–2908.Buizza, R., and Coauthors, 2005: A comparison of the ECMWF, MSC, and NCEP global ensemble prediction systems.

,*Mon. Wea. Rev.***133**, 1076–1097.Cardell, J. B., and C. L. Anderson, 2009: Estimating the system costs of wind power forecast uncertainty.

*Proc. Power and Energy Society General Meeting,*Calgary, AB, Canada, IEEE.Costa, A., A. Crespo, J. Navarro, G. Lizcano, H. Madsen, and E. Feitosa, 2008: A review on the young history of the wind power short-term prediction.

,*Renew. Sustain. Energy Rev.***12**, 1725–1744.Dempster, A. P., N. M. Laird, and D. B. Rubin, 1977: Maximum likelihood from incomplete data via the EM algorithm.

,*J. Roy. Stat. Soc.***39B**, 1–38.Draxl, C., A. N. Hahmann, A. Pena, J. N. Nissen, and G. Giebel, 2010: Validation of boundary-layer winds from WRF mesoscale forecasts with applications to wind energy forecasting. Preprints,

*19th Symp. on Boundary Layers and Turbulence,*Keystone, CO, Amer. Meteor. Soc., 1B.1. [Available online at http://ams.confex.com/ams/pdfpapers/172440.pdf.]Duran, M. J., D. Cros, and J. Riquelme, 2007: Short-term wind power forecast based on ARX models.

,*J. Energy Eng.***133**, 172–180.Fabbri, A., T. G. S. Roman, J. R. Abbad, and V. H. M. Quezada, 2005: Assessment of the cost associated with wind generation prediction errors in a liberalized electricity market.

,*IEEE Trans. Power Syst.***20**, 1440–1446.Fisher, R. A., 1922: On the mathematical foundations of theoretical statistics.

,*Philos. Trans. Roy. Soc. London***222A**, 309–368.Georgilakis, P. S., 2008: Technical challenges associated with the integration of wind power into power systems.

,*Renew. Sustain. Energy Rev.***2**, 852–863.Giebel, G., R. Brownsword, G. Kariniotakis, M. Denhard, and C. Draxl, 2011: The state-of-the-art in short-term prediction of wind power. Project ANEMOS Deliverable Rep. D1.2, Roskilde, Denmark. [Available online at http://www.prediktor.dk/publ/GGiebelEtAl-StateOfTheArtInShortTermPrediction_ANEMOSplus_2011.pdf.]

Gneiting, T., 2011: Making and evaluating point forecasts.

,*J. Amer. Stat. Assoc.***106**, 746–762.Hacker, J. P., J. L. Anderson, and M. Pagowski, 2007: Improved vertical covariance estimates for ensemble-filter assimilation of near-surface observations.

,*Mon. Wea. Rev.***135**, 1021–1036.Holtslag, A. A. M., G. J. Steeneveld, and B. J. H. van de Wiel, 2007: Role of land-surface temperature feedback on model performance for the stable boundary layer.

,*Bound.-Layer Meteor.***125**, 361–376.Hong, S.-Y., and S.-W. Kim, 2008: Stable boundary layer mixing in a vertical diffusion scheme. Preprints,

*18th Symp. on Boundary Layers and Turbulence,*Stockholm, Sweden, Amer. Meteor. Soc., 16B.2. [Available online at http://ams.confex.com/ams/pdfpapers/140120.pdf.]Janjić, Z. I., 1994: The step-mountain eta-coordinate model: Further developments of the convection, viscous sublayer and turbulent closure schemes.

,*Mon. Wea. Rev.***122**, 927–945.Janjić, Z. I., 2001: Nonsingular implementation of the Mellor–Yamada level 2.5 scheme in the NCEP meso model. NOAA/NWS/NCEP Office Note 437, 61 pp. [Available online at http://www.emc.ncep.noaa.gov/officenotes/newernotes/on437.pdf.]

Kulkarni, M. A., S. Patil, G. V. Rama, and P. N. Sen, 2008: Wind speed prediction using statistical regression and neural network.

,*J. Earth Syst. Sci.***117**, 457–463.Lei, M., L. Shiyan, J. Chuanwen, L. Hongling, and Z. Yan, 2009: A review on the forecasting of wind speed and generated power.

,*Renew. Sustain. Energy Rev.***13**, 915–920.McLachlan, G. J., and T. Krishnan, 1997:

*The EM Algorithm and Extensions.*John Wiley and Sons, 274 pp.McSharry, P. E., S. Bouwman, and G. Bloemhof, 2005: Probabilistic forecasts of the magnitude and timing of peak electricity demand.

,*IEEE Trans. Power Syst.***20**, 1166–1172.Molteni, F., R. Buizza, T. N. Palmer, and T. Petroliagis, 1996: The ECMWF Ensemble Prediction System: Methodology and validation.

,*Quart. J. Roy. Meteor. Soc.***122**, 73–119.Monfared, M., H. Rastegar, and H. M. Kojabadi, 2009: A new strategy for wind speed forecasting using artificial intelligent methods.

,*Renew. Energy***34**, 845–848.Monin, A. S., and A. M. Obukhov, 1954: Basic laws of turbulent mixing in the surface layer of the atmosphere.

,*Contrib. Geophys. Inst. Acad. Sci. USSR***151**, 163–187.Nakanishi, M., and H. Niino, 2009: Development of an improved turbulence closure model for the atmospheric boundary layer.

,*J. Meteor. Soc. Japan***87**, 895–912.NARUC, 2007: FERC Order 890: What does it mean for the West? National Association of Regulatory Utility Commissioners (NARUC), National Wind Coordinating Collaborative (NWCC), and the Western Governors’ Association, 8 pp. [Available online at http://www.nationalwind.org/assets/publications/ferc890.pdf.]

Oztopal, A., 2006: Artificial neural network approach to spatial estimation of wind velocity data.

,*Energy Convers. Manage.***47**, 395–406.Palmer, T. N., 2000: Predicting uncertainty in forecasts of weather and climate.

,*Rep. Prog. Phys.***63**, 71–116.Raftery, A. E., T. Gneiting, F. Balabdaoui, and M. Polakowski, 2005: Using Bayesian model averaging to calibrate forecast ensembles.

,*Mon. Wea. Rev.***133**, 1160–1174.Riahy, G. H., and M. Abedi, 2008: Short term wind speed forecasting for wind turbine applications using linear prediction method.

,*Renew. Energy***33**, 35–41.Roquelaure, S., and T. Bergot, 2008: A local ensemble prediction for fog and low clouds: Construction, Bayesian model averaging calibration, and validation.

,*J. Appl. Meteor. Climatol.***47**, 3072–3088.Sfetsos, A., 2000: A comparison of various forecasting techniques applied to mean hourly wind speed time series.

,*Renew. Energy***21**, 23–35.Shin, H. H., and S.-Y. Hong, 2011: Intercomparison of planetary boundary layer parameterizations in the WRF model for a single day from CASES-99.

,*Bound.-Layer Meteor.***139**, 261–281, doi:10.1007/s10546-010-9583-z.Singh, S., T. S. Bhatti, and D. P. Kothari, 2007: Wind power estimation using artificial neural network.

,*J. Energy Eng.***133**, 46–52.Skamarock, W. C., J. B. Klemp, J. Dudhia, D. O. Gill, D. M. Barker, W. Wang, and J. D. Powers, 2005: A description of the Advanced Research WRF version 2. NCAR Tech. Note TN-468+STR, 88 pp.

Sloughter, J. M., A. E. Raftery, T. Gneiting, and C. Fraley, 2007: Probabilistic quantitative precipitation forecasting using Bayesian model averaging.

,*Mon. Wea. Rev.***135**, 3209–3220.Sloughter, J. M., T. Gneiting, and A. E. Raftery, 2010: Probabilistic wind speed forecasting using ensembles and Bayesian model averaging.

,*J. Amer. Stat. Assoc.***105**, 25–35, doi:10.1198/jasa.2009.ap08615.Smith, J. C., E. A. DeMeo, B. Parsons, and M. Milligan, 2004: Wind power impacts on electric power system operating costs: Summary and perspective on work to date. NREL/CP-500-35946, National Renewable Energy Laboratory, 13 pp. [Available online at http://www.nrel.gov/docs/fy04osti/35946.pdf.]

Sukoriansky, S., B. Galperin, and V. Perov, 2006: A quasi-normal scale-elimination model of turbulence and its application to stably stratified flows.

,*Nonlinear Processes Geophys.***13**, 9–22.Sumner, J., and C. Masson, 2006: Influence of atmospheric stability on wind turbine power performance curves.

,*J. Sol. Energy Eng.***128**, 531–538.Taylor, J. W., 2004: Forecasting weather variable densities for weather derivatives and energy prices.

*Modelling Prices in Competitive Electricity Markets,*D. W. Bunn, Ed., Wiley, 307–330.Taylor, J. W., and R. Buizza, 2006: Density forecasting for weather derivative pricing.

,*Int. J. Forecasting***22**, 29–42.Taylor, J. W., P. E. McSharry, and R. Buizza, 2009: Wind power density forecasting using ensemble predictions and time series models.

,*IEEE Trans. Power Syst.***24**, 775–782.Toth, Z., and E. Kalnay, 1997: Ensemble forecasting at NCEP and the breeding method.

,*Mon. Wea. Rev.***125**, 3297–3319.Traiteur, J., 2011: A short-term wind speed forecasting system for wind power applications. M.S. thesis, Dept. of Atmospheric Sciences, University of Illinois at Urbana–Champaign, 76 pp.

U.S. Department of Energy, 2008: 20% wind energy by 2030: Increasing wind energy’s contribution to U.S. electricity supply. DOE/GO-102008-2567, 248 pp. [Available online at http://www.20percentwind.org/20percent_wind_energy_report_revOct08.pdf.]

U.S. Department of Energy, cited 2011: U.S. installed capacity and wind project locations. [Available online at: http://www.windpoweringamerica.gov/wind_installed_capacity.asp.]

von Storch, H., and F. W. Zwiers, 2001:

*Statistical Analysis in Climate Research.*Cambridge University Press, 484 pp.Wagner, R., M. S. Courtney, T. J. Larsen, and U. Schmidt Paulsen, 2010: Simulation of shear and turbulence impact on wind turbine performance. Risø DTU National Laboratory for Sustainable Energy, Riskilde, Denmark, 55 pp.