This paper evaluates the quality of neural network classifiers for wind speed and wind gust prediction with prediction lead times between +1 and +24 h. The predictions were realized based on local time series and model data. The selection of appropriate input features was initiated by time series analysis and completed by empirical comparison of neural network classifiers trained on several choices of input features. The selected input features involved day time, yearday, features from a single wind observation device at the site of interest, and features derived from model data. The quality of the resulting classifiers was benchmarked against persistence for two different sites in Switzerland. The neural network classifiers exhibited superior quality when compared with persistence judged on a specific performance measure, hit and false-alarm rates.
Present forecast systems are capable of providing realistic approximations of the dynamics in the atmosphere. To guarantee the stability of the numerical methods, the rather sparse and noisy observations of the atmosphere have to be represented in a consistent manner by a data assimilation process. The consequent diffusion involved in such a data assimilation process down-weights small-scale phenomena in favor of consistency. To avoid this loss of small-scale information, local data-based methods are presented in this paper.
The crucial aspects of the underlying data are unknown statistical distribution densities, ambiguity, and imbalances. Sources of ambiguity are observation errors and the highly dynamical and nonhomogeneous nature of the atmosphere. Strong winds are much less frequent than weak winds. This is manifest as a strong imbalance in the frequencies of strong and weak wind events. According to Kretzschmar (2002), the skew between weak and severe winds is of the order of 100:1.
Previous data-based approaches for local wind prediction include the publications of Sfetsos (2000), Alexiadis et al. (1998), Mohandes et al. (1998), Kariniotakis et al. (1996), and Beyer et al. (1994). According to their results, neural network function approximators outperformed autoregressive models for the prediction of average local wind speeds for several averaging times, prediction leads, and sites. Nevertheless, none of these studies explicitly addressed the issues of wind gusts and severe winds, and none of these studies referred explicitly to the ambiguity and the imbalances involved in the local wind time series.
This paper addresses the prediction of average hourly local ground wind speed and hourly local ground wind gust maximum values for prediction lead times between +1 and +24 h. Often, a user is more interested in intervals or classes of wind values rather than in the exact values. For this reason, in this paper the local wind prediction task is formulated as classification task to match the desires of such a user as closely as possible. The application relies on neural network classifiers that were tuned according to a cost-sensitive error function. Neural networks were chosen to account for the ambiguity inherent in the data source, and the cost-sensitive error function was employed to account for the rare occurrence of strongest winds. Neural networks have been reported to be successful in several meteorological applications, for example, Marzban (2003), Reusch and Alley (2002), and Schoof and Pryor (2001).
The paper is organized as follows. Section 2 outlines the concepts involved in the prediction system, section 3 presents the neural network classifiers, section 4 reports on the input feature selection, section 5 gives the results, and section 6 concludes the paper and formulates a brief outlook for future research.
This paper investigates the problem of the prediction of local ground winds with a focus on hourly wind speed averages and hourly wind gust maximum values. This section reports on the concepts involved in this application.
The local wind prediction was realized by several classifiers that predicted the classes of each specific variable (wind speed and wind gust) for two sites in Switzerland, namely, Geneva and Sion. According to Fig. 1, Geneva is located in a rather flat area next to Lake Geneva and Sion is situated in a mountain valley. For the prediction of hourly wind speed and wind gusts with lead times from +1 to +24 h, a total of 96 individual approximators would have to be utilized for the final system to predict each lead of each variable at each site. However, for practical reasons the search for the best strategy on how to design and tune those classifiers was restricted to +1-, +6-, +12-, and +24-h lead times only. Another strategy is to predict only one time step ahead and to use this prediction to calculate the prediction of the following time step and so on. Consequently, such an iterative approach requires one to predict all variables that may be used as inputs. The main reason for not having used such an iterative approach for this study is that it was not clear, at the beginning which input features were of relevance. According to the results in Kariniotakis et al. (1996), the iterative approaches tested did not outperform the noniterative approaches for the prediction of local winds.
For most applications, the exact wind speed and wind gust values (m s−1) are not of interest—the user is more interested in classes of wind speed and wind gust values. A common standard is to use integer Beaufort values as wind speed and wind gust classes. However, a slightly altered class definition was used, where neither small nor large Beaufort values were distinguished. Table 1 gives the definitions of the wind speed and wind gust classes used in this paper.
For the prediction of wind speed and wind gusts, higher classes, that is, strong or severe winds, are of major interest. The evaluation of the prediction quality of strong wind speeds and wind gusts was based on hit and false-alarm rates. The following definitions were used. For each class Cj, the hit rate RHitj is defined as
where ncj is the number of correct hits for the class Cj, and |Cj| is the cardinality, that is, the number of samples in class Cj. The false-alarm rate RFAj is defined as
where n fj is the number of false hits for class Cj (i.e., the number of samples that were assigned to class Cj by the classifier but which belong to another class), and ntj is the number of total indications or predictions of class Cj by the classifier. Accordingly, whenever the classifier indicates class Cj, the hit rate RHitj gives the probability that the class Cj events are detected by the classifier, and the false-alarm rate RFAj gives the probability that this indication is wrong. Note that these definitions of hit and false-alarm rates do not tell how many times a specific class Cj is indicated or predicted, they only present relative measures.
The overall quality of the classifiers was judged on a specific performance measure that takes into account that small errors are less severe than large errors. Often, small errors, that is, small differences between the predicted and the desired (or true) class labels, are less critical than large differences. This can be expressed by a performance measure P that relies on a so-called “score matrix,” which assigns a weight to each realization of predicted and desired classes. A large performance value P indicates a classifier of high quality, and a low performance value indicates a classifier of low quality. In the score matrix, a small weight accounts for a critical error (i.e., provides a small contribution to P) and a large weight accounts for a less critical or no error (i.e., provides a large contribution to P). Usually the largest value in the score matrix (here 1.0) is used to express a correspondence between predicted and desired classes. Table 2 shows the score matrices for wind speed and wind gusts, which were defined by MeteoSwiss. The performance is calculated according to
where |𝒳| denotes the cardinality of the set of samples 𝒳, the sum runs over all samples in 𝒳, and sab is the score matrix element that belongs to the predicted, that is, estimated, class Cesta and the desired class Cdesb.
The neural networks employed were conventional feed-forward neural networks (FFNN) that were realized in two variants—directly, as classifiers to match the desires of the user, and indirectly, as function approximators that predicted the exact wind values. In the latter approach, the continuous predictions were transformed into discrete class bins in a postprocessing step by thresholding. Class imbalances were addressed by a cost-sensitive error function.
To evaluate the quality of the final classifiers, the results were benchmarked against persistence. Persistence is a common benchmark, which predicts that the present and future conditions are identical. Note that this method is incapable of predicting any change, for example, the beginning or the end of a wind class period. An attempt to asses the statistical significance of those results was not made at this stage. Because of the absence of specific target performance values or hit and false-alarm rates, the aim of the study was to improve the quality of the classifiers as much as possible with respect to persistence and hit and false-alarm rates. Whether these improvements may be sufficient for specific applications was not a subject of this study.
Cost-sensitive feed-forward neural networks
The aim of this paper is to find a function that maps a characteristic set of actual observed meteorological values (input features) to the true wind speed and wind gust class labels in the future (desired outputs). More realistically, the aim is to approximate such a function. In this paper, this is realized by so-called conventional FFNN, which are capable of approximating any function to any desired degree (Leshno et al. 1993). FFNN are the most frequently employed neural networks. They do not require a mathematical model in order to provide such an approximation, instead they are trained on sample data by iteratively minimizing an error function. This error function significantly influences the characteristics of the resulting approximation. In order to account for the rare occurrence of strongest winds, a variant of FFNN that relied on a cost-sensitive error function was utilized for this application. Section 3a presents the architecture of FFNN, section 3b describes the error function employed, and section 3c describes the training procedure applied. Further information on FFNN can be found, for example, in Duda et al. (2000), Haykin (1999), or Masters (1993).
FFNN provide function approximations of the form y = Ψ(x), where x denotes the vector of input features and y denotes the output vector. The underlying function Ψ( ) can be visualized in a scheme (i.e., architecture) that may be interpreted as a simplified model of connected neurons, for example, in an human brain. Figure 2 shows such a typical FFNN architecture. The following definitions are used. The kth sample is denoted by (x[k], d[k]), where x = [x1, … , xl, … , xni]T are the inputs to each of the ni input neurons counted by index l, and d = [d1, … , dj, … , dno]T are the corresponding desired outputs of each of the no output neurons counted by index j. There is one hidden layer with nh hidden neurons counted by index i; the output values of the hidden neurons are h = [h1, … , hi, … , hnh]T. Last, y = [y1, … , yj, … , yno]T are the output values of the output neurons. The weight between the input and the hidden layer, connecting the input neuron l to the hidden neuron i, is denoted as wHli, and the weight between the hidden layer and the output layer, connecting the hidden neuron i to the output neuron j, is denoted as wOij. The activation functions for the hidden neurons i and the output neuron j are f Hi( ) and f Oj( ), respectively. The weights of the biases of the hidden and output neurons are denoted as wH0i and wO0j, respectively. In accordance with Fig. 2, the output values of a FFNN are
where hi = Σ nil=0 wHlixl and yj = Σ nhi=0 wOijhi are the inputs to the hidden neuron i and the output neuron j, respectively. For the FFNN in this paper, a sigmoidal function was employed for all activation functions:
The update relies on a cost-sensitive batch variant of the summed squared error (SSE), which is defined as
where ej[k] = yj[k] − dj[k] is the difference between the actual response yj[k] of the jth output neuron to the corresponding desired output value dj[k]. The influence of different classes to an error function relates to the number of class members in a given training set. In order to compensate for the skewed influences of different numbers of class members in a dataset, the contribution of each sample to the objective function (and consequently to the corresponding gradients for the free parameter update) is multiplied by the factor
This modification is referred to as “class-specific correction (CSC)” (Kretzschmar 2002). The parameter α specifies the influence of these corrections, that is, the strength of CSC. If α = 1.0, all classes have the same total influence, regardless of their numbers of class members. For α < 1.0 the effect of CSC is weaker, for α > 1.0 the effect of CSC is stronger. For α = 0.0, 1/|Cj|α = 1.0 and, consequently, CSC has no effect, that is, SSEα reduces to the common SSE.
The update of the weights was realized according to the scaled conjugate gradient (SCG) algorithm (Moller 1993). For the SCG algorithm the derivatives of the objective SSEα with respect to the weights wHli and wOij (i.e., the gradients) have to be calculated. The derivatives of SSEα, with respect to wOij, are summarized in the appendix.
The coding of classification was realized by “1 to c” coding for most experiments. A classifier is 1-to-c coded if each class is represented by an individual output neuron j. In case a sample x belongs to the class Cj, its desired output vector d is defined as dj = 1 and dz = 0, ∀z ≠ j. The winning class was then selected by the winner-takes-all criterion, that is, the input sample is assigned to the class Cj corresponding to the largest output yj.
The training of the classifiers was terminated according to the performance P defined in (3) by the following procedure. After each training cycle, involving the whole training set once, the performance P was calculated for the classifier on an independent validation set that had not been used for weight update. The training was terminated if the P computed on the validation set did not improve during 50 cycles and the weight realization with highest P was chosen (the number of 50 cycles was determined empirically according to prior experiments). This stopping criterion aims at selecting classifiers with the highest P values computed on the validation set. For each experiment several neural networks with different numbers of hidden neurons were trained and the training of each neural network was repeated 10 times. Each of these trials was initialized randomly. The relative quality of the resulting neural networks was judged by the performance P, the hit rates RHitj, and the false-alarm rates RFAj, and, accordingly, the realizations showing the highest quality were selected. The actual quality of the selected classifiers was then estimated based on an independent test set. All neural networks used for this study were created and trained with a modified version of the Stuttgart Neural Network Simulator (SNNS; SNNS Group 1995).
Input feature selection
This section describes the selection of the input features for the classifiers. The selection was performed for the site of Geneva, only. First, a rough input feature selection was realized based on a time series correlation study (see section 4a), then a more thorough empirical input feature selection was realized based on local data, data of surrounding stations, and additional data derived from model analysis (see section 4b). Section 4c provides a discussion of the input feature selection results.
The goal of the preprocessing was to find the smallest set of input features that can be obtained at the least financial cost, which comprises the necessary information to adequately represent the processes and the states of the relevant local atmospheric conditions. Small input feature sets are usually more likely to lead to classifiers with lower complexity than larger input feature sets. The objective of choosing input features that can be obtained at the least financial cost (if there is such a choice) was motivated by business reasons.
For the training and validation of the classifiers only 3 yr of data were used in order to keep the computational burden low. More specific, the hourly observations of the 2 yr of 1996 and 1997 (17 544 samples) were used for training, and the hourly observations of the year 1998 (8760 samples) were used for validation and classifier selection. The quality of the selected classifiers was then estimated on the 16 yr of testing data: 1982–95, 1999, and 2000 (140 232 samples). The particular years 1996, 1997, and 1998 were chosen because they were available in the required format at the beginning of the experiments. Additional experiments relying on alternative years for training and validation led to classifiers with comparable performance values and hit and false-alarm rates.
In order to explore the characteristics of the wind data, a spectrum and a correlation analysis of the wind time series were performed.
Figure 3a shows 3 yr (1996–98) of raw wind speed data. The data were recorded with a Schildknecht anemometer that is part of the Swiss automatic weather station network (ANETZ) at Geneva maintained by MeteoSwiss. The horizontal axis represents the samples on an hourly basis, and the vertical axis denotes wind speed (0.1 m s−1). It becomes apparent from this figure that the wind speed values are concentrated at slow speeds and exhibit a clear annual pattern (this is also true for wind gusts).
Figure 3b shows the corresponding discrete power spectrum. The horizontal axis gives the frequencies, and the vertical axis gives the spectral power coefficients. There is a dominant peak around the frequency 0.04, which has its maximum at the spectral power coefficient ϕ1092 that corresponds to a frequency of 0.041(6) (i.e., a period of 24 h). This peak clearly indicates a diurnal influence. Further peaks could also be identified at integer multiples of the frequency 0.041(6), which is an effect of the Fourier transform. The weights of the Fourier coefficients ϕz were found to be more dominant for coefficients at low frequencies. This is an indication that processes with a large time scale dominate the local wind time series.
Figures 3c–3f show autocorrelations and partial autocorrelations of the hourly wind speed time series, differenced hourly wind speed time series, and 10-min wind speed time series. The horizontal axes give the lag of the coefficients and the vertical axes denote the influence of the correlation coefficients. Figure 3c shows the autocorrelation of hourly wind speed for a whole week (i.e., 168 lags). An autocorrelation series gives the degree of correlation of the present sample (time t0 = 0) to the samples with lag tL < t0. The autocorrelation was found to start with high values for the first lags and then drop down to a periodic diurnal oscillation with maxima at multiples of 24 h and minima in between. The corresponding partial autocorrelation for the first 24 h is shown in Fig. 3d. A partial autocorrelation series gives the degree of correlation of the present sample (time t0 = 0) to the samples with lag tL < t0 minus the correlations that are already included in the lags tz with tL < tz < t0. The partial autocorrelation exhibited a very strong correlation for the first lag, a much smaller correlation for the second lag, and appeared to lead into a diurnal oscillation with rather small correlations values. According to this, the most important information about the actual wind speeds is likely to lie in the first lag, or the most recent observation.
Figure 3e shows the partial autocorrelation of the “differenced” wind speed time series. The differenced wind speed value at lag t is the value of the wind speed at lag t minus the value of the wind speed at lag t − 1. Differencing can remove nonstationarity and can be interpreted as high-pass filter. The differenced time series led to weak but relevant partial autocorrelations at almost every lag.
The partial autocorrelation of the corresponding 10-min winds is shown in Fig. 3f. As for the hourly wind speed time series, the first lag of the 10-min winds was found to be dominant. When compared with hourly wind speed, the 10-min winds exhibited stronger partial correlations at lags t > 1.
Further investigations on the time series involved cross correlations and partial cross correlations of hourly wind speed and hourly wind gusts recorded at Geneva, correlated with hourly values of wind speed, wind gusts, wind direction, pressure, temperature, humidity, radiation, and rain recorded at the site of Geneva and other sites in Switzerland (viz., Changins, Chasseral, Geneva, Kloten, and Pully). The most significant correlations were found between the Geneva wind speed and wind gust time series. All other correlations were relevant too; however, the correlations between speed or gusts to pressure or temperature were found to be more relevant than the correlations of speed or gusts to wind direction, humidity, radiation, or rain. In comparing the different stations, the correlations were most relevant between Geneva and Changins, which is also situated next to Lake Geneva at similar altitude, and were least relevant between Geneva and Chasseral, which is a mountain station in the Swiss Jura at a higher altitude. All partial cross correlations exhibited a significant first lag. Based on these insights, a first rough input feature selection was performed on a subjective basis. Restricted to these input features, a second empirical input feature selection was made. This is described in the next section.
The search for an accurate input feature set was initiated by time series analysis to perform a first rough selection of input features as described in the previous subsection. The empirical input feature selection described in this subsection was restricted to these input features, only. The quality of different input feature sets was judged by training neural network classifiers on the corresponding training and validation datasets, and by comparing their performance, and hit and false-alarm rates (as defined in section 2) and the number of input features involved. This evaluation was restricted to the site of Geneva and to +1-, +6-, +12-, and +24-h lead time predictions. The resulting input features were also used for the Sion data, as reported in the next section (section 5).
The first set of these experiments only involved features from the local wind series observed at the site of interest and also time and date. These features were considered to be available least “costs.” Based on these input features the most salient numbers of time lags and variables were evaluated. In a next step, the resulting “optimal” input feature set was enlarged by groups of other local variables, such as pressure and temperature, at several lags. In case the addition of such a group of input features led to an improvement, the group was added to this optimal input feature set to form a new optimal input feature set. This procedure was repeated for nonlocal observations at surrounding stations, and finally for features derived from the analysis of the European Centre for Medium-Range Weather Forecasts (ECMWF) model that corresponded closest to a +24-h lead time. The reason for relying on the analysis and not on the model prediction was to evaluate the potential of an ideal forecast as additional input to the classifiers. As a result, an input feature set that accounts for performance, size, and involved costs was found for each variable and lead time. The resulting input feature sets are summarized in Table 3.
For both variables and all lead times, the inclusion of time and date proved to be beneficial. For wind speed, no lag times longer than 1 h were necessary. The use of 10-min data was found to be beneficial for wind speed at +1- and +6-h lead times, as well as for wind gusts with +1-h lead times. The prediction of wind gusts with +6- and +12-h lead times required more input features than for wind speed for the same lead times. For a +6-h lead time, wind gusts could be predicted best by utilizing lags covering 24 h. The addition of wind direction as an input feature led to the best results for wind gust prediction with +6- and +12-h lead times. The addition of further lag times, local observations, and observations from surrounding stations led to no improvements. The model data included in these experiments were the first 10 principal components of a 500-hPa height field and an 850-hPa temperature field over Europe from the ECMWF model analysis, expressing 95% of the total variance inherent in these data. Both fields are represented by a 10 × 10 grid ranging from 50° to 41°N and from −1.5° to 12.0°E. Improvements due to these model data could only be found for wind speeds for +12- and +24-h lead times, as indicated in Table 3. Larger numbers of principal components led to no further improvements. All further input features were found to be unnecessary for local wind speed and wind gust prediction, too. In fact, a survey of the corresponding hit and false-alarm rates revealed that, especially for strong winds, sets with lower numbers of input features often led to more reliable hit and false-alarm rates in comparison with their counterparts involving higher numbers of input features.
Based on the results of the input feature selection, it was found that the selection of input features was less crucial than expected. The necessary information for the prediction of local wind series at Geneva was found to be present in several realizations of input feature sets. This was indicated by the fact that many input feature sets differed only slightly in performance, and hit and false-alarm rates. This allowed the selection of rather small sets of mainly local input features.
According to the objectives outlined at the beginning of this section, the best predictor for the prediction of the local wind was identified to be the wind itself at the same location. This resembles the way the local wind is predicted by a human forecaster. In order to produce a prediction, the forecaster not only looks carefully at the time evolution of the local wind, but he usually also looks at the behavior of some stations situated upstream. The reason why these stations do not appear in our choice of predictors is that “upstream” can refer to different stations depending on the wind direction, so that no single station finally shows a good correlation of the observed winds to the future winds at the site of interest. The same applies to pressure gradients that are commonly used as predictors for the wind speed evolution. The choice of the gradient is usually dependent on the weather situation.
The incorporation of numerical models is usually expected to lead to a higher benefit than that indicated by the simulation results. The reasons for this are the following. Although the numerical model involves the dynamics of the entire atmosphere, it has a spatial resolution that is much broader than the observed local scale. When relying on model analysis data, the neural networks perform a downscaling of the model forecast. For the prediction of local winds, the model data should ideally be used for the actual time of prediction. For the experiments only, the analysis corresponding closest to a +24-h lead time was utilized, which might not be the most adequate model input for a +12-h wind prediction, for example. The model data involved upper-air analyses only (viz., the 500-hPa height field and the 850-hPa temperature field). It is known that the surface winds are not necessarily well correlated to these upper-air fields. A more promising method could be to correlate the model forecast of surface parameters (such as wind and pressure) to the local wind. However, a numerical model is usually available only a few hours after the observations, whereas the local observations are available almost immediately.
This section presents the results of the neural networks that relied on the input feature sets given in Table 3. All neural networks were trained and validated on independent sets of data and the appropriate number of hidden neurons was specified by the strategy outlined in section 4.
According to the ideas in section 2, FFNN classifiers with CSC (α = 0.5) were compared with FFNN classifiers without CSC (α = 0) and FFNN function approximators. Judged on the performance P, and the hit and false-alarm rates RHitj and RFAj, the FFNN classifiers with CSC showed superior quality for wind speed and wind gust prediction for +1-, +6-, +12-, and +24-h lead times, especially for the prediction of strongest winds. Details can be found in Kretzschmar (2002).
The following sections only report on the results obtained by the FFNN classifiers with CSC (FFNN–CSC). All performance values and rates were calculated on the testing data described in section 4 that involved 16 yr of observations. Sections 5a and 5b benchmark the FFNN– CSC against persistence for the sites of Geneva and Sion, respectively. Section 5c reports on the quality of the Geneva FFNN–CSC when tested on the Sion data. It was found that for the class of strongest winds, the Geneva FFNN–CSC outperformed the Sion FFNN–CSC on the Sion data.
Geneva classifier for Geneva
This section reports on the results at the Geneva site in western Switzerland next to the lake of Geneva (see Fig. 1). The results for persistence are given in Table 4, and the results for the FFNN–CSC are given in Table 5. To illustrate these results, several examples of persistence—FFNN–CSC hit and false-alarm rate comparisons—are shown in Figs. 4a–4f for wind gusts. In each figure the vertical axis denotes the hit rate RHit (Hit) and the horizontal axis denotes the false-alarm rate RFA (FA). Persistence is indicated by encircled dots and the FFNN– CSC results are indicated with crosses. An optimal classifier would produce a result in the upper-left corner (i.e., RHit = 1 and RFA = 0). Note that persistence is not capable of predicting a change of an actual wind class, it merely gives probabilities for class endurance. The hit and false-alarm rates are relative measures and do not indicate how many times a specific class is indicated by a classifier over a dataset. For this reason, whenever hit and false-alarm rates are given, the corresponding cardinalities (i.e., the numbers of samples) of each class are also given. For example, for wind speed with a +1-h lead time in Table 5 the value RHit6 = 0.71 for class C6 denotes that roughly 172 out of |C6| = 243 samples were predicted, and the value RFA6 = 0.36 means that 36% of all predictions for class C6 were wrong. As a consequence the 172 correctly identified samples present 64% of the total indications of class C6; hence, class C6 was predicted about 269 times for the test set.
When compared with persistence, the FFNN–CSC show considerably higher performance values on the testing set Ptest for both variables and all leads. This is an indication that for the FFNN–CSC, the class label distances between wrongly predicted classes and observed classes are smaller than those for persistence. Comparing hit and false-alarm rates, the FFNN–CSC almost always show higher hit and lower false-alarm rates than persistence, provided that the classes were predicted at all (i.e., a hit rate of RHitj > 0 was realized). This is also indicated in Figs. 4a–4f where the neural network classifiers produced results closer to the upper-left corner than those of persistence. Especially for longer lead times, classes of strongest winds were not predicted by the FFNN–CSC (i.e., a hit rate of RHitj = 0 was found). For example, the wind speed class C6 was ignored for a +12-h lead time and the wind speed class C5 was ignored for a +24-h lead time. The addition of the model features for wind speed for +12- and +24-h lead times led to a slight improvement for the +12-h lead time and a considerable improvement for the +24-h lead time in comparison with the corresponding feature sets comprising no model data. According to Tables 4 and 5, hit rates decreased and false-alarm rates increased with increasing class cardinality |Ci|, which is also indicated in Figs. 4a–4d that show the +6-h prediction results of the classes C1–C4. This is a clear indication that strong winds (i.e., rare classes) are more difficult to predict than weak winds (i.e., frequent classes).
Sion classifier for Sion
This section reports on the results at the Sion site, which is situated in a Swiss mountain valley (see Fig. 1). Table 6 presents the results of persistence, and the results for the FFNN–CSC are given in Table 7. Several examples of persistence—FFNN–CSC hit and false-alarm rate comparisons—are shown in Figs. 4g–4l for wind gust. In comparison with the Geneva results, the absolute values of performance, and hit and false-alarm rates were found to be of lower quality at the Sion site; nevertheless, the relative improvements of the FFNN– CSC, with respect to persistence, were found to be considerably higher for Sion than for Geneva. For example, for the wind speed class C5 for a +1-h lead time at Geneva, the hit and false-alarm rates for persistence were 0.63 and 0.37, and for the FFNN–CSC were 0.67 and 0.35, respectively, which indicated a slight improvement of the FFNN–CSC compared to persistence. For the wind speed class C5 for a 1-h lead time at Sion, the corresponding rates were 0.53 and 0.47 for persistence, and 0.63 and 0.44 for the FFNN–CSC, which indicates a considerably higher improvement. The addition of the model features for wind speed for +12- and +24-h lead times led to a slight improvement for both leads in comparison with the corresponding feature sets comprising no model data.
Geneva classifier for Sion
Table 8 presents the results of the Geneva classifiers applied to the datasets from Sion. In contrast to the previous experiments, for wind speed at 12- and 24-h lead times the Geneva classifiers trained on the input features listed in Table 3, excluding the model features, produced results of higher quality on the Sion data than the Geneva classifiers, including the additional model features. For this reason, only the results of the Geneva classifiers, excluding the model features, are given.
In comparison with the Sion classifiers, the Geneva classifiers exhibited lower Ptest values on the Sion data. Nevertheless, for wind speed and wind gusts with a +1-h lead time, the Geneva classifiers produced comparable hit and false-alarm rates. In fact, for the wind speed classes C5 and C6 for +1-h lead time and the wind speed class C6 for +6-h lead time, the Geneva classifier led to higher hit rates and lower false-alarm rates on the Sion data than the Sion classifiers itself. This is attributed to the fact that for the training of the Geneva classifiers considerably more samples of strongest winds were available (13 C6 wind speed samples in the Geneva training set) than for the Sion classifier (1 C6 wind speed sample in the Sion training set). However, the assumption that additional samples of strongest winds from other stations to the Geneva and Sion datasets would improve the classifications was found not to be valid. Although the increase of samples of strongest winds in the datasets led to classifiers leading to higher hit rates for strongest winds, they also produced significantly higher false-alarm rates.
Conclusions and outlook
This document evaluated the quality and potentials of neural network classifiers for hourly local wind speed and wind gust prediction. The presented classifiers exhibited a superior quality when compared with persistence with respect to performance as well as hit and false-alarm rates for the prediction lead times of +1, +6, +12, and +24 h for wind speed and wind gusts. For the prediction of strongest winds, satisfying results could only be realized for a +1-h lead time within these experiments.
The input features selected for the classifiers were several lags of the local wind speed, wind gust, and wind direction time series, time, and data, and additional features from the ECMWF analysis that corresponded closest to a +24-h lead time. Except for the model features, all input features were obtained from a single wind observation device at the site of interest. The model features from the analysis were included to test the potential of additional model features. The chosen model features led to improvements for wind speed prediction for +12- and +24-h lead times; nevertheless, the improvements were smaller than expected. The problem of class imbalance was addressed by cost-sensitive error functions. The neural network classifiers relying on cost-sensitive error functions outperformed conventional neural network classifiers and neural network function approximators.
In the future, training of two-class classifiers is planned, which would indicate whether future winds would exceed a certain threshold. For this purpose, additional model features other than the 500-hPa height field and 850-hPa temperature field analyses will also be evaluated. Further, there is an intention to benchmark the resulting classifiers against alternative prediction methods such as conventional downscaling of model forecasts and to assess the statistical significance of these results.
This project was funded in part by ETH Zurich, Switzerland; MeteoSwiss, Switzerland; and the NCCR Climate Research Consortium. The authors express their gratitude to Nicolaos B. Karayiannis from the University of Houston, Texas, and Hans Richner from ETH Zurich, Switzerland, for their expertise and support of the project.
Corresponding author address: Dr. Ralf Kretzschmar, MeteoSchweiz, Krähbühlstr. 58, Zürich 8044, Switzerland. krr;cameteoswiss.ch