Nowcasting Foehn Wind Events Using the AdaBoost Machine Learning Algorithm

Michael Sprenger Institute for Atmospheric and Climate Science, ETH Zürich, Zurich, Switzerland

Search for other papers by Michael Sprenger in
Current site
Google Scholar
PubMed
Close
,
Sebastian Schemm Geophysical Institute, University of Bergen, and Bjerknes Centre for Climate Research, Bergen, Norway

Search for other papers by Sebastian Schemm in
Current site
Google Scholar
PubMed
Close
,
Roger Oechslin Meteotest, Bern, Switzerland

Search for other papers by Roger Oechslin in
Current site
Google Scholar
PubMed
Close
, and
Johannes Jenkner UBIMET GmbH, Vienna, Austria

Search for other papers by Johannes Jenkner in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

The south foehn is a characteristic downslope windstorm in the valleys of the northern Alps in Europe that demands reliable forecasts because of its substantial economic and societal impacts. Traditionally, a foehn is predicted based on pressure differences and tendencies across the Alpine ridge. Here, a new objective method for foehn prediction is proposed based on a machine learning algorithm (called AdaBoost, short for adaptive boosting). Three years (2000–02) of hourly simulations of the Consortium for Small-Scale Modeling’s (COSMO) numerical weather prediction (NWP) model and corresponding foehn wind observations are used to train the algorithm to distinguish between foehn and nonfoehn events. The predictors (133 in total) are subjectively extracted from the 7-km COSMO reanalysis dataset based on the main characteristics of foehn flows. The performance of the algorithm is then assessed with a validation dataset based on a contingency table that concisely summarizes the cooccurrence of observed and predicted (non)foehn events. The main performance measures are probability of detection (88.2%), probability of false detection (2.9%), missing rate (11.8%), correct alarm ratio (66.2%), false alarm ratio (33.8%), and missed alarm ratio (0.8%). To gain insight into the prediction model, the relevance of the single predictors is determined, resulting in a predominance of pressure differences across the Alpine ridge (i.e., similar to the traditional methods) and wind speeds at the foehn stations. The predominance of pressure-related predictors is further established in a sensitivity experiment where ~2500 predictors are objectively incorporated into the prediction model using the AdaBoost algorithm. The performance is very similar to the run with the subjectively determined predictors. Finally, some practical aspects of the new foehn index are discussed (e.g., the predictability of foehn events during the four seasons). The correct alarm rate is highest in winter (86.5%), followed by spring (79.6%), and then autumn (69.2%). The lowest rates are found in summer (51.2%).

© 2017 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Michael Sprenger, michael.sprenger@env.ethz.ch

Abstract

The south foehn is a characteristic downslope windstorm in the valleys of the northern Alps in Europe that demands reliable forecasts because of its substantial economic and societal impacts. Traditionally, a foehn is predicted based on pressure differences and tendencies across the Alpine ridge. Here, a new objective method for foehn prediction is proposed based on a machine learning algorithm (called AdaBoost, short for adaptive boosting). Three years (2000–02) of hourly simulations of the Consortium for Small-Scale Modeling’s (COSMO) numerical weather prediction (NWP) model and corresponding foehn wind observations are used to train the algorithm to distinguish between foehn and nonfoehn events. The predictors (133 in total) are subjectively extracted from the 7-km COSMO reanalysis dataset based on the main characteristics of foehn flows. The performance of the algorithm is then assessed with a validation dataset based on a contingency table that concisely summarizes the cooccurrence of observed and predicted (non)foehn events. The main performance measures are probability of detection (88.2%), probability of false detection (2.9%), missing rate (11.8%), correct alarm ratio (66.2%), false alarm ratio (33.8%), and missed alarm ratio (0.8%). To gain insight into the prediction model, the relevance of the single predictors is determined, resulting in a predominance of pressure differences across the Alpine ridge (i.e., similar to the traditional methods) and wind speeds at the foehn stations. The predominance of pressure-related predictors is further established in a sensitivity experiment where ~2500 predictors are objectively incorporated into the prediction model using the AdaBoost algorithm. The performance is very similar to the run with the subjectively determined predictors. Finally, some practical aspects of the new foehn index are discussed (e.g., the predictability of foehn events during the four seasons). The correct alarm rate is highest in winter (86.5%), followed by spring (79.6%), and then autumn (69.2%). The lowest rates are found in summer (51.2%).

© 2017 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Michael Sprenger, michael.sprenger@env.ethz.ch

1. Introduction

Foehn flows strongly affect many Alpine valleys, for example, the Rhine valley and the Reuss valley in Switzerland and the Wipp/Inn valley in Austria. The foehn characteristics, especially the strong gusty winds, make a reliable prediction very desirable (see Richner and Hächler 2013 for a recent review). However, the socioeconomic impacts of foehns are not limited to extreme wind gusts. For example, foehns also affect air quality (ozone concentrations and aerosol loadings), snow melting, agriculture, fire outbreaks, cable car operation (e.g., case study by Burri et al. 1999), and wind-induced waves on lakes (Graf et al. 2013). Hence, foehns are an important aspect of operational weather forecasting and nowcasting.

Foehn prediction and nowcasting can be accomplished using several different approaches. At the Swiss national meteorological service (MeteoSwiss; www.meteoswiss.ch), a probabilistic method is applied that originally was introduced by Widmer (1966) and later refined and simplified for operational needs by Courvoisier and Gutermann (1971). This method relies on two pressure gradients across the Alps and also incorporates a pressure tendency. Based on these components it defines a seasonally varying threshold that has to be surpassed to indicate a foehn. The resulting index, called the Widmer foehn index, works rather reliably up to a time window of 36 h (Richner and Hächler 2013). While the Widmer index is designed for forecasting foehn events, the approach developed by Dürr (2008) is designed for foehn nowcasting. Dürr (2008) uses several meteorological parameters characteristic of foehns (wind speed and direction, relative humidity, and potential temperature difference relative to an Alpine crest station) and derives with a simple threshold-based approach the occurrence or absence of foehn conditions at individual valley stations. This objective algorithm is operationally applied to 10-min observational data and provides real-time information about foehns at about 30 measurement sites across Switzerland. An interesting aspect of Dürr’s (2008) algorithm is its application to long observational time series. For instance, in Switzerland 10-min observations have been available since 1981 for many measurement sites, hence allowing one to compile a 30-yr climatology of foehn occurrence at these stations and to compare them to a long “reference” foehn time series at the Swiss station Altdorf (Richner et al. 2014; Gutermann et al. 2012). Another statistical approach was recently applied by Drechsel and Mayr (2008) using operational forecasts made by the ECMWF’s Integrated Forecasting System (IFS; www.ecmwf.int). The focus of their study was the Wipp/Inn valley in Austria, and they could show that reliable foehn forecasts for this region are feasible for up to 3 days in advance. As for the Widmer index, pressure differences across the Alpine barrier play a crucial role in their statistical foehn forecasting model. Recently, a further probabilistic foehn diagnosis based on a statistical mixture model was developed by Plavcan and Mayr (2014) and also applied to the Wipp valley.

The accurate representation of small-scale phenomena such as foehns in a high-resolution numerical weather prediction (NWP) model is still challenging. It was recently shown that even the 2.2-km grid spacing of the current Consortium for Small-Scale Modeling (COSMO) NWP model run at MeteoSwiss (Steppeler et al. 2003) is not capable of accurately representing foehn flow in the narrow northern Alpine valleys (Wilhelm 2012). More precisely, Hächler et al. (2011) concluded that foehn flows begin too early in the model, and the wind speed and surface temperature in the valleys are not correctly captured. It is unlikely that foehn events can be resolved in NWP models in the near future. Hence, foehn forecasts will continue to require some sort of statistical postprocessing of the NWP output.

A simple but effective way of coupling statistical methods with NWP-based forecasts is to apply the Widmer index to the model output. In a more modern terminology, this directly leads to a machine learning (ML) approach for foehn prediction (see Hastie et al. 2009). Other names, which essentially mean the same thing, are applied predictive modeling, artificial intelligence, and statistical learning (Kuhn and Johnson 2013). There are many ML methods that could be (Hastie et al. 2009), and indeed have been, applied within a meteorological context, including linear discriminant analysis and model output statistics (MOS; Glahn and Lowry 1972; Zweifel 2016), random forest (Deloncle et al. 2007), the addative boosting algorithm (AdaBoost; Perler 2006; Perler and Marchand 2009), support vector machines (Radhika and Shashi 2009), neural networks (Manzato 2005; Kretzschmar et al. 2004), and cubist model trees (McCandless et al. 2015). A concise overview of ML methods in the environmental sciences is given in Hsieh (2009). All of the abovementioned methods have distinct advantages and disadvantages. They are nicely summarized and compared by Perler and Marchand (2009) for thunderstorm detection in Switzerland or for high-impact weather prediction and solar irradiance forecasts in the United States by Gagne (2016). First of all, there might be a trade-off between classification performance and interpretability (James et al. 2013). From the list of potential ML approaches, boosting methods appear to offer a good balance between predictive power and interpretability of the results. This balance is an important aspect for foehn research though much less so for pure forecasting purposes. One boosting method, called AdaBoost, is computationally fast, does not negatively react to superficial predictors, and is also quite robust to overfitting (Hastie et al. 2009). Another advantage of AdaBoost is that superfluous or collinear predictors are effectively regularized by the algorithm, and the most potent set of predictors is selected effectively. As one disadvantage of boosting, Dietterich (2000) mentions that outliers can have a negative impact on the outcome because during the boosting steps (as outlined in section 3) they might receive too much weight and introduce flaws into the final classification, particularly if the number of outliers (noise in Dietterich’s terminology) exceeds 20%.

The aim of this paper is twofold. The first part presents a new approach to solving the challenge of accurate foehn prediction and nowcasting. The approach is based on the AdaBoost machine learning algorithm (Freund and Schapire 1997). The second part is then devoted to the application of a trained boosted learner to the period 2000–02. Strengths and weaknesses of the method are compared to the more traditional forecasting technique (Widmer index) used at MeteoSwiss. Further, we will show that the algorithm can be used to study some meteorological aspects of foehns, in particular its predictability during the four seasons.

The outline of the article is as follows. First, in section 2 we describe the datasets, which build the basis for the machine learning algorithm examined in detail in section 3. The forecasting algorithm is then applied to the period 2000–02 (section 4a) and a set of suitable predictors are determined. In the following section, the algorithm is compared with the traditional Widmer index (section 4b), the trade-off between the correct and false alarm rates is discussed (section 4c), and predictability issues are considered (section 4d). We conclude with a summary of the main results and an outlook (section 5).

2. Data

a. Building a foehn reference dataset

For the purpose of our study two datasets are required. First, a reliable foehn climatology has to be used that contains the information about whether or not a foehn occurred and serves as the target variable (i.e., predictand) in the ML model. Because we are dealing with a problem of supervised machine learning, such a dataset is required to train and validate the objective foehn classification. To obtain such a dataset, we use the methodology developed by Dürr (2008), and we apply it to the operational measurements provided by MeteoSwiss. This final dataset indicates for each hour during the period between 2000 and 2002 whether foehn conditions occurred in the Reuss valley (Altdorf; 48.86°N, 8.63°E; 438 m MSL) or not. More precisely, the method of Dürr (2008) relies on several observed parameters and for each of them a threshold is defined: 1) wind direction between 120° and 240° (SE–SW), 2) mean wind speed > 3.7 m s−1, 3) wind gust > 6.2 m s−1, 4) relative humidity < 54%, and 5) potential temperature difference as compared with the measurement site in Gütsch at the Alpine crest < −4 K. While the first through the fourth criteria directly apply to the observation site at Altdorf, the final criterion takes into account a second elevated measurement station at Gütsch (46.65°N, 8.60°E; 2283 m MSL). The last criterion assumes that foehn conditions observed in Altdorf originate from the Alpine crest and then descend dry-adiabatically into the foehn valley on the northern side of the Alps. Dürr’s (2008) algorithm is first applied to 10-min observation data, and then the time resolution is reduced to 1-hourly data. To do so we apply a very basic assumption: if more than three 10-min intervals during 1 h are associated with foehn conditions, then the whole hour is classified as a foehn hour. Furthermore, the methodology allows us to distinguish between pure and mixed foehn cases, the latter being related to air masses in the Reuss valley with foehn characteristics but no severe wind gusts. For our analysis, we exclude these mixed foehn hours and hence restrict the machine learning to pure foehn cases. The mixed cases are attributed to the nonfoehn category.

The objective foehn climatology, obtained using the method of Dürr (2008), agrees well with the foehn times identified by weather forecasters at MeteoSwiss [see Richner and Hächler (2013) and Sprenger et al. (2016) for a historical review of foehn forecasting in Switzerland]. However, this objective foehn identification may still fail in some rare situations. For instance, thunderstorm outflow can mimic the characteristics of a short-lived foehn event during the summer. However, during the main foehn seasons (spring, autumn, and winter) the number of misclassified cases is negligible, and also during the summer manual checks yielded only a few incorrectly classified cases (see section 4 in Dürr 2008). These cases are manually eliminated from the dataset. A further problem with the Dürr (2008) algorithm is its limited skill in detecting dimmerfoehn cases. Dimmerfoehn conditions are characterized by rather high relative humidities in the foehn valleys (Richner and Dürr 2015) and hence do not comply with Dürr’s (2008) relative humidity criterion. However, dimmerfoehn conditions are very rare (see references in Richner and Dürr 2015) and, hence, do not negatively affect the analysis in this study.

Based on the method of Dürr (2008), a total of 1662 foehn hours were identified between 2000 and 2002 in Altdorf. The seasonal cycle is as expected. Most cases are identified during the spring (46.3%), followed by autumn (22%), and then winter (20.5%). Only a small number of cases are identified during the summer (11%), the season when the objective identification exhibits the clearest problems. However, the small number of foehn cases in the summer does not originate from the algorithm’s deficiencies but reflects the overall small number of cases in this season (Gutermann et al. 2012).

b. Meteorological predictors

1) Subjectively identified foehn predictors

After building a reference foehn dataset (section 2a) we determine a large set of potentially useful predictors for foehn nowcasting at various different places in Switzerland. This set includes traditional measures such as pressure tendencies but also moisture and temperature gradients and tendencies (see below). To this aim, we necessarily have to rely on a 4D gridded dataset. The NWP predictors are computed based on the 1-hourly COSMO-7 reanalysis data for the years 2000–02. This regional reanalysis dataset covers the Alpine area with a 7-km horizontal grid having 40 vertical levels. For a detailed description of the COSMO-7 reanalysis dataset, see Jenkner et al. (2010). A general description of the nonhydrostatic COSMO model is provided by Steppeler et al. (2003).

The fields extracted from the COSMO-7 reanalysis include temperature, horizontal and vertical wind components, specific humidity, and pressure. These fields are evaluated either on near-surface model levels or are interpolated onto a set of height or pressure levels. Furthermore, they are interpolated to a number of specific geographical locations, for example, to measurement sites (see below). Note that the main foehn characteristics wind speed, wind direction, and temperature at the validating measurement site (Altdorf) are also extracted as predictors, but because of the relatively coarse resolution of the model (compared with the narrow Reuss valley) and other model deficiencies [e.g., the land–atmosphere coupling; see Richner and Hächler (2013) and references therein], foehn flow is not explicitly simulated in the COSMO-7 model. The COSMO-7 reanalysis was already used by Jenkner (2008) and Jenkner et al. (2010) to compile a climatology of fronts and to investigate and verify precipitation forecasts in Switzerland.

The first set of predictors is subjectively determined and obtained by manual analyses of a multitude of foehn events in Altdorf and their representation in the COSMO-7 reanalysis. The first three parameters are related to the foehn winds in the valley and the driving pressure fields: 1) pressure differences between surface stations (the traditional measure), 2) wind speed and direction in the Reuss valley, and 3) the geopotential height in the midtroposphere. Taking into account the dynamical characteristics of foehn flows (see, e.g., Bougeault et al. 1998; Steinacker 2006), we furthermore included 4) precipitation (because foehn flows are often associated with heavy precipitation along the south side of the Alps), 5) relative humidity (because the airflow in the northern foehn valleys is typically very dry), 6) pressure in the northern Alpine forelands (because a low pressure system there might induce the outflow of stable prefoehn air from the foehn valleys), and 7) vertical stratification, expressed as the vertical derivative of the potential temperature (because Kelvin–Helmholtz instability might erode the prefoehn air from the northern valleys and hence allow the foehn flow to descend and reach the valley’s surface level). The complete list of the thereby selected predictors is provided in appendix A.

All these quantities are evaluated at times when foehn conditions are present in Altdorf according to the reference dataset presented in section 2a, and their tendencies are calculated by taking the differences between subsequent time steps (foehn at times ±1 h). The precise locations where the quantities are evaluated were determined through a series of manual foehn analyses and are based on a large number of weather charts. This analysis motivated us to include, for example, reduced surface pressure. More precisely a strong pressure gradient is discernible over the Alpine crest, and a pronounced pressure difference between Thun, Switzerland, and Domodossola, Italy, was selected as an additional predictor. Similarly, other predictors are chosen (see Table A1 in appendix A). Note that the list presented in appendix A refers to several meteorological fields, four different time instances, four different vertical levels (10 m above the surface, 850, 500, and 700 hPa), and also to many different stations. This large ensemble of predictors appears rather arbitrary, but it is based on manual inspections of several COSMO-7 foehn simulations. Including the whole set of predictors in the analysis is challenging, but makes us confident in the model’s ability to capture the essence of the foehn conditions within a COSMO-7 simulation. Besides, the documented strength of the AdaBoost machine learning algorithm to isolate from a large dataset the key predictors motivated us to use the entire set of 133 predictors listed in appendix A.

2) Objectively identified foehn predictors

A second set of predictors is computed but this time the predictors are objectively identified. As for the subjectively identified set of predictors, we rely on the COSMO-7 reanalyses. More specifically, we simply define a large set of predictors that all rely on pressure differences between different measurement stations (but taken from the COSMO-7 dataset to be consistent). Here, we adopt a rather brute-force approach to see whether pressure-related predictors are sufficient, and also to illustrate that the AdaBoost algorithm is able to handle a large number of (partly strongly correlated) predictors. To this aim, we first include 67 pressure values at 67 stations all over Switzerland (see later Fig. 5), where the number 67 and the locations correspond to the operational measurement network of MeteoSwiss. All pressure values are reduced to sea level, and we include them as predictors with 0-, −2-, and +2-h time lags, resulting in 201 predictors. Next, for all predictors with time lag 0 h a tendency is computed as the difference between the values at time ±2 h. This results in an additional 67 predictors, bringing the total to 201 + 67 = 268 predictors. In the final step, all combinations of pressure differences, again taken at time lag 0 h, are included, increasing the number of predictors by 1/2 × 67 × 68 = 2278. The total number of pressure-related predictors hence becomes 268 + 2278 = 2546!

3. Methodology

In this section the AdaBoost algorithm is presented in some detail (see Freund and Schapire 1997; Hastie et al. 2009). To the best of our knowledge, AdaBoost has not yet been extensively applied in meteorological analyses (Perler and Marchand 2009). There are two essential ingredients to AdaBoost: the meta-algorithm and the weak learner upon which AdaBoost is based. We start with a top-down approach and describe first the meta-algorithm (section 3a), followed by a description of the weak learner applied in this study (section 3b). Then, in section 3c we discuss the method used to validate the algorithm’s performance.

a. General outline of boosting

Hastie et al. (2009) concisely explain the main idea of the boosting algorithm: “AdaBoost, short for Adaptive Boosting, is a machine learning algorithm, formulated by Yoav Freund and Robert Schapire. It is a meta-algorithm, and can be used in conjunction with many other learning algorithms to improve their performance. AdaBoost is adaptive in the sense that subsequent classifiers built are tweaked in favor of those instances misclassified by previous classifiers.” In other words, the basic idea of AdaBoost is to call a weak classifier repeatedly, and for each call to adjust the weights attributed to the examples. Thereby, the weights of incorrectly classified examples are increased compared to correctly classified ones, so that the new classifier focuses more on incorrect examples.

The flowchart in Fig. 1 summarizes the idea behind the meta-algorithm (redrawn after Freund and Schapire 1997). Initially (step 1), some weights are given to all instances. We will attribute a weight of 2 to all foehn hours and a weight 1 to all nonfoehn hours. This choice forces the algorithm to favor correctly classified foehn hours rather than discounting misclassified nonfoehn hours (see appendix B for a sensitivity analysis). Based on these weights, the learning dataset is subjected to a weak learner (step 2a), that is, a learner with only “minimal” predictive power. The misclassifications of the weak learner are summarized into a weighted error fraction (step 2b). This error fraction is further transformed into a measure α (weight) and attributed to this weak learner (step 2c). It is also used to adjust the weights of the individual foehn cases (step 2d). All these steps are iterated several times (step 2), and each iteration defines a new weak learner m and a weight αm attributed to it. Finally, all M weak learners are combined (in step 3) into a strong learner, which is referred to as the boosted learner. In our case M is chosen to be 20 weak learners.

Fig. 1.
Fig. 1.

Outline of the AdaBoost classifier. First, some weights are attributed to the N observations (step 1), whereby the weights can either be equally attributed to foehn and nonfoehn events. Then, M iterations (step 2) are performed to adjust M weak classifiers Gm to the weighted observations (step 2a). In step 2b the performance, expressed as an error value errm, of the weak learner Gm is assessed and determines a weight αm attributed to the weak learner Gm (step 2c). Based on the performance of the weak learner, new weights are then attributed to all observations (step 2d), where the incorrectly classified instances get a larger weight. Finally, after having thus established M weak learners Gm with corresponding weights αm, the boosted strong learner G(x) is defined as a weighted average over all weak learners.

Citation: Weather and Forecasting 32, 3; 10.1175/WAF-D-16-0208.1

b. Definition of weak learner

The weak learners are an essential building block of the AdaBoost meta-algorithm. Their aim is to decide as a first guess whether the set of the previously defined predictors indicate foehn or nonfoehn conditions. There are several possibilities for such a weak learner. Well-known examples are decision stumps, decision trees, and statistical discriminant analysis [see Hastie et al. (2009) for an extensive list of methods]. Here, we rely on a very simple method, namely a modified decision stump.

In our case we use the 133 subjectively determined predictors and the 2546 objectively determined ones, as defined in section 2b, and define for each of them a threshold above or below which the predictor indicates foehn or nonfoehn conditions. For example, consider Fig. 2a, which shows the predictor Δp(ALT − LUZ, −1 h) in appendix A (Table A1) (i.e., the pressure difference between Altdorf and Lucerne, Switzerland, 1 h before the foehn instance at Altdorf). If the pressure difference for all foehn cases and all nonfoehn cases is plotted, the two distributions are well separated with only a small overlap. From Fig. 2a, the threshold separating foehn from nonfoehn cases for this predictor is set to 2 hPa. Obviously, there is the possibility of a misclassification if only this predictor is taken into account. Hence, there might be foehn cases where the pressure difference falls below this threshold (not detected) or nonfoehn cases where it is above the threshold (false alarm). No predictor correctly identifies all foehn cases! However, the thresholds are chosen in an optimal way in the sense that the forecasting error is minimized given the weights attributed to the single observations.1 For instance, if the foehn instances get more weight, then any missed foehn case will be more strongly “punished”; hence, the threshold in Figs. 2a and 2b should move to the left to avoid that. The weights attributed to the observations, on the other hand, depend on the iterative step of the AdaBoost meta-algorithm. As a second example, Fig. 2b shows the wind speed distribution during foehn and nonfoehn events, including a separating threshold value of about 5.5 m s−1.

Fig. 2.
Fig. 2.

Determination of a threshold for the single predictors (a) Δp(ALP − LUZ, −1 h) and (b) vel10m(ALT, −1 h). The predictor values for all nonfoehn (blue) and foehn events (red) are taken and the threshold is set to minimize the error rate, given the weight attributed to all observation instances. Note that the threshold depends on the relative weight of foehn to nonfoehn events (see section 3 for details). This corresponds to a simple decision stump, where finally all observations above the thresholds vote for foehn and all observations below vote against it. Black shading indicates the part that is incorrectly classified based on these single predictors.

Citation: Weather and Forecasting 32, 3; 10.1175/WAF-D-16-0208.1

So far (in Fig. 2), only two predictors have been considered. Every single weak learner consists of 133 (or 2546) predictors, each with its own threshold (e.g., 2 hPa for the pressure difference and 5.5 m s−1 for the wind speed in Fig. 2). In a decision problem (foehn yes/no) each predictor gives a vote for or against foehn conditions (see Fig. 3). The overall decision of the weak learner is then simply taken as the majority vote, or, more specifically, we take the percentage of predictors that vote for foehn conditions. If this value is above 50%, the weak learner is assumed to vote for foehn conditions. Note that this weak learner is in fact already rather sophisticated. It is not based on a single parameter, but takes into account that foehn conditions are necessarily reflected in several parameters.

Fig. 3.
Fig. 3.

Definition of single weak learners for 1100 UTC 16 Oct 2002. For (top) weak learner 1, the 133 predictors are included and vote, as a simple decision stump, for (red) or against (blue) a foehn case. The majority then determines whether the first weak learner classifies the instance as a foehn or nonfoehn event. Based on the performance of the weak learner, as determined from the error value, a weight is attributed to this weak learner. For the subsequent weak learners (5 out of 20 shown here) the weights attributed to the observations are adjusted accordingly, hence giving more impact to incorrectly classified instances, and then the same majority vote method is applied again.

Citation: Weather and Forecasting 32, 3; 10.1175/WAF-D-16-0208.1

Finally, it is worthwhile to repeat that we are considering a whole class of weak learners (up to 20) because the thresholds depend on the weights attributed to every single predictor. In this way, special emphasis is given within one weak learner to the observations that were misclassified in the previous iteration. Moreover, it is not critical that individual weak learners yield a high probability of detection or a small false alarm rate (see section 3c). The algorithm will work even if the error percentage is slightly below 50%. Indeed, there are some indications that the algorithm performs better if the weak learner is not too “strong,” as might be the case if some more sophisticated techniques (e.g., discriminant analysis) are applied as a weak learner.

As an illustrative example, Fig. 4 shows the final classification for 1100 UTC 16 October 2002. The weights attributed to the 20 weak learners are given, where votes for foehn conditions are given positive weights and votes against foehn conditions negative weights. The final vote is calculated as the sum of all signed single votes, and in this case yields 0.56, which means foehn conditions are predicted by the boosted learner if a separating threshold of 0.5 is used. This outcome is in agreement with the foehn observation given by Dürr (2008).

Fig. 4.
Fig. 4.

The final boosted classifier is taken as an average over all weak learners, weighted according to their performance. In this case, at 1100 UTC 16 Oct 2002 the weighted average becomes 0.56, which will finally classify this instance as a foehn event, provided the discriminating decision threshold is set to 0.5.

Citation: Weather and Forecasting 32, 3; 10.1175/WAF-D-16-0208.1

c. Model evaluation

The machine learning task needs to have a careful distinction drawn between the training dataset and the validation dataset (Hastie et al. 2009). The former is used to adjust all parameters of the weak learners to the learning problem. The parameters are optimally adapted to the foehn instances of this learning dataset. The validation dataset, on the other hand, is preferentially an independent dataset used to validate the algorithm. In our case, we follow different approaches for dealing with how to split the original time period 2000–02 into learning and validation datasets. The first approach is a random attribution of all hours in 2000–02 to one of the two datasets. This learning dataset has 13 112 h (including 858 foehn hours) and the validating one 13 192 h (including 804 foehn hours). This approach is very simple, but neglects the autocorrelation of the foehn time series. In fact, typically foehn conditions prevail over many hours (Gutermann 1970; Richner et al. 2014) and hence if two neighboring foehn hours are attributed to separate classes, they nevertheless can be expected to be very similar in their synoptic- and mesoscale structure and hence predictor values. To circumvent this autocorrelation “problem,” we additionally apply a different approach. The whole time series is split in the middle (July 2001), where the first half is used as the training dataset and the second as the validation dataset. This means that in the training and validation dataset foehn episodes are fully and continuously captured, and not randomly interrupted by either of the two. Of course, this splitting in the middle assumes that the two time periods do not differ with respect to their foehn characteristics. On the other hand, the random splitting mentioned above avoids this assumption. A third approach, in between the two extremes, is discussed in appendix C. Note that another common approach to dealing with the autocorrelation is cross validation (Hastie et al. 2009) with a resampling using consecutive data chunks (e.g., a day or a week).

After having split the dataset, the boosted learner can be determined as discussed in sections 3a and 3b. Then, its performance has to be assessed based on the validation dataset. The results based on this validation will be presented in a contingency table (section 4), which lists all possible outcomes in a 2 × 2 matrix. There are several cases that can occur (Jolliffe and Stephenson 2011): 1) no foehn occurs and this is also the prediction (bottom-right element in the contingency table), 2) foehn conditions occur and they are predicted (top left), 3) foehn conditions occur but were not predicted (i.e., a type I error according to statistical decision theory; top right), and 4) no foehn conditions occur but were actually predicted (type II error; bottom left). An overall rating of the algorithm’s performance must combine the correct predictions (the diagonal elements) with the erroneous ones (the off-diagonal elements). A series of studies deal with the correct interpretation of these simple contingency tables (see, e.g., Stephenson 2000; Jolliffe and Stephenson 2011), which is partly motivated by an obvious misinterpretation of tornado forecasts by Finley in 1884 (Murphy 1996). Here, we follow the approach set out by Murphy and Winkler (1987), where two complementary interpretations are considered. In the first approach, called likelihood-base rate factorization by Murphy and Winkler (1987), we take the single observed events (foehn or nonfoehn) as the starting point and then ask: Was the event correctly detected by the algorithm? The second approach [called calibration-refinement factorization in Murphy and Winkler (1987)] takes the predictions (warnings) of the algorithm as the starting point and then asks: Was it a correct warning; that is, did the predicted event really occur? We call the first approach event-based statistics and the second one alarm-based statistics, instead of using the terminology suggested in Murphy and Winkler (1987).

Finally, the terminology of the performance measures based on contingency tables has created some confusion in recent years. Here, we take the advice in the corrigendum of Barnes et al. (2007) seriously and clearly define the terms: in the event-based perspective, POD and POFD are used for the probability of detection and the probability of false detection (or false alarm rate), respectively. Furthermore, we use the miss ratio (MR) as the probability that a foehn occurred but was not detected. This is complemented by the analogous alarm-related measures: correct alarm ratio (CAR), false alarm ratio (FAR) (or probability for false alarm), and the missed (or missing) alarm ratio (MAR), where the latter in this study is defined as the probability that no alarm was issued but a foehn event was actually observed.

4. Results

a. Performance of the boosted classifier and ranking of predictors

We start with the subjectively selected set of 133 predictors derived based on the COSMO-7 reanalysis (section 2) and assume that all foehn hours during 2000–02 are randomly attributed to either the learning or to the validation dataset (section 3c). The learning dataset was used to adjust all parameters of the weak learners (as outlined in section 3), and the resulting boosted learner was then applied to the validation dataset. If the boosted learner yields a value above 0.5, we consider it to be a foehn forecast and vice versa for nonfoehn forecasts. The validation results are presented in the following paragraphs. Note that the validation is very strict in the sense that it tolerates no time shift for the occurrence of a foehn event (i.e., this is critical at the start and end of a foehn period).

The contingency table (Table 1) for the validation run lists the following values for the “event-based statistics” (Table 2): POD of 88.2%, a corresponding POFD of 2.9% and an MR of 11.8%. Hence, the performance is very good from an “event based” perspective [the results for the alternative alarm-based perspective (Table 3) are presented later in section 4b].

Table 1.

Contingency table for the run with the following settings (see also section 4a): 133 predictors, 20 boosting iterations (weak learners), and an initial relative weight of 2 for foehn events compared to 1 for nonfoehn events.

Table 1.
Table 2.

Summary of detection rates for the event-based perspective (see section 3c). The following notation is used: P(a | b) is the conditional probability for a given b, where a and b are either predicted foehn or nonfoehn events (predicted and not predicted) or correspondingly observed events (observed and not observed).

Table 2.
Table 3.

As in Table 2, but for the alarm-based perspective.

Table 3.

Next, it is worthwhile to look at the rankings of the individual 133 predictors. It is a clear advantage of the AdaBoost algorithm that it allows us to study the “strength’ of the individual predictors. The individual predictor ranking, a brief predictor description (as in Table A1), and the cumulative error are listed in Table 4. The cumulative error is calculated in the following way. According to the AdaBoost algorithm, a weight wi is attributed to each weak learner m. Within each weak learner, which is a majority vote over the 133 predictors, a prediction error (based on the learning dataset) is attributed to a predictor: εij, with index j now referring to the jth predictor error and i still referring to the ith weak learner. The cumulative error for predictor j, as listed in Table 4, is then calculated as the sum wiεij over all weak learners i. The most powerful predictor is the pressure difference between Altdorf and Lucerne, evaluated at different time instances for foehn conditions in Altdorf. A look at individual case studies (not shown) confirms that a large pressure difference in this region is indeed a very prominent feature during foehn conditions. On the other hand, it is far from intuitive why a pressure contrast across approximately 50 km surpasses cross-Alpine contrasts. Further, the wind speed in Altdorf itself, where we want to predict the foehn, appears in the ranking table only in fourth place. Most likely, this result reflects the deficiency of the COSMO-7 model in realistically representing foehn winds in the narrow Reuss valley (cf. Wilhelm 2012), whereas the pressure field is less sensitive to this problem as a result of model resolution because it is representative of a larger area.

Table 4.

Ranking of the single predictors for the final boosted foehn classifier (as presented in section 4a; the time lag of the predictor is also given). The cumulative error in the third column corresponds to the summed error values for the predictor: The lower the value, the larger the predictive power (see text for details). Only the 10 predictors with the highest ranks are listed; the highest cumulative errors are 5.658 (not shown).

Table 4.

Motivated by the previous results, we repeat the boosting experiment, but this time we restrict the number of predictors, according to the previous ranking of the predictors. The cumulative errors in Table 4 for the first few predictors are very similar (not shown). For instance, if only the best predictor Δp(ALT − LUZ, −1 h) is kept, the following measures result: POD = 63.2%, POFD = 3.1%, and MR = 36.8%. If the first five predictors in Table 4 are considered, the corresponding values are POD = 81.3%, POFD = 3.1%, and MR = 18.7%; that is, the performance significantly improves. This indicates that, indeed, no single predictor is overpowering all of the others.

In the next experiment, we make use of the objective predictor set, which relies only on pressure differences between stations and their tendencies (see section 2b). This experiment is motivated by the predominance of pressure-related predictors in the rankings shown in Table 4. All other parameters are kept fixed. The performance of the objective classification is then as follows: POD = 91.7%, POFD = 3.2%, and MR = 8.3%. All in all, these numbers are very similar to, or even outperform, the ones with subjective predictors. This indicates again that the most relevant parameters are indeed related to pressure differences, as has already been found based on the subjective predictors. However, in this experiment there was no need to subjectively define reasonable pressure differences in advance: The “best” pressure-related signals are found as a by-product of the algorithm.

The four most powerful pressure differences identified by AdaBoost based on the objectively defined set of predictors are shown in Fig. 5a. Note that none of the four pressure differences agree with the pressure difference Δp(ALT − LUZ), which was the most powerful predictor selected based on the subjectively defined predictors. It is interesting to relate the selected pressure differences to the typical pressure pattern during foehn conditions. A detailed pressure analysis for a single foehn case (13–19 January 1975) is given in Gutermann (1979). For this specific foehn case, strong pressure differences are indeed observed between Napf, Switzerland (NAP), and Robbia, Switzerland (ROB); Pilatus, Switzerland (PIL) and ROB; and Vaduz, Liechtenstein (VAD), and Scuol, Switzerland (SCU) (see inset in Fig. 5b); whereas the selection of Hörnli, Switzerland (HOE) − ROB remains more elusive. Averaging over all foehn instances during the years 2000–02, we find that essentially the same pattern emerges in the COSMO pressure field (Fig. 5b), except for the local low pressure zones along the main foehn valleys (near PIL and VAD). Overall, these results hint at a potential additional application of this approach, as it can be used to determine “optimal” pressure differences (or other signals) in a large collection of predictors. This is a classical “big data” task, because a manual evaluation of all pressure differences would not be feasible.

Fig. 5.
Fig. 5.

(a) All stations with pressure reduced to sea level height, which are included in the sensitivity run described in section 4. The topography, at 1-km resolution, is given in color shading and additionally the first four predictors in the ranking are drawn. They all correspond to pressure differences between a station on the Alpine south side and four stations on the Alpine north side: NAP − ROB, PIL − ROB, HOE − ROB, and VAD − SCU. (b) The reduced surface pressure (in steps of 0.5 hPa) averaged over all foehn instances during the years 2000–02. As an inset, the detailed pressure analysis of the foehn on 15 Jan 1979 from Gutermann (1979) is shown.

Citation: Weather and Forecasting 32, 3; 10.1175/WAF-D-16-0208.1

In the following sections we will continue working with the subjective COSMO-based predictors. The subjective set of predictors has the advantage that it brings in more explicitly the experience and expectations of foehn forecasters, and in this sense it fits with this study’s aim of better understanding the meteorological basis of a foehn forecast. However, the results of this section also indicate that the objective predictors could have been used without expecting substantially different outcomes.

b. The boosted foehn index as a forecasting tool

In the previous section, we characterized the algorithm’s performance: POD of 88.2%, POFD of 2.9%, and MR of 11.8%. On the other hand, a foehn forecaster might be more interested in the “alarm based” performance of the algorithm. Here, the most relevant numbers are (see Table 3) the CAR (66.2%), the FAR (33.8%), and the MAR (0.8%). Of course, these characteristic numbers strongly depend on the choice of the threshold used. If the threshold is changed from 0.5 to 0.6, the performance significantly improves (CAR = 80.9%, FAR = 19.1%, and MAR = 2.5%) and the numbers become rather similar to those in Zweifel (2016). The influence of the threshold on the model performance will be further discussed in section 4c.

The fact that the CAR is smaller than the POD can readily be understood by means of Bayes’s theorem, which relates conditional probabilities to the overall probabilities of events. In particular, P(foehn | alarm) = P(foehn)/P(alarm) ⋅ P(alarm | foehn); that is, CAR = P(foehn)/P(alarm) ⋅ POD. Hence, if the AdaBoost index is overly predicting foehn conditions compared with the observed foehn frequency, the CAR is accordingly diminished. In this study (at a threshold of 0.5), the probability of alarm P(alarm) exceeds that of foehn occurrence P(foehn) by a factor ~1.3, and therefore the CAR is reduced by exactly this factor.

At first inspection the CAR, FAR, and MAR numbers look somewhat “disappointing” (at the 0.5 threshold), but they must be seen in the perspective of the operational application of the algorithm. Indeed, it is most likely that a foehn forecaster will not only look at one single number (above or below the 0.5 threshold). Instead, he or she will look at a time series of the forecasting index. This is shown in Fig. 6 for one year (2002) and in greater detail for one single month (March 2002). Because such a “visual” validation assumes that the learning and the validation datasets can clearly be separated from each other, we show the outcome for the modified experiment where the learning dataset covers the first half (January 2000–July 2001) and the validation dataset the second half (August 2001–December 2002) of the whole time period (see also appendix C). Therefore, the whole year 2002 is included in the validation dataset—but still exhibiting comparable performance measures as in the randomized splitting (CAR = 58.1%, FAR = 41.9%). Several features are discernible from Fig. 6: 1) if a threshold of 0.5 is chosen, most foehn periods are captured; 2) the signal-to-noise ratio (i.e., foehn to nonfoehn ratio) is very good, with foehn periods clearly standing out as increased values; and 3) the AdaBoost index “oscillates” during foehn periods around the 0.5 threshold (i.e., part of a foehn period might fall below the threshold, but nevertheless is clearly associated with enhanced index values at or near the foehn instance). Considering the time series of the AdaBoost index, a smoothing operator might be appropriate for obtaining a clearer signal. Here, we omit this operation and leave it for the final adjustment to an operational environment.2

Fig. 6.
Fig. 6.

Example of the performance of the boosted foehn classifier for (a) the whole year 2002 and (b) March 2002. Green labels mark time instances when a foehn event was observed, blue labels forecast foehn events, and red marks the mismatch between the two. (b) The timeline of the forecast value is shown in blue, which corresponds to a forecasted foehn event if the dashed decision threshold (0.5) is surpassed.

Citation: Weather and Forecasting 32, 3; 10.1175/WAF-D-16-0208.1

In addition to the AdaBoost index, Fig. 7 shows the Widmer foehn index (Widmer 1966), which has been in operational use at MeteoSwiss since 1966. This index underwent several adaptions over the course of the years (Courvoisier and Gutermann 1971) and is therefore optimally tuned to the foehn forecast in the Swiss valleys. MeteoSwiss reports that the CAR = 70% for predicting the onset of foehn and 72% for its decay.3 During the validation time period (July 2001–December 2002), the Widmer index performs very well; that is, at the time instances with foehn conditions marked in Fig. 7, the index indeed surpasses its seasonally dependent threshold. The time series of the Widmer index is a rather “analog” signal, which directly relies on its basic definition as a north–south pressure difference. The time series of the AdaBoost index, on the other hand, takes more of a discrete, binary shape. In this sense, its interpretation is easier than that of the Widmer index. Note that this time series confirms the findings of Fig. 6: the foehn instances are essentially captured by the AdaBoost index although it is not yet optimally tuned for foehn forecasting.

Fig. 7.
Fig. 7.

Time series of the (top) Widmer index and (bottom) AdaBoost index for the time period July 2001–December 2002. Additionally, the thresholds for foehn detection are included as red, dashed lines, and observed foehn events are marked with a gray bar. The time period corresponds to the validation dataset, whereas the training dataset contains the 1.5 yr before. The threshold for the Widmer index varies during the year, with the highest values in winter and smallest in summer as a result of the stronger stratification in the foehn valleys before foehn onset during winter compared with summer; i.e., in winter stronger pressure gradients are needed to displace the stable cold air pools in the valleys.

Citation: Weather and Forecasting 32, 3; 10.1175/WAF-D-16-0208.1

c. Trade-off between POD and POFD

The timeline diagrams in Figs. 6 and 7 reveal that the foehn forecast and the contingency table (Table 1) significantly depend on the threshold chosen. If the threshold is taken as being below 0.5, there is a higher chance for correctly detected foehn conditions (POD), but the POFD is also raised. On the other hand, increasing the threshold above 0.5 certainly decreases POFD, but correspondingly also decreases POD. There is no simple solution to this dilemma. Indeed, a very similar problem is encountered, for example, in ensemble weather prediction, where the economic value of a probability forecast strongly depends on the cost/loss ratio (Katz and Murphy 2008). Manzato (2007) argues that an optimal threshold can be determined based on the Peirce skill score (PSS), which is defined as PSS = POD − POFD, with the threshold maximizing the PSS being the best choice. We find this is the case for a threshold of 0.4; however, the maximum is very flat (i.e., the neighboring thresholds yield nearly the same PSS results). More importantly, Manzato (2007) states that the benefit of the PSS-optimized threshold lies in the comparison of different classifiers. On the other hand, in practical applications of a single classifier, the end user typically has to decide whether he or she wants to accept a lower POD (if the threshold is increased), with the resultant gain of a reduced POFD, or whether he or she more highly values an increased POD (if the threshold is decreased), at the cost of enhanced POFD. It is likely that a clear distinction must be made between the professional end user, who has clear economic considerations at stake, and the public [see also Palmer (2002) for a discussion of the economic values of ensemble forecasts].

Figure 8a shows the relative frequency of foehn and nonfoehn predictions as the threshold value is varied, and Fig. 8b quantitatively establishes how the different performance measures depend on the threshold. If a threshold of 0 is chosen, foehn conditions are always predicted. Of course, in this case all foehn cases are captured (POD = 1) and nonfoehn cases are missed (MR = 1 − POD = 0), but at the cost that many of the nonfoehn days are incorrectly identified as foehn days (POFD ~ 1). A slightly different pattern of behavior is seen for the alarm-based measures: if an alarm is issued always at the 0 threshold, the probability that this alarm is indeed correct is equal to the overall frequency of foehn (i.e., CAR = 6.1%). On the other hand, a permanent alarm means that most often it is displaced (FAR = 1 − CAR = 93.9%). At least, there was always an alarm when there should have been one (MAR = 0). If the threshold is increased to 0.5, the values significantly change, leading to the more meaningful performance measures discussed so far (see Table 3). Finally, one might increase the threshold to 1, in which case foehn conditions are never predicted. Obviously, the alarm-based quantities lose their meaning in this extreme case. From an event-based perspective, we get immediately POD = 0 and POFD = 0 because if there are no foehn conditions and they are never predicted as such, no false detection can result. On the other hand, we see that MR = 1 − POD = 1.

Fig. 8.
Fig. 8.

(a) Distribution of forecast values for foehn (red) and nonfoehn (blue) events. The vertical dashed line corresponds to the decision threshold (0.5) applied in the evaluation. (b) Trade-off between POD, POFD, and MR, as well as between CAR, FAR, and MAR, depending on the decision threshold of the boosted foehn classifier. For CAR, FAR, and MAR only thresholds up to 0.6 are considered, because for higher values the curves become very “noisy.”

Citation: Weather and Forecasting 32, 3; 10.1175/WAF-D-16-0208.1

A common way of representing the sensitivity to the threshold is through the use of receiver (or relative) operating characteristics (ROC; see Jolliffe and Stephenson 2011). For the AdaBoost this approach is shown in Fig. 9, with POFD on its horizontal axis and POD on its vertical axis. The different points on the ROC curve differ with respect to the threshold chosen. There are several aspects that can be derived from the ROC curve: 1) there is a trade-off between the sensitivity (POD) and the specificity (POFD) of the index; 2) the distance from the diagonal indicates the accuracy of the index, more specifically the more closely the curve follows the left-hand border and then the top border, the more accurate is the index; and 3) the likelihood ratio (LR) can be derived as the slope at an ROC point, where the LR is given as the POD/POFD ratio. For instance, for a threshold of 0.5 the slope and the LR are rather large (31).

Fig. 9.
Fig. 9.

ROC values for the AdaBoost index, i.e., dependency of POFD and POD on the decision threshold. Some thresholds along the ROC curve are marked with colored dots.

Citation: Weather and Forecasting 32, 3; 10.1175/WAF-D-16-0208.1

Quantitatively, the performance of a test is often described by the area beneath the ROC curve: the larger the area, the greater the accuracy of the index. In our case (Fig. 9) this area is large and, hence, clearly is above the value required for a successful test. The interpretation of the ROC curve can readily be visualized in the following way (Fawcett 2006). Consider the situation where the days in the validation dataset are already correctly split into a foehn and a nonfoehn group. Randomly, a day is selected from the foehn group and a day from the nonfoehn group. For these two days the AdaBoost classification is performed; that is, it is determined whether they fall into the predicted foehn or nonfoehn group. For a perfect foehn index we require that the day from the foehn group also be classified as a foehn day and, correspondingly, the day from the nonfoehn group be classified as a nonfoehn day. The area beneath the ROC curve gives the probability that this is indeed the case for any randomly drawn pair of days (Fawcett 2006); that is, it is the probability that the classifier will rank a randomly chosen positive instance (foehn) higher than a randomly chosen negative instance (nonfoehn).

d. Aspects of foehn predictability

An area of particular interest is the performance of the AdaBoost index during different weather scenarios. We start with the seasonal dependency of the predictability (i.e., the index performance during the four seasons). Indeed, the dynamics of winter foehn conditions are not necessarily the same as for summer foehn conditions. We will now examine some of the characteristics of the seasonal foehn cases. The vertically averaged stability in the Po valley, south of the Alps, is typically stronger in winter and autumn than in spring; accordingly, the easterly low-level jet in the Po valley is more pronounced in winter and autumn (Würsch and Sprenger 2015). Furthermore, the foehn occurrence strongly varies from season to season (see section 2a for frequencies). These differences in dynamics and frequency are likely reflected in the predictability of foehn conditions during the different seasons (Richner and Hächler 2013).

The main results for the seasonal dependency of the algorithm are 1) the PODs are 88.5% for spring, 91.3% for summer, 90.6% for autumn, and 90.9% for winter and, hence, no significant differences are discernible; 2) the CARs are 79.6% in spring, 48.8% in summer, 69.2% in autumn and 86.5% in winter, indicating that objectively predicting foehn conditions in winter and spring is easier than in autumn and particularly in summer;4 and 3) pressure differences yield the highest forecast power (ranking as in Table 4), except for summer, when the lowest cumulative error is found for the predictor vel10m(ALT, 0 h). With respect to the poor predictability of summer foehn conditions (CAR = 48.8%), we can speculate about possible reasons based on the experience as a forecaster at Meteotest (www.meteotest.ch) by the third author. Indeed, during days with considerable solar irradiance (in summer), a heat low develops over the Alpine region, which then leads to Alpine pumping (Winkler et al. 2006). Often, the pressure field due to the heat low counteracts that of a foehn flow, pushing the foehn back to the inner Alpine valleys or even completely suppressing it (Lotteraner 2009). A predictor Δp(LUZ − LAT), as determined by the AdaBoost algorithm, may lose its relevance in this case. Further, during the night the pressure pattern of the Alpine pumping reverses and can lead to foehnlike wind gusts in valleys. In summer, the breakthrough of the foehn flow in the valleys also strongly depends on the time of the day: solar irradiance substantially supports the erosion of the cold-air pool in the valleys. Finally, summer foehn conditions can be distinguished from the foehn conditions in other seasons with respect to their impact on thunderstorms in the Alps. In fact, the summer foehn flow can enhance or suppress thunderstorms, in contrast to its inhibition of storms from autumn to spring. In summary, there are several aspects in which summer foehn conditions differ from the foehn conditions during the other seasons and why, therefore, the AdaBoost index trained on year-round foehn instances performs poorly during summer.

Further insight into the predictability can be gained if the meteorological fields are compared for correctly identified and missed foehn events (i.e., for all foehn instances in the validation period the composites are separately calculated for cases captured and missed by the AdaBoost classifier). Geopotential height composites at 500 hPa (Fig. 10a) show for both cases a southwesterly flow over the Alpine region. However, it is shifted slightly to more southerly flow for the cases that are correctly identified. This indicates that deep foehn conditions, characterized by a clear southwesterly flow in the midtroposphere above the Alpine crest, are more predictable than situations where the midtropospheric flow is more zonal, as for instance during shallow foehn events. A very similar pattern of geopotential heights is discernible at 700 hPa (not shown). This clear separation into two midtropospheric flow situations is further supported by the sea level pressure (SLP). Figure 10b shows the difference SLP(identified) − SLP(missed). Positive SLP anomalies indicate higher pressure values for the identified cases than for the missed ones. Two distinct patterns can be seen in Fig. 10b: 1) the SLP over the Gulf of Biscay is deeper for correctly identified foehn cases, consistent with the fact that a more southerly wind is approaching the Alpine south side, and 2) the SLP difference over the Po valley is rather negligible, or even slightly positive; the corresponding difference on the Alpine north side is clearly negative. Hence, a more pronounced pressure gradient can be found across the Alpine ridge for the correctly identified cases. Finally, we look at the precipitation composites, again as a difference between identified and missed cases to highlight the contrast between the two. Such a contrast can be seen over the Alps, where the Alpine crest nicely defines the boundary between the positive precipitation anomaly on the Alpine south side and the dry anomaly on the Alpine north side. This across-Alpine dipole indicates that more precipitation falls south of the Alps for correctly identified cases than for missed ones. Of course, this is consistent with a more southerly flow approaching the Alps and further raises the question of whether the predictability of the foehn by the AdaBoost algorithm depends on the thermodynamic stability (stratification) of the atmosphere over the Po valley. This will be discussed now.

Fig. 10.
Fig. 10.

(a) Composite of geopotential height (m) at 500 hPa for all observed foehn events that are correctly identified (contour lines) or missed (in color shading) by the AdaBoost classifier. (b) Difference between the identified and missed precipitation composites (color shading; mm h−1) and correspondingly for the sea level pressure composites (contour lines; hPa). As in (a), all observed foehn events are considered.

Citation: Weather and Forecasting 32, 3; 10.1175/WAF-D-16-0208.1

In fact, Würsch and Sprenger (2015) looked in their study at the evolution of the foehn air parcels before they reached the northern Alpine valleys. In particular, they considered the minimum height these air parcels attain over the Po valley, and based on the results distinguished between a Swiss and an Austrian foehn type (Steinacker 2006). Note that all air parcels have to rise at least up to heights of 2100 m, corresponding to the Gotthard pass, to cross the Alpine barrier. One may wonder whether the predictability of foehn conditions, expressed in terms of the AdaBoost index, depends on this foehn type. Since the ascent of the air parcels on the Alpine south side, and hence the foehn type, strongly depends on the stratification over the Po valley; this also indirectly gives a hint whether the thermodynamic stability of the atmosphere over the Po valley decisively impacts the predictability. Figure 11 shows the result for this analysis: no clear signal is discernible. Irrespective of the minimum height of the air parcels over the Po valley, the AdaBoost index can fall above or below the threshold of 0.5 chosen for correct classification. Of course, this simple analysis can only hint at a possible influence of the thermodynamic stability on the Alpine south side. A refined analysis would have to look more carefully at the vertical temperature profiles, including temperature inversions, and other parameters having an impact on air parcel buoyancy (e.g., relative humidity). Here, we refrain from doing so because it would necessarily require an extensive and systematic approach, which would be beyond what is feasible in this study.

Fig. 11.
Fig. 11.

AdaBoost index for foehn instances as a function of the air parcels’ height over the Po valley [as determined in Wuersch and Sprenger (2015); see text for details]. Green dots represent correctly identified foehn events, and red dots missed cases.

Citation: Weather and Forecasting 32, 3; 10.1175/WAF-D-16-0208.1

5. Conclusions

In this study, the AdaBoost machine learning algorithm was applied to the prediction of foehn conditions in Altdorf within the Swiss Reuss valley. Three years (2000–02) of 1-hourly foehn observations are used as the training and validation dataset. This foehn climatology is taken as the target variable (the “truth”) for the supervised learning problem. The predictors, on the other hand, are derived from a COSMO-7 reanalysis dataset. For the same 3-yr period 133 predictors are subjectively extracted from the hourly reanalysis databased on the main characteristics of foehn flow. The fields particularly rely on pressure differences, wind speeds, precipitation, and relative humidities at different time instances relative to the observed foehn and nonfoehn events. Furthermore, the tendencies of all predictors are taken into account. In a second experiment, an objectively identified set of 2546 predictors is used, which is based solely on the pressure differences and tendencies between 67 stations in Switzerland. Based on a splitting of the dataset into a training and a validation dataset and assessing the performance for the latter, the following main conclusions can be drawn from this study:

  • The foehn identification based on the subjectively chosen COSMO-7 precursors works well. About 90% of the foehn instances were correctly identified (POD) and the probability of false detection (POFD, or false alarm rate) remains as low as 3%. A key parameter is the threshold that determines whether a forecast is identified as a foehn or a nonfoehn event. In a reference setting, this threshold is set to 0.5, but it can be adapted to users’ needs. For instance, decreasing the threshold from 0.5 to 0.3 increases the POD from 88.2% to 99.0% while the POFD increases from 2.8% to 12.0%. This trade-off between correct and incorrect classification is a characteristic feature of the threshold value that finally has to be “solved” by the specific application of the forecast.

  • From a forecasting perspective, the correct alarm ratio (CAR), false alarm ratio (FAR), and missed alarm ratio (MAR) are more important than the POD and POFD. In 65% of the cases a foehn warning was issued based on the COSMO-7 precursors and a foehn actually occurred (CAR). If no warning was issued, in only 0.6% of the cases did a foehn occur (MAR); however this low number must be seen in context with the substantially larger number of nonfoehn hours in the dataset than foehn hours. Further, the performance of these “alarm-based measures” can be substantially improved if a threshold of 0.6 is used. Then, the corresponding measures become CAR = 80.9%, FAR = 19.1%, and MAR = 2.5%.

  • The most important predictors for foehn conditions in Altdorf are the pressure difference between Altdorf and Lucerne and the wind speed at Altdorf. The fact that wind speed itself does not take the highest rank indicates that the foehn flow still is not represented in the COSMO-7 model with sufficient accuracy. We expect the pressure difference to be less negatively influenced by the low model resolution: hence, whereas a realistic wind speed would require a sufficiently resolved foehn valley, the north–south pressure field nevertheless captures the main pattern and strength during foehn periods. The predominant importance of pressure differences as predictors is further highlighted in the brute-force sensitivity experiment in which pressure values at 67 measurement sites in Switzerland and all pairwise combinations (differences) between them are taken as predictors (2546 predictors in total). The performance measures are rather similar in this experiment: POD = 92% and POFD = 3%. As an application of this approach, the optimal pressure difference, in terms of predictive power, can be determined. It turns out to be a difference between central and south-eastern Switzerland (as shown in Fig. 5).

  • The quality of the foehn warning depends on the season. During summer, only ~50% of all foehn warnings are actually associated with a foehn observation; in all other seasons this false alarm ratio is considerably lower (≤20%). The highest number of correct alarms (CAR) are issued in winter (86%), followed by spring (80%) and then autumn (70%). With respect to the POD and POFD, the differences are rather small. All values are around 90%. In the cases where foehn conditions are correctly identified, the mid- and lower-tropospheric flow impinging upon the Alps is more from the southwest than is found in the misidentified cases. Further, in the predicted cases a more pronounced pressure contrast is discernible across the Alps. Finally, no clear impact on performance could be found for the air parcel’s height over the Po valley.

The objective algorithm described in this study is not yet in operational use. Currently, the plan is to set the algorithm in a preoperational implementation, which then would further allow all parameters to be fine-tuned, particularly the decision threshold. When the AdaBoost index becomes operationally available, it could be more systematically compared with the already operational Widmer index. Furthermore, a simple yes/no classification might only be a starting point. In fact, the classification can be extended to distinguish different types of foehn conditions (e.g., dimmerfoehn and/or north foehn events on the Alpine south side). We will pursue these new research questions in the future based on a 10-yr NWP simulation covering the whole of Europe at 2-km grid spacing, as planned in the Convection-Resolving Climate Modeling on Future Supercomputing Platforms project (crCLIM; www.c2sm.ethz.ch/research/crCLIM.html). Further, it will be interesting to extend the AdaBoost foehn index beyond Switzerland, for instance, to see whether it performs equally well in other major foehn regions. In particular, we intend to further investigate meteorological aspects of foehn flow (e.g., its predictability) based on the performance of the AdaBoost (and other) machine learning technique.

Acknowledgments

We thank Bruno Dürr and the Alpine Research Group Foehn Rhine Valley/Lake Constance (AGF; www.AGFoehn.org) for making the foehn index available, and the Swiss National Weather Service (MeteoSwiss) for providing the COSMO reanalysis dataset and the algorithm to calculate the Widmer foehn index. Daniel Gerstgrasser, also from MeteoSwiss, helped with some specific aspects of the Widmer index and the frequency of dimmerfoehn. Furthermore, we had fruitful discussions with Donat Perler, who developed an AdaBoost prediction for thunderstorms in Switzerland. David Plavcan made us aware of the study by Lauren Zweifel about probabilistic foehn forecasting. Finally, we thank three anonymous reviewers who helped to improve the manuscript.

APPENDIX A

List of Subjective Predictors Derived from the COSMO NWP Model

Here, we provide a complete list of the subjective predictors fed into the objective classification algorithm [see section 2b(1)]. Note that the number of predictors is rather large, but all of them are based on foehn characteristics. We do not try to preselect the predictors, but rely on the AdaBoost algorithm to identify those which are of real relevance for the foehn classification. It is an advantage of the AdaBoost approach that it is able to handle this many, partly strongly correlated, predictors. In total, 133 predictors are used, all of them derived from the COSMO reanalysis [see section 2b(1)]. The basic predictor values are listed in Table A1, where A stands for a measurement site A and A − B denotes a difference between two measurement sites A and B. A subscript determines the height or pressure level at which the predictor value is determined (10 m above ground, 850, 700, and 500 hPa; if no level is given, it is assumed that the values are taken at the lowest model level). The predictors encompass 1) the pressure difference Δp(A − B); 2) the geopotential height difference Δz(A − B); 3) the surface precipitation precip(A); 4) the relative humidity RH(A); 5) the temperature difference ΔT(A − B); 6) the equivalent potential temperature Θe(A); 7) the wind speed and direction vel(A) and dir(A), respectively; 8) the wind shear Δυz(A, 1900 − 1700 m), between 1700- and 1900-m heights (correspondingly for other heights); and 9) the vertical stability, expressed as the difference of the potential temperature Δθ(A, 1900 − 1700 m), between two height levels. As a further refinement, the predictors are extracted from the COSMO model at different times relative to the foehn instance (t = 0). One hour before (t = −1 h) and one hour after (t = +1 h) are considered. To include time tendencies explicitly, the difference of the predictors between (t = +1 h) and (t = −1 h) are also included. If the time instance has to be referred to in the text, we adopt the following intuitive notations: for example, for RH(A) the list of values included are RH(A, t = 0), RH(A, t = −1 h), RH(A, t = +1 h), and RH(A, Δt = 2 h). Finally, we added a 133rd dummy predictor, reduced surface pressure at 56.5°N, 21.6°E, which we expect to be very weakly correlated to foehn conditions in Altdorf due to its large horizontal distance. This dummy variable will be taken to judge the predictive power of the other, more physically motivated predictors. The complete list of predictors can be found in Table A1.

Table A1.

List of all predictors used in the analysis (see appendix A for a detailed description of the notation). In total 133 predictors are used, where the number in this table must be multiplied by 4 to account for all time instances of the predictors. Additionally, a dummy field is used as a predictor, which is assumed to be only weakly correlated with foehn conditions in Altdorf. The stations used are (with latitude–longitude coordinates; see www.meteoswiss.ch for details): Domodossola (DOM; 46.11°N, 8.28°E); Thun (THU; 46.77°N, 7.62°E); Altdorf (ALT; 46.86°N, 8.63°E); Lucerne (LUZ; 47.03°N, 8.30°E); Zürich MeteoSwiss, Switzerland (SMA; 47.38°N, 8.71°E); Locarno, Switzerland (OTL; 46.17°N, 8.78°E); Bern, Switzerland (BER; 46.93°N, 7.41°E); Magadino, Switzerland (MAG; 46.17°N, 8.88°E); Zürich (ZUE; 47.38°N, 8.53°E); Lugano, Switzerland (LUG; 46.00°N, 8.95°E); Cluses, France (CLU; 46.06N°, 6.58°E); Morgex, Italy (MOR; 45.75°N, 7.06°E); Nantes, France (NAN; 47.21°N, −1.51°E); Szombathely, Hungary (SZO; 47.22°N, 16.62°E); Piotta, Switzerland (PIO; 46.52°N, 8.68°E); Napf (NAP; 47.00°N, 7.93°E); Corvatsch, Switzerland (COV; 46.42°N, 9.77°E); Jungfraujoch, Switzerland (JUN; 46.55°N, 7.98°E); Parc National de la Vanoise, France (PAR; 45.39°N, 7.12°E); Baceno, Italy (BAC; 46.27°N, 8.32°E); Faido, Switzerland (FAI; 46.49°N, 8.80°E); Vaduz (VAD; 47.14°N, 9.51°E); Aosta, Italy (AOS; 45.74°N, 7.31°E); Altstätten, Switzerland (AST; 47.38°N, 9.55°E); Cagliari, Italy (CAG; 39.22N°, 9.11°E); Le Sentier, Switzerland (SEN; 46.65°N, 5.67°E); Piz Martegnas, Switzerland (MAR; 46.58°N, 9.52°E); Arolla, Switzerland (ARO; 46.02°N, 7.48°E); Gütsch, Switzerland (GUE; 46.65°N, 8.60°E); Crap Masegn, Switzerland (CRA; 46.85°N, 9.17°E); Säntis, Switzerland (SAE; 47.25°N, 9.34°E); and Lake Constance, Austria–Germany–Switzerland (LCO; 47.54°N, 9.61°E).

Table A1.

APPENDIX B

Sensitivity on the Number of Boosting Iterations and Initial Weight Attributed to Foehn Events

In this section, we perform sensitivity tests involving multiple parameters of the boosting algorithm: the number of boosting iterations and the initial weight attributed to the foehn events relative to the nonfoehn events. The sensitivity experiments should again be compared to the reference run in section 4a, where we used an hourly random splitting of the whole time series 2000–03. We provide the comparable values for the POD (88.2% in the reference run) and for the CAR (66.2% in the reference run). Note that the reference run included 20 boosting iterations, the initial weight attributed to foehn conditions relative to nonfoehn events was 2, and the total number of predictors was 133. Further, where not explicitly stated otherwise, the splitting of the dataset into learning and validation parts is taken from the reference run, hence avoiding complications due to random sampling errors (see appendix C).

We start with the sensitivity to the number of iterations, which takes the values 5, 10, 20 (reference), and 40. The POD is highest for only 5 iterations (92.9%), remains approximately stable for 10 and 20 iterations (both 90%) and decreases to 86.6% for 40 iterations. This is somewhat counterintuitive because one might expect that the algorithm performs better with more iterations. However, an improvement can be seen for the CAR, which increases from 52.2% to 70.0% if the number of iterations increases from 5 to 40. Note that this substantial increase in performance in CAR goes along with a rather modest decrease in POD. Further note that a similar decrease is found for the FAR, which decreases from 47.8% to 30.1% for the same change in iteration numbers. Overall, this contrasting behavior of POD and FAR reflects the dilemma outlined in section 4c and Fig. 8; that is, one performance measure can go up in cost while the other goes down. In this specific setting, we gave more weight to the decrease of the FAR than to the increase of the POD.

In our next sensitivity experiment it is possible to attribute different weights to the foehn and nonfoehn events in the AdaBoost learning process (see schematic diagram; Fig. 1). In the reference run of section 4a, the relative weight for foehn events with respect to nonfoehn events was set to 2, thus favoring higher POD. This becomes somewhat more pronounced if the relative weight is increased to 10: the POD increases from 88.2% to 90.7%. Similarly, this slight increase in POD goes along with a small decrease in CAR from to 65.1% to 64.2%. Hence, increasing the initial weight from 2 to 10 has nearly no effect on these two performance measures. On the other hand, the effect becomes much clearer if the initial weights of the foehn and nonfoehn events are decreased compared with the reference run. For instance, if the relative weight is 1, the POD becomes 83.8%, a considerable worsening, and correspondingly the CAR increases to 71.9%, which is a significant improvement.

APPENDIX C

Statistical Significance

In section 4, the reference run was discussed in terms of its predictive power. For instance, the POD was 88.2% and the FAR was 33.8%. These numbers depend on the random split of the hourly dataset during 2000–02 into learning and validation parts (see section 3c).

In a Monte Carlo approach we tested the sensitivity of these results on the splitting. To this aim we repeated the learning and validation runs 100 times, each run characterized by a new randomized splitting. The outcome is shown in Fig. C1, where the relative frequencies of the POD (Fig. C1a) and the FAR (Fig. C1b) are shown. The outcome of the reference run in section 4a is indicated by the vertical lines. With respect to the former, the 88.2% for the reference run is quite high, with the Monte Carlo average lying more near 85.7% (with a standard deviation of 3.7%). On the other hand, the false alarm ratio of 33.8% in the reference run is “too high” compared with the Monte Carlo average (mean 31.3% and standard deviation 4.3%).

Fig. C1.
Fig. C1.

Monte Carlo simulation showing the (a) POD and (b) FAR. The histograms show the outcome if 100 random splits of the original dataset into training and test samples are performed. The green line corresponds to the reference run discussed in section 4.

Citation: Weather and Forecasting 32, 3; 10.1175/WAF-D-16-0208.1

As discussed in section 3c, two different approaches for splitting the dataset into learning and verifying datasets are analyzed: a random hourly split and a split in the middle of the period 2000–02. Both are extreme in two different aspects. Whereas the random hourly split can be affected by the autocorrelation of the foehn time series, the middle split assumes the two time periods before and after July 2001 to be similar in their foehn characteristics. A third approach could take the meteorology of the foehn into consideration, namely its lifespan of a few days or at most a week. Hence, if we split the period 2000–02 into consecutive 4-week blocks, alternatingly attributed to the learning and verifying datasets, the problems of the two extreme approaches disappear. The outcome, however, does not change substantially. The performance measures are POD = 85.7%, POFD = 2.7%, MR = 14.3%, FAR = 34.0%, CAR = 66.0%, and MAR = 0.9%.

REFERENCES

  • Barnes, L. R., D. Schultz, E. C. Gruntfest, M. H. Hayden, and C. C. Benight, 2007: False alarms and close calls: A conceptual model of warning accuracy. Wea. Forecasting, 22, 11401147, doi:10.1175/WAF1031.1; Corrigendum, 24, 1452–1454, doi:10.1175/2009WAF2222300.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bougeault, P., and Coauthors, 1998: Mesoscale Alpine Programme – The science plan. MeteoSwiss, 64 pp. [Available online at www.map.meteoswiss.ch.]

  • Burri, K., P. Hächler, M. Schüepp, and R. Werner, 1999: Der Föhnfall vom April 1993. MeteoSwiss Rep. 196, 89 pp. [Available online at http://www.meteoschweiz.admin.ch/home/service-und-publikationen/publikationen.subpage.html/de/data/publications/1999/9/der-foehnfall-vom-april-1993.html.]

  • Courvoisier, H. W., and T. Gutermann, 1971: Zur praktischen Anwendung des Föhntests von Widmer. MeteoSwiss Rep. 21, 7 pp. [Available online at http://www.agfoehn.org/doc/Courvoisier_1971.pdf.]

  • Deloncle, A., R. Berk, F. D’Andrea, and M. Ghil, 2007: Weather regime prediction using statistical learning. J. Atmos. Sci., 64, 16191635, doi:10.1175/JAS3918.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dietterich, T. G., 2000: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Mach. Learn., 40, 139157, doi:10.1023/A:1007607513941.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Drechsel, S., and G. Mayr, 2008: Objective forecasting of foehn winds for a subgrid-scale Alpine valley. Wea. Forecasting, 23, 205218, doi:10.1175/2007WAF2006021.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dürr, B., 2008: Automatisiertes Verfahren zur Bestimmung von Föhn in Alpentälern. MeteoSwiss Rep. 223, 22 pp. [Available online at http://www.meteoschweiz.admin.ch/content/dam/meteoswiss/de/Ungebundene-Seiten/Publikationen/Fachberichte/doc/ab223.pdf.]

  • Fawcett, T., 2006: An introduction to ROC analysis. Pattern Recognit. Lett., 27, 861874, doi:10.1016/j.patrec.2005.10.010.

  • Freund, Y., and R. E. Schapire, 1997: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci., 55, 119139, doi:10.1006/jcss.1997.1504.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gagne, D. J., II, 2016: Coupling data science techniques and numerical weather prediction models for high-impact weather prediction. Ph.D. dissertation, University of Oklahoma, 185 pp. [Available online at https://shareok.org/handle/11244/44917.]

  • Glahn, H., and D. Lowry, 1972: The use of model output statistics (MOS) in objective weather forecasting. J. Appl. Meteor., 11, 12031211, doi:10.1175/1520-0450(1972)011<1203:TUOMOS>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Graf, M., M. Sprenger, U. Lohmann, C. Seibt, and H. Hofmann, 2013: Evaluating the suitability of the SWAN/COSMO-2 model system to simulate short-crested surface waves for a narrow lake with complex bathymetry. Meteor. Z., 22, 257272, doi:10.1127/0941-2948/2013/0442.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gutermann, T., 1970: Vergleichende Untersuchungen zur Föhnhäuigkeit im Rheintal zwischen Chur und Bodensee. Ph.D. dissertation, University of Zürich, Zürich, Switzerland, 69 pp.

  • Gutermann, T., 1979: Der Föhn vom 14. bis 18. Januar 1975 im Bodenseeraum. MeteoSwiss Rep. 90, 42 pp.

  • Gutermann, T., B. Dürr, H. Richner, and S. Bader, 2012: Föhnklimatologie Altdorf: die lange Reihe (1864-2008) und ihre Weiterführung, Vergleich mit anderen Stationen. MeteoSwiss Tech. Rep. 241, 53 pp., doi:10.3929/ethz-a-007583529.

    • Crossref
    • Export Citation
  • Hächler, P., K. Burri, B. Dürr, T. Gutermann, A. Neururer, H. Richner, and R. Werner, 2011: Der Föhnfall vom 8. Dezember 2006 – Eine Fallstudie. MeteoSwiss Rep. 234, 52 pp. [Available online at http://dx.doi.org/10.3929/ethz-a-007319165.]

    • Crossref
    • Export Citation
  • Hastie, T., R. Tibshirani, and J. Friedman, 2009: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Springer Series in Statistics, 745 pp., doi:10.1007/978-0-387-21606-5.

    • Crossref
    • Export Citation
  • Hsieh, W., 2009: Machine Learning Methods in the Environmental Sciences—Neural Networks and Kernels. Cambridge University Press, 349 pp.

    • Crossref
    • Export Citation
  • James, G., D. Witten, T. Hastie, and R. Tibshirani, 2013: An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics, Vol. 103, Springer, 426 pp. doi:10.1007/978-1-4614-7138-7.

    • Crossref
    • Export Citation
  • Jenkner, J., 2008: Stratified verifications of quantitative precipitation forecasts over Switzerland. Ph.D. thesis 17782, ETH Zürich, 108 pp., doi:10.3929/ethz-a-005698830.

    • Crossref
    • Export Citation
  • Jenkner, J., M. Sprenger, I. Schwenk, C. Schwierz, S. Dierer, and D. Leuenberger, 2010: Detection and climatology of fronts in a high-resolution model reanalysis over the Alps. Meteor. Appl., 17, 118, doi:10.1002/met.142.

    • Search Google Scholar
    • Export Citation
  • Jolliffe, I. T., and D. B. Stephenson, 2011: Forecast Verification: A Practitioner’s Guide in Atmospheric Science. 2nd ed. John Wiley and Sons, 292 pp.

    • Crossref
    • Export Citation
  • Katz, R. W., and A. H. Murphy, 2008: Economic Value of Weather and Climate Forecasts. Cambridge University Press, 240 pp.

  • Kretzschmar, R., P. Eckert, D. Cattani, and F. Eggimann, 2004: Neural network classifiers for local wind prediction. J. Appl. Meteor., 43, 727738, doi:10.1175/2057.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kuhn, M., and K. Johnson, 2013: Applied Predictive Modeling. Springer, 600 pp.

    • Crossref
    • Export Citation
  • Lotteraner, C., 2009: Synoptisch-klimatologische Auswertung von Windfeldern im Alpenraum. Ph.D. dissertation, University of Vienna, 112 pp. [Available online at http://othes.univie.ac.at/6142/.]

  • Manzato, A., 2005: The use of sounding-derived indices for a neural network short-term thunderstorm forecast. Wea. Forecasting, 20, 896917, doi:10.1175/WAF898.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Manzato, A., 2007: A note on the maximum Pierce skill score. Wea. Forecasting, 22, 11481154, doi:10.1175/WAF1041.1.

  • McCandless, T. C., S. E. Haupt, and G. S. Young, 2015: A model tree approach to forecasting solar irradiance variability. Sol. Energy, 120, 514524, doi:10.1016/j.solener.2015.07.020.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1996: The Finley affair: A signal event in the history of forecast verification. Wea. Forecasting, 11, 320, doi:10.1175/1520-0434(1996)011<0003:TFAASE>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., and R. L. Winkler, 1987: A general framework for forecast verification. Mon. Wea. Rev., 115, 13301338, doi:10.1175/1520-0493(1987)115<1330:AGFFFV>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Palmer, T. N., 2002: The economic value of ensemble forecasts as a tool for risk assessment: From days to decades. Quart. J. Roy. Meteor. Soc., 128, 747774, doi:10.1256/0035900021643593.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Perler, D., 2006: Automatic weather interpretation using modern classification algorithms. M.S. thesis, Dept. of Computer Science, ETH Zürich, Zürich, Switzerland, 81 pp.

  • Perler, D., and O. Marchand, 2009: A study in weather model output postprocessing: Using the boosting method for thunderstorm detection. Wea. Forecasting, 24, 211222, doi:10.1175/2008WAF2007047.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Plavcan, D., and G. J. Mayr, 2014: Automatic and probabilistic foehn diagnosis with a statistical mixture model. J. Appl. Meteor. Climatol., 53, 652658, doi:10.1175/JAMC-D-13-0267.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Radhika, Y., and M. Shashi, 2009: Atmospheric temperature prediction using support vector machines. Int. J. Comput. Theor. Eng., 1, 5558, doi:10.7763/IJCTE.2009.V1.9.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Richner, H., and P. Hächler, 2013: Understanding and forecasting Alpine foehn. Mountain Weather Research and Forecasting, F. K. Chow, S. F. J. De Wekker, and B. J. Snyder, Eds., Springer Atmospheric Sciences, Springer, 219–260, doi:10.1007/978-94-007-4098-3.

    • Search Google Scholar
    • Export Citation
  • Richner, H., and B. Dürr, 2015: Facts and fallacies related to dimmerfoehn. ETH Zürich Rep., 4 pp., doi:10.3929/ethz-a-010439615.

    • Crossref
    • Export Citation
  • Richner, H., B. Dürr, T. Gutermann, and S. Bader, 2014: The use of automatic station data for continuing the long time series (1864 to 2008) of foehn in Altdorf. Meteor. Z., 23, 159166, doi:10.1127/0941-2948/2014/0528.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sprenger, M., B. Dürr, and H. Richner, 2016: Foehn studies in Switzerland. From Weather Observations to Atmospheric and Climate Sciences in Switzerland, S. Willemse and M. Furger, Eds., Hochschulverlag, 215–248.

    • Search Google Scholar
    • Export Citation
  • Steinacker, R., 2006: Alpiner Föhn - eine neue Strophe zu einem alten Lied. Promet, 32, 310.

  • Stephenson, D. B., 2000: Use of the “odds ratio” for diagnosing forecast skill. Wea. Forecasting, 15, 221232, doi:10.1175/1520-0434(2000)015<0221:UOTORF>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Steppeler, J., G. Doms, U. Schattler, H. Bitzer, A. Gassmann, U. Damrath, and G. Gregoric, 2003: Meso-gamma scale forecasts using the nonhydrostatic model LM. Meteor. Atmos. Phys., 82, 7596, doi:10.1007/s00703-001-0592-9.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Widmer, R., 1966: Statistische Untersuchungen über den Föhn im Reusstal und Versuch einer objektiven Föhnprognose für die Station Altdorf. Vierteljahresschr. Naturforsch. Ges. Zürich, 111, 331375.

    • Search Google Scholar
    • Export Citation
  • Wilhelm, M., 2012: COSMO-2 model performance in forecasting foehn: A systematic process-oriented verification. MeteoSwiss Sci. Rep. 89, 58 pp.

  • Winkler, P., M. Lugauer, and O. Reitebuch, 2006: Alpines Pumpen. Promet, 32, 3442.

  • Würsch, M., and M. Sprenger, 2015: Swiss and Austrian foehn revisited: A Lagrangian-based analysis. Meteor. Z., 24, 225242, doi:10.1127/metz/2015/0647.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Zweifel, L., 2016: Probabilistic foehn forecasting for the Gotthard region based on model output statistics. M.S. thesis, Faculty of Geo- and Atmospheric Sciences, University of Innsbruck, 90 pp. [Available online at diglib.uibk.ac.at.]

1

More specifically, the threshold is obtained in the following way. First, the 5% and 95% percentiles of a predictor are determined, taking into account all time instances of the learning dataset. Then, the range between these two percentiles is split into 100 equidistant bins and a midpoint threshold is attributed to each bin. Finally, the threshold is selected by sequentially testing all of them; i.e., the algorithm chooses the one that minimizes the classification error given the weights attributed to the single observations.

2

A possible approach could apply a running-mean filter over the AdaBoost index and then use a threshold adapted to this new time series. Or transient gaps lasting 1–2 h between foehn instances are simply dismissed, and the foehn is considered to be continuous. Certainly, the optimal adjustment must be done in close collaboration with operational forecasters who bring with them the necessary experience in foehn prediction.

3

In the operational forecasting of foehns at MeteoSwiss, the Widmer index is only one (important) part. Several additional criteria are taken into account, e.g., the observed wind speed and direction at several (mountain) stations, the lower-tropospheric stratification, and the origin of the air masses. In particular, these additional criteria depend on the station for which the foehn breakthrough or decay has to be predicted.

4

Note that the AdaBoost classifier is trained on foehn instances from all four seasons. An alternative approach could be to train the classifier independently for each season, i.e., only spring foehn cases are used to train a foehn classifier for spring, and correspondingly for the other seasons. This might result in better performances during the individual seasons, if we assume that the foehn seasonally differs in its characteristics. A drawback to this approach, however, would be the reduced size of the training dataset, in particular for summer with its low frequency of foehn occurrences. Therefore, we refrain from this sensitivity experiment in this study, but intend to do so within the 10-yr dataset of the crCLIM project (see outlook in section 5).

Save
  • Barnes, L. R., D. Schultz, E. C. Gruntfest, M. H. Hayden, and C. C. Benight, 2007: False alarms and close calls: A conceptual model of warning accuracy. Wea. Forecasting, 22, 11401147, doi:10.1175/WAF1031.1; Corrigendum, 24, 1452–1454, doi:10.1175/2009WAF2222300.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bougeault, P., and Coauthors, 1998: Mesoscale Alpine Programme – The science plan. MeteoSwiss, 64 pp. [Available online at www.map.meteoswiss.ch.]

  • Burri, K., P. Hächler, M. Schüepp, and R. Werner, 1999: Der Föhnfall vom April 1993. MeteoSwiss Rep. 196, 89 pp. [Available online at http://www.meteoschweiz.admin.ch/home/service-und-publikationen/publikationen.subpage.html/de/data/publications/1999/9/der-foehnfall-vom-april-1993.html.]

  • Courvoisier, H. W., and T. Gutermann, 1971: Zur praktischen Anwendung des Föhntests von Widmer. MeteoSwiss Rep. 21, 7 pp. [Available online at http://www.agfoehn.org/doc/Courvoisier_1971.pdf.]

  • Deloncle, A., R. Berk, F. D’Andrea, and M. Ghil, 2007: Weather regime prediction using statistical learning. J. Atmos. Sci., 64, 16191635, doi:10.1175/JAS3918.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dietterich, T. G., 2000: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Mach. Learn., 40, 139157, doi:10.1023/A:1007607513941.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Drechsel, S., and G. Mayr, 2008: Objective forecasting of foehn winds for a subgrid-scale Alpine valley. Wea. Forecasting, 23, 205218, doi:10.1175/2007WAF2006021.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dürr, B., 2008: Automatisiertes Verfahren zur Bestimmung von Föhn in Alpentälern. MeteoSwiss Rep. 223, 22 pp. [Available online at http://www.meteoschweiz.admin.ch/content/dam/meteoswiss/de/Ungebundene-Seiten/Publikationen/Fachberichte/doc/ab223.pdf.]

  • Fawcett, T., 2006: An introduction to ROC analysis. Pattern Recognit. Lett., 27, 861874, doi:10.1016/j.patrec.2005.10.010.

  • Freund, Y., and R. E. Schapire, 1997: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci., 55, 119139, doi:10.1006/jcss.1997.1504.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gagne, D. J., II, 2016: Coupling data science techniques and numerical weather prediction models for high-impact weather prediction. Ph.D. dissertation, University of Oklahoma, 185 pp. [Available online at https://shareok.org/handle/11244/44917.]

  • Glahn, H., and D. Lowry, 1972: The use of model output statistics (MOS) in objective weather forecasting. J. Appl. Meteor., 11, 12031211, doi:10.1175/1520-0450(1972)011<1203:TUOMOS>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Graf, M., M. Sprenger, U. Lohmann, C. Seibt, and H. Hofmann, 2013: Evaluating the suitability of the SWAN/COSMO-2 model system to simulate short-crested surface waves for a narrow lake with complex bathymetry. Meteor. Z., 22, 257272, doi:10.1127/0941-2948/2013/0442.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gutermann, T., 1970: Vergleichende Untersuchungen zur Föhnhäuigkeit im Rheintal zwischen Chur und Bodensee. Ph.D. dissertation, University of Zürich, Zürich, Switzerland, 69 pp.

  • Gutermann, T., 1979: Der Föhn vom 14. bis 18. Januar 1975 im Bodenseeraum. MeteoSwiss Rep. 90, 42 pp.

  • Gutermann, T., B. Dürr, H. Richner, and S. Bader, 2012: Föhnklimatologie Altdorf: die lange Reihe (1864-2008) und ihre Weiterführung, Vergleich mit anderen Stationen. MeteoSwiss Tech. Rep. 241, 53 pp., doi:10.3929/ethz-a-007583529.

    • Crossref
    • Export Citation
  • Hächler, P., K. Burri, B. Dürr, T. Gutermann, A. Neururer, H. Richner, and R. Werner, 2011: Der Föhnfall vom 8. Dezember 2006 – Eine Fallstudie. MeteoSwiss Rep. 234, 52 pp. [Available online at http://dx.doi.org/10.3929/ethz-a-007319165.]

    • Crossref
    • Export Citation
  • Hastie, T., R. Tibshirani, and J. Friedman, 2009: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Springer Series in Statistics, 745 pp., doi:10.1007/978-0-387-21606-5.

    • Crossref
    • Export Citation
  • Hsieh, W., 2009: Machine Learning Methods in the Environmental Sciences—Neural Networks and Kernels. Cambridge University Press, 349 pp.

    • Crossref
    • Export Citation
  • James, G., D. Witten, T. Hastie, and R. Tibshirani, 2013: An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics, Vol. 103, Springer, 426 pp. doi:10.1007/978-1-4614-7138-7.

    • Crossref
    • Export Citation
  • Jenkner, J., 2008: Stratified verifications of quantitative precipitation forecasts over Switzerland. Ph.D. thesis 17782, ETH Zürich, 108 pp., doi:10.3929/ethz-a-005698830.

    • Crossref
    • Export Citation
  • Jenkner, J., M. Sprenger, I. Schwenk, C. Schwierz, S. Dierer, and D. Leuenberger, 2010: Detection and climatology of fronts in a high-resolution model reanalysis over the Alps. Meteor. Appl., 17, 118, doi:10.1002/met.142.

    • Search Google Scholar
    • Export Citation
  • Jolliffe, I. T., and D. B. Stephenson, 2011: Forecast Verification: A Practitioner’s Guide in Atmospheric Science. 2nd ed. John Wiley and Sons, 292 pp.

    • Crossref
    • Export Citation
  • Katz, R. W., and A. H. Murphy, 2008: Economic Value of Weather and Climate Forecasts. Cambridge University Press, 240 pp.

  • Kretzschmar, R., P. Eckert, D. Cattani, and F. Eggimann, 2004: Neural network classifiers for local wind prediction. J. Appl. Meteor., 43, 727738, doi:10.1175/2057.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kuhn, M., and K. Johnson, 2013: Applied Predictive Modeling. Springer, 600 pp.

    • Crossref
    • Export Citation
  • Lotteraner, C., 2009: Synoptisch-klimatologische Auswertung von Windfeldern im Alpenraum. Ph.D. dissertation, University of Vienna, 112 pp. [Available online at http://othes.univie.ac.at/6142/.]

  • Manzato, A., 2005: The use of sounding-derived indices for a neural network short-term thunderstorm forecast. Wea. Forecasting, 20, 896917, doi:10.1175/WAF898.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Manzato, A., 2007: A note on the maximum Pierce skill score. Wea. Forecasting, 22, 11481154, doi:10.1175/WAF1041.1.

  • McCandless, T. C., S. E. Haupt, and G. S. Young, 2015: A model tree approach to forecasting solar irradiance variability. Sol. Energy, 120, 514524, doi:10.1016/j.solener.2015.07.020.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1996: The Finley affair: A signal event in the history of forecast verification. Wea. Forecasting, 11, 320, doi:10.1175/1520-0434(1996)011<0003:TFAASE>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., and R. L. Winkler, 1987: A general framework for forecast verification. Mon. Wea. Rev., 115, 13301338, doi:10.1175/1520-0493(1987)115<1330:AGFFFV>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Palmer, T. N., 2002: The economic value of ensemble forecasts as a tool for risk assessment: From days to decades. Quart. J. Roy. Meteor. Soc., 128, 747774, doi:10.1256/0035900021643593.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Perler, D., 2006: Automatic weather interpretation using modern classification algorithms. M.S. thesis, Dept. of Computer Science, ETH Zürich, Zürich, Switzerland, 81 pp.

  • Perler, D., and O. Marchand, 2009: A study in weather model output postprocessing: Using the boosting method for thunderstorm detection. Wea. Forecasting, 24, 211222, doi:10.1175/2008WAF2007047.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Plavcan, D., and G. J. Mayr, 2014: Automatic and probabilistic foehn diagnosis with a statistical mixture model. J. Appl. Meteor. Climatol., 53, 652658, doi:10.1175/JAMC-D-13-0267.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Radhika, Y., and M. Shashi, 2009: Atmospheric temperature prediction using support vector machines. Int. J. Comput. Theor. Eng., 1, 5558, doi:10.7763/IJCTE.2009.V1.9.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Richner, H., and P. Hächler, 2013: Understanding and forecasting Alpine foehn. Mountain Weather Research and Forecasting, F. K. Chow, S. F. J. De Wekker, and B. J. Snyder, Eds., Springer Atmospheric Sciences, Springer, 219–260, doi:10.1007/978-94-007-4098-3.

    • Search Google Scholar
    • Export Citation
  • Richner, H., and B. Dürr, 2015: Facts and fallacies related to dimmerfoehn. ETH Zürich Rep., 4 pp., doi:10.3929/ethz-a-010439615.

    • Crossref
    • Export Citation
  • Richner, H., B. Dürr, T. Gutermann, and S. Bader, 2014: The use of automatic station data for continuing the long time series (1864 to 2008) of foehn in Altdorf. Meteor. Z., 23, 159166, doi:10.1127/0941-2948/2014/0528.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sprenger, M., B. Dürr, and H. Richner, 2016: Foehn studies in Switzerland. From Weather Observations to Atmospheric and Climate Sciences in Switzerland, S. Willemse and M. Furger, Eds., Hochschulverlag, 215–248.

    • Search Google Scholar
    • Export Citation
  • Steinacker, R., 2006: Alpiner Föhn - eine neue Strophe zu einem alten Lied. Promet, 32, 310.

  • Stephenson, D. B., 2000: Use of the “odds ratio” for diagnosing forecast skill. Wea. Forecasting, 15, 221232, doi:10.1175/1520-0434(2000)015<0221:UOTORF>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Steppeler, J., G. Doms, U. Schattler, H. Bitzer, A. Gassmann, U. Damrath, and G. Gregoric, 2003: Meso-gamma scale forecasts using the nonhydrostatic model LM. Meteor. Atmos. Phys., 82, 7596, doi:10.1007/s00703-001-0592-9.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Widmer, R., 1966: Statistische Untersuchungen über den Föhn im Reusstal und Versuch einer objektiven Föhnprognose für die Station Altdorf. Vierteljahresschr. Naturforsch. Ges. Zürich, 111, 331375.

    • Search Google Scholar
    • Export Citation
  • Wilhelm, M., 2012: COSMO-2 model performance in forecasting foehn: A systematic process-oriented verification. MeteoSwiss Sci. Rep. 89, 58 pp.

  • Winkler, P., M. Lugauer, and O. Reitebuch, 2006: Alpines Pumpen. Promet, 32, 3442.

  • Würsch, M., and M. Sprenger, 2015: Swiss and Austrian foehn revisited: A Lagrangian-based analysis. Meteor. Z., 24, 225242, doi:10.1127/metz/2015/0647.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Zweifel, L., 2016: Probabilistic foehn forecasting for the Gotthard region based on model output statistics. M.S. thesis, Faculty of Geo- and Atmospheric Sciences, University of Innsbruck, 90 pp. [Available online at diglib.uibk.ac.at.]

  • Fig. 1.

    Outline of the AdaBoost classifier. First, some weights are attributed to the N observations (step 1), whereby the weights can either be equally attributed to foehn and nonfoehn events. Then, M iterations (step 2) are performed to adjust M weak classifiers Gm to the weighted observations (step 2a). In step 2b the performance, expressed as an error value errm, of the weak learner Gm is assessed and determines a weight αm attributed to the weak learner Gm (step 2c). Based on the performance of the weak learner, new weights are then attributed to all observations (step 2d), where the incorrectly classified instances get a larger weight. Finally, after having thus established M weak learners Gm with corresponding weights αm, the boosted strong learner G(x) is defined as a weighted average over all weak learners.

  • Fig. 2.

    Determination of a threshold for the single predictors (a) Δp(ALP − LUZ, −1 h) and (b) vel10m(ALT, −1 h). The predictor values for all nonfoehn (blue) and foehn events (red) are taken and the threshold is set to minimize the error rate, given the weight attributed to all observation instances. Note that the threshold depends on the relative weight of foehn to nonfoehn events (see section 3 for details). This corresponds to a simple decision stump, where finally all observations above the thresholds vote for foehn and all observations below vote against it. Black shading indicates the part that is incorrectly classified based on these single predictors.

  • Fig. 3.

    Definition of single weak learners for 1100 UTC 16 Oct 2002. For (top) weak learner 1, the 133 predictors are included and vote, as a simple decision stump, for (red) or against (blue) a foehn case. The majority then determines whether the first weak learner classifies the instance as a foehn or nonfoehn event. Based on the performance of the weak learner, as determined from the error value, a weight is attributed to this weak learner. For the subsequent weak learners (5 out of 20 shown here) the weights attributed to the observations are adjusted accordingly, hence giving more impact to incorrectly classified instances, and then the same majority vote method is applied again.

  • Fig. 4.

    The final boosted classifier is taken as an average over all weak learners, weighted according to their performance. In this case, at 1100 UTC 16 Oct 2002 the weighted average becomes 0.56, which will finally classify this instance as a foehn event, provided the discriminating decision threshold is set to 0.5.

  • Fig. 5.

    (a) All stations with pressure reduced to sea level height, which are included in the sensitivity run described in section 4. The topography, at 1-km resolution, is given in color shading and additionally the first four predictors in the ranking are drawn. They all correspond to pressure differences between a station on the Alpine south side and four stations on the Alpine north side: NAP − ROB, PIL − ROB, HOE − ROB, and VAD − SCU. (b) The reduced surface pressure (in steps of 0.5 hPa) averaged over all foehn instances during the years 2000–02. As an inset, the detailed pressure analysis of the foehn on 15 Jan 1979 from Gutermann (1979) is shown.

  • Fig. 6.

    Example of the performance of the boosted foehn classifier for (a) the whole year 2002 and (b) March 2002. Green labels mark time instances when a foehn event was observed, blue labels forecast foehn events, and red marks the mismatch between the two. (b) The timeline of the forecast value is shown in blue, which corresponds to a forecasted foehn event if the dashed decision threshold (0.5) is surpassed.

  • Fig. 7.

    Time series of the (top) Widmer index and (bottom) AdaBoost index for the time period July 2001–December 2002. Additionally, the thresholds for foehn detection are included as red, dashed lines, and observed foehn events are marked with a gray bar. The time period corresponds to the validation dataset, whereas the training dataset contains the 1.5 yr before. The threshold for the Widmer index varies during the year, with the highest values in winter and smallest in summer as a result of the stronger stratification in the foehn valleys before foehn onset during winter compared with summer; i.e., in winter stronger pressure gradients are needed to displace the stable cold air pools in the valleys.

  • Fig. 8.

    (a) Distribution of forecast values for foehn (red) and nonfoehn (blue) events. The vertical dashed line corresponds to the decision threshold (0.5) applied in the evaluation. (b) Trade-off between POD, POFD, and MR, as well as between CAR, FAR, and MAR, depending on the decision threshold of the boosted foehn classifier. For CAR, FAR, and MAR only thresholds up to 0.6 are considered, because for higher values the curves become very “noisy.”

  • Fig. 9.

    ROC values for the AdaBoost index, i.e., dependency of POFD and POD on the decision threshold. Some thresholds along the ROC curve are marked with colored dots.

  • Fig. 10.

    (a) Composite of geopotential height (m) at 500 hPa for all observed foehn events that are correctly identified (contour lines) or missed (in color shading) by the AdaBoost classifier. (b) Difference between the identified and missed precipitation composites (color shading; mm h−1) and correspondingly for the sea level pressure composites (contour lines; hPa). As in (a), all observed foehn events are considered.

  • Fig. 11.

    AdaBoost index for foehn instances as a function of the air parcels’ height over the Po valley [as determined in Wuersch and Sprenger (2015); see text for details]. Green dots represent correctly identified foehn events, and red dots missed cases.

  • Fig. C1.

    Monte Carlo simulation showing the (a) POD and (b) FAR. The histograms show the outcome if 100 random splits of the original dataset into training and test samples are performed. The green line corresponds to the reference run discussed in section 4.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 1103 255 17
PDF Downloads 834 171 14