Search Results

You are looking at 1 - 10 of 10 items for

  • Author or Editor: Achim Zeileis x
  • All content x
Clear All Modify Search
David Plavcan, Georg J. Mayr, and Achim Zeileis

Abstract

Diagnosing foehn winds from weather station data downwind of topographic obstacles requires distinguishing them from other downslope winds, particularly nocturnal ones driven by radiative cooling. An automatic classification scheme to obtain reproducible results that include information about the (un)certainty of the diagnosis is presented. A statistical mixture model separates foehn and no-foehn winds in a measured time series of wind. In addition to wind speed and direction, it accommodates other physically meaningful classifiers such as the (potential) temperature difference to an upwind station (e.g., near the crest) or relative humidity. The algorithm was tested for Wipp Valley in the central Alps against human expert classification and a previous objective method (), which the new method outperforms. Climatologically, using only wind information gives nearly identical foehn frequencies as when using additional covariables. A data record length of at least one year is required for satisfactory results. The suitability of mixture models for objective classification of foehn at other locations will have to be tested in further studies.

Full access
Reto Stauffer, Georg J. Mayr, Markus Dabernig, and Achim Zeileis

Abstract

Results of many atmospheric science applications are processed graphically. Visualizations are a powerful tool to display and communicate data. However, to create effective figures, a wide scope of challenges has to be considered. Therefore, this paper offers several guidelines with a focus on colors. Colors are often used to add additional information or to code information. Colors should (i) allow humans to process the information rapidly, (ii) guide the reader to the most important information, and (iii) represent the data appropriately without misleading distortion. The second and third requirements necessitate tailoring the visualization and the use of colors to the specific purpose of the graphic. A standard way of deriving color palettes is via transitions through a particular color space. Most of the common software packages still provide default palettes derived in the red–green–blue (RGB) color model or “simple” transformations thereof. Confounding perceptual properties such as hue and brightness make RGB-based palettes more prone to misinterpretation. Switching to a color model corresponding to the perceptual dimensions of human color vision avoids these problems. The authors show several practically relevant examples using one such model, the hue–chroma–luminance (HCL) color model, to explain how it works and what its advantages are. Moreover, the paper contains several tips on how to easily integrate this knowledge into software commonly used by the community. The guidelines and examples should help readers to switch over to the alternative HCL color model, which will result in a greatly improved quality and readability of visualized atmospheric science data for research, teaching, and communication of results to society.

Full access
Jakob W. Messner, Georg J. Mayr, and Achim Zeileis

Abstract

Nonhomogeneous regression is often used to statistically postprocess ensemble forecasts. Usually only ensemble forecasts of the predictand variable are used as input, but other potentially useful information sources are ignored. Although it is straightforward to add further input variables, overfitting can easily deteriorate the forecast performance for increasing numbers of input variables. This paper proposes a boosting algorithm to estimate the regression coefficients, while automatically selecting the most relevant input variables by restricting the coefficients of less important variables to zero. A case study with ensemble forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF) shows that this approach effectively selects important input variables to clearly improve minimum and maximum temperature predictions at five central European stations.

Full access
Manuel Gebetsberger, Jakob W. Messner, Georg J. Mayr, and Achim Zeileis

Abstract

Raw ensemble forecasts of precipitation amounts and their forecast uncertainty have large errors, especially in mountainous regions where the modeled topography in the numerical weather prediction model and real topography differ most. Therefore, statistical postprocessing is typically applied to obtain automatically corrected weather forecasts. This study applies the nonhomogenous regression framework as a state-of-the-art ensemble postprocessing technique to predict a full forecast distribution and improves its forecast performance with three statistical refinements. First of all, a novel split-type approach effectively accounts for unanimous zero precipitation predictions of the global ensemble model of the ECMWF. Additionally, the statistical model uses a censored logistic distribution to deal with the heavy tails of precipitation amounts. Finally, it is investigated which are the most suitable link functions for the optimization of regression coefficients for the scale parameter. These three refinements are tested for 10 stations in a small area of the European Alps for lead times from +24 to +144 h and accumulation periods of 24 and 6 h. Together, they improve probabilistic forecasts for precipitation amounts as well as the probability of precipitation events over the default postprocessing method. The improvements are largest for the shorter accumulation periods and shorter lead times, where the information of unanimous ensemble predictions is more important.

Open access
Markus Dabernig, Georg J. Mayr, Jakob W. Messner, and Achim Zeileis

Abstract

Separate statistical models are typically fit for each forecasting lead time to postprocess numerical weather prediction (NWP) ensemble forecasts. Using standardized anomalies of both NWP values and observations eliminates most of the lead-time-specific characteristics so that several lead times can be forecast simultaneously. Standardized anomalies are formed by subtracting a climatological mean and dividing by the climatological standard deviation. Simultaneously postprocessing forecasts between +12 and +120 h increases forecast coherence between lead times, yields a temporal resolution as high as the observation interval (e.g., up to 10 min), and speeds up computation times while achieving a forecast skill comparable to the conventional method.

Full access
Jakob W. Messner, Georg J. Mayr, Achim Zeileis, and Daniel S. Wilks

Abstract

To achieve well-calibrated probabilistic forecasts, ensemble forecasts are often statistically postprocessed. One recent ensemble-calibration method is extended logistic regression, which extends the popular logistic regression to yield full probability distribution forecasts. Although the purpose of this method is to postprocess ensemble forecasts, usually only the ensemble mean is used as the predictor variable, whereas the ensemble spread is neglected because it does not improve the forecasts. In this study it is shown that when simply used as an ordinary predictor variable in extended logistic regression, the ensemble spread affects the location but not the variance of the predictive distribution. Uncertainty information contained in the ensemble spread is therefore not utilized appropriately. To solve this drawback a new approach is proposed where the ensemble spread is directly used to predict the dispersion of the predictive distribution. With wind speed data and ensemble forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF) it is shown that by using this approach, the ensemble spread can be used effectively to improve forecasts from extended logistic regression.

Full access
Reto Stauffer, Nikolaus Umlauf, Jakob W. Messner, Georg J. Mayr, and Achim Zeileis

Abstract

Probabilistic forecasts provided by numerical ensemble prediction systems have systematic errors and are typically underdispersive. This is especially true over complex topography with extensive terrain-induced small-scale effects, which cannot be resolved by the ensemble system. To alleviate these errors, statistical postprocessing methods are often applied to calibrate the forecasts. This article presents a new full-distributional spatial postprocessing method for daily precipitation sums based on the standardized anomaly model output statistics (SAMOS) approach. Observations and forecasts are transformed into standardized anomalies by subtracting the long-term climatological mean and dividing by the climatological standard deviation. This removes all site-specific characteristics from the data and makes it possible to fit one single regression model for all stations at once. As the model does not depend on the station locations, it directly allows the creation of probabilistic forecasts for any arbitrary location. SAMOS uses a left-censored power-transformed logistic response distribution to account for the large fraction of zero observations (dry days), the limitation to nonnegative values, and the positive skewness of the data. ECMWF reforecasts are used for model training and to correct the ECMWF ensemble forecasts with the big advantage that SAMOS does not require an extensive archive of past ensemble forecasts as only the most recent four reforecasts are needed, and it automatically adapts to changes in the ECMWF ensemble model. The application of the new method to the central Alps shows that the new method is able to depict the small-scale properties and returns accurate fully probabilistic spatial forecasts.

Full access
Manuel Gebetsberger, Jakob W. Messner, Georg J. Mayr, and Achim Zeileis

Abstract

Nonhomogeneous regression models are widely used to statistically postprocess numerical ensemble weather prediction models. Such regression models are capable of forecasting full probability distributions and correcting for ensemble errors in the mean and variance. To estimate the corresponding regression coefficients, minimization of the continuous ranked probability score (CRPS) has widely been used in meteorological postprocessing studies and has often been found to yield more calibrated forecasts compared to maximum likelihood estimation. From a theoretical perspective, both estimators are consistent and should lead to similar results, provided the correct distribution assumption about empirical data. Differences between the estimated values indicate a wrong specification of the regression model. This study compares the two estimators for probabilistic temperature forecasting with nonhomogeneous regression, where results show discrepancies for the classical Gaussian assumption. The heavy-tailed logistic and Student’s t distributions can improve forecast performance in terms of sharpness and calibration, and lead to only minor differences between the estimators employed. Finally, a simulation study confirms the importance of appropriate distribution assumptions and shows that for a correctly specified model the maximum likelihood estimator is slightly more efficient than the CRPS estimator.

Open access
Jakob W. Messner, Georg J. Mayr, Daniel S. Wilks, and Achim Zeileis

Abstract

Extended logistic regression is a recent ensemble calibration method that extends logistic regression to provide full continuous probability distribution forecasts. It assumes conditional logistic distributions for the (transformed) predictand and fits these using selected predictand category probabilities. In this study extended logistic regression is compared to the closely related ordered and censored logistic regression models. Ordered logistic regression avoids the logistic distribution assumption but does not yield full probability distribution forecasts, whereas censored regression directly fits the full conditional predictive distributions. The performance of these and other ensemble postprocessing methods is tested on wind speed and precipitation data from several European locations and ensemble forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF). Ordered logistic regression performed similarly to extended logistic regression for probability forecasts of discrete categories whereas full predictive distributions were better predicted by censored regression.

Full access
Thorsten Simon, Peter Fabsic, Georg J. Mayr, Nikolaus Umlauf, and Achim Zeileis

Abstract

A probabilistic forecasting method to predict thunderstorms in the European eastern Alps is developed. A statistical model links lightning occurrence from the ground-based Austrian Lightning Detection and Information System (ALDIS) detection network to a large set of direct and derived variables from a numerical weather prediction (NWP) system. The NWP system is the high-resolution run (HRES) of the European Centre for Medium-Range Weather Forecasts (ECMWF) with a grid spacing of 16 km. The statistical model is a generalized additive model (GAM) framework, which is estimated by Markov chain Monte Carlo (MCMC) simulation. Gradient boosting with stability selection serves as a tool for selecting a stable set of potentially nonlinear terms. Three grids from 64 × 64 to 16 × 16 km2 and five forecast horizons from 5 days to 1 day ahead are investigated to predict thunderstorms during afternoons (1200–1800 UTC). Frequently selected covariates for the nonlinear terms are variants of convective precipitation, convective potential available energy, relative humidity, and temperature in the midlayers of the troposphere, among others. All models, even for a lead time of 5 days, outperform a forecast based on climatology in an out-of-sample comparison. An example case illustrates that coarse spatial patterns are already successfully forecast 5 days ahead.

Open access