Search Results

You are looking at 1 - 10 of 11 items for

  • Author or Editor: Andreas P. Weigel x
  • All content x
Clear All Modify Search
Andreas P. Weigel and Simon J. Mason

Abstract

This article refers to the study of Mason and Weigel, where the generalized discrimination score D has been introduced. This score quantifies whether a set of observed outcomes can be correctly discriminated by the corresponding forecasts (i.e., it is a measure of the skill attribute of discrimination). Because of its generic definition, D can be adapted to essentially all relevant verification contexts, ranging from simple yes–no forecasts of binary outcomes to probabilistic forecasts of continuous variables. For most of these cases, Mason and Weigel have derived expressions for D, many of which have turned out to be equivalent to scores that are already known under different names. However, no guidance was provided on how to calculate D for ensemble forecasts. This gap is aggravated by the fact that there are currently very few measures of forecast quality that could be directly applied to ensemble forecasts without requiring that probabilities be derived from the ensemble members prior to verification. This study seeks to close this gap. A definition is proposed of how ensemble forecasts can be ranked; the ranks of the ensemble forecasts can then be used as a basis for attempting to discriminate between corresponding observations. Given this definition, formulations of D are derived that are directly applicable to ensemble forecasts.

Full access
Simon J. Mason and Andreas P. Weigel

Abstract

There are numerous reasons for calculating forecast verification scores, and considerable attention has been given to designing and analyzing the properties of scores that can be used for scientific purposes. Much less attention has been given to scores that may be useful for administrative reasons, such as communicating changes in forecast quality to bureaucrats and providing indications of forecast quality to the general public. The two-alternative forced choice (2AFC) test is proposed as a scoring procedure that is sufficiently generic to be usable on forecasts ranging from simple yes–no forecasts of dichotomous outcomes to forecasts of continuous variables, and can be used with deterministic or probabilistic forecasts without seriously reducing the more complex information when available. Although, as with any single verification score, the proposed test has limitations, it does have broad intuitive appeal in that the expected score of an unskilled set of forecasts (random guessing or perpetually identical forecasts) is 50%, and is interpretable as an indication of how often the forecasts are correct, even when the forecasts are expressed probabilistically and/or the observations are not discrete.

Full access
Andreas P. Weigel, Mark A. Liniger, and Christof Appenzeller

Abstract

Multimodel ensemble combination (MMEC) has become an accepted technique to improve probabilistic forecasts from short- to long-range time scales. MMEC techniques typically widen ensemble spread, thus improving the dispersion characteristics and the reliability of the forecasts. This raises the question as to whether the same effect could be achieved in a potentially cheaper way by rescaling single model ensemble forecasts a posteriori such that they become reliable. In this study a climate conserving recalibration (CCR) technique is derived and compared with MMEC. With a simple stochastic toy model it is shown that both CCR and MMEC successfully improve forecast reliability. The difference between these two methods is that CCR conserves resolution but inevitably dilutes the potentially predictable signal while MMEC is in the ideal case able to fully retain the predictable signal and to improve resolution. Therefore, MMEC is conceptually to be preferred, particularly since the effect of CCR depends on the length of the data record and on distributional assumptions. In reality, however, multimodels consist only of a finite number of participating single models, and the model errors are often correlated. Under such conditions, and depending on the skill metric applied, CCR-corrected single models can on average have comparable skill as multimodel ensembles, particularly when the potential model predictability is low. Using seasonal near-surface temperature and precipitation forecasts of three models of the Development of a European Multimodel Ensemble System for Seasonal-to-Interannual Prediction (DEMETER) dataset, it is shown that the conclusions drawn from the toy-model experiments hold equally in a real multimodel ensemble prediction system. All in all, it is not possible to make a general statement on whether CCR or MMEC is the better method. Rather it seems that optimum forecasts can be obtained by a combination of both methods, but only if first MMEC and then CCR is applied. The opposite order—first CCR, then MMEC—is shown to be of only little effect, at least in the context of seasonal forecasts.

Full access
Andreas P. Weigel, Reto Knutti, Mark A. Liniger, and Christof Appenzeller

Abstract

Multimodel combination is a pragmatic approach to estimating model uncertainties and to making climate projections more reliable. The simplest way of constructing a multimodel is to give one vote to each model (“equal weighting”), while more sophisticated approaches suggest applying model weights according to some measure of performance (“optimum weighting”). In this study, a simple conceptual model of climate change projections is introduced and applied to discuss the effects of model weighting in more generic terms. The results confirm that equally weighted multimodels on average outperform the single models, and that projection errors can in principle be further reduced by optimum weighting. However, this not only requires accurate knowledge of the single model skill, but the relative contributions of the joint model error and unpredictable noise also need to be known to avoid biased weights. If weights are applied that do not appropriately represent the true underlying uncertainties, weighted multimodels perform on average worse than equally weighted ones, which is a scenario that is not unlikely, given that at present there is no consensus on how skill-based weights can be obtained. Particularly when internal variability is large, more information may be lost by inappropriate weighting than could potentially be gained by optimum weighting. These results indicate that for many applications equal weighting may be the safer and more transparent way to combine models. However, also within the presented framework eliminating models from an ensemble can be justified if they are known to lack key mechanisms that are indispensable for meaningful climate projections.

Full access
Andreas P. Weigel, Mark A. Liniger, and Christof Appenzeller

Abstract

This note describes how the widely used Brier and ranked probability skill scores (BSS and RPSS, respectively) can be correctly applied to quantify the potential skill of probabilistic multimodel ensemble forecasts. It builds upon the study of Weigel et al. where a revised RPSS, the so-called discrete ranked probability skill score (RPSSD), was derived, circumventing the known negative bias of the RPSS for small ensemble sizes. Since the BSS is a special case of the RPSS, a debiased discrete Brier skill score (BSSD) could be formulated in the same way. Here, the approach of Weigel et al., which so far was only applicable to single model ensembles, is generalized to weighted multimodel ensemble forecasts. By introducing an “effective ensemble size” characterizing the multimodel, the new generalized RPSSD can be expressed such that its structure becomes equivalent to the single model case. This is of practical importance for multimodel assessment studies, where the consequences of varying effective ensemble size need to be clearly distinguished from the true benefits of multimodel combination. The performance of the new generalized RPSSD formulation is illustrated in examples of weighted multimodel ensemble forecasts, both in a synthetic random forecasting context, and with real seasonal forecasts of operational models. A central conclusion of this study is that, for small ensemble sizes, multimodel assessment studies should not only be carried out on the basis of the classical RPSS, since true changes in predictability may be hidden by bias effects—a deficiency that can be overcome with the new generalized RPSSD.

Full access
Andreas P. Weigel, Mark A. Liniger, and Christof Appenzeller

Abstract

The Brier skill score (BSS) and the ranked probability skill score (RPSS) are widely used measures to describe the quality of categorical probabilistic forecasts. They quantify the extent to which a forecast strategy improves predictions with respect to a (usually climatological) reference forecast. The BSS can thereby be regarded as the special case of an RPSS with two forecast categories. From the work of Müller et al., it is known that the RPSS is negatively biased for ensemble prediction systems with small ensemble sizes, and that a debiased version, the RPSSD, can be obtained quasi empirically by random resampling from the reference forecast. In this paper, an analytical formula is derived to directly calculate the RPSS bias correction for any ensemble size and combination of probability categories, thus allowing an easy implementation of the RPSSD. The correction term itself is identified as the “intrinsic unreliability” of the ensemble prediction system. The performance of this new formulation of the RPSSD is illustrated in two examples. First, it is applied to a synthetic random white noise climate, and then, using the ECMWF Seasonal Forecast System 2, to seasonal predictions of near-surface temperature in several regions of different predictability. In both examples, the skill score is independent of ensemble size while the associated confidence thresholds decrease as the number of ensemble members and forecast/observation pairs increase.

Full access
Simon J. Mason, Michael K. Tippett, Andreas P. Weigel, Lisa Goddard, and Balakanapathy Rajaratnam
Full access
Martin Hirschi, Christoph Spirig, Andreas P. Weigel, Pierluigi Calanca, Jörg Samietz, and Mathias W. Rotach

Abstract

Monthly weather forecasts (MOFCs) were shown to have skill in extratropical continental regions for lead times up to 3 weeks, in particular for temperature and if weekly averaged. This skill could be exploited in practical applications for implementations exhibiting some degree of memory or inertia toward meteorological drivers, potentially even for longer lead times. Many agricultural applications fall into these categories because of the temperature-dependent development of biological organisms, allowing simulations that are based on temperature sums. Most such agricultural models require local weather information at daily or even hourly temporal resolution, however, preventing direct use of the spatially and temporally aggregated information of MOFCs, which may furthermore be subject to significant biases. By the example of forecasting the timing of life-phase occurrences of the codling moth (Cydia pomonella), which is a major insect pest in apple orchards worldwide, the authors investigate the application of downscaled weekly temperature anomalies of MOFCs for use in an impact model requiring hourly input. The downscaling and postprocessing included the use of a daily weather generator and a resampling procedure for creating hourly weather series and the application of a recalibration technique to correct for the original underconfidence of the forecast occurrences of codling moth life phases. Results show a clear skill improvement of up to 3 days in root-mean-square error over the full forecast range when incorporating MOFCs as compared with deterministic benchmark forecasts using climatological information for predicting the timing of codling moth life phases.

Full access
Fotini Katopodes Chow, Andreas P. Weigel, Robert L. Street, Mathias W. Rotach, and Ming Xue

Abstract

This paper investigates the steps necessary to achieve accurate simulations of flow over steep, mountainous terrain. Large-eddy simulations of flow in the Riviera Valley in the southern Swiss Alps are performed at horizontal resolutions as fine as 150 m using the Advanced Regional Prediction System. Comparisons are made with surface station and radiosonde measurements from the Mesoscale Alpine Programme (MAP)-Riviera project field campaign of 1999. Excellent agreement between simulations and observations is obtained, but only when high-resolution surface datasets are used and the nested grid configurations are carefully chosen. Simply increasing spatial resolution without incorporating improved surface data gives unsatisfactory results. The sensitivity of the results to initial soil moisture, land use data, grid resolution, topographic shading, and turbulence models is explored. Even with strong thermal forcing, the onset and magnitude of the upvalley winds are highly sensitive to surface processes in areas that are well outside the high-resolution domain. In particular, the soil moisture initialization on the 1-km grid is found to be crucial to the success of the finer-resolution predictions. High-resolution soil moisture and land use data on the 350-m-resolution grid also improve results. The use of topographic shading improves radiation curves during sunrise and sunset, but the effects on the overall flow are limited because of the strong lateral boundary forcing from the 1-km grid where terrain slopes are not well resolved. The influence of the turbulence closure is also limited because of strong lateral forcing and hence limited residence time of air inside the valley and because of the stable stratification, which limits turbulent stress to the lowest few hundred meters near the surface.

Full access
Andreas P. Weigel, Fotini K. Chow, Mathias W. Rotach, Robert L. Street, and Ming Xue

Abstract

This paper analyzes the three-dimensional flow structure and the heat budget in a typical medium-sized and steep Alpine valley—the Riviera Valley in southern Switzerland. Aircraft measurements from the Mesoscale Alpine Programme (MAP)-Riviera field campaign reveal a very pronounced valley-wind system, including a strong curvature-induced secondary circulation in the southern valley entrance region. Accompanying radio soundings show that the growth of a well-mixed layer is suppressed, even under convective conditions. Our analyses are based on the MAP-Riviera measurement data and the output of high-resolution large-eddy simulations using the Advanced Regional Prediction System (ARPS). Three sunny days of the measurement campaign are simulated. Using horizontal grid spacings of 350 and 150 m (with a vertical spacing as fine as 20 m), the model reproduces the observed flow features very well. The ARPS output data are then used to calculate the components of the heat budget of the valley atmosphere, first in profiles over the valley base and then as averages over almost the entire valley volume. The analysis shows that the suppressed growth of the well-mixed layer is due to the combined effect of cold-air advection in the along-valley direction and subsidence of warm air from the free atmosphere aloft. It is further influenced by the local cross-valley circulation. This had already been hypothesized on the basis of measurement data and is now confirmed through a numerical model. Averaged over the entire valley, subsidence turns out to be one of the main heating sources of the valley atmosphere and is of comparable magnitude to turbulent heat flux divergence. On the mornings of two out of the three simulation days, this subsidence is even identified as the only major heating source and thus appears to be an important driving mechanism for the onset of thermally driven upvalley winds.

Full access