Search Results

You are looking at 1 - 10 of 12 items for

  • Author or Editor: Christof Appenzeller x
  • Refine by Access: All Content x
Clear All Modify Search
Volkmar Wirth, Christof Appenzeller, and Martin Juckes

Abstract

The quasi-horizontal roll-up of unstable stratospheric intrusions into isolated vortices is known to result in specific structures on satellite water vapor images that are characterized by intermingling dark and light filaments. The current paper investigates how these features are generated and how they relate to partly similar features found on concurrent maps of the tropopause height or potential vorticity (PV). The roll-up of a stratospheric intrusion is simulated numerically with an idealized quasigeostrophic model, which focuses on the dynamics induced by anomalies in the height of the tropopause. The upper-tropospheric adiabatic vertical wind is calculated explicitly and is used to simulate water vapor images in the model. These images show qualitatively the same characteristic features as observed. They are generated through a combination of horizontal advection of initial moisture anomalies and the creation of additional moisture anomalies resulting from the upper-tropospheric vertical air motion. The latter is, in turn, induced by the quasi-horizontal motion of the tropopause anomaly. It is suggested that a substantial portion of the spiral-like structures on the water vapor images is likely to reflect the vertical wind induced by the evolution of the intrusion itself. When the tropopause is defined through a fairly low value of PV, it may acquire similar spiraling structures, as it is being advected almost like a passive tracer. On the other hand, for the dynamically active core part of the intrusion, which is located at higher values of PV, one may expect an evolution leading to more compact vortex cores and less structure overall.

Full access
Andreas P. Weigel, Reto Knutti, Mark A. Liniger, and Christof Appenzeller

Abstract

Multimodel combination is a pragmatic approach to estimating model uncertainties and to making climate projections more reliable. The simplest way of constructing a multimodel is to give one vote to each model (“equal weighting”), while more sophisticated approaches suggest applying model weights according to some measure of performance (“optimum weighting”). In this study, a simple conceptual model of climate change projections is introduced and applied to discuss the effects of model weighting in more generic terms. The results confirm that equally weighted multimodels on average outperform the single models, and that projection errors can in principle be further reduced by optimum weighting. However, this not only requires accurate knowledge of the single model skill, but the relative contributions of the joint model error and unpredictable noise also need to be known to avoid biased weights. If weights are applied that do not appropriately represent the true underlying uncertainties, weighted multimodels perform on average worse than equally weighted ones, which is a scenario that is not unlikely, given that at present there is no consensus on how skill-based weights can be obtained. Particularly when internal variability is large, more information may be lost by inappropriate weighting than could potentially be gained by optimum weighting. These results indicate that for many applications equal weighting may be the safer and more transparent way to combine models. However, also within the presented framework eliminating models from an ensemble can be justified if they are known to lack key mechanisms that are indispensable for meaningful climate projections.

Full access
Andreas P. Weigel, Mark A. Liniger, and Christof Appenzeller

Abstract

This note describes how the widely used Brier and ranked probability skill scores (BSS and RPSS, respectively) can be correctly applied to quantify the potential skill of probabilistic multimodel ensemble forecasts. It builds upon the study of Weigel et al. where a revised RPSS, the so-called discrete ranked probability skill score (RPSSD), was derived, circumventing the known negative bias of the RPSS for small ensemble sizes. Since the BSS is a special case of the RPSS, a debiased discrete Brier skill score (BSSD) could be formulated in the same way. Here, the approach of Weigel et al., which so far was only applicable to single model ensembles, is generalized to weighted multimodel ensemble forecasts. By introducing an “effective ensemble size” characterizing the multimodel, the new generalized RPSSD can be expressed such that its structure becomes equivalent to the single model case. This is of practical importance for multimodel assessment studies, where the consequences of varying effective ensemble size need to be clearly distinguished from the true benefits of multimodel combination. The performance of the new generalized RPSSD formulation is illustrated in examples of weighted multimodel ensemble forecasts, both in a synthetic random forecasting context, and with real seasonal forecasts of operational models. A central conclusion of this study is that, for small ensemble sizes, multimodel assessment studies should not only be carried out on the basis of the classical RPSS, since true changes in predictability may be hidden by bias effects—a deficiency that can be overcome with the new generalized RPSSD.

Full access
André Walser, Marco Arpagaus, Christof Appenzeller, and Martin Leutbecher

Abstract

This paper studies the impact of different initial condition perturbation methods and horizontal resolutions on short-range limited-area ensemble predictions for two severe winter storms. The methodology consists of 51-member ensembles generated with the global ensemble prediction system (EPS) of the European Centre for Medium-Range Weather Forecasts, which are downscaled with the nonhydrostatic limited-area model Lokal Modell. The resolution dependency is studied by comparing three different limited-area ensembles: (a) 80-km grid spacing, (b) 10-km grid spacing, and (c) 10-km grid spacing with a topography coarse grained to 80-km resolution. The initial condition perturbations of the global ensembles are based on singular vectors (SVs), and the tendencies are not perturbed (i.e., no stochastic physics). Two configurations are considered for the initial condition perturbations: (i) the operational SV configuration: T42 truncation, 48-h optimization time, and dry tangent-linear model, and (ii) the “moist SV” configuration: TL95 truncation, 24-h optimization time, and moist tangent-linear model.

Lokal Modell ensembles are analyzed for the European winter storms Lothar and Martin, both occurring in December 1999, with particular attention paid to near-surface wind gusts. It is shown that forecasts using the moist SV configuration predict higher probabilities for strong wind gusts during the storm period compared to forecasts with the operational SV configuration. Similarly, the forecasts with increased horizontal resolution—even with coarse topography—lead to higher probabilities compared with the low-resolution forecasts. Overall, the two case studies suggest that currently developed operational high-resolution limited-area EPSs have a great potential to improve early warnings for severe winter storms, particularly when the driving global EPS employs moist SVs.

Full access
Simon C. Scherrer, Christof Appenzeller, Pierre Eckert, and Daniel Cattani

Abstract

The Ensemble Prediction System (EPS) of the European Centre for Medium-Range Weather Forecasts (ECMWF) was used to analyze various aspects of the ensemble-spread forecast-skill relation. It was shown that synoptic-scale upper-air spread measures can be used as first estimators of local forecast skill, although the relation was weaker than expected. The synoptic-scale spread measures were calculated based on upper-air fields (Z500 and T850) over western Europe for the period June 1997 to December 2000. The spread–skill relations for the operational ECMWF EPS were tested using several different spread definitions including a neural network-based measure. It was shown that spreads based on upper-air root-mean-square (rms) measures showed a strong seasonal cycle unlike anomaly correlation (AC)-based measures. The deseasonalized spread–skill correlations for the upper-air fields were found to be useful even for longer lead times (168–240 h). Roughly 68%–83% of small or large spread was linked to the corresponding high or low skill. A comparison with a perfect model approach showed the potential for improving the ECMWF EPS spread–skill relations by up to 25–30 correlation percentage points for long lead times.

Local forecasts issued by operational forecasters for the Swiss Alpine region, as well as station precipitation forecasts for Geneva were used to test the limits of the synoptic-scale upper-air spread as an estimator of local surface skill. A weak relation was found for all upper-air spread measures used. Although the probabilistic EPS direct model precipitation forecast for Geneva exhibited a considerable bias, the spread–skill relation was recovered at least up to 144 h. A neural network downscaling technique was able to correct the precipitation forecast bias, but did not increase the synoptic-scale spread surface-skill relation.

Full access
Andreas P. Weigel, Mark A. Liniger, and Christof Appenzeller

Abstract

Multimodel ensemble combination (MMEC) has become an accepted technique to improve probabilistic forecasts from short- to long-range time scales. MMEC techniques typically widen ensemble spread, thus improving the dispersion characteristics and the reliability of the forecasts. This raises the question as to whether the same effect could be achieved in a potentially cheaper way by rescaling single model ensemble forecasts a posteriori such that they become reliable. In this study a climate conserving recalibration (CCR) technique is derived and compared with MMEC. With a simple stochastic toy model it is shown that both CCR and MMEC successfully improve forecast reliability. The difference between these two methods is that CCR conserves resolution but inevitably dilutes the potentially predictable signal while MMEC is in the ideal case able to fully retain the predictable signal and to improve resolution. Therefore, MMEC is conceptually to be preferred, particularly since the effect of CCR depends on the length of the data record and on distributional assumptions. In reality, however, multimodels consist only of a finite number of participating single models, and the model errors are often correlated. Under such conditions, and depending on the skill metric applied, CCR-corrected single models can on average have comparable skill as multimodel ensembles, particularly when the potential model predictability is low. Using seasonal near-surface temperature and precipitation forecasts of three models of the Development of a European Multimodel Ensemble System for Seasonal-to-Interannual Prediction (DEMETER) dataset, it is shown that the conclusions drawn from the toy-model experiments hold equally in a real multimodel ensemble prediction system. All in all, it is not possible to make a general statement on whether CCR or MMEC is the better method. Rather it seems that optimum forecasts can be obtained by a combination of both methods, but only if first MMEC and then CCR is applied. The opposite order—first CCR, then MMEC—is shown to be of only little effect, at least in the context of seasonal forecasts.

Full access
Andreas P. Weigel, Mark A. Liniger, and Christof Appenzeller

Abstract

The Brier skill score (BSS) and the ranked probability skill score (RPSS) are widely used measures to describe the quality of categorical probabilistic forecasts. They quantify the extent to which a forecast strategy improves predictions with respect to a (usually climatological) reference forecast. The BSS can thereby be regarded as the special case of an RPSS with two forecast categories. From the work of Müller et al., it is known that the RPSS is negatively biased for ensemble prediction systems with small ensemble sizes, and that a debiased version, the RPSSD, can be obtained quasi empirically by random resampling from the reference forecast. In this paper, an analytical formula is derived to directly calculate the RPSS bias correction for any ensemble size and combination of probability categories, thus allowing an easy implementation of the RPSSD. The correction term itself is identified as the “intrinsic unreliability” of the ensemble prediction system. The performance of this new formulation of the RPSSD is illustrated in two examples. First, it is applied to a synthetic random white noise climate, and then, using the ECMWF Seasonal Forecast System 2, to seasonal predictions of near-surface temperature in several regions of different predictability. In both examples, the skill score is independent of ensemble size while the associated confidence thresholds decrease as the number of ensemble members and forecast/observation pairs increase.

Full access
Andreas P. Weigel, Daniel Baggenstos, Mark A. Liniger, Frédéric Vitart, and Christof Appenzeller

Abstract

Monthly forecasting bridges the gap between medium-range weather forecasting and seasonal predictions. While such forecasts in the prediction range of 1–4 weeks are vital to many applications in the context of weather and climate risk management, surprisingly little has been published on the actual monthly prediction skill of existing global circulation models. Since 2004, the European Centre for Medium-Range Weather Forecasts has operationally run a dynamical monthly forecasting system (MOFC). It is the aim of this study to provide a systematic and fully probabilistic evaluation of MOFC prediction skill for weekly averaged forecasts of surface temperature in dependence of lead time, region, and season. This requires the careful setup of an appropriate verification context, given that the verification period is short and ensemble sizes small. This study considers the annual cycle of operational temperature forecasts issued in 2006, as well as the corresponding 12 yr of reforecasts (hindcasts). The debiased ranked probability skill score (RPSSD) is applied for verification. This probabilistic skill metric has the advantage of being insensitive to the intrinsic unreliability due to small ensemble sizes—an issue that is relevant in the present context since MOFC hindcasts only have five ensemble members. The formulation of the RPSSD is generalized here such that the small hindcast ensembles and the large operational forecast ensembles can be jointly considered in the verification. A bootstrap method is applied to estimate confidence intervals. The results show that (i) MOFC forecasts are generally not worse than climatology and do outperform persistence, (ii) MOFC forecasts are skillful beyond a lead time of 18 days over some ocean regions and to a small degree also over tropical South America and Africa, (iii) extratropical continental predictability essentially vanishes after 18 days of integration, and (iv) even when the average predictability is low there can nevertheless be climatic conditions under which the forecasts contain useful information. With the present model, a significant skill improvement beyond 18 days of integration can only be achieved by increasing the averaging interval. Recalibration methods are expected to be without effect since the forecasts are essentially reliable.

Full access
Felix Fundel, Andre Walser, Mark A. Liniger, Christoph Frei, and Christof Appenzeller

Abstract

The calibration of numerical weather forecasts using reforecasts has been shown to increase the skill of weather predictions. Here, the precipitation forecasts from the Consortium for Small Scale Modeling Limited Area Ensemble Prediction System (COSMO-LEPS) are improved using a 30-yr-long set of reforecasts. The probabilistic forecasts are calibrated on the exceedance of return periods, independently from available observations. Besides correcting for systematic model errors, the spatial and temporal variability in the amplitude of rare precipitation events is implicitly captured when issuing forecasts of return periods. These forecast products are especially useful for issuing warnings of upcoming events. A way to visualize those calibrated ensemble forecasts conveniently for end users and to present verification results of the return period–based forecasts for Switzerland is proposed.

It is presented that, depending on the lead time and return period, calibrating COSMO-LEPS with reforecasts increases the precipitation forecast skill substantially (about 1 day in forecast lead time). The largest improvements are achieved during winter months. The reasonable choice of the length of the reforecast climatology is estimated for an efficient use of this computational expensive calibration method.

Full access
Paul M. Della-Marta, Mark A. Liniger, Christof Appenzeller, David N. Bresch, Pamela Köllner-Heck, and Veruska Muccione

Abstract

Current estimates of the European windstorm climate and their associated losses are often hampered by either relatively short, coarse resolution or inhomogeneous datasets. This study tries to overcome some of these shortcomings by estimating the European windstorm climate using dynamical seasonal-to-decadal (s2d) climate forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF). The current s2d models have limited predictive skill of European storminess, making the ensemble forecasts ergodic samples on which to build pseudoclimates of 310–396 yr in length. Extended winter (October–April) windstorm climatologies are created using scalar extreme wind indices considering only data above a high threshold. The method identifies up to 2363 windstorms in s2d data and up to 380 windstorms in the 40-yr ECMWF Re-Analysis (ERA-40). Classical extreme value analysis (EVA) techniques are used to determine the windstorm climatologies. Differences between the ERA-40 and s2d windstorm climatologies require the application of calibration techniques to result in meaningful comparisons. Using a combined dynamical–statistical sampling technique, the largest influence on ERA-40 return period (RP) uncertainties is the sampling variability associated with only 45 seasons of storms. However, both maximum likelihood (ML) and L-moments (LM) methods of fitting a generalized Pareto distribution result in biased parameters and biased RP at sample sizes typically obtained from 45 seasons of reanalysis data. The authors correct the bias in the ML and LM methods and find that the ML-based ERA-40 climatology overestimates the RP of windstorms with RPs between 10 and 300 yr and underestimates the RP of windstorms with RPs greater than 300 yr. A 50-yr event in ERA-40 is approximately a 40-yr event after bias correction. Biases in the LM method result in higher RPs after bias correction although they are small when compared with those of the ML method. The climatologies are linked to the Swiss Reinsurance Company (Swiss Re) European windstorm loss model. New estimates of the risk of loss are compared with those from historical and stochastically generated windstorm fields used by Swiss Re. The resulting loss-frequency relationship matches well with the two independently modeled estimates and clearly demonstrates the added value by using alternative data and methods, as proposed in this study, to estimate the RP of high RP losses.

Full access