Bayesian Model Verification of NWP Ensemble Forecasts

Andreas Röpnack Deutscher Wetterdienst, Offenbach, and Meteorological Institute, University of Bonn, Bonn, Germany

Search for other papers by Andreas Röpnack in
Current site
Google Scholar
PubMed
Close
,
Andreas Hense Meteorological Institute, University of Bonn, Bonn, Germany

Search for other papers by Andreas Hense in
Current site
Google Scholar
PubMed
Close
,
Christoph Gebhardt Deutscher Wetterdienst, Offenbach, Germany

Search for other papers by Christoph Gebhardt in
Current site
Google Scholar
PubMed
Close
, and
Detlev Majewski Deutscher Wetterdienst, Offenbach, Germany

Search for other papers by Detlev Majewski in
Current site
Google Scholar
PubMed
Close
Full access

We are aware of a technical issue preventing figures and tables from showing in some newly published articles in the full-text HTML view.
While we are resolving the problem, please use the online PDF version of these articles to view figures and tables.

Abstract

Forecasts of convective precipitation have large uncertainties. To consider the forecast uncertainties of convection-permitting models, a convection-permitting ensemble prediction system (EPS) based on the Consortium for Small-scale Modeling (COSMO) model with a horizontal resolution of 2.8 km covering all of Germany is being developed by the Deutscher Wetterdienst (DWD). The deterministic model is named COSMO-DE. Vertical structures of temperature and humidity affect the potential for convective instability. For verification of vertical model profiles, radiosonde data are used. However, the observed state is uncertain by itself because of the well-known limits in observing the atmosphere. In this work the authors use a probabilistic approach, which considers the observation error as well as the model uncertainty to validate multidimensional state vectors (e.g., temperature profiles) of the COSMO-DE-EPS and of two mesoscale ensembles with horizontal resolution of 10 km and parameterized convection. The mesoscale ensembles are the COSMO short-range EPS (COSMO-SREPS) and the COSMO limited-area EPS (COSMO-LEPS). The approach is based on Bayesian statistics and allows for both verification and comparison of ensembles. The investigation period comprises August 2007 for a comparison of the COSMO-DE-EPS with the COSMO-SREPS. A period of 5 days in July 2007 is used to demonstrate the potential of the Bayesian approach for verification by evaluating the COSMO-SREPS and COSMO-LEPS against COSMO-EU analyses. Based on the Bayesian approach, it is shown that the temperature profiles modeled by the COSMO-DE-EPS are more consistent with the observed profiles than those of COSMO-SREPS.

Corresponding author address: Andreas Röpnack, Deutscher Wetterdienst, Frankfurter Str. 135, 63067 Offenbach, Germany. E-mail: andreas.roepnack@dwd.de

Abstract

Forecasts of convective precipitation have large uncertainties. To consider the forecast uncertainties of convection-permitting models, a convection-permitting ensemble prediction system (EPS) based on the Consortium for Small-scale Modeling (COSMO) model with a horizontal resolution of 2.8 km covering all of Germany is being developed by the Deutscher Wetterdienst (DWD). The deterministic model is named COSMO-DE. Vertical structures of temperature and humidity affect the potential for convective instability. For verification of vertical model profiles, radiosonde data are used. However, the observed state is uncertain by itself because of the well-known limits in observing the atmosphere. In this work the authors use a probabilistic approach, which considers the observation error as well as the model uncertainty to validate multidimensional state vectors (e.g., temperature profiles) of the COSMO-DE-EPS and of two mesoscale ensembles with horizontal resolution of 10 km and parameterized convection. The mesoscale ensembles are the COSMO short-range EPS (COSMO-SREPS) and the COSMO limited-area EPS (COSMO-LEPS). The approach is based on Bayesian statistics and allows for both verification and comparison of ensembles. The investigation period comprises August 2007 for a comparison of the COSMO-DE-EPS with the COSMO-SREPS. A period of 5 days in July 2007 is used to demonstrate the potential of the Bayesian approach for verification by evaluating the COSMO-SREPS and COSMO-LEPS against COSMO-EU analyses. Based on the Bayesian approach, it is shown that the temperature profiles modeled by the COSMO-DE-EPS are more consistent with the observed profiles than those of COSMO-SREPS.

Corresponding author address: Andreas Röpnack, Deutscher Wetterdienst, Frankfurter Str. 135, 63067 Offenbach, Germany. E-mail: andreas.roepnack@dwd.de

1. Introduction

Quantitative precipitation forecasts (QPFs) are among the most challenging applications in numerical weather prediction (NWP). In particular forecasts of precipitation associated with deep convection have large uncertainties concerning the prediction of the intensity, location, and timing of the respective events (Browning et al. 2007). This is due to the fact that conditional instabilities [e.g., expressed by positive values of convective available potential energy (CAPE)] are released by unresolved-scale events, which themselves are triggered, but not strictly determined by other flow or boundary properties such as orography, soil moisture, etc. The instabilities of the atmospheric flow evolution strongly amplify small uncertainties either embedded in the large-scale flow, in the boundary layer, or in the surface characteristics on time scales of the order of the lifetime of the convective events releasing the instabilities. The inability of NWP models to represent these processes limits among others the capability of NWP models to forecast the right diurnal cycle of precipitation (Guichard et al. 2004) and is one reason for a lack of significant improvement in QPFs during the last decades in contrast to other forecast variables (Hense et al. 2006).

Because of the limited deterministic predictability of processes on the small spatial scale in state-of-the-art NWP models a probabilistic point of view for QPF is favorable. Therefore the Deutscher Wetterdienst (DWD) is developing an ensemble prediction system (EPS) based on the convection permitting limited-area model of the Consortium for Small-scale Modeling (COSMO) COSMO-DE. An intermediate version of this COSMO-DE-EPS development is described in Gebhardt et al. (2011). An EPS provides a limited number of samples corresponding to deterministic realizations of the future atmospheric flow development. In COSMO-DE-EPS, this sample is obtained by perturbations of the initial and boundary conditions and of the model physics to account for different types of uncertainties leading to the uncertainties in the forecasts as described above. These perturbations are explained in more detail in section 3. The additional information in the EPS, namely, the spread of the possible future paths in the atmospheric flow evolution initiated by the perturbations, reflects the forecast uncertainty in the prediction. The uncertainty of the EPS forecast can be quantified in terms of probabilities. Richardson (2000) figured out that probability forecasts derived from an EPS are of greater benefit for decision making under uncertainty than a single deterministic forecast produced by the same model. Similar to deterministic forecasting, it has to be assured that the probabilistic forecasts provide information of the future atmospheric state (Murphy and Winkler 1984).

This is the aim of verification of probabilistic forecasts using observations. Note that this is a problem by itself because an EPS provides a sample of forecasts, while nature provides only a single event. This will require the predicted probability density of the future state of the atmosphere to be compared to the single observed state of the atmosphere at the time of verification with the help of a scoring rule (Gneiting and Raftery 2007; Bröcker and Smith 2007). This scoring rule has to show the specific properties of propriety to allow an objective comparison of prediction and observations. Propriety means here that the score will always prefer a more accurate forecast (Bröcker and Smith 2007). A scoring rule is defined as proper if on average the best score is achieved only if the forecasted probability/probability density is identical to the observed one (Gneiting and Raftery 2007). An example for an unproper scoring rule (see Gneiting and Raftery 2007) is the following construction. A forecast system predicts a Gaussian density fp with expectation μp and variance σp. A possible scoring rule based on observation o could be
e1
which honors the closeness of the observations to the expectation and the level of uncertainty of the predictive density. However, a forecaster could cheat the score and obtain systematically smaller or better scores by issuing deterministic forecasts , which is obviously smaller and therefore seemingly better than S(fp, o).

Furthermore, the observed state is by itself uncertain due to the limited capabilities of observing the atmosphere. This again has to be taken into account to avoid that the forecast are considered exhibiting low skill in case of verifying observations of low quality. Currently, errors of observations are an issue in research verification. Bowler (2007) discussed the significant effect of observation errors on verification results. In particular, the effect of observation errors is not negligible when the forecast errors are small (e.g., at short lead times). This is confirmed by Candille and Talagrand (2008), who mentioned that within forecast ranges up to two days the uncertainty of the verification results is of the same order as the uncertainty of the prediction due to the observation error. Candille and Talagrand (2008) have introduced a method treating the observations as probability distribution. However, their work is focused on binary events for given thresholds. We will use a Bayesian statistical approach, which allows a natural way to consider errors of the observations with their full probability density function (PDF), which is not threshold dependent. In the presented Bayesian approach, the uncertainty of the EPS as well as of the observation is taken into account simultaneously in a statistically consistent way without a restriction on the properties of the underlying PDFs. Using a prior probability allows us to incorporate additional an unconditional prior knowledge. A further important advantage of the method is that we can investigate a multivariate state of a continuous variable. In this study radiosonde measurements are used as verifying observations. Therefore, we define as realizations of the multivariate random variable temperatures at different vertical levels and/or different radiosonde stations. But the theory described below readily generalizes to vectors, which combine different data types like temperature and wind components at single levels, different levels, or different stations.

Radiosonde observations can be considered to be of high quality, but uncertainties (sometimes called errors) are important in our case. In this work the radiosonde observations are placed into a single column of the NWP model as a function of height. This is the same way as these observations are assimilated into the model state. Kitchen (1989) showed that this procedure is acceptable for the synoptic scale. However, for the COSMO-DE running at convection-permitting scales, the drifting of the radiosonde is certainly not neglectable. As a result the radiosonde observations are erroneous beyond the standard instrumental error, which has to be taken into consideration. This is done by using the observation errors of the three-dimensional variational data assimilation (3DVAR) system of the DWD. Formally the drifting effects could be readily incorporated into our verification approach. But this would require an online extraction of the forecasts at the radiosonde sampling points, which is currently beyond our capabilities.

It is often necessary to compare a specific EPS system with another one to decide about the relative quality given the same observational dataset. Additionally, we verify structures or fields (realizations of multivariate random variables).

This is nearly impossible using single-point information of one selected variable. Therefore, the key targets of this work are to present a method

  • to verify and compare ensembles predictions of atmospheric state vectors,

  • to include observational errors, and

  • to allow for relative measures between different EPS systems.

All these aspects are important ingredients for verifying forecasts at resolution, which permit convection and that allow us to study the predictability of convection initiation potential. For a better physical understanding of the prediction of this processes a multidimensional state vector of the forecast ensemble has to be used characterizing at least the vertical temperature and moisture structure. The multivariate aspect is defined by several vertical levels, which are treated simultaneously taking into account the dependencies between the levels like the vertical gradients. Other driving mechanisms like moisture convergence are not readily available from a single radiosonde ascent but could be estimated better from a network of radiosondes. Therefore, the approach should also be capable of including several radiosondes profiles. The probabilistic approach is based on the Bayesian verification method for climate change simulations by Min et al. (2004) and Min and Hense (2006). We added the extension of multivariate kernel dressing proposed by Schölzel and Hense (2010) to estimate in a more flexible way the predictive probability density from the raw ensemble samples. The latter method needs an estimate of the inverse covariance matrix, which describes the variability of the ensemble. The standard maximum likelihood estimation of the covariance matrix often leads to noninvertible or singular matrices. This happens if the sample size of the ensemble used to estimate the covariance matrix is smaller than the dimension of the vectors characterizing the vertical and horizontal temperature and moisture structure. In this work we will introduce a method recently developed by Friedman et al. (2007) called the graphical least absolute shrinkage and selection operator (gLasso) method, which is specifically designed to estimate nonsingular covariance matrices and their inverse from small samples.

Additionally, in Gneiting and Raftery (2007) it was shown that this approach leads to a proper score allowing an unbiased evaluation of the forecasts either with respect to a climatology or a different forecasting system. We do not consider this study as a full-scale verification analysis of the D-Phase ensembles or the COSMO-DE-EPS. This task would require a much larger radiosonde network and forecasts covering much larger time periods. Our study is rather a proof-of-concept or pilot study to identify demonstrative strengths and weaknesses of the presented approach.

The outline of the paper is as follows: first the statistical basics of the Bayesian approach will be introduced. Second in section 3, the data especially the EPS used will be described. Finally, two different ensemble prediction systems based on the COSMO model, the COSMO limited-area EPS (COSMO-LEPS) and the COSMO short-range EPS (COSMO-SREPS), which are not convection permitting will be verified on the basis of analyzed temperature profiles of the operational COSMO model of the DWD for Europa (COSMO-EU) at three radiosonde stations in section 4. The COSMO-LEPS and the COSMO-SREPS (hereafter LEPS and SREPS) are mesoscale ensembles based on the COSMO model, but with different driving models. Additionally, the COSMO-DE-EPS as a convection-permitting short-range ensemble will be compared with the COSMO-SREPS with parameterized convection.

2. Basic theory

a. Statistical fundamentals

This section will introduce the basic theory. We will denote with f the model forecast of the state vector of dimension q of an NWP model. The difference between the true (but unknown) state vector ft and f is described by the error of the model ε:
e2

The statistics of the model error are fully described by the PDF g(ε) (Bouttier and Courtier 1999). Here we assume that the expectation of ε is zero meaning that there is no systematic error of the model or the observations. We will discuss this point at the end. A popular model of the PDF is a Gaussian distribution, in our case a multivariate Gaussian. The model ensemble can be understood as a Monte Carlo procedure to sample this PDF. A convenient way to join all informations and their inherent uncertainties is Bayesian statistics (Berger 1985). The approach based on Bayesian statistics for forecast validation is used because it allows for a more extended consideration of probability than the classic statistics and a straightforward inclusion of the observational uncertainty.

b. Bayesian statistics

The Bayesian approach “allows you to start with what you already believe (prior) and then to see how new information changes your confidence in that belief (posterior). The Bayes’s theorem says simply that the probability P of the hypothesis H, given the data D, is equal to the probability of the data, given the hypothesis is correct, multiplied by the probability of the hypothesis before obtaining the data, divided by the average probability of the data.” (Malakoff 1999):
e3

This Bayesian approach can be used to assess ensemble forecasts given observations. The complete forecast ensemble at lead time τ represent here the hypothesis H. We will not apply this to the full model state vector where the dimension q is of O(q) ~ 108. The aim of this Bayesian investigation is a multivariate verification of forecasts of vertical temperature and moisture profiles of several ensemble systems at a given set of radiosonde places and measurement heights. Partly this circumvents the problem of dimension reduction, which we will not discuss here in detail and refer to Jonko et al. (2009) or Hense and Römer (1995).

The multivariate structure is given by temperature and moisture at various vertical levels. Because we will compare several ensemble forecasts generated by EPS i = 1, … , I among each other or with the climatology i = 0 we will consider each EPS as the realization of a discrete random variable mi = mi(τ) at forecast lead τ. Each of these EPS is characterized by the prior probability P[mi(τ)]. This quantifies given knowledge about the forecasting system possibly in a subjective way (e.g., assessed by a questionnaire among professional weather forecasters).

For verification we want to find the evidence of a specific ensemble mi(τ) given the observational data o(τ). This can be expressed as the conditional probability P[mi(τ)|o(τ)], which is also called the posterior probability. The Bayes’s theorem relates the likelihood l[o(τ)|mi(τ)], the prior probability P[mi(τ)] and the posterior probability P[mi(τ)|o(τ)] as
e4
with
e5
The marginal probability r[o(τ)] is the sum over all ensembles here marked by mj(τ). The posterior evolves from the existing knowledge (the priors) and its modification through the likelihood of the observations (Min et al. 2004), which describes the measurement process including its uncertainties. The likelihood has to be further refined as follows. For simplicity reasons we drop the lead time τ in the following. Each ensemble mi is defined through the realizations , where k = 1, … , Ki is the number of the respective ensemble members. Then the likelihood l(o|mi) is the integral of the product of two PDFs. The first pl(o|f) describes the uncertainty of the observations and the second the uncertainty within the ensemble of the EPS mi:
e6
If we assume for the errors in the observations an unbiased multivariate Gaussian distribution, the conditional probability pl(o|f) can be formulated as
e7
To calculate p(f|mi) from the raw ensemble forecast, we use the multivariate kernel dressing approach by Schölzel and Hense (2010), who define the predictive PDF for the state vector f of the ensemble as a Gaussian mixture model with dressing covariance matrix Σi summed over all Ki realizations:
e8

Here Σo denotes the error covariance matrix of the observations and Σi the error covariance matrix of the EPS mi. The estimation of Σi and Σo is described below.

Inserting Eqs. (7) and (8) into Eq. (6), and applying some linear algebra the integral in Eq. (4) can be evaluated analytically with the following result:
e9
and the following definitions:
eq1

This shows that the posterior probability is a function of the Mahalanobis distances (MD) (Maesschalck et al. 2000; Mahalanobis 1936), which describes the variance-weighted distance between the forecasted state vector and the observation o. The MD is invariant to nonsingular linear transformations of the state vectors and o meaning that the final results are independent from the actual chosen basis (Sole and Tippett 2007). Among other advantages this means that one can compare different variable types or variables with largely different variability ranges as long as the errors are realizations of Gaussian distributed random variables. Additionally the advantage of the posterior probability is the explicit inclusion of the uncertainty of the forecast ensemble and of the observations.

The ratio of the posterior from the model ensemble mi to a specific reference ensemble mr can be used to compare two ensembles with each other. This ratio is called Bayes’s factor (Kass and Raftery 1995) and will be further discussed in the next section. As reference model mr we can define a specific deterministic model (section 4d) or a specific ensemble (section 4e). In the deterministic case we consider the deterministic model as the mean of an artificial one member ensemble but still including the uncertainty represented by the uncertainty of the corresponding ensemble mi in this case.

c. Bayes’s factor

The Bayes’s factor characterizing the relative performance of two ensembles mi and mr or the performance of mi relative to an analysis is defined as the ratio of the posterior probabilities:
e10

Using the Bayes’s factor has the advantage that the marginal probability of the data r(o) cancels out. The Bayes’s factor can be used to decide which ensemble is more likely with respect to the posterior probability. Gneiting and Raftery (2007) show that the logarithm of the Bayes’s factor is equivalent to the so-called ignorance or logarithmic score. Therefore, the logarithm of the Bayes’s factor is also a proper score. A detailed discussion about the meaning of propriety and a listing of proper scores can be found in Bröcker and Smith (2007). Additionally, it can be shown (Bröcker 2009) that any proper score can be decomposed into its uncertainty, resolution, and reliability components similar to the well-established decomposition of the Brier score in the univariate case. This also applies for the multivariate case because the derivation in (Bröcker 2009) readily generalizes to it. The actual problem is to estimate the conditional probability density of the observational data vector given a specified forecasted multivariate probability density. Therefore, we will discuss from now on the log of the Bayes’s factor.

In case of an analysis as reference model mr a logBir near zero means a nearly perfect forecast, considering the analysis as an approximation of the truth under a given uncertainty. If comparing two ensembles a logBir greater than zero describes the case in which the specific model mi is more likely than the reference model mr. Numbers less than zero indicate that the reference model is more likely. There is always a range around zero where both ensemble prediction systems cannot be distinguished statistically on the basis of the available information.

Therefore, Table 1 (Kass and Raftery 1995) introduces specific levels of evidence including a narrative description for comparing the performance of ensemble mi versus ensemble mj and the level of confidence of the predictive performance of the ensemble mi when compared to a verifying analysis mr.

Table 1.

Descriptive scales of the Bayes’s factor for the comparison of two ensembles after Kass and Raftery (1995). The second column shows the evidence for one ensemble in case of comparison of two ensembles, and the third column shows the case of validation indicating the level of confidence that the forecast accurately reflect the analysis when an ensemble forecast and an analysis are compared.

Table 1.

3. Data

The ensemble data of the investigation are taken from the demonstration of probabilistic hydrological and atmospheric simulation of flood events in the alpine region (D-Phase) project (Arpagaus et al. 2009). It is part of the mesoscale alpine project (MAP) and was implemented as a forecast demonstration project (FDP) of the world weather research program (WWRP) of the world meteorological organization (WMO). D-Phase is used to investigate the ability of forecasting heavy precipitation and related flooding events in the Alpine region. The domain of the D-Phase forecasts covers the area of the convective and orographically induced precipitation study (COPS) measurement campaign in the southwestern part of Germany in 2007 (Wulfmeyer et al. 2008).

The ensemble suite contains the COSMO-SREPS and the COSMO-LEPS ensembles. Both ensembles are based on the COSMO model with 10-km grid spacing and 40 vertical levels with parameterized deep convection. The SREPS is initialized and nested into the forecasts of four global models. These global models are the Unified Model (UM) of the Met Office (UKMO), the Integrated Forecast System (IFS) of European Centre for Medium-Range Weather Forecasts (ECMWF), the Global Model (GME) of DWD and the Global Forecast System (GFS) of the National Centers for Environmental Prediction (NCEP; Marsigli et al. 2006). Additionally, four different setups of essential parameterization are chosen giving in total 16 ensemble members. The LEPS ensemble is nested into 16 representative members of the global ECMWF ensemble prediction system. Similar to the SREPS, different setups of the parameterizations are included additionally. Details can be found in Marsigli et al. (2005). Finally the COSMO-DE-EPS (hereafter DE-EPS) provides the third regional forecast ensemble. It is under development at the DWD and is a short-range ensemble based on the nonhydrostatic COSMO-DE model. In contrast this model is a convection-permitting limited-area model with a horizontal grid spacing of 2.8 km and 50 vertical model levels (Baldauf et al. 2006).

The ensemble data of the DE-EPS are from runs at the DWD with an intermediate version of the DE-EPS, which comprised perturbations of the initial and boundary conditions and of the model physics. Table 2 gives an overview of the specification of the ensemble systems SREPS, LEPS, and DE-EPS including an overview of the physics perturbations.

Table 2.

Ensemble systems of COSMO-SREPS, COSMO-LEPS, and COSMO-DE-EPS. The four global models are: ECMWF global (IFS), DWD global (GME), NCEP global (GFS), and UKMO global (UM). The ensembles contain perturbations of the model physics (e.g., the COSMO-DE-EPS contains five physics perturbations: p1, p2, p3, p4, and p5). An explanation of the name list parameters can be found in Schättler et al. (2009).

Table 2.

As observations we use radiosondes ascents provided by DWD during the COPS campaign (Wulfmeyer et al. 2008). During COPS, radiosondes were released every 6 h (0000, 0600, 1200, 1800 UTC) within the COPS area. In our case we use the stations Stuttgart, Idar-Oberstein, and Nancy from Météo-France. During the COPS intensive observation periods (IOP) several additional radiosondes at the German stations were started at 0500, 0800, 1100, 1500, 1800, and 2100 UTC. As mentioned in the introduction, we do not consider this study as a full-scale verification analysis, but rather as a pilot study to demonstrate the potential of this Bayesian verification approach.

4. Results

a. Kernel dressing and the estimation of the dressing covariance matrix

A first step in the analysis is the conversion of the raw ensemble predictions into a predictive probability density function. Several methods have been described in the literature (Wilks and Hamill 2007) among which ensemble Gaussian kernel dressing can be found. The multivariate Gaussian kernel dressing was introduced by Schölzel and Hense (2010). Essential to the method is the estimation of the dressing covariance matrix, which describes the error statistics of the internal variability. As internal variability we understand the spread of the ensemble. We consider it as an estimation of the uncertainty of the ensemble forecast. There is no attempt in this study to correct the forecasts for systematic errors by calibration to the observations, which is often summarized as postprocessing of the raw forecasts. Here we define the kernel dressing as an additional and the only postprocessing of the raw ensemble forecasts. This is done according to the following steps:

  • From a given ensemble at a fixed date and a fixed forecast lead time we calculate all possible differences between each single realization as a prewhitening filter to remove approximately the true signal ft in Eq. (2).

  • This differences are assumed to be realizations of the forecast error scaled by a factor of from which we estimate a first covariance matrix. The factor arises from the fact that the variance of the difference of two independent random variables described by the same probability density is twice the common variance. Therefore, taking the difference between two realizations of the ensemble and scaling the difference with removes the predictable signal to first order and provides another realization of the unpredictable component. More details can be found in Schölzel and Hense (2010).

  • These covariance matrices are averaged over the past 5 days. The daily cycle is taken into account by averaging only those covariance matrices over the past 5 days, which have been estimated at a fixed forecast lead time.

  • This averaging is not based on a large enough sample size in case of treating three stations with eight levels and two variables (q = 3 × 8 × 2 = 48) to guarantee a nonsingular covariance matrix from standard maximum likelihood estimation. Therefore, we use the recently developed gLasso method by Friedman et al. (2007) to estimate the covariance matrix. Details of the method are presented in the appendix.

  • If the covariance matrix has to be calculated for a day located at the beginning of the investigation period then the period for the averaging will be mirrored at the edge of the time range.

  • The averaged matrix defines Σi [Eq. (8)].

  • The radiosonde observations are processed as column observations at eight pressure levels: 1000, 925, 850, 700, 500, 300, 250, and 200 hPa at fixed horizontal coordinates for the verification. The forecasted temperature and moisture values at the model levels are interpolated to these pressure levels in the column above the starting point of the radiosonde.

  • The covariance matrix of the observations Σo is assumed to be diagonal with the variances taken from the DWD data assimilation system. Among others these variances should also account for the uncertainties arising from the no-drift assumption of the radiosondes.

b. A univariate example

The standard kernel dressing (SKD) can be well visualized in the one-dimensional (univariate) case (Bröcker and Smith 2008) together with the evaluation of the likelihood. Figure 1 shows the result for the temperature at 850 hPa at a single grid point in the SREPS ensemble for a 6-h forecast on 15 July 2007.

Fig. 1.
Fig. 1.

Univariate temperature PDF of one level (850 hPa) at one grid point of the 6-h forecast of SREPS at 15 Jul 2007. The raw ensemble forecasts are marked by “+” and the observations by (a) The construction of the kernel dressing prior PDF, (b) the kernel dressing prior PDF together with the Gaussion PDF describing the observations (dashed curve in both figures shows the classic Gaussian density for the SREPS ensemble defined by the mean and variance of the raw forecasts). (c) The likelihood function of the observations for the kernel as well as the Gaussian prior density (the vertical gray line shows the point at which the likelihoods are evaluated).

Citation: Monthly Weather Review 141, 1; 10.1175/MWR-D-11-00350.1

Here it is obvious how the approach works. The PDF for all possible temperature values of the ensemble based on the individual ensemble members is derived by first centering individual Gaussian densities with identical variances at each single member’s temperature value (gray lines in Fig. 1a). Each bell-shaped curve is a function of all possible temperature values. In a second step for any of the possible forecasted temperature values being fixed an unweighted average of the single Gaussian densities is computed defining the desired PDF (black line in Fig. 1a). For comparison the Gaussian prior density derived from the ensemble mean and variance is depicted as the dashed black line in Fig. 1a. Depending on the size of the dressing variance that characterizes the width of the individual Gaussians and depending on the given ensemble realizations being strongly non-Gaussian (e.g., plateaulike), probability densities may be estimated in contrast to the standard approach. This standard approach is equal to estimating just one single Gaussian density by using the mean and the variance of the forecast values of all ensemble realizations as sample (Fig. 1b). The final likelihood (Fig. 1c) is calculated from a convolution of observational PDF (dotted line in Fig. 1b) with one of the prior densities. It depends strongly on the ratio between the ensemble and the observational variance. In this example the ensemble variance is 0.48 K2, the observational variance 0.75 K2, and the kernel dressing variance 0.1 K2. Therefore, the likelihood collapses to a single Gaussian even in cases of kernel dressing. Besides the general approach this example also illustrates the influence and importance of uncertain observations upon the verification.

The multivariate case proceeds in a similar manner. Instead of a single dressing variance the dressing covariance matrix is used and the predictive probability density is constructed by a finite series of multivariate Gaussian densities centered at the single realizations. Details can be found in Schölzel and Hense (2010).

c. The multivariate case

The first results of the full Bayesian verification comparing the available ensemble forecasts for July 2007 will be presented in this section with the verification of SREPS and LEPS with respect to COSMO-EU analysis. Then the comparison of two ensembles—the DE-EPS with the SREPS—is presented. The log Bayes’s factor can be evaluated on each available day and forecast lead time. Note that the multivariate character of the analysis is defined by treating jointly the temperature forecasts and observations at eight pressure levels and three stations. The dimension of the vectors is 24 if not otherwise stated.

In Fig. 2, the time series of an NWP ensemble is shown. The Bayes’s factor is below zero given that an analysis is used as reference model and the Bayes’s factor decreases with time because the forecast error grows with time. At the beginning of the forecast the Bayes’s factor is still near zero. This shows that there is a high level of confidence in favor of the model forecasts (see Table 1) in the first few hours. Later, the model skills of the SREPS are between a medium and a low level of confidence. At the forecast time of 18 h, the figure shows a strong decrease of evidence against the SREPS indicating a very unlikely forecast of the vertical temperature profile of the SREPS given the vertical profile of the COSMO-EU analysis. The comparison of the forecasted SREPS temperature profiles with the observation shows a strong cold bias in the boundary layer (not shown) being responsible for this drop in the score. In case of only one station there is also a value of the log Bayes’s factor close to zero at the 36-h forecast, which indicates a very good forecast. The cause is certainly the small time period of only one ensemble run as well as the use of only one radiosonde station for this investigation, which is not exactly equal to analysis. The Bayesian approach considers explicitly an error of the observation. Theoretically, it could happen that the forecasted profile is closer to the observation than the analysis leading to a log Bayes’s factor greater than zero. The case of three stations has the advantage of an increased representativeness of the result. Finally, usage of the 0-h forecast from later runs of the SREPS as the reference model has the benefit of verifying the SREPS forecast with its own initial state. But this has the disadvantage of a lower temporal resolution because the SREPS was initialized only every 24 h. The previous results with COSMO-EU analysis as the reference model are in agreement with the result in this case.

Fig. 2.
Fig. 2.

Time series of the Bayes’s factor logBir of the temperature at forecast time (indicated as vvtime on the abscissa) with respect to the analysis at the same time. The black solid line shows the Bayes’s factor of one station (Stuttgart) and the dashed line averaged over three stations (Stuttgart, Idar-Oberstein, and Nancy). The dotted–dashed line shows the Bayes’s factor for the three station treated jointly, with the initial state of the SREPS from a later run as the reference model. The gray band describes a significant area (see Table 1).

Citation: Monthly Weather Review 141, 1; 10.1175/MWR-D-11-00350.1

d. Verification of ensembles

For the investigation in this section the log Bayes’s factor is averaged over 5 days of July 2007. The time of the investigation contains the intensive observation period IOP8b (15 July 2007) of the COPS campaign (Wulfmeyer et al. 2008).

In Fig. 3 the reference model for both SREPS and LEPS is an analysis of the COSMO-EU. At the forecast start time, the SREPS has a value of the Bayes’s factor below the corresponding value of the LEPS. This can be explained with the initialization of the ensembles. The LEPS is closely induced to an analysis of the global ECMWF-EPS, while the SREPS initial data are interpolated 12-h forecasts of four different global deterministic NWP forecast systems. The Bayes’s factor of the SREPS (Fig. 3a) shows at 18-h forecast the same decrease of confidence against the analysis as the LEPS at the 6-h forecast (Fig. 3b). This reflects the time shift inherent in the LEPS being initialized 12 h later than SREPS. The decrease of confidence at the 18-h lead-time forecast has also been seen in the case study (Fig. 2) and is caused by a cold bias in the boundary layer occurring nearly in all ensemble forecasts of the investigated period. The cold bias is strongest for start times at the beginning of the period, but decreases with start times toward the end of the period. The standard deviation represented by the vertical bar of the SREPS is caused by the quite strong variation of the likelihood of the SREPS forecast at this lead time. The comparison with the standard deviation of the LEPS shows smaller variations. Furthermore, the Bayes’s factor of the LEPS does not decrease as strongly as in the case for the SREPS. This is due to a weaker cold bias of the LEPS in contrast to the SREPS.

Fig. 3.
Fig. 3.

Time series of the log Bayes’s factor logBir of the temperature at forecast time (vv time) averaged over 5 days and three station for (a) the SREPS case and (b) the LEPS. The initial time of the LEPS is 12 h later as the initial time of the SREPS. The gray band describes a significant area (see Table 1). The vertical bar shows the standard deviation of the Bayes’s factor.

Citation: Monthly Weather Review 141, 1; 10.1175/MWR-D-11-00350.1

The model skills of both ensembles have a “high level of confidence” (Table 1) at the first 18 h of the forecast time, except for the decrease of evidence due to the cold bias in the boundary layer, which was mentioned before. The Bayes’s factors decrease with lead time, but even at the 72-h forecast the model skills of both ensembles have a “medium level of confidence.” However, the standard deviation of the LEPS is smaller as that of the SREPS over the whole period showing that there is more variability in the ensemble forecasts of the SREPS. This is influenced by the usage of the interpolated 12-h forecasts of four different global models as the initial state leading partly to very unlikely forecasts of temperature profiles.

e. Comparison of ensembles

The results in this section were calculated individually for 21 days of August 2007 where observations and forecasts are both available. Then the log Bayes’s factor was averaged over these days to obtain a clearer signal.

Figure 4a shows that the DE-EPS is more likely than the SREPS at the entire forecast time. This result could be explained by the increased resolution of the DE-EPS of 2.8 km horizontally and 50 model levels in the vertical in contrast to the 10-km grid spacing and 40 model levels of the SREPS. That means that in the DE-EPS there are less parameterization and more physical processes explicitly resolved by the model (e.g., the convection). This is a possible cause for the prediction of a more likely temperature profile. Another reason for the good performance of the DE-EPS is a false prediction of the vertical structure of the atmosphere from the SREPS at two subsequent ensemble runs. In these two ensemble runs, the passage of a ridge (not shown) is predicted too early leading to a significant warm temperature bias at almost all levels up to 200 hPa. This false prediction is due to the initialization of the SREPS not using an analysis, but rather the 12-h forecast of the global models as the initial state. This has to be kept in mind when interpreting the clear result for the DE-EPS based on the Bayes’s factor.

Fig. 4.
Fig. 4.

Time series of the Bayes’s factor of COSMO-DE-EPS at forecast time (vv time) with respect to SREPS. (a) The multivariate case for the temperature at eight vertical levels and (b) the univariate (850 hPa) case of the temperature. The gray band describes a significant area (see Table 1). The vertical bars show the standard deviation of the respective Bayes’s factor.

Citation: Monthly Weather Review 141, 1; 10.1175/MWR-D-11-00350.1

The evidence for the DE-EPS in case of the arithmetic averaging over the three vertical profiles is strong (see Table 1). But the evidence for the DE-EPS is even larger (“decisive”) in the case of the simultaneous, joint treatment of the three profiles. Here the dimension of the model state vector is q = 24 with eight levels at three stations. The error covariance matrix is estimated by the graphical lasso method from Friedman et al. (2007), because in this case the maximum likelihood estimated covariance matrix is singular in consequence of the fact \that the ensemble size Ki = 16 (SREPS) or =20 (DE-EPS) is smaller as the dimension q of the model state vector. Apparently there are correlations between the stations in simulations as well as in the observations, which are explicitly resolved in the case of the joint treatment but averaged out in the first case being responsible for the forecast skill. Further, it seems necessary to investigate a longer time period to confirm this evidence for the DE-EPS. The large standard deviation for the 1-month investigation period of August 2007 in Fig. 4a gives an indication of this. Figure 4b shows the results for the univariate case taking as tested variable the 850-hPa temperature. In this case all relevant correlations between the temperature values at various levels and stations are completely lost and it is not possible to decide, which ensemble system is more likely. It clearly shows the advantage of using the multivariate approach.

The univariate results of the Bayes’s factor allows us additionally to compare the Bayes’s factor with standard probabilistic scores. Table 3 shows the ignorance score (IGN) and the continuous rank probability score (CRPS). It is shown that both skill scores of the IGN and of the CRPS fit together quite well. However, the comparison of these two scores with the Bayes’s factor shows that the Bayes’s factor is neutral at the 12-h forecast lead time whereas the other two scores show a clear skill for the DE-EPS. The mean square error (MSE) of the ensemble mean is indeed smaller for the DE-EPS in contrast to the SREPS (not shown). However the standard probabilistic scores do not consider the observation error, which has here the same order as the MSE for both ensembles. Hence, the Bayes’s factor considers explicitly this observation error, which prevents one EPS of being judged as more likely than the other one. For the other forecast lead times, the two EPS are quite close together.

Table 3.

Comparison of the Bayes’s factor (see Fig. 4b) with the IGN and the CRPS as skill scores (SS).

Table 3.

In the Bayesian statistics it is possible to vary the prior probability P[mi(τ)] for the models under investigation, here the DE-EPS and the SREPS. In Bayesian statistics this prior probability indicates existing knowledge about the prediction systems, which is not contained in the actual ensemble forecasts to be analyzed. This could be a probabilistic statement from an earlier verification attempt or a subjective assessment of the ensemble forecasts by human forecasters. However, even if such specific prior properties are not available as in our case, studying the statistical results with respect to possible variation in the prior probability P[mi(τ)] is a straightforward way to assess the robustness of the comparison to different, possible subjective perceptions of prior belief. Therefore, we study the sensitivity of the statistical results with respect to the possible variations in the prior probability P[mi(τ)]. Because only two models are compared we have
e11

If P[mde-eps(τ)] is larger (smaller) than P[msreps(τ)], this indicates that the personal belief in the DE-EPS (e.g., of a professional forecaster) is higher (lower) than for SREPS. The effect of varying the prior probabilities on the Bayes’s factor for the comparison of the DE-EPS with the SREPS is shown in Fig. 5. The figure shows that the evidence of DE-EPS is higher than in SREPS even if the prior probability of the DE-EPS (e.g., ranked by a human forecaster) is as low as 0.2. Note that the case shown in Fig. 4a is also included here for the case that both ensembles have the same prior probability P(mde-eps) = P(msreps) = 0.5.

Fig. 5.
Fig. 5.

The distribution of the Bayes’s factor logBir for DE-EPS (mde-eps) and SREPS (msreps) given the prior of mde-eps (msreps) varies from 0.01 to 0.99 (from 0.99 to 0.01). The forecast lead time is shown by vv.

Citation: Monthly Weather Review 141, 1; 10.1175/MWR-D-11-00350.1

5. Conclusions

The aim of this analysis based on Bayesian statistics was the presentation of a multivariate verification and comparison method of ensemble forecasts. We tested the approach using vertical temperature profiles with respect to the predictability of convection initiation potential. It was shown that the forecasted temperature profiles of the DE-EPS are much more likely as those of the SREPS even if the prior belief of a professional forecaster in the DE-EPS might be as low as 0.2. This result shows that the short-range ensemble weather forecast from the convection permitting COSMO-DE model (the DE-EPS) seems to be a valid and useful way to quantify the uncertainty of short-range weather forecasts at least for the hindcasts performed for August 2007. However, a longer period of investigations seems necessary to confirm these results. Until now only the temperature at three radiosonde stations within the COPS area was investigated. Further variables like the equivalent potential temperature or the temperature lapse rate have to be analyzed to learn more about the predictability of the convection potential. Both quantities are important for the vertical stability of the atmosphere; the first one includes the impact of humidity.

Generally, it has been shown that the statistical model described in this study is appropriate to compare ensemble systems with each other. The score we propose is a generalization of the ignorance score taking into account the uncertainty of the observations as well as the spatial correlation structure of the verified forecasts. The consideration of the observation uncertainty is important because generally observations cannot be assumed to be perfect and do not represent exactly the truth state of the atmosphere as most other verification methods do. In our case and in most of current regional high-resolution NWP models, the major errors source is probably from radiosonde measurements, which are assigned to the vertical atmospheric column above the starting position without considering the horizontal drift.

The Bayes’s factor allows for a comprehensive evaluation of the forecast quality of three-dimensional samples by using just one score. Even an extension to temporal–spatial structures is readily possible provided that there is a method available to estimate nonsingular covariance or correlation matrices of high-dimensional state vectors. For this we introduced the method recently developed by Friedman et al. (2007) called the graphical lasso method, which is specifically designed to estimate nonsingular covariance matrices and their inverse from small samples.

Acknowledgments

The authors wish to thank the Deutsche Forschungsgemeinschaft (DFG) for funding the priority program SPP1167 in which this work was embedded. Furthermore, we thank Susanne Theis from the DWD for her assistance with the COSMO-DE-EPS. We are grateful to the D-Phase project where we received the COSMO-SREPS data used for this investigation. The data of the D-Phase project were provided by the CERA database of the Max Planck Institute for meteorology (MPI) in Hamburg, Germany. We also thank the anonymous reviewers for their helpful suggestions that improved this work.

APPENDIX

The gLasso Method

The basic starting point for the gLasso method is to estimate a sparse matrix Θ, which is the inverse of a covariance matrix and that maximizes the log-likelihood function J of N multivariate normal distributed samples of dimension q penalized by an additional term Jpen, and is already maximized partially with respect to the mean μ. The indicates the standard estimate of the covariance matrix, det Θ is the determinant of Θ, and the factor γ controls the influence of the penalize term:
ea1
The Hammersley–Clifford theorem (Wainwright and Jordan 2008, p. 45) proves that two components i and j of a Gaussian distributed random vector variable are conditionally independent given the remaining components if the entry Θij is zero. Here this Gaussian-distributed random vector variable is the temperature value at various levels. Therefore, it makes sense to require as much as possible entries Θij to be zero as additional information to estimate the covariance matrix and its inverse. Similar procedures in data assimilation applied to the covariance matrix are called localization. This can be achieved by a penalize term, which sums the absolute values of the matrix entries the so called matrix norm of Θ:
ea2

The maximization of the absolute values guarantees that the extreme value of J is attained at Θij = 0 (Knight and Fu 2000). The procedure to find the extreme value of J is called the least absolute shrinkage and selection operator (lasso). If the vector components are viewed as nodes of a network that are linked if they are not conditionally independent and not connected if they are conditionally independent, the method defines a so-called graph (the joint set of nodes and links) or graphical model for the interactions of the vector component. Therefore, the term gLasso was coined. The interpretation of the method is that a nonsingular matrix is estimated in which inverse Θ−1 is as similar as possible to the standard sample covariance matrix S and that itself has the least nonzero entries or the smallest necessary graph to explain the covariances among the vector components.

We use the algorithm of Friedman et al. (2007) available through their program package gLasso for the R programming environment (http://www-stat.stanford.edu/~tibs/glasso/).

REFERENCES

  • Arpagaus, M., and Coauthors, 2009: MAP D-Phase: Demonstrating forecast capabilities for flood events in the Alpine region. Report on the WWRP Forecast Demonstration Project D-PHASE, WWRP Joint Scientific Committee, Rep. 78, 79 pp. [Available online at http://www.meteoschweiz.admin.ch/web/de/forschung/publikationen/alle_publikationen/veroeff_78.Par.0001.DownloadFile.tmp/veroeff78.pdf.]

  • Baldauf, M., K. Stephan, S. Klink, C. Schraff, A. Seifert, J. Förstner, T. Reinhardt, and C. J. Lenz, 2006: The new very short range forecast model LMK for the convection-resolving scale. Extended Abstracts, Second THORPEX Int. Science Symp., Part B, Landshut, Bavaria, Germany, WMO, 148–149. [Available online at http://www.pa.op.dlr.de/stiss/proceedings.html.]

  • Berger, J. O., 1985: Statistical Decision Theory and Bayesian Analysis. 2nd ed. Springer Series in Statistics, Vol. XVI, Springer, 617 pp.

  • Bouttier, F., and P. Courtier, 1999: Data assimilation concepts and methods. ECMWF—Lecture Notes NWP Course, 58 pp.

  • Bowler, N. E., 2007: Numerical Weather Prediction Accounting for the effect of observation errors on verification of MOGREPS. NWP Tech. Rep. 506, 7 pp.

  • Bröcker, J., 2009: Reliability, sufficiency, and the decomposition of proper scores. Quart. J. Roy. Meteor. Soc., 135, 15121519.

  • Bröcker, J., and L. Smith, 2007: Scoring probabilistic forecasts: The importance of being proper. Wea. Forecasting, 22, 382388.

  • Bröcker, J., and L. Smith, 2008: From ensemble forecast to predictive distribution function. Tellus, 60, 663678.

  • Browning, K. A., and Coauthors, 2007: The convective storm initiation project. Bull. Amer. Meteor. Soc., 88, 19391955.

  • Candille, G., and O. Talagrand, 2008: Impact of observational error on the validation of ensemble prediction systems. Quart. J. Roy. Meteor. Soc., 134, 959971.

    • Search Google Scholar
    • Export Citation
  • Friedman, J., T. Hastie, and R. Tibshirani, 2007: Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9, 432441.

    • Search Google Scholar
    • Export Citation
  • Gebhardt, C., S. E. Theis, M. Paulat, and Z. B. Bouallegue, 2011: Uncertainties in COSMO-DE precipitation forecasts introduced by model pertubations and variation of lateral boundaries. Atmos. Res., 100, 168177.

    • Search Google Scholar
    • Export Citation
  • Gneiting, T., and E. A. Raftery, 2007: Strictly proper scoring rules, prediction and estimation. J. Amer. Stat. Assoc., 102, 359378.

  • Guichard, F., J. C. Petch, and J.-L. Redelsperger, 2004: Modelling the diurnal cycle of deep precipitation convection over land with cloud-resolving models and single-column models. Quart. J. Roy. Meteor. Soc., 130, 31393172.

    • Search Google Scholar
    • Export Citation
  • Hense, A., and U. Römer, 1995: Statistical analysis of tropical climate anomaly simulations. Climate Dyn., 11, 178192.

  • Hense, A., G. Adrian, C. Kottmeier, C. Simmer, and V. Wulfmeyer, 2006: The German priority Program SPP1167 PQP Quantitative Precipitation Forecast: An overview. Second Int. Symp.on Quantitative Precipitation Forecasting (QPF) and Hydrology, Boulder, CO, NCAR, 17 pp. [Available online at http://www.mmm.ucar.edu/events/qpf06/QPF/Session7/hense_Simmer_PQP_Overview.pdf.]

  • Jonko, A. K., A. Hense, and J. J. Feddema, 2009: Effect of land cover change on the tropical circulation in a GCM. Climate Dyn., 35, 635649, doi:10.1007/s00382-009-0684-7.

    • Search Google Scholar
    • Export Citation
  • Kass, E., and A. E. Raftery, 1995: Bayes factors. J. Amer. Stat. Assoc., 90, 773795.

  • Kitchen, M., 1989: Representativeness errors for radiosonde observations. Quart. J. Roy. Meteor. Soc., 115, 673700.

  • Knight, K., and W. Fu, 2000: Asymptotics for lasso-type estimators. Ann. Stat., 28, 13561378.

  • Maesschalck, R. D., D. Jouan-Rimbaud, and D. L. Massart, 2000: The Mahalanobis distance. Chem. Intell. Lab. Syst., 50, 118.

  • Mahalanobis, P. C., 1936: On the generalised distance in statistics. Proc. Natl. Inst. Sci. India, 12, 4955.

  • Malakoff, D., 1999: Statistics: A brief guide to Bayes Theorem. Science, 286, 1461, doi:10.1126/science.286.5444.1461.

  • Marsigli, C., F. Boccanera, A. Montani, and T. Paccagnella, 2005: The COSMO-LEPS mesoscale ensemble system: Validation of the methodology and verification. Nonlinear Processes Geophys., 12, 527536.

    • Search Google Scholar
    • Export Citation
  • Marsigli, C., A. Montani, and T. Paccagnella, 2006: The COSMO-SREPS project. Newsletter of the 28th EWGLAM and 13th SRNWP Meetings, Zurich, Switzerland, 267–274.

  • Min, S.-K., and A. Hense, 2006: A Bayesian approach to climate model evaluation and multi-model averaging with an application to global mean surface temperatures from IPCC AR4 coupled climate models. Geophys. Res. Lett., 33, L08708, doi:10.1029/2006GL025779.

    • Search Google Scholar
    • Export Citation
  • Min, S.-K., A. Hense, H. Paeth, and W.-T. Kwon, 2004: A Bayesian decision method for climate change signal analysis. Meteor. Z., 13, 421436.

    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., and R. L. Winkler, 1984: Probability forecasting in meteorology. J. Amer. Stat. Assoc., 79, 489500.

  • Richardson, D. S., 2000: Skill and relative economic value of the ECMWF ensemble prediction system. Quart. J. Roy. Meteor. Soc., 126, 649667.

    • Search Google Scholar
    • Export Citation
  • Schättler, U., G. Doms, and C. Schraff, 2009: A description of the nonhydrostatic regional model LM. Part VII: Users guide. Deutscher Wetterdienst (DWD), Offenbach, Germany, 142 pp.

  • Schölzel, C., and A. Hense, 2010: Probabilistic assessment of regional climate change in Southwest Germany by ensemble dressing. Climate Dyn., 36, 20032014, doi:10.1007/s00 382-010-0815-1.

    • Search Google Scholar
    • Export Citation
  • Sole, T. D., and M. K. Tippett, 2007: Predictability: Recent insights from information theory. Rev. Geophys., 45, 202, doi:10.1029/2006RG000.

    • Search Google Scholar
    • Export Citation
  • Wainwright, M. J., and M. I. Jordan, 2008: Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn., 1, 1305, doi:10.1561/2200000001. [Available online at http://dl.acm.org/citation.cfm?id=1498840.1498841.]

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., and T. M. Hamill, 2007: Comparison of ensemble–MOS methods using GFS reforecasts. Mon. Wea. Rev., 135, 23792390.

  • Wulfmeyer, V., A. Behrendt, and H.-S. Bauer, 2008: The convective and orographically induced precipitation study. Bull. Amer. Meteor. Soc., 89, 14771486.

    • Search Google Scholar
    • Export Citation
Save
  • Arpagaus, M., and Coauthors, 2009: MAP D-Phase: Demonstrating forecast capabilities for flood events in the Alpine region. Report on the WWRP Forecast Demonstration Project D-PHASE, WWRP Joint Scientific Committee, Rep. 78, 79 pp. [Available online at http://www.meteoschweiz.admin.ch/web/de/forschung/publikationen/alle_publikationen/veroeff_78.Par.0001.DownloadFile.tmp/veroeff78.pdf.]

  • Baldauf, M., K. Stephan, S. Klink, C. Schraff, A. Seifert, J. Förstner, T. Reinhardt, and C. J. Lenz, 2006: The new very short range forecast model LMK for the convection-resolving scale. Extended Abstracts, Second THORPEX Int. Science Symp., Part B, Landshut, Bavaria, Germany, WMO, 148–149. [Available online at http://www.pa.op.dlr.de/stiss/proceedings.html.]

  • Berger, J. O., 1985: Statistical Decision Theory and Bayesian Analysis. 2nd ed. Springer Series in Statistics, Vol. XVI, Springer, 617 pp.

  • Bouttier, F., and P. Courtier, 1999: Data assimilation concepts and methods. ECMWF—Lecture Notes NWP Course, 58 pp.

  • Bowler, N. E., 2007: Numerical Weather Prediction Accounting for the effect of observation errors on verification of MOGREPS. NWP Tech. Rep. 506, 7 pp.

  • Bröcker, J., 2009: Reliability, sufficiency, and the decomposition of proper scores. Quart. J. Roy. Meteor. Soc., 135, 15121519.

  • Bröcker, J., and L. Smith, 2007: Scoring probabilistic forecasts: The importance of being proper. Wea. Forecasting, 22, 382388.

  • Bröcker, J., and L. Smith, 2008: From ensemble forecast to predictive distribution function. Tellus, 60, 663678.

  • Browning, K. A., and Coauthors, 2007: The convective storm initiation project. Bull. Amer. Meteor. Soc., 88, 19391955.

  • Candille, G., and O. Talagrand, 2008: Impact of observational error on the validation of ensemble prediction systems. Quart. J. Roy. Meteor. Soc., 134, 959971.

    • Search Google Scholar
    • Export Citation
  • Friedman, J., T. Hastie, and R. Tibshirani, 2007: Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9, 432441.

    • Search Google Scholar
    • Export Citation
  • Gebhardt, C., S. E. Theis, M. Paulat, and Z. B. Bouallegue, 2011: Uncertainties in COSMO-DE precipitation forecasts introduced by model pertubations and variation of lateral boundaries. Atmos. Res., 100, 168177.

    • Search Google Scholar
    • Export Citation
  • Gneiting, T., and E. A. Raftery, 2007: Strictly proper scoring rules, prediction and estimation. J. Amer. Stat. Assoc., 102, 359378.

  • Guichard, F., J. C. Petch, and J.-L. Redelsperger, 2004: Modelling the diurnal cycle of deep precipitation convection over land with cloud-resolving models and single-column models. Quart. J. Roy. Meteor. Soc., 130, 31393172.

    • Search Google Scholar
    • Export Citation
  • Hense, A., and U. Römer, 1995: Statistical analysis of tropical climate anomaly simulations. Climate Dyn., 11, 178192.

  • Hense, A., G. Adrian, C. Kottmeier, C. Simmer, and V. Wulfmeyer, 2006: The German priority Program SPP1167 PQP Quantitative Precipitation Forecast: An overview. Second Int. Symp.on Quantitative Precipitation Forecasting (QPF) and Hydrology, Boulder, CO, NCAR, 17 pp. [Available online at http://www.mmm.ucar.edu/events/qpf06/QPF/Session7/hense_Simmer_PQP_Overview.pdf.]

  • Jonko, A. K., A. Hense, and J. J. Feddema, 2009: Effect of land cover change on the tropical circulation in a GCM. Climate Dyn., 35, 635649, doi:10.1007/s00382-009-0684-7.

    • Search Google Scholar
    • Export Citation
  • Kass, E., and A. E. Raftery, 1995: Bayes factors. J. Amer. Stat. Assoc., 90, 773795.

  • Kitchen, M., 1989: Representativeness errors for radiosonde observations. Quart. J. Roy. Meteor. Soc., 115, 673700.

  • Knight, K., and W. Fu, 2000: Asymptotics for lasso-type estimators. Ann. Stat., 28, 13561378.

  • Maesschalck, R. D., D. Jouan-Rimbaud, and D. L. Massart, 2000: The Mahalanobis distance. Chem. Intell. Lab. Syst., 50, 118.

  • Mahalanobis, P. C., 1936: On the generalised distance in statistics. Proc. Natl. Inst. Sci. India, 12, 4955.

  • Malakoff, D., 1999: Statistics: A brief guide to Bayes Theorem. Science, 286, 1461, doi:10.1126/science.286.5444.1461.

  • Marsigli, C., F. Boccanera, A. Montani, and T. Paccagnella, 2005: The COSMO-LEPS mesoscale ensemble system: Validation of the methodology and verification. Nonlinear Processes Geophys., 12, 527536.

    • Search Google Scholar
    • Export Citation
  • Marsigli, C., A. Montani, and T. Paccagnella, 2006: The COSMO-SREPS project. Newsletter of the 28th EWGLAM and 13th SRNWP Meetings, Zurich, Switzerland, 267–274.

  • Min, S.-K., and A. Hense, 2006: A Bayesian approach to climate model evaluation and multi-model averaging with an application to global mean surface temperatures from IPCC AR4 coupled climate models. Geophys. Res. Lett., 33, L08708, doi:10.1029/2006GL025779.

    • Search Google Scholar
    • Export Citation
  • Min, S.-K., A. Hense, H. Paeth, and W.-T. Kwon, 2004: A Bayesian decision method for climate change signal analysis. Meteor. Z., 13, 421436.

    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., and R. L. Winkler, 1984: Probability forecasting in meteorology. J. Amer. Stat. Assoc., 79, 489500.

  • Richardson, D. S., 2000: Skill and relative economic value of the ECMWF ensemble prediction system. Quart. J. Roy. Meteor. Soc., 126, 649667.

    • Search Google Scholar
    • Export Citation
  • Schättler, U., G. Doms, and C. Schraff, 2009: A description of the nonhydrostatic regional model LM. Part VII: Users guide. Deutscher Wetterdienst (DWD), Offenbach, Germany, 142 pp.

  • Schölzel, C., and A. Hense, 2010: Probabilistic assessment of regional climate change in Southwest Germany by ensemble dressing. Climate Dyn., 36, 20032014, doi:10.1007/s00 382-010-0815-1.

    • Search Google Scholar
    • Export Citation
  • Sole, T. D., and M. K. Tippett, 2007: Predictability: Recent insights from information theory. Rev. Geophys., 45, 202, doi:10.1029/2006RG000.

    • Search Google Scholar
    • Export Citation
  • Wainwright, M. J., and M. I. Jordan, 2008: Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn., 1, 1305, doi:10.1561/2200000001. [Available online at http://dl.acm.org/citation.cfm?id=1498840.1498841.]

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., and T. M. Hamill, 2007: Comparison of ensemble–MOS methods using GFS reforecasts. Mon. Wea. Rev., 135, 23792390.

  • Wulfmeyer, V., A. Behrendt, and H.-S. Bauer, 2008: The convective and orographically induced precipitation study. Bull. Amer. Meteor. Soc., 89, 14771486.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    Univariate temperature PDF of one level (850 hPa) at one grid point of the 6-h forecast of SREPS at 15 Jul 2007. The raw ensemble forecasts are marked by “+” and the observations by (a) The construction of the kernel dressing prior PDF, (b) the kernel dressing prior PDF together with the Gaussion PDF describing the observations (dashed curve in both figures shows the classic Gaussian density for the SREPS ensemble defined by the mean and variance of the raw forecasts). (c) The likelihood function of the observations for the kernel as well as the Gaussian prior density (the vertical gray line shows the point at which the likelihoods are evaluated).

  • Fig. 2.

    Time series of the Bayes’s factor logBir of the temperature at forecast time (indicated as vvtime on the abscissa) with respect to the analysis at the same time. The black solid line shows the Bayes’s factor of one station (Stuttgart) and the dashed line averaged over three stations (Stuttgart, Idar-Oberstein, and Nancy). The dotted–dashed line shows the Bayes’s factor for the three station treated jointly, with the initial state of the SREPS from a later run as the reference model. The gray band describes a significant area (see Table 1).

  • Fig. 3.

    Time series of the log Bayes’s factor logBir of the temperature at forecast time (vv time) averaged over 5 days and three station for (a) the SREPS case and (b) the LEPS. The initial time of the LEPS is 12 h later as the initial time of the SREPS. The gray band describes a significant area (see Table 1). The vertical bar shows the standard deviation of the Bayes’s factor.

  • Fig. 4.

    Time series of the Bayes’s factor of COSMO-DE-EPS at forecast time (vv time) with respect to SREPS. (a) The multivariate case for the temperature at eight vertical levels and (b) the univariate (850 hPa) case of the temperature. The gray band describes a significant area (see Table 1). The vertical bars show the standard deviation of the respective Bayes’s factor.

  • Fig. 5.

    The distribution of the Bayes’s factor logBir for DE-EPS (mde-eps) and SREPS (msreps) given the prior of mde-eps (msreps) varies from 0.01 to 0.99 (from 0.99 to 0.01). The forecast lead time is shown by vv.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 2255 1880 767
PDF Downloads 266 52 2