Search Results

You are looking at 1 - 4 of 4 items for :

  • Author or Editor: Laurence Wilson x
  • Monthly Weather Review x
  • Refine by Access: All Content x
Clear All Modify Search
Laurence J. Wilson, Stephane Beauregard, Adrian E. Raftery, and Richard Verret


Bayesian model averaging (BMA) has recently been proposed as a way of correcting underdispersion in ensemble forecasts. BMA is a standard statistical procedure for combining predictive distributions from different sources. The output of BMA is a probability density function (pdf), which is a weighted average of pdfs centered on the bias-corrected forecasts. The BMA weights reflect the relative contributions of the component models to the predictive skill over a training sample. The variance of the BMA pdf is made up of two components, the between-model variance, and the within-model error variance, both estimated from the training sample. This paper describes the results of experiments with BMA to calibrate surface temperature forecasts from the 16-member Canadian ensemble system. Using one year of ensemble forecasts, BMA was applied for different training periods ranging from 25 to 80 days. The method was trained on the most recent forecast period, then applied to the next day’s forecasts as an independent sample. This process was repeated through the year, and forecast quality was evaluated using rank histograms, the continuous rank probability score, and the continuous rank probability skill score. An examination of the BMA weights provided a useful comparative evaluation of the component models, both for the ensemble itself and for the ensemble augmented with the unperturbed control forecast and the higher-resolution deterministic forecast. Training periods around 40 days provided a good calibration of the ensemble dispersion. Both full regression and simple bias-correction methods worked well to correct the bias, except that the full regression failed to completely remove seasonal trend biases in spring and fall. Simple correction of the bias was sufficient to produce positive forecast skill out to 10 days with respect to climatology, which was improved by the BMA. The addition of the control forecast and the full-resolution model forecast to the ensemble produced modest improvement in the forecasts for ranges out to about 7 days. Finally, BMA produced significantly narrower 90% prediction intervals compared to a simple Gaussian bias correction, while achieving similar overall accuracy.

Full access
Laurence J. Wilson, William R. Burrows, and Andreas Lanzinger


Using a Bayesian context, new measures of accuracy and skill are proposed to verify weather element forecasts from ensemble prediction systems (EPSs) with respect to individual observations. The new scores are in the form of probabilities of occurrence of the observation given the EPS distribution and can be applied to individual point forecasts or summarized over a sample of forecasts. It is suggested that theoretical distributions be fit to the ensemble, assuming a shape similar to the shape of the climatological distribution of the forecast weather element. The suggested accuracy score is simply the probability of occurrence of the observation given the fitted distribution, and the skill score follows the standard format for comparison of the accuracy of the ensemble forecast with the accuracy of an unskilled forecast such as climatology. These two scores are sensitive to the location and spread of the ensemble distribution with respect to the verifying observation.

The new scores are illustrated using the output of the European Centre for Medium-Range Weather Forecasts EPS. Tests were carried out on 108 ensemble forecasts of 2-m temperature, precipitation amount, and windspeed, interpolated to 23 Canadian stations. Results indicate that the scores are especially sensitive to location of the ensemble distribution with respect to the observation; even relatively modest errors cause a score value significantly below the maximum possible score of 1.0. Nevertheless, forecasts were found that achieved the perfect score. The results of a single application of the scoring system to verification of ensembles of 500-mb heights suggests considerable potential of the score for assessment of the synoptic behavior of upper-air ensemble forecasts.

The paper concludes with a discussion of the new scoring method in the more general context of verification of probability distributions.

Full access
Laurence J. Wilson, Stéphane Beauregard, Adrian E. Raftery, and Richard Verret
Full access
Eric Gilleland, Gregor Skok, Barbara G. Brown, Barbara Casati, Manfred Dorninger, Marion P. Mittermaier, Nigel Roberts, and Laurence J. Wilson


As part of the second phase of the spatial forecast verification intercomparison project (ICP), dubbed the Mesoscale Verification Intercomparison in Complex Terrain (MesoVICT) project, a new set of idealized test fields is prepared. This paper describes these new fields and their rationale and uses them to analyze a number of summary measures associated with distance and geometric-based approaches. The results provide guidance about how they inform about performance under various scenarios. The new case comparisons are grouped into four categories: (i) pathological situations such as when a variable is zero valued at all grid points; (ii) circular events aimed at evaluating how different methods handle contrived situations, such as equal but opposite translations, the presence of multiple events of same/different size, boundary effects, and the influence of the positioning of events in the domain; (iii) elliptical events representing simplified scenarios that mimic commonly encountered weather phenomena in complex terrain; and (iv) cases aimed at analyzing how the verification methods handle small-scale scattered events, very large events with holes (e.g., a small portion of clear sky on a cloudy overcast day), and the presence of noise in one or both fields. Results show that all analyzed measures perform poorly in the pathological setting. They are either not able to provide a result at all or they instigate a special rule to prescribe a value resulting in erratic results. The analysis also showed that methods provide similar information in many situations, but that each has its positive properties along with certain unique limitations.

Open access