Search Results
You are looking at 1 - 10 of 13 items for
- Author or Editor: Laurence J. Wilson x
- Refine by Access: All Content x
Abstract
Kernel density estimation is employed to fit smooth probabilistic models to precipitation forecasts of the Canadian ensemble prediction system. An intuitive nonparametric technique, kernel density estimation has become a powerful tool widely used in the approximation of probability density functions. The density estimators were constructed using the gamma kernels prescribed by S.-X. Chen, confined as they are to the nonnegative real axis, which constitutes the support of the random variable representing precipitation accumulation.
Performance of kernel density estimators for several different smoothing bandwidths is compared with the discrete probabilistic model obtained as the fraction of member forecasts predicting the events, which for this study consisted of threshold exceedances. A propitious choice of the smoothing bandwidth yields smooth forecasts comparable, or sometimes superior, to the discrete probabilistic forecast, depending on the character of the raw ensemble forecasts. At the same time more realistic models of the probability density are achieved, particularly in the tail of the distribution, yielding forecasts that can be optimally calibrated for extreme events.
Abstract
Kernel density estimation is employed to fit smooth probabilistic models to precipitation forecasts of the Canadian ensemble prediction system. An intuitive nonparametric technique, kernel density estimation has become a powerful tool widely used in the approximation of probability density functions. The density estimators were constructed using the gamma kernels prescribed by S.-X. Chen, confined as they are to the nonnegative real axis, which constitutes the support of the random variable representing precipitation accumulation.
Performance of kernel density estimators for several different smoothing bandwidths is compared with the discrete probabilistic model obtained as the fraction of member forecasts predicting the events, which for this study consisted of threshold exceedances. A propitious choice of the smoothing bandwidth yields smooth forecasts comparable, or sometimes superior, to the discrete probabilistic forecast, depending on the character of the raw ensemble forecasts. At the same time more realistic models of the probability density are achieved, particularly in the tail of the distribution, yielding forecasts that can be optimally calibrated for extreme events.
Abstract
This paper describes validation tests of the Canadian Updateable Model Output Statistics (UMOS) system against the perfect prognosis forecast system and forecasts of weather elements from the operational numerical weather prediction model. Several update experiments were performed using 2-m temperature, 10-m wind direction and speed, and probability of precipitation as predictands. These experiments were designed to evaluate the ability of the UMOS system to provide improved forecasts during the period following a model change when the development samples contain data from two or more different model versions. Tests were run for about 200 Canadian stations for both summer and winter periods. Independent summer and winter samples were used in the evaluation, to compare UMOS forecast accuracy with the direct model output forecasts, the perfect prog forecasts, and MOS forecasts based only on data from the earlier model version. The authors were also able to compare the evaluation results of forecasts generated using the data from a 4-month summer “parallel run” period for which two versions of the model were run concurrently. Results show that the UMOS forecasts are generally superior to both perfect prog and direct model output forecasts for all three weather elements. The UMOS forecasts are particularly responsive to bias changes; most forecast biases could be corrected with relatively little data from the newer model version. Although some of the improvement over perfect prog forecasts is apparently due solely to the use of MOS, the updating brings additional improvements even during the data blending period. The results also suggest that the higher-resolution predictions from the model bring advantages only for the first day of the forecast period. For the day-2 forecasts, the improvement over the much smoother perfect prog forecasts was smaller, especially for probability of precipitation.
Abstract
This paper describes validation tests of the Canadian Updateable Model Output Statistics (UMOS) system against the perfect prognosis forecast system and forecasts of weather elements from the operational numerical weather prediction model. Several update experiments were performed using 2-m temperature, 10-m wind direction and speed, and probability of precipitation as predictands. These experiments were designed to evaluate the ability of the UMOS system to provide improved forecasts during the period following a model change when the development samples contain data from two or more different model versions. Tests were run for about 200 Canadian stations for both summer and winter periods. Independent summer and winter samples were used in the evaluation, to compare UMOS forecast accuracy with the direct model output forecasts, the perfect prog forecasts, and MOS forecasts based only on data from the earlier model version. The authors were also able to compare the evaluation results of forecasts generated using the data from a 4-month summer “parallel run” period for which two versions of the model were run concurrently. Results show that the UMOS forecasts are generally superior to both perfect prog and direct model output forecasts for all three weather elements. The UMOS forecasts are particularly responsive to bias changes; most forecast biases could be corrected with relatively little data from the newer model version. Although some of the improvement over perfect prog forecasts is apparently due solely to the use of MOS, the updating brings additional improvements even during the data blending period. The results also suggest that the higher-resolution predictions from the model bring advantages only for the first day of the forecast period. For the day-2 forecasts, the improvement over the much smoother perfect prog forecasts was smaller, especially for probability of precipitation.
Abstract
A comparatively long period of relative stability in the evolution of the Canadian Ensemble Forecast System was exploited to compile a large homogeneous set of precipitation forecasts. The probability of exceedance of a given threshold was computed as the fraction of ensemble member forecasts surpassing that threshold, and verified directly against observations from 36 stations across the country. These forecasts were stratified into warm and cool seasons and assessed against the observations through attributes diagrams, Brier skill scores, and areas under receiver operating characteristic curves. These measures were deemed sufficient to illuminate the salient features of a forecast system. Particular attention was paid to forecasts of 24-h accumulation, especially the exceedance of thresholds in the upper decile of station climates. The ability of the system to forecast extended dry periods was also explored.
Warm season forecasts for the 90th percentile threshold were found to be competitive with, even superior to, those for the cool season when verifying across the sample lumping together all of the stations. The relative skill of the forecasts in the two seasons depends strongly on station location, however. Moreover, the skill of the warm season forecasts rapidly drops below cool season values as the thresholds become more extreme. The verification, particularly of the cool season, is sensitive to the calibration of the gauge reports, which is complicated by the inclusion of snow events in the observational record.
Abstract
A comparatively long period of relative stability in the evolution of the Canadian Ensemble Forecast System was exploited to compile a large homogeneous set of precipitation forecasts. The probability of exceedance of a given threshold was computed as the fraction of ensemble member forecasts surpassing that threshold, and verified directly against observations from 36 stations across the country. These forecasts were stratified into warm and cool seasons and assessed against the observations through attributes diagrams, Brier skill scores, and areas under receiver operating characteristic curves. These measures were deemed sufficient to illuminate the salient features of a forecast system. Particular attention was paid to forecasts of 24-h accumulation, especially the exceedance of thresholds in the upper decile of station climates. The ability of the system to forecast extended dry periods was also explored.
Warm season forecasts for the 90th percentile threshold were found to be competitive with, even superior to, those for the cool season when verifying across the sample lumping together all of the stations. The relative skill of the forecasts in the two seasons depends strongly on station location, however. Moreover, the skill of the warm season forecasts rapidly drops below cool season values as the thresholds become more extreme. The verification, particularly of the cool season, is sensitive to the calibration of the gauge reports, which is complicated by the inclusion of snow events in the observational record.
Abstract
The use of model output statistics (MOS) in operational weather element prediction has been hindered since the mid-1980s by frequent changes in the operational numerical weather prediction models that supply the predictors for the weather element forecasts. Once the model changes, a new archive of model output must be collected for a long enough period that statistically stable equations can be developed. This paper describes a new statistical interpretation system that addresses this problem and permits the rapid adaptation of the statistical forecast to changes in the formulation of the driving model. In comparison with traditional MOS development, the new system incorporates two main features. First, the data are stored in the form of the cross-products matrices used in multivariate statistical techniques rather than as raw observations and forecasts. It is these matrices that are updated regularly with new output from the model. Second, equations are developed by a weighted blending of the new and old model data, with weights chosen to emphasize the new model data while including enough old model data in the development to ensure stable equations and a smooth transition to dependency on the new model. This paper describes the design of the new system and shows tests of the equation development method following a major change of the Canadian operational model. Tests were carried out for surface temperature, probability of precipitation, and wind direction and speed for about 200 Canadian stations that have a reliable observation record. For all three elements, the coefficients and predictors selected remained remarkably stable through the transition from dependence on old model data to new model data. Although some degradation of the goodness of fit was noticed during the period when new and old model forecasts were blended, especially for wind, these effects were minor, which means that useful MOS equations could be obtained relatively soon after a change of model. Results from a comparison of forecasts from the new system with operational perfect prog forecasts and direct model output forecasts are the subject of a second paper.
Abstract
The use of model output statistics (MOS) in operational weather element prediction has been hindered since the mid-1980s by frequent changes in the operational numerical weather prediction models that supply the predictors for the weather element forecasts. Once the model changes, a new archive of model output must be collected for a long enough period that statistically stable equations can be developed. This paper describes a new statistical interpretation system that addresses this problem and permits the rapid adaptation of the statistical forecast to changes in the formulation of the driving model. In comparison with traditional MOS development, the new system incorporates two main features. First, the data are stored in the form of the cross-products matrices used in multivariate statistical techniques rather than as raw observations and forecasts. It is these matrices that are updated regularly with new output from the model. Second, equations are developed by a weighted blending of the new and old model data, with weights chosen to emphasize the new model data while including enough old model data in the development to ensure stable equations and a smooth transition to dependency on the new model. This paper describes the design of the new system and shows tests of the equation development method following a major change of the Canadian operational model. Tests were carried out for surface temperature, probability of precipitation, and wind direction and speed for about 200 Canadian stations that have a reliable observation record. For all three elements, the coefficients and predictors selected remained remarkably stable through the transition from dependence on old model data to new model data. Although some degradation of the goodness of fit was noticed during the period when new and old model forecasts were blended, especially for wind, these effects were minor, which means that useful MOS equations could be obtained relatively soon after a change of model. Results from a comparison of forecasts from the new system with operational perfect prog forecasts and direct model output forecasts are the subject of a second paper.
Abstract
Bayesian model averaging (BMA) has recently been proposed as a way of correcting underdispersion in ensemble forecasts. BMA is a standard statistical procedure for combining predictive distributions from different sources. The output of BMA is a probability density function (pdf), which is a weighted average of pdfs centered on the bias-corrected forecasts. The BMA weights reflect the relative contributions of the component models to the predictive skill over a training sample. The variance of the BMA pdf is made up of two components, the between-model variance, and the within-model error variance, both estimated from the training sample. This paper describes the results of experiments with BMA to calibrate surface temperature forecasts from the 16-member Canadian ensemble system. Using one year of ensemble forecasts, BMA was applied for different training periods ranging from 25 to 80 days. The method was trained on the most recent forecast period, then applied to the next day’s forecasts as an independent sample. This process was repeated through the year, and forecast quality was evaluated using rank histograms, the continuous rank probability score, and the continuous rank probability skill score. An examination of the BMA weights provided a useful comparative evaluation of the component models, both for the ensemble itself and for the ensemble augmented with the unperturbed control forecast and the higher-resolution deterministic forecast. Training periods around 40 days provided a good calibration of the ensemble dispersion. Both full regression and simple bias-correction methods worked well to correct the bias, except that the full regression failed to completely remove seasonal trend biases in spring and fall. Simple correction of the bias was sufficient to produce positive forecast skill out to 10 days with respect to climatology, which was improved by the BMA. The addition of the control forecast and the full-resolution model forecast to the ensemble produced modest improvement in the forecasts for ranges out to about 7 days. Finally, BMA produced significantly narrower 90% prediction intervals compared to a simple Gaussian bias correction, while achieving similar overall accuracy.
Abstract
Bayesian model averaging (BMA) has recently been proposed as a way of correcting underdispersion in ensemble forecasts. BMA is a standard statistical procedure for combining predictive distributions from different sources. The output of BMA is a probability density function (pdf), which is a weighted average of pdfs centered on the bias-corrected forecasts. The BMA weights reflect the relative contributions of the component models to the predictive skill over a training sample. The variance of the BMA pdf is made up of two components, the between-model variance, and the within-model error variance, both estimated from the training sample. This paper describes the results of experiments with BMA to calibrate surface temperature forecasts from the 16-member Canadian ensemble system. Using one year of ensemble forecasts, BMA was applied for different training periods ranging from 25 to 80 days. The method was trained on the most recent forecast period, then applied to the next day’s forecasts as an independent sample. This process was repeated through the year, and forecast quality was evaluated using rank histograms, the continuous rank probability score, and the continuous rank probability skill score. An examination of the BMA weights provided a useful comparative evaluation of the component models, both for the ensemble itself and for the ensemble augmented with the unperturbed control forecast and the higher-resolution deterministic forecast. Training periods around 40 days provided a good calibration of the ensemble dispersion. Both full regression and simple bias-correction methods worked well to correct the bias, except that the full regression failed to completely remove seasonal trend biases in spring and fall. Simple correction of the bias was sufficient to produce positive forecast skill out to 10 days with respect to climatology, which was improved by the BMA. The addition of the control forecast and the full-resolution model forecast to the ensemble produced modest improvement in the forecasts for ranges out to about 7 days. Finally, BMA produced significantly narrower 90% prediction intervals compared to a simple Gaussian bias correction, while achieving similar overall accuracy.
Abstract
Using a Bayesian context, new measures of accuracy and skill are proposed to verify weather element forecasts from ensemble prediction systems (EPSs) with respect to individual observations. The new scores are in the form of probabilities of occurrence of the observation given the EPS distribution and can be applied to individual point forecasts or summarized over a sample of forecasts. It is suggested that theoretical distributions be fit to the ensemble, assuming a shape similar to the shape of the climatological distribution of the forecast weather element. The suggested accuracy score is simply the probability of occurrence of the observation given the fitted distribution, and the skill score follows the standard format for comparison of the accuracy of the ensemble forecast with the accuracy of an unskilled forecast such as climatology. These two scores are sensitive to the location and spread of the ensemble distribution with respect to the verifying observation.
The new scores are illustrated using the output of the European Centre for Medium-Range Weather Forecasts EPS. Tests were carried out on 108 ensemble forecasts of 2-m temperature, precipitation amount, and windspeed, interpolated to 23 Canadian stations. Results indicate that the scores are especially sensitive to location of the ensemble distribution with respect to the observation; even relatively modest errors cause a score value significantly below the maximum possible score of 1.0. Nevertheless, forecasts were found that achieved the perfect score. The results of a single application of the scoring system to verification of ensembles of 500-mb heights suggests considerable potential of the score for assessment of the synoptic behavior of upper-air ensemble forecasts.
The paper concludes with a discussion of the new scoring method in the more general context of verification of probability distributions.
Abstract
Using a Bayesian context, new measures of accuracy and skill are proposed to verify weather element forecasts from ensemble prediction systems (EPSs) with respect to individual observations. The new scores are in the form of probabilities of occurrence of the observation given the EPS distribution and can be applied to individual point forecasts or summarized over a sample of forecasts. It is suggested that theoretical distributions be fit to the ensemble, assuming a shape similar to the shape of the climatological distribution of the forecast weather element. The suggested accuracy score is simply the probability of occurrence of the observation given the fitted distribution, and the skill score follows the standard format for comparison of the accuracy of the ensemble forecast with the accuracy of an unskilled forecast such as climatology. These two scores are sensitive to the location and spread of the ensemble distribution with respect to the verifying observation.
The new scores are illustrated using the output of the European Centre for Medium-Range Weather Forecasts EPS. Tests were carried out on 108 ensemble forecasts of 2-m temperature, precipitation amount, and windspeed, interpolated to 23 Canadian stations. Results indicate that the scores are especially sensitive to location of the ensemble distribution with respect to the observation; even relatively modest errors cause a score value significantly below the maximum possible score of 1.0. Nevertheless, forecasts were found that achieved the perfect score. The results of a single application of the scoring system to verification of ensembles of 500-mb heights suggests considerable potential of the score for assessment of the synoptic behavior of upper-air ensemble forecasts.
The paper concludes with a discussion of the new scoring method in the more general context of verification of probability distributions.
Abstract
Statistical models valid May–September were developed to predict the probability of lightning in 3-h intervals using observations from the North American Lightning Detection Network and predictors derived from Global Environmental Multiscale (GEM) model output at the Canadian Meteorological Centre. Models were built with pooled data from the years 2000–01 using tree-structured regression. Error reduction by most models was about 0.4–0.7 of initial predictand variance.
Many predictors were required to model lightning occurrence for this large area. Highest ranked overall were the Showalter index, mean sea level pressure, and troposphere precipitable water. Three-hour changes of 500-hPa geopotential height, 500–1000-hPa thickness, and MSL pressure were highly ranked in most areas. The 3-h average of most predictors was more important than the mean or maximum (minimum where appropriate). Several predictors outranked CAPE, indicating it must appear with other predictors for successful statistical lightning prediction models.
Results presented herein demonstrate that tree-structured regression is a viable method for building statistical models to forecast lightning probability. Real-time forecasts in 3-h intervals to 45–48 h were made in 2003 and 2004. The 2003 verification suggests a hybrid forecast based on a mixture of maximum and mean forecast probabilities in a radius around a grid point and on monthly climatology will improve accuracy. The 2004 verification shows that the hybrid forecasts had positive skill with respect to a reference forecast and performed better than forecasts defined by either the mean or maximum probability at most times. This was achieved even though an increase of resolution and change of convective parameterization scheme were made to the GEM model in May 2004.
Abstract
Statistical models valid May–September were developed to predict the probability of lightning in 3-h intervals using observations from the North American Lightning Detection Network and predictors derived from Global Environmental Multiscale (GEM) model output at the Canadian Meteorological Centre. Models were built with pooled data from the years 2000–01 using tree-structured regression. Error reduction by most models was about 0.4–0.7 of initial predictand variance.
Many predictors were required to model lightning occurrence for this large area. Highest ranked overall were the Showalter index, mean sea level pressure, and troposphere precipitable water. Three-hour changes of 500-hPa geopotential height, 500–1000-hPa thickness, and MSL pressure were highly ranked in most areas. The 3-h average of most predictors was more important than the mean or maximum (minimum where appropriate). Several predictors outranked CAPE, indicating it must appear with other predictors for successful statistical lightning prediction models.
Results presented herein demonstrate that tree-structured regression is a viable method for building statistical models to forecast lightning probability. Real-time forecasts in 3-h intervals to 45–48 h were made in 2003 and 2004. The 2003 verification suggests a hybrid forecast based on a mixture of maximum and mean forecast probabilities in a radius around a grid point and on monthly climatology will improve accuracy. The 2004 verification shows that the hybrid forecasts had positive skill with respect to a reference forecast and performed better than forecasts defined by either the mean or maximum probability at most times. This was achieved even though an increase of resolution and change of convective parameterization scheme were made to the GEM model in May 2004.