• Atger, F., 1999: The skill of ensemble prediction systems. Mon. Wea. Rev., 127 , 19411953.

  • Atger, F., 2003: Spatial and interannual variability of the reliability of ensemble-based probabilistic forecasts: Consequences for calibrations. Mon. Wea. Rev., 131 , 15091523.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Atger, F., 2004: Estimation of the reliability of ensemble-based probabilistic forecasts. Quart. J. Roy. Meteor. Soc., 130 , 627646.

  • Bonsal, B. R., and Wheaton E. E. , 2005: Atmospheric circulation comparisons between the 2001 and 2002 and the 1961 and 1988 Canadian prairie droughts. Atmos.–Ocean, 43 , 163172.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Brier, G. W., 1950: Verification of forecasts expressed in terms of probabilities. Mon. Wea. Rev., 78 , 13.

  • Bright, D. R., Weiss S. J. , Wandishin M. S. , Kain J. S. , and Stensrud D. J. , 2004: Evaluation of short-range ensemble forecasts during the 2003 SPC/NSSL Spring Program. Preprints, 22nd Conf. on Severe Local Storms, Hyannis, MA, Amer. Meteor. Soc., P15.5. [Available online at http://ams.confex.com/ams/pdfpapers/68921.pdf.].

  • Buizza, R., Hollingsworth A. , Lalaurette F. , and Ghelli A. , 1999a: Probabilistic predictions of precipitation using the ECMWF Ensemble Prediction System. Wea. Forecasting, 14 , 168189.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Buizza, R., Miller M. , and Palmer T. N. , 1999b: Stochastic simulation of model uncertainties. Quart. J. Roy. Meteor. Soc., 125 , 28872908.

    • Search Google Scholar
    • Export Citation
  • Buizza, R., Houtekamer P. L. , Toth Z. , Pellerin G. , Wei M. , and Zhu Y. , 2005: A Comparison of the ECMWF, MSC, and NCEP global ensemble prediction systems. Mon. Wea. Rev., 133 , 10761097.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Candille, G., and Talagrand O. , 2005: Evaluation of probabilistic prediction systems for a scalar variable. Quart. J. Roy. Meteor. Soc., 131 , 21312150.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cherubini, T., Ghelli A. , and Lalaurette F. , 2002: Verification of precipitation forecasts over the Alpine region using a high-density observing network. Wea. Forecasting, 17 , 238249.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Côté, J. S., Gravel S. , Méthot A. , Patoine A. , Roch M. , and Staniforth A. , 1998: The operational CMC/MRB Global Environmental Multiscale (GEM) model. Part I: Design considerations and formulation. Mon. Wea. Rev., 126 , 13731395.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Doswell C. A. III, , 1987: The distinction between large-scale and mesoscale contribution to severe convection: A case study example. Wea. Forecasting, 2 , 316.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Eckel, F. A., and Walters M. K. , 1998: Calibrated probabilistic quantitative precipitation forecasts based on the MRF ensemble. Wea. Forecasting, 13 , 11321147.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Feller, W., 1957: An Introduction to Probability Theory and its Applications. Vol. 1. Wiley, 461 pp.

  • Fritsch, J. M., Thomas I. J. , Hansen O. F. , and Hardy G. G. , 1998: Quantitative precipitation forecasting: Report of the Eighth Prospectus Development Team, U.S. Weather Research Program. Bull. Amer. Meteor. Soc., 79 , 285299.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gauthier, P., Charette C. , Fillion L. , Koclas P. , and Laroche S. , 1999: Implementation of a 3d variational data assimilation system at the Canadian Meteorological Centre. Part 1: The global analysis. Atmos.–Ocean, 37 , 103156.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ghelli, A., and Lalaurette F. , 2000: Verifying precipitation forecasts using upscaled observations. ECMWF Newsletter, No. 87, ECMWF, Reading, United Kingdom, 9–17.

  • Glahn, H., and Lowry D. A. , 1972: The use of model output statistics (MOS) in objective weather forecasting. J. Appl. Meteor., 11 , 12031211.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Goodison, B. E., 1978: Accuracy of Canadian snow gage measurements. J. Appl. Meteor., 17 , 15421548.

  • Hamill, T. M., and Colucci S. J. , 1998: Evaluation of Eta–RSM ensemble probabilistic precipitation forecasts. Mon. Wea. Rev., 126 , 711724.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and Juras J. , 2006: Measuring forecast forecast skill: Is it real skill or is it the varying climatology? Quart. J. Roy. Meteor. Soc., 132 , 29052923.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Harvey L. O. Jr., , Hammond K. R. , Lusk C. M. , and Mross E. F. , 1992: The application of signal detection theory to weather forecasting behavior. Mon. Wea. Rev., 120 , 863883.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., Lefaivre L. , Derome J. , Ritchie H. , and Mitchell H. L. , 1996: A system simulation approach to ensemble prediction. Mon. Wea. Rev., 124 , 12251242.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jones, S. C., and Coauthors, 2003: Extratropical transition of tropical cyclones: Forecast challenges, current understanding, and future directions. Wea. Forecasting, 18 , 10521092.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lefaivre, L., Houtekamer P. L. , Bergeron A. , and Verret R. , 1997: The CMC Ensemble Prediction System. Proc. Sixth Workshop on Meteorological Operational Systems, Reading, United Kingdom, ECMWF, 31–44.

    • Search Google Scholar
    • Export Citation
  • Mason, I., 1982: A model for assessment of weather forecasts. Aust. Meteor. Mag., 30 , 291303.

  • Metcalfe, J. R., and Goodison B. E. , 1993: Correction of Canadian winter precipitation data. Proc. Eighth Symp. on Meteorological Observations and Instrumentation, Anaheim, CA, Amer. Meteor. Soc., 338–343.

    • Search Google Scholar
    • Export Citation
  • Metcalfe, J. R., Routledge B. , and Devine K. , 1997: Rainfall measurement in Canada: Changing observational methods and archive adjustment procedures. J. Climate, 10 , 92101.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mullen, S. L., and Buizza R. , 2001: Quantitative precipitation forecasts over the United States by the ECMWF Ensemble Prediction System. Mon. Wea. Rev., 129 , 638663.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mullen, S. L., and Buizza R. , 2002: The impact of horizontal resolution and ensemble size on probabilistic forecasts of precipitation by the ECMWF Ensemble Prediction System. Wea. Forecasting, 17 , 173191.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1973: A new vector partition of the probability score. J. Appl. Meteor., 12 , 595600.

  • Murphy, A. H., 1986: The attributes diagram: A geometric framework for assessing the quality of probability forecasts. Int. J. Forecasting, 2 , 285293.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1993: What is a good forecast? An essay on the nature of goodness in weather forecasting. Wea. Forecasting, 8 , 281293.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., and Winkler R. L. , 1987: A general framework for forecast verification. Mon. Wea. Rev., 115 , 13301338.

  • Pellerin, G., Lefaivre L. , Houtekamer P. , and Girard C. , 2003: Increasing the horizontal resolution of ensemble forecasts at CMC. Nonlinear Processes Geophys., 10 , 463468.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Petterssen, S., 1956a: Motion and Motion Systems. Vol. 1, Weather Analysis and Forecasting, McGraw-Hill, 428 pp.

  • Petterssen, S., 1956b: Weather and Weather Systems. Vol. 2, Weather Analysis and Forecasting, McGraw-Hill, 266 pp.

  • Richardson, D. S., 2000: Skill and relative economic value of the ECMWF Ensemble Prediction System. Quart. J. Roy. Meteor. Soc., 126 , 649667.

  • Ritchie, H., and Beaudoin C. , 1994: Approximations and sensitivity experiments with a baroclinic semi-Lagrangian spectral model. Mon. Wea. Rev., 122 , 23912399.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sanders, F., 1986: Trends in skill of Boston forecasts made at MIT 1966–84. Bull. Amer. Meteor. Soc., 67 , 170176.

  • Sanders, F., and Gyakum J. R. , 1980: Synoptic–dynamic climatology of the “bomb”. Mon. Wea. Rev., 108 , 15891606.

  • Stanski, H. R., Wilson L. J. , and Burrows W. R. , 1990: Survey of common verification methods in meteorology. World Weather Watch Tech. Rep. 8., WMO, Geneva, Switzerland, 114 pp.

  • Swets, J. A., and Pickett R. M. , 1982: Evaluation of Diagnostic Systems: Methods from Signal Detection Theory. Academic Press, 253 pp.

  • Wilks, D. S., 2006: Statistical Methods in the Atmospheric Sciences. 2nd ed. Elsevier Academic, 627 pp.

  • Wilson, L. J., 2000: Comments on “Probabilistic predictions of precipitation using the ECMWF Ensemble Prediction System”. Wea. Forecasting, 15 , 361364.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Yuan, H., Mullen S. L. , Gao X. , Sorooshian S. , Du J. , and Juang H-M. H. , 2005: Verification of probabilistic quantitative precipitation forecasts over the southwest United States during winter 2002/03 by the RSM ensemble system. Mon. Wea. Rev., 133 , 279294.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • View in gallery

    Locations of stations supplying observational data. (Stations referred to explicitly have been labeled and/or named)

  • View in gallery

    Box plots of observational 24-h precipitation distributions for (a) cool and (b) warm seasons (the whiskers extend beyond the third quartile by 1.5 times the interquartile range). Station means (used to order the stations) are indicated by the heavy solid line.

  • View in gallery

    Observed 24-h precipitation accumulation: distributions for selected stations.

  • View in gallery

    Seasonal trend of 24-h precipitation in long-term climate and verification samples: (a) St. Johns, (b) Vancouver, (c) Saskatoon, and (d) Edmonton. Error bars extend vertically from every third long-term climate data point by twice the standard deviation for that point.

  • View in gallery

    Reliability diagrams for (a) warm and (b) cool season forecasts of 24-h accumulation at the 90th percentile.

  • View in gallery

    BSSs for (a) warm and (b) cool season forecasts of 24-h accumulation as a function of projection.

  • View in gallery

    Reliability and resolution for (a) warm and (b) cool season forecasts of 24-h accumulation as a function of projection.

  • View in gallery

    Rank histograms of (a) warm and (b) cool season forecasts of 24-h accumulation with a 24-h lead time.

  • View in gallery

    ROC areas for (a) warm and (b) cool season forecasts of 24-h accumulation as a function of projection.

  • View in gallery

    Reliability diagrams for (a) warm and (b) cool season forecasts of 24-h accumulation at St. John’s.

  • View in gallery

    BSSs for (a) warm and (b) cool season forecasts of 24-h accumulation at St. John’s.

  • View in gallery

    Brier components for (a) warm and (b) cool season forecasts of 24-h accumulation at St. John’s.

  • View in gallery

    ROC areas for (a) warm and (b) cool season forecasts of 24-h accumulation at St. John’s.

  • View in gallery

    BSS and ROC areas, respectively, of (a), (c) warm and (b), (d) cool season forecasts of 24-h accumulation averaged over all stations.

  • View in gallery

    Brier components of (a) warm and (b) cool season forecasts of 24-h accumulation averaged over all stations.

  • View in gallery

    BSSs and components for forecasts of 24-h accumulation: observation adjustment=1.10.

  • View in gallery

    Box plots of observational 7-day precipitation distributions for (a) cool and (b) warm seasons.

  • View in gallery

    Reliability diagrams for (a) warm and (b) cool season forecasts of 7-day precipitation below the 10th percentile.

  • View in gallery

    BSSs and components for forecasts of 7-day precipitation below the 10th percentile.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 39 36 2
PDF Downloads 30 27 2

A Diagnostic Verification of the Precipitation Forecasts Produced by the Canadian Ensemble Prediction System

View More View Less
  • 1 Meteorological Research Division, Environment Canada, Toronto, Ontario, Canada
  • | 2 Meteorological Research Division, Environment Canada, Dorval, Québec, Canada
© Get Permissions
Full access

Abstract

A comparatively long period of relative stability in the evolution of the Canadian Ensemble Forecast System was exploited to compile a large homogeneous set of precipitation forecasts. The probability of exceedance of a given threshold was computed as the fraction of ensemble member forecasts surpassing that threshold, and verified directly against observations from 36 stations across the country. These forecasts were stratified into warm and cool seasons and assessed against the observations through attributes diagrams, Brier skill scores, and areas under receiver operating characteristic curves. These measures were deemed sufficient to illuminate the salient features of a forecast system. Particular attention was paid to forecasts of 24-h accumulation, especially the exceedance of thresholds in the upper decile of station climates. The ability of the system to forecast extended dry periods was also explored.

Warm season forecasts for the 90th percentile threshold were found to be competitive with, even superior to, those for the cool season when verifying across the sample lumping together all of the stations. The relative skill of the forecasts in the two seasons depends strongly on station location, however. Moreover, the skill of the warm season forecasts rapidly drops below cool season values as the thresholds become more extreme. The verification, particularly of the cool season, is sensitive to the calibration of the gauge reports, which is complicated by the inclusion of snow events in the observational record.

Corresponding author address: Syd Peel, 4905 Dufferin St., Toronto, ON M3H 5T4, Canada. Email: syd.peel@ec.gc.ca

Abstract

A comparatively long period of relative stability in the evolution of the Canadian Ensemble Forecast System was exploited to compile a large homogeneous set of precipitation forecasts. The probability of exceedance of a given threshold was computed as the fraction of ensemble member forecasts surpassing that threshold, and verified directly against observations from 36 stations across the country. These forecasts were stratified into warm and cool seasons and assessed against the observations through attributes diagrams, Brier skill scores, and areas under receiver operating characteristic curves. These measures were deemed sufficient to illuminate the salient features of a forecast system. Particular attention was paid to forecasts of 24-h accumulation, especially the exceedance of thresholds in the upper decile of station climates. The ability of the system to forecast extended dry periods was also explored.

Warm season forecasts for the 90th percentile threshold were found to be competitive with, even superior to, those for the cool season when verifying across the sample lumping together all of the stations. The relative skill of the forecasts in the two seasons depends strongly on station location, however. Moreover, the skill of the warm season forecasts rapidly drops below cool season values as the thresholds become more extreme. The verification, particularly of the cool season, is sensitive to the calibration of the gauge reports, which is complicated by the inclusion of snow events in the observational record.

Corresponding author address: Syd Peel, 4905 Dufferin St., Toronto, ON M3H 5T4, Canada. Email: syd.peel@ec.gc.ca

1. Introduction

The amount of precipitation anticipated over some future time interval is often the most important information in a weather forecast. This weather element also remains one of the most challenging to predict (Mullen and Buizza 2001; Fritsch et al. 1998; Sanders 1986). The inherent unpredictability of even the occurrence of precipitation is reflected in the inclusion of a probability of precipitation (POP) in North American weather forecasts. Quantification of this unpredictability was attempted well before the operational implementation of ensemble prediction systems (EPSs), ranging from a purely subjective reflection of the forecaster’s judgement to more objective estimates based, for example, on a regression model (Glahn and Lowry 1972). The advent of EPSs allowed weather services to supply more comprehensive estimates of the uncertainty intrinsic to a given weather forecast, thereby furnishing much more information than a deterministic forecast can; information that, as has already been alluded to, is acutely needed for precipitation forecasts.

An EPS produces a set of precipitation forecasts, each obtained from a perturbation of the initial state, sometimes from different NWP models, in order to capture the uncertainty due to the incomplete description of the initial state of the atmosphere as well as imperfections in the NWP model itself. Numerous investigations have been conducted into the quality of the ensemble forecasts produced operationally, in particular those of the European Centre for Medium-Range Weather Forecasts (ECMWF) (Buizza et al. 1999a; Mullen and Buizza 2001, 2002), the National Centers for Environmental Prediction (NCEP) (Eckel and Walters 1998; Atger 1999, 2003), and the Meteorological Service of Canada (MSC) (Pellerin et al. 2003). A common undercurrent running through such studies is the inevitable compromise that must be made in order to compile a sufficiently large sample of forecasts generated under tolerably homogeneous conditions, thereby justifying the statistical treatment of the problem as the analysis of a stationary system. Restricting the verification sample to periods during which the forecast system did not experience any significant changes and stratifying the data by season are two of the more common techniques by which such homogeneity is approximated.

However, such restrictions reduce the size of the samples available for statistical analysis, increasing the noise arising from sampling effects (Candille and Talagrand 2005). One method of circumventing these constraints on the sample sizes is to verify the EPS forecasts against a surrogate for measurements from in situ instrumentation. Verification studies have, for example, compared EPS forecasts against forecasts from higher-resolution deterministic NWP model output (Buizza et al. 1999a), against the analyses used in NWP assimilation cycles (Atger 2003), and, in the case of precipitation forecasts, against analyses obtained by merging rain gauge data with estimates from radar (Yuan et al. 2005).

In this study we have profited from a period of roughly 3 yr (from August 2001 through November 2004) during which the Canadian EPS did not undergo any modifications. This large forecast sample permitted verification against the observations of a relatively small number of reporting stations across Canada. Not only were these stations deemed to provide reliable in situ precipitation observations, their small number facilitated closer examination and more stringent quality control than would be practicable with larger datasets.

Another common practice in verification studies, ours being no exception, entails the aggregation of forecast–observation pairs from numerous locations to augment the sample available for statistical analysis. This practice can artificially enhance the apparent performance of an EPS, rewarding it for merely discriminating differences in regional climatologies, as pointed out by Hamill and Juras (2006). We followed several of Hamill and Juras’s recommendations in order to avoid this artificial inflation of the verification scores.

In the second section of this paper we briefly describe the Canadian ensemble forecast system. The verification methodology is detailed in section 3, while the results of this verification are presented in section 4. Section 5 concludes with a discussion of these results, along with some conclusions.

2. The Canadian Ensemble Prediction System

The Canadian EPS employs the Spectral Finite Element (SEF) model (Ritchie and Beaudoin 1994) to produce a control forecast. The SEF model also supplies eight members of the ensemble, obtained by randomly perturbing the initializing observations. Different numbers of vertical levels are used in the integrations for different members, as well as different numbers of time steps for the semi-Lagrangian scheme used to model the evolution of the system (Houtekamer et al. 1996; Pellerin et al. 2003). The remaining eight members of the ensemble are generated by the Global Environmental Multiscale (GEM) model (Côté et al. 1998). Only eight independent assimilation cycles are used to produce the analyses required to initialize the integrations for the 16 perturbed members. The mean of these eight analyses is compared with a higher-resolution 3D variational analysis (Gauthier et al. 1999) and a fraction of the difference is added to the initializing analyses of half of the members (Buizza et al. 2005; Pellerin et al. 2003).

In addition to different dynamical models, the Canadian system has access to a variety of parameterizations simulating atmospheric phenomena that cannot be adequately rendered in the resolution of the dynamical model. Hence, for example, deep convection is parameterized by one of the Kuo, Manabe, relaxed Arakawa–Schubert, or modified Kuo schemes. The soil moisture field supplied as a boundary condition is perturbed by ±20% from climate for some of the members, and either a mean orography or a 0.3σ envelope orography is used (Houtekamer et al. 1996; Lefaivre et al. 1997; Pellerin et al. 2003).

In 2001 the resolution of the spectral members was improved from TL95 to TL149, and the resolution of the GEM members was improved from 1.875° to 1.2° (Pellerin et al. 2003; Buizza et al. 2005). In January 2005 the ensemble Kalman filter method was incorporated into the assimilation cycle for the operational EPS. Inasmuch as the Canadian ensemble forecast system underwent no changes during the period from 1 August 2001 through 30 November 2004, this period was chosen for our verification. The EPS outputs, delivered daily and integrated out to 240 h from a 0000 UTC analysis, were verified as probabilistic forecasts of binary events defined by the exceedance of various thresholds. The gridded precipitation forecasts from the EPS members were linearly interpolated to the locations of 36 stations across Canada (Fig. 1), and the probability of a particular event at a given station was obtained as the fraction of member forecasts predicting that event.

3. Methodology

a. Definition of predictands

The events examined included exceedances based on the upper quantiles of the climate distribution, in particular the 90th, 95th, and 99th percentiles. These events were evaluated not only for forecasts of 24-h precipitation, but also for accumulations over intervals of more than 1 day, namely 2, 3, 5, 7, and 10 days. For the longer accumulation intervals we also investigated the ability of the EPS to forecast dry events, the precipitation amount below the 10th percentile in the long-term climate, for example. Exceedance probabilities for physical thresholds, 24-h accumulations ranging from 1 to 5 mm, for example, were also evaluated for individual stations.

b. Observational data

Corroborating observations were obtained from the aforementioned 36 surface stations across Canada; their positions are plotted in Fig. 1. Clearly, the eastern half of the country, particularly the Atlantic provinces and Ontario, is reasonably well sampled. The observing sites sampling Québec and the western half of the country are more sparsely distributed. Starkly apparent is the confinement of the verifying observations to the southern tier of the country. The 6-hourly synoptic reports from these stations were summed to yield the total precipitation amount observed over the relevant accumulation interval. Time series for the observational records from each of these stations were inspected. Egregiously erroneous reports (e.g., ∼800 mm in a 24-h period) were summarily culled from the record, while reports that could not be so easily dismissed (e.g., the observation of more than 153 mm reported at Regina, Saskatchewan, for the 24-h period ending at 0000 UTC 27 June 1975) were confirmed against the official MSC climate archive.

In spite of these steps taken to improve the quality of the observational record, the database is by no means flawless. Suspicious reports of precipitation amounts of 0.5 mm appear sporadically in station records. An uncertainty of ±0.19 mm ascribable to trace reports is quoted in Metcalfe et al. (1997), which prescribes a correction factor of 1.02 for wind-induced error associated with type-B rain gauges deployed in the early 1970s. Errors intrinsic to snow gauge measurements are more complicated, and are discussed in Goodison (1978) and Metcalfe and Goodison (1993).

c. Verification metrics

The metrics used for the verification consisted of the attributes diagram [also referred to as the reliability table; Murphy (1986); Wilks (2006)], the Brier skill score [BSS; Stanski et al. (1990); Wilks (2006)], and the receiver operating characteristic (ROC; Mason 1982; Harvey et al. 1992). For probability forecasts of dichotomous events these three measures afford a fairly complete diagnostic evaluation, probing several of the facets of forecast performance identified in Murphy (1993).

The attributes diagram summarizes the joint distribution of forecasts and observations from the perspective of the observational frequency distribution conditioned on the forecast, rendering a graphical depiction of the reliability, resolution, and skill of the forecasts. The observed frequency of the event is plotted as a function of the forecast probability that the event would occur. Generation of an attributes diagram thus entails partitioning the sample of matched forecast–observation pairs into a number of forecast bins, the position of points on the abscissa determined by the mean of the forecasts within each bin. The highly skewed distributions considered here displace these points away from the center of the bin.

To obtain a stable result, each bin must contain a sufficient number of cases, necessitating sufficiently large bin widths, which imposes a limit on the number of bins (Atger 2004). We therefore face countervailing constraints on the choice of the widths of the forecast bins: they should be large enough to guarantee stability, but small enough that the number of bins reveals any significant structure. Sharpness can be gleaned from the histogram of the forecast distribution.

The BSS (Stanski et al. 1990) derives from the Brier score (BS) (Brier 1950), the mean square error in the probabilistic forecasts fi:
i1520-0434-23-4-596-e1
where N is the size of the sample and the oi are the corresponding dichotomous observations, set to zero if the event does not occur, unity if it does. The Brier score BS can be resolved into contributions from the reliability, resolution, and uncertainty of the forecasts (Murphy 1973; Stanski et al. 1990) (these terms are briefly motivated in section 4):
i1520-0434-23-4-596-e2
i1520-0434-23-4-596-e3
where b is the number of forecast bins, fk is the mean of the forecasts in bin k, ok is the mean of the observations corresponding to the forecasts falling within bin k, o is the mean of all of the observations, and oki is the ith of Nk observations in bin k. Normalizing the reliability and resolution by the uncertainty yields the BSS referenced to the sample climatology (Buizza et al. 2005):
i1520-0434-23-4-596-e4
i1520-0434-23-4-596-e5
The contributions of the reliability and resolution are manifest in (5), while (4) has the advantage of being independent of how the forecasts were binned and is the score used here.

The receiver operating characteristic gauges the discrimination of the forecasts between occurrences and nonoccurrences. The area under the ROC curve depends on the degree of separation of the distribution of forecast probabilities conditional on the occurrence of the event (the signal distribution) from the distribution conditional on nonoccurrence (the noise distribution), thereby gauging the joint distribution of forecasts and observations from the perspective of the forecast frequency distribution conditioned on the observations—the likelihood-base rate factorization of Murphy and Winkler (1987). To calculate the area under the ROC curve we had access to the binomial model suggested in Swets and Pickett (1982) and evaluated for POP forecasts by Mason (1982) (see also Atger 2004 and Richardson 2000). Using something like the trapezoidal rule to compute areas under untransformed ROC curves assumes piecewise linearity in frequency space, which is unsupported by empirical evidence and makes it more difficult to compare ROC scores (Wilson 2000). Inasmuch as the ROC areas depend on the separation of signal and noise distributions, sufficient event occurrences are needed in order to estimate the first and second moments of both distributions. Detection of extreme events is often of critical interest but their incidence is, by definition, infrequent. We have therefore combined samples from numerous stations to augment the sample used to resolve the signal distribution.

d. Long-term station climates

A long-term precipitation climate was constructed using observations from 36 stations, from 1 January 1972 to 1 August 2001, affording a climate comprising almost 30 yr. The box plots in Figs. 2a and 2b for the long-term climatological distributions of 24-h precipitation at each station highlight the differences in the precipitation climates among these stations. The stations have been ordered from left to right by their mean 24-h precipitation, which has also been plotted. In the cool season the driest stations on the left are all in the prairie provinces: Alberta, Saskatchewan, and Manitoba. The Atlantic sites represent the wettest climates, while the intermediate climates are located in the central provinces of Ontario and Québec. The salient difference in the warm season is the displacement of the Pacific coast stations in British Columbia [Canadian Forces Base Comox (YQQ), Vancouver International Airport (YVR), and Victoria International Airport (YYJ)] from the wet regime on the right of the cool season box plots in Fig. 2a into the dry regime on the left of the warm season box plots in Fig. 2b. Much greater regional variation (particularly as revealed in the station means) is apparent in the cool season precipitation climate.

The cumulative distribution functions (cdf’s) at selected stations are plotted in Fig. 3, from which is evident the dryness of the climate for the prairie site Saskatoon John G. Diefenbaker International Airport, Saskatchewan (YXE). For example, Saskatoon saw less than 6 mm of rainfall more than 95% of the time, this occurred at YVR and Canadian Forces Base Shearwater (YAW), in Halifax, Nova Scotia, only 82% of the time, and at St. John’s International Airport, Newfoundland (YYT), less than 80% of the time. The precipitation climates of Toronto Pearson International Airport, Ontario (YYZ), and Montréal-Pierre Elliott Trudeau International Airport, Québec (YUL), are intermediate between those of the coasts and the prairies. The cdf’s for the climates at Halifax and Vancouver intersect for a 24-h precipitation accumulation of 6 mm, indicating that cases in which less than 6 mm falls are more frequent at Vancouver, while events exceeding 6 mm are more frequent at Halifax.

Starker differences manifest themselves in seasonal trends of daily precipitation. Plotted in Fig. 4a is the 4-week moving average of daily precipitation observed at St. John’s, which climbs dramatically in the fall with the onset of the hurricane season in the Atlantic basin. While the advent of a tropical storm is rare, significant rainfall events from posttropical storms engendered by dissipating tropical systems are common enough (Jones et al. 2003), and extratropical storms resulting from East Coast cyclogenesis persist through the winter (Sanders and Gyakum 1980; Petterssen 1956a). A minimum is evident in July, but a much more pronounced minimum is evident in Fig. 4b for Vancouver, which receives most of its precipitation in the winter from Pacific lows following the Kuroshio storm track (Sanders and Gyakum 1980; Petterssen 1956a). The seasonal trends in the verification samples show that the verification sample for Vancouver approximates the long-term climatology. The seasonal signal in the verification sample at St. John’s is much noisier, which is to be expected given that the long-term signal is itself noisier and also has a smaller amplitude than Vancouver’s. It is obvious from Fig. 4c that the seasonal trend at Saskatoon is the reverse of that observed at Vancouver, with a pronounced maximum in the summer due to increased convective activity and little wintertime precipitation under a dry Arctic air mass. Although no overall departure from the long-term climate is evident in the verification samples of these stations, abnormally low precipitation was observed across much of Canada in 2001 and 2002, particularly in parts of the prairies that experienced severe droughts (Bonsal and Wheaton 2005). Evidence of this was captured at Edmonton, Alberta (Fig. 4d), where the precipitation amounts in the verification sample lie below the long-term climate, well below in the spring and early summer.

e. Verification methodology

The BSSs and ROC areas were computed from samples aggregated over a number of stations. Per Hamill and Juras (2006), precipitation amounts corresponding to a particular percentile in the long-term station climates were computed. For example, the 90th percentile of the 24-h precipitation observed at Vancouver over the cool season is 14.4 mm. The forecast probability of exceedance of this 90th percentile threshold was then determined for each record in the verification sample for Vancouver as P(X > 14.4). By the same token, the corresponding binary observation was assigned the value unity if more than 14.4 mm of precipitation was observed, and zero otherwise.

This procedure was repeated for each station in the aggregate sample, the 90th percentile thresholds obtained from each station’s long-term climate. Spatial climatological inhomogeneity was thereby eliminated from the resulting verification database. Considerable temporal inhomogeneity also exists in the observational records (cf. Figs. 4a–d). This was addressed through seasonal stratification of the data, which was partitioned into a warm season, composed of Julian days 121 (1 May) through 300 (27 October), with the remaining data constituting the cool season.

4. Results

Reliability diagrams for the verification sample obtained by lumping the station data together are presented in Figs. 5a and 5b. The observed frequency of occurrence of the event (exceedance of the 90th percentile for 24-h accumulation determined in the warm and cool seasons of the individual station climates) is plotted as a function of the forecast for that event. The dashed horizontal line at a frequency of 10% corresponds to long-term climatology; the solid horizontal line beneath delineates the observed frequency in the verification climate.

Perfectly reliable forecasts fall on the line with unit slope, corresponding to a vanishing contribution to the reliability term in (2), because on this line the forecasts fk averaged over the bin equal the mean ok of the observations (reliability is negatively oriented: smaller values represent better forecasts). Reliability diagrams such as the one in Fig. 5a thus depict the bias in the forecasts, conditional on forecast bin. The resolution term in (2) can be interpreted on the attributes diagram as the vertical distance oko from a point to the horizontal climate line, weighted by the population of the forecast bin. The larger this distance, the better the resolution of the forecast. Resolution indicates the degree to which different forecasts signal different phenomena, regardless of forecast calibration (reliability and resolution are independent of each other). It follows that the line (labeled skill–no skill in Fig. 5a) bisecting the angle between the line of unit slope and the horizontal line marking the sample frequency separates forecasts with skill from those with none (Wilks 2006), because along this line the reliability and resolution terms cancel each other and the BS reduces to the sample uncertainty. The sharpness of the forecasts can be divined from the histograms showing the relative populations of the forecast bins (see the inset in the top-left panel of Fig. 5).

Reliability plots for forecasts at projections of 24, 72, 120, 168, and 240 h are plotted together in Figs. 5a and 5b. For the warm season the 24-h projection demonstrates good reliability in all bins, with a slight underforecasting bias for the smallest probabilities and an increasing overforecasting bias at larger probabilities. The histogram for the forecast bins indicates sharpness in the forecasts; while the preponderance of the forecasts fall within the lowest probability bin, there is an unmistakable local maximum in the forecast population for the highest probability bin. Not surprisingly, the forecasts degrade with increasing forecast projections. The drop in the performance of the 72-h forecasts from that of the 24-h projections is much more pronounced for the higher-probability bins. Indeed, as a consequence of the underforecasting bias in the 24-h forecasts for the lowest-probability bins, there is a slight improvement in the 72-h forecasts in these bins. At a projection of 120 h, little skill remains in the forecasts, which further degrade at 168 and 240 h. The histograms reveal the diminishing sharpness of the forecasts as the populations of the extremal forecast bins consistently shrink while those of the intermediate bins grow; the populations of the highest probability bins of the 168 and 240-h forecasts becoming so small that noise begins to manifest itself.

Cool season forecasts possess skill out to only 72 h, but do exhibit greater sharpness. This is particularly evident for forecasts with a lead time of 1 day in the highest probability bin; the population of this bin in the cool season is roughly double the corresponding population for the warm season forecasts. As was the case for the warm season, this sharpness disappears with increasing forecast projection as the EPS becomes increasingly hesitant to forecast higher probabilities. At a projection of 240 h, the EPS abstains from forecasting in the highest bin (probabilities ≳ 0.91).

BSSs, plotted as a function of forecast projection in Figs. 6a and 6b, confirm the results in the attributes diagrams. Warm season forecasts for the 90th percentile threshold fall near zero at a projection of 5 days, while the cool season forecasts barely maintain skill to 3 days. Also evident is the deterioration of skill as the events (exceedance of the 90th, 95th, and 99th percentiles) become increasingly rare.

From the plots of the reliability and resolution components of BSS as a function of forecast projection in Figs. 7a and 7b, it is clear that better scores for the warm season are derived from their superior reliability. For the 90th and 95th percentile thresholds the reliability of the warm season forecasts, at a projection of 1 day, is very good indeed. At a projection of 1 day the resolution of the cool season forecasts is markedly better than for those of the warm season. Recalling the geometric interpretation of the resolution as the vertical separation between the point on the reliability plot and the climate reference, Figs. 5a and 5b reveal that for lead times of 1 day the largest contribution to the resolution term in (2) is derived from the highest-probability bin. The overall contribution to the resolution is, however, weighted by the relative population of this bin. Since the population of this bin for the cool season forecasts is easily double that of the warm season, the contribution to the overall resolution from this bin is thus much greater for the cool season. By a projection of 72 h, the difference in the population of this bin between the two seasons has largely disappeared, as has the difference in resolution, which drops off monotonically with increasing lead time. Reliability also worsens, albeit not as rapidly, leveling off at about 3 days.

The reliability diagrams in Figs. 5a and 5b match the archetype of an underdispersive EPS presented in Wilks (2006). This is borne out in the rank histograms [computed in accordance with the prescription of Hamill and Colucci (1998)] for forecasts of 1-day accumulations with lead times of 24 h plotted for the warm and cool seasons in Figs. 8a and 8b, respectively. While underdispersion is pronounced for forecasts from either season, particularly for the lowest accumulations, cool season forecasts are clearly much more underdispersive. The underdispersion at the lower end of the precipitation range is largely attributable to a reluctance to forecast nil precipitation, especially in the cool season. The relative frequency with which the smallest member forecast is zero ≈ 0.302 for the cool season versus 0.637 for the warm season forecasts, taking no account of the verifying observation; that is, the cool season forecasts are more than twice as likely as the warm season forecasts to have nonzero quantitative precipitation forecasts (QPFs) from all members, but this is irrespective of whether rain actually occurred and therefore affords no direct evidence on relative seasonal biases in the forecast of nil precipitation. Stratifying on that subset of the sample for which the verifying observation indicated no precipitation, ∼0.488 of the smallest cool season member forecasts were zero, versus 0.845 for the warm season, so there is indeed a much more severe underforecasting bias of nil precipitation in the cool season.

ROC areas are plotted as functions of projection in Figs. 9a and 9b. The ability to discriminate between event occurrence and nonoccurrence is essentially the same for both seasons at the 90th percentile threshold. The forecasts exhibit useful discriminatory skill to almost 6 days [adopting the criterion that the ROC area exceed 0.70; Bright et al. (2004); Buizza et al. (1999a)]. As the phenomenon being forecast becomes rarer, the performance of the forecasts, measured by ROC areas, tends to improve. Hence, it is seen that the cool season forecasts of the 95th percentile exceedance demonstrate skill to 7 days. For the summer the increase in performance is much smaller. For projections beyond 6 days, the ROC areas actually drop below those for the corresponding forecasts of the 90th percentile exceedance, but increasing levels of noise are asserting themselves, largely obscuring the results for the 99th percentile threshold as the populations of the subsamples corresponding to event occurrence become insufficient to resolve the signal distribution.

Treating subsets of the verification sample from single locations avoids those issues addressed in Hamill and Juras (2006) that can arise in the analysis of aggregated samples, permitting verification of exceedance probabilities for physical thresholds (e.g., the probability of more than 2 mm of rain). Because these samples are much smaller than those aggregated over all 36 stations, the number of forecast bins was reduced to six, to minimize the noise in the attributes diagrams. Verifications were effected at all of the stations in this study for probabilities of 24-h precipitation exceeding 1–5 mm.

St. John’s has one of the wettest climates in this study—its precipitation falling in the cool season (cf. Figs. 2a,b, 3 and 4a as well as the sample frequencies in Figs. 10a and 10b). The attributes diagram again indicates skill in the warm season forecasts (of 24-h accumulation exceeding 3 mm) for lead times of about 120 h (Fig. 10a), whereas the cool season forecasts show skill to 168 h (Fig. 10b). This is reflected in the BSS, which remains positive to 6 days for the warm season, while the cool season forecasts show skill for a lead time of 7 days (Figs. 11a and 11b). The cool season forecasts are sharp for forecast projections of 24 and 72 h, while only the 24-h forecasts possess sharpness in the warm season (refer to the histograms inset in Figs. 10a and 10b). The superiority of the cool season forecasts can, on inspection of the components of BSS plotted in Figs. 12a and 12b, be attributed to superior reliability, which decreases with increasing forecast projection. An underforecasting bias in the lower-probability bins for the 24-h forecasts disappears by a lead time of 72 h, the confidence of the EPS waning with increasing forecast projection. The discriminatory skill measured by the ROC areas (Figs. 13a and 13b) is comparable for the two seasons, indicating skill to roughly 5 days.

BSSs and ROC areas, averaged over all of the stations in the study,1 are plotted as functions of forecast projection in Figs. 14a–d. The results closely resemble those obtained for the 90th percentile threshold in the aggregate sample (cf. Figs. 6 and 9). There is a weaker dependence upon the threshold than was observed in the aggregate sample upon passing from the 90th to the 95th and 99th percentiles, but this is hardly surprising inasmuch as thresholds of 3 and 5 mm do not lie very far into the tails of the climatological distributions at most stations. Cool season forecasts for a threshold of 1 mm pose an exception to the above observation; their skill was markedly poorer than that of the forecasts at higher thresholds. Inspection of the Brier components plotted in Figs. 15a and 15b shows that this sharp drop in skill is due to a much poorer reliability. While not as pronounced, the reliability of the warm season forecasts at the 1-mm threshold is also quite poor. This poor calibration at the 1-mm level may arise in part from cumulative errors due to trace amounts registered by the precipitation gauges, which are recorded as zero amounts in the 6-h synoptic reports, when in fact as much as 0.19 mm could have actually fallen (Metcalfe et al. 1997).

Calibration of snow gauge measurements poses an even greater challenge. A multiplicative adjustment of 1.02 was applied to all of the observations, which corrects for wind-induced errors in rain gauge reports, but will undercorrect some snowfalls, contributing to an apparent overforecasting bias, particularly in the cool season. To gauge the sensitivity of the cool season verification to this effect, the verification was repeated, multiplying the observations by 1.10. This can still result in underestimation of precipitation falling as snow when the wind exceeds 6 m s−1, but will be compensated by overestimation in those cases when the wind dropped below 3 m s−1 (Metcalfe and Goodison 1993; Goodison 1978), as well as the application of this factor to rain events included in the cool stratum. Based on these considerations, a 10% correction was deemed a fair, albeit crude, estimate to the upper bound on the adjustment required to account for undercatchment by the snow gauges. Adjusting the cool season observations by this factor of 1.1 does indeed improve the performance of the EPS forecasts (Fig. 16a) by virtue of their improved reliability (Fig. 16b).

While the brunt of the precipitation delivered by a synoptic low usually falls within 24 h and convective storms typically last an hour or less, significant precipitation events can span several days. Hence, precipitation forecasts from the Canadian EPS were also verified for accumulations over more than 1 day. Verification measures for forecasts of 2-day precipitation accumulations (not shown here) closely resemble those of the 24-h precipitation forecasts already examined. The differences in the BSSs, Brier components, and ROC areas for the 2-day cool season forecasts were more evenly spread among the 90th, 95th, and 99th percentile thresholds than was the cases for the forecasts of 24-h accumulation, for which the forecasts at the 90th and 95th percentile thresholds were clustered quite closely together.

Although the absence of precipitation is unremarkable over any given 24-h period, for sufficiently long intervals small precipitation amounts become significant. This is reflected in the box plots of 7-day precipitation amount at each station plotted in Figs. 17a and 17b. While still positively skewed, these box plots have a much more conventional appearance than did those for 24-h accumulations (cf. Figs. 2a and 2b): None of the first quartiles vanish and the means approach the medians as the distributions become increasingly normal, in accordance with the central limit theorem2 (Feller 1957).

Reliability diagrams are plotted in Figs. 18a and 18b for probability forecasts of 7-day precipitation amount below the 10th percentile (in the long-term climate), using a sample aggregated over a subset of the aforementioned 36 observation stations. An underforecasting bias is apparent in the warm season forecasts, unconditional inasmuch as the bias prevails in all forecast bins. Indeed, this reliability diagram has the signature of an unconditional underforecasting bias [here the events are defined as 1-week intervals over which precipitation accumulation was below the prescribed threshold, in contrast to the threshold exceedances considered in Wilks (2006) and earlier in this paper]. The comparative rarity of the event (as perceived by the Canadian EPS) is clear in the histograms of the forecast bins for both seasons; the population of the lowest probability bin is an order of magnitude larger than the others. The cool season forecasts exhibit surprisingly good reliability, belying BSSs poorer than those for the warm season (cf. Fig. 19a).

Figure 19b reveals that the superior performance of the warm season forecasts is attributable to their higher resolution, the reliability of the cool season forecasts being somewhat better than that of the warm season forecasts (BSSs in Fig. 19a were obtained from (4), not (5), which would yield a negative skill for the cool season forecasts even at the earliest forecast projection of 7 days). Because of the overwhelming size of the populations of the leftmost bins, their contributions to the Brier scores overshadow contributions from the other bins. Comparing Figs. 18a and 18b shows that the resolution of the cool season forecasts is worse than that of the warm season forecasts in the first bin because the frequency in the cool season of the verification sample falls below the long-term climatology (by definition 10%), reducing the contribution (o1o)2 to the resolution [cf. (2)]. The event frequency in the warm season, on the other hand, exceeds the long-term climatology, magnifying the contribution to the resolution from the data in the lowest bin. Superior discrimination of the warm season forecasts also shows up in the ROC areas (not shown), indicating skill to 9 days in the warm season and 7 days in the cool season.

5. Discussion and conclusions

In our verification sample aggregated over stations across the country, forecasts of 24-h precipitation accumulation showed skill, as measured by the BSS, to 5 days in the warm season and 3 days in the cool season. These results are obtained for the probability of exceeding the 90th percentile in the long-term station climates. For more extreme events (exceedances of thresholds defined by the 95th and 99th climate percentiles), the skill of the forecasts deteriorated as the phenomena became rarer and, thus, more difficult to predict.

Resolving BSS into reliability and resolution, we see that the reliability varies slowly with increasing forecast projection, leveling off after a few days—a behavior typical of ensemble forecasts (Candille and Talagrand 2005; Mullen and Buizza 2001). The deterioration of skill with increasing lead time is largely attributable to decreasing resolution, which decays monotonically with increasing forecast projection, as is echoed in the falloff of the ROC area with projection (discounting the noise evident in the forecasts for the 99th percentile). This is also reflected in the monotonicity of BSS with lead time, behavior that contrasts with that of the NCEP and ECMWF EPSs, both of which exhibited a dip in BSS at a forecast projection of roughly 48 h, as discussed in Mullen and Buizza (2001). The monotonicity in skill of the Canadian EPS lends support to Mullen and Buizza’s contention that this local minimum in skill of the NCEP and ECMWF EPSs may be an artifact of their perturbation strategies, versus the Monte Carlo technique generating perturbations to the observations ingested by the Canadian system.

Direct comparison of the verification of the Canadian EPS forecasts against studies such as those of Mullen and Buizza (2001) is complicated by their use of physical thresholds, versus the quantile thresholds employed here. Use of percentile thresholds circumvented those problems discussed in Hamill and Juras (2006) that can arise when compiling a sufficiently large sample by aggregating observations over an extended area. Mullen and Buizza also employed such an aggregate sample, lumping observational records from stations across the United States, without correcting for these possible distortions. Moreover, the observational data in their verification sample were obtained from a box-averaging technique: several observations were associated with each grid tile and averaged to yield a value representative of the mean precipitation in that tile. This technique effectively maps station observations onto the EPS grid, smoothing them into a signal more directly comparable to the gridded NWP precipitation forecasts of an EPS (Cherubini et al. 2002; Ghelli and Lalaurette 2000). The more sparsely distributed Canadian network renders this approach impractical here. Instead the verification was made against individual station records, which can capture phenomena on a scale much finer than the grid of the EPS, exacerbating underdispersion in the rightmost bin of the order statistics for the rank histograms. This is reflected in inflated confidence on the part of the forecast system and a concomitant degradation of forecast skill. Discussion of the verification of EPS probabilistic QPFs (PQPFs) against box-averaged rain gauge analyses versus individual station reports can be found in Mullen and Buizza (2002).

A striking feature of the Canadian EPS is the relative skill of the warm versus the cool season forecasts. Whereas in other investigations (Mullen and Buizza 2001; Buizza et al. 1999a) forecasts for the cool season had skill superior to that of the warm season forecasts, we found the reverse for the Canadian EPS, at least for the 90th percentile threshold. Superiority of the cool season forecasts may seem intuitively reasonable inasmuch as these are often associated with synoptic-scale disturbances that tend to be better resolved by the NWP model and, therefore, more accurately forecast than the mesoscale phenomena more common in the summer. Nevertheless, uncertainty in the parameterizations used to incorporate subgrid-scale phenomena into the NWP model is well accounted for in the Canadian model by dint of a variety of representations of several subgrid-scale processes, including deep convection, as described in section 2.

This can have the effect of producing precipitation fields whose positions vary widely among the different members, even for forecast projections of 1 day. This, coupled with the overdiffuse convective precipitation fields typical of NWP models, tends to smear the resulting probabilistic forecasts over a wide area. The confidence with which the EPS forecasts precipitation at any given location is therefore blunted, reducing the sharpness. This is reflected in the inferior resolution of the warm season forecasts, while the reverse holds for the reliability (cf. Figs. 7a and 7b). The greater confidence of the cool season forecasts manifests itself in much more severe underdispersion, as is revealed in the rank histograms in Figs. 8a and 8b.

Furthermore, while convective cells are too small to resolve in the NWP models of the ensemble members, the propensity for warm season convection is preconditioned by synoptic-scale processes [the large-scale forcing referred to in Doswell (1987)]. Barring systemic local forcings such as lake breezes or orographic effects (which are not too important for the stations considered in this study), the stochastic nature of the triggers effecting the convection can be faithfully rendered by the suite of parameterizations in the Canadian EPS. The operational implementation into the ECMWF EPS of a stochastic physics subgrid parameterization scheme (Mullen and Buizza 2001; Buizza et al. 1999b) came too late to show up in the summer stratum of the verification sample analyzed in Mullen and Buizza (2001), which posited that the improvements in the wintertime forecasts of this sample were largely attributable to this change in the parameterization scheme of the EPS.

The greater regional variability in the cool season precipitation climate (cf. Figs. 2a and 2b and section 3d) is hardly surprising since summertime synoptic forcings are typically quite weak across the country (Petterssen 1956b), the main source of rain for most stations being surface-forced showers and thunderstorms, whereas in the winter sharp contrasts in the climate assert themselves. The Arctic ridge dominates the prairies through much of the winter, while the Great Lakes control Ontario’s weather, and the Atlantic and Pacific coasts are impacted by major storm tracks. It is just such regional variations in climate that can contribute to the kind of distortions (of scores such as the ROC areas and BSS) discussed in Hamill and Juras (2006). Greater spatial variability in cool season precipitation could also be anticipated for the United States inasmuch as there are strong similarities in Canadian and American climates, particularly in the summer. Hence, for previous verification studies such as Mullen and Buizza (2001) and Buizza et al. (1999b), which analyzed samples lumped over extended regions of the United States without taking into account the effects described in Hamill and Juras (2006), this artificial inflation of the skill could be larger for the cool season forecasts, rewarding them for discerning greater spatial climate variability than is obtained in the warm season. Of course, there are also significant differences in the precipitation climates of the two countries, especially in light of the seasonal stratification applied in this study. The period of summertime convection is considerably shorter in Canada, particularly on the prairies. Moreover, in the latter stages of the warm season the impact of posttropical systems on stations along the Atlantic coast of Canada dilutes the signal from summertime convection. How much this difference in precipitation climates might contribute to the difference in the relative seasonal performance found in this study compared with previous verifications of ensemble forecasts of precipitation over the United States is unclear.

For more extreme thresholds (95th and 99th percentiles), the resolution and skill of the warm season forecasts fall off more quickly than do those of the cool season forecasts, especially passing from the 90th percentile to the 95th. The median physical threshold corresponding to the latter percentile was between 10 and 15 mm. The areas of the isohyets in the EPS member precipitation forecasts for accumulations of these magnitudes shrink dramatically about the centers of the convective “cells” forecast by the NWP models. By the 99th percentile, the median 24-h accumulation rises to between 25 and 30 mm and the forecast problem becomes still more challenging, as is reflected in the meager skill for either season.

Considerable caution must be exercised in any interseasonal comparisons in light of the sensitivity of the verification to the calibration of the observational record, which is complicated by measurements of precipitation in the form of snow. Nevertheless, the configuration of the Canadian EPS, with its broad spectrum of different physical parameterizations designed to simulate the uncertainty in the resolution of subgrid-scale processes, may afford a significant improvement in the forecast of convection, which dominates warm season precipitation. This only holds for forecasts that do not venture too far into the tails of the climate distributions. The performance of the forecasts in the warm season drops off dramatically with increasing thresholds; a consequence of which is that the cool season forecasts at higher thresholds become comparable or superior to those of the warm season.

One might expect, a priori, the EPS forecasts of multiday precipitation accumulation to benefit from the resultant smoothing imposed upon the signal. In particular, summing over more than 1 day’s precipitation could mitigate the impact of phasing errors in the positions of features rendered by the NWP models. Furthermore, convection typically experiences a maximum in the late afternoon or early evening, making 24-h rainfall totals particularly sensitive to 0000 UTC, which demarcates our verification day, since this corresponds to local times ranging from around 2100 local daylight savings time on the East Coast to 1700 local daylight savings time on the West Coast. Expectations of significant improvement in the skill of forecasting summertime multiday accumulations were dashed, however, because the performances of the warm season forecasts of 1- and 2-day accumulations (not shown here) were similar. This could be a manifestation of the relatively diffuse spatial signal associated with convective precipitation forecasts in the Canadian EPS alluded to above, translating into a diffuse temporal signal, particularly for dynamical triggers propagating in the prevailing flow.

Over longer intervals, the Canadian EPS showed some skill forecasting week-long dry periods (precipitation below the 10th percentile in the long-term climate) in the warm season, while cool season forecasts showed negligible skill. The superiority of the warm season forecasts derived from higher resolution, which was double that of the cool season forecasts. Particular care must be exercised in comparing these resolutions, however, which are largely determined by the amount by which the average observation in the lowest bin falls below the event frequency in the verification climate. This event frequency was above the long-term value (10%) in the warm season, and below this long-term value in the cool season. Hence, the superiority of the warm season forecasts of this event could merely be an artifact of a verification sample containing abnormally dry summers and abnormally wet winters.

Long periods of comparative stability in the evolution of the Canadian EPS, such as the one exploited to conduct this investigation, are rare. The introduction of a Kalman filter into the assimilation cycle early in 2005 has been followed by an extension of the integration out to 16 days, as well as the introduction of a second daily integration starting from a 1200 UTC analysis. Upcoming changes include the abandonment of the SEF members accompanied by the addition of GEM members, bringing their number (including the control) to 21, coupled with changes to the parameterization and a refinement of the grid. The results of the verification described here can serve as a benchmark against which to gauge the impact of such changes.

Acknowledgments

Part of the observational record was extracted from a database constructed by Gérard Croteau at the Canadian Meteorological Centre. Some of the Canadian EPS forecast data were supplied by Gérard Pellerin at the Canadian Meteorological Centre. ROC areas were computed using code supplied by Gérard Pellerin, adapted from the software of D. D. Dorfman and E. Alf, based on a program listed in Swets and Pickett (1982). The authors also thank Dr. William Burrows, Dr. Thomas Hamill, two anonymous reviewers, and the editors, Dr. Paul Roebber and Dr. Da-Lin Zhang, for their helpful comments and suggestions.

REFERENCES

  • Atger, F., 1999: The skill of ensemble prediction systems. Mon. Wea. Rev., 127 , 19411953.

  • Atger, F., 2003: Spatial and interannual variability of the reliability of ensemble-based probabilistic forecasts: Consequences for calibrations. Mon. Wea. Rev., 131 , 15091523.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Atger, F., 2004: Estimation of the reliability of ensemble-based probabilistic forecasts. Quart. J. Roy. Meteor. Soc., 130 , 627646.

  • Bonsal, B. R., and Wheaton E. E. , 2005: Atmospheric circulation comparisons between the 2001 and 2002 and the 1961 and 1988 Canadian prairie droughts. Atmos.–Ocean, 43 , 163172.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Brier, G. W., 1950: Verification of forecasts expressed in terms of probabilities. Mon. Wea. Rev., 78 , 13.

  • Bright, D. R., Weiss S. J. , Wandishin M. S. , Kain J. S. , and Stensrud D. J. , 2004: Evaluation of short-range ensemble forecasts during the 2003 SPC/NSSL Spring Program. Preprints, 22nd Conf. on Severe Local Storms, Hyannis, MA, Amer. Meteor. Soc., P15.5. [Available online at http://ams.confex.com/ams/pdfpapers/68921.pdf.].

  • Buizza, R., Hollingsworth A. , Lalaurette F. , and Ghelli A. , 1999a: Probabilistic predictions of precipitation using the ECMWF Ensemble Prediction System. Wea. Forecasting, 14 , 168189.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Buizza, R., Miller M. , and Palmer T. N. , 1999b: Stochastic simulation of model uncertainties. Quart. J. Roy. Meteor. Soc., 125 , 28872908.

    • Search Google Scholar
    • Export Citation
  • Buizza, R., Houtekamer P. L. , Toth Z. , Pellerin G. , Wei M. , and Zhu Y. , 2005: A Comparison of the ECMWF, MSC, and NCEP global ensemble prediction systems. Mon. Wea. Rev., 133 , 10761097.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Candille, G., and Talagrand O. , 2005: Evaluation of probabilistic prediction systems for a scalar variable. Quart. J. Roy. Meteor. Soc., 131 , 21312150.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cherubini, T., Ghelli A. , and Lalaurette F. , 2002: Verification of precipitation forecasts over the Alpine region using a high-density observing network. Wea. Forecasting, 17 , 238249.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Côté, J. S., Gravel S. , Méthot A. , Patoine A. , Roch M. , and Staniforth A. , 1998: The operational CMC/MRB Global Environmental Multiscale (GEM) model. Part I: Design considerations and formulation. Mon. Wea. Rev., 126 , 13731395.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Doswell C. A. III, , 1987: The distinction between large-scale and mesoscale contribution to severe convection: A case study example. Wea. Forecasting, 2 , 316.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Eckel, F. A., and Walters M. K. , 1998: Calibrated probabilistic quantitative precipitation forecasts based on the MRF ensemble. Wea. Forecasting, 13 , 11321147.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Feller, W., 1957: An Introduction to Probability Theory and its Applications. Vol. 1. Wiley, 461 pp.

  • Fritsch, J. M., Thomas I. J. , Hansen O. F. , and Hardy G. G. , 1998: Quantitative precipitation forecasting: Report of the Eighth Prospectus Development Team, U.S. Weather Research Program. Bull. Amer. Meteor. Soc., 79 , 285299.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gauthier, P., Charette C. , Fillion L. , Koclas P. , and Laroche S. , 1999: Implementation of a 3d variational data assimilation system at the Canadian Meteorological Centre. Part 1: The global analysis. Atmos.–Ocean, 37 , 103156.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ghelli, A., and Lalaurette F. , 2000: Verifying precipitation forecasts using upscaled observations. ECMWF Newsletter, No. 87, ECMWF, Reading, United Kingdom, 9–17.

  • Glahn, H., and Lowry D. A. , 1972: The use of model output statistics (MOS) in objective weather forecasting. J. Appl. Meteor., 11 , 12031211.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Goodison, B. E., 1978: Accuracy of Canadian snow gage measurements. J. Appl. Meteor., 17 , 15421548.

  • Hamill, T. M., and Colucci S. J. , 1998: Evaluation of Eta–RSM ensemble probabilistic precipitation forecasts. Mon. Wea. Rev., 126 , 711724.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and Juras J. , 2006: Measuring forecast forecast skill: Is it real skill or is it the varying climatology? Quart. J. Roy. Meteor. Soc., 132 , 29052923.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Harvey L. O. Jr., , Hammond K. R. , Lusk C. M. , and Mross E. F. , 1992: The application of signal detection theory to weather forecasting behavior. Mon. Wea. Rev., 120 , 863883.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Houtekamer, P. L., Lefaivre L. , Derome J. , Ritchie H. , and Mitchell H. L. , 1996: A system simulation approach to ensemble prediction. Mon. Wea. Rev., 124 , 12251242.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jones, S. C., and Coauthors, 2003: Extratropical transition of tropical cyclones: Forecast challenges, current understanding, and future directions. Wea. Forecasting, 18 , 10521092.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lefaivre, L., Houtekamer P. L. , Bergeron A. , and Verret R. , 1997: The CMC Ensemble Prediction System. Proc. Sixth Workshop on Meteorological Operational Systems, Reading, United Kingdom, ECMWF, 31–44.

    • Search Google Scholar
    • Export Citation
  • Mason, I., 1982: A model for assessment of weather forecasts. Aust. Meteor. Mag., 30 , 291303.

  • Metcalfe, J. R., and Goodison B. E. , 1993: Correction of Canadian winter precipitation data. Proc. Eighth Symp. on Meteorological Observations and Instrumentation, Anaheim, CA, Amer. Meteor. Soc., 338–343.

    • Search Google Scholar
    • Export Citation
  • Metcalfe, J. R., Routledge B. , and Devine K. , 1997: Rainfall measurement in Canada: Changing observational methods and archive adjustment procedures. J. Climate, 10 , 92101.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mullen, S. L., and Buizza R. , 2001: Quantitative precipitation forecasts over the United States by the ECMWF Ensemble Prediction System. Mon. Wea. Rev., 129 , 638663.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mullen, S. L., and Buizza R. , 2002: The impact of horizontal resolution and ensemble size on probabilistic forecasts of precipitation by the ECMWF Ensemble Prediction System. Wea. Forecasting, 17 , 173191.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1973: A new vector partition of the probability score. J. Appl. Meteor., 12 , 595600.

  • Murphy, A. H., 1986: The attributes diagram: A geometric framework for assessing the quality of probability forecasts. Int. J. Forecasting, 2 , 285293.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1993: What is a good forecast? An essay on the nature of goodness in weather forecasting. Wea. Forecasting, 8 , 281293.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., and Winkler R. L. , 1987: A general framework for forecast verification. Mon. Wea. Rev., 115 , 13301338.

  • Pellerin, G., Lefaivre L. , Houtekamer P. , and Girard C. , 2003: Increasing the horizontal resolution of ensemble forecasts at CMC. Nonlinear Processes Geophys., 10 , 463468.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Petterssen, S., 1956a: Motion and Motion Systems. Vol. 1, Weather Analysis and Forecasting, McGraw-Hill, 428 pp.

  • Petterssen, S., 1956b: Weather and Weather Systems. Vol. 2, Weather Analysis and Forecasting, McGraw-Hill, 266 pp.

  • Richardson, D. S., 2000: Skill and relative economic value of the ECMWF Ensemble Prediction System. Quart. J. Roy. Meteor. Soc., 126 , 649667.

  • Ritchie, H., and Beaudoin C. , 1994: Approximations and sensitivity experiments with a baroclinic semi-Lagrangian spectral model. Mon. Wea. Rev., 122 , 23912399.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sanders, F., 1986: Trends in skill of Boston forecasts made at MIT 1966–84. Bull. Amer. Meteor. Soc., 67 , 170176.

  • Sanders, F., and Gyakum J. R. , 1980: Synoptic–dynamic climatology of the “bomb”. Mon. Wea. Rev., 108 , 15891606.

  • Stanski, H. R., Wilson L. J. , and Burrows W. R. , 1990: Survey of common verification methods in meteorology. World Weather Watch Tech. Rep. 8., WMO, Geneva, Switzerland, 114 pp.

  • Swets, J. A., and Pickett R. M. , 1982: Evaluation of Diagnostic Systems: Methods from Signal Detection Theory. Academic Press, 253 pp.

  • Wilks, D. S., 2006: Statistical Methods in the Atmospheric Sciences. 2nd ed. Elsevier Academic, 627 pp.

  • Wilson, L. J., 2000: Comments on “Probabilistic predictions of precipitation using the ECMWF Ensemble Prediction System”. Wea. Forecasting, 15 , 361364.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Yuan, H., Mullen S. L. , Gao X. , Sorooshian S. , Du J. , and Juang H-M. H. , 2005: Verification of probabilistic quantitative precipitation forecasts over the southwest United States during winter 2002/03 by the RSM ensemble system. Mon. Wea. Rev., 133 , 279294.

    • Crossref
    • Search Google Scholar
    • Export Citation

Fig. 1.
Fig. 1.

Locations of stations supplying observational data. (Stations referred to explicitly have been labeled and/or named)

Citation: Weather and Forecasting 23, 4; 10.1175/2008WAF2006099.1

Fig. 2.
Fig. 2.

Box plots of observational 24-h precipitation distributions for (a) cool and (b) warm seasons (the whiskers extend beyond the third quartile by 1.5 times the interquartile range). Station means (used to order the stations) are indicated by the heavy solid line.

Citation: Weather and Forecasting 23, 4; 10.1175/2008WAF2006099.1

Fig. 3.
Fig. 3.

Observed 24-h precipitation accumulation: distributions for selected stations.

Citation: Weather and Forecasting 23, 4; 10.1175/2008WAF2006099.1

Fig. 4.
Fig. 4.

Seasonal trend of 24-h precipitation in long-term climate and verification samples: (a) St. Johns, (b) Vancouver, (c) Saskatoon, and (d) Edmonton. Error bars extend vertically from every third long-term climate data point by twice the standard deviation for that point.

Citation: Weather and Forecasting 23, 4; 10.1175/2008WAF2006099.1

Fig. 5.
Fig. 5.

Reliability diagrams for (a) warm and (b) cool season forecasts of 24-h accumulation at the 90th percentile.

Citation: Weather and Forecasting 23, 4; 10.1175/2008WAF2006099.1

Fig. 6.
Fig. 6.

BSSs for (a) warm and (b) cool season forecasts of 24-h accumulation as a function of projection.

Citation: Weather and Forecasting 23, 4; 10.1175/2008WAF2006099.1

Fig. 7.
Fig. 7.

Reliability and resolution for (a) warm and (b) cool season forecasts of 24-h accumulation as a function of projection.

Citation: Weather and Forecasting 23, 4; 10.1175/2008WAF2006099.1

Fig. 8.
Fig. 8.

Rank histograms of (a) warm and (b) cool season forecasts of 24-h accumulation with a 24-h lead time.

Citation: Weather and Forecasting 23, 4; 10.1175/2008WAF2006099.1

Fig. 9.
Fig. 9.

ROC areas for (a) warm and (b) cool season forecasts of 24-h accumulation as a function of projection.

Citation: Weather and Forecasting 23, 4; 10.1175/2008WAF2006099.1

Fig. 10.
Fig. 10.

Reliability diagrams for (a) warm and (b) cool season forecasts of 24-h accumulation at St. John’s.

Citation: Weather and Forecasting 23, 4; 10.1175/2008WAF2006099.1

Fig. 11.
Fig. 11.

BSSs for (a) warm and (b) cool season forecasts of 24-h accumulation at St. John’s.

Citation: Weather and Forecasting 23, 4; 10.1175/2008WAF2006099.1

Fig. 12.
Fig. 12.

Brier components for (a) warm and (b) cool season forecasts of 24-h accumulation at St. John’s.

Citation: Weather and Forecasting 23, 4; 10.1175/2008WAF2006099.1

Fig. 13.
Fig. 13.

ROC areas for (a) warm and (b) cool season forecasts of 24-h accumulation at St. John’s.

Citation: Weather and Forecasting 23, 4; 10.1175/2008WAF2006099.1

Fig. 14.
Fig. 14.

BSS and ROC areas, respectively, of (a), (c) warm and (b), (d) cool season forecasts of 24-h accumulation averaged over all stations.

Citation: Weather and Forecasting 23, 4; 10.1175/2008WAF2006099.1

Fig. 15.
Fig. 15.

Brier components of (a) warm and (b) cool season forecasts of 24-h accumulation averaged over all stations.

Citation: Weather and Forecasting 23, 4; 10.1175/2008WAF2006099.1

Fig. 16.
Fig. 16.

BSSs and components for forecasts of 24-h accumulation: observation adjustment=1.10.

Citation: Weather and Forecasting 23, 4; 10.1175/2008WAF2006099.1

Fig. 17.
Fig. 17.

Box plots of observational 7-day precipitation distributions for (a) cool and (b) warm seasons.

Citation: Weather and Forecasting 23, 4; 10.1175/2008WAF2006099.1

Fig. 18.
Fig. 18.

Reliability diagrams for (a) warm and (b) cool season forecasts of 7-day precipitation below the 10th percentile.

Citation: Weather and Forecasting 23, 4; 10.1175/2008WAF2006099.1

Fig. 19.
Fig. 19.

BSSs and components for forecasts of 7-day precipitation below the 10th percentile.

Citation: Weather and Forecasting 23, 4; 10.1175/2008WAF2006099.1

1

The ROC areas for cool season forecasts of 5-mm exceedances were not available at some of the stations with dry winter climates, because of the inability of the algorithm to fit the signal distribution to a normal curve.

2

In the approximation that daily precipitation amounts are independent.

Save