Search Results

You are looking at 1 - 10 of 14 items for

  • Author or Editor: Christopher A. T. Ferro x
  • All content x
Clear All Modify Search
Christopher A. T. Ferro

Abstract

This article considers the Brier score for verifying ensemble-based probabilistic forecasts of binary events. New estimators for the effect of ensemble size on the expected Brier score, and associated confidence intervals, are proposed. An example with precipitation forecasts illustrates how these estimates support comparisons of the performances of competing forecasting systems with possibly different ensemble sizes.

Full access
Christopher A. T. Ferro

Abstract

This article proposes a method for verifying deterministic forecasts of rare, extreme events defined by exceedance above a high threshold. A probability model for the joint distribution of forecasts and observations, and based on extreme-value theory, characterizes the quality of forecasting systems with two key parameters. This enables verification measures to be estimated for any event rarity and helps to reduce the uncertainty associated with direct estimation. Confidence regions are obtained and the method is used to compare daily precipitation forecasts from two operational numerical weather prediction models.

Full access
Christopher A. T. Ferro and David B. Stephenson

Abstract

Verifying forecasts of rare events is challenging, in part because traditional performance measures degenerate to trivial values as events become rarer. The extreme dependency score was proposed recently as a nondegenerating measure for the quality of deterministic forecasts of rare binary events. This measure has some undesirable properties, including being both easy to hedge and dependent on the base rate. A symmetric extreme dependency score was also proposed recently, but this too is dependent on the base rate. These two scores and their properties are reviewed and the meanings of several properties, such as base-rate dependence and complement symmetry that have caused confusion are clarified. Two modified versions of the extreme dependency score, the extremal dependence index, and the symmetric extremal dependence index, are then proposed and are shown to overcome all of its shortcomings. The new measures are nondegenerating, base-rate independent, asymptotically equitable, harder to hedge, and have regular isopleths that correspond to symmetric and asymmetric relative operating characteristic curves.

Full access
Christopher A. T. Ferro, Abdelwaheb Hannachi, and David B. Stephenson

Abstract

Anthropogenic influences are expected to cause the probability distribution of weather variables to change in nontrivial ways. This study presents simple nonparametric methods for exploring and comparing differences in pairs of probability distribution functions. The methods are based on quantiles and allow changes in all parts of the probability distribution to be investigated, including the extreme tails. Adjusted quantiles are used to investigate whether changes are simply due to shifts in location (e.g., mean) and/or scale (e.g., variance). Sampling uncertainty in the quantile differences is assessed using simultaneous confidence intervals calculated using a bootstrap resampling method that takes account of serial (intraseasonal) dependency. The methods are simple enough to be used on large gridded datasets. They are demonstrated here by exploring the changes between European regional climate model simulations of daily minimum temperature and precipitation totals for winters in 1961–90 and 2071–2100. Projected changes in daily precipitation are generally found to be well described by simple increases in scale, whereas minimum temperature exhibits changes in both location and scale.

Full access
Cristina Primo, Christopher A. T. Ferro, Ian T. Jolliffe, and David B. Stephenson

Abstract

Probabilistic forecasts of atmospheric variables are often given as relative frequencies obtained from ensembles of deterministic forecasts. The detrimental effects of imperfect models and initial conditions on the quality of such forecasts can be mitigated by calibration. This paper shows that Bayesian methods currently used to incorporate prior information can be written as special cases of a beta-binomial model and correspond to a linear calibration of the relative frequencies. These methods are compared with a nonlinear calibration technique (i.e., logistic regression) using real precipitation forecasts. Calibration is found to be advantageous in all cases considered, and logistic regression is preferable to linear methods.

Full access
Robin J. Hogan, Christopher A. T. Ferro, Ian T. Jolliffe, and David B. Stephenson

Abstract

In the forecasting of binary events, verification measures that are “equitable” were defined by Gandin and Murphy to satisfy two requirements: 1) they award all random forecasting systems, including those that always issue the same forecast, the same expected score (typically zero), and 2) they are expressible as the linear weighted sum of the elements of the contingency table, where the weights are independent of the entries in the table, apart from the base rate. The authors demonstrate that the widely used “equitable threat score” (ETS), as well as numerous others, satisfies neither of these requirements and only satisfies the first requirement in the limit of an infinite sample size. Such measures are referred to as “asymptotically equitable.” In the case of ETS, the expected score of a random forecasting system is always positive and only falls below 0.01 when the number of samples is greater than around 30. Two other asymptotically equitable measures are the odds ratio skill score and the symmetric extreme dependency score, which are more strongly inequitable than ETS, particularly for rare events; for example, when the base rate is 2% and the sample size is 1000, random but unbiased forecasting systems yield an expected score of around −0.5, reducing in magnitude to −0.01 or smaller only for sample sizes exceeding 25 000. This presents a problem since these nonlinear measures have other desirable properties, in particular being reliable indicators of skill for rare events (provided that the sample size is large enough). A potential way to reconcile these properties with equitability is to recognize that Gandin and Murphy’s two requirements are independent, and the second can be safely discarded without losing the key advantages of equitability that are embodied in the first. This enables inequitable and asymptotically equitable measures to be scaled to make them equitable, while retaining their nonlinearity and other properties such as being reliable indicators of skill for rare events. It also opens up the possibility of designing new equitable verification measures.

Full access
Mxolisi E. Shongwe, Christopher A. T. Ferro, Caio A. S. Coelho, and Geert Jan van Oldenborgh

Abstract

The seasonal predictability of cold spring seasons (March–May) in Europe from hindcasts/forecasts of three operational coupled general circulation models (CGCMs) is investigated. The models used in the investigation are the Met Office Global Seasonal Forecast System (GloSea), the ECMWF System-2 (S2), and the NCEP Climate Forecast System (CFS). Using the relative operating characteristic score and the Brier skill score the long-term prediction skill for spring 2-m temperature in the lower quintile (20%) is assessed. Over much of central and eastern Europe the predictive skill is found to be high. The skill of the Met Office GloSea and ECMWF S2 models significantly surpasses that of damped persistence over much of Europe but the NCEP CFS model outperforms this reference forecast only over a small area. The higher potential predictability of cold spring seasons in eastern relative to southwestern Europe can be attributed to snow effects as areas of high skill closely correspond with the climatological snow line, and snow is shown in this paper to be linked to cold spring 2-m temperatures in eastern Europe. The ability of the models to represent snow cover during the melt season is also investigated. The Met Office GloSea and the ECMWF S2 models are able to accurately mimic the observed pattern of monthly snow-cover interannual variability, but the NCEP CFS model predicts too short a snow season. Improvements in the snow analysis and land surface parameterizations could increase the skill of seasonal forecasts for cold spring temperatures.

Full access
Philip G. Sansom, Christopher A. T. Ferro, David B. Stephenson, Lisa Goddard, and Simon J. Mason

Abstract

This study describes a systematic approach to selecting optimal statistical recalibration methods and hindcast designs for producing reliable probability forecasts on seasonal-to-decadal time scales. A new recalibration method is introduced that includes adjustments for both unconditional and conditional biases in the mean and variance of the forecast distribution and linear time-dependent bias in the mean. The complexity of the recalibration can be systematically varied by restricting the parameters. Simple recalibration methods may outperform more complex ones given limited training data. A new cross-validation methodology is proposed that allows the comparison of multiple recalibration methods and varying training periods using limited data.

Part I considers the effect on forecast skill of varying the recalibration complexity and training period length. The interaction between these factors is analyzed for gridbox forecasts of annual mean near-surface temperature from the CanCM4 model. Recalibration methods that include conditional adjustment of the ensemble mean outperform simple bias correction by issuing climatological forecasts where the model has limited skill. Trend-adjusted forecasts outperform forecasts without trend adjustment at almost 75% of grid boxes. The optimal training period is around 30 yr for trend-adjusted forecasts and around 15 yr otherwise. The optimal training period is strongly related to the length of the optimal climatology. Longer training periods may increase overall performance but at the expense of very poor forecasts where skill is limited.

Full access
Stan Yip, Christopher A.T. Ferro, David B. Stephenson, and Ed Hawkins
Full access
Pascal J. Mailier, David B. Stephenson, Christopher A. T. Ferro, and Kevin I. Hodges

Abstract

The clustering in time (seriality) of extratropical cyclones is responsible for large cumulative insured losses in western Europe, though surprisingly little scientific attention has been given to this important property. This study investigates and quantifies the seriality of extratropical cyclones in the Northern Hemisphere using a point-process approach. A possible mechanism for serial clustering is the time-varying effect of the large-scale flow on individual cyclone tracks. Another mechanism is the generation by one “parent” cyclone of one or more “offspring” through secondary cyclogenesis. A long cyclone-track database was constructed for extended October–March winters from 1950 to 2003 using 6-h analyses of 850-mb relative vorticity derived from the NCEP–NCAR reanalysis. A dispersion statistic based on the variance-to-mean ratio of monthly cyclone counts was used as a measure of clustering. It reveals extensive regions of statistically significant clustering in the European exit region of the North Atlantic storm track and over the central North Pacific. Monthly cyclone counts were regressed on time-varying teleconnection indices with a log-linear Poisson model. Five independent teleconnection patterns were found to be significant factors over Europe: the North Atlantic Oscillation (NAO), the east Atlantic pattern, the Scandinavian pattern, the east Atlantic–western Russian pattern, and the polar–Eurasian pattern. The NAO alone is not sufficient for explaining the variability of cyclone counts in the North Atlantic region and western Europe. Rate dependence on time-varying teleconnection indices accounts for the variability in monthly cyclone counts, and a cluster process did not need to be invoked.

Full access