Search Results

You are looking at 1 - 6 of 6 items for :

  • Author or Editor: David B. Stephenson x
  • Monthly Weather Review x
  • Refine by Access: Content accessible to me x
Clear All Modify Search
Ian T. Jolliffe
and
David B. Stephenson

Abstract

Verification is an important part of any forecasting system. It is usually achieved by computing the value of some measure or score that indicates how good the forecasts are. Many possible verification measures have been proposed, and to choose between them a number of desirable properties have been defined. For probability forecasts of a binary event, two of the best known of these properties are propriety and equitability. A proof that the two properties are incompatible for a wide class of verification measures is given in this paper, after briefly reviewing the two properties and some recent attempts to improve properties for the well-known Brier skill score.

Full access
Marion P. Mittermaier
and
David B. Stephenson

Abstract

Synoptic observations are often treated as error-free representations of the true state of the real world. For example, when observations are used to verify numerical weather prediction (NWP) forecasts, forecast–observation differences (the total error) are often entirely attributed to forecast inaccuracy. Such simplification is no longer justifiable for short-lead forecasts made with increasingly accurate higher-resolution models. For example, at least 25% of t + 6 h individual Met Office site-specific (postprocessed) temperature forecasts now typically have total errors of less than 0.2 K, which are comparable to typical instrument measurement errors of around 0.1 K. In addition to instrument errors, uncertainty is introduced by measurements not being taken concurrently with the forecasts. For example, synoptic temperature observations in the United Kingdom are typically taken 10 min before the hour, whereas forecasts are generally extracted as instantaneous values on the hour. This study develops a simple yet robust statistical modeling procedure for assessing how serially correlated subhourly variations limit the forecast accuracy that can be achieved. The methodology is demonstrated by application to synoptic temperature observations sampled every minute at several locations around the United Kingdom. Results show that subhourly variations lead to sizeable forecast errors of 0.16–0.44 K for observations taken 10 min before the forecast issue time. The magnitude of this error depends on spatial location and the annual cycle, with the greater errors occurring in the warmer seasons and at inland sites. This important source of uncertainty consists of a bias due to the diurnal cycle, plus irreducible uncertainty due to unpredictable subhourly variations that fundamentally limit forecast accuracy.

Full access
Edward C. D. Pope
,
David B. Stephenson
, and
David R. Jackson

Abstract

Categorical probabilistic prediction is widely used for terrestrial and space weather forecasting as well as for other environmental forecasts. One example is a warning system for geomagnetic disturbances caused by space weather, which are often classified on a 10-level scale. The simplest approach assumes that the transition probabilities are stationary in time—the homogeneous Markov chain (HMC). We extend this approach by developing a flexible nonhomogeneous Markov chain (NHMC) model using Bayesian nonparametric estimation to describe the time-varying transition probabilities. The transition probabilities are updated using a modified Bayes’s rule that gradually forgets transitions in the distant past, with a tunable memory parameter. The approaches were tested by making daily geomagnetic state forecasts at lead times of 1–4 days and were verified over the period 2000–19 using the rank probability score (RPS). Both HMC and NHMC models were found to be skillful at all lead times when compared with climatological forecasts. The NHMC forecasts with an optimal memory parameter of ~100 days were found to be substantially more skillful than the HMC forecasts, with an RPS skill for the NHMC of 10.5% and 5.6% for lead times of 1 and 4 days ahead, respectively. The NHMC is thus a viable alternative approach for forecasting geomagnetic disturbances and could provide a new benchmark for producing operational forecasts. The approach is generic and is applicable to other forecasts that include discrete weather regimes or hydrological conditions (e.g., wet and dry days).

Free access
Cristina Primo
,
Christopher A. T. Ferro
,
Ian T. Jolliffe
, and
David B. Stephenson

Abstract

Probabilistic forecasts of atmospheric variables are often given as relative frequencies obtained from ensembles of deterministic forecasts. The detrimental effects of imperfect models and initial conditions on the quality of such forecasts can be mitigated by calibration. This paper shows that Bayesian methods currently used to incorporate prior information can be written as special cases of a beta-binomial model and correspond to a linear calibration of the relative frequencies. These methods are compared with a nonlinear calibration technique (i.e., logistic regression) using real precipitation forecasts. Calibration is found to be advantageous in all cases considered, and logistic regression is preferable to linear methods.

Full access
Pascal J. Mailier
,
David B. Stephenson
,
Christopher A. T. Ferro
, and
Kevin I. Hodges

Abstract

The clustering in time (seriality) of extratropical cyclones is responsible for large cumulative insured losses in western Europe, though surprisingly little scientific attention has been given to this important property. This study investigates and quantifies the seriality of extratropical cyclones in the Northern Hemisphere using a point-process approach. A possible mechanism for serial clustering is the time-varying effect of the large-scale flow on individual cyclone tracks. Another mechanism is the generation by one “parent” cyclone of one or more “offspring” through secondary cyclogenesis. A long cyclone-track database was constructed for extended October–March winters from 1950 to 2003 using 6-h analyses of 850-mb relative vorticity derived from the NCEP–NCAR reanalysis. A dispersion statistic based on the variance-to-mean ratio of monthly cyclone counts was used as a measure of clustering. It reveals extensive regions of statistically significant clustering in the European exit region of the North Atlantic storm track and over the central North Pacific. Monthly cyclone counts were regressed on time-varying teleconnection indices with a log-linear Poisson model. Five independent teleconnection patterns were found to be significant factors over Europe: the North Atlantic Oscillation (NAO), the east Atlantic pattern, the Scandinavian pattern, the east Atlantic–western Russian pattern, and the polar–Eurasian pattern. The NAO alone is not sufficient for explaining the variability of cyclone counts in the North Atlantic region and western Europe. Rate dependence on time-varying teleconnection indices accounts for the variability in monthly cyclone counts, and a cluster process did not need to be invoked.

Full access
Stefan Siegert
,
Omar Bellprat
,
Martin Ménégoz
,
David B. Stephenson
, and
Francisco J. Doblas-Reyes

Abstract

The skill of weather and climate forecast systems is often assessed by calculating the correlation coefficient between past forecasts and their verifying observations. Improvements in forecast skill can thus be quantified by correlation differences. The uncertainty in the correlation difference needs to be assessed to judge whether the observed difference constitutes a genuine improvement, or is compatible with random sampling variations. A widely used statistical test for correlation difference is known to be unsuitable, because it assumes that the competing forecasting systems are independent. In this paper, appropriate statistical methods are reviewed to assess correlation differences when the competing forecasting systems are strongly correlated with one another. The methods are used to compare correlation skill between seasonal temperature forecasts that differ in initialization scheme and model resolution. A simple power analysis framework is proposed to estimate the probability of correctly detecting skill improvements, and to determine the minimum number of samples required to reliably detect improvements. The proposed statistical test has a higher power of detecting improvements than the traditional test. The main examples suggest that sample sizes of climate hindcasts should be increased to about 40 years to ensure sufficiently high power. It is found that seasonal temperature forecasts are significantly improved by using realistic land surface initial conditions.

Full access