Search Results

You are looking at 1 - 10 of 37 items for

  • Author or Editor: David B. Stephenson x
  • Refine by Access: Content accessible to me x
Clear All Modify Search
David B. Stephenson

Abstract

This study investigates ways of quantifying the skill in forecasts of dichotomous weather events. The odds ratio, widely used in medical studies, can provide a powerful way of testing the association between categorical forecasts and observations. A skill score can be constructed from the odds ratio that is less sensitive to hedging than previously used scores. Furthermore, significance tests can easily be performed on the logarithm of the odds ratio to test whether the skill is purely due to chance sampling. Functions of the odds ratio and the Peirce skill score define a general class of skill scores that are symmetric with respect to taking the complement of the event. The study illustrates the ideas using Finley’s classic set of tornado forecasts.

Full access
Ian T. Jolliffe
and
David B. Stephenson

Abstract

Verification is an important part of any forecasting system. It is usually achieved by computing the value of some measure or score that indicates how good the forecasts are. Many possible verification measures have been proposed, and to choose between them a number of desirable properties have been defined. For probability forecasts of a binary event, two of the best known of these properties are propriety and equitability. A proof that the two properties are incompatible for a wide class of verification measures is given in this paper, after briefly reviewing the two properties and some recent attempts to improve properties for the well-known Brier skill score.

Full access
Marion P. Mittermaier
and
David B. Stephenson

Abstract

Synoptic observations are often treated as error-free representations of the true state of the real world. For example, when observations are used to verify numerical weather prediction (NWP) forecasts, forecast–observation differences (the total error) are often entirely attributed to forecast inaccuracy. Such simplification is no longer justifiable for short-lead forecasts made with increasingly accurate higher-resolution models. For example, at least 25% of t + 6 h individual Met Office site-specific (postprocessed) temperature forecasts now typically have total errors of less than 0.2 K, which are comparable to typical instrument measurement errors of around 0.1 K. In addition to instrument errors, uncertainty is introduced by measurements not being taken concurrently with the forecasts. For example, synoptic temperature observations in the United Kingdom are typically taken 10 min before the hour, whereas forecasts are generally extracted as instantaneous values on the hour. This study develops a simple yet robust statistical modeling procedure for assessing how serially correlated subhourly variations limit the forecast accuracy that can be achieved. The methodology is demonstrated by application to synoptic temperature observations sampled every minute at several locations around the United Kingdom. Results show that subhourly variations lead to sizeable forecast errors of 0.16–0.44 K for observations taken 10 min before the forecast issue time. The magnitude of this error depends on spatial location and the annual cycle, with the greater errors occurring in the warmer seasons and at inland sites. This important source of uncertainty consists of a bias due to the diurnal cycle, plus irreducible uncertainty due to unpredictable subhourly variations that fundamentally limit forecast accuracy.

Full access
Ian T. Jolliffe
and
David B. Stephenson
Full access
Thomas J. Bracegirdle
and
David B. Stephenson

Abstract

Statistical relationships between future and historical model runs in multimodel ensembles (MMEs) are increasingly exploited to make more constrained projections of climate change. However, such emergent constraints may be spurious and can arise because of shared (common) errors in a particular MME or because of overly influential models. This study assesses the robustness of emergent constraints used for Arctic warming by comparison of such constraints in ensembles generated by the two most recent Coupled Model Intercomparison Project (CMIP) experiments: CMIP3 and CMIP5. An ensemble regression approach is used to estimate emergent constraints in Arctic wintertime surface air temperature change over the twenty-first century under the Special Report on Emission Scenarios (SRES) A1B scenario in CMIP3 and the Representative Concentration Pathway (RCP) 4.5 scenario in CMIP5. To take account of different scenarios, this study focuses on polar amplification by using temperature responses at each grid point that are scaled by the global mean temperature response for each climate model. In most locations, the estimated emergent constraints are reassuringly similar in CMIP3 and CMIP5 and differences could have easily arisen from sampling variation. However, there is some indication that the emergent constraint and polar amplification is substantially larger in CMIP5 over the Sea of Okhotsk and the Bering Sea. Residual diagnostics identify one climate model in CMIP5 that has a notable influence on estimated emergent constraints over the Bering Sea and one in CMIP3 that that has a notable influence more widely along the sea ice edge and into midlatitudes over the western North Atlantic.

Full access
David B. Stephenson
and
Isaac M. Held

Abstract

The response of the Geophysical Fluid Dynamics Laboratory (GFDL) coupled ocean-atmosphere R15, 9-level GCM to gradually increasing C02 amounts is analyzed with emphasis on the changes in the stationary waves and storm tracks in the Northern Hemisphere wintertime troposphere. A large part of the change is described by an equivalent-barotropic stationary wave with a high over eastern Canada and a low over southern Alaska. Consistent with this, the Atlantic jet weakens near the North American coast.

Perpetual winter runs of an R15, nine-level atmospheric GCM with sea surface temperature, sea ice thickness, and soil moisture values prescribed from the coupled GCM results are able to reproduce the coupled model's response qualitatively. Consistent with the weakened baroclinicity associated with the stationary wave change, the Atlantic storm track weakens with increasing C02 concentrations while the Pacific storm track does not change in strength substantially.

An R15, nine-level atmospheric model linearized about the zonal time-mean state is used to analyze the contributions to the stationary wave response. With mountains, diabatic heating, and transient forcings the linear model gives a stationary wave change in qualitative agreement with the change seen in the coupled and perpetual models. Transients and diabatic heating appear to be the major forcing terms, while changes in zonal-mean basic state and topographic forcing play only a small role. A substantial part of the diabatic response is due to changes in tropical latent heating.

Full access
Full access
Christopher A. T. Ferro
and
David B. Stephenson

Abstract

Verifying forecasts of rare events is challenging, in part because traditional performance measures degenerate to trivial values as events become rarer. The extreme dependency score was proposed recently as a nondegenerating measure for the quality of deterministic forecasts of rare binary events. This measure has some undesirable properties, including being both easy to hedge and dependent on the base rate. A symmetric extreme dependency score was also proposed recently, but this too is dependent on the base rate. These two scores and their properties are reviewed and the meanings of several properties, such as base-rate dependence and complement symmetry that have caused confusion are clarified. Two modified versions of the extreme dependency score, the extremal dependence index, and the symmetric extremal dependence index, are then proposed and are shown to overcome all of its shortcomings. The new measures are nondegenerating, base-rate independent, asymptotically equitable, harder to hedge, and have regular isopleths that correspond to symmetric and asymmetric relative operating characteristic curves.

Full access
Edward C. D. Pope
,
David B. Stephenson
, and
David R. Jackson

Abstract

Categorical probabilistic prediction is widely used for terrestrial and space weather forecasting as well as for other environmental forecasts. One example is a warning system for geomagnetic disturbances caused by space weather, which are often classified on a 10-level scale. The simplest approach assumes that the transition probabilities are stationary in time—the homogeneous Markov chain (HMC). We extend this approach by developing a flexible nonhomogeneous Markov chain (NHMC) model using Bayesian nonparametric estimation to describe the time-varying transition probabilities. The transition probabilities are updated using a modified Bayes’s rule that gradually forgets transitions in the distant past, with a tunable memory parameter. The approaches were tested by making daily geomagnetic state forecasts at lead times of 1–4 days and were verified over the period 2000–19 using the rank probability score (RPS). Both HMC and NHMC models were found to be skillful at all lead times when compared with climatological forecasts. The NHMC forecasts with an optimal memory parameter of ~100 days were found to be substantially more skillful than the HMC forecasts, with an RPS skill for the NHMC of 10.5% and 5.6% for lead times of 1 and 4 days ahead, respectively. The NHMC is thus a viable alternative approach for forecasting geomagnetic disturbances and could provide a new benchmark for producing operational forecasts. The approach is generic and is applicable to other forecasts that include discrete weather regimes or hydrological conditions (e.g., wet and dry days).

Free access
Mathew Alexander Stiller-Reeve
,
David B. Stephenson
, and
Thomas Spengler

Abstract

For climate services to be relevant and informative for users, scientific data definitions need to match users’ perceptions or beliefs. This study proposes and tests novel yet simple methods to compare beliefs of timing of recurrent climatic events with empirical evidence from multiple historical time series. The methods are tested by applying them to the onset date of the monsoon in Bangladesh, where several scientific monsoon definitions can be applied, yielding different results for monsoon onset dates. It is a challenge to know which monsoon definition compares best with people’s beliefs. Time series from eight different scientific monsoon definitions in six regions are compared with respondent beliefs from a previously completed survey concerning the monsoon onset.

Beliefs about the timing of the monsoon onset are represented probabilistically for each respondent by constructing a probability mass function (PMF) from elicited responses about the earliest, normal, and latest dates for the event. A three-parameter circular modified triangular distribution (CMTD) is used to allow for the possibility (albeit small) of the onset at any time of the year. These distributions are then compared to the historical time series using two approaches: likelihood scores, and the mean and standard deviation of time series of dates simulated from each belief distribution.

The methods proposed give the basis for further iterative discussion with decision-makers in the development of eventual climate services. This study uses Jessore, Bangladesh, as an example and finds that a rainfall definition, applying a 10 mm day−1 threshold to NCEP–NCAR reanalysis (Reanalyis-1) data, best matches the survey respondents’ beliefs about monsoon onset.

Full access