Search Results

You are looking at 1 - 10 of 28 items for

  • Author or Editor: Simon Mason x
  • All content x
Clear All Modify Search
Simon J. Mason

Abstract

The Brier and ranked probability skill scores are widely used as skill metrics of probabilistic forecasts of weather and climate. As skill scores, they compare the extent to which a forecast strategy outperforms a (usually simpler) reference forecast strategy. The most widely used reference strategy is that of “climatology,” in which the climatological probability (or probabilities in the case of the ranked probability skill score) of the forecast variable is issued perpetually. The Brier and ranked probability skill scores are often considered harsh standards. It is shown that the scores are harsh because the expected value of these skill scores is less than 0 if nonclimatological forecast probabilities are issued. As a result, negative skill scores can often hide useful information content in the forecasts. An alternative formulation of the skill scores based on a reference strategy in which the outcome is independent of the forecast is equivalent to using randomly assigned probabilities but is not strictly proper. Nevertheless, positive values of the Brier skill score with random guessing as a strategy correspond to positive-sloping reliability curves, which is intuitively appealing because of the implication that the conditional probability of the forecast event increases as the forecast probability increases.

Full access
Simon J. Mason and Lisa Goddard

Extreme phases of the El Niño–Southern Oscillation (ENSO) phenomenon have been blamed for precipitation anomalies in many areas of the world. In some areas the probability of above-normal precipitation may be increased during warm or cold events, while in others below-normal precipitation may be more likely. The percentages of times that seasonal precipitation over land areas was above, near, and below normal during the eight strongest El Niño and La Nina episodes are tabulated, and the significance levels of the posterior probabilities are calculated using the hypergeometric distribution. These frequencies may provide a useful starting point for probabilistic climate forecasts during strong ENSO events. Areas with significantly high or low frequencies or above- or below-normal precipitation are highlighted, and attempts are made to estimate the proportion of land areas with significant ENSO-related precipitation signals.

There is a danger of overstating the global impact of ENSO events because only about 20%–30% of land areas experience significantly increased probabilities of above- or below-normal seasonal precipitation during at least some part of the year. Since different areas are affected at different times of the year, the fraction of global land affected in any particular season is only about 15%—25%. The danger of focusing on the impact of only warm-phase events is emphasized also: the global impact of La Nina seems to be at least as widespread as that of El Niño. Furthermore, there are a number of notable asymmetries in precipitation responses to El Niño and La Nina events. For many areas it should not be assumed that the typical climate anomaly of one ENSO extreme is likely to be the opposite of the other extreme. A high frequency of above-normal precipitation during strong El Niño conditions, for example, does not guarantee a high frequency of below-normal precipitation during La Nina events, or vice versa. On a global basis El Niño events are predominantly associated with below-normal seasonal precipitation over land, whereas La Nina events result in a wider extent of above-normal precipitation.

Full access
Bradfield Lyon and Simon J. Mason

Abstract

Following the onset of the strong El Niño of 1997–98 historical rainfall teleconnection patterns and dynamical model predictions both suggested an enhanced likelihood of drought for southern Africa, but widespread dry conditions failed to materialize. Results from a diagnostic study of NCEP–NCAR reanalysis data are reported here demonstrating how the large- and regional-scale atmospheric circulations during the 1997–98 El Niño differed from previous events. Emphasis is placed on the January–March 1998 season and comparisons with the strong 1982–83 El Niño, although composites of eight events occurring between 1950 and 2000 are also considered. In a companion paper, simulation runs from three atmospheric general circulation models (AGCMs), and forecasts from three fully coupled models are employed to investigate the extent to which the anomalous atmospheric circulation patterns during the 1997–98 El Niño may have been anticipated.

Observational results indicate that the 1997–98 El Niño displayed significant differences from both the 1982–83 episode and the composite event. An unusually strong Angola low, exceptionally high sea surface temperatures (SSTs) in the western Indian and eastern tropical South Atlantic Oceans, and an enhanced northerly moisture flux from the continental interior and the western tropical Indian Ocean all appear to have contributed to more seasonal rainfall in 1997–98 over much of the southern Africa subcontinent than in past El Niño events.

Full access
Bradfield Lyon and Simon J. Mason

Abstract

This is the second of a two-part investigation of rainfall in southern Africa during the strong El Niño of 1997/98. In Part I it was shown that widespread drought in southern Africa, typical of past El Niño events occurring between 1950 and 2000, generally failed to materialize during the 1997/98 El Niño, most notably during January–March (JFM) 1998. Here output from three atmospheric general circulation models (AGCMs) forced with observed sea surface temperatures (SSTs) and seasonal forecasts from three coupled models are examined to see to what extent conditions in JFM 1998 could have potentially been anticipated.

All three AGCMs generated widespread drought conditions across southern Africa, similar to those during past El Niño events, and did a generally poor job in generating the observed rainfall and atmospheric circulation anomaly patterns, particularly over the eastern and southern Indian Ocean. In contrast, two of the three coupled models showed a higher probability of wetter conditions in JFM 1998 than for past El Niño events, with an enhanced moisture flux from the Indian Ocean, as was observed. However, neither the AGCMs nor the coupled models generated anomalous stationary wave patterns consistent with observations over the South Atlantic and Pacific. The failure of any of the models to reproduce an enhanced Angola low (favoring rainfall) associated with an anomalous wave train in this region suggests that the coupled models that did indicate wetter conditions in JFM 1998 compared to previous El Niño episodes may have done so, at least partially, for the wrong reasons. The general inability of the climate models used in this study to generate key features of the seasonal climate over southern Africa in JFM 1998 suggests that internal atmospheric variability contributed to the observed rainfall and circulation patterns that year. With the caveat that current climate models may not properly respond to SST boundary forcing important to simulating southern Africa climate, this study finds that the JFM 1998 rainfall in southern Africa may have been largely unpredictable on seasonal time scales.

Full access
Simon J. Mason and Andreas P. Weigel

Abstract

There are numerous reasons for calculating forecast verification scores, and considerable attention has been given to designing and analyzing the properties of scores that can be used for scientific purposes. Much less attention has been given to scores that may be useful for administrative reasons, such as communicating changes in forecast quality to bureaucrats and providing indications of forecast quality to the general public. The two-alternative forced choice (2AFC) test is proposed as a scoring procedure that is sufficiently generic to be usable on forecasts ranging from simple yes–no forecasts of dichotomous outcomes to forecasts of continuous variables, and can be used with deterministic or probabilistic forecasts without seriously reducing the more complex information when available. Although, as with any single verification score, the proposed test has limitations, it does have broad intuitive appeal in that the expected score of an unskilled set of forecasts (random guessing or perpetually identical forecasts) is 50%, and is interpretable as an indication of how often the forecasts are correct, even when the forecasts are expressed probabilistically and/or the observations are not discrete.

Full access
Willem A. Landman and Simon J. Mason

Abstract

The skill of global-scale sea surface temperature forecasts using a statistically based linear forecasting technique is investigated. Canonical variates are used to make monthly sea surface temperature anomaly forecasts using evolutionary and steady-state features of antecedent sea surface temperatures as predictors. Levels of forecast skill are investigated over several months' lead time by comparing the model performance with a simple forecast strategy involving the persistence of sea surface temperature anomalies. Forecast skill is investigated over an independent test period of 18 yr (1982/83–1999/2000), for which the model training period was updated after every 3 yr. Forecasts for the equatorial Pacific Ocean are a significant improvement over a strategy of random guessing, and outscore forecasts of persisted anomalies beyond lead times of about one season during the development stages of the El Niño–Southern Oscillation phenomenon, but only outscore forecasts of persisted anomalies beyond 6 months' lead time during its most intense phase. Model predictions of the tropical Indian Ocean outscore persistence during the second half of the boreal winter, that is, from about December or January, with maximum skill during the March–May spring season, but poor skill during the autumn months from September to November. Some loss in predictability of the equatorial Pacific and Indian Oceans is evident during the early and mid-1990s, but forecasts appear to have improved in the last few years. The tropical Atlantic Ocean forecast skill has generally been poor. There is little evidence of forecast skill over the midlatitudes in any of the oceans. However, during the spring months significant skill has been found over the Indian Ocean as far south as 20°S and over the southern North Atlantic as far north as 30°N, both of which outscore persistence beyond a lead time of less than about one season.

Full access
Simon J. Mason and Nicholas E. Graham

Abstract

The relative operating characteristic (ROC) curve is a highly flexible method for representing the quality of dichotomous, categorical, continuous, and probabilistic forecasts. The method is based on ratios that measure the proportions of events and nonevents for which warnings were provided. These ratios provide estimates of the probabilities that an event will be forewarned and that an incorrect warning will be provided for a nonevent. Some guidelines for interpreting the ROC curve are provided. While the ROC curve is of direct interest to the user, the warning is provided in advance of the outcome and so there is additional value in knowing the probability of an event occurring contingent upon a warning being provided or not provided. An alternative method to the ROC curve is proposed that represents forecast quality when expressed in terms of probabilities of events occurring contingent upon the warnings provided. The ratios used provide estimates of the probability of an event occurring given the forecast that is issued. Some problems in constructing the curve in a manner that is directly analogous to that for the ROC curve are highlighted, and so an alternative approach is proposed. In the context of probabilistic forecasts, the ROC curve provides a means of identifying the forecast probability at which forecast value is optimized. In the context of continuous variables, the proposed relative operating levels curve indicates the exceedence threshold for defining an event at which forecast skill is optimized, and can enable the forecast user to estimate the probabilities of events other than that defined by the forecaster.

Full access
Simon J. Mason and Gillian M. Mimmack

Abstract

Numerous models have been developed in recent years to provide predictions of the state of the El Niño–Southern Oscillation (ENSO) phenomenon. Predictions of the ENSO phenomenon are usually presented in deterministic form, but because of the inherent uncertainty involved probabilistic forecasts should be provided. In this paper, various statistical methods are used to calculate probabilities for monthly Niño-3.4 anomalies within predefined ranges, or categories. The statistical methods used are predictive discriminant analysis, canonical variate analysis, and various forms of generalized linear models. In addition, probabilistic forecasts are derived from a multiple regression model by using contingency tables and from the model's prediction intervals. By using identical sets of predictors and predictands, the methods are compared in terms of their performance over an independent retroactive forecast period, which includes the 1980s and 1990s. The models outperform persistence and damped persistence as reference forecast strategies at some times of the year. The models have greatest skill in predicting El Niño, although La Niña is predicted with greater skill at longer lead times and with greater reliability. The forecasts for the ENSO extremes are reasonably well calibrated, and so the forecast probabilities are reliable estimates of forecast uncertainty.

All models are wrong, but some are useful. G. Box

Full access
Anthony G. Barnston and Simon J. Mason

Abstract

This paper evaluates the quality of real-time seasonal probabilistic forecasts of the extreme 15% tails of the climatological distribution of temperature and precipitation issued by the International Research Institute for Climate and Society (IRI) from 1998 through 2009. IRI’s forecasts have been based largely on a two-tiered multimodel dynamical prediction system. Forecasts of the 15% extremes have been consistent with the corresponding probabilistic forecasts for the standard tercile-based categories; however, nonclimatological forecasts for the extremes have been issued sparingly. Results indicate positive skill in terms of resolution and discrimination for the extremes forecasts, particularly in the tropics. Additionally, with the exception of some overconfidence for extreme above-normal precipitation and a strong cool bias for temperature, reliability analyses suggest generally good calibration. Skills for temperature are generally higher than those for precipitation, due both to correct forecasts of increased probabilities of extremely high (above the upper 15th percentile) temperatures associated with warming trends, and to better discrimination of interannual variability. However, above-normal temperature extremes were substantially underforecast, as noted also for the IRI’s tercile forecasts.

Full access
Andreas P. Weigel and Simon J. Mason

Abstract

This article refers to the study of Mason and Weigel, where the generalized discrimination score D has been introduced. This score quantifies whether a set of observed outcomes can be correctly discriminated by the corresponding forecasts (i.e., it is a measure of the skill attribute of discrimination). Because of its generic definition, D can be adapted to essentially all relevant verification contexts, ranging from simple yes–no forecasts of binary outcomes to probabilistic forecasts of continuous variables. For most of these cases, Mason and Weigel have derived expressions for D, many of which have turned out to be equivalent to scores that are already known under different names. However, no guidance was provided on how to calculate D for ensemble forecasts. This gap is aggravated by the fact that there are currently very few measures of forecast quality that could be directly applied to ensemble forecasts without requiring that probabilities be derived from the ensemble members prior to verification. This study seeks to close this gap. A definition is proposed of how ensemble forecasts can be ranked; the ranks of the ensemble forecasts can then be used as a basis for attempting to discriminate between corresponding observations. Given this definition, formulations of D are derived that are directly applicable to ensemble forecasts.

Full access