Search Results
You are looking at 1 - 10 of 33 items for
- Author or Editor: David B. Stephenson x
- Refine by Access: All Content x
Abstract
This study investigates ways of quantifying the skill in forecasts of dichotomous weather events. The odds ratio, widely used in medical studies, can provide a powerful way of testing the association between categorical forecasts and observations. A skill score can be constructed from the odds ratio that is less sensitive to hedging than previously used scores. Furthermore, significance tests can easily be performed on the logarithm of the odds ratio to test whether the skill is purely due to chance sampling. Functions of the odds ratio and the Peirce skill score define a general class of skill scores that are symmetric with respect to taking the complement of the event. The study illustrates the ideas using Finley’s classic set of tornado forecasts.
Abstract
This study investigates ways of quantifying the skill in forecasts of dichotomous weather events. The odds ratio, widely used in medical studies, can provide a powerful way of testing the association between categorical forecasts and observations. A skill score can be constructed from the odds ratio that is less sensitive to hedging than previously used scores. Furthermore, significance tests can easily be performed on the logarithm of the odds ratio to test whether the skill is purely due to chance sampling. Functions of the odds ratio and the Peirce skill score define a general class of skill scores that are symmetric with respect to taking the complement of the event. The study illustrates the ideas using Finley’s classic set of tornado forecasts.
Abstract
Synoptic observations are often treated as error-free representations of the true state of the real world. For example, when observations are used to verify numerical weather prediction (NWP) forecasts, forecast–observation differences (the total error) are often entirely attributed to forecast inaccuracy. Such simplification is no longer justifiable for short-lead forecasts made with increasingly accurate higher-resolution models. For example, at least 25% of t + 6 h individual Met Office site-specific (postprocessed) temperature forecasts now typically have total errors of less than 0.2 K, which are comparable to typical instrument measurement errors of around 0.1 K. In addition to instrument errors, uncertainty is introduced by measurements not being taken concurrently with the forecasts. For example, synoptic temperature observations in the United Kingdom are typically taken 10 min before the hour, whereas forecasts are generally extracted as instantaneous values on the hour. This study develops a simple yet robust statistical modeling procedure for assessing how serially correlated subhourly variations limit the forecast accuracy that can be achieved. The methodology is demonstrated by application to synoptic temperature observations sampled every minute at several locations around the United Kingdom. Results show that subhourly variations lead to sizeable forecast errors of 0.16–0.44 K for observations taken 10 min before the forecast issue time. The magnitude of this error depends on spatial location and the annual cycle, with the greater errors occurring in the warmer seasons and at inland sites. This important source of uncertainty consists of a bias due to the diurnal cycle, plus irreducible uncertainty due to unpredictable subhourly variations that fundamentally limit forecast accuracy.
Abstract
Synoptic observations are often treated as error-free representations of the true state of the real world. For example, when observations are used to verify numerical weather prediction (NWP) forecasts, forecast–observation differences (the total error) are often entirely attributed to forecast inaccuracy. Such simplification is no longer justifiable for short-lead forecasts made with increasingly accurate higher-resolution models. For example, at least 25% of t + 6 h individual Met Office site-specific (postprocessed) temperature forecasts now typically have total errors of less than 0.2 K, which are comparable to typical instrument measurement errors of around 0.1 K. In addition to instrument errors, uncertainty is introduced by measurements not being taken concurrently with the forecasts. For example, synoptic temperature observations in the United Kingdom are typically taken 10 min before the hour, whereas forecasts are generally extracted as instantaneous values on the hour. This study develops a simple yet robust statistical modeling procedure for assessing how serially correlated subhourly variations limit the forecast accuracy that can be achieved. The methodology is demonstrated by application to synoptic temperature observations sampled every minute at several locations around the United Kingdom. Results show that subhourly variations lead to sizeable forecast errors of 0.16–0.44 K for observations taken 10 min before the forecast issue time. The magnitude of this error depends on spatial location and the annual cycle, with the greater errors occurring in the warmer seasons and at inland sites. This important source of uncertainty consists of a bias due to the diurnal cycle, plus irreducible uncertainty due to unpredictable subhourly variations that fundamentally limit forecast accuracy.
Abstract
Verification is an important part of any forecasting system. It is usually achieved by computing the value of some measure or score that indicates how good the forecasts are. Many possible verification measures have been proposed, and to choose between them a number of desirable properties have been defined. For probability forecasts of a binary event, two of the best known of these properties are propriety and equitability. A proof that the two properties are incompatible for a wide class of verification measures is given in this paper, after briefly reviewing the two properties and some recent attempts to improve properties for the well-known Brier skill score.
Abstract
Verification is an important part of any forecasting system. It is usually achieved by computing the value of some measure or score that indicates how good the forecasts are. Many possible verification measures have been proposed, and to choose between them a number of desirable properties have been defined. For probability forecasts of a binary event, two of the best known of these properties are propriety and equitability. A proof that the two properties are incompatible for a wide class of verification measures is given in this paper, after briefly reviewing the two properties and some recent attempts to improve properties for the well-known Brier skill score.
Abstract
The response of the Geophysical Fluid Dynamics Laboratory (GFDL) coupled ocean-atmosphere R15, 9-level GCM to gradually increasing C02 amounts is analyzed with emphasis on the changes in the stationary waves and storm tracks in the Northern Hemisphere wintertime troposphere. A large part of the change is described by an equivalent-barotropic stationary wave with a high over eastern Canada and a low over southern Alaska. Consistent with this, the Atlantic jet weakens near the North American coast.
Perpetual winter runs of an R15, nine-level atmospheric GCM with sea surface temperature, sea ice thickness, and soil moisture values prescribed from the coupled GCM results are able to reproduce the coupled model's response qualitatively. Consistent with the weakened baroclinicity associated with the stationary wave change, the Atlantic storm track weakens with increasing C02 concentrations while the Pacific storm track does not change in strength substantially.
An R15, nine-level atmospheric model linearized about the zonal time-mean state is used to analyze the contributions to the stationary wave response. With mountains, diabatic heating, and transient forcings the linear model gives a stationary wave change in qualitative agreement with the change seen in the coupled and perpetual models. Transients and diabatic heating appear to be the major forcing terms, while changes in zonal-mean basic state and topographic forcing play only a small role. A substantial part of the diabatic response is due to changes in tropical latent heating.
Abstract
The response of the Geophysical Fluid Dynamics Laboratory (GFDL) coupled ocean-atmosphere R15, 9-level GCM to gradually increasing C02 amounts is analyzed with emphasis on the changes in the stationary waves and storm tracks in the Northern Hemisphere wintertime troposphere. A large part of the change is described by an equivalent-barotropic stationary wave with a high over eastern Canada and a low over southern Alaska. Consistent with this, the Atlantic jet weakens near the North American coast.
Perpetual winter runs of an R15, nine-level atmospheric GCM with sea surface temperature, sea ice thickness, and soil moisture values prescribed from the coupled GCM results are able to reproduce the coupled model's response qualitatively. Consistent with the weakened baroclinicity associated with the stationary wave change, the Atlantic storm track weakens with increasing C02 concentrations while the Pacific storm track does not change in strength substantially.
An R15, nine-level atmospheric model linearized about the zonal time-mean state is used to analyze the contributions to the stationary wave response. With mountains, diabatic heating, and transient forcings the linear model gives a stationary wave change in qualitative agreement with the change seen in the coupled and perpetual models. Transients and diabatic heating appear to be the major forcing terms, while changes in zonal-mean basic state and topographic forcing play only a small role. A substantial part of the diabatic response is due to changes in tropical latent heating.
Abstract
Statistical relationships between future and historical model runs in multimodel ensembles (MMEs) are increasingly exploited to make more constrained projections of climate change. However, such emergent constraints may be spurious and can arise because of shared (common) errors in a particular MME or because of overly influential models. This study assesses the robustness of emergent constraints used for Arctic warming by comparison of such constraints in ensembles generated by the two most recent Coupled Model Intercomparison Project (CMIP) experiments: CMIP3 and CMIP5. An ensemble regression approach is used to estimate emergent constraints in Arctic wintertime surface air temperature change over the twenty-first century under the Special Report on Emission Scenarios (SRES) A1B scenario in CMIP3 and the Representative Concentration Pathway (RCP) 4.5 scenario in CMIP5. To take account of different scenarios, this study focuses on polar amplification by using temperature responses at each grid point that are scaled by the global mean temperature response for each climate model. In most locations, the estimated emergent constraints are reassuringly similar in CMIP3 and CMIP5 and differences could have easily arisen from sampling variation. However, there is some indication that the emergent constraint and polar amplification is substantially larger in CMIP5 over the Sea of Okhotsk and the Bering Sea. Residual diagnostics identify one climate model in CMIP5 that has a notable influence on estimated emergent constraints over the Bering Sea and one in CMIP3 that that has a notable influence more widely along the sea ice edge and into midlatitudes over the western North Atlantic.
Abstract
Statistical relationships between future and historical model runs in multimodel ensembles (MMEs) are increasingly exploited to make more constrained projections of climate change. However, such emergent constraints may be spurious and can arise because of shared (common) errors in a particular MME or because of overly influential models. This study assesses the robustness of emergent constraints used for Arctic warming by comparison of such constraints in ensembles generated by the two most recent Coupled Model Intercomparison Project (CMIP) experiments: CMIP3 and CMIP5. An ensemble regression approach is used to estimate emergent constraints in Arctic wintertime surface air temperature change over the twenty-first century under the Special Report on Emission Scenarios (SRES) A1B scenario in CMIP3 and the Representative Concentration Pathway (RCP) 4.5 scenario in CMIP5. To take account of different scenarios, this study focuses on polar amplification by using temperature responses at each grid point that are scaled by the global mean temperature response for each climate model. In most locations, the estimated emergent constraints are reassuringly similar in CMIP3 and CMIP5 and differences could have easily arisen from sampling variation. However, there is some indication that the emergent constraint and polar amplification is substantially larger in CMIP5 over the Sea of Okhotsk and the Bering Sea. Residual diagnostics identify one climate model in CMIP5 that has a notable influence on estimated emergent constraints over the Bering Sea and one in CMIP3 that that has a notable influence more widely along the sea ice edge and into midlatitudes over the western North Atlantic.
Abstract
Verifying forecasts of rare events is challenging, in part because traditional performance measures degenerate to trivial values as events become rarer. The extreme dependency score was proposed recently as a nondegenerating measure for the quality of deterministic forecasts of rare binary events. This measure has some undesirable properties, including being both easy to hedge and dependent on the base rate. A symmetric extreme dependency score was also proposed recently, but this too is dependent on the base rate. These two scores and their properties are reviewed and the meanings of several properties, such as base-rate dependence and complement symmetry that have caused confusion are clarified. Two modified versions of the extreme dependency score, the extremal dependence index, and the symmetric extremal dependence index, are then proposed and are shown to overcome all of its shortcomings. The new measures are nondegenerating, base-rate independent, asymptotically equitable, harder to hedge, and have regular isopleths that correspond to symmetric and asymmetric relative operating characteristic curves.
Abstract
Verifying forecasts of rare events is challenging, in part because traditional performance measures degenerate to trivial values as events become rarer. The extreme dependency score was proposed recently as a nondegenerating measure for the quality of deterministic forecasts of rare binary events. This measure has some undesirable properties, including being both easy to hedge and dependent on the base rate. A symmetric extreme dependency score was also proposed recently, but this too is dependent on the base rate. These two scores and their properties are reviewed and the meanings of several properties, such as base-rate dependence and complement symmetry that have caused confusion are clarified. Two modified versions of the extreme dependency score, the extremal dependence index, and the symmetric extremal dependence index, are then proposed and are shown to overcome all of its shortcomings. The new measures are nondegenerating, base-rate independent, asymptotically equitable, harder to hedge, and have regular isopleths that correspond to symmetric and asymmetric relative operating characteristic curves.
Abstract
Categorical probabilistic prediction is widely used for terrestrial and space weather forecasting as well as for other environmental forecasts. One example is a warning system for geomagnetic disturbances caused by space weather, which are often classified on a 10-level scale. The simplest approach assumes that the transition probabilities are stationary in time—the homogeneous Markov chain (HMC). We extend this approach by developing a flexible nonhomogeneous Markov chain (NHMC) model using Bayesian nonparametric estimation to describe the time-varying transition probabilities. The transition probabilities are updated using a modified Bayes’s rule that gradually forgets transitions in the distant past, with a tunable memory parameter. The approaches were tested by making daily geomagnetic state forecasts at lead times of 1–4 days and were verified over the period 2000–19 using the rank probability score (RPS). Both HMC and NHMC models were found to be skillful at all lead times when compared with climatological forecasts. The NHMC forecasts with an optimal memory parameter of ~100 days were found to be substantially more skillful than the HMC forecasts, with an RPS skill for the NHMC of 10.5% and 5.6% for lead times of 1 and 4 days ahead, respectively. The NHMC is thus a viable alternative approach for forecasting geomagnetic disturbances and could provide a new benchmark for producing operational forecasts. The approach is generic and is applicable to other forecasts that include discrete weather regimes or hydrological conditions (e.g., wet and dry days).
Abstract
Categorical probabilistic prediction is widely used for terrestrial and space weather forecasting as well as for other environmental forecasts. One example is a warning system for geomagnetic disturbances caused by space weather, which are often classified on a 10-level scale. The simplest approach assumes that the transition probabilities are stationary in time—the homogeneous Markov chain (HMC). We extend this approach by developing a flexible nonhomogeneous Markov chain (NHMC) model using Bayesian nonparametric estimation to describe the time-varying transition probabilities. The transition probabilities are updated using a modified Bayes’s rule that gradually forgets transitions in the distant past, with a tunable memory parameter. The approaches were tested by making daily geomagnetic state forecasts at lead times of 1–4 days and were verified over the period 2000–19 using the rank probability score (RPS). Both HMC and NHMC models were found to be skillful at all lead times when compared with climatological forecasts. The NHMC forecasts with an optimal memory parameter of ~100 days were found to be substantially more skillful than the HMC forecasts, with an RPS skill for the NHMC of 10.5% and 5.6% for lead times of 1 and 4 days ahead, respectively. The NHMC is thus a viable alternative approach for forecasting geomagnetic disturbances and could provide a new benchmark for producing operational forecasts. The approach is generic and is applicable to other forecasts that include discrete weather regimes or hydrological conditions (e.g., wet and dry days).
Abstract
Often there is a need to consider spatial weighting in methods for finding spatial patterns in climate data. The focus of this paper is on techniques that maximize variance, such as empirical orthogonal functions (EOFs). A weighting matrix is introduced into a generalized framework for dealing with spatial weighting. One basic principal in the design of the weighting matrix is that the resulting spatial patterns are independent of the grid used to represent the data. A weighting matrix can also be used for other purposes, such as to compensate for the neglect of unrepresented subgrid-scale variance or, in the form of a prewhitening filter, to maximize the signal-to-noise ratio of EOFs. The new methodology is applicable to other types of climate pattern analysis, such as extended EOF analysis and maximum covariance analysis. The increasing availability of large datasets of three-dimensional gridded variables (e.g., reanalysis products and model output) raises special issues for data-reduction methods such as EOFs. Fast, memory-efficient methods are required in order to extract leading EOFs from such large datasets. This study proposes one such approach based on a simple iteration of successive projections of the data onto time series and spatial maps. It is also demonstrated that spatial weighting can be combined with the iterative methods. Throughout the paper, multivariate statistics notation is used, simplifying implementation as matrix commands in high-level computing languages.
Abstract
Often there is a need to consider spatial weighting in methods for finding spatial patterns in climate data. The focus of this paper is on techniques that maximize variance, such as empirical orthogonal functions (EOFs). A weighting matrix is introduced into a generalized framework for dealing with spatial weighting. One basic principal in the design of the weighting matrix is that the resulting spatial patterns are independent of the grid used to represent the data. A weighting matrix can also be used for other purposes, such as to compensate for the neglect of unrepresented subgrid-scale variance or, in the form of a prewhitening filter, to maximize the signal-to-noise ratio of EOFs. The new methodology is applicable to other types of climate pattern analysis, such as extended EOF analysis and maximum covariance analysis. The increasing availability of large datasets of three-dimensional gridded variables (e.g., reanalysis products and model output) raises special issues for data-reduction methods such as EOFs. Fast, memory-efficient methods are required in order to extract leading EOFs from such large datasets. This study proposes one such approach based on a simple iteration of successive projections of the data onto time series and spatial maps. It is also demonstrated that spatial weighting can be combined with the iterative methods. Throughout the paper, multivariate statistics notation is used, simplifying implementation as matrix commands in high-level computing languages.