A “meta” analysis of the Fractions Skill Score: The limiting case and implications for aggregation

View More View Less
  • 1 Weather Science, Met Office, Exeter, United Kingdom
© Get Permissions
Full access

Abstract

The Fractions Skill Score (FSS) is arguably one of the most popular spatial verification metrics in use today. The fraction of grid points exceeding a threshold within a forecast and observed field neighbourhood are examined to compute a score. By definition a perfect forecast has a FSS of 1, and a “no skill” forecast has a score of 0.

It is shown that the denominator defines the score’s characteristics. The FSS is undefined for instances where both the forecast and the observed field do not exceed a threshold. In the limiting case, the FSS for a perfect null (zero) forecast is also undefined, unless a threshold of ≥ 0 is used, in which case it would be 1 (i.e. perfect). Furthermore the FSS is 0 if either the forecast or the observed field does not exceed a threshold. This symmetry means it cannot differentiate between what are traditionally referred to as false alarms or misses. Additional supplementary information is required. The FSS is greater than 0 if and only if there are values exceeding a given threshold in both the forecast and the observed field.

The magnitude of an overall score computed over many forecasts is sensitive to the pooling method. Zero scores are non-trivial. Excluding them implies excluding all situations associated with false alarms or misses. Omitting near-zero scores is a more credible decision, but only if it can be proven that these are related to spurious artefacts in the observed field. To avoid ambiguity the components of the FSS should be aggregated separately for computing an overall score for most applications and purposes.

Corresponding author address: Met Office, FitzRoy Road, Exeter, EX1 3PB, United Kingdom. E-mail: marion.mittermaier@metoffice.gov.uk

Abstract

The Fractions Skill Score (FSS) is arguably one of the most popular spatial verification metrics in use today. The fraction of grid points exceeding a threshold within a forecast and observed field neighbourhood are examined to compute a score. By definition a perfect forecast has a FSS of 1, and a “no skill” forecast has a score of 0.

It is shown that the denominator defines the score’s characteristics. The FSS is undefined for instances where both the forecast and the observed field do not exceed a threshold. In the limiting case, the FSS for a perfect null (zero) forecast is also undefined, unless a threshold of ≥ 0 is used, in which case it would be 1 (i.e. perfect). Furthermore the FSS is 0 if either the forecast or the observed field does not exceed a threshold. This symmetry means it cannot differentiate between what are traditionally referred to as false alarms or misses. Additional supplementary information is required. The FSS is greater than 0 if and only if there are values exceeding a given threshold in both the forecast and the observed field.

The magnitude of an overall score computed over many forecasts is sensitive to the pooling method. Zero scores are non-trivial. Excluding them implies excluding all situations associated with false alarms or misses. Omitting near-zero scores is a more credible decision, but only if it can be proven that these are related to spurious artefacts in the observed field. To avoid ambiguity the components of the FSS should be aggregated separately for computing an overall score for most applications and purposes.

Corresponding author address: Met Office, FitzRoy Road, Exeter, EX1 3PB, United Kingdom. E-mail: marion.mittermaier@metoffice.gov.uk
Save