The effect of ensemble size on the Mean Squared Error and Spread-Error relationship

Arlan Dirkson a Data Assimilation and Satellite Meteorology, Environment and Climate Change Canada

Search for other papers by Arlan Dirkson in
Current site
Google Scholar
PubMed
Close
and
Mark Buehner a Data Assimilation and Satellite Meteorology, Environment and Climate Change Canada

Search for other papers by Mark Buehner in
Current site
Google Scholar
PubMed
Close
Restricted access

Abstract

Most ensemble verification diagnostics are sensitive to ensemble size, complicating the evaluation of a system’s underlying quality and the comparison of different ensemble systems. This study examines how the Mean Squared Error (MSE) of the ensemble mean, and the Spread-Error relationship used to evaluate ensemble consistency, vary as a function of ensemble size. As the MSE of the ensemble mean (“Error” in Spread-Error) is affected by ensemble size, but the average sample ensemble variance (“Spread” in Spread-Error) is not, these effects must be removed from the MSE for a robust assessment of ensemble consistency. Although the dependence of these diagnostics on ensemble size has been sparsely addressed over several decades, gaps remain concerning the assumptions necessary for quantification. Evidence also suggests these effects are not widely known or fully understood. The impact of ensemble size is examined by assuming exchangeability between ensemble members, allowing us to derive the MSE and Spread-Error relationship (expressed as a difference) that would be obtained with an infinite-sized ensemble. Ensemble-size effects are removed from both scores by subtracting the average ensemble variance divided by the ensemble size. The unbiased MSE can be used to estimate the error reduction achievable by increasing ensemble size and allows for an “apples-to-apples” comparison of forecast error across systems. The unbiased Spread-Error relationship eliminates the effects of ensemble size on the original diagnostic, which, when ignored, make ensembles appear too underdispersive.

© 2025 American Meteorological Society. This is an Author Accepted Manuscript distributed under the terms of the default AMS reuse license. For information regarding reuse and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Arlan Dirkson, arlan.dirkson@ec.gc.ca

Abstract

Most ensemble verification diagnostics are sensitive to ensemble size, complicating the evaluation of a system’s underlying quality and the comparison of different ensemble systems. This study examines how the Mean Squared Error (MSE) of the ensemble mean, and the Spread-Error relationship used to evaluate ensemble consistency, vary as a function of ensemble size. As the MSE of the ensemble mean (“Error” in Spread-Error) is affected by ensemble size, but the average sample ensemble variance (“Spread” in Spread-Error) is not, these effects must be removed from the MSE for a robust assessment of ensemble consistency. Although the dependence of these diagnostics on ensemble size has been sparsely addressed over several decades, gaps remain concerning the assumptions necessary for quantification. Evidence also suggests these effects are not widely known or fully understood. The impact of ensemble size is examined by assuming exchangeability between ensemble members, allowing us to derive the MSE and Spread-Error relationship (expressed as a difference) that would be obtained with an infinite-sized ensemble. Ensemble-size effects are removed from both scores by subtracting the average ensemble variance divided by the ensemble size. The unbiased MSE can be used to estimate the error reduction achievable by increasing ensemble size and allows for an “apples-to-apples” comparison of forecast error across systems. The unbiased Spread-Error relationship eliminates the effects of ensemble size on the original diagnostic, which, when ignored, make ensembles appear too underdispersive.

© 2025 American Meteorological Society. This is an Author Accepted Manuscript distributed under the terms of the default AMS reuse license. For information regarding reuse and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Arlan Dirkson, arlan.dirkson@ec.gc.ca
Save