• Balsamo, G., and et al. , 2015: ERA-Interim/Land: A global land surface reanalysis data set. Hydrol. Earth Syst. Sci., 19, 389407, doi:10.5194/hess-19-389-2015.

    • Search Google Scholar
    • Export Citation
  • Bellprat, O., , S. Kotlarski, , D. Lüthi, , and C. Schär, 2013: Physical constraints for temperature biases in climate models. Geophys. Res. Lett., 40, 40424047, doi:10.1002/grl.50737.

    • Search Google Scholar
    • Export Citation
  • Cohen, J., 1992: Statistical power analysis. Curr. Dir. Psychol. Sci., 1, 98101, doi:10.1111/1467-8721.ep10768783.

  • Dee, D. P., and et al. , 2011: The ERA-Interim reanalysis: Configuration and performance of the data assimilation system. Quart. J. Roy. Meteor. Soc., 137, 553597, doi:10.1002/qj.828.

    • Search Google Scholar
    • Export Citation
  • DelSole, T., , and M. K. Tippett, 2014: Comparing forecast skill. Mon. Wea. Rev., 142, 46584678, doi:10.1175/MWR-D-14-00045.1.

  • Diebold, F. X., , and R. S. Mariano, 1995: Comparing predictive accuracy. J. Bus. Econ. Stat., 13 (3), 134144, doi:10.1198/073500102753410444.

    • Search Google Scholar
    • Export Citation
  • Doblas-Reyes, F. J., , J. García-Serrano, , F. Lienert, , A. P. Biescas, , and L. R. L. Rodrigues, 2013a: Seasonal climate predictability and forecasting: Status and prospects. Wiley Interdiscip. Rev.: Climate Change, 4, 245268, doi:10.1002/wcc.217.

    • Search Google Scholar
    • Export Citation
  • Doblas-Reyes, F. J., and et al. , 2013b: Initialized near-term regional climate change prediction. Nat. Commun., 4, 1715, doi:10.1038/ncomms2704.

    • Search Google Scholar
    • Export Citation
  • Du, H., , F. Doblas-Reyes, , J. Garca-Serrano, , V. Guemas, , Y. Soufflet, , and B. Wouters, 2012: Sensitivity of decadal predictions to the initial atmospheric and oceanic perturbations. Climate Dyn., 39, 20132023, doi:10.1007/s00382-011-1285-9.

    • Search Google Scholar
    • Export Citation
  • Ferry, N., and et al. , 2012: GLORYS2V1 global ocean reanalysis of the altimetric era (1993–2009) at meso scale. Mercator Ocean Quart. Newsl., 44, 28–39. [Available online at http://www.mercator-ocean.fr/wp-content/uploads/2015/05/Mercator-Ocean-newsletter-2012_44.pdf.]

  • Guemas, V., , F. J. Doblas-Reyes, , K. Mogensen, , S. Keeley, , and Y. Tang, 2014: Ensemble of sea ice initial conditions for interannual climate predictions. Climate Dyn., 43, 28132829, doi:10.1007/s00382-014-2095-7.

    • Search Google Scholar
    • Export Citation
  • Hazeleger, W., and et al. , 2012: EC-Earth V2.2: Description and validation of a new seamless earth system prediction model. Climate Dyn., 39, 26112629, doi:10.1007/s00382-011-1228-5.

    • Search Google Scholar
    • Export Citation
  • Hurrell, J. W., 1995: Decadal trends in the North Atlantic Oscillation: Regional temperatures and precipitation. Science, 269, 676679, doi:10.1126/science.269.5224.676.

    • Search Google Scholar
    • Export Citation
  • IPCC, 2012: Managing the Risks of Extreme Events and Disasters to Advance Climate Change Adaptation. Cambridge University Press, 582 pp. [Available online at http://ipcc-wg2.gov/SREX/report/full-report/.]

  • Jolliffe, I. T., 2007: Uncertainty and inference for verification measures. Wea. Forecasting, 22, 637650, doi:10.1175/WAF989.1.

  • Jolliffe, I. T., , and D. B. Stephenson, Eds., 2012: Forecast Verification: A Practitioner’s Guide in Atmospheric Science. 2nd ed. John Wiley & Sons, 292 pp.

  • Keenlyside, N. S., , M. Latif, , J. Jungclaus, , L. Kornblueh, , and E. Roeckner, 2008: Advancing decadal-scale climate prediction in the North Atlantic sector. Nature, 453, 8488, doi:10.1038/nature06921.

    • Search Google Scholar
    • Export Citation
  • Koster, R. D., and et al. , 2004: Regions of strong coupling between soil moisture and precipitation. Science, 305, 11381140, doi:10.1126/science.1100217.

    • Search Google Scholar
    • Export Citation
  • Livezey, R. E., , and W. Chen, 1983: Statistical field significance and its determination by Monte Carlo techniques. Mon. Wea. Rev., 111, 4659, doi:10.1175/1520-0493(1983)111<0046:SFSAID>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Merchant, C. J., and et al. , 2014: Sea surface temperature datasets for climate applications from Phase 1 of the European Space Agency Climate Change Initiative (SST CCI). Geosci. Data J., 1, 179–191, doi:10.1002/gdj3.20.

  • Murphy, A. H., , and E. S. Epstein, 1989: Skill scores and correlation coefficients in model verification. Mon. Wea. Rev., 117, 572582, doi:10.1175/1520-0493(1989)117<0572:SSACCI>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Pepler, A. S., , L. B. Díaz, , C. Prodhomme, , F. J. Doblas-Reyes, , and A. Kumar, 2015: The ability of a multi-model seasonal forecasting ensemble to forecast the frequency of warm, cold and wet extremes. Wea. Climate Extremes, 9, 6877, doi:10.1016/j.wace.2015.06.005.

    • Search Google Scholar
    • Export Citation
  • Rodgers, J. L., , and W. A. Nicewander, 1988: Thirteen ways to look at the correlation coefficient. Amer. Stat., 42, 5966, doi:10.2307/2685263.

    • Search Google Scholar
    • Export Citation
  • Shi, W., , N. Schaller, , D. MacLeod, , T. N. Palmer, , and A. Weisheimer, 2015: Impact of hindcast length on estimates of seasonal climate predictability. Geophys. Res. Lett., 42, 15541559, doi:10.1002/2014GL062829.

  • Siegert, S., , D. B. Stephenson, , P. G. Sansom, , A. A. Scaife, , R. Eade, , and A. Arribas, 2016: A Bayesian framework for verification and recalibration of ensemble forecasts: How uncertain is NAO predictability? J. Climate, 29, 9951012, doi:10.1175/JCLI-D-15-0196.1.

    • Search Google Scholar
    • Export Citation
  • Steiger, J. H., 1980: Tests for comparing elements of a correlation matrix. Psychol. Bull., 87, 245251, doi:10.1037/0033-2909.87.2.245.

    • Search Google Scholar
    • Export Citation
  • Trenberth, K. E., 1997: The definition of El Niño. Bull. Amer. Meteor. Soc., 78, 27712777, doi:10.1175/1520-0477(1997)078<2771:TDOENO>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Von Storch, H., , and F. W. Zwiers, 2001: Statistical Analysis in Climate Research.Cambridge University Press, 496 pp.

  • Wilks, D., 2010: Sampling distributions of the Brier score and Brier skill score under serial dependence. Quart. J. Roy. Meteor. Soc., 136, 21092118, doi:10.1002/qj.709.

    • Search Google Scholar
    • Export Citation
  • Wilks, D., 2011: Statistical Methods in the Atmospheric Sciences. Vol. 100, Academic Press, 704 pp.

  • Williams, E. J., 1959: The comparison of regression variables. J. Roy. Stat. Soc. B, 21 (2), 396399. [Available online at http://www.jstor.org/stable/2983809.]

    • Search Google Scholar
    • Export Citation
  • Zhang, J., , L. Wu, , and W. Dong, 2011: Land-atmosphere coupling and summer climate variability over East Asia. J. Geophys. Res., 116, D05117, doi:10.1029/2010JD014714.

    • Search Google Scholar
    • Export Citation
  • Zou, G. Y., 2007: Toward using confidence intervals to compare correlations. Psychol. Methods, 12, 399413, doi:10.1037/1082-989X.12.4.399.

    • Search Google Scholar
    • Export Citation
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 206 206 39
PDF Downloads 210 210 47

Detecting Improvements in Forecast Correlation Skill: Statistical Testing and Power Analysis

View More View Less
  • 1 Exeter Climate Systems, University of Exeter, Exeter, United Kingdom
  • | 2 Earth Sciences Department, Barcelona Supercomputing Center-Centro Nacional de Supercomputación (BSC-CNS), Barcelona, Spain
  • | 3 Exeter Climate Systems, University of Exeter, Exeter, United Kingdom
  • | 4 ICREA, and Earth Sciences Department, Barcelona Supercomputing Center-Centro Nacional de Supercomputación (BSC-CNS), Barcelona, Spain
© Get Permissions
Restricted access

Abstract

The skill of weather and climate forecast systems is often assessed by calculating the correlation coefficient between past forecasts and their verifying observations. Improvements in forecast skill can thus be quantified by correlation differences. The uncertainty in the correlation difference needs to be assessed to judge whether the observed difference constitutes a genuine improvement, or is compatible with random sampling variations. A widely used statistical test for correlation difference is known to be unsuitable, because it assumes that the competing forecasting systems are independent. In this paper, appropriate statistical methods are reviewed to assess correlation differences when the competing forecasting systems are strongly correlated with one another. The methods are used to compare correlation skill between seasonal temperature forecasts that differ in initialization scheme and model resolution. A simple power analysis framework is proposed to estimate the probability of correctly detecting skill improvements, and to determine the minimum number of samples required to reliably detect improvements. The proposed statistical test has a higher power of detecting improvements than the traditional test. The main examples suggest that sample sizes of climate hindcasts should be increased to about 40 years to ensure sufficiently high power. It is found that seasonal temperature forecasts are significantly improved by using realistic land surface initial conditions.

Corresponding author address: Stefan Siegert, Exeter Climate Systems, University of Exeter, Exeter EX4 4QF, United Kingdom. E-mail: s.siegert@exeter.ac.uk

Abstract

The skill of weather and climate forecast systems is often assessed by calculating the correlation coefficient between past forecasts and their verifying observations. Improvements in forecast skill can thus be quantified by correlation differences. The uncertainty in the correlation difference needs to be assessed to judge whether the observed difference constitutes a genuine improvement, or is compatible with random sampling variations. A widely used statistical test for correlation difference is known to be unsuitable, because it assumes that the competing forecasting systems are independent. In this paper, appropriate statistical methods are reviewed to assess correlation differences when the competing forecasting systems are strongly correlated with one another. The methods are used to compare correlation skill between seasonal temperature forecasts that differ in initialization scheme and model resolution. A simple power analysis framework is proposed to estimate the probability of correctly detecting skill improvements, and to determine the minimum number of samples required to reliably detect improvements. The proposed statistical test has a higher power of detecting improvements than the traditional test. The main examples suggest that sample sizes of climate hindcasts should be increased to about 40 years to ensure sufficiently high power. It is found that seasonal temperature forecasts are significantly improved by using realistic land surface initial conditions.

Corresponding author address: Stefan Siegert, Exeter Climate Systems, University of Exeter, Exeter EX4 4QF, United Kingdom. E-mail: s.siegert@exeter.ac.uk
Save