• Benjamini, Y., and Y. Hochberg, 1995: Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Stat. Soc., B57 , 289300.

    • Search Google Scholar
    • Export Citation
  • Fisher, R. A., 1929: Tests of significance in harmonic analysis. Proc. Roy. Soc. London, A125 , 5459.

  • Folland, C., and C. Anderson, 2002: Estimating changing extremes using empirical ranking methods. J. Climate, 15 , 29542960.

  • Gumbel, E. J., 1958: Statistics of Extremes. Columbia University Press, 375 pp.

  • Johnson, R. A., and D. W. Wichern, 2002: Applied Multivariate Statistical Analysis. 5th ed. Prentice Hall, 767 pp.

  • Katz, R. W., 2002: Sir Gilbert Walker and a connection between El Niño and statistics. Stat. Sci., 17 , 97112.

  • Katz, R. W., and B. G. Brown, 1991: The problem of multiplicity in research on teleconnections. Int. J. Climatol., 11 , 505513.

  • Lindgren, B. W., 1976: Statistical Theory. MacMillan, 614 pp.

  • Livezey, R. E., and W. Y. Chen, 1983: Statistical field significance and its determination by Monte Carlo techniques. Mon. Wea. Rev., 111 , 4659.

    • Search Google Scholar
    • Export Citation
  • Ventura, V., C. J. Paciorek, and J. S. Risbey, 2004: Controlling the proportion of falsely rejected hypotheses when conducting multiple tests with climatological data. J. Climate, 17 , 43434356.

    • Search Google Scholar
    • Export Citation
  • von Storch, H., 1982: A remark on Chervin-Schneider’s algorithm to test significance of climate experiments with GCM’s. J. Atmos. Sci., 39 , 187189.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 1997: Resampling hypothesis tests for autocorrelated fields. J. Climate, 10 , 6582.

  • Wilks, D. S., 2006: Statistical Methods in the Atmospheric Sciences. 2d ed. International Geophysics Series, Vol. 91, Academic Press, 627 pp.

    • Search Google Scholar
    • Export Citation
  • Zwiers, F. W., 1987: Statistical considerations for climate experiments. Part II: Multivariate tests. J. Climate Appl. Meteor., 26 , 477487.

    • Search Google Scholar
    • Export Citation
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 1641 911 64
PDF Downloads 1296 709 43

On “Field Significance” and the False Discovery Rate

View More View Less
  • 1 Department of Earth and Atmospheric Sciences, Cornell University, Ithaca, New York
Restricted access

Abstract

The conventional approach to evaluating the joint statistical significance of multiple hypothesis tests (i.e., “field,” or “global,” significance) in meteorology and climatology is to count the number of individual (or “local”) tests yielding nominally significant results and then to judge the unusualness of this integer value in the context of the distribution of such counts that would occur if all local null hypotheses were true. The sensitivity (i.e., statistical power) of this approach is potentially compromised both by the discrete nature of the test statistic and by the fact that the approach ignores the confidence with which locally significant tests reject their null hypotheses. An alternative global test statistic that has neither of these problems is the minimum p value among all of the local tests. Evaluation of field significance using the minimum local p value as the global test statistic, which is also known as the Walker test, has strong connections to the joint evaluation of multiple tests in a way that controls the “false discovery rate” (FDR, or the expected fraction of local null hypothesis rejections that are incorrect). In particular, using the minimum local p value to evaluate field significance at a level αglobal is nearly equivalent to the slightly more powerful global test based on the FDR criterion. An additional advantage shared by Walker’s test and the FDR approach is that both are robust to spatial dependence within the field of tests. The FDR method not only provides a more broadly applicable and generally more powerful field significance test than the conventional counting procedure but also allows better identification of locations with significant differences, because fewer than αglobal × 100% (on average) of apparently significant local tests will have resulted from local null hypotheses that are true.

Corresponding author address: D. S. Wilks, Department of Earth and Atmospheric Sciences, Cornell University, Ithaca, NY 14853. Email: dsw5@cornell.edu

Abstract

The conventional approach to evaluating the joint statistical significance of multiple hypothesis tests (i.e., “field,” or “global,” significance) in meteorology and climatology is to count the number of individual (or “local”) tests yielding nominally significant results and then to judge the unusualness of this integer value in the context of the distribution of such counts that would occur if all local null hypotheses were true. The sensitivity (i.e., statistical power) of this approach is potentially compromised both by the discrete nature of the test statistic and by the fact that the approach ignores the confidence with which locally significant tests reject their null hypotheses. An alternative global test statistic that has neither of these problems is the minimum p value among all of the local tests. Evaluation of field significance using the minimum local p value as the global test statistic, which is also known as the Walker test, has strong connections to the joint evaluation of multiple tests in a way that controls the “false discovery rate” (FDR, or the expected fraction of local null hypothesis rejections that are incorrect). In particular, using the minimum local p value to evaluate field significance at a level αglobal is nearly equivalent to the slightly more powerful global test based on the FDR criterion. An additional advantage shared by Walker’s test and the FDR approach is that both are robust to spatial dependence within the field of tests. The FDR method not only provides a more broadly applicable and generally more powerful field significance test than the conventional counting procedure but also allows better identification of locations with significant differences, because fewer than αglobal × 100% (on average) of apparently significant local tests will have resulted from local null hypotheses that are true.

Corresponding author address: D. S. Wilks, Department of Earth and Atmospheric Sciences, Cornell University, Ithaca, NY 14853. Email: dsw5@cornell.edu

Save