Search Results

You are looking at 1 - 10 of 16 items for

  • Author or Editor: D. S. Wilks x
  • Refine by Access: All Content x
Clear All Modify Search
D. S. Wilks

Abstract

Stochastic daily weather time series models ( “weather generators” ) are parameterized consistent with both local climate and probabilistic seasonal forecasts. Both single-station weather generators, and spatial networks of coherently operating weather generators, are considered. Only a subset of parameters for individual station models (proportion of wet days, precipitation mean parameters on wet days, and daily temperature means and standard deviations) are found to depend appreciably on the seasonal temperature and precipitation outcomes, so that extension of the single-station models to coherent multisite weather generators is straightforward. The result allows stochastic simulation of multiple daily weather series, conditional on seasonal forecasts. Example applications of spatially integrated extreme daily precipitation and snowpack water content are used to illustrate the method.

Full access
D. S. Wilks

Abstract

Ensemble consistency is a name for the condition that an observation being forecast by a dynamical ensemble is statistically indistinguishable from the ensemble members. This statistical indistinguishability condition is meaningful only in a multivariate sense. That is, it pertains to the joint distribution of the ensemble members and the observation. The rank histogram has been designed to assess overall ensemble consistency, but mistakenly employing it to assess only restricted aspects of this joint distribution (e.g., the climatological distribution) leads to the incorrect conclusion that the verification rank histogram is not a useful diagnostic for good behavior of ensemble forecasts. The potential confusion is analyzed in the context of an idealized multivariate Gaussian model of forecast ensembles and their corresponding observations, and it is shown that the rank histogram does correctly assess the consistency of forecast ensembles.

Full access
D. S. Wilks

Abstract

The conventional approach to evaluating the joint statistical significance of multiple hypothesis tests (i.e., “field,” or “global,” significance) in meteorology and climatology is to count the number of individual (or “local”) tests yielding nominally significant results and then to judge the unusualness of this integer value in the context of the distribution of such counts that would occur if all local null hypotheses were true. The sensitivity (i.e., statistical power) of this approach is potentially compromised both by the discrete nature of the test statistic and by the fact that the approach ignores the confidence with which locally significant tests reject their null hypotheses. An alternative global test statistic that has neither of these problems is the minimum p value among all of the local tests. Evaluation of field significance using the minimum local p value as the global test statistic, which is also known as the Walker test, has strong connections to the joint evaluation of multiple tests in a way that controls the “false discovery rate” (FDR, or the expected fraction of local null hypothesis rejections that are incorrect). In particular, using the minimum local p value to evaluate field significance at a level α global is nearly equivalent to the slightly more powerful global test based on the FDR criterion. An additional advantage shared by Walker’s test and the FDR approach is that both are robust to spatial dependence within the field of tests. The FDR method not only provides a more broadly applicable and generally more powerful field significance test than the conventional counting procedure but also allows better identification of locations with significant differences, because fewer than α global × 100% (on average) of apparently significant local tests will have resulted from local null hypotheses that are true.

Full access
D. S. Wilks

Abstract

Presently employed hypothesis tests for multivariate geophysical data (e.g., climatic fields) require the assumption that either the data are serially uncorrelated, or spatially uncorrelated, or both. Good methods have been developed to deal with temporal correlation, but generalization of these methods to multivariate problems involving spatial correlation has been problematic, particularly when (as is often the case) sample sizes are small relative to the dimension of the data vectors. Spatial correlation has been handled successfully by resampling methods when the temporal correlation can be neglected, at least according to the null hypothesis. This paper describes the construction of resampling tests for differences of means that account simultaneously for temporal and spatial correlation. First, univariate tests are derived that respect temporal correlation in the data, using the relatively new concept of “moving blocks” bootstrap resampling. These tests perform accurately for small samples and are nearly as powerful as existing alternatives. Simultaneous application of these univariate resampling tests to elements of data vectors (or fields) yields a powerful (i.e., sensitive) multivariate test in which the cross correlation between elements of the data vectors is successfully captured by the resampling, rather than through explicit modeling and estimation.

Full access
D. S. Wilks

Abstract

Maximum covariance analysis (MCA) forecasts of gridded seasonal North American temperatures are computed for January–March 1991 through February–April 2014, using as predictors Indo-Pacific sea surface temperatures (SSTs), Eurasian and North American snow-cover extents, and a representation of recent climate nonstationarity, individually and in combination. The most consistent contributor to overall forecast skill is the representation of the ongoing climate warming, implemented by adding the average of the most recent 15 years’ predictand data to the climate anomalies computed by the MCA. For winter and spring forecasts at short (0–1 month) lead times, best forecasts were achieved using the snow-extent predictors together with this representation of the warming trend. The short available period of record for the snow data likely limits the skill that could be achieved using these predictors, as well as limiting the length of the SST training data that can be used simultaneously.

Full access
D. S. Wilks

Abstract

Climate “normals” are statistical estimates of present and/or near-future climate means for such quantities as seasonal temperature or precipitation. In a changing climate, simply averaging a large number of previous years of data may not be the best method for estimating normals. Here eight formulations for climate normals, including the recently proposed “hinge” function, are compared in artificial- and real-data settings. Although the hinge function is attractive conceptually for representing accelerating climate changes simply, its use is in general not yet justified for divisional U.S. seasonal temperature or precipitation. Averages of the most recent 15 and 30 yr have performed better during the recent past for U.S. divisional seasonal temperature and precipitation, respectively; these averaging windows are longer than those currently employed for this purpose at the U.S. Climate Prediction Center.

Full access
Full access
D. S. Wilks

Abstract

The performance of the Climate Prediction Center’s long-lead forecasts for the period 1995–98 is assessed through a diagnostic verification, which involves examination of the full joint frequency distributions of the forecasts and the corresponding observations. The most striking results of the verifications are the strong cool and dry biases of the outlooks. These seem clearly related to the 1995–98 period being warmer and wetter than the 1961–90 climatological base period. This bias results in the ranked probability score indicating very low skill for both temperature and precipitation forecasts at all leads. However, the temperature forecasts at all leads, and the precipitation forecasts for leads up to a few months, exhibit very substantial resolution: low (high) forecast probabilities are consistently associated with lower (higher) than average relative frequency of event occurrence, even though these relative frequencies are substantially different (because of the unconditional biases) from the forecast probabilities. Conditional biases, related to systematic under- or overconfidence on the part of the forecasters, are also evident in some circumstances.

Full access
D. S. Wilks

Abstract

Special care must be exercised in the interpretation of multiple statistical hypothesis tests—for example, when each of many tests corresponds to a different location. Correctly interpreting results of multiple simultaneous tests requires a higher standard of evidence than is the case when evaluating results of a single test, and this has been known in the atmospheric sciences literature for more than a century. Even so, the issue continues to be widely ignored, leading routinely to overstatement and overinterpretation of scientific results, to the detriment of the discipline. This paper reviews the history of the multiple-testing issue within the atmospheric sciences literature and illustrates a statistically principled and computationally easy approach to dealing with it—namely, control of the false discovery rate.

Full access
D. S. Wilks

Abstract

The minimum spanning tree (MST) histogram is a multivariate extension of the ideas behind the conventional scalar rank histogram. It tabulates the frequencies, over n forecast occasions, of the rank of the MST length for each ensemble, within the group of such lengths that is obtained by substituting an observation for each of its ensemble members in turn. In raw form it is unable to distinguish ensemble bias from ensemble underdispersion, or to discern the contributions of forecast variables with small variance. The use of scaled and debiased MST histograms to diagnose attributes of ensemble forecasts is illustrated, both for synthetic Gaussian ensembles and for a small sample of actual ensemble forecasts. Also presented are adjustments to χ 2 critical values for evaluating rank uniformity, for both MST histograms and scalar rank histograms, given serial correlation in the forecasts.

Full access