1. Introduction
Ensemble forecasting is now well established as a technique that is relevant at a variety of spatial scales and lead times (e.g., Du et al. 1997; Eckel and Walters 1998; Hamill and Colucci 1997; Houtekamer et al. 1996; Molteni et al. 1996; Stensrud et al. 1999; Toth and Kalnay 1997). The aim in ensemble forecasting is to approximate the probability distribution reflecting the uncertain components of the forecast system (prominently, initial-state uncertainty) using an ensemble (i.e., a finite collection) of specific plausible initial conditions. If the initial ensemble consists of a random sample from the underlying probability distribution of initial-condition uncertainty, and each ensemble member is integrated forward in time according to a perfect dynamical model, the resulting ensemble of forecasts should represent a random sample from the probability distribution of future-state uncertainty, and the actual state to which the real atmosphere evolves should be yet another random sample from this distribution.
In practice the initial ensemble is not a random sample from the relevant distribution (for a variety of reasons, not least of which is that this distribution is unknown), and the forecast models are not perfect. Therefore, one aspect of interest in the verification of ensemble forecasts is the degree to which the observed (or analyzed) future atmospheric states appear to be plausible members of their forecast ensembles.
For one-dimensional (i.e., scalar, or univariate) forecasts, a popular graphical device for addressing this question is the rank histogram (Anderson 1996; Hamill and Colucci 1997; Harrison et al. 1995). To tabulate a rank histogram, the rank of the observation within the nens + 1 member collection defined by the union of the nens-member ensemble and the observation is determined. Equivalently [provided none of the ensemble members is exactly equal to the analysis; otherwise see Hamill and Colucci (1997)], one is added to the number of ensemble members exceeded in magnitude by the corresponding observation. If the premise is true that the observation and the ensemble members have been drawn from the same distribution, any of the nens + 1 ranks is an equally likely position for the observation on any particular forecast occasion. Collectively, over some number n forecast occasions, a histogram of these nens + 1 ranks—the rank histogram—will be uniform, or flat, within the limits of a finite sample. Particular deviations from the ideal situation of the observation and ensemble members being drawn from the same distribution are reflected in deviations of the rank histogram from uniformity: positive or negative ensemble biases produce overpopulation of the lowest or highest ranks, respectively; underdispersed ensembles produce U-shaped rank histograms; and overdispersed ensembles result in underpopulation of the extreme ranks, or mound-shaped rank histograms (Hamill 2001).
In a manner similar to the ordinary rank histogram for scalar ensemble forecasts, the MST histogram tabulates the rank of the MST length computed for the nens ensemble members only, within the nens + 1 element distribution consisting of the union of the ensemble-only MST length, with the nens MST lengths obtained by substituting the observation for each one of the ensemble members in turn. That is, one is added to the number of the nens MSTs in which the observation has been substituted for one of the ensemble members, whose lengths are exceeded by that for the MST of the ensemble as actually forecast. [Note that this convention is the reverse of that in Smith (2001) but is consistent with usual practice for rank histograms (e.g., Hamill 2001)]. If the ensemble and the subsequent analysis it is meant to predict have been drawn from the same (K dimensional) probability distribution, then the lengths of the MSTs obtained by substituting the observation for any of the ensemble members should be statistically indistinguishable from the length of the MST computed from the ensemble members only. Over a large number n of forecast occasions, the histogram of these ranks— the MST histogram—should be essentially uniform, or flat.
While the scalar rank histogram and the MST histogram are similar in concept, it should be noted that the MST histogram is not a mathematical generalization of the conventional rank histogram. In particular, the MST histogram does not reduce to the scalar rank histogram in the special case of K = 1 dimension. Indeed, in one dimension the MST length is trivially the range (maximum minus minimum) of the data.
The purpose of this paper is to outline some important considerations that bear on the use of the MST histogram and to catalog some typical behaviors under various deviations from perfect ensembles, which result in different types of nonuniform MST histograms. Section 2 details these considerations and typical behaviors in the context of synthetic data. Section 3 applies these to a particular small sample of actual ensemble forecasts. Section 4 considers the question of statistical significance for rank uniformity as a function of ensemble size, sample size, and nonindependence of the ensembles and provides corresponding results for scalar rank histograms. Section 5 provides conclusions.
2. The MST histogram
a. Raw MST histograms
As noted earlier, the solid lines in Fig. 1 indicate MSTs for two ensembles whose members are labeled A–J. The point representing the corresponding observation is labeled O in Figs. 1a and 1b, and the dashed lines show the MSTs that result when the observation is substituted for ensemble member D in each case. In Fig. 1a this substitution results in a shorter MST, with the sum of the lengths of the solid and dashed lined segments being 8.0 and 7.5, respectively. The rank of the solid-line MST depends also on the lengths of the other nine MSTs, resulting from each of the other nine ensemble members being replaced by the observation in turn. In Fig. 1a, the lengths of eight of these MSTs are shorter than 8.0, and they are also shorter than the one obtained by replacing point G by the observation, which is very slightly longer than 8.0. Therefore, the rank of the length of the solid MST in Fig. 1a is 10 out of 11. In Fig. 1b, the length 6.3 of the solid MST is shorter than all 10 of the MSTs obtained by replacing an ensemble member A–J by the observation point O, so its rank is 1 out of 11.
The top row in Fig. 2 shows behaviors of MST histograms for unbiased forecasts, that is, for cases where the (vector) means μens and μtruth of the distributions from which the ensemble and the observation are drawn are equal for each of the n forecast occasions (although not necessarily the same from occasion to occasion). Here the MST histogram for σtruth/σensemble = 1 exhibits uniformity, within typical sampling variability for this sample size. Unbiased but overdispersed ensembles (left panels of top row) exhibit overpopulation of the higher ranks, reflecting the preponderance cases in which the MST length for the ensemble alone is the largest or among the largest of the nens + 1 MSTs for a given forecast. This condition tends to occur for overdispersed ensembles because the observation is often interior to the scatter of the ensemble, as in Fig. 1a, allowing space in the middle of the ensemble to be bridged (e.g., between the groups A–D and E–J in Fig. 1a) through that point, while dropping the segments associated with the omitted point elsewhere in the tree. This condition is accentuated in higher dimensions, where it is increasingly unlikely for an ensemble member to occur near the ensemble mean, because its value in all K dimensions must be near the corresponding mean value simultaneously. Quantitatively, for multivariate normal data (although the qualitative result does not depend on the distribution), the square of the Mahalanobis distance D in Eq. (2) (but between individual data values and their mean) follows the χ2 distribution, with degrees of freedom equal to the dimension K of the space. This is so because the transformation produces K-independent standard Gaussian random variables (Mardia et al. 1979), the sum of the squares of which is well known to follow the
Even without ensemble bias, underdispersed ensembles (σtruth/σensemble > 1) characteristically exhibit overpopulation of the smallest ranks. The MST length for the ensemble members alone tends to be the smallest or among the smallest of the nens + 1 MST lengths because, for the remaining MSTs, the substantial distance between the observation and the ensemble is added to the MST length while a shorter segment within the ensemble is deleted (Fig. 1b). However, the observation is also usually well removed from the ensemble when there is a large ensemble mean error due to forecast bias. Thus, raw MST histograms for substantially biased forecasts toward the bottom of Fig. 2 cannot be distinguished from MST histograms for underdispersed ensembles toward the right of Fig. 2. Similarly, the effects of ensemble bias and overdispersion can compensate to a degree, yielding MST histograms that are nearly uniform (e.g., bias = 2 and σtruth/σensemble = 0.8 in Fig. 2).
b. Scaled and bias-adjusted MST histograms
Figure 2 shows that raw MST histograms cannot distinguish between ensemble underdispersion and ensemble bias. Another problem may occur when there are different measurement scales or scales of variability on the different elements of the ensemble vector x. That is, if some of the K elements of x have variances that are very much smaller than the others, the MST will essentially ignore these dimensions because the corresponding terms in Eq. (1) will be small, so that the MST will essentially occupy only a subspace spanned by the high-variance elements.
Figure 3 illustrates the difference between the scalings in Eq. (5) (Fig. 3a) and Eq. (6) (Fig. 3b) for a hypothetical two-dimensional ensemble of size 50. In Fig. 3a the scaling has transformed both forecast variables to the same (unit) variance but has left the correlation (=0.95) unaffected. According to this scaling, a hypothetical observation O1 is at a distance of 2 (standard deviation units) from the ensemble mean (X), which is plotted at the origin for convenience. Observation O2 is much closer (0.5 standard deviation units) to the ensemble mean although it is outside the main ensemble scatter, and thus further removed from the ensemble mean according to the ensemble dispersion. In Fig. 3b both forecast variables have also been scaled to unit variance, but in addition the scaling in Eq. (6) reflects nearness of points in terms of the ensemble scatter itself, so that the distance [i.e., the Mahalanobis distance; Eq. (2)] from the ensemble mean to O1 is 1.4, while the distance to O2 is 2.2. That is, the Mahalanobis scaling emphasizes distances that are perpendicular to the main directions of scatter in the ensemble, reflecting the fact that points separated in such directions are less alike than points at an equal Euclidean distance apart in directions of the main ensemble scatter. Relative to Fig. 3a, the Mahalanobis scaling has in effect stretched the ensemble in the direction between the upper-left-hand and lower-right-hand corners of Fig. 3b. The result is that the two scaled variables z1 and z2 are uncorrelated and more correctly reflect (in terms of distances within the transformed space) the fact that O1 is inside but at the edge of the ensemble while O2 is near but outside.
Tabulation of MST histograms using the Mahalanobis transformation [Eq. (6)] is recommended in order to judge MST lengths in a way that is consistent with the shape of the ensemble scatter. The rank of the MST length for the scaled and debiased ensemble zi, i = 1, … , nens, is then determined with respect to the MSTs obtained by substituting z0 in turn for each of the zi, and tabulating the MST histogram collectively for all n forecast occasions. In order not to lose the bias information, which will often be an important aspect of the forecast verification exercise, the K biases that are subtracted (angle-bracket term) in Eq. (3) need to be tabulated and presented with the MST histogram.
Figures 4 and 5 show characteristic shapes of the MST histograms derived from bias-corrected and scaled [according to Eq. (6)] ensembles, for ensemble sizes of 10 and 54, respectively. Again, these are results for synthetic, Gaussian ensembles and observations and are presented as functions of ensemble underdispersion (horizontal) and the dimension K. Results for σtruth/σensemble = 1 have been omitted since these result in uniform MST histograms regardless of the ensemble size or dimension. For the larger dimensions K, the results are relatively insensitive to ensemble size, and the MST histograms are similar to the no-bias cases (top row) in Fig. 2. As the dimension increases, the MST histogram is increasingly sensitive to dispersion errors.
Overdispersed ensembles typically contain the observation as an interior point in a K-dimensional “shell” (because the probability of an ensemble member very near the ensemble mean is small in high-dimensional spaces) through which the MST can traverse a distance that would need to be bridged in any case. The result is that the MST excluding the observation is the longest or among the longest, leading to overpopulation of the high ranks. The members of underdispersed ensembles are typically farther from the observation than from each other, so the MST excluding the observation tends to be the shortest or among the shortest, leading to the characteristic overpopulation of the smaller ranks. The effects of ensemble size are more noticeable for smaller-dimension K, particularly for the overdispersed ensembles. Here there is a tendency for hump-shaped MST histograms rather than overpopulated high ranks, since in lower dimensions the ensemble tends to be more of a filled ball rather than a hollow shell, so the MST excluding the observation is often not extraordinarily long or short (Fig. 1a is thus somewhat atypical of K = 2-dimensional MSTs but has been chosen to illustrate the higher-dimensional behavior). This effect extends to higher dimensions for larger ensemble size, for example, K = 4 and nens = 54 in Fig. 5.
3. Example
In this section the foregoing ideas are applied to a small sample of ensemble forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF) Ensemble Prediction System (EPS) (Molteni et al. 1996). These are nens = 51-member ensembles initialized at 0000 UTC during the winter months of January and February 1997 and December 1997 through February 1998 and compared to the subsequent ECMWF analysis as the “observation.” Forecasts at 180-h lead time for 2-m air temperature, 10-m wind speed, and fractional cloud cover are considered, as interpolated to five locations in the United Kingdom: Birmingham, Bristol, Heathrow (London), Leeds, and Manchester. Since there are forecasts for three weather elements at five locations for each of the n = 149 forecast occasions, the dimension K of the forecast vector x is 15.
Figure 6 shows raw MST rank histograms for these forecasts, with (a) indicating results when the cloud cover is expressed as percent, (b) showing the same results but with cloud cover expressed as a decimal fraction (percent/100), and (c) showing results for the reduced (K = 10) forecasts that include only temperature and wind speed at the five locations. Because of the wide disparity in measurement scales, the ensemble scatter in the five cloud cover dimensions dominates the MSTs summarized in Fig. 6a, whereas expressing cloud cover as decimal fractions (Fig. 6b) results in their being essentially ignored, so that these MSTs are nearly confined to the 10-dimensional subspace spanned by the five temperature and five wind speed variables. This result is confirmed by Fig. 6c, which shows the MST histogram for the K = 10-dimensional forecasts of the temperatures and wind speeds only. Figure 6c is very similar to Fig. 6b, with both exhibiting more extreme overpopulation of the smaller ranks than Fig. 6a. Figure 6d compares the MST ranks for these n = 149 cases, with ranks from Fig. 6c on the horizontal and ranks from Fig. 6b on the vertical. Here the correlation is 0.98, while the corresponding correlations between the points in Fig. 6a and the other two MST histograms are about 0.25.
In order to remove the effects of different measurement scales, and to separate the effects of possible bias and dispersion errors, the same ensemble forecasts were scaled and bias adjusted as described in section 2b. Table 1 shows the 15 bias corrections [angle-bracketed term in Eq. (3)]. These are all negative, indicating underforecasting of all three elements (too cool, calm, and clear, on average) at all five locations, although the absolute magnitudes are generally modest. Figure 7 shows the MST histograms for these forecasts (corresponding to Fig. 6a) when scaled (a) according to the Mahalanobis transformation [Eq. (6)] and (b) by dividing by corresponding ensemble standard deviations only [Eq. (5)]. Both Figs. 7a and 7b indicate that the ensembles are underdispersed, with the Mahalanobis scaling in Fig. 7a reflecting also the effects of the correlations among the forecast elements on the distances between ensemble members. These correlations are substantial, with average correlations among the five sites of 0.988, 0.876, and 0.935 for the temperature, wind, and cloud cover forecasts, respectively.
4. Chi-square tests for histogram uniformity given autocorrelated forecasts
One complication in the application of Eq. (8) to assessing rank uniformity, either for scalar rank histograms or for MST histograms, is that the tabulated critical values from the χ2 distribution pertain to independent sequences of ensembles. This condition implies that sequences of forecasts must exhibit no serial correlation, which of course is often not the case. For example, the daily sequences of 180-h lead time temperature and wind forecasts described in section 3 exhibit lag − 1 autocorrelations of approximately 0.5 and 0.4, respectively (the cloud cover forecasts are essentially uncorrelated).
Tabulated critical values from the χ2 distribution can be adjusted to reflect the effects of serial correlation on the sampling variability of MST histograms, using the values provided as functions of the lag − 1 autocorrelation ϕ, in Table 2. These have been computed using the simple stochastic model of ensemble behavior described in the appendix, in which the observation is statistically indistinguishable from the ensemble members by construction, and which reflect the Mahalanobis scaling of Eq. (6) through simulation of uncorrelated ensemble members [the submatrices on the diagonal of Eq. (A4), shown later in the appendix, are themselves diagonal]. The resulting adjustments are insensitive to the dimensionality K (K ≥ 2) of the ensembles and depend on the ensemble size only through the degrees-of-freedom parameter of the χ2 distribution, which in this setting is equal to the ensemble size. While a conventional rule of thumb states that there should be sufficient data to have at least five counts in each bin on average (in the present setting, n/nens ≥ 5), the testing approach and the adjustments in Table 2 were found to be valid for n/nens ≥ 2 or less.
The χ2 values [Eq. (8)] for the example scaled and bias-corrected MST histograms presented in section 3 are included in Fig. 7. Even though the histograms drawn in this figure have been collected into only 13 bins, the χ2 values were computed using the 51 + 1 bin counts separately, as indicated by the subscripts in Fig. 7. The critical levels of
Because the effects of the large correlations among the forecast elements have not been accounted for in Fig. 7b, quantitative interpretation of the χ2 value for that MST histogram is not straightforward. It would be possible to evaluate adjusted χ2 values for particular cases through simulations using Eq. (A1), in which the diagonal submatrices in Eq. (A4) reflected the observed correlations (see appendix).
Finally, Table 3 contains additive adjustments to tabulated χ2 critical values, appropriate to evaluating uniformity of scalar rank histograms. These were tabulated from simulations with the simple stochastic model described in the appendix, with K = 1 so that the submatrices in Eqs. (A3) and (A4) reduce to scalars. Again, dependence on the ensemble size is subsumed in the χ2 critical values through its degrees-of-freedom parameter, and the results are valid for n/nens ≥ 2, at least. Comparison of Tables 2 and 3 shows that the adjustments appropriate to scalar rank histograms are much more sensitive to serial correlation than are the values for MST histograms in Table 2.
5. Conclusions
This paper has examined the MST histogram, a conceptual extension (and not a mathematical generalization) for multidimensional ensemble forecasts of the conventional rank histogram for scalar forecasts. While not a complete verification tool, in the sense that it does not portray the joint distribution of forecasts and observations (Murphy and Winkler 1987), it does provide diagnostic information that may be useful in interpreting and improving ensemble forecasts. Notably, however, the MST histogram does not provide information on the resolution of the forecasts. That is, other things being equal, forecasts with smaller ensemble dispersion (provided it is appropriate to the forecast accuracy) yield more refined probabilities (and thus will be better forecasts to the extent that those refined probabilities are well calibrated, or reliable), but this attribute is not reflected in the MST histogram. This deficiency is also a characteristic of the conventional scalar rank histogram (e.g., Hamill 2001).
The MST histogram presents frequencies of ranks of lengths of ensemble MSTs, relative to the group of such lengths derived by substituting the observation in turn for each of its ensemble members. This convention is consistent with usual practice for scalar rank histograms but is opposite to the original proposal for the MST histogram made by Smith (2001), which results in histograms that are flipped horizontally relative to those described here. In raw form, the MST histogram cannot distinguish ensemble bias from ensemble underdispersion and will downweight or ignore forecast dimensions with small ensemble variability. This paper has advocated computing the MST histograms using forecasts that have been debiased ex post facto and scaled according to the Mahalanobis transformation [Eq. (6)], to eliminate the effects of different ensemble spreads in different dimensions and to account for the effects of correlations within the ensemble on effective distances between ensemble members. The bias information should be retained and reported with the MST histograms.
The behavior of MST histograms has been explored for synthetic Gaussian data, as a function of ensemble over- or underdispersion, ensemble size nens, and data dimension K; but this catalog of behaviors is not exhaustive. As noted by Hamill (2001) in the context of scalar rank histograms, qualitative deviations from these synthetic results may occur for real forecasts, for example, when ensemble properties are not homogeneous within a particular sample of n forecasts.
Adjustments to χ2 values for evaluation of uniformity of the MST histograms to accommodate serial correlation in forecast data have also been presented. These adjustments are generally modest, except for the largest magnitudes of serial dependence. The values in Table 2 pertain to ensembles that have been scaled according to Eq. (6) and are not appropriate to MST histograms in which the effects of ensemble correlation on proximity of ensemble members has not been accounted for. Corresponding χ2 adjustments for assessing uniformity of scalar rank histograms have also been presented.
Verification approaches and other interpretation methods for ensemble forecasts are only just developing. In addition to the scalar rank histogram, alternative ensemble verification methods that recently have been suggested include Bayesian probabilities of the observation given the ensemble distribution (Wilson et al. 1999), scalar performance measures based on economic value (Richardson 2000; Wilks 2001), bounding boxes (Smith 2001), multidimensional scaling (Stephenson and Doblas-Reyes 2000), and time evolution of the ensemble eigenvalues and eigenvectors, and of the ensemble entropy (Stephenson and Doblas-Reyes 2000). Given the intrinsically high dimensionality (Murphy 1991) of ensemble forecast verification, it seems possible that a unified approach to ensemble verification that intelligibly expresses the full joint distribution of forecasts and observations may not be achieved. The MST histogram may develop as one of a number of useful and important diagnostics for ensemble forecasts.
Acknowledgments
I thank ECMWF for supplying the EPS forecast data. The comments of Tom Hamill and two anonymous reviewers have improved the presentation of the paper. This work was supported by NSF under Grant ATM-0221542.
REFERENCES
Ahuja, R., T. Magnanti, and J. Orlin, 1993: Network Flows: Theory, Algorithms, and Applications. Prentice Hall, 846 pp.
Anderson, J. L., 1996: A method for producing and evaluating probabilistic forecasts from ensemble model integrations. J. Climate, 9 , 1518–1530.
Atkinson, K. E., 1978: An Introduction to Numerical Analysis. Wiley, 587 pp.
Bras, R. L., and I. Rodriguez-Iturbe, 1985: Random Functions and Hydrology. Addison-Wesley, 559 pp.
Du, J., S. L. Mullen, and F. Sanders, 1997: Short-range ensemble forecasting of quantitative precipitation. Mon. Wea. Rev, 125 , 2427–2459.
Eckel, F. A., and M. K. Walters, 1998: Calibrated probabilistic quantitative precipitation forecasts based on the MRF ensemble. Wea. Forecasting, 13 , 1132–1147.
Hamill, T. M., 2001: Interpretation of rank histograms for verifying ensemble forecasts. Mon. Wea. Rev, 129 , 550–560.
Hamill, T. M., and S. J. Colucci, 1997: Verification of Eta–RSM short-range ensemble forecasts. Mon. Wea. Rev, 125 , 1312–1327.
Harrison, M. S. J., D. S. Richardson, K. Robertson, and A. Woodcock, 1995: Medium-range ensembles using both the ECMWF T63 and unified models—An initial report. UKMO Tech. Rep. 153, 25 pp. [Available from Met Office Library, London Road, Bracknell, Berkshire RG12 2SZ, United Kingdom.].
Houtekamer, P. L., L. Lefaivre, J. Derome, H. Ritchie, and H. L. Mitchell, 1996: A system simulation approach to ensemble prediction. Mon. Wea. Rev, 124 , 1225–1242.
Mardia, K. V., J. T. Kent, and J. M. Bibby, 1979: Multivariate Analysis. Academic Press, 518 pp.
Molteni, F., R. Buizza, T. N. Palmer, and T. Petroliagis, 1996: The ECMWF ensemble prediction system: Methodology and validation. Quart. J. Roy. Meteor. Soc, 122 , 73–119.
Murphy, A. H., 1991: Forecast verification: Its complexity and dimensionality. Mon. Wea. Rev, 119 , 1590–1601.
Murphy, A. H., and R. L. Winkler, 1987: A general framework for forecast verification. Mon. Wea. Rev, 115 , 1330–1338.
Richardson, D. S., 2000: Skill and relative economic value of the ECMWF ensemble prediction system. Quart. J. Roy. Meteor. Soc, 126 , 649–667.
Smith, L. A., 2001: Disentangling uncertainty and error: On the predictability of nonlinear systems. Nonlinear Dynamics and Statistics, A. E. Mees, Ed., Birkhauer Press, 31–64.
Stensrud, D. J., H. E. Brooks, J. Du, M. S. Tracton, and E. Rogers, 1999: Using ensembles for short-range forecasting. Mon. Wea. Rev, 127 , 433–446.
Stephenson, D. B., 1997: Correlation of spatial climate/weather maps and the advantages of using the Mahalanobis metric in predictions. Tellus, 49A , 513–527.
Stephenson, D. B., and F. J. Doblas-Reyes, 2000: Statistical methods for interpreting Monte Carlo ensemble forecasts. Tellus, 52A , 300–322.
Toth, Z., and E. Kalnay, 1997: Ensemble forecasting at NCEP and the breeding method. Mon. Wea. Rev, 125 , 3297–3319.
Wilks, D. S., 1995: Statistical Methods in the Atmospheric Sciences. Academic Press, 464 pp.
Wilks, D. S., 2001: A skill score based on economic value for probability forecasts. Meteor. Appl, 8 , 209–219.
Wilson, L. J., W. R. Burrows, and A. Lanzinger, 1999: A strategy for verification of weather element forecasts from an ensemble prediction system. Mon. Wea. Rev, 127 , 956–970.
APPENDIX
A Multivariate Autoregressive Model for Ensemble Forecast Behavior
Hypothetical example MSTs in K = 2 dimensions. The nens = 10 ensemble members are labeled A–J, and the corresponding observation is O. Solid lines indicate MSTs for the ensemble as forecast, and dashed lines indicate MSTs that result from the observation being substituted for ensemble member D. (a) A configuration that could result from an overdispersed ensemble, where the observation is interior to the point cloud of the ensemble. (b) A configuration that could result from an underdispersed ensemble and/or a substantial ensemble mean error
Citation: Monthly Weather Review 132, 6; 10.1175/1520-0493(2004)132<1329:TMSTHA>2.0.CO;2
Behaviors of MST histograms for nens = 10 in K = 10 dimensions, as functions of ensemble bias (vertical) and ensemble underdispersion (horizontal), from independent samples of size n = 1000. Vertical scales on each histogram have been varied for clarity of presentation, with the level of the expected number per bin under uniformity (1000/11 = 91) indicated in each case by the dashed line
Citation: Monthly Weather Review 132, 6; 10.1175/1520-0493(2004)132<1329:TMSTHA>2.0.CO;2
Comparison of a hypothetical 50-member ensemble in K = 2 dimensions, as scaled by (a) dividing each dimension by the corresponding ensemble std dev [Eq. (5)] and (b) the Mahalanobis transformation [Eq. (6)]. Plots are centered at the ensemble mean (X) and show also two hypothetical observations O1 and O2 in relation to the ensemble
Citation: Monthly Weather Review 132, 6; 10.1175/1520-0493(2004)132<1329:TMSTHA>2.0.CO;2
Behaviors of scaled and debiased MST histograms for nens = 10, as functions of increasing dimensionality (vertical) and ensemble underdispersion (horizontal), from independent samples of size n = 10 000. Vertical scales on each histogram have been varied for clarity of presentation, with the level of the expected number per bin under uniformity (10 000/11 = 909) indicated in each case by the dashed line.
Citation: Monthly Weather Review 132, 6; 10.1175/1520-0493(2004)132<1329:TMSTHA>2.0.CO;2
As in Fig. 4, but for nens = 54, with each of the 11 bars indicating counts in five consecutive MST histogram bins for clarity of presentation.
Citation: Monthly Weather Review 132, 6; 10.1175/1520-0493(2004)132<1329:TMSTHA>2.0.CO;2
(a) MST histogram for ECMWF EPS forecasts of temperature (°C), wind speed (m s−1), and cloud fraction (%), at Birmingham, Bristol, Leeds, London, and Manchester, United Kingdom (i.e., considering 15-dimensional forecast vectors) at 180-h lead time for the 149 forecasts initialized during Jan and Feb 1997 and Dec 1997– Feb 1998. (b) Results for the same data, except with cloud fractions expressed as %/100, and (c) results omitting the cloud forecasts (10-dimensional forecasts). The ensemble size is 51, each of the 13 bars indicates counts in four consecutive MST histogram bins for clarity of presentation, and the expected number of counts (11.5) under uniformity is indicated by the dashed lines. (d) Scatterplot of MST ranks corresponding to (b) (vertical) and (c) (horizontal) and their correlation over the 149 cases, illustrating domination of the MST lengths by variables with larger scales of variation
Citation: Monthly Weather Review 132, 6; 10.1175/1520-0493(2004)132<1329:TMSTHA>2.0.CO;2
MST histograms for the 15-dimensional forecasts, as in Fig. 6a, after removal of biases, and standardization to common scales according to (a) the Mahalanobis transformation [Eq. (6)] and (b) division of each ensemble vector element by its ensemble sd dev only [Eq. (5)]. Each of the 13 bars indicates counts in four consecutive MST histogram bins for clarity of presentation, and the expected number of counts (11.5) under uniformity is indicated by the dashed lines
Citation: Monthly Weather Review 132, 6; 10.1175/1520-0493(2004)132<1329:TMSTHA>2.0.CO;2
Ensemble biases [angle-bracket term in Eq. (3)] over the n = 149 forecasts
Additive corrections to tabulated χ2 critical values to test uniformity of MST histograms as functions of lag − 1 autocorrelation ϕ. Corrections for ϕ < 0.4 are negligible
Additive corrections to tabulated χ2 critical values to test uniformity of conventional (scalar) rank histograms as functions of lag − 1 autocorrelation ϕ