1. Introduction
It is a widespread observation in numerical weather prediction and seasonal forecasting that when performing ensemble forecasts the ensemble mean often outperforms most or all of the individual ensemble members (Toth and Kalnay 1997; Hamill and Colucci 1997; Du et al. 1997; Ebert 2001; Hagedorn et al. 2005; Krishnamurti et al. 1999; Casanova and Ahrens 2009; Surcel et al. 2014). A similar behavior is observed in climate modeling when validating an ensemble of different model experiments against observations (Lambert and Boer 2001; Gleckler et al. 2008; Pincus et al. 2008; Knutti et al. 2010; Sillmann et al. 2013; Flato et al. 2013) and when validating air quality models (van Loon et al. 2007; Delle Monache and Stull 2003; McKeen et al. 2005).
In the climate modeling context Christiansen (2018a) recently gave a simple explanation based on nonintuitive properties of high-dimensional spaces—the curse of dimensionality. It was explained not only why the ensemble mean often outperforms most or all of the individual ensemble members but also why the error of the ensemble mean is almost always
The explanation in Christiansen (2018a) is based on general principles and gives a framework from which quantitative and analytical results can be obtained. These principles are valid for all distance measures in high-dimensional spaces. In the analysis of weather forecasts or climate models the high-dimensional space enters when we consider, for example, the root-mean-square error over all the grid points in an extended spatial region as a measure of the difference between models and observations. We then operate in a space with the dimension given by the number of (independent) grid points. Note that distance measures based on simple spatial averages, such as the difference between the modeled and the observed average Northern Hemisphere temperature, are not of high dimensionality and the results of this paper do therefore not hold for such diagnostics.
The properties of high-dimensional spaces often defy our intuition based on two and three dimensions (Bishop 2007; Cherkassky and Mulier 2007; Blum et al. 2018). The relevant properties of high-dimensional spaces for this study are that independent random vectors are almost always orthogonal and that random vectors drawn from the same distribution have almost the same length. In mathematics these properties are known as “waist concentration” and “concentration of measures” (Talagrand 1995; Donoho 2000; Gorban et al. 2016) and they are a cornerstone of statistical mechanics (see Gorban and Tyukin 2018, for a recent discussion). While the expression “curse of dimensionality” is generally used for the properties of high-dimensional spaces, in the present context these properties turn out to be a blessing as they strongly simplify the analysis and make analytical results possible. For the situation studied in Christiansen (2018a) the vectors representing the ensemble members and observations lie in a thin annulus far from the center and are all mutually orthogonal. The ensemble mean, however, is located near the center. Therefore, the ensemble mean, the observation, and an individual ensemble member form an isosceles right triangle and the error of the ensemble mean will be a factor of
A basic assumption in Christiansen (2018a) is that the ensemble members and the observations are drawn from the same distribution. While this may be a sensible assumption when validating the climatology of climate models it is not necessarily a good assumption in numerical weather prediction. Here, the bias and variance of the model ensemble are expected to depend on the lead time of the forecast. In this paper we analyze the behavior of the ensemble mean in numerical weather prediction, more precisely in the National Oceanic and Atmospheric Administration’s (NOAA) second-generation global ensemble reforecasts (GEFS) (Hamill et al. 2013). We assume a simple statistical model where observations and ensemble members are drawn from distributions with different variances and where the ensemble may be biased relative to observations. This model was briefly discussed in Christiansen (2018a) regarding the relative error of the ensemble mean. With this model and using the generic geometric properties of high-dimensional spaces we obtain further analytical results for the rank of the ensemble mean, the relation between the ensemble mean and the ensemble spread, and the effect of spatial correlations. These results explain the behavior of the ensemble mean as a function of lead time in the reforecasts. For convenience we have used the model analysis as proxy for the truth but we have tested that only small differences are found using real observations. While the present study focuses on the properties of the ensemble mean, the general simplifying properties of high dimensionality will also be applicable to other (e.g., probabilistic) measures of the forecast skill.
The simple statistical model and the analytic results obtained by using the properties of high-dimensional spaces are discussed in section 2. These results depend on the bias and the variance of both observations and model ensemble. The NOAA global ensemble reforecast dataset is described in section 3. The relation between the simple statistical model and the complex ensemble forecast is discussed in section 4 together with the method to separate bias and the variances in the ensemble forecast. The analytical and numerical results are compared in section 5. The paper finishes with the conclusions in section 6.
2. Analytical results in high-dimensional spaces
In this section we present the analytical results based on the properties of high-dimensional spaces. We begin by describing the notation, the statistical model, and the simplifying effects of high dimensions in section 2a. In sections 2b and 2c we derive analytical results for the relative error and the rank of the ensemble mean. In section 2d we look at the relation between the ensemble mean and the ensemble spread. Finally, in section 2e we consider the modifications to the analytical results when the elements of the observations and the ensemble members are not independent (corresponding to the presence of spatial correlations in the forecasts) and the effective dimension therefore is smaller than the physical dimension.
a. The statistical model and notation
We first focus on an ensemble forecast for a particular lead time. The forecast has K ensemble members,
Our basic model for the description of the ensemble forecast assumes that the ensemble members are drawn independently from
For the weather forecasts we expect to be close to the situation where
For N large the general properties of high-dimensional spaces suggest a situation as shown schematic in Fig. 1. The ensemble members will be situated on a thin annulus with radius
Schematic diagram showing the situation in high-dimensional space when ensemble members are drawn from
Citation: Monthly Weather Review 147, 5; 10.1175/MWR-D-18-0211.1
b. Relative error of the ensemble mean









The relative error of the ensemble mean as a function of normalized model variance
Citation: Monthly Weather Review 147, 5; 10.1175/MWR-D-18-0211.1
c. Rank of the ensemble mean
Equations (1) and (2) show that for large N and K the ensemble mean is always better than all the ensemble members. As mentioned in the introduction it is often observed that the ensemble mean outperforms most or all of the individual ensemble members. To get a more nuanced picture for the rank of the ensemble mean we now need to relax the limit of large N.























In Eq. (5) we now approximate
In Eq. (4) we approximate





The fraction of the ensemble members better than the ensemble mean calculated from this equation is shown in Fig. 3 for
The rank of the ensemble mean (fraction of ensemble members less than ensemble mean) as a function of variance and bias for
Citation: Monthly Weather Review 147, 5; 10.1175/MWR-D-18-0211.1
We have confirmed numerically that Eq. (7) is a very good approximation even for
d. Ensemble spread and ensemble mean error





e. The effective dimension
In the considerations above we assumed that elements
We will therefore have to distinguish between the physical dimension N (the number of grid points) and the effective dimension
We will here consider how the situation where
We now repeat the rewriting of the multivariate spherical Gaussian as a function of radius (Christiansen 2018a) but now we take the reduced number of independent elements into consideration. We find that the lengths of the vectors in Fig. 1 still depend on the physical dimension N [e.g., in Eqs. (1) and (2)]. However, the widths (standard deviations) will be
For the dot product of two ensemble members we then get the standard deviation
Using these relations we find that for the fraction of the ensemble members better than the ensemble mean, Eq. (7), N should be replaced with
3. The global ensemble reforecast
In this section we describe the ensemble reforecast data and show how the error and rank of the ensemble mean change as functions of lead time. In section 5 we will use the theoretical results of section 2 to relate the behavior of the ensemble mean to the variances and bias.
For comparison with the analytical results we use data from the NOAA’s second-generation global ensemble reforecast system (GEFS; Hamill et al. 2013). The reforecasts were downloaded from https://www.esrl.noaa.gov/psd/forecasts/reforecast2/download.html. The ensemble forecasts consist of one control forecast and 10 perturbed ensemble members with lead times up to 384 h (16 days). The forecasts are available for every day in the period 1985 to the present. The internal resolution of the model is T254L42 the first 8 days and T190L42 thereafter.
We have downloaded the forecasts with 3 h intervals for lead times up to 72 h and with 6-h intervals up to 384 h for the period 1985–2017 (33 years). The downloaded horizontal fields are at a 1° × 1° longitude–latitude resolution. Here we mainly consider the 33 forecasts of 2-m temperature for 15 January. Observations are taken as the unperturbed (control) ensemble member at lead time 0. Using the analysis as a proxy for truth is common practice [but not without problems (Bowler et al. 2015)]. However, we have tested that almost identical results are obtained when using either an independent model-based reanalysis or a station observation-based gridded dataset instead of the analysis. The main difference is that in these cases the root-mean-square errors do not vanish for the smallest lead times.
Figure 4 shows the root-mean-square errors of the 2-m temperature for the different ensemble members as a function of lead time for two different 15 January dates (top 1985, bottom 1986). The root-mean-square errors are calculated for the extratropics, 20°–90°N (left), and for a smaller area, 40°–45°N, 10°–40°E (right). As expected the errors increase fast in the first few time steps where after the increase slows down. Saturation is reached for a lead time approximately around 100 h for the extratropics and around 50 h for the smaller area. Also note that the spread of the ensemble members increases with lead time and is larger for the smaller region than for the extratropics. Figure 4 also shows the root-mean-square errors of the ensemble mean (thick black curves). We observe that the error of the model mean is smaller than almost all errors of the individual ensemble members for large lead times. This effect is more clear for the extratropics than for the smaller region. Close inspection shows that also for the smallest lead times the error of the ensemble mean is smaller than the errors of the individual ensemble members. There is a daily cycle in the forecasts, with peaks for lead times 12 h, 36 h, etc., which is most pronounced for the smaller region.
The root-mean-square error of the GEFS reforecast of 2-m temperature as a function of lead time for 15 Jan in two different years [(top) 1985, (bottom) 1986]. Thin black curves are the 11 ensemble members, thick black curve the ensemble mean. Full blue curve is the estimate from Eq. (2) and dashed blue curve is the estimate from Eq. (1) with observed values of B,
Citation: Monthly Weather Review 147, 5; 10.1175/MWR-D-18-0211.1
Figure 5 shows—for the same forecasts as in Fig. 4—the errors relative to the median error of all ensemble members. Note that the relative error of the ensemble mean is always smaller than zero. For the smallest lead times it is close to −1, but it increases fast and is close to zero for lead times around 30–40 h. Then it slowly decreases and there is a tendency for convergence toward
Relative error of the GEFS reforecast as a function of lead time for the same two forecasts as in Fig. 4. Thin black curves are the 11 ensemble members, thick black curve the ensemble mean. Blue curve is the estimate from Eq. (3) with observed values of
Citation: Monthly Weather Review 147, 5; 10.1175/MWR-D-18-0211.1
Figures 6 and 7 summarize the situation for all 33 forecasts for 15 January as a function of lead time. Figure 6 shows the relative error of the ensemble mean for all 33 ensemble forecasts together with its average over the 33 forecasts. The behavior we saw for the two individual forecasts is confirmed. Figure 7 shows the rank of the model mean for the 33 ensemble forecasts. The rank is 0 if the error of the ensemble mean is lower than the error of all 11 individual ensemble members. Likewise, the rank is 11 if the error of the ensemble mean is higher than the error of all ensemble members. For the smallest and largest lead times the ensemble mean is almost always closer to observations than all the ensemble members. For lead times between 20–100 h the ensemble mean is more often outperformed by one or more ensemble members.
The relative error (RMS) of the ensemble mean as a function of lead time for all the 33 forecasts of the 2-m temperature for 15 Jan. Thin black curves are the 33 individual forecasts. Thick black curve is mean over all 33 forecasts. Blue curve is the theoretical relative error from Eq. (3) with observed values of
Citation: Monthly Weather Review 147, 5; 10.1175/MWR-D-18-0211.1
The rank of the ensemble mean as a function of lead time based on all the 33 forecasts of the 2-m temperature for 15 Jan. Thin curves are the 33 individual forecasts. Thick black curve is mean over all 33 forecasts and dashed black curves the mean plus/minus one standard deviation. The blue curves are analytic results [Eq. (7)] for
Citation: Monthly Weather Review 147, 5; 10.1175/MWR-D-18-0211.1
We can directly test if the blessings of high dimensionality are fulfilled for the model ensemble. For each lead time and for each of the 33 forecasts we calculate the angles between pairs of ensemble members. The angles
(top) The distribution of angles
Citation: Monthly Weather Review 147, 5; 10.1175/MWR-D-18-0211.1
4. Relation between the simple statistical model and ensemble forecasts
The analytical expressions in section 2 were derived for a simple statistical model including model bias and different variances for observations and the model ensemble. In this section we first, in section 4a, discuss how to relate the simple statistical model to the complex ensemble forecast. In section 2 we saw that in the simple statistical model the relative error and the rank of the ensemble mean both depend on the variances of the forecasts and observations and on the bias. In section 4b we discuss how to estimate these quantities to be able to compare the observed ensemble mean to the theory of section 2. The results in this section do not depend on the properties of high-dimensional spaces.
a. The dynamical perspective
There are two sources of errors in ensemble forecasts. The first is related to uncertainties in the initial conditions due to, for example, measurements errors and scarceness of observations. The second is related to model deficiencies originating, for example, from unresolved scales and physical parameterizations. The effect of these errors and attempts to separate them have been studied by Tribbia and Baumhefner (1988), Stensrud et al. (2000), and Nicolis et al. (2009). See also the review by Palmer (2000). The situation in phase-space is illustrated schematically in Fig. 9. The gray cloud illustrates the probability distribution of the initial conditions. The blue clouds indicate the modeled probability distributions at two later times.
Schematic illustration of the time evolution in phase space. The probability distribution of the initial conditions (gray cloud) evolves differently in the model (blue clouds) and the real atmosphere (red clouds) because of model deficiencies. The curves illustrate individual trajectories. The widths of the clouds indicate the variances of the model
Citation: Monthly Weather Review 147, 5; 10.1175/MWR-D-18-0211.1
In the real atmosphere there is a single true initial condition and a single trajectory in phase space and consequently a single atmospheric state at each later time. However, we can imagine the evolution of the distribution of the initial conditions in the real atmosphere (or in a perfect model). This is illustrated as the red clouds in the figure. It is these distributions—the blue and red clouds—that should be compared for a given lead time when validating the ensemble forecast. The widths of the clouds (in the subspace or projection under consideration; here the near-surface temperature) indicate the variances of the model,
The temporal development of the distributions are given by the Liouville equation (Ehrendorfer 1994, 2006). However, in ensemble forecasts the distributions are estimated from a finite number of ensemble members due to the complexities of the atmosphere (e.g., Palmer 2000). In Fig. 9 the initial conditions of the ensemble are all located inside the small gray cloud. The blue curves show how the ensemble members develop over time. The figure indicates a strong divergence in the beginning which soon saturates and is followed by a slower divergence. For the real atmosphere the trajectories are illustrated by the red curves.
For each forecast and each lead time we have a model ensemble from which we can estimate the distributional quantities. However, for each forecast and each lead time we only have one realization of the real atmosphere. To estimate the distribution related to the real atmosphere (the red clouds) we need to combine information from many forecasts just as is done in verifications of ensemble forecasts (Toth et al. 2003).
b. Variance-bias decomposition of the error
We now focus on the forecasts for a specific lead time. We are interested only in the bias and variance related to the divergence from the initial conditions and not in the bias and variance related to the climatological year-to-year variations. However, without further assumptions it is not possible to estimate both bias and variance of the observations as we have only one observation for each forecast. We, therefore, consider all forecasts with start day at the same calendar day, here 15 January. We combine these forecasts by removing for every start day the ensemble mean for that day from both the forecast ensemble and observation. This amounts to assuming a model where the different days only differ by a shift that is the same for models and observations. In other words, we consider anomalies with respect to the ensemble mean (which also depends on lead time). Thus, ensemble members are drawn from

















The resulting decomposition is shown in the top panels of Figs. 10 and 11. The total error grows fast for the first 24 h. For lead times smaller than 100 h
Decomposition of the mean square error into variance and bias shown as a function of lead time. For every lead time and year the ensemble mean has been removed from both the ensemble and the observation [see Eq. (9)]. (top)
Citation: Monthly Weather Review 147, 5; 10.1175/MWR-D-18-0211.1
As in Fig. 10, but for the smaller region defined by 40°–45°N, 10°–40°E.
Citation: Monthly Weather Review 147, 5; 10.1175/MWR-D-18-0211.1
Figures 10 and 11 (bottom panels) show the ratios
5. Relating the properties of the ensemble mean to variance and bias
We now compare the mean errors and ranks from the GEFS reforecasts—described in section 3—with the analytical expressions obtained in section 2. These expressions depend on the bias and the variances of the ensemble and observations and they therefore also depend on the lead time. The bias and variances were estimated in section 4.
We first consider the errors of the two individual forecasts shown in Fig. 4. Full and dashed blue curves in this figure are estimates of the error of the ensemble mean [Eq. (2)] and the error of an individual ensemble member [Eq. (1)] using B,
When values of the ratio of variances
For individual winters (Fig. 5) we again see a good agreement with a strong increase in the relative error for the first lead times, a maximum around 30 h, and a decrease toward
Inserting the values of the ratio of variances
The daily cycle in both the relative error and the rank of the ensemble mean that is most clearly seen for the smallest region is a consequence of the daily cycle in the bias.
In general we see for the smallest lead times a situation corresponding to truth plus error interpretation where model ensemble members are sampled from a distribution centered about the observations. This is obviously a simple consequence of the initialization and corresponds in our simple model to
We note that the very good agreement between the analytical and empirical results is not due to a spurious cancellation of errors as it is also found when the bias and variance are calculated from one-half of the starting days and the error of the ensemble mean from the other half.
Finally, we consider the relation between the error of the ensemble mean and the ensemble spread. We note that the last term in Eq. (8) is smaller than the other terms both because of the factor
6. Conclusions
In ensemble forecasts of weather or climate the ensemble mean often outperforms most or all of the individual ensemble members. This holds when, for example, the root-mean-square error over many grid points is used as the measure of the difference between models and observations. It was demonstrated by Christiansen (2018a) that this is a simple consequence of the nonintuitive geometric properties of high-dimensional spaces assuming that the ensemble members and observations are drawn from the same distribution. While this assumption that ensemble members and observations are drawn from the same distribution might hold for the climatology of climate models it does not hold for short-term weather forecasts where bias and variance depend strongly on lead time.
In this paper we first considered the properties of the ensemble mean in a simple theoretical model. We then studied the behavior of the ensemble mean as a function of lead times in the GEFS ensemble reforecast system and compared the results to the simple theoretical model. In this simple model the ensemble members and observations are drawn from distributions with different variances and separated by a bias (Fig. 1). Based on this model we extended the work of Christiansen (2018a) and derived an analytical expression for the rank of the ensemble mean relative to the ensemble members. We also found an analytical relation between the ensemble spread and the ensemble mean error. Finally, we derived analytical expressions for the modifications needed when spatial correlations reduce the effective dimension. All these derivations rely heavily on the blessings of high-dimensional spaces, which simplify the calculations as independent vectors are almost always orthogonal and vectors drawn from the same distribution have almost the same length. The analytical expressions depend on the ratio of the bias to the variance of the observations as well on the ratio between the variances of the ensemble and the observations.
For the GEFS reforecast we found considering the extratropical 2-m temperature that the relative error of the ensemble mean increases rapidly for the first lead times to reach a value of 0 around lead times of 30–40 h (Fig. 6). At the time of initialization the relative error is −1 for large ensembles as the ensemble members are just observations polluted with noise. For larger lead times the relative error decreases again and approaches the theoretical value of
For comparison with the simple theoretical model we needed the bias and the variances of both observations and models. These quantities were calculated by assuming that the distribution of model ensembles and observations for the same day every year only differ by an offset of the mean. This offset can be different for different years but is the same for model ensembles and observations. Using the variances and bias estimated in this way we found an excellent agreement between the simple theoretical model and the GEFS forecasts regarding the relative error and rank of the ensemble mean as functions of lead time. We also noted a weak relationship between ensemble spread and the ensemble mean error as predicted by the simple theoretical model.
The results above were obtained when forecasting the 2-m temperature on 15 January but we have confirmed that we get similar results for other fields such as the 500-hPa geopotential height and the total cloud cover and for other dates such as 15 July. Focusing on smaller regions—and therefore fewer dimensions—we found that the ensemble mean is still in general better than most individual ensemble members but that the effect as expected is less pronounced (Fig. 7).
In this paper we analyzed the properties of the ensemble mean by applying a simple statistical model. However, the blessings of high dimensionality may also be applied when analyzing more advanced statistical models or other measures of the forecast skill. The blessings of high dimensionality hold for all distance measures, the only requirement is that the diagnostic is of high dimensionality. Such diagnostics include the root-mean-square error and correlations between model and observations when these are calculated over an extended spatial region. Diagnostics which are not of high dimensionality include those based on simple spatial averages.
Acknowledgments
This work is supported by the NordForsk-funded Nordic Centre of Excellence project (Award 76654) Arctic Climate Predictions: Pathways to Resilient, Sustainable Societies (ARCPATH) and by the project European Climate Prediction System (EUCP) funded by the European Union under Horizon 2020 (Grant 776613). NOAA’s second-generation global ensemble reforecast dataset was downloaded from https://www.esrl.noaa.gov/psd/forecasts/reforecast2/download.html.
REFERENCES
Annan, J. D., and J. C. Hargreaves, 2011: Understanding the CMIP3 multimodel ensemble. J. Climate, 24, 4529–4538, https://doi.org/10.1175/2011JCLI3873.1.
Bishop, C., 2007: Pattern Recognition and Machine Learning. 2nd ed. Springer-Verlag, 740 pp.
Blum, A., J. Hopcroft, and R. Kannan, 2018: Foundations of Data Science. Cornell University, 479 pp., https://www.cs.cornell.edu/jeh/book.pdf.
Bowler, N. E., M. J. P. Cullen, and C. Piccolo, 2015: Verification against perturbed analyses and observations. Nonlinear Processes Geophys., 22, 403–411, https://doi.org/10.5194/npg-22-403-2015.
Bretherton, C. S., M. Widmann, V. P. Dymnikov, J. M. Wallace, and I. Bladé, 1999: The effective number of spatial degrees of freedom of a time-varying field. J. Climate, 12, 1990–2009, https://doi.org/10.1175/1520-0442(1999)012<1990:TENOSD>2.0.CO;2.
Buizza, R., and T. N. Palmer, 1998: Impact of ensemble size on ensemble prediction. Mon. Wea. Rev., 126, 2503–2518, https://doi.org/10.1175/1520-0493(1998)126<2503:IOESOE>2.0.CO;2.
Casanova, S., and B. Ahrens, 2009: On the weighting of multimodel ensembles in seasonal and short-range weather forecasting. Mon. Wea. Rev., 137, 3811–3822, https://doi.org/10.1175/2009MWR2893.1.
Cherkassky, V. S., and F. Mulier, 2007: Learning fromData: Concepts, Theory, andMethods. 2nd ed. John Wiley & Sons, 560 pp.
Christiansen, B., 2018a: Ensemble averaging and the curse of dimensionality. J. Climate, 31, 1587–1596, https://doi.org/10.1175/JCLI-D-17-0197.1.
Christiansen, B., 2018b: Reply to “Comments on ‘Ensemble averaging and the curse of dimensionality.”’ J. Climate, 31, 9017–9019, https://doi.org/10.1175/JCLI-D-18-0416.1.
Delle Monache, L., and R. B. Stull, 2003: An ensemble air-quality forecast over western Europe during an ozone episode. Atmos. Environ., 37, 3469–3474, https://doi.org/10.1016/S1352-2310(03)00475-8.
Donoho, D. L., 2000: High-dimensional data analysis: The curses and blessings of dimensionality. AMS’ Conf. on Mathematical Challenges of the 21st Century, Los Angeles, CA, American Mathematical Society, 1–32.
Du, J., S. L. Mullen, and F. Sanders, 1997: Short-range ensemble forecasting of quantitative precipitation. Mon. Wea. Rev., 125, 2427–2459, https://doi.org/10.1175/1520-0493(1997)125<2427:SREFOQ>2.0.CO;2.
Ebert, E. E., 2001: Ability of a poor man’s ensemble to predict the probability and distribution of precipitation. Mon. Wea. Rev., 129, 2461–2480, https://doi.org/10.1175/1520-0493(2001)129<2461:AOAPMS>2.0.CO;2.
Ehrendorfer, M., 1994: The Liouville equation and its potential usefulness for the prediction of forecast skill. Part I: Theory. Mon. Wea. Rev., 122, 703–713, https://doi.org/10.1175/1520-0493(1994)122<0703:TLEAIP>2.0.CO;2.
Ehrendorfer, M., 2006: The Liouville equation and atmospheric predictability. Predictability of Weather and Climate, T. Palmer and R. Hagedorn, Eds., Cambridge University Press, 59–98, https://doi.org/10.1017/CBO9780511617652.005.
Flato, G., and Coauthors, 2013: Evaluation of climate models. Climate Change 2013: The Physical Science Basis, T. F. Stocker et al., Eds., Cambridge University Press, 741–866, https://doi.org/10.1017/CBO9781107415324.020.
Fortin, V., M. Abaza, F. Anctil, and R. Turcotte, 2014: Why should ensemble spread match the RMSE of the ensemble mean? J. Hydrometeor., 15, 1708–1713, https://doi.org/10.1175/JHM-D-14-0008.1.
Gleckler, P., K. Taylor, and C. Doutriaux, 2008: Performance metrics for climate models. J. Geophys. Res., 113, D06104, https://doi.org/10.1029/2007JD008972.
Gorban, A. N., and I. Y. Tyukin, 2018: Blessing of dimensionality: Mathematical foundations of the statistical physics of data. Philos. Trans. Royal Soc. London, 376A, https://doi.org/10.1098/rsta.2017.0237.
Gorban, A. N., I. Y. Tyukin, and I. Romanenko, 2016: The blessing of dimensionality: Separation theorems in the thermodynamic limit. IFAC-PapersOnLine, 49, 64–69, https://doi.org/10.1016/j.ifacol.2016.10.755.
Hagedorn, R., F. J. Doblas-Reyes, and T. N. Palmer, 2005: The rationale behind the success of multi-model ensembles in seasonal forecasting—I. Basic concept. Tellus, 57A, 219–233, https://doi.org/10.1111/j.1600-0870.2005.00103.x.
Hamill, T. M., and S. J. Colucci, 1997: Verification of Eta-RSM short-range ensemble forecasts. Mon. Wea. Rev., 125, 1312–1327, https://doi.org/10.1175/1520-0493(1997)125<1312:VOERSR>2.0.CO;2.
Hamill, T. M., G. T. Bates, J. S. Whitaker, D. R. Murray, M. Fiorino, T. J. Galarneau, Y. Zhu, and W. Lapenta, 2013: NOAA’s second-generation global medium-range ensemble reforecast dataset. Bull. Amer. Meteor. Soc., 94, 1553–1565, https://doi.org/10.1175/BAMS-D-12-00014.1.
Kainen, P. C., 1997: Utilizing geometric anomalies of high dimension: When complexity makes computation easier. Computer Intensive Methods in Control and Signal Processing, M. Kárný and K. Warwick, Eds., Birkhäuser, 283–294, https://doi.org/10.1007/978-1-4612-1996-5_18.
Knutti, R., R. Furrer, C. Tebaldi, J. Cermak, and G. A. Meehl, 2010: Challenges in combining projections from multiple climate models. J. Climate, 23, 2739–2758, https://doi.org/10.1175/2009JCLI3361.1.
Kolczynski, W. C., D. R. Stauffer, S. E. Haupt, N. S. Altman, and A. Deng, 2011: Investigation of ensemble variance as a measure of true forecast variance. Mon. Wea. Rev., 139, 3954–3963, https://doi.org/10.1175/MWR-D-10-05081.1.
Krishnamurti, T. N., C. M. Kishtawal, T. E. LaRow, D. R. Bachiochi, Z. Zhang, C. E. Williford, S. Gadgil, and S. Surendran, 1999: Improved weather and seasonal climate forecasts from multimodel superensemble. Science, 285, 1548–1550, https://doi.org/10.1126/science.285.5433.1548.
Lambert, S. J., and G. J. Boer, 2001: CMIP1 evaluation and intercomparison of coupled climate models. Climate Dyn., 17, 83–106, https://doi.org/10.1007/PL00013736.
Leith, C. E., 1974: Theoretical skill of Monte Carlo forecasts. Mon. Wea. Rev., 102, 409–418, https://doi.org/10.1175/1520-0493(1974)102<0409:TSOMCF>2.0.CO;2.
Marzban, C., R. Wang, F. Kong, and S. Leyton, 2011: On the effect of correlations on rank histograms: Reliability of temperature and wind speed forecasts from finescale ensemble reforecasts. Mon. Wea. Rev., 139, 295–310, https://doi.org/10.1175/2010MWR3129.1.
McKeen, S., and Coauthors, 2005: Assessment of an ensemble of seven real-time ozone forecasts over eastern North America during the summer of 2004. J. Geophys. Res., 110, D21307, https://doi.org/10.1029/2005JD005858.
Nicolis, C., R. A. P. Perdigao, and S. Vannitsem, 2009: Dynamics of prediction errors under the combined effect of initial condition and model errors. J. Atmos. Sci., 66, 766–778, https://doi.org/10.1175/2008JAS2781.1.
Palmer, T. N., 2000: Predicting uncertainty in forecasts of weather and climate. Rep. Prog. Phys., 63, 71, https://doi.org/10.1088/0034-4885/63/2/201.
Pincus, R., C. P. Batstone, R. J. P. Hofmann, K. E. Taylor, and P. J. Glecker, 2008: Evaluating the present-day simulation of clouds, precipitation, and radiation in climate models. J. Geophys. Res., 113, D14209, https://doi.org/10.1029/2007JD009334.
Rougier, J., 2016: Ensemble averaging and mean squared error. J. Climate, 29, 8865–8870, https://doi.org/10.1175/JCLI-D-16-0012.1.
Rougier, J., 2018: Comments on “Ensemble averaging and the curse of dimensionality.” J. Climate, 31, 9015–9016, https://doi.org/10.1175/JCLI-D-18-0274.1.
Sanderson, B. M., and R. Knutti, 2012: On the interpretation of constrained climate model ensembles. Geophys. Res. Lett., 39, L16708, https://doi.org/10.1029/2012GL052665.
Sillmann, J., V. V. Kharin, X. Zhang, F. W. Zwiers, and D. Bronaugh, 2013: Climate extremes indices in the CMIP5 multimodel ensemble: Part 1. Model evaluation in the present climate. J. Geophys. Res. Atmos., 118, 1716–1733, https://doi.org/10.1002/jgrd.50203.
Stensrud, D. J., J.-W. Bao, and T. T. Warner, 2000: Using initial condition and model physics perturbations in short-range ensemble simulations of mesoscale convective systems. Mon. Wea. Rev., 128, 2077–2107, https://doi.org/10.1175/1520-0493(2000)128<2077:UICAMP>2.0.CO;2.
Surcel, M., I. Zawadzki, and M. K. Yau, 2014: On the filtering properties of ensemble averaging for storm-scale precipitation forecasts. Mon. Wea. Rev., 142, 1093–1105, https://doi.org/10.1175/MWR-D-13-00134.1.
Talagrand, M., 1995: Concentration of measure and isoperimetric inequalities in product spaces. Inst. Hautes Études Sci. Publ. Math., 81, 73–205, https://doi.org/10.1007/BF02699376.
Toth, Z., and E. Kalnay, 1997: Ensemble forecasting at NCEP and the breeding method. Mon. Wea. Rev., 125, 3297–3319, https://doi.org/10.1175/1520-0493(1997)125<3297:EFANAT>2.0.CO;2.
Toth, Z., O. Talagrand, G. Candille, and Y. Zhu, 2003: Probability and ensemble forecasts. Forecast Verification: A Practitioner’s Guide in Atmospheric Science, I. T. Jolliffe and D. B. Stephenson, Eds., John Wiley & Sons, 137–163.
Tribbia, J. J., and D. P. Baumhefner, 1988: The reliability of improvements in deterministic short-range forecasts in the presence of initial state and modeling deficiencies. Mon. Wea. Rev., 116, 2276–2288, https://doi.org/10.1175/1520-0493(1988)116<2276:TROIID>2.0.CO;2.
van Loon, M., and Coauthors, 2007: Evaluation of long-term ozone simulations from seven regional air quality models and their ensemble. Atmos. Environ., 41, 2083–2097, https://doi.org/10.1016/j.atmosenv.2006.10.073.
Wang, X., and S. S. Shen, 1999: Estimation of spatial degrees of freedom of a climate field. J. Climate, 12, 1280–1291, https://doi.org/10.1175/1520-0442(1999)012<1280:EOSDOF>2.0.CO;2.
Whitaker, J. S., and A. F. Loughe, 1998: The relationship between ensemble spread and ensemble mean skill. Mon. Wea. Rev., 126, 3292–3302, https://doi.org/10.1175/1520-0493(1998)126<3292:TRBESA>2.0.CO;2.
Wilks, D. S., 2011: On the reliability of the rank histogram. Mon. Wea. Rev., 139, 311–316, https://doi.org/10.1175/2010MWR3446.1.