In high dimensional spaces random vectors drawn from the same distribution have almost the same length, and independent vectors are almost always orthogonal. In Christiansen (2018a) I used these properties to explain the ubiquitous observation that the ensemble mean very often has a smaller error than all individual ensemble members when compared to observations. I also explained why the error of the ensemble mean often is 30% smaller than the typical error of the ensemble members. This is not only found in validation of climate models but also when dealing with ensemble weather forecasts.
Assuming the simple statistical model that the observations , and the K ensemble members , , are drawn from the same N-dimensional distribution (N ≫ 1), then the properties of high dimensional spaces ensure that the ensemble mean, the observations, and an ensemble member form an isosceles right triangle with the ensemble mean at the right angle. The situation is illustrated in Fig. 1. From this it follows that the error of the ensemble mean is smaller than the error of the individual ensemble members. High dimensions are encountered when, for example, the error is measured as the root-mean-square over an extended spatial region. Apart from the high dimensions, the explanation also requires that the number of ensemble members is large so that the ensemble mean is located near the center.
In his comment Rougier (2018) claims the there is a logical flaw in the argument above because the ensemble mean is not independent from the individual ensemble members. Actually, the argument does not require that the ensemble mean is orthogonal on the individual ensemble members; it only requires that the ensemble mean is near the center. This is ensured by the law of large numbers as also discussed in Christiansen (2018a). Nevertheless, as the number of ensemble members increases, the contribution to the ensemble mean by each individual ensemble member vanishes. For example, considering the covariance between an ensemble member and the ensemble mean , we get , for large N. This term is a factor of smaller than the comparable terms . Also, the correlation between and becomes . Likewise, Rougier (2018) claims that the errors are correlated. But it is the central point in the argument that in the limit of large N we have as the cross-term vanishes because of the orthogonality and because the observations and the ensemble members have the same lengths. In the same way we get . In other words, it is the orthogonality that separates the joint distribution into the product of the marginal distributions.
The arguments up to this point are simple and flawless and, if in doubt, the results above can easily be confirmed with numerical experiments as they were in Christiansen (2018a).
Rougier (2018) also mentions that sometimes it is observed that some ensemble members have smaller errors than the ensemble mean and that this is in disagreement with the theory above. But the theory was derived in the limit . Christiansen (2018a) did include numerical analysis that showed that for smaller N the ensemble mean will often but not always be better than individual members (Figs. 3 and 5 in Christiansen 2018a). We can relax the requirement on N and still obtain analytical results. In a new paper (Christiansen 2018b, manuscript submitted to Mon. Wea. Rev.) I show that for large N the fraction of ensemble members better than the ensemble mean (i.e., the normalized rank of the ensemble mean) is
where is the cumulative probability density of the Gaussian. As mentioned in that paper, I have confirmed numerically that this is a very good approximation even for . Also, in good agreement with the results in Christiansen (2018a) we get the normalized ranks , 0.005, and 0.07, for , 30, and 10.
Finally, Rougier (2018) questions the applicability of the arguments to the ensemble of climate models. It is true that there are reasons to believe that the simple statistical model above is not fully representative for the ensemble of climate models. This model is often described as the “indistinguishable” interpretation and requires that ensemble members and observations are exchangeable. However, Christiansen (2018a) did test this based on 17 CMIP5 models by successively swapping the observations and a selected climate model. Again I found for each swap that the ensemble mean was better than almost all individual ensemble members and that the ensemble mean was around 30% better than the typical ensemble member. It is also easy to directly test the relevant properties of the ensemble. Calculating the lengths of the ensemble members we get values in the range of 390–666 K with a mean of 514 K.1 For the observations we get 419 K and for the ensemble mean 25 K. For the angles between the ensemble members we get and for the angles between the ensemble members and observations . This demonstrates that we are very close to the situation illustrated in Fig. 1.
Of course the simple statistical model may not be correct from all perspectives and I did mention in Christiansen (2018a) that the swapping experiment may lack power as a test of the statistical model. But as the aphorism goes, “all models are wrong but some are useful,” and the explanation given in Christiansen (2018a) does succeed in explaining the ubiquitous observations of the ensemble mean in a very simple way.
In Christiansen (2018a) the simple statistical model was extended to allow for bias and different variances of observations and ensemble members. I noted that the results regarding the ensemble mean are robust and only change little for small biases and small differences between the variances. In Christiansen (2018b, manuscript submitted to Mon. Wea. Rev.) I analyze weather forecasts using this extended model. Because the bias and variances depend on lead time so does the behavior of the model mean. For low lead times we are in the “truth plus error” interpretation where ensemble members are sampled from a distribution centered about the observations. For large lead times we approach the “indistinguishable” interpretation, where model ensembles and observations are all considered exchangeable. Again, I find an excellent agreement with the analytical results of the extended statistical model in high dimensions.
This work is supported by the NordForsk-funded Nordic Centre of Excellence project (Award 76654) Arctic Climate Predictions: Pathways to Resilient, Sustainable Societies (ARCPATH).
The original article that was the subject of this comment/reply can be found at http://journals.ametsoc.org/doi/abs/10.1175/JCLI-D-17-0197.1.
We are looking at the monthly climatology of near-surface temperature. The length is calculated in the space spanned by all 10 512 grid points and the 12 calendar months. The ensemble and the observations have been centered to their common mean. A length of 500 K corresponds to K in the simple statistical model.