1. Introduction
The goal of ensemble forecasting is to predict the probability distribution of the atmospheric state given all sources of the forecast uncertainty. Ensemble verification techniques verify either the predicted probabilities of weather events or selected statistical parameters of the predicted probability distribution of the state vector (e.g., Talagrand et al. 1997). In the latter case, the focus is usually on the verification of the first- and second-order moments of the probability distribution. The prediction of the first moment is the ensemble mean forecast, while the predictions of the second-order central statistical moments are the entries of the ensemble covariance (dispersion) matrix. The diagonal entries of this matrix represent the predicted variance of the state vector components, while the off-diagonal entries represent the covariance between pairs of the state vector components. The range of this matrix is the linear (vector) space spanned by the ensemble perturbations, the vectors that represent the difference between the ensemble members and the ensemble mean (e.g., Szunyogh 2014). This vector space can be considered a flow-dependent prediction of the space of forecast uncertainty. The efficiency of an ensemble in capturing the forecast uncertainty can be measured by computing the projection of the vector that represents the difference between a proxy for the true state, such as an analysis, and the ensemble mean on that space. There are a number of papers that have investigated the performance of operational ensemble forecast systems along these lines (e.g., Molteni and Buizza 1999; Wei and Toth 2003; Buizza et al. 2005; Herrera et al. 2016). Perhaps the most important difference between these studies was that they chose the components of the state vector in the computation of the verification metrics differently.
This paper presents the latest results of the investigations of our research group into the efficiency of global forecast ensembles in capturing the spatiotemporal evolution of the forecast uncertainty. In these investigations, the diagnostics have been computed for local state vectors, a choice motivated by the data assimilation papers by Ott et al. (2004) and Szunyogh et al. (2005). A local state vector is defined at each grid point of the ensemble dataset. The components of the local state vector are model gridpoint variables in a highly localized volume of the atmosphere, which is centered at the location associated with the local state vector. The investigations of our group started with an analysis of data generated by a research forecast system that was based on the model component of the Global Forecast System (GFS) of the National Centers for Environmental Prediction (Kuhl et al. 2007; Satterfield and Szunyogh 2010, 2011). It continued with a recent study by Herrera et al. (2016), in which the diagnostics of the earlier papers were applied to global ensemble forecast data from the world’s leading operational numerical weather prediction centers for January–February 2012.
The present paper updates the results of Herrera et al. (2016) based on data from January–February 2015, which allows for an assessment of the progress of the last three years. It also documents our first attempt to validate two prognostic relationships found by Satterfield and Szunyogh (2010, 2011) between the raw ensemble forecasts and the actual forecast uncertainty. One of these relationships provides a tool for the routine prediction of the reliability of the ensemble in capturing the uncertain forecast features. The other one predicts the 95th percentile value of the forecast error. In what follows, we explain the local diagnostics adapted from Herrera et al. (2016) and the performance of the predictive schemes of Satterfield and Szunyogh (2010, 2011) (section 2), describe the operational ensemble data that we analyze (section 3), present the verification results (section 4), and offer our conclusions (section 5).
2. Local diagnostics and predictive schemes
A forecast ensemble samples the flow-dependent multivariate probability distribution of the present and future atmospheric states given the sources of forecast uncertainty. We verify the ensemble-based predictions of the first and second central moments (mean, variances, and covariances) of the probability distribution of the local atmospheric states.
a. Local state vectors
Let












b. Local ensemble perturbations















c. Diagnostics





1) Bias

























2) Variance

























The two sides of Eq. (17) can be estimated by computing averages of
3) Covariance



























In practice,
d. The predictive linear relationships
The ensemble dimension (E dimension) is a measure of the steepness of the eigenvalue spectrum of
1) The minimum explained variance










2) The 95th percentile value of the forecast error






3. The forecast and verification data
We examine ensemble forecast data distributed through the THORPEX Interactive Grand Global Ensemble (TIGGE) by the forecast centers.
a. TIGGE
TIGGE includes operational global model forecasts from 10 major numerical weather prediction centers (Bougeault et al. 2010; Swinbank et al. 2016), but we process forecast data only from the
European Centre for Medium-Range Weather Forecasts (ECMWF),
National Centers for Environmental Prediction (NCEP),
Met Office (UKMO),
Japan Meteorological Agency (JMA),
Korean Meteorological Administration (KMA), and
Meteorological Service of Canada [Canadian Meteorological Centre (CMC)].
The main parameters of the investigated ensemble forecast systems.


KMA has moved from using bred vector initial condition perturbations to using perturbations generated by an ensemble transform Kalman filter (ETKF). They now also use a combination of stochastic kinetic energy backscatter (SKEB) and random parameter (RP) schemes to simulate the effects of random model errors. UKMO also switched to a combination of SKEB and RP schemes, reduced the number of ensemble members from 14 to 11, and shortened the maximum forecast time from 360 to 168 h. These changes to the UKMO ensemble were made as part of the Met Office’s strategy to wind down its focus on week 2 forecasting (R. Swinbank 2014, personal communication). JMA has increased the frequency of ensemble forecasts from once to twice daily and extended the maximum forecast time from 216 to 264 h.
b. The components of the local state vector
Following Satterfield and Szunyogh (2010, 2011) and Herrera et al. (2016), we choose
In our diagnostic calculations,
4. Results
We present results first for the evolution of
a. Comparison of VS, TV, and TVS
We compute
The evolution of

Temporal evolution of the diagnostics
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1

Temporal evolution of the diagnostics
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1
Temporal evolution of the diagnostics
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1
The results for NCEP are shown in the right panel of Fig. 1. For this ensemble,
The results on the evolution of

Temporal evolution of the diagnostics
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1

Temporal evolution of the diagnostics
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1
Temporal evolution of the diagnostics
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1
A unique aspect of the evolution of
b. The evolution of M2
Figures 1 and 2 show a slow but general growing trend of the bias

Spatial distribution of the bias
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1

Spatial distribution of the bias
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1
Spatial distribution of the bias
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1

Spaghetti diagrams for the temporal mean of the ECMWF ensemble for January and February 2015. Shown are the diagrams for the 5640-gpm isohypse at 500 hPa at (top left) analysis time and forecast lead times of (top right) 72 h, (bottom left) 120 h, and (bottom right) 360 h. The gray contour lines show the temporal mean of the ensemble members, while the black contour line shows the temporal mean of the ECMWF analyses.
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1

Spaghetti diagrams for the temporal mean of the ECMWF ensemble for January and February 2015. Shown are the diagrams for the 5640-gpm isohypse at 500 hPa at (top left) analysis time and forecast lead times of (top right) 72 h, (bottom left) 120 h, and (bottom right) 360 h. The gray contour lines show the temporal mean of the ensemble members, while the black contour line shows the temporal mean of the ECMWF analyses.
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1
Spaghetti diagrams for the temporal mean of the ECMWF ensemble for January and February 2015. Shown are the diagrams for the 5640-gpm isohypse at 500 hPa at (top left) analysis time and forecast lead times of (top right) 72 h, (bottom left) 120 h, and (bottom right) 360 h. The gray contour lines show the temporal mean of the ensemble members, while the black contour line shows the temporal mean of the ECMWF analyses.
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1

Spaghetti diagrams for the temporal mean of the 360-h (top) ECMWF, (middle) NCEP, and (bottom) CMC ensemble forecasts for January and February 2015. Shown by gray contour lines is the 5640-gpm isohypse at 500 hPa for the ensemble members and by black contour lines for the ECMWF analyses. (Results are not shown for the remaining ensembles, because they do not provide forecasts at 360-h lead time.)
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1

Spaghetti diagrams for the temporal mean of the 360-h (top) ECMWF, (middle) NCEP, and (bottom) CMC ensemble forecasts for January and February 2015. Shown by gray contour lines is the 5640-gpm isohypse at 500 hPa for the ensemble members and by black contour lines for the ECMWF analyses. (Results are not shown for the remaining ensembles, because they do not provide forecasts at 360-h lead time.)
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1
Spaghetti diagrams for the temporal mean of the 360-h (top) ECMWF, (middle) NCEP, and (bottom) CMC ensemble forecasts for January and February 2015. Shown by gray contour lines is the 5640-gpm isohypse at 500 hPa for the ensemble members and by black contour lines for the ECMWF analyses. (Results are not shown for the remaining ensembles, because they do not provide forecasts at 360-h lead time.)
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1
While the qualitative patterns of behavior of

Zonal anomalies of the time-mean flow for January and February of (left) 2012 and (right) 2015. Shown by color shading are the zonal anomalies (gpm), and by black contours the temporal mean of the geopotential height at 500 hPa. The heavy dashed line marks the southern boundary of the verification region (30°N). The computation of the time flow is based on ECMWF analyses.
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1

Zonal anomalies of the time-mean flow for January and February of (left) 2012 and (right) 2015. Shown by color shading are the zonal anomalies (gpm), and by black contours the temporal mean of the geopotential height at 500 hPa. The heavy dashed line marks the southern boundary of the verification region (30°N). The computation of the time flow is based on ECMWF analyses.
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1
Zonal anomalies of the time-mean flow for January and February of (left) 2012 and (right) 2015. Shown by color shading are the zonal anomalies (gpm), and by black contours the temporal mean of the geopotential height at 500 hPa. The heavy dashed line marks the southern boundary of the verification region (30°N). The computation of the time flow is based on ECMWF analyses.
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1
The positive anomalies (ridges) are stronger in the North Atlantic region and over eastern Europe in 2012, as well as in the northeast Pacific region in 2015. The region where
c. Results for the predictive linear relationships
The explained variance

Spatial distribution of the temporal mean of
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1

Spatial distribution of the temporal mean of
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1
Spatial distribution of the temporal mean of
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1
Kuhl et al. (2007) and Satterfield and Szunyogh (2010) observed a negative correlation between
As was done in the previous studies, we prepare estimates of the joint probability distribution function (JPDF) of the explained variance and the E dimension. To obtain estimates of the JPDF, we compute the relative frequencies of the values of the explained variance and the E dimension for discrete bins of the values for all locations

The joint probability distribution of the E dimension and the explained variance for the ECMWF ensemble in the NH extratropics at (top left) the analysis time, and forecast times of (top right) 72 h, (bottom left) 120 h, and (bottom right) 360 h. The bin increments are 0.25 for the E dimension and 0.005 for the explained variance. The maximum possible value of the E dimension is 24 for the analysis and 49 for the forecasts.
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1

The joint probability distribution of the E dimension and the explained variance for the ECMWF ensemble in the NH extratropics at (top left) the analysis time, and forecast times of (top right) 72 h, (bottom left) 120 h, and (bottom right) 360 h. The bin increments are 0.25 for the E dimension and 0.005 for the explained variance. The maximum possible value of the E dimension is 24 for the analysis and 49 for the forecasts.
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1
The joint probability distribution of the E dimension and the explained variance for the ECMWF ensemble in the NH extratropics at (top left) the analysis time, and forecast times of (top right) 72 h, (bottom left) 120 h, and (bottom right) 360 h. The bin increments are 0.25 for the E dimension and 0.005 for the explained variance. The maximum possible value of the E dimension is 24 for the analysis and 49 for the forecasts.
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1
To find an approximate quantitative relationship between the explained variance and the E dimension, we fit a function of the form of Eq. (21) to the data pairs at each lead time. For the function fitting, we divide the data pairs randomly into training datasets and test datasets. Seventy-five percent of the data points are assigned to the training dataset, with the remaining 25% assigned to the test dataset. The data are ordered by values of E dimension and divided into 100 bins of equal numbers of data separately for the training and the test datasets. For each bin, we calculate the mean of the E dimension and the minimum value of the explained variance and perform a linear regression on the E dimension and the explained variance values from the training dataset. The linear regression provides the estimates of the parameters a and b. We use these values of a and b to predict the minimum of the explained variance in the test dataset based on the corresponding values of the E dimension.
The squared correlation values

Graphical illustration of the relationship between the E dimension and the minimum of the explained variance in the NH extratropics. The training data are represented by triangles, the test data by open circles, and the fitted linear regression function is shown by the straight black line. The test data would fall on the straight line if the linear model was perfect. Shown are the distributions for (top left) the analysis time, and forecast times of (top right) 72 h, (bottom left) 120 h, and (bottom right) 360 h. The legends show the average correlation values for the training dataset (
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1

Graphical illustration of the relationship between the E dimension and the minimum of the explained variance in the NH extratropics. The training data are represented by triangles, the test data by open circles, and the fitted linear regression function is shown by the straight black line. The test data would fall on the straight line if the linear model was perfect. Shown are the distributions for (top left) the analysis time, and forecast times of (top right) 72 h, (bottom left) 120 h, and (bottom right) 360 h. The legends show the average correlation values for the training dataset (
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1
Graphical illustration of the relationship between the E dimension and the minimum of the explained variance in the NH extratropics. The training data are represented by triangles, the test data by open circles, and the fitted linear regression function is shown by the straight black line. The test data would fall on the straight line if the linear model was perfect. Shown are the distributions for (top left) the analysis time, and forecast times of (top right) 72 h, (bottom left) 120 h, and (bottom right) 360 h. The legends show the average correlation values for the training dataset (
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1
Satterfield and Szunyogh (2010) speculated that outliers were the likely cause of a general overprediction of the minimum explained variance. This motivates us to investigate the correlation between the E dimension and the 5th percentile value of the explained variance rather than its minimum. The results are shown in Fig. 10, with the corresponding average values of

As in Fig. 9, but for the relationship between the E dimension and the 5th percentile of the explained variance in the NH extratropics.
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1

As in Fig. 9, but for the relationship between the E dimension and the 5th percentile of the explained variance in the NH extratropics.
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1
As in Fig. 9, but for the relationship between the E dimension and the 5th percentile of the explained variance in the NH extratropics.
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1
Next, we investigate whether or not Eq. (22) holds for the ECMWF ensemble. The training and test datasets are constructed similarly to that already described for the estimation of the parameters a and b in Eq. (21). The available data are divided into 100 bins of equal number of

As in Fig. 9, but for the relationship between
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1

As in Fig. 9, but for the relationship between
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1
As in Fig. 9, but for the relationship between
Citation: Weather and Forecasting 32, 1; 10.1175/WAF-D-16-0126.1
5. Conclusions
We carried out diagnostic calculations to identify new directions for research and development to improve the design of ensemble forecast systems and the interpretation of the already available ensemble forecast information. The two main specific objectives of the paper were to update the results of Herrera et al. (2016) based on more recent operational ensemble forecast data and to validate two predictive linear relationships found by Satterfield and Szunyogh (2010, 2011) for a research ensemble.
Our main findings regarding the performance of the operational ensemble forecast systems are as follows:
The main characteristics of the ensemble systems of the different centers have not changed significantly between 2012 and 2015. The only exception is the UKMO ensemble, which was redesigned in 2014, shifting the focus from week 2 to week 1 predictions. The performance of this ensemble was improved in predicting the magnitude of the forecast uncertainty and was degraded in predicting the patterns of forecast uncertainty.
With respect to the performance measures of the present study, the ECMWF ensemble continues to provide the highest quality forecasts.
All ensembles have major difficulties with predicting the large-scale atmospheric flow in the long (longer than 240 h) forecast range. These difficulties are due to the inability of the ensemble members to maintain large-scale waves in the forecasts, which presents a stumbling block in the way of extending the skill of numerical weather forecasts to the subseasonal range.
The results show that the current technology of ensemble forecasting can provide reliable prediction of the first and second central moments of the probability distribution of the atmospheric state in the medium forecast range (from about 48–72 to 240 h). The ensembles, however, have serious difficulties at the shorter and longer forecast times. At the shorter forecast times, they cannot capture all uncertain forecast features (
It is somewhat disappointing to see that the increasing level of integration of the ensemble forecasting and data assimilation systems has not yet resulted in better short-range ensemble performance. This result shows the need for continued efforts to improve the ensemble generation techniques. Closing the gap between
The poor performance of the ensembles in maintaining the large-scale variability of the long-range forecasts is most likely due to deficiencies of the models rather than to shortcomings of the ensemble generation techniques. This may be the result of inadequate representation of tropical processes (e.g., MJO), atmosphere–ocean interactions, stratospheric variability, etc., in the models. The ensemble-based diagnostics of the present study can be used in the future to assess the forecast effects of improvements in the model.
The predictive relationships of Satterfield and Szunyogh (2010, 2011) hold well for the operational ensembles and could be utilized for the routine operational prediction of
the reliability of ensemble forecasts in capturing the local structure of the forecast uncertainty and
a near-worst-case scenario (95th percentile value) of the forecast error.
While it is widely accepted that ensembles can be used for the prediction of the forecast uncertainty, it is rarely recognized that the performance of the ensembles is also flow dependent. The first relationship provides a tool to predict this flow dependence. The second relationship can provide particularly valuable information for situations in which the ensemble spread is large. In such situations, the magnitude of the forecast error can vary across a wide range. The relationship provides a forecast of the top of that range. We also verified the two relationships for the other ensembles used in this study. The results (not shown) were similar to those for the ECMWF ensemble. We note that while we do not have a theoretical explanation for the second relationship, there is hope that the latest theoretical developments (Van Schaeybroeck and Vannitsem 2016) will lead to such an explanation.
Acknowledgments
This study was supported by the National Science Foundation (Grant ATM-AGS-1237613). We thank the three anonymous reviewers for their careful reading of our manuscript and their many helpful suggestions. Their comments helped us greatly improve the presentation of our ideas and results.
REFERENCES
Bougeault, P., and Coauthors, 2010: The THORPEX Interactive Grand Global Ensemble (TIGGE). Bull. Amer. Meteor. Soc., 91, 1059–1072, doi:10.1175/2010BAMS2853.1.
Buizza, R., J. Tribbia, F. Molteni, and T. Palmer, 1993: Computation of optimal unstable structures for a numerical weather prediction model. Tellus, 45, 388–407, doi:10.1034/j.1600-0870.1993.t01-4-00005.x.
Buizza, R., P. L. Houtekamer, Z. Toth, G. Pellerin, M. Wei, and Y. Zhu, 2005: A comparison of the ECMWF, MSC, and NCEP global ensemble prediction systems. Mon. Wea. Rev., 133, 1076–1097, doi:10.1175/MWR2905.1.
Herrera, M. A., I. Szunyogh, and J. Tribbia, 2016: Forecast uncertainty dynamics in the THORPEX Interactive Grand Global Ensemble (TIGGE). Mon. Wea. Rev., 144, 2739–2766, doi:10.1175/MWR-D-15-0293.1.
Kuhl, D., and Coauthors, 2007: Assessing predictability with a local ensemble Kalman filter. J. Atmos. Sci., 64, 1116–1140, doi:10.1175/JAS3885.1.
Molteni, F., and R. Buizza, 1999: Validation of the ECMWF Ensemble Prediction System using empirical orthogonal functions. Mon. Wea. Rev., 127, 2346–2358, doi:10.1175/1520-0493(1999)127<2346:VOTEEP>2.0.CO;2.
Oczkowski, M., I. Szunyogh, and D. J. Patil, 2005: Mechanisms for the development of locally low-dimensional atmospheric dynamics. J. Atmos. Sci., 62, 1135–1156, doi:10.1175/JAS3403.1.
Ott, E., and Coauthors, 2004: A local ensemble Kalman filter for atmospheric data assimilation. Tellus, 56A, 415–428, doi:10.1111/j.1600-0870.2004.00076.x.
Patil, D. J., B. R. Hunt, E. Kalnay, J. A. Yorke, and E. Ott, 2001: Local low-dimensionality of atmospheric dynamics. Phys. Rev. Lett., 86, 5878–5881, doi:10.1103/PhysRevLett.86.5878.
Satterfield, E., and I. Szunyogh, 2010: Predictability of the performance of an ensemble forecast system: Predictability of the space of uncertainties. Mon. Wea. Rev., 138, 962–981, doi:10.1175/2009MWR3049.1.
Satterfield, E., and I. Szunyogh, 2011: Assessing the performance of an ensemble forecast system in predicting the magnitude and the spectrum of analysis and forecast uncertainties. Mon. Wea. Rev., 139, 1207–1223, doi:10.1175/2010MWR3439.1.
Swinbank, R., and Coauthors, 2016: The TIGGE project and its achievements. Bull. Amer. Meteor. Soc., 97, 49–67, doi:10.1175/BAMS-D-13-00191.1.
Szunyogh, I., 2014: Applicable Atmospheric Dynamics: Techniques for the Exploration of Atmospheric Dynamics. 1st ed. World Scientific, 588 pp.
Szunyogh, I., E. Kostelich, G. Gyarmati, D. Patil, B. Hunt, E. Kalnay, E. Ott, and J. Yorke, 2005: Assessing a local ensemble Kalman filter: Perfect model experiments with the NCEP global model. Tellus, 57A, 528–545, doi:10.1111/j.1600-0870.2005.00136.x.
Talagrand, O., 1981: A study of the dynamics of four-dimensional data assimilation. Tellus, 33, 43–60, doi:10.1111/j.2153-3490.1981.tb01729.x.
Talagrand, O., R. Vautard, and B. Strauss, 1997: Evaluation of probabilistic prediction systems. Proc. Workshop on Predictability, Reading, United Kingdom, ECMWF, 1–25.
Van Schaeybroeck, B., and S. Vannitsem, 2016: A probabilistic approach to forecast the uncertainty with ensemble spread. Mon. Wea. Rev., 144, 451–468, doi:10.1175/MWR-D-14-00312.1.
Wei, M., and Z. Toth, 2003: A new measure of the ensemble performance: Perturbation versus error correlation analysis (PECA). Mon. Wea. Rev., 131, 1549–1565, doi:10.1175//1520-0493(2003)131<1549:ANMOEP>2.0.CO;2.
Figure 3 can be directly compared to Fig. 10 in Herrera et al. (2016).