1. Introduction
Global circulation models are the main tools used to simulate future climate conditions. There are two main practices by which to initialize these models that represent predictions for two different time scales. The first practice corresponds to long-term climate projections. In this type of simulation, the climate models are initialized in the preindustrial era (aka uninitialized runs) and integrated forward in time (usually until 2100). In these simulations, the atmospheric composition in the past is set according to observations, while for the future, several representative concentration pathways (Moss et al. 2008), corresponding to different scenarios of atmospheric composition changes, are used. These climate simulations are expected to provide information about the response of the climate system to different emission scenarios by predicting the changes in the long-term averages (10 yr and more) and the statistics of climate variables under different atmospheric composition scenarios (Collins et al. 2013).
The second practice, which is considered in this work, is near-term (decadal) climate predictions intended to provide information on the dynamics of the climate system in time scales shorter than those of significant changes in the atmospheric concentration and the response time of the climate system to such changes. In this practice, the climate models are initialized with observed conditions close to the prediction period. The expected information from these simulations is the dynamics of the monthly to decadal averages of climate variables (Collins 2007; Meehl et al. 2009, 2014; Warner 2011), which is of great importance for climate services (Cane 2010). Recent studies have demonstrated a potential decadal prediction skill in different regions and for different physical processes (Smith et al. 2007; Keenlyside et al. 2008; Meehl et al. 2009, 2014; Pohlmann et al. 2009).
Despite their relatively short term, decadal climate predictions are still accompanied by large uncertainties, and new methods to improve the predictions and reduce the associated uncertainties are of great interest. One of the main approaches to improving climate predictions is to combine the output from an ensemble of climate models. This approach has two known advantages compared with single-model predictions. First, it was shown that the ensemble average generates improved predictions (Doblas-Reyes et al. 2000, 2003; Hagedorn et al. 2005; Palmer et al. 2004, 2000; Kim et al. 2012); second, the distribution of the ensemble member predictions can provide an estimate of the uncertainties. However, the simple average of climate simulations does not account for the quality differences between the ensemble members; therefore, it is expected that weighting the ensemble members based on their past performances will increase the forecast skill.
Uncertainties in climate predictions can be attributed to three main sources. The first is internal variability: that is, uncertainties due to different initial conditions (either different initialization times or different initialization methods) that were used to run a specific model. The second source is model uncertainties due to different predictions of different models. The third source is forcing scenario uncertainties due to different scenarios assumed for the future atmospheric composition (Hawkins and Sutton 2009). The contribution of these sources to the total uncertainty of the climate system varies with the prediction lead time and is also spatially, seasonally, and averaging-period dependent (Strobach and Bel 2015b). It was shown that, for global and regional decadal climate predictions, scenario uncertainties are negligible compared to the first two sources (Hawkins and Sutton 2009; Cox and Stephenson 2007).
There are two contributions to the internal variability: variability due to different starting conditions and variability due to different initialization methods. Uncertainties due to different starting conditions stem from the chaotic nature of the simulated climate dynamics and cannot be reduced using the ensemble approach. However, uncertainties due to different initialization methods and the model variability can be reduced by weighting the members of the ensemble. The total reduction of the uncertainty depends on the relative contribution of these sources to the total uncertainty.
Bayesian inference is one of the methods that have been used in the past to weight an ensemble of climate models. The main part of this method is the calculation of the posterior density, which is proportional to the product of the prior and the likelihood. The Bayesian method optimizes the probability density function (PDF) of the climate variable to the PDF of the data during a learning period and uses it for future predictions. It does not assign weights to the climate models; instead, it gives an estimation for the PDF of the predicted climate variable. Bayesian inference has been used extensively for projections of future climate (Buser et al. 2009, 2010; Smith et al. 2009; Tebaldi et al. 2005; Tebaldi and Knutti 2007; Furrer et al. 2007; Greene et al. 2006; Murphy et al. 2004; Räisänen et al. 2010) and also for near-term climate predictions (Rajagopalan et al. 2002; Robertson et al. 2004). The use of Bayesian inference has reduced the uncertainties of the climate projections and improved their near-term predictions. However, this method relies on many assumptions regarding the distribution of the climate variables that are not always valid, making the Bayesian inference subjective and variable dependent.
A second, and more common, method that has been used to improve climate predictions is linear regression (Feng et al. 2011; Chakraborty and Krishnamurti 2009; Doblas-Reyes et al. 2005; Fraedrich and Smith 1989; Kharin and Zwiers 2002; Krishnamurti 1999; Krishnamurti et al. 2000; Pavan and Doblas-Reyes 2000; Peña and van den Dool 2008; Peng et al. 2002; Yun et al. 2005, 2003). The linear regression method does not assign weights to the ensemble members but rather attempts to find a set of coefficients such that the scalar product of the vector of coefficients and the vector of the model predictions yields the minimal sum of squared errors relative to past observations. The same set of coefficients is then used to produce future predictions. Similarly to the Bayesian method, the regression method also relies on a few inherent assumptions, such as the normal distribution of the prediction errors (therefore, defining the optimal coefficients as those minimizing the sum of squared errors) and the independence of the ensemble member predictions.
Sequential learning algorithms (SLAs, also known as online learning) (Cesa-Bianchi and Lugosi 2006) weight ensemble members based on their past performances. These algorithms were shown to improve long-term climate predictions (Monteleoni et al. 2010, 2011) and seasonal to annual ozone concentration forecasts (Mallet et al. 2009; Mallet 2010). More recently, it was shown that decadal climate predictions of the 2-m temperature can be improved using SLAs and can even become skillful when the climatology is added as a member of the ensemble (Strobach and Bel 2015a). The SLAs have several advantages over the other ensemble methods described above. First, they do not rely on any assumption regarding the models and the distribution of the climate variables. In addition, the weights assigned to the models can be used for model evaluation and the comparison of different parameterization schemes or initialization methods. Third, the weighted ensemble provides not only predictions but also the associated uncertainties. All these characteristics suggest that the SLAs are suitable for the improvement of various climate variable predictions.
Here, we test the performances of SLAs in predicting the previously investigated 2-m temperature (Strobach and Bel 2015a) and three additional climate variables: namely, the zonal and meridional components of the surface wind and the surface pressure. The performances of the SLAs are compared with those of the regression method. The comparison with the Bayesian method is not straightforward and is not included here. We also study the effects of different learning periods and different bias correction methods on the SLA performances. In addition, we consider a new metric, the reliability, to assess the performances of the forecasters. The SLAs are used here in a nontraditional way. Namely, the weights of the ensemble members are updated during a learning period, and the predictions are made not only for the next outcome but for the whole time series during the validation period. This type of prediction is different from previous climate predictions made using the SLAs (Monteleoni et al. 2010, 2011) and is also beyond the framework in which the SLAs are expected to perform well. The results of phase 5 of the Coupled Model Intercomparison Project (CMIP5) (Taylor et al. 2009) decadal experiments constitute the ensemble, and the NCEP–NCAR reanalysis data (Kalnay et al. 1996) are considered as the observations. This paper is organized as follows. In section 2, we present the data that we used in this study, including the models and the reanalysis data. In addition, we discuss the different bias correction methods that we used. In section 3, we describe the SLAs and the regression forecasting methods as we implemented them. We also provide the details of the climatology that we derived from the reanalysis data. In section 5, we present the predictions of the different forecasting methods. We also evaluate their global and regional performances based on their root-mean-square errors (RMSEs). The global and regional uncertainties and reliabilities of the predictions of the different forecasting methods are presented in sections 6 and 7, respectively. The weights assigned by the SLAs to the different models and to the climatology (all the members of the ensemble) are presented in section 8. The results are discussed and summarized in section 9.
2. Models and data
The decadal experiments were introduced to the Coupled Model Intercomparison Project multimodel ensemble in its fifth phase. The objective of these experiments is to investigate the ability of climate models to produce skillful future climate predictions for a decadal time scale. The climate models in these experiments were initialized with interpolated observation data of the ocean, sea ice, and atmospheric conditions, together with the atmospheric composition (Taylor et al. 2009). The ability of these simulations to produce skillful predictions was not investigated widely, but it was shown that they can generate skillful predictions in specific regions around the world (Kim et al. 2012; Kirtman et al. 2013; Doblas-Reyes et al. 2013; Meehl et al. 2009; Pohlmann et al. 2009; Müller et al. 2012; Meehl et al. 2014; Müller et al. 2014; Kruschke et al. 2014).
The CMIP5 decadal experiments were initialized every 5 yr between 1961 and 2011 for 10-yr simulations, with three exceptional experiments that were extended to 30-yr simulations. One of these 30-yr experiments was initialized in 1981 and simulated the climate dynamics until 2011. The output of four variables from this experiment is tested here: surface temperature, zonal and meridional surface wind components, and surface pressure. In what follows, we analyze the monthly means of these variables.
Table 1 shows the eight climate models included in our ensemble. The decadal experiments of the CMIP5 include a set of runs for each of the models, differing by the starting date and the initialization scheme used. We chose, arbitrarily, the first run of each model. As long as the model variability is the main source of uncertainty, the choice of the realization should not be significant for our analysis. Indeed, it was found that, in the CMIP5 decadal experiments, the model variability is the main source of uncertainty, independent of the prediction lead time, as long as the predictions are not bias corrected. Bias correction reduces mainly the model variability; however, the contribution of the model variability remains important (Strobach and Bel 2015b).
Model availability. (Expansions of acronyms are available online at http://www.ametsoc.org/PubsAcronymList.)
The NCEP–NCAR reanalysis data (Kalnay et al. 1996) were used as the observation data for the learning and for the evaluation of the forecasting methods performances. We are aware of other reanalysis projects (Uppala et al. 2005; Onogi et al. 2007); however, we selected the NCEP–NCAR data based on its wide use (note that the assessment of the quality of the different reanalysis projects is subjective and is beyond the scope of this paper). The effects of using different reanalysis data are left for future research.
Bias correction
The predictions made by the climate models often suffer from inherent systemic errors (Goddard et al. 2013), and it is common to apply bias correction methods to the model outputs before analyzing them. For long-term climate projections, this procedure is more straightforward because of the available reference period. Bias correction in decadal climate predictions is not trivial not only because there is no clear reference period but also because some of these experiments are known to have a drift from the initial condition to the model’s climatology during the first years of the simulation (Meehl et al. 2009).
Here, two bias correction methods and the original data were considered. The original data without any bias correction is denoted as “no correction.” The first bias correction method corresponds to subtracting from each model results their average during the learning period and adding the climatological average (the average of the NCEP–NCAR reanalysis data for the same period). This method is denoted as the “average correction.” The second bias correction method corresponds to subtracting from each model and for each calendar month the corresponding average during the learning period and adding the NCEP–NCAR reanalysis average for that calendar month during the same learning period. This method is denoted as the “climatology correction.” The two bias correction methods described above do not account for the explicit time dependence of the bias. However, it is reasonable to assume that, for decadal climate predictions, the bias does not change considerably with time.
3. Forecasting methods
In this work, we consider three SLAs, introduced below. More thorough descriptions of the SLAs can be found in Cesa-Bianchi and Lugosi (2006) and in Monteleoni and Jaakkola (2003). We also consider the linear regression (REG) (Krishnamurti et al. 2000) method in order to compare the performances of the SLAs to the well-known regression method. The climatology (CLM) is considered here as the threshold for skillful predictions. For clarity, the equations that describe the forecasting methods omit the spatial indices. However, the forecasting schemes were applied to each of the grid cells independently, thereby allowing the spatial distribution of the weights (or the coefficients in the case of the REG) and the reference climatology. The consideration of the effect of geospatial neighborhoods (McQuade and Monteleoni 2012) is beyond the scope of this manuscript. The data that we used consists of time series of monthly means, and the weights were updated in each time step (i.e., every month) during the learning period.
a. The EWA and the EGA
The SLAs use an ensemble of “experts” (climate models), each of which provides a prediction for a future value of a climate variable, to provide a forecast of the climate variable in terms of the weighted average of the ensemble. The process is sequentially repeated, with the weights of the models being updated after each measurement according to their prediction skill. We divide the period of the model simulations into two parts. The first part is the learning (or training) period, the data of which is used to update the model weights in the manner described above, and the second part is used for validating and evaluating the forecaster performance. At the end of the learning period, the learning ends, and the weights generated by the SLA in the last learning step are used to weight the predictions of the climate models during the validation period.
The deviation of the prediction of model E





Note that, for the first learning step, one has to assign initial weights to the models. Without any a priori knowledge of the models performances, the natural choice is to assign equal weights to all the models. If the hierarchy of the models is known, it is possible to assign their initial weights accordingly.

b. The learn-α algorithm






c. Regression



d. Climatology




4. Evaluation metrics









5. Predictions
a. Global
The simplest measure of the performance of the forecasters is the global average of the root-mean-squared error
Globally averaged RMSE with climatology.
Citation: Journal of Climate 29, 10; 10.1175/JCLI-D-15-0648.1
In the rest of this section, we focus on the case of 20 yr of learning and 10 yr of prediction. This learning period is chosen because it extends well beyond the drift of the models and it is also long enough to capture the simulated climate dynamics over the time scale of the prediction period. In Table 2, we detail the bias correction that resulted in the smallest
The optimal bias correction [no correction (nbias), average correction (bias), and climatology correction (mbias)] for each forecaster and each climate variable: the surface temperature T, zonal wind U, meridional wind V, and pressure P.
b. Regional
The
Figure 2 depicts the spatial distributions of
Surface temperature RMSE skill score: (top left) EGA, (top right) EWA, (bottom left) LAA, and (bottom right) REG. Positive values correspond to a smaller RMSE than the climatology and vice versa. White dots represent significant improvement, and black dots represent a significantly poorer performance.
Citation: Journal of Climate 29, 10; 10.1175/JCLI-D-15-0648.1
The spatial distribution of the RMSE skill score for the zonal and meridional wind components is shown in Figs. 3 and 4, respectively. The errors in the predictions of both wind components have similar characteristics. The EGA shows a lower spatial variability in the errors of the wind component predictions compared with the errors of the surface temperature predictions. The EWA and LAA show similar variability to the one found for the surface temperature. All the SLAs show smaller regions of significantly lower errors than the climatology. The REG shows a poorer performance compared with the climatology over most of the globe.
As in Fig. 2, but for zonal wind.
Citation: Journal of Climate 29, 10; 10.1175/JCLI-D-15-0648.1
As in Fig. 2, but for meridional wind.
Citation: Journal of Climate 29, 10; 10.1175/JCLI-D-15-0648.1
Figure 5 shows the spatial distribution of the surface pressure
The SLAs show a positive RMSE skill score over most of the globe for the surface temperature and wind components. The LAA shows the highest score (relative to the other forecasters) for the surface pressure. There are several regions (such as the North Atlantic, north Indian Ocean, and northern Eurasia) where all the SLAs seem to provide a smaller RMSE than the climatology. This suggests that at least some of the models capture processes that result in a deviation from the climatology and that the SLAs are capable of tracking these models.
6. Uncertainties
The RMSE is an important measure of the quality of the predictions; however, the uncertainties associated with the predictions of the forecasters are crucial for a meaningful assessment of the predictions’ quality. The uncertainties are quantified here using the standard deviation of the ensemble. A natural reference for comparing the variance of the ensemble weighted by the forecasters is the variance of the equally weighted ensemble that represents no learning. It was mentioned earlier that the linear regression does not assign weights to the models in the ensemble but rather attempts to find the linear combination of their predictions that minimizes the sum of squared errors. Therefore, the variance of the regression predictions is based on the uncertainty in determining the regression coefficients. In this section, we will compare the uncertainties of the three SLAs, the regression, and the equally weighted ensemble. Our analysis proceeds similarly to the analysis of the RMSE; first, we present the globally averaged standard deviation
a. Global
Figure 6 shows
As in Fig. 1, but for STD with
Citation: Journal of Climate 29, 10; 10.1175/JCLI-D-15-0648.1
b. Regional
The uncertainty has a large spatial variability. We focus on the 20-yr learning period, average bias corrected data and the ensemble that includes the climatology. The STD skill score shows the temporally averaged variability of the ensemble weighted by the forecasters compared with that of the equally weighted ensemble during the validation period. Figure 7 shows the spatial distribution of the surface temperature STD skill score for the three SLAs and the regression. All the SLAs have a positive skill score over almost all the globe, which reflects the fact that they have smaller uncertainties than the equally weighted ensemble. Over most of the globe,
Spatial distribution of the surface temperature STD skill score: (top left) EGA, (top right) EWA, (bottom left) LAA, and (bottom right) REG. Positive values correspond to a smaller STD than the equally weighted ensemble and vice versa. White dots represent a statistically significant reduction of the STD and black dots represent a statistically significant increase of the STD relative to the STD of the equally weighted ensemble.
Citation: Journal of Climate 29, 10; 10.1175/JCLI-D-15-0648.1
7. Reliability
An ideal forecaster should have low errors and low uncertainties; however, an uncertainty that is lower than the error reflects an overconfident forecaster, while an uncertainty that is larger than the error reflects an underconfident forecaster. The difference between the error and the uncertainty is often used to measure the reliability of the predictions. In this work, we use the reliability score defined in section 4.
a. Global
Figure 8 shows the
b. Regional
The reliability also varies spatially. We focus on the 20-yr learning period, average bias corrected data, and the ensemble that includes the climatology. The REL shows the temporal average of the reliability of the ensemble weighted by the forecasters during the validation period. Figure 9 shows the spatial distribution of the surface temperature REL for the three SLAs, the regression, and the equally weighted ensemble. The equally weighted ensemble is seen to have higher reliability than the other forecasters over most of the globe. The SLAs and the regression are mostly overconfident; namely, the magnitude of the error is larger than the estimated standard deviation of the results. All the SLAs show higher reliability in the tropics and lower reliability in the midlatitudes and toward the poles. In Figs. 19–21 of the supplementary information, we depict the spatial distribution of the forecasters’ reliability for the surface wind and pressure. In all the variables, the equally weighted ensemble shows higher reliability than the other forecasters. For the surface wind components, we find higher reliability over land (except for Antarctica) and lower reliability over the oceans. The surface pressure shows a spatial distribution that resembles the one observed for the surface temperature.
Surface temperature REL: (top left) EGA, (top right) EWA, (bottom left) LAA, and (bottom right) REG; (middle) The equally weighted ensemble. Positive values indicate overconfidence and vice versa.
Citation: Journal of Climate 29, 10; 10.1175/JCLI-D-15-0648.1
8. Climatology weights
Some of the results above regarding the skill of the forecasters were explained by the weights assigned to the climatology. Because of its superior performance, compared with each of the models in the ensemble, it is expected that the SLAs would assign it a high weight. However, assigning too high a weight to the climatology implies that the forecaster is not capable of capturing deviations from the climatology because of the physical processes captured in the models. Ideally, forecasters should balance between the smaller RMSE of the climatology and the additional information available from the other models.
Figures 10, 11, and 12 show the spatial distribution of the weight assigned to the climatology, for each of the four climate variables, by the EGA, EWA, and LAA, respectively. The weights in these figures correspond to the weights assigned at the end of the 20-yr learning period (i.e., the weights used for the predictions). The average bias correction was applied to the data. The color bar was set to emphasize the differences. The EWA assigns the climatology weights close to 1 over most of the globe for the variables considered here. In the east Pacific tropics regions, the climatology is not the only dominant model in the EWA predictions of the surface temperature, zonal wind, and pressure. The LAA also assigns very high weights to the climatology over most of the globe for all the variables. The exceptions here are the region between the southern westerlies and polar easterlies and the northern Atlantic, in which the weight of the climatology is lower in the predictions of the surface temperature, and the Pacific and Atlantic tropics, in which the weight of the climatology is lower in the LAA predictions of the surface zonal wind. Both the weights assigned by the EWA and those assigned by the LAA stem from the fact that these SLAs are designed to track the best expert, which in our ensemble turns out to be the climatology over most of the globe. Note that the EWA tracks the model that had the lowest cumulative loss during the whole learning period, while the LAA tracks the model that had the smallest cumulative loss during the last part of the learning period. In addition, while the LAA assigns a weight that is almost 1 to the best model, the LAA assigns a lower weight if the values of α that are different from zero obtained a nonzero weight. The EGA assigns lower weights than the EWA and LAA to the climatology over most of the globe for all the climate variables considered here. For the surface temperature, there are still large regions in which the predictions of the EGA are dominated by the climatology. For the surface wind components, the regions dominated by the climatology have some overlap with the regions dominating the EGA’s surface temperature predictions. For these variables, the climatology dominates the EGA’s predictions over large regions in the tropics and southern Asia. The weight assigned to the climatology by the EGA for the surface pressure is lower in most regions and resulted in a somewhat poorer performance by the EGA in the predictions of this variable. This different performance for the surface pressure may be related to the lower quality of the data for this variable. Unlike the EWA and the LAA, the EGA is not designed to track the best expert, but rather to track the measurements. Therefore, the lower weight assigned to the climatology suggests that useful information can be extracted from the models, and their ability to capture some of the processes affecting the climate dynamics in decadal time scales can be quantified by the weight assigned to them by the EGA.
Spatial distribution of the weight assigned to the climatology by the EGA forecaster for surface (a) temperature, (b) zonal wind, (c) meridional wind, and (d) pressure.
Citation: Journal of Climate 29, 10; 10.1175/JCLI-D-15-0648.1
As in Fig. 10, but for the EWA forecaster.
Citation: Journal of Climate 29, 10; 10.1175/JCLI-D-15-0648.1
As in Fig. 10, but for the LAA forecaster.
Citation: Journal of Climate 29, 10; 10.1175/JCLI-D-15-0648.1
The regression does not assign weights to the models in the ensemble. However, one can try to quantify the significance of the climatology by studying the ratio between the magnitude of the climatology coefficient and the coefficients of the other models. Figure 13 shows the spatial distribution of
As in Fig. 10, but for the relative significance of the climatology in the predictions by the REG forecaster. See section 8 for the exact definition of the significance.
Citation: Journal of Climate 29, 10; 10.1175/JCLI-D-15-0648.1
9. Summary and discussion
An ensemble of climate models is known to improve climate predictions and to help better assess the uncertainties associated with them. In this paper, we tested five different methods to combine the results of the decadal predictions of different models: EWA, EGA, LAA, REG, and the equally weighted ensemble. The first three forecasters represent learning algorithms that weight the ensemble models according to their performances during a learning period. The REG attempts to find the linear combination of the model predictions that minimizes the sum of squared errors during the learning period, and the equally weighted ensemble represents no learning.
The learning algorithms were used here to update the weights during the learning period, and the predictions, for the whole time series of the validation period, were made using the weights assigned to the ensemble models at the end of the learning period. This use of the SLAs is different from previous studies and is also beyond the framework in which the SLAs are guaranteed to perform well. Nevertheless, we found that the SLAs performed very well and showed both global and regional skill even in predicting time series that extend long after the learning has ended.
We tried different learning periods and found that a learning period that is at least as long as the prediction period yields better results. In our experiments, learning periods longer than 10 yr ensure that the learning exceeds well beyond the drift of the models. The globally averaged root-mean-squared error,
The simple average was shown to have larger errors and larger uncertainties than the forecasters that used a learning period to weight–combine the model predictions. Over most of the globe, the SLAs performed better than the regression in terms of the metrics we used to quantify the forecasters’ performances. This poorer performance of the regression is associated with the basic assumptions of the linear regression and its oversimplified method to linearly combine the model predictions. The SLAs do not rely on these assumptions and use more advanced methods to weight the models, resulting in smaller errors. The EWA and the LAA were found to be more appropriate in cases in which tracking of the best model is of interest. The climatology outperformed all the other models in the ensemble; therefore, the EWA and the LAA converged to it over most of the globe and for all the four climate variables. The equally weighted ensemble was shown to be underconfident in most cases, while all the other forecasters were found to be overconfident. However, the measure used for the reliability does not favor forecasters with smaller errors and uncertainties but only forecasters with uncertainties that are close to their errors. Therefore, we believe that a more appropriate reliability score should be defined. However, it is beyond the scope of this work.
Although the globally averaged RMSE of the SLAs is only a few percentage points smaller than that of the climatology, it was shown to be statistically significant. In addition, we found that, in many regions, the improvement is larger. The spatial distribution of the SLAs’ performance showed that they are skillful over large continuous regions. This finding suggests that the models were able to capture some physical processes that resulted in deviations from the climatology and that the SLAs enabled the extraction of this additional information. Similarly, the large regions over which the climatology outperforms the forecasters may suggest that physical processes, associated with the climate dynamics affecting these regions, are not well captured by the models. The SLAs’ performances were much poorer for the surface pressure than for the other variables. This poorer performance might be related to the quality of the models output or to the large fluctuations of this variable. The better predictions of the EWA and LAA (relative to the EGA) for the surface pressure result from their tracking of the climatology; therefore, it is difficult to extract from their predictions new information regarding the physics of the climate system. The reduction of the uncertainties, relative to the equally weighted ensemble, is much more substantial than the reduction of the errors and can reach to about 60%–70%, globally. The uncertainties considered here are only those associated with the model variability within the ensemble. The internal uncertainties, scenario uncertainties, and other sources of uncertainty were not studied here.
The results presented here are in agreement with previous results [see Meehl et al. (2009) and references therein]. However, in this work, monthly means were considered, whereas in previous works, the averages of longer periods, which have smaller fluctuations, were considered. Smaller errors than the climatology (i.e., skillful predictions) of the SLAs can be observed in the North Atlantic, north Indian Ocean, northern Eurasia, and some regions in the Pacific Ocean. In addition, the SLAs showed predictive skill for the surface temperature over many land areas, such as northern Eurasia, Greenland, and, to some extent, also the Americas. The results suggest that learning algorithms can be used to improve climate predictions and to reduce the uncertainties associated with them.
Acknowledgments
The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under Grant [293825]. We acknowledge the World Climate Research Programme’s Working Group on Coupled Modelling, which is responsible for the CMIP, and we thank the climate modeling groups (listed in Table 1 of this paper) for producing and making available their model output. For the CMIP, the U.S. Department of Energy’s Program for Climate Model Diagnosis and Intercomparison provides coordinating support and led development of software infrastructure in partnership with the Global Organization for Earth System Science Portals. E.S. wishes to acknowledge a fellowship from the Israel Water Authority.
REFERENCES
Buser, C. M., H. R. Künsch, D. Lüthi, M. Wild, and C. Schär, 2009: Bayesian multi-model projection of climate: Bias assumptions and interannual variability. Climate Dyn., 33, 849–868, doi:10.1007/s00382-009-0588-6.
Buser, C. M., H. R. Künsch, and C. Schär, 2010: Bayesian multi-model projections of climate: Generalization and application to ENSEMBLES results. Climate Res., 44, 227–241, doi:10.3354/cr00895.
Cane, M. A., 2010: Climate science: Decadal predictions in demand. Nat. Geosci., 3, 231–232, doi:10.1038/ngeo823.
Cesa-Bianchi, N., and G. Lugosi, 2006: Prediction, Learning, and Games. Cambridge University Press, 408 pp.
Chakraborty, A., and T. N. Krishnamurti, 2009: Improving global model precipitation forecasts over India using downscaling and the FSU superensemble. Part II: Seasonal climate. Mon. Wea. Rev., 137, 2736–2757, doi:10.1175/2009MWR2736.1.
Collins, M., 2007: Ensembles and probabilities: A new era in the prediction of climate change. Philos. Trans. Roy. Soc. London, A365, 1957–1970, doi:10.1098/rsta.2007.2068.
Collins, M., and Coauthors, 2013: Long-term climate change: Projections, commitments and irreversibility. Climate Change 2013: The Physical Science Basis, T. F. Stocker et al., Eds., Cambridge University Press, 1029–1136, doi:10.1017/CBO9781107415324.024.
Cox, P., and D. Stephenson, 2007: A changing climate for prediction. Science, 317, 207–208, doi:10.1126/science.1145956.
Doblas-Reyes, F. J., M. Déqué, and J.-P. Piedelievre, 2000: Multi-model spread and probabilistic seasonal forecasts in PROVOST. Quart. J. Roy. Meteor. Soc., 126, 2069–2087, doi:10.1256/smsqj.56704.
Doblas-Reyes, F. J., V. Pavan, and D. B. Stephenson, 2003: The skill of multi-model seasonal forecasts of the wintertime North Atlantic Oscillation. Climate Dyn., 21, 501–514, doi:10.1007/s00382-003-0350-4.
Doblas-Reyes, F. J., R. Hagedorn, and T. N. Palmer, 2005: The rationale behind the success of multi-model ensembles in seasonal forecasting—II. Calibration and combination. Tellus, 57A, 234–252, doi:10.1111/j.1600-0870.2005.00104.x.
Doblas-Reyes, F. J., and Coauthors, 2013: Initialized near-term regional climate change prediction. Nat. Commun., 4, 1715, doi:10.1038/ncomms2704.
Feng, J., D.-K. Lee, C. Fu, J. Tang, Y. Sato, H. Kato, J. L. Mcgregor, and K. Mabuchi, 2011: Comparison of four ensemble methods combining regional climate simulations over Asia. Meteor. Atmos. Phys., 111, 41–53, doi:10.1007/s00703-010-0115-7.
Fraedrich, K., and N. R. Smith, 1989: Combining predictive schemes in long-range forecasting. J. Climate, 2, 291–294, doi:10.1175/1520-0442(1989)002<0291:CPSILR>2.0.CO;2.
Furrer, R., S. R. Sain, D. Nychka, and G. A. Meehl, 2007: Multivariate Bayesian analysis of atmosphere–ocean general circulation models. Environ. Ecol. Stat., 14, 249–266, doi:10.1007/s10651-007-0018-z.
Goddard, L., and Coauthors, 2013: A verification framework for interannual-to-decadal predictions experiments. Climate Dyn., 40, 245–272, doi:10.1007/s00382-012-1481-2.
Greene, A. M., L. Goddard, and U. Lall, 2006: Probabilistic multimodel regional temperature change projections. J. Climate, 19, 4326–4343, doi:10.1175/JCLI3864.1.
Hagedorn, R., F. J. Doblas-Reyes, and T. N. Palmer, 2005: The rationale behind the success of multi-model ensembles in seasonal forecasting—I. Basic concept. Tellus, 57A, 219–233, doi:10.1111/j.1600-0870.2005.00103.x.
Hawkins, E., and R. Sutton, 2009: The potential to narrow uncertainty in regional climate predictions. Bull. Amer. Meteor. Soc., 90, 1095–1107, doi:10.1175/2009BAMS2607.1.
Herbster, M., and M. K. Warmuth, 1998: Tracking the best expert. Mach. Learn., 32, 151–178, doi:10.1023/A:1007424614876.
Kalnay, E., and Coauthors, 1996: The NCEP/NCAR 40-Year Reanalysis Project. Bull. Amer. Meteor. Soc., 77, 437–471, doi:10.1175/1520-0477(1996)077<0437:TNYRP>2.0.CO;2.
Keenlyside, N. S., M. Latif, J. Jungclaus, L. Kornblueh, and E. Roeckner, 2008: Advancing decadal-scale climate prediction in the North Atlantic sector. Nature, 453, 84–88, doi:10.1038/nature06921.
Kharin, V. V., and F. W. Zwiers, 2002: Climate predictions with multimodel ensembles. J. Climate, 15, 793–799, doi:10.1175/1520-0442(2002)015<0793:CPWME>2.0.CO;2.
Kim, H.-M., P. J. Webster, and J. A. Curry, 2012: Evaluation of short-term climate change prediction in multi-model CMIP5 decadal hindcasts. Geophys. Res. Lett., 39, L10701, doi:10.1029/2012GL051644.
Kirtman, B., and Coauthors, 2013: Near-term climate change: Projections and predictability. Climate Change 2013: The Physical Science Basis, T. F. Stocker et al., Eds., Cambridge University Press, 953–1028, doi:10.1017/CBO9781107415324.023.
Krishnamurti, T. N., 1999: Improved weather and seasonal climate forecasts from multimodel superensemble. Science, 285, 1548–1550, doi:10.1126/science.285.5433.1548.
Krishnamurti, T. N., C. M. Kishtawal, Z. Zhang, T. LaRow, D. Bachiochi, E. Williford, S. Gadgil, and S. Surendran, 2000: Multimodel ensemble forecasts for weather and seasonal climate. J. Climate, 13, 4196–4216, doi:10.1175/1520-0442(2000)013<4196:MEFFWA>2.0.CO;2.
Kruschke, T., H. Rust, C. Kadow, G. Leckebusch, and U. Ulbrich, 2014: Evaluating decadal predictions of northern hemispheric cyclone frequencies. Tellus, 66A, 22830, doi:10.3402/tellusa.v66.22830.
Leutbecher, M., and T. N. Palmer, 2008: Ensemble forecasting. J. Comput. Phys., 227, 3515–3539, doi:10.1016/j.jcp.2007.02.014.
Mallet, V., 2010: Ensemble forecast of analyses: Coupling data assimilation and sequential aggregation. J. Geophys. Res., 115, D24303, doi:10.1029/2010JD014259.
Mallet, V., G. Stoltz, and B. Mauricette, 2009: Ozone ensemble forecast with machine learning algorithms. J. Geophys. Res., 114, D050307, doi:10.1029/2008JD009978.
McQuade, S., and C. Monteleoni, 2012: Global climate model tracking using geospatial neighborhoods. Proc. 26th AAAI Conf. on Artificial Intelligence, Toronto, Canada, Association for the Advancement of Artificial Intelligence, 335–341.
Meehl, G. A., and Coauthors, 2009: Decadal prediction. Bull. Amer. Meteor. Soc., 90, 1467–1485, doi:10.1175/2009BAMS2778.1.
Meehl, G. A., and Coauthors, 2014: Decadal climate prediction: An update from the trenches. Bull. Amer. Meteor. Soc., 95, 243–267, doi:10.1175/BAMS-D-12-00241.1.
Monteleoni, C., and T. Jaakkola, 2003: Online learning of non-stationary sequences. Adv. Neural Inf. Process. Syst., 16, 1093–1100.
Monteleoni, C., G. A. Schmidt, and S. Saroha, 2010: Tracking climate models. Proc. NASA Conf. on Intelligent Data Understanding, Mountain View, CA, NASA, 1–15.
Monteleoni, C., G. A. Schmidt, S. Saroha, and E. Asplund, 2011: Tracking climate models. Stat. Anal. Data Min., 4, 372–392, doi:10.1002/sam.10126.
Moss, R., and Coauthors, 2008: Towards new scenarios for analysis of emissions, climate change, impacts, and response strategies. IPCC Expert Meeting Rep., 25 pp. [Available online at https://www.ipcc.ch/pdf/supporting-material/expert-meeting-ts-scenarios.pdf.]
Müller, W. A., and Coauthors, 2012: Forecast skill of multi-year seasonal means in the decadal prediction system of the Max Planck Institute for Meteorology. Geophys. Res. Lett., 39, L22707, doi:10.1029/2012GL053326.
Müller, W. A., H. Pohlmann, F. Sienz, and D. Smith, 2014: Decadal climate predictions for the period 1901–2010 with a coupled climate model. Geophys. Res. Lett., 41, 2100–2107, doi:10.1002/2014GL059259.
Murphy, J. M., D. M. H. Sexton, D. N. Barnett, G. S. Jones, M. J. Webb, M. Collins, and D. A. Stainforth, 2004: Quantification of modelling uncertainties in a large ensemble of climate change simulations. Nature, 430, 768–772, doi:10.1038/nature02771.
Onogi, K., and Coauthors, 2007: The JRA-25 Reanalysis. J. Meteor. Soc. Japan, 85, 369–432, doi:10.2151/jmsj.85.369.
Palmer, T. N., Č. Branković, and D. S. Richardson, 2000: A probability and decision-model analysis of PROVOST seasonal multi-model ensemble integrations. Quart. J. Roy. Meteor. Soc., 126, 2013–2033, doi:10.1002/qj.49712656703.
Palmer, T. N., and Coauthors, 2004: Development of a European Multimodel Ensemble System for Seasonal-to-Interannual Prediction (DEMETER). Bull. Amer. Meteor. Soc., 85, 853–872, doi:10.1175/BAMS-85-6-853.
Pavan, V., and F. J. Doblas-Reyes, 2000: Multi-model seasonal hindcasts over the Euro-Atlantic: Skill scores and dynamic features. Climate Dyn., 16, 611–625, doi:10.1007/s003820000063.
Peña, M., and H. van den Dool, 2008: Consolidation of multimodel forecasts by ridge regression: Application to Pacific sea surface temperature. J. Climate, 21, 6521–6538, doi:10.1175/2008JCLI2226.1.
Peng, P., A. Kumar, H. van den Dool, and A. G. Barnston, 2002: An analysis of multimodel ensemble predictions for seasonal climate anomalies. J. Geophys. Res., 107, 4710, doi:10.1029/2002JD002712.
Pohlmann, H., J. H. Jungclaus, A. Köhl, D. Stammer, and J. Marotzke, 2009: Initializing decadal climate predictions with the GECCO oceanic synthesis: Effects on the North Atlantic. J. Climate, 22, 3926–3938, doi:10.1175/2009JCLI2535.1.
Räisänen, J., L. Ruokolainen, and J. Ylhäisi, 2010: Weighting of model results for improving best estimates of climate change. Climate Dyn., 35, 407–422, doi:10.1007/s00382-009-0659-8.
Rajagopalan, B., U. Lall, and S. E. Zebiak, 2002: Categorical climate forecasts through regularization and optimal combination of multiple GCM ensembles. Mon. Wea. Rev., 130, 1792–1811, doi:10.1175/1520-0493(2002)130<1792:CCFTRA>2.0.CO;2.
Robertson, A. W., U. Lall, S. E. Zebiak, and L. Goddard, 2004: Improved combination of multiple atmospheric GCM ensembles for seasonal prediction. Mon. Wea. Rev., 132, 2732–2744, doi:10.1175/MWR2818.1.
Ross, S. M., 2014: Regression. Introduction to Probability and Statistics for Engineers and Scientists, 5th ed. S. M. Ross, Ed., Academic Press, 357–444, doi:10.1016/B978-0-12-394811-3.50009-5.
Smith, D. M., S. Cusack, A. W. Colman, C. K. Folland, G. R. Harris, and J. M. Murphy, 2007: Improved surface temperature prediction for the coming decade from a global climate model. Science, 317, 796–799, doi:10.1126/science.1139540.
Smith, R. L., C. Tebaldi, D. Nychka, and L. O. Mearns, 2009: Bayesian modeling of uncertainty in ensembles of climate models. J. Amer. Stat. Assoc., 104, 97–116, doi:10.1198/jasa.2009.0007.
Strobach, E., and G. Bel, 2015a: Improvement of climate predictions and reduction of their uncertainties using learning algorithms. Atmos. Chem. Phys., 15, 8631–8641, doi:10.5194/acp-15-8631-2015.
Strobach, E., and G. Bel, 2015b: The contribution of internal and model variabilities to the uncertainty in CMIP5 decadal climate predictions. 35 pp. [Available online at http://arxiv.org/pdf/1508.01609v1.pdf.]
Taylor, K. E., R. J. Stouffer, and G. A. Meehl, 2009: A summary of the CMIP5 experiment design, 33 pp. [Available online at http://cmip-pcmdi.llnl.gov/cmip5/docs/Taylor_CMIP5_design.pdf.]
Tebaldi, C., and R. Knutti, 2007: The use of the multi-model ensemble in probabilistic climate projections. Philos. Trans. Roy. Soc., A365, 2053–2075, doi:10.1098/rsta.2007.2076.
Tebaldi, C., R. L. Smith, D. Nychka, and L. O. Mearns, 2005: Quantifying uncertainty in projections of regional climate change: A Bayesian approach to the analysis of multimodel ensembles. J. Climate, 18, 1524–1540, doi:10.1175/JCLI3363.1.
Uppala, S. M., and Coauthors, 2005: The ERA-40 re-analysis. Quart. J. Roy. Meteor. Soc., 131, 2961–3012, doi:10.1256/qj.04.176.
Warner, T. T., 2011: Numerical Weather and Climate Prediction. Cambridge University Press, 550 pp.
Yun, W. T., L. Stefanova, and T. N. Krishnamurti, 2003: Improvement of the multimodel superensemble technique for seasonal forecasts. J. Climate, 16, 3834–3840, doi:10.1175/1520-0442(2003)016<3834:IOTMST>2.0.CO;2.
Yun, W. T., L. Stefanova, A. K. Mitra, T. S. V. V. Kumar, W. Dewar, and T. N. Krishnamurti, 2005: A multi-model superensemble algorithm for seasonal climate prediction using DEMETER forecasts. Tellus, 57A, 280–289, doi:10.1111/j.1600-0870.2005.00131.x.