## 1. Introduction

Probabilistic projections of climate change, as for example presented in the 2013 IPCC assessment (IPCC 2013), are based on comprehensive multimodel ensembles of global climate models (GCMs; see, e.g., Taylor et al. 2012) and GCM-driven regional climate models (RCMs; see, e.g., van der Linden and Mitchell 2009; Mearns et al. 2009; Giorgi et al. 2009). In practice projections are thus usually derived from a series of heterogeneous GCM–RCM model chains. Analyzing and interpreting such datasets is challenging from both a climate science but also statistical point of view (see, e.g., Tebaldi and Knutti 2007; Knutti et al. 2010; Stephenson et al. 2012).

Among the most frequent and simplest attempts is the calculation of the multimodel mean for a specific quantity of interest, possibly weighting each model by its ability to reproduce the observed climate as done for example by the reliability ensemble average (Giorgi and Mearns 2002).

However, assigning weights to the ensemble members is problematic for various reasons (see, e.g., Tebaldi and Knutti 2007; Knutti et al. 2010; Weigel et al. 2010; Déqué and Somot 2010; Chandler 2013). The choice of metric is not unique, such that different weighting schemes may lead to different climate scenarios (Lenderink 2010). Additionally, assigning a single numeric value to individual models is difficult when a suite of variables in different regions is considered, as performance of models varies from variable to variable and region to region. Assigning weights ultimately assigns probabilities to different models, and a formal and objective methodology based on sound probabilistic and statistical assessments is desirable.

Therefore, coherent statistical treatments of multimodel ensembles have become appealing over the last decade. A Bayesian approach turns out to be especially suited because assumptions on the data generating process and expert information used in the analysis can nicely and transparently be laid out. Based on Nychka and Tebaldi (2003), who give a formal justification of the reliability ensemble average, Tebaldi et al. (2005) implemented a Bayesian analysis of a GCM multimodel ensemble. Similarly, Greene et al. (2006) choose model weights according to a Bayesian linear model. The approach of Tebaldi et al. (2005) has been generalized in many directions. Instead of analyzing different regions separately, Smith et al. (2009), for example, implement a multivariate analysis for various climatic regions. This approach has been further generalized by Furrer et al. (2007), Jun et al. (2008), Kaufman and Sain (2010), Kang et al. (2012), Salazar et al. (2011), Sain et al. (2011), Furrer et al. (2012), and Geinitz et al. (2015), who analyze fields of multimodel outputs via spatial processes. Tebaldi and Sansó (2009) and Buser et al. (2010b) implement a bivariate extension of Tebaldi et al. (2005) by modeling temperature and precipitation jointly. General considerations on the statistical treatment of multimodel ensembles with justifications for not using fully Bayesian methods are given in Rougier et al. (2013) and Chandler (2013).

In Buser et al. (2009) different bias assumptions for climate models are discussed. They compare two prominent assumptions called “constant bias” and “constant relation” [referred to as “bias correction” by Ho et al. (2012)] and show that climate projections substantially depend on the chosen assumption. The constant bias assumption underlies studies that identify climate change via the difference “scenario minus control” (see, e.g., Maraun 2012). In contrast, in the constant relation assumption, biases are climate state dependent and change with time (see also Piani et al. 2010; Boberg and Christensen 2012; Christensen and Boberg 2012). Within the constant relation approach, additive and multiplicative biases are considered, which are related to each other. Analyzing the same data as we do in this paper, Kerkhoff et al. (2014) found a preference for the constant relation assumption, although none of the assumptions seems to hold exactly. This suggests using an approach from Buser et al. (2010a) that provides a continuous transition between constant bias and constant relation with an additional parameter that is estimated from the data.

Besides the aforementioned bias assumptions, the literature on multimodel ensembles mainly relies on two paradigms (Sanderson and Knutti 2012). The first one is usually referred to as the “truth plus error” assumption (see, e.g., Tebaldi et al. 2005; Buser et al. 2009). It assumes that the multimodel ensemble members are centered around the true climate, with differences between the truth and the output from climate models being independent and identically distributed. Under this assumption, the error of the multimodel mean could be reduced to zero with a sufficiently large ensemble of climate models. This is not realistic as climate models are not independent, but rather share some basic structural features. The second assumption is based on the concept of exchangeability (Annan and Hargreaves 2010; Rougier et al. 2013). It assumes that all members of the multimodel ensemble are equally likely and that the true climate can be regarded as an additional member of this ensemble. The difference between the truth and the multimodel mean is then comparable to the difference between a single member and the multimodel mean. More generally, Chandler (2013) presents an approach that assumes exchangeability conditional upon a model consensus representing a collective discrepancy between models and true climate. All these assumptions can be questioned, as there is a complex dependence structure between climate models due to their genealogy (Masson and Knutti 2011).

In this manuscript we introduce and apply a hierarchical model to seasonal and regional temperature averages from the ENSEMBLES project (van der Linden and Mitchell 2009). This project represents a large coordinated effort toward quantifying regional climate change over Europe, making use of observations and model output from 15 GCM–RCM chains; see section 2. The proposed hierarchical model, described in section 3, builds upon methods derived in Buser et al. (2009), Buser et al. (2010a), and Kerkhoff et al. (2014). Key extensions of the model include the following:

- The use of a hierarchical framework that allows for correlations between model chains. Such correlations are particularly relevant if the same RCM is driven by different GCMs (Buser et al. 2010a; A. Fischer et al. 2012). Our approach follows Kerkhoff et al. (2014), and is similar to the one of Heaton et al. (2013), who introduced a Bayesian hierarchical analysis for the North American Regional Climate Change Assessment Program (Mearns et al. 2009). The hierarchical model exploits a simple linear relationship between RCMs and their driving GCMs with coefficients that depend on both the RCM and GCM components. This is a central assumption of our new model, which is well confirmed by data analysis (see Fig. 1, left panel).
- The consideration of a transient setting rather than restricting attention to a control and scenario period as in many previous studies (e.g., Buser et al. 2010a; A. Fischer et al. 2012; Heaton et al. 2013). This enables us to account for internal variability on multidecadal time scales. This type of variability is a major contributor to the overall uncertainty in climate projections (Hawkins and Sutton 2009; Deser et al. 2012) and may temporarily mask the effects of anthropogenic changes (i.e., global warming).
- The exploitation of a full Bayesian analysis rather than frequentist estimates as considered in Kerkhoff et al. (2014). This is essential since only a Bayesian analysis allows us to optimally separate bias changes from the climate signal, and to quantify uncertainties in the form of probability distributions. For the Bayesian analysis, a parametric approach to estimate multidecadal variability is preferable to the nonparametric approach used in Kerkhoff et al. (2014). We therefore use a basis of spline functions to represent multidecadal variability.
- The provision of a thorough statistical validation of the out-of-sample predictive performance using independent observations from 1991 to 2012. Although the predictive performance on longer time horizons would be more desirable, we think that the short-term validation provides a valuable additional check of our method.

For our model, the use of all GCM and RCM simulations is essential. Kerkhoff et al. (2014) found that RCMs are able to compensate for some of the driving GCMs’ biases in mean temperature and interannual variability, even on scales that are well resolved by GCMs. This key result is evident from Fig. 1 (right panel), which shows that RCM increments or bias “adjustments” (i.e., RCM minus GCM) are anticorrelated with the GCM biases (GCM minus OBS). As a result the overall biases of the model chains (i.e., RCM minus OBS) tend to be smaller than the GCM biases. While this result highlights the usefulness of model chains, it does not address another key question, namely whether RCMs provide information about how biases develop into the future. However, we do not make such a prior assumption in our analysis.

The reminder of this paper is organized as follows: Section 2 presents the data used, section 3 the methodology, section 4 the resulting climate change projections for four European regions, section 5 some sensitivity analysis using simpler statistical models or only parts of the available climate models, and section 6 an assessment of the predictive performance for the 20-yr period 1991–2012. The study is finally concluded in section 7.

## 2. Data

We analyze 2-m temperatures from the ENSEMBLES project (van der Linden and Mitchell 2009; Hewitt and Griggs 2004). In this project a suite of six GCMs with horizontal resolution of 100–300 km and 11 RCMs with 25-km resolution provide transient simulations for the period 1961–2099. The simulations use past observed greenhouse gas and aerosol concentrations, and the SRES A1B scenario (Nakicenovic and Swart 2000) after year 2000. Because of a lack of computational capacity, only 15 of all 66 possible RCM–GCM combinations were performed in the ENSEMBLES project (see Table 1).

ENSEMBLES data matrix with rows indicating the institution providing an RCM and columns the driving GCM. Asterisks highlight different subsets of the dataset that are used. A single asterisk indicates the subset used in Buser et al. (2010a) and in the rcm approach. Here, only one RCM per GCM was used. This is to reduce dependence between different RCMs potentially arising through common driving GCMs. Two asterisks indicate the constrained ensemble from E. Fischer et al. (2012) that is used also in the con approach from section 5. This subensemble proved to have a realistic present-day interannual variability and soil moisture distribution and in turn projected increases in interannual summer variability (E. Fischer et al. 2012). (Expansions of acronyms can be found at http://www.ametsoc.org/PubsAcronymList. HadQ0, HadQ3, and HadQ16 are configurations of HadCM3. BCM is the Bergen Climate Model. HCQ0, HCQ3, and HCQ16 are configurations of HadRM3.)

Besides projections from climate models, we use observational data from version 9.0 of E-OBS gridded observations (Haylock et al. 2008). This archive spans the period 1950–2012, and the station data are already interpolated on the native grid of the ENSEMBLES RCMs such that validating the RCMs is particularly easy.

In a first postprocessing step we aggregated this data seasonally [winter is December–February (DJF), spring is March–May (MAM), summer is June–August (JJA), and autumn is September–November (SON)] and regionally using only data from land grid points for eight European climate regions [see Table 2; see also Fig. 4 in Christensen and Christensen (2007) for an illustration]. Because of this aggregation and because we analyze temperature, we think that we can neglect interpolation errors in the E-OBS data.

Latitudes and longitudes of the European climate regions considered in this study.

## 3. Methodology

In this section we will introduce and refine a hierarchical model that has been proposed in Kerkhoff et al. (2014). We will then estimate its parameters using a Bayesian approach; see, e.g., Gelman et al. (2013). The foundation of this approach is the computation of the posterior density by combining the prior, which specifies information from experts or previous studies, with the likelihood, which specifies information contained in the data. We first define the parameters of our statistical model, then specify the joint likelihood for the ENSEMBLES dataset and the prior, and finally describe how to explore the resulting posterior.

### a. Notation

Because of the aforementioned data aggregation (see section 2) we assume independence between different seasons and regions. We therefore analyze each season and region separately, which simplifies notation. We denote by *t*, respectively. For the observations we have data spanning the period 1961–2012. However, we only incorporate the first

### b. Hierarchical modeling of the ENSEMBLES dataset

In this subsection we present the basics of the hierarchical model, then section 3c documents the specification of time-varying parameters, section 3d discusses underlying bias assumptions, and finally section 3e explains how to estimate the parameters in a Bayesian setting.

*t*aswhere

*g*in year

*t*is written aswhere

*t*intoand the GCM standard deviation in year

*t*intoWhile the systematic biases

*t*given its driving GCM

*g*can be written aswhere

### c. Separation of the climatological signal and multidecadal variations

The components

*c*and

*β*terms denoting the unknown parameters of the representation. In particular,

### d. Other time-dependent parameters

In our approach, the variance parameters

### e. Bias assumptions

The main parameters of interest are

*κ*, we now discuss a number of special cases. If

If *κ* denotes an unknown bias parameter between 0 and 1, we have a compromise between constant bias and constant relation as used in Buser et al. (2010a).

Additional flexibility is obtained if we replace the hard constraints *κ* is unknown, except for the restriction to the interval *κ* is in the model or not. However, the posterior distribution of the mean values

### f. The likelihood

Parameters of the Bayesian hierarchical model, their meaning, and the associated equation numbers for their definitions.

### g. Specification of the prior

In the Bayesian approach, we need to attach prior distributions to all unknown parameters. For computational convenience, we take normal priors for all mean and regression parameters, and instead of the variance parameters we work with their inverses, called precisions, for which we use gamma priors (see, e.g., Gelman et al. 2013). Note that the reparameterization in terms of precisions is for computational reasons only, and does not affect the outcome of our analysis. For the bias parameter *κ*, we choose a uniform distribution on [0, 1].

Prior distributions own hyperparameters that need to be specified. In case of normal priors, we have to choose means and variances and for the gamma distributions we need to specify shape and rate values. Hyperparameter values for the prior distributions associated with the hierarchical model parameters are given in Table 4.

Specification of the prior distribution for the parameters of the Bayesian hierarchical model. The first column indicates the parameter considered, the second column its prior distribution, the third and fourth columns are the associated hyperparameter values, and the last column indicates an interval containing 95% probability a priori for each specification, with limits rounded to one decimal.

We use weakly informative priors where possible; they carry little information and thus the posterior is mainly determined by the data. However, for those parameters associated with multidecadal variability (

Furthermore, we assume that the parameters of our hierarchical model are a priori independent. In this way, our formulation of the statistical model in section 3 amounts to using the “truth plus error” assumption for the climate signal and an exchangeability assumption for multidecadal variations. If we wanted to incorporate also the exchangeability assumption for the signal or a more general framework as in Chandler (2013), we would have to introduce correlations in the prior for

### h. Exploring the posterior

In a Bayesian analysis, conclusions about the parameters are based on the posterior distribution. According to Bayes formula, the joint posterior of all the parameters of the model given the data is proportional to the prior times the likelihood of the data. Hence, in principle the posterior is known. This, however, is of little practical help since computing the marginal posterior distributions of, for example,

We employ a Gibbs sampler with random-walk Metropolis steps since some of the so-called full conditional distributions (e.g., the one for

## 4. Climate change projections

In this section we present the results for the main parameters of interest of our statistical model (i.e., the climatological mean

First, we consider temperature changes for 2060 and 2085 with respect to year 2000. Posterior densities of these changes are given in Fig. 3 and in Fig. S1 of the supplementary material. These results support earlier model analyses and find a pronounced positive trend in the climatological signal for Europe. However, as we go further into the future the posterior uncertainty increases. Additionally, a latitudinal and seasonal dependence is noticeable in Fig. 3: in summer (winter) strongest increases are found in southern (northern) Europe (cf. MD with SC). This is in line with the climate research literature (IPCC 2013).

To assess the results of our Bayesian analysis, we show in Fig. 3 and Fig. S1 also estimates of the same temperature changes based on a simpler model for individual GCM and RCM outputs, indicated by pluses and dashes, respectively. These estimates are obtained by taking the difference between local linear fits to 29-yr GCM and RCM time slices centered at the respective year. Because temperature changes on these scales are dominated by anthropogenic greenhouse gas forcing rather than internal climate variability [see the comment by Laprise (2014)], a smoothing window of 29 yr should be adequate to extract climatological signals without being affected by multidecadal variability. By taking differences, these estimates implicitly make the constant bias assumption. Usually, the posterior density covers the range of the estimates from the simpler model. For the Mediterranean region in winter, however, this is not the case. Here, the posterior density is shifted to somewhat lower temperature increases. The reason for this is that our Bayesian analysis attributes part of the of the temperature change signal found in the climate models to a bias that is not constant.

Next, we show in Fig. 4 and Fig. S2 of the supplementary material the posterior densities of the changes in interannual variability, defined as the ratio

In winter (as in spring and autumn; see Fig. S2), changes in interannual variability are small; the posterior density of

In contrast to the climatological mean, the posterior densities of the change in interannual variability do not always span the entire range of the estimates from a simpler model (e.g., the Alps in summer). This indicates that our Bayesian analysis identifies models with very large interannual variability as outliers and downweights them in the estimation of

*t*= 2000, 2060, and 2085; see Fig. 5 and Fig. S7 of the supplementary material. These densities represent the uncertainty about the seasonal and regional average temperature in these years, given the observations and the multimodel ensemble. They are obtained by integrating out the unknown mean

## 5. Sensitivity study

There is no consensus on the likely trends in interannual summer variability in the ENSEMBLES data archive. While some studies find virtually no increase in variability (Buser et al. 2010a), our analysis as well as other studies that incorporate a subset of the entire ensemble (E. Fischer et al. 2012) show a robust increase. This section aims at finding reasons for these differences and therefore we have set up and fitted a suite of five alternative statistical models, which deviate significantly from our hierarchical model, as presented in section 3 and hereafter denoted as hbm, in terms of formulation or data usage.

The first alternative is a reduction of hbm that does not contain multidecadal variability, that is, with

The third alternative focuses on the GCM part of the hbm (i.e., only the GCMs are used whereas all RCMs are omitted). The distribution of the observations and of the GCM outputs is unchanged. As for hbm, we neglect any correlations between GCMs that might occur due to shared knowledge, code, etc., across institutions. This approach is called gcm.

The fourth alternative (hereafter rcm) is similar to con, but it is based on a different subset of RCM simulations, namely the one used in Buser et al. (2010a) (see Table 1). This subsets consists of six different RCMs driven by six different GCMs in order to avoid potential correlations between models.

As described in section 3, the hbm allows for bias changes *κ* with values between 0 and 1 so that a continuous interpolation between constant bias and constant relation is possible.

Posterior densities of the change in interannual variability are given in Fig. 6 and Fig. S8 of the supplementary material. For comparability’s sake, we included the geometric mean of the frequentist estimates from Fig. 4 and subsets of this data according to the different models used.

Differences between the approaches are usually minor in winter and especially between the hbm, the one without multidecadal variability (hbm w/o mdv), and nbc, which omits bias changes.

In summer, however, the rcm approach, which relies on the methodology derived in Buser et al. (2010a), projects no increase in interannual variability for southern Europe in contrast to our hbm (cf. red with dark blue boxplots in the bottom of Fig. 6 and Fig. S8). These differences stem from different data sources and their statistical treatment. For the rcm, a subset of RCMs (those that do not share a common driver) were used to determine the posterior distribution of the change in interannual variability. Contrarily, in our hierarchical analysis this change basically depends on the driving GCMs. Because the ENSEMBLES GCMs project a robust increase in southern Europe (see the light blue boxplots), also our hbm projects such increases.

More importantly, however, the hbm arrives at the same conclusion as the con that uses the subjectively constrained ensemble from E. Fischer et al. (2012). This constrained ensemble was chosen to ensure realistic present-day interannual variability in the current climate. In the assessment of the hbm this information is used implicitly but objectively, and it is reassuring that our hbm is able to project a similar increase in interannual summer variability.

This also indicates that the choice of models in Buser et al. (2010a) was a bit unfortunate since a set of models with unrealistic present-day interannual variability was chosen to derive climate change projections. Biases in interannual summer variability stem from soils that are too dry to evaporate (E. Fischer et al. 2012; Bellprat et al. 2013). Likewise, the projected increases in summer variability are interpreted as resulting from the large-scale drying of the European continent in summer (Seneviratne et al. 2006; Mueller and Seneviratne 2012), so there is a physical linkage between the representation of current-day variability and projections of variability changes, and this linkage supports the formulation of our model that entails a corresponding linkage with the constant relation approach.

We have also analyzed the sensitivity of the change in *κ* that are closer to 1, favoring the constant relation assumption. Contrarily, our original analysis (hbm) slightly prefers constant bias (*κ* closer to 0). As already stated in Buser et al. (2009), under constant relation, parts of the climate change signal are attributed to biases such that the resulting climate change signal is smaller than under constant bias.

## 6. Assessment of the predictive performance

In addition to the sensitivity study, we have assessed the out-of-sample predictive performance of our six statistical models from section 5. For this assessment we use the observations from the period 1991–2012. Since these observations have not been used during model fitting, we do compare the out-of-sample predictive performance and there is no unfair advantage for more complex models.

This step is important in any statistical analysis and it is particularly easy in our Bayesian analysis, because it provides predictive distributions of the observed temperatures in each year *t* between 1991 and 2012. For a definition of the posterior predictive distribution see Eq. (18) and the text. Similar to Salazar et al. (2011), we use the continuous ranked probability score (CRPS) that assesses both calibration and sharpness of the predictive distribution (Matheson and Winkler 1976). The CRPS is expressed in degrees Celsius, and lower values of the CRPS indicate a better predictive performance. Because we approximate the posterior by sampling, the predictive distribution becomes a Gaussian mixture and we can compute the CRPS for each year of the validation period according to Eqs. (5) and (6) from Grimit et al. (2006). Finally these values are then averaged over the period and reported in Table 5.

Mean CRPS values (°C) over the period 1991–2012 for different Bayesian models: our standard approach presented in this manuscript (hbm), an equivalent model without the inclusion of multidecadal variability (hbm w/o mdv), a Bayesian analysis of the constrained ensemble from E. Fischer et al. (2012b) (con), a Bayesian model that uses only the ENSEMBLES GCMs (gcm), a Bayesian model for the RCMs from Buser et al. (2010a) (rcm), and a reduction of the hbm that neglects bias changes (nbc).

Differences between our hierarchical model (hbm) and its alternatives (hbm w/o mdv, con, gcm, rcm and nbc) are usually small (cf. column 3 to columns 4–8 of Table 5 and Table S1 of the supplementary material). For the Alps and Mediterranean region in summer, however, larger differences between the hbm approach and the one without multidecadal variations (hbm w/o mdv; compare the third and fourth column of Table 5) exist. Here, incorporating multidecadal variations leads to an improvement in predictive power. This gain in predictive power can be attributed to the underlying functional form of the observed time series (see Fig. 9 and Fig. S11 of the supplementary material). Since the inclusion of the basis functions

A shortcoming of this validation exercise is that it is based on short-term projection horizons only, whereas the real climatological interest is on larger time scales. Hence it is not meant to choose or discard some statistical models for climate projections, but it rather serves as a sanity check for the hbm. However, we think that using the CRPS in combination with long historical simulations and observational time series could help in assessing probabilistic climate change projections from different statistical models.

## 7. Conclusions

We have implemented a Bayesian hierarchical statistical model that enables a coherent treatment of climate model results from heterogeneous RCM–GCM multimodel ensembles. Our approach is motivated by and builds upon findings of Buser et al. (2010a) and Kerkhoff et al. (2014).

Using our Bayesian hierarchical model, we derive climate change projections from the ENSEMBLES dataset. Our analyses confirm previous studies (Buser et al. 2010a) that find a robust increase in seasonal mean temperatures for all regions by the end of the twenty-first century. For interannual summer variability, however, our results, in contrast to other studies (Buser et al. 2010a), indicate also a distinct positive trend. In this way, our findings support conclusions drawn in E. Fischer et al. (2012), but without relying on expert information. The physical interpretation links this to soil moisture dynamics, and this reinforces the importance of the representation of land–atmosphere interactions for the simulation of interannual summer variability.

Furthermore, our results stress the importance of bias assumptions for the outcome of climate change studies. Many assessments neglect bias changes; however, our analysis favors the most general model that includes bias changes and a compromise between the “constant bias” and the “constant relation” assumption. Moreover, the anticipated climate change signal depends significantly on the chosen bias assumption, in particular in central and southern Europe during summer. Here, neglecting bias changes leads to weaker warming compared to an approach including bias changes.

Additionally, we investigate the role of internal decadal variability on the outcome of our analysis. Using an orthogonal spline basis, we are able to separate multidecadal variability from climate change signals. In this way, our approach is an elegant, fully Bayesian advancement of the work of A. Fischer et al. (2012). Assessing the out-of-sample predictive performance by calculating the continuous ranked probability score (CRPS; see Matheson and Winkler 1976), we show that the proposed hierarchical model is competitive and that the inclusion of multidecadal variability can improve near-term projections of climate in southern Europe.

Furthermore, our analysis implies important consequences for the design of future RCM–GCM multimodel ensembles. Because the gain in predictive power when going from independent simulations (exactly one GCM drives one RCM) toward the full data matrix (where sometimes many RCMs are driven by the same GCM; see, e.g., the ECHAM GCM in Table 1) is rather marginal (cf. the third and seventh column in Table 5), priority should be given to setting up many independent RCM–GCM simulations, using a large suite of driving GCMs with different RCMs.

## Acknowledgments

This study was funded by the Swiss National Science Foundation under Grant 200021-132316/1. We acknowledge the EU-FP6 project ENSEMBLES (http://ensembles-eu.metoffice.com) for providing GCM and RCM simulations and the E-OBS observational datasets, as well as the data providers in the ECA&D project (http://www.ecad.eu). Constructive comments from two anonymous referees are also acknowledged.

## APPENDIX

### Construction of Basis Functions Used in Section 3c

The construction of the basis functions *n* interior knots can be written as linear combinations of *n*, the more flexible the splines are, but the price to pay is an increased estimation error due to the increasing number of unknown parameters.

In section 3c, we use natural splines with equispaced knots at

Our choice of basis functions implies that

The remaining four basis function

## REFERENCES

Annan, J. D., , and J. C. Hargreaves, 2010: Reliability of the CMIP3 ensemble.

,*Geophys. Res. Lett.***37**, L02703, doi:10.1029/2009GL041994.Bellprat, O., , S. Kotlarski, , D. Lüthi, , and C. Schär, 2013: Physical constraints for temperature biases in climate models.

,*Geophys. Res. Lett.***40**, 4042–4047, doi:10.1002/grl.50737.Boberg, F., , and J. H. Christensen, 2012: Overestimation of Mediterranean summer temperature projections due to model deficiencies.

,*Nat. Climate Change***2**, 433–436, doi:10.1038/nclimate1454.Buser, C. M., , H. R. Künsch, , D. Lüthi, , M. Wild, , and C. Schär, 2009: Bayesian multi-model projection of climate: Bias assumptions and interannual variability.

,*Climate Dyn.***33**, 849–868, doi:10.1007/s00382-009-0588-6.Buser, C. M., , H. R. Künsch, , and C. Schär, 2010a: Bayesian multi-model projections of climate: Generalization and application to ENSEMBLES results.

,*Climate Res.***44**, 227–241, doi:10.3354/cr00895.Buser, C. M., , H. R. Künsch, , and A. Weber, 2010b: Biases and uncertainty in climate projections.

,*Scand. J. Stat.***37**, 179–199, doi:10.1111/j.1467-9469.2009.00686.x.Chandler, R. E., 2013: Exploiting strength, discounting weakness: Combining information from multiple climate simulators.

,*Philos. Trans. Roy. Soc. London***371A**, 20120388, doi:10.1098/rsta.2012.0388.Christensen, J. H., , and O. B. Christensen, 2007: A summary of the PRUDENCE model projections of changes in European climate by the end of this century.

,*Climatic Change***81**, 7–30, doi:10.1007/s10584-006-9210-7.Christensen, J. H., , and F. Boberg, 2012: Temperature dependent climate projection deficiencies in CMIP5 models.

,*Geophys. Res. Lett.***39**, L24705, doi:10.1029/2012GL053650.Déqué, M., , and S. Somot, 2010: Weighted frequency distributions express modelling uncertainties in the ENSEMBLES regional climate experiments.

,*Climate Res.***44**, 195–209, doi:10.3354/cr00866.Deser, C., , R. Knutti, , S. Solomon, , and A. S. Phillips, 2012: Communication of the role of natural variability in future North American climate.

,*Nat. Climate Change***2**, 775–779, doi:10.1038/nclimate1562.Fischer, A. M., , A. P. Weigel, , C. M. Buser, , R. Knutti, , H. R. Künsch, , M. A. Liniger, , C. Schär, , and C. Appenzeller, 2012: Climate change projections for Switzerland based on a Bayesian multi-model approach.

,*Int. J. Climatol.***32**, 2348–2371, doi:10.1002/joc.3396.Fischer, E. M., , J. Rajczak, , and C. Schär, 2012: Changes in European summer temperature variability revisited.

,*Geophys. Res. Lett.***39**, L19702, doi:10.1029/2012GL052730.Furrer, R., , S. R. Sain, , D. Nychka, , and G. A. Meehl, 2007: Multivariate Bayesian analysis of atmosphere–ocean general circulation models.

,*Environ. Ecol. Stat.***14**, 249–266, doi:10.1007/s10651-007-0018-z.Furrer, R., , S. Geinitz, , and S. R. Sain, 2012: Assessing variance components of general circulation model output fields.

,*Environmetrics***23**, 440–450, doi:10.1002/env.2139.Geinitz, S., , R. Furrer, , and S. R. Sain, 2015: Bayesian multilevel analysis of variance for relative comparison across sources of global climate model variability.

,*Int. J. Climatol.***35,**433–443, doi:10.1002/joc.3991.Gelman, A., , J. B. Carlin, , H. S. Stern, , D. B. Dunson, , A. Vehtari, , and D. B. Rubin, 2013:

3rd ed. CRC Press, 675 pp.*Bayesian Data Analysis.*Gilks, W. R., , S. Richardson, , and D. J. Spiegelhalter, 1996:

Chapman and Hall, 512 pp.*Markov Chain Monte Carlo in Practice.*Giorgi, F., , and L. O. Mearns, 2002: Calculation of average, uncertainty range, and reliability of regional climate changes from AOGCM simulations via the “reliability ensemble averaging” (REA) method.

*J. Climate,***15,**1141–1158, doi:10.1175/1520-0442(2002)015<1141:COAURA>2.0.CO;2.Giorgi, F., , C. Jones, , and G. R. Asrar, 2009: Addressing climate information needs at the regional level: The CORDEX framework.

,*WMO Bull.***58**, 175–183.Greene, A. M., , L. Goddard, , and U. Lall, 2006: Probabilistic multimodel regional temperature change projections.

,*J. Climate***19**, 4326–4343, doi:10.1175/JCLI3864.1.Grimit, E. P., , T. Gneiting, , V. J. Berrocal, , and N. A. Johnson, 2006: The continuous ranked probability score for circular variables and its application to mesoscale forecast ensemble verification.

,*Quart. J. Roy. Meteor. Soc.***132**, 2925–2942, doi:10.1256/qj.05.235.Hastie, T., , R. Tibshirani, , and J. Friedman, 2009:

*The Elements of Statistical Learning—Data Mining, Inference and Prediction*. 2nd ed. Springer-Verlag, 745 pp.Hawkins, E., , and R. Sutton, 2009: The potential to narrow uncertainty in regional climate predictions.

,*Bull. Amer. Meteor. Soc.***90**, 1095–1107, doi:10.1175/2009BAMS2607.1.Haylock, M. R., , N. Hofstra, , A. M. G. Klein Tank, , E. J. Klok, , P. D. Jones, , and M. New, 2008: A European daily high-resolution gridded dataset of surface temperature and precipitation for 1950–2006.

,*J. Geophys. Res.***113**, D20119, doi:10.1029/2008JD010201.Heaton, M., , T. Greasby, , and S. R. Sain, 2013: Modeling uncertainty in climate using ensembles of regional and global climate models and multiple observation-based data sets.

*SIAM/ASA J. Uncertainty Quantif.,***1**(1), 535–559, doi:10.1137/12088505X.Hewitt, C. D., , and D. J. Griggs, 2004: Ensembles-based predictions of climate changes and their impacts.

,*Eos, Trans. Amer. Geophys. Union***85**, 566, doi:10.1029/2004EO520005.Ho, C. K., , D. B. Stephenson, , M. Collins, , C. A. T. Ferro, , and S. J. Brown, 2012: Calibration strategies: A source of additional uncertainty in climate change projections.

,*Bull. Amer. Meteor. Soc.***93**, 21–26, doi:10.1175/2011BAMS3110.1.IPCC, 2013:

*Climate Change 2013: The Physical Science Basis.*Cambridge University Press, 1535 pp., doi:10.1017/CBO9781107415324.Jun, M., , R. Knutti, , and D. W. Nychka, 2008: Spatial analysis to quantify numerical model bias and dependence.

,*J. Amer. Stat. Assoc.***103**, 934–947, doi:10.1198/016214507000001265.Kang, E. L., , N. Cressie, , and S. R. Sain, 2012: Combining outputs from the North American Regional Climate Change Assessment Program by using a Bayesian hierarchical model.

,*J. Roy. Stat. Soc.***61C**, 291–313, doi:10.1111/j.1467-9876.2011.01010.x.Kaufman, C. G., , and S. R. Sain, 2010: Bayesian functional ANOVA modeling using Gaussian process prior distributions.

,*Bayesian Anal.***5**, 123–149, doi:10.1214/10-BA505.Kerkhoff, C., , H. R. Künsch, , and C. Schär, 2014: Assessment of bias assumptions for climate models.

,*J. Climate***27**, 6799–6818, doi:10.1175/JCLI-D-13-00716.1.Knutti, R., , R. Furrer, , C. Tebaldi, , J. Cermak, , and G. A. Meehl, 2010: Challenges in combining projections from multiple climate models.

,*J. Climate***23**, 2739–2758, doi:10.1175/2009JCLI3361.1.Laprise, R., 2014: Comment on “The added value to global model projections of climate change by dynamical downscaling: A case study over the continental U.S. using the GISS-ModelE2 and WRF models” by Racherla et al.

,*J. Geophys. Res. Atmos.***119**, 3877–3881, doi:10.1002/2013JD019945.Lenderink, G., 2010: Exploring metrics of extreme daily precipitation in a large ensemble of regional climate model simulations.

,*Climate Res.***44**, 151–166, doi:10.3354/cr00946.Maraun, D., 2012: Nonstationarities of regional climate model biases in European seasonal mean temperature and precipitation sums.

,*Geophys. Res. Lett.***39**, L06706, doi:10.1029/2012GL051210.Masson, D., , and R. Knutti, 2011: Climate model genealogy.

,*Geophys. Res. Lett.***38**, L08703, doi:10.1029/2011GL046864.Matheson, J. E., , and R. L. Winkler, 1976: Scoring rules for continuous probability distributions.

,*Manage. Sci.***22**, 1087–1096, doi:10.1287/mnsc.22.10.1087.Mearns, L. O., , W. J. Gutowski, , R. Jones, , L.-Y. Leung, , S. McGinnis, , A. M. B. Nunes, , and Y. Qian, 2009: A regional climate change assessment program for North America.

,*Eos, Trans. Amer. Geophys. Union***90**, 311, doi:10.1029/2009EO360002.Mueller, B., , and S. I. Seneviratne, 2012: Hot days induced by precipitation deficits at the global scale.

,*Proc. Natl. Acad. Sci. USA***109**, 12 398–12 403, doi:10.1073/pnas.1204330109.Nakicenovic, N., , and R. Swart, Eds., 2000:

*Special Report on Emission Scenarios.*Cambridge University Press, 570 pp.Nychka, D., , and C. Tebaldi, 2003: Comments on “Calculation of average, uncertainty range, and reliability of regional climate changes from AOGCM simulations via the ‘reliability ensemble averaging’ (REA) method.”

,*J. Climate***16**, 883–884, doi:10.1175/1520-0442(2003)016<0883:COCOAU>2.0.CO;2.Piani, C., , J. O. Haerter, , and E. Coppola, 2010: Statistical bias correction for daily precipitation in regional climate models over Europe.

,*Theor. Appl. Climatol.***99**, 187–192, doi:10.1007/s00704-009-0134-9.Rougier, J., , M. Goldstein, , and L. House, 2013: Second-order exchangeability analysis for multimodel ensembles.

,*J. Amer. Stat. Assoc.***108**, 852–863, doi:10.1080/01621459.2013.802963.Sain, S. R., , R. Furrer, , and N. Cressie, 2011: A spatial analysis of multivariate output from regional climate models.

,*Ann. Appl. Stat.***5**, 150–175, doi:10.1214/10-AOAS369.Salazar, E., , B. Sansó, , A. O. Finley, , D. Hammerling, , I. Steinsland, , X. Wang, , and P. Delamater, 2011: Comparing and blending regional climate model predictions for the American Southwest.

,*J. Agric. Biol. Environ. Stat.***16**, 586–605, doi:10.1007/s13253-011-0074-6.Sanderson, B. M., , and R. Knutti, 2012: On the interpretation of constrained climate model ensembles.

,*Geophys. Res. Lett.***39**, L16708, doi:10.1029/2012GL052665.Schär, C., , P. L. Vidale, , D. Lüthi, , C. Frei, , C. Häberli, , M. A. Liniger, , and C. Appenzeller, 2004: The role of increasing temperature variability for European summer heat waves.

,*Nature***427**, 332–336, doi:10.1038/nature02300.Scinocca, J. F., , D. B. Stephenson, , T. C. Bailey, , and J. Austin, 2010: Estimates of past and future ozone trends from multimodel simulations using a flexible smoothing spline methodology.

,*J. Geophys. Res.***115**, D00M12, doi:10.1029/2009JD013622.Seneviratne, S. I., , D. Lüthi, , M. Litschi, , and C. Schär, 2006: Land–atmosphere coupling and climate change in Europe.

,*Nature***443**, 205–209, doi:10.1038/nature05095.Smith, R. L., , C. Tebaldi, , D. Nychka, , and L. O. Mearns, 2009: Bayesian modeling of uncertainty in ensembles of climate models.

,*J. Amer. Stat. Assoc.***104**, 97–116, doi:10.1198/jasa.2009.0007.Stephenson, D. B., , M. Collins, , J. C. Rougier, , and R. E. Chandler, 2012: Statistical problems in the probabilistic prediction of climate change.

,*Environmetrics***23**, 364–372, doi:10.1002/env.2153.Taylor, K. E., , R. J. Stouffer, , and G. A. Meehl, 2012: An overview of CMIP5 and the experiment design.

,*Bull. Amer. Meteor. Soc.***93**, 485–498, doi:10.1175/BAMS-D-11-00094.1.Tebaldi, C., , and R. Knutti, 2007: The use of the multi-model ensemble in probabilistic climate projections.

,*Philos. Trans. Roy. Soc. London***365A**, 2053–2075, doi:10.1098/rsta.2007.2076.Tebaldi, C., , and B. Sansó, 2009: Joint projections of temperature and precipitation change from multiple climate models: A hierarchical Bayesian approach.

,*J. Roy. Stat. Soc.***172A**, 83–106, doi:10.1111/j.1467-985X.2008.00545.x.Tebaldi, C., , R. L. Smith, , D. Nychka, , and L. O. Mearns, 2005: Quantifying uncertainty in projections of regional climate change: A Bayesian approach to the analysis of multimodel ensembles.

,*J. Climate***18**, 1524–1540, doi:10.1175/JCLI3363.1.van der Linden, P., , and J. F. B. Mitchell, 2009: ENSEMBLES: Climate change and its impacts: Summary of research and results from the ENSEMBLES project. Met Office Hadley Centre Tech. Rep., 160 pp.

Vidale, P. L., , D. Lüthi, , R. Wegmann, , and C. Schär, 2007: European summer climate variability in a heterogeneous multi-model ensemble.

,*Climatic Change***81**, 209–232, doi:10.1007/s10584-006-9218-z.Weigel, A. P., , R. Knutti, , M. A. Liniger, , and C. Appenzeller, 2010: Risks of model weighting in multimodel climate projections.

,*J. Climate***23**, 4175–4191, doi:10.1175/2010JCLI3594.1.