## Abstract

Climate scenarios make implicit or explicit assumptions about the extrapolation of climate model biases from current to future time periods. Such assumptions are inevitable because of the lack of future observations. This manuscript reviews different bias assumptions found in the literature and provides measures to assess their validity. The authors explicitly separate climate change from multidecadal variability to systematically analyze climate model biases in seasonal and regional surface temperature averages, using global and regional climate models (GCMs and RCMs) from the Ensemble-Based Predictions of Climate Changes and Their Impacts (ENSEMBLES) project over Europe. For centennial time scales, it is found that a linear bias extrapolation for GCMs is best supported by the analysis: that is, it is generally not correct to assume that model biases are independent of the climate state. Results also show that RCMs behave markedly differently when forced with different drivers. RCM and GCM biases are not additive, and there is a significant interaction component in the bias of the RCM–GCM model chain that depends on both the RCM and GCM considered. This result questions previous studies that deduce biases (and ultimately projections) in RCM–GCM combinations from reanalysis-driven simulations. The authors suggest that the aforementioned interaction component derives from the refined RCM representation of dynamical and physical processes in the lower troposphere, which may nonlinearly depend upon the larger-scale circulation stemming from the driving GCM. The authors’ analyses also show that RCMs provide added value and that the combined RCM–GCM approach yields, in general, smaller biases in seasonal surface temperature and interannual variability, particularly in summer and even for spatial scales that are, in principle, well resolved by the GCMs.

## 1. Introduction

Our current understanding of the earth’s future climate is, to a large extent, based on comprehensive multimodel ensembles that are extensive collections of global (Taylor et al. 2012) and regional climate models (van der Linden and Mitchell 2009; Mearns et al. 2009) forced by prescribed emission pathways (Nakicenovic et al. 2000; Moss et al. 2010). The resulting climate projections are subject to different sources of uncertainty: (i) the scenario uncertainty, which is due to the chosen future emission pathway; (ii) the model uncertainty, which encompasses structural and parametric contributions; and (iii) internal climate variability (Hawkins and Sutton 2009). Multimodel ensembles aim at disentangling these sources of uncertainty.

Still, combining information from multimodel ensembles is challenging. Since our models are only imperfect representations of the truth, each ensemble member entails systematic errors or biases that are apparent when comparing model outputs with observations in a historical period. These biases are generally significant in comparison to projected climate changes; thus, a bias correction is usually applied to remove structural deficiencies (Déqué 2007; Christensen et al. 2008; Buser et al. 2009, 2010; Maraun et al. 2010; Boberg and Christensen 2012; Christensen and Boberg 2012; Ho et al. 2012; Maraun 2012).

In the future, however, no observations are available; therefore, one has to rely on assumptions on how biases extrapolate in a changing climate. Often these assumptions are not explicitly mentioned in the postprocessing step. A common procedure is to assume that bias changes are negligible compared to climate change or, equivalently, that the bias itself is time invariant (Buser et al. 2009; Maraun et al. 2010). In accordance with Buser et al. (2009), we call this assumption “constant bias.” If one then identifies climate change with “scenario minus control,” biases cancel. In Maraun (2012), this assumption is critically reviewed in a pseudoreality framework (Frías et al. 2006; Vrac et al. 2007).

Alternatively, one might assume that biases (linearly) depend on the underlying climate and thus change with time (Déqué 2007; Christensen et al. 2008; Buser et al. 2009). In Buser et al. (2009), this assumption is called “constant relation.” They identified substantial biases in interannual variability for various regional climate models (RCMs) from quantile–quantile (Q–Q) plots versus observations and argued that models overestimating the difference between a warm and a cold year in the control period would likely also overestimate a future warming (and vice versa with an underestimation). This assumption is also equivalent to quantile mapping, a statistical downscaling methodology widely used in impact studies (Maraun et al. 2010).

Ho et al. (2012) discuss two bias assumptions and their influence on European temperature projections. Their “bias correction” assumption considers transformations between the distribution of model output and observations and assumes that this transformation is constant over time: that is, one can derive values that are statistically equivalent to future observations by applying this transformation (derived from data in the control period) to model output in the scenario period. A similar approach has been taken by Piani et al. (2010) for daily precipitation over Europe. It turns out that this assumption is equivalent to constant relation (see section 3 below). For reasons explained in section 3 we think that the second assumption in Ho et al. (2012), called “change factor,” has little practical relevance.

Boberg and Christensen (2012) and Christensen and Boberg (2012) analyze the relation between climate model biases and the anticipated climate change. They find that models having large biases are likely to show large climate change signals. Accordingly, they then propose a regression type of correction to the climate change signal based on the estimated bias in interannual variability. Again this is related to the constant relation assumption.

Physical constraints for bias assumptions are discussed in Bellprat et al. (2013). They find that soil-moisture depletion plays a crucial role for Mediterranean temperature biases and that, at some elevated future temperature level, linear extrapolation assumptions implicitly imply nonphysical relations between soil moisture and temperature. To overcome this issue, they propose a transition from a linear to a constant bias with increasing temperatures.

Studies applying more than one assumption showed that the derived climate signal largely depends on the chosen bias assumption (see, e.g., Buser et al. 2009). Therefore, this choice constitutes an additional source of uncertainty in climate change studies. This manuscript sheds light on differences and commonalities between the aforementioned bias assumptions and tests their validity under future climate change.

In contrast to previous studies that solely focus on either regional climate models (RCMs; Christensen et al. 2008; Buser et al. 2009; Boberg and Christensen 2012; Ho et al. 2012; Maraun 2012) or global climate models (GCMs; Christensen and Boberg 2012), we analyze seasonal mean temperature biases in the driving GCMs and the downscaling RCMs from the Ensemble-Based Predictions of Climate Changes and Their Impacts (ENSEMBLES) project (van der Linden and Mitchell 2009) and investigate the added value of RCMs compared to GCMs for spatially aggregated data.

Additionally, our methodology explicitly accounts for internal variability. This type of variability shows considerable power on decadal to multidecadal time scales (Hawkins and Sutton 2009; Deser et al. 2012). In Europe, for example, climate is strongly influenced by the Atlantic multidecadal oscillation (AMO) (Schlesinger and Ramankutty 1994; Kerr 2000; Sutton and Dong 2012). Because of the chaotic nature of the climate system, models and observations exhibit their own realizations of internal climate variability. Differences between climate models and observations are thus not only systematic biases but also due to different realizations of these modes of variability (Laprise 2014). By separating the climate change signal from multidecadal variations, as in Stocker et al. (2014) and other studies (Hawkins and Sutton 2009; Fischer et al. 2012), we are thus able to treat systematic biases coherently.

## 2. Data

Our study uses 2-m temperatures from the ENSEMBLES multimodel ensemble (van der Linden and Mitchell 2009). This is an ensemble of regional climate projections designed to analyze European climate change. Lateral boundary data were provided by both the 40-yr European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis (ERA-40; Uppala et al. 2005) and GCMs in a historical period (1961–2000). In the future (2000–99), however, only results with boundary conditions from GCMs are available. We use a subset of 11 RCMs (only those with data until 2099) from the ENSEMBLES archive. The assumed emission scenario is Special Report on Emissions Scenario (SRES) A1B (Nakicenovic et al. 2000).

In the historical period, all RCMs were driven by ERA-40. By using reanalysis data products at the lateral boundaries, we test the ability of RCMs to reproduce the current climate in isolation. We use the European daily high-resolution gridded dataset (E-OBS) for validation (Haylock et al. 2008). These are gridded daily observations for the period 1950–2009. Additionally, six GCMs provided boundary conditions to the RCMs for the period 1961–2099. These GCMs are part of the phase 3 of the Coupled Model Intercomparison Project (CMIP3) ensemble (Meehl et al. 2007). Embedded in the ENSEMBLES multimodel ensemble is the Hadley Centre perturbed-physics ensemble (Collins et al. 2006). Both the Hadley Centre GCM and RCM were run in three different parameter settings corresponding to a standard, low, and high climate sensitivity. This subensemble quantifies the sum of uncertainties due to the chosen parameter configuration and internal climate variability, and the three GCM and RCM versions are here treated as individual models (see the data matrix in Table 1).

To assess internal variability on multidecadal time scales, we additionally investigate a 500-yr preindustrial control integration from the ECHAM GCM (Roeckner et al. 2003; Marsland et al. 2003). Because this model drives many RCMs in the ENSEMBLES project, it should be reasonably representative for the entire ensemble. We only consider land grid points and analyze regional and seasonal temperature averages for eight European climate regions known as the Prediction of Regional Scenarios and Uncertainties for Defining European Climate Change Risks and Effects (PRUDENCE) regions (Christensen and Christensen 2007). These regions characterize Europe’s main climate zones. Each region’s coordinates are given in Table 2.

Because of limitations of space, we mainly focus on Scandinavia (SC), the Alps (AL) and the Mediterranean (MD) and on summer [June–August (JJA)] and winter [December–February (DJF)]. Because of their complex topography and climatic characteristics, we believe that these regions and seasons are particularly challenging for the models to simulate correctly.

## 3. Existing bias assumptions and implications

This section reviews existing bias assumptions more closely; provides further information on commonalities and differences between them; and derives implications that can, in principle, be used to distinguish between the assumptions, based on available data. We focus on GCMs, but subsequent equations can easily be extended to RCMs as well. We think, however, that RCM biases are strongly affected by biases of the driver and the relation between the driver and the RCM. We elaborate on this in section 5b.

### a. Notation

We denote the regional and seasonal average of the observations in year *t* by *X*_{o,t} and assume that

where *μ*_{o,t} denotes the mean and *ϵ*_{o,t} is an independent, normally distributed noise component with standard deviation *σ*_{o,t}. The latter characterizes the observed interannual variability. As indicated in Eq. (1), we assume that the mean at time *t* can be decomposed into a slowly varying climatological signal *μ*_{clim,t} arising from external anthropogenic forcing and internal multidecadal variations *ν*_{o,t}.

Similarly, we can specify the distribution of the output of GCM *g* in year *t* by

where *μ*_{g,t} and *ϵ*_{g,t} have the same interpretations as above. As our models are only imperfect portrayals of the truth, we allow for biases in the mean denoted by *β*_{g,t} and multiplicative biases *b*_{g,t} = *σ*_{g,t}/*σ*_{o,t} in interannual variability. Because of the chaotic nature of the climate system, we additionally assume that the GCM multidecadal variations *ν*_{g,t} are not related to the observed ones.

Furthermore, we denote the output of RCM *r* driven by GCM *g* in year *t* by *X*_{rg,t}. Like their driving GCM, RCMs are biased, and we denote biases in the mean *μ*_{rg,t} by *β*_{rg,t} and biases in interannual variability *σ*_{rg,t} by *b*_{rg,t}. However, as RCMs inherit the GCM atmospheric flow at their lateral boundaries, we assume that they have the same decadal variability *ν*_{rg,t} as their driving GCM (i.e., *ν*_{rg,t} = *ν*_{g,t}). For the same reason, the additive and multiplicative biases also depend on the driving GCM, as we shall see later.

Estimating *μ*_{o,t} and *σ*_{o,t} for future years is hampered by the absence of observational data. One has, therefore, to combine estimates of *μ*_{g,t} and *σ*_{g,t} with assumptions on how *β*_{g,t} and *b*_{g,t} extrapolate from the past to the future. This is the topic of the present manuscript.

### b. Constant bias versus constant relation

An obvious assumption is that biases stay constant over time. This assumption is common to most studies on climate change that identify the climate change signal via the difference (scenario − control). In mathematical terms, this is

Buser et al. (2009) call this the constant bias assumption, and they contrast it with the constant relation assumption that states that the bias in the climatological mean depends linearly on the climate mean state *μ*_{clim,t} and thus changes with time. They showed that the derived climate change signal is sensitive to the adopted bias assumption (see their Fig. 6).

In Buser et al. (2009) the constant relation assumption was formulated based on a discussion of quantile–quantile plots. Ho et al. (2012) have introduced an equivalent assumption called bias correction and, since we believe that their argument is easier to understand, we take this as our starting point. The bias correction assumption states that the biases *β*_{g,t} and *b*_{g,t} may be corrected using some monotone increasing function *h*_{g}(*x*), where *x* stands for seasonal mean temperature. This function is constant in time but depends on GCM *g*. The corrected model output *h*_{g}(*X*_{g,t}) will, however, still have its own multidecadal variability, which is different from the multidecadal variability of the observations. More precisely, it is assumed that *h*_{g}(*X*_{g,t} − *ν*_{g,t}) is statistically indistinguishable from *X*_{o,t} − *ν*_{o,t}. Because we assume normal distributions, the function *h*_{g}(*x*) must be linear: *h*_{g}(*x*) = *c*_{g} + *d*_{g}*x*. The constant relation assumption states therefore that, for some constants *c*_{g} and *d*_{g}, *c*_{g} + *d*_{g}(*X*_{g,t} − *ν*_{g,t}) and *X*_{o,t} − *ν*_{o,t} have the same mean and standard deviation for all years *t*,

This is equivalent to

The multiplicative bias is, thus, again constant, whereas the additive bias depends linearly on the underlying climate *μ*_{clim,t}, provided that *b*_{g} ≠ 1 (i.e., there is some multiplicative bias). If *b*_{g} = 1, constant relation and constant bias coincide.

Consequently, differences between the two bias assumptions exist only for the mean. If we plug *β*_{g,t} = *β*_{g} from Eq. (3) into Eq. (2), then under constant bias, the mean of GCM *g* in year *t* is

Under constant relation, however, the additive bias is [see Eq. (4)]. Plugging this into Eq. (2), it follows that

If we take two time points *t*_{2} > *t*_{1}, denoting a year in the scenario and historical period, respectively, then under constant bias the mean change between these time points is by Eq. (5),

whereas under constant relation this change is by Eq. (6),

since cancels. Distinguishing between constant bias and constant relation is thus possible if

or if we are able to estimate *ν*_{g,t} by smoothing the GCM time series. In the following, we usually opt for the latter, and methods for estimating multidecadal variations from a climatological time series will be discussed in section 4.

Equations. (7) and (8) show that we can distinguish between constant bias and constant relation in a multimodel ensemble by plotting simulated climate changes against multiplicative biases *b*_{g} of the ensemble members. Under constant bias, there should be no relation, whereas under constant relation we should find a linear relation. Moreover, after accounting for multidecadal variations, the intercept of this linear relation should be zero. The slope of this relation is then an estimate of the true climate change. However, neither the simulated climate change nor the multiplicative bias are known exactly. Still, one can compute the regression line for estimated climate changes and estimated multiplicative biases and take the slope of this line as an estimate of the true climate change. This has been studied in detail by Boberg and Christensen (2012) for Mediterranean summer and by Christensen and Boberg (2012) for the CMIP5 multimodel ensemble (Taylor et al. 2012). Note, however, that the regression line should go through the origin. Moreover, the slope of the regression line is biased, as there are estimation errors in both the simulated climate change and the multiplicative climate change. Thus, it is preferable to correct for multiplicative biases through a more complicated but sound statistical treatment, as in Buser et al. (2009).

Instead of the mean and the standard deviation, we can evaluate both assumptions from a quantile perspective (see, e.g., Déqué 2007; Buser et al. 2009). The *α* quantile is the value that is exceeded on average in (1 − *α*)*n* out of *n* cases. We denote quantiles of the observations at time *t* by *q*_{o,t}(*α*) and those of GCM *g* by *q*_{g,t}(*α*). For a normal distribution with mean *μ* and standard deviation *σ*, the *α* quantile is equal to *μ* + *σz*(*α*), where *z*(*α*) is the *α* quantile of the standard normal distribution with mean zero and standard deviation one.

Therefore, under constant bias, it follows that

whereas under the constant relation we have

Hence, under both assumptions, the quantiles of observations and model output for different values of *α* but fixed time point *t* are on a line with slope *b*_{g}. Under constant relation, the line remains the same for all times after accounting for multidecadal variations, whereas under constant bias the intercept of the line changes with time (see also Fig. 1 of Buser et al. 2009).

However, because observations are only available for a limited period, we adopt a pseudoreality approach (Maraun 2012; Bellprat et al. 2013), select one model as the reference, and study how well the others can predict it. We thus express Eqs. (10) and (11) in terms of GCM quantiles only, using two GCMs *g*_{1} and *g*_{2}. Under constant bias we obtain,

whereas under constant relation we find

If we again fix *t* but let *α* vary, then under both assumptions the quantiles of GCM *g*_{2} plotted against the quantiles of GCM *g*_{1} should be on a line with slope . Under constant relation, this line does not change with time, provided we account for multidecadal variations, whereas for constant bias the intercept changes. Equations (12) and (13) will be used to test the validity of the different bias assumptions under climate change in an alternative way.

### c. The change factor assumption

Additionally to the bias correction assumption, Ho et al. (2012) introduced a second assumption called change factor. It is different from constant bias and constant relation. The change factor assumption postulates the existence of a transformation that depends only on the projection horizon *t*_{2} − *t*_{1} and, in particular, not on the climate model. Applying this function, is assumed to be statistically indistinguishable from for any *g* and equivalently for and . Thereby, Ho et al. (2012) implicitly assume that the simulated and observed climate are exchangeable. We have grave doubts about this assumption, as models are per se imperfect representations of the truth. This especially holds in climate research. Climate models show systematic biases because of their finite resolution, unknown or not-well-understood processes that are not accounted for in our models, and also parameterizations for unresolved processes not entirely derived from first principles. Moreover, each model depicts its own climate sensitivity; thus, *h* being independent of climate model *g* seems doubtful. That is why we think this assumption has little practical relevance, and therefore we do not take it into further considerations.

### d. More general bias assumptions

We can weaken constant relation and constant bias by assuming that biases may change over time but more slowly than *μ*_{o,t} and *σ*_{o,t}. This has been implemented in Buser et al. (2009, 2010) by using a Bayesian approach with informative prior distributions.

The constant bias and constant relation assumption can be combined in a coherent way (see Buser et al. 2010). In this approach, the two assumptions are the limiting cases for a parameter *κ* ∈ [0, 1]: that is, for *κ* = 0 we have the constant bias case and for *κ* = 1 we have the constant relation assumption. A fully Bayesian hierarchical treatment of this generalization for the ENSEMBLES dataset showed that neither the constant bias nor the constant relation assumption can clearly be rejected from the data. Generally, the posterior distribution of *κ* is flat, favoring none of the two extreme cases.

Bellprat et al. (2013) provide a physical justification for a combination of constant bias and constant relation. They find that soil-moisture depletion is critical for Mediterranean summer temperatures and imposes physical constraints that are inconsistent with constant relation; thus, when going to very high temperatures, constant bias seems more appropriate.

## 4. Methods

This section describes how we estimate time-varying means, standard deviations, and quantiles of our GCMs in order to validate constant bias and constant relation. Modern statistics offers a wide range of methods for doing this (see, e.g., Hastie et al. 2009). The difficult part is the separation of the climate change signal from internal variability on multidecadal time scales, which is necessary to distinguish between different bias assumptions [see Eqs. (7)–(13) and the comment by Laprise (2014)].

### a. Estimating time-varying means

We employ a nonparametric smoothing technique called local regression (LOESS) that estimates the mean of a time series (*X*_{1}, *X*_{2}, …, *X*_{T}) at time *t* by locally fitting a linear trend to a data window of width 2*k* + 1 years closest to *t*, with *k* denoting the half-window width. The window width determines the temporal scales to be retained and those to be smoothed out. Within the window, a weighting scheme based on the distance to year *t* is used. For details of the LOESS, see appendix A.

Similar to the Stocker et al. (2014; see box 12.1 in chapter 12) and other studies (Hawkins and Sutton 2009; Fischer et al. 2012), we separate a slowly varying climate signal from internal multidecadal variability by exploiting that these signals have temporal scales that are markedly different. We choose a window width of 2*k* + 1 ≈ 100 to estimate the slowly varying mean component *μ*_{clim,t} + *β*_{g,t} of GCM *g*. The remaining residuals are then smoothed with a window width of 2*k* + 1 ≈ 35 to estimate multidecadal variations *ν*_{g,t}.

Hawkins and Sutton (2009) and Fischer et al. (2012) use a polynomial fit to estimate the slowly varying mean component *μ*_{clim,t} + *β*_{g,t} and an unweighted symmetric moving average of 30 yr to estimate multidecadal variations *ν*_{g,t}. Our method has the following advantages: The temporal scales retained in the estimated slowly varying mean are independent of the time-series length *T*, the use of asymmetric windows at the end allows one to estimate multidecadal variations *ν*_{g,t} for all times *t*, and the use of weights allows a better separation between scales that are retained and those that are smoothed out.

We derived the strength of internal variations at multidecadal time scales from a 500-yr preindustrial control integration of the ECHAM GCM, where the first 100 yr were discarded for spin-off reasons (see appendix B for further details). The strength of the multidecadal variability was then estimated via

where denotes the estimated multidecadal variability using LOESS with 2*k* + 1 ≈ 35. In all cases, this quantity is smaller than 0.4°C. Alternatively to our LOESS method, we also estimated the strength of multidecadal variations from the standard deviation of nonoverlapping 30-yr averages similar to the procedure used in Stocker et al. (2014). Essentially, we found the same upper bound of 0.4°C for the strength of the multidecadal variability derived in this way. In relation to Eq. (9), this indicates that, in principle, a distinction between constant bias and constant relation should be feasible for longer-term forecasting horizons (i.e., 50–100 yr).

### b. Estimating time-varying standard deviations and quantiles

For estimating *σ*_{o,t} and *σ*_{g,t} we do not need to separate climatological signals and multidecadal variations. We therefore use a very simple approach that relies on fitting linear trends to windows of 29 yr centered at *t*. The standard deviations *σ*_{o,t} and *σ*_{g,t} of the time series can then be derived from the square root of the residual sum of squares divided by 27, accounting for the use of 2 degrees of freedom to fit a linear trend.

Under our assumption of a normal distribution, we immediately obtain estimates of time-varying quantiles from the estimated mean function and standard deviation. Since we also have estimates of the multidecadal variations, we can subtract these from the quantiles.

Alternatively, one can also use nonparametric quantile regression (see Koenker 2005). This method does not rely on the assumption of normal distributions (or any other family of distributions with a fixed shape). In our experience, however, these estimates are strongly influenced by random variations in the data, in particular for *α* close to zero or one, and thus the choice of the smoothing window becomes more delicate. Because of this, we opt for the simpler estimates based on normal distributions.

## 5. Results and discussion

We analyze the validity of constant bias and constant relation and their implications for the ENSEMBLES dataset. Because of the nesting procedure, RCMs inherit biases of the driving GCMs at their lateral boundaries. Separating these effects from those of the RCM itself is challenging. Therefore, we first restrict ourselves to an analysis of the driving GCMs. Later on, we provide additional information on RCMs and possible (statistical) modeling strategies.

### a. Bias assumptions and implications for GCMs

#### 1) Bias assumptions expressed by first and second moments

Both constant relation and constant bias assume that the bias in interannual variability *b*_{g,t} is time invariant. To see how well this is satisfied for our data, we estimate time-varying standard deviations as described in section 4 and then obtain the estimate of the multiplicative bias *b*_{g,t} as .

The *b*_{g,t} estimates for *t* = 1975 (blue) and *t* = 1995 (red) are given in Fig. 1. It seems that the multiplicative bias is, to a first approximation, constant in the observation period. However, differences (e.g., for AL in summer) underline that our simple assumptions do not apply perfectly.

To give a visual impression of the separation of the climatological signal and variability on multidecadal time scales by the method described in section 4, we show next the GCM time series in Figs. 2–4. The blue lines illustrate the estimated mean values *μ*_{g,t}, and red lines show the estimated slowly varying climatological signal (including additive biases), without multidecadal variations. The shaded areas are 95% confidence intervals for each regression function. Comparing red with blue, we get an impression of the strength of the multidecadal oscillations.

Furthermore, Figs. 2–4 give a rough intuition about the appropriateness of our bias assumptions. The constant relation assumption says that the climate change signal from different GCMs should positively correlate with their represented level of interannual variability (see section 3). Comparing Figs. 2–4 (red lines) with the spread along the *y* axis (dots), it seems that indeed the constant relation assumption is reasonable. For example, Action de Recherche Petite Echelle Grande Echelle (ARPEGE) usually has a small interannual variability and, in turn, projects a small temperature increase by the end of the twenty-first century. Contrarily, the interannual variability of HadQ0 is larger, and also the projected warming over 1961–2099 is stronger. For HadQ3, this generally does not hold because the model is tuned in such a way that it has a low climate sensitivity; thus, the projected warming is small.

Alternatively, we can analyze the validity of constant bias and constant relation according to Eqs. (7) and (8) and Christensen and Boberg (2012). Therefore, we plot the projected temperature change for 2035 (blue) and 2075 (red), with respect to 1975 versus an estimate of the multiplicative bias (see Fig. 5). The values for 1975, 2035, and 2095 are the corresponding values from the red lines in Figs. 2–4. As indicated in section 3, for the constant relation assumption to hold, simulated temperature changes should linearly depend on the estimated bias in interannual variability. Because we have subtracted estimated multidecadal variations, the linear relations should have zero intercept. Under the constant bias assumption, however, these two quantities are unrelated, resulting in horizontal fits.

Generally, we observe substantial scatter around the fitted lines, indicating only a moderate goodness of fit (see Fig. 5). For SC in summer, multiplicative biases and temperature changes are unrelated (fitted lines are essentially horizontal), supporting the constant bias assumption. For AL and MD, however, a positive temperature–bias relation seems plausible (second and third row). GCMs that show large biases in interannual variability thus tend to overestimate the true climate change. This agrees with our conclusions drawn from the time-series plots: that is, temperature changes on centennial time scales are slightly more compatible with constant relation than constant bias. Our results are also in line with Christensen and Boberg (2012), who found a positive temperature–bias relation in the CMIP5 ensemble for numerous climatic regions. However, Fig. 5 also shows that the fitted lines vary substantially, depending on whether or not we force them to have zero intercept. This is an indication that neither of the two assumptions holds perfectly and that estimating climate change by the slope of the fitted line has a large uncertainty.

#### 2) Bias assumptions and quantiles

An additional way of distinguishing between the two bias assumptions relies on quantile–quantile plots. Figures 6–8 show the estimated HadQ3, ECHAM, and HadQ0 quantiles against those from ARPEGE, for *α* = 0.1, 0.3, …, 0.9 as estimated from LOESS fits to 100-yr windows centered at *t* = 1975, 2005, …, 2095. For details on the estimation procedure see section 4. In particular, we remove most of the multidecadal variations by this procedure. Dashed lines connect quantiles for the same year but different *α*, whereas colored lines connect quantiles for different years but same *α*.

Because ARPEGE generally depicts a small bias in the control period (see Fig. 1), this procedure is similar to a pseudoreality approach (Maraun 2012), where ARPEGE is assumed to be the truth.

According to constant bias, dashed and colored lines are expected to form rhomboidal structures. For the comparison of HadQ3 with ARPEGE, we can indeed find these phenomena in summer (Fig. 8). Under constant relation, however, all dashed and colored lines should collapse to a straight line [see Eq. (13)]. This is approximately true for many season–region combinations (see, e.g., Fig. 7). Therefore, in agreement with our conclusions from the previous section, it seems that constant relation is a reasonable assumption on centennial time scales.

### b. Bias assumptions and implications for RCMs

So far we were concerned with the implications of different bias assumptions for GCM output. In this section, we extend the analysis to RCM output.

#### 1) Relations between RCMs and their drivers

For a detailed treatment of RCM biases, we first need to understand the relations between RCMs and their drivers. As an example, we show scatterplots for the Danish Meteorological Institute (DMI) RCM and its drivers in Figs. 9–11. Similar conclusions can be drawn from the other ENSEMBLES RCMs (not shown). For ERA-40-driven runs (red) and GCMs (others) we have data spanning the period 1961–90 and 1961–2099, respectively. Lines indicate a local quadratic regression fit to the data allowing for nonlinear relations (Hastie et al. 2009). To characterize the general properties of the RCM–driver relations, we choose a fairly strong smoothing with a smoothing window containing all points and approximately 3 degrees of freedom for the fit.

Despite the long time period considered, we find only small indications for nonlinearity in the RCM–driver relation. Surprisingly, the DMI is remarkably sensitive to the chosen driver: that is, we observe vertical shifts and changes in the slopes for different drivers. This is most prominent for summer but also true for winter. Examining Figs. 9–11 more closely, we also find changes in the scatter around the fitted lines.

#### 2) RCM and GCM biases

Our findings from the previous section point to important consequences for regional climate modeling in general. When using reanalysis data at the RCM lateral boundaries and comparing these runs with observations, one would like to identify biases stemming solely from the RCM. Assuming that the RCM–GCM bias can be decomposed into stand-alone RCM and GCM parts (and may be a smaller interaction term), inference from the reanalysis-driven RCMs might be justified (e.g., Boberg and Christensen 2012). However, as Figs. 9–11 suggest, RCMs react markedly different to different drivers; therefore, biases may also be affected by a change in the driver.

This section sheds light on this question and analyzes whether RCM–GCM biases are indeed additive in the mean and multiplicative in the interannual variability: that is, whether they can be decomposed into dominating RCM and GCM contributions. To this end, we estimate time-varying means and standard deviations from a linear regression over 29-yr periods.

The ordinate in Fig. 12 characterizes the RCM–GCM mean bias that is not accounted for by the GCM; that is, *β*_{rg,t} − *β*_{g,t}, where *t* denotes the center of the period considered. We estimate this bias from the difference in the RCM–GCM and GCM means (i.e., *μ*_{rg,t} − *μ*_{g,t}, with *t* = 1975). Because of the nesting procedure, RCMs inherit their drivers’ internal variability; thus, the values on the ordinate represent pure biases. Contrarily, on the abscissa, we show merely an estimate of the difference between GCM and observation means *μ*_{g,t} − *μ*_{o,t}. These quantities are affected by multidecadal variations in both the GCM and observations. Since, however, multidecadal oscillations are roughly on the order of 0.4°C (see section 4) and, in general, much smaller than *μ*_{g,t} − *μ*_{o,t}, we are confident that the abscissa values are generally dominated by GCM biases *β*_{g,t}.

Assuming that RCM and GCM biases are additive, one expects that quantities on the y and *x* axis are uncorrelated (as we already accounted for the GCM bias in the derivation of the RCM bias). This is obviously not the case (see the solid black lines in Fig. 12). We rather observe a negative relation between RCM and GCM biases (i.e., that RCMs at least partially compensate GCM biases). We are a bit surprised that RCMs provide substantial added value even for averages over large spatial domains as analyzed here. In fact, an interaction between RCM and GCM biases seems plausible. For the Mediterranean in winter, an almost perfect compensation of GCM biases occurs. Here, the slope of the regression line is close to −1. An interesting effect is also evident for SC in winter: For the Swedish Meteorological and Hydrological Institute (SMHI) RCM, the RCM–GCM differences amount to between 1° and 5°C, overall leading to a reduction of the driving GCM bias by almost a factor of 2.

Similarly, we can investigate the GCM influence on RCM biases in interannual variability. We estimate *σ*_{rg,t} by the same method as *σ*_{g,t} and *σ*_{o,t} and compare the estimated ratio *σ*_{rg,t}/*σ*_{g,t} with the estimated bias in interannual variability *b*_{g,t} (both on log scales; see Fig. 13). In doing so, we investigate RCM multiplicative biases that are not accounted for by the GCMs and their relations to GCM biases. Note that these quantities are unaffected by decadal variations.

Because *b*_{rg,t} = *b*_{g,t}*σ*_{rg,t}/*σ*_{g,t}, the RCM corrects the multiplicative bias of the GCM if the points in Fig. 13 lie in the second or fourth quadrant. As before, complete correction occurs if the points are on a line through the origin with slope equal to −1. Figure 13 highlights that RCMs tend to have smaller biases in interannual variability. Furthermore, this negative association indicates that interactions between RCM and GCM biases might contribute to the overall RCM–GCM bias.

Figures 12 and 13 show that RCM and GCM biases are not additive and that overall RCMs substantially reduce the biases of the driving GCMs in terms of mean seasonal surface temperature and interannual variability, even on spatial scales that are, in principle, well resolved by the driving GCMs. Indeed, the results show that the same RCM may improve the results of different GCMs, although the GCM biases vary strongly from model to model. This is a surprising result. How can it be explained? One might suspect that the respective RCMs have undergone some tuning to compensate for the biases of their driving GCMs. This hypothesis, however, can be rejected for the following reasons: First, the ENSEMBLES project considered RCM simulations driven by both reanalysis and GCMs using the same RCM configurations, and the main goal of the RCM modeling groups was to yield optimal validation for the reanalysis-driven simulations (see Christensen et al. 2010, and references therein); Second, in most cases, the RCM setup had to be defined prior to conducting the GCM-driven simulations, and in many cases the same RCM versions were used to downscale different GCMs. At least in general, a tuning of the ENSEMBLES RCMs with respect to the driving GCMs has thus been neither feasible nor attempted.

To derive a specific interpretation of the aforementioned results, consider an RCM driven by a GCM. On the one hand, the large-scale circulation in the RCM is strongly controlled by the driving GCM, and its performance in terms of upper-level circulation is thus very similar to that of the driving GCM. On the other hand, the RCM is able to develop its own near-surface climate in response to the large-scale forcing. In general, near-surface processes in RCMs are better resolved because of the use of higher horizontal and lower-tropospheric vertical resolution. In addition, RCMs often employ more sophisticated parameterization schemes for boundary layer, snow-cover, and land surface processes. It is also possible that the RCM parameterizations are better tuned (not with respect to the driving GCM but rather with respect to the region considered). The refined representation of the lower troposphere and land surface enables an improved representation of mesoscale effects, such as mountain and land–sea circulations, low-level temperature inversions, and clouds, as well as snow-cover and soil-moisture variations. These factors are often crucial for the near-surface climate. Overall, this yields a satisfactory interpretation of the main results in Figs. 12 and 13 for two reasons: First, the effect of the RCM on the near-surface climate depends upon the driving GCM’s large-scale forcing, and thus there is no reason to expect an additive behavior of GCM and RCM biases. Second, the refined RCM representation of near-surface processes generally improves the simulation of the near-surface climate.

#### 3) Consequences for RCM biases

Based on the fact that RCMs are very sensitive to the driver and because the associated nonadditivity prohibit simple assumptions on the relation between RCM and GCM biases, we suggest not to impose additional bias assumptions on the level of the RCMs but rather to account for the RCM–driver dependence through a hierarchical statistical model (see, e.g., Tebaldi et al. 2011) defined by

where *ξ*_{rg} and *ϕ*_{rg} are the intercept and slope, respectively, of the regression line for an RCM versus its driving GCM, and ϵ_{rg,t} are the mean zero independent and normally distributed errors with standard deviation τ_{rg}.

A slightly more general model would allow *ξ*_{rg}, *ϕ*_{rg}, and *τ*_{rg} to change very slowly with time in order to take the small nonlinearity in the relation between *X*_{rg,t} and *X*_{g,t} into account. Equation (14) implies that the mean and standard deviation of *X*_{rg,t} are given by

In particular, the multidecadal variations of an RCM and its driving GCM must agree. We have checked this by looking at the time-series plots of *X*_{rg,t} and of *X*_{g,t} and found very close agreement. This means, in particular, that the residuals in Figs. 9–11 do not exhibit temporal autocorrelations.

We can use the expressions in Eq. (15) to investigate what happens if the constant relation assumption holds for a GCM. For *t*_{2} > *t*_{1}, we obtain

and

In other words, the GCM-driven RCM will also tend to overestimate the true climate change but by a factor that is smaller than its multiplicative bias.

## 6. Summary and conclusions

We have reviewed different bias assumptions found, for example, in Buser et al. (2009) and Ho et al. (2012); proposed various criteria to check the validity of two opposing extrapolation assumptions for climate model biases; and applied this to the surface temperatures of ENSEMBLES GCM and RCM simulations using validation methodologies and a perfect model approach. Our methodology explicitly accounts for internal climate variability on multidecadal time scales and separates these signals from climate change. Our findings indicate that, on large centennial time scales, a linear bias approximation called constant relation is best justified. This conclusion implies that biases depend on the model state and need to be extrapolated from current to future climates when considering climate change. This contrasts with the commonly used constant bias assumption (which assumes that biases do not depend on the climate state).

For the ENSEMBLES RCMs, results show that RCMs behave markedly differently when forced with different drivers. As the driving GCM affects the RCM biases strongly and in a nonadditive way, it is questionable whether much can be learned from reanalysis-driven simulations for RCM–GCM biases (as suggested, e.g., by Boberg and Christensen 2012), although reanalysis-driven simulations are the best mean to validate an RCM configuration. Our assessment of RCM and GCM biases in mean surface temperature and interannual variability indicates that no simple assumptions on RCM biases are feasible. In particular, RCM and GCM biases are not additive, and there is a significant interaction component in the bias of the RCM–GCM model chain that depends on both modeling components.

For the ENSEMBLES simulations considered, we have also shown that the RCMs substantially add value to GCMs, even on horizontal scales that are well resolved by GCMs. We believe that this improvement stems from higher horizontal and vertical resolution, as well as from a generally more sophisticated and/or better resolved representation of parameterized physical processes.

Our conclusions have several implications. First, in the design of RCM–GCM model ensembles, care should be taken to appropriately represent this complexity by giving consideration to many RCM–GCM combinations (rather than a few model chains), so as to provide sufficient information on bias assessment and extrapolation. Second, the results suggest that it is feasible to construct a hierarchical statistical model building on the previous studies of Tebaldi et al. (2011) and Buser et al. (2009) that would enable a coherent analysis of an entire matrix of GCM and RCM simulations and thereby to overcome issues raised in Buser et al. (2009) when using only independent RCM simulations (i.e., RCM simulations driven by different GCMs). Such a statistical model could easily be applied to other regional climate model projects such as the North American Regional Climate Change Assessment Program (NARCCAP; Mearns et al. (2009) or the Coordinated Regional Climate Downscaling Experiment (CORDEX; e.g., Giorgi et al. 2009; Kotlarski et al. 2014).

## Acknowledgments

This study was funded by the Swiss National Science Foundation under Grant 200021-132316/1. We acknowledge the EU-FP6 project ENSEMBLES (http://ensembles-eu.metoffice.com) for providing GCM and RCM simulations and the E-OBS observational datasets, as well as the data providers in the ECA&D project (http://www.ecad.eu).

### APPENDIX A

#### Details on the LOESS Procedure

We estimate the mean at time *t* from a local linear fit to the 2*k* + 1 observations closest to *t* via LOESS. In the middle of the series, if *k* + 1 ≤ *t* ≤ *T* − *k*, we simply obtain a weighted moving average,

where *w*(*z*) denotes a tricubic weight according to

See Fig. A1 for an illustration. At the beginning of the time series where *t* ≤ *k*, one minimizes

with respect to *μ* and *γ* and takes the minimizing value for *μ* as . At the end of the time series, an analog procedure is used.

### APPENDIX B

#### Separating a Climatological Signal from Multidecadal Variations

In the LOESS procedure, the choice of *k* determines which fluctuations of the series are considered as part of the mean function and which are considered as noise. To decompose a time series of model outputs *X*_{g,t} into a slowly varying climate signal *μ*_{clim,t} plus bias *β*_{g,t} and additional multidecadal variations *ν*_{g,t}, we first apply a LOESS smoother with a window width of 2*k* + 1 ≈ 100 yr to identify the slowly varying mean component *μ*_{clim,t} + *β*_{g,t}. In a second step, we estimate *ν*_{g,t} by applying LOESS with a window width 2*k* + 1 ≈ 35 to the residuals . See also Fig. A1 for an illustration of the tricubic weight function *w*(*z*) for the two smoothers.

The effect of this procedure is best understood by considering the time series as a superposition of harmonic oscillations with frequencies (cycles per year) *ω* ∈ [0, 0.5]. Because the LOESS estimate is linear, the amplitude of each such oscillation is multiplied by the so-called transfer function. Near the beginning and end of the series there is an additional phase shift. Figure B1 shows the transfer functions of the LOESS smoother with window widths 100 (red) and 35 (black) and of the filter, which applies a LOESS smoother with window width 35 to the residuals of a LOESS smoother with window width 100 (blue). In the middle of the series (left) our estimate of the slowly varying climatological signal contains only harmonic oscillations with periods larger than 70 yr and the multidecadal variations contain mainly harmonic oscillations with periods around 50 yr. Oscillations with periods around 100 yr are split up between the two components of the mean, reflecting the ambiguity inherent in any such decomposition. At the beginning of the series (right), higher frequencies are less damped and there is more overlap between the estimates of the slowly varying climate signal and multidecadal variations.

Our selection of the window widths is mainly based on the results of Schlesinger and Ramankutty (1994), Delworth and Mann (2000), and Kerr (2000), who identified significant low-frequency variability at periods around 50 yr. We also tried to identify distinct peaks in the spectrum of a 500-yr preindustrial control integration from ECHAM. However, the estimated spectrum is essentially flat: that is, there is only little evidence for peaks in certain frequency bands (see Fig. B2).

Furthermore, we looked at three ECHAM runs for the period 1961–2099 that differed only with respect to their initial conditions. Hence, these runs should have the same *μ*_{clim,t} + *β*_{g,t} but three independent multidecadal variations of *ν*_{g,t}. We estimated *μ*_{clim,t} + *β*_{g,t} by applying a LOESS smoother to the average of the three series and a window width determined by cross validation (Hastie et al. 2009). This method is based on the data alone and avoids any subjective choice by the data analyst. Next we estimated individual multidecadal variations from the three residuals by a LOESS smoother with window width again determined by cross validation. The results (not shown) did, however, not look convincing, and the window width determined by cross validation varied strongly between regions.

## REFERENCES

*Geophys. Res. Lett.,*

**39,**L24705, doi:.

*Geophys. Res. Lett.,*

**35,**L20709, doi:.

*Geophys. Res. Lett.,*

**33,**L19807, doi:.

*The Elements of Statistical Learning: Data Mining, Inference, and Prediction*. 2nd ed. Springer-Verlag, 745 pp.

**119,**3877–3881, doi:.

*Geophys. Res. Lett.,*

**39,**L06706, doi:.

*Climate Change 2013: The Physical Science Basis.*Cambridge University Press, 1535 pp. [Available online at http://www.climatechange2013.org/images/report/WG1AR5_ALL_FINAL.pdf.]

*Bayesian Statistics 9,*J. M. Bernardo et al., Eds., Oxford University Press, 639–658, doi:.

*Geophys. Res. Lett.,*

**34,**L18701, doi:.