1. Introduction
Global climate models (GCMs) are our primary tool for understanding the behavior of the global climate system over the coming centuries. They have a relatively coarse resolution (typically 150–300 km) and so are only able to provide climate change information on large spatial scales. Climate change information at higher resolution can be derived by nesting regional climate models (RCMs), which cover a limited area at a higher resolution (50 km or finer), in GCMs. These have been demonstrated and are widely used to provide realistic spatial and temporal detail on how the climate may change locally. However, such future climate projections at the local scale are not necessarily reliable because uncertainty in the representation of processes in the GCMs and RCMs often leads to a range of projected changes, the likelihood of which often can not be established (Déqué et al. 2007). In addition, natural climate variability limits our ability to quantify a projected change derived from a finite sampling of the baseline and perturbed climates, and it can be sufficiently large to obscure a signal completely (Kendon et al. 2008).
There has been considerable international effort recently to explore ranges of detailed climate projections through the use of multimodel ensembles, for example, in the Ensemble-Based Predictions of Climate Changes and their Impacts (ENSEMBLES; Hewitt and Griggs 2004), Prediction of Regional Scenarios and Uncertainties for Defining European Climate Change Risks and Effects (PRUDENCE; Christensen and Christensen 2007), and the North American Regional Climate Change Assessment Program (NARCCAP; Mearns et al. 2009) projects. This approach consists of combining different GCMs and different RCMs, developed at different modeling centers around the world, to form an ensemble of simulations sampling uncertainties. Ideally such ensembles should be designed to sample the full range of modeling uncertainty; however, in practice, they are assembled on an opportunity basis (with different models designed to sample uncertainty neither systematically nor comprehensively). Furthermore, because of limited resources, it is not possible to perform simulations for all available GCM–RCM combinations, which results in a sparsely filled GCM–RCM matrix. This raises the following question: how should simulations be prioritized to ensure uncertainties are sampled most efficiently? This requires an evaluation of the best sampling strategy for both GCM and RCM simulations, although here we focus on the latter, with the caveat that the number of GCMs needs to be sufficient to capture uncertainty in the large-scale response. An additional consideration, not addressed here, is the extent to which resources should also be apportioned to sampling uncertainty resulting from future emissions.
Pattern-scaling techniques have been widely used in the construction of climate scenarios in the context of a restricted number of climate model simulations (e.g., Mitchell et al. 1999; Christensen et al. 2001; Mitchell 2003; Rummukainen et al. 2003; Harris et al. 2006; Ruosteenoja et al. 2007). Traditionally, such techniques have been used to provide local and regional climate change projections for time periods and emission scenarios that have not been simulated by GCMs, using global-mean temperature change obtained from a simple energy balance model. The assumption underlying these methods is that the geographical pattern of the change is independent of the forcing. Thus, the local response of a climate variable is assumed to be linearly related to the global-mean temperature change, with the scaling coefficient only dependent on spatial position. This condition is largely satisfied for mean temperature, and to a lesser degree for precipitation (Mitchell et al. 1999; Mitchell 2003).
We examine the accuracy of local scaling both for time-mean variables and for measures of variability and extremes, focusing on temperature and precipitation across Europe. We note that a key consideration is noise resulting from internal variability, which has been found to lead to significant apparent errors in pattern scaling, in particular for mean precipitation (Mitchell et al. 1999). We expect internal variability to have an even greater influence on changes in local precipitation extremes. Thus, in this study, we identify where poor scaling skill may be explained by sampling error rather than genuine failure of the scaling assumptions. Recommendations regarding the applicability of the technique for enhancing information from partly filled GCM–RCM matrices are given.
This work also provides guidance on how to prioritize RCM simulations. In particular, we compare local scaling skill for different driving GCMs with differences between different RCMs driven by the same GCM. Where differences between different RCMs are comparatively small, we suggest that priority should be given to sampling different driving GCMs. However, where regional processes are important we may expect significant divergence between different RCMs, and hence there is the need to sample multiple RCMs. We note that in cases where it is important to sample both multiple RCMs and multiple driving GCMs, it may not be necessary for each GCM to run each RCM because this could lead to the double counting of uncertainties. Such subtleties regarding the optimal ensemble design are beyond the scope of the current paper and are difficult to resolve until more systematically designed matrix experiments become available. As such, this paper represents a first attempt at examining the question of how to design multimodel regional projection experiments.
The key questions we aim to address in this paper are as follows:
To what extent does noise resulting from internal variability impact on our ability to assess local scaling skill, for both mean and extremes of temperature and precipitation?
Is local scaling skillful in predicting the local RCM change for different driving GCMs, and hence can it be used to enhance the information from sparsely filled GCM–RCM ensemble matrices?
How do the errors in scaling across different driving GCMs compare to RCM differences, and what are the implications for GCM–RCM ensemble design?
2. Methodology
a. Model experiments
In this study we use data from the third version of the Rossby Centre regional atmospheric climate model (RCA3), which is driven by a three-member initial condition ensemble of the ECHAM5/Max Planck Institute Ocean Model (MPI-OM; Roeckner et al. 2006; Jungclaus et al. 2006) and three members of the perturbed physics ensemble of the Hadley Centre Coupled Model version 3 (HadCM3; Collins et al. 2010). RCA3 has a model domain spanning Europe, with a horizontal resolution of approximately 50 km × 50 km. Its formulation has been described by Samuelsson et al. (2010). We use 29 yr of data common to both GCM driving ensembles from each of the control and future periods, corresponding to the years of 1961–89 and 2071–99 respectively. For the future period, atmospheric constituents follow the Intergovernmental Panel on Climate Change (IPCC) Special Report on Emissions Scenarios (SRES) A1B scenario (Nakićenović et al. 2000).
The three members of the ECHAM5 driving ensemble (ECHAM5-r1, ECHAM5-r2 and ECHAM5-r3) were initiated from three different times within a long control integration, and, thus, while they have the same anthropogenic forcings, they differ in their realizations of internal climate variability. Thus, the ECHAM5-driven RCA3 ensemble allows us to estimate uncertainty in the local climate response resulting from large-scale internal climate variability, because this is inherited through the lateral and surface boundaries from the global model ensemble, as well as small-scale internal variability generated by the RCM. We note that here we are specifically sampling model-simulated internal variability, and an analysis of how representative this is of actual natural variability is beyond the scope of this paper. However, in the context of assessing the skill of local scaling in estimating model-simulated changes, it is the former that is relevant.
The Hadley Centre perturbed physics ensemble used here was generated by varying 31 uncertain parameters in the atmospheric physics and surface schemes of HadCM3 within plausible ranges (Collins et al. 2010). In total, 16 perturbed physics versions of HadCM3 were constructed, which were designed to systematically sample the modeling uncertainties. Two of these, the low- and high-sensitivity experiments HadCM3-Q3 and HadCM3-Q16—those versions sampling the lowest or highest climate sensitivities, respectively, while showing a realistic present-day climate—along with the standard unperturbed model HadCM3-Q0 have been downscaled here. Thus, the HadCM3-driven RCA3 ensemble samples uncertainty in the local climate response resulting from different driving GCMs (with the caveat that these all have the same model structure), as well as internal variability.
In addition to the above RCA3 model experiments, we also analyze data from the Netherlands Meteorological Institute regional atmospheric climate model RACMO2 (Lenderink et al. 2003) and the Max Planck Institute regional model (REMO) (Jacob et al. 2001), both of which are driven by a single member (r3) of the ECHAM5 ensemble, described above. These model experiments have been performed as part of the European ENSEMBLES (Jacob et al. 2008) project. In both cases, the RCM domain spans Europe, with a horizontal resolution of approximately 25 km × 25 km. In the analysis here, the data are spatially aggregated onto the 50-km RCA3 grid.
b. Statistical analysis
We consider the following metrics of the daily precipitation distribution: mean, standard deviation (std dev), wet-day frequency (Fwet), and the 50th, 75th, 90th, 95th, and 99th percentiles of wet days (for a wet-day threshold of 0.1 mm day−1). For daily 2-m temperature we consider the mean, std dev, and the 1st, 5th, 10th, 50th, 90th, 95th, and 99th percentiles. These metrics are calculated at each grid box, for each RCM simulation and corresponding driving GCM, for each of the control and future periods for both winter [December–February (DJF)] and summer [June–August (JJA)].
We wish to examine whether the local scaling relationship [Eq. (1)] derived for the ECHAM5 driving GCM can be used to estimate changes in RCA3 for HadCM3 driving models. This corresponds to downscaling from 300 to 50 km, with changes given by the difference between the future and control periods. At each grid box and for each variable, the scaling coefficient AE is calculated by one of the following two methods: 1) either by the ratio of the RCM to the GCM change for a single member of the ECHAM5 ensemble, 2) or using least squares linear regression (with zero intercept) applied to the three-member ECHAM5 ensemble. The latter method has the advantage of reducing noise resulting from internal variability (see section 4; Ruosteenoja et al. 2007). In each case, the GCM change corresponds to the same metric of the daily distribution as the RCM change at the nearest GCM grid box and for the same season. The resulting value of AE is then used to predict the RCM change for the GCM change in a given member of the HadCM3 ensemble, and this is compared with the actual simulated response. We also compare AE with the scaling coefficient AH, which is similarly derived using a least squares fit to the HadCM3 ensemble.
For mean precipitation and wet-day frequency, we apply local scaling to percentage changes in the metric, while for the standard deviation and percentiles of daily precipitation we consider changes in the natural logarithm. For temperature, we apply scaling to absolute changes for all indices, except standard deviation, for which percentage changes are considered. These transforms were selected because they were found to give optimal scaling performance (section 4). However, we note that applying a logarithmic transform to precipitation is widely used in statistical downscaling methods because this leads to a more Gaussian distribution and improved linearity between local precipitation and large-scale variables. In addition, scaling percentage changes or changes in the natural logarithm of a climate variable implies a relationship between the relative changes in the variable at local and large scales. This is appropriate for precipitation, which can be modeled as a multiplicative process (Sapiano et al. 2006) with relative changes in precipitation given by the sum of the relative changes in the multiplicative factors (viz., specific humidity q and uplift ω). In particular, if changes in precipitation are dominated by large-scale changes in q, we would expect a strong correspondence between the relative changes in precipitation at both the local and large scales.
We calculate regional average skill in local scaling for eight European regions defined in Fig. 1. These approximately correspond to the regions used in PRUDENCE (Christensen and Christensen 2007), except here we include land and sea grid cells. Regional skill is measured using the inverse nonlinear fraction (InvF) expressed as a bulk measure. This is calculated as the ratio of the regional average RCM change to the regional average nonlinear component. Grid cells where the magnitude of local change is not robust (|SNR| < 5) are excluded, except where the influence of internal variability is explicitly being investigated (section 3).
3. Influence of internal variability
In this section we use local scaling to estimate RCA3 changes for different driving GCMs and examine the contribution of internal variability to any errors in this technique. We use the scaling relationship derived from the ECHAM5-r1-driven integration to estimate RCA3 changes in the two other ECHAM5-driven ensemble members, and we compare these results with the skill in estimating RCA3 changes for HadCM3-driven integrations (Fig. 2). Apparent errors in scaling between the ECHAM5-driven ensemble members reflect internal variability, because the underlying change and therefore the scaling relationship is the same in each ensemble member.
For mean precipitation, local scaling only appears to be skilful (InvF > 2) in estimating changes for different driving GCMs (specifically, HadCM3) for northern regions in winter (red lines, Fig. 2). In winter, the skill in scaling for the HadCM3 driving models is only slightly lower than that for the different ECHAM5 ensemble members (InvF is typically a factor of 2 less, red versus blue lines). Thus, in this case, internal variability significantly contributes to errors in scaling and appears to explain much of the regional variation in scaling performance across Europe. Similarly, for extreme precipitation [95th percentile (P95)] in both seasons, there is evidence that internal variability may dominate leading to the poor scaling performance in many regions. For mean precipitation in summer, however, the situation is different. In this case, errors in scaling for the HadCM3 driving models are considerable (InvF ≃ 1, corresponding to scaling errors of the order of magnitude of the signal) and are much larger relatively than those resulting from internal variability, except in the far north [Scandinavia (SC)] and far south [the Iberian Peninsula (IP) and Mediterranean (MD)] of Europe.
The relationship between local scaling skill and SNR for these precipitation metrics is shown in Fig. 3 (upper panels). This confirms that for both mean and extreme precipitation in winter, internal variability at least partially restricts scaling skill. In particular, for those local regions where 1 < SNR < 10, which includes much of Europe,
For temperature, local scaling shows good skill in estimating changes for the HadCM3-driving models, particularly for mean temperature in winter, where InvF > 5 across Europe (Fig. 2). Skill is lower in summer than winter, for both mean and extremes of temperature, and this is not explained by internal variability. In general, there is no clear relationship between scaling performance and internal variability for temperature, with high values of SNR for mean and extreme temperature across Europe in both seasons (Fig. 3). Notably, for all temperature indices, InvF values for scaling across different driving GCMs are always lower than those obtained when internal variability is the only source of uncertainty. Thus, local scaling is never perfect, even when it is showing substantial skill (as is the case for mean winter temperature).
4. Performance of local scaling
The above results suggest that internal variability may lead to substantial errors in scaling for precipitation indices, where the scaling relationship is derived from a single 30-yr climate change experiment, particularly for extremes and in winter. Thus, in this section, we derive the scaling coefficient by fitting a linear regression model (with no intercept) to the three-member ECHAM5 initial condition ensemble. This reduces the effects of internal variability (Ruosteenoja et al. 2007). Additionally, we restrict our analysis of scaling performance to those grid cells where the magnitude of local change is robust, as assessed by the three-member ensemble (section 2b).
For each index of the daily distribution, we examined the performance of local scaling for estimating absolute changes, percentage changes, and changes in the natural logarithm of the index. Results for mean precipitation and temperature are shown in Fig. 4. It can be seen that, for precipitation, either considering percent changes or applying a logarithmic transform leads to improved scaling skill, and the corresponding scatterplots for these transforms support the simple multiplicative scaling function [Eq. (1)] assessed here. In general, the best skill was found on considering percent changes for mean precipitation and wet-day frequency, and on applying a logarithmic transform for standard deviation and percentiles of precipitation. For temperature, the best skill was found on considering absolute changes for all indices except standard deviation for which percent changes are considered. All of the results below correspond to these optimal transforms.
For mean precipitation, local scaling is generally skilful (InvF > 2) in winter where the local change is robust compared to internal variability, while in summer poor scaling skill remains (Fig. 5). In winter, the skill for mean precipitation extends to the tail of the distribution (although increasing the internal variability means fewer grid cells contribute at the tail; see Fig. 5). In fact, skill is actually greater for higher moments of the precipitation distribution in some cases (Fig. 6). For this season of the precipitation indices examined here, estimating changes in precipitation variance (std dev) is most skillful, and changes in the occurrence of wet days (Fwet) is least skillful. [We note that a poor regional skill for mean wintertime precipitation seen for some southern regions in Fig. 6 is due to poor performance in relatively few grid cells (see Fig. 5).] In summer, there is some suggestion that scaling may be more skilful for estimating changes in the occurrence of wet days than for mean precipitation. In this season, changes in precipitation variance and extremes are not robust compared to internal variability, and so scaling performance cannot be assessed for these indices.
For mean temperature, local scaling shows good skill in winter, with InvF ≥ 5 across almost all of Europe (Fig. 5). This skill extends to the upper tail of the distribution over much of Europe, although there is reduced skill for upper percentiles over northern Scandinavia and parts of eastern Europe. In summer, there is a reduced scaling skill for mean temperature and poor skill for upper percentiles in a band across central and northern Europe. For lower percentiles, however, scaling shows good skill in all of the regions (Fig. 6). For temperature variance (std dev), changes are not robust over much of central and southern Europe in winter; while in summer, on restricting the analysis to those grid cells where the local change is robust, poor scaling skill remains in many regions (Fig. 6).
In the above results, the GCM predictor was chosen to be the same index of the daily distribution as the RCM predictand. In Fig. 7, we examine the effect of choosing different GCM predictors. This suggests that mean temperature change in the GCM may be a better predictor of local changes in precipitation (and, in particular, wet-day occurrence) over northeastern regions [SC, mid-Europe (ME), and east Europe (EA)] in winter, and possibly also precipitation over Scandinavia and France in summer; while mean precipitation change in the GCM may be a better predictor of local changes in the variance and upper percentiles of temperature for central European regions [ME, EA, and the Alps (AL)] in summer. These differences in performance may not be significant (although they are not formally assessed here), but are of more interest and in particular may apply more generally where they can be understood on a physical basis (see section 6). Other than these cases, there is no consistent improvement in skill when using a different GCM predictor. We also find no evidence that using a multiple linear regression relationship, with both mean precipitation and temperature change in the GCM as predictors (not shown), gives improved skill in estimating local changes above a simple linear scaling (with one predictor). Thus, these results support the simple approach adopted here, with there being only isolated cases for which it is worth considering a modified scaling relationship.
We note that Fig. 7, to some extent, gives an indication of the relative performance of local scaling versus more traditional pattern scaling, which uses global-mean temperature change ΔTg as the predictor of the local climate response (section 1). Scaling performance using mean temperature change in the GCM as the predictor (red lines, Fig. 7) provides an upper limit on the expected skill on scaling by ΔTg. In particular, although mean temperature change in the GCM scales well with ΔTg across scenarios, this is much less likely to be the case across different driving GCMs (Mitchell 2003). The results here suggest that local scaling (using the GCM change in the given variable as the predictor, black lines) will give significantly improved skill compared to traditional pattern scaling when estimating local changes for different driving GCMs for precipitation indices (both mean and measures of variability) over much of Europe (at least over northwestern and southern regions in winter and central Europe in summer), and measures of temperature variability (particularly over the British Isles and France).
5. Strategies for ensemble design
a. Overview
The results of the previous section indicate that uncertainty in summertime changes in local precipitation across Europe, and also to a lesser extent local temperature across central/northern Europe, cannot be adequately represented by simply scaling changes resolved by GCMs. To capture uncertainty in the local response in each case, it is necessary to perform RCM simulations for a range of different driving GCMs. This then raises the question: how important is it to also sample a range of different RCMs? In this section, we make a first attempt at addressing this question by comparing the skill in scaling for different driving GCMs (InvFGCM) with that for different RCMs driven by the same GCM (InvFRCM); the latter equates to approximating changes in all RCMs driven by the same GCM as being equal. Based on the relative skill, we can identify the relative importance of sampling multiple-driving GCMs versus multiple RCMs, in the context of the limited ensemble assessed here. As such this analysis provides useful insights into how to prioritize RCM simulations for a given set of available GCM simulations, with implications for future more comprehensive multimodel experiments. The further subtlety of which of the various possible GCM–RCM combinations should be chosen is beyond the scope of the current paper (section 1).
From an analysis of scaling skill, we can identify the following five possible strategies to best prioritize RCM simulations given a constraint of limited resources in a GCM–RCM matrix study:
If |SNR| < 5, then a better sampling of internal variability is needed.
If |SNR| ≥ 5 and InvFRCM ≫ 2 ≥ InvFGCM, then simulations for different driving GCMs for a reduced set of RCMs should be carried out.
If |SNR| ≥ 5 and InvFRCM ≤ 2 ≪ InvFGCM, then priority should be given to sampling different RCMs, using local scaling to estimate changes for other driving GCMs.
If |SNR| ≥ 5 and InvFRCM ≫ 2 ≪ InvFGCM, then uncertainty (which would be sampled by the full GCM–RCM matrix) can be adequately represented by a small number of RCM simulations using local scaling to estimate changes for other driving GCMs.
If |SNR| ≥ 5 and InvFRCM ≤ 2 ≥ InvFGCM, then multiple RCMs and multiple driving GCMs need to be sampled.
We note that there have been a number of previous studies comparing the relative contributions of GCM uncertainty and RCM uncertainty to the overall uncertainty in the local climate response (e.g., Déqué et al. 2007). However, large GCM uncertainty per se does not necessarily imply the need to perform RCM simulations for multiple driving GCMs. In particular, where there are significant RCM differences and local scaling is skillful in estimating changes for different driving GCMs (even if the GCM spread is large), it may be better to focus computer resources on sampling different RCMs. Therefore, it is the error from scaling (or other statistical downscaling method) in estimating changes for different GCMs that should be compared to RCM differences when informing ensemble design. Of course, this does not give any indication of the need for multiple GCM simulations to capture uncertainty in the large-scale response, including the need to sample uncertainties in different feedback processes. We also note that the uncertainty we are considering here is specifically modeling uncertainty arising from different GCMs and RCMs, and not uncertainty from different emission scenarios.
In the analysis here we use the scaling relationship derived from the ECHAM5-driven RCA3 ensemble in order to estimate local changes for the HadCM3-driven RCA3 ensemble and for the RACMO and REMO RCMs, both of which are driven by ECHAM5-r3 (Fig. 8). Thus, this represents a limited study, although as discussed below the models assessed here are reasonably representative of the uncertainty range sampled by a larger ensemble. In addition, because the scaling relationships are trained from just three realizations of the same GCM–RCM pair, this actually poses a stringent test on scaling performance. In particular, we may expect scaling relationships derived from a range of GCM–RCM pairs to be more robust. Thus, where the analysis here indicates good scaling performance, we suggest this is very promising for more comprehensive efforts in the future.
We note that errors on estimating RACMO and REMO changes using the scaling relationship from the ECHAM5-driven RCA3 ensemble may in part reflect the differences in resolution of the RCMs (even though the precipitation and temperature fields in RACMO and REMO have been aggregated onto the RCA3 grid). Thus, in order to identify where this may lead to an underestimation of scaling skill, we also consider the skill in approximating changes in REMO as equal to those in RACMO. In this case InvF is given by the ratio of the RCM change to the difference in the changes between the two RCMs (green dotted lines, Fig. 8).
b. Scaling skill for different RCMs versus different driving GCMs
For mean precipitation, using local scaling to estimate changes for the different RCMs assessed here (which equates to assuming that all RCMs give the same change for given boundary forcing) generally leads to errors of less than 50% (InvFRCM > 2; see Fig. 8). Exceptions to this are seen over eastern Europe in winter, the Alps in winter, and Scandinavia in summer (although for the latter two cases few grid cells contribute because of a strong influence from internal variability; see Fig. 5). Internal variability is a major contributor to scaling error for mean precipitation, particularly in winter (section 3), and we anticipate that a significant proportion of this will be locally generated. However, a more comprehensive analysis would be needed to quantify to what extent these RCM differences can be attributed to internal variability.
Significant RCM differences (InvFRCM < 2) in changes in wet-day occurrence are seen in similar regions, namely, northeastern regions in winter and Scandinavia in summer, suggesting that this may be a major contributor to RCM differences in mean changes. For changes in precipitation extremes, significant RCM differences are only evident over the Alps in winter (although, again, few grid cells contribute because of the strong internal variability). In these regions, uncertainty in the local precipitation response may be significantly underestimated on neglecting RCM differences. (Although this assumes that the RCM differences are physically justifiable and not incompatible with their driving simulations, which would need to be assessed.)
For almost all of Europe in summer and southern regions in winter, local scaling skill is greater on estimating precipitation changes for different RCMs than different driving GCMs (for the models assessed here). By contrast, for changes in precipitation extremes over the Alps in winter, local scaling appears to be more skillful for different driving GCMs than different RCMs. For changes in wet-day occurrence over northeastern regions in winter and Scandinavia in summer, there are both significant RCM differences and also scaling shows limited skill for different driving GCMs.
For temperature indices the results are simpler, with consistent behavior across much of Europe. For mean temperature, in all of the regions and both seasons, assuming different RCMs give the same change (for a given driving GCM) leads to errors of about 10% or less (InvFRCM ≳ 10) for the models assessed here. For temperature extremes, assuming that different RCMs give the same change is also skillful, with differences generally less than 20%. For changes in temperature variance, however, differences between RCMs are greater, particularly over the Alps in winter and central Europe in summer. This suggests that in these regions, uncertainty in local changes in temperature variance may be significantly underestimated on neglecting RCM differences. In general, in summer, the skill in estimating local temperature change for different driving GCMs is consistently lower than that for different RCMs. This is similarly true over southern Europe in winter, except for temperature variance over the Alps; while further north, in winter, scaling shows similarly high skill in estimating temperature changes for different driving GCMs as for different RCMs.
c. Implications for ensemble design
The above results, although limited by the number of models assessed here, provide useful insight into possible strategies for GCM–RCM ensemble design. To capture uncertainty in precipitation change in summer, the results here suggest it is necessary to sample driving GCM uncertainty, but this may be limited to a reduced set of RCMs (Table 1). A possible exception to this may be Scandinavia, where there is some suggestion that sampling RCM differences may be important, although this is currently limited to relatively few grid cells showing a robust change. Here, and more widely for precipitation extremes in summer, better sampling of internal variability is needed (through a larger initial condition ensemble). We note that although sampling more GCM–RCM pairs would indirectly sample internal variability, it would be difficult to disentangle the different sources of uncertainty. Better sampling of internal variability is also needed to capture precipitation changes over southern Europe in winter. Elsewhere for winter, the results here suggest that the ensemble strategy will vary depending on the region and precipitation statistic of interest (Table 1). In particular, sampling driving GCM uncertainty for a reduced set of RCMs may not be the appropriate strategy for capturing precipitation changes over northeastern Europe or the Alps in winter, where sampling RCM differences may be equally or more important. In this season local scaling is more skillful, and it can be used to estimate precipitation changes for untried driving GCMs (with the exception of changes in precipitation occurrence, for which a modified scaling relationship may be appropriate; see section 4).
To capture uncertainty in local temperature change, the above results suggest that priority should be given to sampling different driving GCMs for a reduced set of RCMs (Table 2). Where simulations are not available, however, local scaling is skillful in estimating local changes for different driving GCMs, except over central/northern Europe in summer (although here a modified scaling relationship using a different predictor may be skillful; see section 4). An exception to this general rule is found for changes in temperature variance, for which sampling RCM uncertainty is equally important over central Europe in summer and more important for the Alps in winter; while for much of the rest of Europe better sampling of internal variability is needed.
These results are based on the analysis of four GCMs (viz., ECHAM5 and three perturbed physics variants of HadCM3) and three RCMs (RCA3, RACMO, and REMO). We note that although this is a limited ensemble, the model differences examined here are reasonably representative of those between other GCMs and RCMs. In particular, the difference between the four GCMs used here is found to span a large part of the total uncertainty range sampled by a larger RCA3 ensemble, which includes four other GCMs [Centre National de Recherches Météorologiques Coupled Global Climate Model, version 3 (CNRM-CM3); Bjerknes Centre for Climate Research (BCCR) Bergen Climate Model (BCM), version 2; Community Climate System Model, version 3 (CCSM3); and L’Institut Pierre-Simon Laplace Coupled Model, version 4; see Kjellström et al. 2010], although we may expect a greater spread across the larger Coupled Model Intercomparison Project phase 3 (CMIP3) ensemble (Lind and Kjellström 2008). In the case of the RCMs, there is no evidence to suggest that the models examined here are “clustered” in their responses compared to other RCMs (e.g., Christensen and Christensen 2007); that is, there is no evidence to suggest that we are underestimating uncertainty arising from RCM differences. We note that the differences between different RCMs and GCMs will include a component resulting from internal variability, as well as differences in model formulation.
The fact that local scaling is showing good skill across quite different driving GCMs here gives us confidence that the conclusions regarding scaling performance may apply more generally. Also the suggestion here that it may be possible to rationalize multimodel ensemble experiments by carrying out simulations for a reduced set of RCMs is very promising for future studies. Nevertheless, given the limited ensemble assessed here, it is important that the generality of these results is examined in future studies incorporating a greater breadth of GCM–RCM combinations. In particular, we suggest the need for multiple RCMs, over and above the need for better sampling of internal variability, be critically examined.
6. Discussion and conclusions
In this study we have examined the skill of a local scaling technique in estimating RCM changes for untried driving GCMs and the implications for designing GCM–RCM ensembles given limited resources. The scaling technique has the advantage of being simple and generic, and thus potentially applicable across regions and climate variables. In particular, the results here, assessing skill for a range of precipitation and temperature indices across Europe, support this simple approach and identify only isolated cases for which it is worth considering a modified downscaling relationship.
In general we find that local scaling is never a complete substitute for doing additional RCM runs, with skill always being lower than that obtained when internal variability is the only source of uncertainty. However, it can often work with acceptable skill, with exceptions. We note the results here correspond to downscaling from 300 to 50 km. On using this technique to estimate changes at finer scales (e.g., in 25-km RCMs), we would expect some reduction in skill, with an increased influence from internal variability. However, the analysis here, whereby the scaling function is fitted to each grid box individually, poses a stringent test on scaling skill. In particular, we find considerable spatial coherence in the scaling coefficient (not shown) and accounting for this in fitting the function could significantly improve scaling performance.
We find that internal variability leads to substantial errors in local scaling for daily precipitation indices, where the scaling relationship is derived from a single 30-yr climate change experiment. As a consequence, scaling relationships should be derived using an ensemble of integrations (by fitting a least squares regression line) and are only likely to be reliable where the magnitude of the local change is robust compared to internal variability. In this paper we train the scaling relationships from three realizations of the same GCM–RCM pair. However, scaling relationships derived from a range of GCM–RCM pairs (specifically a range of different driving GCMs for a given RCM) may be more robust. We note that in this case the GCM–RCM pairs used to determine the scaling relationship would not correspond to identically distributed realizations of the climate, and thus could not simply be used to assess the influence of internal variability. Instead, results from a smaller initial condition ensemble could be used to approximate SNR (because the sample provides an unbiased estimator of the population value) under the assumption that the relative influence from internal variability is the same for each GCM–RCM pair. In this case the appropriate SNR threshold corresponding to scaling errors less than 50% in the multimodel ensemble would depend on the size of this larger ensemble. In particular the SNR threshold would decrease with increasing numbers of GCM–RCM pairs (Kendon et al. 2008).
On restricting the analysis to those grid cells where the local change is robust, we find that local scaling generally shows skill in estimating local changes in precipitation indices (including mean, variance, and extremes) across Europe in winter; and mean and extreme temperature in both seasons, with the exception of hot extremes over central/northern Europe in summer. Thus, for these variables where simulations are not available (i.e., where there are gaps in the GCM–RCM matrix), local scaling can be used to estimate changes in an RCM for untried driving GCMs. We note that the scaling coefficient is RCM dependent (scaling across RCMs simply equates to assuming different RCMs give the same change for a given driving GCM), and thus this technique can only be applied for those RCMs that have downscaled at least one GCM.
Local scaling shows poor skill in summer for precipitation indices and temperature variance, and also upper percentiles of temperature in a band across central and northern Europe. This may be understood from the importance of local processes, such that the local change cannot be adequately represented simply by scaling large-scale changes resolved in the GCM. In particular, we suggest that the poor scaling skill for precipitation in summer is due to the importance of local convective processes, while for temperature indices it is the importance of local soil moisture processes (Rowell and Jones 2006). We note that the latter is also supported by the result that mean precipitation change in the GCM is a better predictor of local temperature change over central Europe in summer. In the case of soil moisture, there is a threshold below which it restricts evaporation, increasing the ratio of sensible to latent heating. This threshold dependence means that the disaggregation property of RCMs (i.e., their ability to sample variability not resolved at GCM scales) will be particularly important. Locally the soil moisture threshold may be exceeded even if this is not the case on the large scale, leading to very different temperature changes at both the local and large scales. This may also provide an explanation of why local scaling shows consistently less skill for variance than mean and extremes of temperature. In particular, for the extreme cases when the soil is either very dry or very wet in the GCM, the RCM is likely to produce a similar soil moisture state. However, when the GCM is in an intermediate state, there is more scope for the RCM soil moisture to differ locally. Thus it is changes in temperature variability (rather than the extremes) that are expected to differ most at local and large scales.
The fact that local scaling is generally more skilful in estimating changes in precipitation variance than the mean in winter, while showing poor skill for wet-day occurrence, may also be understood from a consideration of the underlying processes. In this season, large-scale increases in atmospheric moisture with warming are dominant in driving increases in precipitation across Europe (Kendon et al. 2010). Because changes in atmospheric moisture have a strong impact on the intensity of precipitation events, but less so their occurrence (Trenberth 1999; Kendon et al. 2010), they will more directly affect changes in precipitation extremes and variability than the mean. For precipitation occurrence, changes in relative humidity (RH) are more important, and, because this is essentially an “on–off” process controlled by the exceedence of an RH threshold, the disaggregation property of RCMs is again important. As above for soil moisture, the RH threshold may be exceeded locally even if this is not the case on the large scale, leading to very different changes in precipitation occurrence on local and large scales. This may also explain why temperature change in the GCM is a better predictor of local changes in precipitation occurrence across much of Europe in winter. We note that an additional factor leading to poor local scaling skill for wet-day occurrence over northeastern Europe in winter may be the importance of local snow/ice processes.
Where local processes are important we may expect significant differences in the local climate response between different RCMs. For the models assessed here, this is indeed found to be the case for wet-day occurrence over northeastern Europe in winter and temperature variance over central Europe in summer, with the representation of local land surface processes differing between different RCMs. Significant RCM differences are also seen over the Alps in winter for both precipitation indices and temperature variance. Here, although local orographic processes are important, their interaction with the large-scale conditions is similar for different driving GCMs, and thus local scaling shows skill. Perhaps surprising is the finding that different RCMs lead to relatively smaller differences in the local precipitation response in summer (once the effects of internal variability have been accounted for). This suggests that although local convective processes are important (as evidenced by poor local scaling skill), they are being represented similarly in the different RCMs examined here.
In general, the results here suggest that ensembles should be designed to prioritize the sampling of GCM uncertainty by using a reduced set of RCMs. However, our ability to determine the ensemble strategy for precipitation indices, particularly in summer, is limited by a need for better sampling of internal variability. We note, compared to previous studies (Déqué et al. 2007), that the results here suggest a greater contribution from internal variability to RCM differences in precipitation change. In cases where input from multiple RCMs is needed (e.g., the Alps in winter; see Tables 1 and 2), it may be possible to restrict the number of GCMs being sampled, using local scaling to estimate changes for untried driving GCMs. However, this is not the case for wet-day occurrence over northeastern Europe in winter or temperature variance over central Europe in summer, unless a modified scaling relationship is used.
This study is based on a limited number of models, and thus it is important that the conclusions here are confirmed through analysis of a more comprehensive GCM–RCM ensemble, once this becomes available. Additionally, assessing the relative importance of sampling multiple driving GCMs versus multiple RCMs is only the first step in addressing the difficult issue of ensemble design. Judging the “optimal ensemble design” is a much wider question of which of the possible GCM–RCM combinations should be chosen, accounting for dependencies across the ensemble. This will be difficult to resolve until matrix experiments that are more systematically designed become available.
We note that although this analysis generally supports the use of a reduced set of RCMs in multimodel ensemble approaches, it does not imply that RCMs are providing limited “added value.” The representation of local processes by RCMs is essential both for finescale detail and extremes. However, uncertainty in the local climate response appears in many cases to be driven by uncertainty in the large-scale drivers or to be due to internal variability.
Acknowledgments
We thank Clare Goodess and Markku Rummukainen for very useful comments on an earlier version of this paper, and an anonymous reviewer for very constructive comments. Financial support was provided by the European Commission’s Sixth Framework Programme under Contract GOCE-CT-2003-505539 (ENSEMBLES). Kendon, Jones, and Murphy also gratefully acknowledge funding from the Joint Department of Energy and Climate Change (DECC) and Department for Environment Food and Rural Affairs (Defra) Integrated Climate Programme–DECC/Defra (GA01101). Part of the analysis work has been made within the Swedish Mistra-SWECIA program. Many of the model simulations with the Rossby Centre regional climate model were performed on the climate computing resource Tornado, funded with a grant from the Knut and Alice Wallenberg Foundation. We also thank the climate modeling groups for their roles in making available the ENSEMBLES multimodel dataset.
REFERENCES
Christensen, J. H., and O. B. Christensen, 2007: A summary of the PRUDENCE model projections of changes in European climate by the end of this century. Climatic Change, 81 , (Suppl. 1). 7–30.
Christensen, J. H., J. Räisänen, T. Iversen, D. Bjørge, O. B. Christensen, and M. Rummukainen, 2001: A synthesis of regional climate change simulations—A Scandinavian perspective. Geophys. Res. Lett., 28 , 1003–1006.
Collins, M., B. B. B. Booth, B. Bhaskaran, G. R. Harris, J. M. Murphy, D. M. H. Sexton, and M. J. Webb, 2010: Climate model errors, feedbacks and forcings: A comparison of perturbed physics and multi-model ensembles. Climate Dyn., in press, doi:10.1007/s00382-010-0808-0.
Déqué, M., and Coauthors, 2007: An intercomparison of regional climate simulations for Europe: Assessing uncertainties in model projections. Climatic Change, 81 , (Suppl. 1). 53–70.
Harris, G. R., D. M. H. Sexton, B. B. B. Booth, M. Collins, J. M. Murphy, and M. J. Webb, 2006: Frequency distributions of transient regional climate change from perturbed physics ensembles of general circulation model simulations. Climate Dyn., 27 , 357–375. doi:10.1007/s00382-006-0142-8.
Hewitt, C. D., and D. J. Griggs, 2004: Ensembles-based predictions of climate changes and their impacts. Eos, Trans. Amer. Geophys. Union, 85 .doi:10.1029/2004EO520005.
Jacob, D., and Coauthors, 2001: A comprehensive model intercomparison study investigating the water budget during the BALTEX-PIDCAP period. Meteor. Atmos. Phys., 77 , 19–43.
Jacob, D., O. B. Christensen, F. J. Doblas-Reyes, C. Goodess, A. Klein Tank, P. Lorenz, and E. Roeckner, 2008: Information on observations, global and regional modelling data availability and statistical downscaling. ENSEMBLES Tech. Rep. 4, 10 pp. [Available online at http://ensembles-eu.metoffice.com/tech_reports/ETR_4_vn1.pdf].
Jungclaus, J. H., and Coauthors, 2006: Ocean circulation and tropical variability in the coupled ECHAM5/MPI-OM. J. Climate, 19 , 3952–3972.
Kendon, E. J., D. P. Rowell, R. G. Jones, and E. Buonomo, 2008: Robustness of future changes in local precipitation extremes. J. Climate, 21 , 4280–4297.
Kendon, E. J., D. P. Rowell, and R. G. Jones, 2010: Mechanisms and reliability of future projected changes in daily precipitation. Climate Dyn., 35 , 489–509. doi:10.1007/s00382-009-0639-z.
Kjellström, E., G. Nikulin, U. Hansson, G. Strandberg, and A. Ullerstig, 2010: 21st century changes in the European climate: Uncertainties derived from an ensemble of regional climate model simulations. Tellus, doi:10.1111/j.1600-0870.2010.00475.x.
Lenderink, G., B. van den Hurk, E. van Meijgaard, A. van Ulden, and J. Cuijpers, 2003: Simulation of present-day climate in RACMO2: First results and model developments. KNMI Tech. Rep. 252, 24 pp. [Available online at http://www.knmi.nl/publications/fulltexts/trracmo2.pdf].
Lind, P., and E. Kjellström, 2008: Temperature and precipitation changes in Sweden: A wide range of model-based projections for the 21st century. SMHI Meteorology and Climatology Rep. 113, 66 pp. [Available online at http://www.smhi.se/polopoly_fs/1.3297!RMK113_rapport_090421.pdf].
Mearns, L. O., W. Gutowski, R. Jones, R. Leung, S. McGinnis, A. Nunes, and Y. Qian, 2009: A Regional Climate Change Assessment Program for North America. Eos, Trans. Amer. Geophys. Union, 90 , 311. doi:10.1029/2009EO360002.
Mitchell, J. F. B., T. C. Johns, M. Eagles, W. J. Ingram, and R. A. Davis, 1999: Towards the construction of climate change scenarios. Climatic Change, 41 , 547–581.
Mitchell, T. D., 2003: Pattern scaling: An examination of the accuracy of the technique for describing future climates. Climatic Change, 60 , 217–242.
Nakićenović, N., and Coauthors, 2000: Emission Scenarios. Cambridge University Press, 599 pp.
Roeckner, E., and Coauthors, 2006: Sensitivity of simulated climate to horizontal and vertical resolution in the ECHAM5 atmosphere model. J. Climate, 19 , 3771–3791.
Rowell, D. P., and R. G. Jones, 2006: Causes and uncertainty of future summer drying over Europe. Climate Dyn., 27 , 281–299.
Rummukainen, M., and Coauthors, 2003: Regional climate scenarios for use in Nordic water resources studies. Nord. Hydrol., 34 , 399–412.
Ruosteenoja, K., H. Tuomenvirta, and K. Jylhä, 2007: GCM-based regional temperature and precipitation change estimates for Europe under four SRES scenarios applying a super-ensemble pattern-scaling method. Climatic Change, 81 , (Suppl. 1). 193–208.
Samuelsson, P., and Coauthors, 2010: The Rossby Centre regional climate model RCA3: Model description and performance. Tellus, doi:10.1111/j.1600-0870.2010.00478.x.
Sapiano, M. R. P., D. B. Stephenson, H. J. Grubb, and P. A. Arkin, 2006: Diagnosis of variability and trends in a global precipitation dataset using a physically motivated statistical model. J. Climate, 19 , 4154–4166.
Trenberth, K. E., 1999: Conceptual framework for changes of extremes of the hydrological cycle with climate change. Climatic Change, 42 , 327–339.
Widmann, M., C. S. Bretherton, and E. P. Salathe, 2003: Statistical precipitation downscaling over the northwestern United States using numerically simulated precipitation as a predictor. J. Climate, 16 , 799–816.
Summary of the ensemble strategy and local scaling skill for capturing changes in daily precipitation statistics. Results are shown for mean and extreme precipitation and Fwet, where extreme corresponds to the 95th percentile of wet days. For ensemble strategy, “√” indicates sampling priority, where “√” in both columns indicates the need to sample both multiple RCMs and multiple GCMs, and “–” indicates the strategy is not determined (resulting from local change not being robust). For local scaling skill, “√” indicates skillful (InvF > 2), “×” indicates poor skill, “–” indicates skill not assessed (resulting from the local change not being robust), and “(?)” indicates it may be skillful for a modified predictor.
Summary of the ensemble strategy and local scaling skill, for capturing changes in daily temperature statistics. Results are shown for mean, extreme, and the variance of 2-m temperature, where extreme corresponds to the 95th percentile of the daily distribution. Definitions are as in Table 1.