## 1. Introduction

The Intergovernmental Panel on Climate Change (IPCC) reports that “warming of the climate system is unequivocal” and that “most of the observed increase in global average temperature since the mid–twentieth century is very likely due to observed increase in anthropogenic greenhouse gas concentrations” (Solomon et al. 2007). The influence of anthropogenic forcing has also been detected in changes in observed temperature extremes (Zwiers et al. 2011), and in other components of the climate system, such as the global atmospheric circulation (Gillett et al. 2003), the global distribution of precipitation over land (Zhang et al. 2007), extreme precipitation over Northern Hemispheric lands (Min et al. 2011), humidity (Willett et al. 2007; Santer et al. 2007), and the regional hydrological cycle in the western United States (Barnett et al. 2008). Because of the long effective lifetime of greenhouse gases such as CO_{2} and the large thermal inertia of the oceans, current emissions and those of past decades make it practically impossible to avoid further warming and associated changes in other components of the climate system (Solomon et al. 2009; Matthews and Caldeira 2008; Gillett et al. 2011). Adaptation to a changing climate is therefore necessary.

To adapt to the changing climate effectively, decision makers require as much information as possible on how climate will evolve in the future. Such information is typically provided by simulations of general circulation models (GCMs) under prescribed future emissions scenarios. GCMs do not simulate the climate perfectly because of the limitations in the completeness of our understanding of climate processes, lack of sufficient observations for validation, and limitations in computing power. The future emissions are highly dependent on future socioeconomic development, which is also difficult to predict precisely. Additionally, the chaotic nature of the climate system exerts strong short- and long-term variability that overlays the response to forcing. Therefore, future changes in the climate projected by different GCMs or by different simulations of the same GCM may be different and are uncertain. Consequently, simulations from multiple GCMs have been used to depict the range of possible changes in the future climate and sometimes to construct probabilistic projections (e.g., Tebaldi and Knutti 2007; Furrer et al. 2007). Since adaptations to climate change are typically implemented at the community level (Field et al. 2007), it is important to develop future scenarios and quantify their uncertainty at that level as well. However, outputs from GCM simulations generally lack sufficient details, especially in space, to meet this need. Thus, it is often necessary to derive finescale detailed information from coarse-resolution GCM simulations through downscaling using either regional climate models (RCMs) or statistical methods. As there are differences between the two approaches, as well as among RCMs and among statistical downscaling methods, downscaling to smaller spatial scales necessarily introduces additional uncertainties.

High-resolution probabilistic projections of future climate for North America are lacking although some efforts have been made to address this problem. Tabor and Williams (2010) interpolated projected changes by the Coupled Model Intercomparison Project (CMIP3) models to 10-min resolution, but the interpolated high-resolution data are essentially the same as those of GCM simulations at coarse resolution as the effects of local physical processes such as elevation were not considered. Maurer et al. (2007) applied a quantile mapping approach to CMIP3 GCM simulations to estimate temperature and precipitation changes over North America at high spatial resolution. They estimated cumulative distribution functions (CDFs) of monthly temperatures and precipitation for 1950–99 from observations and model simulations at 2° resolution. They then used the differences between the quantile maps of observations and simulations to adjust their simulations for the twentieth and twenty-first century climates, and interpolated the adjusted values to ⅛° (~12 km) resolution. A drawback of their method is that a dense observational network is required for bias correction, which limits its applicability to regions with sparse observing stations including much of Canada. To overcome this limitation, ClimateBC combined interpolation (Wang et al. 2005) and elevation adjustments techniques (Daly et al. 1994; Hamann and Wang 2005) to generate 4-km-resolution climate data focused on western Canada, especially in British Columbia, the Yukon Territories, the Alaska panhandle, and parts of Alberta and the United States.

Another important effort for North American regional climate change study is the North American Regional Climate Change Assessment Program (NARCCAP; Mearns et al. 2009) in which outputs from multiple GCMs are dynamically downscaled to high resolution over North America using multiple RCMs. A significant advantage of dynamical downscaling is that physical processes that are not well simulated, or not simulated at all, in GCMs are better represented in RCMs, particularly in cases where there is interaction with the surface at finescales (e.g., see Laprise 2008). This may produce not only linear but also nonlinear responses and feedbacks between large- and regional-/local-scale processes and, thus, better account for small-scale variability. While the effort continues, the number of RCM runs currently available is still too small to produce probabilistic projections at high resolution.

Here, we present a statistical approach that makes use of outputs from ensembles of multiple GCMs available from the CMIP3 archive and RCMs already available from NARCCAP to produce a wide range of plausible scenarios. The assumption is that for surface air temperature, the small-scale response to large-scale variability can, to first order, be captured by a linear statistical model, which would imply that the nonlinear part of the small-scale response to large-scale variability is reflected in the residual variability at small scales that is not represented by the statistical model. This approach, and its supporting assumption, enables the construction of future probabilistic projections of the monthly or annual temperature changes at the spatial resolutions of the RCMs. We use the statistical relationship between the RCM output and that from the driving GCM as an emulator to downscale CMIP3 simulations that have not been dynamically downscaled by RCMs to produce multiple climate scenarios. We then estimate the empirical distribution of the downscaled high-resolution temperature projections to come up with probabilistic projections. We also attempt to partition the variance of the climate scenarios into various sources to provide some understanding of the source of the uncertainty of the scenarios. The remainder of the paper is structured as follows. We describe the model data in section 2, and present our methods in section 3. Results are given in section 4 followed by our conclusions and some discussion in section 5.

## 2. Data

High-resolution monthly temperature data come from simulations of RCMs participating in NARCCAP (Mearns et al. 2009). Six RCMs from different research centers participate in this program (Table 1). Ideally, these six RCMs would be driven with boundary conditions from four different GCMs including the National Center for Atmospheric Research (NCAR) Community Climate System Model, version 3 (CCSM3); the Canadian Centre for Climate Modelling and Analysis (CCCma) Coupled General Circulation Model, version 3 (CGCM3); the Geophysical Fluid Dynamics Laboratory Climate Model version 2.1 (GFDL CM2.1); and Hadley Centre Coupled Model version 3 (HadCM3). In fact, each RCM will be driven by only two GCMs with the result that a total of 12 RCM runs will eventually be available. However, at the time of writing, RCM simulations and the driving GCM outputs for both historical and future climate simulations covering 1971–2000 and 2041–70 are only available from five RCMs, including the Canadian Regional Climate Model (CRCM) and the Weather Research and Forecasting model using the Grell scheme (WRFG) driven by a CCSM3 run, CRCM and RCM3 driven by a CGCM3 run, and RCM3 driven by a GFDL CM2.1 run. All RCM runs were produced under the IPCC’s Special Report on Emissions Scenarios (SRES) A2 emissions scenario. Monthly mean temperatures from those simulations are used in the current analysis. The NARCCAP RCM runs cover North America, including Canada, the United States, and northern Mexico, at a resolution of about 50 km. The spatial domains and grids used by the different regional climate modeling groups are slightly different and, therefore, to simplify the computations in our analysis, outputs from RCM3 and WRFG are interpolated onto the 45-km grid of CRCM using a nearest gridpoint value approach. Data from coarse-resolution driving GCMs are also interpolated onto the same CRCM grids using an inverse distance-weighting method (Shen et al. 2001).

Regional climate models.

Monthly mean temperatures from both the historical simulations and the future projections of GCMs were retrieved from the CMIP3 data archive at the Program for Climate Model Diagnosis and Intercomparison (PCMDI). There are outputs from 23 GCMs of different spatial resolutions ranging from 1.125° × 1.125° [Model for Interdisciplinary Research on Climate 3, high-resolution version, MIROC3(hires)] to 5° × 4° (e.g., Goddard Institute for Space Studies Model E-R, GISS-ER). Table 2 provides brief descriptions of the models. The number of runs (ensemble size) available for different GCMS from the PCMDI web site varies from one to nine. We consider two future emission scenarios (SRES A2 and B1). There are 38 runs under the A2 scenario and 44 runs under the B1 scenario covering the twenty-first century. Data for 1971–2000 from the twentieth-century simulations of each model have been used to compute the current climatology because many impact studies use 1971–2000 as a base period. Data for 2001–99 under both emission scenarios have been used to construct future scenarios.

Global climate model data availability.

## 3. Methods

Our method involves statistical downscaling of GCM outputs, estimation of the empirical distribution of downscaled scenarios, and partitioning of the estimated uncertainty into different sources. An overview of our procedure is sketched in Fig. 1 and details of each step are outlined below.

### a. Statistical downscaling

Statistical downscaling has been widely used for the construction of high-resolution climate change scenarios (Carter et al. 2007*)*.Thus approach is based on the premise that while a GCM may not simulate current and future climate very well for some variables and does not provide information at small scales, some aspects of the large-scale information simulated by GCMs are reliable and as such may be used to derive smaller-scale information using statistical methods. The end results could potentially be more usable than direct outputs from the GCMs. There are different ways to conduct statistical downscaling, but most methods can be considered to be either linear or nonlinear regressions.

Here, we use a linear regression model to establish a statistical relationship between coarse-resolution GCM-simulated temperatures and higher-resolution RCM-simulated temperatures. A similar approach was used by the U.K. Climate Projections 2009 (UKCP09) with the aim of reducing overfitting (Murphy et al. 2010) that might occur when more complex statistical models employing multiple predictors are used for downscaling. As different models may have different systematic biases and we are only interested in model simulated changes in the future, the model simulated climatology for period of 1971–2000 is removed from the respective model runs prior to the regression analysis both for GCMs and RCMs.

In a typical statistical downscaling setting, the relationship between large-scale predictors and local predictand is established based on the data for the past and current climates and it is then applied to GCM-simulated predictors for the future climate, assuming that the statistical relationship holds in the future. This implies in many applications that the regression relationship is assumed to be valid beyond the data range from which the regression was derived. The validity of such a strong assumption is hard to verify. Here, we estimate the regression equation using data from 1971 to 2000, and from 2041 to 2070. This reduces the need to assume that a regression based on the past remains valid in the future as most of the twenty-first century is covered. Additionally, the predictive error for the near future would be relatively small since the near-future climate is close to the center of the past and future climates. Note, however, that we do assume that the regression relationship derived from outputs of one RCM–GCM pair may be applied to outputs from other GCMs.

*y*and

_{it}*x*are temperature anomalies from an RCM simulation and the driving GCM at grid point

_{it}*i*for given year

*t*, respectively. Note that for each location

*i*, a single relationship in the form of Eq. (1) is fitted to all available pairs of RCM–GCM values {(

*y*,

_{it}*x*),

_{it}*t =*1971, 1972, …, 2000, 2041, 2042, …, 2070} within the two periods. Here,

*i*= 1, …, 140 × 115 (which is the size of the CRCM 45-km grid),

*β*

_{i}_{0}is the intercept,

*β*

_{i}_{1}is the slope of a trend in time that allows for differences between RCM and GCM time tendencies under forcing, β

_{i2}is the regression coefficient that scales the GCM temperature anomalies, and

*ε*

_{it}is the regression residual. Different regression equations are separately estimated for annual, seasonal, and monthly data. In all cases, we assume that the regression residuals have a Gaussian distribution, which is reasonable for temperatures on these time scales.

We also tried a slightly different regression model in which the time trend is not considered, and different regression coefficients and intercepts are considered for the two periods (1971–2000 and 2041–70). We test if regression equations for the two periods are different. We found that, overall, there does not appear to be a statistically significant difference in the regression coefficients derived from the two periods. However, the intercepts derived from the two periods may differ, especially for summer temperatures, suggesting that there may be some nonlinearity in the responses of the RCMs that we have used to the GCM forcing, perhaps related to differences in the radiative forcing in the RCMs and their driving GCMs. These suggest that a time trend in the regression is justified.

Time series of seasonal mean temperature at nearby grid boxes are highly correlated due to the strong spatial dependence that is seen in surface air temperature (e.g., North et al. 2011). It would be desirable to account for spatial dependency among sites by using, for example, the generalized linear modeling (GLM) framework proposed by Chandler (2005). However, the direct application of GLM to our dataset is difficult due to the number of data points involved. Also, the strong spatial covariance structure of temperature implies that nearby points generally cannot provide much additional information (in the statistical sense) that would help to reduce the downscaling prediction error at the RCM point of interest. The large spatial covariance structure of temperature on annual time scales implies that the variability of temperature is dominated by the variation on large spatial scales, which further implies that the eigenspectrum of temperature is steep, with a large fraction of the variance being representable by a small number of EOFs. For example, the first 15 EOFs and their associated time series explain more than 90% of the winter temperature interannual variability simulated by any of the RCMs used in this study (not shown). For similar reasons as with GLMs, approaches that use a dimension reduction procedure, such as EOF truncation, may also not necessarily provide a better solution.

To confirm that the local regression is a suitable approach in the case of surface air temperature, we considered a simplified version of a GLM to determine whether spatial dependence on RCM scales could be exploited to improve the downscaled temperature estimates. Consequently, we divided the whole region into many nonoverlapping subregions of 150 km × 150 km with each subregion consisting of as many as 17 grid boxes at 45-km resolution. Spatial dependency among the regression residuals from Eq. (1) within subregions was considered by using the seemingly unrelated regression model (Zellner 1962). The resulting downscaled temperatures obtained with this approach were very similar to those obtained using only local regression models of the Eq. (1), with virtually identical mean squared errors. More sophisticated spatial approaches, such as the Bayesian approach of Kaufman and Sain (2010), could in principle improve several aspects of the model fitting. However, such approaches would be harder to implement when so many spatial locations need to be considered, and there is always a compromise that needs to be made between the reduction of errors and the smoothness of the regional/local details. Nevertheless, we will consider such approaches in subsequent work to downscale other variables, such as precipitation and near-surface wind speed.

The performance of the regression models is evaluated using a cross-validation procedure within an RCM–GCM pair. This involves 1) using data values for all years but one to establish a regression between RCM series and the corresponding series from the driving GCM, 2) applying the regression equation to the GCM simulation for the left-out year to “predict” what the RCM would simulate for that year, 3) repeating steps 1 and 2 until every year in the series has been predicted, and 4) comparing the predicted and the RCM-simulated values. To verify the assumption that the regression relationship derived from one pair of RCM–GCM outputs may be applied to other GCMs, we compare statistically downscaled results with those dynamically downscaled by RCMs. This involves 1) applying regressions derived from one RCM–GCM pair to outputs of another GCM, and 2) comparing the statistically downscaled values with those dynamically downscaled by a RCM driven by the outputs of that second GCM. The correlations and the root-mean-square errors (RMSEs) between the RCM dynamically downscaled and statistically downscaled values are computed as measures of regression model performance.

### b. Estimation of the projected temperature distribution

*i*in the RCM domain (

*i*= 1, …, 140 × 115). Suppose also that

*x*is a temperature value simulated by GCM

_{it′j}*j*different from that used to fit model (1) at time

*t*, where

*t*′ is a time for which an estimate of a downscaled projection

*y*from GCM

_{it′j}*j*is desired. Here, we will use

*t′*in the range 2001–99, which was the period for which the CMIP3 projections were made and

*j*could represent any CMIP3 projection simulation, most of which have not been downscaled with an RCM for location

*i*. Then, assuming that the fitted regression model continues to hold for GCM

*j*, the best available estimate of the downscaled projection is given byThe error variance associated with this projection is given bywhere

*y*is the projected value that would have been realized had the RCM from the RCM–GCM pair used to train the regression been used to dynamically downscale GCM

_{it′j}*j*. Here,

*i*as diagnosed from the original RCM–GCM pair, which were estimated by the ordinary least squares method (Neter et al. 1985). The two components of the error variance are estimated in a straightforward fashion based on the values of the GCM

*j*simulated predictors

*x*, the uncertainty of the estimated regression coefficients, and the estimated residual variance

_{it′j}Next, we describe the approach that we have used to identify the uncertainty in the downscaled GCM results that originate from the GCMs, RCMs, and downscaling approach combined. For every grid point, five sets of regression equations are separately estimated from the five pairs of RCM–GCM runs. Therefore, for a given projected change by a GCM, there are five projections with different means and variances. As there are a total of 38 GCM temperature projections under the A2 emission scenario, the application of statistical downscaling yields, for each grid point and each year, a total of 5 × 38 = 190 projections with different means and variances. It is difficult to devise an analytic form of a probability distribution to describe these 190 projections because they are not independent due to the use of common regression equations and the fact that it is also difficult to estimate the covariance structure among them. However, estimation of temperature at some quantiles may be sufficient for many adaptation studies. We therefore estimate the 5th, 10th, 25th, 50th, 75th, 90th, and 95th percentiles of the projected temperature change using an empirical sampling method. Specifically, we randomly draw 100 values from a normal distribution with a mean and a variance obtained from each of the 190 statistically downscaled projections from Eqs. (2) and (3). This produces 19 000 random values of temperature change for a grid point at any given year. These 19 000 values are used to estimate the empirical quantiles.

Note that this procedure accounts for (i) sampling uncertainty in the regression coefficients [first term in Eq. (3)]; (ii) uncertainty that arises because there are residual differences between the RCM and GCM that reflect the internal variability that is simulated by RCMs independently of their driving models [second term in Eq. (3)]; (iii) uncertainties in the emission scenario forcing and the response to the forcing as the difference in the means from different scenarios; (iv) structural uncertainty in the models, which is better represented by the large ensemble of GCMs than by the small ensemble of RCMs; and, to a limited extent, (v) structural uncertainty in the downscaling model (i.e., to the extent that the linear model that we have chosen fits some RCM–GCM pairs better than other RCM–GCM pairs). Not fully represented is uncertainty due, for example, to nonlinearities in the RCM–GCM relationships that may become apparent under stronger forcing at the end of the twenty-first century.

Assuming that the regression equations derived under the A2 emission scenario are still valid for intermediate times and different forcings (such as the B1 emissions scenario), the empirical distribution for the B1 scenario was obtained in a similar way by applying the regressions to the 44 GCM simulations under the B1 scenario. As temperature increases more in the A2 emission scenario than in the B1 scenario, the range of GCM-simulated temperature changes under the B1 scenario is expected to be within that in the A2 scenario. Therefore, the application of regressions established from the A2 simulations to the B1 simulations should not introduce additional predictive errors.

### c. Quantifying uncertainty by a mixed effect ANOVA model

To understand the relative importance of the different sources contributing to the uncertainty in the high-resolution future projections, we partition the variance of the projected temperature to possible sources including differences in GCMs, the effects of internal variability simulated by GCMs, differences in the regressions derived from different pairs of RCM–GCM simulations, and the predictive errors of the regression equations. It is difficult to separate the effect of uncertainty due to differences in GCMs and that due to natural internal variability in the simulations because for many GCMs only one simulation is available under a particular emission scenario. We therefore lump the uncertainty due to different GCMs and that due to natural internal variability into one, as modeling uncertainty, and consider only three different sources of uncertainty, namely GCMs, RCMs, and prediction error from statistical downscaling. Differences among GCMs and RCMs also reflect differences in forcing as realized in those models. To provide a flavor of the range of uncertainty that might be associated with the choice emission scenario, we also computed the difference in the median of the temperature scenarios corresponding to the two emission scenarios.

The contributions of the three factors under a given emission scenario to the overall projection uncertainty can be quantified within the framework of a mixed-effects analysis of variance (ANOVA) model with the emission scenario treated as a fixed factor. A natural approach would be to use a full ANOVA model, which contains an emission scenario factor, GCM modeling factor, and statistical downscaling factor as the main effects, as well as the interactions among them. Prediction error from statistical downscaling is represented by the error term in this ANOVA model. Treating a factor as fixed implies an assumption that its levels compose a fixed set of known values, which is appropriate to first order for the emissions factor because emission scenarios are prescribed and do not span the whole population of all plausible future emissions. Treating a factor as a random effect assumes that the experimental level of this factor is randomly selected from its population, that is, the combination of all the possible levels (infinite). Thus, uncertainty in the projection values caused by a fixed effect is deterministic and is characterized by differences in the mean values at its different levels, while uncertainty caused by a random effect is characterized by its variance among the whole population of levels. To simplify the quantification of uncertainty from the various sources, it is desirable that the experimental design be “orthogonal.” This is achieved by using the largest subset of available climate model simulations such that for any given GCM represented in the subset, there are equal numbers of A2 and B1 simulations for that GCM. With this restriction, we are able to analyze the downscaled results from a total of 36 GCM runs for each of the A2 and B1 emission scenarios.

*μ*is the grand mean of projection ensembles;

*i*th forcing scenario, where

*i*=1, 2;

*β*is the random effect on the temperature projection associated with the

_{j}*j*th GCM simulation, where

*j*=1, … , 36;

*γ*is the random effect on the temperature projection associated with the

_{l}*l*th regression relation between RCM and GCM, with

*l*=1, … , 5; and

*ε*is the residual term, which represents projection uncertainties caused by the statistical downscaling prediction error, with

_{ijlk}*k*= 1, … , 100. The model also contains two- and three-way interaction terms that are treated as random. That is, the model accounts for the possibility that the effects of the forcing scenarios, choice of GCM, and downscaling uncertainty may not be additive. Note that mixed-effects ANOVA models have been used previously to analyze ensemble climate simulations (e.g., Zwiers 1996).

*σ*

_{β}

^{2},

*σ*

_{γ}

^{2},

*σ*

_{αβ}

^{2},

*σ*

_{αγ}

^{2},

*σ*

_{βγ}

^{2}, and

*σ*

_{αβγ}

^{2}, respectively. The remaining uncertainty, which is quantified by

*σ*

_{ε}

^{2}, is attributable to prediction error as discussed above. Thus, the total projection uncertainty

*σ*

^{2}caused by the random factors is given by

*m*th run of the

*j*th GCM, where

*m*= 1, 2, 3 in this experiment. Similar assumptions of randomness, normality, and mutual independence apply to

*ρ*. In addition, a similar decomposition of the total uncertainty can be obtained, which would provide a flavor of the relative importance of the internal variability simulated by GCMs to other factors.

The ANOVA models given here allow for the formal testing of contributions for the various terms under the strong assumptions of normality and independence that are listed above. However, these assumptions are not likely to hold completely given that the variability analyzed was generated within an “ensemble of opportunity” of climate models. Indeed, the determination of a statistically appropriate approach for interpreting the variation seen in a collection of model simulations, such as that available from CMIP3, remains an open question [e.g., see the discussion in Rougier (2008)]. However, our main aim here is not to make precise inferences or to provide a detailed decomposition of the total uncertainty; rather, our main interest is to have a pragmatic approach that allows us to assess the bulk contribution of the main factors. The following discussion will therefore focus just on the main effect of each factor.

## 4. Results

We have constructed high-resolution temperature scenarios on a monthly basis for both the A2 and B1 emission scenarios and will make them publicly available. To simplify the writing and to avoid repetition, in this paper we only report upon the results for the winter [December–February (DJF)] and summer [June–August (JJA)] seasons based on the A2 scenario. Results for the B1 scenario are similar to those of the A2 scenario though the B1 scenario has a smaller temperature increase.

### a. Validation of the statistical downscaling model

The correlation coefficients in the cross-validated series for CRCM–CGCM3 for winter and summer temperatures are displayed in Fig. 2a. Correlations appear to be high for the series based on simulations from CRCM driven by CGCM3, but lower for those obtained from other RCM–GCM pairs (not shown). Winter temperatures in the CRCM simulation are highly correlated with those in the driving GCM, with the lowest correlation coefficient close to 0.9. Summer temperatures have lower correlation coefficients especially in the central United States but the lowest value is still close to 0.75. The difference in the correlation between winter and summer may reflect differences in the influence of land surface processes in the two seasons. A large portion of the land surface is covered by snow or frozen ground in winter, thereby partially decoupling the atmosphere from the land surface, whereas the land surface is much more strongly coupled to the atmosphere in summer. This results in the stronger influence of large-scale variations on small-scale variations of surface air temperature in winter than in summer and, thus, a better match between large- and small-scale variability in winter. This is consistent with the effects of land–atmosphere and land–sea coupling on the surface temperature correlation lengths scales in the midlatitudes, which are large over midlatitude land surfaces (which have relatively low thermal capacity) and small over midlatitude ocean surfaces (which have high thermal capacity); see, for example, North et al. (2011). Correlations in the center of the CRCM spatial domain are also smaller for both winter and summer, indicating the reduced influence from the GCM boundary forcing on local climate toward the center of the domain.

Even though the spatial patterns are similar, the correlation coefficients in the cross-validated series based on the CRCM driven by CGCM3 or CCSM3 are slightly higher than other RCM–GCM pairs (not shown). The CRCM–CGCM3 pair also has better correlation than CRCM–CCSM3. At least two factors may have contributed to the differences in the magnitudes and spatial patterns of correlation between different RCM–GCM simulations. One is that the CRCM employs a spectral nudging technique that increases the consistency between the large-scale circulation variability in the CRCM and that in the driving model, thus also limiting temperature variations generated internally by the CRCM (Alexandru et al. 2009). The other is that the CRCM physics package is similar to that of the driving CGCM3, possibly explaining why the CRCM–CGCM3 simulation has slightly higher correlation than that in CRCM–CCSM3.

The magnitudes of RMSEs between the cross-validated series are related to the size of the residual of the regressions and thus the magnitudes of RCM-generated internal variability, and are important indicators of the relative magnitudes of predictive errors of the regressions. Larger RMSEs result in larger predictive uncertainty. The RMSEs for CRCM–CGCM3 are shown in Fig. 2b. Compared with results from other RCM–GCM pairs (not shown), RMSEs are in general smaller if downscaling is based on the CRCM–CGCM3 combination. Larger RMSEs tend to appear toward the center of the spatial domain, particularly in summer when the land surface is active.

Another important aspect of validation is to assess how well our statistical downscaling approach can emulate dynamical downscaling. To this end, we compute RMSEs between dynamically and statistically downscaled values, and then compare these RMSEs with regression residuals. We first fitted the regression equations based on the CRCM–CGCM3 combination and calculated regression residuals. We also computed the RMSEs between downscaled values for CGCM3 predicted by the fitted regression and those obtained when CGCM3 was dynamically downscaled with RCM3 (rather than CRCM, as used for training the regressions). The RMSEs and the ratio between the RMSEs and the regression residuals gives a quantitative measure of the ability of the statistically downscaling procedure trained on the CRCM–CGCM3 combination to emulate the output of RCM3 when driven with CGCM3. We similarly assessed how well the regressions derived from the CRCM–CGCM3 combination were able to emulate the output of RCM3 when driven by the GFDL CM2.1. Results for winter and summer temperatures are shown in Figs. 3a and 3b, respectively. We find that the use of statistical downscaling trained on the CRCM–CGCM3 combination is able to emulate RCM3 reasonably well, both when the RCM3 was driven with CGCM3 and with GFDL CM2.1. The RMSEs and the ratio both become slightly larger when a different driving GCM was involved. The same comparisons have been conducted with other RCM–GCM pairs when possible (not shown), and the results are similar. In general, it appears that in many places the statistical downscaling approach is comparable to the dynamical approach, although the ratio shows that RMSE can be doubled in some regions.

### b. Projected temperature changes

As described in section 3b, we computed the 10th, 50th, and 90th percentiles from 19 000 simulations for each year in the A2 scenario. We then average those percentiles over the three 30-yr periods centered on 2025, 2055, and 2085. Results for winter and summer temperatures are displayed in Figs. 4 and 5, respectively. The 50th percentile represents the median of the projected changes while the 10% and 90% percentiles give an indication of the average uncertainty range in the projected temperature changes.

Figure 4 displays the 10th, 50th, and 90th percentiles of the projected winter temperature changes relative to 1971–2000 climatologies for the three 30-yr periods centered on 2025, 2055, and 2085. The 10th percentile or the 90th percentile indicates that the probability of the temperature change smaller than the 10th percentile or greater than the 90th percentile is 10%. In general, temperature increases tend to be greater at high latitude or in higher-elevation regions. Across North America, the 50th percentile of temperature changes ranges from 0.5° to 2°C, from 1.5° to 4°C, and from 2.5° to 6°C for the three 30-yr periods, respectively. The difference between the 90th and the 10th percentiles decreases from the northeast to the southwest (not shown). It also tends to be larger toward the end of the twenty-first century, indicating larger uncertainty in temperature toward the more remote future.

Figure 5 shows the 10th, 50th, and 90th percentiles of projected summer temperature changes relative to 1971–2000 climatologies for the three 30-yr periods centered on 2025, 2055, and 2085. The greatest temperature increases occur in the interior of the coterminous United States. Smaller temperature increases occur in both the northern and southern extremities of North America. The 50th percentile of temperature changes ranges from 0.5° to 1.5°C, from 1.5° to 3°C, and from 2.5° to 5°C for the three 30-yr periods, respectively. The spread between the 90th and the 10th percentiles is largest toward the center of the United States, and becomes larger toward the end of the twenty-first century (not shown).

### c. Sources of uncertainty

The principal ANOVA results that summarize the relative importance of different factors including statistical downscaling, the choice of RCM, the choice of GCM to the uncertainty under a given emission scenario, and all interaction terms combined are presented in Fig. 6. In Fig. 6, we averaged the percentages of variance from different random factors across North America. The combined interaction terms represent various structural uncertainties among GCM, RCM, emission scenarios, and statistical downscaling.

Figures 6a and 6b show the results based on Eq. (4) in which 36 simulations from both A2 and B1 emission scenarios were involved but uncertainties due to the structural error GCM and that due to internal variability in the GCM simulations were not separated. In this case, the most important contributor to the uncertainty at high resolution is downscaling from GCM outputs to RCM resolution. GCM modeling uncertainty is the second most important factor. Its importance increases with time. Déqué et al. (2006) analyzed dynamically downscaled temperatures from 10 different RCMs for European climate and showed that the uncertainty due to the choice of GCM was larger than other factors for the late twenty-first century. The contributions to the uncertainty by the choice of different RCMs for establishing the statistical downscaling and by various structural uncertainties are of the similar magnitude. As available RCM simulations are very limited, a large portion of the uncertainty at high resolution appears as predictive error of the statistical models. This does not necessarily indicate poor performance of the statistical approach because small-scale activity in RCMs not apparent in GCMs is demonstrated here. Note, however, the uncertainty from the three sources (RCMs, GCMs, and statistical downscaling prediction error) becomes comparable by the end of the twenty-first century. Figures 6c and 6d show that the results when Eq. (6) are applied to 21 runs from a smaller subset of seven GCMs that had multiple simulations. In this case, we were able to decompose the GCM contribution to the total uncertainty into model and internal variability components. Overall, the contribution from GCM-simulated internal variability is less important than any other factors, which is also consistent with the European results of Déqué et al. (2006).

As it is unclear if the two fixed A2 and B1 emission scenarios span the whole range of plausible emission scenarios, we have considered the emissions scenario effect as a fixed factor. This treatment is different from what was done in Déqué et al. (2006) in which the effect of emission scenario was considered to be random. To have an idea of how temperature changes may differ between these two scenarios, we averaged the temperature change across North America for A2 and B1, respectively, and then computed the differences in the medium of temperature projections between these two scenarios for each of the 30-yr periods. The difference is very small across North America for 2011–40. This is because the difference in the emission scenarios is small during this period. However, this difference increases slowly. The differences reach about 0.5° and 0.4°C in winter and summer, respectively, during the middle of this century, and 1.5° and 1.1°C in winter and summer by the end of the twenty-first century, which indicates that the difference in the projected temperatures between the two emission scenarios can be as large as the variance due to natural variability and structural errors in both GCMs and RCMs.

## 5. Conclusions and discussion

We have presented a framework for the construction of probabilistic projections of high-resolution climate scenarios using multiple GCM ensembles and RCM simulations that are publicly available. This involves the use of multiple GCM ensembles of opportunity, the use of limited RCM downscaled fields, and the use of a statistical downscaling approach to emulate RCMs. We then applied this approach to devise multiple realizations of temperature that are in turn used to estimate empirical probabilistic projections of future temperature for North America. Contributions to the projected spread of future temperature as represented by the variance from four different sources—the model errors of GCM, GCM internal natural variability, the use of different RCMs to simulate high-resolution temperature, and statistical downscaling including internal variability—are partitioned using a nested-effects ANOVA model.

Projected future temperature changes have a large spatial variability that increases with time. Winter temperature will increase more in the north than in the south, reflecting polar amplification of temperature change. The largest summer temperature change may not occur in the far north; rather, it may appear in the mountainous and dry regions of the central coterminous United States. For a given emission scenario, downscaling uncertainty is the most important contributor to the projected future changes at high resolution, followed by uncertainty due to GCMs and the use of different RCMs to simulate high-resolution temperature. Uncertainty due to internal variability at GCM spatial resolution plays only a minor role. Differences in emission scenarios yield different projected temperature changes into the future. This difference increases with time. The difference between the A2 and B1 emission scenarios in the median values of projected changes in 30-yr mean temperature is small for the coming 30 yr, but it can become almost as large as the total variance due to internal variability and modeling errors in both the GCM and RCM.

Our high-resolution temperature projections come with several caveats. We are using GCM simulations of opportunity, which means that we may not have sampled the full space of uncertainty due to model structural and parameter errors. We are also limited by the availability of RCM simulations. We plan to update this analysis once all planed RCM runs are available. We expect the additional simulations could widen the current estimates of the uncertainty range as expressed by differences among empirical quantiles, especially the uncertainty from regression models. The perturbed physics ensemble approach coupled with a detail hierarchical Bayesian description of the multiple sources of uncertainty that affected downscaled climate change projections (Murphy et al. 2010) would likely produce a larger uncertainty range. To improve our estimate, we plan to repeat this work when new simulations produced for the IPCC Fifth Assessment Report and when additional RCM simulations for the North America become available.

We wish to thank the North American Regional Climate Change Assessment Program (NARCCAP) for providing the data used in this paper. NARCCAP is funded by the National Science Foundation (NSF), the U.S. Department of Energy, the National Oceanic and Atmospheric Administration (NOAA), and the U.S. Environmental Protection Agency’s (EPA) Office of Research and Development. We acknowledge the modeling groups, the Program for Climate Model Diagnosis and Intercomparison (PCMDI), and the WCRP’s Working Group on Coupled Modelling (WGCM) for their roles in making available the WCRP CMIP3 multimodel dataset. Support of this dataset is provided by the Office of Science, U.S. Department of Energy. Chad Shouquan Cheng, Lucie Vincent, Seung-ki Min, Gerd Buerger, Trevor Murdock, and three anonymous reviewers provided valuable comments on earlier versions of this manuscript.

## REFERENCES

Alexandru, A., , R. D. Elia, , R. Laprise, , L. Separovic, , and S. Biner, 2009: Sensitivity study of regional climate model simulations to large-scale nudging parameters.

,*Mon. Wea. Rev.***137**, 1666–1686.Barnett, T. P., and Coauthors, 2008: Human-induced changes in the hydrology of the western United States.

,*Science***319**, 1080–1083, doi:10.1126/science.1152538.Carter, T. R., and Coauthors, 2007: New assessment methods and the characterisation of future conditions.

*Climate Change 2007: Impacts, Adaptation and Vulnerability,*M. L. Parry et al., Eds., Cambridge University Press, 133–171.Chandler, R. E., 2005: On the use of generalized linear models for interpreting climate variability.

,*Environmentrics***16**, 699–715, doi:10.1002/env.731.Daly, C., , R. P. Neilson, , and D. L. Phillips, 1994: A statistical–topographic model for mapping climatological precipitation over mountainous terrain.

,*J. Appl. Meteor.***33**, 140–158.Déqué, M., and Coauthors, 2006: An intercomparison of regional climate simulations for Europe: Assessing uncertainties in model projections.

,*Climatic Change***81**, 53–70, doi:10.1007/s10584-006-9228-x.Field, C. B., , L. D. Mortsch, , M. Brklacich, , D. L. Forbes, , P. Kovacs, , J. A. Patz, , S. W. Running, , and M. J. Scott, 2007: North America.

*Climate Change 2007: Impacts, Adaptation and Vulnerability,*M. L. Parry et al., Eds., Cambridge University Press, 617–652.Furrer, R., , R. Knutti, , S. R. Sain, , D. W. Nychka, , and G. A. Meehl, 2007: Spatial patterns of probabilistic temperature change projections from a multivariate Bayesian analysis.

,*Geophys. Res. Lett.***34**, L06711, doi:10.1029/2006GL027754.Gillett, N. P., , F. W. Zwiers, , A. J. Weaver, , and P. A. Stott, 2003: Detection of human influence on sea level pressure.

,*Nature***422**, 292–294.Gillett, N. P., , V. K. Arora, , K. Zickfeld, , S. J. Marshall, , and W. Merryfield, 2011: Ongoing climate change following a complete cessation of carbon dioxide emissions.

,*Nat. Geosci.***4**, 83–87, doi:10.1038/ngeo1047.Hamann, A., , and T. Wang, 2005: Models of climatic normals for genecology and climate change studies in British Columbia.

,*Agric. For. Meteor.***128**, 211–221.Kaufman, C. G., , and S. R. Sain, 2010: Bayesian functional ANOVA modeling using Gaussian process prior distributions.

,*Bayesian Anal.***5**, 123–150, doi:10.1214/10-BA505.Laprise, R., 2008: Regional climate modelling.

,*J. Comput. Phys.***227**, 3641–3666.Matthews, H. D., , and K. Caldeira, 2008: Stabilizing climate requires near-zero emissions.

,*Geophys. Res. Lett.***35**, L04705, doi:10.1029/2007GL032388.Maurer, E. P., , L. Brekke, , T. Pruitt, , and P. B. Duffy, 2007: Fine-resolution climate projections enhance regional climate change impact studies.

,*Eos, Trans. Amer. Geophys. Union***88**(47), 504, doi:10.1029/2007EO470006.Mearns, L. O., , W. J. Gutowski, , R. Jones, , L.-Y. Leung, , S. McGinnis, , A. M. B. Nunes, , and Y. Qian, 2009: A regional climate change assessment program for North America.

,*Eos, Trans. Amer. Geophys. Union***90**, 311–312, doi:10.1029/2009EO360002.Min, S.-K., , X. Zhang, , F. W. Zwiers, , and G. C. Hegerl, 2011: Human contribution to more intense precipitation extremes.

,*Nature***470**, 378–381, doi:10.1038/nature09763.Murphy, J. M., and Coauthors, cited 2010: U.K. Climate Projections Science Report: Climate change projections. Met Office Hadley Centre, Exeter, United Kingdom. [Available online at http://ukclimateprojections.defra.gov.uk/content/view/944/517.]

Neter, J., , W. Wasserman, , and M. H. Kutner, 1985:

*Applied Linear Statistical Models*. Richard D. Irwin, Inc., 1127 pp.North, G. R., , J. Wang, , and M. C. Genton, 2011: Correlation models for temperature fields.

,*J. Climate***24**, 5968–5997.Rougier, J., 2008: Comment on article by Sanso et al.

,*Bayesian Anal.***3**, 45–56.Santer, B. D., and Coauthors, 2007: Identification of human-induced changes in atmospheric moisture content.

,*Proc. Natl. Acad. Sci. USA***104**, 15 248–15 253, doi:10.1073/pnas.0702872104.Shen, S., , P. Dzikowski, , G. Li, , and D. Griffith, 2001: Interpolation of 1961–97 daily temperature and precipitation data onto Alberta polygons of ecodistrict and soil landscapes of Canada.

,*J. Appl. Meteor.***40**, 2162–2177.Solomon, S., , D. Qin, , M. Manning, , M. Marquis, , K. Averyt, , M. M. B. Tignor, , H. L. Miller Jr., , and Z. Chen, Eds., 2007:

*Climate Change 2007: The Physical Science Basis*. Cambridge University Press, 996 pp.Solomon, S., , G. K. Plattner, , R. Knutti, , and P. Friedlingstein, 2009: Irreversible climate change due to carbon dioxide emissions.

,*Proc. Natl. Acad. Sci. USA***106**, 1704–1709.Tabor, K., , and J. Williams, 2010: Globally downscaled climate projections for assessing the conservation impacts of climate change.

,*Ecol. Appl.***20**, 554–565.Tebaldi, C., , and R. Knutti, 2007: The use of the multi-model ensemble in probabilistic climate projections.

,*Philos. Trans. Roy. Soc.***365A**, 2053–2075, doi:10.1098/rsta.2007.2076.Wang, T., , A. Hamann, , D. Spittlehouse, , and S. N. Aitken, 2005: Development of scale-free climate data for western Canada for use in resource management.

,*Int. J. Climatol.***26**, 383–397.Willett, K. M., , N. P. Gillett, , P. D. Jones, , and P. W. Thorne, 2007: Attribution of observed surface humidity changes to human influence.

,*Nature***449**, 710–712, doi:10.1038/nature06207.Zellner, A., 1962: An efficient method of estimating seemingly unrelated regression equations and tests of aggregation bias.

,*J. Amer. Stat. Assoc.***57**, 500–509.Zhang, X., , F. W. Zwiers, , G. C. Hegerl, , F. H. Lambert, , N. P. Gillett, , S. Solomon, , P. Stott, , and T. Nozawa, 2007: Detection of human influence on 20th century precipitation trends.

,*Nature***448**, 461–465, doi:10.1038/nature06025.Zwiers, F. W., 1996: Interannual variability and predictability in an ensemble of AMIP climate simulations conducted with the CCC GCM2.

,*Climate Dyn.***12**, 825–847.Zwiers, F. W., , X. Zhang, , and Y. Feng, 2011: Anthropogenic influence on long return period daily temperature extremes at regional scales.

,*J. Climate***24**, 881–892, doi:10.1175/2010JCLI3908.1.