## 1. Introduction

Climate scientists recognize that, although Earth’s surface temperatures have entered a period of general long-term increase, temporary stalling or cooling periods may occur at any spatial scale (Easterling and Wehner 2009; Knight et al. 2009; Meehl et al. 2011; de Elía et al. 2013; Deser et al. 2014). Most long-term trends for the twenty-first century are projected as positive with a high degree of confidence, because anthropogenic greenhouse gases are the dominant factor for that period and their radiative forcing is expected to increase (van Vuuren et al. 2011; Stocker et al. 2013). Short-term trend distributions are tied to this global forcing as well but are also expected to reflect the fast-changing influence of secondary factors like natural forcings (e.g., volcanic aerosols and solar cycles) and unforced natural variability [e.g., the North Atlantic Oscillation (NAO) and the El Niño–Southern Oscillation (ENSO)].

The recent period (~1998–2012) offers an example of temporary warming slowdown on the global scale (Morice et al. 2012). This could have resulted from naturally prevailing La Niña–like conditions in the tropical Pacific (Kosaka and Xie 2013) accompanied by subsurface ocean heat uptake higher than in decades that experienced more warming (Meehl et al. 2011; England et al. 2014; Risbey et al. 2014). The NAO could be implicated as well, at least for the Northern Hemisphere (Li et al. 2013). Other possible causes, involving a decrease in top-of-atmosphere radiative forcing, have been proposed: decreased ozone-depleting substances and methane emissions (Estrada et al. 2013), decreased solar irradiance during cycle 23’s declining phase (Lean 2010; Kopp and Lean 2011), increased sulfur emissions in Asia (Kaufmann et al. 2011), and declining stratospheric water vapor concentrations after 2000 (Solomon et al. 2010). The recent slowdown may be in part a measurement artifact due to incomplete global coverage (Cowtan and Way 2013), but in this case it may be only a matter of time before a 10- or 15-yr global cooling trend actually occurs.

At the regional and local scales, short-term cooling trends are expected to occur with higher probabilities, because interannual and interdecadal variability is generally higher at smaller spatial scales (Deser et al. 2012; Maraun 2013). For North America, Deser et al. (2014) have shown that internal variability can result in as much as ~40% of simulations generated by state-of-the-art global coupled models showing cooling trends over some regions during the period 2010–35.

Although the possibility of future cooling trends in a long-term warming path is undisputed, most studies on the subject are based on direct outputs from climate models, which present differences in many statistical properties when compared with observational products. For example, van Oldenborgh et al. (2013) have examined the Coupled Model Intercomparison Project phase 5 (CMIP5) multimodel ensemble and concluded that the distribution of surface temperature trends (per degree of global warming) for 1950–2011 and north of 45°S is underdispersive; over Canada, their analysis suggests an underforecasting bias, with observed trends almost all within the 20th–99th-percentile bracket of CMIP5 trends. In another study, Bhend and Whetton (2013) also concluded that there were inconsistencies between observed and CMIP3–CMIP5 trends, including for regions in Canada. Other relevant studies of temperature trends at different time and space scales in multimodel ensembles include those of Santer et al. (2011), Sakaguchi et al. (2012), Stott et al. (2013), and Knutson et al. (2013).

Reported inconsistencies between simulated and observed trends are not the only reason why computing future probability of cooling trend (*P*_{cool}) values based on climate model simulations is controversial (Curry 2011; Stephenson et al. 2012). Another important reason is that a “true” value of *P*_{cool} would have to be based on a perfectly representative distribution of trends, which multimodel ensembles cannot guarantee (von Storch and Zwiers 2013). Other issues related to probabilistic projections of climate include the difficulty of assessing reliability given a single real climate trajectory, inferring models’ future skill from past skill, subjectivity in model weighting, model tuning, and anthropogenic forcing uncertainty (Räisänen and Palmer 2001; Tebaldi and Knutti 2007; Yokohata et al. 2013). However, it may be argued that estimations of *P*_{cool} should be provided by climate experts when needed, with limitations clearly stated. The rationale is that otherwise, decisions are made “with implied assessments of relative likelihoods that depart, perhaps significantly, from experts’ best estimates” (Hall et al. 2005, p. 346). Notably, Räisänen and Palmer (2001), Schneider (2002), and Collins (2007) encouraged probabilistic climate forecasting.

This study’s main objective is to estimate *P*_{cool} for short-term durations (from 5 to 25 yr) over Canada during the current period (2006–35) in ways that reflect how numerical climate simulations are actually used by climate service institutions to provide decision makers with local climate scenarios (Huard et al. 2014). A local univariate climate scenario is defined here as a time series that describes a path one climate variable may plausibly follow over a very small area and during a given period of time. Plausibility requires statistical properties consistent with observations for the past segment, continuity at the past–future junction, and physical credibility for the future segment. Climate scenarios are designed by merging statistical properties from numerical model simulations and observational products, with the direct usage of simulations seen as one of the options. No single scenario is a prediction; however, as an ensemble, scenarios aim to cover the range that includes the real future climate trajectory.

A secondary objective is to evaluate whether climate scenario design methods other than the direct use of simulations reduce offsets between observational and simulated *P*_{cool} distributions during the calibration period (1962–2010). Two such alternative methods are used. One method performs statistical adjustment of the simulated interannual variability, while the other involves using a simple autoregressive model to generate variability.

It is important to emphasize local short-term cooling trends in the context of global warming for at least two reasons. First, some decisions do require information related to the climate’s short-term evolution or are based on changes from climate normal, and it may be important to know the range of temperature trends. Second, global warming and consequential environmental changes represent a scientific challenge but also a political issue, implying that it might be tempting for some interest groups to create and spread disinformation. For example, the recent warming slowdown is known to have fueled contrarian views, as discussed by Kaufmann et al. (2011) and Cowtan and Way (2013). Thus it must be clearly demonstrated that short-term cooling trends and long-term global warming represent two intertwined phenomena and that the occurrence of the former is in no way a refutation of the latter (Santer et al. 2011).

The paper is divided as follows. In section 2, datasets and methodological choices are described. Hindcast skill results for 1962–2010 and probabilistic forecasts for 2006–35 are presented in sections 3 and 4, respectively. Section 5 consists of a summary with discussion.

## 2. Data and methods

Numerical model simulations are often used directly as scenarios for making inferences about aspects of Earth’s real future climate (e.g., Deser et al. 2014). This is justified when statistical properties of interest are judged to be realistic. But a more general approach to climate scenario design would be to merge information from simulations and observations, with all the weight on simulations being just one option. The rationale for generalizing the approach is that there exist statistical offsets between simulations and local realizations of nature, and part of this is due to scale mismatch and the inherent imperfections of models. This causes discontinuities at the junction of the past-observed and future-simulated segments, among other problems. However, part of these offsets may also result from limitations in observational products (particularly in northern Canada, where the network of stations is sparse) and from the comparison period being too short for both observations and simulations to represent the full phase space of the local climate. For this and other reasons, there is no scientific consensus on the legitimacy of methods akin to bias correction (Themeßl et al. 2012; Haerter et al. 2011; Ehret et al. 2012).

In this study, *P*_{cool} uncertainty related to the choice of scenario design is addressed by considering three different methods. These are based on the same observational product and simulation ensemble but merge the statistical properties differently. The property of interest here is the short-term trend distribution. Because it is difficult to design climate scenarios in a way that directly ensures their short-term trends will be consistent with observations, methods (other than the direct use of simulations) address the problem indirectly by seeking agreement in interannual variability (standard deviation of residuals around a background state or long-term time-varying trend). Offsets in climate simulation variability have the potential to affect trend detection as well as consistency with observations (Knutson et al. 2013). None of the methods is based on seeking scenario–observation agreement in long-term trends, although this statistical property is strongly linked with short-term trend distributions. Such an agreement could be imposed on the past segment of a scenario simply by borrowing the observational regression. However, extrapolating for the future segment is unlikely to give credible results.

### a. Datasets

#### 1) Numerical simulations

Monthly data from CMIP5 are used (Taylor et al. 2012). The CMIP5 dataset has been chosen because it is recent and well explored by the climate community (Stocker et al. 2013) and reflects a variety of modeling efforts. Nonetheless, this ensemble’s representation of the historical climate has limitations (Sheffield et al. 2013), and models are not completely independent (Räisänen 2007; Masson and Knutti 2011; Stephenson et al. 2012; Knutti et al. 2013). We use 60 simulations from 15 Earth system models (ESMs) and four representative concentration pathways (RCPs). Only ESMs presenting at least one member per RCP at the time of analysis have been selected; these are listed in Table 1. The principle of “model democracy” is applied to the selected ESMs, with only the member with identification code r1i1p1 kept for each ESM and RCP (most models had only one member at the time the analysis was conducted). However, in practice, some models may have more weight than others in this study, since different models from the same center may, in fact, be closely related (GFDL and MIROC each contribute three models in this study).

List of ESMs and associated modeling groups whose RCP experiment simulations with member code r1i1p1 are used in this study. (Here, QCCCE is Queensland Climate Change Centre of Excellence, NIMR/KMA is National Institute of Meteorological Research/Korea Meteorological Administration, MOHC/INPE is Met Office Hadley Centre/Instituto Nacional de Pesquisas Espaciais, and NCC is Norwegian Climate Centre; expansions for the ESMs and the other modeling groups are available at http://www.ametsoc.org/PubsAcronymList.)

#### 2) Observational data

The product used for scenario design is a Natural Resources Canada (NRCan) dataset based on daily observations from Environment Canada’s station network interpolated to a regular 10 km × 10 km grid over Canada (Hutchinson et al. 2009; Hopkinson et al. 2011). Version 2 is used. It contains the daily minimal (*T*_{min}) and maximal (*T*_{max}) temperatures, obtained by applying the Australian National University Splines (ANUSPLIN) package interpolation procedure to station observations. This procedure accounts for terrain elevation but does not take account of any other physical principle. Thus, biases are potentially large where the stations network is sparse (e.g., in the Arctic). Although this dataset does not strictly correspond to observations, this term will be used herein for the sake of brevity. Daily mean temperatures (*T*_{mean}) are obtained by the approximation *T*_{mean} = (*T*_{min} + *T*_{max})/2. This approximation was notably used by Jones et al. (1999) and has known limitations (e.g., Zeng and Wang 2012).

### b. Study domain and period

Figure 1 presents the Canadian toponyms referred to in this study. Scenarios are built for 5146 locations over the country, a number obtained by selecting each 50th point of the NRCan grid. Thus, scenarios are produced on an irregular grid, with a sampling distance of the order of ~44 km. This relatively short distance should not be interpreted as a magnification of ESMs’ resolutions (~100–300 km); scenarios are calculated separately at each location, but neighboring locations are climatically not independent. The sampling distance stems from a compromise between limiting computing time and being able to visualize interesting geographical patterns. Each value in a time series corresponds to a single year and may represent the whole year (ANN) or a single season [December–January (DJF), March–May (MAM), June–July (JJA), and September–November (SON)]. For winter values of a given year, December data from the previous year are used, so the average value actually represents three consecutive months.

Statistical calculations of trends are performed for the 2006–35 segment of the scenarios created. During that period, the mean state of the scenarios ensemble follows a fairly linear trend (not shown), with no obvious RCP-based segregation. Figure 2 shows the linear trends for the grid point nearest to Quebec City (46.71°N, −71.21°E) during the periods 2006–35 and 2036–65, for each combination ESM–RCP (annual time series are used, which are obtained by interpolating the simulations at the NRCan grid point closest to the targeted location). The uncertainty in the trend appears to be much more dependent on ESM choice and initial conditions than on RCP choice over 2006–35. The rates of warming generally increase from 2006–35 to 2036–65 in the case of RCP8.5, are about constant for RCP6.0, and decrease for RCP4.5, whereas temperatures for RCP2.6 stabilize at warmer values than currently observed.

As a matter of indication, the *p* value of the two-sample Kolmogorov–Smirnov test between RCP2.6 and RCP8.5 linear trend distributions is ~0.14 for 2006–35, and <0.01 for 2036–65. Considering a 5% significance level, this means the null hypothesis that the two samples are drawn from the same parent distribution is rejected for the 2036–65 period only. Applying the Kolmogorov–Smirnov test on all 5146 locations results in the rejection of the null hypothesis 77 and 5146 times for 2006–35 and 2036–65, respectively. This supports the assumption that socioeconomic scenarios associated with RCPs do not have a major impact on trends during the study period (i.e., the warming is, in large part, already committed). Hence the dataset can reasonably be considered to include four members per ESM, and trend statistics on the scenarios may be performed without regard for the RCP. This approximation of a small forcing-related uncertainty might not hold when another ensemble of simulations is considered. Time series for the different RCPs during 2006–35 are also practically indistinguishable in terms of two other statistical properties: namely, detrended variance and lag-1 autocorrelation values (not shown).

Finally, it is important to mention that the 5146 locations investigated are not all climatically independent. According to a simple test, the maximal number of uncorrelated NRCan time series across Canada during 1962–2010 is six for each season except summer (11). For two time series to be considered uncorrelated, this test requires that the Pearson linear correlation coefficient (*r*_{Pea}) between their respective linear regression residuals be nonsignificant according to Fisher’s *z*-transformation test at the 95% level (Wilks 2006, section 5.4.2).

### c. Scenario design

#### 1) Method 1: Interpolated model outputs

The first method consists of interpolating each simulation at the 5146 study locations. No statistical adjustment is performed. Spatial interpolation has been used as a form of rudimentary downscaling by Ahmed et al. (2013; see also references therein).

#### 2) Method 2: Statistical adjustment of interannual variability

In the second method, each simulation is adjusted according to its level of agreement with observational variability over a calibration period (1962–2010). The transient background (long-term) trends are still dictated by the simulation. Adjustment is performed independently for each simulation, season, and location. After spatial interpolation, the first step is calculating the time-varying background state of the simulation *B*_{SIM}, as a fourth-order regression (Hawkins and Sutton 2009). The *B*_{SIM} is then subtracted from the simulated time series to get the residual distribution {*r*_{SIM}}. A fourth-order regression is also used to separate the observational residuals {*r*_{OBS}} from the background transient state *B*_{OBS}. A transfer function is next determined, which ensures the mapping of {*r*_{SIM}} into {*r*_{OBS}} for elements of the calibration period, on a quantile basis. The transfer function is then applied to the 1962–2035 residuals, and a scenario is obtained by adding *B*_{SIM} back to the adjusted residuals.

The statistical adjustment of interannual variability is inspired by techniques applied to daily model outputs and known under different names, including quantile mapping (Themeßl et al. 2012) and statistical bias correction (Piani et al. 2010). Here the transfer function is linked with the residuals rather than the actual values, mostly because otherwise different observed and simulated long-term trends would lead to inappropriate variability adjustment (Scherrer et al. 2005).

#### 3) Method 3: Stochastic interannual variability

*B*

_{SIM}independently, which is determined as in method 2 (fourth-order regression). The same technique is used to obtain

*B*

_{OBS}, and the corresponding residuals

*r*

_{OBS}are then used to create a first-order autoregressive or AR(1) model (Wilks 2006, section 8.3.1). In such a statistical model, each value

*X*

_{t}of a time series partially depends on the previous,

*X*

_{t−1}, following the equation

*φ*corresponds to the lag-1 autoregressive parameter of {

*r*

_{OBS}},

*μ*is the average of {

*r*

_{OBS}}, and

*ε*

_{t}is a random number picked from a distribution {

*ε*}. This distribution is assumed to be Gaussian, with

*μ*

_{{ε}}=

*μ*= 0 and

*σ*

_{{ε}}

^{2}= (1 –

*φ*

^{2})

*σ*

^{2},

*σ*being the standard deviation of {

*r*

_{OBS}}. Scenario residuals are then generated using the first value of {

*r*

_{OBS}} as the initial

*X*

_{0}value and randomly picking from within {

*ε*}. Here 50 values beyond the number needed to cover the period 1962–2035 are generated, and only the last 74 values are kept, to eliminate the influence of the arbitrary

*X*

_{0}value chosen to start the series. Once the time series for the scenario residuals is obtained, it is added to

*B*

_{SIM}to obtain a

*T*

_{mean}scenario. Whether the use of a higher-order autoregressive integrated moving average (ARIMA) model would be more appropriate has not been investigated, as was done, for example, by Foster and Rahmstorf (2011) for global temperature.

Successively using each simulation of our work dataset provides an ensemble of 60 scenarios. Note that for each simulation, any number of scenarios could have been generated just by changing *X*_{0} or the pseudorandom seed. However, using the same number of scenarios as provided by methods 1 and 2 renders the intermethod comparison easier.

#### 4) Methods summary

Table 2 offers an overall view of how simulations and observations are combined to produce scenarios in each method, indicating which dataset is relied upon for the background state, the sequence (alternation of minima and maxima), and the variability. The alternative method referred to in Table 2 is not used to generate future *P*_{cool} values and is described in section 4. Figure 3 shows what this means concretely by showing the three scenarios that stem from the FGOALS-s2 historical/RCP6.0 simulation, for annual *T*_{mean} values interpolated at the grid point nearest to Yellowknife (62.13°N, −114.46°E). It is not necessary to adjust the manifest offset between simulated and observed long-term averages, because trends and *P*_{cool} values are invariant under a constant shift. However, to facilitate viewing, the method 3 scenario is shown with a constant shift corresponding to the difference between the means of *B*_{SIM} and *B*_{OBS} over the calibration period. The more subtle offset in variability is adjusted for in method 2. In this case, standard deviations of the 1962–2010 residuals for observations, the simulation, and the method 2 scenario are 1.15°, 1.51°, and 1.17°C, respectively. For method 3, the standard deviation of the residuals is 1.30°C; the number of time steps is too small to ensure convergence toward the prescribed standard deviation.

Methods summarized by source [simulations (SIM) or observations (OBS)] for three major scenario characteristics: trend, sequence, and variability. The alternative method is used for hindcast analysis only.

### d. Short-term trend calculations

Once scenarios are created, short-term trend distributions are obtained, with the aim of estimating a probability for the occurrence of a negative trend (*P*_{cool}) for each case location/season/method/duration. Short-term durations considered are 5, 10, 15, 20, and 25 yr. For each case, a trend distribution is obtained by running a block of the specified duration within the 2006–35 period and calculating linear trends by least squares regression. For each case, this gives 1560 values for 5-yr durations (26 blocks × 60 scenarios), and 1260, 960, 660, and 360 values for the other durations as they increase. It is important to note that blocks are not independent, because they overlap. As will be discussed later, this lack of independence is taken into account in assessing confidence intervals (CIs). For each distribution, some trends are negative and some are positive, and the fraction of negative values during 2006–35 is directly interpreted as *P*_{cool}.

There are more sophisticated ways to obtain probabilities in situations like this—for example, using Bayesian approaches. However, there is no methodological consensus, and differences between results based on various methods are sometimes large (e.g., Tebaldi and Knutti 2007). It is reasonable to consider that the more frequently negative trends occur in scenarios based on ESMs, the more likely they are to occur in the real world.

## 3. Hindcast skill

### a. Verification rank histograms

A consistent hindcast requires that the observed value’s rank varies with equiprobability among the different projected outcomes (Wilks 2006, section 7.7.2). Such an assessment has been performed for method 1 and every case season/duration during the period 1962–2010, as shown in Fig. 4. Ranks are converted into percentiles. Weights are 1/2*N* for each tail and 1/*N* for each interval between successive scenario *P*_{cool} values, with *N* = 60 being the number of scenarios. Linear interpolation is performed between ranks, and 5-percentile bins are used for the histograms (see van Oldenborgh et al. 2013 for further details). Each location contributes one *P*_{cool} value based on a number of trend values, which depends on the duration. Overlap in periods for calculating trends and spatial correlation limit the effective sample size in two different manners. For example, for the case ANN/10-yr duration, there are around 6 climatically independent regions across the country, and only 4 of the 40 trend values determining the local *P*_{cool} value are completely independent. This indicates that the histograms include some redundant information, rendering the estimation of statistical significance challenging (with respect to the disparities between observation and scenario *P*_{cool} distributions). Nevertheless, rank histograms are probably among the best tools for judging how observations differ from the hindcast ensemble. For each case in Fig. 4, observations (red histogram) differ from the ideal result (black line). But when individual simulations are compared with the other 59, histograms also depart from the ideal result (for each bin, the gray shading ranges from the lowest to the highest frequency resulting from the intersimulation verifications). The general location of the red histogram within the gray shading indicates that observations differ from simulations approximately as much as simulations differ from one another. However, for DJF and 15–25-yr durations there is an overforecasting bias (the hindcast shows perceptibly fewer low *P*_{cool} values than the observations). Also, for the case SON/25 yr, observational values occur atypically often within the 80th–95th percentiles. The overall picture is the same with methods 2 and 3 (not shown).

The fairly good verification rank histograms do not highlight the fact that *P*_{cool} geographical patterns vary substantially between observations and individual models. Related correlation coefficients *r*_{Pea} (not shown) are generally small but more often positive than negative. Forcing local scenarios to adopt the observational background state improves *P*_{cool} correlational results substantially. This is discussed in more detail in the next section.

### b. Distributions

In this section, the models’ *P*_{cool} distributions are compared with the corresponding observational values during the calibration period (1962–2010). This procedure’s objective is mainly to assess whether statistically adjusting the variability (with methods 2 and 3) has a beneficial impact in hindcast mode. An alternative scenario design method that consists of reproducing the observed variability around its fourth-order regression with an AR(1) model is also used. This method has not been used beyond the calibration period, because the observed regression would have to be extrapolated, and this procedure has been judged to be too simplistic. The alternative method’s role in the current section is limited to assessing the potential for agreement between scenario and observational *P*_{cool} values that could be reached were their background state the same. Such an agreement during the past period would not guarantee future agreement, but a disagreement would clearly indicate some limitation in the scenario design method.

In Fig. 5, winter and summer 5-yr duration *P*_{cool} distributions for 1962–2010 across Canada are shown for observations (red histograms) as well as for the individual 15 ESMs [gray histograms; black bars connect the 4th and 12th highest frequency values for each 2.5% bin, corresponding roughly to the interquartile range (IQR)]. Each location contributes one *P*_{cool} value. Again, overlapping periods for trend calculation and spatial correlation limit the effective degrees of freedom and hinder clean calculations of statistical significance in the disparities. Disparities themselves are assessed using the Kolmogorov–Smirnov distance (*D*_{KS}), for which a value of 0 means distributions match perfectly, whereas a value of 1 indicates distributions do not even overlap. Minimal, median, and maximal values among the 15 computed *D*_{KS} values are provided on each panel of Fig. 5. For these cases, raw simulations (method 1) often present a *P*_{cool} histogram with a light left tail relative to the observational one. Another noteworthy result is that there is no marked difference between methods 1 and 2, as revealed graphically and by *D*_{KS} values. Method 3 leads to model histograms much closer to one another for cases shown in Fig. 5. One potential explanation is spatial decorrelation of the variability, which produces much richer effective sampling. The difference between results for method 3 and those for the alternative method previously described illustrates how the right background state may contribute to obtaining the right *P*_{cool} values.

For a more complete picture, Fig. 6 presents the *D*_{KS} values for each case season/duration/method/ESM. Each model’s *P*_{cool} distribution is compared with the observational one. Again, each black bar gives the IQR among the 15 ESMs. The green marker corresponds to *D*_{KS} between observations and the multimodel ensemble. Results can be summarized as follows. Adjusting variability with method 2 generally does not bring *P*_{cool} distributions closer to the observed ones. This results partially from compensational effects, whereby occurrence numbers for inflation and compression of standard deviation are comparable (this is not the case for winter, where models produce many more cases of excessive variability than the reverse). Altered trends must also cross the zero line in order to make a difference in *P*_{cool}, so it is in some ways harder to impact *P*_{cool} than to impact the underlying trend distribution. Generating variability with an AR(1) model (in method 3) often—but not always—leads to a narrower IQR, but this does not necessarily co-occur with a general *D*_{KS} decrease. Thus, altering variability and/or lag-1 autocorrelation toward observed values does not systematically lead to closer *P*_{cool} distributions, although very high *D*_{KS} values are often diminished with method 3. The multimodel distribution is found most often within the IQR of individual models and sometimes in the best quartile. There is no case where it is found in the worst quartile.

Results for the alternative method suggest at least two things. First, using the same background state as observations often brings a more substantial improvement than using the same standard deviation. Second, with the same background states, it seems that 4 AR(1) runs per location are enough for having convergence toward some *P*_{cool} distribution (this is not the case for an individual location). The spread associated with method 3 is thus more due to the simulations having diverse offsets in their background state than to random effects of the AR(1) procedure. One case in which the alternative method clearly results in worse *D*_{KS} values is the DJF/10-yr case (annual results are impacted as well). For this case, scenario histograms are shifted by ~4% toward lower *P*_{cool} values, because of a wide region around Hudson Bay where the observed *P*_{cool} values are much larger than expected from the alternative method (not shown). In this region, standard deviation of the observed residuals has increased over time, and the AR(1) model cannot systematically reproduce this heteroscedasticity. Method 3 thus obtains good results for the wrong reason (incorrect scedasticity partially compensating for inadequate background states). Another potential inadequacy of an AR(1) model could stem from residuals not being Gaussian (Gluhovsky and Agee 2007).

The availability of metrics to assess how realistic simulated *P*_{cool} distributions are during the recent-past period renders the recourse to unequal model weights tempting. For example, a relative weight equal to 1 − *D*_{KS} could be a simple, defensible choice. However, the principle of model democracy used here has been preserved for two reasons. First, better past skill does not guarantee better future skill, because the relative importance of the physical processes can evolve in a transient climate (Reifen and Toumi 2009). Second, relative model performances vary substantially across seasons and trend durations (but less across methods). For example, each raw simulation is found in the best quartile for at least one case season/duration, and the result for the worst quartile is similar.

In this section, it has been shown that forcing simulated standard deviations to fit the observed values does not lead to systematic benefits or shortcomings in hindcasting *P*_{cool} distributions. This occurs because other statistical properties, such as long-term trend and scedasticity, are also important. Taking this into account, results for 2006–35 (next section) have been computed using each of the three methods, and differences are interpreted as uncertainty related to postprocessing of simulated interannual variability.

## 4. Estimation of 2006–35 *P*_{cool} values

### a. A specific case (Quebec City)

Figure 7 shows smoothed histograms (distributions) of the trends during 2006–35 near Quebec City, for the five durations considered. Annual values and method 2 are used for this figure, with bins of size 0.333°C decade^{−1} (the smoothing procedure is to facilitate viewing and consists of connecting the bin centers). Unsurprisingly, medians are located in the positive trend zone (between 0.38° and 0.47°C decade^{−1}), the shorter the duration is the wider the distribution becomes, and the fraction of the area under the curve located in the negative zone decreases as duration increases. Corresponding *P*_{cool} values are presented in Fig. 8, along with results for each season and for the other scenario design methods. Square symbols indicate the *P*_{cool} value (frequency of occurrence), whereas associated solid lines illustrate CIs at the 95% level (see the appendix). The gradual decrease in *P*_{cool} as duration increases appears to be robust through seasons and methods. Values approach 50% for the 5-yr duration, but this threshold is not exceeded (however, this does happen at a few locations for the 5-yr duration and method 3, with the extremal case at *P*_{cool} = 51.9%). The MAM and DJF seasons generally show the highest *P*_{cool} values, whereas ANN and JJA generally show the lowest values.

It is important to mention that the method for estimating CIs only accounts for stochastic uncertainty, through the assumption that each sample is drawn from a Gaussian distribution. Other issues, including model interdependence and RCP representativeness, are not accounted for by the CIs. Dependence on the postprocessing method is, however, explicitly shown in the three panels. The variety of error sources associated with estimating probabilities strongly suggests not interpreting *P*_{cool} values too precisely. Hence, there must be more focus on geographical patterns and comparisons between seasons, methods, and durations than on the *P*_{cool} values themselves. CIs will not be presented for the remainder of the paper, for the sake of brevity.

### b. Results on the Canadian scale

Probabilities of negative trends have been calculated for the 5146 grid points covering Canada. Figures 9, 10, and 11 show results obtained per season for the 20-yr duration using methods 1, 2 and 3, respectively. Patterns are relatively similar between methods, although maps appear noticeably noisier for method 3, because of the fact that variability is spatially uncorrelated. Roughly speaking, the regions with the highest *P*_{cool} values are parts of the Prairies (as defined in Fig. 1), British Columbia, southern Yukon, and Newfoundland–Labrador for DJF, Newfoundland–Labrador and the Eastern Baffin Island area for MAM, land surrounding the Beaufort Sea and a thin segment along the Labrador coast for JJA, and parts of the Prairies for SON. On the other hand, the regions with the lowest *P*_{cool} values correspond to parts of Nunavut for DJF and MAM, to southern Quebec, New Brunswick, and parts of the Canadian Cordillera for JJA, and to southeastern Canada and parts of Nunavut and southern Ontario for SON.

Increasing (decreasing) the duration generally leads to the same geographical patterns, with an overall decrease (increase) in *P*_{cool} (not shown). This overall variation linked with duration is illustrated in Table 3, which presents the *P*_{cool} interdecile range [IDR (10th–90th percentiles)] of the distribution of the 5146 values over Canada, for each case duration/season/method. As duration increases (for fixed season and method), the IDR moves toward lower probability values, and generally widens. With regards to the method’s impact (keeping duration and season fixed), method 3 often results in IDRs with the lowest probability values. Finally, seasonal influence could roughly be characterized by the following ranking of *P*_{cool} for fixed duration and method: MAM, DJF, JJA, SON, and ANN (from highest to lowest values, based on 10th and 90th percentile values). Spring and winter are associated with higher probabilities on the Canadian scale, but this is not necessarily the case for every location.

The 10th–90th percentiles over *P*_{cool} values (%) at 5146 grid points of Canadian inland territory, inferred from 60 scenarios (methods 1–3) during 2006–35.

Results provided here consist of the statistical aggregate of a variety of climate trajectories, which, moreover, stem from models with different (though related) physics formulations. A physical interpretation would necessitate an investigation of physical processes underlying the trajectories of individual simulations, as performed, for example, by Deser et al. (2014). Such an analysis is beyond the scope of the present paper, but how variability and long-term linear trend patterns correlate with *P*_{cool} patterns has been investigated. Results using Spearman rank correlation coefficients (*r*_{Spe}) are shown in Table 4 for the 5-yr duration. For each case season/method/location, 30-yr linear regressions are performed on each of the 60 scenarios to determine individual variability and trend values, which are next averaged. Results verify a posteriori that variability is generally linked with *P*_{cool}, although the low number of climatically independent regions implies a low level of statistical significance. The long-term trend appears more important, and the variability-to-trend ratio is an even better indicator (absolute values of r_{Spe} are the same when the inverse signal-to-noise ratio is used). Results are qualitatively the same for longer durations but have not been shown since it is trivial that, for example, 25-yr *P*_{cool} values are correlated with 30-yr linear trends. These results suggest that, for example, the low winter (high summer) *P*_{cool} values for the western Arctic Archipelago relative to other regions (Figs. 9–11) may be explained in terms of relatively low (high) ratios of the variability over the background trend.

Spearman rank correlation coefficient between 5-yr *P*_{cool} and other statistical properties of scenarios over 5146 grid points of Canadian inland territory during 2006–35. Variability (standard deviation of linearly detrended time series) and linear trends are calculated for each scenario and then averaged over the 60 scenarios at each grid point before variability/trend ratios are calculated and correlated with *P*_{cool}.

## 5. Summary with discussion

This paper aims to produce probabilistic scenarios for short-term cooling trend occurrence over Canada during 2006–35 and to evaluate the impact of climate variability postprocessing on results. The frequency of negative temperature trends in an ensemble of 60 climate scenarios has been interpreted as the probability of cooling trend (*P*_{cool}), using three different postprocessing methods: direct usage of locally interpolated simulations (method 1), calibration of variability based on a procedure akin to bias correction (method 2), and use of an autoregressive model (method 3).

It has been verified in hindcast mode (during 1962–2010) that *P*_{cool} distributions across Canada are relatively reliable. Indeed, observational distributions do not differ from scenario distributions more than these differ from one another, except for winter and 15–25-yr durations (atypical overforecasting bias) and for autumn and 25-yr duration (underforecasting). Hindcast results also show that postprocessing of the variability has a slight impact on *P*_{cool} distributions but does not improve reliability much. Moreover, it appears that short-term *P*_{cool} reliability is more limited by inadequacies in model long-term trends than in model variability. Results for 2006–35 suggest that the most influential indicator may be the signal-to-noise ratio, because short-term *P*_{cool} across Canada is more closely correlated to the ratio of the background trend over variability than to the background trend or to variability alone.

Calculations for 2006–35 unsurprisingly lead to higher *P*_{cool} values for shorter durations. For example, probabilities of cooling in mean annual values across Canada decrease from ~40%–46% for 5-yr durations to ~2%–18% for 25-yr durations (interdecile ranges). The case of Quebec City highlights that stochastic uncertainty in *P*_{cool} is considerable and may be larger than that associated with the interannual variability postprocessing method. However, it must be mentioned that a larger ensemble would have led to less stochastic uncertainty and that the assigned number of effective degrees of freedom is somewhat subjective, in part because models are not fully independent.

It must be emphasized that besides the stochastic issue, there is also epistemic uncertainty. Indeed, the CMIP5 simulations used are based on four different representative concentration pathways (RCPs), which do not cover all socioeconomic possibilities. For example, the deployment of a geoengineering scheme involving sulfate aerosols (Keith 2013) or a nuclear conflict (Özdoğan et al. 2013) are two conceivable events for which regional climate impacts could be outside of the range covered by RCP-based simulations. Volcanism and solar activity, not accounted for in RCPs because they are natural forcings, could also impact *P*_{cool} calculations, were they more predictable. One interesting approach would be to use occurrence probabilities for significant volcanic eruptions (Hyde and Crowley 2000) and for the onset of a grand solar minimum (Lockwood 2010).

Despite the limitations just mentioned, it seems clear that, during the coming decades, cooling trends of durations up to 25 yr (and possibly more) do represent plausible outcomes at any location in Canada. This in no way disproves global warming, because these cooling trends occur in simulations whose background state is warming. It also appears that, as a counterpoint, short-term warming trends could occur that are much stronger than the expected range of long-term trends.

## Acknowledgments

We acknowledge the World Climate Research Programme’s Working Group on Coupled Modelling, which is responsible for CMIP, and we thank the climate modelling groups (listed in Table 1 of this paper) for producing and making their model output available. The U.S. Department of Energy’s Program for Climate Model Diagnosis and Intercomparison provides coordinating support and led the development of software infrastructure for CMIP in partnership with the Global Organization for Earth System Science Portals. We also thank Dan McKenney and his team at the Canadian Forest Service of Natural Resources Canada for providing the observational product used here. Finally, we acknowledge the useful work of three anonymous reviewers.

## APPENDIX

### Confidence Intervals for *P*_{cool}

*P*

_{cool}is determined by creating a model of the population of trends for each case based on considerations found in Scheaffer and McClave (1990, section 7.3). To explain the method used, let us consider the sample of trends for the 10-yr duration shown in Fig. 4. This distribution is based on 1260 values, corresponding to 60 independent simulations and 21 blocks falling within 2006–35 (the first 10-yr block would be 2006–15), and has a mean

*m*and a variance

*s*

^{2}. The sample is assumed to be drawn from a population with a Gaussian distribution with a mean μ and variance σ

^{2}, so that

*P*

_{cool}of the population is given by

*m*and

*s*, and confidence intervals must be set, at the 95% level in this case. The CI for σ

^{2}is given by

*n*− 1 stands for the number of degrees of freedom and

*t*distribution, appropriate for the mean of a Gaussian distribution. As

*n*> 30, the

*t*distribution approaches the Gaussian distribution, and the critical values

*s*is conservatively replaced by the upper bound of the CI for σ.

Once the CIs for μ and σ are determined, the four combinations of the CI bounds are successively introduced in Eq. (A1), and the two extremal values among these four are used as the CI for *P*_{cool}. Consistency is ensured by checking that *P*_{cool}, calculated as the proportion of negative values in the sample, falls within the corresponding CI. For the case referred to above, the sample has *m* = 0.41°C and *s* = 0.84°C, which gives a CI for *P*_{cool} of [17.2%, 43.9%], whereas the proportion of negative values in the sample is 30.4%.

## REFERENCES

Ahmed, K. F., G. Wang, J. Silander, A. M. Wilson, J. M. Allen, R. Horton, and R. Anyah, 2013: Statistical downscaling and bias correction of climate model outputs for climate change impact assessment in the U.S. northeast.

,*Global Planet. Change***100**, 320–332, doi:10.1016/j.gloplacha.2012.11.003.Bhend, J., and P. Whetton, 2013: Consistency of simulated and observed regional changes in temperature, sea level pressure and precipitation.

,*Climatic Change***118**, 799–810, doi:10.1007/s10584-012-0691-2.Collins, M., 2007: Ensembles and probabilities: A new era in the prediction of climate change.

*Philos. Trans. Roy. Soc. London,***A365,**1957–1970, doi:10.1098/rsta.2007.2068.Cowtan, K., and R. G. Way, 2013: Coverage bias in the HadCRUT4 temperature series and its impact on recent temperature trends.

*Quart. J. Roy. Meteor. Soc.,***140,**1935–1944, doi:10.1002/qj.2297.Curry, J., 2011: Reasoning about climate uncertainty.

,*Climatic Change***108**, 723–732, doi:10.1007/s10584-011-0180-z.de Elía, R., S. Biner, and A. Frigon, 2013: Interannual variability and expected regional climate change over North America.

,*Climate Dyn.***41**, 1245–1267, doi:10.1007/s00382-013-1717-9.Deser, C., R. Knutti, S. Solomon, and A. S. Phillips, 2012: Communication of the role of natural variability in future North American climate.

*Nat. Climate Change,***2,**775–779, doi:10.1038/NCLIMATE1562.Deser, C., A. S. Phillips, M. A. Alexander, and B. V. Smoliak, 2014: Projecting North American climate over the next 50 years: Uncertainty due to internal variability.

,*J. Climate***27**, 2271–2296, doi:10.1175/JCLI-D-13-00451.1.Easterling, D. R., and M. F. Wehner, 2009: Is the climate warming or cooling?

,*Geophys. Res. Lett.***36**, L08706, doi:10.1029/2009GL037810.Ehret, U., E. Zehe, V. Wulfmeyer, K. Warrach-Sagi, and J. Liebert, 2012: HESS opinions “Should we apply bias correction to global and regional climate model data?”

,*Hydrol. Earth Syst. Sci.***16**, 3391–3404, doi:10.5194/hess-16-3391-2012.England, M. H., and Coauthors, 2014: Recent intensification of wind-driven circulation in the Pacific and the ongoing warming hiatus.

,*Nat. Climate Change***4**, 222–227, doi:10.1038/nclimate2106.Estrada, F., P. Perron, and B. Martínez-López, 2013: Statistically derived contributions of diverse human influences to twentieth-century temperature changes.

,*Nat. Geosci.***6**, 1050–1055, doi:10.1038/ngeo1999.Foster, G., and S. Rahmstorf, 2011: Global temperature evolution 1979–2010.

,*Environ. Res. Lett.***6**, doi:10.1088/1748-9326/6/4/044022.Gluhovsky, A., and E. Agee, 2007: On the analysis of atmospheric and climatic time series.

,*J. Appl. Meteor. Climatol.***46**, 1125–1129, doi:10.1175/JAM2512.1.Haerter, J. O., S. Hagemann, C. Moseley, and C. Piani, 2011: Climate model bias correction and the role of timescales.

,*Hydrol. Earth Syst. Sci.***15**, 1065–1079, doi:10.5194/hess-15-1065-2011.Hall, J., C. Twyman, and A. Kay, 2005: Influence diagrams for representing uncertainty in climate-related propositions.

,*Climatic Change***69**, 343–365, doi:10.1007/s10584-005-2527-9.Hawkins, E., and R. Sutton, 2009: The potential to narrow uncertainty in regional climate predictions.

,*Bull. Amer. Meteor. Soc.***90**, 1095–1107, doi:10.1175/2009BAMS2607.1.Hopkinson, R. F., D. W. McKenney, E. J. Milewska, M. F. Hutchinson, P. Papadopol, and L. A. Vincent, 2011: Impact of aligning climatological day on gridding daily maximum–minimum temperature and precipitation over Canada.

,*J. Appl. Meteor. Climatol.***50**, 1654–1665, doi:10.1175/2011JAMC2684.1.Huard, D., D. Chaumont, T. Logan, M.-F. Sottile, R. D. Brown, B. Gauvin St-Denis, P. Grenier, and M. Braun, 2014: A decade of climate scenarios: The Ouranos consortium modus operandi.

*Bull. Amer. Meteor. Soc.,***95,**1213–1225, doi:10.1175/BAMS-D-12-00163.1.Hutchinson, M. F., D. W. McKenney, K. Lawrence, J. H. Pedlar, R. F. Hopkinson, E. Milewska, and P. Papadopol, 2009: Development and testing of Canada-wide interpolated spatial models of daily minimum–maximum temperature and precipitation for 1961–2003.

,*J. Appl. Meteor. Climatol.***48**, 725–741, doi:10.1175/2008JAMC1979.1.Hyde, W. T., and T. J. Crowley, 2000: Probability of future climatically significant volcanic eruptions.

,*J. Climate***13**, 1445–1450, doi:10.1175/1520-0442(2000)013<1445:LOFCSV>2.0.CO;2.Jones, P. D., M. New, D. E. Parker, S. Martin, and I. G. Rigor, 1999: Surface air temperature and its changes over the past 150 years.

,*Rev. Geophys.***37**, 173–199, doi:10.1029/1999RG900002.Kaufmann, R. K., H. Kauppi, M. L. Mann, and J. H. Stock, 2011: Reconciling anthropogenic climate change with observed temperature 1998–2008.

*Proc. Natl. Acad. Sci. USA,***108,**11 790–11 793, doi:10.1073/pnas.1102467108.Keith, D., 2013:

*A Case for Climate Engineering.*MIT Press, 194 pp.Knight, J., and Coauthors, 2009: Do global temperature trends over the last decade falsify climate predictions? [in “State of the Climate in 2008”].

,*Bull. Amer. Meteor. Soc.***90**(8), S22–S23.Knutson, T. R., F. Zeng, and A. T. Wittenberg, 2013: Multimodel assessment of regional surface temperature trends: CMIP3 and CMIP5 twentieth-century simulations.

,*J. Climate***26**, 8709–8743, doi:10.1175/JCLI-D-12-00567.1.Knutti, R., D. Masson, and A. Gettelman, 2013: Climate model genealogy: Generation CMIP5 and how we got there.

,*Geophys. Res. Lett.***40**, 1194–1199, doi:10.1002/grl.50256.Kopp, G., and J. L. Lean, 2011: A new, lower value of total solar irradiance: Evidence and climate significance.

*Geophys. Res. Lett.,***38,**L01706, doi:10.1029/2010GL045777.Kosaka, Y., and S.-P. Xie, 2013: Recent global-warming hiatus tied to equatorial Pacific surface cooling.

,*Nature***501**, 403–407, doi:10.1038/nature12534.Lean, J., 2010: Cycles and trends in solar irradiance and climate.

,*Wiley Interdiscip. Rev.: Climate Change***1**, 111–122, doi:10.1002/wcc.18.Li, J., C. Sun, and F.-F. Jin, 2013: NAO implicated as a predictor of Northern Hemisphere mean temperature multidecadal variability.

,*Geophys. Res. Lett.***40**, 5497–5502, doi:10.1002/2013GL057877.Lockwood, M., 2010: Solar change and climate: An update in the light of the current exceptional solar minimum.

,*Proc. Roy. Soc. London***A466**, 303–329, doi:10.1098/rspa.2009.0519.Maraun, D., 2013: Bias correction, quantile mapping, and downscaling: Revisiting the inflation issue.

,*J. Climate***26**, 2137–2143, doi:10.1175/JCLI-D-12-00821.1.Masson, D., and R. Knutti, 2011: Climate model genealogy.

,*Geophys. Res. Lett.***38**, L08703, doi:10.1029/2011GL046864.Meehl, G. A., J. M. Arblaster, J. T. Fasullo, A. Hu, and K. E. Trenberth, 2011: Model-based evidence of deep-ocean heat uptake during surface-temperature hiatus periods.

,*Nat. Climate Change***1**, 360–364, doi:10.1038/nclimate1229.Morice, C. P., J. J. Kennedy, N. A. Rayner, and P. D. Jones, 2012: Quantifying uncertainties in global and regional temperature change using an ensemble of observational estimates: The HadCRUT4 data set.

,*J. Geophys. Res.***117**, D08101, doi:10.1029/2011JD017187.Özdoğan, M., A. Robock, and C. J. Kucharik, 2013: Impacts of a nuclear war in South Asia on soybean and maize production in the Midwest United States.

,*Climatic Change***116**, 373–387, doi:10.1007/s10584-012-0518-1.Piani, C., J. O. Haerter, and E. Coppola, 2010: Statistical bias correction for daily precipitation in regional climate models over Europe.

,*Theor. Appl. Climatol.***99**, 187–192, doi:10.1007/s00704-009-0134-9.Räisänen, J., 2007: How reliable are climate models.

,*Tellus***59A**, 2–29, doi:10.1111/j.1600-0870.2006.00211.x.Räisänen, J., and T. N. Palmer, 2001: A probability and decision-model analysis of a multimodel ensemble of climate change simulations.

,*J. Climate***14**, 3212–3226, doi:10.1175/1520-0442(2001)014<3212:APADMA>2.0.CO;2.Reifen, C., and R. Toumi, 2009: Climate projections: Past performance no guarantee of future skill?

,*Geophys. Res. Lett.***36**, L13704, doi:10.1029/2009GL038082.Risbey, J. S., S. Lewandowsky, C. Langlais, D. P. Monselesan, T. J. O’Kane, and N. Oreskes, 2014: Well-estimated global surface warming in climate projections selected for ENSO phase.

,*Nat. Climate Change***4**, 835–840, doi:10.1038/nclimate2310.Sakaguchi, K., X. Zeng, and M. A. Brunke, 2012: The hindcast skill of the CMIP ensembles for the surface air temperature trend.

,*J. Geophys. Res.***117**, D16113, doi:10.1029/2012JD017765.Santer, B. D., and Coauthors, 2011: Separating signal and noise in atmospheric temperature changes: The importance of timescale.

,*J. Geophys. Res.***116**, D22105, doi:10.1029/2011JD016263.Scheaffer, R. L., and J. T. McClave, 1990:

*Probability and Statistics for Engineers.*3rd ed. Duxbury Press, 696 pp.Scherrer, S. C., C. Appenzeller, M. A. Liniger, and C. Schär, 2005: European temperature distribution changes in observations and climate change scenarios.

,*Geophys. Res. Lett.***32**, L19705, doi:10.1029/2005GL024108.Schneider, S. H., 2002: Can we estimate the likelihood of climatic scenarios at 2100?

,*Climatic Change***52**, 441–451, doi:10.1023/A:1014276210717.Sheffield, J., and Coauthors, 2013: North American climate in CMIP5 experiments. Part I: Evaluation of historical simulations of continental and regional climatology.

,*J. Climate***26**, 9209–9245, doi:10.1175/JCLI-D-12-00592.1.Solomon, S., K. H. Rosenlof, R. W. Portmann, J. S. Daniel, S. M. Davis, T. J. Sanford, and G.-K. Plattner, 2010: Contributions of stratospheric water vapor to decadal changes in the rate of global warming.

,*Science***327**, 1219–1223, doi:10.1126/science.1182488.Stephenson, D. B., M. Collins, J. C. Rougier, and R. E. Chandler, 2012: Statistical problems in the probabilistic prediction of climate change.

,*Environmetrics***23**, 364–372, doi:10.1002/env.2153.Stocker, T. F., and Coauthors, 2013:

*Climate Change 2013: The Physical Science Basis.*Cambridge University Press, 1535 pp. [Available online at http://www.climatechange2013.org/images/report/WG1AR5_ALL_FINAL.pdf.]Stott, P., P. Good, G. Jones, N. Gillett, and E. Hawkins, 2013: The upper end of climate model temperature projections is inconsistent with past warming.

,*Environ. Res. Lett.***8**, 014024, doi:10.1088/1748-9326/8/1/014024.Taylor, K. E., R. J. Stouffer, and G. A. Meehl, 2012: An overview of CMIP5 and the experiment design.

,*Bull. Amer. Meteor. Soc.***93**, 485–498, doi:10.1175/BAMS-D-11-00094.1.Tebaldi, C., and R. Knutti, 2007: The use of the multi-model ensemble in probabilistic climate projections.

,*Philos. Trans. Roy. Soc. London***A365**, 2053–2075, doi:10.1098/rsta.2007.2076.Themeßl, M. J., A. Gobiet, and G. Heinrich, 2012: Empirical-statistical downscaling and error correction of regional climate models and its impact on the climate change signal.

,*Climatic Change***112**, 449–468, doi:10.1007/s10584-011-0224-4.van Oldenborgh, G. J., F. J. Doblas-Reyes, S. S. Drijfhout, and E. Hawkins, 2013: Reliability of regional climate model trends.

,*Environ. Res. Lett.***8**, 014055, doi:10.1088/1748-9326/8/1/014055.van Vuuren, D. P., and Coauthors, 2011: The representative concentration pathways: An overview.

,*Climatic Change***109**, 5–31, doi:10.1007/s10584-011-0148-z.von Storch, H., and F. Zwiers, 2013: Testing ensembles of climate change scenarios for “statistical significance.”

,*Climatic Change***117**, 1–9, doi:10.1007/s10584-012-0551-0.Wilks, D. S., 2006:

*Statistical Methods in the Atmospheric Sciences.*2nd ed. International Geophysics Series, Vol. 59, Academic Press, 627 pp.Yokohata, T., and Coauthors, 2013: Reliability and importance of structural diversity of climate model ensembles.

,*Climate Dyn.***41,**2745–2763, doi:10.1007/s00382-013-1733-9.Zeng, X., and A. Wang, 2012: What is monthly mean land surface air temperature?

,*Eos Trans. Amer. Geophys. Union***93**, 156, doi:10.1029/2012EO150006.