## Abstract

A Bayesian statistical model is proposed that combines information from a multimodel ensemble of atmosphere–ocean general circulation models (AOGCMs) and observations to determine probability distributions of future temperature change on a regional scale. The posterior distributions derived from the statistical assumptions incorporate the criteria of bias and convergence in the relative weights implicitly assigned to the ensemble members. This approach can be considered an extension and elaboration of the reliability ensemble averaging method. For illustration, the authors consider the output of mean surface temperature from nine AOGCMs, run under the A2 emission scenario from the Synthesis Report on Emission Scenarios (SRES), for boreal winter and summer, aggregated over 22 land regions and into two 30-yr averages representative of current and future climate conditions. The shapes of the final probability density functions of temperature change vary widely, from unimodal curves for regions where model results agree (or outlying projections are discounted) to multimodal curves where models that cannot be discounted on the basis of bias give diverging projections. Besides the basic statistical model, the authors consider including correlation between present and future temperature responses, and test alternative forms of probability distributions for the model error terms. It is suggested that a probabilistic approach, particularly in the form of a Bayesian model, is a useful platform from which to synthesize the information from an ensemble of simulations. The probability distributions of temperature change reveal features such as multimodality and long tails that could not otherwise be easily discerned. Furthermore, the Bayesian model can serve as an interdisciplinary tool through which climate modelers, climatologists, and statisticians can work more closely. For example, climate modelers, through their expert judgment, could contribute to the formulations of prior distributions in the statistical model.

## 1. Introduction

Numerical experiments based on atmosphere–ocean general circulation models (AOGCMs) are one of the primary tools used in deriving projections for future climate change. However, the strengths and weaknesses that individual AOGCMs display have led authors of model evaluation studies to state that “no single model can be considered ‘best’ and it is important to utilize results from a range of coupled models” (McAvaney et al. 2001). In this paper, we propose a probabilistic approach to the synthesis of climate projections from different AOGCMs, in order to produce probabilistic forecasts of climate change.

Determining probabilities of future global temperature change has flourished as a research topic in recent years (Schneider 2001; Wigley and Raper 2001; Forest et al. 2002; Allen et al. 2000). This work has been based for the most part on the analysis of energy balance or reduced climate system models that facilitate many different model integrations and hence allow the estimation of a distribution of climate projections. However, the focus on low-dimensional models prevents a straightforward extension of this work to regional climate change analyses, which are the indispensable input for impacts research and decision making. In this study we draw on more conventional AOGCM experiments in order to address specifically regional climate change. We recognize, though, that probabilistic forecasts at the level of resolution of the typical AOGCM is a feat of enormous complexity. The high dimensionality of the datasets, the scarcity of observations in many regions of the globe, and the limited length of the observational records have required ad hoc solutions in detection and attribution studies, and it is hard to structure a full statistical approach able to encompass these methods and add the further dimension of the superensemble dataset. At present, we offer a first step toward a formal statistical treatment of the problem, by examining regionally averaged temperature signals. This way, we greatly simplify the dimensionality of the problem, while still addressing the need for regionally differentiated analyses.

### a. Model bias and model convergence

Recently, Giorgi and Mearns's reliability ensemble average (REA) method (Giorgi and Mearns 2002) quantified the two criteria of *bias* and *convergence* for multimodel evaluation, and produced estimates of regional climate change and model reliabilities through a weighted average of the individual AOGCM results. The REA weights contain a measure of model bias with respect to current climate and a measure of model convergence, defined as the deviation of the individual projection of change with respect to the central tendency of the ensemble. Thus, models with small bias and whose projections agree with the consensus receive larger weights. Models with lesser skill in reproducing the observed conditions and appearing as outliers with respect to the majority of the ensemble members receive less weight. In a subsequent note Nychka and Tebaldi (2003) show that the REA method is in fact a conventional statistical estimate for the center of a distribution that departs from a Gaussian shape by having heavier tails. Thus, although the REA estimates were proposed by Giorgi and Mearns as a reasonable quantification of heuristic criteria, there exists a formal statistical model that can justify them as an optimal procedure. The research in this paper is motivated by this statistical insight. Here we start from the same data used by Giorgi and Mearns (i.e., 30-yr regional climate averages representative of current and future conditions) and propose statistical models that recover the REA framework but are flexible in the definition of bias and convergence and are easily generalizable to richer data structures. The main results of our analysis consist of probability distributions of temperature change at regional scales that reflect a relative weighing of the different AOGCMs according to the two criteria.

### b. A Bayesian approach to climate change uncertainty

There is interest in the geosciences in moving from single-value predictions to probabilistic forecasts, in so far as presenting a probability distribution of future climate is a more flexible quantification for drawing inferences and facilitating decision making (Reilly et al. 2001; Webster 2003; Dessai and Hulme 2003). Bayesian methods are not the only option when the goal is a probabilistic representation of uncertainty, but are a natural way to do so in the context of climate change projections.

The choice of prior distributions is crucial, when adopting a Bayesian approach, representing the step in the analysis more open to subjective evaluations. Within the climate community, different viewpoints have been expressed regarding the use of expert judgement in quantifying uncertainty. For example, Wigley and Raper (2001) explicitly chose prior distributions to be consistent with their own expert judgement of the uncertainty of key parameters such as climate sensitivity, and Reilly et al. (2001) argued that such assessments should be part of the Intergovernmental Panel on Climate Change (IPCC) process for quantifying uncertainty in climate projections. Opposing this view, Allen et al. (2001) argued that “no method of assigning probabilities to a 100-year climate forecast is sufficiently widely accepted and documented in the refereed literature to pass the extensive IPCC review process.” (They suggested that there might be better prospects of success with a 50-yr forecast, since over this time frame there is better agreement among models with respect to both key physical parameters and the sensitivity to different emissions scenarios.) Following this philosophy, Forest et al. (2002) stated “an objective means of quantifying uncertainty . . . is clearly desirable” and argued that this was achievable by choosing parameters “that produce simulations consistent with 20th-century climate change.” However, their analysis did not use multimodel ensembles. We view the present approach as an extension of the philosophy reflected in the last two quotations, where we adopt uninformative prior distributions but use both model-generated and observational data to calculate meaningful posterior distributions. Another point in favor of uninformative prior distributions is that they typically lead to parameter estimates similar to non-Bayesian approaches such as maximum likelihood. However, Bayesian methods are more flexible when combining different sources of uncertainty, such as those derived from present-day and future climate model runs, and we regard this as a practical justification for adopting a Bayesian approach.

### c. Outline

Section 2 contains the description of the basic statistical model and its extensions. Prior distributions for the parameters of interest, and distributional assumptions for the data, conditional on these parameters, are described. We then present analytical approximations to the posterior distribution in order to gain insight into the nature of the statistical model and its results. In section 3 the model is applied to the same suite of AOGCM experiments as in Giorgi and Mearns (2002), and some findings from the posterior distributions are presented. In section 4 we conclude with a discussion of our model assumptions and what we consider promising directions for extending this work. The Markov chain Monte Carlo (MCMC) method used to estimate the posterior distributions is described in the appendix.

## 2. Statistical models for AOGCM projections

Adopting the Bayesian viewpoint the uncertain quantities of interest become the parameters of the statistical model and are treated as random variables. A *prior probability distribution* for them is specified independently of the data at hand. The *likelihood* component of the statistical model specifies the conditional distribution of the data, given the model parameters. Through Bayes's theorem prior and likelihood are combined into the *posterior distribution* of the parameters, given the data. Formally, let Θ be the vector of model parameters, and *p*(Θ) their prior distribution. The data *D*, under the assumptions formulated in the statistical model, has likelihood *p*(*D*|Θ). Bayes's theorem states that

where *p*(Θ|*D*) is the posterior distribution of the parameters and forms the basis of any formal statistical inference about them.

When the complexity of *p*(Θ|*D*) precludes a closed-form solution, which is the case in our application, an empirical estimate of the posterior distribution can be obtained through MCMC simulation. MCMC techniques are efficient ways of simulating samples from the posterior distribution, bypassing the need of computing it analytically, and inference can be drawn through smoothed histograms and numerical summaries based on the MCMC samples.

### a. The model for a single region

In this section we present the analytical form of our basic statistical model. We list first the distributional assumptions for the data (likelihood), then the priors for all the model parameters. We then present conditional approximations to the posterior that will help to interpret our results. Throughout, let *X _{i}* and

*Y*denote the temperature simulated by AOGCM

_{i}*i*, seasonally and regionally averaged and aggregated into two 30-yr means representative of current and future climate, respectively.

#### 1) Likelihoods

We assume Gaussian distributions for *X _{i}*,

*Y*

_{i}:where the notation *N*(*μ*, *λ*^{−1}) indicates a Gaussian distribution with mean *μ* and variance 1/*λ*. Here *μ* and *ν* represent the true values of present and future temperature in a specific region and season. A key parameter of interest will be Δ*T* ≡ *ν* − *μ*, representing the expected temperature change. The parameter *λ _{i}*, reciprocal of the variance, is referred to as the

*precision*of the distribution of

*X*. To allow for the possibility that

_{i}*Y*has different precision from

_{i}*X*, we parameterize its distribution by the product

_{i}*θλ*where

_{i}*θ*is an additional parameter, common to all AOGCMs. A more general model might allow for

*θ*to be different in different models, but that is not possible in the present setup given the limited number of data points.

The assumptions underlying (1) and (2) are that the AOGCM responses have a symmetric distribution, whose center is the “true value” of temperature, but with an individual variability, to be regarded as a measure of how well each AOGCM approximates the climate response to the given set of natural and anthropogenic forcings. The assumption of a symmetric distribution around the true value of temperature for the suite of multimodel responses has been implicitly supported by the Coupled Model Intercomparison Project (CMIP) studies (Meehl et al. 2000) where better validation properties have been demonstrated for the mean of a superensemble rather than the individual members. The additional assumption in our model that the single AOGCM's realizations are centered around the true value could be easily modified in the presence of additional data. For example, if single-model ensembles were available, then an AOGCM-specific random effect could be incorporated.

We model the likelihood of the observations of current climate as

Here *μ* is the same as in (1), but *λ*_{0} is of a different nature from *λ*_{1}, . . . , *λ*_{9}. While the latter are measures of model-specific precision, and depend on the numerical approximations, parameterizations, grid resolutions of each AOGCM, *λ*_{0} is a function of the natural variability specific to the season, region and time average applied to the observations. In our model we fix the value of the parameter *λ*_{0}, using estimates of regional natural variability from Giorgi and Mearns (2002). The parameter *λ*_{0} could be treated as a random variable as well if our data contained a long record of observations that we could use for its estimation.

#### 2) Prior distributions

The model described by (1)–(3) is formulated as a function of the parameters *μ*, *ν*, *θ*, *λ*_{1}, . . . , *λ*_{9}. Following the arguments presented in section 1b, we choose uninformative priors as follows.

- Precision parameters
*λ*,_{i}*i*= 1, . . . , 9 have Gamma prior densities [indicated by Ga(*a*,*b*)], of the form with*a*,*b*known and chosen so that the distribution will have a large variance over the positive real line. Similarly,*θ ∼*Ga(*c*,*d*), with*c*,*d*known. These are standard prior choices for the precision parameters of Gaussian distributions. In particular they are*conjugate*, in the sense of having the same general form as the likelihood considered as a function of*λ*, thus ensuring the computational tractability of the model. We choose_{i}*a*=*b*=*c*=*d*= 0.001, that translate into Gamma distributions with mean 1 and variance 1000. By being extremely diffuse the distributions have the noninformative quality that we require in our analysis. True climate means,

*μ*and*ν*for present and future temperature, respectively, have uniform prior densities on the real line. Even if these priors are improper (i.e., do not integrate to 1), the form of the likelihood model ensures that the posterior is a proper density function. Alternative analyses in which the prior distribution is restricted to a finite, but sufficiently large, interval, do not in practice produce different results.

#### 3) Posterior distributions

We apply Bayes's theorem to the likelihood and priors specified above. The resulting joint posterior density for the parameters *μ*, *ν*, *θ*, *λ*_{1}, . . . , *λ*_{9} is given by, up to a normalizing constant,

The distribution in Eq. (4) is not a member of any known parametric family and we cannot draw inference from its analytical form. The same is true for the marginal posterior distributions of the individual parameters, that we cannot compute from (4) through closed-form integrals. Therefore, MCMC simulation is used to generate a large number of sample values from (4) for all parameters, and approximate all the summaries of interest from sample statistics. We implement the MCMC simulation through a Gibbs sampler, and details of the algorithm are given in the appendix.

Here we gain some insight as to how the posterior distribution synthesizes the data and the prior assumptions by fixing some groups of parameters and considering the conditional posterior for the others.

For example, the distribution of *μ* fixing all other parameters is a Gaussian distribution with mean

and variance

In a similar fashion, the conditional distribution of *ν* is Gaussian with mean

and variance

The forms (5) and (7) are analogous to the REA results, being weighted means of the nine AOGCMs and the observation, with weights *λ*_{1}, . . . , *λ*_{9}, *λ*_{0}, respectively. As in the case of the REA method, these weights will depend on the data, but a fundamental difference is that in our analysis they are random quantities, and thus we account for the uncertainty in their estimation. Such uncertainty will inflate the width of the posterior distributions of *ν*, *μ*, and thus also Δ*T*.

An approximation to the mean of the posterior distribution of the *λ _{i}*s, for

*i*= 1, . . . , 9, is

The form of (9) shows that the individual weight *λ _{i}* is large provided both |

*X*

_{i}−

*μ̃*| and |

*Y*

_{i}−

*ν̃*| are small. These two quantities correspond to the

*bias*and

*convergence*criteria, respectively. Here |

*Y*

_{i}−

*ν̃*| measures the distance of the

*i*th model future response from the overall average response, and so has characteristics similar to the convergence measure in Giorgi and Mearns (2002). The important difference from REA for this model is that the distance is based on the future projection (

*Y*) rather than the temperature change (

_{i}*Y*−

_{i}*X*). As for the bias term, notice that in the limit, if we let

_{i}*λ*

_{0}→ ∞ (we model the observation

*X*

_{0}as an extremely precise estimate of the true temperature

*μ*),

*μ̃*→

*X*

_{0}, and the bias term becomes in the limit |

*X*−

_{i}*X*

_{0}|, the same definition of bias as in the REA analysis. The form of (9) also reveals how by chosing

*a*=

*b*= 0.001 we ensure that the contribution of the prior assumption to (9) is negligible.

### b. Introducing correlation between present and future climate responses within each AOGCM and robustifying the model

The model presented in section 2a reproduces the basic features of the REA method. In this section we modify our model in two important directions. First, we relax the assumption of independence between *X _{i}* and

*Y*, for a given climate model. We do so by linking

_{i}*X*and

_{i}*Y*through a linear regression equation; this is equivalent to assuming that (

_{i}*X*,

_{i}*Y*) are jointly normal, given the model parameters, with an unknown correlation coefficient that is left free to vary, a priori, between −1 and +1.

_{i}Thus, adopting a slightly different notation than in section 2a, we express the statistical assumptions for *X _{i}* and

*Y*as follows. Let

_{i}where *ξ*_{i} ∼ *N*(0, *λ*^{−1}_{i}).

In this model we use the priors listed in section 2a for the parameters *μ*, *ν*, *λ*_{1}, . . . , *λ*_{9}, *θ*, and we choose a uniform prior on the real line for *β _{x}*.

In Eq. (11) *β _{x}* introduces a direct (if positive) or inverse (if negative) relation between

*X*−

_{i}*μ*and

*Y*−

_{i}*ν*. By using a common parameter we force the sign of the correlation to be the same for all AOGCMs, a constraint that is required because only two data points (current and future temperature response) per model are available for a given region and season. The value of

*β*has also implications in terms of the correlation between

_{x}*Y*−

_{i}*X*, the signal of temperature change produced by the

_{i}*i*th AOGCM, and the quantity

*X*−

_{i}*μ*, the model bias for current temperature. A value of

*β*equal to 1 translates into conditional independence of these two quantities, while values greater than or less than 1 would imply positive or negative correlation between them. (In the REA analysis, Giorgi and Mearns treat these two quantities as independent, after computing empirical estimates of their correlations and finding that they are for the most part low in absolute value and not statistically significant.) But the most important feature of the model with correlation has to do with the form of the posterior distribution for the

_{x}*λ*parameters. By an approximation similar to what we use in section 2a for the basic model, we derive that the posterior mean has now the approximate form:

_{i}If *β _{x}* ≈ 1 the form in (12) uses a convergence term now similar to the REA method's, measuring the distance between Δ

*T*≡

_{i}*Y*−

_{i}*X*and the consensus estimate of temperature change

_{i}*ν̃*−

*μ̃*. We will compare the results from the basic model and this modified version, in order to assess how differently the two definitions of convergence reflect on the final posterior distributions of regional temperature changes.

A second concern is the sensitivity of our model to outlying data points. This is a consequence of hypothesizing normal distributions for *X _{i}* and

*Y*, given the model parameters. Traditionally, robustness is achieved by choosing distributions with heavier tails than the Gaussian, and Student's

_{i}*t*distributions with few degrees of freedom are often a convenient choice because of flexibility (the degrees of freedom parameter allowing to vary the degree of their departure from the Gaussian) and computational tractability. Thus, we model

*η*and

_{i}*ξ*in Eqs. (10) and (11) above as independently distributed according to a Student's

_{i}*t*distribution, with

*ϕ*degrees of freedom. We perform separate analyses for

*ϕ*varying in the set {1, 2, 4, 8, 16, 32, 64}, the lower end of the range being associated with heavier tail distributions, the higher being equivalent to a Gaussian model. We are not a priori labeling some of the AOGCMs as outliers, but we examine the difference in the posterior distributions that ensues as a function of the statistical model assumed for the error terms. Varying the degrees of freedom can accommodate extreme projections to a larger or smaller degree and is equivalent to the convergence criterion being less or more stringent, respectively. This achieves at least qualitatively the same effect as the exponents in the REA's weights, by which the relative importance of the bias and convergence terms could be modified.

## 3. The analysis: Results from the MCMC simulation

### a. Data

We analyze temperature responses under the A2 SRES emission scenario (Nakicenovic et al. 2000), just a portion of the dataset analyzed by the REA method in Giorgi and Mearns (2002). Output from nine different AOGCMs (see Table 1 and references therein) consists of present and future average surface temperatures, aggregated over the 22 regions shown in Fig. 1, two seasons [December–January–February (DJF) and June–July–August (JJA)] and two 30-yr periods (1961–90 and 2071–2100). Thus, we will separately treat (*X _{i}*,

*Y*),

_{i}*i*= 1, . . . , 9, for each region and season. Observed temperature means for the 22 regions and the 1961–90 period (

*X*

_{0}) were also taken from Giorgi and Mearns (2002), together with estimates of regional natural variability, interpretable as the standard deviation of the three-decadal average that

*X*

_{0}represents. We use these estimates (listed in Table 2) to determine the values of the parameter

*λ*

_{0}. (The dataset can be downloaded from www.cgd.ucar.edu/~nychka/REA.)

### b. Results for a group of representative regions

We present posterior distributions derived from the basic model (section 2a) and the model introducing correlation between *X _{i}* and

*Y*(section 2b), both models assuming Gaussian distributions for the likelihoods. We choose to focus on a representative group of six regions [Alaska (ALA), east North America (ENA), southern South America (SSA), northern Europe (NEU), East Africa (EAF), northern Australia (NAU)], for both the winter (DJF) and summer (JJA) seasons.

_{i}#### 1) Temperature change

Figures 2–4 show posterior distributions of temperature change Δ*T* ≡ *ν* − *μ* in the six regions, for the winter and summer season, for the basic model (solid lines) and for the model with correlation (dashed lines). For reference, the nine models' individual responses *Y _{i}* −

*X*,

_{i}*i*= 1, . . . , 9 are plotted along the

*x*axis (as diamonds), together with Giorgi and Mearns's REA estimates of Δ̃

*T*plus or minus two standard deviations (as a triangle superimposed on a segment). From the relative position of the diamonds one can qualitatively assess a measure of convergence for each model (defined in terms of the individual AOGCM's temperature change response), and discriminate between models that behave as outliers and models that reinforce each other by predicting similar values of temperature change. The bias measure is not immediately recoverable from this representation, so in Table 3 we list values of the bias for each of the nine AOGCMs in the six regions for both summer and winter. A comparison of the densities in Figs. 2–4 with the bias values listed in the table reveals how models having relatively smaller bias receive relatively larger weight. Models that perform well with respect to both criteria are the ones where the probability density function is concentrated. The REA results are consistent overall with mean and range of the dashed curves. For most of the regions (especially in winter) the solid and dashed lines are in agreement, with the solid lines showing a slightly wider probability distribution. This is expected because of the introduction, in (11) of the covariate

*X*, which has the effect of making the distribution of

_{i}*Y*more concentrated about

_{i}*ν*. As a consequence the distribution of the difference

*ν*−

*μ*is also tighter. For these region–season combinations, and for a majority of those not shown, it is the case that the two models, with or without correlation, result in almost identical posterior distributions of temperature change. They represent cases in which the nine AOGCMs maintain the same behavior, when judged in terms of their absolute projections of future climate, or in terms of their temperature change signals. Those that can be defined as extreme are so in both respects, those that agree with each other, and form the consensus, do so in both respects.

For a few region–season combinations (we show three representatives: EAF, NEU, and NAUs in summer) the posterior densities are dramatically different for the two statistical models. This is the consequence of applying two different measures of convergence, and AOGCM results that are “outliers” with respect to one but are not so with respect to the other. In these instances, it matters which criteria of convergence we apply, and so the REA method is in agreement only with the dashed curves.

Since the introduction of the parameter *β _{x}* heavily influences the outcome of the analysis for some region– season combinations, it deserves a careful consideration. For most of the regions its posterior concentrates to the right of zero, indicating a significant positive correlation between present and future temperature responses of all AOGCMs. Figure 5 shows the range of distributions of

*β*for the six regions and two seasons. It is also interesting to notice that most of the mass of these posterior densities is concentrated around 1, supporting (as explained in section 2b) the REA finding of weak correlation between the temperature

_{x}*change*signal and bias.

The shapes of the densities in Figs. 2–4 vary significantly. Regular, unimodal curves are estimated in regions where there is obvious agreement among models, or where outlying models are downweighted due to large biases. Multimodal curves characterize regions where AOGCMs give disparate predictions, none of which can be discounted on the basis of model bias. These are also most often the cases when the results are sensitive to whether the basic model or the one with correlation is adopted. The value of presenting results in terms of a posterior distribution is obvious in the case of multimodal curves. For these densities the mean and median would not be good summaries of the distribution. Although we do not necessarily see these multimodal results as having an obvious physical interpretation, they definitely highlight the problematic nature of predictions in specific areas of the globe. For these regions our analysis gives more insight than traditional measures of central tendency and confidence intervals, and considering the sensitivity of the results to the statistical model assumptions is extremely relevant.

#### 2) Precision parameters

Figures 6 and 7 summarize the posterior distributions for the precision parameters *λ _{i}*, in the form of box plots (shown on a logarithmic scale, because of their high degree of skewness). In our analysis

*λ*s are the analog to the model weights in the REA's final estimates. However,

_{i}*λ*is for us a random variable, and the scoring of the nine AOGCMs should be assessed here through the relative position of the nine box plots, rather than by comparing point estimates. Comparing these distributions across models for a single region and season may suggest an ordering of the nine AOGCMs: large

_{i}*λ*s indicate that the distribution of the AOGCM response is more tightly concentrated around the true climate response. Thus, posterior distributions that are relatively shifted to the right indicate better performance by the AOGCMs associated to them than distributions shifted to the left. However, the large overlap among these distributions indicates that there is substantial uncertainty in the relative weighing of the models and cautions against placing too much faith in their ranking. Further substantiating this finding, we have calculated the posterior means for all

_{i}*λ*s and standardized them to percentages. They are listed in Tables 4 and 5. The tables show clearly that the nine AOGCMs are weighted differently in different regions and seasons. This suggests differential skill in reproducing regional present-day climate and a different degree of consensus among models for different regional signals of temperature change, again pointing to the need of evaluating an AOGCM over a collection of regions and types of climate, before placing too much or too little confidence in it. In this perspective, then, Table 6 presents our summary measure of the nine models' relative weighting, by listing the median rank of each model over the 22 regions, separately for the winter and summer seasons. This ranking is also in the same spirit as the overall reliability index produced by REA. (All the results presented here are from the model with correlation, where the posterior means of the

_{i}*λ*parameters have closer resemblance to the REA weights.)

_{i}#### 3) Inflation–deflation parameter

The parameter *θ* represents the inflation–deflation factor in the AOGCMs' precision when comparing simulations of present-day to future climate. When adopting the model with correlation the posterior for *θ* is always concentrated over a range of values greater than 1, as a consequence of the tightening effect of the form (11) described earlier. As for the posterior distributions derived from the basic model, we present them for each of the six regions in Fig. 8, in the form of box plots. For some of the regions the range is concentrated over values less than 1, suggesting a deterioration in the precision of the nine AOGCMs. Figure 8 indicates that each region tells a different story, and so does each season. For ENA, the simulations (i.e., their agreement) deteriorate in the future, and this holds true for both seasons. For NAU, especially in the summer, and for SSA in both seasons the contrary appears to be true. For ALA the behavior is different in winter (where a loss of precision is indicated) than in summer. For the two other regions, EAF and NEU, the inference is not as clear-cut. We cannot offer any insight into the reason why this is so, our expectation being that the projections in the future, in the absence of the regression structure (11), would be at least as variable around the central mean as the current temperatures simulations are, likely more so. Considering in detail the properties of the AOGCMs for these regions may help to interpret these statistical results. However, we point out that the assumption of a common parameter for all AOGCMs may be too strict, and these results may change if a richer dataset, with single-model ensembles, allowed us to estimate model-specific *θ* factors.

#### 4) Introducing heavy-tail distributions

We estimated alternative statistical models, with heavier-tail characteristics, by adopting Student's *t* distributions for *X _{i}* and

*Y*, as explained in section 2b.

_{i}Different values of degrees of freedom (*ϕ* in the notation of section 2b) were tested, from 1 (corresponding to heavy-tail distributions, more accommodating of outliers) to 64 (corresponding to a distribution that is nearly indistinguishable from a Gaussian). By comparing the posterior densities so derived, we can assess how robust the final results are to varying the assumptions on the tails of the distributions of *X _{i}*s and

*Y*s. It is the case for all regions that the overall range and shape of the temperature change distributions are insensitive to the degrees of freedom, and this holds true when testing either the basic model, or the model with correlation. Only for a small number of regions using one or two degrees of freedom produce posterior distributions of climate change that spread their mass toward the more isolated values among the nine AOGCM responses. The difference from the basic Gaussian model is hardly detectable in a graph comparing the posterior densities from different model formulations. We conclude that the posterior estimates are not substantially affected by the distributional assumptions on the tails for

_{i}*X*and

_{i}*Y*.

_{i}## 4. Discussion, extensions, and conclusions

### a. Overview of temperature change distributions

Figures 9 and 10 look at differences in the estimates of temperature change and uncertainty among regions, in the form of a series of box plots, sorted by their median values. This representation is useful for assessing different magnitudes of warming, and different degrees of uncertainty (variability) across regions and between seasons. Notice that all distributions are limited to the positive range, making the case for global warming. Warming in winter is on average higher than in summer, for all regions, and the high latitudes of the Northern Hemisphere are the regions with a more pronounced winter climate change. These all are by now undisputed results from many different studies of climate change (Cubasch et al. 2001). The variability of the distributions is widely different among regions, supporting the notion that for some regions the signal of climate change is stronger and less uncertain than for others. Regions such as ENA, SSA, eastern Asia (EAS), the Mediterranean (MED) in winter and West Africa (WAF) and southern Australia (SAU) in summer show extremely tight distributions predicting the number of degrees of warming with relative certainty. On the other hand, regions like northern Asia (NAS) and central Asia (CAS) in winter, EAF and west North America (WNA) in summer, and the Amazons (AMZ), central North America (CNA), and NEU in both seasons show a wide range of uncertainty in the degrees of warming. As we already mentioned, the width and shape of these distributions may be taken as a signal of the degree of agreement among AOGCMs over the temperature projection in the regions. Problematic regions, characterized by multimodal or simply diffuse distributions may suggest areas that merit special attention by the climate modeling community.

### b. Advantages of a Bayesian approach

Recent work (Raisanen and Palmer 2001; Giorgi and Mearns 2003) has addressed the need for probability forecasts, but offered only an assessment of the probability of exceeding thresholds, identified as the (weighted) fraction of the ensemble members by which the thresholds in question were exceeded. Moreover, in the case where each ensemble member is equally weighted, issues of model validation are ignored. A differential weighting of the members on the basis of the REA method is more appropriate but depends on a subjective formulation that is difficult to evaluate. In contrast, we think that the Bayesian approach is not only flexible but facilitates an open debate on the assumptions that generate probabilistic forecasts.

The results in this paper demonstrate how the quantitative information from a multimodel experiment can be organized in a coherent statistical framework based on a limited number of explicit assumptions. Our Bayesian analysis yields posterior probability distributions of all the uncertain quantities of interest. These posterior distributions for regional temperature change and for a suite of other parameters provide a wealth of information about AOGCM reliability and temporal (present to future) correlations. We deem this distributional representation more useful than a point estimate with error bars, because important features such as multimodality or long tails become evident. Also, the uncertainty quantified by the posterior distributions is an important product of this analysis and it is not easily recovered using non-Bayesian methods. We note here that the width of the posterior distributions is also a reflection of the limited number of data points that we used in estimating the parameters of the statistical model, particularly for the AOGCM-specific parameters *λ*_{1}, . . . , *λ*_{9}. Conversely, it is worth underlining that even if we start from extremely diffuse priors, we are able to estimate informative posterior distributions for all parameters.

### c. Sensitivity analysis of the statistical assumptions

The posterior distributions of regional temperature change in many region–season combinations may differ both in variance and shape, depending on the statistical model adopted, that is, when introducing the correlation structure between present and future model projections. These sensitivities suggest the need to enrich the experimental setting, thus formulating the statistical model as free as possible from constraints. For example, imposing a common correlation parameter *β _{x}* among models is a strong assumption that could be relaxed if more information were available. Similarly, single-model ensembles would make it possible to estimate the internal variability of the AOGCM and allows for the separation of intramodel versus intermodel variability. In addition, climatological information could be incorporated into the prior. The range of our posteriors for the present and future temperature means would remain the same in the presence of proper, but still uninformative priors (i.e., either uniform distributions limited to a physically reasonable range, or Gaussian distributions concentrating most of their mass over such ranges). However, a modeling group may have confidence that results in a particular region are positively biased, or relatively more accurate than in other regions, on the basis of separate experiments. This kind of information could be applied to the formulation of a different prior for that AOGCM's precision parameter. In our opinion, analyzing the robustness of results to model assumptions is as valuable as analysis of the results themselves, highlighting the weak parts of the statistical formulation, and pointing to the need of closer communication between modelers, climatologists, and statisticians in order to circumscribe the range of sensible assumptions. In this regard the Bayesian model presented here may be viewed as a device to foster more effective collaborations.

A natural extension to the current region-specific model is a model that introduces AOGCM-specific correlation between regions and thus borrows strength from all the regional temperature signals in order to estimate model variability and biases. As we discussed in section 1, working with higher-resolution AOGCM output, both spatially and temporally, will require a much more complex effort. Richer information on AOGCM performance as linked to specific regional/seasonal climate reproductions will have to be incorporated in the likelihood of the AOGCM's responses and/or the priors on the precision parameters. In addition, other climate responses may be considered, jointly with temperature. Precipitation is of course the obvious candidate, and a bivariate distribution of temperature and precipitation is a natural extension.

### d. About reliability criteria and statistical modeling

The criteria of bias and convergence established by Giorgi and Mearns (2002, 2003) and adopted in this work are commonly discussed in the climate change literature as relevant to evaluating climate change projections. However, one can question whether these two criteria are the most relevant for evaluating the reliability of projections of climate change. The ability of a model to reproduce the current climate is usually regarded as a necessary, but not sufficient condition for considering the model's response to future forcings as reliable (McAvaney et al. 2001). The convergence criterion has been advocated by some authors (Raisanen 1997; Giorgi and Francisco 2000) and can be theoretically derived by assuming that the observed AOGCMs represent a random sample from a superpopulation of AOGCMs. A pitfall in applying model convergence is that models may produce similar responses due to similarities in model structures, not because the models are converging on the “true” response. Furthermore, one may argue that extreme projections could be the result of a model incorporating essential feedback mechanisms that the majority of other models ignore. Yet, agreement among many models has been viewed as strengthening the likelihood of climate change (Cubasch et al. 2001; Giorgi et al. 2001a, b). As we noted in section 2b both the form of the REA weights (with the use of two different exponents for the bias and convergence terms) and the specification of the likelihood in our statistical model allow the final result to depend to a lesser degree on the convergence criterion. Besides changing the likelihood assumptions one could impose a prior distribution for the *θ* parameter that concentrates most of its mass on values less than 1. In this way, one posits a lower level of confidence in the precision of the future simulation than of the present, implicitly accepting a less stringent criterion of convergence for the future trajectories. The form of (9) shows that a value of *θ* less than 1 would down weight large deviations in the convergence term (*Y _{i}* −

*ν*)

^{2}, thus achieving a similar effect to the exponent in the REA weight.

From a more general perspective, however, our goal was to extend the REA method, not to reevaluate its criteria. It is expected that other criteria, and more complex combinations of criteria will evolve over time as this statistical work is refined. Much work is being devoted to more sophisticated and extensive methods of model performance evaluation and intermodel comparison (Hegerl et al. 2000; Meehl et al. 2000) and future analyses may modify or extend the two criteria of bias and convergence to account for them.

In conclusion, we view our results as an illustration of the power of bringing statistical modeling to experiments where quantifying uncertainty is an intrinsic concern. Moreover, we hope this work may foster more deliberate analysis that incorporates scientific knowledge of climate modeling at global and regional scales.

## Acknowledgments

This research was supported through the National Center for Atmospheric Research Inititiative on Weather and Climate Impact Assessment Science, which is funded by the National Science Foundation. Additional support was provided through NSF Grants DMS-0084375 and DMS-9815344. We are indebted to two anonimous reviewers and the editor for thoughtful and constructive criticisms that have greatly informed and added value to the paper in its final form. Also, we wish to thank Filippo Giorgi, Kevin Trenberth, Tom Wigley, and Art Dempster for helpful and cogent remarks on intermediate drafts.

## REFERENCES

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**.**

**,**

**,**

**,**

**,**

**,**

**,**

_{2}-induced warming pattern: MRI AOGCM experiment. Preprints, 10th Symp. on Global Change Studies, Dallas, TX, Amer. Meteor. Soc., 359–362

**,**

_{2}induced climate change in coupled GCM experiments.

**,**

**,**

**,**

**,**

**,**

**,**

**,**

### APPENDIX

#### Prior-Posterior Update through MCMC Simulation

The joint posterior distributions derived from the models in section 2 are not members of any known parametric family. However, the distributional forms (Gaussian, uniform, and gamma) chosen for the likelihoods and priors are conjugate, thus allowing for closed-form derivation of all full conditional distributions (the distributions of each parameter, as a function of the remaining parameters assuming fixed deterministic values). We list here such distributions, for the robust model that includes a correlation between *X _{i}* and

*Y*in the form of regression equation (11). The variables

_{i}*s*,

_{i}*t*= 1, . . . 9 are introduced here as an auxiliary randomization device, in order to efficiently simulate from the Student's

_{i}i*t*distributions within the Gibbs sampler. They are not essential parts of the statistical model. Fixing

*s*=

_{i}*t*= 1,

_{i}*β*= 0 allows the recovery of the full conditionals for

_{x}*λ*

_{1}, . . . ,

*λ*

_{9},

*μ*,

*ν,*and

*θ*of the basic univariate model as a special case:

Above, we have used the following shorthand notation:

The Gibbs sampler can be easily coded so as to simulate iteratively from this sequence of full conditional distributions.

After a series of random drawings during which the MCMC process forgets about the arbitrary set of initial values for the parameters (the burn-in period), the values sampled at each iteration represent a draw from the joint posterior distribution of interest, and any summary statistic can be computed to a degree of approximation, a direct function of the number of sampled values available and the inverse function of the correlation between successive samples. To minimize the latter, we save only one iteration result every 50, after running the sampler for a total of 500 000 iterations, and discarding the first half as a burn-in period. These many iterations are probably not needed for this particular application but by performing them we are eliminating any possibility of bias resulting from too few MCMC iterations. The convergence of the Markov chain to its stationary distribution (the joint posterior of interest) is verified by standard diagnostic tools (Best et al. 1995). [A self-contained version of the MCMC algorithm, implemented in the free software package R (R Development Core Team 2004), is available at www.cgd.ucar.edu/~nychka/REA.]

## Footnotes

*Corresponding author address:* Claudia Tebaldi, National Center for Atmospheric Research, P.O. Box 3000, Boulder, CO 80307-3000. Email: tebaldi@ucar.edu