## Abstract

The authors develop statistical data models to combine ensembles from multiple climate models in a fashion that accounts for uncertainty. This formulation enables treatment of model specific means, biases, and covariance matrices of the ensembles. In addition, the authors model the uncertainty in using computer model results to estimate true states of nature. Based on these models and principles of decision making in the presence of uncertainty, this paper poses the problem of superensemble experimental design in a quantitative fashion. Simple examples of the resulting optimal designs are presented. The authors also provide a Bayesian climate modeling and forecasting analysis. The climate variables of interest are Northern and Southern Hemispheric monthly averaged surface temperatures. A Bayesian hierarchical model for these quantities is constructed, including time-varying parameters that are modeled as random variables with distributions depending in part on atmospheric CO_{2} levels. This allows the authors to do Bayesian forecasting of temperatures under different Special Report on Emissions Scenarios (SRES). These forecasts are based on Bayesian posterior distributions of the unknowns conditional on observational data for 1882–2001 and climate system model output for 2002–97. The latter dataset is a small superensemble from the Parallel Climate Model (PCM) and the Community Climate System Model (CCSM). After summarizing the results, the paper concludes with discussion of potential generalizations of the authors’ strategies.

## 1. Introduction

Quantification and treatment of uncertainty in climate forecasting and assessments of impacts of climate change are critical problems. Discussions of the issues can be found in various Intergovernmental Panel on Climate Change (IPCC) publications (also see Allen et al. 2000; Reilly et al. 2001; Webster 2003). Among the many important challenges is the extraction of information from ensembles of climate model runs from multiple climate models in a fashion that enables users to provide uncertainty estimates [see Krishnamurti et al. (1999) for discussion of the use of such “superensembles”].

Bayesian approaches to climate change analysis are receiving increasingly significant attention (e.g., see Berliner et al. 2000a; Moss and Schneider 2000; Tebaldi et al. 2005). For brevity, we defer to those references and general discussions of Bayesian statistics (Berger 1985; Bernardo and Smith 1994) for discussion and motivation of the Bayesian approach in climate science and decision making.

Primary aspects of the strategies presented here are

treatment of climate model output as data (after developing the appropriate machinery, we proceed as if ensembles of model output are observations);

experimental design of superensembles;

focusing on selected aspects of climate as dictated by the forecasting problem at hand; and

modeling the role of influences on the climate system (i.e., atmospheric CO

_{2}) indirectly as effects on the parameters of stochastic climate models.

The design and analysis of computer experiments are subjects of substantial interest and research. To bring context to the first two aspects above, we offer a simplified and specialized perspective of the area. Santner et al. (2003) provide in-depth discussion and review. Most approaches are Bayesian or have a Bayesian flavor in that they involve the representation of and operation on uncertainty in probabilistic manners. Let *X* denote the unknowns of interest, in our case, the true state of the future climate variables under study, and let *Y* be the potential climate model output we may generate. Assume that both are to be modeled as random and arising from some joint probability distribution, say [*x*, *y*]. Probability theory dictates that such a distribution admits two representations:

where, for example, [*x*|*y*] is the conditional distribution of *X* given *Y* = *y* and [*y*] is the marginal distribution of *Y*. (Note that the equality [*x*|*y*][*y*] = [*y*|*x*][*x*] is actually a statement of Bayes’ theorem.) Typically, a variety of parameters are introduced in building these distributions and past data are incorporated, but we ignore these aspects for now.

The lion’s share of the statistical literature in this area involves the development of [*y*] as a function of selected model inputs (e.g., initial conditions, parameters, forcings, etc.). Based on the results of some model runs with specified inputs, inferences are drawn for model output corresponding to other inputs. Note that this has nothing to do with the values of *X*. It remains to construct a probability model [*x*|*y*]. This step is much like the usual development of prior distributions in Bayesian approaches. Indeed, one can in principle skip the step of producing [*y*] and directly construct a predictive model for *X* using “prior” information, including model output, and scientific judgment.

The suggestion in this article focuses on the second factorization of the joint distribution in (1). That is, we formulate a *data model* [*y*|*x*] and a *prior distribution* [*x*]. Once accomplished, we apply Bayes’ theorem to produce the *posterior distribution* [*x*|*y _{o}*], where

*y*is the actual model output we obtain. This distribution serves as the basis for inference and forecasting.

_{o}It is natural to ask if one approach is better than the other. The answer depends on how well we can formulate and assess the distributions used and, hence, is surely problem dependent. Computational issues and ease in interpretations are also relevant. In either case, assessments of models and their ability to simulate observable features of climate would obviously play a role, as do beliefs regarding the climate and the qualities of our models for it. However, in our forecasting context, natural approaches to model assessment and calibration based on observations are not directly available since we have no physical observations of future climate under CO_{2} Special Report on Emissions Scenarios (SRES). Hence, even the cleverest of model–observation comparison and assessment procedures leave uncertainty regarding model-based forecasting.

The use of conditional models [*x*|*y*] or [*y*|*x*] does not indicate a belief in causal relationships. The future climate is not impacted by our model computations (at least ignoring the possibility that model output is used to convince policy makers to act in some fashion, which modifies forcings and thereby impacts future climate). Similarly, our model computations are not determined by the future climate. In more familiar applications of Bayesian updating based on “real” observations, we compute the conditional distribution of the unknowns given those observations, but with no suggestion that those observations *cause* the unknowns to behave in a fashion consistent with that conditional distribution.

We conclude this portion of the discussion with the following perspective, or perhaps warning, regarding the strategy described here. The enterprise of climate forecasting under a fixed emissions scenario can be viewed as answering a “what if” question. Namely, what do we think the climate will be if we assume the given scenario obtains? Our treatment of climate model output as data essentially answers the following question: “Suppose it is the final time of the forecast period and the assumed emissions scenario attained. If we observed the climate model output as real data, what is our reconstruction of the climate?” We do not really believe that climate model output is as informative about the future climate as observations will be, but acting as if it is can lead to overconfidence in the results unless care is taken. We illustrate and clarify the issue in section 4. Further discussion of issues, suggestions, and references on modeling model output can be found in Bowman et al. (1993), Goldstein and Rougier (2005, 2006), Higdon et al. (2005), and Kennedy and O’Hagan (2001).

The third and fourth aspects above also merit clarification. The “climate” as represented in large-scale, coupled models is gigantic in scope and dimension. We believe that specifying uncertainty in model results taken in their entirety is intractable. For example, specifying, estimating, and then using the dependence structure (e.g., covariance matrix) of a random vector of dimension on the order of tens of millions is prohibitively complicated. For similar reasons, communication of results of climate change studies focus on features of the climate, such as sea level, suitably averaged surface temperature, etc. Also, studies regarding local and regional impacts typically require information on some subset of the climate. These points suggest that it is the information regarding the behavior of dimension-reduced reflections of the climate that we are able to use, understand, and communicate to scientists, policy makers, and the general public. In this article forecasts of low-dimensional functions of climate variables are the target of our uncertainty analysis. This means we do not suggest a single model and associated uncertainties that serve to answer all questions. Rather the analysis may vary with the goals. We return to this issue in section 5.

Our treatments regarding aspect 4 arise in our modeling strategy. We model climate variables as evolving according to a multiscale stochastic process. These models contain collections of parameters changing on differing scales. Our strategy uses covariate information (e.g., atmospheric CO_{2}) to model the values and space–time evolution of such parameters, as opposed to modeling climate variables directly. In part our motivation is the familiar issue of attributing local-in-time variations (e.g., “weather”) to human-induced climate change. We redefine the problem as one involving influences on the parameters of probability distributions as opposed to one involving local-in-time variations of realizations from these distributions (Berliner 2003).

Our strategy relies on Bayesian hierarchical modeling. The modifier hierarchical refers to formulating the probability models for collections of variables in stages as component distributions that imply a full distribution; (1) is a basic example. In our illustration, the climate variables of interest are hemispheric, monthly averaged surface temperatures. This setting has received considerable attention: see Kaufmann and Stern (1997), Smith et al. (2003), and Wigley et al. (1998). We consider two models, the Parallel Climate Model (PCM; Washington et al. 2000) and the Community Climate System Model (CCSM; Dai et al. 2001), and three of the familiar SRES scenarios. Separate forecasting analyses are performed for each scenario.

Section 2 presents a Bayesian formulation of a data model for combining superensembles. Implications of that data model to the design of superensemble experiments are reviewed in section 3. By design, we mean the selections of models and individual ensemble sizes used to create a superensemble. Such selections typically involve compromises between model quality and the allocation of computational resources. The issues are relevant in many contexts (e.g., weather forecasting). We have arranged the presentation so readers more interested in the modeling and example can skip section 3 with no loss in continuity.

In section 4 we develop and analyze models for hemispheric surface temperatures. We combine observational data from the period 1882–2001 with climate model output relevant to the period 2002–97. This exemplifies formal combination of observational data and climate model output into an analysis that accounts for uncertainty in each information source. Section 5 offers discussion of further issues, potential applications, and extensions.

## 2. Data model

### a. Derivation: Univariate case

We first develop a data model for a scalar climate variable *X* at a fixed time (dependence on that time is suppressed in the notation). The theory is described first, followed by some discussion of the implications of the model.

Consider output from *M* climate models all run based on the same emissions scenario. For each model *m*, *m* = 1, . . . , *M*, assume that an ensemble of size *n _{m}* is available, and let

**Y**

*denote the*

_{m}*n*vector of ensemble-member-derived values corresponding to

_{m}*X*from model

*m*. For example, if

*X*is globally averaged surface temperature, then elements of each

**Y**

*are average surface temperature computed from ensemble members.*

_{m}Notation used here includes

**Y**|**X**,**Σ**∼ Gau(**X**,**Σ**) is read “the conditional distribution of**Y**given**X**and**Σ**is Gaussian (multivariate normal) with mean**X**and covariance matrix**Σ**”;**1**is a_{k}*k*vector whose entries are all equal to one; and𝗜

is the_{k}*k*×*k*identity matrix.

Finally, we point out common but potentially confusing conventions. In writing a stage of a hierarchical model, say *Y*|*X*, *θ*, if one assumes that given *X* and *θ*, *Y* is independent of *θ*, it is standard to simply write *Y*|*X*. This does not mean *Y* is unconditionally independent of *θ*. Second, quantities assumed known throughout the modeling are typically not indicated in the conditioning.

We describe a two-stage hierarchical model for the climate model output. Each stage encapsulates different forms of uncertainty. The role of the Gaussian assumptions is discussed later.

In stage 1, we suppose that the means of the outputs of the models contain some *common* quantity *β* relevant to the true climate and model-dependent biases. Let **b** be the *M* vector of the biases:

*We assume that given β and ***b**, **Y**_{1}, . . . , **Y**_{M} are conditionally independent and for each m

Note that ensemble members within a model and across models are all assumed to be conditionally independent given *β* and **b**. The variances *σ*^{2}* _{Ym}* are model dependent. For example, they quantify variations around the means that arise due to differing initial and boundary conditions.

*In stage 2, we assume that β and b are conditionally independent given the true climate value X and have Gaussian distributions*

and

The elements of the prior mean vector

are user-specified prior guesses at the model biases, perhaps based on previous model verification studies. Note that prior model for the biases is assumed not to depend on *X*. The assumption that the biases are uncorrelated can be relaxed easily.

Probability theory can now be applied to extract the implied model for the data as functions of *X* (see Berger 1985, section 4.6). By multiplying the density functions implicitly defined in stages 1 and 2 and integrating out *β*, we obtain the marginal distribution of **Y**_{1}, . . . , **Y*** _{M}* conditional on

*X*and

**b**:

where 𝗖_{mm′} is an *n _{m}* ×

*n*

_{m′}matrix whose entries are all equal to

*σ*

^{2}

_{β}and

Similarly, we can integrate out **b** based on (5), yielding

where

#### 1) Remarks

If we drop all Gaussian assumptions and relax all conditional independence assumptions to conditionally uncorrelated assertions, the means and covariances given in (7) and (9) remain unchanged. This can be argued based on a *random effects* representation of the steps above. Letting all *e*’s indicated below be uncorrelated, zero-mean random errors, we have that ensemble member *k* from model *m* arises as

Computation of means, variances, and covariances can then be done applying their formal definitions. This approach may seem simpler than the full probabilistic arguments above. We emphasized the probability because (i) we use the hierarchy later, and (ii) the formal arguments can be applied in more complex settings to include nonlinearity, nonadditivity of errors, etc., though the random effects analogs may be intractable.

Finally, we view the model stages described here as examples. More informed models can be formulated to depend on how the ensemble members are generated. For example, suppose ensemble members are designed in a “two-way layout” with initial condition serving as one factor and some model parameterization or model physics serving as a second. We can readily write a model, which includes different biases for different model parameterizations, say *b _{j}*

_{(m)},

*j*= 1, . . . ,

*J*, where

_{m}*J*is the number of model parameterizations used in model

_{m}*m*. In general, we can entertain the sorts of models found in the statistical analysis of variance and regression literatures (e.g., Ravishanker and Dey 2002), followed by assigning priors on the modeled effects.

#### 2) Implications and issues

Implications of (7) and (9) include the indicated covariances between any two ensemble members from model *m*, as well as covariances between ensemble members from different models. These covariances arise despite the fact that no models in the hierarchy included any nonzero covariances. The common covariance *σ*^{2}_{β} between any two ensemble members from different models [see 𝗖_{mm′} in (7) and (9)] accrues mathematically from integrating out the one-dimensional quantity *β*. Intuitively, it arises because an unobservable “error” *X* − *β* is present in every ensemble member. Similarly, the further integration yielding (10) adds *σ*^{2}* _{bm}* to the covariances between ensemble members from the same model because the additional error

*b*−

_{m}*μ*is present in each of those ensemble members.

_{m}As we justify later, a key implication of (4) is that not even an infinite ensemble from a climate model can provide an exact climate forecast. The fundamental issue is our inclusion of *β* and its relationship to *X*. This idea is best indicated by comparison. If one simply ignores these quantities and averages ensemble members to obtain forecasts and uses sample variances to indicate uncertainty, our claim is that one is estimating *β* + *b _{m}* and

*σ*

^{2}

*[see (11)]. Indeed, an “infinite” ensemble from model*

_{Ym}*m*can provide the value of

*β*+

*μ*, but we claim that the relationship of that number to

_{m}*X*is the subject of the further uncertainty modeled in (13).

### b. Multiple climate variables

We assume that the derived climate variables of interest form a *d*-dimensional vector **X** and at a fixed time, each ensemble member is a *d*-dimensional vector (as before, these are computed from the actual model output). The basic structure of (9) carries over to the multivariate case. For brevity, we present only the results here, using the following notation. Let 𝗔 and 𝗕 be matrices of dimensions *p* × *q* and *r* × *s*, respectively. Their *Kronecker product* 𝗔 ⊗ 𝗕 is the *pr* × *qs* matrix (*a _{ij}*𝗕). For a collection of

*n*,

*d*-dimensional vectors, let vec(

**Y**) denote the (

*n*×

*d*)-dimensional vector obtained by stringing out the

*n*vectors.

An ensemble of size *n _{m}* from model

*m*is a collection of

*n*-dimensional vectors, combined to produce a (

_{m}d*d*×

*n*)-dimensional vector vec(

_{m}**Y**

*). These vectors are similarly combined yielding a (*

_{m}*d*×

*N*)-dimensional data vector vec(

**Y**), where

We consider multivariate analogs of (3), (4), and (5). Define *d* × *d* covariance matrices **Σ**_{β}, **Σ*** _{bm}*, and

**Σ**

*to be multivariate extensions of*

_{Ym}*σ*

^{2}

_{β},

*σ*

^{2}

*, and*

_{bm}*σ*

^{2}

*, respectively. Also, let*

_{Ym}**b**

*be the*

_{m}*d*vector of biases for model

*m*, with prior means

*. Finally, using the same conditional independence assumptions, we obtain*

**μ**_{m}and

The multivariate generalization of (9) is

where

## 3. Experimental design of superensembles

In much of our formalization of the design problem, we consider the estimation of **X** by the method of generalized least squares (GLS) applied to the data model (16). It follows that good designs make features of the associated covariance matrix of the GLS estimator small. We clarify this with some basic results.

### a. Univariate X and one climate model

We assume *d* = 1 and consider a single model, say model *m*. We can write (16) as

where

That is, the “prior-mean-bias-adjusted” ensemble members have the same marginal distributions and common correlation; they are said to be *exchangeable*. The GLS estimate of *X*, denoted by *X̂ _{m}*, based on an exchangeable ensemble works out to be the simple arithmetic average of the data:

Under Gaussian assumptions, we have

where

Derivations of these claims are discussed later. The indicated mean and variance of *X̂ _{m}* are maintained without Gaussian assumptions.

In concert with standard intuition, increasing ensemble size does decrease uncertainty, as reflected in decreases in var(*X̂ _{m}*|

*X*). However, the decreases are moderate compared to those associated with uncorrelated observations. Further, as the ensemble size

*n*tends to infinity, our uncertainty regarding

_{m}*X*as quantified by var(

*X̂*|

_{m}*X*) is bounded from below by

*σ*

^{2}

_{β}+

*σ*

^{2}

*. We find this comforting; we would not expect an infinite ensemble from any practical model to produce a perfect climate forecast. This also avoids potential embarrassments of infinite ensembles from different models yielding different “perfect” forecasts. We note that similar analyses and concerns also apply in general model-based forecasting, including ensemble forecasting in support of numerical weather prediction.*

_{bm}The correlation present in (19) responds to the relative variations in model error reflected in *σ*^{2}_{β} + *σ*^{2}* _{bm}* and the variation due to ensembling reflected in

*σ*

^{2}

*. Note that if*

_{Ym}*σ*

^{2}

*is very small relative to*

_{Ym}*σ*

^{2}

_{β}+

*σ*

^{2}

*, we can conclude that many ensembles members from a poor model are a little value. This is hardly a novel claim; the point is that our representations quantify what “many,” “poor,” and “little” mean. At the other extreme, an ensemble of size*

_{bm}*5*from a hypothetical perfect model (i.e., uncorrelated ensemble members) yields the same variance for the GLS estimate as an

*infinite*ensemble if

*ρ*= 0.20 (cf. Berger 1985, exercise 149, p. 307).

_{m}Next, suppose we believe *σ*^{2}_{β} + *σ*^{2}* _{bm}* = 1.09,

*σ*

^{2}

*= 1 and that we judge that achieving var(*

_{Ym}*X̂*|

_{m}*X*) = 1.10 is acceptable. It follows that we require an ensemble size of 100. Alternatively, suppose we could improve the model to yield

*σ*

^{2}

_{β}+

*σ*

^{2}

*= 1.00, without changing*

_{bm}*σ*

^{2}

*. Then, an ensemble of size 10 also yields var(*

_{Ym}*X̂*|

_{m}*X*) = 1.10. Such arguments can also be reversed (though the issue of also changing

*σ*

^{2}

*is relevant in all comparisons). That is, a very large ensemble from a simple, cheap model that does not grossly degrade model quality may be competitive with a necessarily small ensemble from an expensive though better model. Of course, this notion is discussed by climate scientists; again, we offer a relevant quantification for such discussion.*

_{Ym}### b. GLS in the general case

Recalling that vec(**Y**) denotes the vector of superensemble data, and defining **Y** to be the biased-corrected data, we can write the multivariate data model (16) as

where **Σ** is the *dN* × *dN* covariance matrix defined in (16). Applying least squares theory, the GLS estimate of **X** is

Under Gaussian assumptions, we have that the distribution of **X̂** given **X** is Gaussian:

where the *d* × *d* covariance matrix is

The expression in (27) is the theoretical result; its simplification to the expression in (28) is an exercise in linear algebra and reviewed in appendix A.

### c. Design and decision theory

Formal design of superensemble experiments relies on optimization of the expectation of a cost function that combines *inference loss* and the cost of the production and data processing of the superensemble. A quantification of this notion involves a definition of an expected cost or *risk*, say ℛ, of the form

where *L* is inference loss, **X̃** is our forecast of **X**, *E*(·) indicates expectation with respect to **X̃**, and perhaps **X**, and *C* is computational cost. Construction of such a criterion quantifies the trade-off of increasing the *n _{m}*, which presumably decreases our expected inference loss but increases our computational cost.

A common example of an inference loss is *quadratic loss:*

where *K* is some constant, which calibrates the loss to the context of the problem at hand, and 𝗤 is some *d* × *d* positive definite matrix. The role of 𝗤 is to allow for differential treatment of the coordinates of **X** depending on their importance. We let 𝗤 be the identity matrix for this discussion. Note that we generally require specification of *K*; at the least, it is used to insure that *L* and *C* are measured in the same units. For example, if *C* is in dollars and **X** represents temperature is degrees, then *K* is in dollars per degree squared. (Of course, we could rescale the computational cost to be in degree squared per dollar.)

There are two fundamental approaches to statistical decision problems (see Berger 1985 for more complete discussion). Operationally, they differ in the choice of the probability distribution under which the expectation in (29) is taken. In *frequentist* decision theory, one considers the random forecast or estimator **X̃**(**Y**) as a function of the yet-to-be-collected data **Y** and computes the expectation of the cost with respect to the conditional distribution of **Y** given **X** holding **X** fixed (i.e., the data model). If possible, one would then choose the best estimator as dictated by *L* and in turn the best design. This is not possible in general, since the result of the expectation is a function of **X**. Hence, some form of restricted optimization or modified criterion is necessary [e.g., minimax estimation, best estimation within the class of unbiased and linear-in-**Y** estimators (i.e., GLS), etc.].

The other approach is Bayesian. First, to find the optimal estimate of **X**, we minimize the expected loss computed under the posterior distribution of **X** given the data **Y**. This is a well-posed optimization problem; for quadratic loss, the result is the posterior mean of **X** given **Y**. For design purposes we then take the expectation of the conditional expected loss (a function of **Y** but not **X**) with respect to the marginal distribution of **Y** and optimize the result over all possible designs.

We begin with the frequentist approach (we call the approach frequentist, despite the fact that our data model is Bayesian, for ease in comparison to the decision theory literature). Note that the selection of GLS estimators as forecasts leads to an expected quadratic loss that is (i) not a function of the data and (ii) is simply the variance of the GLS estimator. Hence, optimization over candidate designs is well posed.

#### 1) Illustrations

The following examples all assume quadratic inference loss and *d* = 1. Though the univariate case may be an oversimplification, the results offer qualitative guidance for higher-dimensional cases. Note that knowledge of *σ*^{2}_{β} is not needed in these cases.

In example 1, *M* = 1. Assume each of our *n* ensemble members cost *c* units. From (22) and (29), we find the risk associated an ensemble of size *n* is

We easily obtain the optimal ensemble size

Of course *n** must be an integer. Interpretations of *n** and the optimized risk *K*(*σ*^{2}_{β} + *σ*^{2}_{b}) + 2*σ*_{Y}*Kc* in terms of *σ _{Y}*,

*K*, and

*c*are immediate.

In example 2, *M* > 1. Applying (28) for *d* = 1 and assuming each ensemble member from model *m* costs *c _{m}* units, the risk associated a superensemble of sizes

*n*

_{1}, . . . ,

*n*is

_{M}Example 3 involves “allocation designs.” By allocation design we mean specification of numbers *n _{m}*, assuming that the total computational cost is fixed. Note that basic properties of optimal allocations also carry over to the unconstrained case of example 2, since its solution produces some optimal computational cost and the allocations for that cost must be optimal. Also note that since the total cost is fixed, the optimization can be done without specification of

*K*.

For simplicity we assume the cost *c _{m}* is constant across models. Hence,

*N*is fixed, so we are to minimize the constrained problem

Solving the problem is a relatively simple integer programming problem. However, to see properties of solutions, we find approximate solutions acting as if the *n _{m}* are continuous. The resulting problem is amenable to the method of Lagrange multipliers. Candidate solutions can be obtained quickly by ignoring the inequality constraints 0 ≤

*n*≤

_{m}*N*for each

*m*. If the resulting candidate satisfies those inequalities, then it is a solution. Hence, we are to maximize

The resulting candidate solution is

In case 1, for each *m, σ*^{2}* _{Ym}* =

*σ*

^{2}

*. Then the proportion of ensemble members allocated to model*

_{bm}*m*is

This is the correct solution if

for each *m*, which we anticipate holding if the *σ*^{2}* _{Ym}* are all roughly of the same order. On the other hand, if one of these, say

*σ*

^{2}

_{Ym′}, is huge compared to the others, then depending on

*N*, the inequality constraint would be active and the optimal solution is to delete model

*m*′ from consideration. An alternative is to increase

*N*until the constraint is satisfied, though this may be impractical.

In case 2, for each *m*, *σ*^{2}* _{Ym}* =

*σ*

^{2}

_{Y}. Computing (36), we find

That is, the optimal fraction of *N* to allocate to model *m* is inversely proportional to *σ*^{2}* _{bm}*. In case 3, for each

*m*,

*σ*

^{2}

*=*

_{bm}*σ*

^{2}

_{b}. We find

This solution appears to be at odds with those of cases 1 and 2. In those cases, the general idea is to direct more ensembles to the models with “small uncertainty.” In case 3 it appears that one is to direct more ensembles to the high uncertainty models. This is essentially correct if *N* is sufficiently large, since then

This makes sense since *σ*^{2}* _{bm}* is the same for all

*m*; the solution tends to make the

*σ*

^{2}

*/*

_{Ym}*n*small. However, if

_{m}*N*is small and some

*σ*

^{2}

*are so large as to be poorly controlled even if all ensembles are directed to those*

_{Ym}*m*, it is better to ignore them and focus on the remaining models. The specifics are delicate in their dependence on

*σ*

^{2}

*,*

_{Ym}*σ*

^{2}

_{b}, and

*N*. For example, say

*M*= 2 and set

*σ*

^{2}

_{Y1}= 4,

*σ*

^{2}

_{Y2}= 100, and

*σ*

^{2}

_{b}= 4. For

*N*= 4, 10, and 94, the optimal (

*n**

_{1},

*n**

_{2}) are (4, 0), (5, 5), and (19, 75), respectively. These results are indicative of the potential value of combining complex and simple, though perhaps less precise, models.

#### 2) Extensions

First, we consider the Bayesian design in a special case. Suppose *d* = 1, our data model is Gaussian, and our prior for *X* is also Gaussian with variance *τ* ^{2}. It follows that the posterior variance of *X* conditional on the ensembles does not depend on the data or the prior mean. In particular, the posterior variance is

where

is precisely the quantity optimized in the frequentist approach. It is easy to check that var{*X*|vec(**Y**)} is a monotone increasing function of *σ*^{2}_{e}. Hence, the example results in cases 1–3 above are in fact Bayesian designs under these assumptions. Note that they do not depend on *τ* ^{2}. This seemingly happy circumstance needs not carry to the multivariate case.

For multivariate **X**, designs should be chosen to make cov(**X̂**|**X**) “small.” To make this mathematically precise, one selects some scalar-valued function of cov(**X̂**|**X**) and finds designs to minimize it. Traditional choices of that function include the trace and determinant of cov(**X̂**|**X**). (Note that choosing the trace enables optimal design without knowledge of **Σ*** _{β}*.) For brevity, we do not pursue this further in this article [see Berliner et al. (1999) for discussion in a weather forecasting context and general references].

Finally, the designs described so far depend on our prior for the model biases. This suggests that they are efficient but sensitive to the specifications of the *σ*^{2}* _{bm}*. An alternative approach that does not involve the prior on the biases is based on the following form of (7):

where 𝗗 is the *N* × (*M* + 1) matrix,

For *M* > 1, we can find the GLS estimator of the vector comprised by *X* and **b** and its covariance matrix. Hence, we can find designs that are good for simultaneously estimating *X* and **b**. Alternatively, viewing **b** as nuisance parameters and noting that the implied var(*X̂*|*X*, **b**) is independent of **b**, we can optimize it directly to obtain designs for estimating *X*.

## 4. Climate forecasting: Hemispheric temperatures

Let **X*** ^{p}* and

**X**

*denote past and future climate variable trajectories and*

^{f}**Y**

*and*

^{p}**Y**

*be corresponding data. The main object of interest in Bayesian climate forecasting is the posterior distribution of future climate variables conditional on the data:*

^{f}A useful strategy for developing this object begins with inclusion of **X*** ^{p}*. In general this allows application of dynamics and physical reasoning to relate

**X**

*to*

^{f}**X**

*, rather than attempting to relate*

^{p}**X**

*to*

^{f}**Y**

*directly. The plan is then to form a data model [*

^{p}**Y**

*,*

^{p}**Y**

*|*

^{f}**X**

*,*

^{p}**X**

*] and a process prior of the form [*

^{f}**X**

*|*

^{f}**X**

*][*

^{p}**X**

*]. Bayes’ theorem is applied to complete the analysis. Generally, the implementation of this suggestion involves many modeling steps, introduction of unknown parameters, and computational overhead.*

^{p}In our bivariate (*d* = 2) example the derived climate variables **X** are hemispheric- and monthly averaged surface temperatures for the period 1882–2097. The years 1882–2001 are designated the observation period. The model data **Y*** ^{f}* for the forecasting period 2002–97 are computed from ensembles from

*M*= 2 models, PCM and CCSM, labeled

*m*= 1, 2, respectively, and three SRES scenarios, A2, A1B, and B1 (CO

_{2}concentrations for these scenarios were obtained from the IPCC Web site). We have

*n*

_{1}= 4 ensembles members from PCM for each scenario and

*n*

_{2}= 1 ensemble member for scenarios A1B and B1 only. Finally, we deseasonalized each ensemble member by adjusting it for its monthly means for the first 30 yr of the forecast period.

Our models combine those developed by Y. Kim and L. M. Berliner (2006, unpublished manuscript, hereafter KB) with the data model reasoning for model output described in section 2. First, we develop some notation. Let *t* index months. The observational period includes months *t* = 1, . . . , *P*, *P* = 12 × 120 and the forecast period is months *t* = *P* + 1, . . . , *P* + *F*, *F* = 12 × 96. We let **X*** ^{f}_{t}* and

**X**

*be*

^{p}_{t}*d*= two-dimensional vectors of true Northern and Southern Hemispheric temperatures for month

*t*in the observation and forecast periods, respectively. Similarly,

**Y**

*is the two-dimensional vector of the corresponding observations in month*

^{p}_{t}*t*. Finally, vec(

**Y**

*) is the vector of hemispheric temperatures for month*

^{f}_{t}*t*obtained from the model output. Note that for scenarios A1B and B1, these data vectors are 2 × (4 + 1) = 10 dimensional, but 2 × 4 = 8 dimensional for scenario A2.

### a. Hierarchical model

#### 1) Data models

##### (i) Observations

We assumed that the **Y*** ^{p}_{t}*,

*t*= 1, . . . ,

*P*are conditionally independent given

**X**

*,*

^{p}_{t}*t*= 1, . . . ,

*P*and all other model parameters and that for each

*t*the conditional distribution of

**Y**

*given*

^{p}_{t}**X**

*and the parameters depends on*

^{p}**X**

*only through*

^{p}**X**

*. Preliminary data analysis suggested that the measurement error variance was comparatively large in the early portion of the observation period. Hence, we allowed for random changepoint, say*

^{p}_{t}*t*, and assumed Gaussian distributions leading to

_{c}where *t̃ _{c}* is the integer part of

*t*and the notation diag(. . .) denotes a matrix with the indicated quantities on the diagonal and off-diagonal elements all equal to zero. Priors used for the measurement error variances and

_{c}*t*are described in appendix B.

_{c}##### (ii) Model output

As in the case of the observational data, we assume conditional independence over time of the model output vec(**Y*** ^{f}_{t}*) conditional on

**X**

*and other model parameters and that vec(*

^{f}**Y**

*) conditional on*

^{f}_{t}**X**

*and the parameters depends on*

^{f}**X**

*only through*

^{f}**X**

*. Next, due to the limited number of ensembles available, we assume that the error covariance matrices*

^{f}_{t}**Σ**

*and*

_{β}**Σ**

*and*

_{bm}**Σ**

*,*

_{Ym}*m*= 1, 2, are constant in time. We assume that the model biases are locally constant in time. We defined five temporal periods: 2002–31, 2032–51, 2052–71, 2072–91, and 2092–97, indexed by

*L*= 1, . . . , 5, respectively. For each

*L*= 1, . . . , 5, we let

**b**

^{1}

_{L}(

**b**

^{2}

_{L}) be a 2 vector of the Northern Hemisphere (NH) and Southern Hemisphere (SH) biases for model 1 (2).

For each *t* in the forecast time period, *t* = *P* + 1, . . . , *P* + *F*, we assume that

where the model means * β_{t}* are modeled as

The prior model for coordinates of the model biases **b**^{1}_{L} and **b**^{2}_{L}, *L* = 1, . . . , 5 is that they are all independent and identically distributed Gaussians with mean 0 and variance 9; we used this prior for all three scenarios. The priors used for **Σ**_{Y1}, **Σ**_{Y2}, and **Σ**_{β} are discussed in appendix B. Finally, we assume that the observational data and the model output data are conditionally independent throughout time given the true climate variables and the model parameters.

#### 2) Process model

It is well understood that climate system displays a variety of space–time scales of behavior. Our suggestion is to use multiscale Bayesian hierarchical models, which allow the data to direct the treatment of scales of variation. In this article this is accomplished by using familiar time series models, but with space–time varying parameters. Of course, spatial modeling in two dimensions is not indicative of the issues in higher dimensions, so our real focus is on temporal scales. Furthermore, the example allows demonstration of how both *covariate* information, in our case CO_{2} data and the Southern Oscillation index (SOI), and dynamics of parameter evolution can be combined.

The main process model evolves monthly temperatures via a simple autoregressive time series model with local-in-time parameters. For each month *t* and conditional on indicated parameters, we assume that

where

and where the subscripts *i*, *j*, *k* indicate that the

*mean*parameters**α**_{i}_{(t)},*autoregression*parameters*η*_{j(t)}= {*η*_{j(t)n},*η*_{j(t)s}}′, and*process error variances**σ*^{2}_{k(t)}= {*σ*^{2}_{k(t)n},*σ*^{2}_{k(t)s}}′

are themselves time varying, each on their own time scales. Specifically, the full time period is partitioned into equal length subintervals *i* = 1, . . . , *I*. We assume that for all months *t* in subinterval *i*, **α**_{i}_{(t)} = * α_{i}*. Such local-in-time definitions are similarly applied to the autoregression parameters and process variances; the subscripts

*j*and

*k*indicate that they can be defined on different temporal subintervals. The NH and SH parameters need not vary on the same time scales, but we do not make such separations here.

Note that the inference and modeling problem is now fairly complex. Specifically, we not only have parameters to model, but lengths of the three sets of local subintervals are to be inferred. KB devote substantial attention to this aspect of model selection and the required calculations. In this article, we simply specify the locality definitions to be those KB found to best supported by the data in the observational period. Specifically, the mean parameters and process error variances are assumed to vary on an 8-yr period; the autoregressive parameters vary on a 2-yr period.

Finally, to implement (50), we need a distribution for **X**_{0}; that is, true temperatures for the month before our starting time. We simply assumed

Results are very insensitive to this assumption.

#### 3) Parameter models

For brevity we review our priors for the mean, autoregression, and process variance parameters here and provide a more complete discussion for the other parameters in appendix B.

##### (i) Mean parameters

Let *C _{i}* represent global average CO

_{2}concentrations averaged over time interval

*i*. All CO

_{2}values are assumed known in this article. Our model for the mean parameters is a regression onto CO

_{2}:

where the errors are modeled as an autoregressive time series as indicated in appendix B. We used this model in both the observation and forecast periods but assume that the **a*** _{α}* and

**b**

*are different in those periods. As needed, we use notation to indicate parameters in the forecast period.*

_{α}##### (ii) Autoregression parameters

We used very similar models for the regression parameters but found that the absolute value of the SOI was useful as a predictor during the observational period. [Recall that their interval length of constant values was set at 2 yr (see also Smith et al. 2003).]

For month *t* in the observation period, assume

where the errors **e*** _{ηj}* are modeled in (B4). This model requires modification in the forecast period. Potential options include 1) obtain ensemble-based estimates of a SOI using the original climate model output; 2) produce a more intense statistical model that forecasts SOI: this is analogous to

*parameterization*in numerical modeling; and 3) produce a simple statistical model, motivated by the SOI model, that makes no reference to SOI. We only pursue the third option here as described in appendix B.

##### (iii) Model error variances

Let * λ_{i}* = { ln(

*σ*

^{2}

_{in}), ln(

*σ*

^{2}

_{is})}′ (recall that these variances are assumed to vary on the same scale as the mean parameters). We assumed an autoregressive time series model:

Further details are given in appendix B.

### b. Results

Recall that the process model mean, autoregression, and variance parameters play a special role in our approach. We intend that they statistically control local-in-time climate and weather variations. Also, we seek to explain some of their variability using covariate information. Of particular importance in the forecast period is the behavior of the mean parameters modeled as regressions onto CO_{2} levels. Figure 1 presents temporal plots of 100 posterior realizations of the mean parameters for NH (Fig. 1a) and SH (Fig. 1b). In all cases, these plots indicate very small posterior variances; that is, the ensemble spread is tiny. We suggest that these plots strongly indicate the roles of CO_{2} concentrations in determining climate change. Intuitively, we have “filtered out” local variations, yielding comparatively low uncertainty indications of future climate.

Figure 2 shows analogous results for the autoregression parameters. Recall that the role of these parameters is to represent climatic variations occurring at moderate time scales (i.e., in the observational period, these parameters are modeled based on SOI). Note that their behavior is more variable in the observation period than in the forecast period. We expected this since SOI was available in the observation period; in fact, the level of variations in the forecast period was higher than we expected. This suggests that the models do produce some climate variations on this scale and that our model responded to them. Of course, we do not know if these variations will match future climate. We also note that variations depend on both the scenario used and the hemisphere considered.

Figure 3 presents the results for the process variances. Note that the variances are an order of magnitude higher in NH than in SH. We also see some general indications that the variances grow with CO_{2} in the forecast period. The growth of both the regression and variance parameters toward the later half of the twenty-first century reflect the notion of increasing variability of medium-range temperature behavior and weather variations due to climate change. We remark that the comparative flatness of variations in these quantities in the early portion of the forecast period could be an artifact of our selection of period 2002–31 as the basis of our deseasonalization of the climate model output.

Table 1 summarizes posterior results for the intercept and slope of the regression of process model means on CO_{2} [see (53)]. Note that for both hemispheres, the slopes during the observation period are an order of magnitude larger than those for the forecast period for all three scenarios. Another trend in each case is that the NH slopes are larger than the SH slopes. Since the units for CO_{2} are parts per million by volume (ppmv), the units for the slopes are °C (ppmv)^{−1}. (To calibrate the results in Table 3, note that, for NH and scenario A1B, the expected increase in the mean of temperature arising from an increase in CO_{2} of 100 ppmv is 0.676° ± 0.013°C, that is, plus and minus one standard deviation.)

We next consider the data model parameters. For the observational period recall that we modeled an unknown changepoint *t _{c}* in the measurement error variance. Its posterior mean is at year 1898.6 with a standard deviation of 4.14 yr. The posterior mean (standard deviation) of the measurement error variance for both NH and SH observations before the changepoint are 0.0015 (0.0003). The corresponding values after the changepoint are 6.3585 × 10

^{−05}(3.60 × 10

^{−06}) for NH and 6.3510 × 10

^{−05}(3.67 × 10

^{−06}) for SH.

Tables 2 and 3 summarize posterior results for model biases and the model error covariance matrices. One interesting observation is that PCM appears to have more bias but less variance than CCSM. A second note is that the variability quantified through **Σ*** _{β}* appears to be significantly smaller than that quantified by

**Σ**

_{Y1}and

**Σ**

_{Y2}.

Next, we review temperature forecasting results. In Fig. 4, we have plotted 100 trajectories of NH annually averaged temperatures simulated from the posterior distribution. The plot includes results for all three scenarios. We have also plotted the temperature observations through 2002, but note that they are not separable from the posterior realizations at the scale of this graphic. Figure 5 presents the results for SH. The spread in the trajectories provides a visual indication of variability or uncertainty. We note the forecasted trends and patterns of increases in temperatures, including dependence on the assumed scenario, are in rough agreement with those currently available at the IPCC Web site.

We now return to the issue mentioned in section 1 regarding potential overconfidence in posterior results. As a prelude, note that for fixed hemisphere, scenario, and model, the posterior standard deviations of the biases are decreasing as the lengths of epochs considered increase. Recalling that the priors were the same in all cases, a portion of this tendency is due to sample size, not the number of ensembles, but rather the lengths of epochs over which the biases are assumed to be constant. To indicate how such behavior can impact conclusions, Fig. 6 presents results analogous to those of Fig. 4 based on using the same model but with the assumption that the NH and SH biases are constant over the entire forecast period. Note how the basic form of these plots is similar, but toward the end of the century, the spread of the ensemble members in Fig. 6 is roughly one-half that in Fig. 4.

Our choice of five epochs and their lengths was ad hoc and chosen for illustration. We expect more epochs of shorter lengths would indicate even more uncertainty. On the other hand, as the epoch lengths become very small, the priors for the biases become critical and their selection becomes more challenging. However, at least in our case, transferring questions of climate change from actual temperatures to the parameters of the temperature probability distribution appears to have another advantage. Specifically, the inferences on the future means of temperatures as indicated in Fig. 1 remained virtually unchanged in both form and spread when the analysis is based on the assumption that the biases are constant over the entire forecast period.

### c. Computations

Exact computation of the posterior distribution is intractable in our case. All inferences presented above were obtained by Markov chain Monte Carlo (MCMC). This procedure simulates a specially constructed Markov chain whose stationary distribution coincides with the posterior distribution. After a burn-in or transience period, realizations of the chain are (approximately) identically distributed ensembles from the target posterior. They are then used to estimate posterior quantities of interest. The exact algorithm used is a fairly standard implementation of Gibbs’ sampling for time series problems. For brevity, we do not review this further here. Examples and discussion of MCMC in geophysical settings can be found in Berliner et al. (2003b), KB, and Wikle et al. (1998); see Gelman et al. (1995) for a general introduction.

One technical issue is mentioned. Though the MCMC samples are from the posterior distribution of interest, they are correlated, leading to impacts on estimation much like those discussed in section 2a. This is why estimation is based on such seemingly large MCMC ensemble sizes of 5000 iterations after burn-in times conservatively set to 10 000. The 100 trajectories plotted in each of the figures were based on every 50th of our 5000-member ensembles from the posterior.

## 5. Review and discussion

### a. Model output as data

A key feature of our statistical modeling of climate model output as described in section 2 is allowance for model-specific uncertainties regarding both the means, biases, and covariance properties of superensemble members. The crucial step to enable uncertainty analysis of superensembles is viewing the model-specific means as random variables with common mean ** β**. Further uncertainty is reflected in construction of a distribution for

**conditional on the true state of the climate variables of interest. We investigated implications of the models in inferring climate variables. Formally, this involved inspection of the marginal distribution of superensemble data conditional on the true climate state. A key result was that ensemble members are correlated in this marginal distribution. The correlation structures implied various limitations on our ability to infer the true climate states, even in the presence of large ensemble sizes.**

*β*The data modeling strategy can be generalized in several directions. As mentioned in section 2a, additional sources of model error and ensemble variation can be modeled. Next, though we illustrated the reasoning using Gaussian distributions, the strategy can be implemented with more general models and distributions. We used Gaussian distributions because they are familiar and led to comparatively simple calculations. Further, their results are readily amenable to our discussion of design of superensembles in section 3. Our message is the strategy, however.

A second direction for generalization involves consideration of different model classes. In section 2, all models were assumed to be similar in the sense that we used a single ** β** as a common mean. As an alternative example, suppose we consider two model classes: class A models might be expensive, large-scale climate system models such as PCM and CCSM, etc. Suppose class B models are comparatively cheap and simpler climate models, for example, energy balance models; statistical dynamic models, in the sense of climate scientists (see McGuffie and Henderson-Sellers 1997); or statistical models in the sense of section 4. This suggests introduction of model class means, say

*and*

**β**_{A}*, and associated model class covariances. In particular, we probably expect*

**β**_{B}**Σ**

*to be “smaller” than*

^{A}_{β}**Σ**

*. Note that classes A and B can be further partitioned into subclasses with subclass-specific distributions.*

^{B}_{β}As discussed at the end of section 4b, a critical issue involves the assumed temporal structure of data models. We did not pursue this aspect in more detail in our example because of the small number of ensembles and because the ensembles for each SRES scenario did not show any appreciable spread in time. Hence, we could not meaningfully separate temporal variations in bias from variations in variance without more data and/or very strong prior information dictating the form of such variations.

Temporal structure of the models is also important in designing superensembles targeted for inference of temporal trajectories of climate. Formal consideration of time-varying data model parameters can lead to high-dimensional, complex covariances matrices to be modeled and optimized, even if we wish to infer low-dimensional climate summaries. The difficulties may be somewhat lessened if we can model the requisite covariance matrices as simple functions of time (e.g., an unknown univariate function of time multiplying an unknown but constant matrix). In the case of large climate models, allowance for unknown covariances with arbitrary temporal evolution is problematic due to anticipated small ensemble sizes.

### b. Design

Section 3 provided a formalization of the problem of selecting models and ensemble sizes for the generation of superensembles. The formalization involved optimization of decision theoretic criteria based on losses accruing from errors in inference and experimental costs. Our illustrations focused on inferences judged by quadratic inference loss and hence relied on generalized least squares (GLS) estimation. Computational results were reviewed for the case of a univariate climate variable. We reiterate that these issues are also of interest in other contexts (e.g., numerical weather forecasting).

Our analyses in section 3c offer intuition regarding the behavior of optimal designs. More general treatment in the context of multivariate climate variables involves more intense computational challenges, but not more intense conceptualization. Though we do not envision application of the approach in the case of millions of variables, cases involving hundreds can be addressed. Computational costs for design can be optimized relatively cheaply, at the least by numerical search. Furthermore, suboptimal, but comparatively “good” designs, may be obtained even more easily.

As posed here, design required specification of various data model covariance matrices and information regarding model biases. Selection of these quantities can be based on past model assessments. When feasible, study of the sensitivity of optimal designs with respect to those specifications is recommended. Also, since analyses such as those presented in section 4 produce posterior distributions of these covariances and biases, they can be viewed as preliminary studies to develop information about important model behaviors. Indeed, using either our simple model or some other simple model, the archived output from many models widely available to the climate community can be used to gain some information about the data model parameters. Such information seems very relevant and of interest to climate forecasters. Finally, recall that we recommended severe reduction in the dimension of **X** from that of the dimension of climate system models. This enables both formal analysis combining information sources as well as problem-specific optimal design. On the other hand, generation of ensemble members from massive models remains too expensive to imagine running different ensembles for every interesting climate and climate change impacts study. We suggest that community superensembles from class A models may be generated optimally for some selection of **X** thought to be robust for many potential uses. With our techniques, users with special problems and needs can then design an “experiment” consisting of samples from this (or any) archive, augmented by superensembles from class B models designed to depend on specific goals.

### c. Overlapping observations and model output

In principle, improved estimation of model output data models and their parameters can be obtained by *overlapping* observation data times and model output times. In some cases, this is easy in our framework, at least under the assumption that the observations and model outputs are conditionally independent. Furthermore, running the Bayesian model without overlap and seeing how well the results match observations, up to measurement error, can serve as a model diagnostic. However, in our specific setting, the real observations correspond to a climate under the influence of true forcings, while model output was generated under SRES scenarios. Hence, differences occurring in an overlap period may be due to both model error and forcing variations. More intense modeling would be required to try to separate these effects.

### d. Concluding remarks

The Bayesian climate forecasting approach outlined at the outset of section 4 serves as a target for analysis. Two key points merit discussion. First, our forecast distribution (45) is actually the marginal predictive distribution obtained formally by integrating the full posterior

with respect to **X*** ^{p}*. Note that (56) means that under the model,

**Y**

*is informative about*

^{f}**X**

*, as in backcasting or reanalysis. However, using climate model output generated from different CO*

^{p}_{2}scenarios to infer

**X**

*makes little sense to us. In our example, the posterior distribution of the past was not affected by*

^{p}**Y**

*computed under any of the scenarios, in large part due to our allowing the process model parameters and their priors to be different in the observation and forecast periods.*

^{f}The second key point is that, since the climate model output is treated as data, the Bayesian approach relies on specification of a prior process model [**X*** ^{f}*], or in our case, [

**X**

*|*

^{f}**X**

*] since we use past data. This means that results may vary with selection of these priors. A related issue in the multimodel context is selection of which model to use as the prior as opposed to a data source. One potential guide in this selection involves the plausibilities of the models as stand-alone climate forecasters. For example, we used our simple model because it offers a flexible structure with parameters that can be readily influenced by data and related to covariates of interest. However, we would not propose generation of ensembles of output from the forecast distribution [*

^{p}**X**

*|*

^{f}**Y**

*] based on our simple model.*

^{p}This topic introduces an element of subjectivity to the strategy, but this has already arisen in our data modeling. For this reason, it may be tempting to suggest that climate forecasters avoid explicit methods, such as the strategy suggested here. We counter that simple use of ensemble means and variances to construct forecasts and forecast error quantifications or confidence intervals coincides with implicit data models, which assume the ensemble members are independent, identically distributed (and unbiased) observations of climate.

A related suggestion to manage the issue of selection of a process model prior is to maintain the data modeling much as presented here, but to use an *uninformative*, typically uniform, process prior for **X*** ^{f}*. When this is implementable, the results would be useful benchmarks. Indeed, we used uninformative priors on some model parameters introduced in section 4 (see appendix B). Further, the formulas for GLS estimation presented in section 3 [e.g., (25), (28)] are Bayesian posterior means and covariances for climate based on such uniform priors. However, their implementation requires either knowledge and/or estimation of all data model covariances. Hence, one either assumes their values (objectively?) or faces the additional problems, including loss of desirable properties of GLS and accounting for uncertainty due to covariance estimation.

An argument for maintaining informed priors for **X*** ^{f}* involves applications to forecasting impacts of climate change. Suppose we are really interested in some quantification, say

**R**

*, of some system (e.g., biological, ecological, agricultural, or regional climatic) that depends in part on the climate variables*

^{f}**X**

*. The object of interest is then*

^{f}where [**R*** ^{f}*|

**X**

*] is a probability model for the impacts of interest conditional on climate variables. Typically, this distribution would be a parameterized model, developed in part much like our development of [*

^{f}**X**

*|*

^{f}**X**

*] and [*

^{p}**X**

*]. Success may well require the best, most informed modeling of all stages of such a complex hierarchical model. We also note that formal study of the impact of prior assumptions is an area of research in Bayesian statistics. For discussion of*

^{p}*robust Bayesian analysis*in a climate analysis setting, see Berliner et al. (2000a) and see Berger (1985, section 4.7) for a general introduction.

In summary, our views regarding objectivity versus subjectivity in the context of forecasting climate and climate impacts are (i) the use of uninformative priors offers useful benchmark analyses, though total objectivity is not achievable; (ii) informed analyses may be more efficient than their uninformed counterparts in combining science, model output, and observational data; and (iii) objectivity does not need to correspond to plausibility.

## Acknowledgments

LMB and YK were supported by the NSF under Grant DMS-01-39897. YK was also partially supported by the Department of Statistics, The Ohio State University. We are grateful to Dr. Claudia Tebaldi at NCAR for providing the climate model output. Excellent comments and critiques from two reviewers led to substantial improvements in this article.

## REFERENCES

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

### APPENDIX A

#### Derivation of (28)

Our derivation of (28) relies on a result from linear algebra.

##### Lemma

If 𝗔 is a *dN* × *dN* invertible matrix, and 𝗕 is *dN* × *d* matrix, then

Note that

where

and, similarly,

Hence, we can apply the lemma to calculate the inverses of each **A*** _{m}* and

**Σ**. After some algebra we obtain

where

Next,

where

where

Finally,

### APPENDIX B

#### Parameter Models

##### Data model parameters

###### Observation error variances

All four measurement error variances in (46) were assigned independent inverse gamma priors (e.g., Berger 1985, p. 561). Inverse gamma distributions can be chosen to match one’s prior mean and variance and are also convenient for calculations. We assumed prior mean (standard deviation) 0.0303 (0.0003) for both *σ*^{2}_{0n} and *σ*^{2}_{0s} and prior mean (standard deviation) 0.0152 (0.0002) for both *σ*^{2}_{1n} and *σ*^{2}_{1s}.

###### Changepoint in observation error variances

We assumed that *t _{c}* is a Gaussian random variable with mean corresponding to January 1900 and standard deviation corresponding to 7 yr.

###### Model error covariances

We use a *d*-dimensional generalization of the inverse gamma known as the inverse Wishart distribution (e.g., Box and Tiao 1973, section 8.5.1). A random *d* × *d* covariance matrix **Σ** has an inverse Wishart prior, denoted *W*^{−1} (**Σ**_{0}, *υ*), if its prior density *π*(·) is

where **Σ**_{0} is a fixed, positive definite covariance matrix. For *ν* > 2, the mean of this prior is *ν*(*ν* − 2)^{−1 }**Σ**_{0}. The *degrees of freedom* parameter *ν* controls the spread of the distribution. For *ν* < 2 the density is too diffuse for the mean of the distribution to exist; for *ν* a little greater than 2, the density is relatively diffuse, and as *ν* grows it becomes increasingly concentrated about the mean.

For our prior, we assume that **Σ**_{Y1} and **Σ**_{Y2} are independent realizations from a *W*^{−1}(**Σ**_{0}, *ν* = 50) distribution, where **Σ**_{0} = diag(2, 1). We assume that **Σ*** _{β}* is independently distributed from a

*W*

^{−1}(

**Σ**

_{0}

*,*

_{β}*ν*= 10) distribution, with

_{β}**Σ**

_{0}

*= diag(2, 1). Table 2 indicates that these priors were completely overwhelmed by the data.*

_{β}##### Process model parameters

###### Mean parameters

Recalling (53) we assumed * α_{i}* =

**a**

*+*

_{α}*C*

_{i}**b**

*+*

_{α}**e**

*, where the errors are modeled as an autoregressive time series:*

_{αi}where

As for **a*** _{α}* and

**b**

*, we allow for*

_{α}*ϕ*,

_{αn}*ϕ*, and

_{αs}**Σ**

*to be different in the observation and forecast periods. We use notation to indicate these parameters in the forecast period. We assumed that*

_{α}**a**,_{α}**b**,_{α}**ã**_{α}, and**b̃**_{α}have diffuse, uniform priors;*ϕ*,_{αn}*ϕ*,_{αs}*ϕ̃*, and_{αn}*ϕ̃*are independent and uniformly distributed on the interval (−1, 1); and_{αs}**Σ**and_{α}**Σ̃**_{α}are independent and have*W*^{−1}(**Σ**_{0}=**I**_{2},*ν*= 0.001) distributions; recall that with*υ*this small, the distribution is very diffuse.

###### Autoregression parameters

For month *t* in the observation period, we assume [see (54)] *η** _{j}* =

**a**

*+ |*

_{η}*S*|

_{j}**b**

*+*

_{η}**e**

*, where*

_{ηj}and

In the forecast period, we replace (54) by

where

Discussion of how this model can be related to (54) is given in KB. We assumed that

**a**,_{η}**b**,_{η}**ã**_{η}, and**b̃**_{η}have diffuse, uniform priors;*ϕ*,_{ηn}*ϕ*,_{ηs}*ϕ̃*, and_{ηn}*ϕ̃*are independent and uniformly distributed on the interval (−1, 1); and_{ηs}**Σ**and_{α}**Σ**are independent and have_{α}*W*^{−1}(**Σ**_{0}= 𝗜_{2},*ν*= 0.001) distributions.

###### Model error variances

Recall the autoregressive time series model (55):

We assume that

As above, we assumed the same model for both periods but with different parameters. We also used the vague priors analogous to those used for the mean parameter model. Table 3 summarizes posterior results for these parameters.

## Footnotes

*Corresponding author address:* L. Mark Berliner, Department of Statistics, The Ohio State University, 1958 Neil Avenue, Columbus, OH 43210-1247. Email: mb@stat.osu.edu