Abstract

The authors develop statistical data models to combine ensembles from multiple climate models in a fashion that accounts for uncertainty. This formulation enables treatment of model specific means, biases, and covariance matrices of the ensembles. In addition, the authors model the uncertainty in using computer model results to estimate true states of nature. Based on these models and principles of decision making in the presence of uncertainty, this paper poses the problem of superensemble experimental design in a quantitative fashion. Simple examples of the resulting optimal designs are presented. The authors also provide a Bayesian climate modeling and forecasting analysis. The climate variables of interest are Northern and Southern Hemispheric monthly averaged surface temperatures. A Bayesian hierarchical model for these quantities is constructed, including time-varying parameters that are modeled as random variables with distributions depending in part on atmospheric CO2 levels. This allows the authors to do Bayesian forecasting of temperatures under different Special Report on Emissions Scenarios (SRES). These forecasts are based on Bayesian posterior distributions of the unknowns conditional on observational data for 1882–2001 and climate system model output for 2002–97. The latter dataset is a small superensemble from the Parallel Climate Model (PCM) and the Community Climate System Model (CCSM). After summarizing the results, the paper concludes with discussion of potential generalizations of the authors’ strategies.

1. Introduction

Quantification and treatment of uncertainty in climate forecasting and assessments of impacts of climate change are critical problems. Discussions of the issues can be found in various Intergovernmental Panel on Climate Change (IPCC) publications (also see Allen et al. 2000; Reilly et al. 2001; Webster 2003). Among the many important challenges is the extraction of information from ensembles of climate model runs from multiple climate models in a fashion that enables users to provide uncertainty estimates [see Krishnamurti et al. (1999) for discussion of the use of such “superensembles”].

Bayesian approaches to climate change analysis are receiving increasingly significant attention (e.g., see Berliner et al. 2000a; Moss and Schneider 2000; Tebaldi et al. 2005). For brevity, we defer to those references and general discussions of Bayesian statistics (Berger 1985; Bernardo and Smith 1994) for discussion and motivation of the Bayesian approach in climate science and decision making.

Primary aspects of the strategies presented here are

  1. treatment of climate model output as data (after developing the appropriate machinery, we proceed as if ensembles of model output are observations);

  2. experimental design of superensembles;

  3. focusing on selected aspects of climate as dictated by the forecasting problem at hand; and

  4. modeling the role of influences on the climate system (i.e., atmospheric CO2) indirectly as effects on the parameters of stochastic climate models.

The design and analysis of computer experiments are subjects of substantial interest and research. To bring context to the first two aspects above, we offer a simplified and specialized perspective of the area. Santner et al. (2003) provide in-depth discussion and review. Most approaches are Bayesian or have a Bayesian flavor in that they involve the representation of and operation on uncertainty in probabilistic manners. Let X denote the unknowns of interest, in our case, the true state of the future climate variables under study, and let Y be the potential climate model output we may generate. Assume that both are to be modeled as random and arising from some joint probability distribution, say [x, y]. Probability theory dictates that such a distribution admits two representations:

 
formula

where, for example, [x|y] is the conditional distribution of X given Y = y and [y] is the marginal distribution of Y. (Note that the equality [x|y][y] = [y|x][x] is actually a statement of Bayes’ theorem.) Typically, a variety of parameters are introduced in building these distributions and past data are incorporated, but we ignore these aspects for now.

The lion’s share of the statistical literature in this area involves the development of [y] as a function of selected model inputs (e.g., initial conditions, parameters, forcings, etc.). Based on the results of some model runs with specified inputs, inferences are drawn for model output corresponding to other inputs. Note that this has nothing to do with the values of X. It remains to construct a probability model [x|y]. This step is much like the usual development of prior distributions in Bayesian approaches. Indeed, one can in principle skip the step of producing [y] and directly construct a predictive model for X using “prior” information, including model output, and scientific judgment.

The suggestion in this article focuses on the second factorization of the joint distribution in (1). That is, we formulate a data model [y|x] and a prior distribution [x]. Once accomplished, we apply Bayes’ theorem to produce the posterior distribution [x|yo], where yo is the actual model output we obtain. This distribution serves as the basis for inference and forecasting.

It is natural to ask if one approach is better than the other. The answer depends on how well we can formulate and assess the distributions used and, hence, is surely problem dependent. Computational issues and ease in interpretations are also relevant. In either case, assessments of models and their ability to simulate observable features of climate would obviously play a role, as do beliefs regarding the climate and the qualities of our models for it. However, in our forecasting context, natural approaches to model assessment and calibration based on observations are not directly available since we have no physical observations of future climate under CO2 Special Report on Emissions Scenarios (SRES). Hence, even the cleverest of model–observation comparison and assessment procedures leave uncertainty regarding model-based forecasting.

The use of conditional models [x|y] or [y|x] does not indicate a belief in causal relationships. The future climate is not impacted by our model computations (at least ignoring the possibility that model output is used to convince policy makers to act in some fashion, which modifies forcings and thereby impacts future climate). Similarly, our model computations are not determined by the future climate. In more familiar applications of Bayesian updating based on “real” observations, we compute the conditional distribution of the unknowns given those observations, but with no suggestion that those observations cause the unknowns to behave in a fashion consistent with that conditional distribution.

We conclude this portion of the discussion with the following perspective, or perhaps warning, regarding the strategy described here. The enterprise of climate forecasting under a fixed emissions scenario can be viewed as answering a “what if” question. Namely, what do we think the climate will be if we assume the given scenario obtains? Our treatment of climate model output as data essentially answers the following question: “Suppose it is the final time of the forecast period and the assumed emissions scenario attained. If we observed the climate model output as real data, what is our reconstruction of the climate?” We do not really believe that climate model output is as informative about the future climate as observations will be, but acting as if it is can lead to overconfidence in the results unless care is taken. We illustrate and clarify the issue in section 4. Further discussion of issues, suggestions, and references on modeling model output can be found in Bowman et al. (1993), Goldstein and Rougier (2005, 2006), Higdon et al. (2005), and Kennedy and O’Hagan (2001).

The third and fourth aspects above also merit clarification. The “climate” as represented in large-scale, coupled models is gigantic in scope and dimension. We believe that specifying uncertainty in model results taken in their entirety is intractable. For example, specifying, estimating, and then using the dependence structure (e.g., covariance matrix) of a random vector of dimension on the order of tens of millions is prohibitively complicated. For similar reasons, communication of results of climate change studies focus on features of the climate, such as sea level, suitably averaged surface temperature, etc. Also, studies regarding local and regional impacts typically require information on some subset of the climate. These points suggest that it is the information regarding the behavior of dimension-reduced reflections of the climate that we are able to use, understand, and communicate to scientists, policy makers, and the general public. In this article forecasts of low-dimensional functions of climate variables are the target of our uncertainty analysis. This means we do not suggest a single model and associated uncertainties that serve to answer all questions. Rather the analysis may vary with the goals. We return to this issue in section 5.

Our treatments regarding aspect 4 arise in our modeling strategy. We model climate variables as evolving according to a multiscale stochastic process. These models contain collections of parameters changing on differing scales. Our strategy uses covariate information (e.g., atmospheric CO2) to model the values and space–time evolution of such parameters, as opposed to modeling climate variables directly. In part our motivation is the familiar issue of attributing local-in-time variations (e.g., “weather”) to human-induced climate change. We redefine the problem as one involving influences on the parameters of probability distributions as opposed to one involving local-in-time variations of realizations from these distributions (Berliner 2003).

Our strategy relies on Bayesian hierarchical modeling. The modifier hierarchical refers to formulating the probability models for collections of variables in stages as component distributions that imply a full distribution; (1) is a basic example. In our illustration, the climate variables of interest are hemispheric, monthly averaged surface temperatures. This setting has received considerable attention: see Kaufmann and Stern (1997), Smith et al. (2003), and Wigley et al. (1998). We consider two models, the Parallel Climate Model (PCM; Washington et al. 2000) and the Community Climate System Model (CCSM; Dai et al. 2001), and three of the familiar SRES scenarios. Separate forecasting analyses are performed for each scenario.

Section 2 presents a Bayesian formulation of a data model for combining superensembles. Implications of that data model to the design of superensemble experiments are reviewed in section 3. By design, we mean the selections of models and individual ensemble sizes used to create a superensemble. Such selections typically involve compromises between model quality and the allocation of computational resources. The issues are relevant in many contexts (e.g., weather forecasting). We have arranged the presentation so readers more interested in the modeling and example can skip section 3 with no loss in continuity.

In section 4 we develop and analyze models for hemispheric surface temperatures. We combine observational data from the period 1882–2001 with climate model output relevant to the period 2002–97. This exemplifies formal combination of observational data and climate model output into an analysis that accounts for uncertainty in each information source. Section 5 offers discussion of further issues, potential applications, and extensions.

2. Data model

a. Derivation: Univariate case

We first develop a data model for a scalar climate variable X at a fixed time (dependence on that time is suppressed in the notation). The theory is described first, followed by some discussion of the implications of the model.

Consider output from M climate models all run based on the same emissions scenario. For each model m, m = 1, . . . , M, assume that an ensemble of size nm is available, and let Ym denote the nm vector of ensemble-member-derived values corresponding to X from model m. For example, if X is globally averaged surface temperature, then elements of each Ym are average surface temperature computed from ensemble members.

Notation used here includes

  • Y|X, Σ ∼ Gau(X, Σ) is read “the conditional distribution of Y given X and Σ is Gaussian (multivariate normal) with mean X and covariance matrix Σ”;

  • 1k is a k vector whose entries are all equal to one; and

  • 𝗜k is the k × k identity matrix.

Finally, we point out common but potentially confusing conventions. In writing a stage of a hierarchical model, say Y|X, θ, if one assumes that given X and θ, Y is independent of θ, it is standard to simply write Y|X. This does not mean Y is unconditionally independent of θ. Second, quantities assumed known throughout the modeling are typically not indicated in the conditioning.

We describe a two-stage hierarchical model for the climate model output. Each stage encapsulates different forms of uncertainty. The role of the Gaussian assumptions is discussed later.

In stage 1, we suppose that the means of the outputs of the models contain some common quantity β relevant to the true climate and model-dependent biases. Let b be the M vector of the biases:

 
formula

We assume that given β and b, Y1, . . . , YM are conditionally independent and for each m

 
formula

Note that ensemble members within a model and across models are all assumed to be conditionally independent given β and b. The variances σ2Ym are model dependent. For example, they quantify variations around the means that arise due to differing initial and boundary conditions.

In stage 2, we assume that β and b are conditionally independent given the true climate value X and have Gaussian distributions

 
formula

and

 
formula

The elements of the prior mean vector

 
formula

are user-specified prior guesses at the model biases, perhaps based on previous model verification studies. Note that prior model for the biases is assumed not to depend on X. The assumption that the biases are uncorrelated can be relaxed easily.

Probability theory can now be applied to extract the implied model for the data as functions of X (see Berger 1985, section 4.6). By multiplying the density functions implicitly defined in stages 1 and 2 and integrating out β, we obtain the marginal distribution of Y1, . . . , YM conditional on X and b:

 
formula

where 𝗖mm is an nm × nm matrix whose entries are all equal to σ2β and

 
formula

Similarly, we can integrate out b based on (5), yielding

 
formula

where

 
formula

1) Remarks

If we drop all Gaussian assumptions and relax all conditional independence assumptions to conditionally uncorrelated assertions, the means and covariances given in (7) and (9) remain unchanged. This can be argued based on a random effects representation of the steps above. Letting all e’s indicated below be uncorrelated, zero-mean random errors, we have that ensemble member k from model m arises as

 
formula
 
formula

Computation of means, variances, and covariances can then be done applying their formal definitions. This approach may seem simpler than the full probabilistic arguments above. We emphasized the probability because (i) we use the hierarchy later, and (ii) the formal arguments can be applied in more complex settings to include nonlinearity, nonadditivity of errors, etc., though the random effects analogs may be intractable.

Finally, we view the model stages described here as examples. More informed models can be formulated to depend on how the ensemble members are generated. For example, suppose ensemble members are designed in a “two-way layout” with initial condition serving as one factor and some model parameterization or model physics serving as a second. We can readily write a model, which includes different biases for different model parameterizations, say bj(m), j = 1, . . . , Jm, where Jm is the number of model parameterizations used in model m. In general, we can entertain the sorts of models found in the statistical analysis of variance and regression literatures (e.g., Ravishanker and Dey 2002), followed by assigning priors on the modeled effects.

2) Implications and issues

Implications of (7) and (9) include the indicated covariances between any two ensemble members from model m, as well as covariances between ensemble members from different models. These covariances arise despite the fact that no models in the hierarchy included any nonzero covariances. The common covariance σ2β between any two ensemble members from different models [see 𝗖mm in (7) and (9)] accrues mathematically from integrating out the one-dimensional quantity β. Intuitively, it arises because an unobservable “error” Xβ is present in every ensemble member. Similarly, the further integration yielding (10) adds σ2bm to the covariances between ensemble members from the same model because the additional error bmμm is present in each of those ensemble members.

As we justify later, a key implication of (4) is that not even an infinite ensemble from a climate model can provide an exact climate forecast. The fundamental issue is our inclusion of β and its relationship to X. This idea is best indicated by comparison. If one simply ignores these quantities and averages ensemble members to obtain forecasts and uses sample variances to indicate uncertainty, our claim is that one is estimating β + bm and σ2Ym [see (11)]. Indeed, an “infinite” ensemble from model m can provide the value of β + μm, but we claim that the relationship of that number to X is the subject of the further uncertainty modeled in (13).

b. Multiple climate variables

We assume that the derived climate variables of interest form a d-dimensional vector X and at a fixed time, each ensemble member is a d-dimensional vector (as before, these are computed from the actual model output). The basic structure of (9) carries over to the multivariate case. For brevity, we present only the results here, using the following notation. Let 𝗔 and 𝗕 be matrices of dimensions p × q and r × s, respectively. Their Kronecker product 𝗔 ⊗ 𝗕 is the pr × qs matrix (aij𝗕). For a collection of n, d-dimensional vectors, let vec(Y) denote the (n × d)-dimensional vector obtained by stringing out the n vectors.

An ensemble of size nm from model m is a collection of nm d-dimensional vectors, combined to produce a (d × nm)-dimensional vector vec(Ym). These vectors are similarly combined yielding a (d × N)-dimensional data vector vec(Y), where

 
formula

We consider multivariate analogs of (3), (4), and (5). Define d × d covariance matrices Σβ, Σbm, and ΣYm to be multivariate extensions of σ2β, σ2bm, and σ2Ym, respectively. Also, let bm be the d vector of biases for model m, with prior means μm. Finally, using the same conditional independence assumptions, we obtain

 
formula

and

 
formula

The multivariate generalization of (9) is

 
formula

where

  • Σm is (dnm × dnm) and given by 
    formula
  • 𝗖mm is an (dnm × dnm) matrix (1nm1nm) ⊗ Σβ.

3. Experimental design of superensembles

In much of our formalization of the design problem, we consider the estimation of X by the method of generalized least squares (GLS) applied to the data model (16). It follows that good designs make features of the associated covariance matrix of the GLS estimator small. We clarify this with some basic results.

a. Univariate X and one climate model

We assume d = 1 and consider a single model, say model m. We can write (16) as

 
formula

where

 
formula

That is, the “prior-mean-bias-adjusted” ensemble members have the same marginal distributions and common correlation; they are said to be exchangeable. The GLS estimate of X, denoted by m, based on an exchangeable ensemble works out to be the simple arithmetic average of the data:

 
formula

Under Gaussian assumptions, we have

 
formula

where

 
formula
 
formula

Derivations of these claims are discussed later. The indicated mean and variance of m are maintained without Gaussian assumptions.

In concert with standard intuition, increasing ensemble size does decrease uncertainty, as reflected in decreases in var(m|X). However, the decreases are moderate compared to those associated with uncorrelated observations. Further, as the ensemble size nm tends to infinity, our uncertainty regarding X as quantified by var(m|X) is bounded from below by σ2β + σ2bm. We find this comforting; we would not expect an infinite ensemble from any practical model to produce a perfect climate forecast. This also avoids potential embarrassments of infinite ensembles from different models yielding different “perfect” forecasts. We note that similar analyses and concerns also apply in general model-based forecasting, including ensemble forecasting in support of numerical weather prediction.

The correlation present in (19) responds to the relative variations in model error reflected in σ2β + σ2bm and the variation due to ensembling reflected in σ2Ym. Note that if σ2Ym is very small relative to σ2β + σ2bm, we can conclude that many ensembles members from a poor model are a little value. This is hardly a novel claim; the point is that our representations quantify what “many,” “poor,” and “little” mean. At the other extreme, an ensemble of size 5 from a hypothetical perfect model (i.e., uncorrelated ensemble members) yields the same variance for the GLS estimate as an infinite ensemble if ρm = 0.20 (cf. Berger 1985, exercise 149, p. 307).

Next, suppose we believe σ2β + σ2bm = 1.09, σ2Ym = 1 and that we judge that achieving var(m|X) = 1.10 is acceptable. It follows that we require an ensemble size of 100. Alternatively, suppose we could improve the model to yield σ2β + σ2bm = 1.00, without changing σ2Ym. Then, an ensemble of size 10 also yields var(m|X) = 1.10. Such arguments can also be reversed (though the issue of also changing σ2Ym is relevant in all comparisons). That is, a very large ensemble from a simple, cheap model that does not grossly degrade model quality may be competitive with a necessarily small ensemble from an expensive though better model. Of course, this notion is discussed by climate scientists; again, we offer a relevant quantification for such discussion.

b. GLS in the general case

Recalling that vec(Y) denotes the vector of superensemble data, and defining Y to be the biased-corrected data, we can write the multivariate data model (16) as

 
formula

where Σ is the dN × dN covariance matrix defined in (16). Applying least squares theory, the GLS estimate of X is

 
formula

Under Gaussian assumptions, we have that the distribution of given X is Gaussian:

 
formula

where the d × d covariance matrix is

 
formula
 
formula

The expression in (27) is the theoretical result; its simplification to the expression in (28) is an exercise in linear algebra and reviewed in appendix A.

c. Design and decision theory

Formal design of superensemble experiments relies on optimization of the expectation of a cost function that combines inference loss and the cost of the production and data processing of the superensemble. A quantification of this notion involves a definition of an expected cost or risk, say ℛ, of the form

 
formula

where L is inference loss, is our forecast of X, E(·) indicates expectation with respect to , and perhaps X, and C is computational cost. Construction of such a criterion quantifies the trade-off of increasing the nm, which presumably decreases our expected inference loss but increases our computational cost.

A common example of an inference loss is quadratic loss:

 
formula

where K is some constant, which calibrates the loss to the context of the problem at hand, and 𝗤 is some d × d positive definite matrix. The role of 𝗤 is to allow for differential treatment of the coordinates of X depending on their importance. We let 𝗤 be the identity matrix for this discussion. Note that we generally require specification of K; at the least, it is used to insure that L and C are measured in the same units. For example, if C is in dollars and X represents temperature is degrees, then K is in dollars per degree squared. (Of course, we could rescale the computational cost to be in degree squared per dollar.)

There are two fundamental approaches to statistical decision problems (see Berger 1985 for more complete discussion). Operationally, they differ in the choice of the probability distribution under which the expectation in (29) is taken. In frequentist decision theory, one considers the random forecast or estimator (Y) as a function of the yet-to-be-collected data Y and computes the expectation of the cost with respect to the conditional distribution of Y given X holding X fixed (i.e., the data model). If possible, one would then choose the best estimator as dictated by L and in turn the best design. This is not possible in general, since the result of the expectation is a function of X. Hence, some form of restricted optimization or modified criterion is necessary [e.g., minimax estimation, best estimation within the class of unbiased and linear-in-Y estimators (i.e., GLS), etc.].

The other approach is Bayesian. First, to find the optimal estimate of X, we minimize the expected loss computed under the posterior distribution of X given the data Y. This is a well-posed optimization problem; for quadratic loss, the result is the posterior mean of X given Y. For design purposes we then take the expectation of the conditional expected loss (a function of Y but not X) with respect to the marginal distribution of Y and optimize the result over all possible designs.

We begin with the frequentist approach (we call the approach frequentist, despite the fact that our data model is Bayesian, for ease in comparison to the decision theory literature). Note that the selection of GLS estimators as forecasts leads to an expected quadratic loss that is (i) not a function of the data and (ii) is simply the variance of the GLS estimator. Hence, optimization over candidate designs is well posed.

1) Illustrations

The following examples all assume quadratic inference loss and d = 1. Though the univariate case may be an oversimplification, the results offer qualitative guidance for higher-dimensional cases. Note that knowledge of σ2β is not needed in these cases.

In example 1, M = 1. Assume each of our n ensemble members cost c units. From (22) and (29), we find the risk associated an ensemble of size n is

 
formula

We easily obtain the optimal ensemble size

 
formula

Of course n* must be an integer. Interpretations of n* and the optimized risk K(σ2β + σ2b) + 2σYKc in terms of σY, K, and c are immediate.

In example 2, M > 1. Applying (28) for d = 1 and assuming each ensemble member from model m costs cm units, the risk associated a superensemble of sizes n1, . . . , nM is

 
formula

Example 3 involves “allocation designs.” By allocation design we mean specification of numbers nm, assuming that the total computational cost is fixed. Note that basic properties of optimal allocations also carry over to the unconstrained case of example 2, since its solution produces some optimal computational cost and the allocations for that cost must be optimal. Also note that since the total cost is fixed, the optimization can be done without specification of K.

For simplicity we assume the cost cm is constant across models. Hence, N is fixed, so we are to minimize the constrained problem

 
formula

Solving the problem is a relatively simple integer programming problem. However, to see properties of solutions, we find approximate solutions acting as if the nm are continuous. The resulting problem is amenable to the method of Lagrange multipliers. Candidate solutions can be obtained quickly by ignoring the inequality constraints 0 ≤ nmN for each m. If the resulting candidate satisfies those inequalities, then it is a solution. Hence, we are to maximize

 
formula

The resulting candidate solution is

 
formula

In case 1, for each m, σ2Ym = σ2bm. Then the proportion of ensemble members allocated to model m is

 
formula

This is the correct solution if

 
formula

for each m, which we anticipate holding if the σ2Ym are all roughly of the same order. On the other hand, if one of these, say σ2Ym, is huge compared to the others, then depending on N, the inequality constraint would be active and the optimal solution is to delete model m′ from consideration. An alternative is to increase N until the constraint is satisfied, though this may be impractical.

In case 2, for each m, σ2Ym = σ2Y. Computing (36), we find

 
formula

That is, the optimal fraction of N to allocate to model m is inversely proportional to σ2bm. In case 3, for each m, σ2bm = σ2b. We find

 
formula

This solution appears to be at odds with those of cases 1 and 2. In those cases, the general idea is to direct more ensembles to the models with “small uncertainty.” In case 3 it appears that one is to direct more ensembles to the high uncertainty models. This is essentially correct if N is sufficiently large, since then

 
formula

This makes sense since σ2bm is the same for all m; the solution tends to make the σ2Ym/nm small. However, if N is small and some σ2Ym are so large as to be poorly controlled even if all ensembles are directed to those m, it is better to ignore them and focus on the remaining models. The specifics are delicate in their dependence on σ2Ym, σ2b, and N. For example, say M = 2 and set σ2Y1 = 4, σ2Y2 = 100, and σ2b = 4. For N = 4, 10, and 94, the optimal (n*1, n*2) are (4, 0), (5, 5), and (19, 75), respectively. These results are indicative of the potential value of combining complex and simple, though perhaps less precise, models.

2) Extensions

First, we consider the Bayesian design in a special case. Suppose d = 1, our data model is Gaussian, and our prior for X is also Gaussian with variance τ2. It follows that the posterior variance of X conditional on the ensembles does not depend on the data or the prior mean. In particular, the posterior variance is

 
formula

where

 
formula

is precisely the quantity optimized in the frequentist approach. It is easy to check that var{X|vec(Y)} is a monotone increasing function of σ2e. Hence, the example results in cases 1–3 above are in fact Bayesian designs under these assumptions. Note that they do not depend on τ2. This seemingly happy circumstance needs not carry to the multivariate case.

For multivariate X, designs should be chosen to make cov(|X) “small.” To make this mathematically precise, one selects some scalar-valued function of cov(|X) and finds designs to minimize it. Traditional choices of that function include the trace and determinant of cov(|X). (Note that choosing the trace enables optimal design without knowledge of Σβ.) For brevity, we do not pursue this further in this article [see Berliner et al. (1999) for discussion in a weather forecasting context and general references].

Finally, the designs described so far depend on our prior for the model biases. This suggests that they are efficient but sensitive to the specifications of the σ2bm. An alternative approach that does not involve the prior on the biases is based on the following form of (7):

 
formula

where 𝗗 is the N × (M + 1) matrix,

 
formula

For M > 1, we can find the GLS estimator of the vector comprised by X and b and its covariance matrix. Hence, we can find designs that are good for simultaneously estimating X and b. Alternatively, viewing b as nuisance parameters and noting that the implied var(|X, b) is independent of b, we can optimize it directly to obtain designs for estimating X.

4. Climate forecasting: Hemispheric temperatures

Let Xp and Xf denote past and future climate variable trajectories and Yp and Yf be corresponding data. The main object of interest in Bayesian climate forecasting is the posterior distribution of future climate variables conditional on the data:

 
formula

A useful strategy for developing this object begins with inclusion of Xp. In general this allows application of dynamics and physical reasoning to relate Xf to Xp, rather than attempting to relate Xf to Yp directly. The plan is then to form a data model [Yp, Yf|Xp, Xf] and a process prior of the form [Xf|Xp][Xp]. Bayes’ theorem is applied to complete the analysis. Generally, the implementation of this suggestion involves many modeling steps, introduction of unknown parameters, and computational overhead.

In our bivariate (d = 2) example the derived climate variables X are hemispheric- and monthly averaged surface temperatures for the period 1882–2097. The years 1882–2001 are designated the observation period. The model data Yf for the forecasting period 2002–97 are computed from ensembles from M = 2 models, PCM and CCSM, labeled m = 1, 2, respectively, and three SRES scenarios, A2, A1B, and B1 (CO2 concentrations for these scenarios were obtained from the IPCC Web site). We have n1 = 4 ensembles members from PCM for each scenario and n2 = 1 ensemble member for scenarios A1B and B1 only. Finally, we deseasonalized each ensemble member by adjusting it for its monthly means for the first 30 yr of the forecast period.

Our models combine those developed by Y. Kim and L. M. Berliner (2006, unpublished manuscript, hereafter KB) with the data model reasoning for model output described in section 2. First, we develop some notation. Let t index months. The observational period includes months t = 1, . . . , P, P = 12 × 120 and the forecast period is months t = P + 1, . . . , P + F, F = 12 × 96. We let Xft and Xpt be d = two-dimensional vectors of true Northern and Southern Hemispheric temperatures for month t in the observation and forecast periods, respectively. Similarly, Ypt is the two-dimensional vector of the corresponding observations in month t. Finally, vec(Yft) is the vector of hemispheric temperatures for month t obtained from the model output. Note that for scenarios A1B and B1, these data vectors are 2 × (4 + 1) = 10 dimensional, but 2 × 4 = 8 dimensional for scenario A2.

a. Hierarchical model

1) Data models

(i) Observations

We assumed that the Ypt, t = 1, . . . , P are conditionally independent given Xpt, t = 1, . . . , P and all other model parameters and that for each t the conditional distribution of Ypt given Xp and the parameters depends on Xp only through Xpt. Preliminary data analysis suggested that the measurement error variance was comparatively large in the early portion of the observation period. Hence, we allowed for random changepoint, say tc, and assumed Gaussian distributions leading to

 
formula
 
formula

where c is the integer part of tc and the notation diag(. . .) denotes a matrix with the indicated quantities on the diagonal and off-diagonal elements all equal to zero. Priors used for the measurement error variances and tc are described in appendix B.

(ii) Model output

As in the case of the observational data, we assume conditional independence over time of the model output vec(Yft) conditional on Xf and other model parameters and that vec(Yft) conditional on Xf and the parameters depends on Xf only through Xft. Next, due to the limited number of ensembles available, we assume that the error covariance matrices Σβ and Σbm and ΣYm, m = 1, 2, are constant in time. We assume that the model biases are locally constant in time. We defined five temporal periods: 2002–31, 2032–51, 2052–71, 2072–91, and 2092–97, indexed by L = 1, . . . , 5, respectively. For each L = 1, . . . , 5, we let b1L(b2L) be a 2 vector of the Northern Hemisphere (NH) and Southern Hemisphere (SH) biases for model 1 (2).

For each t in the forecast time period, t = P + 1, . . . , P + F, we assume that

 
formula

where the model means βt are modeled as

 
formula

The prior model for coordinates of the model biases b1L and b2L, L = 1, . . . , 5 is that they are all independent and identically distributed Gaussians with mean 0 and variance 9; we used this prior for all three scenarios. The priors used for ΣY1, ΣY2, and Σβ are discussed in appendix B. Finally, we assume that the observational data and the model output data are conditionally independent throughout time given the true climate variables and the model parameters.

2) Process model

It is well understood that climate system displays a variety of space–time scales of behavior. Our suggestion is to use multiscale Bayesian hierarchical models, which allow the data to direct the treatment of scales of variation. In this article this is accomplished by using familiar time series models, but with space–time varying parameters. Of course, spatial modeling in two dimensions is not indicative of the issues in higher dimensions, so our real focus is on temporal scales. Furthermore, the example allows demonstration of how both covariate information, in our case CO2 data and the Southern Oscillation index (SOI), and dynamics of parameter evolution can be combined.

The main process model evolves monthly temperatures via a simple autoregressive time series model with local-in-time parameters. For each month t and conditional on indicated parameters, we assume that

 
formula

where

 
formula

and where the subscripts i, j, k indicate that the

  • mean parameters αi(t),

  • autoregression parameters ηj(t) = {ηj(t)n,ηj(t)s}′, and

  • process error variances σ2k(t) = {σ2k(t)n, σ2k(t)s}′

are themselves time varying, each on their own time scales. Specifically, the full time period is partitioned into equal length subintervals i = 1, . . . , I. We assume that for all months t in subinterval i, αi(t) = αi. Such local-in-time definitions are similarly applied to the autoregression parameters and process variances; the subscripts j and k indicate that they can be defined on different temporal subintervals. The NH and SH parameters need not vary on the same time scales, but we do not make such separations here.

Note that the inference and modeling problem is now fairly complex. Specifically, we not only have parameters to model, but lengths of the three sets of local subintervals are to be inferred. KB devote substantial attention to this aspect of model selection and the required calculations. In this article, we simply specify the locality definitions to be those KB found to best supported by the data in the observational period. Specifically, the mean parameters and process error variances are assumed to vary on an 8-yr period; the autoregressive parameters vary on a 2-yr period.

Finally, to implement (50), we need a distribution for X0; that is, true temperatures for the month before our starting time. We simply assumed

 
formula

Results are very insensitive to this assumption.

3) Parameter models

For brevity we review our priors for the mean, autoregression, and process variance parameters here and provide a more complete discussion for the other parameters in appendix B.

(i) Mean parameters

Let Ci represent global average CO2 concentrations averaged over time interval i. All CO2 values are assumed known in this article. Our model for the mean parameters is a regression onto CO2:

 
formula

where the errors are modeled as an autoregressive time series as indicated in appendix B. We used this model in both the observation and forecast periods but assume that the aα and bα are different in those periods. As needed, we use notation to indicate parameters in the forecast period.

(ii) Autoregression parameters

We used very similar models for the regression parameters but found that the absolute value of the SOI was useful as a predictor during the observational period. [Recall that their interval length of constant values was set at 2 yr (see also Smith et al. 2003).]

For month t in the observation period, assume

 
formula

where the errors eηj are modeled in (B4). This model requires modification in the forecast period. Potential options include 1) obtain ensemble-based estimates of a SOI using the original climate model output; 2) produce a more intense statistical model that forecasts SOI: this is analogous to parameterization in numerical modeling; and 3) produce a simple statistical model, motivated by the SOI model, that makes no reference to SOI. We only pursue the third option here as described in appendix B.

(iii) Model error variances

Let λi = { ln(σ2in), ln(σ2is)}′ (recall that these variances are assumed to vary on the same scale as the mean parameters). We assumed an autoregressive time series model:

 
formula

Further details are given in appendix B.

b. Results

Recall that the process model mean, autoregression, and variance parameters play a special role in our approach. We intend that they statistically control local-in-time climate and weather variations. Also, we seek to explain some of their variability using covariate information. Of particular importance in the forecast period is the behavior of the mean parameters modeled as regressions onto CO2 levels. Figure 1 presents temporal plots of 100 posterior realizations of the mean parameters for NH (Fig. 1a) and SH (Fig. 1b). In all cases, these plots indicate very small posterior variances; that is, the ensemble spread is tiny. We suggest that these plots strongly indicate the roles of CO2 concentrations in determining climate change. Intuitively, we have “filtered out” local variations, yielding comparatively low uncertainty indications of future climate.

Fig. 1.

Hemispheric process model means. (a) NH and (b) SH process model mean parameters (denoted by α’s in text). One hundred ensembles generated from the posterior distribution are plotted in each of the cases: observation period 1882–2001 (black) and the forecast period 2002–97 for SRES scenarios A1B (red), B1 (blue), and A2 (green), respectively.

Fig. 1.

Hemispheric process model means. (a) NH and (b) SH process model mean parameters (denoted by α’s in text). One hundred ensembles generated from the posterior distribution are plotted in each of the cases: observation period 1882–2001 (black) and the forecast period 2002–97 for SRES scenarios A1B (red), B1 (blue), and A2 (green), respectively.

Figure 2 shows analogous results for the autoregression parameters. Recall that the role of these parameters is to represent climatic variations occurring at moderate time scales (i.e., in the observational period, these parameters are modeled based on SOI). Note that their behavior is more variable in the observation period than in the forecast period. We expected this since SOI was available in the observation period; in fact, the level of variations in the forecast period was higher than we expected. This suggests that the models do produce some climate variations on this scale and that our model responded to them. Of course, we do not know if these variations will match future climate. We also note that variations depend on both the scenario used and the hemisphere considered.

Fig. 2.

Hemispheric process autoregression parameters. (a) NH and (b) SH process model autoregression coefficients (denoted by η’s in text). One hundred ensembles generated from the posterior distribution are plotted in each of the cases: observation period 1882–2001 (black) and the forecast period 2002–97 for SRES scenarios A1B (red), B1 (blue), and A2 (green), respectively.

Fig. 2.

Hemispheric process autoregression parameters. (a) NH and (b) SH process model autoregression coefficients (denoted by η’s in text). One hundred ensembles generated from the posterior distribution are plotted in each of the cases: observation period 1882–2001 (black) and the forecast period 2002–97 for SRES scenarios A1B (red), B1 (blue), and A2 (green), respectively.

Figure 3 presents the results for the process variances. Note that the variances are an order of magnitude higher in NH than in SH. We also see some general indications that the variances grow with CO2 in the forecast period. The growth of both the regression and variance parameters toward the later half of the twenty-first century reflect the notion of increasing variability of medium-range temperature behavior and weather variations due to climate change. We remark that the comparative flatness of variations in these quantities in the early portion of the forecast period could be an artifact of our selection of period 2002–31 as the basis of our deseasonalization of the climate model output.

Fig. 3.

Hemispheric process variances. (a) NH and (b) SH process variances (denoted by σ2i’s in text). One hundred ensembles generated from the posterior distribution are plotted in each of the cases: observation period 1882–2001 (black) and the forecast period 2002–97 for SRES scenarios A1B (red), B1 (blue), and A2 (green), respectively.

Fig. 3.

Hemispheric process variances. (a) NH and (b) SH process variances (denoted by σ2i’s in text). One hundred ensembles generated from the posterior distribution are plotted in each of the cases: observation period 1882–2001 (black) and the forecast period 2002–97 for SRES scenarios A1B (red), B1 (blue), and A2 (green), respectively.

Table 1 summarizes posterior results for the intercept and slope of the regression of process model means on CO2 [see (53)]. Note that for both hemispheres, the slopes during the observation period are an order of magnitude larger than those for the forecast period for all three scenarios. Another trend in each case is that the NH slopes are larger than the SH slopes. Since the units for CO2 are parts per million by volume (ppmv), the units for the slopes are °C (ppmv)−1. (To calibrate the results in Table 3, note that, for NH and scenario A1B, the expected increase in the mean of temperature arising from an increase in CO2 of 100 ppmv is 0.676° ± 0.013°C, that is, plus and minus one standard deviation.)

Table 1.

Means (std dev) of intercept and slope of regression of means on CO2 based on 5000 realizations from the posterior distribution.

Means (std dev) of intercept and slope of regression of means on CO2 based on 5000 realizations from the posterior distribution.
Means (std dev) of intercept and slope of regression of means on CO2 based on 5000 realizations from the posterior distribution.
Table 3.

Means (std dev) of model error covariance matrices based on 5000 realizations from the posterior distribution, where ΣY1 is for PCM and ΣY2 is for CCSM.

Means (std dev) of model error covariance matrices based on 5000 realizations from the posterior distribution, where ΣY1 is for PCM and ΣY2 is for CCSM.
Means (std dev) of model error covariance matrices based on 5000 realizations from the posterior distribution, where ΣY1 is for PCM and ΣY2 is for CCSM.

We next consider the data model parameters. For the observational period recall that we modeled an unknown changepoint tc in the measurement error variance. Its posterior mean is at year 1898.6 with a standard deviation of 4.14 yr. The posterior mean (standard deviation) of the measurement error variance for both NH and SH observations before the changepoint are 0.0015 (0.0003). The corresponding values after the changepoint are 6.3585 × 10−05 (3.60 × 10−06) for NH and 6.3510 × 10−05 (3.67 × 10−06) for SH.

Tables 2 and 3 summarize posterior results for model biases and the model error covariance matrices. One interesting observation is that PCM appears to have more bias but less variance than CCSM. A second note is that the variability quantified through Σβ appears to be significantly smaller than that quantified by ΣY1 and ΣY2.

Table 2.

Means (std dev) of biases b1 and b2 based on 5000 realizations from the posterior distribution.

Means (std dev) of biases b1 and b2 based on 5000 realizations from the posterior distribution.
Means (std dev) of biases b1 and b2 based on 5000 realizations from the posterior distribution.

Next, we review temperature forecasting results. In Fig. 4, we have plotted 100 trajectories of NH annually averaged temperatures simulated from the posterior distribution. The plot includes results for all three scenarios. We have also plotted the temperature observations through 2002, but note that they are not separable from the posterior realizations at the scale of this graphic. Figure 5 presents the results for SH. The spread in the trajectories provides a visual indication of variability or uncertainty. We note the forecasted trends and patterns of increases in temperatures, including dependence on the assumed scenario, are in rough agreement with those currently available at the IPCC Web site.

Fig. 4.

NH temperature reconstructions and forecasts. One hundred ensembles generated from the posterior distribution are plotted in each of the cases: observation period 1882–2001 (black; observations are also plotted here) and the forecast period 2002–97 for SRES scenarios A1B (red), B1 (blue), and A2 (green), respectively.

Fig. 4.

NH temperature reconstructions and forecasts. One hundred ensembles generated from the posterior distribution are plotted in each of the cases: observation period 1882–2001 (black; observations are also plotted here) and the forecast period 2002–97 for SRES scenarios A1B (red), B1 (blue), and A2 (green), respectively.

Fig. 5.

SH temperature reconstructions and forecasts. One hundred ensembles generated from the posterior distribution are plotted in each of the cases: observation period 1882–2001 (black) (observations are also plotted here) and the forecast period 2002–97 for SRES scenarios A1B (red), B1 (blue), and A2 (green), respectively.

Fig. 5.

SH temperature reconstructions and forecasts. One hundred ensembles generated from the posterior distribution are plotted in each of the cases: observation period 1882–2001 (black) (observations are also plotted here) and the forecast period 2002–97 for SRES scenarios A1B (red), B1 (blue), and A2 (green), respectively.

We now return to the issue mentioned in section 1 regarding potential overconfidence in posterior results. As a prelude, note that for fixed hemisphere, scenario, and model, the posterior standard deviations of the biases are decreasing as the lengths of epochs considered increase. Recalling that the priors were the same in all cases, a portion of this tendency is due to sample size, not the number of ensembles, but rather the lengths of epochs over which the biases are assumed to be constant. To indicate how such behavior can impact conclusions, Fig. 6 presents results analogous to those of Fig. 4 based on using the same model but with the assumption that the NH and SH biases are constant over the entire forecast period. Note how the basic form of these plots is similar, but toward the end of the century, the spread of the ensemble members in Fig. 6 is roughly one-half that in Fig. 4.

Fig. 6.

NH temperature reconstructions. Same as in Fig. 4 but based on a model assuming constant biases for all times in the forecast period.

Fig. 6.

NH temperature reconstructions. Same as in Fig. 4 but based on a model assuming constant biases for all times in the forecast period.

Our choice of five epochs and their lengths was ad hoc and chosen for illustration. We expect more epochs of shorter lengths would indicate even more uncertainty. On the other hand, as the epoch lengths become very small, the priors for the biases become critical and their selection becomes more challenging. However, at least in our case, transferring questions of climate change from actual temperatures to the parameters of the temperature probability distribution appears to have another advantage. Specifically, the inferences on the future means of temperatures as indicated in Fig. 1 remained virtually unchanged in both form and spread when the analysis is based on the assumption that the biases are constant over the entire forecast period.

c. Computations

Exact computation of the posterior distribution is intractable in our case. All inferences presented above were obtained by Markov chain Monte Carlo (MCMC). This procedure simulates a specially constructed Markov chain whose stationary distribution coincides with the posterior distribution. After a burn-in or transience period, realizations of the chain are (approximately) identically distributed ensembles from the target posterior. They are then used to estimate posterior quantities of interest. The exact algorithm used is a fairly standard implementation of Gibbs’ sampling for time series problems. For brevity, we do not review this further here. Examples and discussion of MCMC in geophysical settings can be found in Berliner et al. (2003b), KB, and Wikle et al. (1998); see Gelman et al. (1995) for a general introduction.

One technical issue is mentioned. Though the MCMC samples are from the posterior distribution of interest, they are correlated, leading to impacts on estimation much like those discussed in section 2a. This is why estimation is based on such seemingly large MCMC ensemble sizes of 5000 iterations after burn-in times conservatively set to 10 000. The 100 trajectories plotted in each of the figures were based on every 50th of our 5000-member ensembles from the posterior.

5. Review and discussion

a. Model output as data

A key feature of our statistical modeling of climate model output as described in section 2 is allowance for model-specific uncertainties regarding both the means, biases, and covariance properties of superensemble members. The crucial step to enable uncertainty analysis of superensembles is viewing the model-specific means as random variables with common mean β. Further uncertainty is reflected in construction of a distribution for β conditional on the true state of the climate variables of interest. We investigated implications of the models in inferring climate variables. Formally, this involved inspection of the marginal distribution of superensemble data conditional on the true climate state. A key result was that ensemble members are correlated in this marginal distribution. The correlation structures implied various limitations on our ability to infer the true climate states, even in the presence of large ensemble sizes.

The data modeling strategy can be generalized in several directions. As mentioned in section 2a, additional sources of model error and ensemble variation can be modeled. Next, though we illustrated the reasoning using Gaussian distributions, the strategy can be implemented with more general models and distributions. We used Gaussian distributions because they are familiar and led to comparatively simple calculations. Further, their results are readily amenable to our discussion of design of superensembles in section 3. Our message is the strategy, however.

A second direction for generalization involves consideration of different model classes. In section 2, all models were assumed to be similar in the sense that we used a single β as a common mean. As an alternative example, suppose we consider two model classes: class A models might be expensive, large-scale climate system models such as PCM and CCSM, etc. Suppose class B models are comparatively cheap and simpler climate models, for example, energy balance models; statistical dynamic models, in the sense of climate scientists (see McGuffie and Henderson-Sellers 1997); or statistical models in the sense of section 4. This suggests introduction of model class means, say βA and βB, and associated model class covariances. In particular, we probably expect ΣAβ to be “smaller” than ΣBβ. Note that classes A and B can be further partitioned into subclasses with subclass-specific distributions.

As discussed at the end of section 4b, a critical issue involves the assumed temporal structure of data models. We did not pursue this aspect in more detail in our example because of the small number of ensembles and because the ensembles for each SRES scenario did not show any appreciable spread in time. Hence, we could not meaningfully separate temporal variations in bias from variations in variance without more data and/or very strong prior information dictating the form of such variations.

Temporal structure of the models is also important in designing superensembles targeted for inference of temporal trajectories of climate. Formal consideration of time-varying data model parameters can lead to high-dimensional, complex covariances matrices to be modeled and optimized, even if we wish to infer low-dimensional climate summaries. The difficulties may be somewhat lessened if we can model the requisite covariance matrices as simple functions of time (e.g., an unknown univariate function of time multiplying an unknown but constant matrix). In the case of large climate models, allowance for unknown covariances with arbitrary temporal evolution is problematic due to anticipated small ensemble sizes.

b. Design

Section 3 provided a formalization of the problem of selecting models and ensemble sizes for the generation of superensembles. The formalization involved optimization of decision theoretic criteria based on losses accruing from errors in inference and experimental costs. Our illustrations focused on inferences judged by quadratic inference loss and hence relied on generalized least squares (GLS) estimation. Computational results were reviewed for the case of a univariate climate variable. We reiterate that these issues are also of interest in other contexts (e.g., numerical weather forecasting).

Our analyses in section 3c offer intuition regarding the behavior of optimal designs. More general treatment in the context of multivariate climate variables involves more intense computational challenges, but not more intense conceptualization. Though we do not envision application of the approach in the case of millions of variables, cases involving hundreds can be addressed. Computational costs for design can be optimized relatively cheaply, at the least by numerical search. Furthermore, suboptimal, but comparatively “good” designs, may be obtained even more easily.

As posed here, design required specification of various data model covariance matrices and information regarding model biases. Selection of these quantities can be based on past model assessments. When feasible, study of the sensitivity of optimal designs with respect to those specifications is recommended. Also, since analyses such as those presented in section 4 produce posterior distributions of these covariances and biases, they can be viewed as preliminary studies to develop information about important model behaviors. Indeed, using either our simple model or some other simple model, the archived output from many models widely available to the climate community can be used to gain some information about the data model parameters. Such information seems very relevant and of interest to climate forecasters. Finally, recall that we recommended severe reduction in the dimension of X from that of the dimension of climate system models. This enables both formal analysis combining information sources as well as problem-specific optimal design. On the other hand, generation of ensemble members from massive models remains too expensive to imagine running different ensembles for every interesting climate and climate change impacts study. We suggest that community superensembles from class A models may be generated optimally for some selection of X thought to be robust for many potential uses. With our techniques, users with special problems and needs can then design an “experiment” consisting of samples from this (or any) archive, augmented by superensembles from class B models designed to depend on specific goals.

c. Overlapping observations and model output

In principle, improved estimation of model output data models and their parameters can be obtained by overlapping observation data times and model output times. In some cases, this is easy in our framework, at least under the assumption that the observations and model outputs are conditionally independent. Furthermore, running the Bayesian model without overlap and seeing how well the results match observations, up to measurement error, can serve as a model diagnostic. However, in our specific setting, the real observations correspond to a climate under the influence of true forcings, while model output was generated under SRES scenarios. Hence, differences occurring in an overlap period may be due to both model error and forcing variations. More intense modeling would be required to try to separate these effects.

d. Concluding remarks

The Bayesian climate forecasting approach outlined at the outset of section 4 serves as a target for analysis. Two key points merit discussion. First, our forecast distribution (45) is actually the marginal predictive distribution obtained formally by integrating the full posterior

 
formula

with respect to Xp. Note that (56) means that under the model, Yf is informative about Xp, as in backcasting or reanalysis. However, using climate model output generated from different CO2 scenarios to infer Xp makes little sense to us. In our example, the posterior distribution of the past was not affected by Yf computed under any of the scenarios, in large part due to our allowing the process model parameters and their priors to be different in the observation and forecast periods.

The second key point is that, since the climate model output is treated as data, the Bayesian approach relies on specification of a prior process model [Xf], or in our case, [Xf|Xp] since we use past data. This means that results may vary with selection of these priors. A related issue in the multimodel context is selection of which model to use as the prior as opposed to a data source. One potential guide in this selection involves the plausibilities of the models as stand-alone climate forecasters. For example, we used our simple model because it offers a flexible structure with parameters that can be readily influenced by data and related to covariates of interest. However, we would not propose generation of ensembles of output from the forecast distribution [Xf|Yp] based on our simple model.

This topic introduces an element of subjectivity to the strategy, but this has already arisen in our data modeling. For this reason, it may be tempting to suggest that climate forecasters avoid explicit methods, such as the strategy suggested here. We counter that simple use of ensemble means and variances to construct forecasts and forecast error quantifications or confidence intervals coincides with implicit data models, which assume the ensemble members are independent, identically distributed (and unbiased) observations of climate.

A related suggestion to manage the issue of selection of a process model prior is to maintain the data modeling much as presented here, but to use an uninformative, typically uniform, process prior for Xf. When this is implementable, the results would be useful benchmarks. Indeed, we used uninformative priors on some model parameters introduced in section 4 (see appendix B). Further, the formulas for GLS estimation presented in section 3 [e.g., (25), (28)] are Bayesian posterior means and covariances for climate based on such uniform priors. However, their implementation requires either knowledge and/or estimation of all data model covariances. Hence, one either assumes their values (objectively?) or faces the additional problems, including loss of desirable properties of GLS and accounting for uncertainty due to covariance estimation.

An argument for maintaining informed priors for Xf involves applications to forecasting impacts of climate change. Suppose we are really interested in some quantification, say Rf, of some system (e.g., biological, ecological, agricultural, or regional climatic) that depends in part on the climate variables Xf. The object of interest is then

 
formula

where [Rf|Xf] is a probability model for the impacts of interest conditional on climate variables. Typically, this distribution would be a parameterized model, developed in part much like our development of [Xf|Xp] and [Xp]. Success may well require the best, most informed modeling of all stages of such a complex hierarchical model. We also note that formal study of the impact of prior assumptions is an area of research in Bayesian statistics. For discussion of robust Bayesian analysis in a climate analysis setting, see Berliner et al. (2000a) and see Berger (1985, section 4.7) for a general introduction.

In summary, our views regarding objectivity versus subjectivity in the context of forecasting climate and climate impacts are (i) the use of uninformative priors offers useful benchmark analyses, though total objectivity is not achievable; (ii) informed analyses may be more efficient than their uninformed counterparts in combining science, model output, and observational data; and (iii) objectivity does not need to correspond to plausibility.

Acknowledgments

LMB and YK were supported by the NSF under Grant DMS-01-39897. YK was also partially supported by the Department of Statistics, The Ohio State University. We are grateful to Dr. Claudia Tebaldi at NCAR for providing the climate model output. Excellent comments and critiques from two reviewers led to substantial improvements in this article.

REFERENCES

REFERENCES
Allen
,
M.
,
P. A.
Stott
,
J. F. B.
Mitchell
,
R.
Schnur
, and
T. L.
Delworth
,
2000
:
Quantifying the uncertainty in forecasts of anthropogenic climate change.
Nature
,
407
,
617
620
.
Berger
,
J. O.
,
1985
:
Statistical Decision Theory and Bayesian Analysis.
2nd ed. Springer-Verlag, 617 pp
.
Berliner
,
L. M.
,
2003
:
Uncertainty and climate change.
Stat. Sci.
,
18
,
430
435
.
Berliner
,
L. M.
,
Z-Q.
Lu
, and
C.
Snyder
,
1999
:
Statistical design for adaptive weather observations.
J. Atmos. Sci.
,
56
,
2536
2552
.
Berliner
,
L. M.
,
R. A.
Levine
, and
D. J.
Shea
,
2000a
:
Bayesian climate change assessment.
J. Climate
,
13
,
3805
3820
.
Berliner
,
L. M.
,
C. K.
Wikle
, and
N.
Cressie
,
2000b
:
Long-lead prediction of Pacific SSTs via Bayesian dynamic modeling.
J. Climate
,
13
,
3953
3968
.
Bernardo
,
J. M.
, and
A. F. M.
Smith
,
1994
:
Bayesian Theory.
Wiley, 586 pp
.
Bowman
,
K. P.
,
J.
Sacks
, and
Y-F.
Chang
,
1993
:
Design and analysis of numerical experiments.
J. Atmos. Sci.
,
50
,
1267
1278
.
Box
,
G. E. P.
, and
G. C.
Tiao
,
1973
:
Bayesian Inference in Statistical Analysis.
Addison-Wesley, 588 pp
.
Dai
,
A.
,
T. M. L.
Wigley
,
B. A.
Boville
,
J. T.
Kiehl
, and
L. E.
Buja
,
2001
:
Climates of the twentieth and twenty-first centuries simulated by the NCAR Climate System Model.
J. Climate
,
14
,
485
519
.
Gelman
,
A.
,
J. B.
Carlin
,
H. S.
Stern
, and
D. B.
Rubin
,
1995
:
Bayesian Data Analysis.
Chapman and Hall, 526 pp
.
Goldstein
,
M.
, and
J.
Rougier
,
2005
:
Probabilistic formulations for transferring inferences from mathematical models to physical systems.
SIAM J. Sci. Comput.
,
26
,
467
487
.
Goldstein
,
M.
, and
J.
Rougier
,
2006
:
Reified Bayesian modelling and inference for physical systems.
J. Stat. Infer. Plann.
,
in press
.
Higdon
,
D.
,
M. C.
Kennedy
,
J.
Cavendish
,
J.
Cafeo
, and
R. D.
Ryne
,
2005
:
Combining field data and computer simulations for calibration and prediction.
SIAM J. Sci. Comput.
,
26
,
448
466
.
Kaufmann
,
R. K.
, and
D. I.
Stern
,
1997
:
Evidence for human influence on climate from hemispheric temperature relations.
Nature
,
388
,
39
44
.
Kennedy
,
M. C.
, and
A.
O’Hagan
,
2001
:
Bayesian calibration of computer models.
J. Roy. Stat. Soc.
,
63B
,
425
464
.
Krishnamurti
,
T. N.
,
C. M.
Kishtawal
,
T. E.
LaRow
,
D. R.
Bachiochi
,
Z.
Zhang
,
C. E.
Williford
,
S.
Gadgil
, and
S.
Surendran
,
1999
:
Improved weather and seasonal climate forecasts from multimodel superensembles.
Science
,
285
,
1548
1550
.
McGuffie
,
K.
, and
A.
Henderson-Sellers
,
1997
:
A Climate Modelling Primer.
2nd ed. Wiley, 261 pp
.
Moss
,
R. H.
, and
S. H.
Schneider
,
2000
:
Uncertainties.
Guidance Papers on the Cross Cutting Issues of the Third Assessment Report of the IPCC, R. Pachauri, T. Taniguchi, and K. Tanaka, Eds., World Meteorological Organization, 33–51
.
Ravishanker
,
N.
, and
D.
Dey
,
2002
:
A First Course in Linear Model Theory.
Chapman and Hall, 473 pp
.
Reilly
,
J.
,
P. H.
Stone
,
C. E.
Forest
,
M. D.
Webster
,
H. D.
Jacoby
, and
R. G.
Prinn
,
2001
:
Uncertainty and climate change assessments.
Science
,
293
,
430
433
.
Santner
,
T. J.
,
B. J.
Williams
, and
W. I.
Notz
,
2003
:
The Design and Analysis of Computer Experiments.
Springer, 283 pp
.
Smith
,
R. L.
,
T. M. L.
Wigley
, and
B. D.
Santer
,
2003
:
A bivariate time series approach to anthropogenic trend detection in hemispheric mean temperatures.
J. Climate
,
16
,
1228
1240
.
Tebaldi
,
C.
,
R. L.
Smith
,
D.
Nychka
, and
L. O.
Mearns
,
2005
:
Quantifying uncertainty in projections of regional climate change: A Bayesian approach to the analysis of multimodel ensembles.
J. Climate
,
18
,
1524
1540
.
Washington
,
W. M.
, and
Coauthors
,
2000
:
Parallel climate model (PCM) control and transient simulations.
Climate Dyn.
,
16
,
755
774
.
Webster
,
M.
,
2003
:
Communicating climate change uncertainty to policy-makers and the public.
Climatic Change
,
61
,
1
8
.
Wigley
,
T. M. L.
,
R. L.
Smith
, and
B. D.
Santer
,
1998
:
Anthropogenic influence on the autocorrelation function of hemispheric-mean temperatures.
Science
,
282
,
1676
1679
.
Wikle
,
C. K.
,
L. M.
Berliner
, and
N.
Cressie
,
1998
:
Hierarchical Bayesian space-time models.
Environ. Ecol. Stat.
,
5
,
117
154
.

APPENDIX A

Derivation of (28)

Our derivation of (28) relies on a result from linear algebra.

Lemma

If 𝗔 is a dN × dN invertible matrix, and 𝗕 is dN × d matrix, then

 
formula

Note that

 
formula

where

 
formula

and, similarly,

 
formula

Hence, we can apply the lemma to calculate the inverses of each Am and Σ. After some algebra we obtain

 
formula

where

 
formula

Next,

 
formula
 
formula

where

 
formula

Next, we apply (A5) in (A8) and substitute the result into

 
formula

where

 
formula

Finally,

 
formula
 
formula

Both (A13) and the reduction of 𝗛−1 to the result in (28) are applications of the matrix identity (𝗘 + 𝗙)−1 = 𝗘−1 − 𝗘−1 (𝗘−1 + 𝗙−1)−1 𝗘−1.

APPENDIX B

Parameter Models

Data model parameters
Observation error variances

All four measurement error variances in (46) were assigned independent inverse gamma priors (e.g., Berger 1985, p. 561). Inverse gamma distributions can be chosen to match one’s prior mean and variance and are also convenient for calculations. We assumed prior mean (standard deviation) 0.0303 (0.0003) for both σ20n and σ20s and prior mean (standard deviation) 0.0152 (0.0002) for both σ21n and σ21s.

Changepoint in observation error variances

We assumed that tc is a Gaussian random variable with mean corresponding to January 1900 and standard deviation corresponding to 7 yr.

Model error covariances

We use a d-dimensional generalization of the inverse gamma known as the inverse Wishart distribution (e.g., Box and Tiao 1973, section 8.5.1). A random d × d covariance matrix Σ has an inverse Wishart prior, denoted W−1 (Σ0, υ), if its prior density π(·) is

 
formula

where Σ0 is a fixed, positive definite covariance matrix. For ν > 2, the mean of this prior is ν(ν − 2)−1 Σ0. The degrees of freedom parameter ν controls the spread of the distribution. For ν < 2 the density is too diffuse for the mean of the distribution to exist; for ν a little greater than 2, the density is relatively diffuse, and as ν grows it becomes increasingly concentrated about the mean.

For our prior, we assume that ΣY1 and ΣY2 are independent realizations from a W−1(Σ0, ν = 50) distribution, where Σ0 = diag(2, 1). We assume that Σβ is independently distributed from a W−1(Σ0β, νβ = 10) distribution, with Σ0β = diag(2, 1). Table 2 indicates that these priors were completely overwhelmed by the data.

Process model parameters
Mean parameters

Recalling (53) we assumed αi = aα + Cibα + eαi, where the errors are modeled as an autoregressive time series:

 
formula

where

 
formula

As for aα and bα, we allow for ϕαn, ϕαs, and Σα to be different in the observation and forecast periods. We use notation to indicate these parameters in the forecast period. We assumed that

  • aα, bα, ãα, and α have diffuse, uniform priors;

  • ϕαn, ϕαs, ϕ̃αn, and ϕ̃αs are independent and uniformly distributed on the interval (−1, 1); and

  • Σα and Σ̃α are independent and have W−1(Σ0 = I2, ν = 0.001) distributions; recall that with υ this small, the distribution is very diffuse.

Autoregression parameters

For month t in the observation period, we assume [see (54)] ηj = aη + |Sj| bη + eηj, where

 
formula

and

 
formula

In the forecast period, we replace (54) by

 
formula

where

 
formula

Discussion of how this model can be related to (54) is given in KB. We assumed that

  • aη, bη, ãη, and η have diffuse, uniform priors;

  • ϕηn, ϕηs, ϕ̃ηn, and ϕ̃ηs are independent and uniformly distributed on the interval (−1, 1); and

  • Σα and Σα are independent and have W−1(Σ0 = 𝗜2, ν = 0.001) distributions.

Model error variances

Recall the autoregressive time series model (55):

 
formula

We assume that

 
formula

As above, we assumed the same model for both periods but with different parameters. We also used the vague priors analogous to those used for the mean parameter model. Table 3 summarizes posterior results for these parameters.

Footnotes

Corresponding author address: L. Mark Berliner, Department of Statistics, The Ohio State University, 1958 Neil Avenue, Columbus, OH 43210-1247. Email: mb@stat.osu.edu