A Bayesian fingerprinting methodology for assessing anthropogenic impacts on climate was developed. This analysis considers the effect of increased CO2 on near-surface temperatures. A spatial CO2 fingerprint based on control and forced model output from the National Center for Atmospheric Research Climate System Model was developed. The Bayesian approach is distinguished by several new facets. First, the prior model for the amplitude of the fingerprint is a mixture of two distributions: one reflects prior uncertainty in the anticipated value of the amplitude under the hypothesis of “no climate change.” The second reflects behavior assuming“climate change forced by CO2.” Second, within the Bayesian framework, a new formulation of detection and attribution analyses based on practical significance of impacts rather than traditional statistical significance was presented. Third, since Bayesian analyses can be very sensitive to prior inputs, a robust Bayesian approach, which investigates the ranges of posterior inferences as prior inputs are varied, was used. Following presentation of numerical results that enforce the claim of changes in temperature patterns due to anthropogenic CO2 forcing, the article concludes with a comparative analysis for another CO2 fingerprint and selected discussion.
Statistical analyses play a fundamental role in climate change assessment (Barnett et al. 1999). Many of the methods known as “fingerprinting” have strong roots in classical statistical hypothesis testing and regression analysis (e.g., Levine and Berliner 1999). An alternative to the traditional statistical viewpoint is offered by the Bayesian approach. To most easily illustrate the Bayesian approach, we develop a Bayesian fingerprinting analysis for near-surface temperature, restricted to the impact of CO2 forcing only. Heuristically, the modeling process formalizes the relation:
where a is an unknown parameter; g is a spatially varying CO2 fingerprint; and “noise” is modeled to account for both errors in the data and natural climate variability.
The observational dataset used here is a subset (1961–98) of the Jones annual temperature observations (Jones 1994; Jones et al. 1999). We rely on model output from the National Center for Atmospheric Research (NCAR) Climate System Model (CSM; see Boville and Gent 1998; Meehl et al. 2000) in developing a statistical model. When appropriate, datasets were spatially interpolated to the 5° × 5° Jones data grid. We used a 300-yr CSM control run as a basis for estimating natural climate variability. We used a 120-yr CO2 forced run in combination with the control run in constructing a CO2 fingerprint pattern g. Specifically, we computed differences between temporal averages of the final 100 yr of the forced CSM run and 100 yr of the control run at each grid box. Figure 1 displays this fingerprint. (The double use of the CSM control run in developing a fingerprint and estimating climate variability may be of some concern, though we judged that to be minor for our analyses.) For comparison, we also present analyses based on the CO2 fingerprint pattern of Santer et al. (1995).
The Bayesian viewpoint is characterized by two main points: uncertainty modeling and learning from data. In general, the uncertain quantities may be unobserved physical variables, including future values as in forecasting problems, as well as unknown model parameters. In this article, we consider only the latter setting: our analysis will be for the unknown, amplitude a introduced in (1). The formal mechanisms for achieving the above two points are as follows.
All unknowns are treated as random variables and endowed with a prior distribution. For our example, the prior is denoted by π(a).
This distribution, often known as the likelihood function, is typically formulated in conventional statistical analyses as well. An intuitive interpretation of Bayes’s theorem is that once the data have been observed, the prior distribution is reweighted by the likelihood function to produce the posterior. General reviews can be found in Berger (1985), Epstein (1985), and Bernardo and Smith (1994).
Specific reasons for adopting the Bayesian approach in climate change analysis include
inability to adopt the fundamental practices associated with traditional statistical cause–effect analysis, such as controlled experiments comparing responses to treatments;
the need to account for various uncertainties in observational data and our knowledge of how the climate system works;
separation of investigation of the practical significance of potential changes as opposed merely to consideration of statistical significance.
Common approaches for dealing with the first item involve the use of climate models and their responses to various forcings or treatments. We followed suit in this article, using CSM model output, though in general a variety of prior elicitation approaches may be considered (Berger 1985; Bernardo and Smith 1994).
The second item is only partially dealt with here. Our statistical modeling described in section 2 explicitly adjusts for (i) errors in the observational data, (ii) missing data, and (iii) both spatial and temporal dependence structures present in climate variability. There is some evidence that observational uncertainties are negligible in climate change analyses for global, annual near-surface temperatures (Hegerl et al. 2000). Nevertheless, we present an approach for adjusting for data errors here because the method is simple and may be of interest in climate change studies for other physical variables with less reliable data. For most of this article, we focus on uncertainty regarding the amplitude a. First, a prior distribution for a is developed. However, the resulting prior is itself the object of uncertainty, and often a source of controversy. To address this problem, we develop robust Bayesian analyses. The key is that rather than viewing the prior as fixed and perfectly specified, we view our prior information as specifying only a class of plausible priors. The robust Bayesian approach considers the ranges of posterior inferences as the prior varies over the specified class.
Perspective for the third item is best provided via comparison to non-Bayesian approaches. In the simplified framework considered here, traditional fingerprinting involves applications of the triumvariate of classical statistical methods: estimation, confidence intervals, and testing for the amplitude a. Detection of climate change is associated with rejection of the hypothesis that a = 0 via classical statistical testing. However, such rejection does not admit the interpretation that the event a = 0 has small probability, nor does it imply anything regarding the chance that “a is near zero.” (Such statements make no sense in the strict classical statistical approach: a is not viewed as a random variable and hence, cannot be the object of probability statements.) Next, assuming a “detection,” standard approaches to attribution are intended to provide statistical assessments of the consistency of a with a value (or values) of the parameter suggested by climate model responses to hypothesized anthropogenic forcings. The idea is that a forced, model-based estimate of a, say aM, is used as the null hypothesis of another test. Attribution is then associated with the failure to reject the hypothesis that a = aM. Again, such a failure to reject does not imply that the event “a is near aM” has high probability. Rather, it means that the observed data are not surprising if we assume that a = aM. Levine and Berliner (1999) note that such an approach is not recognized as being statistically valid even from the classical viewpoint.
The Bayesian’s treatment of parameters such as a as random variables can lead to very different formulations and interpretations of results. As opposed to the indirect inferences associated with classical statistical testing, Bayesian analyses can produce direct probability statements to serve as bases for detection and attribution. In particular, we define detection as occurring when the posterior probability that a departs significantly from 0 is large. “Significance” here refers to practical significance, rather than statistical significance. Specifically, for a specified neighborhood, say 𝒟 of a = 0, we have a detection if the posterior probability of the event “a is in 𝒟” is small. Similarly, for a specified neighborhood, say 𝒜, of a = aM, we (essentially) define attribution as occurring when the posterior probability that“a is in 𝒜” is large. (We say “essentially” here, because we include another technical condition for attribution, as clarified in section 3a.)
Forms of Bayesian analyses in climate change research have been discussed in the literature. Some examples include Hasselmann (1998), Leroy (1998), and Tol and de Vos (1998). However, these papers do not provide complete views of the potential of the Bayesian approach. Neither does this article, though we do provide substantial discussion of the flexibility of modeling and forms of inference associated with the Bayesian approach [also see Chiang et al. (1999, manuscript submitted to J. Climate)]. On the other hand, the scope of the model used here is limited. We make no pretense to describing a complete climate change analysis. We use a variety of simplifying assumptions. Some were chosen to enhance readability of the article; some were chosen due to computational limitations. For example, though we use a model for the space–time structure of the noise process in (1), we did not adjust for our uncertainty in that model. Inclusion of prior distributions for features (e.g., natural variability covariance matrix) of the distribution of the noise are mandated. Further, we limit the form of signals to be purely spatial and a to be constant in time. A natural generalization is to allow for time-varying amplitudes and fingerprints (e.g., Hasselmann 1993; North and Stevens 1998; Santer et al. 1995; Tett et al. 1999). In a Bayesian framework, a collection of at would be endowed with a time series prior developed for climate change analysis. Also, extensions to Bayesian multiattribute fingerprinting are quite accessible (Hasselmann 1997; Hegerl et al. 1997). Attention would focus on a vector, say a, whose dimension corresponds to the number of fingerprints used in the model. The posterior distribution of a would present us with dependence structures among the elements of a, that would be of interest scientifically. We will pursue these avenues elsewhere.
Next, anthropogenic climate change need not be restricted to the study of signals. For example, we might anticipate potential changes in climate variability. We could use richer models and priors to enable consideration of both signal and variability. More generally, the probability distribution of climate may change form. Other potential extensions are mentioned in section 5d.
In section 2 we describe development of a statistical model for the data and a prior distribution for the parameter a particularly tailored for climate change analysis. Section 3 offers development of a variety of posterior Bayesian inferences. These include formalizations of our above notions of Bayesian detection and attribution as well as the robust Bayesian approach. Numerical results are presented in section 4. These results support both detection of change and attribution of that change to anthropogenic CO2 forcing. In section 5 we present companion Bayesian results for the CO2 fingerprint described in Santer et al. (1995) and some remarks and discussion.
2. Bayesian fingerprinting: Formulation
We discuss developments of the data model or likelihood function f(data|a) and prior π(a) in the next two sections.
a. Likelihood function: f(data|a)
The observational data used here are the familiar Jones data (Jones 1994), consisting of monthly temperature observations on a 5° × 5° grid; we only use data for the period 1961–98. Our analyses are for yearly averages of these data. The Jones grid leads to n = 36 × 72 = 2592 spatial grid points. Some data are missing. Our analysis uses only actual data, that is, we do not impute any missing observations.
Let Tt represent the n vector of “true” annual, gridded temperature anomalies in year t, t = 1, . . . , m. (We perform analyses for four different choices of m.) We assume that these anomalies represent departures from a baseline compatible with that chosen in preparation of the Jones data; namely, the mean of temperatures over the 30-yr period 1961–90. (In general, definitions of baselines appropriate for climate change studies are very important in the interpretation of results.) Let T = (T′1, . . . , T′m)′ be the m · n vector collecting temperature anomalies over the entire period where ′ denotes transpose. Define vectors of observations Y1, . . . , Ym, where Yt represents “observed” annual temperatures in year t = 1, . . . , m. These vectors may be of various lengths, say nt, due to missing data. Let Y = (Y′1, . . . , Y′m)′ be the collection of these vectors.
Our data model includes a formal adjustment for errors present in the observations. We assume that these errors all have mean zero, are independent from year to year, and have normal (Gaussian) distributions. To keep track of missing data, for each year define nt × n incidence matrices Xt, consisting of rows of zeroes, except for a single 1, indicating locations of the observations. This is summarized for each year as
where Dt is an nt × nt covariance matrix. Alternatively, we can represent the model for all observations as
where X is a block diagonal matrix with blocks Xt and D is a block diagonal matrix with blocks Dt. [For completeness, note that the dimensions of X are Σmt=1 nt × (n · m); the dimensions of D are Σmt=1 nt × Σmt=1 nt.]
The second stage of our model considers natural climate variability and the spatial fingerprint. In particular we assume that each year t = 1, . . . , m
where Σs is an n × n spatial covariance matrix; g is an n vector denoting the fingerprint. Note that, as in the fingerprint work of Hasselmann (1997) and the references therein, we assume that the fingerprint pattern is constant over time. Under this model, a standard probability calculation implies that unconditional on T,
where G is an m · n vector of m replicates of g. The matrix Σ is an n · m × n · m space–time covariance matrix developed to represent both the spatial dependence reflected in Σs and temporal dependence. We assume that Σ is space–time separable as described in appendix A [see North and Wu (1999, manuscript submitted to J. Climate) and Cressie and Huang (1999) for potential avenues to relaxing this assumption].
A fully Bayesian analysis would involve formulation of prior probability models for the quantities a, D, Σ, and perhaps g. In view of the high dimensionality of the problem, this is a daunting task. For purposes of illustration, we follow simpler strategies of (i) formulating plausible estimates of D and Σ, and (ii) selecting fixed fingerprints. We focus on modeling and inference for the amplitude a here and describe our estimation of D and Σ in appendix A.
Both Bayesian and traditional approaches to statistical inference benefit from application of the so-called sufficiency principle (Berger 1985, 126–127; Epstein 1985, 23–25). The idea is to construct functions of the data that summarize all the statistical information contained in the data regarding the unknown parameters. It turns out that for the model (6) a sufficient statistic for a is the generalized least squares estimator (GLS), denoted by â(Y) (e.g., Graybill 1976; Levine and Berliner 1999). This estimator is given by
The quantity â(Y) in (7) is often known as the optimal filter in the climate science literature, though it should be clarified that it is optimal under a mean squared error criterion, and then only within the class of estimators that are both unbiased and linear in the data vector Y. It is also familiar as the most probable value of a, though this Bayesian-like interpretation is founded on a so-called “uniform prior” assumption; we do not use uniform priors in this article.
A further result from statistics (Graybill 1976) implies that
b. Prior: π(a)
The issues of climate change motivate construction of the prior π(a) as a mixture of two main components as follows: with probability p, a has a normal distribution with mean 0 and variance τ2, and with probability 1 − p, a has a normal distribution with mean μA and variance τ2A. Symbolically, we write
where n( · , · ) represents the probability density function for a normal distribution; the first argument is the mean and the second is the variance.
In specifying the prior parameters, note that we should think of a and g as coming in pairs. That is, our prior views concerning the actual values of a depend on the choice of g. The first component of (10) is to model information about a under the qualitative statement that anthropogenic CO2 forcing has not impacted global temperatures, at least not in a fashion reflected by the fingerprint g. The selection of zero as the prior mean of a in this component of the prior seems quite natural. The variance τ2 is interpreted as the variability, arising naturally, we anticipate in fitting a by projecting temperature fields onto the selected fingerprint g. The probability p is then a quantification of the degree of belief in the hypothesis of no anthropogenic impacts.
The second component of the prior represents an implied distribution on a assuming significant anthropogenic effects representable as changes in temperature patterns along the fingerprint g. The quantity μA is the hypothesized shift in a, which in turn translates to shifts in temperature anomalies ga. The variance τ2A reflects anticipated variability in a under CO2 forcing. This parameter reflects, in part, natural climate variability, though there is no requirement that τ2A = τ2.
Note that the hypotheses here are modeled as distributions rather than as single points (i.e., either a = 0 or a = aA, exactly). This is in contrast to the simplified prior used in Hasselmann (1998), which is essentially (10) with both τ2 and τ2A set equal to zero, and thus ignores a significant source of uncertainty.
The variation represented by τ2 would be best estimated through (unavailable) replications of the climate system under a no CO2 forcing scenario. As a surrogate, we use CSM control run output. A single 300-yr series was partitioned into 100 batches of equal time duration, each representing “replications” of an unforced, surrogate global climate system over short time periods. The batches are overlapping in that they are constructed by taking a running window of 200 yr. For each batch and our CSM fingerprint pattern g, we computed generalized least squares estimates of a as if the batched data were real observations. That is, in each case, we assumed a model like (6) but with model output playing the role of the data Y and no measurement error covariance matrix D. The sample standard deviation (0.007) of the resulting 100 estimates then provides a plausible first guess at the variance τ. However, since the results batching process of the NCAR CSM runs would vary with sample size, number of batches, and so on, as well as with different CSM runs, we actually use a somewhat larger value τ = 0.02.
Prior parameters under CO2 forcing were computed analogously to the previous discussion but using batches of the CSM CO2 forced output. We used the sample mean of the batched results to estimate the prior mean, μA. The sample variance provides a guess for τ2A, though we again used a slightly larger value since we are merely sampling from a particular CSM run.
3. Posterior inferences
This distribution is a mixture of the two posteriors corresponding to each of the two components of the prior (10). The mixing weight p(â) is a function of the observed data. It is important to note that p(â) is not necessarily interpretable as the posterior probability that the no-change prior distribution is correct. Rather, it is a weight associated with the corresponding posterior distribution. Indeed, depending on the data, either one or both of the components of the posterior distribution (11) may be very different from their corresponding priors.
a. Detection and attribution
We next formalize the specialized inspections of the posterior distribution intended to serve as Bayesian procedures for detection and attribution. One motivation is to offer quantitative measures of direct “practical significance” of results as opposed to restriction to indirect “statistical significance.” We define “no significant climate change” to be the event that a ∈ 𝒟, where 𝒟 is some neighborhood of zero and ∈ is read as “is an element of.” Similarly, define another set, 𝒜, a neighborhood of the mean μA to represent an attribution set reflecting a considered to be plausible in relation to the physics of CO2 impacts on climate as reflected in the CSM model (in our case). Recall that the physical meanings of a and g are paired through their product. Hence, the sets 𝒟 and 𝒜 may vary with g.
With these definitions, we can use the Bayesian model for inferences regarding detection and attribution. The notions are intuitive:
The two requirements in the above statement of attribution need not be redundant. Consider the following example (the numbers are completely artificial, chosen for ready illustration). Suppose that â|a ∼ N(a, 1) [see (8)]. The prior parameters chosen are p = 0.5, τ2 = 1, μA = 5, and τ2A = 25. The observed value of â is −5. Computing (14), we obtain p(â) = 0.045, apparently suggesting that this data offers strong evidence for the anthropogenically forced climate change model. Intuitively, this suggestion seems silly. The observed â = −5 is actually closer to the center, 0, of the no-change prior distribution, than to the center, 5, of the forced prior distribution. However, in a sense, it is true that the forced model offers a better explanation of the data than does the no-change model. The key is the highly different values of the prior variances. Specifically, â is five standard deviations (under the no-change component of the prior) away from 0, but only two standard deviations away from 5 under the forced alternative component. That is, neither alternative offers a good explanation of this data, though in comparing only the two candidates, the forced model is far superior.
Two points should be clarified. First, the above behavior of p(â) could not occur if τ2 = τ2A. However, our analyses below do suggest that for our setting we anticipate that τ2 < τ2A, so this discussion is important. The second, more critical point is that sole reliance on too simple a summary of the posterior can be misleading, though the full posterior is well behaved. Computing the actual posterior (11) for our example yields:
This posterior is indeed far from either of the two components of the prior; the data have overwhelmed both of them. If we defined the negligible impact region 𝒟 to be the neighborhood (−0.5, 0.5), we have Pr(a ∈ 𝒟|â = −5) = 0.000 14. That is, we have a detection (of something); there is strong evidence of significant departure (as expressed in 𝒟) from the no-change hypothesis. On the other hand, if we define attribution regions 𝒜 to be even very large neighborhoods of μA = 5 [e.g., (2, 8)], we have Pr(a ∈ 𝒜|â = −5) = 0.0. That is, we find no evidence for attribution, despite the high posterior probability 0.955 on the forced alternative posterior model.
These examples, suggesting the potential need for care and inspection of the full posterior rather than simple summaries, may cause some readers to wonder if it might be easier to rely on non-Bayesian approaches. We think not, for a variety of reasons outlined in this article. Further, traditional testing is not immune to unintuitive results, but rather also requires care in interpretations. See Berger (1985) and Levine and Berliner (1999, p. 573) for examples.
Two important questions arise. First, who is to quantify what “small” and “large” mean in our definitions of detection and attribution? Second, who is to define the sets 𝒟 and 𝒜? Note that analogs of the first question also arise in non-Bayesian detection and attribution studies. Namely, who is to decide on cutoffs of achieved statistical significance (in the case of detection) and lack of statistical significance (in the case of attribution)? Regarding the specification of 𝒟 and 𝒜, some might suggest that such specifications are separate from the science of climate change. Even if that is true, we at least provide descriptions of tools for quantitative assessments of the practical impacts of human influences.
Our proposition of detection and attribution analyses are different from conventional, two-step approaches. This is primarily due to the points of the previous paragraph. The Bayesian framework is not limited to simply computing direct analogs of classical procedures. For example, Hasselmann (1998) wrote “. . . it is not possible in the Bayesian approach to separate formally between detection and attribution.” We disagree, and offered such a separation.
b. Classes of priors
The prior (10) assumes that μA, τ2, and τ2A are known quantities. In reality, we would be uncertain about these parameters. (This also is true for p; one could seldom argue that p should be 0.7253, but not 0.73. However, study of the impact of varying p is quite easy.) In the Bayesian paradigm, a hierarchical view, in which prior distributions for these quantities are incorporated into the analysis, could be adopted. An alternative is the robust Bayesian viewpoint. The idea is that our information seldom is refined enough to identify a single prior distribution. Rather, our prior information specifies a class of priors, denoted by Γ. We then study ranges of posterior measures (probabilities, estimates, etc.) as the prior varies over Γ. See Berger (1985, chapter 4) for overviews and examples of both the hierarchical and robust Bayesian viewpoints. Those descriptions include more general classes of priors (relaxation of reliance on Gaussian priors, etc.) and analyses.
As in (10), consider priors of the form π(a) = pn(0, τ2) + (1 − p)n(μA, τ2A). For ease in conveying the idea, we assume that τ2 is fixed, but we have substantial uncertainty in μA and τ2A. Similar calculations without this imposition are quite accessible, but our preliminary analyses for formulating estimates of the prior parameters suggest that this simplification is reasonable.
Define a collection of priors:
for specified bounds μl < μu and τl < τu. One example of a robust Bayesian calculation is to find the range of p(â) as the prior varies. During these calculations, â, p, σ2, and τ2 are fixed. Hence, from (14) we see that the maximum of p(â) is of the form
and the minimum is of the form
where the values M(â) and m(â) are to be determined. The mathematics for these optimizations are outlined in appendix B. The results have a useful interpretation in terms of odds ratios. The quantity (1 − p)/p is the prior odds in favor of the anthropogenically forced component of the distribution of a. The values M(â) and m(â) are extremal values of the corresponding odds as reflected by the data. The products of these values and the prior odds are extremals of posterior odds ratios. We return to this interpretation when presenting numerical results in the next section.
Computing bounds for Pr(a ∈ 𝒟|â) and Pr(a ∈ 𝒜|â) are more complicated than those for p(â). However, these bounds are easily found using simple numerical optimization methods.
We present posterior inferences using four subsets of the Jones data corresponding to the time periods: (a) 1961–98; (b) 1970–98; (c) 1980–98; (d) 1988–98. The year 1961 was chosen as the initial time point since the baseline from which the Jones data anomalies are computed begins in 1961. These successively shorter time periods are suggested since climate change trends are anticipated to be increasingly more visible during the latter part of the twentieth century. It should be noted that this reasoning suggests that we should use different priors for a depending on the time period analyzed. For simplicity, we do not do so here.
Table 1 presents the generalized least squares estimates â of a defined in (7) and the associated standard deviations σ computed from (9) for each of the four time periods considered. Note that, as anticipated above, these estimates grow when computed for time windows increasingly focused on the recent past.
a. Basic analysis
Our prior parameter estimates, resulting from the“batching” of CSM output described earlier, are τ = 0.02, μA = 0.17, and τA = 0.05. (We actually inflated standard deviations obtained from those model data analyses a bit.) For these prior parameter values and each of the four time periods, Table 1 presents the means and standard deviations [see (12) and (13)] for the two components of the posterior distribution (11). Table 1 also includes the posterior weight p(â) [see (14)] on the no change posterior model, assuming the prior weight p = 0.5. Figure 2 displays the prior and posterior distributions of a for p = 0.5. The plots also display the likelihood functions derived from (8). Note that the posterior distributions are bimodal for time periods a and b, but not time periods c and d. The bimodality is a consequence of the mixing of the two components in the posterior distribution (11). In the latter two time periods, the posterior mixture probability p(â) is essentially zero. Thus the posterior distribution in these time periods is solely the component of the posterior distribution under CO2 forcing. These results generally suggest that the data most resemble the component of the posterior distributions of a corresponding to anthropogenic forcing; this behavior seems overwhelming for the most recent time periods.
Figure 3 displays the probability p(â) of the “no change” component of the posterior model, as a function of the prior probability p. These probabilities are of course related. For example, the more confident one is in the hypothesis of “no climate change,” the more difficult it is for data to sway that opinion. The posterior probability p(â) is of moderate size for time period a. In fact, for p = 0.5, p(â) is about 0.70, providing more weight for the no change component of the posterior distribution. For time period b, the posterior probabilities p(â) lie well below 0.5 for most of the range of p, thus placing substantial weight on the “anthropogenic forcing” component of the posterior distribution. We have not included corresponding plots for the remaining time periods c and d. The evidence from these periods is so overwhelmingly supportive of the forced model that p(â) (and its upper bound) is essentially zero unless p is absurdly large (0.99 or so).
b. Detection and attribution results
Figures 4 and 5 present the posterior probabilities of detection and attribution regions, Pr(a ∈ 𝒟|â) and Pr(a ∈ 𝒜|â), respectively, as functions of p, for the time periods a and b. We selected 𝒟 = [0 − 0.05, 0 + 0.05] and 𝒜 = [μA − 0.05, μA + 0.05]. Figure 4 shows moderate probabilities of the detection region across the range of p for time period a suggesting little or no evidence of a detected signal. In fact, when p = 0.5, Pr(a ∈ 𝒟|â) is about 0.6. For the same data, Fig. 5 displays small posterior probabilities of the attribution region across the range of p. However, there is some suggestion of a detected change based on time period b in that Pr(a ∈ 𝒟|â) < 0.5 for most of the range of p. However, since the corresponding posterior probability Pr(a ∈ 𝒜|â) is on the order of 0.3–0.4 across the range of p, there is little evidence for attribution to the CO2 forcing represented in our CSM-based fingerprint. These moderate values may be expected after examining Fig. 2. In that figure, the likelihood function lies in between the two components of the prior mixture and the two components of the posterior distribution overlap significantly, particularly for time period a.
For time periods c and d, the data strongly support the CO2 forcing model. In particular, the posterior probabilities Pr(a ∈ 𝒟|â) are essentially zero (less than 6 × 10−3 for c; less than 10−6 for d) for reasonable prior probabilities p. The posterior probabilities Pr(a ∈ 𝒜|â) are constant across the range of p with values 0.92 (period c) and 0.73 (period d). We thus do not include these probabilities for time periods c and d in Figs. 4 and 5. The constant values arise because the posterior weights 1 − p(â) on the forced posterior component is essentially one for all p. (We explain the apparent decrease in this probability based on period d below.)
c. Robust Bayesian results
To illustrate robust Bayesian results for a plausible class of priors, we specified Γ based on the ranges μA ∈ [0.15, 0.19] and τA ∈ [0.05, 0.07] (recall that our baseline prior had μA = 0.17, τA = 0.05).
Table 2 displays the computed values of M(â) and m(â) needed to find the extrema of p(â) described in (19) and (20). These extrema are plotted as functions of p for the first two time periods in Fig. 3. For time period b, the results are quite strong. For example, we have that for p = 0.5, 0.041 ⩽ p(â) ⩽ 0.104; even for the large value p = 0.75, 0.115 ⩽ p(â) ⩽ 0.258.
For the two later time periods c and d the very strong evidence against the no change hypothesis remains even over this class of priors. The values of m and M for these time periods are overwhelmingly supportive of the CO2 forced distribution. The posterior probability of the no change model is essentially zero (to three decimal points) over the entire class for the third data subset, unless one has an absurdly high prior odds on the no change model. For example, one would need a prior odds in favor of no change to about 6300-to-1 to be indifferent on the hypotheses; that is, for the posterior range on p(â) to include 0.5. The results for the last data subset are even more extreme. However, we do not claim that based on the fourth time period, we have proven that starting with an indifferent prior (p = 0.5), the odds in favor of the CO2 forced model are at least 43 524-to-1. Such extreme odds and corresponding probabilities are computed as tail events of the Gaussian distributions assumed in this article. While these Gaussian distributions appear plausible, they can imply probabilities of tail events that are subject to round-off error and are not testable or believable. Further, these probabilities are based on all the other assumptions made concerning other parameter specifications. Hence, while we find these results compelling, the actual values obtained should not be considered exact when they are too extreme to be experienced. However, rounding off 43 524-to-1 to 1000-to-1 or even 100-to-1 remains of serious interest.
Figures 4 and 5 present the upper and lower bounds on the detection and attribution probabilities, Pr(a ∈ [−0.05, 0.05]|â) and Pr(a ∈ [μA − 0.05, μA + 0.05]|â), respectively for time periods a and b. Notice that the ranges for the probability of the detection region are tight for each of the two time periods. Though not displayed here, the latter two time periods c and d show similar behavior with probability ranges bounded very close to zero. This suggests that our interpretation of these posterior probabilities is “robust” with respect to priors varying within Γ. Our conclusion is that there is very strong evidence of a detected signal based on the last three time periods.
The attribution region probabilities are also robust over Γ. Based on time period a, the posterior probabilities of 𝒜 lie within a range of zero to 0.3, suggesting no ability to attribute a climate change signal to anthropogenic forcing. For time period b, the attribution probabilities may fall slightly above 0.5 depending on prior parameter specification on Γ. The range over which the posterior probability of 𝒜 may fall is tight for periods c and d as well, being (0.8, 0.92) and (0.45, 0.95) respectively. These ranges are constant across p and are thus not shown in Fig. 5. Hence our conclusion that the data most resemble the component of the posterior distribution corresponding to anthropogenic forcing is reasonable over this range of prior distributions. The larger range, (0.45, 0.95), on the last time period is a consequence of the large estimated signal, â = 0.22, actually being greater than the upper bound on μA used to define Γ. Since these data do intuitively support strong projection onto our fingerprint, we still consider this good evidence for attribution. In fact, had both our range for μA been more conservative and our definition of 𝒜 been larger to include large a, the resulting probabilities of 𝒜 would be much greater. (Recall that such increases are actually in accord with our prior beliefs, though we did not vary the prior on a with the time period considered.)
a. An alternate fingerprint
We performed an analogous analysis using the CO2 fingerprint (see Fig. 6) developed in Santer et al. (1995). This fingerprint uses the GRANTOUR/CCM1 experimental configuration (Taylor and Penner 1994). Table 3 presents the GLS estimates â of a and associated standard deviations σ for the time periods a–d. Note that these results are similar to those based on our CSM fingerprint, except that the estimated amplitudes for the Santer et al. fingerprint are consistently smaller. However, recall our earlier alert that amplitudes and fingerprints should be viewed as arising in pairs. This means that we should redefine our basic prior, class of priors, and detection and attribution regions for use in analyzing the Santer et al. fingerprint. One option is to repeat the batching ideas used earlier. A second option is to view the adjustments needed as a problem of standardization to place each fingerprint on a common scale for a fair comparison. We chose the second option.
Let g denote our CSM fingerprint, w denote the Santer fingerprint, and ag and aw denote the corresponding amplitudes. We consider renormalization of the fingerprints by their L2 norms (sum of squared values): ‖g‖2 = 90 and ‖w‖2 = 100. Standardizing each fingerprint by the square root of its norm, we suggest that if we are to compare projections of the data
then we expect that aw ≈ 0.95 ag.
A simple transformation of the prior developed for ag takes us to a comparable prior for aw: namely, μA = (0.95)(0.17) = 0.162, τA = (0.95)(0.05) = 0.0475, and τ = (0.95)(0.02) = 0.019. Since this argument is quite ad hoc and leaves us less certain about where we anticipate aw to be, we thought it reasonable to use a larger class of priors for a robust Bayesian analysis: namely, we assumed that Γw be generated by the ranges μA ∈ [0.13, 0.21] and τA ∈ [0.05, 0.08]. (Compare these with μA ∈ [0.15, 0.19] and τA ∈ [0.05, 0.07] used for our fingerprint.)
For brevity, we present results only for detection and attribution analyses. We again associate detection with small probabilities Pr(aw ∈ [−0.05, 0.05]|â), and attribution with large probabilities Pr(aw ∈ [μA − 0.05, μA + 0.05]|â) (recall, now the baseline μA = 0.162). Figures 7 and 8 are plots of these probabilities and ranges as functions of p for the time periods a and b. For time periods c and d, the probabilities of the detection region are tightly bounded around zero for reasonable values of p and thus not presented in Fig. 7. For these same time periods, the probabilities of the attribution region are constant across the range of p with values 0.65 and 0.95 and constant bounds from the robust analysis (0.05, 0.98) and (0.90, 0.96), respectively.
The results appear somewhat less definitive for the Santer fingerprint than for the CSM fingerprint. In particular, the ranges on probabilities appear larger, but of course the class Γw is larger than its counterpart. However, there is one major exception: time period d now yields very strong and robust attribution. Note that the estimate of aw from this period does fall in the range of means defined by Γw. Indeed, our overall conclusion is in concert with that of our CSM-based fingerprint: namely, there is strong evidence in the recent data for detection and attribution to CO2 forcing as reflected in either fingerprint.
b. Comparison to traditional results
We provide traditional, non-Bayesian detection and attribution results. By non-Bayesian detection, we mean rejection of the null hypothesis that a = 0, when compared to the alternative hypothesis a > 0. Such a test rejects when the significance probability or p value for the test is small. By non-Bayesian attribution, we mean failure to reject (or “acceptance” of) the null hypothesis that a = μA, when compared to the alternative hypothesis a ≠ μA. Intuitively this is indicated by a relatively large significance probability. Note that the results given in Table 4 are generally in accordance with simple summaries of our Bayesian results. One exception is the apparent strong, traditional detection results for both fingerprints based on the full data for the time period 1961–98. Our companion Bayesian analyses are more conservative. This is a result of the fact that the non-Bayesian calculation only considers the plausibility of the hypothesis a = 0 exactly, while the our Bayesian formulation studies the plausibility of a lying near zero. Such conservatism through more attention to uncertainty suggests that one may derive comparatively more confidence in our Bayesian detections when they do occur. The situation is somewhat reversed in the context of attribution. The traditional test considers the plausibility of the the value a = μA exactly, while our Bayesian formulation essentially considers events of the form “a lies near μA.” Nevertheless, the use of robust Bayesian calculations in conjunction with inspection of practical significance of results suggest both the inferential value and robustness of the Bayesian viewpoint.
Though the form of GLS estimates used here is simple, the actual calculations for finding â and σ2 are not easy, primarily due to the presence of the inverse of the very high dimensional matrix V. However, the calculations only require that we compute two quadratic forms [the numerator and denominator in (7)]. Rather, than attempting inversion of V, we numerically solved the linear equation
Since z = V−1H, the computations are completed by finding two products: Y′z and H′z. We used the conjugate gradient algorithm to approximate the solution z (Golub and van Loan 1996).
An intriguing and important issue involves the selection of fingerprints. In our analysis we viewed the fingerprint g as fixed. Of course, we might also quantify our uncertainty about g (e.g., Allen and Tett 1999). (Before doing so, we must entertain the question of the existence of a g.) In developing a joint prior on ag and g, a two-step thought process may be a value. Any joint prior, π(g, ag) can be factored as π(g, ag) = π(ag|g)π(g). This means one can build the prior on the fingerprints, and then a prior on amplitudes conditional on fingerprints. For example, we might envision a collection of models or theories, each of which produces particular forms of fingerprints, and then implied distributions on the ag. One can use the Bayesian strategy to assess the models (see Leroy 1998), as well as to combine information from them. In the statistics literature, these tracks are known as model selection and model averaging, respectively.
The specification and interpretations of modeled quantities are intricately related. We have already alluded to the intertwining of fingerprints g and amplitudes a. In addition, the meaning of climate variability, expressed through Σ must be gauged against the specific that the primary model is essentially a signal-plus-noise model. That is, Σ does not exist in isolation but rather is defined to reflect variation around a particular specification of the mean (taken here to be zero) of anomalies. The difficulty is that the anomalies are in turn defined as departures from some, typically ad hoc, baseline. These apparently subtle issues merit more careful attention to enable unequivocal interpretation of statistically estimated amplitudes.
e. Objectivity versus subjectivity
Bayesian statistics is by nature subjective; people with different priors may arrive at different conclusions. This issue has a long history and many interesting points of discussion (Berger 1985; Bernardo and Smith 1994), especially since traditional statistics methods are often thought to be objective. In complex settings it is difficult to argue that only objectivity really guides analysis; a variety of assumptions are made about the modeling. Indeed, substantial prior information is used in deciding on methods of analysis and data sources to be used, as well as in the final interpretation of results. Rather than “sweeping them under the carpet” (a phrase due to I. J. Good in this context), the Bayesian seeks to make clear what the assumptions are and quantifies them. This enables comparatively simple assessments of model assumptions and permits inspections of the sensitivity of results to these assumptions. Of course, we admit that there is essentially no such thing as a perfect Bayesian, and in that spirit, alluded to the robust Bayesian viewpoint.
However, even if we accept that classical statistical results are objective, we reiterate that they are only indirect answers to the fundamental questions typically of interest. Further, what about climate change science should be objective? The public is not interested in truly objective predictions regarding climate change, and so on. Rather, the most informed, intelligent predictions ought to be the goal. This statement should not be misinterpreted to suggest that analysts should base analyses on their personal hidden agendas, bounties on “proofs” of anthropogenic climate change or the reverse, and so on. Rather than being objective, analyses should be impartial (a word used by Laplace in this context), but informed by the best information and scientific opinion available. Again, there may be no such thing as a perfect and impartial Bayesian. Nevertheless, Bayesians suggest that the most effective way to seek impartiality as a community is for analyses to include description and quantification of the prior information used.
This research was supported in part by the Geophysical Statistics Project at the National Center for Atmospheric Research in Boulder, Colorado. LMB was also supported by the U. S. Environmental Protection Agency Agreement R827257-01-9. We are grateful to Rol Madden, Doug Nychka, and Chris Wikle for valuable discussions and comments on preliminary versions of this article. Tim Hoar shared valuable advice on large dataset manipulations. We also thank Phil Jones, Tim Osborne, and Mark New for providing “Jones data” and error variance estimates; Lawrence Buja for help in accessing CSM data; and Tom Wigley for providing the “Santer et al. fingerprint.” Last, the Editor, Francis Zwiers, and two referees provided valuable and insightful comments and suggestions for this article.
Estimation of Variability
Data errors covariance
The observational error covariance matrix D was obtained from the standard errors of Jones et al. (1997). These standard errors were computed for each decade from 1856 to 1998. Figure A1 presents the average of these decadal standard errors. We assumed the yearly standard errors were equal to each other within each decade. Hence, we set the annual standard errors to be (10)1/2 times the decadal standard errors provided in Jones et al. (1997).
Natural climate variability
First, we define the Kronecker product of two matrices BI×J and CK×L. Let B(i, j) denote the i, jth element of B. Define B ⊗ C to be the IJK × IJL matrix obtained by expanding the products B(i, j) × C over all pairs (i, j).
Our estimation of Σ is based on a critical assumption. Namely, we assume that Σ is space–time separable, in that it can be factored as
where Σc is an m × m matrix representing temporal dependencies and Σs is an n × n matrix representing spatial dependence structure. This assumption implies that we can estimate the spatial and temporal matrices separately. While we do not believe space–time separability is strongly defensible, we make the assumption for practical reasons. It also offers some adjustment for temporal dependence.
Estimation of Σc
We set Σc to be the correlation matrix for an autoregressive process of order one (Brockwell and Davis 1991). In particular, all diagonal elements are equal to one. Off diagonal elements are of the form ρd, where d is the time lag in years. We estimated ρ from the CSM control run. The result was ρ̂ = 0.33.
Estimation of Σs
For the space–time separable model with Σc specified as a correlation matrix, Σs is a spatial covariance of Tt for every year t. That is, it can be estimated (using surrogate model output) by averaging over years. Data from the 300-yr NCAR CSM control run are denoted by C = (C′1, . . . , C′300)′, where Ci is an n vector of the model output of global temperatures at time i = 1, . . . , 300. We model Σs as the sum
where the λi − Pi are the first 20 singular value EOF pairs from the NCAR CSM control run. Here υ was set to 100. (The division by 300 is an adjustment for sample size.)
Robust Bayesian Calculations
We compute the minimum and maximum values of p(â) over the class Γ defined in (18). We assume that the lower bound, μl, on μA is a positive value. We also assume that â is positive.
We rewrite p(â) in (14) as
Minimization (maximization) of p(â) involves maximization (minimization) of Q. The special form of Q enables these optimizations in two simple steps.
Case 1: â < μl
To minimize Q we should first set μA = μu, since Q(μu, τ2A) ⩽ Q(μA, τ2A) for all τA in Γ. It remains then to minimize Q(μu, τ2A) with respect to τA. The derivative of this function with respect to τ2A is
where H is strictly positive. Therefore, we must consider three subcases:
Case a: If (μu − â)2/(τ2A + σ2) > 1 on Γ, Q is increasing and the minimizing value of τ2a is the endpoint τ2l.
Case b: If (μu − â)2/(τ2A + σ2) < 1 on Γ, Q is decreasing and the minimizing value of τ2a is the endpoint τ2u.
Case c: If (μu − â)2/(τ2A + σ2) = 1 somewhere in Γ, Q is minimized at either τ2l or τ2u; one simply checks these two values numerically.
Next, to maximize Q we first set μA = μl, since Q(μl, τ2A) ≥ Q(μA, τ2A) for all τA in Γ. To maximize Q(μl, τ2A) with respect to τA, repeat the differentiation argument as in (B3). This again leads to three subcases:
Case a: If (μl − â)2/(τ2A + σ2) > 1 on Γ, Q is increasing and the maximizing value of τ2a is the endpoint τ2u.
Case b: If (μl − â)2/(τ2A + σ2) < 1 on Γ, Q is decreasing and the maximizing value of τ2a the endpoint τ2l.
Case c: If (μu − â)2/(τ2A + σ2) = 1 somewhere in Γ, that value is the maximizer; namely, τ2A = (μl − â)2 − σ2.
Case 2: μl ⩽ â ⩽ μu
To minimize Q we set μA to be that endpoint of the interval that is farthest from â. Let μ* denote the result. That is, μ* = μu, if
otherwise, μ* = μl. Then we again turn to differentiation to complete the minimization.
Case a: If (μ* − â)2/(τ2A + σ2) > 1 on Γ, the minimizing value of τ2a is τ2l.
Case b: If (μ* − â)2/(τ2A + σ2) < 1 on Γ, the minimizing value of τ2a is τ2u.
Case c: If (μ* − â)2/(τ2A + σ2) = 1 somewhere in Γ, the minimizer is either τ2l or τ2u.
To maximize Q we first set μA = â, since Q(â, τ2A) ≥ Q(μA, τ2A) for all τA in Γ. Hence, we need only maximize (τ2A + σ2)−1/2, which of course leads to τ2l as the solution.
Case 3: â > μu
This case is similar to Case 1. To minimize Q we set μA = μl. Then
Case a: If (μl − â)2/(τ2A + σ2) > 1 on Γ, the minimizer is τ2l.
Case b: If (μl − â)2/(τ2A + σ2) < 1 on Γ, the minimizer is τ2u.
Case c: If (μl − â)2/(τ2A + σ2) = 1 somewhere in Γ, the minimizer is either τ2l or τ2u.
To maximize Q set μA = μu. Next,
Case a: If (μu − â)2/(τ2A + σ2) > 1 on Γ, the maximizer is τ2u.
Case b: If (μu − â)2/(τ2A + σ2) < 1 on Γ, the maximizer is τ2l.
Case c: If (μu − â)2/(τ2A + σs2) = 1 somewhere in Γ, the maximizer is (μu − â)2 − σ2.
Corresponding author address: Mark Berliner, Department of Statistics, The Ohio State University, 1958 Neil Avenue, Columbus, OH 43210-1247.
* The National Center for Atmospheric Research is sponsored by the National Science Foundation.