A Bayesian analysis of the evidence for human-induced climate change in global surface temperature observations is described. The analysis uses the standard optimal detection approach and explicitly incorporates prior knowledge about uncertainty and the influence of humans on the climate. This knowledge is expressed through prior distributions that are noncommittal on the climate change question. Evidence for detection and attribution is assessed probabilistically using clearly defined criteria. Detection requires that there is high likelihood that a given climate-model-simulated response to historical changes in greenhouse gas concentration and sulphate aerosol loading has been identified in observations. Attribution entails a more complex process that involves both the elimination of other plausible explanations of change and an assessment of the likelihood that the climate-model-simulated response to historical forcing changes is correct. The Bayesian formalism used in this study deals with this latter aspect of attribution in a more satisfactory way than the standard attribution consistency test. Very strong evidence is found to support the detection of an anthropogenic influence on the climate of the twentieth century. However, the evidence from the Bayesian attribution assessment is not as strong, possibly due to the limited length of the available observational record or sources of external forcing on the climate system that have not been accounted for in this study. It is estimated that strong evidence from a Bayesian attribution assessment using a relatively stringent attribution criterion may be available by 2020.
Statistical analysis plays a major role in climate change detection and attribution studies. The method widely used in such studies is known as “optimal fingerprinting” and utilizes the generalized multiple regression model
where vector y contains a filtered version of the observed temperature record, matrix 𝗫 contains estimated response patterns (signal patterns) to known forcings on the climate system, and vector ɛ contains natural variability that is generated by internal climate processes. The filtering process that is applied to the observations typically results in an observed vector y with only 10–20 elements. Each column in matrix 𝗫 contains the same number of elements as y, and a separate column is used for each of the external forcing factors that is taken into account in the analysis. Vector β contains one element for each signal taken into account. The signal pattern estimates contained in the columns of 𝗫 are typically obtained by using a coupled climate system model (see, e.g., Flato et al. 2000) to simulate the response to known forcing changes in the twentieth century. Vector ɛ is assumed to be a realization of a Gaussian random vector consisting of correlated elements.
The forcings that are most often accounted for in optimal fingerprinting studies result from anthropogenic changes in greenhouse gas concentrations and sulphate aerosol loadings. Some studies also account for anthropogenic changes in the distribution of ozone, and variability in natural external forcing factors such as changes in the solar output and in the stratospheric loading of aerosols from explosive volcanic eruptions. Because climate models simulate natural internal variability as well as the response to specified changes in external forcing, the response to forcing is typically estimated by averaging across an ensemble of simulations of the twentieth century. These signal estimates, which must be linearly independent, are inserted into the columns of matrix 𝗫. The vector of parameters β accounts for the possibility that the amplitudes of the climate model responses to the specified external forcings may not be correct by allowing us to scale the signal patterns to best match the pattern of change that is contained in the observations. A scaling factor different from unity might also indicate that the observations have been affected by one or more additional forcing factors not included in the model that may have similar patterns of response.
Detection and attribution questions are assessed through a combination of deductive reasoning (to determine whether there is evidence that other mechanisms of change not included in the climate model could plausibly explain the observed change) and by testing specific hypotheses on β. Detailed accounts of the optimal fingerprinting method are given by Hasselmann (1979, 1997), Allen and Tett (1999), and Allen et al. (2004). See also the authoritative review of Mitchell et al. (2001).
The detection of a postulated climate change signal occurs when its amplitude in observations is shown to be significantly different from zero. This is handled in the standard optimal fingerprinting approach by testing the null hypothesis
where 0 is a vector of zeros. Rejection of HD by a one-sided test of significance leads to detection at a specified level of significance.
Attribution refers to the process of establishing a cause and effect relationship between the observed change and the hypothesized forcing agents. The requirements for making an attribution claim (e.g., see Mitchell et al. 2001) are detection, elimination of other plausible causes, and evidence that the observed change is consistent with the estimated response to external forcing; that is, β = 1 where 1 is a vector of units. Classical “optimal fingerprinting” analysis uses a hypothesis test called the attribution consistency test (Hasselmann 1997; Allen and Tett 1999) to test the null hypothesis
Formally, consistency between the observed and climate-model-simulated response to forcing can be claimed when HA cannot be rejected. While a failure to reject HA does not constitute direct evidence for the hypothesis, consistency does provide support for attribution, particularly when the assessment of HA involves the simultaneous testing of several externally forced signals.
The Bayesian approach to assessing evidence (e.g., Hasselmann 1998; Leroy 1998; Berliner et al. 2000) provides an alternative to standard (frequentist) hypothesis testing. The Bayesian approach as presented by Berliner et al. (2000) consists of three main steps:
(i) “Optimal” (generalized linear) regression to estimate the vector of amplitudes β that produce the best fit between the climate model estimates of the response to external forcing and the observed data.
(ii) Identification of the posterior distribution of the amplitudes. This is achieved by combining prior knowledge about the amplitudes that has been expressed as a prior probability distribution with the optimal estimates obtained in (i).
(iii) A detection and attribution assessment that is done by examining the posterior probabilities for appropriately defined detection and attribution criteria.
The standard optimal fingerprinting analysis, in contrast, consists of step (i) followed by a step (iii′) in which hypotheses HD and HA are tested. Differences in the clarity of the inferences that can be made with these two approaches will become apparent in the example that will be described below.
We will see that the choice of inference technique does not affect the conclusion that there is strong evidence that the warming observed during the latter half of the twentieth century is unlikely to have been entirely due to natural internal variability (Houghton et al. 2001). Nevertheless, the Bayesian approach has the advantages that it allows incorporation of prior knowledge into the analysis and, more importantly, that it provides direct probabilistic assessments of the evidence for detection and attribution.
The particular Bayesian approach used in this paper is a modified version of Berliner et al.’s (2000) method. Our modifications include changes in the details of how the observations are filtered, the use of a different climate model to estimate the signals of interest, the use of a different approach for specifying the parameters of the prior distribution on the signal amplitudes, and a somewhat different approach to the assessment of the results. We apply the method to the detection of an anthropogenic signal in observed surface air and sea temperatures, and through this application we highlight issues associated with its implementation. The remainder of this paper is organized as follows. We describe the observations and the Bayesian method with which we study the climate change problem in section 2. Our findings are presented in section 3. Discussion and concluding remarks are given in section 4.
2. Data and methods
Our study is set up in a manner similar to Zwiers and Zhang (2003). The observational dataset used is the HadCRUTv dataset (Jones et al. 2001), which is based on monthly values of surface air and sea temperature anomalies relative to 1961–90 and is presented on a 5° latitude by 5° longitude grid.
We use only one externally forced climate signal in this study—that being the combined response to historical changes in greenhouse gas concentrations and the direct effect of sulphate aerosol loading. Following standard convention, we will refer to this signal as the “GS” signal. Our estimates of the GS signal, and of natural climate variability, are constructed from two versions of a coupled global climate model. The models used are the Canadian Center for Climate Modelling and Analysis (CCCma) CGCM1 (Flato et al. 2000; Boer et al. 2000) and CGCM2 (Flato and Boer 2001). Details of the models and an extensive archive of model output are available from the CCCma Web site (http://www.cccma.ec.gc.ca). Ensembles of three GS simulations are available from each model, for a total combined ensemble of six GS simulations. The GS signal is estimated by averaging across this six-member ensemble.
b. Analysis strategy
The data and climate model output that we use are preprocessed to restrict the analysis to large spatial scales and to filter as much natural internal variability as possible from the GS signal estimate. Thus our analysis is conducted on decadal averages of observed annual-mean temperature anomalies relative to the climatology for the twentieth century, calculated from nonmissing years in the observed data. These decadal anomalies are aggregated into 30° latitude by 40° longitude grid boxes for both observations and climate model output.
Separate analyses are performed for each of six overlapping five-decade periods (1900–49, 1910–59, 1920–69, 1930–79, 1940–89, and 1950–99). To avoid systematic bias, missing data are not filled in. Model output is flagged as missing whenever the corresponding observations are missing so that the length of the y vector is the same as the row dimension of the 𝗫 matrix. Observed annual means are treated as missing even if one month within the year is missing. Decadal means are treated as missing if fewer than 6 of the 10 years are present. In the absence of any missing data, the observational vector y would have length 5 × 6 × 9 = 270 for any one five-decade period, where 6 × 9 = 54 is the number of 30° latitude by 40° longitude grid boxes that cover the globe. Missing data reduces the dimension length to a number ranging from 220 for the period 1900–49 to 253 for the period 1950–1999.
c. Introduction to Bayesian aspects
Before we proceed with our analysis, readers not familiar with Bayesian statistics may need some clarifications on steps (ii) and (iii). We should begin by noting that the Bayesian aspect of our analysis begins with the reduced information that is obtained from the optimal regression analysis. That is, we use Bayesian tools to make inferences about β beginning with the estimate β̂ that was obtained by means of optimal regression. A more complete analysis would include the regression step within the Bayesian framework, and perhaps even the signal matrix 𝗫. However, this would significantly complicate the application of the Bayesian paradigm (see, e.g., the discussion in Berliner et al. 2000).
Bayesian inferences are based on the posterior distribution, which in our case is the conditional distribution of β given the estimated value β̂ from the “optimal” regression in step (i). To see how such a distribution is obtained, let π(β) be the density function of the prior distribution of β that expresses our prior knowledge about the true value of the amplitude vector before observing y and estimating β̂. After observing y and estimating β̂, our knowledge of β is updated by calculating the posterior distribution of β conditional upon β̂, which is denoted π(β/β̂). By Bayes theorem, the posterior distribution is given by
where f(β̂|β) is the conditional density of β̂ given the true amplitude β.
The posterior distribution contains all of the available information about β and thus any statistical inferences concerning β should be drawn from this distribution. Similar to Berliner et al. (2000), our detection criterion requires that β have a large posterior probability of lying outside a predefined neighborhood of 0. Because the external signals can be defined so that positive values of β would be expected when the signals are present in the observations, our “neighborhood” of 0 includes all negative amplitudes. Thus in our one signal case, the detection region simply reduces to an open-ended interval on the positive β axis. Similarly, in the multisignal case, the resulting multidimensional detection region is a positive quadrant of the β plane. Our criterion for assessing whether there is evidence to support attribution requires that β have large posterior probability of lying in a predefined neighborhood of 1. This neighborhood is referred to as the attribution region. In both cases, the posterior probabilities are calculated by integrating the posterior distribution [Eq. (2)] over these predefined regions.
The following subsections provide the details of our analysis procedure.
d. Optimal detection procedure
To perform the “optimal” regression, we implemented the method outlined by Allen and Tett (1999). Thus the estimated scaling factor obtained from the generalized linear regression is
where N is an estimate of the covariance matrix of ɛ. To ensure the variance of β̂ is not negatively biased because of possible overfitting (Hegerl et al. 1997), the variance of β̂ is estimated by
where N is a second, statistically independent estimate of the covariance matrix of ɛ.
The covariance matrices must be estimated from control simulation output from climate models because the available instrumental record does not contain sufficient information to do this reliably. Thus as in Zwiers and Zhang (2003), our covariance matrices are estimated from 1600 years of control simulation, composed of 600 years from CGCM1 and 1000 years from CGCM2. The 1600 years of control output is partitioned into two 800-yr subsets consisting of the first 300 years from CGCM1 and the first 500 years from CGCM2 in one case, and the last 300-yr and 500-yr segments in the other.
Covariance matrix N is estimated from the 72 overlapping five-decade control run chunks contained in the first control subset. Note that the chunks do not span the boundary between the 300-yr CGCM1 and 500-yr CGCM2 segments. Each 50-yr chunk is processed in the same manner as the observations to obtain a sequence of five decadal means that are then masked as the observations and aggregated into 30° × 40° grid boxes. The covariance matrix is then estimated from these processed sequences of control run output. Note that this estimation process is repeated for each of the five-decade analysis periods because the details of the masking, which reflect observational coverage, change from one period to the next. Matrix N is estimated similarly from the second subset of control run output.
A difficulty encountered in all detection studies using the optimal fingerprinting technique is that, even with the use of long control simulations, the covariance matrices N and N are not invertible. This occurs because the dimension of y remains large relative to the length of the available control simulations even after preprocessing into decadal and 30° × 40° box means. The exact dimension of y varies with the particular five-decade period being analyzed, but it is always larger than the number of overlapping five-decade sequences (72) that are available in each of the control simulation subsets. Consequently, the regression is performed in the space spanned by the k leading EOFs of N. A residual consistency test (Allen and Tett 1999) is performed to determine the appropriate truncation level k. A simple schematic that outlines the optimal fingerprinting approach can be found in Weaver and Zwiers (2000).
Once the “optimal” regression has been performed and β̂ computed, we can proceed with steps (ii) and (iii). This requires the selection of a prior distribution for β and the calculation of the resulting posterior distribution.
e. The prior distribution
An important element of the analysis is the choice of prior distribution. This distribution embodies our prior knowledge about the magnitude of the response to the external forcing of the climate system that is being investigated. This knowledge can be either subjective or objective in nature. An example of subjective knowledge that could be incorporated into the prior is a judgment about whether the climate has the potential to be affected by anthropogenic forcing. Such a judgment, which might be based on theoretical considerations, should be independent of observations gathered during the period that is represented by y. An example of how objective knowledge could be incorporated into the prior is that the width of the prior distribution could be chosen to reflect uncertainty in the amplitude of the response of climate models to known external forcings. Such an estimate of uncertainty might be obtained by performing “cross model” studies in which we attempt to detect a signal estimated with one climate model in the output of several other models (e.g., Hegerl et al. 2000). The width of the prior might also reflect uncertainty in the pattern of response, in the sense that a biased (but not totally orthogonal) response pattern may require a scaling factor different from unity to make the best fit with observations. In addition, the width of the prior could be augmented to reflect uncertainty in the forcing itself. Some considerations based on energy balance model (EBM) experiments that informed our specific choices of prior distributions are discussed below.
More generally, uncertainty can be modeled explicitly in the Bayesian framework by building hierarchical models in which uncertainty in the forcing and signal response patterns is explicitly taken into account. Alternatively, one could perform a robust Bayesian analysis in which the prior distribution is varied systematically over a range of possibilities that reflects the uncertainty of prior knowledge more completely than a single distribution (Berliner et al. 2000). However, we considered such approaches to be beyond the scope of this paper.
To make the above more concrete, we use an EBM to explore how climate model and forcing uncertainty might translate into uncertainty on β. To determine the uncertainty range, we have employed a suite of EBM simulations that reflect an estimate of climate model and forcing uncertainty. EBM simulations reproduce many of the large-scale temperature responses of general circulation models (e.g., Crowley et al. 1991). EBMs do not simulate internal climate variability (unless forced with noise) and thus they produce climate change signal estimates that are not affected by internal climate variability. Here we employ a linear two-dimensional (i.e., realistic land–sea distribution) seasonal model that has been used in comparisons to paleoclimate reconstructions (Crowley et al. 2003; Hegerl et al. 2003).
Parameters such as the climate sensitivity and ocean heat uptake must be set explicitly in EBMs. To reflect the range of climate model uncertainty and the most important forcing uncertainty, three parameters in the EBM simulations are varied:
The climate sensitivity, which represents the equilibrium response to CO2 doubling, was varied in 0.5°C increments from 1.5° to 4.5°C. This reflects the range presently covered by most coupled climate models as reported by Houghton et al. (2001).
The diffusivity, which determines the ocean heat uptake in the EBM, was set to values between 0.95 and 2.5 cm2 s−1, yielding a conservative range embracing an observational estimate from tracer studies of 1.7 ± 0.2 cm2 s−1 (Li et al. 1984).
Since EBMs cannot distinguish between direct and indirect tropospheric aerosol forcing, we also employed direct aerosol forcing starting in 1850 that reaches values between 0.5 and 4.0 W m−2 by 2000 in increments of 0.5 W m−2 for the latitude strip of 30°–90°N. The forcing in other latitude bands was tapered to lower values in successive equal-area strips so as to yield a total global aerosol forcing of approximately 0.3 to 2.5 W m−2. This approximately covers the Houghton et al. (2001) range for the total combined direct and indirect aerosol forcing.
The first two parameters are important determinants of the large-scale response of climate models to external forcing (cf. Forest et al. 2000), while the total value of aerosol forcing is highly uncertain (Houghton et al. 2001) and was therefore also varied. Note that all simulations are forced with changes in greenhouse gases, direct aerosols, and solar and volcanic forcing (see Crowley et al. 2003 for details). For practical reasons, we use the simulated Northern Hemispheric mid- to high-latitude (30°–90°N) mean annual-mean temperature to assess the variations in response that result from the prescribed variations in parameter settings and aerosol forcing.
We chose one of the simulations, whose parameters characterize the CCCma models (sensitivity 3.5°C, 30°–90°N aerosol forcing of −2 W m−2, and midrange ocean heat uptake) and determined by simple linear regression the scaling that brings this simulation into closest agreement to each member of the suite of simulations. The scaling factors range between 0.15 and 1.84 for the 1950–99 period when all three parameters are varied. The range becomes 0.57 to 1.15 if climate sensitivity is the only free parameter (with diffusivity fixed to 1.9 cm2 s−1 and the 30°–90°N average aerosol forcing set to −2.0 W m−2). The ranges are wider if the entire twentieth century is used to determine the scaling factors (0.50 to 1.20 if only the sensitivity is varied). The asymmetry of the scaling ranges, with a greater part of the range below unit scaling than above, reflects the fact that CCCma’s models have a moderately large climate sensitivity.
Using these results as guidance and acknowledging that not all sources of uncertainty have been taken into account, it would be appropriate for a simple Gaussian prior that gives the greatest prior likelihood to β = 1 to have variance τ2 ≈ 0.25. This yields a plus or minus two standard deviation prior uncertainty range for β of 0.0 to 2.0 that is inclusive of the ranges derived above, and provides a central plus or minus one standard deviation range for β of 0.5 to 1.5.
To examine the sensitivity of the results to a range of plausible choices of prior, we construct three alternative priors, each of which is noncommittal on the climate change question.
where p = 1/2 and the component distribution variances τ21 and τ22 are set to 0.25. A value of p other than 1/2 could be used to reflect the analyst’s degree of belief that the signal X is either present (p near 0) or absent (p near 1) in the observations y (Berliner et al. 2000; Min et al. 2004). The choice of prior with p = 1/2 gives equal weight to the possibilities that the GS signal is either absent or present in the data, and conditional on the latter, reflects our estimates of the effect of climate model and forcing uncertainty on β. The width of the prior conditional on the former possibility is set to the same value for convenience. The resulting mixture distribution gives the greatest amount of weight to β values in the interval [0, 1], but does not give preference to any particular subset of values in this range.
To ensure that our results are not sensitive to the choice of prior, we consider two additional distributions that express prior knowledge and uncertainty, or the lack thereof, in different ways. The first, π0 (not shown), is identical to π1 except that the variance of its two components is only 0.1 instead of 0.25. This prior is still noncommittal about whether the GS signal is present in the observations, but it expresses greater confidence that the climate model is able to respond correctly to the specified GS forcing. The other prior distribution we consider is the noninformative prior, π2(β) ∝ 1, an improper distribution that gives equal weight to all values of β between −∞ and ∞ (Fig. 1b). This distribution expresses a complete absence of prior knowledge about β by not favoring any value of β over another.
f. The posterior distribution
As indicated in section 2b, the posterior distribution is derived via Bayes’ theorem [Eq. (2)] from the distribution of β̂ conditional on β, which is also known as the likelihood function, and the prior distribution.
Assuming that ɛ has a multivariate Gaussian distribution, the distribution of β̂ conditional on β is approximately f(β̂|β) = ϕ(β, σ̂2), where σ̂2 [Eq. (4)] is the estimated variance of β̂ and ϕ(·, ·) is a Gaussian probability density function. The distribution is approximate because σ̂2 is estimated. However, this estimate is known to be consistent, and thus the quality of this approximation will improve as the size of the samples available for calculating covariance estimates N and N increases.
The posterior that results from prior π0, which is a mixture of Gaussian components with variances τ1 = τ2 = 0.1, is identical, except that different values of τ1 and τ2 are used in the expressions above. Finally, the posterior distribution that corresponds to the improper (uninformative) prior π2 can be shown to be
g. Detection and attribution assessment
The posterior distributions are used to make detection and attribution assessments by calculating the posterior probabilities that β lies in predefined detection (𝒟) and attribution (𝒜) regions, respectively.
We have chosen to define the detection region as 𝒟 = [0.1, ∞). We leave open the question of how we should interpret a scaling factor, of say, β = 0.1 (the climate model response is 10 times the observed response) or β = 10 (the model response is only 1/10 the observed response), possibilities that are both contained within 𝒟. However, we acknowledge that such an outcome would raise important questions about the climate model and/or the completeness and accuracy of the specification of the external forcing to the climate model. Thus others may want to define 𝒟 differently. In any case, given a defensible definition of 𝒟, detection is claimed if the posterior probability that β ∈ 𝒟, which we denote by P𝒟|β̂ = Pr(β ∈ 𝒟|β̂), is large.
The exact interpretation of a “large” posterior probability is a matter of judgment, but we can be guided by the use of Bayes’ factors (e.g., see Kass and Raftery 1995, and references therein). The Bayes factor B is the scaling factor that relates the prior odds of detection to the posterior odds of detection through the equation
That is, the Bayes factor is the ratio of the posterior odds to the prior odds. Jeffreys (1935) first proposed the Bayes factor as a way of comparing scientific theories. Kass and Raftery (1995), following Jeffreys (1961), suggest that 3 ≤ β ≤ 20 indicates “positive” evidence for the hypothesis that is being evaluated, that 20 ≤ β ≤ 150 indicates “strong” evidence, and that B > 150 represents “very strong” evidence. The prior odds of detection using definition 𝒟 are easily shown to be approximately 9:4 under prior π1. Thus a posterior probability of detection P𝒟|β̂ > 0.978 is required to obtain a Bayes factor that indicates strong evidence for detection, while P𝒟|β̂ > 0.997 is required to conclude that there is very strong evidence.
As noted previously, statistical evidence that the climate model has responded correctly to historical external forcing is an important aspect of the overall attribution assessment. Thus it is reasonable to define the attribution region 𝒜 as a subset of the detection region that contains the point β = 1. A large posterior probability that β ∈ 𝒜, which we denote by P𝒜/β̂ = Pr(β ∈ 𝒜|β̂), provides evidence in support of attribution.
For purposes of illustration, we consider an attribution region defined as 𝒜 = (0.8, 1.2). This particular choice of 𝒜 implies that the criteria for making an attribution claim should include a high posterior likelihood that the model response to forcing is within 20% of observed. Further discussion of the choice of 𝒜 is given in section 4.
As with detection, we may use the Bayes factors to guide the interpretation of P𝒜/β̂. The prior odds of attribution given our particular definition of 𝒜 are approximately 0.22:1 under prior π1. Under these circumstances a posterior probability P𝒜/β̂ > 0.97 would represent very strong evidence for consistency between model and observed response to forcing, while P𝒜/β̂ > 0.81 would still represent strong evidence.
The residual consistency test statistic reported in Zwiers and Zhang (2003) suggests that our analysis should be conducted with approximately 15–25 EOFs retained. We therefore use truncations in that range. Figure 2 displays the posterior distribution π1 obtained for time periods 1900–49 and 1950–99 with 15, 20, and 25 EOFs retained. As can be seen, the posterior is not very sensitive to the number of EOFs retained over this truncation range. The posterior is somewhat wider for 1900–49 than that for 1950–99 for truncations in the 15–25 EOF range that are shown. This occurs because there is more missing data in the earlier period and because the signal is weaker. The width of the posterior [Eq. (6)] depends upon σ̂2, which in turn depends inversely upon the generalized inner product of β̂ with itself [Eq. (4)]. The latter increases when the signal is stronger and when the signal vector is longer, and thus the posterior narrows under those circumstances.
Most of the posterior probability in the 1950–99 analysis is allocated to β values between 0.4 and 0.8 when 20 EOFs are used, suggesting that the mean model response to the historical forcing changes is somewhat stronger than indicated by the observed temperature changes. This may, in part, be the result of a downward bias in the estimate of β that occurs because some noise from internal climate variability remains in the estimated GS signal even after averaging the six available forced climate simulations. However, fitting the regression model by means of the total least squares algorithm (Allen and Stott 2003) to account for noise in the estimated signal pattern increases β̂ only slightly. The somewhat too strong model response may also be due to missing, and probably net negative, forcings (Mitchell et al. 2001) such as the indirect effects of sulphate aerosols and the combined effect of solar and volcanic forcing.
Figure 3 shows the posterior probabilities that β lies in the detection and attribution regions that are calculated from π1(β|β̂) for each time period. The probability of detection P𝒟|β̂ is greater than 0.998 for both 1900–49 and 1950–99 when 20 EOFs are retained, indicating that it is highly unlikely (very strong evidence) that the observed temperature changes during these periods are entirely the result of natural internal climate variability. Similar results are obtained for 1910–59 and 1940–89. For the other two time periods, the probability of detection ranges between 0.4 and 0.8. Work by others (e.g., Stott et al. 2000, 2001; Hegerl et al. 2003) suggests that the nonmonotonic nature of the warming during the twentieth century that underlies these results can only be adequately explained by considering both the anthropogenic forcing factors investigated here and natural external forcing factors such as solar and volcanic forcing.
The probability of attribution P𝒜/β̂ that is computed from π1(β|β̂) with 20 EOFs retained is less than 0.5 for all time periods and ranges between 0.1 and 0.3 for truncations with 15 to 25 EOFs for the 1950–99 period. These probabilities are not large, and thus attribution is not strongly supported when the posterior is interpreted using our specific choice of attribution region 𝒜. The choice of 𝒜 and the implications for attribution are further discussed in section 4.
We repeated our analysis with the noninformative prior distribution π2. The resulting posterior distribution π2(β|β̂) (not shown) is somewhat wider than π1(β|β̂). Correspondingly, the probabilities of detection and attribution calculated from π2(β|β̂) are slightly smaller than those displayed in Fig. 3. Specifically, the probability of detection is about 0.0001 less than that reported above and the probability of attribution is about 0.01 less. Results obtained using prior distribution π0 are similar to those obtained with π1. Overall, results appear to be insensitive to variations in prior distributions that vary the way they express uncertainty about the climate model response to GS forcing but are similarly noncommittal about whether there has been an anthropogenic influence on climate in the twentieth century. Less moderate prior positions, where greater weight is put on the possibility that β = 1, or depending on perspective, β = 0, may result in a somewhat more narrowly defined posterior distribution.
4. Conclusions and discussion
We have described a Bayesian approach to detection and attribution that was first proposed by Berliner et al. (2000). The method is based on the standard optimal fingerprinting approach of Hasselmann (1979, 1997). In contrast to Berliner et al., the details of our implementation of the optimal fingerprinting component of the method are similar to those used in recent optimal fingerprinting studies (e.g., Allen and Tett 1999; Allen et al. 2004). In addition, we used a series of sensitivity experiments with an energy balance model to derive “ballpark” estimates of the prior uncertainty of the scaling factor on the climate-model-simulated response to external forcing. These estimates helped inform our choices of parameters for the prior distributions used in our study.
As an example, we assessed whether the combined effect of greenhouse gases and sulfate aerosols as simulated by the CCCma climate models (the CCCma “GS” signal) is detectable in the observed global surface temperature record in the twentieth century. Furthermore, we also assessed whether the posterior distribution for the scaling factor on the GS signal provides evidence to support attribution. The assessment was made with three prior distributions that are noncommittal on the question of whether there has been an anthropogenic influence on the climate of the twentieth century. We used specific assessment criteria that were defined in terms of predefined sets of scaling factors (𝒟 and 𝒜) and predefined posterior probability thresholds. The latter were determined by using Bayes factors to associate posterior probabilities with specific levels of “strength of evidence.”
Consistent with earlier studies, our Bayesian analysis indicates that the anthropogenic forcing signal is detectable in the early and late halves of the twentieth century. Note, however, that the early-century anthropogenic climate change signal may be somewhat confounded with the response to natural forcing changes during, and prior to, this period. Tett et al. (1999) and others have suggested that solar forcing changes may also have contributed to the early warming. However, all studies performed to date suggest that the anthropogenic forcing and response dominate during the latter half of the twentieth century (Mitchell et al. 2001; International Ad Hoc Detection and Attribution Group 2005).
Our results are insensitive to our choice of prior distribution because our selection of priors does not include distributions that express strong, a priori positions on anthropogenic climate change. The resulting posterior distributions have considerable spread, and thus we were not able to contribute strong evidence to an attribution assessment with our specific definition of the attribution region 𝒜. For 1950–99, when surface temperature data coverage is the most complete, most of the posterior probability is allocated to β values less than unity, suggesting that the CCCma climate models may have oversimulated the combined response to greenhouse gas and direct sulphate aerosol forcing. This may, in part, be the result of missing forcings such as the indirect effects of sulphate aerosols, the effects of other types of aerosols such as mineral dust and carbonaceous aerosols, the effects of the changing abundance of tropospheric and stratospheric ozone, and the effects of variations in natural external forcing of the climate system.
Our findings, which are preliminary in nature, do not in any way negate the Houghton et al. (2001) conclusion that “. . . most of the warming observed over the last 50 years is attributable to human activities.” Houghton et al. used a very broad range of evidence to make their attribution assessment, including a number of optimal detection studies using several different coupled climate models. What our findings do tell us is that the limited observational record, together with missing forcings and possible bias in the CCCma model, preclude the possibility of constructing a narrow posterior distribution that is centered directly on β = 1. Thus strong evidence supporting attribution does not emerge when using an attribution region that imposes a relatively stringent criterion, where the amplitude of the model response must be within 20% of observed. However, if we use a less stringent criterion with, say, 𝒜 = (0.5, 1.5) (the model response must be within 50% of observed), we then find that the posterior probability that β ∈ 𝒜 increases to levels of approximately 0.7 to 0.9 (depending upon EOF truncation) for the 1950–1999 period. This corresponds to Bayes factors in the range 3.2 < B < 12.4, giving an indication of “positive” evidence to support an attribution assessment. Relaxing the attribution criterion might be reasonable if taking into consideration external forcings not included in the CCCma GS simulations that may have a pattern of response similar to the GS response.
The absence of strong evidence to support attribution in the presence of “very strong” evidence for detection requires further consideration. As already suggested, there are several factors that may affect the amount of evidence that the posterior distribution on β can provide to support attribution. One of these is the length of the observational record. The posterior distribution should become narrower as the analysis period is made longer and extended into the future.
Another factor that affects the width of the posterior distribution is the signal strength. The latter is expected to continue to increase and therefore should contribute to a reduction in the width of the posterior as the analysis period shifts into the future. To quantify when strong evidence in support of an attribution assessment might emerge, we reevaluated the posterior distribution π1(β|β̂) using the 1950–99 values of the scaling factor β̂, but estimates of future values of σ̂2. The latter is strongly affected by the signal strength. The steps in our calculation were as follows:
The future estimates of σ̂2 [(Eq. (4)] were obtained using the 1950–99 observational mask and the corresponding 1950–99 covariance matrices, but using a future five-decade signal vector. The latter was masked in the same way as the 1950–99 observations and calculated relative to the same base period. This was repeated for all five-decade periods beginning with 1960–2009 and ending with 2050–2099.
The estimated future values of σ̂2 and the 1950–99 values for β̂ were substituted into π1(β|β̂). For each “posterior” that was obtained in this way, we calculated the probability that β ∈ (μp − 0.2, μp + 0.2) where μp is the mode of the posterior distribution. This, in effect, is the attribution probability P𝒜|β̂ that would be obtained for 𝒜 = (0.8, 1.2) when β̂ = 1.
The results of this exercise are displayed in Fig. 4 for truncations with 15, 20, and 25 EOFs. The graph has been extended to the left using the posteriors that are appropriate for those historical periods. By centering the attribution region on the mode of the posterior distribution, we have made the assumption that bias in the estimate of β will not be a problem with signals derived from future climate models. With this assumption, the suggestion is that strong or very strong evidence in support of attribution, based on a requirement that the model response to external forcing should be within 20% of observed, should emerge by 2020.
A third factor that affects the posterior distribution is the accuracy of our estimated covariance matrices. Since these matrices are estimated from climate model control runs they may be biased by model errors, although the use of multiple models (Gillett et al. 2002) can help to reduce that bias. The matrices also remain uncertain because of the limited length of the control simulations.
A fourth factor is whether all of the prior information that is available for inclusion in the analysis via the prior distribution has, in fact, been used. In our example we purposefully used priors that are noncommittal on whether there has been an anthropogenic influence on climate. Use of a prior distribution centered on β = 1, rather than on β = 0.5 as in our noncommittal priors, will result in a somewhat sharper posterior distribution and an increase in the posterior probability that β ∈ 𝒜. The multiple lines of evidence for attribution from sources other than surface temperature data that are described in Houghton et al. (2001) and subsequent literature (e.g., International Ad Hoc Detection and Attribution Group 2005) justifies the use of a prior distribution that does not sit on the fence. Recent experience with the detection of externally forced signals in the paleoclimate record (e.g., Hegerl et al. 2003) provides further justification.
Using a Gaussian prior π3 = ϕ(1, 0.25) results in attribution probabilities P𝒜|β̂ = 0.11 to 0.37 for 1950–99 when using between 15 and 25 EOFs and when 𝒜 = (0.8, 1.2). This is slightly greater than the result obtained with the noncommittal prior. However, the corresponding Bayes factors, which lie in the range 0.3 < B < 1.3, still do not indicate support for attribution. When we increase the size of the attribution region to 𝒜 = (0.5, 1.5) we find that P𝒜|β̂ ranges from 0.81 to 0.91, which is again somewhat larger than obtained with the noncommittal prior. In this case the corresponding Bayes factors range between 2 and 4.7, indicating that we now have weaker evidence in support of attribution than obtained with the noncommittal prior.
This is an example of the effect that the choice of prior has on the criteria for the assessment of the evidence in support of attribution (and also detection) when evidence is evaluated by means of Bayes factors. The relatively strong prior position on attribution that is expressed with distribution π3 has substantially larger prior odds of attribution than the noncommittal prior π1 (approximately 2.2:1 as opposed to 0.22:1). Consequently, the posterior odds of attribution must be correspondingly larger to achieve the same assessment of the strength of evidence. For example, to be identified as strong evidence, the posterior odds would have to be approximately 44:1 when using prior π3 while posterior odds of approximately 4.4:1 would be sufficient when using π1.
This illustrates an important benefit of using Bayes factors to assess evidence in support of detection or attribution. By comparing posterior odds ratios with corresponding prior odds ratios, the Bayes factor requires inferences to be heavily dependent on evidence that is obtained from the data. If the analyst has taken a strong prior position on detection or attribution, the Bayes factor requires the data to substantially improve the prior assessment of the odds of detection or attribution before declaring that there is, in fact, strong supporting evidence for that position.
The ability to incorporate prior knowledge into the analysis, particularly with regards to sources of uncertainty, is an important advantage of the Bayesian approach. This knowledge can be both subjective and objective and can include prior information about whether there has been a climatic response to anthropogenic forcing, about climate model uncertainty, and about the nature of the response to natural external forcing changes in, for example, paleo records, as discussed above. Another important advantage of the Bayesian approach is that the analyst must explicitly define the criteria that are used to assess evidence in support of detection and attribution. Admittedly, the choice of prior and the definitions of 𝒟 and 𝒜 are subjective, but they transparently reflect the perspective of the analyst making the climate change assessment.
In comparison, the frequentist criteria for the assessment of evidence are much less explicit. The frequentist criterion for detection is clear enough (we seek evidence that the observed change global surface temperature is not consistent with HD), but exactly what the rejection of HD means in probabilistic terms is not clear. More problematically, the standard criterion for the assessment of evidence in support of attribution is the failure to reject HA (Berliner et al. 2000). This means that the attribution criterion is relaxed, rather than made more stringent, when the significance level is reduced, say, from 10% to 1%. Also, perversely, attribution becomes more difficult with the standard approach as the observed record increases in length for a given signal strength. The analyst does not have direct control over the definition of the criteria for detection or attribution in either case (this happens, at best, by varying the significance level). In the Bayesian setting we must give, and defend, clear definitions of the criteria used to assess the evidence for detection and attribution. Once having defined the criteria, we can calculate the probabilities that they have been met. These posterior probabilities, in turn, can be used in a decision theoretic framework to make rational adaptation and mitigation decisions that minimize expected costs and losses.
We gratefully acknowledge that work conducted by Terry Lee was supported by the Canadian Foundation for Climate and Atmospheric Science through the Canadian CLIVAR Research Network. Work by Min Tsao was supported by the Natural Sciences and Engineering Research Council through a Discovery Grant. Francis Zwiers and Xuebin Zhang both benefited from support provided by the Climate Change Action Fund. This paper was improved by insightful and helpful comments provided by Greg Flato, Nathan Gillett, Slava Kharin, and Andreas Hense.
Corresponding author address: Dr. Francis W. Zwiers, Canadian Centre for Climate Modelling and Analysis, Meteorological Service of Canada, P.O. Box 1700, STN CSC, Victoria, BC V8W 2Y2, Canada. Email: email@example.com