## Abstract

Multiple changes in Earth’s climate system have been observed over the past decades. Determining how likely each of these changes is to have been caused by human influence is important for decision making with regard to mitigation and adaptation policy. Here we describe an approach for deriving the probability that anthropogenic forcings have caused a given observed change. The proposed approach is anchored into causal counterfactual theory (Pearl 2009), which was introduced recently, and in fact partly used already, in the context of extreme weather event attribution (EA). We argue that these concepts are also relevant to, and can be straightforwardly extended to, the context of detection and attribution of long-term trends associated with climate change (D&A). For this purpose, and in agreement with the principle of *fingerprinting* applied in the conventional D&A framework, a trajectory of change is converted into an event occurrence defined by maximizing the causal evidence associated to the forcing under scrutiny. Other key assumptions used in the conventional D&A framework, in particular those related to numerical model error, can also be adapted conveniently to this approach. Our proposal thus allows us to bridge the conventional framework with the standard causal theory, in an attempt to improve the quantification of causal probabilities. An illustration suggests that our approach is prone to yield a significantly higher estimate of the probability that anthropogenic forcings have caused the observed temperature change, thus supporting more assertive causal claims.

## 1. Introduction

Investigating causal links between climate forcings and the observed climate evolution over the instrumental era represents a significant part of the research effort on climate. Studies addressing these aspects in the context of climate change have been providing, over the past decades, an ever-increasing level of causal evidence that is important for decision-makers in international discussions on mitigation policy. In particular, these studies have produced far-reaching causal claims; for instance, the latest IPCC report (AR5; IPCC 2014) stated that “It is extremely likely that human influence has been the dominant cause of the observed warming since the mid-20th century” (p. 4). An important part of this causal claim, as well as many related others, regards the associated level of uncertainty. More precisely, the expression “extremely likely” in the latter quote has been formally defined by the IPCC (Mastrandrea et al. 2010; see Table 1) to correspond to a probability of 95%. The above quote hence implicitly means that the probability that the observed warming since the mid-twentieth century was not predominantly caused by human influence but by natural factors is roughly 1:20. Based on the current state of knowledge, that means that it is not yet possible to fully rule out that natural factors were the main causes of the observed global warming. This probability of 1:20, as well as all the probabilities associated with the numerous causal claims that can be found in the past and present climate literature and are summarized in AR5, is a critical quantity that is prone to affect the way in which climate change is apprehended by citizens and decision makers, and thereby to affect decisions on the matter. It is thus of interest to examine the method followed to derive these probabilities and, potentially, to improve it.

The aforementioned studies buttressing the above claim usually rely on a conventional attribution framework in which “causal attribution of anthropogenic climate change” is understood to mean “demonstration that a detected change is consistent with the estimated responses to anthropogenic and natural forcings combined, but not consistent with alternative, physically plausible explanations that exclude important elements of anthropogenic forcings” (Hegerl et al. 2007, p. 668). While this definition has proved to be very useful and relevant, it offers a description of causality that is arguably overly qualitative for the purpose of deriving a probability. In particular, it comes short of a mathematical definition of the word *cause* and, incidentally, of the *probability to have caused* that we in fact wish to quantify. Hence, beyond these general guidance principles, the actual derivation of these probabilities is left to some extent to the interpretation of the practitioner. In practice, causal attribution has usually been performed by using a class of linear regression models (Hegerl and Zwiers 2011):

where the observed climate change *y* is regarded as a linear combination of *p* externally forced response patterns with referred to as fingerprints, and where *ε* represents internal climate variability and observational error (all variables are vectors of dimension *n*). The regression coefficient accounts for possible error in climate models in simulating the amplitude of the pattern of response to forcing *f*. After inference and uncertainty analysis, the value of each coefficient and the magnitude of the confidence intervals determine whether or not the observed response is attributable to the associated forcing. The desired probability of causation (i.e., the probability that the forcing of interest *f* has caused the observed change *y*) is denoted hereafter . It has often been equated to the probability that the corresponding linear regression coefficient is positive^{1}:

A shortcoming of the conventional framework summarized in Eqs. (1) and (2) above is that a linear regression coefficient does not have any causal meaning from a formal standpoint. As acknowledged by Pearl (2009), turning an intrinsically deterministic notion such as causality into a probabilistic one is a difficult general problem that has also long been a matter of debate (Simpson 1951; Suppes 1970; Mellor 1995). Nevertheless, the current approach can be theoretically improved in the context of climate change where the values of the probabilities of causation have such important implications.

Our proposal to tackle this objective is anchored in a coherent theoretical corpus of definitions, concepts, and methods of general applicability that has emerged over the past three decades to address the issue of evidencing causal relationships empirically (Pearl 2009). This general framework is increasingly used in diverse fields (e.g., in epidemiology, economics, and social science) in which investigating causal links based on observations is a central matter. Recently, it has been introduced in climate science for the specific purpose of attributing weather- and climate-related extreme events (Hannart et al. 2016), which we refer to simply as “extreme events” hereafter. The latter article gave a brief overview of causal theory and articulated it with the conventional framework used for the attribution of extreme events, which is also an important topic in climate attribution. In particular, Hannart et al. (2016) showed that the key quantity referred to as the fraction of attributable risk (FAR) (Allen 2003; Stone and Allen 2005), which buttresses most extreme event attribution (EA) studies, can be directly interpreted within causal theory.

However, Hannart et al. (2016) did not address how to extend and adapt this theory in the context of the attribution of climate changes occurring on longer time scales. Yet, a significant advantage of the definitions of causal theory is precisely that they are relevant no matter the temporal and spatial scale. For instance, from the perspective of a paleoclimatologist studying Earth’s climate over the past few hundred millions of years, global warming over the past 150 years can be considered as a climate event. As a matter of fact, the word “event” is used in paleoclimatology to refer to “rapid” changes in the climate system, but ones that may yet last centuries to millennia. Where to draw the line is thus arbitrary: one person’s long-term trend is another person’s short-term event. It should therefore be possible to tackle causal attribution within a unified methodological framework based on shared concepts and definitions of causality. Doing so would allow us to bridge the methodological gap that exists between EA and trend attribution at a fundamental level, thereby covering the full scope of climate attribution studies. Such a unification would present in our view several advantages: enhancing methodological research synergies between D&A topics, improving the shared interpretability of results, and streamlining the communication of causal claims—in particular when it comes to the quantification of uncertainty, that is, of the probability that a given forcing has caused a given observed phenomenon.

Here, we adapt some formal definitions of causality and probability of causation to the context of climate change attribution. Then, we detail technical implementation under standard assumptions used in D&A. The method is finally illustrated on the warming observed over the twentieth century.

## 2. Causal counterfactual theory

While an overview of causal theory cannot be repeated here, it is necessary for clarity and self-containedness to highlight its key ideas and most relevant concepts for the present discussion.

Let us first recall the so-called counterfactual definition of causality by quoting the eighteenth-century Scottish philosopher David Hume: “We may define a cause to be an object followed by another, where, if the first object had not been, the second never had existed.” In other words, an event *E* (*E* stands for effect) is caused by an event *C* (*C* stands for cause) if and only if *E* would not occur were it not for *C*. Note that the word *event* is used here in its general, mathematical sense of a *subset* of a sample space . According to this definition, evidencing causality requires a counterfactual approach by which one inquires whether or not the event *E* would have occurred in a hypothetical world, termed counterfactual, in which the event *C* would not have occurred. The fundamental approach of causality that is implied by this definition is still entirely relevant in the standard causal theory. It may also arguably be connected to the guidance principles of the conventional climate change attribution framework and to the optimal fingerprinting models, in a qualitative manner. The main virtue of the standard causality theory of Pearl consists in our view in formalizing precisely the above qualitative definition, thus allowing for sound quantitative developments. A prominent feature of this theory consists in first recognizing that causation corresponds to rather different situations and that three distinct facets of causality should be distinguished: (i) necessary causation, where the occurrence of *E* requires that of *C* but may also require other factors; (ii) sufficient causation, where the occurrence of *C* drives that of *E* but may not be required for *E* to occur; and (iii) necessary and sufficient causation, where (i) and (ii) both hold. The fundamental distinction between these three facets can be visualized by using the simple illustration shown in Fig. 1.

While the counterfactual definition as well as the three facets of causality described above may be understood at first in a fully deterministic sense, perhaps the main strength of Pearl’s formalization is to propose an extension of these definitions under a probabilistic setting. The probabilities of causation are thereby defined as follow:

where and are the complementaries of *C* and *E*, and where the notation means that an *intervention* is applied to the system under causal investigation. For instance PS, the *probability of sufficient causation*, reads from the above that the probability that *E* occurs when *C* is interventionally forced to occur, conditional on the fact that neither *C* nor *E* was occurring in the first place. Conversely PN, the *probability of necessary causation*, is defined as the probability that *E* would not occur when *C* is interventionally forced to not occur, conditional on the fact that both *C* and *E* were occurring in the first place. While we omit here the formal definition of the intervention for brevity, the latter can be understood merely as experimentation: applying these definitions thus requires the ability to experiment. Real experimentation, whether in situ or in vivo, is often accessible in many fields but it is not in climate research for obvious reasons. In this case, one can thus only rely on numerical *in silico* experimentation: the implications of this constraint are discussed further.

While the probabilities of causation are not easily computable in general, their expression fortunately becomes quite simple under assumptions that are reasonable in the case of external forcings (i.e., exogeneity and monotonicity):

where is the so-called *factual* probability of the event *E* in the real world where *C* did occur and is its *counterfactual* probability in the hypothetic world as it is would have been had *C* not occurred. One may easily verify that Eq. (4) holds in the three examples of Fig. 1 by assuming that the switches are probabilistic and exogenous. In any case, under such circumstances, the causal attribution problem can thus be narrowed down to computing an estimate of the probabilities and *p*. The latter only requires two experiments: a factual experiment and a counterfactual one . Equation (3) then yields PN, PS, and PNS from which a causal statement can be formulated.

Each of the three probabilities PS, PN, and PNS has different implications depending on the context. For instance, two perspectives can be considered: (i) the *ex post* perspective of the plaintiff or the judge who asks “does *C* bear the responsibility of the event *E* that did occur?” and (ii) the *ex ante* perspective of the planner or the policymaker who instead asks “what should be done w.r.t. *C* to prevent future occurrence of *E*?”. It is PN that is typically more relevant to context (i) involving legal responsibility, whereas PS has more relevance for context (ii) involving policy elaboration. Both these perspectives could be relevant in the context of climate change, and it thus makes sense to trade them off. Note that PS and PN can be articulated with the conventional definition recalled in introduction. Indeed, the “*demonstration that the change is consistent with* (…)” implicitly corresponds to the idea of sufficient causation, whereas “(…) *is not consistent with* (…)” corresponds to that of necessary causation. The conventional definition therefore implicitly requires a high PS and a high PN to attribute a change to a given cause.

PNS may be precisely viewed as a probability that combines necessity and sufficiency. It does so in a conservative way since we have by construction that . In particular, this means that a low PNS does not imply the absence of a causal relationship because either a high PN or a high PS may still prevail even when PNS is low. On the other hand, it presents the advantage that any statement derived from PNS asserting the existence of a causal link holds both in terms of necessity and sufficiency. This property is thus prone to simplify causal communication, in particular toward the general public, since the distinction no longer needs to be explained. Therefore, establishing a high PNS may be considered as a suitable goal to evidence the existence of a causal relationship in a simple and straightforward way. In particular, the limiting case corresponds to the fully deterministic, systematic, and single-caused situation in Fig. 1c—that is, undeniably the most stringent way in which one may understand causality.

## 3. Probabilities of causation of climate change

We now return to the question of interest: for a given forcing *f* and an observed evolution of the climate system *y*, can *y* be attributed to *f*? More precisely, what is the probability that *f* has caused *y*? We propose to tackle this problem by applying the causal counterfactual theory to the context of climate change, and more specifically by using the three probabilities of causation PN, PS, and PNS recalled above. This section shows that it can be done to a large extent similarly to the approach of Hannart et al. (2016) for EA. In particular, as in EA, the crucial question to be answered as a starting point consists of narrowing down the definitions of the cause event *C* and of the effect event *E* associated with the question at stake—where the word *event* is used here in its general mathematical sense of subset.

### a. Counterfactual setting

For the cause event *C*, a straightforward answer is possible: we can follow the exact same approach as in EA by defining *C* as “presence of forcing *f*” (i.e., the factual world that occurred) and as “absence of forcing *f*” (i.e., the counterfactual world that would have occurred in the absence of *f*). Indeed, forcing *f* can be switched on and off in numerical simulations of the climate evolution over the industrial period, as in the examples of Fig. 1 and as in standard EA studies. Incidentally, the sample space consists of the set of all possible climate trajectories in the presence and absence of *f*, including the observed one *y*. In other words, all forcings other than *f* are held constant at their observed values as they are not concerned by the causal question.

In practice and by definition, the factual runs of course always correspond to the historical experiment (HIST), using the Climate Model Intercomparison Project’s (CMIP) terminology as described by Taylor et al. (2012). The counterfactual runs are obtained from the same setting as historical but switching off the forcing of interest. For instance, if the forcing consists of the anthropogenic forcing then the counterfactual runs correspond to the historicalNat (NAT) experiment, that is, . Likewise, if the forcing consists of the CO_{2} forcing, then the counterfactual runs corresponds to the “all except CO_{2}” experiment. However, no such runs are available in CMIP5 (https://cmip.llnl.gov/cmip5/docs/historical_Misc_forcing.pdf; see section 6 for discussion). Last, it is worth underlining that the historicalAnt experiment, which combines all anthropogenic forcings, thus corresponds to the counterfactual setting associated with the natural forcings. Therefore, runs from the historicalAnt experiment are relevant for the attribution of the natural forcings only; they are not relevant for the attribution of the anthropogenic forcings under the present counterfactual causal theory.

These definitions of *C* and have an important implication w.r.t. the design of numerical experiments in climate change attribution. In contrast with the design usually prevailing in D&A (forcing *f* only), the latter experiments are required to be counterfactual (i.e., all forcings except *f*). We elaborate further on this remark in section 6.

### b. Balancing necessity and sufficiency

To define the effect event *E*, we propose to follow the same approach as in EA, where *E* is usually defined based on an ad hoc climatic index *Z* exceeding a threshold *u*:

Thus, defining *E* implies choosing an appropriate variable *Z* and threshold *u* that reflect the focus of the question while keeping in mind the implications of the balance between the probabilities of necessary and sufficient causation. We now illustrate this issue and lay out some proposals to address it.

Consider the question “Have anthropogenic CO_{2} emissions caused global warming?”. Following the above, the event “global warming” may be loosely defined as a positive trend on global Earth surface temperature, that is, , where *Z* is the global surface temperature linear trend coefficient and the threshold *u* is zero. In that case, *E* nearly always occurs in the factual world but it is also frequent in the counterfactual one ( medium), and thus the emphasis is mostly on PS (i.e., on sufficient causation), while PN and PNS will have moderate values (Figs. 2b,e). But if global warming is more restrictively defined as a warming trend comparable to or greater than the observed trend, that is, , where is the observed trend, then the event becomes nearly impossible in the counterfactual world but remains frequent in the factual one ( medium), and thus the emphasis is on PN (i.e., on necessary causation) while the values of PS and PNS will this time be low. Therefore, the above two extreme definitions both yield a low PNS. But under a more balanced definition of *global warming* as a trend exceeding an intermediate value , then the event nearly always occurs in the factual world and yet remains very rare in the counterfactual one . Hence PNS is then high: both necessary and sufficient causation prevail. We propose to take advantage of this optimal value to define the event “global warming” as the global trend index *Z* exceeding the optimal threshold such that the amount of causal evidence, in a PNS sense, is maximized:

where the condition insures that the event has actually occurred. When used on real data (see section 6), this approach yields a high value of for the above question (Figs. 2b,e).

Let us now consider the question “Did anthropogenic CO_{2} emissions cause the Argentinian heatwave of December 2013?” (Hannart et al. 2015). Here, the event can be defined as , where *Z* is surface temperature anomaly averaged over an ad hoc space–time window. Like in the previous case, the causal evidence again shifts from necessary and not sufficient (Figs. 2a,d) when *u* is equal to the observed value of the index *z* = 24.5°C (unusual event in both worlds but much more so in the counterfactual one) to sufficient and not necessary when *u* is small (usual event in both worlds but much more so in the factual one). Like in the previous case, a possible approach here would be to balance both quantities by maximizing PNS in *u* as in Eq. (6). However, this would lead here to a substantially lower threshold that no longer reflects the rare and extreme nature of the event “heatwave” under scrutiny. Furthermore, this would yield a well-balanced but fairly low level of causal evidence . Thus maximizing PNS is not relevant here. Instead, maximizing PN, even if that is at the expense of PS, is arguably more relevant here since we are dealing with extreme events that are rare in both worlds, thereby inherently limiting the evidence of sufficient causation. This maximization corresponds to , which often yields the highest observed threshold . Therefore, PN (i.e., the FAR) is an appropriate metric for the attribution of extreme events, and a high threshold *u* matching with the observed value *z* should be used in order to maximize it. In contrast with extreme events, long-term changes are prone to be associated with much powerful causal evidence that simultaneously involves necessary and sufficient causation, and may yield high values for PN, PS, and PNS. PNS is thus an appropriate summary metric to consider for the attribution of climate changes, in agreement with D&A guidance principles (Hegerl et al. 2010). An optimal intermediate threshold can be chosen by maximizing PNS.

### c. Building an optimal index

In the above example where *global warming* is the focus of the question, the variable of interest *Z* to define the event can be considered as implicitly stated in the question, insofar as the term “global warming” implicitly refers to an increasing trend on global temperature. However, in the context of climate change attribution, we often investigate the cause of “an observed change *y*” with no precise a priori regarding the characteristics of the change that are relevant w.r.t. causal evidencing. Furthermore, *y* may be a large dimensional space–time vector. Thus the definition of the index *Z* in this case is more ambiguous.

We argue that in such a case, the physical characteristics of *y* that are implicitly considered relevant to the causal question are precisely those that best enhance the existence of a causal relationship in a PNS sense. This indeed corresponds to the idea of “fingerprinting” used thus far in climate change attribution studies (as well as in criminal investigations, hence the name): we seek a fingerprint—that is, a distinctive characteristic of *y* that would never appear in the absence of forcing *f* (i.e., ) but systematically does in its presence (i.e., ). If this characteristic shows up in observations, then the causal evidence is conclusive. A fingerprint may thus be thought of as a characteristic that maximizes the gap between *p* and and thereby maximizes PNS, by definition.

As an illustration, Marvel and Bonfils (2013) focus on the attribution of changes in precipitation, and subsequently address the question “Has anthropogenic forcing caused the observed evolution of precipitation at a global level?”. Arguably, this study illustrates our point in the sense that it addresses the question by defining a *fingerprint* index *Z* that aims precisely at reflecting the features of the change in precipitation that are thought to materialize frequently (if not systematically) in the factual world and yet are expected to be rare (if not impossible) in the counterfactual one, based on physical considerations. In practice, the index *Z* defined by the authors consists of a nondimensional scalar summarizing the main spatial and physical features of precipitation evolution w.r.t. dynamics and thermodynamics. The factual and counterfactual PDFs of *Z* are then derived from the HIST and NAT runs respectively (Fig. 2c). From these PDFs, one can easily obtain an optimal threshold for the precipitation index *Z* by applying Eq. (6) (Fig. 2f). This yields , meaning that anthropogenic forcings *have about as likely as not caused the observed evolution of precipitation.*

A qualitative approach driven by physical considerations, such as the one of Marvel and Bonfils (2013), is perfectly possible to define a fingerprint index *Z* that aims at maximizing PNS. However, a quantitative approach can also help in order to define *Z* optimally, and to identify the features of *y* that best discriminate between the factual and counterfactual worlds. Indeed, the qualitative, physical elicitation of *Z* may be difficult when the joint evolution of the variables at stake is complex or not well understood a priori. Furthermore, one may also wish to combine lines of evidence by treating several different variables at the same time in *y* (i.e., precipitation and temperature; Yan et al. 2016). Let us introduce the notation where *Y* is the space–time vectorial random variable of size *n* of which observed realization is *y*, and *ϕ* is a mapping from to . Extending Eq. (6) to the simultaneous determination of the optimal mapping and optimal threshold , we propose the following maximization:

In words, we thus propose to choose the value of the threshold, but also to choose the index *ϕ* among the set all possible indexes , so as to maximize PNS. The event defined above in Eq. (7) may thus be referred to as the *optimal fingerprint* w.r.t. forcing *f*. The maximization performed in Eq. (7) also suggests that our approach shares some similarity with the method of Yan et al. (2016), insofar as the variables of interest are in both cases selected mathematically by maximizing a criterion that is relevant for attribution [i.e., potential detectability in Yan et al. (2016); PNS in the present article] rather than by following qualitative, physics- or impact-oriented, considerations.

## 4. Implementation under the standard framework

We now turn to the practical aspects of implementing the approach described in section 3 above, based on the observations *y* and on climate model experiments. We detail these practical aspects in the context of the standard framework briefly recalled in section 1, namely multivariate linear regression under a Gaussian setting. Note that the assumptions underlying the latter conventional framework could be challenged (e.g., pattern scaling description of model error and Gaussianity). However, the purpose of this section is not to challenge these assumptions. It is merely to illustrate in detail how these assumptions can be used within the general causal framework proposed. Furthermore, the details of the mathematical derivation shown in this subsection cannot be covered exhaustively here in order to meet the length constraint. However, some important steps of the derivation are described in appendix A, and the complete details and justification thereof can be found in the references given in the text.

### a. Generalities

The maximization of Eq. (7) requires the possibility to evaluate the probabilities of occurrence *p* and , in the factual and counterfactual world, of the event , for any *ϕ* and *u*. For this purpose, it is convenient to derive beforehand the factual and counterfactual PDFs of the random variable *Y*, denoted and respectively. Assuming their two first moments are finite, we introduce

The means *μ* and represent the expected response in the factual and counterfactual worlds; it is intuitive that their difference will be key to the analysis. The covariances and represent all the uncertainties at stake; they must be carefully determined based on additional assumptions. To avoid repetition in presenting these assumptions, we will detail them for the factual world only, but they will be applied identically in both worlds.

As recalled above, in situ experimentation on the climate system is not accessible, thus we are left with *in silico* experimentation as the only option. While the increasing realism of climate system models renders such an *in silico* approach plausible, it is clear that modeling errors associated to their numerical and physical imperfections should be taken into account into . In addition, sampling uncertainty and observational uncertainty, which are commonplace sources of uncertainty in dealing with experimental results in an in situ context as well, should also be taken into account. Finally, internal climate variability also needs to be factored in. The latter four sources of uncertainty can be represented by decomposing into a sum of four terms:

where the component represents climate internal variability, represents model uncertainty, represents observational uncertainty, and represents sampling uncertainty. Assumptions regarding the latter four sources of uncertainty are also key in the conventional Gaussian linear regression framework recalled in section 1. We therefore propose to take advantage of some assumptions, data, and procedures that have been previously introduced under the conventional framework, and adapt them to specify *μ*, , , , and . The statistical model specification below can thus be viewed as an extension of developments under the conventional framework, in particular those exposed in Hannart (2016). The various parameters and data involved, as well as their conditioning, are summarized in the hierarchical model represented in Fig. 3, which we now describe.

### b. Model description

As in the conventional linear regression formulation recalled in Eq. (1), we assume that the random variable *Y* is Gaussian with mean and covariance :

In the conventional framework, climate models are assumed to correctly represent the response patterns but to err on their amplitude. Recognizing that the scaling factors *β* thereby aim at representing the error associated to models, we prefer to treat *β* as a random variable instead of a fixed parameter to be estimated. The latter factors are also assumed to be Gaussian:

where we assume that the expected value of *β* is , and *ω* is a scalar parameter that will be determined further in this section. Combining Eqs. (10) and (11) yields (see appendix A)

where . Equation (12) thus shows that it is possible to translate the pattern scaling term from the mean of *Y* to the covariance of *Y*. We believe such a mean-covariance translation is relevant here, since the pattern scaling assumption is meant to represent a source of uncertainty. Furthermore, the covariance associated with the latter source of uncertainty can be represented by the component , which results from the random scaling of the individual responses . Furthermore, the expected value of *Y*, denoted *μ*, is equal to the sum of the latter individual responses. Under the additivity assumption prevailing in the conventional framework, *μ* thus corresponds to the model response under the scenario where the *p* forcings are present. Hence, *μ* can be obtained by estimating directly the combined response as opposed to estimating the individual responses one by one and summing them up. Such a direct estimation of *μ* is indeed advantageous from a sampling error standpoint, as will be made clear immediately below.

The PDF of *Y* in Eq. (12) involves three quantities, *μ*, , and , that need to be estimated from a finite ensemble of model runs denoted , which of course introduces sampling uncertainty. Assuming independence among runs, it is straightforward to show that ( appendix A)

where is the ensemble average for the individual response *i*, is the ensemble average for the combined response, is the number of runs available for the individual response to forcing *i*; and *r* is the number of combined forcings runs. Combining Eqs. (12) and (13), and after some algebra, this becomes ( appendix A)

with , and where the sampling uncertainty on the responses *μ* and thus corresponds to the term . On the other hand, the internal variability component also has to be estimated from the preindustrial control runs, which introduces additional sampling uncertainty. The sampling uncertainty on can be treated by following the approach of Hannart (2016), which introduces an inverse Wishart conjugate prior for . This leads to an inverse Wishart posterior for that has the following expression ( appendix A):

where the estimated covariance consists of a so-called shrinkage estimator:

where is the empirical covariance of the control ensemble, is a shrinkage target matrix taken here to be equal to (i.e., and for ), the shrinkage intensity is obtained from the marginal likelihood maximization described in Hannart and Naveau (2014), and .

Combining Eqs. (14) and (15), and after some algebra and an approximation, this results in ( appendix A)

where we adopted the simplified parametric form for the covariance of observational error, and where is the multivariate *t* distribution with mean *μ*, variance , and *ν* degrees of freedom. Equation (17) implies that taking into account the sampling uncertainty on does not affect the total variance of *Y*. Instead, it only affects the shape of the PDF of *Y*, which has thicker tails than the Gaussian distribution. With these parameterizations, our model for *Y* is thus a parametric Student’s *t* model with two parameters .

After computing the estimators , , , and using the ensemble of model experiments as described above, the parameters are estimated by fitting the above model to the observation *y* based on likelihood maximization. The log-likelihood of the model has the following expression:

The estimators and are then obtained numerically using a standard maximization algorithm (e.g., gradient descent). With being obtained from factual runs (i.e., HIST runs) and containing all the forcings including *f*, this procedure thus yields the PDF of *Y* in the factual world:

Next, to obtain , one simply needs to change the mean to obtained as the ensemble average for the counterfactual experiment “all forcings except *f.*” Some changes also need to be made regarding the covariance. Indeed, since forcing *f* is absent in the counterfactual world, the model error covariance component , corresponding to the random scaling of the response to forcing *f*, must be taken out of the covariance. Furthermore, if the number of counterfactual runs differs from the number of factual runs *r*, the sampling uncertainty associated with estimating *μ* also has to be modified. The PDF of *Y* in the counterfactual world can thus be written

As noted above, when *f* is anthropogenic forcing, the counterfactual experiment NAT is usually available in CMIP runs, allowing for a straightforward derivation of . But for other forcings, by the design of CMIP experiments, counterfactual runs are usually not available. A possible workaround then consists in applying the additivity assumption to approximate with . For instance, if CO_{2} is the forcing of interest, the counterfactual response to all forcings except CO_{2} emissions can be approximated by subtracting the CO_{2} individual response from the all-forcings response. However in that case, the sampling uncertainty term corresponding to the estimation of must be added to the covariance .

### c. Derivation of the probabilities of causation

With the two PDFs of *Y* in hand, an approximated solution to the maximization of Eq. (7) can be conveniently obtained by linearizing *ϕ*, yielding a closed mathematical expression for the optimal index :

Equation (21) is a well-known result of linear discriminant analysis (LDA) (McLachlan 2004). Details of approximations made and of the mathematical derivation of Eq. (21) are given in appendix B. The optimal index can thus be interpreted as the projection of *Y* onto the vector , which will be denoted hereinafter; that is, .

To obtain PNS, we then need to derive the factual and counterfactual CDFs of , denoted *G* and respectively. Since no closed form expression of these CDFs is available, we simulate realizations thereof. Drawing two samples of *N* random realizations of *Y* from the Student *t* distributions and is straightforward, by treating the Student *t* as a compound Gaussian chi-squared distribution. Samples of *Z* are then immediately obtained by projecting onto and the desired CDFs can be estimated using the standard kernel smoothing estimator (Bowman and Azzalini 1997), yielding and for all . Finally, follows as

and

where .

### d. Reducing computational cost

When the dimension of *y* is large, the above described procedure can become prohibitively costly if applied straightforwardly, due to the necessity to derive the inverse and determinant of at several steps of the procedure. However, the computational cost of these operations can be drastically reduced by applying the Sherman–Morrison–Woodbury formula (Woodbury 1950), which states that the inverse of a low rank correction of some matrix can be computed by doing a low rank correction to the inverse of the original matrix. Omitting the notation for more clarity, we have

where can be inverted using the same formula:

where . Likewise, we apply the Sylvester formula (Sylvester 1851) twice to compute the determinant of :

Independently of *n*, the matrix is of size *p* and matrix is of size , and is diagonal. Obtaining their inverse and determinant is therefore computationally cheap. Hence, the inverse and determinant of can be obtained at a low computational cost by applying first Eq. (25) to determine and second Eqs. (24) and (26).

## 5. Illustration on temperature change

Our methodological proposal is applied to the observed evolution of Earth’s surface temperature during the twentieth century, with the focus being restrictively on the attribution to anthropogenic forcings. More precisely, *y* consists of a spatial–temporal vector of size *n* = 54, which contains the observed surface temperatures averaged over 54 time–space windows. These windows are defined at a coarse resolution: Earth’s surface is divided into six regions of similar size (three in each hemisphere) while the period 1910–2000 is divided into nine decades. The decade 1900–10 is used as a reference period, and all values are converted to anomalies w.r.t. the first decade. The HadCRUT4 observational dataset (Morice et al. 2012) was used to obtain *y*. With respect to climate simulations, the runs of the IPSL-CM5A-LR model (Dufresne et al. 2013) for the NAT, ANT, HIST, and PIcontrol experiments were used (see appendix C for details) and converted to the same format as *y* after adequate space–time averaging.

Following the procedure described in section 4, we successively derived the estimated factual response using the *r* HIST runs, the estimated counterfactual response using the NAT runs, the estimated individual responses and using the NAT runs and ANT runs respectively and , and the estimated covariance from the PIcontrol runs. Then, we derived and by likelihood maximization to obtain and .

An assessment of the relative importance of the four components of uncertainty was obtained by deriving the trace of each component (i.e., the sum of diagonal terms) normalized to the trace of the complete covariance. Climate variability is found to be the dominant contribution, followed by model uncertainty, observational uncertainty, and sampling uncertainty (not shown). The split between model and observational uncertainty is to some extent arbitrary as we have no objective way to separate them based only on *y*; that is, the model could be equivalently formulated as and . An objective separation would require an ensemble representing observational uncertainty, allowing for a preliminary estimation of .

The optimal vector , designed to capture the space–time patterns that best discriminate the HIST evolution and the NAT one, was then obtained from Eq. (21). To identify which features of *Y* are captured by this optimal mapping, the coefficients were averaged spatially and temporally, and were plotted in Fig. 4. First, it can be noted that the coefficients’ global average is large and positive: a major discriminant feature is merely global mean temperature, as expected. Second, the coefficients also exhibit substantial variation around their average in both space and time. Spatial variations of unsurprisingly suggest that, beyond global mean temperature, other discriminant features include the warming contrast prevailing between the two hemispheres and/or between low and high latitudes (the low resolution prevents a clear separation), as well as between ocean and land (Fig. 4a). Temporal variations of suggest that discriminant features include the linear trend increase, as expected, but also higher-order temporal variations (Fig. 4b).

The PDFs of the optimal index were derived and are plotted in Fig. 5, together with PNS as a function of the threshold *u*. The maximum of PNS determines the desired probability of causation:

In IPCC terminology, this would mean that anthropogenic forcings *have virtually certainly caused the observed evolution of temperature*, according to our approach. More precisely, the probability that the observed evolution of temperature is not caused by anthropogenic forcings is then one in ten thousand (1:10 000) instead of one in twenty (1:20). Therefore, the level of causal evidence found here is substantially higher than the level assessed in the IPCC report. This discrepancy will be discussed in section 6.

Before digging into this discussion, it is interesting to assess the relative importance of the “trivial” causal evidence coming from the global increase in temperature, and of the less obvious causal evidence coming from space–time features captured by . For this purpose, we merely split into the sum of a global average contribution and of the remaining variations . The PDFs of the resulting indexes are plotted in Figs. 5a and 5b. Their bivariate PDF can also be visualized with the scatterplot of Fig. 5c. The following two probabilities of causation are obtained:

where refers to the globally averaged evolution and refers to other features of evolution. Therefore, while the globally averaged warming provides alone a substantial level of evidence [i.e., ], these results suggest that the overwhelmingly high overall evidence [i.e., ] is primarily associated with nonobvious space–time features of the observed temperature change.

## 6. Discussion

### a. Comparison with previous statements

The probabilities of causation obtained by using our proposal may depart from the levels of uncertainty asserted by the latest IPCC report, and/or by previous work. For instance, when *y* corresponds to the evolution of precipitation observed over the entire globe during the satellite era (1979–2012), we have shown in section 3 that, using the dynamic–thermodynamic index built by Marvel and Bonfils (2013), the associated probability of causation is found to be 0.43. This probability is thus significantly lower than the one implied by the claim made in this study that “the changes in precipitation observed in the satellite era are likely to be anthropogenic in nature” wherein *likely* implicitly means .

In contrast with the situation prevailing for precipitation, when *y* corresponds to the observed evolution of Earth’s surface temperature during the twentieth century, and in spite of using a very coarse spatial resolution, we found a probability of causation , which considerably exceeds the 0.95 probability implied by the latest IPCC report. Such a gap deserves to be discussed.

First, the probability of causation defined in our approach is of course sensitive to the assumptions that are made on the various sources of uncertainty, all of which are here built into . Naturally, increasing the level of uncertainty, for instance through an inflation factor applied to , reduces the probability of causation (Fig. 6). In the present illustration, uncertainty needs to inflated by a factor of 2.4 to obtain in agreement with the IPCC statement. Therefore, a speculative explanation for the gap is that experts may be adopting a conservative approach by implicitly inflating uncertainty, although not explicitly, perhaps in an attempt to account for additional sources of uncertainty that are not well known. In the present case, such an inflation should amount to 2.4 to explain the gap. This number is arguably too high to provide a satisfactory standalone explanation, yet overall such a conservativeness may partly contribute to the discrepancy when it comes to temperature. In any case, this highlights the need for a more explicit and consistent use of conservativeness—if any.

Besides the effect of inflating the individual variances, it is important to note that the probability of causation may also be greatly reduced when the correlation coefficients of the covariance , whether spatial or temporal, are inflated. This less straightforward effect can be explained by the fact that higher correlations imply greater spatial and temporal coherence of the noise, which is thus more prone to confounding with a highly coherent signal, and thereby reduces the probability of causation. Conservativeness may thus be associated with an inflation of correlations, in addition to an inflation of variances.

Another possible explanation for the discrepancy is more technical. Many previous attribution studies buttressing the IPCC statement regarding temperature are based on an inference method for the linear regression model of Eq. (1) which is not optimal w.r.t. maximizing causal evidence—despite it being often referred to as “optimal fingerprinting*.*” More precisely, the inference on the scaling factors *β* and the associated uncertainty quantification are obtained by projecting the observation *y* as well as the patterns onto the leading eigenvectors of the covariance associated with climate internal variability. Such a projection choice sharply contrasts with the projection used in our approach, which is indeed performed onto the vector . Denoting the eigenvectors of and the corresponding eigenvalues, the expression of can be written

Equation (29) shows that projecting onto does not emphasize the leading eigenvectors of , in contrast to the aforementioned standard projection. Instead, it emphasizes the eigenvectors that simultaneously present a low eigenvalue and a strong alignment with the contrast between the two worlds . As a matter of fact, the ratio can be interpreted as the signal-to-noise ratio associated to the eigenvector , where the eigenvalue quantifies the magnitude of the noise and that of the causal signal. Projecting onto thus maximizes the signal-to-noise ratio. In contrast, since is a large contribution to (the dominant one in our illustration), a projection onto the leading eigenvectors of naturally tends to amplify the noise, and to partly hide the relevant causal signal .

To assess whether or not these theoretical remarks hold in practice, we revisited our illustration and quantified the impact on of using such a projection onto the leading eigenvectors of . For this purpose, after computing the projection matrix on the 10 leading eigenvectors of , our procedure was applied identically, but this time using the projected vector . Results are shown in Fig. 7, again after splitting the contribution of global mean change and patterns of change. Unsurprisingly, the probability of causation associated to the global mean change remains unmodified at 0.978. However, the probability of causation associated with the space–time features of warming drops from 0.9994 to 0.92. Indeed, the features that most discriminate the two worlds, and are summarized in , do not align well with the leading eigenvectors of . They are thus incompletely reflected by the projected vector (Fig. 8). Furthermore, the uncertainty of the resulting index is high by construction, as the magnitude of climate variability is maximized when projecting onto its leading modes. This further contributes to reducing to 0.992.

### b. Counterfactual experiments

Our methodological proposal has an immediate implication w.r.t. the design of standardized CMIP experiments dedicated to D&A: a natural option would be to change the present design “forcing *f* only” into a counterfactual design “all forcings except *f.*” Indeed, is driven by the difference between the factual response *μ* (i.e., historical experiment) and the counterfactual response (i.e., all forcings except *f* experiment). Under the assumption that forcings do not interact with one another and that the combined response matches with the sum of the individual responses, the difference coincides with the individual response (i.e., forcing *f* only experiment). While this hypothesis is well established for temperature at large scale (Gillett et al. 2004), it appears to break down for other variables (e.g., precipitation; Shiogama et al. 2013) or over particular regions (e.g., the southern extratropics; Morgenstern et al. 2014) where forcings appear to significantly interplay. Such a lack of additivity would inevitably damage the results of the causal analysis. It is thus important in our view to better understand the domain of validity of the forcing-additivity assumption and to evaluate the drawbacks of the present “one forcing only” design versus its advantages. Such an analysis does require “forcing *f* only” experiments, but also “all forcings except *f*” experiments in order to allow for comparison. This effort would hence justify including in the Detection and Attribution Model Intercomparison Project (DAMIP) set of experiments an “all forcings except *f*” experiment—which is presently absent even in the lowest priority tier thereof—at least for the most important forcings such as anthropogenic CO_{2}.

### c. Benchmarking high probabilities

Section 5 showed that the proposed approach may sometimes yield probabilities of causation that are very close to one. How can we communicate such low levels of uncertainty? This question arises insofar as the term “virtual certainty” applies as soon as PNS exceeds 0.99 under the current IPCC language (Table 1). Thus, this terminology would be unfit to express in words a PNS increase from 0.99 to 0.9999, say—even though such an increase corresponds to a large reduction of uncertainty by a factor of 100. One option to address this issue is to use instead the uncertainty terminology of theoretical physics, in which a probability is translated into an exceedance level under the Gaussian distribution, measured in numbers of *σ* from the mean (where *σ* denotes standard deviation), that is, with *F* the CDF of the standard Gaussian distribution. Under such terminology, “virtual certainty” thus corresponds to a level of uncertainty of 2.3*σ*, while found in section 5 reaches 3.7*σ*. It is interesting to note that the level of uncertainty officially requested in theoretical physics to corroborate a discovery as such (e.g., the existence of the Higgs boson) is 5*σ*. By applying such standards, one may actually consider that is still too low a probability to corroborate that human influence has indeed been the cause of the observed warming. Whether or not such standards are relevant in the particular context of climate change—which relates to defining the proper level of aversion to false discovery suitable in that context—is a disputable matter. In any case, increasing beyond the “virtual certainty” threshold of 0.99 by building more evidence into the analysis, is possible and may still be considered as a relevant research goal.

### d. Alternative assumptions

The mathematical developments of section 4 are but an illustration of how our proposed causal approach, as framed in section 3, can be implemented when one uses the conventional assumptions of pattern scaling and Gaussianity associated to the standard linear regression setting. In that sense, section 4 thus shows that the proposed causal framing is perfectly compatible with the conventional linear regression setting: it should be viewed as an extension of, rather than an alternative to, the latter setting. Nevertheless, it is important to underline that the application of the causal framework of section 3 is by no means restricted to the conventional linear regression setting. One may, for instance, challenge some aspects of the latter (e.g., the pattern scaling description of model error) and formulate an alternative parameterization of the covariance . This does not affect the relevance of the maximization of Eq. (7), which can be implemented similarly.

### e. Attribution as a classification problem

Last, it should be noted that the maximization of Eq. (7) can be viewed as a binary classification problem. Indeed, as illustrated in Fig. 5, solving Eq. (7) is equivalent to building a function of observations that allows us to optimally discriminate between two “classes”: the factual class and the counterfactual class. Under this perspective, PNS is related to the percent of correct classification decisions made by the classifier and is thus a measure of its skill.

Viewing the fingerprinting index as a classifier offers a fruitful methodological angle in our opinion. Indeed, classification is a classic and widespread problem in statistics and machine learning for which numerous solutions are readily available. For instance, under the restrictive assumptions of section 4—among them the Gaussian assumption—one obtains a linear classifier under a closed form expression, which is well known in linear discriminant analysis (McLachlan 2004). But more recent developments have focused on the non-Gaussian situations where a nonlinear classifier is more suitable, and can be formulated for instance using as diverse approaches as random forests, support vector machine, or neural nets (Alpaydin 2010). Testing such approaches for the present attribution problem certainly offers promise. However, the difficulty of physically interpreting such complex classifiers represent a challenge in such approaches.

## 7. Summary and conclusions

We have introduced an approach for deriving the probability that a forcing has caused a given observed change. The proposed approach is anchored into causal counterfactual theory (Pearl 2009), which has been introduced recently in the context of event attribution (EA). We argued that these concepts are also relevant, and can be straightforwardly extended to the context of climate change attribution. For this purpose, and in agreement with the principle of *fingerprinting* applied in the conventional detection and attribution (D&A) framework, a trajectory of change is converted into an event occurrence defined by maximizing the causal evidence associated to the forcing under scrutiny. Other key assumptions used in the conventional D&A framework, in particular those related to numerical models error, can also be adapted conveniently to this approach. Our proposal thus allows us to bridge the conventional framework with the standard causal theory, in an attempt to improve the quantification of causal probabilities. Our illustration suggests that our approach is prone to yield a higher estimate of the probability that anthropogenic forcings have caused the observed temperature change, thus supporting more assertive causal claims.

## Acknowledgments

We gratefully acknowledge helpful comments by Aurélien Ribes and three anonymous reviewers. This work was supported by the French Agence Nationale de la Recherche grant DADA (AH, PN), and the grants LEFE-INSU-Multirisk, AMERISKA, A2C2, and Extremoscope (PN). The work of PN was completed during his visit at the IMAGE-NCAR group in Boulder, Colorado, United States.

### APPENDIX A

#### Derivation of the PDF of *Y*

Given the quadratic dependence to *β* of the two terms under the integral in the right-hand side of Eq. (A1), it is clear that the PDF of the left-hand side is also Gaussian. Thus, instead of computing the above integral, it is more convenient to derive the mean and variance of this PDF by applying the rule of total expectation and total variance:

Next, in order to account for the sampling uncertainty on the estimation of *μ*, we apply the Bayes theorem to derive the PDF of *μ* conditional on the ensemble . Denote the *r* simulated responses in that are assumed to be i.i.d. according to a Gaussian with mean *μ* and covariance . We then have

where is the empirical mean of the ensemble, and we use the improper prior . The exact same approach yields .

To integrate out *μ*, we proceed by following the same reasoning as above for integrating out *β*. Since the resulting PDF is clearly Gaussian, it suffices to derive its mean and variance:

Likewise, to integrate out , we derive the total mean and total variance:

with . Note that is no longer Gaussian after integrating out **x** because **x** appears in the covariance of . However, for simplicity, we approximate it to be Gaussian.

The sampling uncertainty on the covariance matrix is addressed by using an approach described in Hannart and Naveau (2014), the main ideas of which are succinctly recalled here. The reader is referred to the publication for details and explicit calculations. In summary, we apply the Bayes theorem in order to derive , as for *μ* and **x**. However, we use this time an informative conjugate prior on , as opposed to an improper prior:

where denotes the a priori mean of and *a* is a scalar parameter that drives the a priori variance. Furthermore, the mean and variance parameters of this informative prior are estimated from by maximizing the marginal likelihood associated with this Bayesian model:

where is the *n*-variate gamma function and is the empirical covariance. The estimators satisfy to

where is a set of definite positive matrices chosen to introduce a regularization constraint on the covariance. Here we choose as the set of definite positive diagonal matrices, and we derive an approximated solution to Eq. (A8) with and . Because the prior PDF is fitted on the data, this approach can be referred to as “empirical Bayesian.” The “fitted” prior is then updated using the ensemble , and the obtained posterior has a closed form expression due to conjugacy:

where and . We can then use the above posterior to integrate out in the PDF of *Y*, in order to obtain :

The integral above does not have a closed form expression because the variance of is not proportional to . To address this issue, we approximate by . This assumption is conservative in the sense that it extends the sampling uncertainty on to even though is a constant. It yields a closed form expression of the above integral thanks to conjugacy:

### APPENDIX B

#### Optimal Index Derivation

Let us solve the optimization problem of Eq. (7) under the above assumptions. For simplicity, we restrict our search to so-called *half-space* events, which are defined by , where is a linear index with *ϕ* a vector of dimension *n*, and *u* is a threshold. Let us consider PNS as a function of *ϕ* and *u*:

For simplicity, we will use an expression of in the treatment of the optimization problem which approximates by a Gaussian PDF, even though it is a Student *t* PDF from the calculations of section 4. Note that this approximation is made restrictively here for deriving an optimal index. Once this index is obtained, it is the then the true Student *t* PDF of *Y* that will be used to derive the desired value of PNS. Therefore, the implication of this approximation is to yield an index that is suboptimal and thereby underestimates the maximized value :

where *F* is the standard Gaussian CDF. The first-order condition in *u*, , thus yields

Next, we introduce a third approximation to solve Eq. (B3), yielding

Then, the first-order condition in *ϕ*, , yields

which proves Eq. (21). Figure 5c illustrates this solution and also shows that the optimization problem of Eq. (7) may be viewed as a classification problem. Our proposal to solve Eq. (7) is in fact similar to a commonplace classification algorithm used in machine learning and known as a support vector machine (SVM) (Cortes and Vapnik 1995).

### APPENDIX C

#### Data Used in Illustration

As in Hannart (2016), observations were obtained from the HADCRUT4 monthly temperature dataset (Morice et al. 2012), while GCM model simulations were obtained from the IPSL CM5A-LR model (Dufresne et al. 2013), downloaded from the CMIP5 database. An ensemble of runs consisting of two sets of forcings was used, the natural set of forcings (NAT) and the anthropogenic set of forcings (ANT) for which three runs are available in each case over the period of interest and from which an ensemble average was derived. On the other hand, a single preindustrial control run of 1000 years is available and was thus split into 10 individual control runs of 100 years. Temperature in both observations and simulations were converted to anomalies by subtracting the time average over the reference period 1960–91. The data were averaged temporally and spatially using a temporal resolution of 10 years. Averaging was performed for both observations and simulations by using restrictive values for which observations were nonmissing, for a like-to-like comparison between observations and simulations.

## REFERENCES

*Introduction to Machine Learning.*2nd ed. MIT Press, 537 pp.

*Applied Smoothing Techniques for Data Analysis: The Kernel Approach with S-Plus Illustrations.*Oxford University Press, 204 pp.

*Climate Change 2007: The Physical Science Basis,*S. Solomon et al., Eds., Cambridge University Press, 663–745.

*Climate Change 2014: Synthesis Report.*R.K. Pachauri and L.A. Meyer, Eds., IPCC, 151 pp.

*Discriminant Analysis and Statistical Pattern Recognition.*Wiley Interscience, 526 pp.

*The Facts of Causation*. Routledge, 251 pp.

*Causality: Models, Reasoning and Inference*. 2nd ed. Cambridge University Press, 487 pp.

*A Probabilistic Theory of Causality*. North-Holland Publishing, 130 pp.

## Footnotes

© 2018 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

^{1}

The notation corresponds to the confidence level associated to the confidence interval under a frequentist approach, and to the posterior probability that is positive under a Bayesian one.