Describing the Relationship between a Weather Event and Climate Change: A New Statistical Approach

Describing the relationship between a weather event and climate change—a science usually termed event attribution —involves quantifying the extent to which human inﬂuence has affected the frequency or the strength of an observed event. In this study we show how event attribution can be implemented through the application of nonstationary statistics to transient simulations, typically covering the 1850–2100 period. The use of existing CMIP-style simulations has many advantages, including their availability for a large range of coupled models and the fact that they are not conditional to a given oceanic state. We develop a technique for providing a multimodel synthesis, consistent with the uncertainty analysis of long-term changes. Last, we describe how model estimates can be combined with historical observations to provide a single diagnosis accounting for both sources of information. The potential of this new method is illustrated using the 2003 European heat wave and under a Gaussian assumption. Results suggest that (i) it is feasible to perform event attribution using transient simulations and nonstationary statistics, even for a single model; (ii) the use of multimodel synthesis in event attribution is highly desirable given the spread in single-model estimates; and (iii) merging models and observations substantially reduces uncertainties in human-induced changes. Investigating transient simulations also enables us to derive insightful diagnostics of how the targeted event will be affected by climate change in the future.


Introduction
Describing the relationship between a given weather or climate event and anthropogenic climate change is a growing area of activity in the field of climate science (National Academies of Sciences, Engineering, and Medicine 2016). Since the pioneering studies of Allen (2003) and Stott et al. (2004), the concept of event attribution has been applied to a wide variety of events, as synthesized in the annual special issues of the Bulletin of the American Meteorological Society (BAMS) ''Explaining Extreme Events from a Climate Perspective'' (Peterson et al. 2012, and subsequent issues). 1 Multiple approaches have been introduced to address this question. Beyond issues related to the definition of the event of interest, the most commonly used approach is probabilistic and involves a comparison of the distributions of extreme events in the factual versus Denotes content that is immediately available upon publication as open access. counterfactual worlds (e.g., Stott et al. 2004;Pall et al. 2011). Particular attention is paid to changes in probability of the event associated with human influence. Various alternatives have been proposed in the literature; one of these involves focusing on the thermodynamic component of human influence (e.g., Trenberth et al. 2015;Cattiaux et al. 2010). However, this study will focus on the probabilistic approach and its statistical implementation, that is, how estimating changes in occurrence frequency and the corresponding uncertainty.
At least two methods are commonly used to derive such probabilities.
First, large ensembles of simulations are used to sample the factual and counterfactual statistical distributions (Pall et al. 2011;Massey et al. 2015;Christidis et al. 2013;Ciavarella et al. 2018;Wehner et al. 2018). Such ensembles are typically produced with atmosphericonly models forced by prescribed sea surface temperatures (SSTs); factual SSTs are usually taken from observations, while counterfactual SSTs are derived by subtracting an estimate of the anthropogenic influence. Such ensembles can be very large, typically from a few hundred to more than 10 000 simulations of one year or one season. One important advantage of using such large ensembles is that the signal-to-noise ratio is increased (sampling noise is reduced), and probabilities can be estimated very straightforwardly by counting exceedences, that is, using a minimal statistical inference-although more complex treatments have also been proposed (Paciorek et al. 2018). Several disadvantages should also be mentioned: the computational cost is relatively high (large number of simulations, which have to be redone on an annual basis at least), processes involving oceanatmosphere coupling are missing (Dong et al. 2017), results are conditional on the particular pattern of SSTs considered (Risser et al. 2017), model bias or reliability issues can affect results (Bellprat and Doblas-Reyes 2016), and last, modeling uncertainty is usually not assessed comprehensively due to the lack of coordinated exercise  is a noticeable exception].
Second, occurrence probabilities can be inferred from observations and observed trends, assuming that the trends are entirely related to human influence on climate (e.g., van Oldenborgh et al. 2015;Vautard et al. 2015). This approach eliminates concerns related to climate model biases and/or error in representing climate change. However, inference becomes dependent on the choice of a statistical model, and the signal-to-noise ratio is usually limited in observations (e.g., Li et al. 2019). Uncertainty in the observed trend estimate might lead to very wide uncertainty in the probability ratio or other diagnostics of the human influence. Further, this technique is highly dependent on the availability of a long, homogeneous historical record.
A few attempts have been made to consider these two approaches simultaneously (Uhe et al. 2016;Eden et al. 2016;Hauser et al. 2017), providing very helpful comparisons of methods for selected case studies. Several attribution studies from the World Weather Attribution 2 (WWA) also made some synthesis of both sources of information. However, to the best of our knowledge, no statistical method has been proposed to formally combine models and observations results in order to provide one single estimate of human influence.
In this paper, we tackle several of these issues. First, we propose to base event attribution on transient the Coupled Model Intercomparison Project (CMIP)-style simulations-typically a combination of historical and RCP scenarios. This is done through the use of nonstationary statistics (section 3). Second, we propose a statistical procedure to create a rigorous multimodel synthesis. This question has not been fully addressed in previous event attribution literature. We show that, if a large multimodel ensemble is available, the assumptions and techniques used to construct multimodel syntheses for large-scale mean variables can be extended to event attribution (section 4). Third, we introduce a statistical framework for merging information from models and observations. The proposed method is essentially Bayesian, in the sense that available observations are used to constrain the model range further (section 5).
In addition to the statistical inference itself, we promote the use of two additional diagnostics in describing the relationship between a particular event and climate change. First, the human influence is quantified both in terms of probability and intensity of the event. Although highlighting this duality is not new, using one point of view or the other may have contributed to past controversies (Otto et al. 2012), and both quantities can be derived from the same statistical analysis. Second, we describe how the characteristics of the event (frequency, intensity) evolve with time. This allows us to describe not only the human influence to date-the main diagnosis of event attribution studies-but also how a similar event will be affected by climate change in the future. This type of outcome is another benefit of using transient simulations and might be very helpful for communicating the relationship between an event and climate change in a comprehensive way.
Several facets of the proposed statistical method have been already introduced in the literature, for example, the investigation of future event probability (Sun et al. 2014;Christidis et al. 2015a;Kirchmeier-Young et al. 2017) or the use of some observational constraints (e.g., Christidis et al. 2015b). The general structure of our method also shares several similarities with recent WWA studies [e.g., the use of a covariate and nonstationary statistics; see Van Oldenborgh et al. (2017), among other examples]. Using transient CMIPstyle simulations for event attribution is not a new idea either (Lewis and Karoly 2013;Sun et al. 2014;Christidis et al. 2015b;King et al. 2015;Kirchmeier-Young et al. 2017. The main issue with such simulations is that the sample size is limited-usually to no more than 10 members. This is at least partly compensated by the fact that these simulations include a period of time (near the end of the twenty-first century) in which the human influence is much more pronounced than in the current climate-increasing the signal-to-noise ratio. Regardless, there are tremendous advantages in using such simulations: they are already available (reducing the additional computational cost for an event attribution analysis to almost zero), performed with fully coupled models (i.e., accounting for coupled processes, and also not conditional on a specific oceanic state), and available for many models (making it possible to account for a wider range of model uncertainties).
The main goal of this paper is to describe the proposed statistical method, and to provide a first illustration of its potential. The proposed algorithm is flexible and could be improved in several ways, without significantly affecting its general structure. Strengths and weaknesses are also discussed in the last section.

a. Event definition and indicators of the human influence
A relatively wide variety of events or classes of events has been considered in event attribution. In this study we focus on simple classes of events such as where Y is a univariate random climate variable-typically temperature, rainfall, or wind speed, averaged over a given time window and spatial domain-and s is a predetermined threshold. This class of events consists of all outcomes that are greater than or equal to the threshold s. The choice of s is, in some way, informed by the weather event that motivates the study. We assume that this event has happened at a time t e in the factual (F) world. 3 The attribution analysis involves describing the characteristics of a similar event happening in the counterfactual (C) world. 4 The meaning of similar depends on the point of view (frequency or intensity) and is discussed below. As we consider transient simulations where climate changes with time, describing how the characteristics of the event vary with time, for example, in the factual world, is also of interest. Changes in occurrence frequency/probability can be assessed by comparing the probability of the event E happening in (F) versus (C), considering the same threshold s. Denoting F F,t and F C,t the cumulative distribution functions of Y at time t in the factual and counterfactual worlds, respectively, we define Human influence is then typically characterized through the probability ratio (PR) or the fraction of attributable probability 5 (FAP; Stott et al. 2004): As they are of critical importance, we will denote p F 5 p F (t e ) and p C 5 p C (t e ) the probabilities at time t e . Changes in intensity are assessed by comparing the magnitude of events with the same occurrence probability; this value is set to p F , consistent with the observed event: In other words, I F and I C are the quantiles of order p F of F F,t and F C,t , respectively. By definition, I F (t e ) 5 s. DI tells how much more/less intense the event with exactly the same frequency would have been in the counterfactual world. Note that, for some climate variable such as precipitation, a relative difference (i.e., DI 5 I F /I C ) might be better appropriate than absolute difference. Two important remarks can be added. First, conventional attribution studies only calculate PR, FAP, or DI at time t e , that is, the exact date at which the event was observed. Calculation of PR, FAP, or DI at a different date allows us to quantify the human influence, had the event occurred at that date. Second, describing how the characteristics of the event are changing through time, for example, in the factual world, is also helpful (see, e.g., Christidis et al. 2015a). This can be done using relative indices, for example, , or DI rel (t) 5 I F (t) 2 I F (t e ): (5) All these diagnostics are calculated and illustrated subsequently.

b. Case study: 2003 European heat wave
To illustrate the method presented in this paper, we focus on the 2003 European heat wave (EHW03), an event that has long been scrutinized in event attribution studies (Stott et al. 2004;Schär et al. 2004;Christidis et al. 2015a). We define EHW03 (variable Y) as a 1-month event occurring in August 2003 near Paris, France. The spatial domain considered is a 58 3 58 square surrounding Paris, that is, [458-508N] and [08-58E]. The choice of this space-time window is debatable (see, e.g., Cattiaux and Ribes 2018). A monthly value was considered as a convenient choice in order to involve as many phase 5 of CMIP (CMIP5) models as possible and investigate their level of agreement. The threshold used is s 5 15.388C anomaly with respect to the 1961-90 mean, corresponding to actual value observed in 2003.
As further discussed below, our technique also requires the use of a covariate X, which is assumed to be representative of climate change magnitude over time. We consider the summer mean temperature over western Europe ([358-708N] and [108W-308E]) in this respect.

c. Data
We use data from a collection of climate models from CMIP5 (all 24 models considered are listed in Fig. 4). For each model, we combine historical simulations (1850-2005) and RCP8.5 simulations . We use all available runs in cases where ensembles have been performed-using a different number of historical and RCP8.5 simulations is not problematic. Preindustrial control simulations are also used to quantify internal variability and derive confidence intervals.
Our method also requires using observed data. We use HadCRUT4 (Morice et al. 2012

Statistical analysis of transient simulations
In this section, we consider data from one single climate model and describe how changes in occurrence probability can be calculated from such data. By construction, transient simulations exhibit a nonstationary climate, so using nonstationary statistics is a key component of our approach. Therefore, we consider a covariate X that is assumed to be representative of climate change magnitude over time. The covariate will typically be a temperature, averaged either globally or over a large region, on a seasonal or annual basis. Several studies have already used the global mean temperature as such a covariate van der Wiel et al. 2017). Here we use summer mean European average temperature and refer to section 6 for further discussion on this choice. Once this covariate X has been selected, our procedure is as follows.
a. Anthopogenic and natural contributions to changes in X As a first step, we need to estimate the forced responses in the covariate X and, in particular, the contributions of natural versus anthropogenic forcings to changes in X. This is typically the purpose of detection and attribution techniques. However, these techniques are not usually designed to provide smooth time series as a result. We therefore propose a hybrid approach using generalized additive models (GAM; Hastie 2017; Wood 2017).
We assume that where m x is a constant, e t is an energy balance model (EBM; see Held et al. 2010) response to natural forcings only at the global scale, b is an unknown scaling factor, f(t) is a smooth temporal function, and « t is a random term describing internal variability. Within this framework, estimation of the response to natural forcing is quite consistent with usual practice in detection and attribution [D&A; more specifically, the ordinary least squares (OLS) formulation], as it involves the estimation of an unknown scaling factor b. The main innovation is the consideration of the response e t derived from an EBM (similar to Huber and Knutti 2012), rather than a more complex model. In doing this, we take advantage of the forcing time series and avoid involving additional noise (i.e., internal variability) from a climate model run. The impact of major volcanic eruptions, for instance, becomes clearly noticeable in that way. In practice, we consider the EBM parameter settings given in Geoffroy et al. (2013), for a subset of CMIP5 models. We calculate the EBM responses e t using the corresponding settings (one calculation for each CMIP model). The multimodel average of all e t is taken as a best estimate, while we randomly sample over the finite set of responses to quantify uncertainty. Finally, the rescaling of e t is important if a regional temperature is considered, as EBMs are only suitable to describe global or hemispheric means.
The response to anthropogenic forcing f is assumed to be smooth over time and is estimated from the data x t using smoothing splines. The regularity assumption can be regarded as sensible, as greenhouse gas and aerosols, that is, the two dominant drivers, vary quite slowly over time. Anthropogenically induced changes are computed with respect to a reference date t ref , implying that f(t ref ) 5 0; we consider t ref 5 1850, consistent with CMIP5 protocol, but another reference date could be used. Although the human influence is dominant on mean temperature changes over recent decades (Bindoff et al. 2013), the estimate of f is still influenced by low-frequency internal variability. It will be necessary to account for this component in the uncertainty analysis.
Estimation within model (6) can be made using standard GAM tools. Here we chose to estimate f using smoothing splines with 6 equivalent degrees of freedom-this number was tuned using cross validation.
Quantifying uncertainty in this decomposition is more difficult, since it is important to account for dependencies in « t . It is assumed that « t ; N(0, S), where S is known (derived from preindustrial control simulations, as usual in D&A) but not equal to identity. Uncertainties in X nat and X ant are assessed by using (i) perturbed values of e t (using EBM parameters fitted to individual CMIP models), (ii) parametric uncertainty on b given [f(), e t ], and (iii) parametric uncertainty on f() This decomposition procedure is illustrated in Fig. 1 for one particular CMIP5 model (CNRM-CM5). Response to major volcanic eruptions can be easily identified in both the factual world (all forcings combined) and the counterfactual world (natural forcings only). The human influence emerges from noise near 1970 in this model. This is not necessarily contradictory with the fact that human influence is not attributable at that date in the instrumental record-10 model runs are used, while only one observed realization is available, implying different signal-to-noise ratios.
This decomposition produces two major outputs: the estimated response to natural forcings only X nat , corresponding to the expected value of X in the counterfactual world, and X all 5 X ant 1 X nat , describing the state of X in the factual world.
b. Fitting a nonstationary distribution to Y As a second step, a nonstationary distribution is fitted to the variable of interest Y. X all (t) is used as a covariate in this nonstationary fit. Two types of distributions are considered: The parameters g 5 (m 0 , m 1 , s 0 , s 1 ) can be estimated via maximum likelihood. However, no closed formula is available in this case, and an optimization algorithm is needed. We used the nlminb R routine, chosen from other possible options. Confidence regions on g can be derived by bootstrapping [X all (t), Y t ] and simultaneously considering uncertainty on X all , derived from the previous step. The bootstrap used assumes FIG. 1. ANT and NAT influences in covariate X. Changes in covariate X (raw data are shown as black points) are decomposed into changes related to (top) all forcings combined (red), (middle) anthropogenic forcings only (green), and (bottom) natural forcings only (blue). Shaded area corresponds to 5%-95% confidence regions. All data are from CNRM-CM5, including all runs available from this model (10 historical, 5 RCP8.5).
year-to-year independence and therefore does not account for potential interannual correlation. d Nonparametric distributions, assuming that the quantile of order a at time t, q a (t), satisfies In this case, the parameters g 5 (m a 0 , m a 1 ) can be estimated, for a collection of a, using quantile regression (Koenker and Bassett 1978;Koenker and Hallock 2001). Given typical sample sizes (a few hundreds to thousands of data) and the computational cost of resampling, a fast algorithm is needed, and we used the Frisch-Newton approach after preprocessing (see Portnoy et al. 1997, implemented in R under ''pfn''). Uncertainty on (m a 0 , m a 1 ) is assessed through a bootstrap procedure.
This list of distributions is obviously not exhaustive, and other families might be used. For instance, generalized extreme value (GEV) distributions could be of interest when the focus is on annual maxima (Coles et al. 2001). Still, a nonparametric distribution offers the highest flexibility, and might be nicely complemented by adjusting generalized Pareto distribution (GPD) to the tails, in order to improve estimation of rare values. 6 In the remainder of this paper, we focus on nonstationary Gaussian distribution only for illustrating our method. So far, nonparametric distributions were only used to analyze transient simulations, and provided results consistent with Gaussian distributions.
The fit of a nonstationary Gaussian distribution is illustrated in Fig. 2. This figure suggests that X all (t) is an appropriate covariate for Y t , as there is no evidence of a nonlinear relationship in the data. More generally, this type of diagnosis can be used to check if the choice of the covariate is appropriate. The fact that the three regression lines (corresponding to mean and quantiles) are almost parallel indicates that there are almost no changes in variance for this particular model and variable. We also find that, thanks to the temporal smoothing, internal variability does not contribute to the correlation between X all (t) and Y t (not shown)-although internal variability does induce a significant correlation between X t and Y t in this particular case.
In the following, g will be split into (g 0 , g 1 ), where g 0 are parameters describing the distribution of Y at a reference time (or in a stationary climate), while g 1 are parameters describing the sensitivity of Y to changes in X. For instance, in the Gaussian case, g 0 5 (m 0 , s 0 ) and g 1 5 (m 1 , s 1 ).

c. Estimating changes in probability/intensity
Once a nonstationary distribution has been fitted on Y, the PDF of Y in the factual world is given by (8). The equivalent PDF in the counterfactual world is obtained by replacing X all by X nat . These PDFs can be translated into the corresponding CDFs F F,t and F C,t . Then, the diagnostics proposed in section 2 can be derived easily, leading to PR and DI.
Changes in frequency and intensity, as estimated from one particular CMIP5 model, are illustrated in Fig. 3 for the EHW03 event as defined in section 2b. The event frequency moves from about 2.10 24 in 1850 to more than 1/2 in 2100 in the factual world. These numbers are not inconsistent with Stott et al. (2004), as we consider a smaller space-time domain. Over the same period, the FIG. 2. Fit of a nonstationary Gaussian distribution. Illustration of the nonstationary fit for data from the CNRM-CM5 model, assuming Gaussian distribution. Black points: data [X all (t), Y t ], where X all (t) has been estimated following section 3a, and Y t is the raw Y data, that is, August mean surface air temperatures from CNRM-CM5 for the 58 3 58 grid square that includes Paris. Several simulations are considered, leading to several values of Y t at each time t. Many points lie in the bottom-left corner, corresponding to the quasi-stationary climate of the period before 1980. Red lines: change in the mean of Y t (solid) or 5%-95% quantiles (dashed lines) as estimated from the nonstationary fit. magnitude of the event (now defined by its occurrence probability) increases by about 68C. These diagnostics suggest that recent changes are large in terms of probability ratios (PR near 10 in 2003) while remaining limited in terms of magnitude (near 18C in 2003) in the CNRM-CM5 model. The influence of natural forcings is clearly discernible, and mainly driven by large volcanic eruptions. Consistent with Fig. 1, both frequency and intensity exhibit a discernible human influence as early as 1970 in this model. Human influence becomes huge during the twenty-first century, with PR higher than 10 3 in 2100. Overall, confidence intervals are relatively narrow but are consistent with the estimated changes in X (which exhibits limited uncertainty; Fig. 1) and the fact that there is a clear relationship between X and Y (Fig. 2). The latter implies that any significant change in X translates into a significant change in Y.

a. Results from CMIP5 models
To give a broader picture, this procedure can then be applied to other CMIP models (Fig. 4). This reveals that model uncertainty is large-unlike estimation (or sampling) uncertainty, which remains very limited. Model best estimates of PR vary from 1.7 to more than 300 at the date of the event. Discrepancies among models are also very large in terms of DI, from 0.28 to 38C in 2003. Similar findings are made on the other parameters involved: p C , p F , I C , and I F -keeping in mind that model biases contribute substantially to discrepancies in I C and I F .
Unlike CNRM-CM5, some individual models exhibit a significant cooling trend (e.g., FGOALS-g2, ACCESS1.3, all versions of MIROC and CMCC) or warming trend [e.g., BCC_CSM1.1(m), INM-CM4.0, GISS-E2-R] in X all prior to 1950 (Fig. 5a)-a period over which the anthropogenic forcings are limited. Most of this trend is interpreted as resulting from human influence (i.e., falls into X ant ) according to the simple decomposition described in section 3a. Such trends typically result in PR becoming significantly different from 1, and DI significantly different from 0, soon after 1850 (Figs. 5d,g). At this stage it is unclear whether these trends (i) are related to low-frequency internal variability that is inappropriately taken into account, (ii) can be explained by a long-term regional drift (imbalance) in preindustrial control simulations, or (iii) highlight an early onset The multimodel distribution estimated from this sample of models is illustrated through: the mean and 90% confidence region (i.e., confidence region of u 0 , or resulting diagnostics, using the ''models are statistically indistinguishable from the truth'' paradigm), or (right) new realizations drawn from that distribution (which can be interpreted as virtual climate models). of the anthropogenic influence, dominated by either aerosols or greenhouse gases (GHGs).
Though large, discrepancies among models can be easily understood. Models disagree on the magnitude of the changes in the covariate X (different global or local sensitivity to forcings), the variance of Y (which strongly influences the probability of exceeding a high threshold), and the strength of the relationship between Y and X. Each model might exhibit some bias in one of these characteristics. This highlights the need for a multimodel synthesis.

b. Building a multimodel synthesis
Techniques for building a multimodel synthesis have received much attention in both the literature and the IPCC assessments, due to their importance in providing climate change projections for the next century, including an assessment of uncertainty (Collins et al. 2013). Literature on the subject of how to use an ensemble of opportunity such as the CMIP ensemble, that is, where no particular design effort is made to cover the range of uncertainty (Tebaldi and Knutti 2007;Knutti et al. 2010a,b), is relatively abundant. However, these efforts have not been translated into event attribution thus far to the best of our knowledge, although multimodel event attribution studies have become more popular (e.g., Sun et al. 2014;Kirchmeier-Young et al. 2017Stone et al. 2019). In this section we introduce one possible method for conducting such a synthesis in the context of the statistical framework described above. Here we propose a technique similar to that outlined in Ribes et al. (2017); we review the main concepts here but refer to that publication for a more detailed discussion.
Following section 3, the parameters describing the response of one single model are u 5 [X all (t), X nat (t), g]all diagnostics can be derived from u. Let u i be the true value of u in model i (i 5 1, . . . , M), letû i be its estimator (as derived from section 3), and let u o be the true value of u in the real world. The key idea behind the multimodel synthesis is to assume that (u i ) i51,..., M are realizations of one multimodel distribution. Further, it is assumed that the truth u o is also a possible realization of this multimodel distribution-a paradigm known as models are statistically indistinguishable from the truth (Annan and Hargreaves 2010), or model-truth exchangeability (Rougier et al. 2013). These are strong assumptions, given that we do not know how the available sample of models was drawn from the space of plausible representations of the climate. There is no consensus on whether these assumptions lead to well-calibrated probabilistic forecasts (Collins et al. 2013), but some authors report empirical evidence that they are consistent with past observations, in particular, for temperature (Annan and Hargreaves 2010;van Oldenborgh et al. 2013). In the following, this multimodel distribution is estimated using a Gaussian assumption. Multimodel statistics such as confidence regions can then be derived.
In more detail, we assume that (for i 5 1, . . . , m) leading toû where u i is the value of u for model i,û i is its estimate, m m and S m are the mean and variance of the multimodel population (i.e., S m represents modeling uncertainty on u), and S u,i describes the uncertainty related to internal variability in the estimation of u i . For each model,û i and S u,i can be derived from the estimation procedure described in section 3. The next step is to estimate m m and S m from the available sample ofû i -we refer to Ribes et al. (2017) for this technical step. Last, a prediction region for u o (i.e., a draw of a new value of u) can be derived from (m m , S m ). Given a collection of CMIP models such as in Fig. 4, our procedure can be used to derive multimodel statistics and confidence regions (Fig. 5, and ''MULTI'' confidence ranges in Fig. 4). The fitted multimodel distribution can also be used to sample new realizations (using Monte Carlo simulations) corresponding to virtual climate models (Fig. 5), which is a way to check that the fitted distribution is consistent with the model sample.
Multimodel uncertainty is found to be much larger than the sampling uncertainty related to internal variability in one given model. This is not surprising for a monthlong temperature event such as the one investigated here and is consistent with many other studies (e.g., Hawkins and Sutton 2009)-but might not be the case with other variables or temporal scales. The multimodel confidence range (5%-95%) for PR is about [1.4, 260] in 2003, which better reflects the overall uncertainty than singlemodel estimates. It is worth noting that the reported multimodel confidence regions are not equal to the range of single-model results. Some models can be excluded from the final confidence region if they are outliers in terms of u. And, in the presence of a very large sample of models, the bounds of the multimodel confidence region would converge to the corresponding quantiles of the model sample.
This uncertainty range in PR is larger than reported by Stott et al. (2004). Two important features of our method could contribute to this discrepancy, in addition to the different event definitions. First, the ensemble of models considered here is larger than in any other attribution study, enabling a more comprehensive exploration of uncertainties-resulting in larger confidence intervals. In particular, differences in the variance of Y strongly contribute to the spread in p F . Second, the attribution performed here is less constrained than other approaches. For instance, Stott et al. (2004) used historical observations (as opposed to model experiments only) to estimate the attributable warming, following standard D&A practice. In the most widespread event attribution procedure, the human-induced SST warming is also constrained by observations (Pall et al. 2011). This highlights the benefit of incorporating observed information in our procedure-a path explored in the next section. Our results also suggest that it is critical to consider a large ensemble of models for event attribution-as for assessing many other features of climate change.

Merging models and observations
In this section we introduce two techniques to combine observations and information provided by climate models, using the multimodel synthesis as a starting point. Among other possible approaches, we focus on using observations to constrain changes in X and estimate the distribution of Y. Other options are briefly discussed in section 6.

a. Observed changes in X
Detection and attribution studies have long illustrated that observations might be used to derive information on, for example, human-induced warming to date (Bindoff et al. 2013), particularly when the signal-tonoise ratio is high. In contrast with this, in Figs. 5a,b, models exhibit large differences in the simulated regional warming (covariate X) to date: 0.58 to 2.08C in 2015. It is thus natural to investigate whether available observations enable us to narrow this range further.
In mathematical terms, real observations of covariable X, say X o , can be written as where X o all is the real-world response to external forcings and « o is the contribution of internal variability, considered to be random. Using a Bayesian perspective, multimodel uncertainty on X all (derived from the multimodel synthesis, e.g., Fig. 5b) can be considered as a prior distribution for X o all , say p(X o all ). If both this prior distribution and the distribution of « o are known, then the posterior distribution of X o all jX o can be derived, using a conventional Bayesian technique. The distribution of u o jX o (u o contains X o all but is larger) can be derived similarly. This derivation is particularly easy under the assumptions made in section 4, as all distributions involved [i.e., p(X o all ) and the distribution of «] are assumed to be Gaussian. As a consequence, u o jX o follows a Gaussian distribution as well, but with mean and variance different from (m m , S m ). Last, this constrained distribution can be used, instead of the unconstrained one, to derive all results, following the same procedure as in section 4.
Application of this procedure to summer mean temperature over Europe (i.e., our covariate X) suggests that some model responses to historical forcings are inconsistent with observations (Fig. 6). This phenomenon can be explained as follows. Intuitively, X o all (t) is the expectation of X o at time t, so observations should be distributed around X o all (t). An estimate of X o all is not quite plausible if it lies far away from most of observed values x o t . In Fig. 6, the upper and lower bounds of the multimodel distribution p(X o all ) fall almost outside the set of observations over the beginning (before 1900) or the end (after 2000) of the observed period, suggesting some inconsistency. 7 Using observational information FIG. 6. Observational constraint on covariate X. Observation of covariate X (here European summer mean temperature since 1850, black points), are compared to the multimodel distribution of externally forced changes in X, that is, p(X o all ) (light pink, 5%-95% confidence region, identical to Fig. 5b). Uncertainty in changes in X decreases after applying the observational constraint, that is, X o all jX o (dark pink, 5%-95% confidence region). Best estimates before (light brown) and after (brown) applying the observational constraint are almost indistinguishable in this case, as observations are consistent with the multimodel mean estimate. All values are temperature anomalies with respect to the 1961-90 period. therefore leads to a substantial reduction of the multimodel uncertainty in changes in X. This reduction is particularly clear over the historical period: the multimodel 5%-95% confidence range of total 1850-2015 warming is [0.58, 2.08C] without the use of any observations [i.e., in p(X o all )], but shrinks to [0.98, 1.48C] after applying the observational constraint (i.e., in X o all jX o ). The spread in future warming is also substantially reduced-[3.88, 7.78C] and [4.58, 6.98C] for the corresponding 5%-95% confidence ranges before and after applying the observational constraint, respectively.
The impact on event attribution diagnostics is also very clear, with a sharp reduction of uncertainties in PR or DI (Fig. 8). The lower bound of PR is particularly affected (3.1 after accounting for observations, as opposed to 1.4 without), because some model responses exhibit almost no human-induced warming in 2003, while observations suggest that such weak responses are unlikely. The effect on DI is also very clear: uncertainty reduces from [10.18, 12.38C] before accounting for observations to [10.58, 11.58C] after. Overall, these results suggest that taking observational information into account is very helpful, even if done only through the covariate X, that is, at a large spatiotemporal scale.

b. Observed distribution of Y
Another way of merging climate models' outputs with real-world observations is to estimate the distribution of Y, for example, at the time of the event. Climate models exhibit all sorts of biases (e.g., biases in the mean climate, the variability, the tails of the distribution, etc.) that can make the simulated p F (and the entire probability distribution) erroneous (Bellprat and Doblas-Reyes 2016). Figure 4 shows that estimates of p F vary widely, and are inconsistent among models; the multimodel synthesis exhibits large uncertainty: 5 3 10 27 to 3 3 10 22 . In such a case, the historical record can bring useful information to constrain the range of model estimate. Considering observations is also consistent with the practice of many national weather services in characterizing extreme events (Klein Tank et al. 2009).
Here, we illustrate how observations of Y, say Y o , can be used to infer p F and, more generally, the distribution of Y at time t e . To do this, both changes in the covariate X, that is, X all , and the nonstationary coefficients g 1 5 (m 1 , s 1 ) are estimated from climate models (with some uncertainty) and subsequently treated as known in the statistical inference. Then, observations are used to estimate the stationary parameters g 0 5 (m 0 , s 0 ). In this way, models are used to estimate some parameters (X all and g 1 ) while observations are used to estimate others (g 0 ). As we make no use of the value of g 0 previously estimated in each model, the proposed treatment cannot be really considered as a constraint. This approach could rather be considered as a form of bias correction.
Under the Gaussian assumption, the parameters (m 0 , s 0 ) of Eq. (8) have to be estimated. Given estimates of g 1 and X all , m 0 can be naturally estimated bŷ where the overbar denotes an average, y o t are the available observations of Y, and y o t , m 1 and X all are estimated from models. Then, s 0 can be estimated bŷ where sd is the sample standard deviation, denoting again that every term in the right-hand side is known, and that [y o t 2 m 1 X all (t) 2m 0 ] ; Nf0, s 0 [1 1 s 1 X all (t)]g. Note that these estimators (m 0 ,ŝ 0 ) do not necessarily coincide with the maximum likelihood estimators but are, however, quite natural and attractive for computational reasons. The uncertainty of these parameters can be assessed by extending the bootstrap procedure to y o t (i.e., resampling observations y o t , as would be done in a stationary context), and considering simultaneously uncertainty in m 1 , s 1 , and X all , as derived from the multimodel synthesis.
Our procedure is illustrated in Fig. 7. As the CanESM2 model simulates a larger change than CNRM-CM5, the 2003 event is relatively less abnormal according to that model, resulting in a larger estimate of p F (changes in variance are small and do not substantially influence the results in this case). Model discrepancies in estimates of p F are therefore largely related to the spread in the nonstationary term m 1 X all .
Applying this procedure to all single models and to the multimodel synthesis leads to reduced uncertainty in the estimate of p F (Fig. 8). Estimates of p C are similarly improved. However, the impact on attribution diagnostics, that is, PR and DI, is quite limited. The proposed procedure refines the estimation of the Y distribution, but does not affect the estimation of human influence, and so coefficients measuring that influence are only indirectly impacted.

c. Applying the two constraints together
The two constraints presented above can also be applied simultaneously. If so, observations of X are used to constrain X all first, then parameters g 0 are estimated using observations of Y, given (X all , g 1 ). Therefore, observed information is used in both X and g 0 , in addition to model information. Note, however, that we neglect potential dependencies between observations of X and Y. Results obtained in this way can be considered as the final results of our procedure described, as they combine all sources of information (Figs. 8 and 9 for the multimodel synthesis only).
Applying the two constraints simultaneously leads to a real narrowing of uncertainties in estimates of probabilities p F (t e ) and p C (t e ) (where g 0 plays a critical role), and in the derived attribution diagnostics PR and DI (where X ant is critical; Fig. 8). Uncertainty in PR shrinks from [1.4, 260] (multimodel synthesis, no use of observations) to [3.8, 90] (applying the two constraints).
Uncertainty in DI is also reduced, from [10.18, 12.38C] to [10.58, 11.58C] (i.e., roughly by a factor of 2). In all cases considered, applying the two constraints together reduces model spread further than using one single constraint or no observations at all.
Remarkably, time series of PR and DI can still be derived after applying these constraints (Fig. 9), suggesting a few comments. First, human influence on the likelihood of an event like EWH03 has been significant since the mid 1980s (Figs. 9a,c). Second, the odds of observing an event such as EHW03 (in the sense of the same magnitude) have strongly increased since 2003;  (Fig. 9c). Last, a very large fraction of this human-induced warming is expected to take place after 2003: 16.88C [14, 19.8].
Overall, these results suggest that our approach, in addition to covering a wide range of uncertainties through the use of a large ensemble of models, can lead to relatively constrained attribution results. They also suggest that, in the particular case under consideration, the unconstrained parameters g 1 do not vary widely among models.

Discussion
In this section we review several aspects of our proposed method that deserve particular attention.
a. Choice of the covariate X Selecting an appropriate covariate X is a key step in our method. The choice of this covariate is at least partly subjective and can impact the final results. Using a global or regional temperature may be appropriate, as changes in many variables have been described to scale with temperature (e.g., Collins et al. 2013;Tebaldi and Arblaster 2014). Pattern scaling, however, works better if GHGs are the only source of external forcing. In practice, other forcings also contributed to recent changes-in particular, anthropogenic aerosols. As the strength of the aerosols forcing varies considerably over space, using a regional temperature as a covariate might better reflect the regional balance between various external forcings. In any case, relying on a covariate X is a strong assumption of our method, which must be properly acknowledged.

b. Choice of the Y distribution
In the present application, our method is only illustrated with Gaussian distributions. This is a clear limitation to the field of application of that method, as many climate variables do not follow a Gaussian distribution. Then, as the EHW03 event is very rare, a Gaussian assumption might not well represent the upper tail of the distribution of temperature, potentially leading to errors in the estimation of probabilities (particularly in the counterfactual world). Extending our procedure to other distributions, in particular, GEVs or nonparametric distributions (as suggested in section 3b) will open a broader scope of application.

c. Limitations in using nonstationary statistics
The use of nonstationary statistics is central in this approach, and some limitations must be pointed out. First, a sufficiently large SNR is needed in model data in order to allow fitting of the nonstationary model. The entire procedure can fail if nonstationary coefficients cannot be estimated properly [see, e.g., Li et al. (2019) for a discussion on this limitation]. In this respect, the temperature event considered in this study was an easy one. The method will have to be tested on other events/ variables (e.g., precipitation, wind), to determine the extent of its field of application. Second, further statistical developments might improve the fit of the statistical model. In the current analysis, Y data were limited to a specific space-time domain-we ignore any information available outside this domain. Using further spatial (e.g., a broader region than that of the event; Christidis et al. 2015b) or temporal (e.g., modeling the entire seasonal cycle; Rigal et al. 2019) information might be particularly attractive, but it would involve a sharp increase in the complexity of the statistical model and inference.

d. Climate model evaluation and reliability
Many previous attribution studies have devoted much attention to the evaluation of climate models, in particular, their ability to simulate extreme events consistent with observations, and whether the models include the appropriate physical processes and forcings. This model reliability issue has sometimes been tackled through implementing model selection (e.g., King et al. 2015). Recent studies have also suggested that model biases do impact event attribution results (Bellprat and Doblas-Reyes 2016;Bellprat et al. 2019).
In this study, we do not perform any model evaluation or selection, even though using CMIP models (of relatively low resolution) somewhat strengthens this reliability concern. Instead, we use bias-correction techniques (through the C0 observational constraint, and through defining the event as an anomaly with respect to a given reference period) to correct some of the model biases. Beyond that, more sophisticated bias-correction techniques could be applied prior to the statistical analysis-potentially contributing to reconcile event attribution with the use of coarse-resolution CMIP-style models. Such a treatment, however, is only acceptable if model-simulated changes are robust and little affected by present-day climate biases.
Broader application of our technique (e.g., to other spatiotemporal scales or other variables) might require case-by-case evaluation of the models' reliability, on the basis of available literature. In various cases, model limitations can prevent reliable attribution, for example, if models are just not able to simulate the type of event considered (e.g., tropical cyclones, weather events related to a specific topography), or if some forcings are missing (e.g., irrigation, changes in surface roughness). Two adaptations of our method could be proposed to address at least some of these issues. First, our method could be easily applied to transient experiments performed with high-resolution models, such as Coordinated Regional Climate Downscaling Experiment (CORDEX) models-following a general recommendation to use high-resolution models (National Academies of Sciences, Engineering, and Medicine 2016, and references therein). Note, however, that limiting the analysis to a small number of models can have undesirable effects on the uncertainty analysis. Second, inconsistencies between models and observations might be identified by additional observational constraints, as discussed below.

e. Uncertainty quantification and modeling uncertainty
One key outcome of our analysis is that modeling uncertainty is critical in event attribution. Uncertainty ranges vary greatly in size if derived using one model only versus a multimodel ensemble, with ranges far too narrow in the former case. The technique proposed to build the multimodel synthesis could be improved in many ways, for example, by using a link function for some parameters in u, a non-Gaussian dependence structure, or another paradigm than the model truth exchangeability. Last, the uncertainty ranges derived from the multimodel synthesis are relatively large, but it still might be useful to check that observations are consistent with these-no such check was implemented here.

f. Additional observational constraints
Section 5 explored two possible ways to use observations to constrain attribution results, in which observations of Y are only used to estimate the stationary parameters g 0 . A natural extension would be to also constrain g 1 , that is, the magnitude of changes in Y. This could be done using a Bayesian approach, in which the multimodel uncertainty on g 1 is used as a prior distribution.

g. Consistency with other approaches
Assessing the consistency of our results with previous studies that also focused on the EWH03 event is not easy, primarily because variations in the event definition can contribute to discrepancies in the results (Cattiaux and Ribes 2018). Roughly speaking, our results in terms of PR lie somewhere between Stott et al. (2004) and Christidis et al. (2015a). Our estimate of DI seems consistent with the figures in Christidis et al. (2015a). Providing closer comparisons between our approach and other event attribution methods will be of primary interest in the future. From another perspective, our approach provides an adaptation of projections uncertainty plumes to a specific event, sharing some common features with the treatment used in IPCC AR5 (e.g., use of CMIP models, treatment of uncertainties).

Conclusions
This study describes and illustrates a new statistical method for event attribution that can be decomposed into three steps. First, event attribution diagnostics are derived from transient CMIP-style experiments using nonstationary statistics and an appropriate covariate. Single-model results derived from this step typically exhibit large discrepancies. Second, a multimodel synthesis is performed, assuming model-truth exchangeability. Evidence suggests that this synthesis might represent uncertainty better than single-model analyses. Third, multimodel information is combined with historical observations in order to account for all sources of information available. This blending typically reduces uncertainty in the final attribution diagnostics, while providing a more comprehensive description of the event and human influence on it.
This study illustrates that it is possible to perform event attribution using available transient simulations. This is an important result because the use of such experiments offers several advantages. In particular, it offers the possibility of characterizing the human influence on a singular event in a way that is consistent with long-term projections, that is, using the same data and a similar quantification of uncertainty. The calculation of uncertainty plumes covering both the past and future also provides a new perspective on the human influence on a singular event. And, obviously, reusing available simulations could save much computation time and effort.
Overall, this method could facilitate communication about the human influence on a particular event, as the diagnostics it provides are, by construction, consistent with other long-term indicators of climate change. The method is also promising in that it allows a rapid analysis of events, as all input data are already available. Testing this approach on a broad range of event types and scales will be necessary before any systematic application.
Data availability statement: All data used in this study are publicly available. 1) HadCRUT4 observations are available at https://crudata.uea.ac.uk/cru/ data/temperature/. 2) Outputs of CMIP5 simulations are available at https://esgf-node.llnl.gov/projects/cmip5/. Parameters of the EBMs were taken from Geoffroy et al. (2013). Time series of natural forcings used to compute EBMs response were taken from IPCC AR5, Annex II (IPCC 2013).