1. Introduction
Civil engineers, flood planners, and other risk managers have to take into account the nonstationarity of our climate system when computing return levels1 to assess, for example, the quality of existing dams or heights of new dikes and other structures. It is natural to wonder how a 100-yr return level for a dike built 50 years ago, under the assumption of stationarity, could be different from today’s value. As emphasized by Rootzén and Katz (2013), a delicate issue is the interpretation of return periods and return levels within a nonstationary world. A 100-yr return level computed 50 years ago, today, or in 50 years may differ and become a relative quantity. Another challenge is to discriminate between a set of potential causes in order to explain detected changes (see, e.g., Zwiers et al. 2013; Hegerl and Zwiers 2011; IPCC 2013).
To deal with relative probability changes and causality questions, researchers in the event attribution community (see, e.g., Uhe et al. 2016) frequently work with the risk ratio RR = p1/p0 of two probabilities of the same event, p0 and p1, under two different situations: a “world that might have been” (a counterfactual world without anthropogenic forcings) and the “world that is” (a factual world). The archetypical example of a common event is that the random variable of interest, say annual maximum temperature, exceeds a high threshold. The so-called fraction of attributable risk FAR = 1 − 1/RR has also been used (see, e.g., Stott et al. 2004, 2016), and it can be viewed as the probability of necessary causation within the mathematical paradigm of causality theory (see Hannart et al. 2016).
Instead of focusing on return period changes, which is a frequent approach in event attributions studies, our goal in this paper is to determine how record event frequencies differ between the factual and counterfactual worlds. Many reasons can be invoked to motivate this change of viewpoint regarding the event of interest. Practically, records have often large economic and human impacts. In terms of media communications, the news anchor typically asks, whenever a record of temperatures or rainfall is broken, if such a record is due to climate change. It is important to have the appropriate probability tool to precisely address this question. Conceptually, record events (see, e.g., King 2017; Meehl et al. 2009; Resnick 1987) are different from maxima and excesses above a high threshold. Reformulating extreme event attribution in terms of records rather than threshold excesses simplifies the statistical issues because only one probability has to be estimated instead of two. Under some additional assumptions based on extreme value theory (EVT; see, e.g., Embrechts et al. 1997; Coles 2001), we will show that the computation of FAR for records can be simplified dramatically and leads to some simple formulas when evaluated through Pearl’s causality theory (see Hannart et al. 2016). The results yield interesting conclusions when applied to two datasets of annual maxima of daily temperature maxima: observations recorded in Paris and outputs from the Météo-France CNRM-CM5 climate model. Still, our objective is not to replace the classical FAR and RR, but to offer a complementary instrument for climatologists, hydrologists, and risk managers. Although strongly connected, the interpretation of our new FAR and RR for records will differ from conventional approaches.
a. Main assumptions and notations
The probabilistic framework in event attribution studies often starts with the simple and reasonable hypothesis that two independent samples are available, say
As relative probability ratios of the type p1/p0 or (p1 − p0)/p1 are often the main objects of interest in event attribution, we will always assume that such ratios are well defined and finite. Such a hypothesis is classical within the event attribution community and simplifies mathematical expressions and interpretations. Assuming a finite FAR implies that the right-hand end point of
b. Records and return periods


c. Interpreting records in an event attribution context
Given any return period r, different types of records can be defined by comparing data from our different worlds. For example, the event

To make the link with past event attribution studies, we use the notation p0,r and p1,r; they differ in their interpretation and inference properties from the classical survival functions
Given the return period r, we have to find the solution of the system defined by p0,r and p1,r. By construction, p0,r is always equal to 1/r, and consequently, there is only one quantity to estimate: p1,r. The consequence of fixing p0,r = 1/r is particularly relevant when dealing with intercomparison studies like CMIP5 or the upcoming CMIP6 (see, e.g., Kharin et al. 2013; Taylor et al. 2012). To compare the values of p1 from different climate models, it would be best to fix p0 to the same value for all climate models. Otherwise, the interpretation of the ratio p1/p0 will only be relative, an important shortcoming for assessments like CMIP5. For example, the ratio
The remaining task is to estimate p1,r. To implement it, the rest of the paper will be organized as follows. Section 2 presents our statistical model and its corresponding inferential strategy. Examples will be given in section 3. In particular, the analysis of yearly temperature maxima from a climate model will illustrate the applicability of our approach. Section 4 will conclude by highlighting potential improvements and further directions.
2. Statistical model for 



Table 1 provides three examples from extreme value theory, a well-developed probability theory for distributions of maxima and related upper-tail behaviors (see, e.g., Embrechts et al. 1997; Coles 2001). From this table, one can see that a shifted Gumbel random variable, that is, Z = X + μ, produces a random variable W = −logG(Z) that is exponentially distributed. The same result can be obtained from an inflated Fréchet, that is, Z = σX with σ > 0, or from a Weibull–EVT type,6 that is, Z = μ + σX. This hints that block maxima, a block being a season or a year, can produce factual and counterfactual records that are in compliance with an exponentially distributed W. In practice, we do not have to fit a generalized extreme value distribution (GEV) to the factual and counterfactual data. We need to assume that W is exponentially distributed.
Three examples for X and Z (see, e.g., Fig. 1). Note that X and Z have to have the same lower and upper end points. For the Weibull type, this imposes x < 1/(−ξ) with σ > 0 and μ = (1 − σ)/(−ξ).
a. Testing the null hypothesis, H0: W 
exp(θ)





b. Inference of θ under H0: W 
exp(θ)

The constant θ, which plays an essential role, can be interpreted in different ways. Since





c. Inference of 
and risk ratio under H0: W 
exp(θ)


Explicit FAR(u) and far(r) for the three examples in Table 1. The “limit” row indicates the asymptotic value of far(r) as r ↑ ∞ and, equivalently, the FAR(u) asymptote as u moves toward the upper support bound. See also Fig. 1.
Comparison of far(r) (dotted lines) defined by Eq. (6) with FAR(xr) (solid lines). The horizontal dashed lines correspond to the limits shown in Table 2.
Citation: Journal of Climate 31, 9; 10.1175/JCLI-D-16-0752.1


3. Examples
a. Annual temperature maxima in Paris
To illustrate how our approach can be applied to climatological time series, we study one of the longest records of annual maxima of daily temperature maxima in Paris (weather station of Montsouris). To represent the factual and counterfactual worlds, this Paris time series is divided into two parts: 1900–30 and 1990–2015. Although the influence of anthropogenic forcing may not be completely absent in the first part of the record, Fig. 2 still shows that the hypothesis of exponentially distributed Wi seems reasonable. The histogram and the q-q plot do not contradict the exponential hypothesis. The top-right panel compares a decreasing convex dotted line (our fitted exponential model) with the observational histogram that is also decreasing and convex. The lower-left panel shows the ranked observations versus the ones expected from our model. Being close to the diagonal (the solid line) indicates a good fit.
Analysis of annual maxima of daily temperature maxima recorded at the weather station of Montsouris in Paris. To compute far(r), the period 1900–30 plays the role of the counterfactual world, and the recent climatology 1990–2015 represents the factual world. The 90% confidence intervals are based on Eq. (5).
Citation: Journal of Climate 31, 9; 10.1175/JCLI-D-16-0752.1
To complement these visual devices, our exponential fit is compared, via the Akaike information criterion (AIC) and Bayesian information criterion (BIC), to a Weibull fit, a natural but more complex alternative. The lower scores for the exponential pdf (AIC = 11.83 and BIC = 13.05 vs AIC = 13.2 and BIC = 15.7) confirms our preference for the simpler model. Concerning the test of Cox and Oates (1984), the inferred p value is 0.51, indicating that there is no reason to reject the exponential model for our Parisian data. The upper-left panel of Fig. 2 displays far(r) with its associated 90% confidence interval. Even from a single location like Paris, one can deduce that far(r) is not equal to zero, but is rather around 0.5.
b. Analysis of yearly temperature maxima from CNRM-CM5
To illustrate our method with numerical climate model outputs, we study yearly maxima of daily maxima of near-surface air temperature generated by the Météo-France CNRM-CM5 climate model. We focus on a single climate simulation with “all forcings”9 for the period 1975–2005. This should mimic the current climatology, that is, the counterfactual world, at least according to this model. During this 30-yr climatological period, we assume that the climate system is approximately stationary with respect to annual maxima. Concerning the “world that might have been” without anthropogenic forcings, we consider a second run from the same numerical model, but with only natural forcings. For this counterfactual world, we assume stationarity over the full length of the simulation, that is, for the period 1850–2012.
Our first inquiry is to see if our null hypothesis on W is rejected or not. The white areas in Fig. 3 correspond to the locations where the exponential hypothesis has been rejected by the Cox and Oates test at the 5% level. For the other points, the estimated value of θ inferred from Eq. (5) ranges from 0.2 to 1.2. A θ near one (orange) indicates identical factual and counterfactual worlds. A θ close to zero (blue) implies that the factual and counterfactual worlds strongly differ with an increase in temperature records. Note that the estimated θ can be larger than one, meaning that
Comparing yearly temperature maxima from two runs of the Météo-France CNRM-CM5 climate model. The factual run has all forcings (1975–2005), and the counterfactual has only natural forcings (1850–2012). This figure displays
Citation: Journal of Climate 31, 9; 10.1175/JCLI-D-16-0752.1
Figure 4 displays the largest values of p1,r − p0,r over all values of r ≥ 2. In causality theory, this difference is called the probability of sufficient and necessary causations, and under H0, it is equal to
As in Fig. 3, but for probabilities of sufficient and necessary causations at locations such that
Citation: Journal of Climate 31, 9; 10.1175/JCLI-D-16-0752.1
Interpreting the two maps jointly highlights the African continent, Brazil, and western Australia, which seem to represent regions with low θ and large p values, that is, with a strong contrast between the all forcing and only natural forcing runs. Northern Brazil, most of Africa, and western Australia appear to show evidence of such causations most strongly. Variability in surface air temperature is low in tropical regions and over many ocean areas, and it is strongly reduced in areas of sea ice loss, so we expect large relative differences between the counterfactual and factual worlds in these cases. Simulated variability is larger in other places, relative to the simulated change in mean climate. As a consequence, both PN and PS remain small locally in these regions. This includes the Mediterranean basin, which is sometimes referred to as a “hot spot.” Note that given the period of time considered for the “factual world,” non-GHG forcings like anthropogenic aerosols can contribute to this result (offsetting part of the GHG-induced warming).
4. Discussions and future work
a. Event definition
Our work is closely related to the fundamental and sometimes overlooked question of how to define the event itself. Ideally, the event of interest should be easily interpretable, its probability should be simple to compute in both worlds, and it should belong to an invariant reference framework in order to compare different studies. These desiderata were fulfilled by slightly moving away from the idea of defining the event as an excess over a fixed threshold and instead setting the event as a record.
A potential worry could have been that treating record events instead of excesses above the threshold u could drastically move us away from FAR(u). Figure 1 showed that this concern was unfounded. In other words, the practitioner interested in assessing the likelihood of extreme events relative to a counterfactual scenario has the choice of using far(r) or FAR(xr) for large r or, equivalently, rr(r) or RR(xr).
b. Null hypothesis
Statistically, inferences of the probabilities of interest are simple, fast, and efficient under H0: W ~ exp(θ). In particular, such an assumption is satisfied for simple GEV setups (see Table 1). Theoretically, it is rooted on the variable W = −logG(Z). This transformed variable shows that the interest is neither in only Z nor only X (throughout the CDF G) but in how they combine. In other words, our main message is that it is counterproductive to minutely study the two samples X and Z independently. As the FAR and risk ratios are relative quantities, working with the relative statistic W greatly simplifies their inferences.
Concerning the applications, our analysis of a long temperature record in Paris and of a climate model indicates that the exponential assumption for W appears to be reasonable for temperature maxima over land areas. Still, a comprehensive study with a large number of both weather stations and climate models is needed to confirm this hypothesis. Future work should also focus on precipitation maxima. In this situation, we conjecture that the Weibull MGF, with its extra parameter k, could be a better fit than the exponential MGF because extreme precipitation has a stronger variability. The Weibull parameter k should capture the changes between the factual and counterfactual tails. Another option when H0 is rejected is to estimate nonparametrically p1,r [see Eq. (A1)], which provides the necessary information to compute confidence intervals in this case. The main advantage of this approach is that goodness-of-fit tests are not required, but its main drawback is that the sample size n should always be much greater than the return period of interest so
To summarize our findings about H0, it is important to emphasize that this hypothesis allows us to work with small sample sizes; see our Paris example with n = 30. This is a huge difference with nonparametric methods that need thousands of factual and counterfactual simulations. Hence, our methodology is fairly general. One possible application could be conditional attributions, when the event itself can be linked to another covariate, like the atmospheric circulation. Further work on far(r) would be to estimate this value’s conditional to atmospheric circulation properties, in order to separate the roles of changes in circulation and temperature increases (see, e.g., Yiou et al. 2017).
Finally, our goal in this paper is not to discard the classical FAR(u) and RR(u) in favor of far(r) and rr(r). The two have different interpretations, and they complement each other. Here, the new p0,r and p1,r should be viewed as an additional risk management tool for practitioners interested in causality, especially those who base their decision-making process on records. Conceptually, our approach has the advantage of providing a fixed reference framework, which is a necessity for intercomparison studies and could help future intercomparison studies like CMIP6.
APPENDIX
Mathematical Computation
a. Proof of Eq. (2)

b. Proof of Eq. (4)
In this section, we prove a more general result than Eq. (4); the latter can be deduced by taking r = 2 in the following proposition.

















c. Computation of p1,r in the GEV case










For the special case when ξ0 = ξ1, then this simplifies to p1,r = 1/(1 + cr).
Acknowledgments
This work is supported by ERC Grant 338965-A2C2 and P. Naveau’s work was partially financed by the EUPHEME project (ERA4CS ERA-NET).
REFERENCES
Ascher, S., 1990: A survey of tests for exponentiality. Commun. Stat. Theory Methods, 19, 1811–1825, https://doi.org/10.1080/03610929008830292.
Brockwell, P. J., and R. A. Davis, 1991: Time Series: Theory and Methods, 2nd ed. Springer Series in Statistics, Springer, 577 pp.
Coles, S. G., 2001: An Introduction to Statistical Modeling of Extreme Values. Springer Series in Statistics, Springer, 208 pp.
Cox, D. R., and D. Oakes, 1984: Analysis of Survival Data, Chapman & Hall/CRC Monogr. on Statistics and Applied Probability, No. 21, Chapman and Hall, 212 pp.
Embrechts, P., C. Klüppelberg, and T. Mikosch, 1997: Modelling Extremal Events for Insurance and Finance. Stochastic Modelling and Applied Probability Series, Vol. 33, Springer, 648 pp.
Hannart, A., J. Pearl, F. E. L. Otto, P. Naveau, and M. Ghil, 2016: Causal counterfactual theory for the attribution of weather and climate-related events. Bull. Amer. Meteor. Soc., 97, 99–110, https://doi.org/10.1175/BAMS-D-14-00034.1.
Hegerl, G., and F. Zwiers, 2011: Use of models in detection and attribution of climate change. Wiley Interdiscip. Rev.: Climate Change, 2, 570–591, https://doi.org/10.1002/wcc.121.
Henze, N., and S. G. Meintanis, 2005: Recent and classical tests for exponentially: A partial review with comparisons. Metrika, 61, 29–45, https://doi.org/10.1007/s001840400322.
IPCC, 2013: Climate Change 2013: The Physical Science Basis. Cambridge University Press, 1535 pp., https://doi.org/10.1017/CBO9781107415324.
Kharin, V. V., F. W. Zwiers, X. Zhang, and M. Wehner, 2013: Changes in temperature and precipitation extremes in the CMIP5 ensemble. Climatic Change, 119, 345–357, https://doi.org/10.1007/s10584-013-0705-8.
King, A. D., 2017: Attributing changing rates of temperature record breaking to anthropogenic influences, Earth’s Future, 5, 1156–1168, https://doi.org/10.1002/2017EF000611.
Mason, D. M., and W. R. Van Zwet, 1987: A refinement of the KMT inequality for the uniform empirical process. Ann. Probab., 15, 871–884, https://doi.org/10.1214/aop/1176992070.
Meehl, G. A., C. Tebaldi, G. Walton, D. Easterling, and L. McDaniel, 2009: Relative increase of record high maximum temperatures compared to record low minimum temperatures in the U.S. Geophys. Res. Lett., 36, L23701, https://doi.org/10.1029/2009GL040736.
Resnick, S., 1987: Extreme Values, Regular Variation, and Point Processes. Springer Series in Operations Research and Financial Engineering, Springer, 320 pp.
Rootzén, H., and R. W. Katz, 2013: Design Life Level: Quantifying risk in a changing climate. Water Resour. Res., 49, 5964–5972, https://doi.org/10.1002/wrcr.20425.
Stott, P. A., D. A. Stone, and M. R. Allen, 2004: Human contribution to the European heatwave of 2003. Nature, 432, 610–614, https://doi.org/10.1038/nature03089.
Stott, P. A., and Coauthors, 2016: Attribution of extreme weather and climate-related events. Wiley Interdiscip. Rev.: Climate Change, 7, 23–41, https://doi.org/10.1002/wcc.380.
Taylor, K., R. Stouffer, and G. Meehl, 2012: An overview of CMIP5 and the experiment design. Bull. Amer. Meteor. Soc., 93, 485–498, https://doi.org/10.1175/BAMS-D-11-00094.1.
Uhe, P., F. E. L. Otto, K. Haustein, G. J. van Oldenborgh, A. D. King, D. C. H. Wallom, M. R. Allen, and H. Cullen, 2016: Comparison of methods: Attributing the 2014 record European temperatures to human influences. Geophys. Res. Lett., 43, 8685–8693, https://doi.org/10.1002/2016GL069568.
Yiou, P., A. Jézéquel, P. Naveau, F. E. L. Otto, R. Vautard, and M. Vrac, 2017: A statistical framework for conditional extreme event attribution. Adv. Stat. Climate Meteor. Oceanogr., 3, 17–31, https://doi.org/10.5194/ascmo-3-17-2017.
Zwiers, F., and Coauthors, 2013: Climate extremes: Challenges in estimating and understanding recent changes in the frequency and intensity of extreme climate and weather events. Climate Science for Serving Society, G. R. Asrar and J. W. Hurrell, Eds., Springer, 339–389.
Given a fixed number of years r, the return level, say xr, is the value that should be crossed on average once during the period r under the assumption of stationarity. The number r is called the return period, and, hence, it is associated with an event of probability 1/r. The return level xr associated with period r can also be defined as the value that is exceeded with probability 1/r in any given year.
All CDFs in this article are assumed to be continuous.
For ease of interpretation, the temporal increment will always be a year in this work, but this is arbitrary and other increments could be used instead.
The IID assumption appears reasonable for yearly maxima in the counterfactual world, the topic of interest in our examples.
With the constraint that X and Z share the same support, that is, x < 1/(−ξ) with σ > 0 and μ = (1 − σ)/(−ξ).
As
That is, the model is driven by specifying the history of greenhouse gas concentrations, aerosols, land use, volcanic aerosol, and solar output over the period of simulation from preindustrial times up to 2005.