## Abstract

In attempts to attribute a single event—such as an observed heat wave or flooding—to climate change, the probability distributions of the quantity under consideration for current and preindustrial conditions are compared to determine a possible changed occurrence rate. These distributions are typically calculated from large ensembles produced by climate models and require large computational resources. In this study, a simple alternative surrogate method together with analytical considerations will be used as a test bed to inform about methodological issues connected to the selection problem and deviations from Gaussianity that should be considered before comprehensive climate models are invoked.

The author will mainly study the influence of the selection problem, which in this context means that when an event has been observed it is not obvious how the probability distributions should be defined. Should similar events be looked for in the immediate neighborhood of the observation or in an extended area? It is shown that this choice will have serious consequences for the distributions of the events and the attribution to climate change.

The author also demonstrates that deviations from Gaussianity can have a large influence on the conclusions and that it is important that the ensembles adequately represent the features that contribute to the extreme events under consideration. In particular, it is shown that the fractional attributable risk has very different behavior for heavy-tailed distributions than for Gaussian distributions. In the example considered with the surrogate method—European heat waves—important features also include the seasonal variation in skewness.

## 1. Introduction

There is a general understanding that the potentially most adverse consequences of climate change are related to increased occurrence of extreme events such as floodings and heat waves. The number of such extremes have often been reported to increase (Frich et al. 2002; Alexander et al. 2006; Meehl et al. 2009; Coumou and Rahmstorf 2012; Peterson et al. 2012; Fischer and Knutti 2015). But because extremes by their very nature are scarce, and because events in nearby in space and time are not necessarily independent, it is not straightforward to determine whether changes in the frequencies of such events are a consequence of the internal variability in the climate system alone or whether they can be attributed to anthropogenic climate change (Frei and Schär 2001; Stott et al. 2004). However, such challenges in attribution can be met by suitable statistical methods (Benestad 2003; Alexander et al. 2006; Kunkel et al. 2007; Meehl et al. 2009; Wergen and Krug 2010; Christiansen 2013) taking the reduced number of degrees of freedom into account. Such efforts have led to a growing evidence for changes in the frequency of occurrence of extreme events.

In parallel with the progress in the general detection and attribution problems—demonstrating that climate has changed in some defined statistical sense and establishing the most likely causes for the detected change—there has been an increasing interest in the attribution of single extreme events. Such event attribution aims at giving quantitative estimations of the role of anthropogenic climate change for the occurrence of a single observed event. This interest is partly driven by the fact that both decision makers and the public often, after an extreme event has occurred, naturally ask how much anthropogenic climate change has contributed to the observed event and how often similar events will happen in the future. There is also an emerging interest in legal issues connected to loss and damage caused by extreme events and their possible connection to anthropogenic climate change (Allen et al. 2007; James et al. 2014). But while the more general attribution problem discussed in the first paragraph is obviously well defined although challenging, the situation is less clear for the attribution of single extreme events (e.g., Allen 2003). Climate and climate change refer to statistical quantities such as mean and variance of spatiotemporal fields (e.g., temperature and precipitation). Single events are connected to weather—the faster fluctuations on top of the climate. While climate is a boundary condition problem (the boundaries being, e.g., the atmospheric CO_{2} content and the solar constant) weather depends mainly on the initial conditions. So, how can a single event be attributed to changes in the boundary conditions?

Such an attribution must necessarily be probabilistic in nature. Assume we have observed, say, an extreme temperature or precipitation event in a specific region. If we have information about the frequency of similar events of different magnitudes in both the preindustrial climate and in the climate under anthropogenic forcing we will be able to make statements about how much the probability of the observed (or more extreme) event differs between the two situations. Note the words “similar events”; the complications of that phrase will soon be elaborated upon.

More precisely, let the variable *x* have the probability density in the unperturbed climate and the density in a perturbed climate. Let the corresponding cumulated distributions be denoted and . Here we will use the general terms unperturbed and perturbed climate. In the present case the unperturbed climate is the climate without anthropogenic changes, sometimes denoted the preindustrial climate, or “the world that might have been.” The perturbed climate is the climate under anthropogenic changes.

The fractional attributable risk (FAR) for an observation , , is then defined as

where is the complimentary cumulative distribution function. So is a measure of how much the probability for an event as extreme or more extreme than the observed, , differs between the unperturbed and perturbed climate.

The fractional attributable risk was introduced in climate science by Allen (2003) and Stott et al. (2004), and the methodology has been used in, for example, Schär et al. (2004), Jaeger et al. (2008), Pall et al. (2011), and Knutson et al. (2014). See also the review of Stott et al. (2013) and the assessment of Bindoff et al. (2014). In most of the previous studies the information about the distributions comes from ensemble experiments with climate models. These could be dedicated experiments where ensemble members are produced by perturbing the initial conditions or existing ensembles of experiments such as those compiled in the CMIP5 project. So far there have only been a few studies of the theoretical properties of the FAR (Hansen et al. 2014; Hannart et al. 2015).

It often happens that an observed event is extreme both in the unforced and forced climate; that is, is far to the right of both and (Schär et al. 2004; Christidis et al. 2013; Knutson et al. 2014). In this case, the FAR will be difficult to constrain because both and will be close to one. There are a number of explanations for such a situation:

The event is actually a very extreme event also in the forced climate.

The system used to generate the probabilities is too conservative and is not producing enough extremes. Perhaps the model is too Gaussian.

We have been too restrictive in our definition of a “similar event.” Perhaps we have tacitly selected an observed event after scanning a larger area, while the probabilities are calculated by looking at the specific site where the event happened.

The second explanation deals with the tails of the distributions and this situation might be improved by better modeling tools giving a better representation of the underlying processes. Observational studies of, for example, surface temperature, often show significant deviations from Gaussianity both regarding skewness and kurtosis (Perron and Sura 2013) and regarding heavy tails (Ruff and Neelin 2012).

The third explanation is more philosophical and is basically the selection problem. Consider, for example, the situation where we have observed an extreme temperature or precipitation event at a particular station or site. In the calculation of the FAR we consider the frequencies of similar or more extreme events. But how do we define “similar”? Should we study the frequencies of temperature or precipitation at the same station or in an extended region? How large should the extended region be? After all, the particular station or site is only special because it is where we observed the extreme event and we might have been likewise exited or alarmed if an extreme event was observed at a different site. This choice is important because a, say, three-sigma event observed at a particular location is by definition rare but may be observed often if a more extended region is considered (Chase et al. 2006). Thus, it is obvious that the number of similar or more extreme events in both the unforced and forced climate increases (or at least do not decrease) when the considered region grows, but it is far from obvious what happens to the FAR. As for the choice of spatial region, the choice of the period that should be considered is far from obvious and lead to similar ambiguity. We would probably have been as surprised if, for example, the European heat wave had happened in 2001 or 1995 as we were when it happened in 2003. Note that when the considered region and period reach planetary and climatic scales the event attribution problem transforms into the general attribution problem.

Note that the selection problem also includes other choices—for example, the choice of the diagnostic used to measure the severity of the event (Sippel and Otto 2014). Regarding heat waves, a range of definitions have been proposed, some based exclusively on meteorological variables such as temporal duration and spatial extend of temperature (Stefanon et al. 2012) and others based also on the physiological parameters that are needed to describe the physiological heat load (Koppe et al. 2004). Each choice of diagnostic may give a different FAR and to avoid the selection problem we should decide which index to focus on before the event has actually happened or at least report the FAR for a range of relevant diagnostics.

It is this framing of a similar event that constitutes the selection problem and is the main subject of the present paper. It is not obvious that a good solution to the selection problem exists; statistics destroy the unique and the unique defies statistics (Pauli 1955). However, it is important to acknowledge it and to be aware of the ambiguity it might inflict on measures such as the FAR (Otto et al. 2015).

Below we will study the challenges mentioned in the last two bullet points in more detail. We will show how both non-Gaussianity and the size of the region considered when defining similar events can change the FAR.

As mentioned, most previous attempts of attribution of single events have been based on ensembles of climate model experiments, although extrapolation of long observed time series has also been used (van Oldenborgh 2007). For the general detection and attribution problem several alternative methods to generate the ensembles have been applied, including one-dimensional Monte Carlo simulations (Rahmstorf and Coumou 2011; Benestad 2003) and a simple energy balance model (Stone and Allen 2005). The method of flow analogs has been used for process attribution (Cattiaux et al. 2010; Cattiaux and Yiou 2013). In this paper we discuss an alternative to climate model experiments based on a simple algorithm to produce ensembles of surrogate fields based on observations. This method produces surrogate fields with the same spatial and temporal structure as the target field. The method does not depend on physical parameterizations but only on statistical assumptions. Therefore, it produces estimates of the FAR that are independent from those obtained with climate models, which still have limitations related to tuning and bias corrections (Stone and Allen 2005; Christensen et al. 2008; Teutschbein and Seibert 2013). The surrogate method is fast and flexible and can therefore also be used for sensitivity studies and to test the robustness of the FAR to methodological choices. A fundamental assumption is that it is possible in the observations to empirically separate internal variability from climate change; in the present formulation, this is done by assuming that climate change in each geographical point has the form of a polynomial trend. In this paper we use the surrogate method to illustrate the importance of the selection problem in a realistic setting focusing on European heat waves.

## 2. Theoretical considerations

Let us consider a vector , , of *n* independent and identically distributed variables each with the probability density and cumulative distribution . We denote the probability density and the cumulated distribution of the largest value in this vector, , by and . Because of the assumed independence we have . Thus, *n* can be seen as the size of the region or the length of the period under consideration and the difference between and expresses the selection problem in this simple case.

In the following, the unperturbed climate for *x* will be represented by a density with mean 0 and unit variance. We assume that the climate change amounts to a simple shift of the density so that the perturbed climate will be represented by , which is just moved to the right (but still with unit variance). So the climate change is measured in units of the standard deviations of the unperturbed climate. Representing climate change as a simple shift seems to be a very good first-order approximation in particular for temperature (Simolo et al. 2011; Donat and Alexander 2012; Huntingford et al. 2013; Weaver et al. 2014). There is an ongoing discussion over whether changes in variance is also in some cases affecting temperature extremes (Hansen et al. 2012; Rhines and Huybers 2013; Hartmann et al. 2013). We now study , the FAR of an observed , for different values of *n* and for different densities .

For we will focus on Gaussian distributions and *t* distributions. The *t* distributions have *d* degrees of freedom with . The *t* distributions have fatter tails than the Gaussian and approach a Gaussian for large *d*. We will report only results for but similar results are obtained for other values of *d*.

Let us first consider the case where is Gaussian. We assume that the climate change is . The two densities, and , corresponding to , are shown in the top panel of Fig. 1 (left curves). The corresponding FAR is shown as a function of in the upper curve in the top panel of Fig. 2. This curve is monotonically increasing, meaning that the higher the observed value is, the larger the FAR is, and the more important would we assume climate change has been. For a value of 2 (4) we find a FAR of about 0.4 (0.6), which should be interpreted as 40% (60%) of the risk of that event is attributable to climate change (Stott et al. 2013).

The last paragraph corresponds to . Let us now consider larger values of *n*. In the top panel of Fig. 1 (right curves) the densities and are shown for . The densities have moved to larger values compared to , reflecting that the probability for observing a larger value has increased; their means are 2.51 and 2.81, respectively (note the constant difference equal to ). They are also more narrow and more skewed than for ; they have both a standard deviation of 0.42 and a skewness of 0.65. This is in agreement with the well-known fact (e.g., Leadbetter et al. 1983) that the density of for large *n* will converge (properly normalized) toward a Gumbel distribution. The mean drifts rather slowly to the right as while the width decreases as . The FAR (top panel in Fig. 2) has the same form as for for large values of while it has decreased for small values. Actually, for the smallest values, the FAR is very close to zero. For example, the FAR is almost zero for , while it is still around 0.6 for . The FAR has much the same structure for different values of *n*. For large values of they fall on the same upper branch (Fig. 2) but the jump from values near zero to the upper branch moves to the right and becomes more abrupt for increasing *n*. In fact, the FAR always decreases with increasing *n* as long as . This can be shown by inspection of the derivative of with respect to *n*.

While the Gaussian distribution may be sufficient in some situations, many relevant processes follow distributions with heavier tails. We now consider *t* distributed with 5 degrees of freedom. The bottom panel in Fig. 1 shows the distributions and for and . For both distributions have a standard deviation of 1.46 and a skewness of 3.57 while their means are 3.15 and 3.45, respectively. It is worth noting that the deviations from Gaussianity are much more pronounced for than for . This is in agreement with the fact that this distribution for large *n* will converge (properly normalized) not toward a Gumbel distribution but a Fréchet distribution. The FARs (bottom panel in Fig. 2) are—perhaps surprisingly—very different from the Gaussian case; they are not monotonically increasing as before but have a maximum. This means that the impact of climate change would be assumed larger for medium values around 1.5–3 (for ) than for real extremes above 4. Note also that the FAR never exceeds 0.4 for the curves shown in the bottom panel of Fig. 2, and therefore never more than 40% of the risk of an event will be attributable to climate change even for .

For large there is an extreme difference between the *t*-distributed case (where the FAR approaches 0) and the Gaussian case (where the FAR approaches 1). This is accentuated in Fig. 3 where the FAR is shown as function of both and *n* for . The discussion above has been for but similar results are obtained for other values of climate change as demonstrated in Fig. 4 where the FAR is shown as function of both and for . While *R* increases with in both cases it is obvious that the differences between the Gaussian and *t*-distributed cases exist for all values of . The local maximum in the FAR in the *t*-distributed case becomes even more distinctive for large .

The difference in the form of the FAR between the Gaussian and the *t*-distributed cases can be explained analytically. In our case we have . For large *x* we have in the Gaussian case and in the *t*-distributed case. We therefore find that the FAR approaches 1 for large *x* in the Gaussian case and 0 in the *t*-distributed case. In general, distributions with heavy tails will have a maximum in the FAR while light-tailed distributions will have a monotonically increasing FAR. In fact, the results shown for the Gaussian and the *t* distribution are very characteristic for light- and heavy-tailed distributions, respectively, and similar results are obtained with other distributions falling into these classes. For a Gumbel distribution—the limiting case between light- and heavy-tailed distributions—the FAR converges to a constant for large . A simple calculation finds this constant to be , where *ρ* is the scale parameter.

Let us briefly summarize the results and put them into a real-world perspective.

The FAR depends strongly on the

*n*(Fig. 3). For a given the FAR will approach 0 for increasing*n.*This holds for both the Gaussian and non-Gaussian cases. This means that the selection problem should be taken seriously. We need to consider the size of the region that should be included in our calculation of the frequencies. Note that the parameter*n*in the discussion above is a measure of the number of spatial and/or temporal degrees of freedom. In real-world situations the number of spatial degrees of freedom is difficult to estimate and depends both on the field as well as on the time scale under consideration (Jones and Briffa 1996) and may differ between different parts of the globe (North et al. 2011) and between different seasons. It is of order 50–100 for monthly surface temperatures in NH Europe (Wang and Shen 1999; Bretherton et al. 1999) and probably an order of magnitude larger for daily means. For daily precipitation the number of spatial degrees of freedom is much larger (Moron et al. 2006; Benestad 2013). Temporal degrees of freedom can be estimated from the temporal decorrelation length , where is the autocorrelation coefficient at lag 1. For the daily surface temperatures (see, e.g., Fig. 7) a typical value of is 0.7 and we find days. For a three-month season the number of temporal degrees of freedom is therefore around 15. The total number of degrees of freedom when both space and time are taken into account can therefore vary with many orders of magnitude depending on the field, the region, and the period under consideration.While the FAR is well behaved for Gaussian distributions it shows a rather counterintuitive behavior for distributions with heavy tails. The FAR does not increase monotonously with the strength of the observation but has a local maximum. This means that while the frequency of extreme events does increase in the changed climate it is not necessarily the most extreme events that are most attributable to climate change. The difference in FAR based on Gaussian and heavy-tailed distributions increases with increasing . In the most extreme cases the Gaussian distributions give a FAR of 1, while heavy-tailed distributions give a FAR of 0. We note that although in general observed temperatures show light-tailed behavior (Katz 2002) there is some evidence that daily temperatures can show heavy tails (Ruff and Neelin 2012). Stronger evidence for heavy tails is found for precipitation (Koutsoyiannis 2004; Wilson and Toumi 2005) and in particular for hydrological parameters such as river flows (Anderson and Meerschaert 1998; Katz 2002). Climate models used for event attribution should therefore be carefully tested to confirm that they represent relevant deviations from Gaussianity adequately (Vautard et al. 2013).

## 3. Data and surrogate fields

To estimate the influence of climate change on a specific event we need to find the frequencies of the quantity under consideration for both preindustrial (unperturbed) conditions and climate change (perturbed) conditions. These distributions will typically be calculated from large ensembles produced by a climate model and require large computational resources. In the remainder of this paper we use an alternative approach based on ensembles of surrogate fields calculated from observations.

The ensembles are produced by a Monte Carlo approach, which we have previously used in tests of paleo-reconstruction methods (e.g., Christiansen et al. 2009, 2010; Christiansen and Ljungqvist 2011) and in a study of the significance of the increase in warm records (Christiansen 2013).

The surrogate fields are generated with a phase-scrambling procedure described in Christiansen (2007) and in more detail in (Christiansen 2013). The general outline of the procedure is familiar from bootstrap methods; first a transformation of the original data into stationary anomalies is performed, stationary surrogate anomalies are produced from the original stationary anomalies, and the final surrogate data are produced by applying the inverse transformation to the surrogate anomalies.

In the present paper the starting point is the NCEP daily near-surface temperatures for the period 1948–2011 defined on a 2.5° × 2.5° longitude–latitude grid (Kalnay et al. 1996). The average annual cycle and the secular variations—trends and variability on the lowest frequencies estimated by a third-order polynomial fit—are removed from each grid point. The third-order polynomial fit is typically relatively flat until the middle of the 1970s, whereafter it increases. The anomalies are not yet stationary as they have larger variance in winter than in summer. We obtain stationary anomalies by dividing by the annual periodic envelopes of the standard deviations. The resulting stationary anomalies are Fourier transformed, the Fourier phases are randomized but with the same random phases for all grid points, and inverse Fourier transforms are performed to get the stationary surrogate anomalies. Now the surrogate anomalies are multiplied by the annual periodic envelopes and the average annual cycles are added to get a surrogate field of the unperturbed climate state. Adding also the secular trends gives us a surrogate of the changed climate.

There is a further confounding factor that turns out to be important for the extreme warm events we consider here. Daily temperatures in Europe happen to be positively skewed in summer. However, the phase-scrambling procedure produces Gaussian-distributed anomalies. We therefore transform the surrogate anomalies in every grid point so their distributions in each season correspond to the distributions of the observed anomalies. This is done by the transformation defined by , where and are the original and transformed surrogate anomalies and and are the distributions of the original surrogate anomalies and the observed anomalies (Christiansen 2005). Note that this forces the distributions of summer (or other seasons) surrogate anomalies in the period 1948–2011 to be identical to the observed distributions over this period. There will still be intra- and interannual variability in the standard deviation and skewness of the surrogates as we will see below.

This method can be seen as a field extension of bootstrapping as it produces surrogate fields with the same spatial and temporal structure (as measured with lagged cross correlations) as the target field (Christiansen et al. 2009).

In Figs. 5–8 relevant measures of the temperature field—skewness, spatial correlations, temporal correlations, and standard deviations—are shown for both observations and a single surrogate. Results, which are based on anomalies after the average annual cycle and the secular variations have been removed, are shown for both winter, December–February, and summer, June–August. Note that in observations large seasonal cycles are found in standard deviations (Fig. 8) and skewness (Fig. 5) while spatial (Fig. 6) and temporal (Fig. 7) correlations show much weaker seasonal changes As expected from the description of the method the observed standard deviations and the skewness are reproduced perfectly in the surrogate. The spatial correlations—illustrated by the correlations between the grid point at 40°N, 15°E and all other grid points—and the temporal correlations—illustrated by the autoregression coefficient at lag 1—are very well represented although without the weak seasonality found in the observations.

We have seen that the surrogates represent the observed long-term statistics well. However, it is also important that the surrogates have a good representation of the observed inter- and intraannual variability. In the following we will focus on winters but similar results are found for the other seasons. The geographical patterns for the winter mean temperatures are shown for three different winters in Fig. 9 for both the observations and a single surrogate. We note that although there are large differences from year to year in both observations and in the surrogate, the structure and strength of these patterns are similar whether they are based on observations or on the surrogate. The same is the case for the intraseasonal standard deviation and the skewness calculated for each winter from daily values. The spatial averages of these three quantities (over the area shown in Fig. 9) are shown as function of year in Fig. 10 for observations and for 30 different surrogates. It is obvious that the observed temporal variability of these quantities is well represented by the surrogates. The main deviation seems to be a small bias in the skewness.

One could worry that the variability in the surrogates would be limited by the fact that they are based on a single observed sample. For example, the transformation of the surrogate anomalies to the observed sample distributions will to some extent constrain the surrogates. Here it should be remembered that the distributions from the observed sample are well determined because of the long period it is based on. However, we have confirmed that this effect is not important by fitting parametric distributions—for example, a Gaussian mixture with three modes—to the observed distributions and transforming the surrogates to samples from these distributions. Also, the phase-scrambling method reproduces the amplitudes of the observed spectra perfectly. The observed spectra have sampling uncertainty in the amplitudes and these can be considerable for a specific frequency. However, the phase-scrambling method can be adjusted to reproduce surrogate spectra where also the amplitudes are allowed to vary (Davison 1997). Again, no significant impacts on the results are found when using this method.

## 4. The FAR of European heat waves

In this section we will use the surrogate methodology described in the last section to investigate the challenges mentioned in section 2 in a more realistic setting. We will focus on heat waves in Europe which have attracted considerable attention after the fatal event in 2003. During July and August 2003 many European countries experienced record-breaking temperatures (Schär et al. 2004; Beniston 2004; Black et al. 2004; García-Herrera et al. 2010; Stott et al. 2004) and more than 20 000 people died of heat-related stress (Bouchama 2004; Fouillet et al. 2008).

To estimate the rarity of an event we need a diagnostics that can be compared across different spatial regions and different seasons. We choose a diagnostic calculated daily as the area of the largest contiguous region with deseasonalized temperatures larger than . Here *σ* is the standard deviation of the deseasonalized and detrended temperatures calculated for each grid point. Under Gaussianity on average 0.14% of the area would be above so we are considering extreme warm events.

To investigate the effect of the size of the region—one aspect of the selection problem—we consider three different regions all from 35° to 70°N but spanning different longitude intervals: Europe, 350°E–30°W, extended Europe: 310°E–70°W, and extratropical NH: −180°E–180°W. This is the real-world version of the problem of the number of degrees of freedom discussed in section 2. Here the number of degrees of freedom is not clear as it depends both on the spatial and seasonal distribution of, for example, skewness and spatial correlations. For each surrogate and for each region we calculate the daily index.

This index calculated from the NCEP data for Europe is shown in the top panel of Fig. 11 as function of time. The effect of the seasonal difference in skewness is clearly seen as a dominance of large values in summer. It is also obvious that there is a strong interannual variability in the index with a tendency for large values in the last 10–20 yr. The largest value, 493.7 (unit of 1000 km^{2}, but we will drop the units from now on), is reached 9 August 2003, corresponding to the European heat wave of 2003. Other strong events are found on 4 May 1990 (245.1) and 11 August 2004 (279.6). The heat wave in 2006 (Rebetez et al. 2009) is recorded as 190.1 on this scale at 19 July.

The same indices calculated for extended Europe and extratropical NH are also shown in Fig. 11. As expected, the larger the region under consideration, the more strong events are recorded. For both extended Europe and extratropical NH the event of 2003 has been exceeded by a stronger event 6 August 2010 (510.8) corresponding to the Russian summer heat wave (Rahmstorf and Coumou 2011; Dole et al. 2011; Otto et al. 2012). Also note that the seasonal asymmetry is weakest for extratropical NH, in agreement with the observation in the last section that the strong seasonal difference in skewness is restricted to Europe.

To calculate the FAR related to the index we generate 1000 surrogates of both the unperturbed climate and the changed climate using the method described in section 3. We saw in section 3 that the surrogate method produces fields with the same general characteristics as the real surface temperature. However, we should still investigate whether this level of realism is enough to produce heat wave indices with the correct characteristics. The index for the extended Europe from a single typical surrogate of the perturbed climate (bottom panel in Fig. 11) shows an excellent similarity to the corresponding index calculated from the original NCEP data. This similarity includes the seasonality, the amplitude, and the low-frequency variability. Histograms of the daily values of the indices are shown in Fig. 12. Again a very good similarity is found regarding events of all sizes. The only difference is for the largest events where the surrogate shows a little too few. This is mainly related to the period since 2003 and may be due to random fluctuations (van Oldenborgh et al. 2009) not caught by the secular trend in form of a third-order polynomial. Similar results are obtained for Europe and the extratropical NH.

From each surrogate index we find its maximum value over the whole 1948–2011 period. As mentioned in the introduction the choice of period is as ambiguous as the choice of spatial region. Here we mainly focus on the influence of the size of the spatial region but results from choosing a shorter period are briefly discussed below. The resulting probability distributions of the resulting 1000 maxima are shown in Fig. 13 for both the unperturbed surrogates and the climate change surrogates. These distributions are all heavily skewed toward large events. The distributions for the climate change surrogates have larger mean and variance than those for the unperturbed climate while the skewness is approximately the same. For the unperturbed climate the mean, standard deviation (and skewness) are 276, 63 (1.43) for Europe; 323, 65 (1.20) for extended Europe; and 377, 73 (1.18) for extratropical NH. For the changed climate they are 332, 83 (1.43) for Europe; 413, 85 (1.32) or extended Europe; and 465, 90 (1.11) for extratropical NH. So, when the area under consideration increases the distributions move toward higher values and broaden while the skewness decreases. The latter point is due to the positive skewness of European summer temperatures. If Europe is considered, the probability of an event such as the observed 2003 heat wave is very rare in both the unperturbed and the changed climate; in only 7 and 46 surrogates, respectively, out of the 1000 is an event at least as severe found. Considering larger areas these frequencies increase; for extended Europe the counts are 21 and 154 surrogates and for extratropical NH 73 and 315.

The three corresponding FARs, shown in the last panel of Fig. 13, increase monotonically with the value of the maximum index except for the largest values. However, for an event of this severity the FAR is very badly constrained with an ensemble of 1000. The dashed curves in Fig. 13 indicate the 95% uncertainty intervals calculated by a simple bootstrap procedure.

The FARs differ most in the interval between approximately 200 to 400. For an event of 250—the same size as the observed event in May 1990 and a little more than half size of that observed for the 2003 heat wave—the FAR decreases from 0.4 for Europe over 0.2 for the extended Europe, to near 0 if the extratropical NH is considered. For an event of the size of that observed for the 2003 heat wave the difference seems to be smaller. However, for such a strong event the FARs for extended Europe and the extratropical NH are only barely distinguishable—the FARs are 0.86 and 0.77 with 95% uncertainty intervals of 0.81–0.91 and 0.72–0.82. For Europe the FAR is 0.84 and the 95% uncertainty interval 0.73–0.94. The widths of the 95% uncertainty intervals are found to scale well with the inverse square root of the number of surrogates. Thus for obtaining 95% uncertainty intervals with widths of 0.05 at an event similar to the 2003 heat wave we would need in the order of 15 000 surrogates for Europe and 4000 for the extended Europe and the extratropical NH.

In the discussion above we focused on a specific diagnostic—namely, the largest daily spatially contiguous region with temperatures larger than . It is certainly of interest to investigate to which extent conclusions about the FAR is robust to changes in the diagnostic. This diagnostic used above does not take the persistence of heat waves into account. We define an alternative index by a natural extension as the largest daily contiguous volume in space and time with temperatures larger than . This new index has the dimension of time multiplied by length squared and we use a unit of 1000 × day × km^{2}. The distributions and FARs for the three regions are seen in Fig. 14. The distributions are now more skewed than for the old index (cf. Fig. 13). The FARs for Europe and extended Europe are now very similar to each other while the FAR for extratropical NH is very different. For intermediate values of the index very large differences can be found in the FARs; for example, for a value of 2000 the FAR for Europe and extended Europe is around 0.7 while it is only 0.2 for extratropical NH. Note that the 2003 heat wave is also extreme measured with this index with a FAR not distinguishable from 1 for all regions.

We saw in the simple setting of section 2 that the FAR is sensitive to the details of the underlying distribution. With the surrogate approach we can investigate how serious this challenge is under more realistic circumstances. In section 3 we found that there is a rather strong seasonal difference in the skewness in particular in the European region. We can address the importance of this seasonality by adjusting the procedure in section 3 to generate surrogates without this seasonality but where surrogates still have the same skewness in annual mean as the NCEP data. The resulting FARs (Fig. 15) are now much lower than when the seasonality in skewness was included. For the 2003 heat wave the FAR is now around 0.5.

Two other measures of heat waves that do not include spatial extent but only temporal persistence have been studied. These are defined by calculating the averages over 5 or 30 days of normalized anomalies in each grid cell and then finding the largest of these values over the considered region. The FAR for these measures are shown in Figs. 16a,b. As for the other measures the event of 2003 has FARs close to 1. Note that the results for Europe and extended Europe are almost identical, in particular for a 5-day mean. This is again a consequence of the positive summer skewness in Europe.

The surrogate method needs to separate internal variability from trends. Previously in the paper this was done by subtracting a third-order polynomial fit from each grid point. To test the sensitivity to this choice we have repeated the calculation of the FAR by subtracting polynomials of first, fifth, and seventh order. The results for the extended Europe are shown in Fig. 16c using the measure based on the area of the largest contiguous region. The results for polynomials of order 3, 5, and 7 are identical within the error bars. Using a first-order polynomial—which is obviously a bad choice since the trend has not been linear since 1948—does change the results; the FAR now becomes smaller as would be expected because the perturbed and changed climate are more similar.

Until now we have defined the maximum values using the full period 1948–2011. However, one could argue that this period is too long because anthropogenic climate change did not affect the first part of the period. We have therefore repeated the analysis when defining the maximum value using only the last 15 yr of the period. The results for the FARs are shown in Fig. 16d. Compared to Fig. 13 we find that the FARs have increased; this should be expected because of the reduced number of degrees of freedom.

## 5. Conclusions

We have studied some methodological issues with event attribution related to the selection problem and deviations from Gaussianity. We have considered simple situations where some analytical progress could be made and we have introduced a surrogate ensemble method that is less resource demanding than the ensemble climate model simulations usually used for event attribution. With the surrogate approach we have focused on European heat waves.

Our main conclusions are as follows.

The selection problem must be taken seriously. Increasing the region or the period involved when calculating the frequencies not only changes the frequencies themselves but also the FAR. We saw that theoretically the FAR will decrease when increasing the number of degrees of freedom and from the simple examples in section 2 that the effect can be considerable when varying the number of degrees of freedom from 1 to 100. We also found that the FAR changes most reluctantly for the largest values of the observation although they eventually will approach zero for very large numbers of degrees of freedom. This behavior was confirmed in the more realistic setting of section 3 where the surrogate method was applied to heat waves. Increasing the area of the considered region we found that the FAR decreases considerably for intermediate values—for example, from 0% to 40%—while for the largest events (such as the European 2003 summer heat wave) smaller changes were found.

The FAR is very sensitive to deviations from Gaussianity. The simple examples in section 2 showed that for heavy-tailed distributions the FAR is not an increasing function of the observation but that the most extreme events may have a lesser FAR than more intermediate values. In the study of European heat waves we saw that the FAR was sensitive to the surrogates having the observed seasonality in skewness. This strongly suggests that we should carefully evaluate the event attribution systems’ ability to faithfully simulate the statistical characteristics of the diagnostics under consideration.

We also found that the FAR may depend on the diagnostic chosen. For the heat waves we considered two diagnostics—one including temporal persistence, the other not—and we found significant differences in the corresponding FARs. Finally, we mention that the number of ensemble members needed to constrain the FAR can be quite large for the most extreme events. For the European 2003 heat wave we estimated that around 4000–15 000 ensemble members are needed to obtain a 95% significance interval of 0.05 depending on the size of the considered region. These numbers should be compared to typical ensemble sizes in previous work, which range from around 100 when CMIP5 experiments are used (Knutson et al. 2014; Min et al. 2014), over several hundred when dedicated systems are applied (Christidis et al. 2013), to several thousand when computations are publicly distributed (Pall et al. 2011; Sparrow et al. 2013).

We have demonstrated that statistical methods such as the surrogate ensemble method used here can be applied for event attribution and in particular as a test bed for studying methodological issues. It is obvious, as also mentioned in Christiansen (2013), that the surrogate method has both pros and cons. It is nonparametric and does not depend on any assumptions about the physical mechanisms. It is fast and does not require extensive computer resources. On the other hand, we need to separate internal variability from the secular variations using only the brief period with observations. The anomalies—after removing trends and annual cycles—are assumed to be stationary. So if the anthropogenic climate change includes new physics influencing, for example, the tails of the distributions, these changes will not be appropriately represented by the surrogates. Furthermore, the scrambling method works best for fields that are not too far from Gaussian, such as temperature, while it might have more problems for fields that are strongly non-Gaussian, such as daily precipitation.

## Acknowledgments

This work was supported by EUCLEIA project funded by the European Union’s Seventh Framework Programme (FP7/2007-2013) under Grant Agreement 607085. The NCEP Reanalysis data were provided by the NOAA/CIRES Climate Diagnostics Center, Boulder, Colorado, from their website at http://www.cdc.noaa.gov/.

## REFERENCES

*Geophys. Res. Lett.*,

**31**, L02202, doi:.

*Climate Change 2013: The Physical Science Basis*, T. F. Stocker et al., Eds., Cambridge University Press, 867–952, doi:.

*Geophys. Res. Lett.*,

**37**, L20704, doi:.

*Geophys. Res. Lett.*,

**33**, L23709, doi:.

*Bootstrap Methods and Their Application.*Cambridge University Press, 582 pp.

*Geophys. Res. Lett.*,

**38**, L06702, doi:.

*Geophys. Res. Lett.*,

**39**, L14707, doi:.

*Climate Change 2013: The Physical Science Basis*, T. F. Stocker et al., Eds., Cambridge University Press, 159–254, doi:.

*Climatic Variations and Forcing Mechanisms of the Last 2000 Years*, P. D. Jones, R. S. Bradley, and J. Jouzel, Eds., NATO ASI Series I: Global Environmental Change, Vol. 41, Springer-Verlag, 625–644.

*16th Conf. on Probability and Statistics in the Atmospheric Sciences*, Orlando, FL, Amer. Meteor. Soc., J3.5. [Available online at https://ams.confex.com/ams/annual2002/techprogram/paper_26949.htm.]

*Extremes and Related Properties of Random Sequences and Processes*. Springer, 336 pp.

*Geophys. Res. Lett.*,

**36**, L23701, doi:.

*Geophys. Res. Lett.*,

**39**, L04702, doi:.

**132**, 531–543, doi:.

*The Interpretation of Nature and the Psyche*, C. G. Jung and W. E. Pauli, Eds., Pantheon, 147–241.

*Geophys. Res. Lett.*,

**39**, L04704, doi:.

*Geophys. Res. Lett.*,

**38**, L16701, doi:.

*Environ. Res. Lett.*,

**7**, 014023, doi:.

*Geophys. Res. Lett.*,

**32**, L18711, doi:.

*Climate Science for Serving Society*, G. R. Asrar and J. W. Hurrell, Eds., Springer Netherlands, 307–337, doi:.

**41**,

*Geophys. Res. Lett.*,

**32**, L14812, doi:.