The annual “State of the Climate” report, published in the Bulletin of the American Meteorological Society (BAMS), has included a supplement since 2011 composed of brief analyses of the human influence on recent major extreme weather events. There are now several dozen extreme weather events examined in these supplements, but these studies have all differed in their data sources as well as their approaches to defining the events, analyzing the events, and the consideration of the role of anthropogenic emissions. This study reexamines most of these events using a single analytical approach and a single set of climate model and observational data sources. In response to recent studies recommending the importance of using multiple methods for extreme weather event attribution, results are compared from these analyses to those reported in the BAMS supplements collectively, with the aim of characterizing the degree to which the lack of a common methodological framework may or may not influence overall conclusions. Results are broadly similar to those reported earlier for extreme temperature events but disagree for a number of extreme precipitation events. Based on this, it is advised that the lack of comprehensive uncertainty analysis in recent extreme weather attribution studies is important and should be considered when interpreting results, but as yet it has not introduced a systematic bias across these studies.
In each year since 2012, a supplement has been published along with the annual “State of the Climate” report in the Bulletin of the American Meteorological Society (BAMS) (Peterson et al. 2012, 2013; Herring et al. 2014, 2015). Each supplement has consisted of a collection of studies by different author teams that looked at extreme weather events that occurred during the previous year, with the underlying question being how their properties, including their occurrence, may have changed as a consequence of anthropogenic climate change. The BAMS supplements have grown in size with the first, second, and third supplements consisting of 6, 19, and 22 studies, respectively (this study was conducted before the fourth supplement was published in 2015). A few of these studies have looked at multiple individual events—for example, one study examined six different rainfall extremes over the United States (Knutson et al. 2014b). Given this, we count the total number of event analyses in these three supplements to be 63. It should be noted, however, that in four cases more than one paper examines the same event. These event analyses are listed in Tables S1–S3 in the supplemental material.
There are a number of aspects of extreme weather in which an anthropogenic role may be discerned (Stott et al. 2013). Since attribution statements may be sensitive to methods used and data sources (Otto et al. 2012) [also see studies 33 (Swain et al. 2014), 34 (Wang and Schubert 2014), and 35 (Funk et al. 2014) in Table S3 in the supplemental material], this study aims to identify if the published BAMS attribution results would differ if a common methodological and data framework were applied across all events. Understanding the importance of this sensitivity has been identified as a major priority in event attribution research (Stott et al. 2013; Titley et al. 2016). To highlight this, the Titley et al. (2016, p. 11) report specifically mentions that “bringing multiple scientifically appropriate approaches together, including multiple models and multiple studies helps distinguish results that are robust from those that are much more sensitive to how the question is posed and the approach taken.” We adopt the attribution concept described in Stone and Allen (2005) as implemented by Pall et al. (2011) and D. Stone and P. Pall (2016, unpublished manuscript). Along with being popular in recent years, this approach is straightforward to apply on the relatively large number of events being examined here. It involves the comparison of the probability of extreme weather under a factual scenario of conditions that occurred around the time of the event (e.g., greenhouse gas concentrations and ocean temperatures) against the probability under a counterfactual scenario in which anthropogenic emissions had never occurred.
To facilitate a systematic investigation, we restrict our analyses to temperature and precipitation extremes of one or more calendar months in length. Both the climate model used in this study (see section 2) and available observationally based products may not be applicable for some events, such as subdaily storms. This reduces the total number of events considered from 63 to 48. We further exclude events either where the multiple observational datasets we use indicate the event was in fact not extreme (defined here as an anomalous magnitude having been equaled or exceeded more than 10 times during the 1961–2010 period) or where our climate model poorly (what constitutes a poor fit is detailed in the methods section) simulates the frequencies of extreme temperature or precipitation over the specified spatial and temporal scales relating to the event. These additional constraints further reduce the total number of events for which attribution statements are calculated to 36, as indicated in Tables S1–S3 in the supplemental material.
It should be stressed that our analyses are complementary to those performed within the BAMS supplements. However, our systematic approach may overlook some issues that are often studied explicitly within BAMS supplement contributions, such as confirmation that the dynamics of the extreme events in our climate model simulations resemble the dynamics of observed events. Disagreement between our result and that published in a BAMS supplement may reflect shortcomings of either analysis (or both) or the differences in the way the attribution question was framed (Otto et al. 2015). The primary aim of this study is to identify event types for which attribution conclusions may be sensitive to choice of methodology.
Two ensembles of 390 independently and identically distributed realizations of the period from January 2010 to December 2013 have been constructed using the Community Atmosphere Model, version 5.1 (CAM5.1), a numerical model of the atmosphere–land system representing phenomena larger than ~1° in longitude–latitude (Neale et al. 2012). Each realization within an ensemble is driven by the same external boundary conditions but starts from a different initial weather state such that each ensemble represents a spread of possible weather trajectories given the external boundary conditions. The first ensemble is driven by a factual “real world” (RW) scenario simulating weather that might have occurred under observed historical boundary conditions. These boundary conditions include changing greenhouse gas, tropospheric aerosol (prescribed burdens), volcanic aerosol, and ozone concentrations; solar luminosity; sea surface temperatures; sea ice coverage; and land cover. The second ensemble is a counterfactual “natural” (NAT) scenario, in which emissions from human activities had not interfered with the climate system. In the NAT scenario, greenhouse gases, tropospheric aerosols, and ozone have been altered to estimated preindustrial (circa 1855) levels, while ocean temperatures have been cooled and sea ice coverage expanded according to an estimate based on output from the international CMIP5 climate modeling effort (D. Stone and P. Pall 2016, unpublished manuscript). This adjustment to ocean surface conditions preserves month-to-month and year-to-year variability, such as variability related to the El Niño–Southern Oscillation phenomenon. It should be noted that, because of the way in which sea ice concentrations have been imposed, the runs in the NAT scenario cool off with time. Mean temperature differences (RW minus NAT) in CAM5.1 do not stand out from those in two other AGCMs (MIROC5 and HadGEM3-A-N216) for the periods and latitudes over which the examined extremes occurred (figure not shown). Such an artifact should only influence attribution statements for temperature extremes occurring over polar regions. These CAM5.1 simulations have been produced under the international Climate of the 20th Century Plus (C20C+) Detection and Attribution Project (http://portal.nersc.gov/c20c; Folland et al. 2014; the data can be accessed at http://portal.nersc.gov/c20c/data.html). All analyses in this paper are based on monthly mean output of precipitation and near-surface air temperature averaged over the regional domains indicated in Figs. S1–S3 in the supplemental material.
Monthly observational data have been obtained from the Climatic Research Unit Time Series, version 3.22 (CRU TS 3.22; Harris et al. 2014); NOAA Precipitation Reconstruction over Land (PREC/L) on a 2.5° longitude–latitude grid (Chen et al. 2002); GPCC, version 6 (Schneider et al. 2014); and GPCP, version 2.2 (Adler et al. 2003), products for precipitation; and CRU TS 3.22 and GISTEMP (Hansen et al. 2010) products for 2-m air temperature. These data are compared against output from 50 of the RW simulations over the 1961–2010 (1979–2010 for GPCP, version 2.2) period in order to assess the climate model’s ability to reproduce the type of extreme weather being analyzed.
The analysis consists of two main steps. The first step [see the examples in Fig. 1 or Figs. S1–S3 (center) in the supplemental material for all 48 events] tests the model’s ability to realistically simulate temperature or precipitation extremes over the same temporal and spatial domain as was previously examined in the BAMS supplements. For example, the red curve in Fig. 1d is obtained from 50 RW realizations of the January–February mean monthly rainfall over California from 1961 to 2010, sorted in descending order and plotted logarithmically to focus on the tail of the distribution in which we are interested (in this case, the minimum). The resulting curve estimates the return values for all possible exceedance probabilities. The remaining solid curves are similarly constructed from the two observational temperature products. We adjust for systematic mean bias in temperatures between the simulations and each of the observational products by adding to the observations the difference between the 1961–2010 average of the simulations and of the respective observations; for precipitation the observations are multiplied by the ratio of the averages. We bias correct the observations to the model and not the other way around because we have one model and multiple observations. As the choice does not influence exceedance probabilities, this is the favored approach owing to its relative simplicity.
The uncertainty bars on the model (red) curve are prediction intervals for the return value curve, calculated at a discrete set of exceedance probabilities as the 5th and 95th percentiles of the return values estimated separately from each of the 50 ensemble members. If the model reasonably characterizes the behavior of extremes, we would expect the line characterizing a given observational dataset to fall within approximately 90% of the intervals. Since we are primarily interested in the model’s ability to simulate extremes, we consider the prediction intervals for events rarer than the 30% exceedance probability. The model passes our “fit for purpose” test if at least 70% of those intervals include the values from the observation-based curves. This percentage of intervals, representing how closely the tail of the model distributions corresponds to the tails of the observed distributions, is depicted on the legends in Fig. 1 following the fitness test (FT). If the model passes our fitness test and at least one of the observational products indicates the event was indeed extreme (defined as exceeding the tenth warmest/coldest/wettest/driest event for its season and region during the 1961–2010 period), then analysis for the given event enters the second step wherein we formulate an attribution statement.
In the second step we calculate the probability ratio (PR)—a metric characterizing the anthropogenic contribution to the occurrence of the extreme (often also termed the risk ratio). In this study, PR is defined as the probability of occurrence in the real world divided by the probability in the natural world . If PR > 1, anthropogenic activities have increased the chance of the event, while if PR < 1 they have decreased the chance of the event. Event probabilities are estimated from the RW and NAT simulations for the year of the event. To define an extreme event, we construct three different sets of thresholds. The first set of thresholds is determined by the magnitude of the actual events according to the observational datasets. As the multiple observational products differ from each other, this set comprises multiple credible thresholds. The second set of thresholds [1in20(c) in Fig. 2] is determined by the 5th or 95th percentile (1-in-20-yr event) of the 50-member RW ensemble of 1961–2010 simulations. As this ensemble has a clear trend toward warmer temperatures as time evolves, we construct a third set of thresholds (1in20 in Fig. 2) by determining the 5th or 95th percentile of the 390 event realizations for the year of each event from the 2010–13 RW runs—thus, here we define to be 5%.
We estimate (for all thresholds) and [for the observationally based and 1in20(c) thresholds] using only the 390 realizations available for the month/season and year of the event. Estimation is done in one of two ways. In most cases, the threshold is either above the 80th or below the 20th percentile of the simulations. In this case, the peaks-over-threshold (POT) extreme value statistical methodology (described further below) is used to fit a distribution to the tail of the entire sample of simulations; from this fitted distribution we calculate the probabilities of the event of interest. However, it is possible for a threshold to not be that extreme in one set of simulations (i.e., to be between the 20th and 80th percentiles). In this case, or is expressed simply as the percentage of realizations that exceed the threshold. This latter situation was encountered in some estimates of where a rare cold event in the RW simulations was common in the NAT simulations.
The peaks-over-threshold analysis fits a distribution to all the exceedances over a high cutoff using a point process model, as in Tomassini and Jacob (2009) and Cooley and Sain (2010). (Note that we use the term “cutoff” rather than “threshold” to distinguish the value used to define exceedances in the POT analysis from the value used to define extreme events.) We use a cutoff of the 80th percentile, for hot/wet events, or the 20th percentile, for cold/dry events, of the 390 realizations available for each scenario for a given event. This point process approach is equivalent to fitting a generalized Pareto distribution for excesses over a cutoff and is consistent with the generalized extreme value (GEV) distribution for block maxima. The basic parameters of the point process model can be expressed in terms of those of a GEV distribution. Using these parameter estimates, standard calculations provide the estimated probabilities of exceeding the various event thresholds (Coles 2001). Uncertainty bars on the best estimate (BE) of the PRs are calculated by generating 1000 bootstrap datasets of the RW and NAT realizations. For each dataset the corresponding PR is calculated (on the log scale) per the procedures discussed above. This gives a sample of 1000 log(PR) values that characterize the sampling distribution of the PR estimate. To quantify uncertainty in the estimated log(PR), we used the basic bootstrap confidence interval procedure, by which lower and upper uncertainty bars are calculated by BE − (E95 − BE) and BE − (E05 − BE), respectively, where E95 and E05 represent the 95th and 5th percentiles of the 1000 bootstrapped log(PR) values (Davison and Hinkley 1997).
Frequency distributions of weather for the specified month/season and year of the event are shown in Fig. 3 as blue (NAT) and red (RW) histograms [see Figs. S1–S3 (right) for all events] for the same selection of events as in Fig. 1. Also as in Fig. 1, dashed lines in Fig. 3 are the bias-adjusted observed event magnitudes, as well as the simulation-based thresholds (see section 3).
PRs for all events found to in fact be extreme in the observational record, and for which our model is deemed suitable for the task, are shown in Fig. 2. If the event only met the extreme requirement for some of the observational products, then results are only shown for those products. If none of the observational products meet the extreme requirement or the model fit is poor—for example, in the cases of events 2 and 3 (Funk 2012), respectively (the 2011 East African droughts)—then the attribution step is not performed. Results for hot, cold, wet, and dry events are shown in Figs. 2a, 2b, 2c, and 2d, respectively. For reference, the solid line represents a PR of unity, meaning the event likelihood is unchanged as a consequence of anthropogenic emissions. Markers above or below this line indicate that human activity increased or decreased the event likelihood, respectively. Framed markers indicate a PR of infinity; that is, is estimated as zero from the available simulations. Uncertainty bars have been calculated on the 1in20(c) markers except for those with best estimates of infinity or near infinity since in these cases the bootstrap procedure fails. For these infinity and near-infinity cases, the fraction of RW realizations exceeding the thresholds lies between and , depending on the event, meaning that the conclusion of a PR much greater than unity is robust to the uncertainty in the near-zero . Uncertainty is larger for thresholds based on actual events (not shown) because these thresholds are almost always more extreme than the simulation-based thresholds. For example, for event 33, a low rain event, there are only two RW and zero NAT values below the CRU TS 3.22 value.
To test whether results are robust to common assumptions regarding the fitted distributions, Fig. S4 in the supplemental material depicts PRs calculated assuming all RW and NAT distributions are Gaussian. Since we are looking at events with greater than or equal to one month duration, the aggregation of daily temperature or rainfall data results in a distribution converging on a Gaussian distribution (except possibly for some low-precipitation events). This assumption may introduce significant errors in estimating the probability of the more extreme thresholds. However, assuming Gaussianity means the fitted curves are unbounded; thus, a nonzero probability of exceedance is calculated in every case. This means markers that would otherwise have been found to be infinity in Fig. 2 now have extremely high or low, but finite, PRs. Uncertainty has been calculated in the same way as in Fig. 2. Given the similarity between Fig. S4 in the supplemental material and Fig. 2, we conclude that results are qualitatively robust to our assumptions concerning the tails of these distributions.
In Table 1 we summarize all the PRs for each event in Fig. 2 and the corresponding attribution conclusions from the BAMS supplement studies into three categories: the event likelihood increased, decreased, or hardly changed as a consequence of human activity. The BAMS supplement published in 2014 (Herring et al. 2014) included a table describing the conclusions of all of the studies using these categories. The earlier BAMS supplements (Peterson et al. 2012, 2013) did not include such a table, however, so we determined the categorization based on the conclusions stated in the papers. Given the existence of the summary table in Herring et al. (2014), we believe the BAMS supplements are intended to be interpreted according to these categories. Some papers in the earlier supplements expressed conclusions in terms of a relation to a long-term warming over a large area of the ocean, for instance, rather than anthropogenic emissions of greenhouse gases; in these cases we referred to the confident detection of human influence on large-scale warming (Bindoff et al. 2013) and interpreted the studies as relevant for assessing the role of human activity. Attribution statements in this study are conditional on SSTs at the time of the event, which, for example, may have occurred during an El Niño or La Niña event. On the contrary, “general attribution” refers to attribution statements not conditional on observed SSTs during the time of the event. We argue, however, that the boundary conditions are only one aspect attribution results in a study might be conditional on. For example, different physics and parameterizations in models can result in different attribution statements, regardless of whether the models used are coupled atmosphere and ocean GCMs or AGCMs.
Based on our calculated PRs, the existence of a human role is assigned when the 1in20(c) error bars and four out of five (for precipitation; two out of three for temperature) of the other best-estimate values fall outside a near-unity PR range of – (indicated by the dashed lines in Fig. 2). We discuss this definition of near unity further below. Figure 4 shows the allocation of attribution statements from the BAMS supplements into the three categories, with the area of the pie charts being proportional to the number of positive, negative, or neutral conclusions per event type. The area of each pie chart filled with gray or white represents the relative agreement or disagreement between our statements and those found in the BAMS supplements, respectively. Usage of a less strict criterion of the 1in20(c) error bars and three out of five of the other best-estimate values for precipitation yields a similar result, with two attribution statements [13 (Rupp et al. 2013) and 30 (King et al. 2013)] switching from neutral to positive.
The comparison in Fig. 4 between the conclusions of the BAMS supplement studies and our analysis suggests some sensitivity to the choice of methodology. All but one (Cattiaux and Yiou 2013) of the discrepancies concern rainfall events. The influence of emissions on temperature extremes appears to now be strong enough that it is robust to the choice of methodology, data sources, and other factors (Christidis et al. 2012a,b; Angélil et al. 2014; Fischer and Knutti 2015). The discrepancies between our results and the BAMS conclusions run both ways, with either the respective BAMS supplement study concluding that human emissions have played a substantial role whereas we conclude that they have not, or vice versa.
A crucial element of the comparison between our conclusions and those of the BAMS supplement studies displayed in Fig. 4 is the definition of the boundary between the neutral and nonneutral categories. There is still no accepted value of the PR for which emissions should be deemed to have played an important role, and selection of such a value may well be context specific. For instance, while a near-zero border may be relevant for general monitoring of human influence on climate, civil court cases may prefer a doubling (Grossman 2003). Our choice of a – range for the neutral category has in fact been selected in part because it provides a good overall match to the conclusions of the BAMS supplement studies; a narrow neutral range also matches the definition of nonneutral in the summary table of Herring et al. (2014), which effectively uses an infinitesimal range for papers that use the PR measure. Use of a larger – neutral range only converts one dry-positive agreed case to disagreement (Fig. S5 in the supplemental material). However, a – range, which might be considered the widest range plausibly labeled neutral, results in complete agreement for all wet and dry cases concluded to be neutral in the BAMS supplements but almost complete disagreement for wet and dry cases concluded to be positive in the BAMS supplements (Fig. S5 in the supplemental material). The PR for the temperature-related events are far enough from unity to be insensitive to these choices.
The analysis conducted here should not be considered a full, comprehensive, and unbiased assessment of the role of anthropogenic emissions in extreme weather generally. For instance, attribution results in approximately half of studies reexamined are not conditional on the SSTs at the time of the event while in this study results are. This study additionally uses a single estimate of change in SSTs associated with anthropogenic greenhouse gas emissions. SST warming patterns tend to vary between models leading to large differences in attribution statements (Lewis and Karoly 2015; Shiogama et al. 2016). This source of uncertainty is not accounted for in this study.
Also, the selection of events in this analysis is far from unbiased. For one thing, the locations of events examined in the three BAMS supplements correlate strongly with the authors’ proximity to the event. Another selection factor may be the degree of anticipated media and public interest. Both factors may be involved in the strong focus on Europe, the United States, and Australia (49 of 63 events). An aggravating factor is that researchers are usually cautious in their analyses for regions where observational records are known to be poor, but the tight schedule of the BAMS supplement submission process hinders that degree of caution. In areas with poor monitoring, it becomes imperative to ensure that observational data or observationally based data are adequate for characterizing the event (e.g., was the event even extreme?) and either for assessing long-term trends or for evaluating/calibrating climate models that are used for the assessments. However, even if such biases in the original BAMS supplements did not exist, our selection criteria of extreme and fit for purpose tests may have imposed biases.
For computational tractability, climate models include approximations to the physical equations governing the climate system, and differences in approximations made across climate models can lead to differences in the climate described by the models. For instance, Angélil et al. (2014) find that PR estimates for 1-in-1-yr hot and cold extremes could differ between two climate models by a factor of 2. Bellprat and Doblas-Reyes (2016) additionally show that the use of a single climate model for event attribution statements can lead to overestimated attribution statements. We therefore stress the importance of using multiple models for probabilistic event attribution. Additionally, only one estimate of the attributable ocean warming due to emissions was used in calculating the sea surface temperatures for the NAT scenario, but there may be a strong sensitivity to uncertainty in this estimate (Pall et al. 2011; Shiogama et al. 2014). Accounting for these uncertainties requires the production of new climate model data products, as for instance currently underway within the international C20C+ Detection and Attribution Project (Folland et al. 2014) (and of which the CAM5.1 simulations used here are a first submission) and the weather@home project (Massey et al. 2015).
One of the reasons for our selection of the Pall et al. (2011) approach has been that it can be applied systematically across a wide range of event types; some other approaches require analyses tailored to each event, such as those performed in Cattiaux and Yiou (2012, 2013). Our analysis has compared results derived from one approach against results obtained using a variety of attribution concepts and analysis methods [including the Pall et al. (2011) approach itself in some cases], and although some recent studies have used multiple attribution frameworks (King et al. 2015), this study constitutes the first large-scale assessment of the sensitivity of conclusions regarding the role of emissions from human activities in the occurrence of extreme weather events to the choice of methodology. Such an approach to extreme event attribution has recently been emphasized in Titley et al. (2016). Titley et al. (2016) additionally advise the use of multiple observational and/or reanalysis products for model evaluation. The suggestion corresponds to recommendations in Angélil et al. (2016), who find large uncertainty around exceedance probabilities for rainfall and temperature extremes among many of the current-generation products used to evaluate GCMs. It should be briefly noted that this study does not examine individual events but rather classes of events, defined by those that exceed (or fall below for cold events) the observed magnitude of the event.
While the Pall et al. (2011) approach has become popular in recent years, there remain different views on what constitutes the concept of attribution (Stott et al. 2013). For example, Titley et al. (2016) and Shepherd (2016) both identify different concepts of attribution. The “storyline” concept is one that examines the role of various factors contributing to the event as it occurred, while the “risk based” concept answers the question probabilistically (Shepherd 2016). While these concepts can be broadly considered as addressing attribution, they are not identical. Similarly Hannart et al. (2016) note that current attribution work focuses on the sufficiency aspect but could equally focus on the necessity aspect. Similarly, while our analysis benefits from superior and more numerous data sources than can be afforded by most studies on the tight schedule of the BAMS supplements, the more targeted nature of the BAMS supplement studies may permit a more thorough evaluation of the adequacy of the data sources and of the confidence in conclusions. Thus, the conclusions in our analysis for each event could be considered similarly plausible to those in the BAMS supplement studies. In that sense, discrepancies between our conclusions and those in the BAMS supplements indicate that uncertainties in analysis methods have yet to be adequately considered, particularly concerning precipitation events. However, considering that disagreements are in both positive–neutral and neutral–positive directions, we find no evidence that the selection of methodology represents a systematic bias in favor of a particular event attribution conclusion.
This work was supported by the Regional and Global Climate Modeling Program of the Office of Biological and Environmental Research in the Department of Energy Office of Science under Contract DE-AC02-05CH11231. Calculations were performed at the National Energy Research Supercomputing Center (NERSC) at the Lawrence Berkeley National Laboratory. We thank Chris Funk, Shreyas Cholia, and Prabhat for helpful discussion.
Supplemental information related to this paper is available at the Journals Online website: http://dx.doi.org/10.1175/JCLI-D-16-0077.s1.