Treatment of error and uncertainty is an essential component of science and is crucial in policy-relevant disciplines, such as climate science. We posit here that awareness of both “false positive” and “false negative” errors is particularly critical in climate science and assessments, such as those of the Intergovernmental Panel on Climate Change. Scientific and assessment practices likely focus more attention to avoiding false positives, which could lead to higher prevalence of false-negative errors. We explore here the treatment of error avoidance in two prominent case studies regarding sea level rise and Himalayan glacier melt as presented in the Fourth Assessment Report of the Intergovernmental Panel on Climate Change. While different decision rules are necessarily appropriate for different circumstances, we highlight that false-negative errors also have consequences, including impaired communication of the risks of climate change. We present recommendations for better accounting for both types of errors in the scientific process and scientific assessments.
Climate science and assessment sometimes focus too strongly on avoiding false-positive errors, when false-negative errors may be just as important.
The concept of risk has been identified as a fundamental framing to the analysis of what to do about anthropogenic climate change, unanimously agreed to by the signatories of the United Nations Framework Convention on Climate Change (Pachauri and Reisinger 2007; Alley et al. 2007; National Research Council 2011). Stephen Schneider was essential in drafting the language in the summary for policymakers of the Synthesis Report of the Fourth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC) that has framed the risk-based approach to climate change: “Responding to climate change involves an iterative risk management process that includes both mitigation and adaptation” (Pachauri and Reisinger 2007, p. 64). At its core, risk assessment and risk management involve determination of probabilities and consequences of outcomes, both of which have uncertainties associated with them. Scientists aim to illuminate the full probability distributions of risks by accounting for the full range of different types of uncertainties while avoiding potential errors in causal relationships via statistical forms of inference, such as hypothesis testing.
Based on formal hypothesis testing in statistics, scientists typically consider two types of error (Fig. 1). Type 1 errors are a false positive: a researcher states that a specific relationship exists when in fact it does not. Type 1 errors are typically avoided in hypothesis testing by determining whether a p value, roughly the probability that a result could be obtained by chance alone, falls below a predetermined threshold. A 5% p value cutoff has become scientific convention in many fields of the natural sciences, but it could, in theory, be selected to be a different threshold. This false positive comes in the form of a double negative (type 1 errors mean incorrectly rejecting the null hypothesis that a relationship does not exist). Type 2 errors are the reverse: a null hypothesis would not be rejected despite being false—a false negative on the hypothesis that no relationship exists. A scientist says no relationship exists when, in fact, one exists; but again, the p value threshold for making such a claim is, in fact, arbitrary.
This statistical formulation of type 1/type 2 errors is relevant in the detection and attribution of climate change (Trenberth 2011), determining whether an observed impact or a climatic extreme event is likely to have been caused by anthropogenic climate change. Yet, type 1 and type 2 errors are also relevant to the projection of climate change and climate impacts in assessing the future scenarios' respective risks and mean and lower and upper bounds of projected climate changes/impacts from different sources (Schneider 2006). In scientific assessments such as the IPCC, scientists synthesize and weight multiple lines of evidence from diverse tools. Thus, the relative avoidance of type 1 versus type 2 errors can shape this synthesis process and the findings produced. In this case, an overestimation of a given climate impact is analogous to type 1 errors (i.e., a false positive in the magnitude of an impact), while an underestimation of the impact corresponds to type 2 errors (Schneider 2006; Brysse et al. 2013).
Recent research has suggested in a number of key attributes in climate change that scientists have “erred on the side of least drama” by underestimating changes in climate assessments (Brysse et al. 2013), effectively favoring the risk of type 2 errors to lower the chances of type 1 errors. Yet decision makers often take both type 1 and type 2 errors seriously. While many risk management and decision-making frameworks take account of and attempt to minimize the occurrence of both types of errors, available evidence suggests that recent climate science does not amply consider both types of errors, particularly in assessments.
Type 1 and type 2 errors become especially important in what has been termed “postnormal science,” where risks and/or uncertainty are high in a policy-relevant issue and decisions must likely be made without complete certainty (Funtowicz and Ravetz 1993). With its dependence on the complex and chaotic coupled climate–land–ocean system, human activities, policy decisions, system inertia, and time lags, climate science and climate impacts are generally considered within these landscapes of postnormal science (Bray and von Storch 1999; Saloranta 2001). These two types of errors factor into the complex landscape of uncertainty characterization, which has been increasingly explored and utilized within the context of the IPCC (Mastrandrea et al. 2011; Moss and Schneider 2000; O'Reilly et al. 2011; Yohe and Oppenheimer 2011). Yet, careful treatment of type 2 errors can fall outside current uncertainty characterizations and it has particular relevance to climate impacts (Trenberth 2005). Failure to account for both type 1 and type 2 errors leaves a discipline or assessment processes in danger of irrelevancy, misrepresentation, and unnecessary damages to society and human well-being (Oppenheimer et al. 2007). We further explore error avoidance in the context of two prominent case studies in the Fourth Assessment Report of the IPCC.
SEA LEVEL RISE IN THE IPCC FOURTH ASSESSMENT REPORT.
Sea level rise constitutes one of the most prominent and visible climate change impacts reported by the IPCC, with implications for human livelihoods and billions of dollars required for adapting, managing, and planning for sea level rise in the twenty-first century. From 1993 to 2003 sea level increased at a rate of about 3 mm yr−1, which is significantly higher than the 1.8 mm yr−1 average increase for the twentieth century (Alley et al. 2007). Working Group I (WGI) of the IPCC attributed about half of this current increase to the melting of land ice, a dynamical and incompletely understood process that has accelerated in recent years (Bindoff et al. 2007). The melting of Greenland and Antarctic ice sheets, however, had still not been modeled with great accuracy and had, in fact, been increasing at unpredictable rates. Because ice sheet melting was accelerating quickly and in unpredictable ways, “quantitative projections of how much it would add [to sea level rise] cannot be made with confidence” (Bindoff et al. 2007, p. 409). The authors decided, given these realities, to remove sea level rise driven by ice melt from their future estimates—not because the ice was not melting but because future rates could not be projected.
More specifically, Working Group I of the Fourth Assessment Report dealt with this insufficient understanding by removing the acceleration of ice sheet melt out of its quantitative projections of the future. The summary for policymakers' table 3 of sea level rise projections includes sea level contributions from ice sheet flow held steady at the rates observed from 1993 to 2003, but they do not include a continuation of the observed acceleration of melt (Alley et al. 2007). The Fourth Assessment Report gives ranges for sea level rise by 2100 that were lower than those reported in the Third Assessment Report and the Fifth Assessment Report (Fig. 2), but it warns that “Larger values cannot be excluded, but understanding of these effects is too limited to assess their likelihood or provide a best estimate or an upper bound for sea level rise” (Alley et al. 2007, p. 14); and they provided a footnote to explain why.
We highlight this example as an instance of how type 1 errors could potentially manifest in scientific assessments. Naturally, the projected range is for a future date and, while observed trends exceed the projected trends, we will not know whether any ranges were an error until that time period. Several scientists pointed out this potential type 2 error in the peer-reviewed literature is a consequence of “scientific reticence” (Hansen 2007), which includes a strong focus on avoiding type 1 errors. The limitations of consensus and dynamics of the IPCC assessment process, however, may have instead influenced this range (Oppenheimer et al. 2007; Solomon et al. 2008), as the process of determining upper and lower bounds involves integrating and weighting different sources of information and model simulations.
We analyzed a dataset of major U.S. and U.K. media outlet news coverage of the IPCC WGI report to examine whether media outlets reported the critical caveat regarding the upper bounds of sea level rise. A lack of reporting this caveat suggests that this potential type 2 error impaired effective communication of climate risks. We used published methods of media analysis on a database of seven major U.S. and U.K. newspapers (Rick et al. 2011) with articles mentioning global warming or climate change, sub-sampled for mentions of sea level from 1 February to 31 March 2007 to examine media coverage of the release of WGI report. Of the news articles in the dataset that covered the report release, 81% reported the quantitative sea level rise projections (18–59 cm), while only 31% mentioned the qualitative caveats about missing dynamical ice sheet contributions. Other studies have found that the media more often reports IPCC summaries of sea level rise, rather than individual studies (Rick et al. 2011), which indicates that the IPCC reported range matters for climate change communication and risk assessment.
A retrospective analyses of several key attributes of global warming concluded that the IPCC as an institution has tended to be generally conservative and often underestimate key characteristics of climate (Brysse et al. 2013). This arguably has led to larger (though unknown) type 2 error rates, particularly in presenting the upper bounds of climate changes and impacts that might not capture the full tails of the probability density function distribution. As we discuss in the “Conclusions” section, higher type 2 error rates may be particularly harmful in presenting the full spectrum of risk for risk assessment and management.
HIMALAYAN GLACIER MELT IN THE IPCC FOURTH ASSESSMENT REPORT.
Known as the “third pole” for its extensive glaciers, Himalayan glaciers provide critical water resources for millions of people in India, China, and other nations. In 2010, three years following the publication of the Fourth Assessment Report, it came to light that a single section in a chapter in Climate Change 2007: Impacts, Adaptation and Vulnerability had overstated the rate at which glaciers were melting from the Himalayan region. Stemming from a lapse in the application of quality control and review of nonjournal literature and potentially a simple typographical error, a section in chapter 10 of Climate Change 2007: Impacts, Adaptation and Vulnerability mistakenly reported that glacial melt of many glaciers was possible by 2035, though the executive summary of the chapter correctly concluded, “The retreat of glaciers and permafrost in Asia in recent years is unprecedented as a consequence of warming” (Cruz et al. 2007, p. 471). While recent research has in fact shown that the majority of Himalayan glaciers are melting and at a rate on par with glaciers around the world (Fujita and Nuimura 2011; Kaab et al. 2012; Kargel et al. 2011), the 2035 melt date is almost certainly an overstating of melt rates (Bolch et al. 2012) and thus provides an example of a possible type 1 error.
In contrast to the sea level rise, the scientific community and media response to this potential error was substantial. In the peer-reviewed literature, the melt date was described as incorrect (Cogley et al. 2010) and some suggested that “this error . . . shredded the reputation of a large and usually rigorous international virtual institution” (Kargel et al. 2011, p. 14 709). The IPCC issued a formal statement, saying it “regret[s] the poor application of well-established IPCC procedures in this instance” (IPCC 2013, p. 1). The IPCC response emphasized that the organization has numerous processes and procedures to examine evidence and to avoid errors. These procedures had simply not been adequately followed in this case.
Did the overestimation actually damage scientific credibility of the IPCC? It is hard to know the true impact, but polling data since the incident indicates likely not. A poll conducted in June 2010 found that 14% of Americans heard in the news recently about errors in the IPCC report (Leiserowitz et al. 2013). About 5% said that these errors had decreased their trust in climate scientists, though these were largely concentrated in the “doubtful” and “dismissive” categories of respondents with relatively low trust in climate scientists prior (Leiserowitz et al. 2013).
Another set of polling data questioned a nationally representative sample of Americans concerning the Himalayan glacier error in June 2010, six months after the incident. Around 24% of the nation said they remembered hearing about recent errors, but only 4% said they thought the errors indicated scientific misconduct (J. Krosnick and B. MacInnis 2014). After a set of calculations with respondents indicating a degree of trust of climate scientists, the authors determined that the maximum theoretical upper bound of opinion change was a 5% decrease in trust of climate scientists. The actual change in the degree of trust based on longitudinal polling data from this study, however, was statistically insignificant from zero (J. Krosnick and B. MacInnis 2014). The average change of public belief in the existence of global warming across all nine sets of available polling data before and after the Himalayan glacier error and the hacking of the University of East Anglia e-mails, and thus potentially attributable to these two events, was 6%, but longitudinal analysis of public opinion over 2006–11 indicates that year-to-year fluctuations in temperature appear to have a much larger effect on public opinion (J. Krosnick and B. MacInnis 2014), which aligns with recent research documenting the direct “experiential learning” effect of temperatures on public opinion on climate change in many sections of the U.S. public (Myers et al. 2013). Taken together, the breadth of polling data since this incident indicates that a relatively small portion of Americans were aware of this controversy, that Americans have generally trusted scientists studying the environment, and that this trust did not decline following this error (J. Krosnick and B. MacInnis 2014).
The two case studies analyzed here illustrate the intricacies and complexities in avoiding both type 1 and type 2 errors in scientific assessments. Oppenheimer and colleagues (2007) have noted that searching for consensus in an assessment process such as the IPCC can be counterproductive to risk assessment. We suggest that assessment can further institutionalize the aversion to type 1 errors and attendant risk of committing type 2 errors. Both in paradigm and procedure, the scientific method and culture prioritize type 1 error aversion (Hansson 2013) and “erring on the side of least drama” (O'Reilly et al. 2011) or “scientific reticence” (Hansen 2007), and this can be amplified by both publication bias and scientific assessment (Freudenburg and Muselli 2010; Lemons et al. 1997; O'Reilly et al. 2011). Thus, the high consequence and tails of the distribution of climate impacts, where experts may disagree on likelihood or where understanding is still limited, can often be left out or understated in the assessment process (Oppenheimer et al. 2007; Socolow 2011). As participants in the IPCC assessments, we have observed the excessive focus on avoiding type 1 errors at various stages in the assessment process, which may have worsened following the Himalayan glacier event.
Growing evidence suggests that, partly owing to this treatment of error as well as other processes, consensus scientific assessments to date are likely to underestimate climate disruptions (Brysse et al. 2013; Freudenburg and Muselli 2010; O'Reilly et al. 2011). A recent paper reviewed the suite of studies that compared past predictions with recent observations of sea level rise, surface temperature increase, melting of Arctic sea ice, permafrost thaw, and hurricane intensity and frequency. The study found that IPCC assessments of projections were on the whole largely correct or even underestimates (possible type 2 errors), and that there was little to no evidence of “alarmism” or widespread overestimates (Brysse et al. 2013). Thus, while a full accounting of the relative prevalence of type 1 versus type 2 errors is not possible (as what determines an “error” is a difficult question and future projections cannot be assessed currently), the balance of evidence indicates that potential type 2 errors may be more prevalent in assessments, such as the IPCC.
This asymmetry of treatment of error has unintended consequences. Type 2 errors can hinder communication of the full range of possible climate risks to the media, the public, and decision makers who have to justify the basis of their analyses. Thus, such errors have the potential to lead to unnecessary loss of lives, livelihoods, or economic damages. Yet, as Stephen Schneider eloquently highlighted throughout his work, high-consequence, controversial, uncertain impacts are exactly what policy makers and other stakeholders would like to know to perform risk management (National Research Council 2011; Schneider et al. 1998; Socolow 2011).
Naturally, varying situations and contexts apply different decision rules in considering type 1 versus type 2 errors, and type 1 error aversion is beneficial in certain circumstances. Moreover, uncertainty must be recognized as multifaceted and textured. As such, Brian Wynne described four kinds of uncertainty: 1) “risk”—where we know the odds, system behavior, and outcomes can be defined as well as quantified through probabilities; 2) “uncertainty”—where system parameters are known, but not the odds or probability distributions; 3) “ignorance”—risks that escape recognition; and 4) “indeterminacy”—which captures elements of the conditionality of knowledge and contextual scientific, social, and political factors (Wynne 1992). Thus, the risks through uncertainty in these conditions of postnormal science have material implications. Incomplete presentation of the full possibilities of outcomes (likelihood compounded by consequence) can lead to a lack of preparedness, loss of livelihoods or lives, and economic damage.
Error and uncertainty are inherent to all science, scientific inquiry, and policy decision making. Furthermore, various mobilizations of uncertainty and varied interpretations of risk have long played a critical part in ways of making climate change meaningful in civil society. Climate science, especially the IPCC assessments, is a considered leader in the treatment of uncertainty in a highly complex and societally relevant research field (Morgan and Mellon 2011). Thus, lessons learned in climate science regarding treatment of uncertainty and type 1/2 errors may also be applicable in other policy-relevant fields, such as medicine. While considerations of type 1 and type 2 errors sometimes fall outside the typical approach to uncertainty characterization, several steps would help better address an asymmetry of error:
First, as part of an awareness of one's own epistemological biases, treatment of type 2 error as error is critical.
Second, reporting the full range of possible outcomes, even if improbable, controversial, or poorly understood, is essential if it is “not implausible.”
Third, drawing on information from diverse sources, especially in scientific assessment such as the IPCC, can help avoid type 2 errors.
Finally, better use of formal expert elicitation analysis can provide a full spectrum of possible impacts, supplement other data sources, and help avoid type 2 errors.
The IPCC has made progress to opening the door on some of these areas. The most recent uncertainty guidance document covers some of the abovementioned steps and states that “findings can be constructed from the perspective of minimizing false-positive (type 1) or false-negative (type 2) errors, with resultant tradeoffs in the information emphasized” (Mastrandrea et al. 2013, p. 1). Furthermore, the expert elicitation analysis literature is also expanding in its treatment of major climate system uncertainties. A recent study on sea level rise based on elicitation analysis of 90 experts estimated the range of sea level rise by 2100 at 40–120 cm (Horton et al. 2014), with upper bounds above the current IPCC “likely” range (Fig. 2).
Regardless of the future fate of the IPCC periodic reports, assessments of climate science will continue in the future and will be aimed at providing rigorous risk assessment of climate change impacts. Ultimately, awareness among climate scientists of both type 1 and type 2 errors will best advance the field and help provide accurate and nuanced risk assessment for decision makers.
This paper is dedicated to Stephen H. Schneider, whose wisdom on epistemology and uncertainty inspire us continually. We thank M. Mastrandrea and K. Mach for the helpful discussion of the concepts and their comments on the manuscript. W. R. L. A. was supported in part by an award from the Department of Energy (DOE) Office of Science Graduate Fellowship Program (DOE SCGF) and by the NOAA Climate and Global Change Postdoctoral Fellowship Program, administered by the University Corporation for Atmospheric Research.