1. Introduction
Forecasts for major weather events often begin days in advance. The weather models upon which forecasts are based update frequently and generally grow more accurate as lead times decrease (Lazo et al. 2009; Wilson and Giles 2013). However, meteorologists are sometimes reluctant to update forecasts provided to the public out of fear that inconsistency in subsequent forecasts will be confusing and negatively affect user trust. “Inconsistency” in this context means that the most recent forecast (e.g., 10 in. of snow accumulation; 1 in. = 2.54 cm) differs from the previous forecast (e.g., 2 in. of snow accumulation) for the same target date (next Saturday). In fact, maintaining consistency in forecasts is considered best practice by some institutions like the National Oceanic and Atmospheric Administration (NOAA 2016). Yet, because forecasts tend to grow more accurate as lead time decreases, the choice to maintain consistency can be at a loss to accuracy (how closely the forecast matches the outcome).
There is strong evidence that forecast inaccuracy reduces trust. For instance, in a study in which participants used overnight low-temperature forecasts to make road-salting decisions, they rated trust significantly higher and took protective action more often with low-error forecasts than with high-error forecasts (Joslyn and LeClerc 2012). Similarly, in a study in which participants used reports from financial analysts to make investment decisions, participants rated competence, trust, and likelihood of buying future reports higher for accurate financial analysts than for inaccurate financial analysts (Kadous et al. 2009). Also, relative to patients who imagined receiving accurate initial mammogram test results, those who imagined receiving false-positive breast cancer test results reported reduced trust and increased likelihood to delay future mammography (Kahn and Luce 2003). Even preschoolers show reduced trust in inaccurate relative to accurate informants (Pasquini et al. 2007; Ronfard and Lane 2018).
By contrast, evidence on the effect of inconsistency in forecasts is sparce and comes largely from nonweather domains. For instance, consumers believe that consistency between two estimates from the same source is a signal of skill (Falk and Zimmermann 2017). There is also evidence that information about an event from multiple sources is preferred when it is in agreement as opposed to conflicting, all else being equal (Smithson 1999). Moreover, confidence in one’s own decision is higher when based on information from financial advisors who agree with one another as opposed to those who do not agree (Budescu et al. 2003).
There is also recent evidence from our own laboratory that speaks to the effect of inconsistency in predictions on trust and decision-making. For example, one study that manipulated forecast consistency in sequential thunderstorm and snow forecasts from a single source found that consistent (relative to inconsistent) forecasts led to greater trust (Losee and Joslyn 2018). There is also research comparing the impact of inconsistency with that of inaccuracy, suggesting that sequential forecast inconsistency reduces user trust but inaccuracy has a larger negative effect on trust (Burgeno and Joslyn 2020). In these experiments, participants based their school-closure decisions on snow-accumulation forecasts (e.g., Monday forecast: 4 in. of snow on Wednesday) from a single source, two days and one day in advance of an anticipated storm. Not only was inaccuracy more detrimental to trust in the forecast, but inconsistency appeared to provide useful information. It increased participants’ uncertainty expectations, reflected in a wider range of expected outcomes, and led to more conservative closure decisions. An inaccuracy by inconsistency interaction effect suggested that differences in trust due to inconsistency shrank when forecasts were inaccurate. In other words, the reduction in trust due to inaccuracy was substantial to the extent that inconsistency had little additional impact.
At least part of the reason that inconsistency is less detrimental to trust in sequential forecasts may be the fact that, when forecasts are inconsistent, people understand that the most recent forecast is more likely to be more accurate and regard it as a replacement for an earlier forecast. Indeed, prior research on sequential forecasts suggests that participants’ best estimates were more heavily influenced by recent forecasts (1 day in advance) than initial forecasts (2 days in advance), suggesting that participants expected the most recent forecast to be more reliable and weighted it more heavily (Burgeno and Joslyn 2020).
The relative effects of inconsistency and inaccuracy have also been compared in an experiment based on snow forecasts from two different sources provided at the same time, one day in advance of an anticipated storm. It revealed that while inaccuracy significantly reduced trust, inconsistency between the two sources did not (Su et al. 2021). In fact, participants incorporated information from both sources equally in their outcome estimates and appeared to glean useful information from inconsistencies. As with inconsistent sequential forecasts (Burgeno and Joslyn 2020), inconsistencies led participants to infer greater uncertainty and make more cautious decisions. Therefore, inconsistency appears to be less problematic for trust than inaccuracy in both sequential forecasts coming from the same source and simultaneous forecasts from different sources, and it may provide useful information.
It is important to note that a particular kind of trust was measured in this line of work, referred to as “calculative trust.” There are at least two kinds of trust that could be affected: 1) relational trust, representing the social bond between the trustor and the trustee, which is based on factors such as the forecast providers’ intentions, attitudes, or goals; and 2) calculative trust, sometimes called “confidence,” which is based on factors directly related to the quality of the forecast and derived from factors such as past performance (Siegrist et al. 2005; Twyman et al. 2008; Earle 2010). The kind of trust tested in the work reported below was also “calculative” trust.
Although the research on inconsistency reviewed above is both important and foundational, it is crucial to note that to isolate the effects of inaccuracy and inconsistency, all of the experiments cited above from our own laboratory used highly controlled forecast stimuli, limiting the range of forecast values and closely matching the degrees of inaccuracy and inconsistency at small amounts (about 2 in.). In other words, both inconsistency and inaccuracy were essentially categorical variables (either inconsistent or consistent, inaccurate or accurate). Moreover, exactly one-half of forecast pairs were inconsistent and the other one-half were consistent. Similarly, exactly one-half of forecasts in each consistency category were inaccurate and one-half were exactly accurate. That raises the question, “Will the same effects be observed in more realistic forecast situations in which forecasts vary naturally and take on a wider range of values?” Indeed, the degree of inconsistency may be crucial. For instance, the impact on users may be greater if the snow-accumulation forecast decreases from 7 to 1 in. as compared with from 3 to 1 in. in the subsequent forecast. This may, in turn, translate into a greater impact on trust. Indeed, for some users, small inconsistencies may not be regarded as inconsistency at all but rather as an informative update. Larger inconsistencies, however, may have a qualitatively different impact. In addition, there may be relationships between forecast values, inconsistency, and inaccuracy in actual forecasts that may also be relevant. The experiment reported here was designed to evaluate the relative impact on trust of forecast inconsistencies and inaccuracies that vary naturally and take on a wide range of values.
The other question this work was designed to answer is whether there is a benefit to adding an uncertainty estimate to inconsistent forecasts. By “uncertainty estimate” in this context we mean a probabilistic forecast (e.g., 30% chance) indicating the likelihood of a particular outcome. Although forecast inconsistency may reduce trust in some situations, it may be possible to preserve trust in the face of inconsistency by adding a probabilistic forecast, as has been shown with inaccuracy. For example, in the road-salting study (Joslyn and LeClerc 2012) mentioned above, probabilistic forecasts reduced the negative effects of forecast inaccuracy on both trust and decision-making. When provided the probability of observing temperatures at or below the decision threshold (in addition to single-value forecasts) participants rated trust higher than those who received single-value low-temperature forecasts alone. There are likely two main reasons for this effect. First, the acknowledgment of uncertainty may make the forecast seem “less wrong” when it fails to verify, preserving trust in the face of forecast error. In addition, people have an intuitive understanding of the uncertainty inherent in weather forecasts, even when not specified (Joslyn and Savelli 2010). Therefore, a forecast that makes the uncertainty explicit may seem more honest. In addition, in these experiments (Joslyn and LeClerc 2012), participants made better decisions from an economic standpoint when they were provided with probabilistic forecasts. In another study, probabilistic forecasts preserved trust to a greater degree than did lowering false alarm rates. In addition, probabilistic forecasts increased compliance with weather warnings (LeClerc and Joslyn 2015). In yet another set of studies, probabilistic forecasts added to flood warnings enhanced subjective understanding of flood likelihood and reduced recency biases relative to a return-period expression (e.g., 10-yr flood) and to a no-information control group (Grounds et al. 2018). Thus, a growing body of evidence suggests that laypeople can use explicit probabilistic information and that it may offer several benefits in the decision-making process, not least of which is preserving trust. Therefore, uncertainty estimates may attenuate the loss of trust due to forecast inconsistency. However, for these benefits to be observed, it may be necessary for probabilistic forecasts to be reliable. In one study [experiment 1 of Burgeno and Joslyn (2020)] when targeted outcomes were observed 50% of the time regardless of the probability predicted, no effect of including probabilistic forecasts (as compared with single-value forecasts) was observed. Thus, the experiment reported here was designed to test whether including reliable probabilistic forecasts preserves trust in the face of forecast inconsistency.
In sum, the experiment reported here was designed to test whether the reduction in trust due to forecast inconsistency extends to inconsistency values that vary naturally and, if so, whether the reduction in trust is attenuated by including uncertainty estimates. It also tested the impact of these factors on participants’ own outcome estimates and decision quality. This experiment employs the school-closure paradigm described above (Burgeno and Joslyn 2020). However, the new experiment used entirely different, realistic forecast stimuli and was conducted 2 years later with a different group of participants. The participants’ goal was to decide, based on a sequence of snow forecasts taken from historical records, whether it was appropriate to close schools due to a snowstorm based on a 6-in.-or-more accumulation rule. One-half of participants received probabilistic forecasts in addition to the single-value snow-accumulation amount to determine the impact of probabilistic forecasts on trust and decision quality.
We hypothesized that probabilistic forecasts would enhance trust and that inconsistency and inaccuracy would reduce trust. Furthermore, we hypothesized that probabilistic forecasts would attenuate the negative effects of forecast inconsistency and inaccuracy on trust and enhance decision quality. We predicted that inconsistency would be interpreted as indicating greater uncertainty in the forecast, reflected in a wider range of expected outcomes, and we tested whether this would be affected by probabilistic forecasts. Last, we hypothesized that more-recent forecasts, in inconsistent pairs, would have a greater impact on participants’ accumulation estimates. Hypotheses were preregistered on Open Science Framework and can be viewed online (https://osf.io/dv6j8).
2. Method
a. Participants
A total of 419 University of Washington psychology students participated for course credit and the opportunity to earn a cash bonus. After executing data-cleaning procedures (described below), data from 398 participants (62% female; mean age = 19.5) remained and were included in the analyses below.
b. Procedure
Participants first gave informed consent and provided their age and gender. Next, they read and listened to instructions spoken by the experimenter that explained the computer-based task1 (appendix A). Participants were asked to advise schools on whether to close because of an anticipated snowstorm based on weather forecasts provided by “a private weather service that specializes in local predictions.” Although several factors are considered when actual closure decisions are made, in this simplified task, the decision was based on snow accumulation alone. Participants were instructed to advise closing if they expected 6 in. or more of snow accumulation. Participants provided school closure advice for 65 schools across the region for each of two hypothetical winter periods for a total of 130 trials. Each week was described as involving a different school district to encourage participants to regard the trials as independent of one another.
To better simulate actual weather-related decisions that have real consequences, a point system was used. Participants’ ending point balance was converted to cash at the conclusion of the experiment to encourage them to put forth their best effort. Participants began with a virtual budget of 332 points. Their goal was to retain as many points as possible. Closure recommendations cost 2 points to reflect the cost of makeup days. There was no cost for recommending that a school stay open; however, if participants advised staying open and 6 in. or more of snow was observed, they incurred a 6-point penalty to reflect the risk of accidents and injuries due to dangerous road conditions. Notice that, as with many real-world weather-related decisions, the cost of protection is less than the potential cost of the adverse weather event.
Participants earned a cash bonus for the ending point balance at the rate of $1 for every 32 points over 72 (final balance) points. A 72-point threshold was selected to discourage the simplistic and unrealistic strategy of recommending closure for every trial, which would result in a final balance at the payment threshold of 72 points.2 In addition to providing real consequences for the decisions made, this point system held constant the cost and the penalty across participants. In other words, unlike many real-life weather threats, for which the cost of protection or the vulnerability to consequences may be greater for some, in this context, it was the same across participants, reducing statistical noise and allowing us to better detect differences due to the forecasts alone.
For every trial, participants based their school-closure decision on two snow forecasts for Wednesday, one issued on Monday (2 days prior to the event) and one on Tuesday (1 day prior to the event). Forecasts were presented sequentially, centered on separate screens with the current weekday in the top-left-hand corner in boldface font. To determine how the forecasts influenced participants’ estimates, participants were asked to report the number of inches of snow they expected for Wednesday as well as the least (minimum estimate, “as little as”) and greatest (maximum estimate, “as much as”) number of inches that they would not be surprised by. Then, participants rated their trust in the forecast “to help them make their [school closure] decision” on a 6-point drop-down menu, from “not at all” to “completely.” Notice that this question asks participants to focus on the quality of the information itself rather than the source of the information. See appendix B for the exact wording of each question. The current point balance was displayed in the bottom left-hand corner of each screen. When participants completed all four questions, they clicked a “next” button in the bottom right-hand corner of the screen to progress to the next screen and could not go back and change responses on the previous screen. Then, the second forecast was shown, and participants answered the same four questions with respect to the second forecast. Next, the decision screen appeared. The current day (Tuesday) was displayed in boldface font in the top-left-hand corner with two buttons in the middle of the screen labeled “close” and “stay open.” Below each respective button was a reminder of the associated point cost and that “close” meant, “I think snow accumulation will be 6 in. or more,” whereas “stay open” meant, “I think snow accumulation will be less than 6 in.”
After submitting their school-closure decision, a fourth screen appeared saying that the school followed their advice and either stayed open or closed. The observed snow accumulation on Wednesday was displayed, and the resulting cost or penalty was shown (unless neither occurred). Participants’ point balance and, if applicable, the penalty incurred, was displayed in the bottom-left corner of the screen. Participants again rated their trust in the forecasts using the same pull-down menu. In sum, each trial consisted of four screens: 1) Monday forecast for Wednesday, 2) Tuesday forecast for Wednesday, 3) Tuesday night school-closure decision, and 4) Wednesday outcome. Then, the next trial began with a new set of forecasts and outcome that pertained to a school in a different district. Participants completed four practice trials before the test trials began (see appendix A).
c. Forecast stimuli
The data upon which the snow-accumulation forecasts (48 and 24 h in advance), probabilities of 6 in. or more accumulation and observed 24-h snow-accumulation outcomes, were based were obtained from the Eastern Region Headquarters of NOAA. The original set of 160 forecasts pertained to a snowstorm that occurred in several locations over the eastern United States on 9 February 2017.3 In the experiment we used 130 of these, treating each pair of forecasts and outcomes as a separate event. All single-value forecasts and observed accumulation amounts were rounded to the nearest inch. Although some other small changes were made (described below), the vast majority of trials4 included original forecast values, and all outcome values were identical to the original historical forecast set. As a result, forecasts varied naturally in terms of snow-accumulation totals, accuracy, and consistency. Moreover, the critical characteristics of the original forecast dataset were maintained (see appendix C).
1) Consistency
Consistency was defined as an exact match between the Monday and Tuesday forecasts. All inconsistent trials were inconsistent by 1 in. or more, with a range from −8 to 8 in. and a mean of 2.86 in. Because the original dataset had few exactly consistent pairs, they were increased by making slight changes to the second forecast in 9 pairs (7%) in which the initial differences were small. The original cases pertained to a single weather event in which the expected accumulation increased over time, so there were very few descending forecast pairs (4%; N = 7). This was problematic because some anecdotal evidence5 suggests that downgraded forecasts (descending in this case) are more likely to be altered to maintain consistency by forecasters. Therefore, we increased the proportion of descending forecasts by flipping the order of 19 inconsistent forecast pairs. As a result, in the forecast set used here, 40 (61%) of the inconsistent trials were ascending (values increased from first to second forecast) and the remaining 26 (39%) were descending (values decreased from first to second forecast).
2) Accuracy
Accuracy was gauged relative to the second (Tuesday) forecast. By this standard, the proportion of exactly accurate forecasts was similar to that of the historical forecast set (see appendix C). All inaccurate trials were inaccurate by 1 in. or more. Inaccuracies ranged from −6 to 10 in. and had a mean of −1.02 in. Thus, like the historical forecast set, inaccurate forecasts were positively (high) biased by about an inch, and fewer than 20% crossed the 6-in. decision threshold (e.g., a second forecast of 7 in. and an observed snow accumulation of 5 in.).
3) Probabilistic forecasts
For one-half of participants, the forecast also included the probability of 6 in. or more of snow. The probabilities were also based on those provided in the historical dataset. However, it was important to first test the impact of well-calibrated probabilistic forecasts, otherwise, null effects could be due to either the genuine lack of an effect or simply the lack of an effect for uncalibrated probabilities. This was especially true of the most recent forecast used as the standard for accuracy. Therefore, some second-forecast probabilities were altered slightly so that forecast probabilities for 6 in. or more of snow accumulation roughly matched the frequency of observing 6 in. or more of snow. See appendix D for the calibration procedures and appendix C for forecast characteristics.
d. Design
A single-factor (forecast format), between-participants design was used. One-half of the participants received a single-value forecast, and the other one-half received the same single value and the probability of 6 in. or more of snow accumulation (e.g., “. . . 4 inches of snow. . . however, there’s a 30% chance of 6 or more inches of snow”). We refer to the former as deterministic in that they imply an exact outcome (e.g., “. . . 4 inches of snow”) and the latter as probabilistic. Thus, other than the additional probability of observing 6 in. or more of snow, the forecasts and outcomes seen by both groups of participants were identical. Forecasts were presented in one of four fixed orders.6
Participants were randomly assigned to one of two forecast format conditions and one of four forecast orders. In the analyses reported below, forecast values, the magnitude of inconsistency and inaccuracy, and the economically optimal decision (see closure-decision analysis below) were also included as predictor variables. The outcome variables were trust rating, closing schools or not (closure decisions), and snow-accumulation estimates.
3. Results
Prior to conducting the main analyses, we eliminated participants who did not understand the task, were not paying attention, or were not taking the task seriously. To this end, participant data were excluded if 1) they provided a lower estimate for maximum than for minimum snow-accumulation estimate or if 2) their average best estimate or 3) highest day-2 maximum or minimum estimates were unreasonably large, that is, greater than the national record accumulation amount for lowland (200 m or less above sea level) snowfall (49 in.; National Climatic Data Center 2019). Twenty-two participants were excluded in this procedure, leaving 398 participants in the following analyses.
a. Trust
1) Hypotheses
The primary hypotheses for this research concerned whether trust was impacted by access to probabilistic forecasts, inconsistency between two consecutive forecasts for the same event, inaccuracy of the most recent forecast (when compared with the outcome), or interactions among these variables.7 We hypothesized the following:
-
H1: Probabilistic forecasts would increase trust in forecasts relative to deterministic forecasts.
-
H2: Inconsistency would reduce trust in forecasts.
-
H3: Inaccuracy would reduce trust in forecasts.
-
H4: The negative effect of forecast inconsistency on trust would be attenuated by the inclusion of a probabilistic forecast.
-
H5: The negative effect of forecast inaccuracy on trust would be attenuated by the inclusion of a probabilistic forecast.
2) Data analysis plan
Because of our interest in the effects of inaccuracy on trust (as well as inconsistency and probabilistic forecasts), we analyzed the postoutcome trust measure, at which point forecast accuracy was known to participants (question 6 in appendix B).8 Trust was an ordinal variable. Therefore, it was analyzed with a series of generalized estimating equations (GEEs) using cumulative link proportional odds regression models (see appendix E for model details and regression tables), which are designed to model ordinal data and population-averaged (between group) effects. To conduct these analyses, we used the multgee package (Touloumis 2015) for R. We specified an “independence” working correlation structure9 and robust standard errors to build in resistance to possible misspecifications of the working correlation structure.10 For this and all subsequent analyses, an alpha level of 0.05 was used to determine statistical significance.
3) Trust ratings
As hypothesized, probabilistic forecasts increased trust (Table E1 in appendix E). The estimated association between forecast format and trust was significant such that when trials included probabilistic forecasts, as compared with equivalent trials with deterministic forecasts (inaccuracy and inconsistency held constant), the odds of reduced trust decreased (trust increased) by approximately 20% [estimated odds ratio = 0.80, 95% confidence interval (CI) = (0.65, 0.99), and p = 0.04].
Contrary to our predictions, inconsistency (mismatch between forecast 1 and 2) appeared to slightly increase (rather than decrease) trust (Table E1 in appendix E). The estimated association between inconsistency and trust ratings was significantly positive, such that a 1-in. difference between forecast 1 and 2, when compared with otherwise equivalent trials (inaccuracy and format held constant), decreased the odds of trust reduction (increased trust) by approximately 8% [estimated odds ratio = 0.93, 95% CI = (0.92, 0.94), and p < 0.001].
Meanwhile, inaccuracy, the degree of mismatch between forecast 2 and the outcome appeared to decrease trust as predicted (Table E1 in appendix E). The estimated association between forecast inaccuracy and trust was significantly negative, such that a 1-in. difference between forecast 2 and the observed accumulation, when compared with otherwise equivalent trials (inconsistency and format held constant), increased the odds of trust reduction (reduced trust) by approximately 15% [estimated odds ratio = 1.15, 95% CI = (1.14, 1.17), and p < 0.001]. To reiterate, although the effect of inaccuracy on trust confirmed our hypothesis, the effect of inconsistency did not. Inaccuracy had a negative association with trust (decreased trust), whereas inconsistency had a slight positive association with trust (increased trust).
Previous research suggested that inconsistent forecasts had a smaller effect on trust when forecasts were inaccurate (Burgeno and Joslyn 2020). To better understand this relationship with naturalistic forecasts incorporating a wider range of inconsistencies and inaccuracies, we conducted exploratory analyses with inaccuracy dichotomized at 3 in. (roughly the mean of inaccuracies; see Tables E6 and E7 in appendix E). In these data, the strength of the positive association between inconsistency and trust differed significantly across levels of forecast accuracy (p < 0.001), such that it was stronger (interaction odds ratio farther from 1) for trials with greater inaccuracy [more than 3 in. from the outcome; estimated odds ratio = 0.80 and 95% CI = (0.78, 0.82)], when compared with equivalent trials (forecast format held constant) with less inaccuracy [less than 3 in.; estimated odds ratio = 0.93 and 95% CI = (0.92, 0.94)]. This suggests that, as with previous research, at low forecast inaccuracy there was an association between inconsistency and trust. However, as the inaccuracies increased (greater than 3 in.; not tested in previous research) the effect on trust was greater. In contrast with previous research, the association between trust and inconsistency was positive. Therefore, the positive effect of inconsistency on trust was greater when inaccuracy was greater.
The association between inconsistency and trust also differed significantly across forecast format (p < 0.001). The positive association between inconsistency and trust was stronger (farther from odds ratio = 1) for trials that included probabilistic forecasts [estimated odds ratio = 0.90; 95% CI = (0.88, 0.91)], when compared with equivalent trials (inaccuracy held constant) with deterministic forecasts [estimated odds ratio = 0.96 and 95% CI = (0.94, 0.97); see Tables E2 and E3 in appendix E]. In other words, probabilistic forecasts were associated with a stronger increase in trust due to inconsistency when compared with equivalent trials with deterministic forecasts (for the general pattern, see Fig. 1a).
Similarly, in support of our hypothesis, the association between inaccuracy and trust differed significantly by format (p = 0.04). The negative association was weaker for trials that included probabilistic forecasts [estimated odds ratio = 1.13; 95% CI = (1.12, 1.15)], when compared with equivalent trials (consistency held constant) with deterministic forecasts [estimated odds ratio =1.17 and 95% CI = (1.14, 1.19); see Tables E4 and E5 in appendix E). In other words, as hypothesized, probabilistic forecasts attenuated the negative effect of inaccuracy on trust, when compared with equivalent trials with deterministic forecasts (for the general pattern, see Fig. 1b).
Taken together, these results suggest that, here, with more realistic forecasts, unlike previous experiments, inconsistency increased rather than decreased trust, and the impact was greater with greater inaccuracies. However, the rest our predictions were confirmed. Probabilistic forecasts increased trust and interacted with the effects on trust due to both inconsistency and inaccuracy. Probabilistic forecasts enhanced the positive association between inconsistency and trust. At the same time, probabilistic forecasts attenuated the reduction in trust due to inaccurate forecasts.
b. Accumulation estimates, ranges, and closure decisions
Next we examined participants’ closure decisions and snow-accumulation estimates. We hypothesized the following:
-
H6: Probabilistic forecasts would enhance decision quality, defined here as the expected value of the decision (see calculation below), and greater differentiation of closure decisions across the decision-threshold value (see below).
-
H7: Inconsistency would increase uncertainty expectations, defined here as the range of anticipated outcomes (“as much as,” “as little as”), and increase decision quality. We also asked what further impact forecast format (deterministic, probabilistic) would have on uncertainty expectations.
-
H8: The most recent forecast (forecast 2) would have a greater impact on participants’ outcome estimates requested after forecast 2 was shown than would the initial forecast, (forecast 1) suggesting that participants understood that the most recent forecast was more accurate.
1) Data analysis plan
The continuous variables, decision quality, uncertainty expectations and snow-accumulation estimates were analyzed using linear mixed-model regressions.11 A t statistic (coefficient divided by its standard error) and alpha levels of 0.05 were used to determine whether the coefficient of each predictor variable (see appendix G for regression tables) was significantly different from 0, that is, whether the contribution of that predictor was significant. School-closure decisions were analyzed as a binary variable, modeled with a series of binary logistic GEEs (see appendix F for model details and regression tables). To conduct these analyses, we used the geepack package for R (Højsgaard et al. 2005), with robust standard errors. We specified an “independence” working correlation structure and binomial family (see Table 1 for descriptive statistics).
Descriptive statistics of participants’ mean response on key dependent variables within each forecast format condition (deterministic or probabilistic). Here, SD indicates standard deviation, F1 refers to forecast 1, and F2 refers to forecast 2.
2) Decision quality
First, we examined whether probabilistic forecasts improved decision quality. The quality of the participant’s decision was defined as its value, prior to knowing the outcome, referred to as the “expected value” (Bernoulli 1954). We describe it here as the “expected cost” because only losses (cost of closure or penalty) were possible in this task. For each trial, the optimal choice was the one with the least expected cost (Murphy 1977). There were two possible options on every trial, to advise 1) keeping the school open or 2) closing. The expected cost of keeping a school open (there was no actual cost at that point) was the product of the 6-point penalty and the chance of receiving it (the percent chance of 6 in. or more of snow for the second forecast on that trial). The cost of closing a school was the 2 points that participants paid when they selected that option. A 33% chance of 6 in. or more of snow was the break-even point at which the expected cost of staying open (0.33 × 6 = 2) was equal to the cost of closing (2 points). Therefore, whenever the chance of 6 in. or more was greater than 33%, it was optimal to advise closing because the cost of closing was less than the expected cost of staying open. Whenever the chance of 6 in. or more was less than 33%, it was optimal to advise staying open. A difference score was calculated on each trial by subtracting the expected (or actual) cost of the participant’s choice from the optimal choice on that trial (henceforth referred to as expected cost difference). A “0” difference indicates that the participant made the optimal choice. Otherwise, the value is negative. Then, a linear mixed-model regression analysis was conducted on the expected cost difference (appendix G), with forecast format (probabilistic/deterministic), inconsistency, and inconsistency by forecast format interaction entered simultaneously as predictors.12
Confirming our hypothesis, the expected cost difference was smaller (decision quality was better) for probabilistic forecasts than for the deterministic forecasts (Table 1). In particular, shifting from the deterministic to the probabilistic format predicted a 0.06-unit decrease in the expected cost difference [t(51 736) = 10.10; p < 0.001].
There was also an unpredicted increase in decision quality due to inconsistency, although it was smaller than the effect of forecast format. For every 1-unit increase in inconsistency, there was a 0.02-unit decrease in expected cost difference [decision quality was better; t(51 736) = 21.19; p < 0.001]. Additionally, the inconsistency by forecast format interaction was significant, such that the probabilistic forecast reduced the expected cost difference (increased decision quality) for smaller inconsistencies but less so for larger inconsistencies, where decision quality was already higher [t(51 736) = 6.73, p < 0.001, and B = 0.01]. We will return to this issue in the discussion.
To better understand the decision errors participants made, we next examined the difference in participants’ decisions to close schools above and below the optimal decision threshold. As mentioned above, according to expected value theory, it was optimal to close schools whenever the probability of 6 in. or more of snow was 33% or higher and to keep schools open otherwise. By this standard, as is common with decisions that involve only losses (Kahneman and Tversky 1979)13 most decision errors (65%) were risk seeking (participants kept schools open when they should have closed) as opposed to risk averse (closing schools when they should stay open). Binary logistic GEE models were used to examine the associations between closure decisions (open or close) and forecast format, inconsistency, and a categorical variable that indicated whether the optimal decision was to stay open or close on that trial. Two interactions were also tested: forecast format by inconsistency and forecast format by optimal decision. Thus, there were three models: one with the main effects entered simultaneously and one for each of the two interaction effects (controlling for all main effects; appendix F).
Indeed, participants tended to follow the optimal strategy. The estimated association between optimal decision and actual closure decisions was significantly positive, such that a day-2 forecast probability at or above 33% increased the odds of deciding to close by approximately 3000% [estimated odds ratio = 30.39, 95% CI = (28.30, 32.60), and p < 0.001; see Table F1 in appendix F].
Importantly, as reflected in the expected value analysis, participants made fewer errors with probabilistic forecasts. The estimated association between optimal decision and closure decisions varied significantly across forecast format, p < 0.001. Probabilistic forecasts supported greater differentiation across the decision threshold [estimated odds ratio = 44.2; 95% CI = (39.4, 49.5)] relative to deterministic forecasts [inconsistency held constant; estimated odds ratio = 22.7, 95% and CI = (21.2, 24.4); see Tables F2 and F4 in appendix F]. In other words, probabilistic forecasts decreased the odds of deciding to close when it was optimal to keep a school open and increased the odds of deciding to close when it was optimal to close, when compared with those who received deterministic forecasts.
In contrast, participants closed more often overall as the inconsistency in forecasts increased. The estimated association between inconsistency and closure decision was significantly positive, such that a 1-in. difference between forecast 1 and 2, when compared with otherwise equivalent trials (forecast format and threshold orientation held constant), increased the odds of deciding to close by approximately 44% [estimated odds ratio = 1.44, 95% CI = (1.42, 1.46), and p < 0.001; see Table F1 in appendix F]. In addition, the association between optimal decision and actual closure decisions was stronger for larger inconsistencies than for smaller inconsistencies [estimated odds ratio = 1.68, 95% CI = (1.59, 1.77), and p < 0.00114; see Table F3 in appendix F].
Thus, examination of closure decisions above and below the optimal threshold (33% chance of 6 in. or more) aligned with the expected value analysis. Probabilistic forecasts allowed participants to make better decisions than did deterministic forecasts in both analyses. Participants also made better decisions when forecasts were inconsistent. This was due in part to the fact that inconsistency encouraged them to close the schools more often, an advantage in this task in which people tend to be risk seeking (majority of errors were not closing when closing was optimal).
3) Range estimates
The above analysis suggests that participants made better decisions (in this case more conservative; closing schools more often) both when they were provided with explicit uncertainty estimates, the precent chance of 6 in. or more of snow, and when there was greater inconsistency between the day-1 and day-2 forecasts. This latter result could be due in part to the fact that participants interpreted inconsistency as an indication of uncertainty in the forecast. To determine whether this was the case, a range of anticipated outcomes was calculated. This was done by subtracting participants’ minimum (question 2 in appendix B) from their maximum (question 3 in appendix B) estimate of the number of inches that would not surprise them, taken after the second forecast. A wider range of anticipated outcomes suggests greater perceived uncertainty. Then, a linear mixed-model regression was conducted on range of outcomes, with inconsistency, forecast format, and the inconsistency × forecast format interaction entered simultaneously as predictors (see appendix G for regression tables). Confirming our hypothesis, forecast inconsistency tended to increase the range of anticipated outcomes. More specifically, every 1-in. increase in inconsistency predicted a 0.62-in. increase in range [t(51 340) = 88.23; p < 0.001]. The main effect of forecast format did not reach significance [t(404) = 1.39; p = 0.17]. However, the inconsistency by forecast format interaction was significant, such that participants who received probabilistic forecasts expected a smaller range of values for lower-magnitude inconsistencies and a larger range of values for higher-magnitude inconsistencies, relative to participants who received deterministic forecasts [t(51 340) = 10.37, p < 0.001, and B = 0.09; see Fig. 2]. Thus, as predicted, participants expected greater uncertainty with greater inconsistency. In addition, probabilistic forecasts amplified the difference in uncertainty expectations across the range of inconsistency.
4) Snow-accumulation estimates
To determine how the two forecasts influenced a participant’s own expectations of the outcome, we next examined snow-accumulation estimates for Wednesday made after forecast 2 (question 1 in appendix B). A linear mixed-model regression was conducted on snow-accumulation estimates, with three continuous predictor variables (forecast 1 value, forecast 2 value, and inconsistency) and the categorial predictor, forecast format (deterministic; probabilistic) entered simultaneously with the inconsistency by forecast format interaction15 (see appendix G for regression tables).
As hypothesized, the second forecast was a much better predictor of snow-accumulation estimates than the first forecast. For every 1 unit increase in the second forecast, there was a 0.87 unit increase in estimated snow accumulation [t(51 330) = 322.72; p < 0.001].16 In contrast, for every 1 unit increase in the first forecast, there was only a 0.08 unit increase in estimated snow accumulation [t(51 330) = 30.11; p < 0.001]. In addition, there was an unpredicted effect of inconsistency. Inconsistency slightly but significantly reduced estimates. More specifically, every 1 unit increase in inconsistency predicted a 0.03 unit decrease in estimated snow accumulation [t(51 330) = 5.91; p < 0.001]. The main effect of forecast format failed to reach significance [t(411) = 0.72; p = 0.47]. However, the inconsistency by forecast format interaction was marginally significant [t(51 330) = 1.97; p = 0.05], such that the reduction in estimates due to inconsistency was stronger for deterministic forecasts than probabilistic forecasts. In sum, as predicted, these results suggest that participants weighted the most recent forecast 10 times as heavily as the earlier forecast in their own estimate.
4. Discussion and conclusions
The experiment reported here is the first to demonstrate the benefits of probabilistic forecasts to enhance both trust in the forecast and decision quality in the face of forecast inconsistency. Participants made better decisions in terms of both increased expected value and fewer decision errors with probabilistic than deterministic forecasts. A closer inspection of decision errors clarified the benefits of the probabilistic forecast. Because only costs and losses were possible in this task, participants made more risk seeking (failing to close schools when it was economically optimal) than risk-averse errors (closing schools when it was not economically optimal). In cost/loss situations such as this, people tend to prefer to take a risk than to pay a small cost up front to protect against that risk, even when it is not economically optimal to do so (Kahneman and Tversky 1979). However, the error analysis revealed that those with probabilistic forecasts were less prone to this strategy. They differentiated to a greater degree across the optimal-decision threshold. In other words, when provided with the probability of 6 in. or more of snow, participants closed schools more often when it was economically optimal to do so (probability of 6 in. or more was 33% or more) and kept schools open more often when it was economically optimal to do so (probability of 6 in. or more was less than 33%) when compared with participants using the deterministic forecast alone.
This experiment is also the first to demonstrate the impact of forecast inconsistency on trust and decision-making using naturalistic forecast stimuli. The basic conclusions from these results align remarkably well with those reported in previous highly controlled experiments (Burgeno and Joslyn 2020; Su et al. 2021) suggesting that forecast inconsistency, as defined here, may not be as detrimental to trust as is often assumed.
However, here, in contrast to the highly controlled studies cited above, the results suggest that naturalistic forecast inconsistency may have a positive impact on trust. One potential explanation resides in the set of historical forecasts used here, in which the inconsistent forecasts were predominantly ascending (the second forecast was for greater accumulation than the first). Moreover, the increasing forecast trend tended to be confirmed by the outcome in those trials. For 72% of the ascending trials, the observed accumulation was higher than the most recent forecast. People may have expected the trend to continue, as has been shown in previous research (Hohle and Teigen 2015, 2019; Maglio and Polman 2016), and confirmation of those expectations may have increased trust. Another factor that may have increased trust slightly is that fewer of the inconsistencies between forecasts (31%) crossed the 6-in. decision threshold in this experiment, relative to the highly controlled studies (50% at minimum). Because participants’ decisions depended on whether 6 in. or more was expected, an inconsistency may be less trustworthy when the two forecasts point toward different choices (close/open). Therefore, the slight positive effect of inconsistency on trust found in this experiment may be specific to situations in which there is an ascending trend or the trend in forecasts is confirmed by the result or the inconsistency is less consequential to the decision. Resolving these issues might be a fruitful line of future research, in which such variables could be systematically manipulated to determine their individual impacts on trust.
An alternative, more general explanation is that inconsistency increases trust because it acts as an estimate of uncertainty. As with the prior research (Burgeno and Joslyn 2020), the results reported here demonstrated that participants expected a larger range of outcomes with greater inconsistency, suggesting that they perceived greater uncertainty in these forecasts. However, here, unlike the previous highly controlled studies in which inconsistency was held constant at a few inches, some of the inconsistencies were much larger. This may have enhanced the positive effect of perceived uncertainty on trust. It is clear that an explicit expression of uncertainty increases trust. As with numerous previous experiments (Joslyn and LeClerc 2012; LeClerc and Joslyn 2015; Grounds et al. 2018), the inclusion of the probabilistic forecast increased trust over the single-value forecast. It may be that when uncertainty is acknowledged in some way, either with an explicit uncertainty estimate or implied by the inconsistency in forecasts, the forecast seems less “wrong” when the single-value forecast does not match the observed snow accumulation.
Somewhat surprisingly, forecast inconsistency also increased decision quality slightly, perhaps because it was interpreted as a sign of uncertainty. Forecast inconsistency appeared to encourage greater cautiousness, closing schools more often overall, as was seen in previous research (Burgeno and Joslyn 2020). Greater cautiousness tended to increase decision quality in this task because the majority of errors were risk seeking (failing to close when it was optimal). The increase in cautiousness with inconsistent forecasts seen here may have been due in part to the fact that with these forecast data, greater inconsistency tended to be correlated with higher forecast snow-accumulation totals in forecast 2 (r = 0.39; p < 0.001). However, this could not have been the explanation in the previous research in which inconsistency also increased cautiousness (Burgeno and Joslyn 2020), because forecast values in those experiments were held constant across conditions. Thus, an explanation that accounts best for all of these results is that the increase in decision quality is due to the fact that inconsistency acts to signal uncertainty, which promotes cautiousness. Regardless of the reason, it is important to note that the positive effect of inconsistency on closure decisions differed qualitatively from that of probabilistic forecasts, which was more precise. Probabilistic forecasts, because they specified the percent chance of snow accumulation surpassing the decision threshold (6 in. or more) increased closure decision mainly when it was optimal to do so and not otherwise.
We were also interested in how people integrate information from differing forecasts to form their own estimates. In line with the previous research on sequential forecasts (Burgeno and Joslyn 2020), participants’ snow-accumulation estimates were influenced more strongly by the second than by the first forecast values. In other words, although participants did not completely disregard the first forecast, they appeared to understand that the most recent forecast should take precedence. There are at least two possible explanations for this. It may be due to extensive extraexperimental experience with real weather forecasts, leading to many, oftentimes correct, intuitions about forecasts (Morss et al. 2008; Joslyn and Savelli 2010; Savelli and Joslyn 2012). However, it is important to note that our forecast stimuli were realistic in the sense that second forecasts [mean inaccuracy = 2.14; standard deviation (SD) = 1.81] were on average closer to accurate than first forecasts (mean inaccuracy = 2.42; SD = 2.03). Participants might have learned (explicitly or implicitly) to discount first forecasts within the context of the experimental experience.
The main limitation of the research presented here is related to one of the primary goals to evaluate the effects of forecast accuracy and consistency on trust and decision-making in the context of naturalistic forecasts. Allowing forecasts and outcomes to vary naturally led to a loss in internal validity. In other words, some of the effects observed here may be limited to similar forecast sets. For instance, here (and perhaps in most naturalistic situations), inconsistency led to a slight increase in trust. This could have been due to the perception of greater uncertainty per se (perhaps due to larger inconsistencies than in the previous highly controlled studies) or to the predominance of ascending and confirmed trends in this forecast set. Similarly, participants’ increased cautiousness with inconsistent forecasts may have been due to the perception of greater uncertainty per se or to the fact that inconsistent forecasts often included slightly higher snow-total values. Thus, future work should test these effects with different naturalistic forecast data as well as manipulate them systematically in controlled studies to verify these particular effects. Another issue that could be resolved in future research is whether the source of inconsistency matters. For instance, inconsistency could be due to capricious weather situations or to lack of expertise among forecasters, which may impact some form of trust. Finally, it is important to note that this was a student sample. It is possible that greater experience or differences in education level might lead to slightly different results. However, recent evidence suggests that the ability to use probabilistic forecasts to make better decisions is similar among college students to a broader population (Grounds 2016; Grounds and Joslyn 2018). It is also important to note that decisions in a controlled experimental environment such as this differ in many respects to those made in real-world situations in which other factors play a role and the decision consequences can be very serious.
Importantly, the main results reported here align with a growing body of highly controlled experimental research. We have shown here that, in line with previous research (Joslyn and LeClerc 2012; LeClerc and Joslyn 2015; Grounds et al. 2018; etc.), explicit numeric uncertainty estimates preserved trust in the context of naturalistic forecasts and outcomes, especially as inaccuracy increased. Probabilistic forecasts also allowed users to make better decisions from an economic perspective. In addition, the research reported here provides converging evidence that the effect of forecast inconsistency is not as problematic as once thought and may also confer some benefits for forecast users. It is important to consider the impact of forecast inconsistency in the context of forecast inaccuracy as we have done here, because there can be a trade-off between them. Weather models tend to grow more accurate as lead times decrease. Therefore, the artificial maintenance of forecast consistency can be at a cost to accuracy. As shown previously in studies with highly controlled forecast stimuli (Burgeno and Joslyn 2020; Su et al. 2021) and here with naturalistic forecast data, inaccuracy is much more detrimental to trust than inconsistency. This is true whether inconsistency is based on a single source (presented experiment; Burgeno and Joslyn 2020) or resides in multiple sources (Su et al. 2021). It is true whether forecasts are encountered sequentially (present experiment; Burgeno and Joslyn 2020) or simultaneous (Su et al. 2021). All of this evidence points in the same direction: Inaccuracy is far more detrimental to user trust than inconsistency. In fact, much of this research suggests that inconsistency may be beneficial in that it provides useful information to decision-makers. Based on this converging evidence, we recommend that forecast providers avoid artificially preserving consistency at a potential loss to accuracy. Updating forecasts and including well-calibrated uncertainty estimates can preserve trust in the information source and provide users with decision-relevant information.
The experiment was programed in Microsoft Excel Visual Basic and conducted on standard desktop computers.
The endowment was calculated by multiplying the number of trials (130) by the cost of closing (2 points) and adding that product to the payment threshold, (130 × 2) + 72 = 332. This was done to create a cushion of points to maintain engagement with the task.
Special thanks are given to David B. Radell at NOAA and the National Weather Service for providing us with the forecast data.
There were 68% presented in the same order as in the historical forecast set.
Unpublished interviews with operational forecasters at National Weather Service Western Region, Seattle, Washington.
Order was a control variable to ensure that any observed effects would not be tied to a particular order. All dependent variables were summarized across order.
The numbering of hypotheses reported here is slightly different than those registered, although the content is the same.
The trust measure taken earlier (question 4) was prior to making the decision or learning the outcome, at which point accuracy was unknown to the participant. Nonetheless, it yielded similar results, with probabilistic forecasts increasing trust by approximately 35% [estimated odds ratio = 0.65, 95% CI = (0.51, 0.82), and p < 0.001]. Inconsistency increased trust by approximately 5% [estimated odds ratio = 0.95, 95% CI = (0.94, 0.97), and p < 0.001].
This is a simplifying assumption that responses nested within a participant are independent of one another.
A working correlation structure does not need to be specified correctly because robust standard errors, with wider confidence intervals than naïve standard errors, are agnostic to the structure specified. Therefore, even if the working correlation structure is misspecified, the model will still generate appropriate estimates.
Linear mixed-model regression analyses are also capable of accounting for clustered responses.
Inaccuracy was not included as a predictor because participants had not learned the outcome at the point at which they made a decision.
There are some exceptions to this at very small likelihoods (Tversky and Fox 1995).
This may be explained by the fact that magnitude of inconsistency was positively correlated with the probability of greater than 6 in. (r = 0.67; p < 0.001), making it generally optimal to close in trials in which there were large inconsistencies.
Inaccuracy was not included as a predictor because participants had not yet learned the outcome at the point at which they made an estimate.
Note that, because of the inclusion of random effects, R2 is uninterpretable for mixed-model regressions.
Acknowledgments.
This research was supported by the National Science Foundation under Grant 1559126. Special thanks are given to David B. Radell at NOAA and the National Weather Service for providing us with the forecast data and to Mengying Xu, Keiko Shannon, Justin Takeuchi, Yuan (Eva) Yin, and Brandy Steed for conducting data-collection sessions. Note: This paper was also submitted in partial fulfillment of a dissertation.
Data availability statement.
All data used and collected for this study are available online (https://osf.io/dv6j8).
APPENDIX A
Task Instructions and Training Trials
The following sections contain the exact text of the task instructions and training trials (Fig. A1) that participants were given.
a. Scenario
You have been hired to work for a decision consultancy. Your project this winter is to consult with school districts faced with widespread snowstorms. Your job is to provide decision advice regarding whether they should close school for the day or stay open for class.
Schools are closed when driving conditions are unsafe to prevent accidents and injuries. However, school closures are expensive to the district because days must be made up at the end of the school year.
You will be provided with forecasts for each school area 2-days and 1-day in advance of a storm to help you make your decision. Due to microclimates across the regions, snow accumulation can differ from location to location; therefore, you will receive weather forecast information from a private weather service that specializes in local predictions.
There will be two periods for which winter storms are anticipated across two regions. For each storm, you will provide school closure advice for 65 schools located throughout the region. You will see a screen indicating the new period after school 65.
If you think the school area will receive 6 inches of snow or more, advise closing. If you think the school area will receive less than 6 inches, advise staying open.
Your boss gives cash bonuses to the members of the decision consultancy staff who offer the best advice. You will begin with 332 points. It will cost 2 points every time you advise closing. It will cost 0 points if you advise to stay open. However, if you advise to stay open and 6 inches of snow or more is observed, then you will be penalized 6 points. Your goal is to give the best advice possible and retain as many points as you can. You will receive a cash bonus if your ending balance is above 72 points.
b. Summary
When you expect 6 inches or more of snow, you should advise the school to close.
When you expect less than 6 inches of snow, you should advise the school to stay open.
Cost to close schools: 2 points to compensate for makeup days.
Penalty for staying open when 6 inches or more are observed: 6 points to compensate for traffic accidents and injuries.
You will receive one dollar for every 32 points above 72 points at the end of the session.
You will now see several demonstration trials to help you understand your task. After those trials, you will begin making your own decisions. Your goal is to end up with the highest number of points possible.
APPENDIX B
Questions Asked on Each Trial
The following list contains the text of the questions that participants were asked on each trial:
-
How much snow accumulation do you expect on Wednesday?
-
I would not be surprised if the snow accumulation was as little as ____ inches
-
I would not be surprised if the snow accumulation was as much as ____ inches
-
How much do you trust Monday’s [or Tuesday’s] forecast to help you make your decision? [Response: 6-point scale from “not at all” to “completely”]
-
Do you want to close the school tomorrow? [Response options—Close: cost 2 points (I think snow accumulation will be 6 inches or more); Stay Open: cost 0 point (I think snow accumulation will be less than 6 inches)
-
Trust rating taken after outcome is shown: How much did you trust this week’s forecasts to help you make your decision? [Response: 6-point scale from “not at all” to “completely”]
APPENDIX C
Features Preserved in the Forecast Set as Compared with the Historical Forecast Set
Table C1 contains the statistics for the forecast and historical forecast sets for comparison.
Statistics for the forecast and historical forecast sets; SD indicates standard deviation.
APPENDIX D
Probabilistic Forecast Calibration Procedure
A binning technique was used to examine the reliability of the probabilistic forecasts because there were very few cases at the same probability, precluding more conventional measures such as the Brier score (Brier 1950). A bin was considered to be calibrated if the proportion of observed events with 6 in. or more fell within the probability range for that bin. For instance, bin 2 ranged between 5% and 14% and contained 1 of 10 trials (10%) in which 6 in. or more of snow accumulation was observed.
In the historical forecast dataset, the proportion of outcomes at or above the 6-in. threshold was within a few percentage points of the bin boundaries in most cases. However, in the higher-probability bins (65%–74%, 75%–84%, and 85%–94%), the proportions were as many as 25 percentage points higher than the upper bound of the bin, suggesting a low bias in the forecast probabilities for that day. Therefore, slight changes were made (cases were removed, cases were duplicated, and/or the probabilities were modified) to perfect probabilistic forecast reliability while maintaining the basic characteristics of the historical forecast set. See Table D1.
Day-2 forecast and observed probabilities.
APPENDIX E
Trust Analyses: GEE Model Descriptions and Regression Tables by Hypotheses
a. Hypotheses
The hypotheses associated with trust analyses are given in the following list:
-
Hypothesis 1—Is forecast format associated with trust?
-
Hypothesis 2—Is inconsistency associated with trust?
-
Hypothesis 3—Is inaccuracy associated with trust?
-
Hypothesis 4—Does the association between inconsistency and trust differ across forecast formats?
-
Hypothesis 5—Does the association between inaccuracy and trust differ across forecast formats?
-
Exploratory—Relationship between inconsistency, inaccuracy, and trust.
b. Models
Tables E1–E7 describe the models associated with the above hypotheses and explorations. The postoutcome trust measure, at which point forecast accuracy was known to participants, was analyzed. Postoutcome trust was an ordinal variable. Therefore, it was analyzed with a series of GEEs using cumulative link proportional odds regression models, which are designed to model ordinal data and population-averaged (between group) effects. We specified an “independence” working correlation structure and robust standard errors to build in resistance to possible misspecifications of the working correlation structure.
Hypotheses 1, 2, and 3 are addressed by model I with predictors inaccuracy, inconsistency, and forecast format. Here and in subsequent tables, LL indicates lower limit and UL indicates upper limit for the 95% CI.
Hypothesis 4 is addressed by model II with predictors inaccuracy, inconsistency, forecast format, and the inconsistency by forecast format interaction.
Model III with predictors inaccuracy, forecast format, inconsistency by deterministic, and inconsistency by probabilistic interactions was conducted to explore how the association between inconsistency and trust differed across forecast format.
Hypothesis 5 is addressed by model IV with predictors inaccuracy, inconsistency, forecast format, and the inaccuracy by forecast format interaction.
Model V with predictors inconsistency, forecast format, inaccuracy by deterministic, and inconsistency by probabilistic interactions was conducted to explore how the association between inaccuracy and trust differed across forecast format.
Exploratory model VI with predictors inaccuracy, forecast format, and inconsistency by forecast format interaction was conducted to test whether the association between inconsistency and trust differs across accuracy.
Exploratory model VII with predictors inaccuracy, forecast format, inconsistency by high accuracy and inconsistency by low-accuracy interactions was conducted to explore how the association between inconsistency and trust differed across accuracy.
APPENDIX F
Closure Decision Analyses: Binary GEE Model Descriptions and Regression Tables by Hypotheses
a. Hypotheses
The hypotheses associated with closure decision analyses are given in the following list:
-
Hypothesis 6a—Is forecast format associated with closure decisions?
-
Question 6b—Is optimal decision associated with closure decisions?
-
Question 6c—Does the association between optimal decision and closure decisions differ across forecast formats?
-
Question 6d—Does the association between optimal decision and closure decisions differ across inconsistency?
-
Question 6e—Is inconsistency associated with closure decisions?
-
Exploratory—Relationship between optimal decision, forecast format, and closure decisions.
b. Models
Tables F1–F4 describe the models associated with the above hypotheses, questions, and explorations.
Hypothesis 6a and questions 6b and 6e are addressed by model I with predictors optimal decision, inconsistency, and forecast format.
Question 6c is addressed by model II with predictors optimal decision, inconsistency, forecast format, and the optimal decision by forecast format interaction.
Question 6d is addressed by model III with predictors optimal decision, inconsistency, forecast format, and optimal decision by inconsistency.
Exploratory model IV with predictors forecast format, inconsistency, optimal decision by deterministic forecast format, and optimal decision by probabilistic forecast format was conducted to test whether the association between optimal decision and closure decisions differed across forecast format.
APPENDIX G
Linear Mixed-Model Regression Tables for Expected Value, Uncertainty Range, and Best Estimate
The regression tables for expected value (Table G1), best estimate (Table G2), and uncertainty range (Table G3) are presented here.
Regression table for expected value.
Regression table for best estimate.
Regression table for uncertainty range.
REFERENCES
Bernoulli, D., 1954: Exposition of a new theory on the measurement of risk. Econometrica, 22, 23–36, https://doi.org/10.2307/1909829.
Brier, G. W., 1950: Verification of forecasts expressed in terms of probability. Mon. Wea. Rev., 78, 1–3, https://doi.org/10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2.
Budescu, D. V., A. K. Rantilla, H.-T. Yu, and T. M. Kareletiz, 2003: The effect of asymmetry among advisors on the aggregation of their opinions. Organ. Behav. Hum. Decis. Processes, 90, 178–194, https://doi.org/10.1016/S0749-5978(02)00516-2.
Burgeno, J. N., and S. L. Joslyn, 2020: The impact of weather forecast inconsistency on user trust. Wea. Climate Soc., 12, 679–694, https://doi.org/10.1175/WCAS-D-19-0074.1.
Earle, T. C., 2010: Trust in risk management: A model-based review of empirical research. Risk Anal., 30, 541–574, https://doi.org/10.1111/j.1539-6924.2010.01398.x.
Falk, A., and F. Zimmermann, 2017: Consistency as a signal of skills. Manage. Sci., 63, 2197–2210, https://doi.org/10.1287/mnsc.2016.2459.
Grounds, M. A., 2016: Communicating weather uncertainty: An individual differences approach. Ph.D. dissertation, University of Washington, 141 pp.
Grounds, M. A., and S. L. Joslyn, 2018: Communicating weather forecast uncertainty: Do individual differences matter? J. Exp. Psychol. Appl., 24, 18–33, https://doi.org/10.1037/xap0000165.
Grounds, M. A., J. E. LeClerc, and S. Joslyn, 2018: Expressing flood likelihood: Return period versus probability. Wea. Climate Soc., 10, 5–17, https://doi.org/10.1175/WCAS-D-16-0107.1.
Hohle, S. M., and K. H. Teigen, 2015: Forecasting forecasts: The trend effect. Judgment Decis. Making, 10, 416–428, https://doi.org/10.1017/S1930297500005568.
Hohle, S. M., and K. H. Teigen, 2019: When probabilities change: Perceptions and implications of trends in uncertain climate forecasts. J. Risk Res., 22, 555–569, https://doi.org/10.1080/13669877.2018.1459801.
Højsgaard, S., U. Halekoh, and J. Yan, 2005: The R package geepack for generalized estimating equations. J. Stat. Software, 15, 1–11, https://doi.org/10.18637/jss.v015.i02.
Joslyn, S., and S. Savelli, 2010: Communicating forecast uncertainty: Public perception of weather forecast uncertainty. Meteor. Appl., 17, 180–195, https://doi.org/10.1002/met.190.
Joslyn, S. L., and J. E. LeClerc, 2012: Uncertainty forecasts improve weather-related decisions and attenuate the effects of forecast error. J. Exp. Psychol. Appl., 18, 126–140, https://doi.org/10.1037/a0025185.
Kadous, K., M. Mercer, and J. Thayer, 2009: Is there safety in numbers? The effects of forecast accuracy and forecast boldness on financial analysts’ credibility with investors. Contemp. Accounting Res., 26, 933–968, https://doi.org/10.1506/car.26.3.12.
Kahn, B. E., and M. F. Luce, 2003: Understanding high-stakes consumer decisions: Mammography adherence following false-alarm test results. Mark. Sci., 22, 393–410, https://doi.org/10.1287/mksc.22.3.393.17737.
Kahneman, D., and A. Tversky, 1979: Prospect theory: An analysis of decision under risk. Econometrica, 47, 263–291, https://doi.org/10.2307/1914185.
Lazo, J. K., R. E. Morss, and J. L. Demuth, 2009: 300 billion served: Sources, perceptions, uses, and values of weather forecasts. Bull. Amer. Meteor. Soc., 90, 785–798, https://doi.org/10.1175/2008BAMS2604.1.
LeClerc, J., and S. Joslyn, 2015: The cry wolf effect and weather‐related decision making. Risk Anal., 35, 385–395, https://doi.org/10.1111/risa.12336.
Losee, J. E., and S. Joslyn, 2018: The need to trust: How features of the forecasted weather influence forecast trust. Int. J. Disaster Risk Reduct., 30A, 95–104, https://doi.org/10.1016/j.ijdrr.2018.02.032.
Maglio, S. J., and E. Polman, 2016: Revising probability estimates: Why increasing likelihood means increasing impact. J. Pers. Soc. Psychol., 111, 141–158, https://doi.org/10.1037/pspa0000058.
Morss, R. E., J. L. Demuth, and J. K. Lazo, 2008: Communicating uncertainty in weather forecasts: A survey of the U.S. public. Wea. Forecasting, 23, 974–991, https://doi.org/10.1175/2008WAF2007088.1.
Murphy, A. H., 1977: The value of climatological, categorical and probabilistic forecasts in the cost-loss ratio situation. Mon. Wea. Rev., 105, 803–816, https://doi.org/10.1175/1520-0493(1977)105<0803:TVOCCA>2.0.CO;2.
National Climatic Data Center, 2019: 30 years of Seattle snow accumulation data (1989–2019). National Climatic Data Center, accessed 2 December 2019, https://www.ncdc.noaa.gov/cdo-web.
NOAA, 2016: Risk communication and behavior: Best practices and research findings. NOAA Rep., 66 pp., https://repository.library.noaa.gov/view/noaa/29484.
Pasquini, E. S., K. H. Corriveau, M. Koenig, and P. L. Harris, 2007: Preschoolers monitor the relative accuracy of informants. Dev. Psychol., 43, 1216–1226, https://doi.org/10.1037/0012-1649.43.5.1216.
Ronfard, S., and J. D. Lane, 2018: Preschoolers continually adjust their epistemic trust based on an informant’s ongoing accuracy. Child Dev., 89, 414–429, https://doi.org/10.1111/cdev.12720.
Savelli, S., and S. Joslyn, 2012: Boater safety: Communicating weather forecast information to high-stakes end users. Wea. Climate Soc., 4, 7–19, https://doi.org/10.1175/WCAS-D-11-00025.1.
Siegrist, M., H. Gutscher, and T. C. Earle, 2005: Perception of risk: The influence of general trust, and general confidence. J. Risk Res., 8, 145–156, https://doi.org/10.1080/1366987032000105315.
Smithson, M., 1999: Conflict aversion: Preference for ambiguity vs conflict in sources and evidence. Organ. Behav. Hum. Decis. Processes, 79, 179–198, https://doi.org/10.1006/obhd.1999.2844.
Su, C., J. N. Burgeno, and S. Joslyn, 2021: The effects of consistency among simultaneous forecasts on weather-related decisions. Wea. Climate Soc., 13, 3–10, https://doi.org/10.1175/WCAS-D-19-0089.1.
Touloumis, A., 2015: R Package multgee: A generalized estimating equations solver for multinomial responses. J. Stat. Software, 64, 1–14, https://doi.org/10.18637/jss.v064.i08.
Tversky, A., and C. R. Fox, 1995: Weighing risk and uncertainty. Psychol. Rev., 102, 269–283, https://doi.org/10.1037/0033-295X.102.2.269.
Twyman, M., N. Harvey, and C. Harries, 2008: Trust in motives, trust in competence: Separate factors determining the effectiveness of risk communication. Judgment Decis. Making, 3, 111–120, https://doi.org/10.1017/S1930297500000218.
Wilson, L. J., and A. Giles, 2013: A new index for the verification of accuracy and timeliness of weather warnings. Meteor. Appl., 20, 206–216, https://doi.org/10.1002/met.1404.