## 1. Introduction

The construction of models that attempt to ascertain the economic value of weather and climate forecasts has a long history in meteorology and allied fields (Katz and Murphy 1997). Such valuation models are necessary if we are to understand when a particular set of forecasts might be favorably applied to a given decision problem, and they also play an important role in legitimizing meteorological research in wider society, particularly to funding bodies (Pielke and Carbone 2002). The dual motivations of forecast producers—a wish to provide weather and climate information that is useful to society and a simultaneous desire to pursue scientific research for its own sake—are not always in consonance. The scientific community has evolved a set of metrics by which it measures the performance of its forecasts (see, e.g., Wilks 2006) that are at best only a partial indication of whether they will be useful to actual forecast users. This is easily demonstrated by including even the most crude representation of a user’s decision problem into the verification exercise. Indeed, Murphy and Ehrendorfer (1987) show that increases in forecast accuracy can actually decrease forecast value. The inadequacy of standard forecast verification measures as indicators of forecast value is well known (Richardson 2001), and models of forecast value based on decision theory are a significant step toward obtaining measures of forecast performance that are relevant to real decision makers.

However, even those decision models that have attempted to resolve some of the parameters that might be relevant to actual forecast users have invariably focused on a best-case scenario, in which forecast users are assumed to be hyperrational decision makers who process their perfect knowledge of the forecasting products in statistically sophisticated ways. Such models are normative in flavor, prescribing forecast use strategies that are optimal with respect to a stated decision metric. As such, these valuation models tend to overstate the value that actual users extract from forecasts. This has the dual effect of giving the scientific community a sense of legitimacy that is not necessarily mirrored on the ground while leaving real opportunities for gains in the uptake of objectively valuable forecasts relatively unexplored. Empirical studies over temporal scales ranging from days to seasons (Stewart et al. 2004; Rayner et al. 2005; Vogel and O’Brien 2006; Patt et al. 2005) suggest that the forecast user’s behavior is often not accurately predicted by such normative valuation models and that a variety of economic, institutional, and behavioral factors can contribute to the forecast use choices of individuals (Patt and Gwata 2002; Roncoli 2006).

The purpose of this paper is to demonstrate that even modest changes to the behavioral assumptions of normative valuation models can lead to a substantially different picture of the forecast user’s behavior and hence the de facto value of forecasting information. Such a behavioral model may not only provide a more accurate picture of realized forecast value for certain user groups but also highlight situations in which interventions such as user education programs are most likely to be efficacious. The paper considers one of the simplest decision problems, the cost–loss scenario (see, e.g., Murphy 1977; Katz and Murphy 1997; Zhu et al. 2002; Smith and Roulston 2004), and compares the behavior of two types of stylized agents. The first type of agent is a rational decision maker who is statistically sophisticated (i.e., makes decisions based on knowledge of the statistical properties of the forecasts) and has perfect information about the forecasts she receives—I describe her behavior using a standard normative model used elsewhere in the literature (Richardson 2001; Wilks 2001; Katz and Ehrendorfer 2006). The second type of agent is also a rational decision maker but initially has limited information about the forecasting product, does not trust the forecasts completely, and is statistically unsophisticated; that is, he does not keep track of the consequences of his choices in a statistically consistent manner. Because he does not know that he will be better off using the forecasts, I model his behavior as a learning process in which his forecast use choices are guided by his past experience. As he decides whether or not to make use of the forecasts and experiences the consequences of his decisions, he learns from them in a manner consistent with a prominent psychological theory of learning behavior known as reinforcement learning.

Reinforcement learning is a widely used paradigm for representing so-called operant conditioning, in which the consequences of a behavior modify its future likelihood of occurrence. It has a long history in psychology, going back at least as far as the statement of the Law of Effect by Thorndike (1911), which has been argued to be a necessary component of any theory of behavior (Dennett 1975). The framework is based on the empirical observation that the frequency of a given behavior increases when it leads to positive consequences and decreases when it leads to negative consequences (Thorndike 1933; Bush and Mosteller 1955; Cross 1973). Because of its emphasis on the frequencies of choices, the model is necessarily probabilistic in nature. In addition, the model is forgetful, in that the only knowledge it has of past choices and their consequences is the agent’s current propensity for a given choice. Thus information about the entire historical sequence of choices and outcomes is compressed into a single value. This loss of information allows the model to represent statistically unsophisticated learning behavior. I construct a minimal behavioral model of the user’s learning process based on this framework. The model has been intentionally kept as basic as possible to facilitate comparisons with the normative paradigm. Yet even in this case, the introduction of learning dynamics has a marked effect on our picture of users’ rates of forecast usage and hence the value of forecasts.

In the following section, a normative model of forecast value for the cost–loss decision problem is briefly introduced, and its assumptions are explained. Section 3 develops the behavioral model of statistically unsophisticated learning in detail and derives its properties. In section 4, a quantitative relationship between the normative and behavioral models of forecast value is established. It is shown that accounting for learning dynamics reduces the user’s realized value score by a factor that depends on his decision parameters (i.e., the cost–loss ratio), the climatological probability of the adverse event, and the forecast skill. The implications of this result are examined, and the general properties of the dependence of the deviation between the two models on these parameters are established. The paper concludes by commenting on the policy relevance of the results and suggesting a focus for future research.

## 2. A normative model of forecast value

The normative model of forecast value that I will use was developed by Richardson (2001), though it is very similar to Wilks (2001). I focus on probabilistic forecasts because they are widely used by operational forecast centers. Indeed, Murphy (1977) shows that reliable probability forecasts are necessarily more valuable than categorical forecasts for the decision problem I consider. The model rests on a specification of the user’s decision problem, the assumption that users are rational decision makers with a perfect understanding of the forecasting product, and simplifying assumptions about the nature of the forecasts. I briefly develop the model below.

Models of forecast value that are based on the rational actor assumption prescribe optimal user decision-making behavior. The notion of optimality requires that there be something to optimize. In the case of the model developed here, users are assumed to act so as to minimize their expected losses. In general, expected losses are dependent on the details of the user’s decision problem. To make the discussion concrete, I will consider the cost–loss problem—a staple of the forecast valuation literature (Katz and Murphy 1997).

At each time step in the cost–loss scenario, users must decide whether or not to protect against the adverse effects of a weather event—for example, purchase hurricane insurance—that occurs with some stationary climatological probability *p _{c}*. The insurance costs

*C*, and the users have an exposure of

*L*, where

*L*>

*C*. If the hurricane occurs, then the policy pays out an amount

*L*so that the losses are completely covered. If we let

*a*be the user’s action (

*a*= 1 if she protects and

*a*= 0 if she does not) and let

*e*be the event (

*e*= 1 if the event occurs and

*e*= 0 if it does not), then we can represent this scenario with the loss matrix 𝗟(

*a*,

*e*), depicted in Table 1.

*p*of the hurricane occurring satisfies

*p*>

*C*/

*L*. I define

*z*to be the cost–loss ratio

*z*:=

*C*/

*L*. Thus the decision rule

To compute the value of a forecasting product as a whole, it is necessary to find the average losses that the user sustains when making use of the forecasts. This is achieved by specifying the joint distribution of forecasts *p* and events *e*. This joint distribution can be decomposed by using the calibration–refinement factorization (Murphy and Winkler 1987; Wilks 2001). In this scheme, one writes the joint distribution as the product of the marginal distribution of the forecasts *p*, which I will call *g*(*p*), and the conditional distribution of the event given the forecast. Because there are only two events in the cost–loss scenario, only one such conditional distribution is needed; for example, Prob(*e* = 1|*p*) because Prob(*e* = 0|*p*) = 1 − Prob(*e* = 1|*p*). In what follows I define *f*_{1}(*p*) := Prob(*e* = 1|*p*). The function *f*_{1}(*p*), also known as the calibration function, determines the reliability of the forecasts. Perfectly reliable forecasts have *f*_{1}(*p*) = *p*. Such forecasts can be shown to be unconditionally unbiased (Jolliffe and Stephenson 2003). In what follows I assume that forecasts are perfectly reliable. Although an idealization of reality, the perfect reliability assumption is a good working hypothesis. Provided sufficient validation data are available, operational forecasts are calibrated using empirical calibration functions [i.e., an estimate of *f*_{1}(*p*)] so that the calibrated forecasts approximate perfect reliability. However, one should keep in mind that forecasts that are calibrated to the past are not necessarily calibrated to the true probability of the event occurring in the present (Oreskes et al. 1994). Although this point affects what we mean when we talk about calibrated forecasts, it has no bearing on the analysis that follows.

*g*(

*p*) is constrained to be equal to the climatological probability of the event.

*g*(

*p*)—which describes the probability of receiving a forecast

*p*—in hand, the expected losses

*V*that are sustained when the forecast product is used can be calculated. First, I define the function

_{F}*μ*(

*x*), the partial mean of

*g*(

*p*) at

*x*via

*p*as

*p*—the user’s expected losses would be

_{c}*C*and

*L*only through their ratio

*z*.

*g*(

*p*) to be a beta distribution with (positive) parameters

*r*,

*s*:

*E*denotes an expectation over the joint distribution of

*e*and

*p*. It thus measures the average squared deviation between the forecasts and the realized events. Using the equation (9), the Brier skill score (BSS), a measure of forecast performance analogous to the relative value score specified earlier, can be shown to be given by

*g*(

*p*) is equal to

*r*/(

*r*+

*s*) imply that

*g*(

*p*) can be determined in terms of the Brier skill score and the climatological probability of the event:

*z*,

*p*, and BSS. These three parameters capture the details of the decision problem, the environment, and the forecast performance, respectively.

_{c}## 3. A behavioral model: Learning from experience

The normative model described above is an elegant extension of the forecast validation literature to include a representation of a decision structure that agents might face when applying forecast information in their daily lives. Although the cost–loss decision problem is by no means universal, it is an intuitive and straightforward example and general enough to be useful in a wide range of applied settings (Stewart et al. 2004; Katz and Murphy 1997). The inclusion of a decision structure into the validation and valuation exercise is a vital and necessary step; however, it falls short of providing an indication of how much value actual decision makers extract from forecasts. This should be no surprise because the model is intentionally normative, rather than positive, in its design. Thus it is perhaps best thought of as providing an upper bound on forecast value.

Behavioral deviations from the normative framework are among the main reasons for expecting real forecast users to extract less value from forecasts than normative models say they should. There is abundant evidence in both the applied meteorology and psychology literatures that suggests that psychological factors can affect people’s forecast use behavior and thus the realized value of forecasts. A number of studies have focused on the difficulties of communicating probabilistic forecasts (Gigerenzer et al. 2005; Roncoli 2006; National Research Council 2006), while others (Nicholls 1999), inspired by the seminal work of Kahneman and Tversky (2000), have emphasized the importance of cognitive heuristics and biases as explanations of suboptimal behavior. The perceived trustworthiness of forecasts is also a key limitation on their uptake. In the context of the cost–loss scenario, the theoretical analysis of Millner (2008) suggests that perceived forecast value is nonlinearly related to perceived accuracy and that a critical trust threshold must be crossed before forecasts are believed to have any value at all.

The normative model’s representation of user behavior assumes that users are rational, in the sense of the decision rule given in (1), and they have perfect knowledge of the properties of the forecasts; that is, they understand that the forecasts are perfectly reliable, and hence that they are better off using them than resorting to their climatological information. Thus perfectly knowledgeable users have complete trust in the forecasts, by definition. Although the heuristics and biases literature interrogates the rationality assumption in the context of decision making under uncertainty, there has been rather less formal treatment of the implications of the perfect knowledge assumption for users of weather and climate forecasts. When this assumption is relaxed in the normative model developed earlier, instead of using the forecasts all the time, the user is faced with a choice between using forecasts and other information sources. How this choice is made depends crucially on the representation of the user’s learning about forecast performance. If we assume that the user is statistically sophisticated—that is, he eventually deduces the conditional distribution *f*_{1}(*p*) after long-term exposure to the forecasts—then we can expect him to ultimately converge to normative behavior. This assumption may be justified for a subset of forecast users—for example, energy traders and sophisticated commercial farmers. If on the other hand he is statistically unsophisticated, does not trust the forecasts completely, or just does not pay close attention to forecast performance, his forecast use choices are likely to be dictated by other, more informal, learning processes. This case is likely to be more appropriate for people who lack the requisite statistical training, or the will, to understand objective quantifications of forecast performance and instead form opinions about the benefits of forecast use based on their experience. Put another way, their learning process is based on a response to the stimulus provided by the consequences of their forecast use choices, rather than cognitive reflection on the problem. In the remainder of this section, I propose a model of how a user who engages in such a learning process might differ from normative behavior.

The model I will develop is a very basic version of reinforcement learning—one of the most prevalent psychological frameworks for understanding learning processes. Theories of learning based on a notion of reinforcement go back as far as Thorndike (1911) and Pavlov (1928), who studied associative learning in animals. The theory was later refined and developed by Thorndike (1933) and Skinner (1933) and became one of the theoretical pillars of the so-called behaviorist school of psychology. Although by no means unchallenged, reinforcement learning still underpins a vast swathe of behavioral research today. [Refer to Lieberman 1999 and Mazur 2006 for book-length treatments that contextualize reinforcement in the wider literature on learning.]

In its simplest form, reinforcement learning suggests that the *frequency* of a given behavior is “reinforced” by its consequences; that is, choices that lead to positive consequences increase in frequency and those that lead to negative consequences decrease in frequency. The emphasis on the frequencies of choices, which necessitates a probabilistic description of choice behavior rather than a deterministic choice rule, is one of the fundamental differences between reinforcement learning and normative paradigms based on decision theory. Cross (1973) explains that, “[I]f we repeatedly reward a certain action, we in no sense guarantee that the action will be taken on a subsequent occasion, even in a ceteris paribus sense; only that the likelihood of its future selection will be increased. The vast body of experimental material on learning that has been accumulated provides convincing evidence that this interpretation is a good one.” As Cross suggests, reinforcement learning–type models have been remarkably successful at reproducing learning behavior in a variety of contexts, including more complex scenarios than ours in which agents engage in strategic interactions (Erev and Roth 1998; Arthur 1991).

The particular formal model of choice behavior I will adopt itself has a long history in mathematical psychology and economics. The first of this class of models (Bush and Mosteller 1955) considered the case where reinforcing stimuli were either present or absent and modeled the change in the frequencies of choices based on these binary outcomes. This model was later extended by Cross (1973) to include the effect of positive payoffs of different magnitudes on reinforcement. The version of the model I employ here is a slightly extended modern incarnation taken from Brenner (2006), which allows payoffs to be either positive or negative, and hence can represent negative reinforcement as well. The model also captures what psychologists refer to as spontaneous recovery (Thorndike 1932), in which actions that are nearly abandoned because of a series of negative consequences quickly increase in frequency if they result in positive outcomes. The model has the advantage of being intuitive, and analytically tractable, so that the behavioral modification to the normative relative value score in Eq. (8) can be computed explicitly as a function of the model parameters.

Imagine a hypothetical decision maker, call him Burrhus.^{1} Burrhus is not as certain about whether or not to use the forecasts as the users in the normative model. He has two sources of information: the forecasts and his knowledge of the climatological probability of the adverse event. Occasions will arise in which these two information sources will be in conflict, and Burrhus will be forced to choose between them. It is this choice that I wish to model. Burrhus suffers from informational constraints, in that initially he has no knowledge of the performance of a given forecast system, and he is statistically unsophisticated, in that even after long-term exposure to the forecasting service, he does not base his decision on the conditional distribution *f*_{1}(*p*). Instead, let us suppose that Burrhus bases his forecast use decisions on his experience of the consequences of his choices, and that he learns from the outcomes of those choices in a manner consistent with reinforcement learning. I thus assume that at each time step, Burrhus makes a probabilistic choice between using the forecasts or the climatology. We are interested in how the probability of his making use of the forecasts might evolve over time as he makes decisions and receives feedback on their consequences.

*t*be an index of the events in the set

*D*. Given that there is disagreement, let Burrhus’s forecast use choice at

*t*be

*c*(

*t*), where

*c*(

*t*) = 1 if he chooses to make use of the forecasts and

*c*(

*t*) = 0 if he chooses to resort to his climatological information. I assume that once his choice is made, Burrhus acts rationally; that is, he acts in accordance with the decision rule (1) and follows the recommendation of his chosen information source. Once a choice is made and a consequence is realized, Burrhus learns from his experience in such a way that a positive outcome at

*t*increases his chance of making choice

*c*(

*t*) at

*t*+ 1, whereas a negative outcome reduces it. Suppose that

*q*(

*c*,

*t*) is Burrhus’s probability of making choice

*c*at

*t*. The Brenner (2006) model of how these probabilities evolve is

*R*(

*t*) is the

*reinforcement strength*at

*t*and

*υ*is the

*learning rate*, where we require

*R*(

*t*) > 0] reinforce the choice that led to them by increasing the probability of that choice being made on the next occasion of disagreement. The increase in probability is proportional to the strength of the reinforcement and the probability of the alternative choice. The proportionality constant

*ν*is the learning rate and determines how quickly Burrhus responds to new information. Similarly, negative outcomes [

*R*(

*t*) < 0] are negatively reinforced, in that the probability of making the choice that led to them at

*t*+ 1 is decreased by an amount proportional to the reinforcement strength and to the probability of the choice

*c*(

*t*). Notice that this rule only tells us what to do to update the probability of making the choice

*c*(

*t*); however, because there are only two possible values of

*c*and the sum of the probabilities of the choices must equal one, we have that

*q*(1 −

*c*,

*t*+ 1) = 1 −

*q*(

*c*,

*t*+ 1). This allows us to express this updating procedure in terms of just one of the probabilities, say,

*q*:=

_{t}*q*(1,

*t*)—the probability of following the forecasts.

*t*. I will assume that the reinforcement strength is determined as follows: Suppose the loss associated with the action

^{2}

*a*(

*t*) at

*t*, given that event

*e*(

*t*) occurred, is 𝗟(

*a*(

*t*),

*e*(

*t*)), where 𝗟 is the loss matrix in Table 1. Then the reinforcement strength is given by

*t*and the loss that was actually realized given that he followed his choice

*c*(

*t*). Thus

*R*(

*t*) is a measure of the regret or happiness that Burrhus feels from his choice. For example, suppose that at

*t*=

*τ*Burrhus chose to make use of the forecasts. Suppose that

*z*>

*p*, so that when the forecasts and the climatology disagree, the forecasts necessarily suggest that Burrhus should protect—that is,

_{c}*a*(

*τ*) = 1. Assume that the adverse event was subsequently realized [

*e*(

*τ*) = 1]. Then the reinforcement strength used to update the probability of using the forecasts is

*R*(

*τ*) =

*L*−

*C*> 0. In this case Burrhus will be more likely to use the forecast on the next occasion of disagreement because it led him to take the correct action.

It is important to keep in mind that the choice of reinforcement strength in Eq. (19), although intuitive, is not necessarily an accurate representation of human choice behavior. In general, reinforcement may be moderated by the variability in the rewards obtained from a given choice (Behrens et al. 2007), nonlinear responses to rewards of different magnitudes, and asymmetries between the reinforcing effects of regret and happiness (Kahneman and Tversky 1979). The motivation for my choice is a desire to keep the model simple and tractable and as close as possible to its normative analog. Thus the model should be interpreted as a stylized example of the behavioral modeling paradigm, rather than an empirically substantiated predictive model.

*R*are just linear combinations of

*L*and

*C*. The update rule (17) can thus be written as a function of

*z*=

*C*/

*L*only by defining a rescaled learning rate

*λ*:=

*υL*. By using this definition, Eqs. (17)–(19), and the fact that

*q*(0,

*t*) = 1 −

*q*, a complete list of possible outcomes from the reinforcement rule can be generated. This list is reproduced in appendix A. One finds that the learning rule reduces to

_{t}*λ*, becomes

*R*(

*a*,

*e*) = −

*R*(1 −

*a*,

*e*)] and the fact that the two information sources necessarily disagree when learning occurs. At face value, these equations seem to suggest that the learning process is independent of forecast performance; however, this is not the case. The reason is that because attention is restricted to the set

*D*, the probability of the event

*e*= 1 is no longer given by

*p*, the climatological probability. In fact if we let

_{c}*p*

_{1|D}be the probability of

*e*= 1 given that we are in

*D*, then we have that

*p*

_{1|D}depends implicitly on the forecast performance through the distribution

*g*(

*p*) and also on the value of

*z*. A representative trajectory of the sequence

*q*is plotted in Fig. 1.

_{t}Notice that although a statistically sophisticated forecast user would eventually converge to the value *q _{t}* = 1 because forecasts are more valuable on average than the climatology, Burrhus’s tendency toward a particular choice is constantly in flux. Because the past is represented only by the current value

*q*, he does not have access to a complete picture of the historical consequences of his actions. His tendencies thus change over time as he makes successive choices and receives feedback on their consequences.

_{t}The trajectory illustrated in Fig. 1 is only one of an infinite number. To draw some general conclusions about Burrhus’s behavior for particular parameter values, we need to understand the statistics of the updating process given by Eq. (20). To do this I will make use of the following results:

**Theorem 1:**Let

*q*

_{t}be the expected value of

*q*. Then

_{t}*q*

_{t}satisfies

*A*,

*B*are constants given by

**Theorem 2:** If *λ* ∈ (0, 1), then eventually (*t* → ∞) the update rule in Eq. (20) gives rise to a distribution *P*(*q*) of the values of *q _{t}* that is independent of the initial value

*q*

_{0}and asymptotically stationary.

A proof of theorem 1 is given in appendix B, and a sketch of the proof of theorem 2 is given in appendix C. Figure 2 illustrates theorem 2 by simulating the long-run distribution *P*(*q*) for fixed values of *z*, *p _{c}*, BSS, and different values of

*λ*.

*P*(

*q*) shift when

*λ*is changed, the figure suggests that the long-run mean value remains unchanged. With theorem 1 in hand, it is possible to calculate an explicit formula for the long-run mean value of

*q*that demonstrates this fact. Define

_{t}*q*

_{t→∞}

*q*

_{t};

*q*

*q*

*B*/(1 −

*A*). Thus,

*q*

*λ*and the initial value

*q*

_{0}, the long-run average dynamics of the learning model are completely specified by

*z*,

*p*, and BSS. The sequence of expected values

_{c}*q*

_{t}is guaranteed to converge to

*q*

*A*| < 1. This requirement translates into a condition on

*λ*that ensures convergence; however, it can be shown that this condition is not as restrictive as that in (21).

^{3}Furthermore, restricting

*λ*to be less than 1 ensures that the entire distribution of

*q*values (not just the mean) converges to a long-run distribution that is independent of the initial value

*q*

_{0}. Thus the distributions plotted in Fig. 2 are valid for all

*q*

_{0}.

*q*

_{t}converges to its long-run value

*q*

*A*> 0, convergence to

*q*

*A*> 0 when

*λ*satisfies (21), I will focus on this case. Thus when

*q*

_{0}<

*q*

*q*

*q*

_{0}>

*q*

*q*

_{0}<

*q*

*A*gives the rate of convergence of the sequence of mean values, with values close to zero (one) corresponding to fast (slow) convergence. Inspecting the expression for

*A*in Eq. (24), it is clear that the larger

*λ*is, the faster the convergence. Moreover, one can show that lim

_{z→0}

*A*= lim

_{z→1}

*A*= 1.

^{4}Thus high or low values of

*z*exhibit slow convergence to

*q*

*A*on

*z*for a sample case.

In the next section, I make use of these results to establish the relationship between the behavioral learning model and the normative model of section 2.

## 4. Relationship between normative and behavioral models

*q*

*V*is the expected loss that the behavioral learner sustains in the long run.

_{B}*begin by defining the set*

_{B}*D*, the complement of

^{c}*D*—that is, the set of events in which the forecasts and the climatology agree with one another—and let

*P*= Prob(

_{D}*D*) = 1 − Prob(

*D*) be the probability of the forecasts and the climatology disagreeing. In addition, let

^{c}*V*

_{clim|D}and

*V*

_{F|D}be the expected losses from using the climatology and the forecasts, respectively, conditional on the fact that the two information sources disagree, with similar definitions holding for the case where they do agree. Then, the average losses

*V*that the behavioral learner sustains are given by

_{B}*V*

_{F|Dc}=

*V*

_{clim|Dc}, we have that

*is related to the normative relative value score VS*

_{B}*via*

_{N}*z*for several values of the Brier skill score in Fig. 4.

The relationship (34) and the expression for *q*

First, notice that Eq. (34) implies that the behavioral relative value score is always less than the normative relative value score—statistically unsophisticated forecast users do not attain the normative ideal even after long-term exposure to the forecasting product. In fact, for fixed parameter values, VS* _{B}* is a constant fraction

*q*

*. Thus the key to understanding the relationship between the two models is to understand this quantity, which depends on*

_{N}*z*, BSS, and

*p*.

_{c}*z*on reinforcement. For large (small) values of

*z*, the reinforcement strength

*R*in Eq. (19) will be large in absolute magnitude when

*e*= 0 (

*e*= 1). However, using Eq. (22) one can show that

*e*= 0 (

*e*= 1) when

*z*is large (small), the probability of

*e*= 0 (

*e*= 1) occurring is relatively low. It turns out that the low probability dominates the strength of the reinforcement for extreme values of

*z*. Extensive numerical simulations for a variety of parameter values show that

*z*= 0 (for

*p*> 0.5) or

_{c}*z*= 1 (for

*p*< 0.5) gives rise to the lowest value of

_{c}*q*

*z*close to 0 or 1 exhibit the greatest relative reduction in their achieved value scores, with intermediate values of

*z*corresponding to higher values of

*q*

*tends to zero as*

_{N}*z*tends to 0 or 1 [as can be shown from the definition (8)], the difference between the losses realized by normative and behavioral agents is small for extreme values of

*z*. The behavior of

*V*−

_{B}*V*for intermediate values of

_{F}*z*is as follows: For

*p*< 0.5, the difference between losses is increasing in

_{c}*z*for

*z*<

*p*, whereas for

_{c}*z*>

*p*it either decreases monotonically, or it has a single maximum before descending back to zero at

_{c}*z*= 1. This behavior is reversed for

*p*> 0.5, with the loss difference increasing monotonically or exhibiting a single maximum for

_{c}*z*<

*p*and decreasing for

_{c}*z*>

*p*. This behavior is illustrated in Fig. 5.

_{c}Now consider the dependence of *q**q** _{N}* − VS

*is proportional to*

_{B}*V*−

_{B}*V*when we hold

_{F}*z*and

*p*fixed and vary BSS. I plot the difference between the value scores as a function of BSS in Fig. 6.

_{c}The figure suggests that the difference between the losses realized by behavioral and normative agents is always a unimodal function of BSS, with the largest deviations occurring for intermediate values of BSS, and the losses sustained converging for extreme values of BSS. The reason for this is simple—the factor (1 − *q** _{N}* is increasing in BSS. The competition between these two effects leads to the inverted U shape in Fig. 6. In addition, because VS

*= 0 when BSS = 0 and*

_{N}*q*

The fact that the difference between the normative and behavioral models has a characteristic dependence on the decision parameter *z* and the forecast skill BSS, allows predictions to be made about when the effect of statistically unsophisticated learning on user behavior and realized value is likely to be most important. In general, the model suggests that for users with intermediate values of *z* and for forecasts of moderate skill, behavioral deviations from the normative ideal are likely to be significant. This provides a direct justification for the model, because it may not only provide a more realistic representation of the behavior of certain kinds of forecast users but it can also suggest which users stand to gain the most from an increased knowledge and understanding of forecasts and their performance.

## 5. Conclusions

The analysis presented in this paper offers an alternative modeling paradigm to the normative framework for assessing forecast value in the case of cost–loss decisions. The model is designed to incorporate a specific behavioral effect—learning dynamics based on the assumption that the forecast user is statistically unsophisticated; that is, the forecast user does not deduce the statistical properties of the forecasts after long-term exposure to them. Instead, the user reacts to forecasts in a manner consistent with the theory of reinforcement learning, so that the frequencies of forecast use choices are positively or negatively reinforced depending on their consequences. A simple model of this process based on existing models of reinforcement in the psychology and economics literature was proposed, its consequences deduced, and its deviation from the normative model analyzed. It was demonstrated that accounting for statistically unsophisticated learning reduces the relative value score that the forecast user achieves by a multiplicative factor that depends on the user’s cost–loss ratio, the forecast skill, and the climatological probability of the adverse event. An analytical expression for this factor was derived, and its properties analyzed. It was shown that differences between the losses sustained by normative and statistically unsophisticated users are greatest for users with intermediate cost–loss ratios and when forecasts are of intermediate skill. These predictions of the model are empirically testable, and if verified in the field or in laboratory experiments, they could act as a useful heuristic for directing interventions [such as those described by Patt et al. (2007)] aimed at educating users, thus increasing the value users realize from forecasts.

If we accept the assertion of Pielke and Carbone (2002) that it is in the interests of the scientific community to take responsibility not only for the production of forecasts but to follow them through all the way to end users’ decisions, then it is vital to attempt a systematic scientific study of the user’s forecast use behavior. In pursuing this, we would do well to learn from and collaborate with colleagues in neuroscience and experimental psychology, and experimental and behavioral economics. These disciplines have evolved powerful tools for understanding the neural processes involved in decision making (Yu and Dayan 2005; Behrens et al. 2007; Platt and Huettel 2008; Rushworth and Behrens 2008), patterns and biases in decision making (Kahneman and Tversky 2000; Nicholls 1999), and sophisticated models of learning and trust (Camerer 2003; Erev et al. 2008). The learning model presented here is a stylized example of the behavioral modeling paradigm that was designed to emphasize the importance of behavioral factors in determining user behavior and thus de facto forecast value. It should certainly not be used for policy recommendations in the absence of empirical data; such data come in two varieties—descriptive field studies and laboratory experiments. Descriptive studies of forecast value (Patt et al. 2005; Stewart et al. 2004; Stewart 1997), in which researchers attempt to analyze the behavior of forecast users in the field, have found it difficult to make coherent measurements of forecast value and its determinants, owing largely to a wide range of circumstantial variables that are beyond the observer’s control. For this reason, more effort and resources should be directed toward running controlled laboratory experiments in which test subjects interact with forecasting products in simulated decision scenarios [see Sonka et al. (1988) for an early attempt]. Such experiments allow for a much greater degree of controllability than field studies, enabling the experimenter to build testable models of decision-making behavior that stand a greater chance of being generalizable than any single field study. It is not clear a priori that results obtained in the laboratory will necessarily be applicable in more complex real-world decision environments. Laboratory investigations can, however, serve to highlight key behavioral factors that influence decision making even in controlled situations and thus suggest causative hypotheses that can be empirically tested in the field. Hopefully, this will not only allow us to make more credible estimates of the true value of forecasts but will also suggest tangible, behaviorally sound, methods of increasing their value for the users that are their raison d’être.

## Acknowledgments

I thank Christopher Honey, Gareth Boxall, and Rachel Denison for their helpful suggestions. The comments of the two anonymous reviewers were very useful. The financial support of the Commonwealth Scholarship Commission and the NRF are gratefully acknowledged.

## REFERENCES

Arthur, W., 1991: Designing economic agents that act like human agents: A behavioral approach to bounded rationality.

,*Amer. Econ. Rev.***81****,**353–359.Behrens, T., Woolrich M. , Walton M. , and Rushworth M. , 2007: Learning the value of information in an uncertain world.

,*Nat. Neurosci.***10****,**1214–1221.Brenner, T., 2006: Agent learning representation: Advice on modelling economic learning.

*Agent-Based Computational Economics,*L. Tesfatsion and K. Judd, Eds., Vol. 2,*Handbook of Computational Economics,*Elsevier, 895–947.Bush, R., and Mosteller F. , 1955:

*Stochastic Models for Learning*. Wiley, 365 pp.Camerer, C., 2003:

*Behavioral Game Theory: Experiments in Strategic Interaction*. Princeton University Press, 550 pp.Cross, J., 1973: A stochastic learning model of economic behavior.

,*Quart. J. Econ.***87****,**239–266.Dennett, D., 1975: Why the law of effect will not go away.

,*J. Theory Soc. Behav.***5****,**169–188.Erev, I., and Roth A. , 1998: Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria.

,*Amer. Econ. Rev.***88****,**848–881.Erev, I., Ert E. , and Yechiam E. , 2008: Loss aversion, diminishing sensitivity, and the effect of experience on repeated decisions.

,*J. Behav. Decis. Making***21****,**575–597.Gigerenzer, G., Hertwig R. , van den Broek E. , Fasolo B. , and Katsikopoulos K. , 2005: “A 30% chance of rain tomorrow”: How does the public understand probabilistic weather forecasts?

,*Risk Anal.***25****,**623–629.Jolliffe, I., and Stephenson D. , 2003:

*Forecast Verification: A Practitioner’s Guide in Atmospheric Science*. Wiley, 240 pp.Kahneman, D., and Tversky A. , 1979: Prospect theory: An analysis of decision under risk.

,*Econometrica***47****,**263–292.Kahneman, D., and Tversky A. , 2000:

*Choices, Values, and Frames*. Cambridge University Press, 840 pp.Katz, R., and Murphy A. , 1997:

*Economic Value of Weather and Climate Forecasts*. Cambridge University Press, 222 pp.Katz, R., and Ehrendorfer M. , 2006: Bayesian approach to decision making using ensemble weather forecasts.

,*Wea. Forecasting***21****,**220–223.Khamsi, M., and Kirk W. , 2001:

*An Introduction to Metric Spaces and Fixed Point Theory*. Wiley, 302 pp.Lieberman, D., 1999:

*Learning: Behavior and Cognition*. 3rd ed. Wadsworth Publishing, 595 pp.Mazur, J., 2006:

*Learning and Behavior*. 6th ed. Prentice Hall, 444 pp.Millner, A., 2008: Getting the most out of ensemble forecasts: A valuation model based on user–forecast interactions.

,*J. Appl. Meteor. Climatol.***47****,**2561–2571.Murphy, A., 1977: The value of climatological, categorical and probabilistic forecasts in the cost–loss ratio situation.

,*Mon. Wea. Rev.***105****,**803–816.Murphy, A., and Ehrendorfer M. , 1987: On the relationship between the accuracy and value of forecasts in the cost–loss ratio situation.

,*Wea. Forecasting***2****,**243–251.Murphy, A., and Winkler R. , 1987: A general framework for forecast verification.

,*Mon. Wea. Rev.***115****,**1330–1338.National Research Council, 2006:

*Completing the Forecast: Characterizing and Communicating Uncertainty for Better Decisions Using Weather and Climate Forecasts*. National Academies Press, 112 pp.Nicholls, N., 1999: Cognitive illusions, heuristics, and climate prediction.

,*Bull. Amer. Meteor. Soc.***80****,**1385–1397.Norman, M., 1968: Some convergence theorems for stochastic learning models with distance diminishing operators.

,*J. Math. Psychol.***5****,**61–101.Oreskes, N., Shrader-Frechette K. , and Belitz K. , 1994: Verification, validation, and confirmation of numerical models in the earth sciences.

,*Science***263****,**641–646.Patt, A., and Gwata C. , 2002: Effective seasonal climate forecast applications: Examining constraints for subsistence farmers in Zimbabwe.

,*Global Environ. Change***12****,**185–195.Patt, A., Suarez P. , and Gwata C. , 2005: Effects of seasonal climate forecasts and participatory workshops among subsistence farmers in Zimbabwe.

,*Proc. Natl. Acad. Sci. USA***102****,**12673–12678.Patt, A., Ogallo L. , and Hellmuth M. , 2007: Learning from 10 years of climate outlook forums in Africa.

,*Science***318****,**49–50.Pavlov, I., 1928:

*Lectures on Conditioned Reflexes: Twenty-Five Years of Objective Study of the Higher Nervous Activity (Behavior) of Animals*. International Publishers, 414 pp.Pielke R. Jr., , and Carbone R. E. , 2002: Weather impacts, forecasts, and policy: An integrated perspective.

,*Bull. Amer. Meteor. Soc.***83****,**393–403.Platt, M., and Huettel S. , 2008: Risky business: The neuroeconomics of decision making under uncertainty.

,*Nat. Neurosci.***11****,**398–403.Rayner, S., Lach D. , and Ingram H. , 2005: Weather forecasts are for wimps: Why water resource managers do not use climate forecasts.

,*Climatic Change***69****,**197–227.Richardson, D., 2001: Measures of skill and value of ensemble prediction systems, their interrelationship and the effect of ensemble size.

,*Quart. J. Roy. Meteor. Soc.***127****,**2473.Roncoli, C., 2006: Ethnographic and participatory approaches to research on farmer responses to climate predictions.

,*Climate Res.***33****,**81–99.Roulston, M. S., and Smith L. A. , 2004: The boy who cried wolf revisited: The impact of false alarm intolerance on cost–loss scenarios.

,*Wea. Forecasting***19****,**391–397.Rushworth, M., and Behrens T. , 2008: Choice, uncertainty and value in prefrontal and cingulate cortex.

,*Nat. Neurosci.***11****,**389–397.Skinner, B., 1933: The rate of establishment of a discrimination.

,*J. Gen. Psychol.***9****,**302–350.Sonka, S., Changnon S. , and Hofing S. , 1988: Assessing climate information use in agribusiness. Part II: Decision experiments to estimate economic value.

,*J. Climate***1****,**766–774.Stewart, T., 1997: Forecast value: Descriptive decision studies.

*Economic Value of Weather and Climate Forecasts,*R. Katz and A. Murphy, Eds., Cambridge University Press, 147–181.Stewart, T., Pielke R. Jr., and Nath R. , 2004: Understanding user decision making and the value of improved precipitation forecasts: Lessons from a case study.

,*Bull. Amer. Meteor. Soc.***85****,**223–235.Thorndike, E., 1911:

*Animal Intelligence: Experimental Studies*. Macmillan, 297 pp.Thorndike, E., 1932:

*The Fundamentals of Learning*. AMS Press, 638 pp.Thorndike, E., 1933: A theory of the action of the after-effects of a connection upon it.

,*Psychol. Rev.***40****,**434–439.Vogel, C., and O’Brien K. , 2006: Who can eat information? Examining the effectiveness of seasonal climate forecasts and regional climate-risk management strategies.

,*Climate Res.***33****,**111–122.Wilks, D., 2001: A skill score based on economic value for probability forecasts.

,*Meteor. Appl.***8****,**209–219.Wilks, D., 2006:

*Statistical Methods in the Atmospheric Sciences*. 2nd ed. Academic Press, 627 pp.Yu, A., and Dayan P. , 2005: Uncertainty, neuromodulation, and attention.

,*Neuron***46****,**681–692.Zhu, Y., Toth Z. , Wobus R. , Richardson D. , and Mylne K. , 2002: The economic value of ensemble-based weather forecasts.

,*Bull. Amer. Meteor. Soc.***83****,**73–83.

## APPENDIX A

### Reinforcement Learning Rule

Reinforcement learning rule.

## APPENDIX B

### Proof of Theorem 1

*i*∈ {0, 1} indexes a set of two updating functions—for example,

*h*

_{0}(

*q*) = (1 −

*λz*)

*q*and

*h*

_{1}(

*q*) = [1 −

*λ*(1 −

*z*)]

*q*+

*λ*(1 −

*z*) when

*z*>

*p*. At each step

_{c}*t*, one of the updating functions is chosen to update the value of

*q*with a known probability

_{t}*p*, where

_{i}*p*

_{0}= 1 −

*p*

_{1|D},

*p*

_{1}=

*p*

_{1|D}. Systems of functions such as Eq. (B1), in which a function is applied with a known probability at each iteration, are known as iterated function systems (IFS).

*T*(

_{t}*q*

_{0}) to be the set of values that

*q*takes with nonzero probability given the initial value

_{t}*q*

_{0}and that the

*q*’s are updated according to (B1). The mean value of

*q*is given by

_{t}*w*(

*q*) is the probability of being at

_{t}*q*at time

_{t}*t*. Now for each element

*q*

_{t+1}of

*T*

_{t+1}(

*q*

_{0}), we have that

*q*∈

_{t}*T*(

_{t}*q*

_{0}) and some

*e*(

*t*) ∈ {0, 1}. The probability of this

*q*

_{t+1}occurring satisfies

*h*are all linear and that for a linear function

_{i}*h*(

*q*),

*Eh*(

*q*) =

*h*(

*Eq*), where

*E*is the expectation operator, we have that

*h*and

_{i}*p*into this equation and collecting terms gives the result.

_{i}## APPENDIX C

### Proof of Theorem 2

The theorem follows as a direct consequence of theorem 2.2 in Norman (1968). To establish it, I will work with the case *z* > *p _{c}*. The analysis presented below follows through in an exactly analogous manner for the case

*z*≤

*p*.

_{c}*S*= ([0, 1],

*d*) be a metric space on the unit interval where

*d*(

*x*,

*y*) = |

*x*−

*y*| is the Euclidean distance between

*x*and

*y*. A

*contraction mapping*on

*S*is a function

*h:S*→

*S*, which satisfies

*h*shrinks the distances between the initial points by at least a factor of

*k*with each iteration. Readers interested in the general properties of such maps and the spaces they act on are referred to Khamsi and Kirk (2001). If each of the functions

*h*(

_{i}*q*) in (B1) is a contraction mapping, then the IFS defined on

_{t}*S*is

*distance diminishing*.

^{5}Finally, define the distance

*d*

*A*,

*B*) between two sets

*A*and

*B*as

Now let *K _{t}*(

*q*,

*q*

_{0}) be the probability of being in state

*q*at time

*t*given an initial value

*q*

_{0}.

**Theorem 2.2**of Norman (1968): Suppose that an IFS of the form (B1) is distance diminishing and has no absorbing states. Suppose also that the following condition is satisfied:

*q*converges (uniformly) to the stationary asymptotic distribution

_{t}*P*(

*q*) for any initial value

*q*

_{0}.

(i) The system has no absorbing states; that is, there is no value of

*q*for which_{t}*q*_{t+1}=*q*with probability 1._{t}(ii) Each of the functions

*h*is a contraction mapping._{i}(iii) The condition (C3) is satisfied.

*h*(

_{i}*q*) are linear functions. They may thus be written as

_{t}*r*and

_{i}*s*are constants. One can verify using the definition (C1) that a linear function on

_{i}*S*is a contraction mapping if it has a slope of absolute magnitude less than 1. Thus we require

*z*>

*p*:

_{c}*z*< 1, these inequalities are satisfied for

*λ*∈ (0, 1).

*λ*, condition (C3) is satisfied. To do this, let

*α*be in

_{n}*T*(

_{n}*q*

_{0}). This means that there is a sequence {

*i*} of values of

_{k}*i*such that

*β*is the result of the same sequence of functions that gives rise to

_{n}*α*but with a different starting value

_{n}*q*′

_{0}. The distance between

*α*and

_{n}*β*is

_{n}*c*is a constant that is the same for both terms and

*k*

*λ*(1 −

*z*)|, |1 −

*λz*|] < 1 for

*λ*∈ (0, 1). The minimum distance between

*T*(

_{n}*q*

_{0}) and

*T*(

_{n}*q*′

_{0}) must be less than or equal to the distance between

*α*and

_{n}*β*, so we have that

_{n}*q*

_{0},

*q*′

_{0}∈

*S*condition (C3) is satisfied and the theorem is established.

Loss matrix 𝗟(*a*, *e*) for the cost–loss scenario.

^{1}

Named in honor of B. F. Skinner.

^{2}

Notice that actions are determined by the information source selected because the climatology always prescribes the same action, and we know that the forecast disagrees with the climatology. Thus *a* = 1 when *c* = 1 and *z* > *p _{c}*, and

*a*= 0 when

*c*= 1 and

*z*≤

*p*.

_{c}^{3}

To explain this further, notice that |*A*| < 1 implies that 0 < *λ* < 2/(*z* + *p*_{1|D} − 2*zp*_{1|D}). The expression on the right-hand side of this inequality is larger than or equal to 2; however, the weakest constraint on *λ* from the requirement (21) is that *λ* < max_{z} {1/(max{*z*, 1 − *z*})} = 2.

^{5}

This implication is specific to our set *S* and depends critically on the fact that *k* is strictly less than one for all the functions *h _{i}*. Refer to Norman (1968) for the full general list of conditions that the IFS must satisfy for it to be distance diminishing.