Comparison of Single-Valued Forecasts in a User-Oriented Framework

Michael Foley Science and Innovation Group, Bureau of Meteorology, Melbourne, Australia

Search for other papers by Michael Foley in
Current site
Google Scholar
PubMed
Close
and
Nicholas Loveday Science and Innovation Group, Bureau of Meteorology, Darwin, Australia

Search for other papers by Nicholas Loveday in
Current site
Google Scholar
PubMed
Close
Free access

Abstract

We compare single-valued forecasts from a consensus of numerical weather prediction models to forecasts from a single model across a range of user decision thresholds and sensitivities, using the relative economic value framework, and present this comparison in a new graphical format. With the help of a simple linear error model, we obtain theoretical results and perform synthetic calculations to gain insights into how the results relate to the characteristics of the different forecast systems. We find that multimodel consensus forecasts are more beneficial for users interested in decisions near the climatological mean, due to their reduced spread of errors compared to the constituent models. Single model forecasts may present greater benefit for users sensitive to extreme events if the forecasts have smaller conditional biases than the consensus forecasts and hence better resolution of such events. The results support use of consensus averaging approaches for single-valued forecast services in typical conditions. However, it is hard to cater for all user sensitivities in more extreme conditions. This underscores the importance of providing probability-based services for unusual conditions.

© 2020 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Michael Foley, michael.foley@bom.gov.au

Abstract

We compare single-valued forecasts from a consensus of numerical weather prediction models to forecasts from a single model across a range of user decision thresholds and sensitivities, using the relative economic value framework, and present this comparison in a new graphical format. With the help of a simple linear error model, we obtain theoretical results and perform synthetic calculations to gain insights into how the results relate to the characteristics of the different forecast systems. We find that multimodel consensus forecasts are more beneficial for users interested in decisions near the climatological mean, due to their reduced spread of errors compared to the constituent models. Single model forecasts may present greater benefit for users sensitive to extreme events if the forecasts have smaller conditional biases than the consensus forecasts and hence better resolution of such events. The results support use of consensus averaging approaches for single-valued forecast services in typical conditions. However, it is hard to cater for all user sensitivities in more extreme conditions. This underscores the importance of providing probability-based services for unusual conditions.

© 2020 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Michael Foley, michael.foley@bom.gov.au

1. Motivation and scope

There remains a place in weather forecast services for single-valued forecasts of weather parameters important for day-to-day decision-making (Lazo et al. 2009). Such forecasts provide easily understood information about the weather and can be used as a simple basis for decision-making. This is notwithstanding the widely recognized benefits of explicitly probabilistic forecasts for more sophisticated decision-makers (Palmer 2002; Buizza 2008; Verkade and Werner 2011; Ramos et al. 2013) and the increasing availability of such probabilistic forecasts based on dynamical ensemble prediction systems (Buizza and Leutbecher 2015) and statistical post processing systems (Vannitsem et al. 2018; Hemri et al. 2014).

If a single-valued forecast is to be provided, a decision must be made as to the most appropriate information to feed this service. This decision may reside in the way an automatically generated service is constructed, or it might be a choice being made day by day by a weather forecaster deciding whether or not to adjust forecast guidance from an automated system. Different types of single-valued forecasts have different characteristics. Some may be termed “deterministic,” in the sense that they relate to a particular forecast scenario such as provided by the output of a single numerical weather prediction (NWP) model. Others represent a statistical construct taken from the distribution of possible outcomes, rather than any one scenario. For instance, the forecast could be the average of an ensemble of forecasts, whether from a “poor-man’s ensemble” set of deterministic NWP systems or from an ensemble prediction system.

When single-valued forecasts are displayed in gridded form, the differences in the various types of information become more obvious, with the single model output giving a meteorological picture consistent with one possible future state of the atmosphere, and the ensemble mean giving a picture that is more or less fuzzy depending on the degree of uncertainty. The former might be preferred by the meteorologist who wants a precise picture to which they can attach conceptual models to craft an accompanying narrative to the forecast. Likewise, a forecast user such as a fire behavior modeler might need a dynamically consistent picture to get a realistic representation of how a cold front would affect a fire. On the other hand, the ensemble mean has the advantage of not representing unforecastable and potentially misleading detail. This could be preferable for users who base their decision-making on values of meteorological parameters at their location, without making inferences based on meteorological features.

In this paper we compare single-valued forecast systems by considering how valuable they are for users’ decision-making. This goes beyond how forecasts perform when compared with observations using standard metrics such as root mean squared error or mean absolute error. As has been pointed out (Murphy 1993; Roebber and Bosart 1996; Marzban 2012), the benefit of a forecast to a user (which we will refer to as “forecast value”) does not necessarily depend on forecast skill as measured by closeness of forecasts to observations. In section 2 we look at which is the more valuable out of two example forecast systems, using a simple representation of forecast value. We present a graphical depiction of this comparison across the range of user sensitivities and across the range of decision thresholds.

Section 3 explores how the results relate to the properties of each of the two forecast systems. We consider forecast value in the light of a simple linear forecast error model that expresses forecast system characteristics in terms of conditional bias, unconditional bias, and random error spread. In section 4 we draw some theoretical conclusions about which forecast system is the more valuable for all users, at particular decision thresholds. We further examine the relationship between forecast value and different forecast system characteristics in section 5 by performing experiments with synthetic series of observations and forecasts. This allows us to identify the superiority of consensus average approaches for forecast decisions in nonextreme conditions, due to their reduced spread of random errors. However, in more extreme conditions, a forecast system with less conditional bias than the consensus will benefit some users.

In section 6 we provide suggestions regarding aspects of forecast system performance for which improvements would increase value to users.

2. Comparing forecast value

a. Relative economic value

In the relative economic value framework (Richardson 2000), a user who is affected by particular weather event (e.g., temperature exceeding 30°C), will experience a loss L if that event occurs, unless they protect in advance against the event with cost C. We consider the expense Eclimate that they would have incurred due to the weather events if they had made the same decision on every occasion based on the climatology of the weather event, the expense Eforecast that they would have incurred had they acted based on whether or not the forecast system predicted the event, and the expense Eperfect that they would have incurred if they used a hypothetical perfect forecast. The relative economic value V of a forecast system is defined as
V=EclimateEforecastEclimateEperfect.
The different possible forecast and observation outcomes for an event are shown in Table 1. By considering the user costs or losses associated with each outcome, it can be shown (Richardson 2000) that
V=min(α,o¯)Fα(1o¯)+Ho¯(1α)o¯min(α,o¯)o¯α,
where α = C/L is the cost–loss ratio, o¯ is the observed relative frequency (hits + misses) of the event during the period of the forecasts, F=false alarms/(1o¯) is the false alarm rate and H=hits/o¯ is the hit rate.
Table 1.

Contingency table for a binary event, where the different outcomes refer to relative frequencies of occurrence.

Table 1.

b. Significant cost–loss ratios

It follows from (2) that the difference in value ΔV between two different sets of forecasts depends on the differences in hit rate ΔH and false alarm rate ΔF as follows:
ΔV=ΔF α(1o¯)+ΔH o¯(1α)min(α,o¯)o¯α.
Assuming we exclude the case of events that are always or never observed to happen and assuming users have nonzero costs and losses, then 0 < α < 1 and 0<o¯<1 so that the denominator of this expression is always positive. This means that if one forecast system has both a lower false alarm rate and a higher hit rate than the other, then it is the more valuable, for any user’s cost–loss ratio.
If one forecast system has a better hit rate and the other has a better false alarm rate then the forecast system that is more valuable varies with cost–loss ratio. In this case it follows from (3) that the set of forecasts with a higher hit rate is more valuable for α < αequal while the set of forecasts with a lower false alarm rate is more valuable for α > αequal, where
αequal=ΔHo¯ΔF(1o¯)+ΔHo¯.
This can be rewritten as
αequal=o¯Δf¯ΔH,
where f¯ is the relative frequency (hits + false alarms) of forecasts of the event and Δf¯ is the difference between the two forecast sources.
The comparison of value of two forecast systems needs to be considered in the context of the range of cost–loss ratios for which the value exceeds climatology. For any set of forecasts, there will be a value of α at which a user is so sensitive to the event (loss ≫ cost) it is better to always protect against an event than make a decision based on the forecast. Following Richardson [2000, his Eq. (10)], this holds for α < αlow, where
α low=o¯1f¯(1H).
Furthermore, there is a value of α at which a user is so insensitive to the event (loss ≈ cost) that it is better to never protect, rather than use the forecast. Again following Richardson [2000, his Eq. (11)], this holds for α > αhigh, where
α high=o¯f¯H.

c. Single-valued forecasts to compare

Studies looking at forecast value have typically compared probability forecasts to each other or have been used to show that probability forecasts are more valuable than single-valued forecasts (Richardson 2000; Mylne 2002; Zhu et al. 2002). Here we are applying the concept of relative economic value to compare different single-valued forecasts. One study that applied this idea to different types of single-valued forecasts was by Buizza (2001). He showed, in the case of a synthetic ensemble of rainfall forecasts, that the ensemble mean was more valuable than the control ensemble member for most rainfall thresholds, but that the control was more valuable for very sensitive users at the highest rainfall threshold examined.

Following Buizza (2001) but using real forecast data, we take two forecast systems in use at the Australian Bureau of Meteorology that are examples of these two different approaches to providing single-valued forecasts. The gridded Operational Consensus Forecast (OCF) is a poor-man’s ensemble, taking bias-corrected forecasts from several NWP models as input to a weighted consensus (Engel and Ebert 2007, 2012) with statistical downscaling to around 5-km resolution based on a gridded analysis. It exemplifies the statistical averaging approach. The Australian Community Climate and Earth-System Simulator NWP model (Puri et al. 2013), run in its regional configuration (ACCESS-R) at around 12-km horizontal resolution, is an example of the single-scenario approach. The ACCESS-R output is one of the NWP inputs into OCF. In each case, the forecasts are for temperature at 0600 UTC (between 1400 and 1700 local time depending on time zone) compared to observations at sites, at a lead time of 36 h. The forecasts are extracted from data presented to forecasters in the Graphical Forecast Editor (GFE), for use in provision of forecaster-curated gridded forecast services. The OCF and ACCESS-R forecasts have been converted through bilinear interpolation to a 3- or 6-km grid (gridded forecast resolution varies by state across Australia) and have been adjusted further based on the difference in elevation between guidance and observation using a standard adiabatic lapse rate of 6.5°C km−1. This is the standard processing that forecast guidance receives on ingestion into the GFE, and as the same processing has been applied to data from each forecast system, we do not expect that it will affect comparisons between the two. The forecasts are valid at 200 observation stations across the southern part of Australia, with forecasts taken from the Southern Hemisphere summer seasons (December–February) for 2015/16, 2016/17, 2017/18 and 2018/19.

d. Display of comparison

In Fig. 1a we consider a decision threshold of 30°C and show relative economic value curves as presented by Richardson (2000) and others, plotting relative economic value as a function of cost–loss ratio for each forecast system. The resulting curves show that for a decision threshold of 30°C, the consensus forecasts (OCF) are more valuable irrespective of user sensitivity to the event. By contrast, Fig. 1b for a decision threshold of 35°C shows that more sensitive users with cost–loss ratios up to 0.33 would have been better off using the deterministic NWP forecasts (ACCESS-R). Less sensitive users with higher cost–loss ratios would have been better off using the OCF forecasts. In both examples, the most sensitive and insensitive users at the extremes of cost–loss ratio, would have been better off basing their decision on climatology.

Fig. 1.
Fig. 1.

Relative economic value curves for OCF and ACCESS-R, as a function of cost–loss ratio α for decision thresholds (a) X = 30°C and (b) 35°C. Forecasts are for 36-h lead time for screen-level temperature at 0600 UTC, across 200 observation locations in southern Australia for summer 2015/16, 2016/17, 2017/18, and 2018/19.

Citation: Weather and Forecasting 35, 3; 10.1175/WAF-D-19-0248.1

To obtain a more general idea of the usefulness of the single-valued forecasts, we proceed to look at comparisons for decisions based on whether the temperature forecasts exceed a threshold X, across a range of values of X and cost–loss ratios α between 0 and 1. Figure 2 shows which is the most valuable forecast system across these ranges, with the changeover point being given by αequal from (5). The gray shaded areas show where neither forecast system is superior to a climatologically based decision, using αlow and αhigh from (6) and (7). The hit rate and false alarm rate for each forecast system are also shown. The forecast system with better (higher) hit rate is OCF at decision thresholds below 33°C while ACCESS-R has better hit rate above. ACCESS-R has the better (lower) false alarm rate below 26°C and OCF is better above. Therefore OCF is more valuable for all user sensitivities, with higher hit rate and lower false alarm rate, for decision thresholds between 26° and 33°C. Figure 1a demonstrates this for a cross section of Fig. 2 at threshold 30°C. For thresholds exceeding 33°C, the deterministic NWP model proves to be the more valuable for more sensitive users while the consensus forecast is the more valuable for less sensitive users. Figure 1b corresponds to a cross section of Fig. 2 at threshold 35°C that lies in this regime. For thresholds beneath 26°C, the opposite result holds.

Fig. 2.
Fig. 2.

Most valuable basis for decision, out of OCF, ACCESS-R, and climatology, as a function of cost–loss ratio α and decision threshold X. Also shown are hit rates and false alarm rates for OCF and ACCESS-R. Forecasts are as for Fig. 1.

Citation: Weather and Forecasting 35, 3; 10.1175/WAF-D-19-0248.1

3. Effect of error characteristics on value

a. Linear error model

To understand how the nature of the errors of different types of forecast systems relate to the comparative value of those forecast systems, we use a simple linear model of the errors, following Tian et al. (2016). This error model proposes that forecasts can be related to observations through the relationship:
yi=λxi+δ+εi,
where yi is the ith prediction by the forecast system, xi is the corresponding observation, λ and δ are constants, and εi is a random error drawn from a distribution with standard deviation σ and mean 0.
If the forecast system has an unconditional bias β when averaged over all the events i, then (8) can be expressed as
yi=λxi+(1λ)x¯+β+εi,
where x¯ is the mean of the observations. We will refer to λ as the “scale.”

b. Error characteristics of different types of forecast systems

We consider how the characteristics of a consensus forecast system compare with those of an individual NWP model within the linear error model framework. Say a consensus forecast system, with error model parameters λC, βC, and σC comprises the mean of forecasts from n individual NWP models, the jth of which has its own error model parameters λj, βj, and σj.

It is well recognized (see, e.g., Woodcock and Engel 2005) that consensus forecasts exhibit smaller errors than forecasts based on single deterministic NWP model outputs, due to cancellation of random errors. If those errors are uncorrelated between the different NWP models then it follows from the general properties of variances [as laid out for instance by Brunk (1965, chapter 5)], that the standard deviation of the random errors for the consensus forecast is given by
σC=1n(σ12+σ22++σn2n)1/2.
A consensus of n models will have lower spread of random errors, relative to the constituent models, due to the factor 1/n, with the proviso that the constituent models have similar random error spreads. Expanding the consensus by addition of another model with an anomalously large random spread could overcome the effect of the 1/n factor and increase σC. This is one reason why having more constituent models does not always improve performance of the consensus, as observed by Arribas et al. (2005).

Due to the linear form of the error model, the other parameters λC and βC will simply be the mean of the corresponding parameters for the constituent models. The OCF consensus forecast system includes a removal of recently observed unconditional bias, which generally leads to forecast bias βC ≈ 0. The scale parameter λ can be thought of as representing biases that are conditional on the observed value. This can be seen from the form of (9) where for λ closer to 1 the forecasts will follow the observations and for λ closer to 0 they will follow the mean observation and be insufficiently extreme. Conditional biases are not removed in the OCF system. Therefore the best performing NWP model in the consensus will be less conditionally biased (λ closer to 1) than the consensus that has the mean λ.

c. Error characteristics of two forecast systems

Returning to the example used in section 2c, we use least squares fitting to estimate the error model parameters for the OCF and ACCESS-R forecast systems. The fitting was done separately for each observation site, to give an idea of the amount of variation in the results for different locations.

Figure 3a compares random error standard deviation for OCF and ACCESS-R, and shows that for most locations the error spread is smallest for the consensus system.

Fig. 3.
Fig. 3.

Comparison of (a) standard deviation of random errors σ and (b) scale parameter λ for OCF and ACCESS-R 36-h forecasts of 0600 UTC temperature from the linear error model with least squares fit for individual observation locations in southern Australia for summer 2015/16, 2016/17, 2017/18, and 2018/19. Points above (below) the diagonal in (a) have a smaller (larger) spread of random errors for OCF than for ACCESS-R. Points in the left and right quadrants of the cross in (b) have a scale closer to 1 for ACCESS-R than for OCF, while the converse holds for the top and bottom quadrants.

Citation: Weather and Forecasting 35, 3; 10.1175/WAF-D-19-0248.1

Figure 3b compares the scale parameter. It tends to be less than one, showing that both forecast systems tend to underforecast the extremes when verified at a point in time and space. However, the NWP system has the scale parameter nearer to one for the majority of locations, showing that it has less conditional bias.

The unconditional bias is shown in Fig. 4, and is more clustered around zero for the consensus system, reflecting the fact that a bias correction has been applied. There is still some scatter of bias for OCF, which may be due to the fact that the bias correction is done relative to a gridded analysis (Engel and Ebert 2007) while we are comparing to site observations.

Fig. 4.
Fig. 4.

Comparison of unconditional bias β for same forecasts as for Fig. 3.

Citation: Weather and Forecasting 35, 3; 10.1175/WAF-D-19-0248.1

The results for the parameters, from the fits for the 200 stations, are summarized in the means for each forecast system given in Table 2. These quantify the lower random error spread σ for OCF and better conditional error characteristics λ for ACCESS-R. The mean unconditional biases for both forecast systems are small, and in fact the mean result for ACCESS-R is slightly better than that for OCF in this example.

Table 2.

Linear error model parameters fit to 36-h OCF and ACCESS-R forecasts of 0600 UTC temperature for summer 2015/16, 2016/17, 2017/18, and 2018/19, with the mean taken across 200 observation sites for southern Australia.

Table 2.

4. Analytic results

To enable us to derive theoretical relationships between the error characteristics of two forecast systems A and B, and the difference in relative economic value between A and B, we make the simplifying assumptions that they can be adequately described by a linear error model with random errors drawn from a Gaussian distribution. Tian et al. (2016) note that real forecast errors may not be well described by the linear error model. For instance, the simple linear model of forecast error does not take into account the possibility that error spread might be dependent on extremeness of the observations. Furthermore, the residual errors may not be well represented by samples from a Gaussian distribution.

While we acknowledge these caveats, these simplifications allow us to gain useful insights from relationships derived for three cases described in sections 4a, 4b, and 4c.

a. Forecasts with no conditional or unconditional biases

In the case where both forecast systems have no unconditional or conditional biases, so βA = βB = 0 and λA = λB = 1, then if σA < σB it follows (see derivation in the appendix) that system A always has higher hit rate and lower false alarm rate than system B. As discussed in section 2b, this means that A has higher relative economic value than B. This holds for any cost–loss ratio and any decision threshold, so all users are better off making decisions based on forecasts from system A than on forecasts from system B (though as discussed in section 2b, they may still be better off making decisions based on climatology if particularly sensitive or insensitive to the event).

b. Forecasts with conditional biases only

There are straightforward techniques to bias-correct forecasts, for instance by removing recent unconditional bias and assuming that the past bias of the forecast system will be a good indicator of the future bias. Thus, it is useful to consider forecast systems that have only conditional biases (λA, λB ≠ 1), with the unconditional biases having been removed (βA = βB = 0).

In this case, in the special situation where the decision threshold is equal to the mean observed value x¯, then if
σAλA<σBλB,
it follows (see the appendix) that system A always has higher hit rate and lower false alarm rate than system B, and thus A is more valuable. This holds for any user cost–loss ratio.

This result can be understood as follows. If λ = 0, the forecasts are centered around the mean of the observations, and as λ increases toward 1, the forecasts are progressively closer to being centered on the observed value and thus more clearly separated from x¯ which is our decision threshold, with lower likelihood of the random error leading to the forecast being on the wrong side of x¯.

As the decision threshold moves to more extreme values above (below) the mean, the result in this case no longer holds true. The forecast system that has λ closer to 1, being less conditionally biased, will forecast the event more (less) frequently and at some point will have more (fewer) of both hits and false alarms than the other forecast system. Then there will no longer be one forecast system that is most valuable for all users, but instead the more valuable system will depend on cost–loss ratio following (5). We will explore this behavior in more detail in section 5.

c. Forecasts with unconditional biases

If inequality (11) holds and if there are no restrictions on the unconditional bias, then (see appendix) A is more valuable than B for any user cost–loss ratio at decision threshold X:
X=x¯+βAσBβBσA(1λA)σB(1λB)σA.
It can be seen from this expression that the more the unconditional biases βA and βB differ, the more the threshold tends to move away from the observed mean. For instance, in the simple case where system A has zero unconditional bias, system B has negative unconditional bias, and they have the same conditional biases λ, then the denominator of (12) is positive due to (11), which leads to X>x¯. The negative bias will decrease both hit rate and false alarm rate for B. This has the effect of moving the range of thresholds in which A has both better hit and false alarm rates, to higher quantities, as hit rates and false alarm rates are monotonically decreasing as a function of threshold.

5. Synthetic results

To gain further insights into the relationship between the error characteristics of forecast systems and their comparative value, we generate series of synthetic observations and use the error model described earlier to create series of synthetic forecasts given these observations. The number of observations generated is large (5 million) so that the forecast performance is adequately sampled for the most extreme observations.

The relationships described in section 4 and the appendix are independent of the form of the distribution of the observations. For simplicity, we have drawn the synthetic observations from a Gaussian distribution. The synthetic observations in sections 5a, 5b, 5c, and 5e below are generated with mean value (24.2°C) and standard deviation (6.7°C) matching those of the real observations in the example in section 2c. In section 5d the standard deviation of the observations is varied to explore how this affects the forecast value comparisons.

a. Forecasts with different random error spreads

We compare two hypothetical forecast systems A and B, neither of which have any biases (βA = βB = 0 and λA = λB = 1). System A has a random error standard deviation σA = 1.96°C while system B has σB = 2.39°C, matching the values for OCF and ACCESS-R, respectively, as given in Table 2. Figure 5 shows the more valuable of the two systems, in the same format as Fig. 2. As expected from section 4a, the system with the smaller spread of random errors has higher hit rate and lower false alarm rate and thus is more valuable for all users at all thresholds.

Fig. 5.
Fig. 5.

Most valuable basis for decision, out of A (synthetic forecasts with βA = 0, λA = 1, σA = 1.96), B (βB = 0, λB = 1, σB = 2.39), and climatology as a function of cost–loss ratio α and decision threshold X. Forecasts were generated using the linear error model with random errors drawn from a Gaussian distribution. Also shown are the hit rates and false alarm rates for A and B. Observations are synthetic with mean 24.2°C (dot–dashed line) and standard deviation 6.7°C.

Citation: Weather and Forecasting 35, 3; 10.1175/WAF-D-19-0248.1

b. Forecasts with different conditional biases

Now we give our two hypothetical forecast systems A and B an equal random error spread (σA = σB = 2.18°C, in between the OCF and ACCESS-R values) and no unconditional biases (βA = βB = 0), but system A has scale λA = 0.911 while system B has scale λB = 0.943, again as per the values for OCF and ACCESS-R in Table 2. This will introduce conditional biases such that the forecasts will tend to be less extremely high or low compared to observations, more so in the case of A than B. As can be seen in Fig. 6, the result of the greater underforecasting of extremes by A is that the hit rate (false alarm rate) becomes worse more rapidly as event thresholds rise above (fall below) the mean.

Fig. 6.
Fig. 6.

As in Fig. 5, for A (βA = 0, λA = 0.911, σA = 2.18) and B (βB = 0, λB = 0.943, σB = 2.18).

Citation: Weather and Forecasting 35, 3; 10.1175/WAF-D-19-0248.1

As expected from (11), system B provides more valuable forecasts for all users when the decision threshold equals the mean of 24.2°C. However, users who are more insensitive (sensitive) to an extremely high (low) event threshold, will obtain more value from system A.

c. Forecasts with different biases and random error spreads

We make synthetic versions of the OCF and ACCESS-R forecasts by using all the error model parameters from Table 2. Because σ/λ is lower for OCF than ACCESS-R, it follows that the synthetic OCF forecasts are more valuable than synthetic ACCESS-R for all user sensitivities at the event threshold 28.9°C derived from (12). This can be seen in Fig. 7. However, as in section 5b, as one moves to higher thresholds, there comes a point where the system with larger conditional biases (synthetic OCF) ends up with a poorer hit rate and better false alarm rate than the other system due to underforecasting of the event. Beyond that point, (5) means that sensitive users (α < αequal) would benefit more from the other forecast system (synthetic ACCESS-R) due to its better hit rate, despite its larger spread of random errors. Similarly, as one moves to lower thresholds, the overforecasting of the event by the system with larger conditional biases (synthetic OCF) leads to the other forecast system (synthetic ACCESS-R) being better for insensitive users (α > αequal) due to its better false alarm rate.

Fig. 7.
Fig. 7.

Most valuable basis for decision, out of synthetic versions of OCF, ACCESS-R, and climatology, as a function of cost–loss ratio α and decision threshold X. Forecasts were generated using the linear error model with random errors drawn from a Gaussian distribution, taking parameters from Table 2. Also shown are hit rates and false alarm rates for synthetic OCF and ACCESS-R. Observations are synthetic with mean 24.2°C (dot–dashed line) and standard deviation 6.7°C. The dotted line shows the decision threshold corresponding to (12), where one forecast system will be best for all sensitivities.

Citation: Weather and Forecasting 35, 3; 10.1175/WAF-D-19-0248.1

By comparing Fig. 2 for the real forecasts and observations to Fig. 7 for the synthetic data, one can see similar features that suggest that, despite the many simplifications underpinning the theoretical results and synthetic calculations, they can provide insights into real data.

We have looked at a range of examples (not shown) for other parts of Australia, other times of day, other forecast systems (NWP forecasts from the European Centre for Medium-Range Weather Forecasts and manually produced forecasts) and other parameters (daily maximum and minimum temperature, dewpoint and wind speed). For temperatures, we find that results for real data match the theoretical expectations when the forecast systems being compared have well separated values of σ/λ. In other examples where σ/λ are close together, the theoretical results are not necessarily borne out by the real data, which is not surprising given the many assumptions that have been made. Also, there can be cases where the event threshold from (12) is extreme and therefore is not relevant to realistic ranges of forecast values and does not have well-sampled data around the threshold, for instance in cases where σAσB and λAλB but βAβB. Wind and dewpoint are not so well represented by the linear error model, and do not conform to the theoretical expectations.

d. Dependence on observational spread

To show the effect of observational spread on the results, we repeat section 5b with the standard deviation of the synthetic observations being decreased and increased by 20% (Figs. 8 and 9, respectively). From this we see that when the observational scatter is smaller, the climatologically based decision is best for a larger range of user sensitivities (larger gray area on Fig. 8 compared with Fig. 9). In other words, if the weather doesn’t vary much, for instance for temperature in the tropics, a forecast is unlikely to help a user make a decision unless it is particularly accurate.

Fig. 8.
Fig. 8.

As in Fig. 6, but synthetic observations have lower variability with standard deviation 5.3°C. (The data are noisier at extreme high and low decision thresholds as the narrower distribution leads to fewer cases in the extremes, compared with Fig. 6.)

Citation: Weather and Forecasting 35, 3; 10.1175/WAF-D-19-0248.1

Fig. 9.
Fig. 9.

As in Fig. 6, but synthetic observations have higher variability with standard deviation 8.0°C.

Citation: Weather and Forecasting 35, 3; 10.1175/WAF-D-19-0248.1

The range of user sensitivities for which system A, the more conditionally biased forecast system, is better than system B, is also wider when the observational scatter is smaller. This is shown by a smaller tan area at extreme decision thresholds on Fig. 8 compared with Fig. 9. System A tends to be closer to climatology than to the observed temperature in the extremes, compared with system B. For a given high extreme, system A will have lower false alarm rates and hit rates. As the observational scatter is reduced, false alarms become increasingly more prevalent than hits, and the false alarm term in αequal (4) dominates, leading to a lower cost ratio at which system B becomes the more valuable. Conversely, at low extremes the hit terms in (4) dominate, leading to a higher cost–loss ratio at which system B becomes the most valuable.

e. Sensitivity to form of forecast error distribution

The derivations of the theoretical relationships in section 4 assume that the distribution of random forecast errors is Gaussian. To test how sensitive the results are to this assumption, we replace the Gaussian with the skew normal distribution (Azzalini 1985) that corresponds to a Gaussian distribution when its shape parameter is zero, while giving a skewed distribution with other choices of shape parameter.

For example, Fig. 10 is a comparison of synthetic versions of OCF and ACCESS-R forecasts with the same parameters as in section 5c but with skewed random error distributions having shape parameter 1 for OCF (skewness 0.14) and shape parameter 2 for ACCESS-R (skewness 0.46). The skewness for the random errors of the actual OCF and ACCESS-R forecasts are 0.16 and 0.19, respectively. When synthetic forecasts are generated with random error distributions skewed by these amounts (not shown), the result is hard to distinguish from Fig. 7, so we have increased the difference in skewness for effect. It can be seen by comparing Fig. 7 and Fig. 10 that the area where synthetic ACCESS-R is most valuable decreases for high decision thresholds and increases for low thresholds. The skew moves the median of the random errors to the left of the mean, and the more skewed the distribution, the more hit rates reduce, for decision thresholds above the observational mean, and the more false alarm rates reduce, for decision thresholds below the observational mean.

Fig. 10.
Fig. 10.

As in Fig. 7, but with random errors drawn from a skew normal distribution having skewness 0.14 for synthetic OCF and 0.46 for synthetic ACCESS-R.

Citation: Weather and Forecasting 35, 3; 10.1175/WAF-D-19-0248.1

To show that the analytic relationships of section 4 are useful for a range of distribution shapes beyond Gaussian, we have repeated the scenarios from sections 5a to 5d for the same choices of σ, λ, and β but with a variety of different shape parameter combinations for forecast systems A and B. The results remain consistent with the analytic relationships in all cases up until A reaches shape parameter −5 (skewness −0.85) and B reaches shape parameter 5 (skewness 0.85), when the analytic result for forecasts with no conditional or unconditional biases (section 4a) is no longer obeyed for all decision thresholds.

6. Discussion and conclusions

We have applied a user-oriented approach to compare the worth of single-valued temperature forecasts from two different forecast systems and have presented a new graphical depiction showing which system has higher relative economic value as a function of user cost–loss ratio and decision threshold. Through use of a linear error model we have been able to gain insights into how the nature of the forecast systems being compared affects which forecast system is more valuable.

Many assumptions and simplifications have been made along the way. Decisions made in the real world will not necessarily be well represented by a simple cost–loss proposition or a single decision threshold, and various more complex approaches have been suggested (Shorr 1966; Matte et al. 2017; Roulston and Smith 2004). There are other properties of the forecast that may affect user decision-making, for instance forecast stability (Griffiths et al. 2019), which is not considered in this framework. We have already noted limitations with the linear error model.

We have seen that synthetic forecasts generated using the linear error model can give qualitatively similar features to real forecasts when comparing relative economic value. This gives us reason to expect that the insights from a simple error model can also assist our understanding of how relative economic value compares for real-world forecasting systems.

The roles of unconditional and conditional bias can suggest strategies for increasing the usefulness of a forecast system. In the absence of biases, the spread of random errors is seen to have a direct relationship to forecast value for all user sensitivities. Therefore, averaging approaches to derive single-valued forecasts from an ensemble of independent forecasts are beneficial as they contribute to this aspect of forecast value. However, conditional biases degrade the value of forecasts in extreme conditions. While emphasis has been placed on the more straightforward task of correcting unconditional biases in forecast systems (Woodcock and Engel 2005), it is also beneficial to reduce conditional biases to maximize the value of a forecast system.

The results give insight into when a particular single-valued forecast system is going to be the most useful. We have seen, via synthetic forecasts, that forecasts become more valuable relative to climatology the more variable the range of observations is. We have shown that there can be event thresholds for which one forecast system is unequivocally better than another for all user sensitivities. For forecast systems where unconditional bias has been removed, this occurs around the mean observed value. We have seen how decisions based on whether conditions will be below or above normal can be better provided by a bias-corrected consensus than a deterministic NWP output. Such a forecast system provides the best information for all decision-makers in routine conditions that are not far removed from the mean.

However, single-valued forecasts from one forecast system are not going to be optimal for all types of users. For sensitive (insensitive) users in extreme high (low) conditions, a single output from a skillful NWP model can be more valuable than a consensus which fails to resolve the extremes. This is consistent with the findings of Buizza (2001) with synthetic rainfall forecasts. This has implications in a context where forecasters can intervene in the forecast production process. If forecasters start with a forecast system that is optimal in routine conditions, and can understand and consistently reduce conditional biases for more extreme conditions, this will have the effect of widening the range of decision thresholds for which the forecast is best for all users.

The most complete solution to addressing different user sensitivities is to move to multiple forecasts that are tailored to different user sensitivities, or else to provide explicitly probabilistic forecasts, particularly for thresholds corresponding to extreme and potentially impactful events, which will not be optimally served by one single-valued forecast system. However, for as long as there is an ongoing demand for a general single-valued forecast service, there will remain a need to consider how to optimize the user benefits of single-valued forecasts. The methods explored in this paper provide one avenue for such consideration.

Acknowledgments

The authors thank Beth Ebert, Deryn Griffiths, Ioanna Ioannou, and two anonymous reviewers for their helpful review comments.

APPENDIX

Derivation of Theoretical Relationships

Here we give the derivation of the relationships in section 4 for comparative value of forecast systems in some particular cases, under the linear error model as described in section 3. We make the assumption that the random errors are drawn from a Gaussian distribution.

As discussed in section 2b, relative economic value for forecast system A increases with hit rate HA and decreases with false alarm rate FA. Say the event of interest is an observation x exceeding some threshold X. If the forecast system is well described by the error model in (9) then the distribution of forecast values for a given observation x, over many cases, is centered on
μA=λAx+(1λA)x¯+βA,
with standard deviation σA. For a Gaussian distribution, the proportion of forecast values that exceed the event threshold X is given by
12πXμσet2/2dt
[following Brunk (1965), p. 150], so that
HA=12πXP(x)XμAσAet2/2dtdx,
and
FA=12πXP(x)XμAσAet2/2dtdx,
where P(x) is the probability of x being observed.

Forecast system A will be more valuable than forecast system B for all user cost–loss ratios if ΔH = HAHB > 0 and ΔF = FAFB < 0.

Using (A1) and (A3), ΔH > 0 if (but not only if)
X[λAx+(1λA)x¯+βA]σA<X[λBx+(1λB)x¯+βB]σB,for allx>X
[so that the inner integral in the expression (A3) for HA includes a larger area than in the equivalent expression for HB].
Similarly, ΔF < 0 if (but not only if)
X[λAx+(1λA)x¯+βA]σA>X[λBx+(1λB)x¯+βB]σB,for allx<X
[so that the inner integral in (A4) includes a smaller area than in the equivalent expression for HB].
In the case where the forecasts have no unconditional or conditional biases, βA = βB = 0 and λA = λB = 1. Expression (A5) becomes
XxσA<XxσB,for allx>X,
so that the numerator is negative and (A6) becomes
XxσA>XxσB,for allx<X,
so that the numerator is positive, which together prove the result in section 4a. This holds for any event threshold X.
In the case where the forecasts do have biases, it is allowed that λA and λB ≠ 1 and βA and βB ≠ 0. We can rearrange (A5) to obtain x>x*, for all x > X, where
x*=(σBσA)X+[(1λB)σA(1λA)σB]x¯+σAβBσBβAσBλAσAλB.
This holds, without loss of generality, if (σA/λA) < (σB/λB) so that the denominator of (A9) is positive. For ΔH > 0 to be true since x > X, we therefore require Xx*. Conversely, because of the opposite direction of the inequalities in (A6), for ΔF < 0 to be true, we require Xx*.

Therefore ΔH > 0 and ΔF < 0 both hold when (σA/λA) < (σB/λB) and X=x*. Solving this equation for X leads to the result in section 4c. The result in section 4b follows when βA = βB = 0. Unlike the random-error only case, in these cases the value comparisons at other thresholds X are dependent on P(x) and may differ for users with different cost–loss ratios.

In the case where σA = σB and λA = λB but βAβB (for instance, where comparing an un-bias-corrected forecast system with a system using the same forecasts with unconditional biases removed), (A5) and (A6) cannot both be satisfied, so we do not obtain a simple general solution for circumstances where one forecast system is better than the other for all user sensitivities.

REFERENCES

  • Arribas, A., K. B. Robertson, and K. R. Mylne, 2005: Test of a poor man’s ensemble prediction system for short-range probability forecasting. Mon. Wea. Rev., 133, 18251839, https://doi.org/10.1175/MWR2911.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Azzalini, A., 1985: A class of distributions which includes the normal ones. Scand. J. Stat., 12, 171178.

  • Brunk, H. D., 1965: An Introduction to Mathematical Statistics. 2nd ed Blaisdell, 429 pp.

  • Buizza, R., 2001: Accuracy and potential economic value of categorical and probabilistic forecasts of discrete events. Mon. Wea. Rev., 129, 23292345, https://doi.org/10.1175/1520-0493(2001)129<2329:AAPEVO>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Buizza, R., 2008: The value of probabilistic prediction. Atmos. Sci. Lett., 9, 3642, https://doi.org/10.1002/asl.170.

  • Buizza, R., and M. Leutbecher, 2015: The forecast skill horizon. Quart. J. Roy. Meteor. Soc., 141, 33663382, https://doi.org/10.1002/qj.2619.

  • Engel, C., and E. Ebert, 2007: Performance of hourly Operational Consensus Forecasts (OCFs) in the Australian region. Wea. Forecasting, 22, 13451359, https://doi.org/10.1175/2007WAF2006104.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Engel, C., and E. Ebert, 2012: Gridded operational consensus forecasts of 2-m temperature over Australia. Wea. Forecasting, 27, 301322, https://doi.org/10.1175/WAF-D-11-00069.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Griffiths, D., M. Foley, I. Ioannou, and T. Leeuwenburg, 2019: Flip-flop index: Quantifying revision stability for fixed-event forecasts. Meteor. Appl., 26, 3035, https://doi.org/10.1002/met.1732.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hemri, S., M. Scheuerer, F. Pappenberger, K. Bogner, and T. Haiden, 2014: Trends in the predictive performance of raw ensemble weather forecasts. Geophys. Res. Lett., 41, 91979205, https://doi.org/10.1002/2014GL062472.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lazo, J. K., R. E. Morss, and J. L. Demuth, 2009: 300 billion served: Sources, perceptions, uses, and values of weather forecasts. Bull. Amer. Meteor. Soc., 90, 785798, https://doi.org/10.1175/2008BAMS2604.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Marzban, C., 2012: Displaying economic value. Wea. Forecasting, 27, 16041612, https://doi.org/10.1175/WAF-D-11-00138.1.

  • Matte, S., M. A. Boucher, V. Boucher, and T. C. Fortier Filion, 2017: Moving beyond the cost-loss ratio: Economic assessment of streamflow forecasts for a risk-averse decision maker. Hydrol. Earth Syst. Sci., 21, 29672986, https://doi.org/10.5194/hess-21-2967-2017.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1993: What is a good forecast? An essay on the nature of goodness in weather forecasting. Wea. Forecasting, 8, 281293, https://doi.org/10.1175/1520-0434(1993)008<0281:WIAGFA>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mylne, K. R., 2002: Decision-making from probability forecasts based on forecast value. Meteor. Appl., 9, 307315, https://doi.org/10.1017/S1350482702003043.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Palmer, T. N., 2002: The economic value of ensemble forecasts as a tool for risk assessment: From days to decades. Quart. J. Roy. Meteor. Soc., 128, 747774, https://doi.org/10.1256/0035900021643593.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Puri, K., and Coauthors, 2013: Implementation of the initial ACCESS numerical weather prediction system. Aust. Meteor. Oceanogr. J., 63, 265284, https://doi.org/10.22499/2.6302.001.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ramos, M. H., S. J. Van Andel, and F. Pappenberger, 2013: Do probabilistic forecasts lead to better decisions? Hydrol. Earth Syst. Sci., 17, 22192232, https://doi.org/10.5194/hess-17-2219-2013.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Richardson, D. S., 2000: Skill and relative economic value of the ECMWF ensemble prediction system. Quart. J. Roy. Meteor. Soc., 126, 649667, https://doi.org/10.1002/qj.49712656313.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Roebber, P. J., and L. F. Bosart, 1996: The complex relationship between forecast skill and forecast value: A real-world analysis. Wea. Forecasting, 11, 544559, https://doi.org/10.1175/1520-0434(1996)011<0544:TCRBFS>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Roulston, M. S., and L. A. Smith, 2004: The boy who cried wolf revisited: The impact of false alarm intolerance on cost–loss scenarios. Wea. Forecasting, 19, 391397, https://doi.org/10.1175/1520-0434(2004)019<0391:TBWCWR>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Shorr, B., 1966: The cost/loss utility ratio. J. Appl. Meteor., 5, 801803, https://doi.org/10.1175/1520-0450(1966)005<0801:TCUR>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Tian, Y., G. S. Nearing, C. D. Peters-Lidard, K. W. Harrison, and L. Tang, 2016: Performance metrics, error modeling, and uncertainty quantification. Mon. Wea. Rev., 144, 607613, https://doi.org/10.1175/MWR-D-15-0087.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Vannitsem, S., D. S. Wilks, and J. Messner, 2018: Statistical Postprocessing of Ensemble Forecasts. Elsevier Science, 362 pp.

  • Verkade, J. S., and M. G. Werner, 2011: Estimating the benefits of single value and probability forecasting for flood warning. Hydrol. Earth Syst. Sci., 15, 37513765, https://doi.org/10.5194/hess-15-3751-2011.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Woodcock, F., and C. Engel, 2005: Operational consensus forecasts. Wea. Forecasting, 20, 101111, https://doi.org/10.1175/WAF-831.1.

  • Zhu, Y., Z. Toth, R. Wobus, D. Richardson, and K. Mylne, 2002: The economic value of ensemble-based weather forecasts. Bull. Amer. Meteor. Soc., 83, 7383, https://doi.org/10.1175/1520-0477(2002)083<0073:TEVOEB>2.3.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
Save
  • Arribas, A., K. B. Robertson, and K. R. Mylne, 2005: Test of a poor man’s ensemble prediction system for short-range probability forecasting. Mon. Wea. Rev., 133, 18251839, https://doi.org/10.1175/MWR2911.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Azzalini, A., 1985: A class of distributions which includes the normal ones. Scand. J. Stat., 12, 171178.

  • Brunk, H. D., 1965: An Introduction to Mathematical Statistics. 2nd ed Blaisdell, 429 pp.

  • Buizza, R., 2001: Accuracy and potential economic value of categorical and probabilistic forecasts of discrete events. Mon. Wea. Rev., 129, 23292345, https://doi.org/10.1175/1520-0493(2001)129<2329:AAPEVO>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Buizza, R., 2008: The value of probabilistic prediction. Atmos. Sci. Lett., 9, 3642, https://doi.org/10.1002/asl.170.

  • Buizza, R., and M. Leutbecher, 2015: The forecast skill horizon. Quart. J. Roy. Meteor. Soc., 141, 33663382, https://doi.org/10.1002/qj.2619.

  • Engel, C., and E. Ebert, 2007: Performance of hourly Operational Consensus Forecasts (OCFs) in the Australian region. Wea. Forecasting, 22, 13451359, https://doi.org/10.1175/2007WAF2006104.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Engel, C., and E. Ebert, 2012: Gridded operational consensus forecasts of 2-m temperature over Australia. Wea. Forecasting, 27, 301322, https://doi.org/10.1175/WAF-D-11-00069.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Griffiths, D., M. Foley, I. Ioannou, and T. Leeuwenburg, 2019: Flip-flop index: Quantifying revision stability for fixed-event forecasts. Meteor. Appl., 26, 3035, https://doi.org/10.1002/met.1732.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hemri, S., M. Scheuerer, F. Pappenberger, K. Bogner, and T. Haiden, 2014: Trends in the predictive performance of raw ensemble weather forecasts. Geophys. Res. Lett., 41, 91979205, https://doi.org/10.1002/2014GL062472.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lazo, J. K., R. E. Morss, and J. L. Demuth, 2009: 300 billion served: Sources, perceptions, uses, and values of weather forecasts. Bull. Amer. Meteor. Soc., 90, 785798, https://doi.org/10.1175/2008BAMS2604.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Marzban, C., 2012: Displaying economic value. Wea. Forecasting, 27, 16041612, https://doi.org/10.1175/WAF-D-11-00138.1.

  • Matte, S., M. A. Boucher, V. Boucher, and T. C. Fortier Filion, 2017: Moving beyond the cost-loss ratio: Economic assessment of streamflow forecasts for a risk-averse decision maker. Hydrol. Earth Syst. Sci., 21, 29672986, https://doi.org/10.5194/hess-21-2967-2017.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1993: What is a good forecast? An essay on the nature of goodness in weather forecasting. Wea. Forecasting, 8, 281293, https://doi.org/10.1175/1520-0434(1993)008<0281:WIAGFA>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mylne, K. R., 2002: Decision-making from probability forecasts based on forecast value. Meteor. Appl., 9, 307315, https://doi.org/10.1017/S1350482702003043.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Palmer, T. N., 2002: The economic value of ensemble forecasts as a tool for risk assessment: From days to decades. Quart. J. Roy. Meteor. Soc., 128, 747774, https://doi.org/10.1256/0035900021643593.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Puri, K., and Coauthors, 2013: Implementation of the initial ACCESS numerical weather prediction system. Aust. Meteor. Oceanogr. J., 63, 265284, https://doi.org/10.22499/2.6302.001.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ramos, M. H., S. J. Van Andel, and F. Pappenberger, 2013: Do probabilistic forecasts lead to better decisions? Hydrol. Earth Syst. Sci., 17, 22192232, https://doi.org/10.5194/hess-17-2219-2013.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Richardson, D. S., 2000: Skill and relative economic value of the ECMWF ensemble prediction system. Quart. J. Roy. Meteor. Soc., 126, 649667, https://doi.org/10.1002/qj.49712656313.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Roebber, P. J., and L. F. Bosart, 1996: The complex relationship between forecast skill and forecast value: A real-world analysis. Wea. Forecasting, 11, 544559, https://doi.org/10.1175/1520-0434(1996)011<0544:TCRBFS>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Roulston, M. S., and L. A. Smith, 2004: The boy who cried wolf revisited: The impact of false alarm intolerance on cost–loss scenarios. Wea. Forecasting, 19, 391397, https://doi.org/10.1175/1520-0434(2004)019<0391:TBWCWR>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Shorr, B., 1966: The cost/loss utility ratio. J. Appl. Meteor., 5, 801803, https://doi.org/10.1175/1520-0450(1966)005<0801:TCUR>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Tian, Y., G. S. Nearing, C. D. Peters-Lidard, K. W. Harrison, and L. Tang, 2016: Performance metrics, error modeling, and uncertainty quantification. Mon. Wea. Rev., 144, 607613, https://doi.org/10.1175/MWR-D-15-0087.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Vannitsem, S., D. S. Wilks, and J. Messner, 2018: Statistical Postprocessing of Ensemble Forecasts. Elsevier Science, 362 pp.

  • Verkade, J. S., and M. G. Werner, 2011: Estimating the benefits of single value and probability forecasting for flood warning. Hydrol. Earth Syst. Sci., 15, 37513765, https://doi.org/10.5194/hess-15-3751-2011.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Woodcock, F., and C. Engel, 2005: Operational consensus forecasts. Wea. Forecasting, 20, 101111, https://doi.org/10.1175/WAF-831.1.

  • Zhu, Y., Z. Toth, R. Wobus, D. Richardson, and K. Mylne, 2002: The economic value of ensemble-based weather forecasts. Bull. Amer. Meteor. Soc., 83, 7383, https://doi.org/10.1175/1520-0477(2002)083<0073:TEVOEB>2.3.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    Relative economic value curves for OCF and ACCESS-R, as a function of cost–loss ratio α for decision thresholds (a) X = 30°C and (b) 35°C. Forecasts are for 36-h lead time for screen-level temperature at 0600 UTC, across 200 observation locations in southern Australia for summer 2015/16, 2016/17, 2017/18, and 2018/19.

  • Fig. 2.

    Most valuable basis for decision, out of OCF, ACCESS-R, and climatology, as a function of cost–loss ratio α and decision threshold X. Also shown are hit rates and false alarm rates for OCF and ACCESS-R. Forecasts are as for Fig. 1.

  • Fig. 3.

    Comparison of (a) standard deviation of random errors σ and (b) scale parameter λ for OCF and ACCESS-R 36-h forecasts of 0600 UTC temperature from the linear error model with least squares fit for individual observation locations in southern Australia for summer 2015/16, 2016/17, 2017/18, and 2018/19. Points above (below) the diagonal in (a) have a smaller (larger) spread of random errors for OCF than for ACCESS-R. Points in the left and right quadrants of the cross in (b) have a scale closer to 1 for ACCESS-R than for OCF, while the converse holds for the top and bottom quadrants.

  • Fig. 4.

    Comparison of unconditional bias β for same forecasts as for Fig. 3.

  • Fig. 5.

    Most valuable basis for decision, out of A (synthetic forecasts with βA = 0, λA = 1, σA = 1.96), B (βB = 0, λB = 1, σB = 2.39), and climatology as a function of cost–loss ratio α and decision threshold X. Forecasts were generated using the linear error model with random errors drawn from a Gaussian distribution. Also shown are the hit rates and false alarm rates for A and B. Observations are synthetic with mean 24.2°C (dot–dashed line) and standard deviation 6.7°C.

  • Fig. 6.

    As in Fig. 5, for A (βA = 0, λA = 0.911, σA = 2.18) and B (βB = 0, λB = 0.943, σB = 2.18).

  • Fig. 7.

    Most valuable basis for decision, out of synthetic versions of OCF, ACCESS-R, and climatology, as a function of cost–loss ratio α and decision threshold X. Forecasts were generated using the linear error model with random errors drawn from a Gaussian distribution, taking parameters from Table 2. Also shown are hit rates and false alarm rates for synthetic OCF and ACCESS-R. Observations are synthetic with mean 24.2°C (dot–dashed line) and standard deviation 6.7°C. The dotted line shows the decision threshold corresponding to (12), where one forecast system will be best for all sensitivities.

  • Fig. 8.

    As in Fig. 6, but synthetic observations have lower variability with standard deviation 5.3°C. (The data are noisier at extreme high and low decision thresholds as the narrower distribution leads to fewer cases in the extremes, compared with Fig. 6.)

  • Fig. 9.

    As in Fig. 6, but synthetic observations have higher variability with standard deviation 8.0°C.

  • Fig. 10.

    As in Fig. 7, but with random errors drawn from a skew normal distribution having skewness 0.14 for synthetic OCF and 0.46 for synthetic ACCESS-R.

All Time Past Year Past 30 Days
Abstract Views 103 0 0
Full Text Views 343 72 4
PDF Downloads 478 71 3