## 1. Introduction

Weather forecasts, seasonal forecasts, and climate projections, consisting of either single values, ensembles, or probabilities, can be used to help make decisions of many different types (Wilks and Wolfe 1998; Wilks et al. 1993; Jewson and Ziehmann 2004; Skoglund et al. 2015; Molinder et al. 2018; CNN 2019). In certain use-cases, decisions can then be further improved by using information about how the forecasts or projections might change over time. One such use-case occurs when the forecast user has to decide whether to act now or wait for the next forecast. Weather forecast–related situations in which this question arises are diverse, ranging from a farmer deciding whether to fertilize crops now or wait for the next forecast to an event organizer deciding whether to cancel a large outdoor event now or wait for the next forecast. Climate projection–related situations range from a government deciding whether to build a flood defense now or wait for the next set of projections to an investor deciding whether to buy or sell an asset now or wait for the next set of projections.

Another set of use-cases in which forecast change information is useful arises when the forecast user has to decide which of a series of lagged forecasts (also known as revised forecasts) they will use, when using a forecast has both costs and benefits. Examples of these types of situation would include the following: when forecasts are expensive, and there is a trade-off between the benefit of being able to use a forecast and the cost; when forecast users need to perform calibration of forecasts to render them useful, which involves a cost; in the lead-up to an event, when the forecast user is trying to understand how frequently they should undertake a detailed and costly assessment of whether to cancel the event due to possible adverse weather; in a trading situation, when the forecast user is trying to decide how often potentially costly forecast-based hedging should be planned.

In all of the above examples, the forecast (or projection) user would ideally have access to information about how the forecasts and their skill may change. More advanced forecast users with in-house research teams may be able to derive such information themselves from their own forecast archives. For other forecast users faced with these kinds of decisions, we make the suggestion, building on the work of a number of previous authors (such as Jewson and Ziehmann 2004; Regnier and Harr 2006; McLay 2008, 2011; Fowler et al. 2015; Griffiths et al. 2019), that it may be useful if forecast change information were made available by forecast providers along with their forecasts. This would be most impactful for the most commonly used and most critical forecasts, which might include tropical cyclone forecasts, forecasts for key temperature indices used in the energy industry, and climate projections for key metrics such as regional rainfall.

In general, what forecast information is required depends on the decisions being made. The logic of decision-making that involves temporal aspects of forecasts has been analyzed in various ways by various authors (Murphy et al. 1985; Epstein and Murphy 1987; Murphy and Ye 1990; Wilks 1991; Katz 1993; McLay 2008, 2011). In particular, Regnier and Harr (2006) and Jewson et al. (2021, hereafter J21) have argued that the act-now-or-wait decision can be improved significantly when forecast-change information is included.

Aspects of forecast changes have also been analyzed by various authors. Hamill (2003) looked at whether lagged forecast information could be used to improve the final forecast via forecast calibration; Jewson and Ziehmann (2004) showed that the size of forecast changes is relevant for weather-related risk management and can be predicted using the ensemble spread; Zsoter et al. (2009) and Richardson et al. (2020) looked at diagnostics of “jumpiness,” or “consistency,” of uncalibrated lagged numerical model forecasts; and Fowler et al. (2015) and Griffiths et al. (2019) looked at measures of consistency of lagged forecasts, for the purpose of communicating information about possible forecast changes to users of forecasts.

Our goal is to define a comprehensive set of diagnostics for communicating information about sizes of forecast changes, and demonstrate with some examples how the information might be used. Relative to Fowler et al. (2015), who discuss autocorrelation of forecasts, runs tests, and metrics for understanding changes in tropical cyclone forecasts, we discuss a broader and simpler set of forecast change diagnostics, but only for univariate forecasts. Relative to Griffiths et al. (2019), who consider a single summary index of forecast changes that they call the flip-flop index, we consider a larger number of more granular ways to communicate information about univariate forecast changes.

A large amount of information about forecast changes can be derived from analysis of historical forecasts: we call forecast change metrics derived in this way *unconditional* forecast change metrics. Conversely, one can also consider forecast change metrics that vary as a function of the forecast, that we call *conditional* forecast change metrics. We will focus on ways to communicate unconditional forecast change metrics, since they provide general information about a forecast, and are relevant to a wide range of possible user situations. We explore ways in which unconditional forecast change information and metrics could be presented in readily digestible formats that might help forecast users improve their decision-making in situations of the types discussed above.

Some forecast-based decisions are made quantitatively and require specific forecast metrics as input to mathematical decision models, but many forecast-based decisions are made qualitatively, based on a subjective assessment of many different factors. For such subjective assessments, the forecast user can benefit from being able to understand the behavior of the forecast from as many different perspectives as possible. The metrics we show are designed to present new perspectives in order to contribute to such assessments. Our overall philosophy is that forecast users can only benefit from having more information about the performance of the forecast that they are using, including information about forecast changes.

We consider situations in which a forecast user is interested in forecasts for a specific target day, and we will demonstrate the ideas in the context of medium range site-specific temperature forecasts. Presenting forecast change information for other forecasts, and climate projections, may require slightly different approaches, but could use many of the same concepts.

In section 2, we discuss some use-case examples, the forecast and observational data we use for our study, and how we calibrate the forecast. In section 3, we present a set of unconditional metrics related to changes in the forecast *values* over time. In section 4, we present a set of unconditional metrics related to changes in the forecast *skill* over time. In section 5, we discuss a number of examples that illustrate how some of the information we have presented might be used in a quantitative way to decide when to use a forecast. In section 6, we summarize and conclude.

## 2. Use-case examples, data, and forecast calibration

### a. Use-case examples

As we present various forecast change metrics, we will consider six particular use-cases, derived from the examples we have given above. To distinguish clearly between a number of slightly different situations that arise when using forecasts, in some places we will separate *using* forecasts into the two steps of first *accessing* the forecast and then *acting* on the forecast.

The first use-case we consider (*case 1*) is the situation in which the user has to decide whether to act now or wait for the next forecast, as discussed above, in Regnier and Harr (2006), and in J21. In this use-case, there is considered to be no cost to accessing a forecast, and the forecast user might access forecasts every day and evaluate each time whether to act or not. They have to make a difficult trade-off between the benefits of acting early, but based on forecasts that are less accurate on average, versus acting later, based on forecasts that are more accurate on average. To make that trade-off, they need information about how different, and how much more accurate, the later forecasts might be. There are many situations in which this use-case arises in real decision-making, as discussed in the introduction.

The second use-case (*case 2*) is the situation in which accessing the forecast has some cost, which is constant. This cost might be a literal cost that the user has to pay to the forecast provider, or it might be a cost in terms of the time, attention, and human or computing resources required to acquire or calibrate the forecast. Examples of literal costs of forecasts are commonplace in applied meteorology, since many users access forecasts from forecast providers (including national meteorological services) that charge fees. A specific example of cost in terms of time, attention, and resources arises in the insurance industry, as a hurricane approaches the U.S. coastline. Ideally, insurance companies would monitor such storms round the clock, but in practice this may need substantial human resources. As a result of costs of accessing forecasts, the user might decide only to access the forecasts intermittently as the target day approaches. In a fully logical approach, the forecast user should decide when to access the forecast based on some kind of cost–benefit analysis. This decision should take into account, among other things, how much the forecast is likely to change over time and how much more accurate the forecast is likely to become. There are various cases, depending on the user situation. For instance, in many situations, a new forecast that is unlikely to be materially different from the previous forecast may not be of much use. A new forecast that is likely to be different but unlikely to be more accurate may not be of much use, either. In other, rather specific, situations, however, knowing the new forecasts may be useful even if the new forecast is not materially different, nor likely to be more accurate, because it may help predict the behavior of other forecast users. This might be the case in weather-forecast influenced trading, for instance. In most situations, a new forecast that is likely to be more accurate is of more use. In case 2 situations, the forecast user has to assess these issues of how much a forecast is likely to change, and how much more accurate it is likely to be, before they have seen the forecast, in order to determine the value of the forecast and make their decision as to whether to access the forecast and accept the associated cost. A related point is that in some such situations, it would be helpful for forecast users if forecast providers would alert users when new forecasts show particularly large changes relative to previous forecasts and relative to typical sizes of changes. Users may then decide to access a forecast that they would typically not access.

The third use-case (*case 3*) is the situation in which accessing the forecast has no cost, but taking action based on the forecast has some cost, again considered constant. This cost might be the cost of running impact models based on the forecast, or it might be the cost of making adjustments to plans based on the forecast or on the output from the impact models. Examples of costs related to running impact models are common in the insurance industry, since complex impact models are used which convert forecasts into possible damages. Examples of costs related to making adjustments to plans based on forecasts are common throughout applied meteorology. For instance, in trading, trading positions may need to be adjusted as forecasts change, and each trade may involve a brokerage fee. In case 3 situations, the user might access every forecast, but only take action on some of the forecasts. We consider two variations of case 3. In case 3a, we assume that the user has to decide in advance of obtaining any of the forecasts (perhaps months in advance) at which points in time they will act on the forecasts. This situation arises when acting on a forecast requires human or other resources that must be hired, purchased, or allocated in advance. In case 3b, we assume that the user can decide in real time, once they have seen a forecast, whether to act on it or not. In case 3a, the in-advance decision about which forecasts to act on should, in a logical approach, depend on the *likely* changes in the values and skill of the forecast. In case 3b, the real-time decision about which forecasts to act on can depend on the *actual* size of the change in the forecast, which is then known, and the *likely* change in skill of the forecast. In case 3b, if the new forecast is different, but unlikely to be more accurate, then taking action is perhaps less likely to make sense, except in the situations referred to above where understanding the behavior of other forecast users is also relevant. If the new forecast is likely to be more accurate, then taking action may make more sense.

The reason we distinguish between case 2 and case 3, which both involve a constant cost when using the forecast, is so that we can separate case 3 into 3a and 3b, which lead to different decision logic. In fact, case 2 could also be separated in a similar way, into “in-advance” and “real-time” decision situations, and the resulting decisions would differ for cases in which the previous forecast affects the decision as to whether it is worth accepting the cost of accessing the current forecast. However, we do not make this distinction, since although this is relevant to decision-making in general, it is less relevant to the use of the forecast change metrics we describe below. For most situations, including our examples below, case 2 and case 3a are equivalent in terms of the way information would be used by the forecast user, while case 3b is different.

The fourth use-case (*case 4)* is similar to case 3, but now we consider that the cost of acting on the forecast varies. There are many reasons why costs of acting on a forecast might vary. Examples would include cancellation charges increasing with time, labor costs being higher on weekends, a farmer ramping up or down the level of irrigation as a function of the forecast, or a trader ramping up or down the level of hedging as a function of the forecast. As with case 3, we separate case 4 into two variations. In case 4a, we assume that the user has to decide in advance of obtaining any forecasts at which points in time they will act on the forecasts, and in case 4b we assume they can decide in real time. The reason we make a distinction between case 3 and case 4 is that in case 4, the decision as to whether to act on a forecast needs to be made differently, since the cost of acting may depend on the size of the forecast change. In case 4a, this decision needs to be made in advance based on estimates of the *likely* change in the size of the forecast, while in case 4b, it can be made based on the *actual* forecast value.

This categorization of use-cases is not intended to be exhaustive, but rather to help guide understanding of some of the different situations in which forecast change information may be useful. The differences between the cases are summarized in Table 1. As we present metrics for measuring forecast change, we will use all six cases (1, 2, 3a, 3b, 4a, 4b) to illustrate how the metrics could be interpreted and applied.

Definitions of cases 1, 2, 3a, 3b, 4a, and 4b. The first column is the label for the case. The second column specifies whether there is a cost related to accessing the forecast. The third column specifies whether there is a cost related to acting on the forecast, and whether that cost is constant or variable. The fourth column indicates when the decision has to be made as to which forecasts to access or act on.

### b. Forecasts and observed data

The data we use for our study consists of 2-m temperature reforecasts from ECMWF and observed temperature data from the Swedish Meteorological and Hydrological Institute (SMHI) station in central Stockholm, WMO reference 98210. The reforecasts are ERA-Interim reforecasts, initialized at 1200 GMT (Dee et al. 2011) for a grid point corresponding to Stockholm. They run from 1980 to 2018, giving 39 years of forecasts in total, and contain 8 lead times, corresponding to forecasts made at initialization times at 24 hourly intervals from −180 h before the target day to −12 h before the target day. We will refer to these forecasts as −8, −7, …, −1 day forecasts.

### c. Forecast calibration

Before calculating any forecast change metrics, we calibrate the forecasts in a series of four steps as follows.

First, we number the years from 1 to 39 and split both the forecast and observed datasets into odd and even years, giving 20 odd years and 19 even years. We split the data in this way so that we can derive calibration parameters using the odd years as training data and apply them to the even years as testing data, thus calculating all metrics out of sample. Calculating metrics out of sample in this way means that calibration uncertainty is included in the metrics. We split by odd and even years, rather than first and second halves, to reduce the possible impact of temperature trends.

Second, we use the training data to create smoothed average seasonal cycles for both forecasts and observations by combining all 20 years to produce an average seasonal cycle, which we then smooth using a 31-day window. The window length was chosen by trial and error, to give a suitably smooth seasonal cycle in the training data. We use separate seasonal cycles for each forecast lead time and for the observations. We then remove the resulting seasonal cycles from the 19 years of testing data for both forecasts and observations, to create anomalies.

Third, we use the training data anomalies to calculate seasonal cycles of forecast bias for each lead time, which we then smooth using a 15-day window. The window length was chosen by trial and error, to give suitably smooth estimates of bias in the training data. We apply these biases as bias corrections to the testing data forecast anomalies. Removing bias in this way ensures that forecast changes between consecutive forecasts for the same target day have a mean close to zero.

Fourth, we use multiple linear regression to produce regression-adjusted forecasts, by regressing the anomaly observations onto the corresponding bias-corrected anomaly forecasts for the same target day. The predictors in the multiple linear regression include all prior forecasts for that day e.g., the 6-day forecasts produced by the regression use 6-, 7-, and 8-day forecasts as predictors. We do not include an offset in the regression, since bias correction has already been applied in the previous step. We fit the regression parameters to the training data and apply them to the testing data. Applying multiple linear regression in this way ensures that forecast changes between a forecast and all subsequent forecasts for the same target day are uncorrelated in the training data and approximately uncorrelated in the testing data. Implicit in the way we have implemented this regression step is the assumption that the regression parameters do not vary throughout the year. To test the validity of this assumption, we will also repeat parts of our analysis using regression parameters calculated on a seasonal basis.

The statistical behavior of the resulting forecasts is illustrated in Fig. 1 using standard metrics. We see that the root-mean-squared error (RMSE) of the forecast reduces from 3.6°C on day −8 to 1.8°C on day −1. The mean absolute error (MAE) reduces from 2.8° to 1.3°C. The anomaly correlation increases from 0.52 to 0.91 and the anomaly standard deviation of the forecast increases from 2.15° to 3.96°C. The low standard deviation of the longer lead forecasts is a result of the regression adjustment, which reduces the variance of forecasts with low anomaly correlation as part of the process of minimizing RMSE.

These calibrated forecasts are single-valued forecasts, to be interpreted as estimates of the mean of the distribution of future temperatures. This interpretation is implicit in the use of linear regression to calibrate the forecasts. Probability distributions around these mean values can be created by using the RMSE values given in Fig. 1a as the standard deviation of a distribution around the mean, using a distribution such as the normal or the *t* distribution. However, probabilities created in this way neglect the possibility that the uncertainty around the mean may vary with the weather state. Variations in the uncertainty that do vary with the weather state can be created from ensemble forecasts using nonhomogeneous regression (Jewson et al. 2004; Gneiting et al. 2005; Wilks 2006; Thorarinsdottir and Johnson 2012; Scheuerer 2013; Lalić et al. 2017; Gebetsberger et al. 2018).

All of the metrics we present below are ways of understanding possible changes in the mean of the forecast. One way information about changes in probabilities could be derived would be to combine information about possible changes in the mean with information about changes in the standard deviation: this approach was used in J21.

All the metrics shown in sections 3 and 4 are unconditional metrics and are derived from the 19 years of calibrated testing data. These are all examples of diagnostics that might be useful for informing the types of forecast-based decisions discussed in the introduction.

## 3. Changes in forecast values

We now show a number of diagnostics that illustrate how the calibrated forecasts described in section 2, considered for a fixed target day, change from one initialization time to a later initialization time. In this section, we focus on changes in the forecast *values* and not changes in the skill.

### a. Measures of typical sizes of forecast change

#### 1) Root-mean-square change

The first diagnostic of forecast change we consider is the root-mean-square size of forecast changes (RMSC) from one day to another. This is a generalization of RMSE, and as such is a useful first measure to assess the typical size of forecast changes. We illustrate RMSC values in Fig. 2 for our example forecasts. The format of Figs. 3–8 follows the format of Fig. 2, and we will now explain this format in detail.

Root-mean-square changes in the forecasts from Fig. 1, from the forecast made on day *M* (on the horizontal axes) to the forecast made on day *N* (the numbers indicated on the lines inside each graph), where *N* > *M*, i.e., all changes are forward in time. All panels show the same values, but with different red crosses to aid with interpretation. Markers show the changes (a) from day *M* to day 0, (b) from day *M* to day −4, (c) from day −5 to day *N*, and (d) from day *M* to the next forecast on day *M* + 1.

Citation: Weather and Forecasting 37, 1; 10.1175/WAF-D-21-0086.1

Root-mean-square changes in the forecasts from Fig. 1, from the forecast made on day *M* (on the horizontal axes) to the forecast made on day *N* (the numbers indicated on the lines inside each graph), where *N* > *M*, i.e., all changes are forward in time. All panels show the same values, but with different red crosses to aid with interpretation. Markers show the changes (a) from day *M* to day 0, (b) from day *M* to day −4, (c) from day −5 to day *N*, and (d) from day *M* to the next forecast on day *M* + 1.

Citation: Weather and Forecasting 37, 1; 10.1175/WAF-D-21-0086.1

Root-mean-square changes in the forecasts from Fig. 1, from the forecast made on day *M* (on the horizontal axes) to the forecast made on day *N* (the numbers indicated on the lines inside each graph), where *N* > *M*, i.e., all changes are forward in time. All panels show the same values, but with different red crosses to aid with interpretation. Markers show the changes (a) from day *M* to day 0, (b) from day *M* to day −4, (c) from day −5 to day *N*, and (d) from day *M* to the next forecast on day *M* + 1.

Citation: Weather and Forecasting 37, 1; 10.1175/WAF-D-21-0086.1

Further diagnostics of sizes of changes in the forecasts from Fig. 1. All changes are from day *M* (horizontal axes) to day *N* (numbers in the graph). (a) Mean absolute change in the forecast, (b) the probability that the absolute change exceeds 0.25°C, (c) the probability that the absolute change exceeds 1°C, and (d) the probability that the absolute change exceeds 4°C.

Citation: Weather and Forecasting 37, 1; 10.1175/WAF-D-21-0086.1

Further diagnostics of sizes of changes in the forecasts from Fig. 1. All changes are from day *M* (horizontal axes) to day *N* (numbers in the graph). (a) Mean absolute change in the forecast, (b) the probability that the absolute change exceeds 0.25°C, (c) the probability that the absolute change exceeds 1°C, and (d) the probability that the absolute change exceeds 4°C.

Citation: Weather and Forecasting 37, 1; 10.1175/WAF-D-21-0086.1

Further diagnostics of sizes of changes in the forecasts from Fig. 1. All changes are from day *M* (horizontal axes) to day *N* (numbers in the graph). (a) Mean absolute change in the forecast, (b) the probability that the absolute change exceeds 0.25°C, (c) the probability that the absolute change exceeds 1°C, and (d) the probability that the absolute change exceeds 4°C.

Citation: Weather and Forecasting 37, 1; 10.1175/WAF-D-21-0086.1

Further diagnostics of sizes of changes in the forecasts from Fig. 1. All changes are from day *M* (horizontal axes) to day *N* (numbers in each graph). All panels show probabilities of changes less than or greater than certain values. Probabilities of change (a) <−0.25°C, (b) >0.25°C, (c) <−2°C, and (d) >2°C.

Citation: Weather and Forecasting 37, 1; 10.1175/WAF-D-21-0086.1

Further diagnostics of sizes of changes in the forecasts from Fig. 1. All changes are from day *M* (horizontal axes) to day *N* (numbers in each graph). All panels show probabilities of changes less than or greater than certain values. Probabilities of change (a) <−0.25°C, (b) >0.25°C, (c) <−2°C, and (d) >2°C.

Citation: Weather and Forecasting 37, 1; 10.1175/WAF-D-21-0086.1

Further diagnostics of sizes of changes in the forecasts from Fig. 1. All changes are from day *M* (horizontal axes) to day *N* (numbers in each graph). All panels show probabilities of changes less than or greater than certain values. Probabilities of change (a) <−0.25°C, (b) >0.25°C, (c) <−2°C, and (d) >2°C.

Citation: Weather and Forecasting 37, 1; 10.1175/WAF-D-21-0086.1

Further diagnostics of sizes of changes in the forecasts from Fig. 1. All changes are from day *M* (horizontal axes) to day *N* (numbers in each graph). All panels show percentiles of changes for different probabilities. Changes for the (a) 75th percentile, (b) 95th percentile, (c) 99th percentile, and (d) 99.9th percentile.

Citation: Weather and Forecasting 37, 1; 10.1175/WAF-D-21-0086.1

Further diagnostics of sizes of changes in the forecasts from Fig. 1. All changes are from day *M* (horizontal axes) to day *N* (numbers in each graph). All panels show percentiles of changes for different probabilities. Changes for the (a) 75th percentile, (b) 95th percentile, (c) 99th percentile, and (d) 99.9th percentile.

Citation: Weather and Forecasting 37, 1; 10.1175/WAF-D-21-0086.1

Further diagnostics of sizes of changes in the forecasts from Fig. 1. All changes are from day *M* (horizontal axes) to day *N* (numbers in each graph). All panels show percentiles of changes for different probabilities. Changes for the (a) 75th percentile, (b) 95th percentile, (c) 99th percentile, and (d) 99.9th percentile.

Citation: Weather and Forecasting 37, 1; 10.1175/WAF-D-21-0086.1

Further diagnostics of sizes of changes in the forecasts from Fig. 1. All changes are from day *M* (horizontal axes) to day *N* (numbers in each graph). All panels show root-mean-square changes, as in Fig. 2, but now separated by season.

Citation: Weather and Forecasting 37, 1; 10.1175/WAF-D-21-0086.1

Further diagnostics of sizes of changes in the forecasts from Fig. 1. All changes are from day *M* (horizontal axes) to day *N* (numbers in each graph). All panels show root-mean-square changes, as in Fig. 2, but now separated by season.

Citation: Weather and Forecasting 37, 1; 10.1175/WAF-D-21-0086.1

Further diagnostics of sizes of changes in the forecasts from Fig. 1. All changes are from day *M* (horizontal axes) to day *N* (numbers in each graph). All panels show root-mean-square changes, as in Fig. 2, but now separated by season.

Citation: Weather and Forecasting 37, 1; 10.1175/WAF-D-21-0086.1

Diagnostics of sizes of changes in forecast skill for the forecasts from Fig. 1. All changes are from day *M* (horizontal axes) to day *N* (numbers in each graph). (a) Reduction in RMSE, (b) reduction in RMSE as the proportion of the standard deviation of the anomaly observations of temperature, (c) the ratio of RMSC from Fig. 2 to the RMSE from (a), and (d) the inverse of this ratio.

Citation: Weather and Forecasting 37, 1; 10.1175/WAF-D-21-0086.1

Diagnostics of sizes of changes in forecast skill for the forecasts from Fig. 1. All changes are from day *M* (horizontal axes) to day *N* (numbers in each graph). (a) Reduction in RMSE, (b) reduction in RMSE as the proportion of the standard deviation of the anomaly observations of temperature, (c) the ratio of RMSC from Fig. 2 to the RMSE from (a), and (d) the inverse of this ratio.

Citation: Weather and Forecasting 37, 1; 10.1175/WAF-D-21-0086.1

Diagnostics of sizes of changes in forecast skill for the forecasts from Fig. 1. All changes are from day *M* (horizontal axes) to day *N* (numbers in each graph). (a) Reduction in RMSE, (b) reduction in RMSE as the proportion of the standard deviation of the anomaly observations of temperature, (c) the ratio of RMSC from Fig. 2 to the RMSE from (a), and (d) the inverse of this ratio.

Citation: Weather and Forecasting 37, 1; 10.1175/WAF-D-21-0086.1

Further diagnostics of sizes of changes in the forecast skill of the forecasts from Fig. 1. All changes are from day *M* (horizontal axes) to day *N* (numbers in each graph). All panels show probabilities that the later forecast is more or less accurate than the earlier forecast. (a) Probabilities that the later forecast is more accurate, (b) probabilities that the later forecast is less accurate, (c) probabilities that the later forecast is more than 0.5°C more accurate, and (d) probabilities that the later forecast is not more than 0.5°C more accurate.

Citation: Weather and Forecasting 37, 1; 10.1175/WAF-D-21-0086.1

Further diagnostics of sizes of changes in the forecast skill of the forecasts from Fig. 1. All changes are from day *M* (horizontal axes) to day *N* (numbers in each graph). All panels show probabilities that the later forecast is more or less accurate than the earlier forecast. (a) Probabilities that the later forecast is more accurate, (b) probabilities that the later forecast is less accurate, (c) probabilities that the later forecast is more than 0.5°C more accurate, and (d) probabilities that the later forecast is not more than 0.5°C more accurate.

Citation: Weather and Forecasting 37, 1; 10.1175/WAF-D-21-0086.1

Further diagnostics of sizes of changes in the forecast skill of the forecasts from Fig. 1. All changes are from day *M* (horizontal axes) to day *N* (numbers in each graph). All panels show probabilities that the later forecast is more or less accurate than the earlier forecast. (a) Probabilities that the later forecast is more accurate, (b) probabilities that the later forecast is less accurate, (c) probabilities that the later forecast is more than 0.5°C more accurate, and (d) probabilities that the later forecast is not more than 0.5°C more accurate.

Citation: Weather and Forecasting 37, 1; 10.1175/WAF-D-21-0086.1

The horizontal axes in all panels in Fig. 2 show the initialization time of a forecast as the number of days before the target day and range from 8 days before the target day (shown as −8) to 1 day before the target day (shown as −1). We will refer to the number on the horizontal axis as *M*. The graph contains 8 lines, each labeled with a number. These numbers also refer to the initialization time of a forecast as the number of days before the target day and range from 7 days before the target day (shown as −7) to 0 days before the target day (shown as 0), which is the target day itself (at which point the “forecast” is just the observations). We will refer to these numbers as *N*. The lines show values of RMSC of the forecast from day *M* to day *N*. We only show changes going forward in time, and so the value of *N* is always greater than (i.e., closer to zero than) the value of *M* for all the plotted values.

The four panels in Fig. 2 give the same values of RMSC (i.e., show the same black lines) but with different red markers. We will use the red markers to explain how to interpret this and the following figures. Figure 2a has markers added to the line that corresponds to all RMSC values from day *M* to day *N* = 0, varying *M* from −8 to −1. Since the forecast on day *N* = 0 is the observed value, this line corresponds to forecast changes from the forecast on day *M* to the observations, and is the same as the RMSE of the forecast, as shown in Fig. 1a. This is the sense in which RMSC is a generalization of RMSE. The points in Fig. 2, and subsequent graphs, could be joined by lines in different ways, or not at all. We have chosen to join them with lines in such a way that the values corresponding to the same value of *N* are joined together. One reason for joining them in this way is so that the *N* = 0 curve in Fig. 2 is equivalent to the standard RMSE curve.

Figure 2b has markers added to points that correspond to changes from day *M* to day *N* = −4. These points are joined in a curve because they correspond to the same value of *N* of −4. The values of *M* for this curve only run over preceding forecasts, from *M* = −8 to *M* = −5. The values on this curve might be useful for a forecast user who has accessed a forecast on day *M* (where *M* < −4) and has decided that the next time they will access the forecast will be on day *N* = −4. The marked curve shows them what size of change to expect, depending on the value of *M*.

Figure 2c has markers added to points that correspond to changes from day *M* = −5 to day *N*, for values of *N* from −4 to 0. These values would be relevant for a forecast user who has already accessed the forecast on day −5 and would like to know how much the forecast might change over the next 1, 2, 3, or 4 days, and finally how large the change might be from the day −5 forecast to the observations. One striking feature of the marked changes is that there is a large difference between the 1-day sizes of changes up to day −1, and the change to day *N* = 0, i.e., the changes between consecutive forecasts are smaller than the change between the final forecast and the observations. This feature is visible in all the metrics we present below. To give an example of how the information in Fig. 2c might be used, that relates to cases 2, 3a, and 4a as defined in section 2: if the forecast user concludes from this graph that the change in the forecast is likely to be small between day −5 and day −4, then they may conclude there is little reason to use the forecast on day −4, because of the cost involved.

Figure 2d has markers added to points that correspond to changes from day *M* to day *N* = *M* + 1, i.e., the 1-day changes in the forecast. As *M* changes from −8 to −2 these changes become gradually smaller, after which the final change, from day −1 to day 0, is again large.

RMSC values in general are also relevant to case 1, and J21 shows how RMSC values can be used as input to an algorithm that can be used to make the act-now-or-wait decision.

#### 2) Mean absolute change

Figure 3a again shows the typical sizes of forecast changes, but now measured using mean absolute change (MAC) rather than RMSC. The qualitative structure of the changes is the same as for RMSC, although the values are slightly smaller. The *N* = 0 line corresponds to the MAE curve shown in Fig. 1b. RMSC and MAC communicate very similar information, although some users may prefer one over the other.

### b. Probability of absolute changes of size X

Figures 3b–d show another way to understand possible sizes of forecast changes, by showing the probability that the absolute (i.e., unsigned) value of the change in the forecast will be greater than a threshold. We have picked thresholds of 0.25°, 1°, and 4°C to illustrate a range of behavior. Different users may prefer different thresholds, depending on their usage of the forecast. The threshold 0.25°C is a small change for our example forecast, and Fig. 3b shows that in most cases the probability that the change will exceed 0.25°C is greater than 80%. The only exceptions are the 1-day forecast changes from day −5 to −4, from day −4 to −3, from day −3 to −2, and from day −2 to −1.

Figure 3c shows probabilities of changes greater than 1°C. We see that the changes over the longest time periods, e.g., from day −8 to day −1, have a fairly high probability of being greater than 1°C. The changes over the shortest time periods and for the forecasts nearest to the target, e.g., from day −2 to day −1, show a much lower probability of showing this size of change.

Figure 3d shows probabilities of changes of greater than 4°C. A change of 4°C is a large forecast change from our example forecast and we see that the probabilities of changes of this size are all less than 30%. For the shortest time periods (e.g., 1-day changes such as from day −8 to day −7), the probability of such a large change is very close to zero.

These values may be relevant for forecast users in cases 2, 3a, or 4a, if when using the forecast there will be little impact if the forecast does not change by a *material* amount. Such users could consult these charts to assess whether a material change is likely and hence whether it would be worth using a particular forecast or not.

### c. Probability of change of size X

Figure 4 presents probabilities of different sizes of changes in the forecast, but now for signed changes. Figure 4a shows the probability of forecast *reductions* of greater than 0.25°C, and Fig. 4b shows the probability of forecast *increases* of greater than 0.25°C. Since these are small changes in the forecast, the probabilities are all close to 50%. If the distribution of forecast changes were entirely symmetrical around zero then these two figures would look the same. However, there are some slight differences between the two figures, indicating some skew or asymmetry in the forecast changes in our dataset. Figures 4c and 4d show the probabilities of forecast reductions and increases of greater than 2°C. This is a larger change, and now the probabilities are lower.

These values could again be useful for users in cases 2, 3a, and 4a, and could additionally be used in situations in which the question arises as to whether to act now or wait for the next forecast (case 1), and in which the decision is being made subjectively, rather than via the algorithm given in J21. For instance, if the weather on the target day is currently forecasted to be bad, and if, from these charts, the chance of it improving is small, then there may be more reason to act now rather than wait.

### d. Percentiles of changes

Another way to understand the distribution of possible changes in the forecast is to consider percentiles of the distribution of change. Figure 5a shows the 75th percentile, Fig. 5b shows the 95th percentile, Fig. 5c shows the 99th percentile and Fig. 5d shows the 99.9th percentile, all calculated empirically (i.e., by ranking the data and picking out the appropriate values). These four percentiles are all positive, indicating increases. Similar figures can also be created for percentiles below the 50th, which would likely show negative changes, representing decreases. Considering the change from day −8 to day 0 we see that there is a 25% chance this change will be 2.3°C or larger, a 5% chance that it will be 6.2°C or larger, a 1% chance it will be 9.2°C or larger, and a 0.1% chance that it will be 13°C or larger.

For the 99.9th percentile the number of cases in our testing data becomes small, and the results are noisy. However, it is perhaps useful for some forecast users to know that in 1 in 1000 cases the forecast changes can be rather large, e.g., even the 1-day change from day −8 to day −7 can be larger than +8°C.

We have used an empirical approach to calculate the percentiles. An alternative approach would be to fit a distribution and read percentiles from the fitted distribution.

### e. Seasonal variations

Many forecast properties vary with season, and it is important to assess whether that is the case for our regression parameters. We recalibrate the forecasts, but now allowing regression parameters to vary by season, again splitting into training and testing datasets. Figure 6 shows the resulting RMSC for each season, which can be compared with the annual values in Fig. 2. We see that there are indeed seasonal variations, and in particular the changes are smaller in SON than in the other seasons. This suggests that all the other metrics (in both section 3 above and section 4 below) may also vary with season, and for more accurate results should all be calculated on a seasonal basis. Given that our goal is just to illustrate different metrics, however, these are the only seasonal metrics we include in this article.

## 4. Changes in forecast skill

We now show a number of diagnostics that illustrate how the *skill* of the forecast described in section 2, considered for a fixed target day, changes from one initialization time to a later initialization time. Forecast users in all of our six use-cases could potentially use this information. In most cases, using an update to a forecast only has value if the new forecast is likely to be more skillful than the old forecast, and hence understanding changes in forecast skill is crucial to understanding whether using a forecast update might be worthwhile. In some cases, using an updated forecast will only be worthwhile if the skill increases by a large enough amount, and this is reflected in some of the metrics we consider for understanding changes in forecast skill.

### a. Reduction in RMSE

Figure 7a shows the reduction in the RMSE (RRMSE) from day *M* to day *N*, which is our first measure of how forecast skill increases between different lead times as we approach the target day. These values are not to be confused with the RMSC values shown in Fig. 2, which only show change and not skill. As with RMSC, the RRMSE can also be considered as a generalization of RMSE, since the *N* = 0 line shows the RMSE of the forecast, as shown in Fig. 1a. All the other lines in Fig. 7 can be derived from this *N* = 0 line simply by calculating differences between different points on that line. We see that the 1-day reductions in the RMSE are all small, especially for day −8 to day −7, suggesting that using updated forecasts on consecutive days may often not be worthwhile if there is a cost involved. Conversely, the reductions in the RMSE between the early forecasts and the late forecasts are large, suggesting that using updated forecasts is likely worthwhile, unless the cost of doing so is large. The reductions in RMSE are gradual, and there is no point at which the RMSE reduces significantly more rapidly from one forecast to the next. The largest reductions in RMSE occur when moving from any of the forecasts to the final observation (for which the RMSE is zero).

Figure 7b shows the same values as in Fig. 7a but divided by the standard deviation of the anomaly observations, to allow interpretation of the reductions in RMSE in the context of the climatological variability of the temperature (and many of the other metrics we present could also be shown in this way). While the values for 1-day reductions are only a small fraction of the climatological variability, again indicating that using 1-day forecast updates may not be particularly valuable if there is any cost involved, the reductions from the early forecasts to the late forecasts approach 50% of the climatological variability, suggesting that multiday updates could be valuable.

For case 1, J21 shows how changes in RMSE can be used to calculate the RMSC values that are used as the input to an algorithm that can be used to make the act-now-or-wait decision. For the other cases, the RRMSE would be one way to measure the benefit of using an updated forecast, to be weighed against the costs.

### b. Comparing RMSE and RMSC

To compare changes in the forecast values with the change in the forecast skill, Fig. 7c shows the ratio of typical sizes of forecast changes, as measured using the RMSC shown in Fig. 2, to the typical sizes of changes in forecast errors, as measured using the RRMSE shown in Fig. 7a. Figure 7d shows the same values, but as one divided by this ratio, as an alternative way to present the same information. In Fig. 7c, we see that for the 1-day forecast changes, such as from day −5 to day −4, the RMSC/RRMSE ratio is fairly high, with RMSC values which are 4 to 6 times larger than the typical reduction in RMSE. For these 1-day forecast changes, most of the change in the forecast is not related to a reduction in forecast error, but is random. For changes over longer periods, such as changes from day −5 to day −1, the ratio is lower and a larger part of the change in the forecast is related to a reduction in forecast error. For changes between day *M* and day 0 (the target day), the ratio is 1, by definition, since the numerator and the denominator of the ratio are both the RMSE. For some users, large forecast changes are inconvenient, as taking action may be costly, and the cost may depend on the size of the forecast change (cases 3b and 4b). The inconvenience of forecast changes has been discussed in, for example, Hamill (2003), Zsoter et al. (2009), McLay (2011), Fowler et al. (2015), Griffiths et al. (2019) and Richardson et al. (2020). However, forecasts cannot improve unless they change. The ratios in Figs. 7c and 7d give a simple way to assess the possible value imparted by a change in the forecast, by comparing the size of change with the increase in skill. A forecast change for which the change is likely much larger than the decrease in RMSE may be mostly inconvenience and may have less value than a forecast change for which the two are more similar, where the inconvenience of the change is balanced by the increase in information.

*M*and day

*N*as

*r*and

_{M}*r*, where

_{N}*M*is the earlier forecast, and hence

*r*>

_{M}*r*. From Eq. (2) in J21, for well calibrated forecasts (such as those used in our example), the RMSC is related to the RMSE by the expression

_{N}*r*=

_{M}*r*+

_{N}*ε*, then we can linearize Eq. (1), giving

*M*= −3 and

*N*= −2, for which

*r*= 2.22°C,

_{M}*r*= 1.98°C, and

_{N}*ε*= 0.24°C. These values give RMSC/RRMSE ≈ 4.06, which is close to what we see in Fig. 7c for the actual ratio for this particular change. From this analysis, we see that the value of the ratio is mostly a consequence of the size of the fractional change in RMSE from one forecast to the next. Equation (2) shows that small fractional changes in RMSE lead to large values of the ratio. This shows that small fractional changes in skill from day to day will always lead to large volatility of the forecast: it would not be possible to reduce the volatility of the forecast without improving the skill. This result can perhaps help with the interpretation of some of the forecast volatility analyses in the papers cited above.

All panels in Fig. 7 can also be produced using absolute changes and absolute errors, which may be preferred by some forecast users. The results are similar (not shown).

### c. Probability that a later forecast is more accurate

Figure 2 shows that all forecasts made at later times are more accurate *on average*. However, they are not more accurate in every case, and these on average results only give a very limited picture of how accuracy improves in individual cases. To remedy this, Fig. 8a shows the probability that one of the later forecasts will be more accurate, where more accurate means having a smaller size of error.

All the values for the forecasts themselves are between 50% and 80%. The observations are more accurate 100% of the time, by definition, as shown by the *N* = 0 line. Moving from day *M* to *N* = *M* + 1, i.e., for 1-day changes, the forecast has less than a 60% chance of being more accurate, and this does not vary much with *M*. Some users might conclude that having less than a 60% chance of being more accurate is not sufficient to justify updating the forecast they are using. Moving from day −8 to day −1 the forecast has nearly an 80% chance of being more accurate, and would be more likely to justify a forecast update.

Figure 8b presents exactly the same information, but as the probability that a subsequent forecast will be *less* accurate. Although the mathematical relationship between Figs. 8a and 8b is trivial, we include both, since some forecast users may prefer one over the other.

### d. Probability that a later forecast is materially more accurate

Figure 8a includes cases where the forecast is only very slightly more accurate. However, if a new forecast is only very slightly more accurate, it is almost definitely not worth updating to the new forecast if there is a cost in doing so. Figure 8c attempts to factor this in and shows the probability that the later forecast will be *materially* more accurate, which we define as being more than 0.5°C more accurate. The choice of a 0.5°C threshold to define materially more accurate is arbitrary, and different forecast users may prefer a different threshold. Testing for material increases in accuracy is a tougher test for the forecast, but perhaps more relevant for a forecast user. Figure 8c shows that there are many forecast changes for which the later forecast has less than a 50% probability of being materially more accurate. This includes all the one or two day changes from forecast to forecast (i.e., from day *M* to day *N* = *M* + 1 or day *N* = *M* + 2. For all of the changes that show less than a 50% of being materially more accurate, users may well conclude that updating is not worthwhile. All the 4-, 5-, 6-, 7-, and 8-day changes in Fig. 8c have greater than a 50% chance of being materially more accurate, and users may conclude that updating to the new forecast is worthwhile in all these cases. The changes from day *M* to the observations are not 100% likely to be materially more accurate, since there is a possibility that the forecast on day *M* is already within 0.5°C of the observations. Figure 8d shows the probability that the later forecast will *not* be materially more accurate, as another way of presenting the same information.

## 5. Discussion and examples

We now discuss five illustrative examples of possible usage of the types of forecast change information presented above.

### a. Act now or wait for the next forecast

Our first example (example A) is based on case 1 from section 2. J21 addressed the question of whether to cancel an event or wait for the next forecast, two forecast steps in advance of the event, using a simple extension of the well-known cost–loss model (for discussion of the basic cost–loss model, see Murphy 1969; Kernan 1975; Katz and Murphy 1997; Buizza 2001; Roulin 2007; Richardson et al. 2020). J21 assumed that values can be specified for the costs of cancellation one and two days in advance, and for the loss that would be suffered if bad weather occurs on the day of the event. They described an algorithm for making this decision in a rational way, which involves simulating the possible values for the second forecast given the first forecast. The algorithm uses RMSC values to simulate possible future forecasts using a Markov process, which allows calculation of probabilities of all possible forecast trajectories. The Markov process uses a simplified approach in which normal distributions are used to calculate the transitions, which contrasts with a number of other authors who have used the more general approach of using transition matrices (e.g., Regnier and Harr 2006; McLay 2008, 2011). The simulated future forecasts allow the evaluation of the utility of the two choices (act now or wait).

These simulations feed into a decision rule, and J21 use out-of-sample testing with real forecasts and observations to show that using the algorithm and the decision rule leads to better decisions, in terms of higher average utility, than using simpler methods based on the forecasts alone. The algorithm beats the simpler methods because it takes into account what can already be deduced about the likely values of the second forecast given the first forecast. This algorithm, and the solution to the basic cost–loss model, are both special cases of the general decision framework known as dynamic programming (Bellman 1957), which has been applied to various decision-making problems in atmospheric sciences (Katz et al. 1982; Wilks 1991; Regnier and Harr 2006; McLay 2008, 2011).

Example A uses both unconditional and conditional forecast change information. The inputs for the algorithm are the forecast RMSE values, from which RMSC values were derived. RMSC values could also be estimated directly. These inputs are unconditional, i.e., do not depend on the current forecast value. Based on these inputs, and the value of the current forecast, the algorithm calculates two forecast-dependent probabilities, related to forecast change. These probabilities are conditional forecast change information, and determine the decision.

In most real act-now-or-wait-for-the-next-forecast situations, it would not be possible to specify the costs and losses required to formulate the problem mathematically as described in J21. In these more general situations, the act-now-or-wait decision would likely be made based on a subjective evaluation after considering all the available information. All of the unconditional metrics discussed in sections 3 and 4 above could feed into that subjective evaluation, and the analysis in J21 suggests that calculating some additional conditional metrics might also be useful. Examples of conditional metrics might include the probability that the next forecast will predict bad weather, given the current forecast, and the probability that the weather on the day of the event will be bad, even if the final forecast predicts good weather, given the current forecast [which is a limiting case of the “sneak” situation discussed in McLay (2011)].

### b. When to access or act on forecasts: A constant cost example

Our next four examples (examples B–E) consider the question of when to access or act on forecasts from an unconditional perspective, i.e., from a perspective before any forecasts are available. Because no forecasts are available, decisions cannot be made based on forecast values, but rather only on estimated statistical behavior of the forecasts. These problems cannot be solved using the dynamic programming framework, because they lack the necessary recursive structure, but can be solved using a more general optimization framework, as we show below.

*N*, having previously used the forecast on day

*M*, as follows:

In the expression for benefit, *r*_{−8} is used as a constant normalizing factor. We assume that non-dimensionalization has been used so that the cost and benefit are expressed in the same units. The values of benefit can be derived from the changes in RMSE shown in Fig. 7a. To give an example: we can consider whether to use the forecast on day *N* = −2, having previously used it on day *M* = −3 using the values of *r*_{−3} and *r*_{−2} given in section 4a above, and *r*_{−8} = 3.60°C. These values give a benefit of (−0.24)(−2)/3.60 = 0.133. If this benefit is greater than the cost *C* then we would decide to use the forecast on day −2; otherwise, we would not use the forecast.

This model has only a single parameter *C*, the cost of using the forecast. We will consider a range of values for this parameter, and in each case we will determine on which days the model implies forecasts should be used. The decisions for the different days interact, since using a forecast on day *M* reduces the benefit of using the forecast on days *M* − 1 and *M* + 1, because it decreases the RMSE reduction possible on consecutive days. We will assume that, at the start, the forecast is always used on day −8, leaving 7 binary decisions to be made with respect to which of the 7 later forecasts to use. These 7 binary decisions lead to 128 possible strategies for which set of forecasts to use. We perform a brute-force search through all these 128 strategies, computing net benefit = benefit − cost for each forecast within each strategy, and total net benefit as the sum of the net benefits across all 7 forecast decisions within each strategy. For each value of *C*, we then choose the single strategy that gives the highest total net benefit. If none of the strategies give positive total net benefit, then no forecasts are used except for day −8.

Results are given in Table 2 and Fig. 9. In Table 2, “1” indicates that the model recommends that the forecast should be used for a given forecast day and cost *C*, while “0” indicates that it should not be. We see that for *C* > 1.25, the algorithm recommends using no further forecasts after day −8, because using forecasts is simply too expensive and all strategies have negative total net benefit because of the large cost. For *C* ∈ (0.38, 1.25), the algorithm recommends using just one further forecast, on day −4. Although the forecast on day −4 is less accurate than later forecasts, the algorithm values it more highly because it is available earlier, according to Eq. (3). For *C* ∈ (0.21, 0.37), the algorithm recommends using two further forecasts, on days −5 and −3. The algorithm separates them in time because that increases the change in the RMSE for the forecast on day −3, which increases the benefit on that day. As *C* is reduced further, more and more forecasts are used, until *C* has reduced to <0.053, at which point the algorithm calculates that the most beneficial strategy is to use every forecast. The optimal strategy involving just two forecasts is also given in Table 3.

Results from calculations to determine the optimal strategy with respect to which of the forecasts from Fig. 1 to use (access or act on), according to the four cost–benefit models described in the text. (a) For example B (as described in section 5b), shows the total net benefit of the optimal strategy vs the cost parameter (red line), as well as the total net benefit of the strategy of using no forecasts after the first, and the total net benefit of using all forecasts. (b) The strategy number in each case. (c),(e),(g) As in (a), but for examples C, D, and E (as described in sections 5c, 5d and 5e, respectively), and (d),(f),(h) the strategy numbers for examples C, D, and E.

Citation: Weather and Forecasting 37, 1; 10.1175/WAF-D-21-0086.1

Results from calculations to determine the optimal strategy with respect to which of the forecasts from Fig. 1 to use (access or act on), according to the four cost–benefit models described in the text. (a) For example B (as described in section 5b), shows the total net benefit of the optimal strategy vs the cost parameter (red line), as well as the total net benefit of the strategy of using no forecasts after the first, and the total net benefit of using all forecasts. (b) The strategy number in each case. (c),(e),(g) As in (a), but for examples C, D, and E (as described in sections 5c, 5d and 5e, respectively), and (d),(f),(h) the strategy numbers for examples C, D, and E.

Citation: Weather and Forecasting 37, 1; 10.1175/WAF-D-21-0086.1

Results from calculations to determine the optimal strategy with respect to which of the forecasts from Fig. 1 to use (access or act on), according to the four cost–benefit models described in the text. (a) For example B (as described in section 5b), shows the total net benefit of the optimal strategy vs the cost parameter (red line), as well as the total net benefit of the strategy of using no forecasts after the first, and the total net benefit of using all forecasts. (b) The strategy number in each case. (c),(e),(g) As in (a), but for examples C, D, and E (as described in sections 5c, 5d and 5e, respectively), and (d),(f),(h) the strategy numbers for examples C, D, and E.

Citation: Weather and Forecasting 37, 1; 10.1175/WAF-D-21-0086.1

Optimal strategies for whether to use (access or act on) each forecast leading up to the target day, in a situation where using a forecast has a fixed cost, and a variable benefit proportional to the increase in skill and the length of time to the target day (see section 5b, example B). The first column gives the range of cost parameters. In the next seven columns, a 1 indicates that a forecast on that day is to be used, and a 0 that it should not be used. The last column gives the total number of forecasts used in each case.

Optimal strategies as shown in Table 2, but now for four different examples, and only for the optimal solution that uses exactly two forecasts in each case.

Figure 9a shows the variation of the total net benefit versus the cost parameter *C* for the optimal strategy derived as described above, and also for the simpler strategies of 1) only using the first forecast, and then no other forecasts, and 2) using all the forecasts. We see that for small values of the cost parameter, the optimal strategy and using all forecasts give equal total net benefit, while for large values of the cost parameter, the optimal strategy and using no forecasts give equal total net benefit. In between these two extremes, there is a range of values of the cost parameter for which the optimal strategy beats both of the simpler strategies. Figure 9b shows how the strategy chosen varies with the cost parameter. The strategies are coded using a binary representation, as in Table 2, and the binary number is converted to a decimal number, which we call the strategy number and plot to illustrate how the strategy changes as *C* varies. The eight plateaus in this graph give the regions over which the eight strategies given in Table 2 are optimal. For the third strategy [*C* ∈ (0.21, 0.37)], this region is very small.

In practice, for a fixed cost (assumed to be known to the user), the characteristics of the forecast changes would determine the optimal strategy, illustrating the value of the forecast change information.

In a variation of this example, the benefit of the forecast could instead be modeled using the change in MAE rather than the change in RMSE. This model applies to cases 2 and 3a but is not relevant for case 3b, in which the decision as to whether to act on the forecast can be made once the forecast is known. Analysis of the case 3b situation is more complex and would require conditional forecast change metrics and modeling of how the benefit of using the forecast depends on the actual value of the forecast.

### c. When to act on forecasts: An increasing cost example

*variable*cost associated with acting on the forecast. In this case, we will model the variable cost as increasing as we approach the target day. The benefit and cost in this model are then given by

Once again, there is just a single parameter *C*, and the cost of acting on the forecast is proportional to this parameter. As before, we can determine the best course of action for each value of *C* by brute-force search through all 128 possible strategies.

Figures 9c and 9d show the variation of the total net benefit versus the cost parameter for the optimal strategy, and the strategy number for the optimal strategy, following Figs. 9a and 9b. For low values of the cost parameter, using all the forecasts makes most sense. As the cost parameter increases, at some point using all the forecasts becomes too expensive and is beaten by using none of the forecasts. The optimized solution beats them both, however, and favors using the early forecasts, as shown by the large values of the strategy number that can be seen in Fig. 9d. For the highest values of the cost parameter, the optimal strategy is to use just the first forecast, which beats using no forecast. This forecast is optimal because in the cost model, using just the first forecast has no cost [*N* = −7 in Eq. (6)]. The optimal strategy involving just two forecasts is given in Table 3. We see that in this case, the first two forecasts are chosen, reflecting the low costs at that time.

### d. When to act on forecasts: A peaking cost example

Results are shown in Figs. 9e and 9f. Different optimal strategies are chosen, relative to examples B and C, which this time favor both early and late forecasts and avoid forecasts in the middle of the time period. The optimal strategy involving just two forecasts is given in Table 3. We see that in this case the first and last forecasts are chosen, reflecting the low costs at these times.

### e. When to act on forecasts: Cost proportional to size of change

Results from this experiment are shown in Figs. 9g and 9h. The results are similar to those shown in Figs. 9a and 9b for example B, because the variation in changes in forecast skill in our forecast, and hence the variations in cost, are small and thus close to the constant cost case. The only qualitative difference is that there are now nine plateaus in Fig. 9d, reflecting that there are now nine different solutions, rather than eight, because there are two that correspond to using four forecasts. The optimal strategy involving just two forecasts is given in Table 3. We see that in this case, forecasts are chosen on days −5 and −3, which match the solution from example B. In general, however, examples B and E show different optimal solutions (see Fig. 9).

In a variation of this example, one could model the cost as a function of the average absolute change in the forecast (Fig. 3a). One could also consider situations in which the cost is zero until the forecast has changed by a material amount, as might be the case for users who only rerun their impact models if the forecast is likely to change by a material amount. For these situations, it might be more appropriate to model the cost as proportional to the probability of large changes in the forecast (using data from Figs. 3b,c,d), or percentiles of change in the forecast (using data from Fig. 5). This model applies to case 4a but is not relevant for case 4b, in which the decision as to whether to act on the forecast can be made once the forecast is known. As with case 3b, the optimal strategy for case 4b would be more complex.

### f. Comparison and discussion of examples B–E

The examples B–E illustrate how details of the costs and benefits of using forecasts at different times can lead to different optimal strategies, involving using different subsets of the full set of forecasts. There are many other possible configurations that could arise in different user situations.

In cases where the cost and benefit cannot be precisely quantified, which is most real cases, the various graphs presented in sections 3 and 4 above allow the user to take forecast change information into account to make the same decisions as to which forecasts to access or act on, but in a qualitative or subjective way.

## 6. Summary

Weather forecasts, seasonal forecasts, and climate projections exist to help improve decision-making. In some situations, single value forecasts, ensemble forecasts, or probabilistic forecasts may be all that are required. In other situations, information about possible forecast changes can further improve decisions. For instance, using forecast change information can help make better decisions in the commonly occurring situation in which a forecast user is considering whether to act now or wait for the next forecast (Regnier and Harr 2006; Jewson et al. 2021, hereafter J21). Forecast change information can also help when there are costs associated with using forecasts, because a forecast user may have to decide how frequently, and exactly when, to use the forecast in order to maximize the total benefit from the forecast while minimizing the total cost.

Currently, forecast users are unable to obtain information about possible forecast changes unless they derive it themselves from an archive of past forecasts. We make the suggestion that it might be useful for forecast providers to supply this information. That in turn raises the question of how exactly such information should be presented. We have explored the question of how to present forecast change information in the context of medium-range single-value forecasts of mean temperature for Stockholm from the ERA-Interim reforecast dataset. We have produced a number of exhibits which show ways in which both changes in the forecast, and changes in the skill of the forecast, can be presented. Different forecast users operate in different environments and may have different perspectives as to which of the exhibits they find most useful.

J21 showed that, for forecasts consisting of normal distributions and for a certain class of idealized decision situations in which all costs can be quantified, changes in root-mean-squared error (RMSE) of forecasts from one forecast to the next (for the same target day) can be used in an algorithm that answers the question as to whether to act now or wait for the next forecast. The algorithm leads to better decisions, on average, when compared with decisions made using only the current forecast. In general act-now-or-wait decision situations, it is typically not possible to quantify all the costs of different actions and outcomes and hence not possible to derive algorithms to make decisions. Decisions must then be made subjectively after consideration of all available information. The forecast change metrics we have presented can form part of this body of information, and forecast users who have access to these metrics could make better decisions.

To further elucidate the potential uses of forecast change information, we have presented four new illustrative examples (examples B–E) that explore the question of when to use forecasts, when using the forecast has a cost. In all four examples, using the forecast has a benefit which is modeled as proportional to the reduction in RMSE, to capture the idea that more accurate forecasts are more useful, and which is also proportional to the number of days to the target day, to capture the idea that earlier forecasts are more useful. In example B, the cost is modeled as constant, in example C, it is modeled as increasing, in example D, it is modeled as peaking, and in example E, it is modeled as proportional to the size of the change in the forecast. For all examples, we have computed optimal strategies for a range of values of a cost parameter. In all cases, when the cost parameter, and hence the cost, is low, it is best to use all forecasts. The interesting cases lie as the cost increases, where there is a trade-off between the cost of using the forecast and the resulting expected benefit. All four models then select subsets of the full set of forecasts as the optimal strategy, which generally involves using forecasts that are spaced out in time in some way. The models differ in terms of exactly which forecasts to choose because of their different cost models. These models illustrate that the choice of exactly which forecasts to use can be a subtle one that may vary among users and situations, and that can only be resolved given appropriate information about possible forecast changes. Once again, in general decision situations, it is typically not possible to quantify the costs and benefits, as we have done in these examples, but the forecast change metrics we have presented can form part of the body of information on which subjective decisions can be based and would again be expected to improve the decisions made.

The actual costs and benefits that occur as a result of using a forecast will vary among forecasts, and, in particular, more volatile forecasts are likely to lead to greater costs. This has been discussed in previous literature (Hamill 2003; Zsoter et al. 2009; McLay 2011; Fowler et al. 2015; Griffiths et al. 2019; Richardson et al. 2020). How exactly forecast volatility affects costs in the examples we have considered would vary among the examples, since volatility at different lead times would have different impacts in the different examples. It would be informative to extend our models and the examples to study this effect.

We have attempted to present information about forecast changes in ways that are readily understandable. However, there is undoubtedly potential for further improvements in how this information could be presented, especially for less technical users of forecasts. The challenge of finding better ways to present this information would need to be addressed along with more general questions about how forecast information, such as forecast uncertainty, and forecast probabilities, might be presented for such users.

We have considered forecasts for temperature at a single location as a starting point for exploring the subject of how to communicate forecast changes. Further work might include investigating how similar types of information could be presented for other types of forecast. Precipitation forecasts are different in character and would require a different set of exhibits. For instance, precipitation forecasts are often presented as probabilities, and metrics related to forecast changes would need to be presented as possible sizes of changes in the forecast probabilities. Metrics related to changes in forecast skill would need to use measures of skill that apply to probability forecasts. Tropical cyclone forecasts are different again and are presented both as track and intensity, as well as maps of local variables (typically wind, rainfall, and surge depth). As a result, metrics of change could be calculated for changes in tracks, intensities, and local variables. Understanding possible changes in tropical cyclone forecasts can help with the decision of whether to evacuate or wait for the next forecast. Finally, presenting information about possible changes in climate projections would be useful for those making long-term decisions based on such projections, but poses some different challenges.

In conclusion, we have explored the question of how forecast change information might be presented to forecast users, to help them make better decisions in the various situations in which using forecast change information can be beneficial. The metrics we have presented give forecast users general information about the behavior and performance of a forecast, which can feed into subjective decisions that the forecast user may need to make or can feed into objective algorithms that can automate decisions.

## Acknowledgments.

The authors thank the anonymous reviewers, who made suggestions that have improved the manuscript considerably. Gabriele Messori has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant 948309, CENÆ Project), and was partly supported by the Swedish Research Council Vetenskapsrådet Grant 2016-03724.

## Data availability statement.

The ECMWF forecast data can be obtained from ECMWF. The temperature observations for Stockholm can be obtained from the Swedish Meteorological and Hydrological Institute (SMHI).

## REFERENCES

Bellman, R., 1957:

Dover Publications, 366 pp.*Dynamic Programming.*Buizza, R., 2001: Accuracy and economic value of categorical and probabilistic forecasts of discrete events.

,*Mon. Wea. Rev.***129**, 2329–2345, https://doi.org/10.1175/1520-0493(2001)129<2329:AAPEVO>2.0.CO;2.CNN, 2019: The New York City Triathlon has been canceled because of the heat. CNN, accessed 10 January 2022, https://edition.cnn.com/2019/07/19/us/heat-wave-new-york-triathlon-canceled-wxc/index.html.

Dee, D. P., and Coauthors, 2011: The ERA-Interim reanalysis: Configuration and performance of the data assimilation system.

,*Quart. J. Roy. Meteor. Soc.***137**, 553–597, https://doi.org/10.1002/qj.828.Epstein, E., and A. Murphy, 1987: Use and value of multiple-period forecasts in a dynamic model of the cost–loss ratio situation.

,*Mon. Wea. Rev.***116**, 746–761, https://doi.org/10.1175/1520-0493(1988)116<0746:UAVOMP>2.0.CO;2.Fowler, T., B. Brown, J. Gotway, and P. Kucera, 2015: Spare change: Evaluating revised forecasts.

,*Mausam***66**, 635–644.Gebetsberger, M., J. Messner, G. Mayr, and A. Zeileis, 2018: Estimation methods for nonhomogeneous regression models: Minimum continuous ranked probability score versus maximum likelihood.

,*Mon. Wea. Rev.***146**, 4323–4338, https://doi.org/10.1175/MWR-D-17-0364.1.Gneiting, T., A. Raftery, A. Westveld, and T. Goldman, 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation.

,*Mon. Wea. Rev.***133**, 1098–1118, https://doi.org/10.1175/MWR2904.1.Griffiths, D., M. Foley, I. Ioannou, and T. Leeuwenburg, 2019: Flip-flop index: Quantifying revision stability for fixed-event forecasts.

,*Meteor. Appl.***26**, 30–35, https://doi.org/10.1002/met.1732.Hamill, T., 2003: Evaluating forecasters’ rules of thumb: A study of

*d*(prog)/*dt*.,*Wea. Forecasting***18**, 933–937, https://doi.org/10.1175/1520-0434(2003)018<0933:EFROTA>2.0.CO;2.Jewson, S., and C. Ziehmann, 2004: Using ensemble forecasts to predict the size of forecast changes, with application to weather swap value at risk.

,*Atmos. Sci. Lett.***4**, 15–27, https://doi.org/10.1016/S1530-261X(03)00003-3.Jewson, S., A. Brix, and C. Ziehmann, 2004: A new parametric model for the assessment and calibration of medium‐range ensemble temperature forecasts.

,*Atmos. Sci. Lett.***5**, 96–102, https://doi.org/10.1002/asl.69.Jewson, S., S. Scher, and G. Messori, 2021: Decide now or wait for the next forecast? Testing a decision framework using real forecasts and observations.

,*Mon. Wea. Rev.***149**, 1637–1650, https://doi.org/10.1175/MWR-D-20-0392.1.Katz, R., 1993: Dynamic cost–loss ratio decision making model with an autocorrelated climate variable.

,*J. Climate***6**, 151–160, https://doi.org/10.1175/1520-0442(1993)006<0151:DCLRDM>2.0.CO;2.Katz, R., and A. Murphy, 1997:

Cambridge University Press, 240 pp.*Economic Value of Weather and Climate Forecasts.*Katz, R., A. Murphy, and R. Winkler, 1982: Assessing the value of frost forecasts to orchardists: A dynamic decision-making approach.

,*J. Appl. Meteor.***21**, 518–531, https://doi.org/10.1175/1520-0450(1982)021<0518:ATVOFF>2.0.CO;2.Kernan, G., 1975: The cost–loss decision model and air pollution forecasting.

,*J. Appl. Meteor.***14**, 8–16, https://doi.org/10.1175/1520-0450(1975)014<0008:TCLDMA>2.0.CO;2.Lalić, B., A. Firany Sremac, L. Dekic, and J. Eitzinger, 2017: Seasonal forecasting of green water components and crop yields of winter wheat in Serbia and Austria.

,*J. Agric. Sci.***156**, 645–657, https://doi.org/10.1017/S0021859617000788.McLay, J., 2008: Markov chain modelling of sequences of lagged NWP ensemble probability forecasts: An exploration of model properties and decision support applications.

,*Mon. Wea. Rev.***136**, 3655–3670, https://doi.org/10.1175/2008MWR2376.1.McLay, J., 2011: Diagosing the relative impact of sneaks, phantoms and volatility in sequences of lagged ensemble probability forecasts with a simple dynamic decision model.

,*Mon. Wea. Rev.***139**, 387–402, https://doi.org/10.1175/2010MWR3449.1.Molinder, J., H. Kornich, E. Olsson, H. Bergstrom, and A. Sjoblom, 2018: Probabilistic forecasting of wind power production losses in cold climates.

,*Wind Energy Sci.***3**, 667–680, https://doi.org/10.5194/wes-3-667-2018.Murphy, A., 1969: On expected-utility measures in cost–loss ratio decision situations.

,*J. Appl. Meteor.***8**, 989–991, https://doi.org/10.1175/1520-0450(1969)008<0989:OEUMIC>2.0.CO;2.Murphy, A., and Q. Ye, 1990: Optimal decision-making and the value of information in a time-dependent version of the cost–loss ratio situation.

,*Mon. Wea. Rev.***118**, 939–949, https://doi.org/10.1175/1520-0493(1990)118<0939:ODMATV>2.0.CO;2.Murphy, A., R. Katz, R. Winkler, and W. Hsu, 1985: Repetitive decision making and the value of forecasts in the cost–loss ratio situation: A dynamic model.

,*Mon. Wea. Rev.***113**, 801–813, https://doi.org/10.1175/1520-0493(1985)113<0801:RDMATV>2.0.CO;2.Regnier, E., and P. Harr, 2006: A dynamic decision model applied to hurricane landfall.

,*Wea. Forecasting***21**, 764–780, https://doi.org/10.1175/WAF958.1.Richardson, D. S., H. L. Cloke, and F. Pappenberger, 2020: Evaluation of the consistency of ECMWF ensemble forecasts.

,*Geophys. Res. Lett.***47**, e2020GL087934, https://doi.org/10.1029/2020GL087934.Roulin, E., 2007: Skill and relative economic value of medium-range hydrological ensemble predictions.

,*Hydrol. Earth Syst. Sci.***11**, 725–737, https://doi.org/10.5194/hess-11-725-2007.Scheuerer, M., 2013: Probabilistic quantitative precipitation forecasting using Ensemble Model Output Statistics.

,*Quart. J. Roy. Meteor. Soc.***140**, 1086–1096, https://doi.org/10.1002/qj.2183.Skoglund, L., J. Kuttenkeuler, and A. Rosen, 2015: A comparative study of deterministic and ensemble weather forecasts for weather routing.

,*J. Mar. Sci. Technol.***20**, 429–441, https://doi.org/10.1007/s00773-014-0295-9.Thorarinsdottir, T., and M. Johnson, 2012: Probabilistic wind gust forecasting using nonhomogeneous Gaussian regression.

,*Mon. Wea. Rev.***140**, 889–897, https://doi.org/10.1175/MWR-D-11-00075.1.Wilks, D. S., 1991: Representing serial correlation of meteorological events and forecasts in dynamic decision-analytic methods.

,*Mon. Wea. Rev.***119**, 1640–1662, https://doi.org/10.1175/1520-0493(1991)119<1640:RSCOME>2.0.CO;2.Wilks, D. S., 2006: Comparison of ensemble-MOS methods in the Lorenz ’96 setting.

,*Meteor. Appl.***13**, 243–256, https://doi.org/10.1017/S1350482706002192.Wilks, D. S., and D. W. Wolfe, 1998: Optimal use and economic value of weather forecasts for lettuce irrigation in a humid climate.

,*Agric. For. Meteor.***89**, 115–129, https://doi.org/10.1016/S0168-1923(97)00066-X.Wilks, D. S., R. E. Pitt, and G. W. Fick, 1993: Modeling optimal alfalfa harvest scheduling using short-range weather forecasts.

,*Agric. Syst.***42**, 277–305, https://doi.org/10.1016/0308-521X(93)90059-B.Zsoter, E., R. Buizza, and D. Richardson, 2009: “Jumpiness” of the ECMWF and Met Office EPS control and ensemble-mean forecasts.

,*Mon. Wea. Rev.***137**, 3823–3836, https://doi.org/10.1175/2009MWR2960.1.