1. Introduction
In The Critique of Pure Reason, Immanuel Kant wrote that “The usual touchstone, whether that which someone asserts is merely his persuasion—or at least his subjective conviction, that is, his firm belief—is betting.” The mathematical theory of probability was born out of gambling when, in the early seventeenth century, Blaise Pascal and Pierre de Fermat set out to explain the consistent losses of the Chevalier de Méré in a popular dice game. While recreational gambling is frowned upon by some, regulated in many places, and banned in others, examples involving tossed coins, dice, and lotteries are a staple of probability and statistics classes. In recent decades, economists have taken a more than recreational interest in the ability of betting to elicit and aggregate information and have advocated the use of “information markets” or “prediction markets” as a way of synthesizing disparate sources of information and expertise using the mechanics of betting (Arrow et al. 2008; Abramowicz 2008).
The challenge of combining multiple sources of information, different modeling approaches, and human knowledge is common to seasonal and climate forecasting. While the use of prediction markets for climate forecasting has been suggested by some climate scientists, legal scholars, and economists, there are few examples of their use (Hsu 2011; Vandenbergh et al. 2013; Nay et al. 2016; Lucas and Mormann 2018; Aliakbari and McKitrick 2018; Roulston et al. 2022). This is partly because of regulatory obstacles, due to their similarity with recreational gambling, and because switching to a system that directly rewards forecasters for accuracy would represent a fundamental change for many funders of seasonal and climate forecasting.
In this article, we will introduce prediction markets and explain how they can solve some pressing problems faced by the users of climate forecasts. We will also describe a suite of climate-related prediction markets, with expert participants, that we have run with horizons up to one year ahead. These have demonstrated the ability of relevant experts to engage with prediction markets and have allowed us to make a preliminary evaluation of whether the collective probability forecasts the markets generate are probabilistically calibrated, or reliable, in the terminology of meteorology.
2. How do prediction markets work?
Most prediction markets are based on conditional contracts which economists call “Arrow–Debreu” securities. These are simply contracts that pay out a fixed amount (often $1.00) if the event specified in the contract occurs. For example, a contract might pay out $1.00 if it snows in New York on Christmas Day. Participants trade these contracts with each other and, if we assume the prevailing price is equal to the expected pay out, Prob(snow) × $1.00, then the price can be interpreted as the probability that it will snow according to the collective wisdom of “the market.” At the point of buying a contract, the arrangement is no different to fixed-odds betting: a price of $0.25 would correspond to odds of 3:1 against. In a prediction market, however, participants have the option of selling and so do not have to hold the contract until expiration. Buying and selling by participants determines a prevailing price, and it is this that allows a prediction market to be a mechanism for dynamically aggregating information.
There are different mechanisms by which contracts are traded. Many prediction markets use a continuous double auction (CDA) in which participants can post the price at which they are prepared to buy a contract (a “bid”) or the price at which they are prepared to sell one (an “ask”). If the highest bid matches or exceeds, the lowest ask a trade occurs. CDAs work well if there are many participants with divergent views. If there are frequent trades, the price of the most recent one can be used as a proxy for the probability of the event occurring. CDAs work less well if there are fewer participants or if there is strong agreement about the fair value of the contract, in which case very few trades may occur. The market-based estimate of the probability can be assumed to lie between the highest bid and the lowest ask, but sometimes this “bid–ask spread” can be large, making estimation of the probability difficult.
An alternative to a CDA is an automated market maker (AMM) that will always quote a price at which it will buy or sell contracts. Many financial markets have market makers who are participants prepared to both buy and sell, but these market makers seek to make a profit by buying at prices lower than they sell. In contrast, the AMM in a prediction market can be subsidized and designed to lose money in return for accurate information. This is a fundamentally different goal than that of traditional market makers or bookmakers in gambling. AMMs adjust their pricing based on how many contracts for different outcomes they have already sold and can be based on “proper scoring rules” for evaluating probability forecasts, such as the logarithmic or Brier scores (Hanson 2003). Under proper scores, which are a standard way of verifying probabilistic weather forecasts, participants maximize their expected reward by expressing their true beliefs about probabilities (Gneiting and Raftery 2007). Proper scoring rules are “incentive compatible” in the jargon of economics. By always being prepared to take the other side of a bet, AMMs allow prediction markets to function with a small number of participants, or if participants agree. While subsidized AMMs can greatly improve the functioning of a prediction market, they require a subsidy. A sponsor must essentially provide a prize which will be shared among participants in proportion to the contribution they make to the accuracy of the collective forecast. It is not necessary for participants to understand the functioning of the AMM; they merely need to decide whether the prices it quotes are above or below what they consider “fair value.”
The first academic prediction market, now known as the Iowa Electronic Markets (IEM), was established by the University of Iowa in the 1980s. The IEM focuses on predicting the outcome of elections (Berg and Rietz 2014). Participants stake real money, although the number of participants and the size of their stakes are restricted in an agreement with the Commodity Futures Trading Commission (CFTC). Forecasts produced by the IEM have been shown to be on average more accurate than conventional opinion polling (Berg et al. 2008a). It is likely, however, that participants take polls into consideration when placing their trades, which underlines the important point that prediction markets are not really an alternative mechanism for making forecasts but a mechanism for aggregating many forecasts into a unified prediction. This aggregation can include tacit expertise concerning the relative merits of different information sources. More recently, prediction markets with expert participants have been used to predict whether psychology experiments will replicate. These markets were found to be more accurate than surveying the same experts (Dreber et al. 2015). Their ability to allow a wider range of participants to contribute is an oft-cited advantage of prediction markets, but results such as Dreber et al. suggest that, even for a given group of experts, it is advantageous to use prediction markets for combining their views.
Prediction markets for climate are not a new idea. Mark Boslough created a pilot market in 2011 for the global mean temperature anomaly in the following year on the now-defunct commercial platform Intrade (Boslough 2011). This market consisted of a strip of contracts for whether the temperature anomaly would exceed a range of thresholds from +0.30° to +1.10°C at intervals of 0.05°C. By partitioning a continuous quantity, like the temperature anomaly, into discrete intervals, a prediction market using binary contracts can be used to generate an implied probability distribution for the variable. Unfortunately, Intrade used a CDA and the market suffered from the problem of low activity and very large bid–ask spreads discussed above. This problem was exacerbated by the number of different contracts available. These issues illustrate the advantages of using an AMM when markets are for more specialized topics that struggle to attract mass interest in the way that sports betting does.
3. What problems can prediction markets for climate forecasting solve?
Prediction markets offer a solution to a couple of significant problems that affect climate forecasting: how to aggregate many sources of information and how to align the incentives of forecasters with those of forecast users. In addition, more sophisticated prediction markets can address the interdependence of forecasts of greenhouse gas (GHG) concentrations and forecasts of future climate, a problem that has been called “circularity.”
a. Aggregation.
Climate forecasting is a multidisciplinary activity. Predicting future climate given the future concentration of GHGs involves atmospheric science, oceanography, and other physical sciences. Much of this expertise is codified in coupled atmosphere–ocean general circulation models (CGCMs), but there is also knowledge not included within the models, as demonstrated by differences in simulations from different CGCMs. Furthermore, unconditional forecasts of future climate require future GHG concentrations to be predicted as well, an endeavor that calls for insight into economics, public policy, the quality of institutions, and the likely path of technological innovation (Moore et al. 2022; Venmans and Carr 2024). As well as the diversity of disciplines, there is a diversity of modeling approaches: CGCMs are augmented with different ways to downscale predictions to produce more localized forecasts, including regional climate models, statistical methods, and more recently AI-based techniques. Combining disparate information to produce forecasts relevant to decision-makers is a task for which prediction markets are well suited.
b. Aligning incentives.
Weather forecasters, whose predictions are for a few days ahead, can be rigorously evaluated with only a few months of forecasts and their subsequent verifications. Evaluating seasonal forecasts, with prediction horizons of a few months, with similar statistical robustness requires several years of forecast-verification pairs. Some practitioners attempt to get around this constraint by evaluating “reforecasts,” which are forecasts made retrospectively for verification times in the past (e.g., Hamill et al. 2006; Weisheimer and Palmer 2014; Risbey et al. 2022). While reforecasts should only use information that would have been available at the point when the forecast was made, in practice reforecasting can be susceptible to model overfitting and model selection biases, leading to exaggerated forecast skill. In finance, this type of retrospective forecasting is called “backtesting” and, because of selection biases, its results are treated with extreme caution (Bailey et al. 2014). In climate forecasting, the phrase “artificial skill” is used to describe the way that forecast skill estimates can be inflated when information that would not have been available when the forecast was made is indirectly incorporated into the forecast, such as when predictor variables are screened (DelSole and Shukla 2009) or via bias correction or the definition of climatology (Risbey et al. 2021). The most stringent forecast evaluations use truly out-of-sample predictions in which the forecast was issued before the verification time. This makes accumulating a sufficiently large dataset difficult for seasonal forecasts and usually impossible for climate forecasts with horizons of years to decades. This is a major problem for users of these forecasts who do not know how good a forecaster is and have no robust way of evaluating them before making use of their forecasts. Economists call this “information asymmetry” and have explained how it can cause the breakdown of markets when buyers are not prepared to pay for quality they cannot verify, and sellers are unwilling to invest in quality they cannot demonstrate (Akerlof 1970). Under these conditions, it is more rational for forecast providers to focus on presentation and the user-friendliness of their portals than on the accuracy of their forecasts. A common solution to the problem of asymmetric information is for the compensation of sellers to be contingent on quality, for example, by providing warranties (Grossman 1981). Prediction markets do this by rewarding forecasters based on the accuracy of their predictions.
c. Circularity.
If a prediction market for the future global temperature anomaly implies only a moderate rise in temperatures over the next few decades, this could be because market participants believe that the sensitivity of climate to GHGs is low, or it could be because they expect effective action to reduce GHGs. If these two scenarios cannot be differentiated, the prediction is of limited use to policy makers trying to mitigate climate change (although it is still potentially useful for climate adaptation). This interdependence of policy and prediction is sometimes referred to as circularity in the analogous context of interest rate setting and inflation forecasting (Bernanke and Woodford 1997; Sumner and Jackson 2008). It can be addressed using more complicated prediction markets, which, for example, allow for joint predictions of future GHG concentrations and global temperature anomalies. Such a market generates a two-dimensional distribution of prices that can be interpreted as the joint probability distribution for GHG concentrations and temperature anomaly. Probability distributions for temperature, conditional on a particular GHG concentration, can then be extracted, allowing the low-sensitivity scenario to be distinguished from the low-GHG-concentration scenario.
The distinguishing feature of prediction markets is that their primary and often only purpose is “information discovery.” Traditional financial markets for stocks and futures also perform information discovery, but they do this as a side effect of their primary purpose, which is the transfer of assets or risks. For example, there are futures contracts called weather derivatives whose payout depends on weather-related variables, such as the average monthly temperature in a specified city (Zeng 2000; Dutton 2002; Jewson and Brix 2005). These contracts were invented to allow firms, such as power companies, to hedge their weather-related risks. The prices of these derivatives provide indirect predictions of the weather variables to some extent, but they focus on specific cities and are only actively traded up to a year or so ahead, so they provide little information about long-range climate change. Researchers have also studied real estate prices and found that homes more exposed to sea level rise can sell at discounts compared to similar but less exposed properties (Bernstein et al. 2019), although these discounts may still be underpricing the risk (Gourevitch et al. 2023).
Many of the most significant climate risks will ultimately fall to governments who may be unable or unwilling to transfer them but who could still benefit from market-based predictions of the risks. Prediction markets can decouple the ability of markets to aggregate information from their role in the transfer of risk.
4. Demonstration prediction markets for climate-related risks
Since 2018, we have run two dozen individual prediction markets for climate-related risks covering four topics: U.K. monthly temperatures and rainfall, the Niño-3.4 sea surface temperature anomaly, Atlantic hurricane activity, and U.K. wheat yield. The primary goal of these markets was to test and refine the design of prediction markets for climate-related applications and to familiarize relevant experts with the concept. However, this collection of proof-of-concept markets also constitutes an “experiment-of-opportunity” that allows us to perform a preliminary evaluation of the collective forecasts produced by prediction markets.
None of the markets were “pay-to-play.” Instead, teams and individuals with relevant expertise were invited to participate and endowed with credits with which to trade contracts. After the actual outcomes were known, and the markets were settled, the credits that participants had accumulated were converted into cash rewards provided by the market sponsor. This arrangement avoided falling foul of laws regulating online betting. The exact mechanism used for converting on-platform credits to cash varied and was influenced by the sponsor and a desire to test different incentive schemes.
The prediction horizons of the demonstration markets ranged from a couple of months to 1 year ahead. Even the longest horizons were significantly shorter than those relevant to long-range climate prediction, but, because the markets were proofs-of-concept, the horizons are a compromise between using time scales relevant to climate while still being able to collect enough forecast-verification pairs to allow statistically meaningful analyses.
All the markets were hosted on versions of the AGORA prediction market platform developed by Winton Group and Hivemind Technologies Ltd. (Roulston et al. 2016). On this platform, each market has a set of outcomes defined so that one outcome, and only one, will occur. Once the actual outcome is known, any contract including this outcome converts to 1.00 credit while all other contracts become worthless. If a participant believes an outcome is undervalued, that is, the price of the outcome is less than the probability of it occurring, then they can buy the outcome and the AMM responds by raising its price and lowering the price of other outcomes so that prices across all outcomes always sum to one. If a participant believes an outcome is overvalued, then, because the outcomes cover every eventuality and their prices always sum to one, there must be other outcomes they believe are undervalued which they can buy. AGORA allows participants to define contracts covering one or more outcomes which they trade via an AMM. The AMM will quote a price (between 0.00 and 1.00) determined by how many of the outcomes it has already sold using an algorithm driven by the “logarithmic market scoring rule” (LMSR). The details of Hanson’s (2003) LMSR-AMM are in the appendix, but its key feature is that it ultimately rewards participants according to the logarithmic scoring rule, which is a proper scoring rule for probability forecasts (Good 1952).
The 24 markets fell into four groups.
a. U.K. monthly temperature and rainfall.
In 2018, six markets were run by Winton to simultaneously predict the monthly average of the maximum daily temperature for the United Kingdom and the total monthly rainfall—statistics published by the Met Office—for April–September. All the markets opened for trading in March. The outcome space of each market was a two-dimensional grid with temperature partitioned into intervals of 0.2°C, ranging from 0° to 25°C, and rainfall partitioned into intervals of 5 mm, from 0 to 200 mm. Open intervals covered temperatures below 0°C and above 25°C and rainfall above 200 mm. With 127 temperature intervals and 41 rainfall intervals, there were 5207 joint outcomes. Figure 1 shows a snapshot of prices in the July market on a day in June along with the 5207 distinct outcomes. The purpose of these markets was to test the viability of joint-outcome markets with two-dimensional outcome spaces and a very large number of distinct outcomes. A market with this structure could be used to make joint predictions of carbon dioxide concentration and global temperature anomalies for future years, overcoming the problem of circularity.
A snapshot of prices on 28 Jun 2018 in the prediction market to jointly predict the U.K. monthly temperature and rainfall for July 2018. The grid shows the partitioning of the outcome space into 5207 distinct outcomes. The shading signifies the price of each of these outcomes. These prices can be interpreted as the probabilities of the outcomes. Similar joint markets for GHG concentrations and global temperature anomalies could be used to produce unconditional predictions of temperature and predictions conditioned on the GHG concentration.
Citation: Bulletin of the American Meteorological Society 105, 10; 10.1175/BAMS-D-24-0135.1
Two dozen teams from atmospheric science, statistics, engineering, and economics departments in British universities took part. They were endowed with on-platform credits with which to trade in any of the six markets. After the September market had been settled, the 10 teams that had accumulated the most credits received cash rewards from Winton. The first team received 10 000 GBP, the second place received 9000 GBP, and so on to the 10th placed team that received 1000 GBP. This “tournament” scheme for distributing prizes was chosen instead of simply paying out prizes in proportion to accumulated credits out of regulatory caution. However, a potential drawback with tournament pay offs is that it can encourage excessive risk taking by participants to try and improve their ranking (Witkowski et al. 2023). The large number of prizes was intended to mitigate this problem.
b. El Niño–Southern Oscillation.
During 2019 and 2020, the company Hivemind ran markets to predict the monthly averaged sea surface temperature anomaly (SSTA) in the Niño-3.4 region of the central Pacific (5°N–5°S, 170°–120°W), a key indicator of El Niño–Southern Oscillation (ENSO).
A wide range of approaches is used for predicting ENSO (Tang et al. 2018). Some groups use models that simulate the dynamics of the ocean and atmosphere, while other researchers use statistical models and machine learning. This diversity of forecasting methods makes ENSO prediction an interesting candidate for a prediction market.
Nine markets to predict the monthly averaged Niño-3.4SSTA were run for the months of July 2019–March 2020. All the markets opened for trading in April 2019, and each one closed on the last day of the month which it covered. The outcome space was partitioned into intervals of 0.1°C ranging from −4° to +4°C, with open intervals at either end, giving a total of 41 outcomes.
Eighteen researchers with relevant expertise from the United Kingdom, Europe, and the United States took part, and each was endowed with on-platform credits for each of the monthly markets. Credits were not transferable between the markets, in contrast to the markets for U.K. temperature and rainfall. After each monthly market was settled, the credits accumulated by participants were converted to lottery tickets and each month’s lottery winner received 500 GBP. The rationale for converting credits to lottery tickets is that participants will want to maximize their expected number of lottery tickets irrespective of their own risk preferences. Since, if the LMSR is used, the lottery tickets awarded are proportional to a proper scoring rule, participants should trade strictly based on their true beliefs of the outcome probabilities. This approach is sometimes used in experimental economics to induce subjects to behave in a risk neutral way (Berg et al. 2008b).
ENSO is the largest source of interannual variability in the global climate, so predictions of ENSO are of interest. The 9-month period covered by the ENSO prediction markets during 2019 and 2020 did not contain any El Niño or La Niña events. Also, the consecutive months would be correlated so it is not possible to draw statistically meaningful conclusions about the performance of the ENSO predictions in isolation.
c. Atlantic hurricanes.
The Atlantic hurricane season runs from June to November. The number of hurricanes that occur each year is highly variable: There were no recorded hurricanes in 1907 or 1914 but 15 in 2005, and the average number is about 6.
Identifying trends in the Atlantic hurricane activity is difficult due to heterogeneity in the observational record (Vecchi et al. 2021), and predicting whether climate change will affect hurricane activity is also difficult because general circulation models do not have sufficient resolution to simulate hurricanes directly. Instead, global simulations must be downscaled using either models that simulate the climate at a higher resolution over a smaller region (e.g., Knutson et al. 2022) or statistical methods that make assumptions about the relationship between larger-scale features and hurricane activity (e.g., Vecchi et al. 2008).
During the 2020 Atlantic hurricane season, two prediction markets were run: one for the total number of hurricanes and one for the number making U.S. landfall. The markets were sponsored by Lloyd’s of London as part of Hivemind’s participation in its Lloyd’s Laboratory Insurtech accelerator. Both markets had an outcome space of 21 outcomes: no hurricanes, 1 hurricane, 2 hurricanes, up to “20 or more” hurricanes. Due to the timing of the accelerator, the markets opened in August 2020, more than 2 months into the season and after two hurricanes had already occurred. The markets closed on November 30 by which time 13 storms had been classed as hurricanes and 6 of these had made landfall in the United States.
Twenty individuals from academic institutions and commercial forecasting and reinsurance firms took part. They were endowed with on-platform credits that could be deployed in either market. After settlement, the three individuals in each market who had accumulated the most credits received 1200, 800, and 500 GBP for first, second, and third places, respectively.
Figure 2 shows a snapshot of prices in the hurricane market on 28 August 2020. This market is typical of a market which generates a one-dimensional probability distribution.
A snapshot of prices in the prediction market for the number of Atlantic hurricanes during the 2020 season on 28 Aug 2020. These prices can be interpreted as an implied probability distribution.
Citation: Bulletin of the American Meteorological Society 105, 10; 10.1175/BAMS-D-24-0135.1
d. U.K. wheat yields.
The YIELD21 market to predict the U.K. average wheat yield for the 2020/21 growing season was run by Hivemind, sponsored by Agrimetrics, a supplier of agricultural data, in partnership with Weather Logistics, a seasonal forecast provider.
Wheat is the United Kingdom’s largest crop. It is winter wheat; planting begins in October and harvesting starts in July. Between 2016/17 and 2022/23, yields have ranged from 7.0 to 8.7 t ha−1 (tonnes per hectare), averaging 7.9 t ha−1. The outcome space of the market consisted of 20 intervals of 0.2 t ha−1 between 6 and 10 t ha−1, with two additional outcomes covering values below 6 t ha−1 and above 10 t ha−1.
Sixteen participants from academia and crop research organizations took part. The market opened on 1 February 2021 and closed on 30 September 2021. The market was settled when DEFRA published the U.K. wheat yield of 7.8 t ha−1 in December 2021. After the market was settled, the three participants with the most credits received monetary rewards from Agrimetrics.
Table 1 provides a summary of the 24 markets, including information about how many separate trades were made by participants over the duration of the market. Some of the markets had thousands of trades. In these markets, many participants were using the platform’s API to place automated trades informed by their own models of the probability distribution. Figure 3 shows how the probability distributions implied by the prices in each market evolved over the course of the market. This figure shows these probability distributions converging on the actual outcomes as the prediction horizon shrinks. This indicates that the markets were effectively incorporating new information up to the point the market was closed for trading.
The markets included in analyses. The joint U.K. rainfall–temperature markets had two-dimensional outcome spaces, but the marginal distributions for rainfall and temperature were treated as separate markets for the analysis.
The evolution of the price distributions throughout each of the 24 markets. The black line represents the median of the implied probability distributions, while the gray envelopes represent the 50% and 90% intervals. The green line is the outcome that was ultimately observed.
Citation: Bulletin of the American Meteorological Society 105, 10; 10.1175/BAMS-D-24-0135.1
5. Evaluation of prediction market-based forecasts
Although the markets described were not conceived as a verification experiment, we can still use the data they produced to evaluate some aspects of their performance. Two key attributes that are desirable in probabilistic forecasts are reliability and resolution. Reliability, referred to by statisticians as “probabilistic calibration,” means that the frequency of actual outcomes matches their predicted probabilities. For example, if you collect all the forecasts which say there is an 80% chance of rain, it should have subsequently rained following 80% of those forecasts. This property generalizes to forecasts of continuous quantities: for a well-calibrated probabilistic distribution, the actual observation should fall below the 10th percentile 10% of the time, below the 20th percentile 20% of the time, and so on. Given that probabilistic forecasts are reliable, we would also like them to have a resolution. For binary forecasts, this means that they make a lot of predictions close to 0% or 100%, while for predictions of continuous quantities, it means that the forecast probability distributions should be narrow; resolution is sometimes referred to as sharpness. Forecasts can be sharp but inaccurate. The property of discrimination refers to the likelihood of a forecast predicting the actual outcome and is related to the information content of the forecast (Potts 2003).
The reliability of ensemble forecasts is typically evaluated using rank histograms (Hamill 2001), and the reliability of probability forecasts for binary events can be tested by plotting reliability curves of the fraction of times the event occurs against the predicted probability (Toth et al. 2003). Because the prediction markets produce probability distributions of continuous quantities (or discrete but ordered outcomes in the case of hurricane numbers), it is more appropriate to use quantile–quantile (Q–Q) plots. In a Q–Q plot, the fraction of actual observations that fall below each percentile of the forecast distribution is plotted against the percentile. For a large number of forecasts, this curve should be diagonal, with X% of the actuals falling below the Xth percentile for any value of X. For a smaller number of forecasts, there will be deviations from the diagonal, even if the forecasts are reliable. Whether these deviations are significant can be determined by simulating an ensemble of Q–Q curves for perfectly reliable forecasts by drawing synthetic actuals from the forecast probability distributions. If the true Q–Q curve lies within this ensemble, this is consistent with good reliability.
Figure 4 shows the Q–Q curves for the prediction market forecasts for prediction horizons of 120, 90, and 60 days. The gray envelopes in these plots represent an ensemble of Q–Q curves for perfectly reliable forecasts. The Q–Q curves for the true actuals fall within the envelopes, indicating that the forecasts produced by the prediction markets cannot be distinguished from perfectly reliable forecasts given the sample size.
Reliability Q–Q curves for the forecasts generated by prediction markets. The panels show the reliability of the forecasts at lead times of 120, 90, and 60 days. The gray envelopes represent reliability curves for perfectly reliable forecasts constructed by drawing synthetic verifications from the probability distributions implied by market prices; 95% of the reliability curves for these perfect forecasts lie within the envelope.
Citation: Bulletin of the American Meteorological Society 105, 10; 10.1175/BAMS-D-24-0135.1
To assess the relative information content of the market-based predictions, a benchmark climatology was used for each forecast. For the predictions of U.K. monthly temperatures and rainfall, the historical values for the relevant month between 1900 and 2017 were used. For the Niño-3.4 SSTA, the relevant monthly values since 1950 were used. The hurricane climatologies were based on annual numbers of hurricanes since 1920, while for the U.K. wheat yield, reported yields between 1990 and 2019 were used. In all cases, a probability distribution in the outcome space of the market was constructed from the historical samples using kernel density estimation. Some of the predicted quantities show trends, either due to climate change or possible changes to observing technologies. No attempt was made to adjust for these when estimating the climatological distributions.
Figure 5 shows how the estimated mean information content of the market-based forecasts changes with the prediction horizon, which is measured as the number of days to when the market closed for trading. The error on this mean was estimated by resampling the categories of the forecasts: U.K. temperature, U.K. rainfall, Niño-3.4 SST, hurricane numbers, and U.K. wheat yield. This was to allow for potential correlations of the forecast errors within these categories. The plot indicates that, even at the longest horizons, the prediction market forecasts are more informative than the climatologies used as benchmarks. As mentioned, these climatologies were not adjusted for trends so the improvement provided by the prediction markets could be explained by participants adjusting the climatologies taking the trends into account. On shorter horizons, there is a marked increase in the informativeness of the prediction market forecasts, which is to be expected. The U.K. temperature and rainfall markets and the Niño-3.4SSTA markets were predicting monthly statistics and allowed trading up to the end of the month in question, so when the prediction horizon falls below 30 days, these markets can start including observations into the prediction of what the final temperature average or total rainfall will be. The increase begins as early as 90 days before market close, however, demonstrating that the market participants were using forecasts to inform their trading.
The relative information [given by Eq. (1)] of the probability forecasts generated by the prediction markets as a function of days until market closes. The forecasts were benchmarked against a climatological distribution estimated from historical observations of the variable being predicted. The mean relative information of all the forecasts is also shown along with an estimate of the standard error. The standard error was estimated by bootstrap resampling of the forecast categories (U.K. temperature, U.K. rainfall, Niño-3.4SSTA, hurricanes, and U.K. wheat yield).
Citation: Bulletin of the American Meteorological Society 105, 10; 10.1175/BAMS-D-24-0135.1
6. Discussion and conclusions
Although the markets described in this article were not conceived as a single study but as separate proofs-of-concept, with different sponsoring organizations, they do demonstrate several important features:
- 1)That participants with expertise relevant to weather and climate prediction can engage with prediction markets, which are considerably more complex than common binary markets, including joint-outcome markets for simultaneously predicting two quantities. Such markets are necessary for addressing the circularity problem created by the interdependence of GHG concentrations and climate forecasts.
- 2)That participants can be incentivized with incentive schemes that do not require them to pay to take part, circumventing regulatory constraints or outright prohibitions on gambling.
- 3)That prediction markets for climate-related predictands can produce probability forecasts that are consistent with being reliable (probabilistically calibrated).
- 4)That climate-related prediction markets can successfully incorporate new information as it becomes available to participants.
None of the demonstration markets described in this article had prediction horizons beyond 1 year. Whether the reliability seen in these markets would persist in markets with much longer horizons can only be determined by running markets with horizons of multiple years more pertinent to long-range climate forecasting. Running prediction markets with multiyear horizons would introduce governance issues that would need to be addressed. Participants would need to be confident that the institution running the market would be around for its duration. There are, however, numerous financial instruments, such as bonds and pensions, which involve multidecadal commitments, so this is not a fundamental impediment to running long-range prediction markets.
Subsidized prediction markets for climate forecasting, with expert participants, could be adopted by public or private sector organizations that want to solve the problem of asymmetric information that is inherent in long-range climate prediction and to a lesser extent seasonal forecasting. Such markets would also allow organizations to remove the risks associated with relying on a single provider for forecasts. Prediction markets would also be a way to effectively integrate emerging forecasting techniques—such as AI—with more established methods. Adopting prediction markets for climate forecasting will require something of a cultural shift by both climate forecasters and funding organizations. Our experience suggests that climate forecasters can quickly become proficient in using prediction markets but it remains to be seen whether funders and users of climate forecasts can become comfortable with market-based predictions.
Acknowledgments.
The authors thank the participants who attended a workshop on climate prediction market design hosted by Winton in London in 2016. They would also like to thank Winton, Hivemind, Lloyd’s Lab, Agrimetrics, and Weather Logistics for supporting markets in this study and all the institutions and individuals who participated in the markets. Feedback from reviewers greatly improved the quality of this paper. Climate Risk and Uncertainty Collective Intelligence Aggregation Laboratory (CRUCIAL) is supported by the SCOR Foundation for Science.
Data availability statement.
Analysis code and anonymized data are available at https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/PG3MPA.
APPENDIX The LMSR Market Maker
The logarithmic market scoring rule (LMSR), proposed by Hanson (2003, 2007), is a subsidized automated market maker. Prediction markets using the LMSR can support large numbers of possible outcomes without suffering from low liquidity.
Notice that these prices are normalized so that
The LMSR marker maker rewards participants linearly in the logarithmic scoring rule. The logarithmic scoring rule is a strictly proper scoring rule that incentivizes participants to reveal their true beliefs about the probabilities of each outcome.
References
Abramowicz, M., 2008: Predictocracy: Market Mechanisms for Public and Private Decision Making. Yale University Press, 352 pp.
Akerlof, G. A., 1970: The market for “Lemons”: Quality uncertainty and the market mechanism. Quart. J. Econ., 84, 488–500, https://doi.org/10.2307/1879431.
Aliakbari, E., and R. McKitrick, 2018: Information aggregation in a prediction market for climate outcomes. Energy Econ., 74, 97–106, https://doi.org/10.1016/j.eneco.2018.06.002.
Arrow, K. J., and Coauthors, 2008: The promise of prediction markets. Science, 320, 877–878, https://doi.org/10.1126/science.1157679.
Bailey, D. H., J. M. Borwein, M. L. de Prado, and Q. J. Zhu, 2014: Pseudo-mathematics and financial charlatanism: The effects of backtest overfitting on out-of-sample performance. Not. Amer. Math. Soc., 61, 458–471, https://doi.org/10.2139/ssrn.2308659.
Berg, J. E., and T. A. Rietz, 2014: Market design, manipulation, and accuracy in political prediction markets: Lessons from the Iowa Electronic Markets. Political Sci. Polit., 47, 293–296, https://doi.org/10.1017/S1049096514000043.
Berg, J. E., F. D. Nelson, and T. A. Rietz, 2008a: Prediction market accuracy in the long run. Int. J. Forecasting, 24, 285–300, https://doi.org/10.1016/j.ijforecast.2008.03.007.
Berg, J. E., T. A. Rietz, and J. W. Dickhaut, 2008b: On the performance of the lottery procedure for controlling risk preferences. Handbook of Experimental Economics Results, Vol. 1, Elsevier, 1087–1097, https://doi.org/10.1016/S1574-0722(07)00115-1.
Bernanke, B. S., and M. Woodford, 1997: Inflation forecasts and monetary policy. J. Money Credit Banking, 29, 653–684, https://doi.org/10.2307/2953656.
Bernstein, A., M. T. Gustafson, and R. Lewis, 2019: Disaster on the horizon: The price effect of sea level rise. J. Financ. Econ., 134, 253–272, https://doi.org/10.1016/j.jfineco.2019.03.013.
Boslough, M., 2011: Using prediction markets to generate probability density functions for climate change risk assessment. 2011 Fall Meeting, San Francisco, CA, Amer. Geophys. Union, Abstracts GC11A-0885, https://ui.adsabs.harvard.edu/abs/2011AGUFMGC11A0885B/abstract.
DelSole, T., and J. Shukla, 2009: Artificial skill due to predictor screening. J. Climate, 22, 331–345, https://doi.org/10.1175/2008JCLI2414.1.
Dreber, A., T. Pfeiffer, J. Almenberg, S. Isaksson, B. Wilson, Y. Chen, B. A. Nosek, and M. Johannesson, 2015: Using prediction markets to estimate the reproducibility of scientific research. Proc. Natl. Acad. Sci. USA, 112, 15 343–15 347, https://doi.org/10.1073/pnas.1516179112.
Dutton, J. A., 2002: Opportunities and priorities in a new era for weather and climate services. Bull. Amer. Meteor. Soc., 83, 1303–1312, https://doi.org/10.1175/1520-0477-83.9.1303.
Gneiting, T., and A. E. Raftery, 2007: Strictly proper scoring rules, prediction, and estimation. J. Amer. Stat. Assoc., 102, 359–378, https://doi.org/10.1198/016214506000001437.
Good, I. J., 1952: Rational decisions. J. Roy. Stat. Soc., 14B, 107–114, https://doi.org/10.1111/j.2517-6161.1952.tb00104.x.
Gourevitch, J. D., C. Kousky, Y. Liao, C. Nolte, A. B. Pollack, J. R. Porter, and J. A. Weill, 2023: Unpriced climate risk and the potential consequences of overvaluation in US housing markets. Nat. Climate Change, 13, 250–257, https://doi.org/10.1038/s41558-023-01594-8.
Grossman, S. J., 1981: The informational role of warranties and private disclosure about product quality. J. Law Econ., 24, 461–483, https://doi.org/10.1086/466995.
Hagedorn, R., and L. A. Smith, 2009: Communicating the value of probabilistic forecasts with weather roulette. Meteor. Appl., 16, 143–155, https://doi.org/10.1002/met.92.
Hamill, T. M., 2001: Interpretation of rank histograms for verifying ensemble forecasts. Mon. Wea. Rev., 129, 550–560, https://doi.org/10.1175/1520-0493(2001)129%3c0550:IORHFV%3e2.0.CO;2.
Hamill, T. M., J. S. Whitaker, and S. L. Mullen, 2006: Reforecasts: An important dataset for improving weather predictions. Bull. Amer. Meteor. Soc., 87, 33–46, https://doi.org/10.1175/BAMS-87-1-33.
Hanson, R., 2003: Combinatorial information market design. Inf. Syst. Front., 5, 107–119, https://doi.org/10.1023/A:1022058209073.
Hanson, R., 2007: Logarithmic markets scoring rules for modular combinatorial information aggregation. J. Prediction Markets, 1, 3–15, https://doi.org/10.5750/jpm.v1i1.417.
Hsu, S.-L., 2011: A prediction market for climate outcomes. Univ. Colo. Law Rev., 83, 179, https://doi.org/10.2139/ssrn.1770882.
Jewson, S., and A. Brix, 2005: Weather Derivative Valuation: The Meteorological, Statistical, Financial and Mathematical Foundations. Cambridge University Press, 392 pp.
Knutson, T. R., J. J. Sirutis, M. A. Bender, R. E. Tuleya, and B. A. Schenkel, 2022: Dynamical downscaling projections of late twenty-first-century U.S. landfalling hurricane activity. Climatic Change, 171, 28, https://doi.org/10.1007/s10584-022-03346-7.
Lucas, G. M., Jr., and F. Mormann, 2018: Betting on climate policy: Using prediction markets to address global warming. UC Davis Law Rev., 52, 1429–1486.
Moore, F. C., K. Lacasse, K. J. Mach, Y. A. Shin, L. J. Gross, and B. Beckage, 2022: Determinants of emissions pathways in the coupled climate–social system. Nature, 603, 103–111, https://doi.org/10.1038/s41586-022-04423-8.
Nay, J. J., M. Van der Linden, and J. M. Gilligan, 2016: Betting and belief: Prediction markets and attribution of climate change. 2016 Winter Simulation Conf. (WSC), Washington, DC, Institute of Electrical and Electronics Engineers, 1666–1677, https://doi.org/10.1109/WSC.2016.7822215.
Potts, J. M., 2003: Basic concepts. Forecast Verification: A Practitioner’s Guide in Atmospheric Science, I. T. Jolliffe and D. B. Stephenson, Eds., John Wiley and Sons, 13–36.
Risbey, J. S., and Coauthors, 2021: Standard assessments of climate forecast skill can be misleading. Nat. Commun., 12, 4346, https://doi.org/10.1038/s41467-021-23771-z.
Risbey, J. S., and Coauthors, 2022: Common issues in verification of climate forecasts and projections. Climate, 10, 83, https://doi.org/10.3390/cli10060083.
Roulston, M., T. Kaplan, B. Day, and K. Kaivanto, 2022: Prediction-market innovations can improve climate-risk forecasts. Nat. Climate Change, 12, 879–880, https://doi.org/10.1038/s41558-022-01467-6.
Roulston, M. S., and L. A. Smith, 2002: Evaluating probabilistic forecasts using information theory. Mon. Wea. Rev., 130, 1653–1660, https://doi.org/10.1175/1520-0493(2002)130<1653:EPFUIT>2.0.CO;2.
Roulston, M. S., D. J. Hand, and D. W. Harding, 2016: Establishing a real-money prediction market for climate on decadal horizons. 2016 Fall Meeting, San Francisco, CA, Amer. Geophys. Union, Abstracts GC13A-1188, https://ui.adsabs.harvard.edu/abs/2016AGUFMGC13A1188R/abstract.
Sandroni, A., 2014: At least do no harm: The use of scarce data. Amer. Econ. J. Microecon., 6 (1), 1–4, https://doi.org/10.1257/mic.6.1.1.
Sumner, S., and A. L. Jackson, 2008: Using prediction markets to guide global warming policy. 63rd 153 Int. Atlantic Economic Conf., Madrid, Spain, 14–18.
Tang, Y., and Coauthors, 2018: Progress in ENSO prediction and predictability study. Natl. Sci. Rev., 5, 826–839, https://doi.org/10.1093/nsr/nwy105.
Toth, Z., O. Talagrand, G. Candille, and Y. Zhu, 2003: Probability and ensemble forecasts. Forecast Verification: A Practitioner’s Guide in Atmospheric Science, I. T. Jolliffe and D. B. Stephenson, Eds., John Wiley and Sons, 137–163.
Vandenbergh, M. P., K. T. Raimi, and J. M. Gilligan, 2013: Energy and climate change: A climate prediction market. UCLA Law Rev., 61, 1962.
Vecchi, G. A., K. L. Swanson, and B. J. Soden, 2008: Whither hurricane activity? Science, 322, 687–689, https://doi.org/10.1126/science.1164396.
Vecchi, G. A., C. Landsea, W. Zhang, G. Villarini, and T. Knutson, 2021: Changes in Atlantic major hurricane frequency since the late-19th century. Nat. Commun., 12, 4054, https://doi.org/10.1038/s41467-021-24268-5.
Venmans, F., and B. Carr, 2024: Literature-informed likelihoods of future emissions and temperatures. Climate Risk Manage., 44, 100605, https://doi.org/10.1016/j.crm.2024.100605.
Weisheimer, A., and T. N. Palmer, 2014: On the reliability of seasonal climate forecasts. J. Roy. Soc. Interface, 11, 20131162, https://doi.org/10.1098/rsif.2013.1162.
Witkowski, J., R. Freeman, J. W. Vaughan, D. M. Pennock, and A. Krause, 2023: Incentive-compatible forecasting competitions. Manage. Sci., 69, 1354–1374, https://doi.org/10.1287/mnsc.2022.4410.
Zeng, L., 2000: Weather derivatives and weather insurance: Concept, application, and analysis. Bull. Amer. Meteor. Soc., 81, 2075–2082, https://doi.org/10.1175/1520-0477(2000)081<2075:WDAWIC>2.3.CO;2.