The Signal-to-Noise Paradox in Climate Forecasts Revisiting Our Understanding and Identifying Future Priorities

our


E652
T he signal-to-noise paradox (SNP) in model-based climate forecasting refers to counterintuitive situations where time series of ensemble-mean forecasts correlate better with observations of the real world than with individual members of the model forecast ensemble.It implies that the predictability of the real world exceeds the predictability within the model world.
The expected correlation of observations with a forecast ensemble mean is related to the signal-to-noise ratio of the forecasting system in a monotonic but nonlinear way (Kumar 2009).Here, "signal" refers to the temporal variability of the ensemble mean, and "noise" to the variability of the ensemble members about the ensemble mean.The SNP occurs when the correlation between the ensemble-mean forecast and observations is greater than expected given the signal-to-noise ratio of the forecasting system.
The SNP is often quantified with the ratio of predictable components (RPC) between the real and model world.The predictable component of the observations is estimated from the correlation coefficient between the ensemble-mean signal and the observations, and the (squared) predictable component of the model is estimated from the fraction of the signal variance to the total model variance.This latter fraction is identical to the (squared) correlation coefficient between the ensemble-mean signal and the individual ensemble members.If the RPC is significantly larger than 1, then the observations are more predictable than the model ensemble realizations, which constitutes the SNP.
Evidence supporting the existence of the SNP in seasonal and decadal climate predictions was first presented by Scaife et al. (2014) and Eade et al. (2014).They described instances where the predictable components of the winter atmospheric circulation over the North Atlantic were sometimes lower in models than in observations.Despite numerous studies exploring different facets of the SNP since the comprehensive review by Scaife and Smith (2018), a conclusive solution to the problem has not yet been reached.The Oxford workshop provided a dedicated platform not only to present our current understanding to an expert audience but, more importantly, to critically discuss gaps and problems in our state of knowledge.The primary objective of the workshop was to achieve a better and more complete comprehension of the paradox, as well as to identify suggestions toward its resolution.During the workshop it became clear that our community, including the authors of this report, hold a range of perspectives on the problem, and our meeting report reflects this diversity of ideas and evidence presented at the meeting.

E653
Shaping the problem The statistical interpretation.The correlation between the ensemble mean and observations can be visualized in a simple trigonometric framework with the three sides of the triangle corresponding to the variances of the ensemble mean, the observation and the forecast error (i.e., the mean squared difference between the ensemble mean and the observation; Bröcker et al. 2023).In this way, the behavior of the RPC, and possible emergence of the SNP, can be studied as a function of fairly primitive statistics.From the several conclusions emerging, the sensitive dependence of the RPC on these statistics, particularly if the model signal is small, is one of the most noticeable.Consequently, in this situation it is easy to misdiagnose a forecast as suffering from the SNP or to miss its presence.
Formally, a small signal-to-noise ratio can be either due to a deficit in the signal, to excessive noise, or to a combination of both.In the statistical interpretation, the SNP tends to manifest itself in a too-weak predicted ensemble-mean signal compared with the observed signal (Siegert et al. 2016;Falkena et al. 2022;Williams et al. 2023).
A nonlinear perspective in terms of circulation regimes.A complementary interpretation of the SNP using dominant and persistent circulation regimes has been put forward by Strommen and Palmer (2019) and Zhang and Kirtman (2019).A nonlinear circulation regime can be likened to a potential well in state space.Here, the signal can be understood as the regime centroid and the noise as the width of the regime.In general, models predict regime centroids and the ensemble-mean signal of fluctuating between these preferred regimes reasonably well.However, the simulated regimes are not as statistically significant as in observational data, implying that the potential wells associated with these regimes are too shallow and broad compared with the deeper and sharper regimes of the real world.Such blurred regimes would be characterized by weak persistence within the regime centroids.The resulting signal-to-noise ratios from realistic signals of regime centroids but too strong noise components of shallow regimes will thus be small.

Detection of the paradox
Following the earlier findings, the number of studies commenting on the signal and noise characteristics has been rapidly growing.While we cannot mention all of them here, new evidence for, but also against, the existence of the SNP in climate predictions was presented at the workshop.
As part of a skill comparison of the winter North Atlantic Oscillation (NAO) across several seasonal forecast models, Baker et al. (2018) demonstrated that most but not all individual systems had RPC > 1. Consistently with Stockdale et al. (2015), Charlton-Perez et al. (2019) found that the low signal-to-noise ratio is predominantly a feature of the lower and middle troposphere linked to weak signal amplitudes in early winter and is not present in the stratosphere.Smith et al. (2020) in a very large ensemble of decadal predictions of the NAO showed high correlation skill but very low amplitudes of the ensemble-mean signals, resulting in a high RPC.While the SNP was initially identified for the winter season, new studies have shown evidence that it also occurs for summer precipitation over northern Europe and the Tibetan Plateau (Dunstone et al. 2018;Yeager et al. 2018;Hu and Zhou 2021).More recently, Dunstone et al. (2023) diagnosed the SNP in predictions of the summer NAO with spuriously weak dynamical signals in the model.
It is important to contrast and learn from situations where the SNP is not apparent.In seasonal hindcasts for recent decades, Falkena et al. (2022) found that while the system exhibited a signal deficit in the winter NAO index of about a factor of 2, the zonal flow regimes-both the positive NAO and a negative counterpart of Scandinavian blocking-showed

E654
no signal deficit at all.The deficit for the NAO index thus resulted primarily from the negative NAO (blocked) regime, where the predictable signal in the model did not align with the observations.
Applying the concept of predictable modes of climate variability to seasonal forecasts, Hodson et al. (2023) found that the SNP does not affect all predictable modes over the North Atlantic region equally.The amplitude of the NAO-like mode, which is influenced by Indian Ocean temperatures, is significantly underestimated.In contrast, the Pacific-North American (PNA)-like mode, which is strongly related to tropical Pacific variability, shows less of an underestimation.
Seasonal forecasts of the Southern Annular Mode (SAM) during late spring were analyzed in Seviour et al. (2014) and Byrne et al. (2019).Both studies did not find a significant SNP, but in Seviour et al. (2014) the ensemble-mean correlation with observations was twice that expected, raising the possibility that a Southern Hemisphere SNP may be present in that system.
Whether the SNP already emerges in subseasonal predictions is as yet unclear.Preliminary results presented at the workshop showed a possible weak problem for the NAO at time scales of approximately 30 days but its disappearance at longer lead times is an indication of a lack of robustness.
The combination of multiple single-model forecast ensembles into a multimodel ensemble has often been used as a pragmatic "ensemble of opportunity" to overcome problems in single-model systems.Its success in providing more skillful and reliable forecasts relies on model error compensation and reduced overconfidence (Palmer et al. 2004).However, diagnosing an SNP is problematic in a multimodel ensemble because the ensemble members are not designed to represent model uncertainty in a coherent theoretical way and will thus primarily reflect the diversity of the individual modeling system deficiencies with respect to signals and noise.Notwithstanding, multimodel ensembles are widely used for climate predictions and weak predictable signals need to be calibrated to obtain the most accurate and reliable forecasts (Smith et al. 2022).
Hypotheses for mechanisms contributing to the SNP Nonstationarity and intermittency characteristics.Before we discuss physical and dynamical mechanisms that may lie behind the SNP, it is instructive to reflect on the nonstationarity and intermittent character of the paradox.In the context of coupled and uncoupled seasonal hindcasts covering a 110-yr period, it was shown that the predictable components undergo low-frequency variations, linked to significant multidecadal skill variations (Weisheimer et al. 2019(Weisheimer et al. , 2020)).While the North Atlantic circulation in winter showed regions with RPC > 1 toward the end of the twentieth century, the early and particularly the middle decades of the century did not.Considering the substantial uncertainty in estimating the RPC for the winter NAO index, robust periods of values significantly above 1 were not detected.Any mechanism to explain the SNP for the recent past will thus also have to explain the absence of the paradox for earlier historical decades.
A closer look at the contributions from individual years (or seasons) to the covariances that form the basis for the correlation and RPC analyses revealed that the interannual variability is very large, leading to pronounced intermittency in the contributions of individual years to the overall estimate of the SNP (Weisheimer et al. 2019).The signal-to-noise problems of the winter NAO are dominated by a few years with larger amplitudes of the observed and modeled anomalies, while most of the years contribute very little to the SNP estimates.For the winter NAO, four of the most contributing five years were characterized by negative NAO flow patterns over the North Atlantic and one year was the strongest positive NAO winter during the twentieth century.These findings imply that concentrating efforts on specific key extreme years or seasons to understand the paradox should be possible.

E655
Weak teleconnections and signal deficits.Teleconnections between the tropics and the extratropics are the backbone mechanism for predictable signals in the tropics to influence remote regions in the extratropics.However, the strength of these teleconnections appears to be systematically underestimated in subseasonal and seasonal forecast models (Garfinkel et al. 2022;Hardiman et al. 2022;Di Capua et al. 2023;Molteni and Brookshaw 2023;Roberts et al. 2023;Williams et al. 2023).While some of this bias may be due to sea surface temperature (SST) biases (Beverley et al. 2023), the signal deficit because of too-weak teleconnection strength is also evident with prescribed observed SST or before the SST biases develop (Chen et al. 2020;Garfinkel et al. 2022).
The stratosphere provides a pathway for tropical signal propagation to the extratropical troposphere and the quasi-biennial oscillation (QBO) has been identified as another likely source of the SNP for the NAO (Garfinkel et al. 2018).With the QBO-NAO teleconnections being too weak in seasonal forecasts, diagnostically amplifying the teleconnection increased the ensemble-mean NAO signal and resolved the signal-to-noise problem (O'Reilly et al. 2019).
In the Euro-Atlantic sector during winter, the predictable seasonal signal seems largely associated with El Niño-Southern Oscillation (ENSO) events, hence with just a small number of years.The ENSO signal in circulation regimes can be enhanced using Bayesian regularization and is evident even in 10-member ensembles (Falkena et al. 2023).As noted earlier, the model signal matches that of the observations for the zonal flow regimes, but not for the blocked flow regimes.Regressing the observations against the ensemble-mean forecast provides a simple and robust measure of whether there is a signal deficit, which does not rely on being able to estimate the noise in the observations (Falkena et al. 2022).
Transient eddy feedback.Eddy feedback describes the interaction of small-scale transient eddies with large-scale climate anomalies in the midlatitudes.It is an essential part of the process that governs the intensity and persistence of North Atlantic weather patterns (Zhao et al. 2023).It has been hypothesized that for climate models to simulate the correct amplitude of the tropospheric response to remote influences, they need to correctly simulate eddy feedback.One example of this is the jet stream response to Arctic sea ice loss (Screen et al. 2022).Eddy feedback is too weak in almost all current climate models over the North Atlantic (Smith et al. 2022).It has recently been argued that increased and more accurate eddy feedback is linked with stronger predictable signals, increased skill, and a reduced signal-tonoise error in seasonal forecast systems of the northern extratropics (Hardiman et al. 2022).
There are mounting indications though that this bias is less pronounced in the Southern Hemisphere (e.g., Screen et al. 2022).The reasons for such a hemispheric asymmetry in the eddy feedback strength are yet to be understood, with the role of topography and stationary waves being plausible candidates.
Air-sea coupling.Inadequate representation of air-sea interactions in current climate models has also been proposed as a potential cause or contributor to the SNP.Ossó et al. (2020) presented evidence that two-way coupling between the North Atlantic Ocean and atmosphere in summer was not well simulated in a coupled climate model, which implied weaker predictable signals in the model.Zhang et al. (2021) reported that decadal predictability is underestimated in CMIP5 models and suggested that ocean-atmosphere coupling in oceaneddy-rich regions (such as the Gulf Stream) could be an important factor.Patrizio et al. (2023) recently described further evidence supporting this hypothesis for boreal winter in the North Atlantic.
Model resolution as a plausible fundamental underlying problem.To a large extent, the model errors in the representation of the dynamical and physical mechanisms likely

E656
contributing to the SNP can be seen as direct consequences of coarse model resolution, both in the atmosphere and in the ocean.The strength of the eddy feedback is sensitive to resolution, and lower-resolution models have been shown to significantly underestimate the strength of this feedback (Scaife et al. 2019).New results recently illustrated that increasing the horizontal model resolution substantially improved the performance, including skill and the signal-to-noise characteristics of decadal predictions (Yeager et al. 2023).It is currently less clear which specific aspects of increased resolution contributed to this overall improvement, and a direct comparison with other systems is lacking.
It is plausible that problems with circulation regime structures, local air-sea interactions and eddy feedback could be solved with enough resolution.However, the model resolutions that may be required to achieve significant increases in model fidelity are not currently computationally affordable for global seasonal and decadal prediction systems.

Take-away conclusions
The discussions between the participants during and after the workshop revealed that our community is lacking a commonly agreed approach to the puzzle that the SNP represents.Nonetheless, the following key points of consensus have been identified: • The SNP is neither a simple-to-understand nor a simple-to-fix problem.Understanding and resolving the paradox is a challenge whose cause defies easy comprehension and straightforward solutions.Although there has been progress in unraveling the intricacies of the paradox, it is evident that the conundrum remains unsolved, even after a decade-long debate within our community.A variety of approaches is needed.• The commonly employed RPC diagnostic requires careful interpretation.The RPC is highly sensitive to the relative magnitudes of the variances of the observations, the ensemble-mean forecast and the error of the ensemble mean.When coupled with weak correlations and short observed records, this results in high sampling uncertainties that can significantly impact the RPC estimates and compromise their robustness.Alternative diagnostics should therefore be explored.For example, the regression coefficient of the observations against the ensemble-mean forecast provides a simple diagnostic of the signal deficit in models, i.e., whether the predictable signal is stronger than the predicted signal.• The SNP exhibits intermittent behavior, often dominated by a few decisive forecast years.
Correlation-based diagnostics, including skill, RPC, and regression, provide the means to identify these key contributing years that should form the basis for in-depth case studies on the underlying physical and dynamical processes involved.The interannual intermittency is active on top of low-frequency multidecadal variations of the signal-to-noise characteristics.

E657
testing as subseasonal forecasts are frequently integrated into high-resolution operational forecasting activities.• The signal-to-noise paradox is likely not attributable to a single model deficiency.Instead, we assume it arises from the collective impact of various deficient physical and dynamical mechanisms.For example, eddy feedback on the large-scale circulation, air-sea interactions, circulation regime structures and tropical-extratropical teleconnections are interconnected factors that all contribute to the complexity of the phenomenon.Too-coarse model resolution, including topography, has been argued to be a broader underlying issue with many of these.
Based on these conclusions, we outline the following future research directions toward resolving the SNP. 1) Models with improved and more realistic representations of a range of physical and dynamical processes will be required.Further increasing the model resolution will help achieve these.
2) The intermittent nature of the paradox provides a basis for focused case studies to test the described hypotheses ("windows of opportunities").
3) The regime dependence of the SNP over the North Atlantic requires testing and quantifying in multiple forecasting systems to establish more robustly which modes of variability or circulation regimes are worst affected.4) If the SNP arises from processes that amplify predictable signals in the real world but are poorly represented in current models, it will need to be argued as to why these processes should preferentially amplify only the signals and not also the noise (i.e., internal variability).This puzzle is particularly apparent when considering hypotheses such as the role of eddy feedback, which might well be expected to influence noise as well as signals.5) Detecting the SNP requires large ensembles, posing a practical constraint due to high computational costs.The quickly evolving field of machine learning and, more broadly, artificial intelligence is likely to provide impetus for exploring alternative methods of generating large ensembles.
Distinguishing the signal from the noise requires both scientific knowledge and self-knowledge: the serenity to accept things we cannot predict, the courage to predict the things we can, and the wisdom to know the difference.-Nate Silver (Silver 2012)

•
Multimodel ensembles are inadequate to reliably diagnose the SNP.Concepts from single-model ensembles do not readily apply to multimodel ensembles of opportunity and, if applied without caution, may lead to erroneous conclusions.Signal-to-noise errors can occur in multimodel ensembles because of different model responses rather than anomalously small predictability seen in single models.Therefore, careful interpretations of signal-to-noise problems in multimodel ensembles are required.•Further diagnostics are needed to investigate the influence of eddy feedback on the paradox.Enhanced diagnostics based on, e.g., potential vorticity and E-vectors, may be essential for a more thorough understanding of the processes involved in the feedback mechanisms.• The potential emergence of the paradox on shorter time scales opens up new avenues to scrutinize the underlying physical hypotheses.Should evidence of signal-to-noise errors on multiweekly time scales prove robust, it could offer opportunities for more comprehensive Unauthenticated | Downloaded 08/03/24 12:39 AM UTC