Improving Arctic weather and seasonal climate prediction: recommendations for future forecast systems evolution from the European project APPLICATE

: The Arctic environment is changing, increasing the vulnerability of local communities and ecosystems, and impacting its socio-economic landscape. In this context, weather and climate prediction systems can be powerful tools to support strategic planning and decision-making at different time horizons. This article presents several success stories from the H2020 project APPLICATE on how to advance Arctic weather and seasonal climate prediction, synthesizing the key lessons learned throughout the project and providing recommendations for future model and forecast system development.

An efficient generation and exploitation of ensemble forecasts can also offer important predictive advantages, like a better characterization of the forecast uncertainty and more efficient constraint of the predictable signals.Likewise, multimodel forecast ensembles can bring further advantages as a better sampling of structural model uncertainty (Min et al. 2020).
Over the last four decades, major NWP improvements have been achieved in a so-called quiet revolution (Bauer et al. 2015), underpinned by continuous scientific and technical advances at weather prediction centers worldwide.These improvements have gone hand in hand with considerable advances in supercomputing technologies and power, and a growing network of high-quality observations, which have been jointly leveraged to improve model fidelity and perform larger ensembles of predictions.Predictive skill has also increased at longer forecast horizons, specially at subseasonal and seasonal scales (Vitart and Robertson 2018;Tang et al. 2018).In the Arctic, seasonal predictions of features like the sea ice edge and surface air temperature already show promising potential (Day et al. 2014;Zampieri et al. 2018;Dirkson et al. 2019).Such predictions are being promoted and intercompared for both dynamical and statistical systems-currently showing similar levels of skill-through the yearly Sea Ice Outlooks (Steele et al. 2021) of the Sea Ice Prediction Network Phase 2 (SIPN2).
However, the Arctic is a region where Earth system predictions have suffered from both reduced availability of in situ observations and model limitations to realistically simulate the relevant physical processes (Jung et al. 2016).To help overcome these problems, the European H2020 project APPLICATE (https://applicate-h2020.eu/; November 2016-May 2021) fostered new dedicated research to improve weather and seasonal climate predictive skill in and around the Arctic, playing a key role in international initiatives like YOPP, MOSAiC, and SIPN2.In the following, we present a selection of lessons learned throughout APPLICATE, followed by a list of recommendations for Arctic weather and climate forecast system development.

Improving forecast initialization through enhanced assimilation of observations
Increasing the uptake of Arctic observations is key to initializing forecast models with a better estimate of the observed state.For this, understanding which assimilated observations yield higher predictive potential is essential.
Observing System Experiments, in which different observation types were alternately excluded from data assimilation when creating the initial conditions for the Integrated Forecasting System (IFS) of the European Centre for Medium-Range Weather Forecasts (ECMWF), were used to perform a first comprehensive study documenting the role of different observation sources on Arctic NWP skill (Lawrence et al. 2019).All observations were shown to positively impact short-and medium-range forecast skill in the Arctic troposphere (e.g., in Figs.1a,b for pan-Arctic 500 hPa geopotential height).Microwave sounding observations from low-Earth-orbit satellites and conventional observations showed the largest contributions to skill in summer and winter, respectively, as indicated by the fact that they led to larger random forecast errors when their respective observations were excluded.The analysis also showed that microwave sounding data were suboptimally used in winter, in particular over snow and sea ice covered areas, where their uptake was reduced (Figs.1c,d) due to model errors and increased uncertainties in estimating surface emissivity and temperature.This study thus provided a good example of how forecast initialization and model development are interconnected.
The implementation of new nudging capabilities in three different APPLICATE seasonal forecast systems showed promising benefits from the assimilation of sea ice properties on the Arctic prediction skill.Blockley and Peterson (2018) found remarkable improvements in predicting the summer Arctic sea ice extent and the local surface weather when satellite observations of sea ice thickness (SIT) were used to initialize the Met Office Global Seasonal forecast system (GloSea; Fig. 2a).The assimilation of different SIT products was later found to also enhance the short-term (up to 5 days) predictive skill of sea ice concentrations (Fiedler et al. 2022;Mignac et al. 2022), an important milestone toward the future operational implementation of SIT assimilation within Met Office forecasting systems.However, the limited length of current SIT observations precludes performing long hindcasts to robustly bias-correct the forecasts and thus reduce forecast uncertainty (Blanchard-Wrigglesworth et al. 2017).A more idealized study, assimilating only sea ice freeboard observations in EC-Earth with an ensemble Kalman filter (EnKF), was also shown to improve the prediction of sea ice extent in July-October (Fig. 2b), in particular the September minimum.A different study, also based on EC-Earth predictions, demonstrated that forecast errors around the sea ice edge can be reduced by assimilating sea ice concentrations through nudging (Fig. 2c).These latter predictions even showed increased skill in the summer atmospheric circulation over the North Atlantic and Eurasia through an improved representation of the teleconnection mechanisms with the regions where sea ice errors were reduced (Acosta Navarro et al. 2022).Model development to enhance process representation APPLICATE also explored how to improve Arctic forecasts through enhanced process representation.New or revised model components in APPLICATE NWP systems, including a version of IFS-HRES with a multilayer snow scheme (Arduini et al. 2019;Day et al. 2020) and new versions of ARPEGE, AROME and IFS-HRES with dynamic sea ice models  (Bazile et al. 2020;Day et al. 2022), demonstrated that improved representation of cold surfaces can lead to enhanced predictions of near-surface variables.The typical improvements achieved for predicting surface air temperature in three distinct Arctic regions are quantified in Fig. 3.The inclusion of the dynamic sea ice model GELATO (Global Experimental Leads and Ice for Atmosphere and Ocean; Salas Mélia 2002) in ARPEGE led to significant reductions in mean absolute errors, with particularly large improvements in the regions surrounded by sea ice.Similar forecast improvements were also found for predicting low and high temperature extremes.The impact of the multilayer snow scheme in IFS-HRES was more complex (Fig. 3b), reducing a warm bias in continental regions from Northern Scandinavia, but increasing a cold bias in Arctic maritime climates (e.g., Svalbard and Norway).
APPLICATE also devoted substantial efforts to develop and test new parameterizations and model features to refine the representation of sea ice, like sea ice thickness distribution (Massonnet et al. 2019;Moreno-Chamarro et al. 2020), landfast ice, snow on sea ice, form drag, and melt ponds (Sterlin et al. 2021).These showed mixed results in terms of improved forecast skill, both in NWP and seasonal systems (Ponsoni et al. 2021).An identified problem was that preexisting-and partly compensating-systematic model errors, unbalanced physics, and numerical issues disguised the expected improvements.Another limitation was related to the use of a forced ocean-sea ice model configuration for testing some of the parameterizations.Because this configuration uses prescribed atmospheric fields (thus excluding air-sea feedbacks), some of the improvements detected for new model features did not hold when eventually evaluated in coupled mode.Further work also showed that the coupling with the atmosphere could introduce new systematic errors not present in the stand-alone ocean-sea ice models, like an erroneous circulation of Atlantic Water in the Arctic in the coupled version of the Alfred Wegener Institute Climate Model (AWI-CM1), arising from a bias in the overlaying atmospheric circulation (Hinrichs et al. 2021).All the above results thus highlighted the importance of testing new developments directly in the final model configurations.
Understanding the physical origins of model errors is essential to assist and improve model development.This understanding can be fostered through the implementation of new model frameworks for process-based analysis, and by exploiting new observations to facilitate model evaluation.APPLICATE made important contributions on both fronts.On the first, the development of an Atmosphere-Ocean Single Column Model (Hartung et al. 2018(Hartung et al. , 2022)), based on ECMWF IFS, provided a versatile and computationally inexpensive tool that was used to show that a revised skin conductivity parameter, which controls the coupling between sea ice and atmosphere, reduced a pre-identified surface warm bias over melting sea ice regions.Tjernström et al. (2021) later evaluated the vertical structure of the lower atmosphere in IFS using data from the Arctic Ocean 2018 campaign, showing that the warm bias, which had a deep vertical structure, arose from complex interactions between surface temperature, clouds, the energy fluxes, and vertical mixing.On the second front, APPLICATE also exploited the wealth of in situ observations from Arctic supersites (e.g., from YOPPsiteMIP) for model error investigation, which involved the definition of process-based diagnostics for evaluating aspects as diverse as the surface energy budget or solid precipitation (Day et al. 2020;Køltzow et al. 2020).This exercise also highlighted the importance of standardizing the formats and preprocessing of Arctic observations, as carried out within YOPP.
APPLICATE additionally explored the impact of model resolution on process representation and forecast quality.Increasing the model resolution in the AROME-Arctic NWP system from 2.5 to 1.25 km and from 65 to 90 vertical levels showed overall added value in forecast accuracy of near-surface variables, especially during winter (Køltzow et al. 2021).Valkonen et al. (2020) additionally showed that regional NWP configurations with kilometer and subkilometer grid spacing can outperform state-of-the-art Arctic operational systems over complex terrain areas thanks to a better representation of topographic wind effects.By contrast, retrospective seasonal predictions based on EC-Earth3 and CNRM-CM6 showed no clear added value on Arctic prediction of increasing the ocean-sea ice model spatial resolution from 100 to 25 km, which could be due to a suboptimal tuning of the higher-resolution configurations.Tuning efforts in the Arctic could be further compromised at even finer resolutions at which linear kinematic sea ice features start to emerge, dramatically increasing the computational demands.This motivated the development of a computationally efficient sea ice solver for FESOM2 (Koldunov et al. 2019), a global ocean-sea ice model able to reach, through its unstructured mesh, ground-breaking resolutions in the Arctic for climate scale simulations.
Ensemble strategies to boost Arctic prediction APPLICATE also explored new ways to optimize ensemble prediction in the Arctic.For NWP, the major forecast improvements were achieved with the development of a pioneering high-resolution regional ensemble prediction system (EPS) based on AROME-Arctic (Køltzow et al. 2021).This system exhibited promising improvements in different Arctic regions when compared to its deterministic counterpart, in particular for forecasting lowprobability tail events in surface temperature (Fig. 3c) and precipitation (not shown).A fundamentally different approach, based on the combination of large multimodel ensembles with process-oriented constraints, led to large improvements in seasonal prediction.The method, developed in Acosta Navarro et al. ( 2020), used the strength of an observed teleconnection-generally underestimated in models-between Barents-Kara sea ice concentrations in November and the winter atmospheric circulation in western Russia (Mori et al. 2019), as a metric to subselect those members of an ensemble of APPLICATE systems for which the teleconnection was more realistic.This physically constrained ensemble showed substantially higher skill for the atmospheric circulation in western Russia and surface air temperature in Eurasia (Figs. 4a,b).Similar skill enhancements for forecast years 2-10 were also found when subselecting a multimodel ensemble from the Decadal Climate Prediction Project (DCPP; Boer et al. 2016;Figs. 4d ,e) according to the strength of the same teleconnection.Other recent studies also showed the potential of applying Unauthenticated | Downloaded 12/29/22 12:01 PM UTC physical constraints in the analysis of large prediction ensembles to circumvent systematic model problems (Smith et al. 2020;Donegan et al. 2021).Also worth highlighting is a separate study that confirmed the benefits of multisystem initiatives for enhancing forecast accuracy of sea ice features in the melt season (Batté et al. 2020).

Recommendations based on APPLICATE findings
Initialization enhancements, through an improved uptake of atmospheric (particularly microwave sounding data over snow and ice) and sea ice thickness/freeboard observations, were shown to improve predictive skill from a few days to several months ahead.Improving the representation of cold surfaces through different approaches (e.g., improved modeling of snow and sea ice and their coupling to the atmosphere) also led to improvements in Arctic NWP systems.The importance of producing physically consistent ocean and sea ice initial conditions for Arctic seasonal prediction was demonstrated as well (Cruz-García et al. 2021).Performing similar analyses in decadal prediction is thus encouraged.Many state-of-the-art coupled models used for weather and climate prediction are subject to considerable model biases over the Arctic, limiting their predictive capacity and credibility to assess future changes.Understanding the origin of these biases, and how each model component contributes to them, is a time-consuming but crucial first step to eventually correct them through improved model development and tuning.For this, APPLICATE has successfully developed process-based diagnostics and metrics to compare models and observations (Day et al. 2020;Khosravi et al. 2022).These diagnostics provided key knowledge to later improve, e.g., the representation of the vertical structure of the lower atmosphere and the feedback mechanisms with the surface, a much-needed step to fully capitalize on the forecast improvements currently achieved at the surface.Another key process proposed to control the remote influence of Arctic sea ice on the midlatitudes, which is still debated (Blackport and Screen 2020), is the strength of the eddy momentum feedback on the mean flow.This feedback is underestimated in current models with respect to observations, which could explain the generally weak model responses found in the Polar Amplification Model Intercomparison Project to Arctic sea ice loss (Smith et al. 2022).We therefore recommend further work to understand why models underestimate the eddy momentum feedback and how this error could be corrected.
The vast amount of satellite measurements used to improve forecast quality in the assimilation process could be further exploited for a more efficient model evaluation (and forecast system development) in the Arctic.For this, it is critical to consider the uncertainties in observation-based products, which can show important differences in sea ice mean state and variability (Ponsoni et al. 2019;Moreno-Chamarro et al. 2020).Likewise, models can also assist the development of future observing systems, providing key information on the sampling locations providing more predictability (Ponsoni et al. 2020;Sandu et al. 2021).
Given the high potential shown by multisystem forecast ensembles to boost skill through the use of optimized ensemble aggregation techniques and the introduction of physical constraints to select members with larger skill, it is important to promote and further exploit existing multi-collaborative initiatives for retrospective and real-time prediction, such as it is currently done through DCPP, the Copernicus Climate Data Store, and SIPN2.
Preliminary results from APPLICATE suggest that the high computational costs of enhancing horizontal resolution for seasonal climate prediction in the Arctic are still not worthwhile when compared to the benefits of larger ensembles, at least for the resolutions and models considered thus far.In a regional NWP system, enhanced resolution was found to improve the predictive skill, but to a lesser degree than an ensemble approach with similar computational costs (Køltzow et al. 2021).Unveiling the real benefits of higher resolution configurations will require optimally tuned model versions.

Fig. 1 .
Fig. 1. (a),(b) Normalized change in the standard deviation of the forecast errors of ECMWF IFS system for the Arctic geopotential height at 500 hPa for different observation system experiments for the summer and winter seasons, respectively.Forecast errors are verified against the ECMWF operational analysis.Values are given as fractions with respect to the errors of a control forecast in which all observations are assimilated.Positive values in this quantity indicate that random forecast errors increase when the respective observations are excluded.(c),(d) Number of observations from channel 5 of the microwave sounding unit AMSU-A that were assimilated in the ECMWF operational system in June-September 2016 and in December 2017-March 2018, respectively.All panels are reproduced from Lawrence et al. (2019).

Fig. 2 .
Fig. 2. (a) September 2012 mean probability of sea ice for two ensembles of seasonal forecasts made with the GloSea system in which sea ice thickness observations are not included (CTRL-HC, left) and included (ThDA-HC, right) within the model initialization.Contours of 15% sea ice concentration (SIC) are overlain to represent the sea ice edge for the ensemble mean (orange) and the Copernicus Marine Environment Monitoring Service (CMEMS) reanalysis (black).Probability is defined at each point as the proportion of ensemble members with at least 15% SIC.The CMEMS and modeled sea ice extent (SIE) and associated integrated ice edge error (IIEE) are included in the lower-right corner (units: 10 6 km 2 ).Both panels are reproduced from Blockley and Peterson (2018).(b) Total Arctic 2012 SIE in two sets of 50-member seasonal predictions with EC-Earth performed with/without assimilation of OSI-450 sea ice freeboard observations.Colored shades indicate the 5%-95% ensemble range (light shade) and interquartile 25%-75% range (medium shade), and thick lines the ensemble medians.(c) IIEE in the Arctic (solid lines) and its Atlantic sector (dashed lines) for two sets of retrospective predictions with EC-Earth initialized with/without SIC assimilation of sea ice nudging.Each set covers the period 1992-2019 and includes 30 ensemble members.IIEE is evaluated against NSIDC-0051 observations (Cavalieri et al. 1996) after bias-correction is applied (Batté et al. 2020).All forecast systems in the figure are initialized the first of May.

Fig. 3 .
Fig. 3. Averaged improvements/deteriorations (red/blue) in the mean absolute error (MAE) for predicting 2 m tures to 4 days ahead in all weather station sites of three characteristic Arctic regions (inland northern Scandinavia, the Norwegian coastal and fjord areas, and Svalbard) resulting from (a) the inclusion of the GELATO sea ice model in ARPEGE, (b) including a multilayer snow scheme (MLSS) in IFS-HRES, and (c) comparing the mean of an EPS based on AROME-Arctic with its deterministic counterpart.Improvements/deteriorations on the ability to forecast tail events are additionally quantified with the Brier skill score (BSS) for low (<observed 10% tile) and high (>observed 90% tile) temperature events using, respectively, ARPEGE predictions with prescribed climatological sea ice, IFS-HRES predictions with a single layer snow scheme, and the deterministic AROME-Arctic system as reference forecasts.Significant changes (at 95% confidence level) calculated by bootstrapping are enclosed by a black frame.

Fig. 4 .
Fig. 4. (a),(b) Skill difference for predicting winter SLP and atmospheric temperature at 2 m (T2m) when subsampling a multimodel ensemble of November initialized seasonal predictions from APPLICATE (120 members in total) according to the strength of an observed teleconnection between November Barents-Kara sea ice and winter sea level pressure in western Russia [green box in (a)], as compared to the full ensemble.The constrained ensemble includes the 5% subset of 10,000 random combinations of 12 members yielding the strongest teleconnection strength.Stippling represents significant skill differences at the 95% confidence level.(c),(d) As in (a),(b), but in an ensemble of decadal predictions comprising 55 members from 6 models contributing to DCPP.Skill is computed in winter for the average of forecast years 2-10.Observational references for evaluating the skill in the seasonal/decadal predictions are Centre ERS d′Archivage et de Traitement (CERSAT)/HadISSTv1.1 for sea ice and ERA-Interim/JRA55 for SLP and T2m.