The North Atlantic Oscillation (NAO) is the main driver of weather variability in parts of Eurasia, Greenland, North America, and North Africa on a range of time scales. Successful extended-range NAO predictions would equate to improved predictions of precipitation and temperature in these regions. It has become clear that the NAO is influenced by the stratosphere, but because this downward coupling is not fully reproduced by all forecast models the potential for improved NAO forecasts has not been fully realized. Here, an analysis of 21 winters of subseasonal forecast data from the European Centre for Medium-Range Weather Forecasts monthly forecasting system is presented. By dividing the forecasts into clusters according to their errors in North Atlantic Ocean sea level pressure 15–30 days into the forecasts, we identify relationships between these errors and the state of the stratospheric polar vortex when the forecasts were initialized. A key finding is that the model overestimates the persistence of both the negative NAO response following a weak polar vortex and the positive NAO response following a strong polar vortex. A case in point is the sudden stratospheric warming in early 2019, which was followed by five consecutive weeks of an overestimation of the negative NAO regime. A consequence on the ground was temperature predictions for northern Europe that were too cold. Another important finding is that the model appears to misrepresent the gradual downward impact of stratospheric vortex anomalies. This result suggests that an improved representation and prediction of stratosphere–troposphere coupling in models might yield substantial benefits for extended-range weather forecasting in the Northern Hemisphere midlatitudes.
The North Atlantic Oscillation (NAO), which was described qualitatively in the latter half of the 18th century and more quantitatively using weather observations about 100 years later (e.g., van Loon and Rogers 1978; Greatbatch 2000; Hurrell et al. 2001; Visbeck et al. 2001; Wanner et al. 2001, and references therein), is the dominant driver of atmospheric variability over the extratropical North Atlantic Ocean. Usually defined as a meridional seesaw pattern of atmospheric pressure between a northern region near Iceland and a southern area roughly between the Azores and Portugal, the NAO varies on daily to decadal time scales (Thompson and Wallace 2000; Woollings et al. 2014) and affects weather in North Atlantic coastal regions and beyond. The negative phase of the NAO is characterized by a blocking high in the northern part of the North Atlantic (Croci-Maspoli et al. 2007) and has wide-reaching impacts on the ground, including anomalously cold and dry weather in northern Europe and warm, wet anomalies over the Mediterranean region. At the other end of the scale, the positive NAO phase is typically accompanied by lower-than-normal pressure near Iceland and a jet stream curving northward into northern Europe, bringing warm and stormy weather (Hurrell and Deser 2010).
Because the NAO is so important for the weather in large parts of the Northern Hemisphere’s midlatitudes, a substantial body of research has been devoted to NAO forecasting in recent years. Even though an increase in NAO prediction skill does not necessarily translate to an enhanced ability to forecast surface variables (Scaife et al. 2014; Athanasiadis et al. 2017), the NAO is often used as a benchmark for dynamical prediction models.
While it has emerged that the NAO has a potentially predictable deterministic component on time scales of 2–3 weeks (Domeisen et al. 2018; Zhang et al. 2019), there is also significant skill in predicting the NAO on subseasonal (Vitart 2014), seasonal (Weisheimer et al. 2020), annual (Dunstone et al. 2016) and decadal (Smith et al. 2019) time scales, although the models lay bare a marked underconfidence in predicting the NAO at seasonal and longer lead times (Baker et al. 2018).
To enhance the NAO forecast skill, it is important to understand the mechanisms that dictate its variability. The fluctuations of the NAO between positive, neutral and negative phases are partly generated by internal atmospheric variability (DeWeaver and Nigam 2000), but they also respond to remote forcing (e.g., Marshall et al. 2001; Osborn 2004; Cui et al. 2014). In particular, the stratosphere has been shown to strongly affect NAO variability (Thompson and Wallace 1998). Periods during which the stratospheric polar vortex is unusually weak, sudden stratospheric warmings (SSWs; Butler et al. 2017), tend to coincide with a nudging of the NAO toward its negative phase (Charlton-Perez et al. 2018) and an increased persistence of this phase (Domeisen 2019). On the ground, this yields a higher likelihood of e.g., cold-air outbreaks in northern Europe (Kolstad et al. 2010). At the other end of the spectrum, strong vortex events are frequently followed by a positive phase of the NAO (Baldwin and Dunkerton 2001).
As a result of its influence on the NAO, the stratosphere has been shown to contribute skill in the North Atlantic–European region on subseasonal (Tripathi et al. 2015; Butler et al. 2019; Choi and Son 2019; Domeisen et al. 2020c) to seasonal (Sigmond et al. 2013; Scaife et al. 2016; Dobrynin et al. 2018; O’Reilly et al. 2019; Nie et al. 2019) time scales. But despite the demonstrated increase in NAO skill after SSW events, not all stratospheric events contribute to this skill. Although there is variation between SSW events (Domeisen et al. 2020a), only roughly two thirds of SSWs are followed by a persistent negative NAO response (Charlton-Perez et al. 2018; Domeisen 2019), while the remaining third are followed by a positive NAO response (Afargan-Gerstman and Domeisen 2020).
The purpose of the study presented here was to investigate North Atlantic SLP errors in the European Centre for Medium-Range Weather Forecasts (ECMWF) monthly forecasting system for lead times on the subseasonal time scale (15–30 days). When we divided the errors into three clusters, it became evident that two of the clusters projected onto too-negative or too-positive NAO conditions. By relating these errors to the state of the stratospheric polar vortex when the model forecasts were initialize, we identified a weak but general relationship between NAO errors and the initial vortex state. The largest NAO errors were found to arise when the model failed to capture the diversity of tropospheric responses to anomalous stratospheric vortex states, instead overpredicting the “standard” NAO response to both SSWs and strong vortex events. This prediction bias negatively impacts the models’ prediction skill after anomalous stratospheric vortex states, which could otherwise have greatly contributed to improved subseasonal prediction skill of the winter troposphere.
2. Data and methods
a. Data sources
The model system used here is the ECMWF monthly forecasting system, which has two components: a real-time forecasting system with 51 ensemble members, and hindcasts with 11 members. Each ensemble member is a 46-day coupled ocean–atmosphere integration. The real-time forecasts are initialized on each Monday and Thursday, and we used all the 30 ensemble forecasts that were initialized between 19 November 2018 and 28 February 2019. The corresponding hindcasts were initialized on the same dates each year from 1998/99 to 2017/18, giving us a total of 21 winters when combining the hindcasts and forecasts, for a total of 30 × 21 = 630 predictions. Because we only consider ensemble means, we do not distinguish between hindcasts and forecasts and from now on refer to them simply as “forecasts.” Unless otherwise specified, we used time-averaged values for the forecast lead times between 15 and 30 days.
The model version investigated here is CY45R1, which became operational on 6 June 2018. The horizontal grid spacing of the model is 16 km at lead times of up to 15 days and 32 km thereafter. The forecasting system is a high-top model (model top around 0.01 hPa) and exhibits high resolution in the stratosphere (38 levels at 100 hPa and above), which is among the best resolutions among subseasonal prediction systems that provide stratospheric data (Domeisen et al. 2020b). The data were downloaded from the S2S database (Vitart et al. 2017), where instantaneous data for 0000 UTC on each day were available at a resolution of 1.5°. The following variables were downloaded: SLP, 2-m temperature, zonal wind at 10 hPa, and geopotential height at 50, 100, and 200 hPa. The forecasts are compared with ERA5 reanalysis data (Hersbach et al. 2020), from which the same variables were downloaded on the same 1.5° grid as the forecasts.
Prior to performing the analysis described below, we converted both reanalysis and forecast values to dimensionless anomalies by first subtracting the datewise climatological mean and then dividing by the datewise climatological standard deviation. For the forecasts, we applied this standardization to the ensemble means for each lead time.
For each of the 630 forecasts, the average SLP for lead times between 15 and 30 days was computed for each grid point inside a subregion of the North Atlantic (25°–75°N, 80°W–10°E). From ERA5 data, the average SLP for the same days was also calculated, and then 630 maps of the differences between the two (forecast minus reanalysis) were compiled. As mentioned, the differences were evaluated using standardized and dimensionless SLP values. The k-means (MacQueen 1967) implementation in the Scikit-learn Python library (Pedregosa et al. 2011) was then used to divide the difference maps into clusters. The number of clusters is a subjective choice, but it had little bearing on the results presented here. Many alternatives were tried, ranging from 2 to 10 clusters. Two clusters with oppositely signed SLP errors with NAO-like spatial structures always emerged, and the spatial signatures of these did not change appreciably with the number of total clusters. This is probably because the NAO explains a large share of the SLP variance in the North Atlantic. The results presented here are based on three clusters. As will be shown below (Fig. 1), this yielded two largely symmetric clusters that projected onto an NAO-like pattern and a third cluster without a distinct spatial signature.
2) Stratospheric sudden warmings
The major SSW central dates in the study period were identified using the zonal mean of ERA5 10-hPa zonal wind at 60°N. Henceforth, we refer to this wind speed (m s−1) as and its dimensionless standardized version, which was used to neutralize the seasonal cycle, as . Following Charlton and Polvani (2007), a central date is the first day that . To avoid recording the same event twice, we also required that for 20 consecutive dates before the central date. The SSW central dates during the study period from November 1998 to February 2019 are listed in Table 1, along with the central dates in Butler et al. (2017). Note that the two missing central dates, numbers 29 and 40 in Butler et al. (2017), both occurred in March, that is, outside our seasonal scope.
3) NAO and NAM index
The NAO index was calculated as follows. For each date, the area-averaged SLP was computed for two regions: all grid points within 500 km of Stykkisholmur in Iceland, and all grid points within 500 km of Lisbon, Portugal. This yielded time series of SLP for each region. As in Hurrell (1995), these time series were standardized separately for each region to avoid domination by the “northern” time series, which has a considerably higher variance than the “southern” time series. Then the difference between the southern and the northern time series was calculated to form a new time series, and the standardized version of this was used as our NAO index.
To assess the forecast errors lower in the stratosphere and near the tropopause level, we calculated an indicator of the Northern Hemisphere Annular Mode (NAM), which is related to the strength of the polar vortex and the NAO index, at 50, 100, and 200 hPa. We computed a NAM index as −1 times the standardized area-averaged geopotential height anomalies north of 65°N. This index is equivalent to the NAM index obtained with more complex methods based on principal component analysis (Baldwin and Thompson 2009).
4) Significance testing
As we computed composites for 16-day periods and the ECMWF forecasts were initialized twice a week, many of the composite periods overlapped. To account for the resulting autocorrelation, we used a two-sided bootstrapping approach to calculate the statistical significance of the composites in Figs. 1, 2, 4a, and 4b (note that Fig. 4 is discussed in more detail below). The null hypothesis was that each set of cluster members within a given winter could just as well have occurred in any of the 21 winters. To test this, 10 000 synthetic composites were calculated as follows. For cluster x, the set of forecast initialization dates belonging to winter y were retained, but y was replaced by a random winter out of the 21 possible winters (with replacement). As an example, assume a hypothetical cluster of 10 forecasts. Further assume that five of these forecasts were initialized in the winter of 2000/1 and the other five were initialized in 2012/13. Each of the 10 000 synthetic composites would consist of five forecasts from a random winter between 1998/99 and 2018/19 (initialized on the same days and months as the ones in 2000/1) and five forecasts from another random winter in the same time interval (initialized on the same days and months as the ones in 2018/19). This procedure ensured that each of the synthetic composites for x had the same intraseasonal autocorrelation. If the actual composite value was lower than the 2.5th or greater than the 97.5th percentile of the set of synthetic composites, we defined the value to be statistically significant at the 5% level.
Bootstrapping was also used to find out how many forecasts each bin in Fig. 4c might have contained by chance if they had been independent of . We created a set of 10 000 forecasts for which the distribution of the 30 forecasts into clusters for each year was swapped with the distribution from a random year out of the possible 21 years (with replacement). This method retained the autocorrelation of the original distribution while at the same time the cluster distribution was randomized. We defined the number of forecasts in a cluster to be significantly different from random if it was lower than the 2.5th percentile or higher than the 97.5th percentile of this random set.
A similar method was used to calculate the significance of the composite average temperature anomalies in Figs. 8b and 9b (discussed in more detail below), where a set of 10 000 sequences consisting of 11 consecutive forecasts where drawn randomly from the forecasts. The anomalies were defined to be statistically significantly different from random if they were either less than the 2.5th or greater than the 97.5th percentiles of this set.
a. Forecast error clusters
As described in the previous section, the 630 North Atlantic SLP error maps (forecast minus reanalysis), averaged for lead times between 15 and 30 days, were divided into three clusters. In Fig. 1, the mean forecast errors are shown for each of the clusters, named A, B, and C. The errors in clusters A and B project onto the typical spatial pattern of the NAO. Figure 1a shows that the forecasts in cluster A are in a too-negative phase of the NAO, with too low pressure in a wide belt stretching from the subtropics to the Iberian Peninsula and too high pressure over Greenland and Iceland. Although the clustering algorithm only considered grid points inside the outlined region, the forecast errors are significant and positive over most of the polar cap north of 60°N. Significant and negative errors also occur over large parts of the Pacific. Figure 1b shows that the NAO in the cluster-B forecasts is too positive. Overall, there is substantial symmetry between the errors in clusters A and B, although the errors in cluster A cover a larger geographical region than the errors in cluster B. The total number of forecasts in A and B is 355 (56% of the 630 forecasts). The average of the forecast errors in cluster C is small, as shown in Fig. 1c, indicating that there is no systematic error pattern in this set of forecasts. The remainder of the study is mainly focused on clusters A and B.
The SLP anomalies in the forecasts and the reanalysis corresponding to the forecast errors shown in Fig. 1 are now studied separately. The mean forecast SLP anomaly for cluster A has a distinctly NAO-negative signature after 15–30 days (Fig. 2a), but what actually occurred during said period was that the NAO was weakly positive (Fig. 2b). In cluster B, the mean forecast SLP anomaly corresponds to a positive NAO (Fig. 2c), while the mean reanalysis anomaly is that of a negative NAO (Fig. 2d). Hence, on average, these clusters comprise forecasts in which the model predicted the incorrect sign of the NAO.
We now show that the divergence of the NAO phase between forecasts and observations is clear already from a few days into the forecasts. In Fig. 3, the NAO index averaged over all members in each cluster is shown for each lead time between 1 and 30 days. Over the first few days in the cluster-A cases, the mean NAO index is decreasing in both the reanalysis and the forecasts (Fig. 3a). After about five days, the mean NAO index in the reanalysis starts to increase and eventually becomes positive, while the mean forecast NAO index continues to decrease and stays negative throughout the period. The differences between the forecast and the reanalysis is significant from one week and onward. For the forecasts in cluster B, there is a similar divergence between the reanalysis and the forecast NAO index (Fig. 3b). In both clusters, the forecast model tends to persist and enhance the initial phase of the NAO, while in the reanalysis, the NAO index switches sign on average between 11 and 14 days after initialization.
The cluster analysis identified sets of forecasts with large ensemble mean error in the NAO in days 15–30. Importantly, the emergence of consistent signals early in the forecast period, as shown in Fig. 3, reveals that these are not simply random errors, as might be expected in a system with low intrinsic predictability. Instead, there is a consistent dynamical setup in these cases that suggests a systematic bias in the model under these situations. Further evidence of the common structures within these two clusters is presented in the following section.
b. Linkages to initial vortex states
We now consider the state of the stratospheric polar vortex on the initialization dates of the forecasts in clusters A and B. Figure 4a shows the mean 10-hPa zonal wind anomalies on the 168 initialization dates in cluster A. The significant and negative mean anomalies in a wide belt on both sides of 60°N (the usual reference latitude for the polar vortex at 10 hPa) show that, on average, the polar vortex was weaker than normal on these dates. On the initialization dates of the 187 forecasts in cluster B, the average vortex was slightly stronger than normal, but the composite average zonal wind anomalies are not statistically significant (Fig. 4b).
The histogram in Fig. 4c is an alternative visualization of the link between the vortex states on initialization and the SLP forecast errors. The 630 forecasts were divided into five equally sized bins according to (the standardization was done to account for the seasonal cycle of ). Each bin contains 126 forecasts, and the height of each bar indicates the percentage of the total number of forecasts in each cluster.
First, we note that the cluster-C percentages, which are shown here to contrast with the ones for clusters A and B, do not significantly deviate from what could have occurred by chance. By contrast, 53 (32%) of the 168 cluster-A forecasts belong to the bin with the weakest initial vortex states. This fraction is significantly different from what could be expected by chance. The weak vortex bin also contains significantly fewer cluster-B forecasts than a randomly distributed set. The number of cluster-A members in each bin decreases gradually with increasing vortex strength. The last bin, which consists of the forecasts with strong initial vortex states, only contains 16 (10%) of the cluster-A members, that is, less than one-third of the number of cluster-A forecasts in the weak vortex bin. This is significantly fewer than expected by chance. The fourth bin, which consists of forecasts that were initialized when the vortex was between average and strong, contains significantly more cluster-B forecasts than expected by chance.
There is clearly a relationship between North Atlantic SLP errors and the polar vortex when the initial state of the vortex is either weak or strong, although a large share of the 630 SLP forecasts do not appear to be directly linked to the initial vortex state. We now take a small detour from the clusters and relate the initial vortex strength to subsequent errors in the NAO index, which is the most familiar indicator of SLP variance in the extratropical North Atlantic. In Fig. 5, all the 630 NAO forecast errors at lead times of 15–30 days are plotted against when the forecasts were initialized. The correlation coefficient is 0.25, and this is statistically significant at the 5% level, according to a bootstrapping test that takes the intraseasonal autocorrelation of the time series into account. The correlation is even significant for the cluster-C members alone (the gray circles in the scatterplot). This is a powerful result, which shows that there is a general relationship between NAO forecast errors and the initial vortex state.
c. Downward impact
The results in the previous section made it clear that the initial state of the stratospheric polar vortex is linked to SLP forecast errors in the North Atlantic region between 15 and 30 days into the forecasts. It is possible that the NAO forecast errors can be explained by an erroneous representation of the downward impact of stratospheric vortex anomalies. To test this, we calculated zonal wind anomalies at 10 hPa and geopotential height anomalies at 50, 100, and 200 hPa in both the model and the reanalysis during the period from 15 to 30 days after the forecasts in each cluster had been initialized.
As we already know from Fig. 4a, the initial polar vortex is weaker than normal in the cluster-A forecasts (Fig. 6a). With increasing lead time, the negative anomalies diminish in magnitude. It is remarkable how closely the forecast anomalies follow the reanalysis. The differences are nonsignificant at all lead times. The same applies to the anomalies for the cluster-B cases (Fig. 6a). In summary, this shows that the mean zonal wind forecast at 10 hPa is successful for both cluster A and cluster B at lead times of up to 30 days.
Figure 6c shows how the cluster-A NAM index forecast errors evolve with increasing lead time. Already 8 days into the forecast, the error becomes significant at all the three levels. At lead times longer than about 10 days, the errors are consistently smaller at 50 hPa than at 100 hPa, and smaller at 100 hPa than at 200 hPa. As was seen for the NAO index in Fig. 3a, the 200-hPa NAM index in the reanalysis gradually becomes less negative with increasing lead time, whereas the forecast NAM index first becomes more negative and then stays negative. Similar evolutions, although less pronounced, are seen for the NAM index at 50 and 100 hPa. In the cluster-B cases shown in Fig. 6d, the forecast and reanalysis-based NAM index values also diverge after about 10 days, and the mean forecast error grows with decreasing height, just as it did in the cluster-A cases.
d. Persistence of forecast errors
A timeline of how the forecasts are distributed among the clusters is shown in Fig. 7. The forecasts have a tendency to belong to the same cluster for multiple subsequent initialization dates. Cluster B has the highest persistence, with 64% of cluster-B forecasts immediately followed by forecasts in the same cluster. The corresponding fractions for cluster A and cluster C are 57% and 59%, respectively. Note, however, that these percentages should only be interpreted relative to each other, since some persistence is expected merely due to the fact that we consider 16-day means of forecasts that are separated by only 3 or 4 days. Figure 7 also shows the dates on which , along with the major SSW central dates. Of the total number of dates in the analysis period, was negative on 8%. To identify an equal number of weak and strong vortex dates, we defined strong vortex dates as the dates on which was higher than its overall 92nd percentile. These dates are also marked in Fig. 7.
Roughly one-half (23 of 50) of the forecasts that were initialized when belong to cluster A, even though the cluster-A cases only constitute about 27% of the total number of forecasts. The major SSW in the beginning of January 2019 was followed by 11 consecutive forecasts in cluster A (corresponding to more than 5 weeks, as there are two forecasts per week). Another example of long cluster-A sequences following an SSW occur after the SSW in January 2009.
Cluster B makes up 17 of the 47 forecasts (36%) that were initialized on strong vortex dates. But as the cluster-B forecasts make up 30% of the total, this is not remarkable. It is more revealing that only 4 of the 47 strong vortex forecasts belong in cluster A. Two sequences of 10 or more consecutive cluster-B forecasts occur in 2012/13 and 2006/7, but these are not linked to strong initial vortex states. The long run with mainly cluster-B forecasts early in the winter of 2015/16 is, however, associated with a long period of strong vortex dates.
In the next section, the 2018/19 and 2015/16 winters are used to illustrate how cluster-A and cluster-B SLP forecast errors relate to regional temperature forecast errors.
e. Links to temperature forecast errors
To show how the forecast errors evolved during the 2018/19 winter, we calculated the NAO index for the forecast and the reanalysis, averaged for lead times of 15–30 days after each of the forecast initialization dates. These NAO index time series are shown in Fig. 8a, along with the standardized anomaly on initialization. The major SSW central date (2 January 2019) is indicated by a vertical line. The shaded area denoting the difference between the forecast and the reanalysis-derived NAO index shows that the forecasts were very good prior to the SSW, but after this the forecast NAO index was far too low. As expected, the too-negative NAO forecasts belong to cluster A, as indicated by the blue circles along the bottom of the graph.
The average 2-m temperature anomalies 15–30 days into the 11 cluster-A forecasts are shown in the map in Fig. 8b, and the geographical pattern clearly resembles the typical temperature anomaly pattern accompanying the negative phase of the NAO, which consists of anomalously cold temperatures in northern Europe and on the U.S. East Coast, in tandem with anomalously warm temperatures in North Africa and in the northwest Atlantic (e.g., van Loon and Rogers 1978; Domeisen et al. 2020c).
To provide some context of just how large the forecast errors were in this case in a specific region, we calculated the area-averaged 2-m temperature errors at lead times between 15 and 30 days inside the northern European region outlined in Fig. 8b. The first 10 of the 11 cluster-A forecasts after the major SSW were too cold in this region. The largest error occurred for the forecast initialized on 24 January 2019, when the forecast was 2.2σ colder than the reanalysis. To put this in perspective, this was the third-largest negative northern European temperature forecast error following major SSWs in the study period. The two even larger errors were found after the 5 January 2004 SSW, when the forecast initialized on 17 January had a temperature error of −2.4σ 15–30 days into the forecast, and after the 31 December 2001 SSW, when the forecast initialized on 7 January 2002 was 2.3σ colder than the reanalysis for the same lead times.
In Fig. 7, several series of consecutive cluster-B forecasts were evident, and some of these could be linked to strong initial vortex states. In Fig. 9a, the NAO forecast error during the winter of 2015/16, during which one such sequence occurred, is shown. The bars representing show that except for a few days in February, the initial stratospheric vortex was strong throughout the winter. Most of the forecasts up until the one initialized on 31 December 2015, belong in cluster B. The forecast NAO index was consistently too high for the cluster-B members. The average 2-m temperature anomaly for the cluster-B forecasts (including the one initialized on 4 February 2016) is shown in Fig. 9b. These forecasts were too warm in northern Europe, with average errors of more than 2 standard deviations in Finland.
4. Summary and discussion
In this paper, we have investigated the sources of subseasonal forecast errors over the North Atlantic sector. A cluster analysis identified cases of significant discrepancies between the predicted and observed sign of the NAO. While such errors might be expected to occur randomly in forecasts of a complex system with limited predictability, common dynamical structures were seen in both the troposphere and stratosphere in these cases, revealing systematic model biases in stratosphere–troposphere coupling in certain situations. It is worth noting that the results presented here were based on model data from only one leading prediction center (ECMWF). Further study is needed to determine if other subseasonal ensemble systems exhibit similar characteristics, in particular low-top models. Butler et al. (2020) identified considerable diversity in the bias of the NAM response among S2S prediction systems for the 2018 and 2019 SSW events, some of which can be traced back to the stratosphere. We emphasize that even though our results are statistically significant, there is still much diversity in the relationship between NAO errors and the initial state of the polar vortex (Fig. 5).
A cluster analysis enabled the cases exhibiting errors in the NAO forecast to be isolated. These cases exhibit a remarkable persistence of forecast errors, which are seen to emerge in forecast week two and persist over the subsequent two weeks (see Fig. 3). Forecast errors can also persist for several consecutive forecasts, as highlighted by the long series of too-negative NAO forecasts during the winter of 2018/19 (see Fig. 8). This suggests that the emerging errors in forecasts from the preceding week or two may provide useful information on likely errors for today’s forecasts.
The overestimation of the negative NAO response in the troposphere likely results from an overestimation of the tendency of weak vortex states to be followed by typical negative NAO phases. As shown for the example of the 2019 SSW, the negative NAO response was severely overestimated and resulted in much too cold temperature forecasts for northern Europe. Rao et al. (2019) found that the onset of the January 2019 SSW event was forecast in the ECMWF model 25 days before the event, which is notably longer than for most other SSW events (Domeisen et al. 2020c). Nevertheless, the downward impact was not correctly forecast by the model, highlighting that the question of forecasting SSWs and their downward coupling are separate problems. We mention here that a follow-up study on how the forecast errors were distributed vertically for specific events, including both SSWs and strong vortex events, would be worthwhile for enhancing our understanding of the mechanisms that give rise to the forecast errors near the surface.
In a recent paper related to the one presented here, Choi and Son (2019) used the ECMWF model to investigate composites for high and low SLP forecast skill north of 20°N. Their high skill composite, which corresponded to enhanced skill over parts of the North Atlantic and North Pacific, was associated with a 10-hPa geopotential height dipole anomaly pattern. As we did not fully account for zonally asymmetric flow, our study is not directly comparable, but it is worth noting that Choi and Son (2019) showed that different asymmetries with respect to the wave phasing in the stratospheric flow can have an impact on tropospheric prediction skill.
In another related study, Matsueda and Palmer (2018) investigated the flow-dependent predictability of the North Atlantic/European region in the THORPEX Interactive Grand Global Ensemble (TIGGE) and found a general overestimation of the frequency of the negative NAO regime. They related this to too frequent transitions from zonal to “wavier” regimes like the negative NAO, rather than an overestimation of its persistence. This is not necessarily at odds with the overly persistent negative NAO cases reported here. Their focus was the shorter 10–15 day time scale, on which stratospheric dynamics are likely to play a weaker role than in our forecasts. Also note that Matsueda and Palmer (2018) used a four regime system, and so some of the transitions between these four may occur within one of the larger clusters used here. Thus, the overly frequent negative NAO transitions in the forecasts of Matsueda and Palmer (2018) likely manifest as enhanced persistence of the negative NAO cases in our analysis.
We emphasize that the origin of tropospheric NAO biases is not limited to the stratosphere. North Atlantic winter variability can also be influenced by the tropics (Scaife et al. 2017; Jiménez-Esteve and Domeisen 2018), by the Arctic (e.g., Kolstad and Årthun 2018; Blackport and Screen 2019), by the upstream circulation in the North Pacific (Drouard et al. 2013, 2015), and by the underlying ocean (Visbeck 2002).
Stratospheric variability is widely recognized as an important source of predictability on subseasonal time scales. The results presented here highlight that the diversity in the tropospheric response to stratospheric forcing is a particular challenge for forecast models.
Subseasonal prediction lies in an intermediate band of time scales between those dominated by initial condition information, as in weather forecasting, and those dominated by boundary condition information, as in near-term climate predictions. Weather forecasts have traditionally suffered from a lack of diversity or spread in model ensembles (e.g., Buizza 1997), whereas near-term climate predictions of the NAO suffer from the opposite problem of having too-large internal variability within ensembles (e.g., Eade et al. 2014). The forecast errors presented here could provide examples of the former of these issues on the stratospheric weather time scale, as the model appears to be constrained too strongly by the polar vortex state in the initial conditions.
Overcoming the challenge of allowing for diversity in dynamical stratosphere–troposphere coupling could significantly improve the skill of subseasonal predictions for events such as cold temperature extremes in northern Europe and storm track anomalies in central and southern Europe in winter.
Author Kolstad was supported by the Research Council of Norway through the Seasonal Forecasting Engine project (Grant 270733). Support to authors Domeisen and Wulff from the Swiss National Science Foundation through project PP00P2_170523 is gratefully acknowledged. Author Woollings acknowledges support from the Research Council of Norway, Grant 310391.
Denotes content that is immediately available upon publication as open access.