1. Introduction
The wintertime extratropical stratospheric variability is known to be predictable at longer lead times than that in the troposphere (Waugh et al. 1998; Jung and Leutbecher 2007; Taguchi 2014). This predictability can be attributed to a lack of large-scale hydrodynamic instabilities (baroclinic or barotropic), dominance of the quasi-stationary planetary-scale waves, and long radiative damping time scales. All of these factors lead to a longer time scale of the wintertime stratospheric dynamical variability, which is seen in observations (Baldwin et al. 2003) and reproduced by climate models (Gerber et al. 2010). There is special interest in the predictability of extreme anomalies in the extratropical stratospheric circulation such as sudden stratospheric warmings (SSWs), which are extraordinary events characterized by an abrupt rise in stratospheric temperatures in the polar region by several tens of degrees within a few days and a quick deceleration of the westerly winds. This interest is motivated by the need to improve weather forecasts. It is known that during extreme stratospheric conditions, such as SSWs, the stratosphere exerts a downward influence on the tropospheric circulation that often leads to persistent weather patterns lasting for several weeks (e.g., Baldwin et al. 2003; Thompson et al. 2002; Lehtonen and Karpechko 2016). Specifically, tropospheric circulation after SSWs often remains in the negative phase of the northern annular mode (NAM; Baldwin et al. 2003) with corresponding weather impacts including higher than normal sea level pressure over the Arctic, increased frequency of occurrence of cold events in northern Eurasia and eastern North America (Thompson et al. 2002; Kolstad et al. 2010), and increased precipitations in southwestern Europe (Sigmond et al. 2013). Several studies (e.g., Sigmond et al. 2013; Tripathi et al. 2015a) demonstrated that during such periods extended weather forecasts for certain regions are more skillful. Note that, although SSWs’ influence on the troposphere is statistically robust, not all SSWs have considerable tropospheric impacts (e.g., Karpechko et al. 2017).
The recent review by Tripathi et al. (2015b) showed that the predictability of SSW events, as reported upon by different studies, varies between 6 and 30 days. Although this spread may reflect the differences in the predictability of particular events and the skill of forecasting systems, the spread may also be because of different approaches used to quantify the predictability. Tripathi et al. (2016) performed a detailed predictability study of a single SSW event that occurred during January 2013 using five forecast models and showed that most of the models predicted the SSW when initialized 10 days before the event but failed to predict it when initialized 15 days before the event. Whether or not this result can be generalized to other events remains an open question.
Several more studies (e.g., Kim and Flatau 2010; Mukougawa and Hirooka 2004) focused on the predictability of single events; however, only a few reported the predictability of various events using the same system and same approach. Marshall and Scaife (2010) analyzed five events and reported that their predictability was between 9 and 15 days. Taguchi (2016b) analyzed the predictability of SSWs in a single system covering the period from 2001/02 to 2012/13 and showed that, on average, the fraction of forecast ensemble members that predict an SSW increases from 20% on day 15, to 30% on day 10, and to 70% on day 5. In another paper, Taguchi (2016a) analyzed SSW predictability during the period 1979–2012 and suggested that the predictability of SSWs depends on the geometry of the polar vortex, with split SSWs being less predictable than displacement SSWs. However, it remains unclear whether these results reflect the skill of particular forecast systems, or the intrinsic predictability of the SSW events. Further studies are needed to test these results with different forecast systems to improve our understanding of SSW predictability.
The goal of our study is to document SSW predictability limits in the European Centre for Medium-Range Weather Forecasts (ECMWF) extended-range forecast system. The advantage of our study is that we use a relatively large number of reforecasts (about 22 month−1) in comparison to previous studies, which allows us to better quantify predictability limits. The dataset is described in section 2. In section 3 we assess the skill of forecasts in the wintertime Arctic stratosphere and analyze the predictability of SSW events. The conclusions are given in section 4.
2. Data and methods
We use retrospective forecasts (hindcasts) of the extended-range forecast system of ECMWF (Vitart 2014). After an update in November 2013 the system has 91 levels in the vertical with the top level at 0.01 hPa, which makes it well suited for studying stratospheric predictability. Along with the real-time forecasts the system also produces hindcasts needed to correct systematic model biases. The hindcasts are initialized at the same calendar date as the forecasts but for the 20 preceding years. Since our focus is on the predictability of SSWs, we use hindcast sets initialized between 1 November and 31 March in conjunction with forecasts for four winters: 2013/14–2016/17. For winter 2013/14 we only use hindcasts after 21 November (i.e., after the update of the system). The hindcasts are initialized once a week (Thursdays) during winters 2013/14 and 2014/15 and twice a week (Mondays and Tuesdays) during winters 2015/16 and 2016/17. We use all hindcasts except those corresponding to Monday’s forecasts from 2016/17. The dates of these hindcasts coincide with the dates of the 2013/14 forecasts; therefore, to avoid overlap and simplify data handling these hindcasts are not used. Altogether 2120 hindcasts corresponding to 106 forecast dates and covering extended winters 1993/94 to 2015/16 are analyzed. In terms of temporal coverage, the hindcasts are available for 61% of all dates during the extended winters of 1993/94–2015/16.
Each hindcast is an ensemble of perturbed forecasts consisting of either 5 (winters 2013/14 and 2014/15) or 11 (winters 2015/16 and 2016/17) members. The hindcasts from the first two winters run for 32 days while those from the last two winters run for 46 days. For consistency we only analyze the first 32 days from all hindcasts.
We assess the skill of zonal wind (U10) and geopotential height (Z10) hindcasts at the 10-hPa pressure level, which is the standard level used to diagnose SSWs. Forecast skill is assessed in terms of the correlation coefficient and absolute error for the zonal winds, and root-mean-square error (RMSE) and anomaly correlation coefficients (ACC) for the polar cap (60°–90°N) geopotential height. For verification we use ECMWF’s interim reanalysis (ERA-I; Dee et al. 2011)
There were 13 major SSWs during the period analyzed here. The central dates for these events, event type, duration, and intensity are listed in Table 1. The dates are the same as in Karpechko et al. (2017) and are defined as dates between 1 November and 31 March when daily mean zonal mean winds at 60°N and 10 hPa reverse from westerlies to easterlies, indicating weakening of the stratospheric polar vortex, and are followed by a period of at least 10 days before the final warming.
Central dates for the SSWs during 1993–2016. The events with strong tropospheric impacts according to Karpechko et al. (2017) are shown in boldface, while those without impacts are shown in italics. The third column shows the SSW type (S, split; D, displacement). The fourth column shows the duration as a number of days when zonal winds remained easterlies before becoming westerlies for the first time. The fifth column shows the SSW intensity as the minimum daily mean zonal wind during the SSW period.


SSW predictability is assessed by analyzing ensemble mean zonal wind hindcasts and also by calculating the SSW probability as the probability of zonal winds in each forecast day to be easterly. Several previous studies (e.g., Tripathi et al. 2016; Taguchi 2016b) calculated the SSW probability by counting the fraction of ensemble members that satisfy the SSW criterion during each forecast day. Since the size of the forecast ensemble in our study is rather small (5 or 11 members), applying this method here would result in a coarse probabilistic forecast resolution (i.e., only probability multiples of 0.2 would be possible in the case of 5 members). Therefore, we use a different approach in which we calculate the probability distribution function (PDF) for the zonal wind forecast ensembles assuming the normal distribution defined by the mean and standard deviation of the ensembles. The SSW probability in each forecast day is then simply the fraction of the PDF area that is below 0 m s−1. The assumption of a normal distribution may be not well justified because it ignores the possibility of multiple flow regimes. However, comparison of our results with those from the other studies addressing specific events, such as Tripathi et al. (2016), shows similar results, suggesting that our approach is reasonable.
3. Results
a. Wintertime Arctic stratosphere predictability
In this section we start by assessing the forecast skill in the wintertime Arctic stratosphere during November–March and then look in more detail at the predictability of individual SSW events in section 3b. Figure 1a shows the correlation between the observed and hindcast zonal mean zonal winds as a function of hindcast lead time. Figure 1 confirms the skill level reported by Vitart (2014) for the previous 62-level version of the ECMWF model and also shows a moderate improvement after the introduction of the 91-level version. In the previous version the correlations decreased to 0.6 and 0.8 on forecast days 27 and 16, respectively (Vitart 2014), while in the present version they decrease to the same levels on forecast days 29 and 18, suggesting a gain of about two forecast days. It is reasonable to assume that persistence associated with the long time scales of the stratospheric variability is an important source of forecast skill throughout the stratosphere. Indeed, the skill of persistence forecasts, calculated as a correlation between the observed winds at day 0 and at all following dates, is about 0.6 after two weeks (Fig. 1a, dashed line). It is important to note that the forecast system performs considerably better than the persistence forecast at all lead times.

(a) Correlation skill of zonal mean zonal wind hindcasts at 10 hPa and 60°N. The solid line shows the anomaly correlation and the dashed line the persistence forecast. (b) ACC and (c) RMSEs of the polar cap (60°–90°N) geopotential height hindcasts. In (b),(c) solid black lines show mean diagnostics for all hindcasts and dark shadings show 25th–75th percentiles and light shadings show 5th–95th percentiles. Solid (dashed) red lines show mean diagnostics (25th–75th percentiles) for the hindcasts initialized 10–15 days before SSWs.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0317.1

(a) Correlation skill of zonal mean zonal wind hindcasts at 10 hPa and 60°N. The solid line shows the anomaly correlation and the dashed line the persistence forecast. (b) ACC and (c) RMSEs of the polar cap (60°–90°N) geopotential height hindcasts. In (b),(c) solid black lines show mean diagnostics for all hindcasts and dark shadings show 25th–75th percentiles and light shadings show 5th–95th percentiles. Solid (dashed) red lines show mean diagnostics (25th–75th percentiles) for the hindcasts initialized 10–15 days before SSWs.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0317.1
(a) Correlation skill of zonal mean zonal wind hindcasts at 10 hPa and 60°N. The solid line shows the anomaly correlation and the dashed line the persistence forecast. (b) ACC and (c) RMSEs of the polar cap (60°–90°N) geopotential height hindcasts. In (b),(c) solid black lines show mean diagnostics for all hindcasts and dark shadings show 25th–75th percentiles and light shadings show 5th–95th percentiles. Solid (dashed) red lines show mean diagnostics (25th–75th percentiles) for the hindcasts initialized 10–15 days before SSWs.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0317.1
Figure 1b shows anomaly correlation coefficients (ACCs) for the polar cap Z10. This diagnostic takes into account the predictability of the whole flow system in the latitude–longitude plane; therefore, it is not surprising that the ACC decreases faster with lead time than the correlation skill of the zonal mean wind forecasts. When averaged over all hindcasts, ACC reaches levels of 0.6 and 0.8 on forecast days 16 and 11, respectively. However, the spread of the skill is large: in the worst 5% of the hindcasts the skill decreases below 0.6 on day 9 while in the best 5% the skill is much higher and exceeds 0.8 even after 1 month.
We next address the following question: What are the meteorological conditions leading to the largest forecast errors? Figure 2 shows the evolution of the forecasted and observed winds averaged across the 5% of cases when the zonal mean zonal wind hindcasts at 60°N show the largest absolute errors by day 15. The 5% of cases when the hindcasts show the smallest zonal wind errors are also shown for comparison. Qualitatively similar results are found if the selections of the cases are based on day 10 or day 20 errors. It can be seen that the hindcasts with the largest errors are initialized during periods with stronger than normal zonal winds, which then weaken by day 15. The hindcasts capture the initial deceleration; however, they are not able to reproduce the intensity of the wind weakening. In comparison, the forecasts with the smallest errors are initialized during weaker than normal zonal winds, which change little during the hindcast period.

Zonal mean zonal winds at 60°N and 10 hPa averaged over the periods with the 5% largest (red) and 5% smallest (blue) absolute errors of the 15-day hindcast zonal winds. Each composite consists of 106 cases. Solid lines show ERA-I and dashed lines show hindcasts. Dots indicate days when the mean observed zonal winds over the periods with the largest and smallest hindcast errors are different at p = 0.05. Dotted gray line shows climatological winds averaged over all hindcast periods.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0317.1

Zonal mean zonal winds at 60°N and 10 hPa averaged over the periods with the 5% largest (red) and 5% smallest (blue) absolute errors of the 15-day hindcast zonal winds. Each composite consists of 106 cases. Solid lines show ERA-I and dashed lines show hindcasts. Dots indicate days when the mean observed zonal winds over the periods with the largest and smallest hindcast errors are different at p = 0.05. Dotted gray line shows climatological winds averaged over all hindcast periods.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0317.1
Zonal mean zonal winds at 60°N and 10 hPa averaged over the periods with the 5% largest (red) and 5% smallest (blue) absolute errors of the 15-day hindcast zonal winds. Each composite consists of 106 cases. Solid lines show ERA-I and dashed lines show hindcasts. Dots indicate days when the mean observed zonal winds over the periods with the largest and smallest hindcast errors are different at p = 0.05. Dotted gray line shows climatological winds averaged over all hindcast periods.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0317.1
Figure 3 shows a similar comparison for the Z10 fields. Here, the selections are based on the 5% of cases with the largest and smallest RMSEs by day 15 (Fig. 1c). Selections based on the ACC diagnostics show broadly similar patterns but with a smaller magnitude of the anomalies. Results seen in Fig. 3 are consistent with those in Fig. 2 and provide additional insights. The hindcasts with the largest RMSEs by day 15 are initialized when a stronger than normal polar vortex is located over the European Arctic and an anticyclonic anomaly is present over Canada. By day 15 the vortex weakens and a wavenumber-2 structure appears. We note however that there is considerable case-to-case variability. Figure 3 also shows, for comparison, the 5% of hindcasts with the smallest errors. In this case the hindcasts are initialized when the polar vortex is weaker than normal. By day 15 the anomalies weaken, consistent with Fig. 2; however, the pattern shows only small changes.

Polar cap (60°–90°N) geopotential height anomalies at 10 hPa (Z10, m). The Z10 values are averaged over the periods with (a),(c),(e) the 5% largest and (b),(d),(f) 5% smallest RMSEs of the 15-day hindcast Z10. Each composite consists of 106 cases. Shown are results for (a),(b) ERA-I for day 0; (c),(d) ERA-I for day 15; and (e),(f) hindcasts for day 15.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0317.1

Polar cap (60°–90°N) geopotential height anomalies at 10 hPa (Z10, m). The Z10 values are averaged over the periods with (a),(c),(e) the 5% largest and (b),(d),(f) 5% smallest RMSEs of the 15-day hindcast Z10. Each composite consists of 106 cases. Shown are results for (a),(b) ERA-I for day 0; (c),(d) ERA-I for day 15; and (e),(f) hindcasts for day 15.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0317.1
Polar cap (60°–90°N) geopotential height anomalies at 10 hPa (Z10, m). The Z10 values are averaged over the periods with (a),(c),(e) the 5% largest and (b),(d),(f) 5% smallest RMSEs of the 15-day hindcast Z10. Each composite consists of 106 cases. Shown are results for (a),(b) ERA-I for day 0; (c),(d) ERA-I for day 15; and (e),(f) hindcasts for day 15.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0317.1
In summary, the hindcasts in the Arctic wintertime stratosphere are skillful (correlation skill >0.6) at lead times beyond 2 weeks, which is consistent with most previous studies, and also modestly improved in comparison to the previous study by Vitart (2014) as a result of model improvements. The largest forecast errors correspond to cases of vortex weakening. We next move to assessing the predictability of the SSW events.
b. Predictability of SSWs
We start this section by looking at the Z10 forecast skill diagnostics. The ACC and RMSEs for the hindcasts initialized between 10 and 15 days before the SSW central dates are shown in Figs. 1b and 1c. Looking first at ACC (Fig. 1b), we note that, at most lead times, there is little difference between hindcasts initialized before SSWs and all other hindcasts. On the other hand the RMSE diagnostic (Fig. 1c) shows larger errors for the hindcasts initialized before SSWs. On average, the RMSEs are larger by 40%–60% at lead times shorter than 20 days in hindcasts initialized before SSW events in comparison to all other hindcasts. There is a decrease in RMSEs for SSW events after day 20, which corresponds to the recovery from SSWs. During this period, the magnitude of the anomalies decreases and the absolute errors become smaller even if there is no skill improvement in terms of ACC (Fig. 1b). Analysis of ACC and RMSE diagnostics suggests that the system predicts the spatial patterns of the anomalies during SSW events with the same skill as during other situations; however, the magnitude of the events is underestimated.
We next address the following question: How long in advance can the SSW events be predicted? Figure 4 shows ensemble mean hindcasts initialized within 1 month before the SSW central dates for each SSW event (Table 1). Note that hindcast lead times vary between the events depending on the availability of the hindcasts. Figure 4 shows that, in many cases, hindcasts initialized less than 10 days before the SSWs predict events close in time to the ones that are actually observed. However, SSWs 5, 6, and 9 are not predicted even at lead times of 4–6 days. It can be seen from Table 1 that these are weak events lasting only for a few days; therefore, in these cases even a relatively small forecast error may result in forecasted zonal winds missing the SSW criterion. The ACC and RMSE diagnostics for these events do not show anomalous values in comparison to the other SSWs (Table 2), suggesting that the apparently low predictabilities of these events are likely due to methodological problems rather than being due to, for example, low atmospheric predictability.

Zonal mean zonal winds at 60°N and 10 hPa for the 13 SSW cases that occurred during 1993–2016. ERA-I is shown with black lines, red lines show hindcasts that predict an SSW to occur within 3 days from the date they actually occurred, yellow lines show hindcasts that predict an SSW but on a date that differs by more than 3 days from when it actually occurred, and the gray lines show hindcasts that do not predict an SSW. Blue shading marks periods when the observed winds were negative. All available hindcasts initialized 1–32 days before SSWs are shown.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0317.1

Zonal mean zonal winds at 60°N and 10 hPa for the 13 SSW cases that occurred during 1993–2016. ERA-I is shown with black lines, red lines show hindcasts that predict an SSW to occur within 3 days from the date they actually occurred, yellow lines show hindcasts that predict an SSW but on a date that differs by more than 3 days from when it actually occurred, and the gray lines show hindcasts that do not predict an SSW. Blue shading marks periods when the observed winds were negative. All available hindcasts initialized 1–32 days before SSWs are shown.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0317.1
Zonal mean zonal winds at 60°N and 10 hPa for the 13 SSW cases that occurred during 1993–2016. ERA-I is shown with black lines, red lines show hindcasts that predict an SSW to occur within 3 days from the date they actually occurred, yellow lines show hindcasts that predict an SSW but on a date that differs by more than 3 days from when it actually occurred, and the gray lines show hindcasts that do not predict an SSW. Blue shading marks periods when the observed winds were negative. All available hindcasts initialized 1–32 days before SSWs are shown.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0317.1
SSW hindcast statistics. The second through sixth columns show results from 10-day hindcasts, except for SSW 5 and 9, for which 11-day results are shown. The second column shows the minimum ensemble mean wind within 3 days from the central date. The third and fourth columns show ACC for Z10 on the central date (absolute value and corresponding percentile across all 10-day hindcasts). RMSEs are shown in the fifth and sixth columns. The seventh and eighth columns show the forecast lead days when the predicted SSW probability dropped below 0.5 and 0.9 for the first time, respectively. Numbers in parentheses show the number of lead days when the predicted SSW probability dropped below the corresponding limits and additionally the event was predicted within 3 days from the actual one. The events with strong tropospheric impacts according to Karpechko et al. (2017) are shown in boldface, while those without impacts are shown in italics.


For events 2, 3, and 10 some hindcasts predict an SSW close to the actual dates already at lead times of 20–30 days. However, these hindcasts are followed by hindcasts initiated at shorter lead times that do not predict an SSW, which raises doubts as to whether such long lead times can be interpreted as predictability limits for these events. It is possible that the loss of predictability at shorter lead times may be due to larger initialization errors during certain meteorological conditions, or that the relatively small size of hindcast ensembles is not sufficient to sample all possible meteorological situations; however, these hypotheses should be studied separately. Interestingly, Mukougawa and Hirooka (2004) reported a successful forecast of SSW 1 at a lead time of 30 days; however, this event is not predicted at such long times in the ECMWF system.
It is interesting to compare our results with those from Tripathi et al. (2016), who studied the predictability of SSW 13 using several forecast models but only looked at forecasts initialized every fifth day (i.e., days 15, 10, and 5 before the SSW). They showed that most forecast models predicted this event at days 5 and 10 but not at day 15. Here, we find that the ensemble mean hindcasts predict the SSWs at lead times of 13 days and less, but not at longer lead times, which is in agreement with Tripathi et al. (2016). Further analysis (not shown) reveals that, while hindcasts initiated at day 10 and fewer predict a split type of SSW, as observed, the 12- and 13-day hindcasts predict a displacement-type SSW. Moreover, the ensemble mean of the day 12 hindcast predicts an SSW to occur only 16 days after the actual event. A mismatch in time between actual and predicted SSWs of up to 2 weeks can also be seen in some other hindcasts for several events: SSWs 2, 3, 5, 8, and 12 (Fig. 4).
We also consider false alarms, or cases when the model predicted an SSW but none actually occurred. Using ensemble mean winds to define SSWs, we find that the total number of false alarm cases is 33, which corresponds to a false alarm rate (ratio of false alarms to the total number of SSW nonoccurrences) of 0.03. This means that a very small fraction of forecasts predicted an SSW when none actually occurred. Most of these false alarms correspond to the situation when observed winds weaken to 5 m s−1 or less but remain positive (i.e., no SSW occurred), with only a few cases when the observed winds remained stronger than 10 m s−1 when the model predicted negative winds (i.e., an SSW).
We next consider probabilistic SSW forecasts. The SSW probability is calculated for each forecast day using the spread across ensemble members as explained in section 2. The maximum daily SSW probability during a 32-day forecast period is thereafter used as a measure of risk of having an SSW during the forecast period. Note that this measure does not indicate for which date an SSW is forecasted; that is, it could be forecasted for a different date than actually observed event. For example, a hindcast initiated 10 days before SSW4 (11 February 2001) predicted an SSW to occur (ensemble mean winds become easterlies) on 11 February 2001 while a hindcast initiated 9 days before the same SSW predicted an SSW to occur on 16 February 2001; nevertheless, both hindcasts predicted a high probability of an SSW during the forecast period with maximum daily probabilities of 0.70 and 0.99, correspondingly. In the following we will refer to the maximum daily SSW probability as the forecasted SSW probability or, simply, as the SSW probability.
Figure 5 shows forecasted SSW probabilities for all SSWs as a function of forecast lead time with respect to the actual central date. Note that an SSW probability of 0.5 means that the ensemble mean zonal wind becomes easterly according to the definition used here (section 2). There is good correspondence between our SSW probability diagnostic and the frequency of SSW occurrence. This correspondence can be seen from the reliability diagram (Wilks 2006) shown in Fig. 6. The reliability diagram shows the frequency of the observed events as a function of their forecasted probability; that is, it indicates how often events predicted with given probability have actually occurred. The proximity of the diagnostic to the diagonal indicates that events predicted to occur with, say, a probability of 0.5, have actually occurred in 50% of cases, which means that the forecasts are well calibrated, or reliable.

Predicted SSW probability for each SSW as a function of lead time calculated with respect to the actual SSW central date. Each panel shows the probability for three to four SSWs grouped according to their time of occurrence for visualization purposes.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0317.1

Predicted SSW probability for each SSW as a function of lead time calculated with respect to the actual SSW central date. Each panel shows the probability for three to four SSWs grouped according to their time of occurrence for visualization purposes.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0317.1
Predicted SSW probability for each SSW as a function of lead time calculated with respect to the actual SSW central date. Each panel shows the probability for three to four SSWs grouped according to their time of occurrence for visualization purposes.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0317.1

Reliability diagram of the predicted SSW probability when (a) the whole forecast period is considered and (b) when only forecast days 15–32 are considered. The reliability curve is in red. The size of the dots is proportional to the number of forecasts with given forecast probability. The numbers of forecasts in each probability bin are shown next to the dots. The dotted line shows the observed SSW occurrence frequency during a 32-day period of 0.13. Dashed lines bound the area where forecasts are skillful. The 5%–95% uncertainty range for the forecasts is estimated by a bootstrapping procedure repeated 10 000 times.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0317.1

Reliability diagram of the predicted SSW probability when (a) the whole forecast period is considered and (b) when only forecast days 15–32 are considered. The reliability curve is in red. The size of the dots is proportional to the number of forecasts with given forecast probability. The numbers of forecasts in each probability bin are shown next to the dots. The dotted line shows the observed SSW occurrence frequency during a 32-day period of 0.13. Dashed lines bound the area where forecasts are skillful. The 5%–95% uncertainty range for the forecasts is estimated by a bootstrapping procedure repeated 10 000 times.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0317.1
Reliability diagram of the predicted SSW probability when (a) the whole forecast period is considered and (b) when only forecast days 15–32 are considered. The reliability curve is in red. The size of the dots is proportional to the number of forecasts with given forecast probability. The numbers of forecasts in each probability bin are shown next to the dots. The dotted line shows the observed SSW occurrence frequency during a 32-day period of 0.13. Dashed lines bound the area where forecasts are skillful. The 5%–95% uncertainty range for the forecasts is estimated by a bootstrapping procedure repeated 10 000 times.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0317.1
Only hindcasts initialized during periods of westerly winds (92% of the hindcasts) are included in Fig. 6 because for the hindcasts initialized during periods of easterly winds the SSW probability is 1 by definition. Furthermore, only hindcasts initialized from November to February are considered. This constraint is applied to exclude the transition to summer circulation (i.e., final warmings), which typically happens by April (e.g., Karpetchko et al. 2005). Including March hindcasts would also reflect the model’s skill in capturing the seasonal cycle rather than subseasonal variability. The SSW occurrence probability during a 32-day hindcast period is estimated to be 0.13, which is calculated using the observed frequency of the SSW occurrence of 0.6 per a November–March season (Butler et al. 2015) and assuming that there is an equal SSW occurrence probability on each day during the season. Using ensemble mean zonal winds to define SSWs in hindcasts, we find a very close frequency of SSW occurrence of 0.15 in the ECMWF model. Since SSWs are relatively rare events, most hindcasts show only a small SSW probability and the number of hindcasts predicting the given SSW probability decreases as the probability increases (Fig. 6). However, there is a small increase in the number of hindcasts predicting high SSW probability of >0.9. These hindcasts are mostly initialized just a few days before an SSW, when the ensemble spread is small and the predictability is close to its deterministic limit. Indeed, the number of hindcasts with SSW probability >0.9 decreases considerably when only forecast days 15–32 are considered (Fig. 6b). The predicted SSW probability is closely linked with the frequency of observed SSWs; however, there is a tendency for forecasts to be slightly underconfident (i.e., predicting SSWs less frequently than they occur) for a probability range of 0.3–0.5 and slightly overconfident (i.e., predicting SSWs more frequently than they occur) for probabilities >0.9. The underconfidence may be related to the model’s positive bias in zonal winds of 1–3 m s−1 at lead times of more than 12 days (not shown). On the other hand the overconfidence, which is clear for the case when the whole forecast period of 0–32 days is considered but cannot be diagnosed because of the large uncertainty when only forecast days 15–32 are considered, may be associated with a too small forecast spread during the first forecast days, which might be improved by increasing the number of ensemble members. Overall, the closeness of the reliability curve to the diagonal indicates that the SSW probability diagnostic is a reliable one (Wilks 2006). This conclusion applies to the results from the whole forecast period (Fig. 6a), as well as to those obtained when only forecast days 15–32 are considered (Fig. 6b).
Returning back to Fig. 5, it can be seen that, for most events, the predicted SSW probability is close to 1 for lead times less than 5 days. At longer lead times there is a large variability from event to event. For example, for SSW 10 the forecasted probability is close to 1 at lead times of 13 days and less, while for SSW 6 the probability is less than 0.5 just 4 days before the event. As was discussed above, defining SSW predictability based on satisfying the SSW criterion may result in low predictability for short-lived and weak events, such as SSW 6, which only lasted for 1 day with zonal mean zonal winds of −2 m s−1. Moreover, the short-lived events typically do not have considerable surface impacts (Karpechko et al. 2017) and therefore they may be less interesting from the surface climate predictability perspective. Focusing only on the events with long-lasting tropospheric impacts defined by Karpechko et al. (2017) as a negative phase of the NAM at 1000 hPa lasting for at least 45 days after SSW, it can be seen (Table 2) that, in five out of seven such events, an SSW probability >0.5 was already forecasted at days 10–13 and less, although in four of these cases the earliest forecasts predicted an SSW to occur 1–2 weeks later than they actually did (Fig. 4 and Table 2). The two cases, SSWs 11 and 12, with the shortest predictability limit (less than 10 days) among these events are discussed in more detail below.
For event 12, a low SSW probability was predicted in hindcasts initialized between 10 and 7 days before the event. On the other hand, several hindcasts initialized 11–25 days before the actual SSW predicted a high probability of SSWs occurring as early as 30 January (i.e., 10 days before the actual event). Dörnbrack et al. (2012) analyzed the quality of the ECMWF medium-range forecasts for this event and showed that, while the initial vortex weakening in late January (see Fig. 4) was well predicted in 10-day forecasts, the evolution of the disturbed flow in early February, before the major SSW took place, was less well predicted. Dörnbrack et al. (2012) suggested that the rapidly changing flow and disturbed vortex during this period introduced larger uncertainty into the initial forecast conditions, which resulted in large uncertainty in the forecasts. In agreement with this suggestion we find that, for example, the RMSE of the hindcast initialized on 30 January 2010 was already unusually large (39 m, corresponding to the 95th percentile across all 10-day hindcasts) on day 0. Thus, problems with initialization in the stratosphere have likely contributed to the low predictability of SSW 12.
Unlike SSW 12, SSW 11 started much more abruptly and the vortex was unusually strong 1–2 weeks before the event (Harada et al. 2010). Table 2 shows that the low predictability of this event is evident in several diagnostics. For example, the 10-day hindcast shows both a large RMSE (829 m, corresponding to the 99th percentile) and a low ACC (0.54, corresponding to the 6th percentile). Problems with initialization could also contribute to the low predictability of this event; however, they were not as severe (the RMSE at day 0 of 28 m, corresponding to the 83th percentile) as those during SSW 12, and are unlikely to be the main cause. The low predictability of this event was also pointed out by Taguchi (2016b) and Kim and Flatau (2010) in other forecast systems. The latter study reported that this event was not predicted by their system in 9-day forecasts but was well predicted in 4-day forecasts. Taguchi (2016b) showed that the low predictability of this event was associated with strongly underestimated wave forcing quantified in terms of the eddy heat flux. A low predictability of SSW 11 in different forecast systems may thus reflect an intrinsically low predictability of atmospheric flow during this period.
Since the observed frequency of the SSWs is only about 0.13 month−1, a significant increase in predicted SSW probability above this threshold may be used as an early warning of increased SSW risk. For example, days when the predicted SSW probabilities increase above 0.5 and 0.9 are listed in Table 2 for each event. In addition, it may be even more illuminating to analyze SSW predictability averaged over several events. Figure 7 shows the predicted SSW probability averaged over SSWs 2, 4, 7, 8, 11, and 13, that is, over all events with strong tropospheric impacts (Table 1), except SSW 12. Here, SSW 12 is excluded because, unlike the other cases, it occurred following a period with a strongly disturbed vortex and thus its predictability may be not representative of a typical SSW. Moreover, there is some ambiguity even in defining the central date of this event. For example the operational ECMWF analysis showed an SSW had already formed on 26 January 2010 (Dörnbrack et al. 2012), that is, two weeks earlier than the date based on the reanalyses (Table 1).

Mean predicted SSW probability as a function of lead time averaged over SSWs 2, 4, 7, 8, 11, and 13. The lead time is calculated with respect to the actual SSW central dates. Numbers next to the dots indicate the total number of ensemble members from all hindcasts at that lead time. Dashed line shows observed frequency of SSW occurrence (0.13 month−1). SSW probabilities of 0.5 and 0.9 are shown by the dotted lines.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0317.1

Mean predicted SSW probability as a function of lead time averaged over SSWs 2, 4, 7, 8, 11, and 13. The lead time is calculated with respect to the actual SSW central dates. Numbers next to the dots indicate the total number of ensemble members from all hindcasts at that lead time. Dashed line shows observed frequency of SSW occurrence (0.13 month−1). SSW probabilities of 0.5 and 0.9 are shown by the dotted lines.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0317.1
Mean predicted SSW probability as a function of lead time averaged over SSWs 2, 4, 7, 8, 11, and 13. The lead time is calculated with respect to the actual SSW central dates. Numbers next to the dots indicate the total number of ensemble members from all hindcasts at that lead time. Dashed line shows observed frequency of SSW occurrence (0.13 month−1). SSW probabilities of 0.5 and 0.9 are shown by the dotted lines.
Citation: Monthly Weather Review 146, 4; 10.1175/MWR-D-17-0317.1
Figure 7 reveals three distinguishable periods. The first period lasting from 1 month before the events until day 13 is characterized by a slow increase in SSW probability from near 0 to 0.3. During the second period the probability increases rapidly from 0.5 on day 12 to 0.8 on day 8. Finally, the third period, lasting from day 7 until the SSW central date, is characterized by a high SSW probability with values close to 1. Thus, skillful probabilistic forecasts of the long-lasting SSW events can be made 8–12 days in advance and, in some cases, possibly even at longer lead times (Fig. 4). At the same time deterministic SSW forecasts may not be possible until about day 7 before the event.
We also analyzed the spread of the hindcast ensemble members before the SSWs but did not find a dependence on lead time with respect to the central dates, with larger differences found between individual SSWs. However, hindcasts initialized at lead times less than 7 days before SSWs show a smaller spread after SSWs, in comparison to the mean spread in all hindcasts at the same lead times. Since these hindcasts skillfully predict SSWs (Fig. 7), smaller ensemble spread during periods of recovery from SSWs may be related to an enhanced degree of weather forecast skill (Tripathi et al. 2015a); however, this hypothesis requires further investigation.
Finally, we discuss the dependence of SSW predictability on its type. It has been suggested by Taguchi (2016a) that forecasts have larger errors for split- than for displacement-type SSWs. Analysis of 10-day forecasts (Table 2) shows that similar difference can be found in the ECMWF system. The mean RMSE averaged over the six split events is 372 m whereas the RMSE value for the seven displacement events is 342 m. Similarly, ACC results for the two types are 0.83 and 0.87, respectively. However, the difference between these subsamples is rather small and strongly depends on the inclusion of extraordinary split SSW 11 with large forecast errors. Thus, because of the small number of events our results are inconclusive.
4. Conclusions and discussion
This study investigates the predictability of the sudden stratospheric warmings (SSWs), defined as a reversal of zonal mean zonal winds at 10 hPa and 60°N, in the ECMWF extended-range forecast system. Our study is one of the few that investigate the predictability of several SSW events in the same system, which allows case-to-case comparison.
The forecasts initialized during the 10–15-day periods preceding SSWs have similar skill to the other forecasts in terms of pattern correlation, but they have larger RMSEs because they significantly underestimate the magnitude of the anomalies. Using the spread of the ensemble members to define the predicted SSW probability, we show that for some events a high SSW probability strongly exceeding the climatological SSW occurrence frequency is forecasted 12–13 days before the events. In cases when weak easterlies appear for a short period of time, the system may miss the event at lead times of 4–6 days even if the forecast errors are not exceptionally large. Although such events may not be predicted at long lead times, they are usually restricted to the stratosphere and have little impact on the tropospheric circulation. Focusing on the events with significant tropospheric impacts and averaging over several events, we show that, as lead time decreases, the forecasted SSW probability increases, on average, throughout the whole 1-month forecast period. Until day 13 the increase is slow but thereafter the probability increases rapidly from 0.3 to nearly 1 at day 7. We conclude that during this period (i.e., days 8–12 before the events), the SSWs can be considered predictable in a probabilistic sense. After day 7 the predictability can be considered to be close to deterministic.
Our results are in agreement with those of most others, suggesting that the predictability limit of SSWs is close to 10 days (Tripathi et al. 2015b), and are obtained using a larger set of hindcasts than most of the previous studies. Comparison of the results for individual events, such as SSWs 2009 and 2013, also gives comparable predictability limits (Kim and Flatau 2010; Tripathi et al. 2016). Our results do not suggest that SSWs are regularly predictable beyond 2 weeks, as was found for one case by Mukougawa and Hirooka (2004); however, we do show that some SSW events were predicted at lead times of more than 20 days although they were not predicted at shorter lead times. Better understanding of why these forecasts were successful at such long lead times, but not at shorter lead times, could help us to make progress in extended-range forecasting. Another question that stems from our study is whether or not the tropospheric impacts of SSWs are predictable at the same lead times as the SSWs themselves. This work is in progress and the results will be reported upon in a separate study.
Acknowledgments
The study is funded by the Academy of Finland (Grants 286298 and 294120). The author thanks Drs. Andrew Charlton-Perez, Magdalena Balmaseda, Frederic Vitart, and Nicholas Tyrrell for useful discussions and comments on an earlier version of the manuscript. Comments from three anonymous reviewers are also appreciated.
REFERENCES
Baldwin, M. P., D. B. Stephenson, D. W. J. Thompson, T. J. Dunkerton, A. J. Charlton, and A. O’Neill, 2003: Stratospheric memory and skill of extended‐range weather forecasts. Science, 301, 636–640, https://doi.org/10.1126/science.1087143.
Butler, A. H., D. J. Seidel, S. C. Hardiman, N. Butchart, T. Birner, and A. Match, 2015: Defining sudden stratospheric warmings. Bull. Amer. Meteor. Soc., 96, 1913–1928, https://doi.org/10.1175/BAMS-D-13-00173.1.
Dee, D. P., and Coauthors, 2011: The ERA-Interim reanalysis: Configuration and performance of the data assimilation system. Quart. J. Roy. Meteor. Soc., 137, 553–597, https://doi.org/10.1002/qj.828.
Dörnbrack, A., M. C. Pitts, L. R. Poole, Y. J. Orsolini, K. Nishii, and H. Nakamura, 2012: The 2009–2010 Arctic stratospheric winter—General evolution, mountain waves and predictability of an operational weather forecast model. Atmos. Chem. Phys., 12, 3659–3675, https://doi.org/10.5194/acp-12-3659-2012.
Gerber, E. P., and Coauthors, 2010: Stratosphere‐troposphere coupling and annular mode variability in chemistry‐climate models. J. Geophys. Res., 115, D00M06, https://doi.org/10.1029/2009JD013770.
Harada, Y., A. Goto, H. Hasegawa, N. Fujikawa, H. Naoe, and T. Hirooka, 2010: A major stratospheric sudden warming event in January 2009. J. Atmos. Sci., 67, 2052–2069, https://doi.org/10.1175/2009JAS3320.1.
Jung, T., and M. Leutbecher, 2007: Performance of the ECMWF forecasting system in the Arctic during winter. Quart. J. Roy. Meteor. Soc., 133, 1327–1340, https://doi.org/10.1002/qj.99.
Karpetchko, A., E. Kyro, and B. M. Knudsen, 2005: Arctic and Antarctic polar vortices 1957–2002 as seen from the ERA-40 reanalyses. J. Geophys. Res., 110, D21109, https://doi.org/10.1029/2005JD006113.
Karpechko, A. Yu., P. Hitchcock, D. H. W. Peters, and A. Schneidereit, 2017: Predictability of downward propagation of major sudden stratospheric warmings. Quart. J. Roy. Meteor. Soc., 143, 1459–1470, https://doi.org/10.1002/qj.3017.
Kim, Y.-J., and M. Flatau, 2010: Hindcasting the January 2009 Arctic sudden stratospheric warming and its influence on the Arctic Oscillation with unified parameterization of orographic drag in NOGAPS. Part I: Extended-range stand-alone forecast. Wea. Forecasting, 25, 1628–1644, https://doi.org/10.1175/2010WAF2222421.1.
Kolstad, E. W., T. Breiteig, and A. A. Scaife, 2010: The association between stratospheric weak polar vortex events and cold air outbreaks in the Northern Hemisphere. Quart. J. Roy. Meteor. Soc., 136, 886–893, https://doi.org/10.1002/qj.620.
Lehtonen, I., and A. Yu. Karpechko, 2016: Observed and modeled tropospheric cold anomalies associated with sudden stratospheric warmings. J. Geophys. Res. Atmos., 121, 1591–1610, https://doi.org/10.1002/2015JD023860.
Marshall, A. G., and A. A. Scaife, 2010: Improved predictability of stratospheric sudden warming events in an atmospheric general circulation model with enhanced stratospheric resolution. J. Geophys. Res., 115, D16114, https://doi.org/10.1029/2009JD012643.
Mukougawa, H., and T. Hirooka, 2004: Predictability of stratospheric sudden warming: A case study for 1998/99 winter. Mon. Wea. Rev., 132, 1764–1776, https://doi.org/10.1175/1520-0493(2004)132<1764:POSSWA>2.0.CO;2.
Sigmond, M., J. F. Scinocca, V. V. Kharin, and T. G. Shepherd, 2013: Enhanced seasonal forecast skill following stratospheric sudden warmings. Nat. Geosci., 6, 98–102, https://doi.org/10.1038/ngeo1698.
Taguchi, M., 2014: Stratospheric predictability: Basic characteristics in JMA 1-month hindcast experiments for 1979–2009. J. Atmos. Sci., 71, 3521–3538, https://doi.org/10.1175/JAS-D-13-0295.1.
Taguchi, M., 2016a: Connection of predictability of major stratospheric sudden warmings to polar vortex geometry. Atmos. Sci. Lett., 17, 33–38, https://doi.org/10.1002/asl.595.
Taguchi, M., 2016b: Predictability of major stratospheric sudden warmings: Analysis results from JMA operational 1-month ensemble predictions from 2001/02 to 2012/13. J. Atmos. Sci., 73, 789–806, https://doi.org/10.1175/JAS-D-15-0201.1.
Thompson, D. W. J., M. P. Baldwin, and J. M. Wallace, 2002: Stratospheric connection to Northern Hemisphere wintertime weather: Implications for prediction. J. Climate, 15, 1421–1428, https://doi.org/10.1175/1520-0442(2002)015<1421:SCTNHW>2.0.CO;2.
Tripathi, O. P., A. Charlton-Perez, M. Sigmond, and F. Vitart, 2015a: Enhanced long-range forecast skill in boreal winter following stratospheric strong vortex conditions. Environ. Res. Lett., 10, 104007, https://doi.org/10.1088/1748-9326/10/10/104007.
Tripathi, O. P., and Coauthors, 2015b: The predictability of the extratropical stratosphere on monthly time-scales and its impact on the skill of tropospheric forecasts. Quart. J. Roy. Meteor. Soc., 141, 987–1003, https://doi.org/10.1002/qj.2432.
Tripathi, O. P., and Coauthors, 2016: Examining the predictability of the stratospheric sudden warming of January 2013 using multiple NWP systems. Mon. Wea. Rev., 144, 1935–1960, https://doi.org/10.1175/MWR-D-15-0010.1.
Vitart, F., 2014: Evolution of ECMWF sub-seasonal forecast skill scores. Quart. J. Roy. Meteor. Soc., 140, 1889–1899, https://doi.org/10.1002/qj.2256.
Waugh, D. W., J. M. Sisson, and D. J. Karoly, 1998: Predictive skill of an NWP system in the southern lower stratosphere. Quart. J. Roy. Meteor. Soc., 124, 2181–2200, https://doi.org/10.1002/qj.49712455102.
Wilks, D. S., 2006: Statistical Methods in the Atmospheric Sciences: An Introduction. 2nd ed. Academic Press, 627 pp.