How do stratospheric perturbations influence North American weather regime predictions?

: Observational evidence shows changes to North American weather regime occurrence depending on the strength of the lower-stratospheric polar vortex. However, it is not yet clear how this occurs or to what extent an improved stratospheric forecast would change regime predictions. Here we analyze four North American regimes at 500 hPa, constructed in principal component (PC) space. We consider both the location of the regimes in PC space and the linear regression between each PC and the lower-stratospheric zonal-mean winds, yielding a theory of which regime transitions are likely to occur due to changes in the lower stratosphere. Using a set of OpenIFS simulations, we then test the effect of re-laxing the polar stratosphere to ERA-Interim on subseasonal regime predictions. The model start dates are selected based on particularly poor subseasonal regime predictions in the European Centre for Medium-Range Weather Forecasts CY43R3 hindcasts. While the results show only a modest improvement to the number of accurate regime predictions, there is a substantial reduction in Euclidean distance error in PC space. The average movement of the forecasts within PC space is found to be consistent with expectation for moderate-to-large lower-stratospheric zonal wind perturbations. Overall, our results provide a framework for interpreting the stratospheric in ﬂ uence on North American regime behavior. The results can be applied to subseasonal forecasts to understand how stratospheric uncertainty may affect regime predictions, and to diagnose which regime forecast errors are likely to be related to stratospheric errors.


Introduction
The framework of large-scale weather regimes is now increasingly used in wintertime subseasonal-to-seasonal (S2S) prediction (from ∼2 weeks to 2 months ahead; White et al. 2017), although the concept of a weather "regime" is not new (Rex 1951). Regimes are characteristically recurrent, persistent, and quasi-stationary (e.g., Michelangeli et al. 1995) with typical time scales of weeks, well suited to the subseasonal scale where they can manifest "windows of opportunity" for skillful extended-range forecasts (Mariotti et al. 2020;Robertson et al. 2020).
Unlike empirical orthogonal functions (EOFs) (e.g., Hannachi et al. 2007), regimes defined through clustering methods are not bound by orthogonality or variance partitioning constraints. These regimes can therefore more closely represent the full anomalous flow configuration on a given day by benefiting from "mode mixing" and are accordingly easier to interpret, providing a useful way to understand extendedrange ensemble forecasts. By characterizing recurrent flow configurations, weather regimes can also be used to diagnose flow-dependent predictability (Ferranti et al. 2015;Matsueda Denotes content that is immediately available upon publication as open access. Supplemental information related to this paper is available at the Journals Online website: https://doi.org/10.1175/JCLI-D-21-0413.1.s1. and Palmer 2018). From an impacts perspective, regimes have been used to better understand meteorological impacts on energy demand (e.g., Grams et al. 2017;van der Wiel et al. 2019;Garrido-Perez et al. 2020), precipitation and wildfire risk (Robertson and Ghil 1999;Robertson et al. 2020), and public health (Charlton-Perez et al. 2019;Huang et al. 2020).
A significant source of tropospheric subseasonal predictability during boreal winter is variability in the Arctic stratospheric polar vortex, including sudden stratospheric warmings (SSWs; e.g., Charlton and Polvani 2007) and strong vortex events (e.g., Limpasuvan et al. 2005;Tripathi et al. 2015). The downward influence of the stratosphere can be viewed as the modulation of weather regime transition and persistence. Perhaps the simplest regime framework employs the two phases of the North Atlantic Oscillation (NAO), which are similar to the Northern Annular Mode (NAM) and Arctic Oscillation (AO) patterns and strongly influenced by the stratosphere (Ambaum et al. 2001;Baldwin and Thompson 2009;Hitchcock and Simpson 2014). More complex regime analyses for the North Atlantic-European sector invoke four (e.g. , Vautard 1990;Cassou 2008), six (Falkena et al. 2020), or seven (e.g., Grams et al. 2017) regimes depending on the method, focus, or purpose of the analysis.
Using four North Atlantic regimes, Charlton-Perez et al. (2018) found significant differences in the occurrence likelihood of three regimes between strong and weak lower-stratospheric vortex states, while the probability of Scandinavian blocking was invariant. Beerli and Grams (2019) related the stratospheric modulation of Atlantic weather regimes to whether or not the regime projected strongly onto the NAO pattern. They emphasized that regimes that do not project strongly onto the NAO provide a route for a wider variety of weather patterns following anomalous stratospheric vortex states. Subsequently, Maycock et al. (2020) analyzed the North Atlantic response to SSWs from the perspective of modulation of the three eddy-driven jet regimes, finding an increase in the occurrence and persistence of the southernmost regime (corresponding to the negative NAO). Domeisen et al. (2020a) assessed the varying degrees of stratosphere-troposphere coupling following major SSWs (e.g., Karpechko et al. 2017;White et al. 2019) by considering the regimes present during SSW onset and in the weeks afterward, suggesting that the antecedent state of the troposphere may play an important role in determining subsequent downward coupling.
In recent years, the influence of the stratosphere on North American climate variability has received increased attention, likely owing to the extreme cold-air outbreaks during winter 2013/14 that accompanied disruption to the polar vortex (Yu and Zhang 2015;Waugh et al. 2017). However, relatively less attention has been given to explicitly viewing the impact of the stratosphere on North American weather from a tropospheric regimes perspective. As North America is influenced by weather from both the Atlantic and Pacific to different degrees across the continent, a challenge with defining North American regimes is the choice of domain. Some studies (e.g., Amini and Straus 2019;Fabiano et al. 2021) focus on upstream variability in the Pacific-North American (PNA) sector (akin to the Atlantic regimes with respect to Europe), while others focus on the continent as a whole and incorporate both Atlantic and Pacific variability. Despite some methodological differences, a growing number of studies have defined a consistent and reproducible set of four wintertime regimes in the 500-hPa geopotential height anomaly field centered over North America (e.g., Straus et al. 2007;Vigaud et al. 2018;Lee et al. 2019b;Robertson et al. 2020). The regimes capture both PNA-like and NAO-like behavior.
More specifically, Lee et al. (2019b) analyzed these four North American regimes (the Arctic high, Arctic low, Alaskan ridge, and Pacific trough) in the context of the strength of the lower-stratospheric polar vortex in reanalysis. They found significant differences between the occurrence of three of the regimes during strong and weak stratospheric vortex states of a similar magnitude to those in Charlton-Perez et al. (2018) for the North Atlantic. The Alaskan ridge regime did not show a relationship with the stratospheric vortex strength, but was found to be strongly linked to North American cold waves. Lee et al. (2019b) hypothesized that tropical forcing (e.g., Wang et al. 2014) or stratospheric wave reflection (Kodera et al. 2016;Kretschmer et al. 2018;Matthias and Kretschmer 2020) may dominate driving the Alaskan ridge, owing to the similarity of the regime to patterns associated with both. As a purely observation-based study, the results of Lee et al. (2019b) were noncausal and did not assess when or how changes in the stratospheric state would change regime occurrence, or whether improved stratospheric forecasts would yield better regime predictions. Addressing these points is therefore a goal of the present study.
To diagnose the downward influence of the stratosphere on the troposphere, and changes in tropospheric forecast skill arising from a correctly predicted stratosphere, model experiments in which the stratospheric state is artificially nudged or relaxed to a different state (such as that from reanalysis) have been used. Most studies have focused on the seasonal-scale effects (Douville 2009;Hitchcock and Simpson 2014;Jung et al. 2010a,b). However, Kautz et al. (2020) used relaxation experiments on S2S time scales to quantify the role of the February 2018 SSW in the predictability and onset of the subsequent Eurasian cold wave. They found an increased probability of surface cold extremes in forecasts with a nudged stratosphere, but that the evolution of the lower-stratospheric NAM following the SSW}rather than simply the occurrence of the SSW}was important for more accurate tropospheric forecasts. The importance of persistent lower stratospheric anomalies in eliciting a tropospheric response is consistent with climate model studies (Maycock and Hitchcock 2015;Runde et al. 2016) and the polar-night jet oscillation events of Hitchcock et al. (2013).
Although SSWs and their strong vortex counterpart are typically harbingers of persistent anomalous lower-stratospheric NAM states (Baldwin and Dunkerton 2001), they do not necessarily propagate into the lowermost stratosphere, and anomalous lower-stratospheric NAM states can occur without a typical midstratospheric precursor. Hence, analysis of the effect of the stratosphere on the troposphere need not only focus on such extreme midstratospheric circulation events. Further, the NAM in the lower stratosphere during midwinter possesses a very long time scale (over 4 weeks; Baldwin et al. 2003), key for the S2S prediction scale. In this study, we focus on subseasonal variability in the strength of the lower-stratospheric polar vortex, diagnosed through the zonal-mean zonal wind at 100 hPa and 608N (U100). We do not explicitly consider SSWs or strong vortex events.
The overall goal of this study is to understand how changes or uncertainty in the subseasonal lower stratospheric vortex state can influence changes or uncertainty in predictions of North American weather regimes. We do this first by a statistical analysis of the regimes and their underlying EOFs in reanalysis, and then through analyzing a set of model experiments in which the stratosphere is nudged toward reanalysis. A greater understanding of the relationship between stratospheric variability and regimes will help in both the real-world understanding and interpretation of regime forecast uncertainty, and in subsequent studies of regime dynamics and predictability. It would also be a useful tool to examine how model biases affect the representation of stratosphere-troposphere coupling.
The paper is organized as follows. Section 2 introduces the data, methods, and model experiments. Section 3 defines the regimes and their underlying EOFs, and the relationship between these EOFs and the lower-stratospheric polar vortex strength. Section 4 develops a theory of how the stratosphere may influence regime behavior. Section 5 presents the results of a modeling study used to test the theory. A summary and conclusion of our work follows in section 6, including implications for S2S prediction.

a. Hindcasts and reanalysis
For historical analysis and verification, we use the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA-Interim reanalysis (Dee et al. 2011). Hindcasts are taken from version CY43R3 of the ECMWF extended-range prediction system (used to produce operational forecasts from July 2017 to June 2018) as part of the S2S database. The hindcasts consist of an 11-member ensemble (1 unperturbed member and 10 perturbed members) initialized from ERA-Interim twice per week. The model has a resolution of Tco639 1 up to day 15 and Tco319 after day 15, and 91 vertical levels. 2 All data are sampled once per day at 0000 UTC, and regridded to 2.58 latitudelongitude resolution for computational efficiency and since we are only considering large-scale fields.

b. Regime definitions
The definition of North American weather regimes follows that of Lee et al. (2019b), extended by 1 year. We take 500-hPa geopotential heights (Z500) in the region 1808-308W, 208-808N in all December-March days in the period 1 January 1979-31 December 2018 in ERA-Interim (4840 days) and subtract the daily climatology over this period. (Any trends in Z500 are found to have little impact on the regimes, so detrending is not performed.) Then, data are weighted by the square root of cosine latitude, and EOF analysis is performed, retaining the leading 12 EOFs that explain close to 80% of the variance; k-means clustering is then performed (Pedregosa et al. 2011) in the nonstandardized 12-dimensional principal component (PC) space, with k set to 4. In addition to reducing the dimensionality of the clustering problem and filtering smaller-scale variability, performing the clustering in PC space produces a coordinate system that enables interpretation of the regimes in terms of their comprising EOFs, linking two widely used prediction frameworks. After generating the clusters, each day is then assigned to one of the four regimes by the minimum Euclidean distance to the cluster centroids in PC space.
For regime assignment in the hindcasts, the model Z500 climate is first subtracted, to account for systematic biases. The model climate is computed for each initialization date and lead time over the 20-year hindcast period. Then, the daily data are projected onto the 12 EOFs, and each day is assigned to a regime based on these pseudo-PC loadings. As an additional forecast diagnostic in the model experiments, weekly mean regimes are produced by first averaging the PCs over a 7-day period and then assigning to a regime; these are found to be largely consistent with the regime occupying the majority of days within each week (not shown).

c. Regime bust criteria
We select subseasonal regime "busts" from the ECMWF hindcasts where there is strong ensemble support ($7 members, or approximately two-thirds) for one specific incorrect regime to be dominant (i.e., present on at least 8 days) during days 14-27 (weeks 3-4). These criteria are designed to pick out cases that suggest a strong, but incorrect, subseasonal signal constraining the model analogous to a "precise but inaccurate" forecast. As such, the model confidence may be erroneously interpreted as enhanced predictability and accuracy, with potentially large real-world impacts from subsequent decision-making. We choose only hindcasts initialized during December-February, as the seasonal cycle may affect week-3-4 forecasts initialized during March. These criteria yield 31 initialization dates. A further stipulation is applied such that the initialization dates must be separated by at least 21 days to avoid analyzing multiple instances of the same event; in these cases, the earliest initialization date is selected. This step filters the number of cases to 20 (i.e., on average 1 per winter), which are listed in Table 1. Except for forecasts of an Arctic high verifying as an Alaskan ridge, all forecast-verification combinations are included at least once (not by design).
No stratospheric error criteria are included in order to assess both to what extent poor subseasonal regime forecasts are associated with stratospheric errors and the effect of stratospheric relaxation even in cases with a relatively wellforecast stratosphere. We find that the majority of bust cases feature ensemble-mean U100 error magnitudes $ 3 m s 21 (14 of the 20 initialization dates, including 8 week-3 and 12 week-4 forecasts), approximately the mean absolute error (MAE) of the December-February week-3-4 hindcasts (see Fig. S1 in the online supplemental material). This suggests that regime busts and large lower-stratospheric vortex errors often co-occur.

d. OpenIFS model
For model experiments, we use OpenIFS 3 version 43r3v1} a research version of the ECMWF IFS (Integrated Forecast System) model CY43R3, but without data assimilation. The model is initialized from ERA-Interim and run on a linear Gaussian grid with T255 resolution, 60 vertical levels (i.e., the resolution of ERA-Interim), and a time step of 45 min. Output data are bilinearly interpolated onto a 2.58 latitudelongitude grid. Each ensemble consists of an unperturbed member and 20 perturbed members, in which spread is generated by the stochastically perturbed parameterization tendencies (SPPT) and stochastic kinetic energy backscatter (SKEB) schemes (Leutbecher et al. 2017). The ensemble size is chosen as a balance between the potential gain from additional members compared with the 11-member hindcasts and computational expense. The OpenIFS runs differ from the operational model in both resolution and in that there is no representation of initial condition uncertainty, so some differences between these model runs and the equivalent hindcasts are to be expected. As we are primarily considering forecasts on time scales of several weeks, the initial condition uncertainty is considered less important, and the stochastic schemes generate spread comparable to the hindcasts in the fields analyzed in this study.
For each initialization date, two sets of ensembles are produced: a control (CTR) run in which the forecast freely evolves (comparable with the equivalent hindcast, notwithstanding the model differences), and a relaxed (RLX) run in which the Arctic stratosphere is nudged toward ERA-Interim using the IFS relaxation scheme (e.g., Jung et al. 2010a). The relaxation scheme operates by applying a nonphysical tendency to the model equations of the form where X is a model prognostic variable, X obs is the "observed" value from ERA-Interim, and k [unit: (time step) 21 ] is the relaxation coefficient controlling the strength of the forcing [following, e.g., Jeuken et al. (1996) and Magnusson (2017)]. The term X obs at each model time step is generated by linear interpolation between 6-hourly reanalysis files. A relaxation time scale of 12 h is used in this study, corresponding to k 5 0.0625 per time step given the 45-min model time step, which can be interpreted  -4 dom.) regime is that which is predicted by $7 ensemble members (64%) to be present on $8 days during days 14-27 inclusive, verified against the ERA-Interim regime that is present for $8 days during the same time period. Week-3 and week-4 regimes are the regimes of the weekly mean field with the largest ensemble support; «U is the ensemble-mean error in the 100-hPa 608N zonal-mean zonal winds averaged over each week. The data are grouped by the dominant regime prediction and then sorted by the week-4 «U. as nudging the model state at each time step by 6.25% of the departure from the reanalysis. Vorticity, divergence, and temperature are relaxed in model gridpoint space with an exponential taper at both the latitude and model-level boundaries.
A profile of the relaxation domain is shown in Fig. 1. The domain boundaries are chosen to both maximize constraint of the polar lower stratosphere while allowing for a sufficiently smooth taper to minimize negative numerical effects, and to remain largely poleward and upward of the subtropical jet to reduce directly constraining the tropical upper-tropospheric waveguide. The choice of domain is also limited by the vertical level spacing of the model in the upper troposphere and lower stratosphere. We employ a weaker stratospheric nudging than some previous studies (e.g., Jung et al. 2010a;Kautz et al. 2020), but note that the relaxation in our study extends further into the lower stratosphere. Analysis of the output fields show this relaxation strength is enough to constrain the model. Time series of the U100 forecasts from the CTR and RLX experiments and the corresponding verification from ERA-Interim are shown in Fig. S2.
As the random seed used in the stochastic schemes is fixed for each ensemble member, the equivalent ensemble members in the CTR and RLX experiments differ only by the stratospheric nudging. In analyzing the OpenIFS runs, we assume the model climatology is equivalent to that of the corresponding CY43R3 hindcasts.

e. Significance testing
Throughout the paper, statistical significance is assessed at the 95% confidence level by bootstrap resampling (e.g., Wilks 2019). Random samples (with replacement) are taken from the population and the quantity under analysis (e.g., a regression coefficient) is calculated and stored. This process is repeated 10 000 times, and then a confidence interval is constructed from the appropriate percentiles of this distribution (2.5th-97.5th percentiles for two-sided 95% confidence).

Regimes and EOFs
The centroids of the four regimes (expressed as the Z500 field reconstructed from the sum of the centroid loading in the leading 12 EOFs), along with the percent of days assigned to each (the occupation frequency), are shown in Figs. 2a-d. In terms of both spatial patterns and the ranking of occupation frequency, these match the regimes of Lee et al. (2019b) and so we follow their naming convention [after Straus et al. (2007)]: Arctic high (ArH), Arctic low (ArL), Alaskan ridge (AkR), and Pacific trough (PT). The coordinates of the regime centroids in the leading 12 PCs are shown in Fig. 2e. Only the leading three PCs have large contributions to the centroids; performing the same clustering analysis but retaining only the leading three PCs yields very similar patterns, with only 4% of days assigned to a different regime. Therefore, we now focus our analysis on these leading three EOFs.

5919
Maps of the EOFs and the percent of the total variance explained are shown in Figs. 2f-h. In total, these three EOFs explain close to 40% of the daily variance within the domain, and are well separated according to the criterion of North et al. (1982). The sign of the EOFs is here defined such that a positive loading produces an anomalous trough in the northeast Pacific. EOF1 is similar to the PNA (Wallace and Gutzler 1981) but slightly eastward shifted. It also bears some similarity to the tropical-Northern Hemisphere (TNH) pattern (Mo and Livezey 1986;Liang et al. 2017). Furthermore, there is a meridional dipole in the North Atlantic in the eastern edge of the domain, reminiscent of NAO-like variability. EOF2 has a meridional dipole in Z500 anomalies, and thus some similarity to the surface-based NAM/AO, but with a center of action over Alaska that is not characteristic of the surface NAM (e.g., Thompson and Wallace 1998). EOF3 is characterized by a wavenumber-2 pattern across the domain.
Comparison of these regional EOFs with the leading three EOFs for the Northern Hemisphere poleward of 208N (Figs. S3-S5) shows a high degree of similarity in both the correlation of the PC time series (Pearson's correlation r $ 0.77; p , 0.05) and spatially (area-weighted pattern correlation $ 0.87 over the North American domain). We can therefore be confident that the leading three EOFs used in the clustering are regional manifestations of hemispheric variability, and that hemispheric variability is dominant in the smaller domain under consideration. The EOFs presented here}with the most NAM-like pattern in EOF2, while the leading EOF contains NAM/NAO and PNA-like characteristics}agrees well with the upper-tropospheric EOF analysis of Baldwin and Thompson (2009). For all three North American EOFs, the e-folding time scales of the PC time series are 5-7 days, which is similar to the median number of consecutive days with the same regime assignment. However, a quarter of the individual blocks of consecutive regime days persist for more than 1 week (including one instance of 39 days of ArL up to and including 22 February 1990), motivating their utility for extended-range prediction.
To understand the relationship between regime occurrence and the lower-stratospheric vortex presented in Lee et al. (2019b), we examine the relationship between U100 and the leading EOFs which define the clusters. We perform linear regression between each PC time series and the contemporaneous U100 to see how changes in U100 may modulate the location of a point within the 3D PC space and thus its regime attribution. The instantaneous relationship is used since we are considering the lower stratosphere as an upper boundary condition to the troposphere, with both a much longer memory (e.g., Baldwin et al. 2003) and greater predictability (Son et al. 2020); lagged relationships (not shown) reveal these coefficients are either effectively maximized at lag 0 or, considering uncertainty, largely invariant for 67 days (within the PC e-folding time scale). Some of this relationship may relate to the vertical extension of a primarily tropospheric zonal wind signature associated with these EOFs into the lower stratosphere. However, on subseasonal scales (well beyond tropospheric decorrelation time scales) this remains the component of the structure that is potentially predictable.
The regression coefficients are shown in Fig. 3. Although the coefficients for all three EOFs are significantly different from zero, the linear relationship is 3-5 times stronger for EOF2. Similarly, the Pearson's correlations between U100 and PCs 1 and 3 are small (r 5 20.13 and 0.10, respectively), but moderate for PC2 (r 5 0.42). Thus, the effect of the stratosphere in this 3D EOF space is mostly contained within EOF2, which is consistent with its annular-like spatial pattern and the height-dependent NAM results of Baldwin and Thompson (2009). The sign of the regression coefficients is such that a decrease in U100 is associated with an increase in Z500 in the vicinity of Greenland/the northern node of the NAO, in agreement with the canonical response of the troposphere to a weakened stratospheric vortex.

Theory of regime transitions and the stratosphere
In this section, we develop a theory of which regime transitions may be possible solely due to a stratospheric perturbation by jointly considering the linear relationship between U100 and the three PCs, and the location of the regimes within the space spanned by the three PCs. The theory can be interpreted as an idealized framework where all else is instantaneously equal and only the stratosphere is changed, retaining potential predictability arising from other tropospheric processes.
Using the regression coefficients between U100 and the PC time series, we define the stratospheric perturbation vector b. This vector represents the movement within the 3D PC space arising from a perturbation to U100, DU, that is explained by the linear regression coefficients: Note that b is not a function of the position within PC space and is thus constant for a given DU. While the truncation to a 3D PC space was earlier motivated by the coordinates of the regime centroids, the linear relationship between the leading three EOFs and U100 also accounts for nearly all of the linear relationship with Z500 (Fig. S6).
The transition vector g between two points (e.g., two cluster centroids) within this space is then defined as the respective distances between the coordinates in the three PCs: where DPC k 5 PC k (B) 2 PC k (A) for the transition from point A to point B. Hence, inverse transitions have an equal but opposite transition vector: g(A, B) 5 g(B, A).
The angle u between b and g follows as where x 5 x 2 1 + x 2 2 + x 2 3 denotes the Euclidean norm of a 3D vector x.
We use this framework to model which regime transitions are possible solely with stratospheric forcing by considering whether the vectors b (either positive or negative) and g point in a similar direction, known as "cosine similarity" (e.g., Han et al. 2012). If u $ 908 (cosu # 0), then no component of the regime transition or movement within the 3D PC space can be explained by the linear relationship between the PCs and U100, since the contribution of b would be 0 (in the case of maximally dissimilar vectors, u 5 908) or oppose g (cos u , 0). A smaller angle indicates the transition is more likely since the projection of b in the direction of g is larger (as cos u is larger), thus requiring a smaller DU. We focus on angles, rather than explicit distances, since the distances between regimes for any point are dependent on the initial location. Figure 4 presents a 3D depiction (in the space spanned by the leading three EOFs) of b (both positive and negative; i.e., for a strengthening or weakening stratospheric vortex) applied to each regime centroid and the transition vector g between the centroids. The regime centroids form a tetrahedron in this space. Some of the transition vectors lie closer to b than others owing to their relative locations within this space. For example, the positive b vector and the transition vector from the ArH to PT centroids are close, while the transition vectors from the AkR centroid are almost perpendicular to either sign of b. The angles between the centroid g vectors and b are quantified in the protractor-like polar plots in Fig. 5. The angles are expressed such that both positive and negative b are aligned with 08 (thus, the angle between each g and b , 0 is a reflection of that to b . 0 about 908). For a point starting at the ArH centroid (Fig. 5a), there is substantial cosine similarity between b . 0 and transition vectors to all other regimes (for all three, u , 608). The similarity is strongest for the transition vectors to PT and ArL, which have approximately equal cosine similarity. The angles between b , 0 and all three transition vectors are .908; thus, the theory does not allow a transition away from ArH given DU , 0. Overall, ArH has the largest number of transition vectors with small angles/high cosine similarity. Equally, the minimum angle between either sign of b and any g vector is between b , 0 and transitions to ArH (Figs. 5b-d). This is consistent with the observed probability of transitions into, and the persistence of, ArH/NAO-, which is the most sensitive of both the North American and North Atlantic regimes to the strength of U100 (Charlton-Perez et al. 2018;Lee et al. 2019b).
For the PT regime (Fig. 5b), there is a small angle between the negative b vector and the transition vector to ArH (i.e., equal and opposite to the positive b and the transition from ArH to PT). While transitions are possible to both AkR with b , 0, and to ArL with b . 0, the angles are close to 908, suggesting that these are unlikely. Considering the ArL regime (Fig. 5c), transitions to all three other regimes are possible with b , 0. The smallest angle is to the ArH transition vector, while the angles to the PT and AkR transitions are large. No regime transitions from ArL are possible in this framework with DU . 0. Last, the angles between the transition vectors and b are all relatively large for AkR (Fig. 5c), as previously suggested by the 3D depiction in Fig. 4. For b , 0, only a transition to ArH has an angle , 908. Transitions to ArL and PT are possible with b . 0, but the angles are relatively large and thus more unlikely.
We next extend our analysis beyond points initiating at the centroids and incorporate the effect of spread around the PC space spanned by each regime. First, we consider all the assigned regime days in ERA-Interim. The leading three PCs are then perturbed by b in the range 230 # DU # 30 m s 21 , and subsequently reassigned to a regime by minimum Euclidean distance. The maximum magnitude of DU is chosen here to be close to the maximum observed variability in U100; the largest U100 errors in individual CY43R3 ensemble members are close to 620 m s 21 . Note that in reality, the tropospheric response may be larger for a smaller DU as a consequence of the linear framework. Figure 6 depicts the conditional probability, for each initial regime, of either remaining in the same regime or transitioning to each of the other regimes for each DU. Only those transition pathways with u , 908 occur, and the relative likelihood manifests the degree of similarity (i.e., the angle) between b and g. There are no transitions away from ArH for DU , 0 ( Fig. 6a) or away from ArL for DU . 0 (Fig. 6c). For DU , 0, the dominant transition for all regimes is to ArH. For DU . 0, transitions from ArH to PT dominate (Fig. 6a) while transitions to ArL dominate for AkR and PT (Figs. 6b,d). Transitioning into AkR from any other regime is unlikely even for large |DU|, while transitioning out of AkR is the least likely for any of the regimes where a transition pathway exists (despite its unique approximately equal sensitivity for either sign of DU). Although not explicitly shown, there is also evidence of multiple transitions occurring as |DU| increases. For example, the probability of transitioning into AkR from each of the other regimes reaches a peak for |DU| between 10 and 20 m s 21 before declining.
As a general diagnostic of the sensitivity of each initial regime state to a lower-stratospheric perturbation, we can consider the probability of transitioning out of the regime for DU 5 610 m s 21 (approximately equal to the maximum week-3-4 ensemble-mean U100 error magnitude in CY43R3 hindcasts). For DU 5 10 m s 21 , 58% of ArH days transition into a new regime, while only 17% of AkR days and 6% of PT days do so. For DU 5 210 m s 21 , the sensitivity of PT and ArL is approximately equal, with 39% of PT and 38% of ArL days transitioning into a new regime. Only 15% of AkR days transition into a new regime.
Overall, the results presented in Figs. 4-6 are in agreement with the observed differences in regime occurrence in strong and weak stratospheric vortex states in Lee et al. (2019b). The theory also gives results consistent with the relationship between the regimes (particularly ArH and ArL) and the concurrent NAO index (Fig. S7), given the strong modulation of the NAO by the stratosphere. Further, the proposed framework yields insight into specific regime transitions under different vortex states that are not limited by the observational sample size. In summary: • DU , 0 moves the majority of points within PC space toward only ArH, consistent with this regime being the only one more likely under weak vortex conditions. • DU . 0 does little to changing the regime assignment for days initially assigned to ArL or PT, while these are favored transitions for initial ArH and AkR states. This is consistent with ArL and PT being more likely under strong vortex conditions. • Very large DU is required to shift toward and away from AkR, with a similar proportion of transitions resulting from both positive and negative perturbations. This behavior is consistent with the observed statistically equal occurrence of this regime in strong and weak vortex states.
These conclusions are highly idealized, requiring both a perfectly linear response and the sole (or dominant) change being to U100. It is also possible that b may be sensitive to the initial position within PC space. However, the corroboration with observations suggests the potential use of this framework in interpreting the regime response to changes and uncertainty in the stratosphere on subseasonal time scales. The analysis in the next section considers whether imposing stratospheric relaxation yields a tropospheric response consistent with this simple but novel theory.

Model experiments
In analyzing the results of the relaxation experiments, we seek to answer the following two questions: • What is the effect of stratospheric relaxation on regime forecast accuracy in these cases? • Regardless of the forecast accuracy, is the change in the forecast consistent with the theory in section 4?

a. Regime predictions
A comparison between the weekly mean regimes in the CTR and RLX ensembles, for weeks 3 and 4, is shown in Fig. 7. The improvement in the total number of ensemble members with a correctly assigned weekly mean regime is modest: 13% in week 3 and 15% in week 4. Therefore (recalling that these cases were selected as particularly poor forecasts), the overall fraction of correctly assigned regimes remains low in the RLX experiment: 40% in week 3 and 25% in week 4. Any improvement is also case dependent. The greatest improvement in week 3 is in the 11 December 2001 case (7 more members correctly assigned to ArH), and in the 29 January 1998 case (5 more members correctly assigned to PT) in week 4. The latter was a case with a very large U100 error (cf. Table 1). In several cases, there is a decrease in the number of correctly assigned ensemble members. Thus, constraining the stratospheric state is not enough to fix these regime bust cases}which may be unsurprising given that only a selection of these cases have very large stratospheric errors, while all have largely inaccurate regime predictions. This result indicates that the stratospheric state should not be viewed  Figure 7 also shows that there are changes to the number of ensemble members assigned to the incorrect regimes, regardless of whether there is a change to the number assigned to the correct regime. On a member-by-member basis, 34% and 57% of the total ensemble members in weeks 3 and 4 respectively are assigned to a different regime in the RLX experiments. Thus, by week 4, the stratospheric nudging has shifted the majority of ensemble members into a new regime} suggesting significant movement within the PC space in which the regimes are assigned. For example, in week 4 of the 11 December 2001 case, there is no increase in the number of members correctly assigned to PT, but there is a gain of eight ensemble members assigned to AkR (with ArH and ArL losing four members each). While a full case-by-case analysis may yield further specific insight, it is beyond the scope of this study; we instead focus on the general results across this set of forecasts.

b. Error reduction in PC space
Despite the small and case-dependent regime improvement, for almost all cases the mean Euclidean distance error of the ensemble in 3D PC space is reduced (Fig. 8a). This diagnostic is useful because it incorporates changes to forecasts that maintain the same regime attribution and is proportional to the root-mean square error (RMSE) of the Z500 field reconstructed from the leading three EOFs (see the online supplemental material; note that because non-normalized PCs are used, the total error on subseasonal time scales is dominated by the EOFs with the largest eigenvalues). Hence, in the space in which regimes are assigned, the RLX forecasts are almost entirely closer to the verification. The improvement is maximized in week 3 (median 14%), with only two cases showing an increase in error (21 December 2005 and 8 February 2010, both of which had negligible week-3 U100 errors in the CTR run). The median improvement in week 4 is 12%, but with much greater spread than week 3. There was a 30% improvement in a single case (21 December 2014), while four cases show no change or increased error (7 December 2000, 11 December 2001, 8 February 2010, and 15 February 2017. Also shown in Fig. 8a is the mean change in Euclidean distance error obtained by perturbing the PCs of the CTR ensemble by b multiplied by DU between the CTR and RLX experiments. This shows that a simple statistical nudge of the PCs using the known linear relationships also yields an error reduction of on average ∼50% of that obtained by running the full dynamical relaxation experiment. Thus, a substantial component of the dynamical effect of imposing a different stratospheric state on these EOFs can be explained by the observed linear relationship between the PCs and U100. To understand whether larger stratospheric forcing yields larger error reduction, Fig. 8b shows the case-by-case change in ensemble-mean Euclidean distance error against the magnitude of the U100 change between the CTR and RLX experiments for weeks 3 and 4. There is no immediately clear relationship, with the greatest error reduction occurring with a U100 change of only 1 m s 21 while the largest error increase occurs with a U100 change of 2.6 m s 21 (8 February 2010). The large relative error reduction for small DU suggests a potential role of zonally asymmetric corrections or other changes to the vortex that do not project strongly onto U100 (and thus fall outside the framework proposed here). However, across this set of 20 cases, for DU exceeding 3 m s 21 , there is a systematic error reduction. We revisit this apparent threshold in the analysis below.

c. Movement within PC space
We now investigate whether the movement of the forecasts within 3D PC space is consistent with what might be expected from the theory established in section 4. For this analysis, we analyze three vectors and three different angles within PC space. Figure 9 shows a schematic of this approach. The vectors are defined as follows: • CTR-ERA: the vector between the CTR forecast and the verification from ERA-Interim (i.e., the error in the CTR forecast). • CTR-RLX: the vector between the CTR and RLX forecasts. Then, the size of the three angles can be used to answer the following questions: • u 1 5 u(CTR-ERA, CTR-RLX): Does stratospheric relaxation move the CTR forecast toward the verification? • u 2 5 u(CTR-RLX, CTR-STAT): Does stratospheric relaxation move the CTR forecast in the direction expected from b? • u 3 5 u(CTR-ERA, CTR-STAT): Does statistical nudging by b move the CTR forecast toward the verification?
A scatter of the week-3 and week-4 angles versus the magnitude of DU between the CTR and RLX experiments is shown in Fig. 10. To focus on the overall shift of the ensemble in the relaxed experiments, and since b is defined from linear best-fit regression coefficients, we perform this analysis on the perturbations to the PCs and U100 averaged across the ensemble. Nevertheless, similar results are obtained when considering the results across all individual ensemble members (not shown). Figure 10a shows that in the majority of cases and in both weeks 3 and 4, the stratospheric relaxation generally moved the predictions toward the verification. Only two cases in week 3 and six cases in week 4 do not exhibit any similarity (i.e., u . 908). These results are consistent with the reduction in Euclidean distance error and its relationship with the magnitude of DU (Fig. 8).
Figure 10b assesses whether the stratospheric perturbation vector outlined in section 4 is a good representation of the effect of a dynamically applied stratospheric perturbation. For |DU| , ∼3 m s 21 , the points are scattered across almost the full range of angles, indicating no clear relationship between the theory and the movement of these forecasts in PC space. However, although the sample is smaller, for |DU| . ∼3 m s 21 , the angles are systematically much smaller than 908}especially for week-4 forecasts, which feature larger DU. Hence, we conclude that on average, these forecasts moved in PC space in the general direction expected from the theory.
Finally, Fig. 10c assesses whether the simple statistical perturbation moves the CTR forecast toward the verification without running a full dynamical experiment (cf. Fig. 10a). As in Fig. 10b, but unlike in Fig. 10a, there is no clear evidence of vector similarity for small DU, but there is evidence of a systematic shift for DU exceeding ∼3 m s 21 in magnitude. As a result, for larger U100 errors the tropospheric forecast can be partially corrected statistically (as indicated by Fig. 8a), but there is evidently additional gain from a dynamically corrected stratosphere even for small DU.
The 3 m s 21 threshold is most apparent for angles involving b, although there is some suggestion for the behavior of the RLX experiment (in terms of both angles and Euclidean distance error). It is not clear why 3 m s 21 should be a threshold; it may be related to the signal magnitude required to emerge above the typical ensemble-mean variability, and thus may be sensitive to ensemble size. Across the CY43R3 hindcasts, FIG. 8. (a) Boxplots of the ratio between the ensemble-mean Euclidean distance error in 3D PC space between the weekly averaged RLX and CTR ensembles for the 20 cases. Red lines denote the median, and notches show 95% confidence intervals obtained by 10 000 bootstrap resamples (with replacement). Black triangles denote the mean. Blue circles represent the average ratio obtained by statistically perturbing the CTR PCs by the stratospheric perturbation vector multiplied by the change in U100 between the CTR and RLX ensembles. Whiskers extend to 1.5 times the interquartile range or extremes (whichever is smaller); outliers shown as open circles. (b) Scatterplot of the week-3 (green squares) and week-4 (maroon circles) error ratio against the magnitude of the ensemble-mean change in U100 between CTR and RLX.
3 m s 21 is approximately two-thirds of the standard deviation of the ensemble-mean U100 in weeks 3-4 (∼4.5 m s 21 ), although these are not directly comparable owing to the smaller hindcast ensemble size. As mentioned in section 2, 3 m s 21 is also approximately the MAE of the ensemble-mean week-3-4 U100 in the CY43R3 hindcasts, and so errors of this magnitude are a reasonably frequent occurrence.
In week 4 (when DU is generally largest), the magnitude of the correlations between the ensemble-mean change in the PCs and the ensemble-mean DU from CTR to RLX (and thus the individual components of b) are maximized. These correlations are largest for EOF2 (r 5 0.60, p , 0.05) and EOF3 (r 5 0.48, p , 0.05) but the correlation is small and insignificant for EOF1 (r 5 20.19, p 5 0.40; although it is similar to that in ERA-Interim). Furthermore, we can find the "effective" vector in the model by computing the regression coefficients between DU and each DPC across all ensemble members. For weeks 3-4, these are not significantly different from the components of b in ERA-Interim, except slightly for EOF1 in week 3. As a result, the angles between this effective vector and b are small (268 in the week-3 forecasts and 128 in the week-4 forecasts), confirming that b is a good approximation of the response to an imposed stratospheric change. FIG. 9. Schematic of the angle-based approach (here in a 2D PC space). There are three vectors: the vector from the control forecast to the ERA-I verification (CTR-ERA; red), the vector from the control forecast to the relaxed forecast (CTR-RLX; purple), and the stratospheric perturbation vector to the statistically nudged forecast (CTR-STAT; gray). Here u 1 denotes the angle between CTR-ERA and CTR-RLX, u 2 the angle between CTR-RLX and CTR-STAT, and u 3 the angle between CTR-ERA and CTR-STAT.

5927
Nevertheless, across the range of cases studied here, the response of EOF1 to stratospheric perturbations is not well approximated by linear regression. This may be due to nonlinearity, or that the relationship between the EOF and U100 is not causal (recalling the similarity between the EOF and patterns related to tropical forcing). Sample size may be an issue, given that the small expected response in EOF1. There may also be limitations in the representation of stratospheretroposphere coupling in the model, such as the overestimation of the NAO response reported by Kolstad et al. (2020) using a similar but more recent ECMWF forecast model (CY45R1). The relatively low vertical resolution employed here, particularly in the upper troposphere and lower stratosphere, may also have limited the downward coupling and forecast improvement arising from the stratosphere (Kawatani et al. 2019;Domeisen et al. 2020c).

Summary and conclusions
Understanding and exploiting stratospheric variability is a key way in which the accuracy and usefulness of S2S forecasts and the fidelity of stratosphere-troposphere coupling within models can be increased. In this study, we investigated how perturbations to the strength of the lower-stratospheric polar vortex can influence North American weather regime predictions. Our novel technique involved jointly considering the linear relationship between the vortex strength and the leading EOFs that contribute to the regimes (Fig. 3), and the relative location of the regimes within the EOF space (Fig. 4). We used an angle-based approach to quantify which transitions are likely to occur (using cosine similarity) for a given regime and stratospheric perturbation (Fig. 5). These results agree with the observed changes in regime occurrence under different stratospheric vortex states reported in Lee et al. (2019b) and provide an explanation for the regime behavior. However, both the regime framework and EOFs are defined primarily from a mathematical, rather than physical, standpoint, and therefore the results of this work largely focus on the mathematics of regime attribution.
We then performed a set of stratospheric relaxation model experiments, selecting 20 cases from the ECMWF hindcasts in which there was strong, coherent ensemble support for an incorrect regime to dominate during weeks 3-4. The majority (14) of these cases featured U100 errors approximately equal to or greater than the MAE in either week 3 or 4 or both, suggesting a link to the erroneous tropospheric forecasts. We found that the stratospheric relaxation is not enough to eliminate the regime errors, but the relaxation does lead to shifts in the ensemble distribution of the regimes within each forecast indicating substantial movement within PC space (Fig. 7). The results also showed an overall 10%-20% improvement in the accuracy of the forecasts in terms of Euclidean distance error/RMSE, which was most consistent in cases where the stratospheric error was larger (Fig. 8).
Analysis of the transition vectors between the CTR and RLX forecasts in PC space provided insight into the effect of stratospheric relaxation in the space in which regimes are assigned. The results (Fig. 10) illustrated that stratospheric relaxation generally moved the forecasts toward the ERA-Interim verification and in the direction of that expected from the theory, while statistically nudging the CTR ensembles by the corresponding stratospheric perturbation vector also generally moved the forecasts toward the verification. For |DU| . ∼3 m s 21 , this effect was particularly pronounced. Consequently, the model experiments support the proposed theory of which regime transitions may be possible solely because of changes to the stratospheric state (Fig. 5).
Overall, our results provide evidence that, all else being equal: • The average shift of an ensemble of subseasonal North American weather regime forecasts in response to changes in the strength of the lower-stratospheric vortex is broadly generic and predictable. • Correcting the stratospheric state leads to an improvement in the large-scale subseasonal tropospheric forecast over North America, but it does not necessarily correct the regime assignment (likely due to other sources of error). • Some tropospheric regime states are more likely to change regime assignment for a given stratospheric perturbation than others. This arises due to the location of the regimes in PC space relative to the linear tropospheric response to the stratosphere.
We therefore propose that this vector-based approach can be used to identify, a priori, the regime forecast-verification scenarios in which lower-stratospheric errors are more likely to have played a substantial role}and thus toward understanding the overall contribution to subseasonal North American weather regime forecast accuracy. Further, it is possible that in certain circumstances when stratospheric uncertainty is dominant that the method could be used in real time to qualitatively interpret regime forecast uncertainty owing to stratospheric uncertainty. This approach is likely to be most useful 2-3 weeks before SSWs or strong vortex events, when abrupt forecast shifts (e.g., Lee et al. 2019a) are more likely due to the current predictability limit of these phenomena (Domeisen et al. 2020b). It may also be plausible to use the technique on-the-fly to linearly impose alternate regime "storylines" arising from a different stratospheric evolution without running additional dynamical forecasts.
Moreover, the dominantly linear and apparently generic response to the lower-stratospheric forcing on these time scales is somewhat similar to the long-lag response following SSWs in the model experiments of White et al. (2020). The idea that the tropospheric flow configuration following an imposed stratospheric change depends on the state of the troposphere is not a new idea (e.g., Gerber et al. 2009), but as a result, potential gains in subseasonal regime prediction skill from the stratosphere may be minimal if the tropospheric forecast otherwise drifts too far from the truth [also recently suggested by Charlton-Perez et al. (2021)]. This potential limitation is consistent with the regime forecasts remaining largely inaccurate even in cases where large lower-stratospheric errors were corrected, notwithstanding the imperfections of the model experiment.
Employing a stronger stratospheric nudging in the model experiments presented in this paper may produce greater improvement in the regime forecasts. On the other hand, constraining the prediction too strongly would exceed a realistically achievable level of stratospheric forecast accuracy on these scales. It is also plausible that the nudging may have limited potential tropospheric forecast accuracy (when compared with a true perfect stratosphere forecast) by inducing unrealistic wave behavior or generation on the boundary of the nudging domain (Hitchcock et al. 2022). Also, model experiments with a greater horizontal and vertical resolution may also yield better results, with evidence supporting a link between increased resolution and better representation of modes of variability in S2S models (Quinting and Vitart 2019;Lee et al. 2020) and downward stratosphere-troposphere coupling (Kawatani et al. 2019). The 60-level model version used in the experiments performed here (limited by the resolution of ERA-Interim) is coarser than the 91-level model used operationally, suggesting there is scope for the impact of an improved stratospheric forecast to be greater in the operational model (and thus lead to more regime shifts).
Further, we have exclusively considered the effect of changes to the strength of the lower-stratospheric polar vortex defined through the zonal-mean zonal wind at 100 hPa and 608N. A more complex analysis may incorporate the effects of wave propagation (Perlwitz and Harnik 2003;Kodera et al. 2008), vortex morphology (Cohen et al. 2021), or the representation of ozone chemistry (e.g., Oehrlein et al. 2020). While the use of zonal-mean quantities is motivated by annular modes, the approach can mask important subhemispheric variability such as localized wave reflection (e.g., Matthias and Kretschmer 2020).
A case-by-case analysis of the dynamics involved, including the interplay between stratospheric errors and other leading sources of subseasonal prediction (e.g., the Madden-Julian oscillation, which can act together with stratospheric variability; Schwartz and Garfinkel 2017; Barnes et al. 2019; Green and Furtado 2019) is a potentially fruitful avenue of future work. Moreover, using the proposed angular diagnostic to assess the tropospheric regime response to stratospheric perturbations across a much larger set of simulations (and in different geographic regions) will aid in understanding the robustness of the results of this study. vironment Research Council (NERC) via the SCENARIO Doctoral Training Partnership (NE/L002566/1) at the University of Reading. S.J.W was supported by the National Centre for Atmospheric Science, a NERC collaborative centre. This work is based on S2S data. S2S is a joint initiative of the World Weather Research Programme (WWRP) and the World Climate Research Programme (WCRP). The original S2S database is hosted at ECMWF as an extension of the TIGGE database. The authors thank Glenn Carver and Marcus Koehler at ECMWF for their support with OpenIFS and preparation of the initial conditions and relaxation data and three anonymous reviewers for their helpful comments on earlier versions of the manuscript.