## 1. Introduction

Many features of the atmosphere–ocean system’s large-scale variability can be viewed as transitions between qualitatively different regimes. Examples include blocking, monsoons, El Niño, and sudden stratospheric warming (SSW) events (the subject of this paper), all of which are associated with extreme weather. From a scientific perspective, regime transitions are handles by which to probe the climate’s nonlinear, nonequilibrium dynamics. They expose novel physics and push us to qualitatively expand our physical understanding. From a human perspective, these relatively rare anomalies pose major societal challenges (Lesk et al. 2016; Kron et al. 2019), especially with a changing climate and increasing reliance on weather-susceptible infrastructure (e.g., Mann et al. 2017; Frame et al. 2020).

Regime transitions are used as benchmarks for model development across a hierarchy, from state-of-the-art Earth system models with billions of variables (e.g., Stephenson et al. 2008; Lengaigne and Vecchi 2010; Vitart and Robertson 2018) to conceptual low-order models with fewer than 10 variables (e.g., Charney and DeVore 1979; Timmermann et al. 2003; Ruzmaikin et al. 2003; Crommelin et al. 2004; Thual et al. 2016). In Finkel et al. (2021), we addressed near-term forecasting of regime transitions in the context of an idealized SSW model constructed by Holton and Mass (1976), which possesses two metastable states: a strong-vortex regime and a weak-vortex regime. The present paper’s chief goal is to address questions about the long-term climate statistics of rare events by way of a case study on SSW-like regime transitions in the Holton–Mass model: How often do they occur, what are their typical development pathways, and how variable are those pathways between events?

We will use the framework of transition path theory (TPT; E and Vanden-Eijnden 2006), which offers a concise set of quantities to answer these questions. An SSW event is represented as a *transition path* from the strong vortex regime, which we denote state *A*, to the weak vortex regime, state *B*. The main quantity of interest will be the *reactive current* **J*** _{AB}*, defined in section 3, which specifies the flow of probability density through state space

*conditioned on an A*→

*B transition event being underway*. To properly implement that conditional statement, we will need two auxiliary quantities. First, the

*forward committor*

**x**, next reaches

*B*before

*A*. This is a measure of progress toward SSW: What is the probability of observing an SSW before returning to the strong vortex climatology? Second, the

*backward committor*

*A*more recently than

*B*, i.e., the model was last in the metastable strong vortex climatology, as opposed to just recovering from a recent SSW.

The forward committor itself was a primary focus of Finkel et al. (2021), where we pursued forecasting as the main objective. Committor probabilities are generally gaining traction as a metric for weather prediction; see Tantet et al. (2015) for an application to atmospheric blocking, Lee et al. (2018) for an application to tropical cyclone downscaling, Lucente et al. (2022) for an application to El Niño, and Miloshevich et al. (2022) for an application to heat waves. However, in the present paper we are pursuing climatological statistics rather than forecasting probabilities, using the committor only as an intermediate calculation for the reactive current, which characterizes the full transition process from *A* to *B* rather than its “forward half” from **x** to *B*.

Some previous studies (Crommelin 2003; Tantet et al. 2015) have visualized what are essentially reactive currents for blocking events in an observable subspace of leading EOFs. However, these studies were not couched in the language of TPT, a formalism that brings more quantitative results. Namely, the reactive current **J*** _{AB}* provides a direct estimate of the SSW rate, decomposing it over a continuous probability distribution of pathways. Formal TPT has not yet been widely taken up by the atmosphere–ocean science community, besides a few exceptions (Finkel et al. 2020; Miron et al. 2021, 2022). Part of our goal here is to encourage a common quantitative language for discussing regime transitions, which could help to organize several existing lines of research.

The reactive current **J*** _{AB}*, like

*U*, an index for polar vortex strength, and vertically integrated heat flux (IHF), which roughly measures the amplitude and phase tilt of vortex-disrupting planetary waves. Here we continue to use those coordinates, but also introduce a new subspace based on the zonal-mean meridional potential vorticity (PV) gradient and eddy enstrophy. These two quantities obey a conservation law in the absence of dissipation and stochastic forcing, a slight variation of the Eliassen–Palm relation. This allows us to diagnose more precisely the crucial roles of dissipation and stochastic forcing in driving the transition process, an important step toward understanding their causal relationship. Other kinds of atmospheric regime transitions will have different relevant physical diagnostics, any of which can be seen as an independent variable for the committor function and reactive current.

This paper is organized as follows. In section 2 we review the dynamical model. In section 3 we visualize the evolution of SSW events using the probability current, and introduce the key quantities for TPT—committors, densities, and currents—along with a brief summary of the method to compute them, which is more thoroughly explained in the online supplementary document. In section 4, we use reactive current to construct a composite SSW evolution, and compare this to the standard composite method. In section 5, we change coordinates to better examine the dynamics of SSW events. We assess future directions and conclude in section 6.

## 2. A stochastically forced Holton–Mass model of SSW dynamics

We use exactly the same model as in Finkel et al. (2021), which is presented here for completeness.

### a. Model specification

*ψ*′(

*x*,

*y*,

*z*,

*t*) on a

*β*-plane channel with a central latitude of

*θ*= 60°N, a meridional extent of 60°, and a height of 70 km, with the coordinate

*z*ranging from 0 at the bottom of the domain (the tropopause) to 70 km at the top of the domain.

*ψ*′ are projected onto a single zonal wavenumber

*k*= 2/(

*a*cos

*θ*) and a meridional wavenumber

*a*= 6370 km is the radius of Earth, and

*H*= 7 km is the scale height.

*U*(the mean flow) and Ψ (a complex-valued wave amplitude) evolve according to the projected primitive equations and the linearized quasigeostrophic potential vorticity (QGPV) equation. A nondimensionalized version of the equations is as follows, rearranged slightly from Finkel et al. (2021). The mean flow

*U*(

*z*,

*t*) satisfies

*z*,

*t*) satisfies

*f*

_{0}is the Coriolis parameter at 60°N,

*N*

^{2}= 4 × 10

^{−4}is the stratification, and

*L*= 2.5 × 10

^{5}km is a horizontal length scale chosen to make nondimensionalized

*U*and Ψ variables have similar climatological variances. The linear relaxation toward

*U*(

^{R}*z*) = 10 m s

^{−1}+ (

*γ*/1000)

*z*on the right-hand side of Eq. (3a) is the force that maintains the typically strong polar vortex. Here

*γ*= 1.5 m s

^{−1}km

^{−1}. The relaxation is mediated by a Newtonian cooling profile

*α*(

*z*), which is plotted in Fig. 1a, in its original dimensional units. Meanwhile, the lower boundary condition on Ψ comes from a bottom topography

*h*cos(

*kx*), where

*h*= 38.5 m. This serves as a source of planetary waves.

*d*= 3 × (27 − 2) = 75, with a state vector

**X**(

*t*) =

**v**[

**X**(

*t*)]. We furthermore perturb the system by stochastic forcing to represent unresolved processes such as smaller-scale Rossby and gravity waves, initial condition uncertainties, and sources of model error, an approach originally put forward by Birner and Williams (2008) and used more recently by Esler and Mester (2019). The forcing is white in time, giving an Itô diffusion

**v**(

**x**) (not to be confused with meridional wind velocity

*υ*) is the drift function determined by Eq. (3),

**W**(

*t*) is an (

*m*+ 1)-dimensional white-noise process, and

**X**, but for simplicity we fix it to a constant, defined as follows. At each time step

*δt*= 0.005 days, after incrementing the full system by

*δ*

**X**=

**v**(

**X**)

*δt*, we additionally increment the zonal wind profile by

*σ*= 1 m s

_{U}^{−1}day

^{−1/2}, whose units reflect the quadratic variation of Brownian motion. The numerical scheme is known as Euler–Maruyama (see, e.g., Pavliotis 2014, chapter 5). Equation (9) fully defines the matrix σ. For

*k*= 0, …,

*m*, the

*k*th column starts with 50 zeros, since there is no forcing on Re{Ψ} or Im{Ψ}. The last 25 entries are evenly spaced samples of the sinusoidal factor in Eq. (9), all times

*σ*.

_{U}The specific choice of stochastic forcing does affect the transition path statistics, but our method can be applied to any stochastic forcing. Because of the nonlinear coupling between *U*(*z*) and Ψ(*z*) in Eqs. (3a) and (3b), the noise injected to *U* feeds to Ψ after a single time step.

### b. Diagnostics

*U*(

*z*), an index for vortex strength which is used to define regimes

*A*and

*B*. The second is the meridional eddy heat flux

*R*is the ideal gas constant for dry air and

*φ*is the phase of the complex-valued streamfunction Ψ. Hence the heat flux is related to the amplitude and phase tilt of the waves, both of which rise significantly during an SSW event. We also use the density-weighted vertical integral of heat flux,

### c. Bistability

We use the same constant parameters and boundary conditions as Finkel et al. (2021), which give rise to two stable equilibria: a radiative equilibrium–like state, denoted **a**, and a disturbed state **b**, in which upward-propagating stationary waves flux momentum down to the lower boundary, weakening zonal winds. Detailed bifurcation analysis by Yoden (1987a) and Christiansen (2000) found a range of values for bottom topography *h* that create bistability. Figures 1b and 1c depicts the zonal wind and streamfunction of these two equilibria. SSW events in this model are abrupt transitions from the region near **a** to the region near **b**. If a strong wave from below happens to catch the stratospheric vortex in a vulnerable configuration, then a burst of wave activity can propagate upward, ripping apart the polar vortex and causing zonal wind to collapse (Charney and Drazin 1961; Yoden 1987b). With certain parameters, the vortex can get stuck in repeated “vacillation cycles,” in which the vortex begins to restore with the help of radiative forcing, only to be undermined quickly by the wave. The situation of two well-separated equilibria is highly idealized, and not a generic feature of climate phenomena; this system, with these parameters, serves to demonstrate qualitative features of SSW, not represent the real stratosphere quantitatively. Holton and Mass (1976), Yoden (1987b), Christiansen (2000), and Finkel et al. (2021) contain further details.

*transition path*is defined as an unbroken segment, or trajectory, of the system that begins in a region

*A*of state space (a neighborhood of

**a**) and travels to another region

*B*(a neighborhood of

**b**) without returning to

*A*. As in Finkel et al. (2021), we define

*A*and

*B*based on the zonal-mean zonal wind at

*z*= 30 km:

**a**and

**b**, respectively.

An SSW event is then a transition from *A* to *B*, while the reverse, from *B* to *A*, represents the recovery of the vortex. The definition of *B* modifies the widely used definition of Charlton and Polvani (2007) in two ways. First, we use zonal wind at 30 km above the tropopause (in log-pressure coordinates), because 30 km is where the zonal wind profile of **b** reaches a minimum; Christiansen (2000) used this same coordinate when studying the same model. [The standard 10 hPa pressure level would correspond to *z* = −7 km × log(10/1000) − 10 km ≈ 22 km above the troposphere in this model.] We also modify the zonal wind thresholds order to ensure that **a** ∈ *A* and **b** ∈ *B*.

An important consequence of our *A* and *B* definitions is that the *A* → *B* transition path takes ∼80 days. By design, this includes the slow initial *preconditioning* stage of vortex breakdown in advance of the ∼10-day time horizon that traditionally comprises an SSW event. In this paper, “SSW event” should be interpreted as both the preconditioning and the ensuing vortex collapse.

Figure 2 shows time series of *U* and *A* → *B* transitions while green strips denote *B* → *A* transitions. The long periods in between, which we call the *A* → *A* and *B* → *B* phases, demonstrate the bistable nature of regimes *A* and *B*. The fleeting *A* → *B* phase, however, is what we seek to understand. When the system is en route from *A* to *B*, we say it is (*AB*) *reactive*, using a term from chemistry literature where the passage from *A* (reactant) to *B* (product) models a chemical reaction. The following section will introduce the *reactive density π _{AB}*(

**x**) and associated

*reactive current*

**J**

*(*

_{AB}**x**), which help us visualize the transition as a path distribution through state space and make the foregoing observations more quantitative.

## 3. The reactive density and reactive current: A distribution over transition paths

**X**(

*t*) undergoing transitions between states

*A*and

*B*. Aggregating together statistics from only the transition paths yields a probability distribution called the

*reactive density π*(

_{AB}**x**), defined such that

*d*

**x**is a small region about

**x**. One could estimate

*π*by binning samples from a long simulation, but including only those samples in transit directly from

_{AB}*A*to

*B*. Associated to

*π*is a vector field called the

_{AB}*reactive current*

**J**

*(*

_{AB}**x**), which quantifies the probability flux passing through

**x**per unit time only during transition paths. Roughly speaking,

*π*specifies where transition paths go, and

_{AB}**J**

*specifies how they move. Below we define them formally, but Figs. 3a–c give some intuition by projecting them on the subspace (*

_{AB}*U*, IHF) at

*z*=10, 20, and 30 km. Background shading indicates the strength of

*π*, and arrows indicate the magnitude and direction of

_{AB}**J**

*. Overlaid in thin blue lines are 10 randomly sampled transition paths from the long ergodic simulation. These sample paths cluster in the same regions of state space identified as high probability under*

_{AB}*π*, and on average flow along the arrows, corroborating qualitatively that

_{AB}*π*(

_{AB}**x**) and

**J**

*describe the location and evolution of the model in state space.*

_{AB}The transition path ensemble shows marked differences between altitudes. At *z* =10 km, the vortex strength (*U*) of states **a** and **b** is about the same, but the IHF is very distinct. The reactive current aligns with the IHF axis. Mathematically, this reflects the lower boundary condition *U*(*z* = 0) = *U ^{R}*(

*z*= 0). Physically, this means that the heat flux due to the wave is the dominant physical process, with only small changes in zonal wind strength. The higher altitude of

*z*= 30 km, by contrast, exhibits a large reduction in zonal wind strength, but only in the late stages of the process. In fact, the pattern of reactive density

*π*at

_{AB}*z*= 30 km (Fig. 3c) tells us that this final deceleration is quite sudden: the magnitude of

*π*is large near

_{AB}*A*, meaning transition paths linger there for a long time and only slowly crawl downward and to the right. But at the point IHF(30 km) ≈ 2.5 × 10

^{4}K m s

^{−1},

*U*(30 km) ≈ 30 m s

^{−1}(the region marked by a black rectangle in Figs. 3c,f),

*π*reduces in magnitude and the reactive current spreads out widely as it turns downward toward set

_{AB}*B*. This is a signal that the transition paths are becoming both faster and more variable.

As a further point of comparison with **J*** _{AB}*, we have plotted the minimum-action pathway from

*A*to

*B*with thick cyan lines (section 3 of the supplement specifies the numerical method). This represents the most likely transition path in the low-noise limit (e.g., Freidlin and Wentzell 1970; E et al. 2004; Forgoston and Moore 2018), and indeed it follows the direction of reactive current. With finite noise, however, the transition path ensemble spreads significantly around the minimum-action pathway, especially at the higher altitude of 30 km in the late stage of the transition process. Because of this, it is not possible for

*any*single pathway, minimum action or not, to meaningfully represent the full ensemble.

We will show that the slow, initial phase of SSW involves *preconditioning* of the vortex: gradual erosion of the wind field by the stochastic forcing into a configuration that is especially susceptible to wave propagation. Once the wave burst is triggered, it imparts swift changes to the entire zonal wind profile. However, the bulk of SSW progress, probabilistically speaking, occurs in the preconditioning phase. Below we make this qualitative description precise by relating the reactive current to the forecast functions from Finkel et al. (2021): the committor and expected lead time metrics.

### a. Mathematical relationship between current, committor, density, and rate

**X**(

*t*

_{0}) =

**x**with a vortex that is neither strong nor fully broken down, so

**x**∉

*A*∪

*B*.

**X**(

*t*) will soon evolve into either

*A*or

*B*, since both are attractive. The probability of hitting

*B*first is called the

*forward committor*(to

*B*):

**x**denotes a conditional probability given

**X**(

*t*

_{0}) =

**x**, and

*first hitting time*after

*t*

_{0}to a set

Our system is autonomous, with no external time-dependent forcing, so we can set *t*_{0} = 0 and drop the argument from **x** with a periodic variable for time (e.g., to include the seasonal cycle) or by augmenting *A* and *B* to include initial and terminal times (e.g., to better examine climate change effects). Periodic- and finite-time TPT has been presented formally in Helfmann et al. (2020), and we have applied it to a dataset of state-of-the-art ensemble forecasts in Finkel et al. (2022). As a conceptual demonstration, however, the autonomous Holton–Mass model makes for a clearer exposition.

*expected lead time*(to

*B*),

*B*conditional on hitting

*B*first. Finkel et al. (2021) described

*U*, IHF). We do the same here, but additionally we overlay the reactive current. In Figs. 3d–f, background shading represents the expected lead time and black contours represent committor level sets of 0.1, 0.2, 0.5, 0.8, and 0.9.

The committor’s contour structure differs a lot between altitude levels. At 10 and 30 km (Figs. 3d,f), the contours have kinks. Depending on the initial condition, either a fluctuation in *U* or IHF might have a greater effect on the committor. The intermediate altitude of 10 km seems special in having committor contours that align with the IHF axis along the main channel of reactive current. In other words, *U*(20 km), which is consistent with the finding in Finkel et al. (2021) that the 21.5-km altitude holds the most predictive power for

**J**

*is related to*

_{AB}**J**

*contains some key information that the committor does not. As a*

_{AB}*fore*cast function, the committor does not distinguish

*A*→

*B*transitions from

*B*→

*B*transitions, where the system leaves state

*B*(beginning to recover), but then falls back to the weak-vortex state. To isolate the transition events from

*A*to

*B*, we need to introduce the

*backward committor*(to

*A*):

*most recent hitting time*:

**x**last came from

*A*, not

*B*. The backward-in-time probabilities refer specifically to the process

**X**(

*t*)

*in steady-state*, allowing us once again to set

*t*

_{0}= 0. In other words,

*steady-state probability density π*(

**x**), where

*d*

**x**about

**x**.

*Z*is a normalizing constant such that the right-hand side integrates to one. The associated reactive current can in turn be expressed

_{AB}**∇**represents the gradient operator over state space.

*rate*, or inverse return time, of the event [approximately (1700 days)

^{−1}for the Holton–Mass model with our chosen parameters]. Let

*C*be a closed hypersurface in

*A*and is disjoint with

*B*; we call this a

*dividing surface*. In the context of the diagrams in Fig. 3,

*C*is any curve separating region

*A*from region

*B*. Then we have

**n**is an outward unit normal from

*C*and

*dS*is a surface area element. The integral relationship (22) holds for any dividing surface, implying that the current is divergence-free outside of

*A*and

*B*, but has a source in

*A*and a sink in

*B*[see Vanden-Eijnden (2006) for a thorough mathematical explanation of

**J**

*]. This constraint immediately implies a link between magnitude and width of*

_{AB}**J**

*streamlines. In Figs. 3c and 3f, the strong magnitude of*

_{AB}**J**

*near*

_{AB}**a**implies a thin central channel, and strict constraints on the mechanisms of early SSW onset. In other words, the initial preconditioning phase can only happen in a small number of ways. On the other hand, the subsequent weakening of

**J**

*between*

_{AB}*U*and IHF at 30 km; at the lower altitudes, the current remains strong and narrow all the way through the transition process (Fig. 3, columns 1 and 2).

The reactive current and density characterize the transition path ensemble across the continuum of possible pathways, providing more information than the numerical value of the rate itself. Given any user-defined set of coordinates, the reactive current projection maps the transition paths in those coordinates, as a statistical ensemble with average behavior and variability. Below, following a brief note on the computational method, sections 4 and 5 demonstrate how to use reactive current and density to describe climatology and strengthen physical understanding of a rare transition event.

### b. Computational method

The quantities presented in section 3, as well as the results to follow, could be computed directly by running a model for long enough to undergo a large number of SSW events and analyzing the statistics of those transitions. This procedure, which we call the “ergodic simulation” (ES) method, is possible in the 75-dimensional Holton–Mass model, and we have performed such a simulation of 10^{6} days for validation purposes. However, this can be a major computational barrier in global climate models when the numerical integration is costly and the return period is long compared to the simulation time step. Anticipating the need for fundamentally different techniques in high-dimensional state spaces, we have instead used the dynamical Galerkin approximation (DGA; Thiede et al. 2019; Strahan et al. 2021). A large collection of trajectories are launched in parallel with initial conditions distributed across state space, each one running for only a short time relative to the return period. Here we use 3 × 10^{5} trajectories of length 20 days each, which is shorter than the 80-day duration of a single SSW event and much shorter than the 1700-day return period. Afterward, we assemble all these pieces together to estimate the quantities of interest, exploiting the Markov property. The total simulation time is not always reduced by this method—in our case, the short simulations total 6 × 10^{6} days compared with the 1 × 10^{6}-day ES—but the format opens the door for many interesting possibilities, such as massive parallelization and adaptive sampling. In particular, as we show in Finkel et al. (2022), DGA is uniquely positioned to exploit large ensembles of short weather forecasts from high-fidelity operational models.

The basic DGA algorithm for rare event analysis has been described and tested in a recent series of articles (Thiede et al. 2019; Strahan et al. 2021; Finkel et al. 2021; Antoszewski et al. 2021). It is closely related to the “analog Markov chain” approach of Lucente et al. (2021). Recently, an approach to learning neural network approximations of forecast functions using short trajectory data was introduced in Strahan et al. (2022). Due to the dependence on steady state and backward-in-time quantities, a full TPT analysis as carried out in this paper requires additional calculations beyond what is described in Finkel et al. (2021). We leave these details to the supplement in order to keep the focus on the results of our TPT analysis, which are robust with respect to algorithmic parameters.

## 4. SSW composites

Here we explain the traditional notion of a rare event “composite” and contrast it with the composite intrinsically defined by TPT. The results are qualitatively similar, but the TPT description allows a rigorous mathematical connection to the reactive current and SSW rate.

The standard “composite” of an SSW event is a day-by-day aggregate of all the SSW events in a given dataset, aligned by the central warming date. This can include statistics, such as the mean and quantiles, of any observable function, such as the zonal-mean zonal wind or heat flux. Charlton and Polvani (2007) and Charlton et al. (2007) used this method to describe SSW climatology and establish benchmarks for stratosphere-resolving GCMs. We form a standard composite of *U*(30 km) from our Holton–Mass model in Fig. 4a, averaging together 300 events from a long ergodic simulation.

Here, we propose a complementary “TPT composite” based on reactive density. Instead of aligning events by the central warming date, we align the events by a general coordinate *f*(**x**), which can be user-defined but must fulfill the minimal criterion of increasing from *A* to *B*, so it represents some objective notion of progress. At any progress level *f*_{0}, the TPT composite is defined by restricting the reactive density *π _{AB}*(

**x**) to the level set {

**x**:

*f*(

**x**) =

*f*

_{0}}. Fixing

*f*=

*f*

_{0}is not the same as fixing the lead time

*f*(

**x**) is a deterministic function of initial condition

**x**, unlike the hitting time

In Figs. 4b and 4c, we juxtapose alternative composites with the standard warming date coordinate *expected* time until the central warming date. *A* to 1 on *B*.

The traditional and TPT composites are similar in shape, with an initially gradual decay in *U*(30 km) accelerating into a rapid decline in the final few days. As a function of *U*(30 km) accelerates steadily through the whole transition, in both the traditional and TPT composites. But as a function of committor, *U*(30 km) decreases linearly at first and then accelerates downward between *U*(30 km) becomes steadily less variable over time, with the whole ensemble collapsing into a single path by construction, as *t* = 0 is the time of the event when *U*(30 km) = 0. But when viewed as a function of expected lead time or committor, *U*(30 km) becomes more variable in the middle of the path, starting at *q*^{+} → 1.

The same variability is reflected in Figs. 3c and 3f. In the boxed region, the reactive density weakens and the reactive current spreads out, some paths turning straight downward into *B* and others accumulating still more heat flux before making the plunge. The *U* flank of the current, especially in the boxed region, the *U* axis. The lowest visible level set of *U*(30 km).

Physically, the TPT composites are more variable than the traditional composite because *z*, *t*) and *U*(*z*, *t*) at a given time, the TPT composite is the best one can do. The expected lead time quantifies SSW predictability, as established in Finkel et al. (2021). Here, we additionally incorporate the backward committor *π _{AB}*, and so restrict focus to

*transition*events—“major warmings”—from

*A*to

*B*.

As a loose analogy, a student’s progress toward a degree can be measured objectively in course credits. On the other hand, first-year exams might weed out half of all students, which means that the *probabilistic* halfway point usually comes before half of required credits are done. A third metric, the time until graduation, can vary due to random effects like gap years and pandemics, which can cause a student to space their course load unevenly in time. Each cross section of the student population—conditioning on a fixed number of credits completed, probability of graduation, or expected time until graduation—is a different statistical ensemble, each one conveying different information.

Going forward, we will use the committor as the progress coordinate of choice. That way, each point along the composite is an average over trajectories that are equally predictable in their probability to reach *B*, i.e., to proceed to an SSW. Often it is not just a singular coin toss that determines the fate of **X**(*t*), but a whole sequence of “coin tosses”—random turns through state space—aligning in just such a way to navigate from *A* to *B*. With the committor as a progress coordinate, the “coin tosses” are equidistributed along the horizontal axis, though they may not be equidistributed in time.

The same composite technique can be used to visualize the vertical wind structure at different stages. Figure 5 plots *U*(*z*) and *z* = 20–25 km is the key transition region, below which zonal wind evolves relatively smoothly and with a symmetric distribution, and above which it varies rapidly with a skewed distribution. *α*(*z*), which has its own transition region centered at 25 km. It is not surprising that zonal wind just below, at 21.5 km, is an optimal linear predictor, as we found in Finkel et al. (2021).

## 5. A wave–mean flow interaction perspective

The previous section presented **J*** _{AB}* and

*π*as functions of two basic observables, zonal wind and integrated heat flux, and constructed a composite evolution of these observables. In this section, we incorporate more detailed physical knowledge to improve the interpretability of our TPT results. In particular, we manipulate the dynamical equations to derive an enstrophy budget in the Holton–Mass model, which reveals a more natural set of coordinates that separates conservative from nonconservative processes. By visualizing the current in these coordinates, we identify physical drivers of each stage in the transition process. Our goal is twofold: first, to show how TPT can be formulated for any observables, and second, more narrowly in the context of this study, how the dynamics become clearer when those observables are well-chosen.

_{AB}### a. An eddy enstrophy formulation of the Holton–Mass model

A common diagnostic for wave–mean flow interaction systems is the wave activity, *B*) of the Holton–Mass model, in particular the upward wave propagation that destabilizes the vortex. Below we derive a related set of equations for the eddy enstrophy, which enjoys a simpler balance equation and which we have found is better numerically suited for TPT analysis.

*q*′ and take a zonal average, yielding

We wish to work with the projected version of the equation, Eq. (3b), rather than the original PDE, to account for the approximation

*q*′ is represented in the projected equations by

*ψ*′(

*x*,

*y*,

*z*,

*t*), in geostrophic balance with the wind (

*u*,

*υ*).

*β*, as in Yoden (1987b). Instead, we next turn to the mean-flow Eq. (3a), which is an evolution equation for the PV gradient

_{e}*β*rather than

_{e}*U*directly. Multiplying through by

*β*, we find

_{e}*z*) is the squared meridional gradient of zonal-mean potential vorticity, which is highly correlated to zonal wind strength

*U*(

*z*) in the Holton–Mass model;

*R*is a relaxation coefficient for Γ, strengthening the vortex via radiative cooling.

*F*cancels to give

_{q}β_{e}*α*(

*z*), which appears both in

*D*and

*R*.

### b. Using the reactive current to quantify the importance of nonconservative processes

Dissipation and forcing act to disrupt the conservation of *U*, IHF). We take square roots because the visualizations are clearer, and the units of s^{−1} are more comparable with those of zonal wind *U*(*z*) and radiative cooling *α*(*z*). [We note that the fixed point **b** in Fig. 6d appears to have committor <1; this is possible when projecting out nonlinear coordinates because set *B* is defined based on the 30-km level, and the state-space regions that resemble **b** at 10 km may not resemble it at 30 km.] In the upper stratosphere, at *z* = 30 km (Figs. 6c,f), the main channel of reactive current flows along a circular arc, approximately conserving *U*, IHF) space, the reactive density *π _{AB}* decreases along that circular arc, meaning the transition paths accelerate.

On the other hand, **J*** _{AB}* projected at

*z*= 10 km (Figs. 6a,d) shows that the dynamics are never conservative in the lower stratosphere: the initial motion points not along a circular arc but directly leftward, such that

Finally, consider the middle altitude of 20 km, where **J*** _{AB}* has a shape that is intermediate between the current at 10 and 30 km. It does not have distinctly positive or negative curvature, but flows along a straight channel from

*A*to

*B*. Twenty kilometers seems to be in just the right altitude range to feel significant dissipation and stochastic forcing—a feature of the lower boundary—but also to channel a good share of the loss of Γ to the gain of

^{1/2}(20 km) and

Figures 7a–c show the composite evolution of

At 20 and 30 km, the distribution of

The composites, as well as the reactive currents, support the notion of the “typical” transition path as an initially nonconservative creep at low altitudes, opening up a valve to allow waves to propagate upward, finally yielding a very abrupt collapse at high altitudes follows after a long, mostly conservative phase. With the enstrophy budget (28), we can assess the importance of each term by plotting those composites as well. Figures 7d–f show the composite evolution of each term at each altitude: *Rβ _{e}* (the relaxation of the squared mean PV gradient, Γ) in blue,

*D*(the dissipation of enstrophy,

*β*(the transfer of enstrophy from Γ to

_{e}F_{q}*relative*to the total budget. The sum

**a**or

**b**, depending on where the initial condition falls relative to the surface dividing the two attractors.

**J**

*⋅*

_{AB}**∇**

*f*(

**x**) for any observable

*f*, so it is appropriate to view the arrows in Figs. 3 and 6 as a proxy for the stochastic tendencies of the projected observables.

We introduce *q*_{0} highlights the stochastic effects responsible for taking the system from *q*_{0} to *q*_{0} + *dq*. Often it is not just a single coin flip that decides the fate of **X**(*t*), but a whole sequence of random turns through state space aligning in just such a way to navigate from *A* to *B*.

The role of stochasticity is most stark at 10 and 20 km (Figs. 7d,e) and for *B*, the easier it is for deterministic drift to carry it out alone. At 30 km (Fig. 7f), all forms of dissipation and forcing start out *relatively* small compared to the magnitude of *slows* the collapse of *U*(30 km) at the end. It seems that to achieve the *A* → *B* transition, which is defined entirely in terms of *U*(30 km), the most common mechanism is a persistent negative push applied to lower altitudes, and this ultimately sets up the higher altitudes for more sudden, deterministic collapse after the “hard work” of eroding the vortex from below is mostly finished.

In summary, the TPT diagnostics have demonstrated that the SSW process begins with steady, significant decay of the PV gradient (here, its squared gradient, Γ) at lower altitudes, driven by the stochastic forcing, with only conservative changes taking place at higher altitudes. This preconditioning of the vortex opens up a valve to the midstratosphere. In the late stages of the transition, starting between

## 6. Conclusions

Transition path theory (TPT) is a mathematical framework that can be used to assess the near-term predictability and long-term climatology of anomalous weather events. The framework lends itself naturally to events associated with regime transitions, but it can be applied to more general anomalies. The key is to be able to define a suitable “reaction coordinate,” or measure of progress, linking the event to the mean state. We have analyzed the statistical ensemble of sudden stratospheric warmings (SSWs) in the idealized Holton–Mass model. Here, measures of the vortex strength (or the mean potential vorticity) and heat flux (eddy enstrophy) provide natural coordinates for applying the theory.

Probability densities and currents tell us how the system evolves through state space during a breakdown of the polar stratospheric vortex. The reactive current, **J*** _{AB}*, allows one to condition dynamical tendencies on the occurrence of a rare event. By overlaying

**J**

*over observable subspaces at different altitudes in the stratosphere, we have identified the key roles of dissipation and stochastic forcing in driving SSWs in the Holton–Mass model. The stochastic driving represents the effects of unresolved Rossby and gravity waves that have been stripped from this highly truncated model. The action of these nonconservative processes, stochastic driving in particular, matter most at lower altitudes early in the transition process, conditioning the vortex, while the higher altitudes are shielded from significant dissipation. It is only late in the transition process, after the likelihood of the event has surpassed 60%, that the upper-level winds play a significant role in the dynamics.*

_{AB}This work is an early application of TPT to atmospheric science. We believe it holds potential as a framework for forecasting, risk analysis, and uncertainty quantification. Thus far, it has been used mainly to analyze protein folding in molecular dynamics, but is now being applied in diverse fields such as social science (Helfmann et al. 2021), as well as ocean and atmospheric science (Finkel et al. 2020; Helfmann et al. 2020; Miron et al. 2021, 2022). TPT results are best interpreted when viewed in a physically meaningful observable subspace of variables. Utilizing physical knowledge and experience with the system allows one to gain the most from the methodology. With the rather simple Holton–Mass model, we identified such a subspace based on an enstrophy budget. In different versions of quasigeostrophic dynamics, the wave activity (Nakamura and Solomon 2010; Lubis et al. 2018) and other diagnostics based on the transformed Eulerian mean (Andrews and McIntyre 1976) are likely to be informative coordinates.

Significant challenges remain for deploying TPT analysis at scale to state-of-the-art climate models. We have used a dynamical Galerkin approximation (DGA) short trajectory analysis algorithm to compute TPT quantities. One important limitation of this computational pipeline is the data generation step. We used a long direct simulation to sample the background climatology, which served the double purpose of seeding initial data points for short trajectories and providing a ground truth for validating the accuracy of DGA. The former point is critical: one must cover the space of initial conditions to capture the dynamics of extreme events. In some cases, short trajectory data already exist, e.g., from the subseasonal-to-seasonal (S2S) database (Vitart and Robertson 2018), which we have used recently in Finkel et al. (2022) to estimate centennial-scale SSW rates from only 21 years of ensemble forecasts. In other cases, it is advantageous to generate fresh data in undersampled regions of state space, which would require more advanced sampling methods such as the adaptive sampling strategies proposed in Lucente et al. (2021) and Strahan et al. (2022), or rare event simulation schemes such as in Mohamad and Sapsis (2018), Ragone et al. (2018), Webber et al. (2019), and Ragone and Bouchet (2020).

## Acknowledgments.

During the time of writing, J.F. was supported by the U.S. DOE, Office of Science, Office of Advanced Scientific Computing Research, Department of Energy Computational Science Graduate Fellowship under Award DE-SC0019323. During the time of writing, R.J.W. was supported by New York University’s Dean’s Dissertation Fellowship and by the Research Training Group in Modeling and Simulation funded by the NSF via Grant RTG/DMS-1646339. E.P.G. acknowledges support from the NSF through Grants AGS-1852727 and OAC-2004572. This work was partially supported by the NASA Astrobiology Program, Grant 80NSSC18K0829, and benefited from participation in the NASA Nexus for Exoplanet Systems Science research coordination network. J.W. acknowledges support from the Advanced Scientific Computing Research Program within the DOE Office of Science through Award DE-SC0020427 and from the NSF through Award DMS-2054306. The computations in the paper were done on the high-performance computing cluster at New York University. We thank John Strahan, Aaron Dinner, and Chatipat Lorpaiboon for many helpful conversations and methodological advice.

## Data availability statement.

The code to produce the dataset and results, either on the Holton–Mass model or on other systems, is publicly available at https://github.com/justinfocus12/SHORT. Interested users are encouraged to contact J.F. for more guidance on usage of the code.

## REFERENCES

Andrews, D. G., and M. E. McIntyre, 1976: Planetary waves in horizontal and vertical shear: The generalized Eliassen-Palm relation and the mean zonal acceleration.

,*J. Atmos. Sci.***33**, 2031–2048, https://doi.org/10.1175/1520-0469(1976)033<2031:PWIHAV>2.0.CO;2.Antoszewski, A., C. Lorpaiboon, J. Strahan, and A. R. Dinner, 2021: Kinetics of phenol escape from the insulin R6 hexamer.

,*J. Phys. Chem.***125B**, 11 637–11 649, https://doi.org/10.1021/acs.jpcb.1c06544.Birner, T., and P. D. Williams, 2008: Sudden stratospheric warmings as noise-induced transitions.

,*J. Atmos. Sci.***65**, 3337–3343, https://doi.org/10.1175/2008JAS2770.1.Bolhuis, P. G., D. Chandler, C. Dellago, and P. L. Geissler, 2002: Transition path sampling: Throwing ropes over mountain passes in the dark.

,*Annu. Rev. Phys. Chem.***53**, 291–318, https://doi.org/10.1146/annurev.physchem.53.082301.113146.Charlton, A. J., and L. M. Polvani, 2007: A new look at stratospheric sudden warmings. Part I: Climatology and modeling benchmarks.

,*J. Climate***20**, 449–469, https://doi.org/10.1175/JCLI3996.1.Charlton, A. J., and Coauthors, 2007: A new look at stratospheric sudden warmings. Part II: Evaluation of numerical model simulations.

,*J. Climate***20**, 470–488, https://doi.org/10.1175/JCLI3994.1.Charney, J. G., and P. G. Drazin, 1961: Propagation of planetary-scale disturbances from the lower into the upper atmosphere.

,*J. Geophys. Res.***66**, 83–109, https://doi.org/10.1029/JZ066i001p00083.Charney, J. G., and J. G. DeVore, 1979: Multiple flow equilibria in the atmosphere and blocking.

,*J. Atmos. Sci.***36**, 1205–1216, https://doi.org/10.1175/1520-0469(1979)036<1205:MFEITA>2.0.CO;2.Christiansen, B., 2000: Chaos, quasiperiodicity, and interannual variability: Studies of a stratospheric vacillation model.

,*J. Atmos. Sci.***57**, 3161–3173, https://doi.org/10.1175/1520-0469(2000)057<3161:CQAIVS>2.0.CO;2.Crommelin, D. T., 2003: Regime transitions and heteroclinic connections in a barotropic atmosphere.

,*J. Atmos. Sci.***60**, 229–246, https://doi.org/10.1175/1520-0469(2003)060<0229:RTAHCI>2.0.CO;2.Crommelin, D. T., J. D. Opsteegh, and F. Verhulst, 2004: A mechanism for atmospheric regime behavior.

,*J. Atmos. Sci.***61**, 1406–1419, https://doi.org/10.1175/1520-0469(2004)061<1406:AMFARB>2.0.CO;2.Du, R., V. S. Pande, A. Y. Grosberg, T. Tanaka, and E. S. Shakhnovich, 1998: On the transition coordinate for protein folding.

,*J. Chem. Phys.***108**, 334, https://doi.org/10.1063/1.475393.E, W., and E. Vanden-Eijnden, 2006: Towards a theory of transition paths.

,*J. Stat. Phys.***123**, 503–523, https://doi.org/10.1007/s10955-005-9003-9.E, W., W. Ren, and E. Vanden-Eijnden, 2004: Minimum action method for the study of rare events.

,*Commun. Pure Appl. Math.***57**, 637–656, https://doi.org/10.1002/cpa.20005.Esler, J. G., and M. Mester, 2019: Noise-induced vortex-splitting stratospheric sudden warmings.

,*Quart. J. Roy. Meteor. Soc.***145**, 476–494, https://doi.org/10.1002/qj.3443.Finkel, J., D. S. Abbot, and J. Weare, 2020: Path properties of atmospheric transitions: Illustration with a low-order sudden stratospheric warming model.

,*J. Atmos. Sci.***77**, 2327–2347, https://doi.org/10.1175/JAS-D-19-0278.1.Finkel, J., R. J. Webber, E. P. Gerber, D. S. Abbot, and J. Weare, 2021: Learning forecasts of rare stratospheric transitions from short simulations.

,*Mon. Wea. Rev.***149**, 3647–3669, https://doi.org/10.1175/MWR-D-21-0024.1.Finkel, J., E. P. Gerber, D. S. Abbot, and J. Weare, 2022: Revealing the statistics of extreme events hidden in short weather forecast data. arXiv, 2206.05363v1, https://doi.org/10.48550/arXiv.2206.05363.

Forgoston, E., and R. O. Moore, 2018: A primer on noise-induced transitions in applied dynamical systems.

,*SIAM Rev.***60**, 969–1009, https://doi.org/10.1137/17M1142028.Frame, D. J., S. M. Rosier, I. Noy, L. J. Harrington, T. Carey-Smith, S. N. Sparrow, D. A. Stone, and S. M. Dean, 2020: Climate change attribution and the economic costs of extreme weather events: A study on damages from extreme rainfall and drought.

,*Climatic Change***162**, 781–797, https://doi.org/10.1007/s10584-020-02729-y.Freidlin, M. I., and A. D. Wentzell, 1970:

*Random Perturbations of Dynamical Systems*. Springer, 460 pp.Helfmann, L., E. Ribera Borrell, C. Schütte, and P. Koltai, 2020: Extending transition path theory: Periodically driven and finite-time dynamics.

,*J. Nonlinear Sci.***30**, 3321–3366, https://doi.org/10.1007/s00332-020-09652-7.Helfmann, L., J. Heitzig, P. Koltai, J. Kurths, and C. Schütte, 2021: Statistical analysis of tipping pathways in agent-based models.

,*Eur. Phys. J. Spec. Top.***230**, 3249–3271, https://doi.org/10.1140/epjs/s11734-021-00191-0.Holton, J. R., and C. Mass, 1976: Stratospheric vacillation cycles.

,*J. Atmos. Sci.***33**, 2218–2225, https://doi.org/10.1175/1520-0469(1976)033<2218:SVC>2.0.CO;2.Kron, W., P. Löw, and Z. W. Kundzewicz, 2019: Changes in risk of extreme weather events in Europe.

,*Environ. Sci. Policy***100**, 74–83, https://doi.org/10.1016/j.envsci.2019.06.007.Lee, C.-Y., M. K. Tippett, A. H. Sobel, and S. J. Camargo, 2018: An environmentally forced tropical cyclone hazard model.

,*J. Adv. Model. Earth Syst.***10**, 223–241, https://doi.org/10.1002/2017MS001186.Lengaigne, M., and G. A. Vecchi, 2010: Contrasting the termination of moderate and extreme El Niño events in coupled general circulation models.

,*Climate Dyn.***35**, 299–313, https://doi.org/10.1007/s00382-009-0562-3.Lesk, C., P. Rowhani, and N. Ramankutty, 2016: Influence of extreme weather disasters on global crop production.

,*Nature***529**, 84–87, https://doi.org/10.1038/nature16467.Lubis, S. W., C. S. Y. Huang, and N. Nakamura, 2018: Role of finite-amplitude eddies and mixing in the life cycle of stratospheric sudden warmings.

,*J. Atmos. Sci.***75**, 3987–4003, https://doi.org/10.1175/JAS-D-18-0138.1.Lucente, D., J. Rolland, C. Herbert, and F. Bouchet, 2021: Coupling rare event algorithms with data-based learned committor functions using the analogue Markov chain. arXiv, 2110.05050v3, https://doi.org/10.48550/arXiv.2110.05050.

Lucente, D., C. Herbert, and F. Bouchet, 2022: Committor functions for climate phenomena at the predictability margin: The example of El Niño–Southern Oscillation in the Jin and Timmermann model.

,*J. Atmos. Sci.***79**, 2387–2400, https://doi.org/10.1175/JAS-D-22-0038.1.Mann, M. E., S. Rahmstorf, K. Kornhuber, B. A. Steinman, S. K. Miller, and D. Coumou, 2017: Influence of anthropogenic climate change on planetary wave resonance and extreme weather events.

,*Sci. Rep.***7**, 45242, https://doi.org/10.1038/srep45242.Miloshevich, G., B. Cozian, P. Abry, P. Borgnat, and F. Bouchet, 2022: Probabilistic forecasts of extreme heatwaves using convolutional neural networks in a regime of lack of data. arXiv, 2208.00971v1, https://doi.org/10.48550/ARXIV.2208.00971.

Miron, P., F. J. Beron-Vera, L. Helfmann, and P. Koltai, 2021: Transition paths of marine debris and the stability of the garbage patches.

,*Chaos***31**, 033101, https://doi.org/10.1063/5.0030535.Miron, P., F. J. Beron-Vera, and M. J. Olascoaga, 2022: Transition paths of North Atlantic deep water.

,*J. Atmos. Oceanic Technol.***39**, 959–971, https://doi.org/10.1175/JTECH-D-22-0022.1.Mohamad, M. A., and T. P. Sapsis, 2018: Sequential sampling strategy for extreme event statistics in nonlinear dynamical systems.

,*Proc. Nat. Acad. Sci. USA***115**, 11 138–11 143, https://doi.org/10.1073/pnas.1813263115.Nakamura, N., and A. Solomon, 2010: Finite-amplitude wave activity and mean flow adjustments in the atmospheric general circulation. Part I: Quasigeostrophic theory and analysis.

,*J. Atmos. Sci.***67**, 3967–3983, https://doi.org/10.1175/2010JAS3503.1.Oksendal, B., 2003:

*Stochastic Differential Equations: An Introduction with Applications*. Springer, 379 pp.Pavliotis, G. A., 2014:

*Stochastic Processes and Applications*. Springer, 339 pp.Ragone, F., and F. Bouchet, 2020: Computation of extreme values of time averaged observables in climate models with large deviation techniques.

,*J. Stat. Phys.***179**, 1637–1665, https://doi.org/10.1007/s10955-019-02429-7.Ragone, F., J. Wouters, and F. Bouchet, 2018: Computation of extreme heat waves in climate models using a large deviation algorithm.

,*Proc. Natl. Acad. Sci. USA***115**, 24–29, https://doi.org/10.1073/pnas.1712645115.Ruzmaikin, A., J. Lawrence, and C. Cadavid, 2003: A simple model of stratospheric dynamics including solar variability.

,*J. Climate***16**, 1593–1600, https://doi.org/10.1175/1520-0442(2003)016<1593:ASMOSD>2.0.CO;2.Stephenson, D. B., B. Casati, C. A. T. Ferro, and C. A. Wilson, 2008: The extreme dependency score: A non-vanishing measure for forecasts of rare events.

,*Meteor. Appl.***15**, 41–50, https://doi.org/10.1002/met.53.Strahan, J., A. Antoszewski, C. Lorpaiboon, B. P. Vani, J. Weare, and A. R. Dinner, 2021: Long-time-scale predictions from short-trajectory data: A benchmark analysis of the trp-cage miniprotein.

,*J. Chem. Theory Comput.***17**, 2948–2963, https://doi.org/10.1021/acs.jctc.0c00933.Strahan, J., J. Finkel, A. R. Dinner, and J. Weare, 2022: Forecasting using neural networks and short-trajectory data. arXiv, 2208.01717v1, https://doi.org/10.48550/ARXIV.2208.01717.

Tantet, A., F. R. van der Burgt, and H. A. Dijkstra, 2015: An early warning indicator for atmospheric blocking events using transfer operators.

,*Chaos***25**, 036406, https://doi.org/10.1063/1.4908174.Thiede, E., D. Giannakis, A. R. Dinner, and J. Weare, 2019: Approximation of dynamical quantities using trajectory data. arXiv, 1810.01841v2, https://doi.org/10.48550/arXiv.1810.01841.

Thual, S., A. J. Majda, N. Chen, and S. N. Stechmann, 2016: Simple stochastic model for El Niño with westerly wind bursts.

,*Proc. Natl. Acad. Sci. USA***113**, 10 245–10 250, https://doi.org/10.1073/pnas.1612002113.Timmermann, A., F.-F. Jin, and J. Abshagen, 2003: A nonlinear theory for El Niño bursting.

,*J. Atmos. Sci.***60**, 152–165, https://doi.org/10.1175/1520-0469(2003)060<0152:ANTFEN>2.0.CO;2.Vanden-Eijnden, E., 2006: Transition path theory.

*Computer Simulations in Condensed Matter Systems: From Materials to Chemical Biology*, Springer, 453–493, https://doi.org/10.1007/3-540-35273-2_13.Vitart, F., and A. W. Robertson, 2018: The Sub-Seasonal To Seasonal Prediction project (S2S) and the prediction of extreme events.

,*npj Climate Atmos. Sci.***1**, 3, https://doi.org/10.1038/s41612-018-0013-0.Webber, R. J., D. A. Plotkin, M. E. O’Neill, D. S. Abbot, and J. Weare, 2019: Practical rare event sampling for extreme mesoscale weather.

,*Chaos***29**, 053109, https://doi.org/10.1063/1.5081461.Yoden, S., 1987a: Bifurcation properties of a stratospheric vacillation model.

,*J. Atmos. Sci.***44**, 1723–1733, https://doi.org/10.1175/1520-0469(1987)044<1723:BPOASV>2.0.CO;2.Yoden, S., 1987b: Dynamical aspects of stratospheric vacillations in a highly truncated model.

,*J. Atmos. Sci.***44**, 3683–3695, https://doi.org/10.1175/1520-0469(1987)044<3683:DAOSVI>2.0.CO;2.