## 1. Introduction and background

The polar winter stratosphere typically supports a strong, cyclonic polar vortex, maintained by the thermal wind relation and meridional temperature gradient. A sudden stratospheric warming (SSW) event is a large excursion from this normal state, which can take many different forms. In split-type SSWs, the vortex splits completely in two. In displacement-type SSWs the vortex displaces far away from the pole (these can be considered wavenumber-2 and wavenumber-1 disturbances, respectively). Both types are “major” warmings, as the mean zonal wind reverses. In a “minor” warming, the zonal wind slows down significantly without completely reversing (Butler et al. 2015).

SSW is a rare event occurring about twice every 3 years, depending on the definition used (Butler et al. 2015). Its effects can propagate downward into the troposphere, altering the tropospheric jet stream and inducing extreme midlatitude surface weather events, including cold spells and precipitation (Baldwin and Dunkerton 2001; Thompson et al. 2002). Abrupt cold spells severely stress infrastructures, economies and human lives, and every bit of extra prediction lead time is helpful for adaptation. Unfortunately, numerical weather prediction struggles to forecast SSW at any lead time longer than about 2 weeks (Tripathi et al. 2016). Understanding SSW is therefore important for practical forecasting as well as science, but this task remains difficult. Several different geophysical fields are often used as indices of SSW onset. One simple indicator is zonal-mean zonal wind at 60°N, which defines thresholds for minor and major warming (Charlton and Polvani 2007; Butler et al. 2015). Another common indicator is the 10 hPa geopotential height field, which was used by Inatsu et al. (2015) to estimate a fluctuation–dissipation relation in its leading empirical orthogonal functions (EOFs). Many studies have examined SSW precursors and dominant pathways through simulation and observation. Limpasuvan et al. (2004), for instance, catalogued the various wavenumber forcings, heat fluxes and zonal wind anomalies that accompanied each stage of SSW events from reanalysis data. While planetary wave forcing from the troposphere is an accepted proximal cause of SSW, the polar vortex’s susceptibility to such forcing, or “preconditioning,” is a nontrivial and debated function of its geometry (Albers and Birner 2014; Bancalá et al. 2012). Tropospheric blocking is also thought to be linked to SSW; Martius et al. (2009) and Bao et al. (2017) found blocking to precede many major SSW events of the past half century. The diversity and complex life cycle of SSWs makes it difficult to build a unified picture of their onset.

In this article, we are interested in developing a detailed understanding of transition events between two states, at least one of which is typically long lived. Consider, for example, a particle with position *x*(*t*) moving in the double-well potential energy landscape *V*(*x*) = *x*^{4}/4 − *x*^{2}/2 (illustrated in Fig. 1) and forced by stochastic white noise *committor*: the probability of reaching the right well before the left well.

We denote this function by *q*(*x*), which solves the Kolmogorov backward equation (to be introduced later). For this simple system the equation takes a form that can be solved exactly:

Note that the boundary conditions are implied by the probabilistic interpretation.

The committor is plotted in the right panel of Fig. 1 for various noise levels. N.b., the potential landscape picture is not fully general, but is a useful mental model. The equations that determine *q*(*x*) will be presented in section 2.

In the case of SSW, the long-lived states are the steady and disturbed circulation regimes of the stratospheric polar vortex. Recent work by Yasuda et al. (2017) has studied SSW in an equilibrium statistical mechanics framework, with these two stable states as saddle points of energy functionals. Transition path theory (TPT) takes a complementary nonequilibrium view, describing the long-time (steady state) statistics of trajectories between the two states. For example, TPT introduces a probability density of reactive trajectories (or “reactive density”) indicating the regions where trajectories tend to spend their time en route from A to B. The system is said to be *reactive* at a point in time if it has most recently visited A and will next visit B. The associated probability current of reactive trajectories (or “reactive current”) indicates the preferred direction and speed of transition paths. These detailed descriptors of the mechanism underlying a rare event can be expressed in terms of probabilistic forecasts like the committor *q*(*x*), the probability of entering state A before reaching state B from a given initial condition *x* (not in either A or B). The committor is the ideal probabilistic forecast in the usual variance-minimizing sense of conditional expectations (Durrett 2013). Any other predictor of a transition derived through experiments and observations, such as vortex preconditioning and forcing at different wavenumbers (Albers and Birner 2014; Bancalá et al. 2012; Martius et al. 2009; Bao et al. 2017) necessarily corresponds to an approximation of the committor.

Ensemble simulation is a commonly used method to estimate the committor at a single given initial condition by measuring the fraction of ensemble trajectories that achieve the rare event. This is challenging because many simulations are needed to generate enough rare events for significant statistical power. Recent and ongoing work aims to channel this computing power more efficiently in weather simulation using importance sampling and large deviation theory (Hoffman et al. 2006; Weare 2009; Vanden-Eijnden and Weare 2013; Ragone et al. 2018; Dematteis et al. 2018; Plotkin et al. 2019; Webber et al. 2019). The committor, and other quantities of interest, are fundamentally averages over sample trajectories. We can also express them as solutions to a concrete set of partial differential equations (PDEs) using basic stochastic calculus. TPT provides a framework to exploit these quantities and enhance our understanding of rare events, both from simulation data and from the fundamental equations of motion.

TPT has been applied primarily to molecular dynamics simulation to determine reaction rates and pathways of complex conformational transitions (E and Vanden-Eijnden 2010; Metzner et al. 2006; E and Vanden-Eijnden 2006); however, the framework does not depend on the details of the underlying dynamical system. TPT deals particularly with stochastically forced systems, such as Brownian dynamics of particles. Stochastic forcing applies quite generally; while the climate system is deterministic in principle, nonlinear interactions between resolved and unresolved scales inevitably leads to resolution-dependent model errors that can be approximated as stochastic. Hasselmann (1976) originally formulated stochastic climate models to capture the influence of quickly evolving “weather” variables on the slowly evolving “climate” variables. Stochastic parameterization remains an active area of research. For example, Franzke and Majda (2006) had success in capturing energy fluxes of a three-layer quasigeostrophic model by projecting onto ten EOF modes and treating the remainder as stochastic forcing. Kitsios and Frederiksen (2019) addressed the challenge of designing consistent numerical schemes for subgrid-scale parameterization. Deep convection in the atmosphere and turbulence in the ocean boundary layer are two examples of multiscale processes that are especially challenging to resolve.

The aim of this article is to introduce the key quantities and relations describing the path properties of rare atmospheric events. Computing those quantities for more complicated systems than the low-order model studied here is a significant and, we argue, worthwhile challenge. We do not address that computational challenge here. Instead we note that development of approximation techniques for TPT and related quantities in high dimensional settings is an active area of research Thiede et al. (2019), Bowman et al. (2009), and Chodera and Noe (2014). These methods incorporate both simulated and observational data and we outline them briefly in the conclusions.

The paper is organized as follows. Section 2 describes the dynamical model we use, building on work by Ruzmaikin et al. (2003) and Birner and Williams (2008). Section 3 describes the mathematical framework of TPT, with detailed, but informal, derivations mainly put in the online supplement. Section 4 explains the methodology and results particular to this model.

## 2. Dynamical model

Holton and Mass (1976) studied “stratospheric vacillation cycles,” a certain kind of minor warming in which zonal wind oscillates on a roughly seasonal time scale. They posited a mechanism of wave-mean flow interaction, which continues to be an important modeling paradigm. The quasigeostrophic equations are confined to a *β*-plane channel from 60° to the North Pole, and the streamfunction is perturbed from below by orographically induced planetary waves, specified through the lower boundary condition. The Holton–Mass model combines the zonal-mean flow equations,

and the linearized quasigeostrophic potential vorticity equation,

where

and

These equations represent zonal [(2)] and meridional [(3)] momentum balance, conservation of energy [(4)], conservation of mass [(5)], and a combination of all these for the perturbations [(6)]. Overbars and primes represent zonal averages and perturbations. Φ is the geopotential height; *H* = 7 km is the atmospheric scale height; *z* = −*H*ln(*p*/*p*_{0}) is log pressure; *ρ*_{s} = *ρ*_{0}*e*^{−z/H} is a standard density profile; *α* = *α*(*z*) is an altitude-dependent damping coefficient; and

where *ψ*′ is the zonal perturbation of *ψ* = (*g*/*f*_{0})Φ. *k* = 2/(*a* cos60°) and *a* is Earth’s radius. These wavenumbers are commonly observed in real SSWs and used in theoretical studies (Birner and Williams 2008; Ruzmaikin et al. 2003; Yoden 1987; Holton and Mass 1976); a split-type SSW is an extreme wavenumber-two perturbation. The lower boundary condition at *z* = *z*_{B} (the tropopause) is

where *h* is a topographically induced perturbation to geopotential height at the tropopause. Holton and Mass found that for a certain range of *h*, this system has qualitatively different regimes: a steady eastward zonal flow close to radiative equilibrium, and a weaker zonal flow with quasi-periodic “vacillations” from eastward to westward, even under constant forcing. Each vacillation cycle consists of a sudden warming and cooling over the time scale of weeks. Although these individual cycles are interesting weather events unto themselves, in this paper we think of the vacillations as occurring within a general *climate regime* that is conducive to sudden warming, as opposed to the steady flow state, which is not. Transitions between these two regimes, which we focus on here, are more accurately described as climatological shifts than weather events. The study by Ruzmaikin et al. (2003) varies *h* on an interannual time scale, with each single winter season occupying one of the two stable states and generating its daily weather accordingly. Hence, for this paper we will use the term “climate transitions.”

The original Holton–Mass model discretizes the above PDE with finite differences across 27 vertical levels, which is assumed to be close to a continuum limit. Following several studies at this resolution (Holton and Mass 1976; Yoden 1987; Christiansen 2000), Ruzmaikin et al. (2003) did the most severe truncation possible, resolving only three vertical levels (including fixed boundaries) for easy analysis and exploration of parameter space. This reduces phase space to only three degrees of freedom: *U*(*t*), which modulates *X*(*t*) = Re{Ψ(*t*)}; and *Y*(*t*) = Im{Ψ(*t*)}, where *X* and *Y* modulate the amplitude and phase of the perturbation streamfunction:

Carrying the ansatz through the quasigeostrophic equations, Ruzmaikin et al. (2003) derived the following system:

The primary control parameter *h* represents topographic forcing and other sources of planetary waves, such as land–sea ice contrast. While Ruzmaikin et al. (2003) also varies Λ, representing vertical wind shear, we will only vary *h* and set Λ constant. Time derivatives

Remarkably, this hugely simplified model retains the qualitative structure of the Holton–Mass model as a bistable system for a certain range of *h* between the critical values *h*_{1} ≈ 20 m and *h*_{2} ≈ 160 m, as shown in the bifurcation diagram of Fig. 2. Blue points represent the normal state of the vortex, in approximate thermal wind balance with the radiative equilibrium temperature field (henceforth called the “radiative solution”). Red points represent a disturbed vortex, with weaker zonal wind and vacillations. This climatological regime supports more SSW events, and is henceforth called the “vacillating solution.” We use the same blue-red color scheme consistently here to represent these two states. Transitions between them happen on interannual time scales, affecting each year’s likelihood of SSW events. The structure of transitions is illustrated in Fig. 3: as *h* increases slowly past the bifurcation threshold *h*_{2}, the system enters a series of rapid, large-amplitude oscillations that spiral into the weaker-circulation state.

In Figs. 2 and 3, transitions require crossing the bifurcation threshold *h*_{2}, where the radiative solution ceases to exist. Birner and Williams (2008) introduced additive white-noise forcing in the *U* variable to model unresolved gravity waves and found that these perturbations were sufficient to excite the system out of its normal state and into a vacillating regime. In Fig. 4 we illustrate stochastic trajectories of the system for three different (fixed) values of *h*. (For numerical reasons we also add a small amount of independent white noise to *X* and *Y* variables). Even when *h* is far below *h*_{2}, transitions still occur, and in fact the preference for the vacillating solution branch increases quickly with *h*.

Birner and Williams (2008) used direct numerical simulation and the Fokker–Planck equation to calculate long-term occupation statistics, that is, how much time on average was spent in each regime and the mean first passage time before a transition to the vacillating regime, all for a range of forcing and noise levels. Our approach differs in both target and methodology. We aim to characterize the transition process between the two states, to monitor its progress in real time, as well as to describe statistics of the transition over many realizations. Methodologically, transition path theory phrases these questions in terms of the *generator* of the stochastic process, a differential operator that encodes all information about the behavior of the process.

## 3. Path properties

TPT characterizes the statistics of transitions between states. In this section, we introduce the key quantities needed for TPT as applied to the Ruzmaikin model to obtain a more complete picture than we get from the sample paths shown above. For mathematical details, see the online supplement and background literature (E and Vanden-Eijnden 2006; Metzner et al. 2006; E and Vanden-Eijnden 2010).

### a. Infinitesimal generator

The noisy Ruzmaikin model can be expressed compactly as a stochastic differential equation (SDE)—specifically a *diffusion* process—in the variable **b**(*z*) = [*b*_{1}(*z*), *b*_{2}(*z*), *b*_{3}(*z*)] and a 3 × 3 diffusion matrix ** σ**(

*z*):

Here, **W**_{t} is a three vector of independent Brownian motions. We use the Ito convention for stochastic integration. While ** σ** can in principle be any

*z*-dependent matrix, we make

**diagonal and constant:**

*σ***(**

*σ**z*) = diag(

*σ*

_{1},

*σ*

_{2},

*σ*

_{3}), creating independent additive noise in the

*X*,

*Y*, and

*U*variables;

*σ*

_{1}and

*σ*

_{2}have units of m

^{2}s

^{−1}day

^{−1/2}, while

*σ*

_{3}has units of m s

^{−1}day

^{−1/2}. Associated with this equation is the infinitesimal generator

*f*(⋅) is a smooth function of phase-space variables, then

where

Ito’s lemma (the chain rule for diffusion SDEs) gives the Kolmogorov backward equation, which represents

The diffusion matrix _{ij} =∂^{2}*f*/∂*z*_{i}∂*z*_{j}. The generator provides path statistics as the solution to PDEs, as illustrated in the following subsections.

### b. Equilibrium probability density

This stochastic process admits a time-dependent probability density *ρ*(*z*, *t*), which can be derived from the generator. For example, if the system starts in a known position *Z*_{0} = *z*, then *ρ*(*z*′, 0) = *δ*(*z* − *z*′). The density spreads out from this initial point over time according to the Fokker–Planck equation, which can be written in terms of the adjoint of the generator:

When *d***Z**_{t} = *d***W**_{t}, then **b** = 0 and *I*, giving the heat equation ∂_{t}*ρ* = (1/2)∇^{2}*ρ* *π*(*z*), which solves *π*(*z*) ≥ 0.

### c. Committor probability

The stationary density is an equilibrium quantity characterizing the long-term occupation statistics. But it is insufficient to describe the events of interest to us, which are *transition paths*: trajectory segments beginning inside the radiative state and ending inside the vacillating state. Specifically, we define the sets *A* and *B* as ellipsoids around these two fixed points, respectively. Their size is determined by contours of a local approximation to the stationary density *π*; see supplement for details. We say that a snapshot **Z**_{t} of the system is undergoing a *transition* (or reaction) at time *t* if it is on the way from set *A* to set *B*. This involves information about both its future and its past, for which we introduce the forward and backward committor probabilities in this section.

The forward committor *q*^{+} (denoted *q* when context is clear) describes the progress of a stochastic trajectory traveling from set *A* to set *B*, as follows:

The boundary conditions on *A* and *B* follow naturally from the probabilistic definition. If the system begins in set *A*, by path continuity it will certainly next find itself in *A*, with zero chance of hitting *B* first. Starting in set *B* the opposite is true. The committor therefore obeys the boundary value problem (see online supplement for derivation)

This equivalence of a conditional expectation with respect to a Markov process like the committor and the solution to PDE involving the generator of the process is generally referred to as a Feynman–Kac relation (Karatzas and Shreve 1998) and is well studied. The PDE in (25) is most naturally posed on an infinite domain, but as a numerical approximation we solve it in a large rectangular domain and impose homogeneous Neumann conditions at the domain boundary. A limiting example is the noise-dominated case, where **b**(*z*) is negligible and

If posed on the interval [0, 1], with *A* = {0} and *B* = {1}, the solution is *q*^{+}(*z*) = *z*. The linear increase from set *A* to set *B* reflects the greater likelihood of entering *B* when beginning closer to it. This limit is reflected in Fig. 1, which shows the committor of the double-well potential approaching a straight line for large ** σ** values.

Prediction is naturally much harder in high-dimensional systems such as stratospheric models. A number of physically interpretable fields, such as zonal wind and geopotential height anomalies, seem to have some predictive power for SSW, but prediction by any single such diagnostic is suboptimal. Insofar as they are successful, these variables approximate certain aspects of the committor. For example, the committor might increase monotonically with the quasi-biennial oscillation index. Furthermore, statistical correlations potentially obscure the conditional relationships needed. For example, Martius et al. (2009) and Bao et al. (2017) examined tropospheric precursors to SSW events in reanalysis records, finding that blocking events preceded most major SSWs, potentially by enhancing upward-propagating planetary waves. (We use “precursor” only to mean an event that sometimes happens before SSW.) Blocking influences SSW through height perturbations at the tropopause, which would enter the Ruzmaikin model as low-frequency variations in lower boundary forcing *h*. Since we fix *h* constant, the blocking precursor is outside our scope here. However, farther down the dynamic chain are other measurable precursors such vertical wave activity flux and meridional heat flux, which are also found to have predictive power (Sjoberg and Birner 2012). However comprehensive the model, we would naturally expect the true committor probability to exhibit similar patterns to canonical precursors of that model such as blocking (for a troposphere-coupled model) and heat flux (for a stratosphere-only model). However, there is an important difference: while a precursor *P* may appear with high probability *given* that a SSW is imminent, the committor specifies the probability of a SSW given an observed pattern. As acknowledged in Martius et al. (2009), many blocking events did not lead to SSW events, meaning that

While *q*^{+} describes the future of a transition, the backward committor *q*^{−} describes its past. It is defined as

Here, *q*^{−} solves the time-reversed Kolmogorov backward equation

We now describe the fundamental statistics characterizing transition events as identified by TPT and explain how they can be expressed in terms of quantities such as *q*^{+}, *q*^{−}, and *π*. The probability density of reactive trajectories *ρ*_{R}(*z*), the probability of observing the system **Z**_{t} at the location *z* during a transition, is proportional (up to a normalization constant) to the product *π*(*z*)*q*^{−}(*z*)*q*^{+}(*z*). This density is large in regions of phase space that are highly trafficked by reactive trajectories. This is how TPT gives information about precursors, indicating regions of phase space that are usually visited by the system over the course of a transition path.

The direction and intensity of this traffic is specified by the *reactive current*. To develop this concept, we start by introducing the probability current **J**, a vector field that satisfies a continuity equation with the time-dependent density *ρ*:

If *ρ* were the density and *υ* the velocity field of a fluid, **J** would be *ρυ*. One can think of **J** as an instantaneous (in time and position) average over all possible system trajectories, though a precise mathematical description requires some care. In equilibrium, when *ρ* = *π* is no longer changing, ∇ ⋅ **J** = 0, or equivalently *C* is any closed surface.

The reactive current **J**_{AB} is also an “average velocity,” but restricted to reactive paths. Unlike **J**, **J**_{AB} is not divergence-free, with a source in *A* and a sink in *B* (where transition paths start and end). We define **J**_{AB} implicitly via surface integrals. If *C* is any surface enclosing set *A* but not set *B*, with outward normal *n*, then the flux *transition rate R*_{AB}; see E and Vanden-Eijnden (2010) for details. The supplement describes another expression in terms of the generator. The result is (Metzner et al. 2006)

where again *π* is the stationary density. This expression has intuitive ingredients. Multiplying **J** by *q*^{+}*q*^{−} conditions the equilibrium probability current on the trajectory being reactive, meaning en route from *A* to *B*. The *q*^{−}∇*q*^{+} − *q*^{+}∇*q*^{−} reflects the fact that trajectories from *A* to *B* must ascend a gradient of *q*^{+}, going from *q*^{+} = 0 to *q*^{+} = 1, while descending a gradient of *q*^{−}.

Just as *J*_{AB}(*z*) describes the average reactive velocity, a streamline *z*_{t} of *J*_{AB}(*z*) (solving *z*_{0} = **a** ∈ *A* and *z*_{T} = **b** ∈ *B* for some *T* > 0) is a kind of “average” transition path. Although the streamline will not be realized by any particular transition path, it will have common geometric features in phase space with many actual path samples. At low noise the reactive trajectories will cluster in a thin corridor about the streamline. The streamline is a more dynamical description of precursors: whereas regions of high reactive density are commonly observed states along reactive trajectories, streamlines of reactive current are commonly observed *sequences* of states along reactive trajectories. The study by Limpasuvan et al. (2004), for example, described a sequence of events in a prototypical SSW life cycle based on reanalysis including vortex preconditioning, wave forcing, and anomalous heat fluxes at various levels in the troposphere and stratosphere. The sequence described there likely corresponds to a streamline of the reactive trajectory.

The committor also quantifies the relative balance of time spent on the way to each set. If more probability mass lies in the region where *q*^{+} > 1/2, set *B* is globally more imminent, whereas more mass where *q*^{+} < 1/2 indicates set *A* is. A single summary statistic of imminence is the average committor during a long trajectory,

An average below (above) 1/2 *would indicate more time spent on the way* to *A* (*B*).

Another statistic, the forward transition rate, captures the frequency of transitions between *A* and *B* rather than the overall time spent in each. We earlier defined *R*_{AB} as the number of *A* → *B* transitions per unit time. Since a *B* → *A* transition must occur between every two *A* → *B* transitions, *R*_{AB} = *R*_{BA} =: *R*. The inverse of the transition rate is the return time, a widely used metric for changing frequency of extreme events under climate change scenarios (Easterling et al. 2000). However, the forward and backward transitions may differ in important characteristics like speed. To capture this asymmetry, we need a *dynamical* analog to the equilibrium statistic ⟨*q*^{+}⟩_{π}. The typical quantity of choice is the rate *constant k*_{AB}, which is larger if *A* → *B* transitions happen faster than *B* → *A* transitions. We therefore normalize by the overall time spent having come from *A*, which is ⟨*q*^{−}⟩_{π}:

This rate constant, defined in Vanden-Eijnden (2014), parallels the chemistry definition. If *X*_{A} and *X*_{B} are two chemical species, with [⋅] denoting concentration, the forward and backward-rate constants *k*_{AB} and *k*_{BA} are defined so that

In the language of transition path theory, [*X*_{A}] is the long-term probability of the system existing most recently in state *A*, which is ⟨*q*^{−}⟩_{π}. Rates are also expressible in terms of expected passage times. Thinking of [*X*_{A}] as the total probability of having last visited set *A*, 1/*k*_{AB} = [*X*_{A}]/*R* estimates the total transition time between entering *A* (having last visited *B*) and next reentering *B*. It is these inverse quantities we display in the results section.

These quantities together make an informative description of the typical transition process from *A* to *B*. We now proceed to analyze the transition path properties of the Ruzmaikin stratospheric model.

## 4. Methodology

### Spatial discretization

The quantities of interest described above (*π*, *q*^{+}, *q*^{−}, and **J**_{AB}) emerge as solutions to PDEs involving the generator *d* dimensions. Here we use the same domain and noise levels as Birner and Williams (2008): −0.06 ≤ *X* ≤ 0.04, −0.05 ≤ *Y* ≤ 0.05, 0 ≤ *U* ≤ 0.8 in units nondimensionalized in terms of the radius of Earth and the length of a day. We tile this with a grid of 40 × 40 × 80 grid cells. We choose a noise constant *σ*_{3} in the *U* variable in the range 0.4–1.5. This is a similar range to observed atmospheric gravity wave momentum forcing (Birner and Williams 2008). For numerical reasons, we also add small noise to the streamfunction variables *X* and *Y*, in proportion to the domain size. Specifically, as *U* spans a range of 0.8 and *X*, *Y* span a smaller range of 0.1, we choose *σ*_{1} and *σ*_{2} to be *σ*_{3} × (0.1/0.8). This adjustment does change our results with respect to Birner and Williams (2008), causing more transitions in both directions at lower *h* than if only the *U* variable were perturbed. While gravity wave drag forces the zonal wind, eddy interactions and other sources of internal variability can perturb the streamfunction as well, and it is not uncommon to represent these effects stochastically (DelSole and Farrell 1995). There are surely more accurate representations of noise, but this important issue is not our focus. We retain these perturbations for numerical convenience, but stress that the general principles of the TPT framework are independent of any specific form of stochasticity. In the forthcoming experiments, we will refer only to *σ*_{3} with the understanding that *σ*_{1} and *σ*_{2} are adjusted proportionally. The discretization we use has strengths and limitations. Given the matrix *δX*, *δY*, *δU*). In our current example, the spacing is not nearly small enough to guarantee this (matrix entries were just as often negative as positive), but results are still accurate, as verified by stochastic simulations to be described in the results section. While we could have used one-sided finite differences to enforce positivity, this would have degraded the overall numerical accuracy of the solutions. We opted instead to zero out negatives, which were always negligible in magnitude.

The discretized Kolmogorov backward equation is *q*^{+} = 0, augmented with appropriate boundary conditions. The definition of *A* and *B* is a design choice that should satisfy three conditions: 1) they are disjoint, 2) *A* contains the radiative fixed point and *B* the fixed point of the vacillating regime, and 3) both sets are relatively stable in the chosen noise range. We choose *A* and *B* to be ellipses with orientations determined by the covariance of the equilibrium density of the linearized stochastic dynamics about their respective fixed points, as described in the online supplement. The choice of the sizes of *A* and *B* is a subjective decision that alters the very definition of a reactive trajectory; hence, different sizes emphasize different features of the transition path ensemble, especially in oscillatory systems like this one. We made *A* and *B* large enough to enclose the many loops that often accompany the escape from *A* and the descent into *B*, so that we can focus on the relatively rare crossing of phase space. More sophisticated techniques exist for shrinking the two sets while erasing resulting loops (Lu and Vanden-Eijnden 2014; Banisch and Vanden-Eijnden 2016); for simplicity, we forgo these techniques for the current study.

Careful discretization is important for constructing the dominant pathways discussed above, that is, the streamlines *z*_{t} satisfying *q*^{+}, *q*^{−}, and *π*. These can be severe enough to prevent *z*_{t} from reaching set *B*. To guarantee that full transitions are extracted, we instead solve shortest-path algorithms on the graph induced by the discretization, as described in Metzner et al. (2009). The supplement contains more details on this computation.

## 5. Results

We begin this section by describing the kinematic path characteristics of the process in its three-dimensional phase space, according to the quantities described above. Following this purely geometrical description, we will suggest some dynamical interpretations and compare with previous studies. Finally, we will map statistical features as functions of background parameters.

The Ruzmaikin model is attractive for demonstrating use of the tools introduced in section 3 due to its low-dimensional state space, in which PDEs can be solved numerically using standard methods such as our finite volume scheme. We tested the committor’s accuracy empirically by randomly selecting 50 cells in our grid (this is 0.04% of the grid) and evolving *n* = 60 stochastic trajectories forward in time from each, stopping when they reach either set *A* or set *B*. The fraction of trajectories starting from *z* that first reach *B* is taken as the empirical committor at point *z* and is denoted *q*^{+} from the finite volume scheme. Figure 5 clearly demonstrates the usefulness of the committor for probabilistic forecasting. The left column displays the committor calculated from finite volumes, averaged in the *Y* direction, for two different forcing levels *h*. The right column shows a scatterplot of *q*^{+} at the 50 randomly selected grid cells. We expect the points to fall along the line

Figure 5 also shows how *q*^{+} responds to increasing *h*, even far below the bifurcation threshold: the committor values throughout state space become rapidly skewed toward unity (meaning redder in the picture). This means that even slight perturbations can kick the system out of state *A* toward state *B*. Another indicator is the “isocommittor surface,” the set of points *z* such that *q*^{+}(*z*) = 1/2; that is, the system has equal probability of next entering set *A* or *B*. In the left-hand column this is the set of gray points (averaging out the variable *Y*). For low forcing, this surface tightly encloses set *B*, meaning the system must wander very close before a transition is imminent. For high forcing values, the isocommittor hugs set *A* more closely, meaning that small perturbations from this normal state can easily push the system into dangerous territory. In Fig. 6, the isocommittor is shown as a set of gray points in a 3D plot viewed from various vantage points. In the low-*U* region, the isocommittor resembles a spiral staircase, reflecting the spiral-shaped stable manifold of the fixed point in set *B*. Different initial positions with the same streamfunction phase, differing only slightly in the *U* direction, can have drastically different final destinations. These spiral surfaces are responsible for the blue lobes in the lower part of Fig. 5, but they disappear at higher noise.

Figures 7 and 8 display numerical solutions of the equilibrium density *π* and reactive density *ρ*_{R} ∝ *πq*^{+}*q*^{−} for two forcing levels. While *π* indicates where **Z**_{t} tends to reside, *ρ*_{R} indicates where **Z**_{t} resides *given* a transition from *A* to *B* is underway. As *h* increases, even far below the bifurcation threshold, *π* responds strongly, shifting weight toward state *B*. On the other hand, the reactive density displays similar characteristics for all *h* values. In the *X*–*U* plane, the two lobes of high reactive density surrounding *A* indicate that zonal wind tends to remain strong for a while before dipping into the weaker regime. Viewing the same field in the *X*–*Y* plane (Fig. 8) reveals a halo of intermediate density about set *A*. While many different motions would be consistent with this pattern, the coming figures verify that the early stages of transition have circular loops in the *X*–*Y* plane, meaning zonal movement of the streamfunction’s peaks and troughs. The exact streamfunction phase corresponding to the (*X*, *Y*) position is calculated as follows. Recall the streamfunction is *X* + *iY*. In polar coordinates, *ϕ* = tan^{−1}(*Y*/*X*). The full streamfunction is

where *λ* is longitude.

The angle from the origin in the *X*–*Y* plane indicates the zonal streamfunction phase, and circular motion indicates zonal movement. (This “looping” motion is indeed shared by the transition path samples shown in Figs. 9 and 10, to be described later.)

The darkest (most-trafficked) region of this loop is the sector (*π*/4) ≲ *ϕ* ≲ (*π*/4). The relationship between (*X*, *Y*) and *λ* indicates *ψ*′ is maximized at longitudes *λ* = {−*ϕ*/2, *π* − *ϕ*/2}. As the maximum reactive density occurs around *ϕ* = 3*π*/8, the streamfunction peaks are at {−3*π*/16, 13*π*/16} ≈ {326°, 146°}. What is the significance of this phase relative to the lower boundary forcing? Recalling the forcing form Re{Ψ(*z*_{B}, *t*)*e*^{ikx}} = *h*Re{*e*^{i2λ}} ∝ cos(2*λ*), the bottom peaks are located at *λ* = {0, *π*}. Hence, the bulk of the transition process happens when the perturbation streamfunction at the midstratosphere lags the lower boundary condition by 3/16 ± 1/16 of a wavelength. Meanwhile, the *X*–*U* plane reveals what happens to the zonal wind speed during the SSW transition. The high-reactive density region discussed above coincides with the crescent-shaped bridge of high density between the sets in Fig. 7. This suggests that in an SSW, the zonal wind weakens while the streamfunction stays in that particular phase window.

The pictures of reactive density suggest that reactive trajectories tend to loop around set *A*, physically meaning the streamfunction tends to travel in one direction before slowing down, but they technically convey no *directional* information to explicitly support this claim. For this, we turn to the reactive current. We computed the discrete-space effective current matrix *π*, *q*^{+} and *q*^{−}. Physically, this matrix represents the flux of a vector field from grid cell *i* to cell *j*. From this we calculated the maximum-current paths as described in Metzner et al. (2009) and displayed the results in Fig. 9 for a forcing level of *h* = 30 m (other levels are qualitatively similar). Both the *X*–*U* and *X*–*Y* views are shown. Superimposed on these paths are seven actual reactive trajectories that occurred during a long stochastic simulation, to demonstrate features that are captured by the dominant pathways. The dominant path from *A* to *B* indeed contains a half loop in the *X*–*Y* plane in the clockwise direction, which means an eastward phase velocity. With a smaller set *A*, this dominant path would contain more of these loops. However, during the next transition stage, the streamfunction slows to a halt at the phase angle *ϕ* = *π*/2, doubles back and travels westward as zonal wind loses strength. The smear of high density in the neighborhood *ϕ* ~ 3*π*/16 therefore comprises not only a precipitous drop in zonal wind (which happens at the edge of that region) but also a backtrack, this time with weaker background zonal wind. This behavior is borne out by the trajectory samples, which vacillate in the upper-middle section of the *X*–*Y* plot. These paths are displayed as space–time diagrams of the streamfunction in Fig. 10. In Fig. 10a, the dominant path’s two loops correspond to two troughs moving east past a fixed longitude before the slowdown. The random streamfunction trajectories shown in Figs. 10b–g do not follow this representative history exactly, but they do combine elements of it: steady eastward wave propagation followed by slowdown and reversal. Each stage can have multiple false starts. Notably, the slowdown consistently happens at the same phase, with peaks at ~120° and 300°E, at roughly the same phase as found from the density plots. In fact, the figures show a brief slowdown every time the streamfunction passes this phase. This can be thought of as a representation of blocking events that often accompany sudden stratospheric warmings. The third transition path shown is an exception to the general pattern, making a final turn toward the east instead of to the west. This outlier of a reactive trajectory can also be seen in Fig. 9, as the single green trajectory that decreases in *X* before decreasing in *U* instead of the other way around.

This kinematic sequence of events has a dynamical interpretation with precedent in prior literature. A critical ingredient of SSW is meridional eddy heat flux, which in this model takes the form *U* in (15), showing that a reduced equator-to-pole temperature gradient in turn weakens the vortex via the thermal wind relation. The association of heat flux with SSW has been demonstrated in reanalysis (Sjoberg and Birner 2012) and in detailed numerical simulations of internal stratospheric dynamics, even with time-independent lower boundary forcing (Scott and Polvani 2006). This relationship favors the phase *ϕ* = *π*/2 as the most susceptible state for SSW onset, which is exactly picked out by the dominant transition pathway in Fig. 9.

However, immediately after the wind starts weakening at *ϕ* = *π*/2, where the streamfunction lines up with its lower boundary condition, the phase velocity reverses, giving rise to the westward lag of 3*π*/16 we observed in the reactive density. A similar phase lag has also been observed in more detailed numerical studies. For example, Scott and Polvani (2006) observed a lag of *π*/2 across the whole stratosphere (*π*/4 at the midlevel), quite similar to our result. They found that vortex breakup was preceded by a long, slow build-up phase in which the vortex became increasingly vertically coherent, only to be ripped apart by an upward- and west-propagating wave. In an experiment with slowly increasing lower boundary forcing, Dunkerton et al. (1981) saw a phase lag across the whole stratosphere that increased from ~100° to ~180° (50° to 90° between the lower boundary and the midstratosphere) over the course of the warming event. They attribute this phase tilt to the zonal wind rapidly reversing and carrying the streamfunction along. The weakening zonal wind simply removes the Doppler shift from the Rossby wave dispersion relation, *U*, rotation in the *X*–*Y* plane is counterclockwise and phase speed is westward.

Let us reemphasize the probabilistic interpretation of reactive density. We have found that transitions from *A* to *B* are accompanied by anomalous increases in meridional heat flux. In other words, *B*; the committor alone conveys that information. Rather, a trajectory is highly likely to pass through that region *given* that it is reactive. Notably, reactive trajectories are unlikely to take a straight-line path from *A* to *B* with *U*, *X*, and *Y* changing linearly. This unrealistic path would represent a zonally stationary streamfunction growing steadily in magnitude, while zonal wind falls off gradually. At higher noise levels, however, the system would be increasingly dominated by pure Brownian motion, and such a path would become more plausible.

We now turn to a quantitative comparison of committors and transition rates for different forcing and noise levels. These trends illustrate the effects of modeling choices and global change on the climatology of SSW. Planetary wave forcing *h* varies across days and seasons as well as different planets. The strength of additive noise *σ*_{3} (which determines the full diffusion matrix ** σ** by proportionality) is a modeling choice intended to represent gravity wave drag. Different stochastic parameterizations will vary in their effective

*σ*

_{3}value, and it is important to understand the sensitivity of SSW to model choices (Sigmond and Scinocca 2010). Furthermore, long-term climate change may cause both parameters to drift, altering the occurrence of SSW-induced severe weather events.

The measure of the relative “imminence” of a vacillating solution versus a radiative solution, as described in the background section on committors, is the equilibrium density-weighted average committor, denoted ⟨*q*^{+}⟩_{π}. Figure 11a shows this quantity for 25 ≤ *h* ≤ 45 m and 0.4 ≤ *σ*_{3} ≤ 2.0 m s^{−1} day^{−1/2}. Two trends are clearly expected from the basic physics of the model. First, as seen in Figs. 7 and 8, ⟨*q*^{+}⟩_{π} should increase with *h*. Second, in the limit of large noise and infinite domain size, the dominance of Brownian motion will smooth out the committor function and make ⟨*q*^{+}⟩_{π} tend to an intermediate value between zero and one. On the other hand, as noise approaches zero, the dynamical system becomes increasingly deterministic, and the ultimate destination of a trajectory will depend entirely on which basin of attraction it starts in. The boundary, or *separatrix*, between these two basins is the stable manifold of the third (unstable) fixed point. In the case of a potential system, of the form *V*. Our system admits no such potential function, but this is a useful visual analogy. The committor function becomes a step function in the deterministic limit, with the discontinuity located exactly on this boundary. The addition of low noise moves the committor 1/2 surface away from the separatrix, possibly asymmetrically: one basin will shrink, becoming more precarious with respect to random perturbations, while the other will expand, becoming a stronger global attractor. Which basin will shrink is not evident a priori, so we compute the averaged committor, ⟨*q*^{+}⟩_{π}, as a summary statistic that will increase when the basin of *B* expands.

Figure 11a plots the trends in ⟨*q*^{+}⟩_{π} as a function of *h* (along the horizontal axis) and *σ*_{3} (along the vertical axis). The two basic hypotheses are verified: ⟨*q*^{+}⟩_{π} increases monotonically as *h* increases, and ⟨*q*^{+}⟩_{π} ~ 1/2 as *σ*_{3} increases, no matter the value of *h*. The less predictable behavior is in the range *h* = 35 m, 0.75 ≤ *σ*_{3} ≤ 1.0 m s^{−1} day^{−1/2}, where ⟨*q*^{+}⟩_{π} displays nonmonotonicity with respect to noise, at low noise levels. As *σ*_{3} increases, the average committor increases from ~0.4 to ~0.6, and then decreases again. The four committor plots at the bottom of 11 illustrate the trend graphically. At low noise, the *A* basin includes winding passageways leading from the small-*U* region back to *A*. Small additive noise closes them off, effectively expanding the *B* basin. As noise increases and Brownian motion dominates the dynamics, committor values everywhere relax back to less-extreme values, reflecting the unbiased nature of Brownian motion.

Despite the coarse grid resolution, the first-order effect of noise is clear. At the low and high margins of *h*, where falling into state *A* and *B* respectively is virtually certain, an increase in noise decreases this virtual certainty, and the trend continues at larger noise to attenuate ⟨*q*^{+}⟩_{π} to its limit of 1/2. The middle *h* range, however, behaves differently. Whereas *h* = 35 m appears to balance out the basin sizes at low noise, a slight noise increase tends to kick the system out of the *A* basin and toward *B*, more so than the other way around. At higher noise, the committor relaxes back to 1/2. Examining the committor fields, it is clear that the isocommittor surface location does not move back and forth; rather, it moves toward *A*, and then the rest of the field flattens out.

Figure 12 shows trends in the return times of SSW with varying *h* and *σ*_{3}. There are several different return times of interest. The first, shown in Fig. 12a, is the total expected time between one transition event and the next, whose reciprocal is the *rate R*_{AB}, the number of transitions per unit time. The return time is a symmetric quantity between *A* and *B*, since every forward transition is accompanied by a backward one. Among the parameter combinations, (*h*, *σ*_{3}) ≈ (35 m, 0.75 m s^{−1} day^{−1/2}) is the one that minimizes return time, or equivalently maximizes the transitions per unit time. *h* = 35 m is a forcing level that approximately balances out the time spent between the two sets, making transitions relatively common. At lower noise, transitions are exceedingly rare, and at higher noise the two states cease to be long lived. However, this symmetric quantity does not capture information about the relative speed of transition from *A* to *B* versus from *B* back to *A*. Figure 12b shows a different passage time, which is the average time between the end of a backward (*B* → *A*) transition and the end of the next forward (*A* → *B*) transition, which we call *T*_{AB}. This is computed as the reciprocal of the rate constant *k*_{AB}, as described in the previous section. In other words, the stopwatch begins when the system returns to *A* after having last visited *B*, and ends when the system next hits *B*. This metric is asymmetric: a smaller *A* → *B* return time indicates that the forward transition is faster than the backward transition. Figure 12b shows the complementary *B* → *A* return time *T*_{BA}. Unsurprisingly, an increase in *h* causes a decrease in *T*_{AB} and an increase in *T*_{BA} regardless of the noise level. The noise level has a less obvious effect. Whereas *T*_{BA} decreases monotonically with increasing noise, regardless of *h*, the forward time *T*_{AB} is minimized by a midrange noise level of *σ*_{3} ≈ 0.75 m s^{−1} day^{−1/2}. This is another reflection of the bias toward state *B* that is effected by adding noise to a very low baseline.

## 6. Conclusions

Transition path theory is a framework for describing rare transitions between states. We have described TPT along with a number of its key ingredients like the forward and backward committor functions. While it has been applied primarily to molecular systems, we believe it offers valuable insight into climate and weather phenomena such as sudden stratospheric warming, primarily through committors, reactive densities, and reactive currents. Of interest apart from its role in TPT, the committor defines an optimal probabilistic forecast, borne out by direct numerical simulation experiments. The reactive densities and currents describe the geometric properties of dominant transition mechanisms at low noise. In applying TPT to a noisy, truncated Holton–Mass model, we find that transitions tend to begin with a drop in mean zonal wind and a reversal of the streamfunction’s phase velocity at a particular streamfunction phase. This is consistent with the significance of blocking precursors to SSW as found in Martius et al. (2009), insofar as this idealized model can represent them. We also find that noise has a nonmonotonic effect on the overall preference for a vacillating state, measured by the average committor. At a forcing of *h* = 35 m, where the isocommittor surface (essentially the basin boundary of the deterministic dynamics) divides the space approximately in half, we find that raising the noise tilts the balance decisively toward the vacillating solution. Still larger noise evens the whole field out. The transition rate constant shows a similar dependence on *h* and *σ*_{3}.

In future work, we plan to scale these methods to more realistic and complex models as well as observational data, where predicting SSW remains an active area of research. In this high-dimensional setting, the generator may be unknown or computationally intractable. Any state space with more than ~5 degrees of freedom is beyond the reach of a finite-volume discretization, because the number of grid cells increases exponentially with dimension. However, there is a generic insight that physical models evolve on very low-dimensional manifolds within the full available state space. A growing body of research in molecular dynamics (Thiede et al. 2019), fluid dynamics (Giannakis et al. 2018; Froyland and Junge 2018), climate dynamics (Giannakis and Majda 2012; Sabeerali et al. 2017), and general multiscale systems (Harlim and Yang 2018; Berry et al. 2015; Giannakis 2015, 2019) exploits the intrinsic low-dimensionality to represent the infinitesimal generator more efficiently. While here we represented the generator as a finite-volume or finite-difference operator on a grid, one can also write it in a basis of globally coherent functions, such as Fourier modes, or more generally harmonic functions on a manifold. Given only data, without an explicit form of the dynamics, this manifold and the basis functions can be estimated from (for example) the diffusion maps algorithm, and the generator’s action on this basis can be approximated from short trajectories. These ideas can be applied to computing the dynamical statistics that have been the focus of this paper (Thiede et al. 2019). We hope that these techniques will enable more efficient observation strategies for targeted data assimilation procedures with the goal of tracking the progression of specific extreme events, including hurricanes and heat waves as well as sudden stratospheric warming.

## Acknowledgments

Justin Finkel was funded by the Department of Energy Computational Science Graduate Fellowship under Grant DE-FG02-97ER25308. We acknowledge support from the National Science Foundation under NSF Award 1623064. Jonathan Weare was supported by the Advanced Scientific Computing Research Program within the DOE Office of Science through Award DE-SC0020427. The reviewers of the paper provided invaluable insights on physical interpretation of the mathematical results. In particular, they helped us to recognize the connection between our computed maximum-flux reactive pathway and anomalous heat fluxes that weaken the zonal wind and reverse the streamfunction’s phase velocity. We thank Cristina Cadavid, Thomas Birner and Paul Williams for helpful clarifications about previous work. We thank Eric Vanden-Eijnden, Mary Silber, Robert Webber, Erik Thiede, Aaron Dinner, Tiffany Shaw, and Noboru Nakamura for useful discussions throughout this project. The University of Chicago’s Research Computing Center provided the computational resources to greatly expedite the calculations done here.

Data availability statement: Data from this study will be made available upon request.

## APPENDIX

### Numerical Constants

Below are the numerical coefficients used in the reduced-order Ruzmaikin model, with very similar values to Ruzmaikin et al. (2003) and Birner and Williams (2008). The relationship with physical parameters is described in the appendix of Ruzmaikin et al. (2003). Note that our notation differs slightly: following Birner and Williams (2008), we write the topographic forcing in terms of *h* rather than Ψ_{0} = *gh*/*f*_{0}, a difference that results in numerical factors of ~1000 depending on the convention used:

## REFERENCES

Albers, J. R., and T. Birner, 2014: Vortex preconditioning due to planetary and gravity waves prior to sudden stratospheric warmings.

, 71, 4028–4054, https://doi.org/10.1175/JAS-D-14-0026.1.*J. Atmos. Sci.*Baldwin, M. P., and T. J. Dunkerton, 2001: Stratospheric harbingers of anomalous weather regimes.

, 294, 581–584, https://doi.org/10.1126/science.1063315.*Science*Bancalá, S., K. Krüger, and M. Giorgetta, 2012: The preconditioning of major sudden stratospheric warmings.

, 117, D04101, https://doi.org/10.1029/2011JD016769.*J. Geophys. Res.*Banisch, R., and E. Vanden-Eijnden, 2016: Direct generation of loop-erased transition paths in non-equilibrium reactions.

, 195, 443–468, https://doi.org/10.1039/C6FD00149A.*Faraday Discuss.*Bao, M., X. Tan, D. L. Hartmann, and P. Ceppi, 2017: Classifying the tropospheric precursor patterns of sudden stratospheric warmings.

, 44, 8011–8016, https://doi.org/10.1002/2017GL074611.*Geophys. Res. Lett.*Berry, T., D. Giannakis, and J. Harlim, 2015: Nonparametric forecasting of low-dimensional dynamical systems.

, 91E, 032915, https://doi.org/10.1103/PhysRevE.91.032915.*Phys. Rev.*Birner, T., and P. D. Williams, 2008: Sudden stratospheric warmings as noise-induced transitions.

, 65, 3337–3343, https://doi.org/10.1175/2008JAS2770.1.*J. Atmos. Sci.*Bou-Rabee, N., and E. Vanden-Eijnden, 2015: Continuous-time random walks for the numerical solution of stochastic differential equations. American Mathematical Society Paper, 124 pp.

Bowman, G. R., K. A. Beauchamp, G. Boxer, and V. S. Pande, 2009: Progress and challenges in the automated construction of Markov state models for full protein systems.

, 131, 124101, https://doi.org/10.1063/1.3216567.*J. Chem. Phys.*Butler, A. H., D. J. Seidel, S. C. Hardiman, N. Butchart, T. Birner, and A. Match, 2015: Defining sudden stratospheric warmings.

, 96, 1913–1928, https://doi.org/10.1175/BAMS-D-13-00173.1.*Bull. Amer. Meteor. Soc.*Charlton, A. J., and L. M. Polvani, 2007: A new look at stratospheric sudden warmings. Part I: Climatology and modeling benchmarks.

, 20, 449–469, https://doi.org/10.1175/JCLI3996.1.*J. Climate*Chodera, J. D., and F. Noe, 2014: Markov state models of biomolecular conformational dynamics.

, 25, 135–144, https://doi.org/10.1016/j.sbi.2014.04.002.*Curr. Opin. Struct. Biol.*Christiansen, B., 2000: Chaos, quasiperiodicity, and interannual variability: Studies of a stratospheric vacillation model.

, 57, 3161–3173, https://doi.org/10.1175/1520-0469(2000)057<3161:CQAIVS>2.0.CO;2.*J. Atmos. Sci.*DelSole, T., and B. F. Farrell, 1995: A stochastically excited linear system as a model for quasigeostrophic turbulence: Analytic results for one- and two-layer fluids.

, 52, 2531–2547, https://doi.org/10.1175/1520-0469(1995)052<2531:ASELSA>2.0.CO;2.*J. Atmos. Sci.*Dematteis, G., T. Grafke, and E. Vanden-Eijnden, 2018: Rogue waves and large deviations in deep sea.

, 115, 855–860, https://doi.org/10.1073/pnas.1710670115.*Proc. Natl. Acad. Sci. USA*Dunkerton, T., C.-P. F. Hsu, and M. E. McIntyre, 1981: Some Eulerian and Lagrangian diagnostics for a model stratospheric warming.

, 38, 819–844, https://doi.org/10.1175/1520-0469(1981)038<0819:SEALDF>2.0.CO;2.*J. Atmos. Sci.*Durrett, R., 2013:

. Cambridge University Press, 430 pp.*Probability: Theory and Examples*E, W., and E. Vanden-Eijnden, 2006: Towards a theory of transition paths.

, 123, 503–523, https://doi.org/10.1007/s10955-005-9003-9.*J. Stat. Phys.*E, W., and E. Vanden-Eijnden, 2010: Transition-path theory and path-finding algorithms for the study of rare events.

, 61, 391–420, https://doi.org/10.1146/annurev.physchem.040808.090412.*Annu. Rev. Phys. Chem.*Easterling, D. R., G. A. Meehl, C. Parmesan, S. A. Changnon, T. R. Karl, and L. O. Mearns, 2000: Climate extremes: Observations, modeling, and impacts.

, 289, 2068–2074, https://doi.org/10.1126/science.289.5487.2068.*Science*Franzke, C., and A. J. Majda, 2006: Low-order stochastic mode reduction for a prototype atmospheric GCM.

, 63, 457–479, https://doi.org/10.1175/JAS3633.1.*J. Atmos. Sci.*Froyland, G., and O. Junge, 2018: Robust fem-based extraction of finite-time coherent sets using scattered, sparse, and incomplete trajectories.

, 17, 1891–1924, https://doi.org/10.1137/17M1129738.*SIAM J. Appl. Dyn. Syst.*Giannakis, D., 2015: Dynamics-adapted cone kernels.

, 14, 556–608, https://doi.org/10.1137/140954544.*SIAM J. Appl. Dyn. Syst.*Giannakis, D., 2019: Data-driven spectral decomposition and forecasting of ergodic dynamical systems.

, 47, 338–396, https://doi.org/10.1016/j.acha.2017.09.001.*Appl. Comput. Harmon. Anal.*Giannakis, D., and A. J. Majda, 2012: Nonlinear Laplacian spectral analysis for time series with intermittency and low-frequency variability.

, 109, 2222–2227, https://doi.org/10.1073/pnas.1118984109.*Proc. Natl. Acad. Sci. USA*Giannakis, D., A. Kolchinskaya, D. Krasnov, and J. Schumacher, 2018: Koopman analysis of the long-term evolution in a turbulent convection cell.

, 847, 735–767, https://doi.org/10.1017/jfm.2018.297.*J. Fluid Mech.*Harlim, J., and H. Yang, 2018: Diffusion forecasting model with basis functions from QR-decomposition.

, 28, 847–872, https://doi.org/10.1007/s00332-017-9430-1.*J. Nonlinear Sci.*Hasselmann, K., 1976: Stochastic climate models: Part I. Theory.

, 28, 473–485, https://doi.org/10.3402/tellusa.v28i6.11316.*Tellus*Hoffman, R. N., J. M. Henderson, S. M. Leidner, C. Grassotti, and T. Nehrkorn, 2006: The response of damaging winds of a simulated tropical cyclone to finite-amplitude perturbations of different variables.

, 63, 1924–1937, https://doi.org/10.1175/JAS3720.1.*J. Atmos. Sci.*Holton, J. R., and C. Mass, 1976: Stratospheric vacillation cycles.

, 33, 2218–2225, https://doi.org/10.1175/1520-0469(1976)033<2218:SVC>2.0.CO;2.*J. Atmos. Sci.*Inatsu, M., N. Nakano, S. Kusuoka, and H. Mukougawa, 2015: Predictability of wintertime stratospheric circulation examined using a nonstationary fluctuation–dissipation relation.

, 72, 774–786, https://doi.org/10.1175/JAS-D-14-0088.1.*J. Atmos. Sci.*Karatzas, I., and S. E. Shreve, 1998:

. Springer, 470 pp.*Brownian Motion and Stochastic Calculus*Kitsios, V., and J. S. Frederiksen, 2019: Subgrid parameterizations of the eddy–eddy, eddy–mean field, eddy–topographic, mean field–mean field, and mean field–topographic interactions in atmospheric models.

, 76, 457–477, https://doi.org/10.1175/JAS-D-18-0255.1.*J. Atmos. Sci.*Limpasuvan, V., D. W. J. Thompson, and D. L. Hartmann, 2004: The life cycle of the Northern Hemisphere sudden stratospheric warmings.

, 17, 2584–2596, https://doi.org/10.1175/1520-0442(2004)017<2584:TLCOTN>2.0.CO;2.*J. Climate*Lu, J., and E. Vanden-Eijnden, 2014: Exact dynamical coarse-graining without time-scale separation.

, 141, 044109, https://doi.org/10.1063/1.4890367.*J. Chem. Phys.*Martius, O., L. M. Polvani, and H. C. Davies, 2009: Blocking precursors to stratospheric sudden warming events.

, L14806, https://doi.org/10.1029/2009GL038776.*Geophys. Res. Lett.*Metzner, P., C. Schutte, and E. Vanden-Eijnden, 2006: Illustration of transition path theory on a collection of simple examples.

, 125, 084110, https://doi.org/10.1063/1.2335447.*J. Chem. Phys.*Metzner, P., C. Schutte, and E. Vanden-Eijnden, 2009: Transition path theory for Markov jump processes.

, 7, 1192–1219, https://doi.org/10.1137/070699500.*Multiscale Model. Simul.*Pavliotis, G. A., 2014:

. Springer, 339 pp.*Stochastic Processes and Applications*Plotkin, D. A., R. J. Webber, M. E. O’Neill, J. Weare, and D. S. Abbot, 2019: Maximizing simulated tropical cyclone intensity with action minimization.

, 11, 863–891, https://doi.org/10.1029/2018MS001419.*J. Adv. Model. Earth Syst.*Ragone, F., J. Wouters, and F. Bouchet, 2018: Computation of extreme heat waves in climate models using a large deviation algorithm.

, 115, 24–29, https://doi.org/10.1073/pnas.1712645115.*Proc. Natl. Acad. Sci. USA*Ruzmaikin, A., J. Lawrence, and C. Cadavid, 2003: A simple model of stratospheric dynamics including solar variability.

, 16, 1593–1600, https://doi.org/10.1175/1520-0442-16.10.1593.*J. Climate*Sabeerali, C. T., R. S. Ajayamohan, D. Giannakis, and A. J. Majda, 2017: Extraction and prediction of indices for monsoon intraseasonal oscillations: An approach based on nonlinear Laplacian spectral analysis.

, 49, 3031–3050, https://doi.org/10.1007/s00382-016-3491-y.*Climate Dyn.*Scott, R. K., and L. M. Polvani, 2006: Internal variability of the winter stratosphere. Part I: Time-independent forcing.

, 63, 2758–2776, https://doi.org/10.1175/JAS3797.1.*J. Atmos. Sci.*Sigmond, M., and J. F. Scinocca, 2010: The influence of the basic state on the Northern Hemisphere circulation response to climate change.

, 23, 1434–1446, https://doi.org/10.1175/2009JCLI3167.1.*J. Climate*Sjoberg, J. P., and T. Birner, 2012: Transient tropospheric forcing of sudden stratospheric warmings.

, 69, 3420–3432, https://doi.org/10.1175/JAS-D-11-0195.1.*J. Atmos. Sci.*Thiede, E. H., D. Giannakis, A. R. Dinner, and J. Weare, 2019: Galerkin approximation of dynamical quantities using trajectory data.

, 150, 244111, https://doi.org/10.1063/1.5063730.*J. Chem. Phys.*Thompson, D. W. J., M. P. Baldwin, and J. M. Wallace, 2002: Stratospheric connection to Northern Hemisphere wintertime weather: Implications for prediction.

, 15, 1421–1428, https://doi.org/10.1175/1520-0442(2002)015<1421:SCTNHW>2.0.CO;2.*J. Climate*Tripathi, O. P., and Coauthors, 2016: Examining the predictability of the stratospheric sudden warming of January 2013 using multiple NWP systems.

, 144, 1935–1960, https://doi.org/10.1175/MWR-D-15-0010.1.*Mon. Wea. Rev.*Vanden-Eijnden, E., 2014: Transition path theory.

*An Introduction to Markov State Models and Their Application to Long Timescale Molecular Simulation*, Springer, 91–100.Vanden-Eijnden, E., and J. Weare, 2013: Data assimilation in the low noise regime with application to the Kuroshio.

, 141, 1822–1841, https://doi.org/10.1175/MWR-D-12-00060.1.*Mon. Wea. Rev.*Weare, J., 2009: Particle filtering with path sampling and an application to a bimodal ocean current model.

, 228, 4312–4331, https://doi.org/10.1016/j.jcp.2009.02.033.*J. Comput. Phys.*Webber, R. J., D. A. Plotkin, M. E. O’Neill, D. S. Abbot, and J. Weare, 2019: Practical rare event sampling for extreme mesoscale weather.

, 29, 053109, https://doi.org/10.1063/1.5081461.*Chaos*Yasuda, Y., F. Bouchet, and A. Venaille, 2017: A new interpretation of vortex-split sudden stratospheric warmings in terms of equilibrium statistical mechanics.

, 74, 3915–3936, https://doi.org/10.1175/JAS-D-17-0045.1.*J. Atmos. Sci.*Yoden, S., 1987: Bifurcation properties of a stratospheric vacillation model.

, 44, 1723–1733, https://doi.org/10.1175/1520-0469(1987)044<1723:BPOASV>2.0.CO;2.*J. Atmos. Sci.*