## Abstract

Many natural systems undergo critical transitions, i.e., sudden shifts from one dynamical regime to another. In the climate system, the atmospheric boundary layer can experience sudden transitions between fully turbulent states and quiescent, quasi-laminar states. Such rapid transitions are observed in polar regions or at night when the atmospheric boundary layer is stably stratified, and they have important consequences in the strength of mixing with the higher levels of the atmosphere. To analyze the stable boundary layer, many approaches rely on the identification of regimes that are commonly denoted as weakly and very stable regimes. Detecting transitions between the regimes is crucial for modeling purposes. In this work a combination of methods from dynamical systems and statistical modeling is applied to study these regime transitions and to develop an early warning signal that can be applied to nonstationary field data. The presented metric aims to detect nearing transitions by statistically quantifying the deviation from the dynamics expected when the system is close to a stable equilibrium. An idealized stochastic model of near-surface inversions is used to evaluate the potential of the metric as an indicator of regime transitions. In this stochastic system, small-scale perturbations can be amplified due to the nonlinearity, resulting in transitions between two possible equilibria of the temperature inversion. The simulations show such noise-induced regime transitions, successfully identified by the indicator. The indicator is further applied to time series data from nocturnal and polar meteorological measurements.

## 1. Introduction

The atmospheric boundary layer (ABL) is the lowest part of the atmosphere that is directly influenced by Earth’s surface and across which turbulent exchanges of momentum, heat and matter between the surface and the free atmosphere occur. During daytime, surface warming leads to an unstable or convective boundary layer. During clear-sky nights, radiative cooling leads to a surface that is cooler than the air aloft and the ABL becomes stably stratified. The stable stratification can also arise when warm air is advected over a colder surface, which is a frequent event in polar regions. Turbulence in the resulting stable boundary layer (SBL) is subject to buoyant damping and is only maintained through mechanical production of turbulent kinetic energy (TKE). Understanding and modeling the SBL is essential for regional and global atmospheric models, yet there are many well-documented challenges to simulate stably stratified atmospheric flows (Sandu et al. 2013; Holtslag et al. 2013; LeMone et al. 2018). One of the challenges is to develop an accurate understanding and representation of distinct regimes of the SBL and transitions between them (Baas et al. 2017).

Numerous observational and modeling studies show that the SBL can be classified, to a first approximation, in a weakly stable regime in which turbulence is continuous, and a very stable regime with patchy and intermittent turbulence, requiring a different modeling approach (Mahrt 2014). The weakly stable regime typically occurs when cloud cover limits nocturnal radiative cooling at the land surface, or with strong winds associated to wind shear that produces enough TKE to sustain turbulence. The vertical mixing is therefore maintained and a well-defined boundary layer usually exists in which turbulent quantities decrease upward from the surface layer following the classical model of Monin–Obukhov similarity theory and related existing concepts (Mahrt 2014). The associated temperature stratification, or temperature inversion, is weak. The strongly stable regime occurs with strong stratification and weak winds and does not follow the traditional concept of a boundary layer. Transitions from weakly stable to strongly stable regimes are caused by a strong net radiative cooling at the surface that increases the inversion strength and eventually leads to suppressed vertical exchanges unless winds are strong enough to maintain turbulence (van de Wiel et al. 2007). The reduced vertical mixing results in a decoupling from the surface, such that similarity theory breaks down (Acevedo et al. 2015). Intermittent bursts tend to be responsible for most of the turbulent transport (Acevedo et al. 2006; Vercauteren et al. 2016). Such bursts alter the temperature inversion and can sometimes drive transitions from strongly to weakly stable boundary layers.

Transitions between the different SBL regimes have been found by modeling studies to be dynamically unpredictable. Based on a numerical model representing the exchanges between the surface and the SBL using a very simple two-layer scheme, McNider (1995) showed the existence of bistable equilibria of the system, which can thus transition between very different states under the influence of random perturbations. Interacting nonlinear processes that lead to this bistability partly involve thermal processes at the land surface, as was highlighted following the hypothesis that continuous turbulence requires the turbulence heat flux to balance the surface energy demand resulting from radiative cooling (van de Wiel et al. 2007, 2012b, 2017). According to this maximum sustainable heat flux hypothesis, a radiative heat loss that is stronger than the maximum turbulent heat flux that can be supported by the flow with a given wind profile will lead to the cessation of turbulence (van de Wiel et al. 2012a) and thus to a regime transition. This concept is used by van Hooijdonk et al. (2015) to show that the shear over a layer of certain thickness can predict SBL regimes when sufficient averaging of data is considered. Based on observations, Sun et al. (2012) identify a height and site-dependent wind speed threshold that triggers a transition between a regime in which turbulence increases slowly with increasing wind speed from a regime where turbulence increases rapidly with the wind speed. The change of the relationship between the turbulence and the mean wind speed occurs abruptly at the transition. Sun et al. (2016) attribute this difference to the turbulent energy partitioning between turbulent kinetic energy and turbulent potential energy (TPE): in the very stable regime, shear-induced turbulence will have to enhance the TPE in order to counter the stable stratification before enhancing the TKE.

The combined importance of the wind speed and of the surface thermal processes has also been evidenced by numerical studies using idealized single-column models of the atmosphere. Single-column models with a first-order turbulence closure scheme (Baas et al. 2017, 2019; Holdsworth and Monahan 2019) or a second-order closure scheme (Maroneze et al. 2019) are able to representatively simulate transitions from weakly to strongly stable regimes. Yet, direct numerical simulations show that transitions from strongly to weakly stable regimes can occur following a localized, random perturbation of the flow (Donda et al. 2015). Field studies have also highlighted examples of transitions induced by small-scale perturbations of the flow (Sun et al. 2012). In fact, a statistical classification scheme introduced by Vercauteren and Klein (2015) shows that the SBL flow transitions between periods of strong and weak influence of small-scale, nonturbulent flow motions on TKE production in the SBL. Such submesoscale fluctuations of the flow (e.g., induced by various kind of surface heterogeneity) are typically not represented in models but are important in strongly stable regimes (Vercauteren et al. 2019), and may trigger regime transitions. Stochastic modeling approaches are a promising framework to analyze their impact on regime transitions. A related statistical classification of the Reynolds-averaged boundary layer states introduced by Monahan et al. (2015) highlighted that regime transitions are a common feature of SBL dynamics around the globe (Abraham and Monahan 2019). Regime transitions typically take an abrupt character (Baas et al. 2019; Acevedo et al. 2019). Predicting the transition point remains a challenge (van Hooijdonk et al. 2016).

Abrupt or critical transitions are ubiquitous in complex natural and social systems. The concept of critical transition is formally defined in dynamical systems theory and relates to the notion of bifurcation (Kuznetsov 2013). When the dynamics is controlled by a system of equations depending on an external parameter (often called forcing), the stability of the equilibrium solutions can change abruptly and this is also reflected on macroscopic observables of the system. Sometimes, one can have early warning signals of a transition because the systems experience some influences of the bifurcated state before actually reaching it. The motion of a particle undergoing random fluctuations in an asymmetric double-well energy potential *V* is a minimal system to detect early warnings, in which each well or local minimum of the energy potential corresponds to a stable equilibrium state of the system. For a small fixed level of noise, the control parameter is Δ*V*, the depth of the well in which the particle is located, leading to an energy barrier that the particle has to overcome in order to transition to the second stable equilibrium. If Δ*V* is large, the distribution of the positions of the particle will be quasi Gaussian and the autocorrelation function of the position of the particle will have an exponential decay. Conversely, if Δ*V* is reduced when the particle approaches the bifurcation point, then the particle position’s distribution starts to “feel” the effect of the other state and the distribution will be skewed toward the new state. Similarly, the excursions from the equilibrium position will become larger, increasing the autocorrelation time. Early warning signals can then, e.g., be defined based on the changes of the autocorrelation time.

These first early warning signs have been successfully applied to several systems with excellent results (Scheffer et al. 2009, 2012), including the present context of SBL regime transitions (van Hooijdonk et al. 2016). However, sometimes, transitions can happen without detectable early warnings (Hastings and Wysham 2010). The main limitation of early warning signals based on the increase of autocorrelation is that their activation does not always correspond to a bifurcation. Indeed, if a single well potential widens, as it can occur in nonstationary systems, the distribution of a particle’s position experiences the same increase in skewness and autocorrelation function without the need of approaching a bifurcation (Lenton et al. 2012; Faranda et al. 2014). In our context of SBL flows, nonstationarity of the energy potential governing the dynamics can be due to changes in the mean wind speed or cloud cover, for example. For these reasons, (Faranda et al. 2014) have introduced a new class of early warning indicators based on defining a distance from the dynamics expected from a particle evolving in a single-well potential. The suggested indicator statistically quantifies the dynamical stability of the observables and was already used by Nevo et al. (2017) to show that strongly stable flow regimes are dynamically unstable and may require high-order turbulence closure schemes to represent the dynamics. Alternative new early warnings are based on the combination of statistical properties of observables when approaching the bifurcation (Chen et al. 2012).

In the present analysis, we investigate if the early warning indicator introduced by Faranda et al. (2014) can be used to detect nearing transitions between SBL flow regimes, based on both simulated data and field measurements. We show that the conceptual model that was recently suggested by van de Wiel et al. (2017) to understand SBL regime transitions in terms of thermal coupling of the land surface is equivalent to a dynamical system representing the evolution of the temperature inversion evolving in a double-well energy potential. We extend this conceptual model to a stochastic model where added noise represents the effect of natural fluctuations of the temperature inversion’s rate of change. The resilience of equilibria of the nonrandom model to perturbations as well as the bifurcation points are known analytically (as was discussed in van de Wiel et al. 2017), and we thus use the simulated data to test our indicator. Additionally, the indicator relies on calculating statistical properties of the data with a moving window approach and is sensitive to the choice of the window length. We suggest two complementary, data-driven but physically justified approaches to define an appropriate window length for which results can be trusted. Finally, the indicator is applied to nocturnal temperature inversion data from a site in Dumosa, Australia, as well as from temperature inversion data from Dome C, Antarctica.

## 2. Analyzing the dynamical stability of stable boundary layer regimes

The goal of our study is to investigate if a statistical early warning indicator of regime transitions can be successfully used to detect nearing regime transitions in the SBL. In section 2a, the conceptual model introduced by van de Wiel et al. (2017) to study regime transitions will be introduced, along with its dynamical stability properties. In section 2b, the model is extended to a stochastic model in which noise represents fluctuations in the dynamics of the near-surface temperature inversion. In section 2c, we present a statistical indicator that was introduced in Faranda et al. (2014) and applied to SBL turbulence data in Nevo et al. (2017) to estimate the dynamical equilibrium properties of time series, based on a combination of dynamical systems concepts and stochastic processes tools. The conceptual model describes the evolution of the near-surface temperature inversion and is used to produce time series of controlled data for which the theoretical equilibrium properties are known.

### a. Model description and linear stability analysis

A conceptual model was introduced by van de Wiel et al. (2017) to study regime transitions of near-surface temperature inversions in the nocturnal and polar atmospheric boundary layer. The authors were able to determine a connection between the dynamical stability of the temperature inversion and the ambient wind speed *U* through their model and measurements. Mathematically speaking, the model is a dynamical system represented by a first-order ordinary differential equation (ODE), which describes the time evolution of the difference between the temperature at a reference height *T*_{r} and the surface temperature *T*_{s}. Although the equilibrium properties of the system and the dynamical stability properties (i.e., the resilience to perturbations) of all equilibria states were thoroughly discussed in van de Wiel et al. (2017), for the sake of completeness we briefly introduce the model and summarize the linear stability analysis of equilibrium points of the resulting ODE for different values of a bifurcation parameter. The bifurcation parameter is related to the ambient wind speed.

Assuming that the wind speed and temperature are constant at a given height *z*_{r}, the following equation describes the evolution of the near-surface inversion strength, based on a simple energy balance at the ground surface:

In this energy balance model, *c*_{υ} is the heat capacity of the soil, Δ*T* = *T*_{r} − *T*_{s} is the inversion strength between the temperature at height *z*_{r} and at the surface *z*_{s}, *Q*_{n} is the net long wave radiative flux (an energy loss at the surface that will be set as a constant), *G* is the soil heat flux (an energy storage term that will be parameterized as a linear term), and *H* is the turbulent sensible heat flux (a nonlinear energy transport term that will be parameterized in the following).

After parameterizing the fluxes, the model has the form

in which *Q*_{i} is the isothermal net radiation, *λ* is a lumped parameter representing all feedbacks from soil heat conduction and radiative cooling as a net linear effect, *ρ* is the density of air at constant pressure, *c*_{p} is the heat capacity of air at constant pressure, *c*_{D} = [*κ*/ln(*z*_{r}/*z*_{0})]^{2} is the neutral drag coefficient, with *κ* ≈ 0.4 the von Kármán constant, *z*_{0} the roughness length, and *z*_{r} the reference height, *U* is the wind speed at height *z*_{r}, *R*_{b} = *z*_{r}(*g*/*T*_{r})(Δ*T*/*U*^{2}) is the bulk Richardson number, and *f*(*R*_{b}) is the stability function used in Monin–Obukhov similarity theory.

The lumped parameter *λ* corresponds to a linear term in the model as the soil is assumed to respond linearly to the temperature inversion. Moreover, Δ*Tf*(*R*_{b}) is a nonlinear term due to the nonlinear dependence of turbulent diffusion on the vertical temperature gradient.

Following van de Wiel et al. (2017), instead of analyzing the dynamical stability of the energy-balance model (2) itself, we will present the linear stability analysis of a simplified system that has a similar mathematical structure but is mathematically convenient to analyze. Using a cutoff, linear form for the stability function, i.e., *f*(*x*) = 1 − *x* and *f*(*x*) = 0 for *x* > 1, the simplified model is

and *x*(*t*_{0}) = *x*_{0}. Here, up to dimensional constants, *x* represents Δ*T*. The parameter *C* will be treated as a bifurcation parameter for this simplified system. Similar types of stability functions are typically used in numerical weather prediction tools, and the cutoff form facilitates the mathematical analysis of the model. Note that to be consistent with the original model, the stability function should include a dependence on both the temperature and the wind speed via *R*_{b}. Removing this dependence as it is done here changes some of the nonlinearity; however, it makes the mathematical analysis very simple and the qualitative behavior of the system is similar to the original system (see van de Wiel et al. 2017, their Figs. 8 and 10). In that sense, the model loses some physical significance for mathematical convenience, but the qualitative nonlinear feedback processes are maintained. This simplification also has for implication that while *C* is related to the wind speed, it cannot be directly interpreted as such in the context of the energy balance model. For a deeper discussion of the model, its simplifications and the model parameters, the reader is referred to its thorough presentation by van de Wiel et al. (2017).

For fixed and physically meaningful values of *Q*_{i} and *λ*, Eq. (3) can have either one, two, or three possible equilibrium solutions depending on the fixed values [see illustration in van de Wiel et al. (2017), Figs. 10–12, and related discussion for more details]. The equilibrium solutions will be functions of the parameter *C*, which we will consider as a bifurcation parameter in the following. Physically, the case of strong thermal coupling between the surface and the atmosphere, corresponding to a large value of *λ*, results in one unique equilibrium solution whose value depends on *C*. In van de Wiel et al. (2017), it is hypothesized that such a case is representative of a grass site such as Cabauw, the Netherlands. The solution is linearly stable to perturbations, i.e., linear stability analysis shows that perturbed solutions are attracted back to the equilibrium. The case of no coupling (*λ* = 0) leads to two equilibrium solutions, one of which is linearly stable and the other unstable to perturbations (i.e., perturbed solutions are repelled by the equilibrium). A weak coupling strength, with an intermediate value of *λ* that could be representative of a snow surface or another thermally insulated ground surface, results in three possible equilibrium solutions. The two extreme solutions are stable to perturbations, while the middle equilibrium solution is unstable. Perturbed solutions around the middle equilibrium will thus be attracted by either the upper or the lower equilibrium. Plotting those three equilibrium solutions as a function of the bifurcation parameter *C* results in a back-folded curve, which is qualitatively similar to observations of the temperature inversion shown as a function of wind speed at Dome C (see Vignon et al. 2017). The bifurcation diagram is shown in Fig. 1 for parameter values such that *λ* > 0 and *Q*_{i} > *λ*, resulting in the case with three possible equilibrium solutions. By convention, the unstable equilibrium branch is denoted by a dashed line. In the following, we will analyze transitions between the two stable equilibria. If the system undergoes random perturbations in this bistable context, a perturbation could drive the system sufficiently far from a stable equilibrium state so that it comes near the unstable equilibrium and finally gets attracted by the second equilibrium. The three possible equilibria are denoted as $xe1$, $xe2$, and $xe3$. To study such regime transitions induced by random perturbations, the conceptual model is extended with a noise term in the following section.

### b. Extending the conceptual model by randomization

The conceptual model (3) can be equivalently written in terms of a gradient system, in which the temperature inversion represented here by *x* evolves according to the influence of an underlying potential *V*(*x*). The randomized model to be introduced will be based on this gradient structure. Specifically, the initial-value problem (3) can be written as

where it is easy to see that the potential is given by

The linear stability analysis discussed in the previous section can thus be understood in the sense that the temperature inversion *x* equilibrates at a local minimum of a potential *V*. That is, an equilibrium point *x*_{e} satisfies *V*′(*x*_{e}) = 0. Figure 2 sketches the form of the potential with the exemplary parameter values *λ* = 2, *Q*_{i} = 2.5, and *C* = 6.4. Note that *V*(*x*) is a double-well potential in that case where each local minimum corresponds to one of the stable equilibrium points $xe1$ and $xe3$, while the local maximum corresponds to the system’s unstable equilibrium $xe2$.

While the conceptual model (3) has proven very insightful to explain observed sharp transitions in temperature inversions, it only allows for regime transitions when drastic changes in the model *parameters* (i.e., bifurcations) occur. That is, the model is overly idealized and in reality one can expect regime transitions to also take place due to small natural fluctuations of the temperature inversion itself in certain cases, e.g., when the potential barrier separating the two local minima and corresponding stable equilibria is shallow. Therefore, we will consider an appropriate randomized variant of the model. Specifically, we consider the stochastic differential equation (SDE) model

to account for small random perturbations to the temperature inversion’s rate of change. Here, *B* denotes a standard Brownian motion (i.e., a stochastic process) and *σ* > 0 scales the intensity of the fluctuations, while the potential *V* is as in (4). As the randomized dynamics is characterized by the same potential, also the equilibrium points of the nonrandom model (3) will describe the dominant effects of the randomized model’s dynamics. However, due to the presence of the noise, the stable equilibria of the nonrandom model (3) are not limiting points for the stochastic counterpart in (5), in the sense that the temperature inversion may still fluctuate after reaching a stable equilibrium. The reason is that in a context of two stable equilibria (i.e., for parameter values such that the model (3) exhibits two stable equilibria, denoted earlier as the thermally weakly coupled state), the random perturbations can trigger transitions from one stable equilibrium to another one. We will therefore refer to the formerly stable states as: metastable. Note that depending on the coupling strength and noise intensity, the likelihood of regime transitions can change drastically and the system may or may not exhibit metastable states. The type of noise (additive or multiplicative for example, or noise with a Levy distribution) will also affect regime transitions. In our subsequent simulations and analyses, we will focus on the case of two metastable states with additive noise and leave other cases for future research.

The effect of these random perturbations to a metastable equilibrium point *x*_{e} can be understood through a localized approximation of the original dynamics. More precisely, consider a second-order Taylor approximation of the potential around an equilibrium point *x*_{e}, yielding the quadratic approximate potential $V\u02dc$:

For the same parameter values that were used to plot the original potential in Fig. 2, the red line in the same figure shows the approximate quadratic potential around the equilibrium value $xe1$. Using the locally quadratic potential, we can thus define a locally approximate dynamics for the temperature inversion by replacing *V* in (5) by $V\u02dc$, resulting in

where $k:=d2V/dx2\u2061(x)|x=xe\u2208R$ and *X* is introduced to describe the approximate dynamics of the former *x*. This approximate dynamics is an example of the well-studied Ornstein–Uhlenbeck process and it provides an accurate description of the full dynamics in the neighborhood to the equilibrium point *x*_{e}. Discretizing the Ornstein–Uhlenbeck process *X* using the Euler–Maruyama scheme with a step-size Δ*t*:= *T*/*L* for some positive integer *L* we furthermore find that the process at discrete times *t* ∈ {1, …, *L*} approximately satisfies the difference equation

in the sense that *X*_{t} ≈ *X*(*t*Δ*t*). By defining $\mu :=kxe\u2208R$, *ϕ*:= (1 − *k*Δ*t*) and *w*_{t}:= *σ*{*B*(*t*Δ*t*) − *B*[(*t* − 1)Δ*t*]} this can be written as

which is a so-called autoregressive model of order 1, denoted AR(1), thanks to the properties of the (scaled) Brownian increments *w*_{t}. Consequently, we see that the discretized Ornstein–Uhlenbeck process can be accurately approximated by an AR(1) process. This derivation can also be found in Thomson (1987). Combining this with the observation that the Ornstein–Uhlenbeck process offered an accurate approximation to the original dynamics in the vicinity of a stable equilibrium, we can thus conclude that the local dynamics in the neighborhood of a metastable state can be approximately described by an AR(1) process.

### c. Statistical indicator for the dynamical stability of time series

In section 2a we discussed the simplified model by van de Wiel et al. (2017) that was developed to understand regime transitions in near-surface temperature inversions. This model provides a hypothesis that explains the existence of two possible equilibria of the temperature inversion for a given wind speed. In agreement with the randomized conceptual model introduced in the previous section, we say that a system exhibiting at least two metastable equilibria is called metastable. In this section the goal is to describe a methodology for statistically detecting critical transitions based on time series data. For the detection we apply an indicator for the dynamical stability (i.e., the resilience to perturbations) of time series, which was defined by Faranda et al. (2014) and applied to SBL turbulence data in Nevo et al. (2017). The indicator uses a combination of methods from dynamical systems and from statistical modeling. In its definition, deviations from AR(1) processes in the space of so-called autoregressive-moving-average (ARMA) models are used to quantify the dynamical stability of a time series. A time series *x*_{t}, $t\u2208Z$, is an ARMA(*p*, *q*) process if it is stationary and can be written as

with constant *ν*, coefficients *ϕ*_{p}, *θ*_{q},and {*w*_{t}} being white noise with positive variance *σ*^{2}. The coefficients *ϕ*_{p} and *θ*_{q} additionally have to satisfy some constraints (see Brockwell and Davis 2016). Notice that AR(1) = ARMA(1, 0). Intuitively the parameters *p* and *q* are related to the memory lag of the process. The longer the system takes to return to the equilibrium after a perturbation, the more memory we expect to observe in the process. Examples of simple systems along with their ARMA(*p*, *q*) characteristics can be found in Faranda et al. (2014).

In section 2b, it was shown that the dynamics when the system is close to a stable equilibrium can be approximated by an AR(1) process. We will assume that far from the transition from one dynamical regime to another, the time series of a generic physical observable can be described by an ARMA(*p*, *q*) model with a reasonably low number of *p*, *q* parameters and coefficients. Indeed, far from a transition, the system will tend to remain around an equilibrium despite random perturbations, and excursions from the equilibrium are short. The idea behind the modeling assumption is that ARMA processes are an important parametric family of stationary time series (Brockwell and Davis 2016). Their importance is due to their flexibility and their capacity to describe many features of stationary time series. Thereby, choosing ARMA(*p*, *q*) processes for modeling the dynamics away from a stable state is a reasonable Ansatz. Close to a transition, the resilience of the system to perturbations is weak and longer excursions from the equilibrium occur. The statistical properties (such as the shape and/or the persistence of the autocorrelation function) of the system change drastically, leading to an expected increase of the value *p* + *q* (Faranda et al. 2014). Based on this, we use ARMA(*p*, *q*) models in the following to analyze the stability of a dynamical system. The dynamical stability indicator that will be defined next will be used to obtain indicators for detecting the system’s proximity to a regime transition.

To quantify the local stability of a time series, we first slice the time series *x*_{t} with a moving time window of fixed length *τ*. In other words, we obtain subsequences {*x*_{1}, …, *x*_{τ}}, {*x*_{2}, …, *x*_{τ+1}}, …, {*x*_{t−τ+1}, …, *x*_{t}} of the original time series that overlap. By slicing the original time series, we obtain a sequence of shorter time series for which it is reasonable to suppose that they are amenable to ARMA modeling. In detail, we assume that the subsequences are realizations of linear processes with Gaussian white noise, which then implies that the process is stationary. We then fit an ARMA(*p*, *q*) model for every possible value of (*p*, *q*), with *p* ≤ *p*_{max} and *q* ≤ *q*_{max}, to these subsequences, where *p*_{max} and *q*_{max} are predefined thresholds. Afterward we choose the best fitting ARMA(*p*, *q*) model by choosing the one with the minimal Bayesian information criterion (BIC) (Schwarz 1978), which is a commonly used and well-studied tool in statistical model selection. Assuming that we have the maximum likelihood estimator $\beta ^:=(\nu ^,\varphi ^1,\u2026,\varphi ^p,\theta ^1,\u2026,\theta ^q)$ of the fitted ARMA(*p*, *q*) model (which can be obtained using a so-called innovation algorithm, as it is, for example, implemented in the “forecast” R package (Hyndman et al. 2019), which is used for the analyses), the BIC is formally defined as

where $L\u2061(\beta ^)$ denotes the associated likelihood function evaluated at the maximum likelihood estimator $\beta ^$. The second term introduces a penalty for high-order models (i.e., those that contain more parameters) to avoid overfitting.

We reiterate that when the system is close to a metastable state, its dynamics can be well approximated by an AR(1) process. When the system is approaching an unstable point separating multiple basins of attraction, the approximation no longer holds as the underlying potential cannot be approximated by a quadratic potential anymore. The change in the shape of the potential introduces new correlations in the time series, resulting in higher-order ARMA terms when fitting such a model to data.

The definition of the stability indicator is based on this observation, in the sense that it assumes that the dynamics near a metastable state can be modeled by an ARMA(1, 0) or AR(1) process. Specifically, the stability indicator is defined as

For a stable state, the most likely statistical model is an AR(1) process and one expects that $\u03d2=0$. The indicator $\u03d2$ gives a normalized distance between the stable state $\u2061(\u03d2=0)$ and the state in which the system is. The limit $\u03d2\u21921$ corresponds to a most likely statistical model of high order and probably to a nearing transition. While a formal proof of this statement is still missing, tests performed for systems of increasing complexity in Nevo et al. (2017) showed promising results where the indicator correctly identified changes in the stability of metastable states. Note that the character of the noise present in the physical system (additive noise, multiplicative, Levy process, etc.) will affect the ARMA model approximation and impact the values of $\u03d2$. To simplify the notation, we drop the dependence of $\u03d2$ on *p*, *q*, and *τ* in the following discussion.

The reliability of $\u03d2$ highly depends on the choice of *τ*, the window length (which we will consider in number of discrete observations in the following), and it relates to the characteristic time scales of the dynamics. Intuitively, the window length, when converted to its equivalent physical duration (i.e., the number of discrete observations multiplied by the discrete sampling time), has to be shorter than the residence time scale in one basin of attraction (i.e., the time spent in the neighborhood of an equilibrium before transitioning to another one) in order to satisfy the local stationarity, but large enough so that statistical model estimation is reliable. In winter at Dome C where the polar winter results in a near absence of daily cycle, no preferred time scale of residence around an equilibrium of the temperature inversion was observed (an equilibrium can remain for several days); however, the transition between two equilibria was observed to take place over a time scale on the order of 10 h (Baas et al. 2019). For nocturnal flows, the residence time scale is tightly connected to the daily cycle and could be of a few hours during the night, or the entire night. The transition between two equilibria typically takes place over a duration of about a half hour. For reliable statistical estimation, multiple tests showed that a minimum window length of 20 discrete points is needed. With a sampling time of 1 min, that means that a moving window of approximately 20 or 30 min may be appropriate.

In addition, the sampling frequency has to be fine enough to sample typical fluctuations of the dynamics. In the following analyses, we find a sampling frequency of 1 min to be appropriate for that purpose. The characteristic time scale here is given by the time scale at which the system recovers from perturbations (which is estimated by linear stability analysis in the case where the model is known; see, e.g., van de Wiel et al. 2017), and the time interval between two observations should be close to or smaller than this quantity so that (small-scale) local fluctuations can be resolved. Since the characteristic time scales of the system cannot be known analytically in many situations, for example when analyzing time series from atmospheric models or from field data, we suggest two data-driven approaches to select a window length:

In the first approach, the mean residence time around each metastable state as well as the mean transition time between the two states will be estimated based on a

*data clustering approach.*The observations will be clustered in the metastable regimes and an intermediate, transition regime. From the clustered data, the mean residence time in each cluster will be evaluated. This approach will provide an upper bound to select the window length.The second approach is motivated by the fact that the indicator $\u03d2$ is obtained through a statistical inference procedure through the definition of the BIC, which involves fitting suitable ARMA processes to data. Specifically, a maximum likelihood approach is used, which assumes that subsequences are sampled from a normal distribution. To assess the validity of this

*statistical approach*, a normality test will be implemented as a criterion to select a window length for which the normality assumption is justified and ARMA model estimation is reliable.

Both approaches are applicable when the data-generating model is unknown. This is important in cases where data showing signs of metastability are available, but an underlying model is unknown. A summary of the full algorithmic procedure used to calculate the statistical indicator is given in appendix C.

#### 1) Clustering approach: *K* means

In the first approach, we suggest using the *K*-means algorithm (Hartigan and Wong 1979, see pseudo code in appendix A) to select a window length for the analysis. In the context of analyzing transitions in the temperature inversion, the idea is to cluster the data into three different clusters: data around each stable fixed point and data near the unstable fixed point (in other words, data covering the transition periods between two metastable states). By that, the goal is to estimate the average time needed by the system to transition between two metastable states. The mean residence time within each cluster is calculated from the time series of cluster affiliation. We choose *τ* (recall that we consider it in number of discrete observations and not in physical time) such that it is smaller than the minimal mean time spent in one cluster, which should ensure that subsequences remain mostly around one equilibrium. This value is denoted by *τ*_{Kmeans}. For the simulated data in the following, each simulated time series will be assigned a window length *τ*_{Kmeans} by this procedure. For the nocturnal dataset, we cluster the entire dataset once and obtain a length *τ*_{Kmeans} of 22 points, corresponding to a duration of 22 min. For the polar dataset, only one continuous time series during a polar winter will be considered and assigned one value of *τ*_{Kmeans}, namely, 10 points, corresponding to a duration of 100 min. This window length is insufficient to obtain reliable statistical estimations.

Note that this clustering approach to determining a residence time scale around an equilibrium is a crude approximation and suffers from many caveats: a high density of data close to a given value of the temperature inversion may not necessarily relate to the existence of metastable equilibria, but could occur due to nonstationary dynamics or complex nonlinear effects, for example. Nevertheless, we use it as a first approach and future research may result in more reliable approaches. In the following analyses, the *K*-means procedure can be interpreted as providing an upper bound for selecting a window length for the analysis and thus, combined with the following criterion, will offer an applicability criterion for our method.

#### 2) Statistical approach: Anderson–Darling normality test

The *K*-means clustering approach described above estimates the system’s physical time scales, but the statistical properties of the process should also be considered for reliable calculations. To fit ARMA models reliably and to calculate the Bayesian information criterion for ARMA model selection, we need the underlying process to follow a normal distribution. Note that the reliability of ARMA model fitting generally increases for increasing number of data points (assuming stationarity remains fulfilled). In this approach, we suppose that the subsequences are sampled from a normal distribution, at least for some window length *τ*. We then choose *τ* as the biggest window length such that this normality assumption holds (more precisely, such that the normality hypothesis cannot be rejected). This value is denoted by *τ*_{AD}. If we find that *τ*_{AD} has to be much larger than *τ*_{Kmeans} to fulfil the normality assumption, we will interpret this as a sign of undersampling in the data.

Specifically, the statistical test results in a *p*-value for each subsequence, and we choose the window length such that the median of the *p*-values of all subsequences is above a threshold for which the null hypothesis cannot be rejected. The normality test applied here is the Anderson–Darling test (Anderson and Darling 1952), abbreviated AD test, as it is, for example, more stable than the Kolmogorov–Smirnov test (Stephens 1974). Further details of the AD test are summarized in appendix B. Similar to the clustering approach, the window length *τ*_{AD} for which normality cannot be rejected is selected for each analyzed time series, and this value of *τ*_{AD} is then used for all subsequences of the time series. Each of the simulated dataset, the nocturnal dataset and the polar dataset will be assigned a single value of *τ*_{AD}. The value of *τ*_{AD} for the nocturnal dataset is 19 discrete points, hence 19 min with a sampling frequency of 1 min. For the polar dataset, the value of *τ*_{AD} is 43 points corresponding to a duration of 430 min.

## 3. Stability analysis of simulated and observed time series

In this section we quantify the reliability of the stability indicator introduced in section 2c. We start by testing it on a controlled dataset generated by the simplified model for near-surface temperature inversion (see section 2a) and then proceed by applying $\u03d2$, the stability indicator, to observational data. In the tests we use the auto.arima(⋅) function from the “forecast” R package (Hyndman et al. 2019). The auto.arima(⋅) function fits ARMA(*p*, *q*) models by calculating the maximum likelihood estimators for a given model order (using the innovation algorithm mentioned earlier). It calculates the corresponding BIC [using the definition (7)] for all ARMA(*p*, *q*) models with *p* ≤ *p*_{max} and *q* ≤ *q*_{max}, where *p*_{max} and *q*_{max} are thresholds set to 10 in our application, and then it chooses the ARMA model with the minimal BIC value. This procedure is repeated for each subsequence of data, using the moving window approach, and the minimal BIC value leads to the optimal ARMA(*p*, *q*) model to represent the given subsequence.

### a. Simulated time series

To generate the simulated data, we use the conceptual randomized model (5), which we recall here for the reader’s convenience:

where *V*(*x*) is the energy potential defined in (4). That is, the data-generating model reads

The SDE model (9) is approximated path-wise (i.e., for each realization of the driving Brownian path) using the Euler–Maruyama scheme.

For the purpose of testing the accuracy of the $\u03d2$ indicator and its potential to detect nearing regime transitions, one realization {*x*_{t}} of the stochastic process is used for each fixed value of the bifurcation parameters *C*. Multiple fixed values of *C* are used, resulting in one time series per value of *C*. The initial parameters are set to *t*_{0} = 0 and $x\u2061(t0)=min{xei|i=1,\u20092,\u20093}$ where $xei$ are the three equilibria of the system. To generate the controlled dataset the model parameters are set to *λ* = 2, *Q*_{i} = 2.5, and *σ* = 0.35. The value of *C* is varied between *C* = 5.3 and *C* = 7.2 with discrete increments of 0.1 and one simulation is done per value of *C*. The simulations are run for *n* = 2000 time steps of size Δ*t* = 0.01. The amplitude of the noise, or diffusion coefficient, *σ* = 0.35 is chosen as it resulted in trajectories for which regime transition could be observed on the time interval [0, *T* = *n*Δ*t* = 20]. The range for *C* is chosen because for these values the time series shows frequent transitions from one metastable state to another. To choose the window length *τ* we apply both the *K*-means algorithm [section 2c(1)] and the Anderson–Darling test [section 2c(2)]. The length *τ* is determined individually for each simulation, i.e., for each fixed value of *C*. The *K*-means algorithm can be used to estimate the amount of discrete observation points covering the transition time. We set the cluster number to three as we expect three equilibria. The results of the clustering algorithm are exemplary shown in Fig. 3 for *C* = 6.4. Note that *t* ∈ [0, *n*Δ*t*] = [0, 20], whereas we will express our window length *τ* in number of discrete points in the following. In this case the equilibria are $xe1=0.46\u2009\u2061(metastable)$, $xe2=0.97\u2009\u2061(unstable)$, and $xe3=1.25\u2009\u2061(metastable)$. The cluster centers, estimated by the *K*-means algorithm, are 0.46, 0.97, and 1.31, which are a close approximation of the equilibria. Therefore, we expect a good estimation for the amount of points covering the transition. The average time spend in each cluster are (for *C* = 6.4): mean(*T*_{1}) = 112.2, mean(*T*_{2}) = 94, and mean(*T*_{3}) = 286.67, where mean(*T*_{i}) is the average time spent without observed transitions in cluster *i* ∈ {1, 2, 3} expressed in number of discrete points. The minimal mean residence time is thus mean(*T*_{2}) = 94 and provides an upper bound to select a window length that respects the time scales of the system. The window length *τ* is thus chosen such that it is smaller than the minimal average time spent in one cluster, i.e., for *C* = 6.4 we choose *τ* < *τ*_{Kmeans}: = min {mean(*T*_{i})|*i* = 1, 2, 3} = 94. For all tested *C*, we choose *τ* = min {mean(*T*_{i})|*i* = 1, 2, 3} − 5 in order to give room for some uncertainty in the evaluation of the time spent in each cluster, due to potential overlaps of the clusters (we recall that the minimal mean residence time should be understood as an upper bound to select *τ*). By applying $\u03d2$ to the data generated by the simplified model with *C* = 5.3, *C* = 6.4, and *C* = 7.2 we get the results shown in Fig. 4. The solid red lines correspond to the stable equilibria and the dotted red line to the unstable one. The colors ranging from dark blue to yellow represent the stability of the points measured by $\u03d2$ and we always color the last point of the subsequence. The simulation is initialized around the stable equilibrium $xe1$, where short memory of the random perturbations should prevail. As expected, the values of $\u03d2$ remains close to 0 [corresponding to a most likely AR(1) model] as long as the simulation oscillates around the equilibrium. The time series eventually approaches the neighborhood of the unstable equilibrium where long memory properties are to be expected and thus higher-order ARMA(*p*, *q*) models, hence larger values of $\u03d2$, are more likely. This first transition through the unstable equilibrium is well recognized with higher values of $\u03d2$ (green dots after the dotted red line).

The Anderson–Darling test can be used to find the biggest *τ* for which we can assume that most of the subsequences are sampled from a normal distribution and hence trust the ARMA model selection and fitting. As shown in Fig. 5, for *C* = 6.4 the Anderson–Darling test yields that for *τ* = 67 the median of the *p*-values for all subsequences is greater than the significance level 0.05. The solid line in the gray boxes is the median of the *p*-values for a fixed *τ* while the upper and lower border of the gray boxes refer to the upper and lower quartile of the *p*-values. The dotted horizontal line is the significance level. We report that the values of *τ*_{AD} given by the Anderson–Darling test are ranging from 60 to 70 discrete points for all values for *C*.

Figure 6 summarizes the $\u03d2$ values obtained for different choices of *τ* and different values of *C* in a bifurcation diagram. In the figure, the equilibrium solutions of the deterministic Eq. (3) are shown by a red line for the considered range of values of *C*. This is the same diagram as shown in Fig. 1, where the upper and lower branches of the equilibrium solution correspond to the two stable equilibria, while the middle one is the unstable equilibrium separating the two basins of attraction of the stable equilibria. A discontinuity in the solution is visible between the upper and middle solution branches, which is due to the discontinuity introduced by the cutoff form of the stability function. The $\u03d2$ values obtained for the simulations of the stochastic system [Eq. (9)] for all considered values of *C* are then shown as a scatterplot along with the equilibrium solution, and the darker color corresponds to higher values of $\u03d2$. As the initial condition for all simulations is taken at the lowest equilibrium values, the transitions are expected to occur between the lower and upper equilibrium branches when the system transitions from the basin of attraction of the lowest equilibrium value to that of the highest value. High values of $\u03d2$ are indeed mainly found in this region of the diagram.

Three methods are used to select the value of *τ* and the results associated with these window sizes are shown in Fig. 6. In Fig. 6a, *τ* = *τ*_{Kmeans} − 5 is used. Around the stable branches, values of $\u03d2$ are small, denoting that stable states are detected as such. Large values of $\u03d2$ are found between the unstable branch and the upper stable branch of the bifurcation diagram, indicating that transitions from the lower to the upper stable branches are detected by the indicator. The fact that the high values are not exactly located around the unstable branch is due to the use of a finite window size for the calculation: in the diagram, the color is always assigned to the last point of the subsequence. For small values of *C*, e.g., for *C* = 5.4, large values of $\u03d2$ are occasionally inappropriately found around the upper stable branch. For small *C*, the potential well will be relatively steep and the system rapidly approaches the second equilibrium, so that the detection can be too slow. Figure 6b shows the results for $\u03d2$ when choosing *τ* according to the Anderson–Darling test, denoted as *τ*_{AD}. The figure is very similar to the one using *τ*_{Kmeans} except that for *C* ≤ 5.6 there are more high values of $\u03d2$ located around the stable branch. This is due to the fact that for these *C*’s the *τ* values chosen by the Anderson–Darling test are larger than the ones estimated by the *K*-means algorithm. Consequently, the local stationarity assumption may break down. For *C* ≥ 5.9 the values of *τ* given by the AD test are smaller than those of the *K*-means algorithm. In these cases $\u03d2$ gives a good indication for the stability. Figure 6c is a bifurcation plot showing only the time series for which *τ* can be chosen to satisfy both the *K*-means and the Anderson–Darling condition, i.e., *τ*_{AD} < *τ*_{Kmeans} − 5. Here *τ*:= *τ*_{AD} is used for the analysis and time series that do not satisfy the condition are discarded from the analysis. In this case, we see that large values of $\u03d2$ always occur between the unstable branch and the upper stable branch; thus, $\u03d2$ is capable of recognizing the location of unstable equilibria for all *C* and stable equilibria are never assigned a large value of $\u03d2$. A note of caution should however be given regarding the reverse transitions from the upper stable branch to the lower stable branch in those numerical examples. Indeed, those are poorly identified. This is probably related to the asymmetry of the underlying potential, which induces different characteristic time scales in the system and hence a need to adapt the value of *τ* locally, and not just globally for all types of transitions as it is done here. To overcome this difficulty, an adaptive tuning of *τ* would be required, which will be left for future research. With these possible limitations in mind, $\u03d2$ will next be applied to observational data.

As a final remark, the values of *τ* considered here can be compared to analytical results in this numerical example. Indeed, for this simple bistable example system, analytical results can provide the expected time taken by the system to transition from one of the local equilibria to the bifurcation point (Krumscheid et al. 2015) and can serve as a comparison to the statistical estimations of *τ* obtained here. As a matter of fact, for *C* close to the bifurcation point, the results given by the Anderson–Darling test are similar to those given by analytical calculations.

### b. Analysis of regime transition in observed nocturnal and polar temperature inversions

In this section we apply the stability indicator $\u03d2$ on observational data obtained from one site near Dumosa for which nocturnal data are selected, and from Dome C for which we consider the polar winter. When we plot Δ*T* over *U* for both sites (see Fig. 7) we see a clear sign of two distinct states: one when the wind is weak and Δ*T* is large and one for strong wind where Δ*T* is small.

#### 1) Dumosa

The first observational dataset consists of temperature measurements from a site near Dumosa. The site was located in a large area with mostly homogeneous and flat terrain, covered by wheat crops, and measurements were taken during the crop season. The temperature measurements were made on the main tower at heights of 3 and 6 m and the wind measurements at 6 m. The frequency of measurements is 1 min. Further details about the observational site can be found in Lang et al. (2018). As we want to use data where we can expect temperature inversions to take place, we exclusively use evening and nighttime data from March until June 2013 (89 days). Each night of data results in a time series of 1020 discrete observations. Similar to the simulated data, we use the *K*-means algorithm and the Anderson–Darling test to choose the window length *τ*. The results are shown for all nights considered together in Fig. 8. According to these tests the maximal *τ* for which we can assume normality (*τ*_{AD}) is 19 discrete observations (hence 19 min with the sampling frequency of 1 min) and the *τ*, which corresponds to the minimal mean residence time in one of three clusters (*τ*_{Kmeans}) is 22. Hence, we have *τ*_{AD} < *τ*_{Kmeans} and choosing *τ*_{AD} should be appropriate. The results for $\u03d2$ applied to all 89 nights with both choices of window lengths are given in Fig. 9. The results highlight a lower branch with low values of $\u03d2$, or dynamics identified as stable, and an upper branch with high values of $\u03d2$, or dynamics identified as unstable. In some cases, a proper ARMA(*p*, *q*) model cannot be fitted by the statistical methods, resulting in absence of results for some windows. Generally, a reliable ARMA(*p*, *q*) fit becomes difficult for a time series with less than 20 observations, and the estimated window lengths are on the lowest end to obtain the statistical estimations. Figure 10 shows the time evolution of Δ*T* when conditionally averaged for all nights with the wind speed (wsp) being in a given category. The corresponding time evolution of $\u03d2$ is shown for the same conditional averages. The window length here is chosen as the most restrictive criterion *τ* = *τ*_{AD} < *τ*_{Kmeans}. For low wind speeds, the $\u03d2$ values are high (on average), which implies that in this case we have an unstable system. Note that in this dataset, a leveling off of the temperature inversion for low wind speeds [which could correspond to the stable equilibrium of a strong inversion according to the model of van de Wiel et al. (2017)] is not very evident from Fig. 9. It could be that the temperature inversion does not have time to reach the stable equilibrium during the night, or that other instabilities that are not considered in the simplified model arise in strong stability conditions. For example, flow instabilities such as submesomotions, which are favored in strongly stratified situations, could make the system dynamically unstable.

#### 2) Antarctica

The Dome C data were measured at the Concordia Research Station, which is located on the Antarctica Plateau. It is a French–Italian research facility that was built 3233 m above sea level. It is extensively described for example in Genthon et al. (2010). The Dome C dataset contains 10-min-averaged meteorological data from 2017. Regimes and their transitions were analyzed by Vignon et al. (2017) and Baas et al. (2019). Important for our analysis are measurements of the temperature at height 9.4 m and surface, the wind speed (m s^{−1}) at height 8 m and the radiation made in the polar night, which is from March to September. We focus on the polar night during which multiple regime transitions take place. Following van de Wiel et al. (2017) the data are classified into two subcategories of radiative forcing being the sum of net shortwave and incoming longwave radiation: *R*_{+} = *K*^{↓} − *K*^{↑} + *L*^{↓}. Strong cooling is favored in cases of low incoming radiation and when plotting Δ*T* = *T*_{9.4m} − *T*_{s} over the wind speed *U*_{8m} a back-folding of the points becomes apparent when *R*^{+} < 80 W m^{−2} (van de Wiel et al. 2017, their Fig. 6; and less clearly in our Fig. 7). Therefore, we focus on the case when *R*^{+} < 80 W m^{−2}. We apply $\u03d2$ to the longest consecutive time series with *R*^{+} < 80 W m^{−2}, which is from 1050 LT 3 August to 2150 LT 24 August 2017, i.e., 3091 data points. Our analyzed time series is thus shorter than the one visualized in van de Wiel et al. (2017), which explains the differences in the scatterplot. Again we choose *τ* with the Anderson–Darling test and the *K*-means algorithm. The value for *τ* given by the Anderson–Darling test *τ*_{AD} = 43 observations (with a corresponding window duration of 430 min) is much larger than the one given by the *K*-means algorithm (*τ*_{Kmeans} = 10 observations, corresponding to a window duration of 100 min), which is in fact too few points to expect a good fit for the ARMA(*p*, *q*) model. In Fig. 11 we see that no transitions are recognized by $\u03d2$ with *τ* = *τ*_{AD} (left) but with *τ* = *τ*_{Kmeans} (right) some transitions are noted. This indicates that the data frequency of this dataset is not high enough to give reliable results for the stability indicator.

### c. Sensitivity analysis on averaged data

As discussed when applying the indicator $\u03d2$ on the Dome C dataset, there is strong indication for the data frequency being crucial for the reliability of the results when applying the stability indicator. Observational data are often stored in block averages, e.g., measurements over 5-min time window are averaged into 1 data point. The issue with this can be that the data frequency can be too low to sample typical fluctuations during the observed transition. In more detail, if the time taken by the system to transition from one metastable state to the next is less than approximately 20 discrete measurement points (the minimum needed to have relevant statistical results according to our tests), then the approach may not be applicable. Therefore, data frequency needs to be high enough to give reliable results for $\u03d2$. As a comparative study to illustrate this point, we block average the temperature measurements for the Dumosa data, such that we repeat the analysis based on 5-min-averaged data instead of 1-min averages. Thereby, we reduce the length of the time series for each individual night from 1020 to 204 data points. Again we choose *τ* with the Anderson–Darling test and the *K*-means algorithm. There is a clear distinction between the *τ* estimated by the Anderson–Darling test and the one given by the *K*-means algorithm for the 5 min data, contrary to the 1 min data where both methods suggested comparable window lengths. Indeed, *τ*_{AD} = 31 and *τ*_{Kmeans} = 7 for the 5 min averaged data whereas *τ*_{AD} = 19 and *τ*_{Kmeans} = 22 for the 1 min data. The small value for *τ* given by the *K*-means algorithm for the 5 min averaged data suggests that there is only a small amount of points covering the transition time and we cannot fit an ARMA(*p*, *q*) model properly to subsequences this short. Moreover, as the value for *τ* given by the Anderson–Darling test is much bigger than the amount of points covering the transition time we do not expect reliable results for $\u03d2$ with this *τ*. Figure 12 confirms this hypothesis. The left plot is with *τ* = *τ*_{AD} and the right plot is with *τ* = *τ*_{Kmeans}.

## 4. Discussion and conclusions

In this study we analyzed the potential of a statistical indicator to be used to detect the system’s proximity to critical regime transitions in the near-surface temperature inversion. The statistical indicator evaluates the dynamical stability of time series resulting from a dynamical system and was initially suggested in Faranda et al. (2014). Based on idealized numerical simulations, van Hooijdonk et al. (2016) had found the presence of early warning signs in the turbulent flow field before a transition from weakly stable to strongly stable conditions. These signs included a critical slowing down, referring to the fact that dynamical systems tend to recover slower from perturbations when approaching a transition point in the dynamics. This slowdown was evaluated based on fluctuations of the temperature field and the early warning signal relied on a change in the variance. Such metrics, which are often used in studies of tipping points, can become problematic when the underlying dynamics is highly nonstationary, as an increase of variance could be due to the nonstationarity of the system without implying a transition (Lenton et al. 2012; Faranda et al. 2014). The typical scatter of atmospheric field data and their inherent nonstationarity makes the application of classical critical slowdown metrics difficult. The metric presented and used here is different in that it statistically quantifies the deviation from the dynamics expected when the system is close to a stable equilibrium. Specifically, the indicator is based on ARMA modeling with a moving window for which local stationarity is assumed, and the distance from stable equilibrium dynamics is evaluated based on a Bayesian information criterion. The indicator crucially relies on an appropriate window length and we suggested two methods to select its value in a data-driven manner. That is, both methods can be used when the underlying model governing the dynamics is unknown, such that those can be applied to field data with significant scatter. Our suggestion to ensure reliable results is to use a combination of both approaches. The shortest residence time around an equilibrium estimated through the *K*-means approach provides an upper bound to select a window length that respects the time scale of the system, i.e., a length that ensures local stationarity for ARMA model fitting. The window length should be selected as shorter or equal to this upper bound, and such that the data within individual windows mostly satisfy normality to ensure reliable Bayesian inference. The Anderson–Darling normality test is appropriate, but an improvement of the clustering approach to estimate the residence time around an equilibrium (here done based on a simple *K*-means clustering approach) would be beneficial. Based on this approach, we find that a nocturnal temperature inversion dataset with a sampling frequency of 1 min can be analyzed successfully using a window length of approximately 20 min. Slower sampling frequency did not lead to conclusive results.

The conceptual model introduced by van de Wiel et al. (2017) was developed to understand regime transitions in the near-surface temperature inversion and can support scenarios with multiple stable equilibria. For our purpose of identifying the system’s proximity to regime transitions, it offers an ideal model for which the theoretical dynamical stability can be calculated analytically. We extended the model to include random perturbations in the dynamics and used the resulting stochastic model to provide a test dataset on which to evaluate the potential of the indicator of regime transitions. In this stochastic system, small-scale perturbations can be amplified due to the nonlinearity, resulting in transitions between the bistable equilibria. Our simulations show such noise-induced regime transitions, successfully identified by the indicator $\u03d2$. More research would however be beneficial in order to assess the type of noise that is appropriate to represent randomized dynamics of the SBL.

The application to field data was done for one nocturnal dataset and one polar dataset. In their discussion, van de Wiel et al. (2017) suggest that the strength of the thermal coupling between the soil and the atmosphere may be a key process to distinguish between cases where the temperature inversion has a unique stable equilibria and cases with bistable equilibria, separated by an unstable equilibrium. The wind speed dependence of observational scatter is partly attributed to the existence of a dynamically unstable branch in the system in cases where the thermal coupling is weak. In both datasets considered in our analysis, a weak thermal coupling is to be expected. Clearly in the polar dataset, the snow surface leads to a weak thermal coupling between the atmosphere and the soil (Vignon et al. 2017; van de Wiel et al. 2017). The nocturnal dataset originates from a wheat crop near Dumosa, Australia, probably resulting in a weak thermal coupling as well. While the Dome C data did not have the required sampling rate in order to have reliable estimates of the dynamical stability, the Dumosa data were found to have a clear signal with one dynamically stable branch and one dynamically unstable branch. A second dynamically stable branch corresponding to a strong inversion was not clearly observed. This data-driven result agrees with the theoretical result of van de Wiel et al. (2017), namely, that a dynamically unstable branch exists for a certain range of wind speeds in case of weak atmosphere–surface thermal coupling. Note that this is an idealized model and other nonrepresented physical processes may be at work and impact the interpretations. As an additional note of caution, if the nature of the noise was different in the two populations evident in the Dumosa data, then the value of $\u03d2$ could differ even if both branches were dynamically stable. Differences in the noise memory properties may also impact the results. Indeed, depending on how the noise enters the dynamics, its memory might or might not be represented well by an ARMA model. The result on the Dumosa data is nevertheless promising for the use of the indicator as an early warning signal of regime transitions. Extending the analysis to a polar night with an appropriate sampling frequency would be very interesting, as multiple regime transitions occur during the long-lived temperature inversion (Baas et al. 2019). Moreover, comparing results obtained for a site with strong atmosphere–surface thermal coupling would provide great insight to compare the dynamical stability of field data to the dynamical stability predicted by the conceptual model.

To be noted is the fact that the conceptual model is derived for a temperature inversion between the surface and a height at which the wind speed stays relatively constant during the night, found to be approximately 40 m at Cabauw in the Netherlands and 10 m at Dome C. The Dumosa dataset did not offer the possibility to select a height with such a constant wind speed. The measurements, taken at lower heights in this case, will be prone to submesoscale activity, inducing perturbations of the shallow inversion, which could affect the dynamical stability of the time series. In fact, the earlier application of the dynamical stability indicator to SBL data in Nevo et al. (2017) showed that higher stability corresponded to unstable dynamics of the vertical velocity fluctuations and of the wind speed. More analyses would be needed to assess the influence of the measurement height on the evaluated dynamical stability of the temperature inversion. Nevertheless, our results encourage the use of the statistical dynamical stability as a metric to detect nearing regime transitions in the SBL. The ability to detect nearing regime transitions in atmospheric numerical weather prediction and climate models could offer a possibility to use a different type of SBL parameterization in those specific cases without relying on the assumption of turbulence stationarity.

## Acknowledgments

The authors wish to thank Christophe Genthon and Etienne Vignon for providing the Dome C data, obtained as part of the CALVA observation project supported by the French polar institute IPEV, and for lively discussions. The research was funded by the Deutsche Forschungsgemeinschaft (DFG) through Grant VE 933/2-2. A.K. acknowledges funding from the DFG through the Collaborative Research Center CRC1114, “Scaling Cascades in Complex system.” The scientific exchanges between N.V. and D.F. were greatly facilitated by funding through the DAAD exchange program Procope through the project “Data-driven dynamical stability of stably stratified turbulent flows.” We are grateful to anonymous reviewers for insightful comments that helped us improve the manuscript.

### APPENDIX A

#### Details of the *K*-Means Clustering Algorithm

The clustering is done using the *K*-means algorithm with the following steps:

Input:

*k*= number of clusters, set of points*x*_{i−τ+1}, …,*x*_{i}*.*Place centroids

*c*_{1}, …,*c*_{k}at random locations.Repeat until none of the cluster assignments change:

For each point

*x*_{i}find nearest centroid*c*_{j}and assign*x*_{i}to cluster*j*For each cluster

*j*= 1, …,*k*calculate new centroid*c*_{j}= mean of all points*x*_{i}assigned to cluster*j*in previous step.

### APPENDIX B

#### Details of the Anderson–Darling Normality Test

*F*

_{n}(

*x*) and the normal distribution

*F**(

*x*). The statistic for this test is

*ψ*is a nonnegative weight function that is used to emphasize the tails of the presumed distribution. We use the modified AD statistic given by D’Agostino and Stephens (1986), which takes into accounts the sample size

*n*:

The null hypothesis of this normality test is that the data are sampled from a normal distribution. When the *p*-value is greater than the predetermined critical value (*α* = 0.05), the null hypothesis is not rejected and thus we conclude that the data are normally distributed.

### APPENDIX C

#### Summary of the Algorithmic Procedure

The full procedure to apply the statistical indicator to a time series follows the following steps:

Input: Time series

*x*_{1}, …,*x*_{T}*.*Evaluate the window length

*τ*_{Kmeans}:Cluster the time series in

*k*clusters (expected number of equilibrium states) using the*K*-means algorithm.For each cluster, calculate the mean residence time of the time series.

The minimal mean residence time over the

*k*clusters provides an upper bound for*τ*_{Kmeans}.

Evaluate the window length

*τ*_{AD}:For a range of window lengths

*τ*_{min}<*τ*<*τ*_{max}, apply the AD test statistic to the time series in a moving window approach.For each

*τ*, calculate the median of the*p*-values obtained over all windows.Select the largest

*τ*for which the median*p*-value is greater than the predetermined critical value (*α*= 0.05) as*τ*_{AD}.

If

*τ*_{AD}<*τ*_{Kmeans}, select*τ*_{AD}as a window length. Else, ARMA model fitting may be inappropriate.Repeat for each window of length

*τ*_{AD}:Select the best fitting ARMA(

*p*,*q*) model (minimal BIC).Fit an ARMA (1, 0) to the window and calculate the BIC.

Calculate $\u03d2$. High values will indicate transitions. The values themselves may depend on the dataset.

## REFERENCES

*J. Atmos. Sci.*

*Bound.-Layer Meteor.*

*Quart. J. Roy. Meteor. Soc.*

*Quart. J. Roy. Meteor. Soc.*

*Ann. Math. Stat.*

*Bound.-Layer Meteor.*

*Quart. J. Roy. Meteor. Soc.*

*Introduction to Time Series and Forecasting*

*Sci. Rep.*

*Quart. J. Roy. Meteor. Soc.*

*J. Phys.*

*J. Geophys. Res.*

*J. Roy. Stat. Soc.*

*Ecol. Lett.*

*J. Atmos. Sci.*

*Bull. Amer. Meteor. Soc.*

*Phys. Rev. E*

*Elements of Applied Bifurcation Theory*. Vol. 112. Springer Science and Business Media, 632 pp

*Bound.-Layer Meteor.*

*A Century of Progress in Atmospheric and Related Sciences: Celebrating the American Meteorological Society Centennial*,

*Meteor. Monogr.*, No. 59, Amer. Meteor. Soc.

*Philos. Trans. Roy. Soc. London*

*Annu. Rev. Fluid Mech.*

*Quart. J. Roy. Meteor. Soc.*

*J. Atmos. Sci.*

*J. Atmos. Sci.*

*Phys. Rev. Fluids*

*J. Adv. Model. Earth Syst.*

*Nature*

*Science*

*Ann. Stat.*

*J. Amer. Stat. Assoc.*

*J. Atmos. Sci.*

*Bound.-Layer Meteor.*

*J. Fluid Mech.*

*Flow Turbul. Combust.*

*J. Atmos. Sci.*

*J. Atmos. Sci.*

*J. Atmos. Sci.*

*J. Atmos. Sci.*

*Bound.-Layer Meteor.*

*J. Atmos. Sci.*

*Quart. J. Roy. Meteor. Soc.*

*Bound.-Layer Meteor.*

*Quart. J. Roy. Meteor. Soc.*

## Footnotes

Denotes content that is immediately available upon publication as open access.

Goodness-of-Fit Techniques. Marcel Dekker, 560 pp.