## Abstract

Understanding non-Gaussian statistics of atmospheric synoptic and low-frequency variability has important consequences in the atmospheric sciences, not least because weather and climate risk assessment depends on knowing and understanding the exact shape of the system’s probability density function. While there is no doubt that many atmospheric variables exhibit non-Gaussian statistics on many time (and spatial) scales, a full and complete understanding of this phenomenon remains a challenge. Various mechanisms behind the observed atmospheric non-Gaussian statistics have been proposed but remain, however, multifaceted and scattered in the literature: nonlinear dynamics, multiplicative noise, cross-frequency coupling, nonlinear boundary layer drag, and others. Given the importance of this subject for weather and climate research, and in an attempt to contribute to this topic, a thorough review and discussion of the different mechanisms that lead to non-Gaussian weather and climate variability are presented in this paper and an outlook is given.

## 1. Introduction: What is the problem?

One of the defining characteristics of Earth’s weather and climate is its variability on a very broad spectrum of time scales, ranging from hours to centuries and millennia (e.g., Peixoto and Oort 1992). Whereas external orbital variations described by Milankovitch cycles are believed to be the dominant driving force for millennial climate variability (Hays et al. 1976; Berger et al. 1984), weather and climate variations on shorter submillennial time scales are the result of complex nonlinear interactions among many subsystems, degrees of freedom, or modes (e.g., Ghil and Childress 1987; Ghil and Robertson 2002).

Because this paper (with a few exceptions) focuses on synoptic and large-scale atmospheric variability, we predominantly distinguish between synoptic time scales of up to a few days and low-frequency variability, defined as being above the synoptic time scale. Whereas the synoptic band is populated by dynamically relatively well-understood weather systems and phenomena, which can be said to be “predictable” to some extent, atmospheric low-frequency variability is not yet fully understood. However, a thorough understanding of low-frequency variations, dominated by tropospheric planetary waves and other coherent large-scale structures such as blocking, is essential for extending the practical range of numerical weather forecasts beyond the theoretical limit of about 1–2 weeks. Attention has thus been mostly paid to finding persistent or quasi-stationary atmospheric flow regimes, often loosely defined as large-scale atmospheric flow configurations that persist longer than individual weather systems (e.g., Pandolfo 1993). The situation, however, is much deeper than this, as discussed below.

The atmospheric system is a high-dimensional and highly complex dynamical system with very many nonlinearly interacting modes. One important feature of the large-scale (predominantly low-frequency) atmospheric circulation is its non-Gaussianity. Understanding non-Gaussianity has become an important objective in weather/climate research, because weather and climate risk assessment depends on knowing and understanding the tails of probability distributions. In particular, there is broad consensus that the most hazardous effects of climate change are related to a potential increase (in frequency and/or intensity) of extreme weather and climate events populating the tails of often non-Gaussian distributions.

Gaussianity or normality is a basic concept so important, for reasons that will become clear later, that non-Gaussianity often begs for explanation. It should be noted that many important atmospheric small-scale and mesoscale phenomena exhibit strong non-Gaussian statistics. Most importantly, precipitation statistics are highly non-Gaussian, often modeled by a gamma distribution (Wilks 2006). In fact, understanding and modeling precipitation and convection statistics is a very active and important area of research (e.g., Peters et al. 2001; Neelin et al. 2008; Peters et al. 2009; Stechmann and Neelin 2011, 2014; Hannachi 2014). However, whereas all atmospheric time and spatial scales are definitely linked through nonlinear (i.e., multiscale) interactions, the primary focus of this paper is on the large-scale atmospheric circulation. In particular, non-Gaussian precipitation and convection statistics are such important topics that they deserve a review paper in their own right; a thorough review is beyond the scope of this paper.

One of the earliest studies that highlighted the non-Gaussianity of the large-scale circulation goes back to the early 1980s with White (1980), which was followed up by many studies since then (Trenberth and Mo 1985; Sutera 1986; Hansen and Sutera 1986; Nakamura and Wallace 1991; Holzer 1996; Stephenson et al. 2004; Hannachi 2010; Woollings et al. 2010a; Turner and Hannachi 2010; Hannachi and Turner 2013b; Berner and Branstator 2007; Rennert and Wallace 2009; Sura et al. 2005; Perron and Sura 2013; Pasmanter and Selten 2010). Non-Gaussianity manifests most often in skewness and/or kurtosis as discussed later. The origin of non-Gaussianity, however, is not unique and we can cite (and discuss in this paper) nonlinear dynamics, multiplicative noise, cross-frequency coupling, and nonlinear boundary layer drag. (We note here that extreme events, resulting for example from severe storms, can affect one side of the distribution tail more than the other, hence generating skewness.) For example, the nonlinear paradigm is based on the fact that large-scale flow has a tendency to reside, longer^{1} than typical synoptic time scales, in “preferred” regions of the system’s state space. Broadly speaking those regions represent what is known as nonlinear circulation regimes.

The hope is that such regions contain initial atmospheric states from which better long-term forecasts are possible (Colucci and Baumhefner 1992). The issue of climate change has also brought the question related to the existence of preferred flow regimes up front in science media (Palmer 1999; Corti et al. 1999; Hsu and Zwiers 2001; Christiansen 2003). That is, the dynamics of low-frequency variability and flow regimes has been, and still is, the focus of many studies. An often used measure for atmospheric regime behavior resulting from nonlinear dynamics is a non-Gaussian probability density function (PDF) of the variable under consideration, and we will focus on this condition in this paper. Note, however, that the *absence* of non-Gaussianity is *not* equivalent to the absence of regimes. In fact, a system with Gaussian statistics may still show some persistence in regions of its phase space because the PDF of a stochastic system is shaped by the deterministic drift *and* stochastic diffusion (called the Fokker–Planck equation). Nevertheless, our main concern is the (relatively) easy to observe non-Gaussianity of atmospheric variability.

While there is no doubt that many atmospheric variables exhibit non-Gaussian statistics on many time scales, the proposed mechanisms behind the observed non-Gaussianity are multifaceted and scattered in the literature. Atmospheric blocking, for example, in addition to the extreme weather events mentioned above, contributes consistently to the observed non-Gaussian behavior in weather and climate variability. We therefore feel a need to compile and present all those processes in one place. Given the importance of this topic for climate research, it is our wish in this paper to review and discuss the different mechanisms that can lead to non-Gaussian weather and climate variability. This paper, however, does not provide a categorical synthesis to reconcile the different theories, a task that goes well beyond the scope of this paper but is a direction of our future study. The objective of this paper is threefold:

to compile and discuss the different investigations in one place, which we believe will meet the need of the climate researcher;

to help evaluate and assess what has been done and what might remain to be done; and

to use this document for didactic purposes for students starting a research career in large-scale dynamics.

To achieve this goal and get some insights into non-Gaussianity in weather and climate, it is necessary to have a thorough understanding of the central role of Gaussianity^{2} first. The paper is organized as follows. In section 2 we briefly review some basics as to why the Gaussian distribution plays such a central role not only in climate but also in nearly all branches of science. In section 3 we present some basic measures of non-Gaussianity. A description of non-Gaussianity in weather and climate and its possible sources is given in section 4. Section 5 discusses how weather and climate patterns maximizing non-Gaussianity can be defined in much the same way as empirical orthogonal functions, and a summary and discussion are provided in section 6.

## 2. Gaussianity: Basics

The distribution most commonly used in the physical sciences is the Gaussian distribution. In fact, the Gaussian is extensively used in *all* fields that deal with data analysis (like social sciences, medicine, and life sciences). This section reviews a few fundamental ideas of why the Gaussian distribution plays such a central role in probability and statistics and notably in weather and climate. It seems that the normal distribution was first used by Abraham de Moivre in 1733. However, it was Carl Friedrich Gauss who proved in 1809 that the maximum likelihood estimator of a PDF’s location parameter equals the arithmetic mean only for the normal distribution. The distribution was later coined “Gaussian.” Nevertheless, Gauss’s results went initially largely unnoticed because they were presented as a minor finding in a large book on astronomy. It was Pierre-Simon Laplace who saw the significance in 1810 and then gave us the first version of the central limit theorem shortly thereafter. See Stigler (1990) or Jaynes (2003) for more detailed historical accounts of the Gaussian distribution.

### a. Central limit theorem

The central limit theorem (CLT) is one of the cornerstones of probability and mathematical statistics. It explains why the distributions of many observed quantities tend to be close to Gaussian. In a nutshell, the classical CLT states that the distribution of the sum (or mean) of a sufficiently large number of independent but equally distributed (with identical finite mean and variance) random variables will be approximately Gaussian. Remember that many variables studied in meteorology and climatology are averages or integrated quantities of some type (von Storch and Zwiers 1999; Wilks 2006). That is, we may expect to observe Gaussian distributions frequently, as long as the requirements for the classical CLT are approximately met (independence, identical distributions, and finite variance).

In mathematical terms, let be independent random variables that are identically distributed (i.e., all have the same PDF) and have finite mean *μ* and variance *σ*^{2}. Then if (), we have

where *P*[⋅] denotes the probability of an event happening. That is, the random variable , which is the standardized variable corresponding to *S*_{n}, is asymptotically normal. The theorem is also true under more general conditions. For example, it also holds when are independent random variables with the same mean and variance but not necessarily identically distributed. For a more detailed discussion see a standard textbook on probability and statistics (e.g., Feller 1968; Schiller et al. 2008). Beyond the standard textbook treatment it should be noted that the approach to the Gaussian limit may be very slow and not necessarily monotonic. For example, if the system has a highly ordered attractor (like the Lorenz attractor), then the PDFs of running averages may show complicated changes as the averaging window is increased (Marshall and Molteni 1993; Dwivedi et al. 2007). Furthermore, if high-frequency “synoptic” Gaussian variability is superimposed upon non-Gaussian low-frequency variability, then averaging may first amplify non-Gaussianity (Teng et al. 2004).

Besides specific cases where approaching the Gaussian limit is very slow and nonmonotonic, a common CLT example considers the sum of many independent harmonic waves. While the distribution of a single harmonic wave is highly non-Gaussian (it is bimodal because the signal slows down near the two extreme phase), a sum of many independent harmonic waves becomes close to Gaussian. That is why, neglecting nonlinear wave–wave interactions as a first approximation, the distribution of local sea surface heights (dominated by a random field of wind forced surface gravity waves) is close to Gaussian (e.g., LeBlond and Mysak 1978; Ochi 2005; Gemmrich and Garett 2008). One could argue that atmospheric non-Gaussianity is at least partly due to strong harmonic signals. However, in the atmosphere we rarely observe strong harmonic signals. One exception is the equatorial quasi-biennial oscillation (QBO) with a quasi-periodicity of about 27–29 months affecting tropical dynamics.

Overall, the key ingredient of any CLT version is that the random variable being observed should be the sum (or mean) of many *independent* random variables. That is, if the random variables are *not* independent of each other we expect non-Gaussian distributions to arise. One obvious implication is that temporal and/or spatial averaging should make any quantity more Gaussian, as long as the original signal is not correlated in time and/or space. For example, if we have a non-Gaussian signal on a daily time scale that is uncorrelated from day to day, averaging in time (e.g., weekly, monthly, seasonally, or annually) would make the signal more Gaussian. However, if the original non-Gaussian signal has a significant nonvanishing autocorrelation for time lags beyond a day, temporal averaging will have less an effect on the original non-Gaussianity. That means that atmospheric low-frequency variability beyond the synoptic time scale typically prohibits the strict application of the CLT on longer time scales, potentially allowing for significant non-Gaussian low-frequency variability. It is worth noting here that in very high-dimensional systems the PDFs of the marginal distributions in most low-dimensional hyperspaces are near-Gaussian because of the possible involvement of sums of large numbers of variables,^{3} as in EOFs.

Thus we identified two opposing statistical principles: In general, averaging destroys non-Gaussianity, whereas (auto)correlations (or persistence) have a tendency to retain non-Gaussianity by slowing down the approach to Gaussianity. As we have seen, the devil is in the details, and the actual rate (not necessarily monotonic) of convergence to Gaussianity depends on the system under consideration. However, the interplay between these two principles can categorize the nature of the PDF of climate. But, besides these statistical principles there is also another “physical–statistical principle” that can lead to Gaussianity, and this is discussed next.

### b. Maximum entropy distribution

Another way of looking at the distribution of a physical quantity is the following. Suppose a physical system is characterized by a set of general constraints. For example, under some circumstances it makes physical sense to assume that the mean and the variance of a system are fixed. While the real atmosphere is full of nonstationary phenomena (e.g., annual and diurnal cycles), in such conditions the assumption of statistical stationarity is reasonable to first order. We can then ask this question: Which distribution is the most consistent (and probable) with the given constraints?

The answer to that question is given by the principle of maximum entropy, introduced by Jaynes^{4} in 1957, postulating that the probability density function *p*(*x*), which best represents a given variable *x* subject to known constraints, is the one with the largest information entropy (Jaynes 1957a,b). The information entropy is often described as a measure that describes how uninformative a particular probability distribution is. That is, the principle of maximum entropy would give the distribution that is the most noninformative given the constraints. However, most uninformative is equivalent to most probable distribution.

The probabilistic interpretation of the maximum entropy distribution (the “Wallis derivation”) is attributed to Graham Wallis and is described in detail in Jaynes (2003); it is mathematically equivalent to the combinatorial derivation of Maxwell–Boltzmann statistics. In a nutshell, suppose we want to randomly distribute *N* quanta of probability (each worth 1/*N*) among *m* bins (think of randomly tossing *N* balls in *m* buckets or bins). That will result in *n*_{i} quanta or balls in the *i*th bin (i.e., *i* = 1, …, *m*), giving us the discrete probability distribution *p*_{i} = *n*_{i}/*N*. Normalization requires . In addition, suppose that the resulting distribution *p*_{i} has to meet certain constraints (e.g., a prescribed mean and variance). That means, we have to check if the probability *p*_{i} thus obtained is consistent with the given constraints. If not, we will repeat the experiment until the constraints are met. What is the most probable distribution of this random experiment? The probability of a specific distribution *p*_{i} (with *n*_{i} quanta or balls in bin *i*) is the multinomial distribution (e.g., Feller 1968). That is, the probability *P* to randomly produce the distribution *p*_{i} is

The most probable result is the one which maximizes *P*. Instead of maximizing *P* directly we can also maximize the logarithm of *P*. Doing just that gives us the desired answer. The most probable distribution *p*_{i} is the one maximizing , subject to the constraints, which we identify as the discrete version of the information entropy *S*.

For example, given only the mean *μ* and the variance *σ*^{2} of a real-valued *x*, the Gaussian distribution maximizes the information entropy . That means, given the mean and variance of a physical system (and no other constraints), that the most uninformative (i.e., probable) distribution is the Gaussian. Another well-known example is, of course, that of a nonnegative variable *E* (such as energy), where only the average is known. The maximum entropy probabilities *p*(*E*) are then given by , which we identify as the Boltzmann distribution. Therefore, the laws of statistical mechanics can be interpreted independently of physical arguments: they simply represent the best estimate, given limited information (Jaynes 1957a,b). That also means that the information entropy and the thermodynamic entropy are conceptually equivalent (having in mind that in thermodynamic applications we have to include the Boltzmann constant). A comprehensive treatment of information theory applied to geophysical flows can be found in Majda and Wang (2006).

To summarize, if we assume that it makes physical sense that the mean and the variance of a system are fixed, we identified another physical–statistical mechanism that prefers Gaussianity. However, how do we reconcile the principle of maximum entropy with the observed non-Gaussian statistics in weather and climate? Within the maximum entropy framework there is only one answer: other, very specific constraints inherent to the systems or quantity under consideration.

## 3. Basic measures of non-Gaussianity

Before we look into the origin and sources of atmospheric non-Gaussianity, let us remember how we typically identify non-Gaussian statistics. The most obvious method is to simply compare an observed PDF to a Gaussian, keeping in mind that we have to perform tests to quantify the statistical significance of deviations from Gaussianity (see, e.g., Stephenson et al. 2004). It should be noted that impeccable statistical significance testing of atmospheric non-Gaussianity is itself a major issue, mainly because of the temporal and spatial correlations in the data but also because many of the methods used have been chosen ad hoc without a sufficient knowledge about their properties. While this is an important and interesting topic, a detailed discussion is beyond the scope of this paper. A thorough discussion of the statistical significance of non-Gaussian atmospheric regimes can be found in Stephenson et al. (2004), while Christiansen (2005b) discusses the possibility of finding spurious multimodality in a unimodal dataset if a specific method (nonlinear principal component analysis) is not applied carefully. However, most statistical test for non-Gaussianity use some kind of Monte Carlo method (e.g., Kimoto and Ghil 1993a; Christiansen 2005a, 2009; Perron and Sura 2013), which can be (relatively) easily implemented for univariate or bivariate time series.

The univariate case is straightforward and easy to visualize. Take a time series, calculate the PDF, and compare to a Gaussian (i.e., significance test). As a typical bivariate case, we consider non-Gaussian regimes in a highly truncated bivariate phase space spanned by two leading EOFs of daily geopotential height (or streamfunction) anomalies from NCEP–NCAR reanalyses (e.g., Mo and Ghil 1988; Molteni et al. 1990; Kimoto and Ghil 1993a,b; Corti et al. 1999; Smyth et al. 1999; Weisheimer et al. 2001; Stephenson et al. 2004; Monahan et al. 2001; Berner 2005; Berner and Branstator 2007; Majda et al. 2003, 2008; Franzke et al. 2005; Sura et al. 2005). In our example (from Sura et al. 2005), 750-hPa streamfunction anomalies were defined by removing the seasonal cycle at each grid point and then applying a 7-day running mean filter. An empirical orthogonal function/principal component (EOF/PC) analysis (e.g., Hannachi 2007) was then applied to the streamfunction anomalies. The first two principal components (PC1 and PC2) are then normalized to have zero mean and unit standard deviation. The bivariate PDF of PC1 and PC2 is shown in Fig. 1a. Positive (negative) departures from Gaussianity (Fig. 1b) indicate that the observed PDF is greater (smaller) than the corresponding bivariate Gaussian distribution. The regions of positive and negative departures from Gaussianity are statistically significant at the 90% confidence level (thick contours in Fig. 1a), as determined by a Monte Carlo method. Similar plots can be found in many publications (e.g., Kimoto and Ghil 1993a; Weisheimer et al. 2001; Stephenson et al. 2004).

In the important case of having a time series at many horizontal (or vertical) locations, we typically cannot estimate the full PDF at every grid point. Then skewness (normalized third moment) and kurtosis (normalized fourth moment) can be used to assess (and visualize) the spatial structure of non-Gaussian statistics. For anomalies *x*′ with zero mean and standard deviation *σ*, the skewness *S* and kurtosis *K* are defined respectively as

where denotes an ensemble average, or assuming ergodicity, a time average. Remember that skewness represents the asymmetry of the PDF. It is positive if the right tail contains more data than the left tail, and negative if the opposite is true. Symmetric distributions, such as the classic Gaussian, have zero skewness. Kurtosis, on the other hand, represents the peakiness of the data distribution; it is high when the data include numerous extreme events. The definition *K* here is sometimes referred to as “excess kurtosis” with respect to that of a Gaussian distribution. The exact standard errors of skewness and kurtosis depend on their underlying population distribution, but can be approximated using a Gaussian as and , respectively, where *N*_{in} is the effective number of independent observations (e.g., Brooks and Carruthers 1953; Perron and Sura 2013).

Some of the earliest maps of atmospheric skewness and kurtosis (of 500-hPa geopotential heights) were presented in White (1980) and Trenberth and Mo (1985). A more recent example is presented in Fig. 2, showing horizontal (500 hPa) and zonally averaged maps of daily December–February (DJF) and June–August (JJA) geopotential height skewness and kurtosis. In a nutshell, we see (for the most part) that the non-Gaussianity follows zonally symmetric patterns. Overall, the horizontal patterns are consistent with the earlier findings of White (1980), Trenberth and Mo (1985), Nakamura and Wallace (1991), and Holzer (1996). In addition, the cross sections are analogous to the ones found in Holzer (1996). In these early (and groundbreaking) papers, the areas featuring non-Gaussianity were qualitatively linked to climatological features of the synoptic atmosphere such as the mean jet location and troughs and ridges. In particular, an important part of the non-Gaussianity is attributed to blocking in the atmosphere, in which the large-scale flow pattern becomes stuck in an untypical configuration for a few days to several weeks.

## 4. Non-Gaussianity: Sources and mechanisms

In this section we present and discuss general mechanisms that can lead to non-Gaussian atmospheric behavior: nonlinear deterministic regime dynamics, multiplicative noise, cross-frequency coupling, jet variability and meandering, and nonlinear boundary layer drag. In the conclusion we will evaluate the conflict between the different theories and make suggestions for future research.

### a. Nonlinear deterministic regime dynamics

Traditionally, the non-Gaussiansity of the atmospheric large-scale circulation has been attributed to nonlinear deterministic dynamics. It has long been observed that the atmospheric midlatitude circulation appears to alternate between a zonal flow and a blocked flow with a pronounced wave component (e.g., Pandolfo 1993). This behavior suggests that there may exist more than one large-scale flow regime consistent with a given external forcing. Charney and DeVore (1979), Wiin-Nielsen (1979), and Hart (1979) were the first who proposed that the occurrence of large-scale flow regimes may be due to multiple equilibria of the nonlinear governing equations. They suggest that blocking and zonal flow could be associated with two stable stationary solutions of the spectrally truncated nonlinear barotropic quasigeostrophic vorticity equation.

The general mathematical theory is straightforward and very appealing. For convenience and didactic purposes we briefly summarize the main ideas. Consider a *p*-dimensional dynamical system written as an ordinary differential equation for the *p*-dimensional state vector :

The system is called autonomous if (no explicit time dependence). In the following we only discuss autonomous systems, and we can assume without loss of generality that it is also dissipative, that is,

The roots of [points were ] are called equilibria, fixed points, singular points, or steady (or stationary) states of the dynamical system. The stability of a given equilibrium **x**_{e} can be deduced from the linearized version of the dynamical system, which is obtained by a truncated Taylor series expanded about the equilibrium **x**_{e}. That is, we perturb the equilibrium **x**_{e} through a small disturbance ** δ**:

Linearization yields

with the Jacobian matrix _{A},

That is, the dynamics of the perturbation ** δ** is governed, to first approximation, by a linear

*p*-dimensional autonomous ordinary differential equation

The stability of Eq. (10) is deduced from the eigenvalues of the Jacobian matrix at the equilibrium **x**_{e}. The eigenvalues *λ* are given by

where is the *p* × *p* identity matrix. If all eigenvalues of have a negative real part, the equilibrium **x**_{e} is stable. If just one eigenvalue has a positive real part, the equilibrium **x**_{e} is unstable. Within this framework Legras and Ghil (1985) elucidate the dynamical importance of unstable fixed points. Steady states that are unstable to a small number of modes, but stable to a large number of modes, may act to steer a time-dependent model, thus providing a mechanism for a temporarily persistent regime and, therefore, non-Gaussian statistics. This spurred a renewed effort to explore the existence of regimes in observed midlatitude flows (e.g., Sutera 1986; Hansen and Sutera 1986; Mo and Ghil 1988; Molteni et al. 1990; Kimoto and Ghil 1993a,b; Cheng and Wallace 1993; Corti et al. 1999; Smyth et al. 1999; Monahan et al. 2001, and many others). Sutera (1986) and Hansen and Sutera (1986) pioneered this effort by studying an atmospheric wave amplitude index (the root-mean-square of the wavenumber 2–4 height variance) and finding two non-Gaussian flow regimes (i.e., a bimodal PDF) in a short dataset, of 4 (Sutera 1986) and 16 (Hansen and Sutera 1986) winters. However, the statistical significance of the two regimes has been questioned repeatedly (e.g., Christiansen 2005a; Ambaum 2008). In fact, Ambaum (2008) showed that the wave amplitude index is basically unimodal. In addition, other observational studies generally did not find regime behavior as pronounced as in the theoretical and simple model studies predicted by dynamical system theory. That is, the observational evidence for multiple states or regimes in the atmospheric circulation is rather equivocal. One main reason could be the relatively short record of observations (or reanalyses), keeping in mind the substantial autocorrelation in these data particularly for daily time scales. For monthly time scales the observational evidence for non-Gaussian regimes remains so weak that the null hypothesis of multinormal behavior cannot be rejected at the 5% significance level (Stephenson et al. 2004). Furthermore, the relevance of multiple equilibria in truncated models has been questioned by Tung and Rosenthal (1985).

Observational evidence for regimes proved to be a challenging problem and not an easy exercise, particularly when looking straight at standard variables (e.g., geopotential height and sea level pressure). Thus conclusions obtained so far seem to suggest dependence on the variable and method used. In fact, the search for multiple states in the atmospheric circulation has been an ongoing endeavor guided by improved statistical techniques (such as mixture and hidden Markov models), long model runs (to obtain longer-than-observed datasets), physical intuition (such as analyzing novel variables like jet stream position), simplified models, or clever combinations of all thereof. In the remainder of this subsection on nonlinear deterministic dynamics we discuss several groundbreaking studies that did just that. Note that not all results are necessarily consistent with each other, illustrating the still equivocal character of the search for nonlinear deterministic regimes.

For example, it is possible to find metastable (i.e., quasi-persistent) atmospheric flow regimes despite nearly Gaussian statistics, and this has been done at various levels. The analyses performed by Haines and Hannachi (1995) and Hannachi (1997a,b) provide some evidence for the existence of nonlinear regimes in midlatitude tropospheric low-frequency variability (LFV), in the form of equilibria of simplified dynamical systems. They projected the quasigeostrophic simplified dynamics onto the space spanned by the leading few EOFs and identified the flow tendencies and associated equilibria (e.g., Haines and Hannachi 1995; Hannachi 1997a). Figure 3 shows an example of flow tendencies obtained using the dynamics of a two-level quasigeostrophic model on the sphere projected onto the leading two streamfunction EOFs of a 10-yr perpetual January run of the U.K. Universities Global Atmospheric Modeling Project (UGAMP; see, e.g., Hannachi 1997b). The singular points (or equilibria) of the obtained dynamics are shown as points with zero tendencies. Among the states identified were the Pacific–North American (PNA) patterns and blocking flows over the North Pacific (Haines and Hannachi 1995; Hannachi 1997b). In the same vein Branstator and Berner (2005) used a very long simulation of the NCAR GCM and investigated the flow tendencies within the space of the leading few EOFs. They also identified equilibria over the extratropical Northern Hemisphere in the form of propagating polar waves.

The dynamics of dissipative systems within its state space is determined to a large extent by the metastable fixed points. In addition, unstable periodic homoclinic orbits define the nature of the system’s attracting set, which is chaotic. The chaotic attractor of low-order dissipative systems has a topological complicated geometry, often fractal, and consequently with a non-Gaussian PDF. The Lorenz system (Lorenz 1963) is a simple model of this paradigm. Chaotic behavior, however, has not been identified beyond a shadow of a doubt in the observed atmospheric extratropical low-frequency dynamics.

As we stated above, it is possible that the chaotic behavior could be concealed or eroded by the complexity and high-dimensionality of the atmospheric system with very many nonlinearly interacting modes. The quest for equilibria in observations has therefore turned into a quest for “quasi-stationarity” (Legras and Ghil 1985; Vautard 1990; Marshall and Molteni 1993), persistence (Horel 1985; Dole and Gordon 1983; Legras and Ghil 1985; Hansen et al. 1993), clustering within the system’s state space (Molteni et al. 1990; Cheng and Wallace 1993; Hannachi 2007; Straus et al. 2007), and exploration of the non-Gaussianity of the system’s PDF (Hannachi 2007, 2010; Woollings et al. 2010a,b; Hannachi et al. 2012).

The outstanding feature of a non-Gaussian PDF is its skewness and/or kurtosis, which may be noted visually in one and two dimensions. The nonlinear regime paradigm constitutes one way to interpret the non-Gaussianity through an elegant probabilistic framework, the mixture model, by which any PDF *f*(**x**) can be decomposed as closely as desired by a convex sum of multivariate Gaussian density functions (e.g., Anderson and Moore 1979) as

where are the *M* mixing proportions or weights of the mixture model and they satisfy

and *μ*_{k} and **Σ**_{k} are, respectively, the mean and the covariance matrix of the *k*th, , multivariate normal density function *g*_{k}:

where *p* is the state space dimension.

The mixture model was applied extensively in atmospheric LFV and other related studies (Haines and Hannachi 1995; Hannachi 1997b; Smyth et al. 1999; Hannachi and O’Neill 2001; Woollings et al. 2010a,b; Berner and Branstator 2007; Stephenson et al. 2004; Hannachi 2007; Hannachi et al. 2012). One useful application of the mixture model was performed by Woollings et al. (2010a), who applied it to the North Atlantic Oscillation (NAO). In that study the NAO is defined as the leading winter EOF obtained using the 40-yr European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis (ERA-40; see Uppala et al. 2005) over the North Atlantic sector (20°*–*90°N, 90°W–90°E) based on the 500-hPa geopotential height (Z500). The skewness of the NAO was −0.23 and is significant at the 5% level. The NAO PDF along with the Gaussian fit is shown in Fig. 4 (top).

Now the NAO has two opposite phases for which the positive phase normally corresponds to a zonal flow, whereas the negative phase corresponds to a blocked flow often located over Greenland. Woollings et al. (2010a) decomposed the daily NAO index into Greenland blocking and no-blocking events. Greenland blocking events (GBE) are defined based on wave breaking obtained by searching for reversals in the meridional potential temperature gradient on the 2-PVU surface^{5} (1 PVU = 10^{−6} K kg^{−1} m^{2} s^{−1}). Figure 4 (bottom) shows the PDF of the NAO split into the PDFs of GBE and non-GBE days. These individual PDFs have much less skewness, with no significance. Now the mixture model applied to the NAO time series is shown in Fig. 5. The agreement between the mixture model PDF and the PDFs corresponding to the GBE/non-GBE days is outstanding although the wave-breaking definition provides a slight overestimation of blocking occurrence. This provides another suggestion for NAO variability, namely that it is due to transitions between the zonal and blocked flow over the North Atlantic.

We present next our last example, which is on the jet stream position and its link to non-Gaussianity. For example, the jet latitude from different climate models is highly correlated to jet latitude skewness as obtained, for example, from the integrations from phase 3 of CMIP (Barnes and Hartmann 2010; Hannachi et al. 2013) and phase 5 of CMIP (Barnes and Polvani 2013), showing a decrease of skewness associated with increasing jet latitude. The westerly jet stream is important in low-frequency variability in that it is associated with much of weather and climate variability and teleconnection in the extratropics. The nonlinear regimes are actually a manifestation of the position of the jet stream (Woollings et al. 2010b; Hannachi et al. 2012). Over the North Atlantic region the eddy-driven jet stream latitude exhibits an outstanding trimodal behavior (Fig. 6) and the modes are associated with the Greenland blocking and two zonal flows, one with a rather strong northward-shifted jet and the other with a rather broad jet located at about 45°N (Woollings et al. 2010b). To investigate the relationship between the eddy-driven jet stream variability and large-scale flow over the North Atlantic Woollings et al. (2010b) and Hannachi et al. (2012) analyzed the PDF of the winter (DJF) Z500 from ERA-40 using the leading two EOFs, namely the NAO and the East Atlantic (EA) patterns. This PDF is based on a three-component mixture model and is shown in Fig. 7. The individual Gaussian components are shown by their covariance ellipses^{6} and their centers. The crosses represent the flow associated with the modes of the jet latitude PDF (Fig. 6). The color in Fig. 7 refers to the daily values of the jet latitude anomaly. The regimes, provided by the centers of the individual bivariate Gaussians, are shown in Fig. 8 and are very similar to the modes of the jet latitude PDF (Hannachi et al. 2012). Clearly Figs. 7 and 8 show that each regime corresponds to a particular position of the jet stream. For instance, the southern jet position (left-hand side ellipse) represents Greenland anticyclone blocking with its negative NAO signature. The northern jet (bottom right-hand side ellipse) represents a negative EA pattern whereas the third regime represents a central position of the jet and is a combination of positive NAO and EA patterns.

Mixture models can also be applied in a way that is consistent with the dynamics. This can be achieved using hidden Markov models. A hidden Markov model (HMM) is a statistical model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states. If the Markov chain possesses metastable states, they are identified as regimes. HMMs have been successfully used to search for regimes in simplified (quasigeostrophic: barotropic and three-layer baroclinic) general circulation models (Majda et al. 2006; Franzke et al. 2008). An extended HMM-type method also finds regimes in a comprehensive general circulation model (GCM) (Franzke et al. 2009). Franzke et al. (2011) also applied HMMs to the jet stream latitude over the North Atlantic to identify transitions between its positions.

Last but not least, it is important to mention that applications of mixture models and HMMs are not without problems. In particular, in the basic implementation of these methods the number of clusters *M* has to be specified a priori. For the mixture model *M* can be in principle determined by cross validation (Smyth 2000). However, it has been shown by Christiansen (2007) that applying the clustering methods to the Northern Hemisphere winter tropospheric geopotential heights gives conflicting and fragile results.

### b. Multiplicative noise

The concept of non-Gaussian statistics induced by multiplicative noise is best explained in conjunction with the nonlinear deterministic framework. We only have to add the idea of separate “fast” and “slow” time scales to upgrade the nonlinear (deterministic) framework into a stochastic one. [The following discussion follows Sura (2011, 2013), where the reader will find more details.]

Note that extratropical atmospheric flow does not generally have a spectral gap that allows this decomposition to be made unambiguously. Therefore stochastic climate models have to be seen as more or less rigorous approximations of the real system. While those approximations can be heuristics or mathematically more systematic and advanced, in both cases some leap of faith is required initially. Subsequently, the results have to be carefully validated by observations. Here we keep the math relatively simple to focus on the more intuitive physical aspects. A more detailed mathematical discussion of stochastic climate models and multiplicative noise can be, for example, found in Majda et al. (2008). In addition, more general and comprehensive treatments of stochastic differential equations and related topics may be found in many textbooks (e.g., Gardiner 2004; Øksendal 2007; Paul and Baschnagel 1999; van Kampen 2007; Horsthemke and Léfèver 1984).

Suppose climate dynamics are split into slow (i.e., slowly decorrelating) and fast (i.e., rapidly decorrelating) contributions; this is an assumption well known in turbulence theory. The fast part is then approximated as noise. That is, we consider the dynamics of a *p*-dimensional system whose state vector **x** is governed by the stochastic differential equation (SDE)

where the vector **A**(**x**) represents all slow, deterministic processes. The product of the matrix (**x**) and the noise vector ** η**(

*t*), (

**x**)

**(**

*η**t*), represents the influence of fast variables on the slow variables by a stochastic approximation. The stochastic components

*η*

_{i}are assumed to be independent Gaussian white-noise processes: and , where denotes the time average (or, assuming ergodicity, an ensemble average) and

*δ*is the Dirac delta function. In general,

**x**in Eq. (15) will have non-Gaussian statistics and is, therefore, well suited to study, for example, extreme events. This is well known to be the case if

**A**(

**x**) is nonlinear, even if (

**x**) is constant (i.e., the noise is state independent or additive; see left branch of Fig. 9). It is also true, yet less well known, if the deterministic dynamics are linear, represented for example as

_{0}

**x**with the matrix

_{0}, as long as (

**x**) is not constant (i.e., the noise is state dependent or multiplicative; see right branch of Fig. 9). Note that the schematic diagram in Fig. 9 is not intended to represent observed (realistic) atmospheric conditions (most atmospheric phenomena are not bimodal); the bimodality is used to highlight the impact of multiplicative noise in an intuitive manner. The actual functional form of stochastic models representing aspects of atmospheric low-frequency variability can be found in specific papers (e.g., Majda et al. 2006, 2009).

Let us look into the physics of each branch in more detail, starting with the left one. There it is obvious that transitions from one potential well to the other driven by additive noise will result in a bimodal PDF, as long as the additive noise is not too strong (in that case we get a monomodal PDF because the potential barrier does not inhibit the motion of the trajectory). This is not, however, the only dynamical system that can produce such a PDF. Consider instead a linear system (the right branch), represented by a unimodal deterministic potential, in which the trajectories are perturbed by multiplicative noise. Because of the larger noise amplitudes near the center of the monomodal potential, as compared to the strength of the noise to the right and left, the system’s trajectory is more often found on either side of the central noise maximum, and this system will have a bimodal PDF as well. Thus the same non-Gaussian PDF can result from either a slow (deterministic) nonlinear dynamical system or a fast (stochastic) nonlinear dynamical system.

For a more mathematical perspective the starting point is again Eq. (15). We now define (to use the vector/matrix form of the coming expansion); let , and linearize around the basic state through a Taylor expansion to get

with the Jacobian [for the function **A**(**x**)] evaluated at and assuming that we have a constant mean with a vanishing deterministic forcing for the mean, [or that the tendency of the mean is forced by the deterministic drift of the mean, ]. Because the time mean of the multiplicative noise term is, in general, nonzero we have to introduce an additional mean forcing to ensure that . We also included a supposedly small residual **r**, representing the higher-order terms of the Taylor expansion and all other neglected phenomena. Thus, although the governing equations for weather and climate are nonlinear, for many applications the anomalies are well approximated by a linear SDE of the general component form

where we dropped the primes, lumped together all the derivatives and coefficients into new matrices and tensors, and used Einstein’s summation convention (sum over repeated indices) to highlight the functional form of, specifically, the noise terms. Note that *A*_{ij} is now a matrix and not a general (potentially nonlinear) function **A**(**x**) anymore. Therefore, besides the linear deterministic term *A*_{ij}*x*_{j}, *G*_{ip}*η*_{p} represents the additive (state independent), and *E*_{ijp}*x*_{j}*η*_{p} the multiplicative (state dependent) noise contribution. Note that the additive and multiplicative noise terms are correlated because each term is multiplied by the same noise *η*_{p}. In addition, *D*_{i} represents the correlations of the noise forcing with the state variable itself (the “noise-induced drift”), which can be shown to be related to the noise parameters as *D*_{i} = −0.5*E*_{ijp}*G*_{jp} (e.g., Sardeshmukh and Sura 2009).

Equation (17) without multiplicative noise (i.e., *E*_{ijp} = 0) already plays a central role in studying climate variability. For example, the matrices *A*_{ij} and *Q*_{ij} = *G*_{im}*G*_{jm} (the product of *G*_{im} with its transpose) can be estimated directly from the data. This technique is called linear inverse modeling and has been successfully applied to study many climate phenomena (e.g., Penland 1989; Penland and Ghil 1993; Penland and Matrosova 1994; Penland and Sardeshmukh 1995; Winkler et al. 2001). In addition, dynamical studies of quasigeostrophic turbulence often rely on multivariate linear models with additive noise (e.g., Farrell and Ioannou 1995, 1996; Whitaker and Sardeshmukh 1998; DelSole 2004). While the multivariate linear models with additive noise are very useful for many applications, they have one inherent shortcoming: they can only represent Gaussian statistics (Gardiner 2004) and are, therefore, more often than not inconsistent with observations. However, in the general case with multiplicative noise (i.e., *E*_{ijp} ≠ 0) Eq. (17) will exhibit non-Gaussian statistics.

Recent studies demonstrated that certain types of observed non-Gaussian statistics are consistent with linear stochastically forced dynamics with correlated additive and multiplicative (CAM) noise forcing (Sura and Sardeshmukh 2008; Sardeshmukh and Sura 2009; Sura and Sardeshmukh 2009; Sura and Gille 2010; Sura and Perron 2010; Sura 2011; B. West and P. Sura 2013, unpublished manuscript). The consistency follows from the fact that the PDF and closed expressions for the moments, such as skewness and kurtosis, can be analytically derived from the stationary Fokker–Planck equation for a system such as Eq. (17). In particular, for a variable *x* (with zero mean and standard deviation *σ*) drawn from such a PDF *p*(*x*) we obtain the following:

the property that the (excess) kurtosis

*K*is always greater than 1.5 times the square of the skewness*S*(minus an adjustment constant*r*), ; andthe power-law tails, that is, (with the power-law exponent

*α*) for large .

Remarkably, the theoretically derived *K*–*S* inequality and the power-law behavior is found to be largely consistent with observed non-Gaussian variability of many variables in the ocean and the atmosphere. One exception is that the observed power-law exponents are typically different for the positive and negative tails, pointing to the still incomplete character of the simplest (i.e., linear) multiplicative noise model.

An example, based on DJF 500-hPa geopotential height skewness and kurtosis (Fig. 2), is presented in Fig. 10 (B. West and P. Sura 2013, unpublished manuscript). The upper panel shows maps of estimated power-law exponents *α* for positive (left) and negative (right) geopotential height anomalies (i.e., the power-law *α* is separately estimated for positive and negative anomalies). The lower panel shows scatterplots of geopotential height kurtosis *K* versus skewness *S*. The skewness and kurtosis values are taken from DJF data present in Fig. 2 and are, therefore, obviously identical in both plots. The dashed parabolic lines denote , indicating the constraint that kurtosis is always greater than 1.5 times the square of the skewness (minus an adjustment constant). The color of the points represents the value of *α* for positive (left) and negative (right) geopotential height anomalies.

### c. Kinematic (meandering jet stream)

It has been recently suggested that the skewness pattern of 500-hPa geopotential height anomalies can arise as a simple kinematic consequence of a meandering jet stream (Luxford and Woollings 2012). A Northern Hemisphere jet corresponds to a strong meridional geopotential height gradient. As the jet meanders, the gradient shifts north and south with the jet. If we now assume a location north (south) of the mean jet and consider jet stream shifts, we will see that the resulting PDF is positively (negatively) skewed. When the jet stream moves south there is little effect at this point, because the geopotential height gradient to the north of the jet is relatively weak. If, however, the jet moves north, the region of strong gradients moves over our point of observation and experiences a strong increase in height. This will naturally result in more positive than negative geopotential height anomalies and hence a positively skewed PDF. The opposite is true (i.e., we obtain a negatively skewed PDF) if we consider a point just to the south of the mean jet. The idea is schematically presented in Fig. 11 where the curves represent schematics of the tropopause height. Note that this is a simplified schematic omitting some details. For example, when the jet meanders equatorward it is typically slightly stronger and higher up compared to the poleward displacement.

### d. Cross-frequency coupling

Cross-frequency coupling can be best explained by reexamining the definition of skewness and (excess) kurtosis.^{7} Let us assume that the anomaly time series *x* is split into two frequency bands: . Then skewness *S* and excess kurtosis *K* of *x* can be written as

where *σ*_{x} is the standard deviation of the full signal *x*. It can be seen that the non-Gaussianity of *x* is not simply related to the higher moments of *x*_{1} and *x*_{2} alone, but also to higher-order mixed moments. That is, only if the mixed moments are zero (or add up to zero), the skewness (and kurtosis) of the full signal is a linear combination of the synoptic and low-frequency skewness (and kurtosis). Of course, the choice of the frequency bands is arbitrary; the crucial point is that skewness and kurtosis of a time series *x* always includes cross-frequency terms.

For example, the skewness of wintertime 500-hPa geopotential height variability is dominated by the cross-frequency coupling between low [<(30 day)^{−1}] and medium [from (30 day)^{−1} to (6 day)^{−1}] frequency variability (Rennert and Wallace 2009). To come to that conclusion, Rennert and Wallace (2009) analyze the 6-day low-pass filtered geopotential height field *Z*_{LM}, which consists of a low-frequency contribution *Z*_{L} and a medium-frequency part *Z*_{M}: *Z*_{LM} = *Z*_{L} + *Z*_{M}.

Rennert and Wallace (2009) then calculate the individual contributions to the overall skewness of *Z*_{LM}: , , , and . The results [adapted from Rennert and Wallace (2009)] are shown in Fig. 12. First, the total skewness of *Z*_{LM} is shown in Fig. 12a. Most important, the contribution from the cross-frequency coupling term is presented in Fig. 12b. The contributions from and are shown in Figs. 12c and 12d, respectively. The remaining cross-coupling term is negligible and, therefore, not shown. It can be seen that the term is the dominant contributor, showing that the skewness of wintertime 500-hPa geopotential height variability is dominated by the cross-frequency coupling between low- and medium-frequency variability.

### e. Nonlinear boundary layer drag

Because the atmospheric boundary plays an important role in influencing the free troposphere and, therefore, the general circulation, we also include a boundary layer mechanism that induces non-Gaussianity (skewness) in surface winds. Of course, the detailed knowledge of surface wind statistics is very important for many practical applications such as weather risk management. From a more scientific point of view, sea surface wind statistics are also important to understand ocean–atmosphere coupling through momentum and heat fluxes.

To explain the impact of nonlinear boundary layer drag on the statistics of surface winds, we focus on the skewness of sea surface winds analyzed by Monahan (2004, 2006a,b). For example, Fig. 13 shows the mean, standard deviation, and skewness of zonal sea surface (995 hPa) winds from NCEP–NCAR reanalyses (Kalnay et al. 1996; Kistler et al. 2001). The mean sea surface zonal wind field displays the familiar tropical easterlies and midlatitude westerlies. The standard deviation shows variability minima in the tropics/subtropics and maxima in the midlatitude storm tracks. The skewness of the zonal wind is generally positive in the tropics and negative in the midlatitudes. In fact, there exists a linear relationship between the mean and skewness of the zonal winds: positive means winds are related to negative skewness and vice versa. The same relationship holds for meridional winds; see Monahan (2004) for more details.

The basic principle behind the observed anticorrelation between mean and skewness of sea surface winds can be understood rather easily (Monahan 2004). The main building block is the fact that the stress in the boundary layer is nonlinear. According to Monin–Obukhov similarity theory, the zonal and meridional components of the surface stress vector are and , where *u* and *υ* are the zonal and meridional surface wind components, *c*_{D} is the drag coefficient, and *ρ* is the density of air. Besides the pressure gradient force, the Coriolis force, downward momentum transport (from the top of the boundary layer), and nonlinear advection, the local rates of velocity changes are impacted by the stresses:

where *h* is the depth of the boundary layer [see Monahan (2004) for more details]. Now the basic understanding of the link between the mean and the skewness of the sea surface wind components is straightforward. First suppose that the mean wind is zero and that the fluctuations are equally likely to be positive and negative. Then both the mean and skewness of the resulting wind will be zero. Now suppose that the mean wind is positive. Because of the nonlinear surface drag, positive wind anomalies will experience stronger friction than negative ones. Therefore, the winds will be negatively skewed. The opposite is true if we consider a negative mean wind. That is, the non-Gaussianity of the sea surface winds is due to the nonlinear boundary layer drag. Over land the situation is a bit more complicated because the boundary layer is influenced more strongly by the diurnal cycle than over sea (He et al. 2010; Monahan et al. 2011).

## 5. Non-Gaussianity as modes via optimization principles

The complexity of weather/climate variability implies that non-Gaussianity is an intrinsic physical property of the system. This suggests that modes of atmospheric variability characterized by non-Gaussianity can be defined. Like the case of prominent modes of variability, such as the Arctic Oscillation (AO), the NAO, the EA patterns, etc., which explain maximum variance, there is the possibility to explore modes of variability that maximize various criteria of non-Gaussianity. Skewness and excess kurtosis are obvious examples of such criteria. Other more exotic measures can also be used such as multimodality of the system’s PDF, or the negative entropy (negentropy; see, e.g., Christiansen 2009; Hannachi et al. 2009), defined by

where *p*(*x*) is the PDF of the variable *X* and *g*(*x*) is the Gaussian PDF with the same variance as *X*.

Directions within the system’s state space that maximize these nonnormality measures can be used to identify prominent patterns. Projection pursuit (Friedman and Tukey 1974) refers to the method that identifies directions **a**, within the state space, such that the time series obtained by projecting the data **x**_{t}, onto **a** (i.e., ) maximizes a specific criterion , for example,

Christiansen (2009) applied projection pursuit to 20- and 500-hPa geopotential heights from NCEP–NCAR reanalyses for the period 1948–2005 by optimizing five indices. In terms of the 500-hPa heights Christiansen (2009) identified a pattern resembling a combination of the PNA and a European pattern when kurtosis is maximized, and a NAO pattern when the negentropy is maximized.

Maximizing non-Gaussianity (e.g., skewness and negentropy) actually overlaps substantially with independent component analysis (ICA; Hyvärinen et al. 2001), which aims at finding statistically independent patterns. Hannachi et al. (2009) presented ICA as a EOF rotation method by maximizing independence between EOF loadings as measured by a fourth-order moment and identified patterns like the NAO and AO (note that the resulting patterns are not spatially orthogonal like EOF modes). In the above studies (Christiansen 2009; Hannachi et al. 2009) skewness was not considered explicitly. Pasmanter and Selten (2010) maximized skewness in an attempt to decompose atmospheric data into skewness modes. The method is similar to projection pursuit and seeks to maximize skewness subject to unit variance, that is,

where is the expectation operator and represents the covariance matrix of , . After prewhitening the data and differentiation, the above system boils down to a quadratic system of equations:

where represents the elements of the skewness tensor. The above system can be solved only numerically when *p* > 2.

Pasmanter and Selten (2010) applied this to the winter (DJF) daily 500-hPa streamfunction from ERA-40. The first maximal skewness mode represents a Rossby wave–like propagating from central Pacific along the west North American coast (Fig. 14, left). The second skewness mode (Fig. 14, right) projects onto the PNA pattern and contains a west Pacific pattern and a particular phase of the wave in the intermediate frequency range of Rennert and Wallace (2009, see their Fig. 14). These modes explain altogether around 8% of the total variance (by construction the skewness modes are temporally uncorrelated, so it makes sense to talk about them explaining a certain fraction of the total variance).

A few important points can be drawn from these findings. Although the amount of variance explained by those patterns maximizing skewness is not very large it is still significant when compared to the variance of extratropical modes of variability or EOFs. The other and perhaps most important point is that modes maximizing skewness or kurtosis project well onto the large-scale flow. These observations point to the importance of non-Gaussianity in the atmosphere, which cannot be overlooked. There is also the possibility that some of the “interesting structures” of large-scale flow may lie on a (nonlinear) low-dimensional manifold for which nonlinear methods (Hannachi and Turner 2013a,b; Monahan et al. 2001), can, despite their shortcomings (Monahan and Fyfe 2007), still provide some valuable information. It is also important to say, in this regard, that when we deal with non-Gaussian structures conventional approaches that operate on hyperplanes may not be efficient and one may have to try different measures that focus on the non-Gaussian character based, for example, on information theoretic metrics (Monahan and DelSole 2009).

## 6. Summary and discussion

Understanding non-Gaussian statistics of weather and climate variability is important in the atmospheric and ocean sciences for many reasons including weather/climate prediction, risk assessment, and model evaluation/comparison. For instance, weather and climate risk assessment, attributable, for example, to extremes, depends on knowing and understanding the exact shape of the system PDF. In particular, we need to know the different mechanisms responsible for, and quantify their individual contribution to, the non-Gaussianity in order to understand how a PDF might change under global warming.

In this paper we have attempted to do a systematic survey from the literature of non-Gaussianity in large-scale tropospheric variability and compiled it in one place, which we hope will stimulate further discussion within the community. Although non-Gaussianity is ubiquitous in the climate system and exists almost in the entirety of the spectrum here we focused mainly on low-frequency variability and large-scale motion of the free atmosphere and the atmospheric boundary layer. Non-Gaussianity can be investigated through analyzing the PDF of the system. Because of system’s complexity PDF estimation is often restricted to a few modes of variability. A simple alternative is to use the third- and fourth-order moments, that is, skewness and kurtosis computed usually pointwise.

As it turns out, most atmospheric low-frequency variability is characterized by largely significant skewness and kurtosis partly enhanced by the persistence characterizing the system. Different sources and mechanisms contribute to the observed non-Gaussianity, namely the following:

nonlinear regime dynamics,

multiplicative noise,

cross-frequency coupling,

jet variability and meandering, and

nonlinear boundary layer drag

Processes such as blocking and associated extreme weather events can be explained by the jet variability and nonlinearity. Undoubtedly not all extreme events are linked to the jet stream variability and other large-scale processes. Local phenomena such as wind gusts, convective fronts, and thunderstorms and many mesoscale extreme events have nonnegligible effect on the PDF of weather phenomena, which can cascade, through cumulative effects, to the large-scale flow and ultimately enhance its nonnormality. But as we have chosen to focus on low-frequency and large-scale flows these mesoscale phenomena are not considered. It is worth noting here that it is extremely difficult to identify or trace back the contribution of small scales and mesoscales to the observed non-Gaussianity of large-scale flows.

Patterns maximizing various measures of non-Gaussianity are found to represent large-scale patterns and do include some signatures from familiar teleconnection patterns such as the NAO, PNA, and the West Pacific (WP) pattern. Patterns maximizing skewness, however, do not exactly match those large-scale teleconnection patterns. The leading maximum skewness patterns explain a significant, though not very large, percentage of variance. Those leading patterns project well onto the large-scale flow and reflect the importance of non-Gaussianity in weather and climate, which cannot be overlooked.

Although we focused mainly on non-Gaussianity in the large-scale troposphere, non-Gaussianity is also and without a shadow of a doubt a characteristic feature of large-scale ocean variability. In fact, from a climate perspective, the non-Gaussianity of sea surface temperature (SST) variability plays an important role because SSTs directly influence atmospheric events such as hurricanes. In addition, SST-driven atmosphere–ocean interactions are responsible for large-scale climate phenomena like El Niño–Southern Oscillation (ENSO). Local SST variability is strongly non-Gaussian and possibly driven by multiplicative noise (Sura and Sardeshmukh 2008). In this case the physical aspect of the state-dependent noise forcing is easy to understand. If we (realistically) assume that SST variability is mainly driven by the vertical heat flux *F*_{Q} through the air–sea boundary, the typical bulk heat flux parameterization takes the form (positive flux upward), where *T*_{o} and *T*_{a} are the sea surface and air temperatures, respectively, is the wind speed, and *β* is a positive parameter depending on densities of seawater and air, specific heats, the Bowen ratio, and other physical processes. If we allow for rapidly varying, effectively stochastic winds we immediately see that the rapid wind component multiplies the slowly varying SST, giving rise to an effectively multiplicative noise forcing. In strong currents like the Gulf Stream, random advection by surface currents plays a (mathematically) similar role (Sura 2010). Large-scale non-Gaussian SST variability, such as in the ENSO region (e.g., Burgers and Stephenson 1999), may also be explained by nonlinear dynamics of the coupled ocean–atmosphere system. Examples include nonlinear stochastic oscillator models (e.g., Neelin et al. 1998; Dijkstra 2006; Cane and Zebiak 1985; Suarez and Schopf 1988, and many others), nonlinear stochastic models relating the thermocline depth to the SST (e.g., Hannachi et al. 2003), or even exotic chaotic dynamics (e.g., Jin et al. 1994). A detailed discussion of nonlinear and non-Gaussian ENSO dynamics is, however, beyond the scope of this paper.

At this point a paradigm does not seem to exist that allows us to physically understand how a PDF might change in a warming climate, and the development of a conceptual yet physically reasonable model to study how climate change might affect non-Gaussianity would be invaluable in this regard. Many climate projections just look into the change of the mean and the variance, that is, assuming Gaussian statistics. However, the non-Gaussian statistics (the shape of the distribution) will most likely also be altered in a changing climate. Although various studies have investigated changes in weather and climate extremes due to a warming climate, apart from a few studies that investigated some aspects of high-order moments (e.g., skewness or kurtosis) related to jet latitude (Hannachi et al. 2013; Barnes and Hartmann 2010; Barnes and Polvani 2013) no quantitative studies have genuinely investigated skewness and/or kurtosis changes pointing to the need to investigate those changes in a future warming climate.

One of our main future research lines will be to develop methods to quantitatively understand the non-Gaussian features of weather/climate PDFs as this will turn out to be invaluable in gaining profound insights into weather/climate dynamics. While there are, as we have seen, many mechanisms contributing to non-Gaussian atmospheric statistics, we believe that there may exist a deeper grand principle beyond the diversity of phenomena. For example, one possibility is that atmospheric and/or climate statistics are collectively non-Gaussian to optimize a specific functional of the weather/climate system. One possible candidate to explore would be the entropy production (e.g., Paltridge 1975; Ozawa et al. 2003; Kleidon 2009). That is, the climate system may exhibit a specific collective non-Gaussianity to maximize the overall entropy production. Although this idea is purely speculative, it may point a way forward toward a unified perspective of non-Gaussianity in weather and climate variability.

## Acknowledgments

Most of this work was done while PS was visiting the International Meteorological Institute (IMI) at Stockholm University. We are grateful for the generous funding provided by IMI, and the support from faculty and staff of the Department of Meteorology (MISU). We thank Dr. Adam Monahan and two anonymous reviewers for their constructive comments that helped improve the manuscript. We also thank Drs. Libby Barnes and Christian Franzke for their comments and for the discussion we had with them.

## REFERENCES

*Milankovitch and Climate: Understanding the Response to Astronomical Forcing.*NATO ASI Series, Vol. 126, Springer, 896 pp.

**62,**1792–1811, doi:.

**36,**1205–1216, doi:.

*Handbook of Stochastic Methods for Physics, Chemistry and the Natural Science.*3rd ed. Springer-Verlag, 415 pp.

*Handbook of Statistics 1: Analysis of Variance,*P. R. Krishnaiah, Ed., North Holland Publishing Company, 279–320.

*Stochastic Differential Equations*. 6th ed. Springer-Verlag, 369 pp.

*ances in*Geophys

*ics,*Vol. 34, Academic Press, 93–174, doi:.

*The History of Statistics: The Measurement of Uncertainty before 1900*. Belknap Press, 432 pp.

*Extremes in a Changing Climate,*A. AghaKouchak et al., Eds., Springer, 181–222.

*Advances in Geophysics,*Vol. 29, Academic Press, 227–249, doi:.

*Stochastic Processes in Physics and Chemistry.*3rd ed. Elsevier, 463 pp.

*Statistical Methods in the Atmospheric Sciences.*2nd ed. Academic Press, 627 pp.

## Footnotes

^{1}

Hence higher predictability.

^{2}

Mardia (1980) referred in her book to a statement by Geary (1947): “Normality is a myth; there never was, and never will be a normal distribution.” Mardia, however, points out that this is an overstatement from a practical point of view and it is important to know when a sample departs from normality.

^{4}

Referred to in the literature as Jaynes’ principle.

^{5}

Considered as a proxy for the dynamical tropopause.

^{6}

That contains 84% of the total volume delimited by the bivariate Gaussian PDF and the horizontal plane.

^{7}

Note that it is not unreasonable to consider cross-frequency coupling as kinematic, but we prefer to separate them as cross-frequency coupling does not need a jet.